pax_global_header00006660000000000000000000000064125554366620014530gustar00rootroot0000000000000052 comment=cd70ddee5fb32a1a67f4599c56d2effebc337a4d mothur-1.36.1/000077500000000000000000000000001255543666200131365ustar00rootroot00000000000000mothur-1.36.1/.gitignore000066400000000000000000000003211255543666200151220ustar00rootroot00000000000000*.logfile *.o *.pbxproj *.zip .DS_Store .idea build xcuserdata project.xcworkspace *.xcuserdata TARGET_BUILD_DIRTARGET_BUILD_DIR *.xcuserdata *.xml nbproject/ mothur Mothur.1 *.stdout uchime *.stderr *.pat mothur-1.36.1/INSTALL.md000066400000000000000000000023031255543666200145640ustar00rootroot00000000000000#For Unix / Linux / Centos #Current makefile compatible with Boost 1.58.0 1. Boost requires some things to installed on your machine already. Most come standard on many flavors on Unix, but you may need to install the devel packages. Install libz, bzip2 and python, if its not on your machine, including zlib-devel, bzip2-devel, python-devel. This can be easily done with yum or apt get. 2. Download Boost, http://www.boost.org 3. Follow their install instructions, http://www.boost.org/doc/libs/1_58_0/more/getting_started/unix-variants.html#easy-build-and-install ./bootstrap.sh --prefix=/usr/local/ ./b2 install 4. Edit mothur's makefile to indicate your boost installation location. This should match the setting of --prefix in the boost install. #From the boost install ./bootstrap.sh --prefix=/usr/local/ #Mothur's makefile BOOST_INCLUDE_DIR="/usr/local/include/" BOOST_LIBRARY_DIR="/usr/local/lib/" 5. Copy libz.a into BOOST_LIBRARY_DIR. 6. Run make. If you get a linking errors, it is likely because the zlib files were not found correctly. You may need to add gzip.cpp and zlib.cpp to the source folder of mothur. They are located in the boost_versionNumber/libs/iostreams/src/gzip.cpp. mothur-1.36.1/LICENSE.md000066400000000000000000000763101255543666200145510ustar00rootroot00000000000000GNU GENERAL PUBLIC LICENSE Version 3, 29 June 2007 Copyright © 2007 Free Software Foundation, Inc. Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The GNU General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users. We, the Free Software Foundation, use the GNU General Public License for most of our software; it applies also to any other work released this way by its authors. You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for them if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs, and that you know you can do these things. To protect your rights, we need to prevent others from denying you these rights or asking you to surrender the rights. Therefore, you have certain responsibilities if you distribute copies of the software, or if you modify it: responsibilities to respect the freedom of others. For example, if you distribute copies of such a program, whether gratis or for a fee, you must pass on to the recipients the same freedoms that you received. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. Developers that use the GNU GPL protect your rights with two steps: (1) assert copyright on the software, and (2) offer you this License giving you legal permission to copy, distribute and/or modify it. For the developers' and authors' protection, the GPL clearly explains that there is no warranty for this free software. For both users' and authors' sake, the GPL requires that modified versions be marked as changed, so that their problems will not be attributed erroneously to authors of previous versions. Some devices are designed to deny users access to install or run modified versions of the software inside them, although the manufacturer can do so. This is fundamentally incompatible with the aim of protecting users' freedom to change the software. The systematic pattern of such abuse occurs in the area of products for individuals to use, which is precisely where it is most unacceptable. Therefore, we have designed this version of the GPL to prohibit the practice for those products. If such problems arise substantially in other domains, we stand ready to extend this provision to those domains in future versions of the GPL, as needed to protect the freedom of users. Finally, every program is threatened constantly by software patents. States should not allow patents to restrict development and use of software on general-purpose computers, but in those that do, we wish to avoid the special danger that patents applied to a free program could make it effectively proprietary. To prevent this, the GPL assures that patents cannot be used to render the program non-free. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS 0. Definitions. “This License” refers to version 3 of the GNU General Public License. “Copyright” also means copyright-like laws that apply to other kinds of works, such as semiconductor masks. “The Program” refers to any copyrightable work licensed under this License. Each licensee is addressed as “you”. “Licensees” and “recipients” may be individuals or organizations. To “modify” a work means to copy from or adapt all or part of the work in a fashion requiring copyright permission, other than the making of an exact copy. The resulting work is called a “modified version” of the earlier work or a work “based on” the earlier work. A “covered work” means either the unmodified Program or a work based on the Program. To “propagate” a work means to do anything with it that, without permission, would make you directly or secondarily liable for infringement under applicable copyright law, except executing it on a computer or modifying a private copy. Propagation includes copying, distribution (with or without modification), making available to the public, and in some countries other activities as well. To “convey” a work means any kind of propagation that enables other parties to make or receive copies. Mere interaction with a user through a computer network, with no transfer of a copy, is not conveying. An interactive user interface displays “Appropriate Legal Notices” to the extent that it includes a convenient and prominently visible feature that (1) displays an appropriate copyright notice, and (2) tells the user that there is no warranty for the work (except to the extent that warranties are provided), that licensees may convey the work under this License, and how to view a copy of this License. If the interface presents a list of user commands or options, such as a menu, a prominent item in the list meets this criterion. 1. Source Code. The “source code” for a work means the preferred form of the work for making modifications to it. “Object code” means any non-source form of a work. A “Standard Interface” means an interface that either is an official standard defined by a recognized standards body, or, in the case of interfaces specified for a particular programming language, one that is widely used among developers working in that language. The “System Libraries” of an executable work include anything, other than the work as a whole, that (a) is included in the normal form of packaging a Major Component, but which is not part of that Major Component, and (b) serves only to enable use of the work with that Major Component, or to implement a Standard Interface for which an implementation is available to the public in source code form. A “Major Component”, in this context, means a major essential component (kernel, window system, and so on) of the specific operating system (if any) on which the executable work runs, or a compiler used to produce the work, or an object code interpreter used to run it. The “Corresponding Source” for a work in object code form means all the source code needed to generate, install, and (for an executable work) run the object code and to modify the work, including scripts to control those activities. However, it does not include the work's System Libraries, or general-purpose tools or generally available free programs which are used unmodified in performing those activities but which are not part of the work. For example, Corresponding Source includes interface definition files associated with source files for the work, and the source code for shared libraries and dynamically linked subprograms that the work is specifically designed to require, such as by intimate data communication or control flow between those subprograms and other parts of the work. The Corresponding Source need not include anything that users can regenerate automatically from other parts of the Corresponding Source. The Corresponding Source for a work in source code form is that same work. 2. Basic Permissions. All rights granted under this License are granted for the term of copyright on the Program, and are irrevocable provided the stated conditions are met. This License explicitly affirms your unlimited permission to run the unmodified Program. The output from running a covered work is covered by this License only if the output, given its content, constitutes a covered work. This License acknowledges your rights of fair use or other equivalent, as provided by copyright law. You may make, run and propagate covered works that you do not convey, without conditions so long as your license otherwise remains in force. You may convey covered works to others for the sole purpose of having them make modifications exclusively for you, or provide you with facilities for running those works, provided that you comply with the terms of this License in conveying all material for which you do not control copyright. Those thus making or running the covered works for you must do so exclusively on your behalf, under your direction and control, on terms that prohibit them from making any copies of your copyrighted material outside their relationship with you. Conveying under any other circumstances is permitted solely under the conditions stated below. Sublicensing is not allowed; section 10 makes it unnecessary. 3. Protecting Users' Legal Rights From Anti-Circumvention Law. No covered work shall be deemed part of an effective technological measure under any applicable law fulfilling obligations under article 11 of the WIPO copyright treaty adopted on 20 December 1996, or similar laws prohibiting or restricting circumvention of such measures. When you convey a covered work, you waive any legal power to forbid circumvention of technological measures to the extent such circumvention is effected by exercising rights under this License with respect to the covered work, and you disclaim any intention to limit operation or modification of the work as a means of enforcing, against the work's users, your or third parties' legal rights to forbid circumvention of technological measures. 4. Conveying Verbatim Copies. You may convey verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice; keep intact all notices stating that this License and any non-permissive terms added in accord with section 7 apply to the code; keep intact all notices of the absence of any warranty; and give all recipients a copy of this License along with the Program. You may charge any price or no price for each copy that you convey, and you may offer support or warranty protection for a fee. 5. Conveying Modified Source Versions. You may convey a work based on the Program, or the modifications to produce it from the Program, in the form of source code under the terms of section 4, provided that you also meet all of these conditions: a) The work must carry prominent notices stating that you modified it, and giving a relevant date. b) The work must carry prominent notices stating that it is released under this License and any conditions added under section 7. This requirement modifies the requirement in section 4 to “keep intact all notices”. c) You must license the entire work, as a whole, under this License to anyone who comes into possession of a copy. This License will therefore apply, along with any applicable section 7 additional terms, to the whole of the work, and all its parts, regardless of how they are packaged. This License gives no permission to license the work in any other way, but it does not invalidate such permission if you have separately received it. d) If the work has interactive user interfaces, each must display Appropriate Legal Notices; however, if the Program has interactive interfaces that do not display Appropriate Legal Notices, your work need not make them do so. A compilation of a covered work with other separate and independent works, which are not by their nature extensions of the covered work, and which are not combined with it such as to form a larger program, in or on a volume of a storage or distribution medium, is called an “aggregate” if the compilation and its resulting copyright are not used to limit the access or legal rights of the compilation's users beyond what the individual works permit. Inclusion of a covered work in an aggregate does not cause this License to apply to the other parts of the aggregate. 6. Conveying Non-Source Forms. You may convey a covered work in object code form under the terms of sections 4 and 5, provided that you also convey the machine-readable Corresponding Source under the terms of this License, in one of these ways: a) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by the Corresponding Source fixed on a durable physical medium customarily used for software interchange. b) Convey the object code in, or embodied in, a physical product (including a physical distribution medium), accompanied by a written offer, valid for at least three years and valid for as long as you offer spare parts or customer support for that product model, to give anyone who possesses the object code either (1) a copy of the Corresponding Source for all the software in the product that is covered by this License, on a durable physical medium customarily used for software interchange, for a price no more than your reasonable cost of physically performing this conveying of source, or (2) access to copy the Corresponding Source from a network server at no charge. c) Convey individual copies of the object code with a copy of the written offer to provide the Corresponding Source. This alternative is allowed only occasionally and noncommercially, and only if you received the object code with such an offer, in accord with subsection 6b. d) Convey the object code by offering access from a designated place (gratis or for a charge), and offer equivalent access to the Corresponding Source in the same way through the same place at no further charge. You need not require recipients to copy the Corresponding Source along with the object code. If the place to copy the object code is a network server, the Corresponding Source may be on a different server (operated by you or a third party) that supports equivalent copying facilities, provided you maintain clear directions next to the object code saying where to find the Corresponding Source. Regardless of what server hosts the Corresponding Source, you remain obligated to ensure that it is available for as long as needed to satisfy these requirements. e) Convey the object code using peer-to-peer transmission, provided you inform other peers where the object code and Corresponding Source of the work are being offered to the general public at no charge under subsection 6d. A separable portion of the object code, whose source code is excluded from the Corresponding Source as a System Library, need not be included in conveying the object code work. A “User Product” is either (1) a “consumer product”, which means any tangible personal property which is normally used for personal, family, or household purposes, or (2) anything designed or sold for incorporation into a dwelling. In determining whether a product is a consumer product, doubtful cases shall be resolved in favor of coverage. For a particular product received by a particular user, “normally used” refers to a typical or common use of that class of product, regardless of the status of the particular user or of the way in which the particular user actually uses, or expects or is expected to use, the product. A product is a consumer product regardless of whether the product has substantial commercial, industrial or non-consumer uses, unless such uses represent the only significant mode of use of the product. “Installation Information” for a User Product means any methods, procedures, authorization keys, or other information required to install and execute modified versions of a covered work in that User Product from a modified version of its Corresponding Source. The information must suffice to ensure that the continued functioning of the modified object code is in no case prevented or interfered with solely because modification has been made. If you convey an object code work under this section in, or with, or specifically for use in, a User Product, and the conveying occurs as part of a transaction in which the right of possession and use of the User Product is transferred to the recipient in perpetuity or for a fixed term (regardless of how the transaction is characterized), the Corresponding Source conveyed under this section must be accompanied by the Installation Information. But this requirement does not apply if neither you nor any third party retains the ability to install modified object code on the User Product (for example, the work has been installed in ROM). The requirement to provide Installation Information does not include a requirement to continue to provide support service, warranty, or updates for a work that has been modified or installed by the recipient, or for the User Product in which it has been modified or installed. Access to a network may be denied when the modification itself materially and adversely affects the operation of the network or violates the rules and protocols for communication across the network. Corresponding Source conveyed, and Installation Information provided, in accord with this section must be in a format that is publicly documented (and with an implementation available to the public in source code form), and must require no special password or key for unpacking, reading or copying. 7. Additional Terms. “Additional permissions” are terms that supplement the terms of this License by making exceptions from one or more of its conditions. Additional permissions that are applicable to the entire Program shall be treated as though they were included in this License, to the extent that they are valid under applicable law. If additional permissions apply only to part of the Program, that part may be used separately under those permissions, but the entire Program remains governed by this License without regard to the additional permissions. When you convey a copy of a covered work, you may at your option remove any additional permissions from that copy, or from any part of it. (Additional permissions may be written to require their own removal in certain cases when you modify the work.) You may place additional permissions on material, added by you to a covered work, for which you have or can give appropriate copyright permission. Notwithstanding any other provision of this License, for material you add to a covered work, you may (if authorized by the copyright holders of that material) supplement the terms of this License with terms: a) Disclaiming warranty or limiting liability differently from the terms of sections 15 and 16 of this License; or b) Requiring preservation of specified reasonable legal notices or author attributions in that material or in the Appropriate Legal Notices displayed by works containing it; or c) Prohibiting misrepresentation of the origin of that material, or requiring that modified versions of such material be marked in reasonable ways as different from the original version; or d) Limiting the use for publicity purposes of names of licensors or authors of the material; or e) Declining to grant rights under trademark law for use of some trade names, trademarks, or service marks; or f) Requiring indemnification of licensors and authors of that material by anyone who conveys the material (or modified versions of it) with contractual assumptions of liability to the recipient, for any liability that these contractual assumptions directly impose on those licensors and authors. All other non-permissive additional terms are considered “further restrictions” within the meaning of section 10. If the Program as you received it, or any part of it, contains a notice stating that it is governed by this License along with a term that is a further restriction, you may remove that term. If a license document contains a further restriction but permits relicensing or conveying under this License, you may add to a covered work material governed by the terms of that license document, provided that the further restriction does not survive such relicensing or conveying. If you add terms to a covered work in accord with this section, you must place, in the relevant source files, a statement of the additional terms that apply to those files, or a notice indicating where to find the applicable terms. Additional terms, permissive or non-permissive, may be stated in the form of a separately written license, or stated as exceptions; the above requirements apply either way. 8. Termination. You may not propagate or modify a covered work except as expressly provided under this License. Any attempt otherwise to propagate or modify it is void, and will automatically terminate your rights under this License (including any patent licenses granted under the third paragraph of section 11). However, if you cease all violation of this License, then your license from a particular copyright holder is reinstated (a) provisionally, unless and until the copyright holder explicitly and finally terminates your license, and (b) permanently, if the copyright holder fails to notify you of the violation by some reasonable means prior to 60 days after the cessation. Moreover, your license from a particular copyright holder is reinstated permanently if the copyright holder notifies you of the violation by some reasonable means, this is the first time you have received notice of violation of this License (for any work) from that copyright holder, and you cure the violation prior to 30 days after your receipt of the notice. Termination of your rights under this section does not terminate the licenses of parties who have received copies or rights from you under this License. If your rights have been terminated and not permanently reinstated, you do not qualify to receive new licenses for the same material under section 10. 9. Acceptance Not Required for Having Copies. You are not required to accept this License in order to receive or run a copy of the Program. Ancillary propagation of a covered work occurring solely as a consequence of using peer-to-peer transmission to receive a copy likewise does not require acceptance. However, nothing other than this License grants you permission to propagate or modify any covered work. These actions infringe copyright if you do not accept this License. Therefore, by modifying or propagating a covered work, you indicate your acceptance of this License to do so. 10. Automatic Licensing of Downstream Recipients. Each time you convey a covered work, the recipient automatically receives a license from the original licensors, to run, modify and propagate that work, subject to this License. You are not responsible for enforcing compliance by third parties with this License. An “entity transaction” is a transaction transferring control of an organization, or substantially all assets of one, or subdividing an organization, or merging organizations. If propagation of a covered work results from an entity transaction, each party to that transaction who receives a copy of the work also receives whatever licenses to the work the party's predecessor in interest had or could give under the previous paragraph, plus a right to possession of the Corresponding Source of the work from the predecessor in interest, if the predecessor has it or can get it with reasonable efforts. You may not impose any further restrictions on the exercise of the rights granted or affirmed under this License. For example, you may not impose a license fee, royalty, or other charge for exercise of rights granted under this License, and you may not initiate litigation (including a cross-claim or counterclaim in a lawsuit) alleging that any patent claim is infringed by making, using, selling, offering for sale, or importing the Program or any portion of it. 11. Patents. A “contributor” is a copyright holder who authorizes use under this License of the Program or a work on which the Program is based. The work thus licensed is called the contributor's “contributor version”. A contributor's “essential patent claims” are all patent claims owned or controlled by the contributor, whether already acquired or hereafter acquired, that would be infringed by some manner, permitted by this License, of making, using, or selling its contributor version, but do not include claims that would be infringed only as a consequence of further modification of the contributor version. For purposes of this definition, “control” includes the right to grant patent sublicenses in a manner consistent with the requirements of this License. Each contributor grants you a non-exclusive, worldwide, royalty-free patent license under the contributor's essential patent claims, to make, use, sell, offer for sale, import and otherwise run, modify and propagate the contents of its contributor version. In the following three paragraphs, a “patent license” is any express agreement or commitment, however denominated, not to enforce a patent (such as an express permission to practice a patent or covenant not to sue for patent infringement). To “grant” such a patent license to a party means to make such an agreement or commitment not to enforce a patent against the party. If you convey a covered work, knowingly relying on a patent license, and the Corresponding Source of the work is not available for anyone to copy, free of charge and under the terms of this License, through a publicly available network server or other readily accessible means, then you must either (1) cause the Corresponding Source to be so available, or (2) arrange to deprive yourself of the benefit of the patent license for this particular work, or (3) arrange, in a manner consistent with the requirements of this License, to extend the patent license to downstream recipients. “Knowingly relying” means you have actual knowledge that, but for the patent license, your conveying the covered work in a country, or your recipient's use of the covered work in a country, would infringe one or more identifiable patents in that country that you have reason to believe are valid. If, pursuant to or in connection with a single transaction or arrangement, you convey, or propagate by procuring conveyance of, a covered work, and grant a patent license to some of the parties receiving the covered work authorizing them to use, propagate, modify or convey a specific copy of the covered work, then the patent license you grant is automatically extended to all recipients of the covered work and works based on it. A patent license is “discriminatory” if it does not include within the scope of its coverage, prohibits the exercise of, or is conditioned on the non-exercise of one or more of the rights that are specifically granted under this License. You may not convey a covered work if you are a party to an arrangement with a third party that is in the business of distributing software, under which you make payment to the third party based on the extent of your activity of conveying the work, and under which the third party grants, to any of the parties who would receive the covered work from you, a discriminatory patent license (a) in connection with copies of the covered work conveyed by you (or copies made from those copies), or (b) primarily for and in connection with specific products or compilations that contain the covered work, unless you entered into that arrangement, or that patent license was granted, prior to 28 March 2007. Nothing in this License shall be construed as excluding or limiting any implied license or other defenses to infringement that may otherwise be available to you under applicable patent law. 12. No Surrender of Others' Freedom. If conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot convey a covered work so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not convey it at all. For example, if you agree to terms that obligate you to collect a royalty for further conveying from those to whom you convey the Program, the only way you could satisfy both those terms and this License would be to refrain entirely from conveying the Program. 13. Use with the GNU Affero General Public License. Notwithstanding any other provision of this License, you have permission to link or combine any covered work with a work licensed under version 3 of the GNU Affero General Public License into a single combined work, and to convey the resulting work. The terms of this License will continue to apply to the part which is the covered work, but the special requirements of the GNU Affero General Public License, section 13, concerning interaction through a network will apply to the combination as such. 14. Revised Versions of this License. The Free Software Foundation may publish revised and/or new versions of the GNU General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns. Each version is given a distinguishing version number. If the Program specifies that a certain numbered version of the GNU General Public License “or any later version” applies to it, you have the option of following the terms and conditions either of that numbered version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of the GNU General Public License, you may choose any version ever published by the Free Software Foundation. If the Program specifies that a proxy can decide which future versions of the GNU General Public License can be used, that proxy's public statement of acceptance of a version permanently authorizes you to choose that version for the Program. Later license versions may give you additional or different permissions. However, no additional obligations are imposed on any author or copyright holder as a result of your choosing to follow a later version. 15. Disclaimer of Warranty. THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION. 16. Limitation of Liability. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. 17. Interpretation of Sections 15 and 16. If the disclaimer of warranty and limitation of liability provided above cannot be given local legal effect according to their terms, reviewing courts shall apply local law that most closely approximates an absolute waiver of all civil liability in connection with the Program, unless a warranty or assumption of liability accompanies a copy of the Program in return for a fee. END OF TERMS AND CONDITIONS mothur-1.36.1/Mothur.xcodeproj/000077500000000000000000000000001255543666200164105ustar00rootroot00000000000000mothur-1.36.1/Mothur.xcodeproj/project.pbxproj000066400000000000000000014112621255543666200214730ustar00rootroot00000000000000// !$*UTF8*$! { archiveVersion = 1; classes = { }; objectVersion = 46; objects = { /* Begin PBXBuildFile section */ 219C1DE01552C4BD004209F9 /* newcommandtemplate.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 219C1DDF1552C4BD004209F9 /* newcommandtemplate.cpp */; }; 219C1DE41559BCCF004209F9 /* getcoremicrobiomecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 219C1DE31559BCCD004209F9 /* getcoremicrobiomecommand.cpp */; }; 481623E21B56A2DB004C60B7 /* pcrseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 481623E11B56A2DB004C60B7 /* pcrseqscommand.cpp */; }; 481FB51C1AC0A63E0076CFF3 /* main.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 481FB51B1AC0A63E0076CFF3 /* main.cpp */; }; 481FB5251AC0AA430076CFF3 /* testsequence.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 481FB5231AC0AA430076CFF3 /* testsequence.cpp */; }; 481FB5261AC0ADA00076CFF3 /* sequence.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7DB12D37EC400DA6239 /* sequence.cpp */; }; 481FB5271AC0ADBA0076CFF3 /* mothurout.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75D12D37EC400DA6239 /* mothurout.cpp */; }; 481FB52A1AC19F8B0076CFF3 /* setseedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 481FB5281AC19F8B0076CFF3 /* setseedcommand.cpp */; }; 481FB52B1AC1B09F0076CFF3 /* setseedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 481FB5281AC19F8B0076CFF3 /* setseedcommand.cpp */; }; 481FB52C1AC1B0A70076CFF3 /* commandfactory.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6AF12D37EC400DA6239 /* commandfactory.cpp */; }; 481FB52E1AC1B0CB0076CFF3 /* testsetseedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 481FB52D1AC1B0CB0076CFF3 /* testsetseedcommand.cpp */; }; 481FB52F1AC1B5C20076CFF3 /* averagelinkage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65912D37EC300DA6239 /* averagelinkage.cpp */; }; 481FB5301AC1B5C80076CFF3 /* calcsparcc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77B7189173D40E4002163C2 /* calcsparcc.cpp */; }; 481FB5311AC1B5CD0076CFF3 /* clearcut.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69412D37EC400DA6239 /* clearcut.cpp */; }; 481FB5321AC1B5D00076CFF3 /* cmdargs.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A412D37EC400DA6239 /* cmdargs.cpp */; }; 481FB5331AC1B5D30076CFF3 /* distclearcut.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6CF12D37EC400DA6239 /* distclearcut.cpp */; }; 481FB5341AC1B5D60076CFF3 /* dmat.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6D312D37EC400DA6239 /* dmat.cpp */; }; 481FB5351AC1B5D90076CFF3 /* fasta.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6DC12D37EC400DA6239 /* fasta.cpp */; }; 481FB5361AC1B5DC0076CFF3 /* getopt_long.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6FC12D37EC400DA6239 /* getopt_long.cpp */; }; 481FB5371AC1B5E00076CFF3 /* cluster.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69812D37EC400DA6239 /* cluster.cpp */; }; 481FB5381AC1B5E30076CFF3 /* clusterclassic.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69A12D37EC400DA6239 /* clusterclassic.cpp */; }; 481FB5391AC1B5E90076CFF3 /* ace.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B64F12D37EC300DA6239 /* ace.cpp */; }; 481FB53A1AC1B5EC0076CFF3 /* bergerparker.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65E12D37EC300DA6239 /* bergerparker.cpp */; }; 481FB53B1AC1B5EF0076CFF3 /* boneh.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66612D37EC400DA6239 /* boneh.cpp */; }; 481FB53C1AC1B5F10076CFF3 /* bootstrap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66812D37EC400DA6239 /* bootstrap.cpp */; }; 481FB53D1AC1B5F80076CFF3 /* bstick.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66C12D37EC400DA6239 /* bstick.cpp */; }; 481FB53E1AC1B5FC0076CFF3 /* calculator.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66E12D37EC400DA6239 /* calculator.cpp */; }; 481FB53F1AC1B6000076CFF3 /* canberra.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67012D37EC400DA6239 /* canberra.cpp */; }; 481FB5401AC1B6030076CFF3 /* chao1.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67612D37EC400DA6239 /* chao1.cpp */; }; 481FB5411AC1B6070076CFF3 /* coverage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6BB12D37EC400DA6239 /* coverage.cpp */; }; 481FB5421AC1B60D0076CFF3 /* efron.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6D712D37EC400DA6239 /* efron.cpp */; }; 481FB5431AC1B6110076CFF3 /* geom.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F012D37EC400DA6239 /* geom.cpp */; }; 481FB5441AC1B6140076CFF3 /* goodscoverage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70E12D37EC400DA6239 /* goodscoverage.cpp */; }; 481FB5451AC1B6170076CFF3 /* gower.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71212D37EC400DA6239 /* gower.cpp */; }; 481FB5461AC1B6190076CFF3 /* hamming.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71612D37EC400DA6239 /* hamming.cpp */; }; 481FB5471AC1B61C0076CFF3 /* heip.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72412D37EC400DA6239 /* heip.cpp */; }; 481FB5481AC1B61F0076CFF3 /* hellinger.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72612D37EC400DA6239 /* hellinger.cpp */; }; 481FB5491AC1B6220076CFF3 /* invsimpson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72F12D37EC400DA6239 /* invsimpson.cpp */; }; 481FB54A1AC1B6270076CFF3 /* jackknife.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73112D37EC400DA6239 /* jackknife.cpp */; }; 481FB54B1AC1B62A0076CFF3 /* logsd.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74112D37EC400DA6239 /* logsd.cpp */; }; 481FB54C1AC1B62D0076CFF3 /* manhattan.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74712D37EC400DA6239 /* manhattan.cpp */; }; 481FB54D1AC1B6300076CFF3 /* memchi2.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74B12D37EC400DA6239 /* memchi2.cpp */; }; 481FB54E1AC1B6340076CFF3 /* memchord.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74D12D37EC400DA6239 /* memchord.cpp */; }; 481FB54F1AC1B63A0076CFF3 /* memeuclidean.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74F12D37EC400DA6239 /* memeuclidean.cpp */; }; 481FB5501AC1B63D0076CFF3 /* mempearson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75112D37EC400DA6239 /* mempearson.cpp */; }; 481FB5511AC1B6410076CFF3 /* npshannon.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76D12D37EC400DA6239 /* npshannon.cpp */; }; 481FB5521AC1B6450076CFF3 /* odum.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77112D37EC400DA6239 /* odum.cpp */; }; 481FB5531AC1B6490076CFF3 /* parsimony.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78312D37EC400DA6239 /* parsimony.cpp */; }; 481FB5541AC1B64C0076CFF3 /* prng.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79912D37EC400DA6239 /* prng.cpp */; }; 481FB5551AC1B64F0076CFF3 /* qstat.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79D12D37EC400DA6239 /* qstat.cpp */; }; 481FB5561AC1B6520076CFF3 /* shannon.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E512D37EC400DA6239 /* shannon.cpp */; }; 481FB5571AC1B6550076CFF3 /* shannoneven.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E712D37EC400DA6239 /* shannoneven.cpp */; }; 481FB5581AC1B6590076CFF3 /* shannonrange.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A09B0F18773C0E00FAA081 /* shannonrange.cpp */; }; 481FB5591AC1B65D0076CFF3 /* sharedjabund.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F412D37EC400DA6239 /* sharedjabund.cpp */; }; 481FB55A1AC1B6600076CFF3 /* sharedace.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E912D37EC400DA6239 /* sharedace.cpp */; }; 481FB55B1AC1B6630076CFF3 /* sharedanderbergs.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7EC12D37EC400DA6239 /* sharedanderbergs.cpp */; }; 481FB55C1AC1B6660076CFF3 /* sharedbraycurtis.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7EE12D37EC400DA6239 /* sharedbraycurtis.cpp */; }; 481FB55D1AC1B6690076CFF3 /* sharedchao1.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F012D37EC400DA6239 /* sharedchao1.cpp */; }; 481FB55E1AC1B66D0076CFF3 /* sharedjackknife.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F612D37EC400DA6239 /* sharedjackknife.cpp */; }; 481FB55F1AC1B6750076CFF3 /* sharedjclass.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F812D37EC400DA6239 /* sharedjclass.cpp */; }; 481FB5601AC1B6790076CFF3 /* sharedjest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7FA12D37EC400DA6239 /* sharedjest.cpp */; }; 481FB5611AC1B69B0076CFF3 /* sharedjsd.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7222D721856277C0055A993 /* sharedjsd.cpp */; }; 481FB5621AC1B69E0076CFF3 /* sharedkstest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7FC12D37EC400DA6239 /* sharedkstest.cpp */; }; 481FB5631AC1B6A10076CFF3 /* sharedkulczynski.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7FE12D37EC400DA6239 /* sharedkulczynski.cpp */; }; 481FB5641AC1B6A40076CFF3 /* sharedkulczynskicody.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80012D37EC400DA6239 /* sharedkulczynskicody.cpp */; }; 481FB5651AC1B6A70076CFF3 /* sharedlennon.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80212D37EC400DA6239 /* sharedlennon.cpp */; }; 481FB5661AC1B6AA0076CFF3 /* sharedmarczewski.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80612D37EC400DA6239 /* sharedmarczewski.cpp */; }; 481FB5671AC1B6AD0076CFF3 /* sharedmorisitahorn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80812D37EC400DA6239 /* sharedmorisitahorn.cpp */; }; 481FB5681AC1B6B20076CFF3 /* sharedochiai.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80B12D37EC400DA6239 /* sharedochiai.cpp */; }; 481FB5691AC1B6B50076CFF3 /* sharedrjsd.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705AC119BE32C50075E977 /* sharedrjsd.cpp */; }; 481FB56A1AC1B6B80076CFF3 /* sharedsobs.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81512D37EC400DA6239 /* sharedsobs.cpp */; }; 481FB56B1AC1B6BB0076CFF3 /* sharedsobscollectsummary.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81712D37EC400DA6239 /* sharedsobscollectsummary.cpp */; }; 481FB56C1AC1B6BE0076CFF3 /* sharedsorabund.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81912D37EC400DA6239 /* sharedsorabund.cpp */; }; 481FB56D1AC1B6C10076CFF3 /* sharedsorclass.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81B12D37EC400DA6239 /* sharedsorclass.cpp */; }; 481FB56E1AC1B6C30076CFF3 /* sharedsorest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81D12D37EC400DA6239 /* sharedsorest.cpp */; }; 481FB56F1AC1B6C70076CFF3 /* sharedthetan.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81F12D37EC400DA6239 /* sharedthetan.cpp */; }; 481FB5701AC1B6CA0076CFF3 /* sharedthetayc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82112D37EC400DA6239 /* sharedthetayc.cpp */; }; 481FB5711AC1B6D40076CFF3 /* shen.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82512D37EC400DA6239 /* shen.cpp */; }; 481FB5721AC1B6D40076CFF3 /* simpson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82912D37EC400DA6239 /* simpson.cpp */; }; 481FB5731AC1B6EA0076CFF3 /* simpsoneven.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82B12D37EC400DA6239 /* simpsoneven.cpp */; }; 481FB5741AC1B6EA0076CFF3 /* smithwilson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83212D37EC400DA6239 /* smithwilson.cpp */; }; 481FB5751AC1B6EA0076CFF3 /* soergel.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83512D37EC400DA6239 /* soergel.cpp */; }; 481FB5761AC1B6EA0076CFF3 /* solow.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83712D37EC400DA6239 /* solow.cpp */; }; 481FB5771AC1B6EA0076CFF3 /* spearman.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83B12D37EC400DA6239 /* spearman.cpp */; }; 481FB5781AC1B6EA0076CFF3 /* speciesprofile.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83D12D37EC400DA6239 /* speciesprofile.cpp */; }; 481FB5791AC1B6EA0076CFF3 /* structchi2.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84512D37EC400DA6239 /* structchi2.cpp */; }; 481FB57A1AC1B6EA0076CFF3 /* structchord.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84712D37EC400DA6239 /* structchord.cpp */; }; 481FB57B1AC1B6EA0076CFF3 /* structeuclidean.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84912D37EC400DA6239 /* structeuclidean.cpp */; }; 481FB57C1AC1B6EA0076CFF3 /* structkulczynski.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84B12D37EC400DA6239 /* structkulczynski.cpp */; }; 481FB57D1AC1B6EA0076CFF3 /* structpearson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84D12D37EC400DA6239 /* structpearson.cpp */; }; 481FB57E1AC1B6EA0076CFF3 /* unweighted.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87012D37EC400DA6239 /* unweighted.cpp */; }; 481FB57F1AC1B6EA0076CFF3 /* uvest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87212D37EC400DA6239 /* uvest.cpp */; }; 481FB5801AC1B6EA0076CFF3 /* weighted.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87C12D37EC400DA6239 /* weighted.cpp */; }; 481FB5811AC1B6EA0076CFF3 /* whittaker.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87F12D37EC400DA6239 /* whittaker.cpp */; }; 481FB5821AC1B6FF0076CFF3 /* bellerophon.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65C12D37EC300DA6239 /* bellerophon.cpp */; }; 481FB5831AC1B6FF0076CFF3 /* ccode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67412D37EC400DA6239 /* ccode.cpp */; }; 481FB5841AC1B6FF0076CFF3 /* chimera.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67812D37EC400DA6239 /* chimera.cpp */; }; 481FB5851AC1B6FF0076CFF3 /* chimeracheckrdp.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68012D37EC400DA6239 /* chimeracheckrdp.cpp */; }; 481FB5861AC1B6FF0076CFF3 /* chimerarealigner.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68412D37EC400DA6239 /* chimerarealigner.cpp */; }; 481FB5871AC1B6FF0076CFF3 /* decalc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6C112D37EC400DA6239 /* decalc.cpp */; }; 481FB5881AC1B6FF0076CFF3 /* chimeraslayer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68812D37EC400DA6239 /* chimeraslayer.cpp */; }; 481FB5891AC1B6FF0076CFF3 /* maligner.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74512D37EC400DA6239 /* maligner.cpp */; }; 481FB58A1AC1B6FF0076CFF3 /* myPerseus.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7BF221214587886000AD524 /* myPerseus.cpp */; }; 481FB58B1AC1B6FF0076CFF3 /* pintail.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79312D37EC400DA6239 /* pintail.cpp */; }; 481FB58C1AC1B6FF0076CFF3 /* slayer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82E12D37EC400DA6239 /* slayer.cpp */; }; 481FB58D1AC1B7060076CFF3 /* collect.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A612D37EC400DA6239 /* collect.cpp */; }; 481FB58E1AC1B7060076CFF3 /* completelinkage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48F98E4C1A9CFD670005E81B /* completelinkage.cpp */; }; 481FB58F1AC1B71B0076CFF3 /* newcommandtemplate.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 219C1DDF1552C4BD004209F9 /* newcommandtemplate.cpp */; }; 481FB5901AC1B71B0076CFF3 /* aligncommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65112D37EC300DA6239 /* aligncommand.cpp */; }; 481FB5911AC1B71B0076CFF3 /* amovacommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A61F2C130062E000E05B6B /* amovacommand.cpp */; }; 481FB5921AC1B71B0076CFF3 /* anosimcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A71CB15E130B04A2001E7287 /* anosimcommand.cpp */; }; 481FB5931AC1B71B0076CFF3 /* binsequencecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66012D37EC300DA6239 /* binsequencecommand.cpp */; }; 481FB5941AC1B71B0076CFF3 /* catchallcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67212D37EC400DA6239 /* catchallcommand.cpp */; }; 481FB5951AC1B71B0076CFF3 /* chimerabellerophoncommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67A12D37EC400DA6239 /* chimerabellerophoncommand.cpp */; }; 481FB5961AC1B71B0076CFF3 /* chimeraccodecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67C12D37EC400DA6239 /* chimeraccodecommand.cpp */; }; 481FB5971AC1B71B0076CFF3 /* chimeracheckcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67E12D37EC400DA6239 /* chimeracheckcommand.cpp */; }; 481FB5981AC1B71B0076CFF3 /* chimerapintailcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68212D37EC400DA6239 /* chimerapintailcommand.cpp */; }; 481FB5991AC1B71B0076CFF3 /* chimeraperseuscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7BF2231145879B2000AD524 /* chimeraperseuscommand.cpp */; }; 481FB59A1AC1B71B0076CFF3 /* chimeraslayercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68A12D37EC400DA6239 /* chimeraslayercommand.cpp */; }; 481FB59B1AC1B71B0076CFF3 /* chimerauchimecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A74D36B7137DAFAA00332B0C /* chimerauchimecommand.cpp */; }; 481FB59C1AC1B71B0076CFF3 /* chopseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68C12D37EC400DA6239 /* chopseqscommand.cpp */; }; 481FB59D1AC1B71B0076CFF3 /* classifyotucommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69012D37EC400DA6239 /* classifyotucommand.cpp */; }; 481FB59E1AC1B71B0076CFF3 /* classifyseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69212D37EC400DA6239 /* classifyseqscommand.cpp */; }; 481FB59F1AC1B71B0076CFF3 /* classifyrfsharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7F24FC117EA365F0021DC9A /* classifyrfsharedcommand.cpp */; }; 481FB5A01AC1B71B0076CFF3 /* classifysvmsharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 7B2181FE17AD777B00286E6A /* classifysvmsharedcommand.cpp */; }; 481FB5A11AC1B71B0076CFF3 /* classifytreecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7EEB0F414F29BFD00344B83 /* classifytreecommand.cpp */; }; 481FB5A21AC1B71B0076CFF3 /* clearcutcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69612D37EC400DA6239 /* clearcutcommand.cpp */; }; 481FB5A31AC1B7300076CFF3 /* clearmemorycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A73DDBB913C4A0D1006AAE38 /* clearmemorycommand.cpp */; }; 481FB5A41AC1B7300076CFF3 /* clustercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69C12D37EC400DA6239 /* clustercommand.cpp */; }; 481FB5A51AC1B7300076CFF3 /* clusterdoturcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69E12D37EC400DA6239 /* clusterdoturcommand.cpp */; }; 481FB5A61AC1B7300076CFF3 /* clusterfragmentscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A012D37EC400DA6239 /* clusterfragmentscommand.cpp */; }; 481FB5A71AC1B7300076CFF3 /* clustersplitcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A212D37EC400DA6239 /* clustersplitcommand.cpp */; }; 481FB5A81AC1B7300076CFF3 /* collectcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A812D37EC400DA6239 /* collectcommand.cpp */; }; 481FB5A91AC1B7300076CFF3 /* collectsharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6AC12D37EC400DA6239 /* collectsharedcommand.cpp */; }; 481FB5AA1AC1B7300076CFF3 /* consensusseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6B712D37EC400DA6239 /* consensusseqscommand.cpp */; }; 481FB5AB1AC1B7300076CFF3 /* cooccurrencecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7C3DC0914FE457500FE1924 /* cooccurrencecommand.cpp */; }; 481FB5AC1AC1B7300076CFF3 /* corraxescommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6B912D37EC400DA6239 /* corraxescommand.cpp */; }; 481FB5AD1AC1B7300076CFF3 /* countgroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A795840C13F13CD900F201D5 /* countgroupscommand.cpp */; }; 481FB5AE1AC1B7300076CFF3 /* countseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7730EFE13967241007433A3 /* countseqscommand.cpp */; }; 481FB5AF1AC1B7300076CFF3 /* createdatabasecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77EBD2E1523709100ED407C /* createdatabasecommand.cpp */; }; 481FB5B01AC1B7300076CFF3 /* deconvolutecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6C312D37EC400DA6239 /* deconvolutecommand.cpp */; }; 481FB5B11AC1B7300076CFF3 /* degapseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6C512D37EC400DA6239 /* degapseqscommand.cpp */; }; 481FB5B21AC1B7300076CFF3 /* deuniqueseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6C712D37EC400DA6239 /* deuniqueseqscommand.cpp */; }; 481FB5B31AC1B7300076CFF3 /* deuniquetreecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77A221E139001B600B0BE70 /* deuniquetreecommand.cpp */; }; 481FB5B41AC1B7300076CFF3 /* distancecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6CB12D37EC400DA6239 /* distancecommand.cpp */; }; 481FB5B51AC1B7300076CFF3 /* filterseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6E312D37EC400DA6239 /* filterseqscommand.cpp */; }; 481FB5B61AC1B74F0076CFF3 /* filtersharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A79EEF8516971D4A0006DEC1 /* filtersharedcommand.cpp */; }; 481FB5B71AC1B74F0076CFF3 /* getcommandinfocommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A778FE6A134CA6CA00C0BA33 /* getcommandinfocommand.cpp */; }; 481FB5B81AC1B74F0076CFF3 /* getcoremicrobiomecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 219C1DE31559BCCD004209F9 /* getcoremicrobiomecommand.cpp */; }; 481FB5B91AC1B74F0076CFF3 /* getcurrentcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FE7C3F1330EA1000F7B327 /* getcurrentcommand.cpp */; }; 481FB5BA1AC1B74F0076CFF3 /* getdistscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7128B1C16B7002600723BE4 /* getdistscommand.cpp */; }; 481FB5BB1AC1B74F0076CFF3 /* getgroupcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F212D37EC400DA6239 /* getgroupcommand.cpp */; }; 481FB5BC1AC1B74F0076CFF3 /* getgroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F412D37EC400DA6239 /* getgroupscommand.cpp */; }; 481FB5BD1AC1B74F0076CFF3 /* getlabelcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F612D37EC400DA6239 /* getlabelcommand.cpp */; }; 481FB5BE1AC1B74F0076CFF3 /* getmetacommunitycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7548FAC17142EBC00B1F05A /* getmetacommunitycommand.cpp */; }; 481FB5BF1AC1B74F0076CFF3 /* getmimarkspackagecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705ABB19BE32C50075E977 /* getmimarkspackagecommand.cpp */; }; 481FB5C01AC1B74F0076CFF3 /* getlineagecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F812D37EC400DA6239 /* getlineagecommand.cpp */; }; 481FB5C11AC1B74F0076CFF3 /* getlistcountcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6FA12D37EC400DA6239 /* getlistcountcommand.cpp */; }; 481FB5C21AC1B74F0076CFF3 /* getoturepcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6FE12D37EC400DA6239 /* getoturepcommand.cpp */; }; 481FB5C31AC1B74F0076CFF3 /* getotuscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70012D37EC400DA6239 /* getotuscommand.cpp */; }; 481FB5C41AC1B74F0076CFF3 /* getotulabelscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A70056E5156A93D000924A2D /* getotulabelscommand.cpp */; }; 481FB5C51AC1B74F0076CFF3 /* getrabundcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70212D37EC400DA6239 /* getrabundcommand.cpp */; }; 481FB5C61AC1B74F0076CFF3 /* getrelabundcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70412D37EC400DA6239 /* getrelabundcommand.cpp */; }; 481FB5C71AC1B74F0076CFF3 /* getsabundcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70612D37EC400DA6239 /* getsabundcommand.cpp */; }; 481FB5C81AC1B74F0076CFF3 /* getseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70812D37EC400DA6239 /* getseqscommand.cpp */; }; 481FB5C91AC1B74F0076CFF3 /* getsharedotucommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70A12D37EC400DA6239 /* getsharedotucommand.cpp */; }; 481FB5CA1AC1B74F0076CFF3 /* hclustercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71A12D37EC400DA6239 /* hclustercommand.cpp */; }; 481FB5CB1AC1B74F0076CFF3 /* heatmapcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71E12D37EC400DA6239 /* heatmapcommand.cpp */; }; 481FB5CC1AC1B74F0076CFF3 /* heatmapsimcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72212D37EC400DA6239 /* heatmapsimcommand.cpp */; }; 481FB5CD1AC1B74F0076CFF3 /* helpcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72812D37EC400DA6239 /* helpcommand.cpp */; }; 481FB5CE1AC1B75C0076CFF3 /* homovacommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A75790581301749D00A30DAB /* homovacommand.cpp */; }; 481FB5CF1AC1B75C0076CFF3 /* indicatorcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72B12D37EC400DA6239 /* indicatorcommand.cpp */; }; 481FB5D01AC1B75C0076CFF3 /* kruskalwalliscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7496D2C167B531B00CC7D7C /* kruskalwalliscommand.cpp */; }; 481FB5D11AC1B75C0076CFF3 /* lefsecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7190B201768E0DF00A9AFA6 /* lefsecommand.cpp */; }; 481FB5D21AC1B75C0076CFF3 /* libshuffcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73B12D37EC400DA6239 /* libshuffcommand.cpp */; }; 481FB5D31AC1B75C0076CFF3 /* listotulabelscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A067191562946F0095C8C5 /* listotulabelscommand.cpp */; }; 481FB5D41AC1B75C0076CFF3 /* listseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73D12D37EC400DA6239 /* listseqscommand.cpp */; }; 481FB5D51AC1B75C0076CFF3 /* loadlogfilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A73901071588C40900ED2ED6 /* loadlogfilecommand.cpp */; }; 481FB5D61AC1B75C0076CFF3 /* mantelcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FA10011302E096003860FE /* mantelcommand.cpp */; }; 481FB5D71AC1B75C0076CFF3 /* makebiomcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A724D2B6153C8628000A826F /* makebiomcommand.cpp */; }; 481FB5D81AC1B75C0076CFF3 /* makecontigscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A0671E1562AC3E0095C8C5 /* makecontigscommand.cpp */; }; 481FB5D91AC1B75C0076CFF3 /* makefastqcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A799F5B81309A3E000AEEFA0 /* makefastqcommand.cpp */; }; 481FB5DA1AC1B75C0076CFF3 /* makegroupcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74312D37EC400DA6239 /* makegroupcommand.cpp */; }; 481FB5DB1AC1B75C0076CFF3 /* makelefsecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A741744A175CD9B1007DF49B /* makelefsecommand.cpp */; }; 481FB5DC1AC1B75C0076CFF3 /* makelookupcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E6F69D17427D06006775E2 /* makelookupcommand.cpp */; }; 481FB5DD1AC1B77E0076CFF3 /* matrixoutputcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74912D37EC400DA6239 /* matrixoutputcommand.cpp */; }; 481FB5DE1AC1B77E0076CFF3 /* mergesfffilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705ABF19BE32C50075E977 /* mergesfffilecommand.cpp */; }; 481FB5DF1AC1B77E0076CFF3 /* mergefilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75312D37EC400DA6239 /* mergefilecommand.cpp */; }; 481FB5E01AC1B77E0076CFF3 /* mergegroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A71FE12B12EDF72400963CA7 /* mergegroupscommand.cpp */; }; 481FB5E11AC1B77E0076CFF3 /* mergetaxsummarycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A799314A16CBD0CD0017E888 /* mergetaxsummarycommand.cpp */; }; 481FB5E21AC1B77E0076CFF3 /* metastatscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75712D37EC400DA6239 /* metastatscommand.cpp */; }; 481FB5E31AC1B77E0076CFF3 /* mgclustercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75912D37EC400DA6239 /* mgclustercommand.cpp */; }; 481FB5E41AC1B77E0076CFF3 /* mimarksattributescommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 487C5A851AB88B93002AF48A /* mimarksattributescommand.cpp */; }; 481FB5E51AC1B77E0076CFF3 /* nocommands.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76912D37EC400DA6239 /* nocommands.cpp */; }; 481FB5E61AC1B77E0076CFF3 /* normalizesharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76B12D37EC400DA6239 /* normalizesharedcommand.cpp */; }; 481FB5E71AC1B77E0076CFF3 /* nmdscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A713EBEC12DC7C5E000092AC /* nmdscommand.cpp */; }; 481FB5E81AC1B77E0076CFF3 /* otuassociationcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A3C8C714D041AD00B1BFBE /* otuassociationcommand.cpp */; }; 481FB5E91AC1B77E0076CFF3 /* otuhierarchycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77912D37EC400DA6239 /* otuhierarchycommand.cpp */; }; 481FB5EA1AC1B77E0076CFF3 /* pairwiseseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77D12D37EC400DA6239 /* pairwiseseqscommand.cpp */; }; 481FB5EB1AC1B77E0076CFF3 /* parsefastaqcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77F12D37EC400DA6239 /* parsefastaqcommand.cpp */; }; 481FB5EC1AC1B77E0076CFF3 /* parselistscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78112D37EC400DA6239 /* parselistscommand.cpp */; }; 481FB5ED1AC1B77E0076CFF3 /* parsimonycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78512D37EC400DA6239 /* parsimonycommand.cpp */; }; 481FB5EE1AC1B77E0076CFF3 /* pcacommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FC486612D795D60055BC5C /* pcacommand.cpp */; }; 481FB5EF1AC1B77E0076CFF3 /* pcoacommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78712D37EC400DA6239 /* pcoacommand.cpp */; }; 481FB5F11AC1B77E0076CFF3 /* phylodiversitycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78B12D37EC400DA6239 /* phylodiversitycommand.cpp */; }; 481FB5F21AC1B77E0076CFF3 /* phylotypecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79112D37EC400DA6239 /* phylotypecommand.cpp */; }; 481FB5F31AC1B77E0076CFF3 /* pipelinepdscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79512D37EC400DA6239 /* pipelinepdscommand.cpp */; }; 481FB5F41AC1B77E0076CFF3 /* preclustercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79712D37EC400DA6239 /* preclustercommand.cpp */; }; 481FB5F51AC1B77E0076CFF3 /* primerdesigncommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A74C06E816A9C0A8008390A3 /* primerdesigncommand.cpp */; }; 481FB5F61AC1B77E0076CFF3 /* quitcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A112D37EC400DA6239 /* quitcommand.cpp */; }; 481FB5F71AC1B77E0076CFF3 /* rarefactcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7AB12D37EC400DA6239 /* rarefactcommand.cpp */; }; 481FB5F81AC1B77E0076CFF3 /* rarefactsharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7AE12D37EC400DA6239 /* rarefactsharedcommand.cpp */; }; 481FB5F91AC1B77E0076CFF3 /* removedistscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7B0231416B8244B006BA09E /* removedistscommand.cpp */; }; 481FB5FA1AC1B77E0076CFF3 /* removegroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7C312D37EC400DA6239 /* removegroupscommand.cpp */; }; 481FB5FB1AC1B77E0076CFF3 /* removelineagecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7C512D37EC400DA6239 /* removelineagecommand.cpp */; }; 481FB5FC1AC1B7970076CFF3 /* removeotuscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7C712D37EC400DA6239 /* removeotuscommand.cpp */; }; 481FB5FD1AC1B7970076CFF3 /* removeotulabelscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A70056EA156AB6E500924A2D /* removeotulabelscommand.cpp */; }; 481FB5FE1AC1B7970076CFF3 /* removerarecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A727864312E9E28C00F86ABA /* removerarecommand.cpp */; }; 481FB5FF1AC1B7970076CFF3 /* removeseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7C912D37EC400DA6239 /* removeseqscommand.cpp */; }; 481FB6001AC1B7970076CFF3 /* renameseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7CFA4301755401800D9ED4D /* renameseqscommand.cpp */; }; 481FB6011AC1B7970076CFF3 /* reversecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7CD12D37EC400DA6239 /* reversecommand.cpp */; }; 481FB6021AC1B7970076CFF3 /* screenseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D112D37EC400DA6239 /* screenseqscommand.cpp */; }; 481FB6031AC1B7970076CFF3 /* secondarystructurecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D312D37EC400DA6239 /* secondarystructurecommand.cpp */; }; 481FB6041AC1B7970076CFF3 /* sensspeccommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D512D37EC400DA6239 /* sensspeccommand.cpp */; }; 481FB6051AC1B7970076CFF3 /* seqerrorcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D712D37EC400DA6239 /* seqerrorcommand.cpp */; }; 481FB6061AC1B7970076CFF3 /* seqsummarycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D912D37EC400DA6239 /* seqsummarycommand.cpp */; }; 481FB6071AC1B7970076CFF3 /* setcurrentcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FE7E6C13311EA400F7B327 /* setcurrentcommand.cpp */; }; 481FB6081AC1B7970076CFF3 /* setdircommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7DF12D37EC400DA6239 /* setdircommand.cpp */; }; 481FB6091AC1B7970076CFF3 /* setlogfilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E112D37EC400DA6239 /* setlogfilecommand.cpp */; }; 481FB60A1AC1B7970076CFF3 /* sffinfocommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E312D37EC400DA6239 /* sffinfocommand.cpp */; }; 481FB60B1AC1B7AC0076CFF3 /* sffmultiplecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7C7DAB815DA758B0059B0CF /* sffmultiplecommand.cpp */; }; 481FB60C1AC1B7AC0076CFF3 /* sharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F212D37EC400DA6239 /* sharedcommand.cpp */; }; 481FB60D1AC1B7AC0076CFF3 /* shhhercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82712D37EC400DA6239 /* shhhercommand.cpp */; }; 481FB60E1AC1B7AC0076CFF3 /* shhhseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A774101314695AF60098E6AC /* shhhseqscommand.cpp */; }; 481FB60F1AC1B7AC0076CFF3 /* sortseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A32DA914DC43B00001D2E5 /* sortseqscommand.cpp */; }; 481FB6101AC1B7AC0076CFF3 /* sparcccommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77B7184173D2240002163C2 /* sparcccommand.cpp */; }; 481FB6111AC1B7AC0076CFF3 /* splitabundcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83F12D37EC400DA6239 /* splitabundcommand.cpp */; }; 481FB6121AC1B7AC0076CFF3 /* splitgroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84112D37EC400DA6239 /* splitgroupscommand.cpp */; }; 481FB6131AC1B7AC0076CFF3 /* sracommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A747EC70181EA0F900345732 /* sracommand.cpp */; }; 481FB6141AC1B7AC0076CFF3 /* subsamplecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84F12D37EC400DA6239 /* subsamplecommand.cpp */; }; 481FB6151AC1B7AC0076CFF3 /* summarycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85712D37EC400DA6239 /* summarycommand.cpp */; }; 481FB6161AC1B7AC0076CFF3 /* summaryqualcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A754149614840CF7005850D1 /* summaryqualcommand.cpp */; }; 481FB6171AC1B7AC0076CFF3 /* summarysharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85912D37EC400DA6239 /* summarysharedcommand.cpp */; }; 481FB6181AC1B7AC0076CFF3 /* summarytaxcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FFB557142CA02C004884F2 /* summarytaxcommand.cpp */; }; 481FB6191AC1B7AC0076CFF3 /* systemcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85B12D37EC400DA6239 /* systemcommand.cpp */; }; 481FB61A1AC1B7AC0076CFF3 /* treegroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86212D37EC400DA6239 /* treegroupscommand.cpp */; }; 481FB61B1AC1B7AC0076CFF3 /* trimflowscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86812D37EC400DA6239 /* trimflowscommand.cpp */; }; 481FB61C1AC1B7AC0076CFF3 /* trimseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86A12D37EC400DA6239 /* trimseqscommand.cpp */; }; 481FB61D1AC1B7AC0076CFF3 /* unifracunweightedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86C12D37EC400DA6239 /* unifracunweightedcommand.cpp */; }; 481FB61E1AC1B7AC0076CFF3 /* unifracweightedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86E12D37EC400DA6239 /* unifracweightedcommand.cpp */; }; 481FB61F1AC1B7AC0076CFF3 /* venncommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87A12D37EC400DA6239 /* venncommand.cpp */; }; 481FB6201AC1B7B30076CFF3 /* commandoptionparser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6B112D37EC400DA6239 /* commandoptionparser.cpp */; }; 481FB6211AC1B7BA0076CFF3 /* communitytype.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7132EB2184E792700AAA402 /* communitytype.cpp */; }; 481FB6221AC1B7BA0076CFF3 /* kmeans.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7D395C3184FA3A200A350D7 /* kmeans.cpp */; }; 481FB6231AC1B7BA0076CFF3 /* pam.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7B093BF18579F0400843CD1 /* pam.cpp */; }; 481FB6241AC1B7BA0076CFF3 /* qFinderDMM.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7548FAE171440EC00B1F05A /* qFinderDMM.cpp */; }; 481FB6251AC1B7EA0076CFF3 /* alignment.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65312D37EC300DA6239 /* alignment.cpp */; }; 481FB6261AC1B7EA0076CFF3 /* alignmentcell.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65512D37EC300DA6239 /* alignmentcell.cpp */; }; 481FB6271AC1B7EA0076CFF3 /* alignmentdb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65712D37EC300DA6239 /* alignmentdb.cpp */; }; 481FB6281AC1B7EA0076CFF3 /* blastalign.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66212D37EC300DA6239 /* blastalign.cpp */; }; 481FB6291AC1B7EA0076CFF3 /* blastdb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66412D37EC400DA6239 /* blastdb.cpp */; }; 481FB62A1AC1B7EA0076CFF3 /* counttable.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A74D59A3159A1E2000043046 /* counttable.cpp */; }; 481FB62B1AC1B7EA0076CFF3 /* database.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6BD12D37EC400DA6239 /* database.cpp */; }; 481FB62C1AC1B7EA0076CFF3 /* designmap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77916E6176F7F7600EEFE18 /* designmap.cpp */; }; 481FB62D1AC1B7EA0076CFF3 /* distancedb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6CD12D37EC400DA6239 /* distancedb.cpp */; }; 481FB62E1AC1B7EA0076CFF3 /* fastamap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6DE12D37EC400DA6239 /* fastamap.cpp */; }; 481FB62F1AC1B7EA0076CFF3 /* fastqread.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48C51DEF1A76B888004ECDF1 /* fastqread.cpp */; }; 481FB6301AC1B7EA0076CFF3 /* flowdata.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6E712D37EC400DA6239 /* flowdata.cpp */; }; 481FB6311AC1B7EA0076CFF3 /* fullmatrix.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6EE12D37EC400DA6239 /* fullmatrix.cpp */; }; 481FB6321AC1B7EA0076CFF3 /* groupmap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71412D37EC400DA6239 /* groupmap.cpp */; }; 481FB6331AC1B7EA0076CFF3 /* kmer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73312D37EC400DA6239 /* kmer.cpp */; }; 481FB6341AC1B7EA0076CFF3 /* kmeralign.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48C51DF11A793EFE004ECDF1 /* kmeralign.cpp */; }; 481FB6351AC1B7EA0076CFF3 /* kmerdb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73512D37EC400DA6239 /* kmerdb.cpp */; }; 481FB6361AC1B7EA0076CFF3 /* listvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73F12D37EC400DA6239 /* listvector.cpp */; }; 481FB6371AC1B7EA0076CFF3 /* nameassignment.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75F12D37EC400DA6239 /* nameassignment.cpp */; }; 481FB6381AC1B7EA0076CFF3 /* oligos.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705ABD19BE32C50075E977 /* oligos.cpp */; }; 481FB6391AC1B7EA0076CFF3 /* ordervector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77712D37EC400DA6239 /* ordervector.cpp */; }; 481FB63A1AC1B7EA0076CFF3 /* qualityscores.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79F12D37EC400DA6239 /* qualityscores.cpp */; }; 481FB63B1AC1B7EA0076CFF3 /* rabundvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A312D37EC400DA6239 /* rabundvector.cpp */; }; 481FB63C1AC1B7EA0076CFF3 /* referencedb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721765613BB9F7D0014DAAE /* referencedb.cpp */; }; 481FB63D1AC1B7EA0076CFF3 /* reportfile.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7CB12D37EC400DA6239 /* reportfile.cpp */; }; 481FB63E1AC1B7EA0076CFF3 /* sabundvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7CF12D37EC400DA6239 /* sabundvector.cpp */; }; 481FB63F1AC1B7EA0076CFF3 /* sequencecountparser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A741FAD115D1688E0067BCC5 /* sequencecountparser.cpp */; }; 481FB6401AC1B7EA0076CFF3 /* sequencedb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7DD12D37EC400DA6239 /* sequencedb.cpp */; }; 481FB6411AC1B7EA0076CFF3 /* sequenceparser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7F9F5CE141A5E500032F693 /* sequenceparser.cpp */; }; 481FB6421AC1B7EA0076CFF3 /* sharedlistvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80412D37EC400DA6239 /* sharedlistvector.cpp */; }; 481FB6431AC1B7EA0076CFF3 /* sharedordervector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80D12D37EC400DA6239 /* sharedordervector.cpp */; }; 481FB6441AC1B7EA0076CFF3 /* sharedrabundfloatvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80F12D37EC400DA6239 /* sharedrabundfloatvector.cpp */; }; 481FB6451AC1B7EA0076CFF3 /* sharedrabundvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81112D37EC400DA6239 /* sharedrabundvector.cpp */; }; 481FB6461AC1B7EA0076CFF3 /* sharedsabundvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81312D37EC400DA6239 /* sharedsabundvector.cpp */; }; 481FB6471AC1B7EA0076CFF3 /* sparsematrix.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83912D37EC400DA6239 /* sparsematrix.cpp */; }; 481FB6481AC1B7EA0076CFF3 /* sparsedistancematrix.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E0243C15B4520A00A5F046 /* sparsedistancematrix.cpp */; }; 481FB6491AC1B7F40076CFF3 /* suffixdb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85112D37EC400DA6239 /* suffixdb.cpp */; }; 481FB64A1AC1B7F40076CFF3 /* suffixnodes.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85312D37EC400DA6239 /* suffixnodes.cpp */; }; 481FB64B1AC1B7F40076CFF3 /* suffixtree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85512D37EC400DA6239 /* suffixtree.cpp */; }; 481FB64C1AC1B7F40076CFF3 /* tree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85F12D37EC400DA6239 /* tree.cpp */; }; 481FB64D1AC1B7F40076CFF3 /* treemap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86412D37EC400DA6239 /* treemap.cpp */; }; 481FB64E1AC1B7F40076CFF3 /* treenode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86612D37EC400DA6239 /* treenode.cpp */; }; 481FB64F1AC1B8100076CFF3 /* consensus.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6B512D37EC400DA6239 /* consensus.cpp */; }; 481FB6501AC1B8100076CFF3 /* dlibshuff.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6D112D37EC400DA6239 /* dlibshuff.cpp */; }; 481FB6511AC1B8100076CFF3 /* engine.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6DA12D37EC400DA6239 /* engine.cpp */; }; 481FB6521AC1B8100076CFF3 /* fileoutput.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6E012D37EC400DA6239 /* fileoutput.cpp */; }; 481FB6531AC1B8100076CFF3 /* gotohoverlap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71012D37EC400DA6239 /* gotohoverlap.cpp */; }; 481FB6541AC1B8100076CFF3 /* hcluster.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71812D37EC400DA6239 /* hcluster.cpp */; }; 481FB6551AC1B8100076CFF3 /* heatmap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71C12D37EC400DA6239 /* heatmap.cpp */; }; 481FB6561AC1B8100076CFF3 /* heatmapsim.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72012D37EC400DA6239 /* heatmapsim.cpp */; }; 481FB6571AC1B8100076CFF3 /* inputdata.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72D12D37EC400DA6239 /* inputdata.cpp */; }; 481FB6581AC1B8100076CFF3 /* libshuff.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73912D37EC400DA6239 /* libshuff.cpp */; }; 481FB6591AC1B8100076CFF3 /* linearalgebra.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FC480D12D788F20055BC5C /* linearalgebra.cpp */; }; 481FB65A1AC1B8100076CFF3 /* wilcox.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7D9378917B146B5001E90B0 /* wilcox.cpp */; }; 481FB65B1AC1B82C0076CFF3 /* mothurfisher.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A79234D613C74BF6002B08E2 /* mothurfisher.cpp */; }; 481FB65C1AC1B82C0076CFF3 /* mothurmetastats.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A73DDC3713C4BF64006AAE38 /* mothurmetastats.cpp */; }; 481FB65F1AC1B8450076CFF3 /* myseqdist.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A774104614696F320098E6AC /* myseqdist.cpp */; }; 481FB6601AC1B8450076CFF3 /* nast.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76112D37EC400DA6239 /* nast.cpp */; }; 481FB6611AC1B8450076CFF3 /* nastreport.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76312D37EC400DA6239 /* nastreport.cpp */; }; 481FB6621AC1B8450076CFF3 /* noalign.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76712D37EC400DA6239 /* noalign.cpp */; }; 481FB6631AC1B8450076CFF3 /* needlemanoverlap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76512D37EC400DA6239 /* needlemanoverlap.cpp */; }; 481FB6641AC1B8450076CFF3 /* optionparser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77512D37EC400DA6239 /* optionparser.cpp */; }; 481FB6651AC1B8450076CFF3 /* overlap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77B12D37EC400DA6239 /* overlap.cpp */; }; 481FB6661AC1B8450076CFF3 /* progress.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79B12D37EC400DA6239 /* progress.cpp */; }; 481FB6671AC1B8450076CFF3 /* randomnumber.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77B7186173D4041002163C2 /* randomnumber.cpp */; }; 481FB6681AC1B8450076CFF3 /* rarecalc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A512D37EC400DA6239 /* rarecalc.cpp */; }; 481FB6691AC1B8520076CFF3 /* abstractdecisiontree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7386C241619E52200651424 /* abstractdecisiontree.cpp */; }; 481FB66A1AC1B8520076CFF3 /* abstractrandomforest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705AC319BE32C50075E977 /* abstractrandomforest.cpp */; }; 481FB66B1AC1B8520076CFF3 /* decisiontree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7386C28161A110700651424 /* decisiontree.cpp */; }; 481FB66C1AC1B8520076CFF3 /* randomforest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77E1937161B201E00DB1A2A /* randomforest.cpp */; }; 481FB66D1AC1B8520076CFF3 /* rftreenode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77E193A161B289600DB1A2A /* rftreenode.cpp */; }; 481FB66E1AC1B8520076CFF3 /* forest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 83F25B0A163B031200ABE73D /* forest.cpp */; }; 481FB66F1AC1B8520076CFF3 /* regularizedrandomforest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 834D9D561656D7C400E7FAB9 /* regularizedrandomforest.cpp */; }; 481FB6701AC1B8820076CFF3 /* raredisplay.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A712D37EC400DA6239 /* raredisplay.cpp */; }; 481FB6711AC1B8820076CFF3 /* rarefact.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A912D37EC400DA6239 /* rarefact.cpp */; }; 481FB6721AC1B8820076CFF3 /* refchimeratest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 7E6BE10912F710D8007ADDBE /* refchimeratest.cpp */; }; 481FB6731AC1B8820076CFF3 /* seqnoise.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77410F414697C300098E6AC /* seqnoise.cpp */; }; 481FB6741AC1B88F0076CFF3 /* formatcolumn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6E912D37EC400DA6239 /* formatcolumn.cpp */; }; 481FB6751AC1B88F0076CFF3 /* formatphylip.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6EC12D37EC400DA6239 /* formatphylip.cpp */; }; 481FB6761AC1B88F0076CFF3 /* readblast.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7B012D37EC400DA6239 /* readblast.cpp */; }; 481FB6771AC1B88F0076CFF3 /* readcluster.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7B212D37EC400DA6239 /* readcluster.cpp */; }; 481FB6781AC1B88F0076CFF3 /* readcolumn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7B412D37EC400DA6239 /* readcolumn.cpp */; }; 481FB6791AC1B88F0076CFF3 /* readphylip.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7BD12D37EC400DA6239 /* readphylip.cpp */; }; 481FB67A1AC1B88F0076CFF3 /* readtree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7BF12D37EC400DA6239 /* readtree.cpp */; }; 481FB67B1AC1B88F0076CFF3 /* readphylipvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A713EBAB12DC7613000092AC /* readphylipvector.cpp */; }; 481FB67C1AC1B88F0076CFF3 /* splitmatrix.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84312D37EC400DA6239 /* splitmatrix.cpp */; }; 481FB67D1AC1B88F0076CFF3 /* treereader.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7D755D91535F679009BF21A /* treereader.cpp */; }; 481FB67E1AC1B8960076CFF3 /* sharedutilities.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82312D37EC400DA6239 /* sharedutilities.cpp */; }; 481FB67F1AC1B8960076CFF3 /* singlelinkage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82D12D37EC400DA6239 /* singlelinkage.cpp */; }; 481FB6801AC1B8960076CFF3 /* slibshuff.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83012D37EC400DA6239 /* slibshuff.cpp */; }; 481FB6811AC1B8960076CFF3 /* subsample.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7876A25152A017C00A0AE86 /* subsample.cpp */; }; 481FB6821AC1B8AF0076CFF3 /* svm.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 7B21820117AD77BD00286E6A /* svm.cpp */; }; 481FB6831AC1B8B80076CFF3 /* trialSwap2.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7C3DC0D14FE469500FE1924 /* trialSwap2.cpp */; }; 481FB6841AC1B8B80076CFF3 /* trimoligos.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FF19F1140FFDA500AD216D /* trimoligos.cpp */; }; 481FB6851AC1B8B80076CFF3 /* validcalculator.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87412D37EC400DA6239 /* validcalculator.cpp */; }; 481FB6861AC1B8B80076CFF3 /* validparameter.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87612D37EC400DA6239 /* validparameter.cpp */; }; 481FB6871AC1B8B80076CFF3 /* venn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87812D37EC400DA6239 /* venn.cpp */; }; 481FB6881AC1B8B80076CFF3 /* weightedlinkage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87E12D37EC400DA6239 /* weightedlinkage.cpp */; }; 481FB6891AC1BA760076CFF3 /* phylosummary.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78D12D37EC400DA6239 /* phylosummary.cpp */; }; 481FB68A1AC1BA9E0076CFF3 /* alignnode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB66161C570F009860A1 /* alignnode.cpp */; }; 481FB68B1AC1BA9E0076CFF3 /* aligntree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB68161C570F009860A1 /* aligntree.cpp */; }; 481FB68C1AC1BA9E0076CFF3 /* bayesian.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65A12D37EC300DA6239 /* bayesian.cpp */; }; 481FB68D1AC1BA9E0076CFF3 /* classify.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68E12D37EC400DA6239 /* classify.cpp */; }; 481FB68E1AC1BA9E0076CFF3 /* kmernode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB6D161C572A009860A1 /* kmernode.cpp */; }; 481FB68F1AC1BA9E0076CFF3 /* kmertree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB6F161C572A009860A1 /* kmertree.cpp */; }; 481FB6901AC1BA9E0076CFF3 /* knn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73712D37EC400DA6239 /* knn.cpp */; }; 481FB6911AC1BAA60076CFF3 /* phylotree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78F12D37EC400DA6239 /* phylotree.cpp */; }; 481FB6921AC1BAA60076CFF3 /* taxonomyequalizer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85D12D37EC400DA6239 /* taxonomyequalizer.cpp */; }; 481FB6931AC1BAA60076CFF3 /* taxonomynode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB73161C573B009860A1 /* taxonomynode.cpp */; }; 483C952E188F0CAD0035E7B7 /* (null) in Sources */ = {isa = PBXBuildFile; }; 48705AC419BE32C50075E977 /* getmimarkspackagecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705ABB19BE32C50075E977 /* getmimarkspackagecommand.cpp */; }; 48705AC519BE32C50075E977 /* oligos.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705ABD19BE32C50075E977 /* oligos.cpp */; }; 48705AC619BE32C50075E977 /* mergesfffilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705ABF19BE32C50075E977 /* mergesfffilecommand.cpp */; }; 48705AC719BE32C50075E977 /* sharedrjsd.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705AC119BE32C50075E977 /* sharedrjsd.cpp */; }; 48705AC819BE32C50075E977 /* abstractrandomforest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48705AC319BE32C50075E977 /* abstractrandomforest.cpp */; }; 487C5A871AB88B93002AF48A /* mimarksattributescommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 487C5A851AB88B93002AF48A /* mimarksattributescommand.cpp */; }; 4893DE2918EEF28100C615DF /* (null) in Sources */ = {isa = PBXBuildFile; }; 48A85BAD18E1AF2000199B6F /* (null) in Sources */ = {isa = PBXBuildFile; }; 48C51DF01A76B888004ECDF1 /* fastqread.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48C51DEF1A76B888004ECDF1 /* fastqread.cpp */; }; 48C51DF31A793EFE004ECDF1 /* kmeralign.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48C51DF11A793EFE004ECDF1 /* kmeralign.cpp */; }; 48DB37B31B3B27E000C372A4 /* makefilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48DB37B11B3B27E000C372A4 /* makefilecommand.cpp */; }; 48DB37B41B3B27E000C372A4 /* makefilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48DB37B11B3B27E000C372A4 /* makefilecommand.cpp */; }; 48E981CF189C38FB0042BE9D /* (null) in Sources */ = {isa = PBXBuildFile; }; 48F98E4D1A9CFD670005E81B /* completelinkage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 48F98E4C1A9CFD670005E81B /* completelinkage.cpp */; }; 7E6BE10A12F710D8007ADDBE /* refchimeratest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 7E6BE10912F710D8007ADDBE /* refchimeratest.cpp */; }; 834D9D581656D7C400E7FAB9 /* regularizedrandomforest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 834D9D561656D7C400E7FAB9 /* regularizedrandomforest.cpp */; }; 835FE03D19F00640005AA754 /* classifysvmsharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 7B2181FE17AD777B00286E6A /* classifysvmsharedcommand.cpp */; }; 835FE03E19F00A4D005AA754 /* svm.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 7B21820117AD77BD00286E6A /* svm.cpp */; }; 83F25B0C163B031200ABE73D /* forest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = 83F25B0A163B031200ABE73D /* forest.cpp */; }; A70056E6156A93D000924A2D /* getotulabelscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A70056E5156A93D000924A2D /* getotulabelscommand.cpp */; }; A70056EB156AB6E500924A2D /* removeotulabelscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A70056EA156AB6E500924A2D /* removeotulabelscommand.cpp */; }; A70332B712D3A13400761E33 /* makefile in Sources */ = {isa = PBXBuildFile; fileRef = A70332B512D3A13400761E33 /* makefile */; }; A7128B1D16B7002A00723BE4 /* getdistscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7128B1C16B7002600723BE4 /* getdistscommand.cpp */; }; A7132EB3184E792700AAA402 /* communitytype.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7132EB2184E792700AAA402 /* communitytype.cpp */; }; A713EBAC12DC7613000092AC /* readphylipvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A713EBAB12DC7613000092AC /* readphylipvector.cpp */; }; A713EBED12DC7C5E000092AC /* nmdscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A713EBEC12DC7C5E000092AC /* nmdscommand.cpp */; }; A7190B221768E0DF00A9AFA6 /* lefsecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7190B201768E0DF00A9AFA6 /* lefsecommand.cpp */; }; A71CB160130B04A2001E7287 /* anosimcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A71CB15E130B04A2001E7287 /* anosimcommand.cpp */; }; A71FE12C12EDF72400963CA7 /* mergegroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A71FE12B12EDF72400963CA7 /* mergegroupscommand.cpp */; }; A721765713BB9F7D0014DAAE /* referencedb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721765613BB9F7D0014DAAE /* referencedb.cpp */; }; A721AB6A161C570F009860A1 /* alignnode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB66161C570F009860A1 /* alignnode.cpp */; }; A721AB6B161C570F009860A1 /* aligntree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB68161C570F009860A1 /* aligntree.cpp */; }; A721AB71161C572A009860A1 /* kmernode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB6D161C572A009860A1 /* kmernode.cpp */; }; A721AB72161C572A009860A1 /* kmertree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB6F161C572A009860A1 /* kmertree.cpp */; }; A721AB77161C573B009860A1 /* taxonomynode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A721AB73161C573B009860A1 /* taxonomynode.cpp */; }; A7222D731856277C0055A993 /* sharedjsd.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7222D721856277C0055A993 /* sharedjsd.cpp */; }; A724D2B7153C8628000A826F /* makebiomcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A724D2B6153C8628000A826F /* makebiomcommand.cpp */; }; A727864412E9E28C00F86ABA /* removerarecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A727864312E9E28C00F86ABA /* removerarecommand.cpp */; }; A7386C251619E52300651424 /* abstractdecisiontree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7386C241619E52200651424 /* abstractdecisiontree.cpp */; }; A7386C29161A110800651424 /* decisiontree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7386C28161A110700651424 /* decisiontree.cpp */; }; A73901081588C40900ED2ED6 /* loadlogfilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A73901071588C40900ED2ED6 /* loadlogfilecommand.cpp */; }; A73DDBBA13C4A0D1006AAE38 /* clearmemorycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A73DDBB913C4A0D1006AAE38 /* clearmemorycommand.cpp */; }; A73DDC3813C4BF64006AAE38 /* mothurmetastats.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A73DDC3713C4BF64006AAE38 /* mothurmetastats.cpp */; }; A741744C175CD9B1007DF49B /* makelefsecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A741744A175CD9B1007DF49B /* makelefsecommand.cpp */; }; A741FAD215D1688E0067BCC5 /* sequencecountparser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A741FAD115D1688E0067BCC5 /* sequencecountparser.cpp */; }; A747EC71181EA0F900345732 /* sracommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A747EC70181EA0F900345732 /* sracommand.cpp */; }; A7496D2E167B531B00CC7D7C /* kruskalwalliscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7496D2C167B531B00CC7D7C /* kruskalwalliscommand.cpp */; }; A74C06E916A9C0A9008390A3 /* primerdesigncommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A74C06E816A9C0A8008390A3 /* primerdesigncommand.cpp */; }; A74D36B8137DAFAA00332B0C /* chimerauchimecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A74D36B7137DAFAA00332B0C /* chimerauchimecommand.cpp */; }; A74D59A4159A1E2000043046 /* counttable.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A74D59A3159A1E2000043046 /* counttable.cpp */; }; A754149714840CF7005850D1 /* summaryqualcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A754149614840CF7005850D1 /* summaryqualcommand.cpp */; }; A7548FAD17142EBC00B1F05A /* getmetacommunitycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7548FAC17142EBC00B1F05A /* getmetacommunitycommand.cpp */; }; A7548FB0171440ED00B1F05A /* qFinderDMM.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7548FAE171440EC00B1F05A /* qFinderDMM.cpp */; }; A75790591301749D00A30DAB /* homovacommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A75790581301749D00A30DAB /* homovacommand.cpp */; }; A7730EFF13967241007433A3 /* countseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7730EFE13967241007433A3 /* countseqscommand.cpp */; }; A774101414695AF60098E6AC /* shhhseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A774101314695AF60098E6AC /* shhhseqscommand.cpp */; }; A774104814696F320098E6AC /* myseqdist.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A774104614696F320098E6AC /* myseqdist.cpp */; }; A77410F614697C300098E6AC /* seqnoise.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77410F414697C300098E6AC /* seqnoise.cpp */; }; A778FE6B134CA6CA00C0BA33 /* getcommandinfocommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A778FE6A134CA6CA00C0BA33 /* getcommandinfocommand.cpp */; }; A77916E8176F7F7600EEFE18 /* designmap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77916E6176F7F7600EEFE18 /* designmap.cpp */; }; A77A221F139001B600B0BE70 /* deuniquetreecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77A221E139001B600B0BE70 /* deuniquetreecommand.cpp */; }; A77B7185173D2240002163C2 /* sparcccommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77B7184173D2240002163C2 /* sparcccommand.cpp */; }; A77B7188173D4042002163C2 /* randomnumber.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77B7186173D4041002163C2 /* randomnumber.cpp */; }; A77B718B173D40E5002163C2 /* calcsparcc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77B7189173D40E4002163C2 /* calcsparcc.cpp */; }; A77E1938161B201E00DB1A2A /* randomforest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77E1937161B201E00DB1A2A /* randomforest.cpp */; }; A77E193B161B289600DB1A2A /* rftreenode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77E193A161B289600DB1A2A /* rftreenode.cpp */; }; A77EBD2F1523709100ED407C /* createdatabasecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A77EBD2E1523709100ED407C /* createdatabasecommand.cpp */; }; A7876A26152A017C00A0AE86 /* subsample.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7876A25152A017C00A0AE86 /* subsample.cpp */; }; A79234D713C74BF6002B08E2 /* mothurfisher.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A79234D613C74BF6002B08E2 /* mothurfisher.cpp */; }; A795840D13F13CD900F201D5 /* countgroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A795840C13F13CD900F201D5 /* countgroupscommand.cpp */; }; A799314B16CBD0CD0017E888 /* mergetaxsummarycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A799314A16CBD0CD0017E888 /* mergetaxsummarycommand.cpp */; }; A799F5B91309A3E000AEEFA0 /* makefastqcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A799F5B81309A3E000AEEFA0 /* makefastqcommand.cpp */; }; A79EEF8616971D4A0006DEC1 /* filtersharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A79EEF8516971D4A0006DEC1 /* filtersharedcommand.cpp */; }; A7A0671A1562946F0095C8C5 /* listotulabelscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A067191562946F0095C8C5 /* listotulabelscommand.cpp */; }; A7A0671F1562AC3E0095C8C5 /* makecontigscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A0671E1562AC3E0095C8C5 /* makecontigscommand.cpp */; }; A7A09B1018773C0E00FAA081 /* shannonrange.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A09B0F18773C0E00FAA081 /* shannonrange.cpp */; }; A7A32DAA14DC43B00001D2E5 /* sortseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A32DA914DC43B00001D2E5 /* sortseqscommand.cpp */; }; A7A3C8C914D041AD00B1BFBE /* otuassociationcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A3C8C714D041AD00B1BFBE /* otuassociationcommand.cpp */; }; A7A61F2D130062E000E05B6B /* amovacommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7A61F2C130062E000E05B6B /* amovacommand.cpp */; }; A7B0231516B8244C006BA09E /* removedistscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7B0231416B8244B006BA09E /* removedistscommand.cpp */; }; A7B093C018579F0400843CD1 /* pam.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7B093BF18579F0400843CD1 /* pam.cpp */; }; A7BF221414587886000AD524 /* myPerseus.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7BF221214587886000AD524 /* myPerseus.cpp */; }; A7BF2232145879B2000AD524 /* chimeraperseuscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7BF2231145879B2000AD524 /* chimeraperseuscommand.cpp */; }; A7C3DC0B14FE457500FE1924 /* cooccurrencecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7C3DC0914FE457500FE1924 /* cooccurrencecommand.cpp */; }; A7C3DC0F14FE469500FE1924 /* trialSwap2.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7C3DC0D14FE469500FE1924 /* trialSwap2.cpp */; }; A7C7DAB915DA758B0059B0CF /* sffmultiplecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7C7DAB815DA758B0059B0CF /* sffmultiplecommand.cpp */; }; A7CFA4311755401800D9ED4D /* renameseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7CFA4301755401800D9ED4D /* renameseqscommand.cpp */; }; A7D395C4184FA3A200A350D7 /* kmeans.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7D395C3184FA3A200A350D7 /* kmeans.cpp */; }; A7D755DA1535F679009BF21A /* treereader.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7D755D91535F679009BF21A /* treereader.cpp */; }; A7D9378A17B146B5001E90B0 /* wilcox.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7D9378917B146B5001E90B0 /* wilcox.cpp */; }; A7E0243D15B4520A00A5F046 /* sparsedistancematrix.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E0243C15B4520A00A5F046 /* sparsedistancematrix.cpp */; }; A7E6F69E17427D06006775E2 /* makelookupcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E6F69D17427D06006775E2 /* makelookupcommand.cpp */; }; A7E9B88112D37EC400DA6239 /* ace.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B64F12D37EC300DA6239 /* ace.cpp */; }; A7E9B88212D37EC400DA6239 /* aligncommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65112D37EC300DA6239 /* aligncommand.cpp */; }; A7E9B88312D37EC400DA6239 /* alignment.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65312D37EC300DA6239 /* alignment.cpp */; }; A7E9B88412D37EC400DA6239 /* alignmentcell.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65512D37EC300DA6239 /* alignmentcell.cpp */; }; A7E9B88512D37EC400DA6239 /* alignmentdb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65712D37EC300DA6239 /* alignmentdb.cpp */; }; A7E9B88612D37EC400DA6239 /* averagelinkage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65912D37EC300DA6239 /* averagelinkage.cpp */; }; A7E9B88712D37EC400DA6239 /* bayesian.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65A12D37EC300DA6239 /* bayesian.cpp */; }; A7E9B88812D37EC400DA6239 /* bellerophon.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65C12D37EC300DA6239 /* bellerophon.cpp */; }; A7E9B88912D37EC400DA6239 /* bergerparker.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B65E12D37EC300DA6239 /* bergerparker.cpp */; }; A7E9B88A12D37EC400DA6239 /* binsequencecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66012D37EC300DA6239 /* binsequencecommand.cpp */; }; A7E9B88B12D37EC400DA6239 /* blastalign.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66212D37EC300DA6239 /* blastalign.cpp */; }; A7E9B88C12D37EC400DA6239 /* blastdb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66412D37EC400DA6239 /* blastdb.cpp */; }; A7E9B88D12D37EC400DA6239 /* boneh.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66612D37EC400DA6239 /* boneh.cpp */; }; A7E9B88E12D37EC400DA6239 /* bootstrap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66812D37EC400DA6239 /* bootstrap.cpp */; }; A7E9B89012D37EC400DA6239 /* bstick.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66C12D37EC400DA6239 /* bstick.cpp */; }; A7E9B89112D37EC400DA6239 /* calculator.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B66E12D37EC400DA6239 /* calculator.cpp */; }; A7E9B89212D37EC400DA6239 /* canberra.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67012D37EC400DA6239 /* canberra.cpp */; }; A7E9B89312D37EC400DA6239 /* catchallcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67212D37EC400DA6239 /* catchallcommand.cpp */; }; A7E9B89412D37EC400DA6239 /* ccode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67412D37EC400DA6239 /* ccode.cpp */; }; A7E9B89512D37EC400DA6239 /* chao1.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67612D37EC400DA6239 /* chao1.cpp */; }; A7E9B89612D37EC400DA6239 /* chimera.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67812D37EC400DA6239 /* chimera.cpp */; }; A7E9B89712D37EC400DA6239 /* chimerabellerophoncommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67A12D37EC400DA6239 /* chimerabellerophoncommand.cpp */; }; A7E9B89812D37EC400DA6239 /* chimeraccodecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67C12D37EC400DA6239 /* chimeraccodecommand.cpp */; }; A7E9B89912D37EC400DA6239 /* chimeracheckcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B67E12D37EC400DA6239 /* chimeracheckcommand.cpp */; }; A7E9B89A12D37EC400DA6239 /* chimeracheckrdp.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68012D37EC400DA6239 /* chimeracheckrdp.cpp */; }; A7E9B89B12D37EC400DA6239 /* chimerapintailcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68212D37EC400DA6239 /* chimerapintailcommand.cpp */; }; A7E9B89C12D37EC400DA6239 /* chimerarealigner.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68412D37EC400DA6239 /* chimerarealigner.cpp */; }; A7E9B89E12D37EC400DA6239 /* chimeraslayer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68812D37EC400DA6239 /* chimeraslayer.cpp */; }; A7E9B89F12D37EC400DA6239 /* chimeraslayercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68A12D37EC400DA6239 /* chimeraslayercommand.cpp */; }; A7E9B8A012D37EC400DA6239 /* chopseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68C12D37EC400DA6239 /* chopseqscommand.cpp */; }; A7E9B8A112D37EC400DA6239 /* classify.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B68E12D37EC400DA6239 /* classify.cpp */; }; A7E9B8A212D37EC400DA6239 /* classifyotucommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69012D37EC400DA6239 /* classifyotucommand.cpp */; }; A7E9B8A312D37EC400DA6239 /* classifyseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69212D37EC400DA6239 /* classifyseqscommand.cpp */; }; A7E9B8A412D37EC400DA6239 /* clearcut.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69412D37EC400DA6239 /* clearcut.cpp */; }; A7E9B8A512D37EC400DA6239 /* clearcutcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69612D37EC400DA6239 /* clearcutcommand.cpp */; }; A7E9B8A612D37EC400DA6239 /* cluster.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69812D37EC400DA6239 /* cluster.cpp */; }; A7E9B8A712D37EC400DA6239 /* clusterclassic.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69A12D37EC400DA6239 /* clusterclassic.cpp */; }; A7E9B8A812D37EC400DA6239 /* clustercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69C12D37EC400DA6239 /* clustercommand.cpp */; }; A7E9B8A912D37EC400DA6239 /* clusterdoturcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B69E12D37EC400DA6239 /* clusterdoturcommand.cpp */; }; A7E9B8AA12D37EC400DA6239 /* clusterfragmentscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A012D37EC400DA6239 /* clusterfragmentscommand.cpp */; }; A7E9B8AB12D37EC400DA6239 /* clustersplitcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A212D37EC400DA6239 /* clustersplitcommand.cpp */; }; A7E9B8AC12D37EC400DA6239 /* cmdargs.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A412D37EC400DA6239 /* cmdargs.cpp */; }; A7E9B8AD12D37EC400DA6239 /* collect.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A612D37EC400DA6239 /* collect.cpp */; }; A7E9B8AE12D37EC400DA6239 /* collectcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6A812D37EC400DA6239 /* collectcommand.cpp */; }; A7E9B8AF12D37EC400DA6239 /* collectsharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6AC12D37EC400DA6239 /* collectsharedcommand.cpp */; }; A7E9B8B012D37EC400DA6239 /* commandfactory.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6AF12D37EC400DA6239 /* commandfactory.cpp */; }; A7E9B8B112D37EC400DA6239 /* commandoptionparser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6B112D37EC400DA6239 /* commandoptionparser.cpp */; }; A7E9B8B312D37EC400DA6239 /* consensus.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6B512D37EC400DA6239 /* consensus.cpp */; }; A7E9B8B412D37EC400DA6239 /* consensusseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6B712D37EC400DA6239 /* consensusseqscommand.cpp */; }; A7E9B8B512D37EC400DA6239 /* corraxescommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6B912D37EC400DA6239 /* corraxescommand.cpp */; }; A7E9B8B612D37EC400DA6239 /* coverage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6BB12D37EC400DA6239 /* coverage.cpp */; }; A7E9B8B712D37EC400DA6239 /* database.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6BD12D37EC400DA6239 /* database.cpp */; }; A7E9B8B812D37EC400DA6239 /* decalc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6C112D37EC400DA6239 /* decalc.cpp */; }; A7E9B8B912D37EC400DA6239 /* deconvolutecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6C312D37EC400DA6239 /* deconvolutecommand.cpp */; }; A7E9B8BA12D37EC400DA6239 /* degapseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6C512D37EC400DA6239 /* degapseqscommand.cpp */; }; A7E9B8BB12D37EC400DA6239 /* deuniqueseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6C712D37EC400DA6239 /* deuniqueseqscommand.cpp */; }; A7E9B8BC12D37EC400DA6239 /* distancecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6CB12D37EC400DA6239 /* distancecommand.cpp */; }; A7E9B8BD12D37EC400DA6239 /* distancedb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6CD12D37EC400DA6239 /* distancedb.cpp */; }; A7E9B8BE12D37EC400DA6239 /* distclearcut.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6CF12D37EC400DA6239 /* distclearcut.cpp */; }; A7E9B8BF12D37EC400DA6239 /* dlibshuff.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6D112D37EC400DA6239 /* dlibshuff.cpp */; }; A7E9B8C012D37EC400DA6239 /* dmat.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6D312D37EC400DA6239 /* dmat.cpp */; }; A7E9B8C112D37EC400DA6239 /* efron.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6D712D37EC400DA6239 /* efron.cpp */; }; A7E9B8C212D37EC400DA6239 /* engine.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6DA12D37EC400DA6239 /* engine.cpp */; }; A7E9B8C312D37EC400DA6239 /* fasta.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6DC12D37EC400DA6239 /* fasta.cpp */; }; A7E9B8C412D37EC400DA6239 /* fastamap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6DE12D37EC400DA6239 /* fastamap.cpp */; }; A7E9B8C512D37EC400DA6239 /* fileoutput.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6E012D37EC400DA6239 /* fileoutput.cpp */; }; A7E9B8C612D37EC400DA6239 /* filterseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6E312D37EC400DA6239 /* filterseqscommand.cpp */; }; A7E9B8C812D37EC400DA6239 /* flowdata.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6E712D37EC400DA6239 /* flowdata.cpp */; }; A7E9B8C912D37EC400DA6239 /* formatcolumn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6E912D37EC400DA6239 /* formatcolumn.cpp */; }; A7E9B8CA12D37EC400DA6239 /* formatphylip.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6EC12D37EC400DA6239 /* formatphylip.cpp */; }; A7E9B8CB12D37EC400DA6239 /* fullmatrix.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6EE12D37EC400DA6239 /* fullmatrix.cpp */; }; A7E9B8CC12D37EC400DA6239 /* geom.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F012D37EC400DA6239 /* geom.cpp */; }; A7E9B8CD12D37EC400DA6239 /* getgroupcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F212D37EC400DA6239 /* getgroupcommand.cpp */; }; A7E9B8CE12D37EC400DA6239 /* getgroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F412D37EC400DA6239 /* getgroupscommand.cpp */; }; A7E9B8CF12D37EC400DA6239 /* getlabelcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F612D37EC400DA6239 /* getlabelcommand.cpp */; }; A7E9B8D012D37EC400DA6239 /* getlineagecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6F812D37EC400DA6239 /* getlineagecommand.cpp */; }; A7E9B8D112D37EC400DA6239 /* getlistcountcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6FA12D37EC400DA6239 /* getlistcountcommand.cpp */; }; A7E9B8D212D37EC400DA6239 /* getopt_long.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6FC12D37EC400DA6239 /* getopt_long.cpp */; }; A7E9B8D312D37EC400DA6239 /* getoturepcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B6FE12D37EC400DA6239 /* getoturepcommand.cpp */; }; A7E9B8D412D37EC400DA6239 /* getotuscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70012D37EC400DA6239 /* getotuscommand.cpp */; }; A7E9B8D512D37EC400DA6239 /* getrabundcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70212D37EC400DA6239 /* getrabundcommand.cpp */; }; A7E9B8D612D37EC400DA6239 /* getrelabundcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70412D37EC400DA6239 /* getrelabundcommand.cpp */; }; A7E9B8D712D37EC400DA6239 /* getsabundcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70612D37EC400DA6239 /* getsabundcommand.cpp */; }; A7E9B8D812D37EC400DA6239 /* getseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70812D37EC400DA6239 /* getseqscommand.cpp */; }; A7E9B8D912D37EC400DA6239 /* getsharedotucommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70A12D37EC400DA6239 /* getsharedotucommand.cpp */; }; A7E9B8DB12D37EC400DA6239 /* goodscoverage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B70E12D37EC400DA6239 /* goodscoverage.cpp */; }; A7E9B8DC12D37EC400DA6239 /* gotohoverlap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71012D37EC400DA6239 /* gotohoverlap.cpp */; }; A7E9B8DD12D37EC400DA6239 /* gower.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71212D37EC400DA6239 /* gower.cpp */; }; A7E9B8DE12D37EC400DA6239 /* groupmap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71412D37EC400DA6239 /* groupmap.cpp */; }; A7E9B8DF12D37EC400DA6239 /* hamming.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71612D37EC400DA6239 /* hamming.cpp */; }; A7E9B8E012D37EC400DA6239 /* hcluster.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71812D37EC400DA6239 /* hcluster.cpp */; }; A7E9B8E112D37EC400DA6239 /* hclustercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71A12D37EC400DA6239 /* hclustercommand.cpp */; }; A7E9B8E212D37EC400DA6239 /* heatmap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71C12D37EC400DA6239 /* heatmap.cpp */; }; A7E9B8E312D37EC400DA6239 /* heatmapcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B71E12D37EC400DA6239 /* heatmapcommand.cpp */; }; A7E9B8E412D37EC400DA6239 /* heatmapsim.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72012D37EC400DA6239 /* heatmapsim.cpp */; }; A7E9B8E512D37EC400DA6239 /* heatmapsimcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72212D37EC400DA6239 /* heatmapsimcommand.cpp */; }; A7E9B8E612D37EC400DA6239 /* heip.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72412D37EC400DA6239 /* heip.cpp */; }; A7E9B8E712D37EC400DA6239 /* hellinger.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72612D37EC400DA6239 /* hellinger.cpp */; }; A7E9B8E812D37EC400DA6239 /* helpcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72812D37EC400DA6239 /* helpcommand.cpp */; }; A7E9B8E912D37EC400DA6239 /* indicatorcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72B12D37EC400DA6239 /* indicatorcommand.cpp */; }; A7E9B8EA12D37EC400DA6239 /* inputdata.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72D12D37EC400DA6239 /* inputdata.cpp */; }; A7E9B8EB12D37EC400DA6239 /* invsimpson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B72F12D37EC400DA6239 /* invsimpson.cpp */; }; A7E9B8EC12D37EC400DA6239 /* jackknife.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73112D37EC400DA6239 /* jackknife.cpp */; }; A7E9B8ED12D37EC400DA6239 /* kmer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73312D37EC400DA6239 /* kmer.cpp */; }; A7E9B8EE12D37EC400DA6239 /* kmerdb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73512D37EC400DA6239 /* kmerdb.cpp */; }; A7E9B8EF12D37EC400DA6239 /* knn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73712D37EC400DA6239 /* knn.cpp */; }; A7E9B8F012D37EC400DA6239 /* libshuff.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73912D37EC400DA6239 /* libshuff.cpp */; }; A7E9B8F112D37EC400DA6239 /* libshuffcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73B12D37EC400DA6239 /* libshuffcommand.cpp */; }; A7E9B8F212D37EC400DA6239 /* listseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73D12D37EC400DA6239 /* listseqscommand.cpp */; }; A7E9B8F312D37EC400DA6239 /* listvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B73F12D37EC400DA6239 /* listvector.cpp */; }; A7E9B8F412D37EC400DA6239 /* logsd.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74112D37EC400DA6239 /* logsd.cpp */; }; A7E9B8F512D37EC400DA6239 /* makegroupcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74312D37EC400DA6239 /* makegroupcommand.cpp */; }; A7E9B8F612D37EC400DA6239 /* maligner.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74512D37EC400DA6239 /* maligner.cpp */; }; A7E9B8F712D37EC400DA6239 /* manhattan.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74712D37EC400DA6239 /* manhattan.cpp */; }; A7E9B8F812D37EC400DA6239 /* matrixoutputcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74912D37EC400DA6239 /* matrixoutputcommand.cpp */; }; A7E9B8F912D37EC400DA6239 /* memchi2.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74B12D37EC400DA6239 /* memchi2.cpp */; }; A7E9B8FA12D37EC400DA6239 /* memchord.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74D12D37EC400DA6239 /* memchord.cpp */; }; A7E9B8FB12D37EC400DA6239 /* memeuclidean.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B74F12D37EC400DA6239 /* memeuclidean.cpp */; }; A7E9B8FC12D37EC400DA6239 /* mempearson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75112D37EC400DA6239 /* mempearson.cpp */; }; A7E9B8FD12D37EC400DA6239 /* mergefilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75312D37EC400DA6239 /* mergefilecommand.cpp */; }; A7E9B8FF12D37EC400DA6239 /* metastatscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75712D37EC400DA6239 /* metastatscommand.cpp */; }; A7E9B90012D37EC400DA6239 /* mgclustercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75912D37EC400DA6239 /* mgclustercommand.cpp */; }; A7E9B90112D37EC400DA6239 /* mothur.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75B12D37EC400DA6239 /* mothur.cpp */; }; A7E9B90212D37EC400DA6239 /* mothurout.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75D12D37EC400DA6239 /* mothurout.cpp */; }; A7E9B90312D37EC400DA6239 /* nameassignment.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B75F12D37EC400DA6239 /* nameassignment.cpp */; }; A7E9B90412D37EC400DA6239 /* nast.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76112D37EC400DA6239 /* nast.cpp */; }; A7E9B90512D37EC400DA6239 /* nastreport.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76312D37EC400DA6239 /* nastreport.cpp */; }; A7E9B90612D37EC400DA6239 /* needlemanoverlap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76512D37EC400DA6239 /* needlemanoverlap.cpp */; }; A7E9B90712D37EC400DA6239 /* noalign.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76712D37EC400DA6239 /* noalign.cpp */; }; A7E9B90812D37EC400DA6239 /* nocommands.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76912D37EC400DA6239 /* nocommands.cpp */; }; A7E9B90912D37EC400DA6239 /* normalizesharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76B12D37EC400DA6239 /* normalizesharedcommand.cpp */; }; A7E9B90A12D37EC400DA6239 /* npshannon.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B76D12D37EC400DA6239 /* npshannon.cpp */; }; A7E9B90B12D37EC400DA6239 /* odum.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77112D37EC400DA6239 /* odum.cpp */; }; A7E9B90C12D37EC400DA6239 /* optionparser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77512D37EC400DA6239 /* optionparser.cpp */; }; A7E9B90D12D37EC400DA6239 /* ordervector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77712D37EC400DA6239 /* ordervector.cpp */; }; A7E9B90E12D37EC400DA6239 /* otuhierarchycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77912D37EC400DA6239 /* otuhierarchycommand.cpp */; }; A7E9B90F12D37EC400DA6239 /* overlap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77B12D37EC400DA6239 /* overlap.cpp */; }; A7E9B91012D37EC400DA6239 /* pairwiseseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77D12D37EC400DA6239 /* pairwiseseqscommand.cpp */; }; A7E9B91112D37EC400DA6239 /* parsefastaqcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B77F12D37EC400DA6239 /* parsefastaqcommand.cpp */; }; A7E9B91212D37EC400DA6239 /* parselistscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78112D37EC400DA6239 /* parselistscommand.cpp */; }; A7E9B91312D37EC400DA6239 /* parsimony.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78312D37EC400DA6239 /* parsimony.cpp */; }; A7E9B91412D37EC400DA6239 /* parsimonycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78512D37EC400DA6239 /* parsimonycommand.cpp */; }; A7E9B91512D37EC400DA6239 /* pcoacommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78712D37EC400DA6239 /* pcoacommand.cpp */; }; A7E9B91712D37EC400DA6239 /* phylodiversitycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78B12D37EC400DA6239 /* phylodiversitycommand.cpp */; }; A7E9B91812D37EC400DA6239 /* phylosummary.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78D12D37EC400DA6239 /* phylosummary.cpp */; }; A7E9B91912D37EC400DA6239 /* phylotree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B78F12D37EC400DA6239 /* phylotree.cpp */; }; A7E9B91A12D37EC400DA6239 /* phylotypecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79112D37EC400DA6239 /* phylotypecommand.cpp */; }; A7E9B91B12D37EC400DA6239 /* pintail.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79312D37EC400DA6239 /* pintail.cpp */; }; A7E9B91C12D37EC400DA6239 /* pipelinepdscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79512D37EC400DA6239 /* pipelinepdscommand.cpp */; }; A7E9B91D12D37EC400DA6239 /* preclustercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79712D37EC400DA6239 /* preclustercommand.cpp */; }; A7E9B91E12D37EC400DA6239 /* prng.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79912D37EC400DA6239 /* prng.cpp */; }; A7E9B91F12D37EC400DA6239 /* progress.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79B12D37EC400DA6239 /* progress.cpp */; }; A7E9B92012D37EC400DA6239 /* qstat.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79D12D37EC400DA6239 /* qstat.cpp */; }; A7E9B92112D37EC400DA6239 /* qualityscores.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B79F12D37EC400DA6239 /* qualityscores.cpp */; }; A7E9B92212D37EC400DA6239 /* quitcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A112D37EC400DA6239 /* quitcommand.cpp */; }; A7E9B92312D37EC400DA6239 /* rabundvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A312D37EC400DA6239 /* rabundvector.cpp */; }; A7E9B92412D37EC400DA6239 /* rarecalc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A512D37EC400DA6239 /* rarecalc.cpp */; }; A7E9B92512D37EC400DA6239 /* raredisplay.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A712D37EC400DA6239 /* raredisplay.cpp */; }; A7E9B92612D37EC400DA6239 /* rarefact.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7A912D37EC400DA6239 /* rarefact.cpp */; }; A7E9B92712D37EC400DA6239 /* rarefactcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7AB12D37EC400DA6239 /* rarefactcommand.cpp */; }; A7E9B92812D37EC400DA6239 /* rarefactsharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7AE12D37EC400DA6239 /* rarefactsharedcommand.cpp */; }; A7E9B92912D37EC400DA6239 /* readblast.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7B012D37EC400DA6239 /* readblast.cpp */; }; A7E9B92A12D37EC400DA6239 /* readcluster.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7B212D37EC400DA6239 /* readcluster.cpp */; }; A7E9B92B12D37EC400DA6239 /* readcolumn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7B412D37EC400DA6239 /* readcolumn.cpp */; }; A7E9B92F12D37EC400DA6239 /* readphylip.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7BD12D37EC400DA6239 /* readphylip.cpp */; }; A7E9B93012D37EC400DA6239 /* readtree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7BF12D37EC400DA6239 /* readtree.cpp */; }; A7E9B93212D37EC400DA6239 /* removegroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7C312D37EC400DA6239 /* removegroupscommand.cpp */; }; A7E9B93312D37EC400DA6239 /* removelineagecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7C512D37EC400DA6239 /* removelineagecommand.cpp */; }; A7E9B93412D37EC400DA6239 /* removeotuscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7C712D37EC400DA6239 /* removeotuscommand.cpp */; }; A7E9B93512D37EC400DA6239 /* removeseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7C912D37EC400DA6239 /* removeseqscommand.cpp */; }; A7E9B93612D37EC400DA6239 /* reportfile.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7CB12D37EC400DA6239 /* reportfile.cpp */; }; A7E9B93712D37EC400DA6239 /* reversecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7CD12D37EC400DA6239 /* reversecommand.cpp */; }; A7E9B93812D37EC400DA6239 /* sabundvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7CF12D37EC400DA6239 /* sabundvector.cpp */; }; A7E9B93912D37EC400DA6239 /* screenseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D112D37EC400DA6239 /* screenseqscommand.cpp */; }; A7E9B93A12D37EC400DA6239 /* secondarystructurecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D312D37EC400DA6239 /* secondarystructurecommand.cpp */; }; A7E9B93B12D37EC400DA6239 /* sensspeccommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D512D37EC400DA6239 /* sensspeccommand.cpp */; }; A7E9B93C12D37EC400DA6239 /* seqerrorcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D712D37EC400DA6239 /* seqerrorcommand.cpp */; }; A7E9B93D12D37EC400DA6239 /* seqsummarycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7D912D37EC400DA6239 /* seqsummarycommand.cpp */; }; A7E9B93E12D37EC400DA6239 /* sequence.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7DB12D37EC400DA6239 /* sequence.cpp */; }; A7E9B93F12D37EC400DA6239 /* sequencedb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7DD12D37EC400DA6239 /* sequencedb.cpp */; }; A7E9B94012D37EC400DA6239 /* setdircommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7DF12D37EC400DA6239 /* setdircommand.cpp */; }; A7E9B94112D37EC400DA6239 /* setlogfilecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E112D37EC400DA6239 /* setlogfilecommand.cpp */; }; A7E9B94212D37EC400DA6239 /* sffinfocommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E312D37EC400DA6239 /* sffinfocommand.cpp */; }; A7E9B94312D37EC400DA6239 /* shannon.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E512D37EC400DA6239 /* shannon.cpp */; }; A7E9B94412D37EC400DA6239 /* shannoneven.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E712D37EC400DA6239 /* shannoneven.cpp */; }; A7E9B94512D37EC400DA6239 /* sharedace.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7E912D37EC400DA6239 /* sharedace.cpp */; }; A7E9B94612D37EC400DA6239 /* sharedanderbergs.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7EC12D37EC400DA6239 /* sharedanderbergs.cpp */; }; A7E9B94712D37EC400DA6239 /* sharedbraycurtis.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7EE12D37EC400DA6239 /* sharedbraycurtis.cpp */; }; A7E9B94812D37EC400DA6239 /* sharedchao1.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F012D37EC400DA6239 /* sharedchao1.cpp */; }; A7E9B94912D37EC400DA6239 /* sharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F212D37EC400DA6239 /* sharedcommand.cpp */; }; A7E9B94A12D37EC400DA6239 /* sharedjabund.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F412D37EC400DA6239 /* sharedjabund.cpp */; }; A7E9B94B12D37EC400DA6239 /* sharedjackknife.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F612D37EC400DA6239 /* sharedjackknife.cpp */; }; A7E9B94C12D37EC400DA6239 /* sharedjclass.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7F812D37EC400DA6239 /* sharedjclass.cpp */; }; A7E9B94D12D37EC400DA6239 /* sharedjest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7FA12D37EC400DA6239 /* sharedjest.cpp */; }; A7E9B94E12D37EC400DA6239 /* sharedkstest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7FC12D37EC400DA6239 /* sharedkstest.cpp */; }; A7E9B94F12D37EC400DA6239 /* sharedkulczynski.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B7FE12D37EC400DA6239 /* sharedkulczynski.cpp */; }; A7E9B95012D37EC400DA6239 /* sharedkulczynskicody.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80012D37EC400DA6239 /* sharedkulczynskicody.cpp */; }; A7E9B95112D37EC400DA6239 /* sharedlennon.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80212D37EC400DA6239 /* sharedlennon.cpp */; }; A7E9B95212D37EC400DA6239 /* sharedlistvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80412D37EC400DA6239 /* sharedlistvector.cpp */; }; A7E9B95312D37EC400DA6239 /* sharedmarczewski.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80612D37EC400DA6239 /* sharedmarczewski.cpp */; }; A7E9B95412D37EC400DA6239 /* sharedmorisitahorn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80812D37EC400DA6239 /* sharedmorisitahorn.cpp */; }; A7E9B95512D37EC400DA6239 /* sharedochiai.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80B12D37EC400DA6239 /* sharedochiai.cpp */; }; A7E9B95612D37EC400DA6239 /* sharedordervector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80D12D37EC400DA6239 /* sharedordervector.cpp */; }; A7E9B95712D37EC400DA6239 /* sharedrabundfloatvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B80F12D37EC400DA6239 /* sharedrabundfloatvector.cpp */; }; A7E9B95812D37EC400DA6239 /* sharedrabundvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81112D37EC400DA6239 /* sharedrabundvector.cpp */; }; A7E9B95912D37EC400DA6239 /* sharedsabundvector.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81312D37EC400DA6239 /* sharedsabundvector.cpp */; }; A7E9B95A12D37EC400DA6239 /* sharedsobs.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81512D37EC400DA6239 /* sharedsobs.cpp */; }; A7E9B95B12D37EC400DA6239 /* sharedsobscollectsummary.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81712D37EC400DA6239 /* sharedsobscollectsummary.cpp */; }; A7E9B95C12D37EC400DA6239 /* sharedsorabund.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81912D37EC400DA6239 /* sharedsorabund.cpp */; }; A7E9B95D12D37EC400DA6239 /* sharedsorclass.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81B12D37EC400DA6239 /* sharedsorclass.cpp */; }; A7E9B95E12D37EC400DA6239 /* sharedsorest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81D12D37EC400DA6239 /* sharedsorest.cpp */; }; A7E9B95F12D37EC400DA6239 /* sharedthetan.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B81F12D37EC400DA6239 /* sharedthetan.cpp */; }; A7E9B96012D37EC400DA6239 /* sharedthetayc.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82112D37EC400DA6239 /* sharedthetayc.cpp */; }; A7E9B96112D37EC400DA6239 /* sharedutilities.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82312D37EC400DA6239 /* sharedutilities.cpp */; }; A7E9B96212D37EC400DA6239 /* shen.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82512D37EC400DA6239 /* shen.cpp */; }; A7E9B96312D37EC400DA6239 /* shhhercommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82712D37EC400DA6239 /* shhhercommand.cpp */; }; A7E9B96412D37EC400DA6239 /* simpson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82912D37EC400DA6239 /* simpson.cpp */; }; A7E9B96512D37EC400DA6239 /* simpsoneven.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82B12D37EC400DA6239 /* simpsoneven.cpp */; }; A7E9B96612D37EC400DA6239 /* singlelinkage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82D12D37EC400DA6239 /* singlelinkage.cpp */; }; A7E9B96712D37EC400DA6239 /* slayer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B82E12D37EC400DA6239 /* slayer.cpp */; }; A7E9B96812D37EC400DA6239 /* slibshuff.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83012D37EC400DA6239 /* slibshuff.cpp */; }; A7E9B96912D37EC400DA6239 /* smithwilson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83212D37EC400DA6239 /* smithwilson.cpp */; }; A7E9B96A12D37EC400DA6239 /* soergel.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83512D37EC400DA6239 /* soergel.cpp */; }; A7E9B96B12D37EC400DA6239 /* solow.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83712D37EC400DA6239 /* solow.cpp */; }; A7E9B96C12D37EC400DA6239 /* sparsematrix.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83912D37EC400DA6239 /* sparsematrix.cpp */; }; A7E9B96D12D37EC400DA6239 /* spearman.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83B12D37EC400DA6239 /* spearman.cpp */; }; A7E9B96E12D37EC400DA6239 /* speciesprofile.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83D12D37EC400DA6239 /* speciesprofile.cpp */; }; A7E9B96F12D37EC400DA6239 /* splitabundcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B83F12D37EC400DA6239 /* splitabundcommand.cpp */; }; A7E9B97012D37EC400DA6239 /* splitgroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84112D37EC400DA6239 /* splitgroupscommand.cpp */; }; A7E9B97112D37EC400DA6239 /* splitmatrix.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84312D37EC400DA6239 /* splitmatrix.cpp */; }; A7E9B97212D37EC400DA6239 /* structchi2.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84512D37EC400DA6239 /* structchi2.cpp */; }; A7E9B97312D37EC400DA6239 /* structchord.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84712D37EC400DA6239 /* structchord.cpp */; }; A7E9B97412D37EC400DA6239 /* structeuclidean.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84912D37EC400DA6239 /* structeuclidean.cpp */; }; A7E9B97512D37EC400DA6239 /* structkulczynski.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84B12D37EC400DA6239 /* structkulczynski.cpp */; }; A7E9B97612D37EC400DA6239 /* structpearson.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84D12D37EC400DA6239 /* structpearson.cpp */; }; A7E9B97712D37EC400DA6239 /* subsamplecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B84F12D37EC400DA6239 /* subsamplecommand.cpp */; }; A7E9B97812D37EC400DA6239 /* suffixdb.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85112D37EC400DA6239 /* suffixdb.cpp */; }; A7E9B97912D37EC400DA6239 /* suffixnodes.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85312D37EC400DA6239 /* suffixnodes.cpp */; }; A7E9B97A12D37EC400DA6239 /* suffixtree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85512D37EC400DA6239 /* suffixtree.cpp */; }; A7E9B97B12D37EC400DA6239 /* summarycommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85712D37EC400DA6239 /* summarycommand.cpp */; }; A7E9B97C12D37EC400DA6239 /* summarysharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85912D37EC400DA6239 /* summarysharedcommand.cpp */; }; A7E9B97D12D37EC400DA6239 /* systemcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85B12D37EC400DA6239 /* systemcommand.cpp */; }; A7E9B97E12D37EC400DA6239 /* taxonomyequalizer.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85D12D37EC400DA6239 /* taxonomyequalizer.cpp */; }; A7E9B97F12D37EC400DA6239 /* tree.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B85F12D37EC400DA6239 /* tree.cpp */; }; A7E9B98012D37EC400DA6239 /* treegroupscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86212D37EC400DA6239 /* treegroupscommand.cpp */; }; A7E9B98112D37EC400DA6239 /* treemap.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86412D37EC400DA6239 /* treemap.cpp */; }; A7E9B98212D37EC400DA6239 /* treenode.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86612D37EC400DA6239 /* treenode.cpp */; }; A7E9B98312D37EC400DA6239 /* trimflowscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86812D37EC400DA6239 /* trimflowscommand.cpp */; }; A7E9B98412D37EC400DA6239 /* trimseqscommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86A12D37EC400DA6239 /* trimseqscommand.cpp */; }; A7E9B98512D37EC400DA6239 /* unifracunweightedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86C12D37EC400DA6239 /* unifracunweightedcommand.cpp */; }; A7E9B98612D37EC400DA6239 /* unifracweightedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B86E12D37EC400DA6239 /* unifracweightedcommand.cpp */; }; A7E9B98712D37EC400DA6239 /* unweighted.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87012D37EC400DA6239 /* unweighted.cpp */; }; A7E9B98812D37EC400DA6239 /* uvest.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87212D37EC400DA6239 /* uvest.cpp */; }; A7E9B98912D37EC400DA6239 /* validcalculator.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87412D37EC400DA6239 /* validcalculator.cpp */; }; A7E9B98A12D37EC400DA6239 /* validparameter.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87612D37EC400DA6239 /* validparameter.cpp */; }; A7E9B98B12D37EC400DA6239 /* venn.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87812D37EC400DA6239 /* venn.cpp */; }; A7E9B98C12D37EC400DA6239 /* venncommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87A12D37EC400DA6239 /* venncommand.cpp */; }; A7E9B98D12D37EC400DA6239 /* weighted.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87C12D37EC400DA6239 /* weighted.cpp */; }; A7E9B98E12D37EC400DA6239 /* weightedlinkage.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87E12D37EC400DA6239 /* weightedlinkage.cpp */; }; A7E9B98F12D37EC400DA6239 /* whittaker.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7E9B87F12D37EC400DA6239 /* whittaker.cpp */; }; A7EEB0F514F29BFE00344B83 /* classifytreecommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7EEB0F414F29BFD00344B83 /* classifytreecommand.cpp */; }; A7F24FC317EA36600021DC9A /* classifyrfsharedcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7F24FC117EA365F0021DC9A /* classifyrfsharedcommand.cpp */; }; A7F9F5CF141A5E500032F693 /* sequenceparser.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7F9F5CE141A5E500032F693 /* sequenceparser.cpp */; }; A7FA10021302E097003860FE /* mantelcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FA10011302E096003860FE /* mantelcommand.cpp */; }; A7FC480E12D788F20055BC5C /* linearalgebra.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FC480D12D788F20055BC5C /* linearalgebra.cpp */; }; A7FC486712D795D60055BC5C /* pcacommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FC486612D795D60055BC5C /* pcacommand.cpp */; }; A7FE7C401330EA1000F7B327 /* getcurrentcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FE7C3F1330EA1000F7B327 /* getcurrentcommand.cpp */; }; A7FE7E6D13311EA400F7B327 /* setcurrentcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FE7E6C13311EA400F7B327 /* setcurrentcommand.cpp */; }; A7FF19F2140FFDA500AD216D /* trimoligos.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FF19F1140FFDA500AD216D /* trimoligos.cpp */; }; A7FFB558142CA02C004884F2 /* summarytaxcommand.cpp in Sources */ = {isa = PBXBuildFile; fileRef = A7FFB557142CA02C004884F2 /* summarytaxcommand.cpp */; }; /* End PBXBuildFile section */ /* Begin PBXBuildRule section */ 481FB6A11AC1BE060076CFF3 /* PBXBuildRule */ = { isa = PBXBuildRule; compilerSpec = com.apple.compilers.proxy.script; fileType = sourcecode.fortran; isEditable = 1; outputFiles = ( "$(TARGET_BUILD_DIR)/$(INPUT_FILE_BASE).o", ); script = ""; }; A7D162CB149F96CA000523E8 /* PBXBuildRule */ = { isa = PBXBuildRule; compilerSpec = com.apple.compilers.proxy.script; fileType = sourcecode.fortran; isEditable = 1; outputFiles = ( "$(TARGET_BUILD_DIR)/$(INPUT_FILE_BASE).o", ); script = ""; }; /* End PBXBuildRule section */ /* Begin PBXCopyFilesBuildPhase section */ 481FB5171AC0A63E0076CFF3 /* CopyFiles */ = { isa = PBXCopyFilesBuildPhase; buildActionMask = 2147483647; dstPath = /usr/share/man/man1/; dstSubfolderSpec = 0; files = ( ); runOnlyForDeploymentPostprocessing = 1; }; 8DD76FAF0486AB0100D96B5E /* CopyFiles */ = { isa = PBXCopyFilesBuildPhase; buildActionMask = 8; dstPath = Users/SarahsWork/desktop/debug; dstSubfolderSpec = 16; files = ( ); runOnlyForDeploymentPostprocessing = 1; }; /* End PBXCopyFilesBuildPhase section */ /* Begin PBXFileReference section */ 219C1DDF1552C4BD004209F9 /* newcommandtemplate.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = newcommandtemplate.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/newcommandtemplate.cpp; sourceTree = ""; }; 219C1DE11552C508004209F9 /* newcommandtemplate.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = newcommandtemplate.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/newcommandtemplate.h; sourceTree = ""; }; 219C1DE31559BCCD004209F9 /* getcoremicrobiomecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getcoremicrobiomecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getcoremicrobiomecommand.cpp; sourceTree = ""; }; 219C1DE51559BCF2004209F9 /* getcoremicrobiomecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getcoremicrobiomecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getcoremicrobiomecommand.h; sourceTree = ""; }; 481623E11B56A2DB004C60B7 /* pcrseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = pcrseqscommand.cpp; path = source/commands/pcrseqscommand.cpp; sourceTree = ""; }; 481623E31B58267D004C60B7 /* INSTALL.md */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = net.daringfireball.markdown; path = INSTALL.md; sourceTree = ""; }; 481FB5191AC0A63E0076CFF3 /* TestMothur */ = {isa = PBXFileReference; explicitFileType = "compiled.mach-o.executable"; includeInIndex = 0; path = TestMothur; sourceTree = BUILT_PRODUCTS_DIR; }; 481FB51B1AC0A63E0076CFF3 /* main.cpp */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.cpp.cpp; path = main.cpp; sourceTree = ""; }; 481FB5201AC0A6B60076CFF3 /* catch.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; path = catch.hpp; sourceTree = ""; }; 481FB5231AC0AA430076CFF3 /* testsequence.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = testsequence.cpp; path = testdatastructures/testsequence.cpp; sourceTree = ""; }; 481FB5281AC19F8B0076CFF3 /* setseedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = setseedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/setseedcommand.cpp; sourceTree = ""; }; 481FB5291AC19F8B0076CFF3 /* setseedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = setseedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/setseedcommand.h; sourceTree = ""; }; 481FB52D1AC1B0CB0076CFF3 /* testsetseedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = testsetseedcommand.cpp; path = testcommands/testsetseedcommand.cpp; sourceTree = ""; }; 48705ABA19BE32C50075E977 /* abstractrandomforest.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = abstractrandomforest.hpp; path = source/randomforest/abstractrandomforest.hpp; sourceTree = ""; }; 48705ABB19BE32C50075E977 /* getmimarkspackagecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getmimarkspackagecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getmimarkspackagecommand.cpp; sourceTree = ""; }; 48705ABC19BE32C50075E977 /* getmimarkspackagecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getmimarkspackagecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getmimarkspackagecommand.h; sourceTree = ""; }; 48705ABD19BE32C50075E977 /* oligos.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = oligos.cpp; path = source/datastructures/oligos.cpp; sourceTree = ""; }; 48705ABE19BE32C50075E977 /* oligos.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = oligos.h; path = source/datastructures/oligos.h; sourceTree = ""; }; 48705ABF19BE32C50075E977 /* mergesfffilecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mergesfffilecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mergesfffilecommand.cpp; sourceTree = ""; }; 48705AC019BE32C50075E977 /* mergesfffilecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mergesfffilecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mergesfffilecommand.h; sourceTree = ""; }; 48705AC119BE32C50075E977 /* sharedrjsd.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedrjsd.cpp; path = source/calculators/sharedrjsd.cpp; sourceTree = ""; }; 48705AC219BE32C50075E977 /* sharedrjsd.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedrjsd.h; path = source/calculators/sharedrjsd.h; sourceTree = ""; }; 48705AC319BE32C50075E977 /* abstractrandomforest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = abstractrandomforest.cpp; path = source/randomforest/abstractrandomforest.cpp; sourceTree = ""; }; 487C5A851AB88B93002AF48A /* mimarksattributescommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mimarksattributescommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mimarksattributescommand.cpp; sourceTree = ""; }; 487C5A861AB88B93002AF48A /* mimarksattributescommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mimarksattributescommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mimarksattributescommand.h; sourceTree = ""; }; 48844B261AA74AF9006EF2B8 /* compare.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = compare.h; path = source/datastructures/compare.h; sourceTree = ""; }; 48C51DEE1A76B870004ECDF1 /* fastqread.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = fastqread.h; path = source/datastructures/fastqread.h; sourceTree = ""; }; 48C51DEF1A76B888004ECDF1 /* fastqread.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = fastqread.cpp; path = source/datastructures/fastqread.cpp; sourceTree = ""; }; 48C51DF11A793EFE004ECDF1 /* kmeralign.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = kmeralign.cpp; path = source/datastructures/kmeralign.cpp; sourceTree = ""; }; 48C51DF21A793EFE004ECDF1 /* kmeralign.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = kmeralign.h; path = source/datastructures/kmeralign.h; sourceTree = ""; }; 48DB37B11B3B27E000C372A4 /* makefilecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = makefilecommand.cpp; path = source/commands/makefilecommand.cpp; sourceTree = ""; }; 48DB37B21B3B27E000C372A4 /* makefilecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = makefilecommand.h; path = source/commands/makefilecommand.h; sourceTree = ""; }; 48F98E4C1A9CFD670005E81B /* completelinkage.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = completelinkage.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/completelinkage.cpp; sourceTree = ""; }; 7B2181FE17AD777B00286E6A /* classifysvmsharedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = classifysvmsharedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifysvmsharedcommand.cpp; sourceTree = ""; }; 7B2181FF17AD777B00286E6A /* classifysvmsharedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = classifysvmsharedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifysvmsharedcommand.h; sourceTree = ""; }; 7B21820117AD77BD00286E6A /* svm.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = svm.cpp; path = source/svm/svm.cpp; sourceTree = ""; }; 7B21820217AD77BD00286E6A /* svm.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = svm.hpp; path = source/svm/svm.hpp; sourceTree = ""; }; 7E6BE10812F710D8007ADDBE /* refchimeratest.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = refchimeratest.h; path = /Users/sarahwestcott/Desktop/mothur/source/refchimeratest.h; sourceTree = ""; }; 7E6BE10912F710D8007ADDBE /* refchimeratest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = refchimeratest.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/refchimeratest.cpp; sourceTree = ""; }; 7E78911B135F3E8600E725D2 /* eachgapdistignorens.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = eachgapdistignorens.h; path = source/calculators/eachgapdistignorens.h; sourceTree = ""; }; 834D9D561656D7C400E7FAB9 /* regularizedrandomforest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = regularizedrandomforest.cpp; path = source/randomforest/regularizedrandomforest.cpp; sourceTree = ""; }; 834D9D571656D7C400E7FAB9 /* regularizedrandomforest.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = regularizedrandomforest.h; path = source/randomforest/regularizedrandomforest.h; sourceTree = ""; }; 83F25B0A163B031200ABE73D /* forest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = forest.cpp; path = source/randomforest/forest.cpp; sourceTree = ""; }; 83F25B0B163B031200ABE73D /* forest.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = forest.h; path = source/randomforest/forest.h; sourceTree = ""; }; 8DD76FB20486AB0100D96B5E /* mothur */ = {isa = PBXFileReference; explicitFileType = "compiled.mach-o.executable"; includeInIndex = 0; path = mothur; sourceTree = BUILT_PRODUCTS_DIR; }; A70056E5156A93D000924A2D /* getotulabelscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getotulabelscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getotulabelscommand.cpp; sourceTree = ""; }; A70056E8156A93E300924A2D /* getotulabelscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getotulabelscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getotulabelscommand.h; sourceTree = ""; }; A70056E9156AB6D400924A2D /* removeotulabelscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = removeotulabelscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removeotulabelscommand.h; sourceTree = ""; }; A70056EA156AB6E500924A2D /* removeotulabelscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = removeotulabelscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removeotulabelscommand.cpp; sourceTree = ""; }; A70332B512D3A13400761E33 /* makefile */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.make; path = makefile; sourceTree = ""; }; A7128B1A16B7001200723BE4 /* getdistscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getdistscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getdistscommand.h; sourceTree = ""; }; A7128B1C16B7002600723BE4 /* getdistscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getdistscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getdistscommand.cpp; sourceTree = ""; }; A7132EAE184E76EB00AAA402 /* communitytype.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = communitytype.h; path = source/communitytype/communitytype.h; sourceTree = ""; }; A7132EB2184E792700AAA402 /* communitytype.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = communitytype.cpp; path = source/communitytype/communitytype.cpp; sourceTree = ""; }; A713EBAA12DC7613000092AC /* readphylipvector.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = readphylipvector.h; path = source/read/readphylipvector.h; sourceTree = ""; }; A713EBAB12DC7613000092AC /* readphylipvector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = readphylipvector.cpp; path = source/read/readphylipvector.cpp; sourceTree = ""; }; A713EBEB12DC7C5E000092AC /* nmdscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = nmdscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/nmdscommand.h; sourceTree = ""; }; A713EBEC12DC7C5E000092AC /* nmdscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = nmdscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/nmdscommand.cpp; sourceTree = ""; }; A7190B201768E0DF00A9AFA6 /* lefsecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = lefsecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/lefsecommand.cpp; sourceTree = ""; }; A7190B211768E0DF00A9AFA6 /* lefsecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = lefsecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/lefsecommand.h; sourceTree = ""; }; A71CB15E130B04A2001E7287 /* anosimcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = anosimcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/anosimcommand.cpp; sourceTree = ""; }; A71CB15F130B04A2001E7287 /* anosimcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = anosimcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/anosimcommand.h; sourceTree = ""; }; A71FE12A12EDF72400963CA7 /* mergegroupscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mergegroupscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mergegroupscommand.h; sourceTree = ""; }; A71FE12B12EDF72400963CA7 /* mergegroupscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mergegroupscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mergegroupscommand.cpp; sourceTree = ""; }; A721765513BB9F7D0014DAAE /* referencedb.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = referencedb.h; path = source/datastructures/referencedb.h; sourceTree = ""; }; A721765613BB9F7D0014DAAE /* referencedb.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = referencedb.cpp; path = source/datastructures/referencedb.cpp; sourceTree = ""; }; A721AB66161C570F009860A1 /* alignnode.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = alignnode.cpp; path = source/classifier/alignnode.cpp; sourceTree = ""; }; A721AB67161C570F009860A1 /* alignnode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = alignnode.h; path = source/classifier/alignnode.h; sourceTree = ""; }; A721AB68161C570F009860A1 /* aligntree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = aligntree.cpp; path = source/classifier/aligntree.cpp; sourceTree = ""; }; A721AB69161C570F009860A1 /* aligntree.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = aligntree.h; path = source/classifier/aligntree.h; sourceTree = ""; }; A721AB6D161C572A009860A1 /* kmernode.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = kmernode.cpp; path = source/classifier/kmernode.cpp; sourceTree = ""; }; A721AB6E161C572A009860A1 /* kmernode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = kmernode.h; path = source/classifier/kmernode.h; sourceTree = ""; }; A721AB6F161C572A009860A1 /* kmertree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = kmertree.cpp; path = source/classifier/kmertree.cpp; sourceTree = ""; }; A721AB70161C572A009860A1 /* kmertree.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = kmertree.h; path = source/classifier/kmertree.h; sourceTree = ""; }; A721AB73161C573B009860A1 /* taxonomynode.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = taxonomynode.cpp; path = source/classifier/taxonomynode.cpp; sourceTree = ""; }; A721AB74161C573B009860A1 /* taxonomynode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = taxonomynode.h; path = source/classifier/taxonomynode.h; sourceTree = ""; }; A7222D711856276C0055A993 /* sharedjsd.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = sharedjsd.h; path = source/calculators/sharedjsd.h; sourceTree = ""; }; A7222D721856277C0055A993 /* sharedjsd.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedjsd.cpp; path = source/calculators/sharedjsd.cpp; sourceTree = ""; }; A724D2B4153C8600000A826F /* makebiomcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = makebiomcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makebiomcommand.h; sourceTree = ""; }; A724D2B6153C8628000A826F /* makebiomcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = makebiomcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makebiomcommand.cpp; sourceTree = ""; }; A727864212E9E28C00F86ABA /* removerarecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = removerarecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removerarecommand.h; sourceTree = ""; }; A727864312E9E28C00F86ABA /* removerarecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = removerarecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removerarecommand.cpp; sourceTree = ""; }; A7386C1B1619CACB00651424 /* abstractdecisiontree.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = abstractdecisiontree.hpp; path = source/randomforest/abstractdecisiontree.hpp; sourceTree = ""; }; A7386C1D1619CACB00651424 /* decisiontree.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = decisiontree.hpp; path = source/randomforest/decisiontree.hpp; sourceTree = ""; }; A7386C1E1619CACB00651424 /* macros.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = macros.h; path = source/randomforest/macros.h; sourceTree = ""; }; A7386C1F1619CACB00651424 /* randomforest.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = randomforest.hpp; path = source/randomforest/randomforest.hpp; sourceTree = ""; }; A7386C201619CACB00651424 /* rftreenode.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = rftreenode.hpp; path = source/randomforest/rftreenode.hpp; sourceTree = ""; }; A7386C241619E52200651424 /* abstractdecisiontree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = abstractdecisiontree.cpp; path = source/randomforest/abstractdecisiontree.cpp; sourceTree = ""; }; A7386C28161A110700651424 /* decisiontree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = decisiontree.cpp; path = source/randomforest/decisiontree.cpp; sourceTree = ""; }; A73901051588C3EF00ED2ED6 /* loadlogfilecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = loadlogfilecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/loadlogfilecommand.h; sourceTree = ""; }; A73901071588C40900ED2ED6 /* loadlogfilecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = loadlogfilecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/loadlogfilecommand.cpp; sourceTree = ""; }; A73DDBB813C4A0D1006AAE38 /* clearmemorycommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = clearmemorycommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clearmemorycommand.h; sourceTree = ""; }; A73DDBB913C4A0D1006AAE38 /* clearmemorycommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = clearmemorycommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clearmemorycommand.cpp; sourceTree = ""; }; A73DDC3613C4BF64006AAE38 /* mothurmetastats.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mothurmetastats.h; path = source/metastats/mothurmetastats.h; sourceTree = ""; }; A73DDC3713C4BF64006AAE38 /* mothurmetastats.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mothurmetastats.cpp; path = source/metastats/mothurmetastats.cpp; sourceTree = ""; }; A741744A175CD9B1007DF49B /* makelefsecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = makelefsecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makelefsecommand.cpp; sourceTree = ""; }; A741744B175CD9B1007DF49B /* makelefsecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = makelefsecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makelefsecommand.h; sourceTree = ""; }; A741FAD115D1688E0067BCC5 /* sequencecountparser.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sequencecountparser.cpp; path = source/datastructures/sequencecountparser.cpp; sourceTree = ""; }; A741FAD415D168A00067BCC5 /* sequencecountparser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sequencecountparser.h; path = source/datastructures/sequencecountparser.h; sourceTree = ""; }; A747EC6F181EA0E500345732 /* sracommand.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = sracommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sracommand.h; sourceTree = ""; }; A747EC70181EA0F900345732 /* sracommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sracommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sracommand.cpp; sourceTree = ""; }; A7496D2C167B531B00CC7D7C /* kruskalwalliscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = kruskalwalliscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/kruskalwalliscommand.cpp; sourceTree = ""; }; A7496D2D167B531B00CC7D7C /* kruskalwalliscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = kruskalwalliscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/kruskalwalliscommand.h; sourceTree = ""; }; A74C06E616A9C097008390A3 /* primerdesigncommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = primerdesigncommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/primerdesigncommand.h; sourceTree = ""; }; A74C06E816A9C0A8008390A3 /* primerdesigncommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = primerdesigncommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/primerdesigncommand.cpp; sourceTree = ""; }; A74D36B6137DAFAA00332B0C /* chimerauchimecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimerauchimecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimerauchimecommand.h; sourceTree = ""; }; A74D36B7137DAFAA00332B0C /* chimerauchimecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chimerauchimecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimerauchimecommand.cpp; sourceTree = ""; }; A74D59A3159A1E2000043046 /* counttable.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = counttable.cpp; path = source/datastructures/counttable.cpp; sourceTree = ""; }; A74D59A6159A1E3600043046 /* counttable.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = counttable.h; path = source/datastructures/counttable.h; sourceTree = ""; }; A754149514840CF7005850D1 /* summaryqualcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = summaryqualcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/summaryqualcommand.h; sourceTree = ""; }; A754149614840CF7005850D1 /* summaryqualcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = summaryqualcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/summaryqualcommand.cpp; sourceTree = ""; }; A7548FAB17142EA500B1F05A /* getmetacommunitycommand.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = getmetacommunitycommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getmetacommunitycommand.h; sourceTree = ""; }; A7548FAC17142EBC00B1F05A /* getmetacommunitycommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getmetacommunitycommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getmetacommunitycommand.cpp; sourceTree = ""; }; A7548FAE171440EC00B1F05A /* qFinderDMM.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = qFinderDMM.cpp; path = source/communitytype/qFinderDMM.cpp; sourceTree = ""; }; A7548FAF171440ED00B1F05A /* qFinderDMM.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = qFinderDMM.h; path = source/communitytype/qFinderDMM.h; sourceTree = ""; }; A75790571301749D00A30DAB /* homovacommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = homovacommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/homovacommand.h; sourceTree = ""; }; A75790581301749D00A30DAB /* homovacommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = homovacommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/homovacommand.cpp; sourceTree = ""; }; A76CDD7F1510F09A004C8458 /* pcrseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = pcrseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pcrseqscommand.h; sourceTree = ""; }; A7730EFD13967241007433A3 /* countseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = countseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/countseqscommand.h; sourceTree = ""; }; A7730EFE13967241007433A3 /* countseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = countseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/countseqscommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A774101214695AF60098E6AC /* shhhseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = shhhseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/shhhseqscommand.h; sourceTree = ""; }; A774101314695AF60098E6AC /* shhhseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = shhhseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/shhhseqscommand.cpp; sourceTree = ""; }; A774104614696F320098E6AC /* myseqdist.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = myseqdist.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/myseqdist.cpp; sourceTree = ""; }; A774104714696F320098E6AC /* myseqdist.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = myseqdist.h; path = /Users/sarahwestcott/Desktop/mothur/source/myseqdist.h; sourceTree = ""; }; A77410F414697C300098E6AC /* seqnoise.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = seqnoise.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/seqnoise.cpp; sourceTree = ""; }; A77410F514697C300098E6AC /* seqnoise.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = seqnoise.h; path = /Users/sarahwestcott/Desktop/mothur/source/seqnoise.h; sourceTree = ""; }; A778FE69134CA6CA00C0BA33 /* getcommandinfocommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getcommandinfocommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getcommandinfocommand.h; sourceTree = ""; }; A778FE6A134CA6CA00C0BA33 /* getcommandinfocommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getcommandinfocommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getcommandinfocommand.cpp; sourceTree = ""; }; A77916E6176F7F7600EEFE18 /* designmap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = designmap.cpp; path = source/datastructures/designmap.cpp; sourceTree = ""; }; A77916E7176F7F7600EEFE18 /* designmap.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = designmap.h; path = source/datastructures/designmap.h; sourceTree = ""; }; A77A221D139001B600B0BE70 /* deuniquetreecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = deuniquetreecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/deuniquetreecommand.h; sourceTree = ""; }; A77A221E139001B600B0BE70 /* deuniquetreecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = deuniquetreecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/deuniquetreecommand.cpp; sourceTree = ""; }; A77B7183173D222F002163C2 /* sparcccommand.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = sparcccommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sparcccommand.h; sourceTree = ""; }; A77B7184173D2240002163C2 /* sparcccommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sparcccommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sparcccommand.cpp; sourceTree = ""; }; A77B7186173D4041002163C2 /* randomnumber.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = randomnumber.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/randomnumber.cpp; sourceTree = ""; }; A77B7187173D4041002163C2 /* randomnumber.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = randomnumber.h; path = /Users/sarahwestcott/Desktop/mothur/source/randomnumber.h; sourceTree = ""; }; A77B7189173D40E4002163C2 /* calcsparcc.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = calcsparcc.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/calcsparcc.cpp; sourceTree = ""; }; A77B718A173D40E4002163C2 /* calcsparcc.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = calcsparcc.h; path = /Users/sarahwestcott/Desktop/mothur/source/calcsparcc.h; sourceTree = ""; }; A77E1937161B201E00DB1A2A /* randomforest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = randomforest.cpp; path = source/randomforest/randomforest.cpp; sourceTree = ""; }; A77E193A161B289600DB1A2A /* rftreenode.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = rftreenode.cpp; path = source/randomforest/rftreenode.cpp; sourceTree = ""; }; A77EBD2C1523707F00ED407C /* createdatabasecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = createdatabasecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/createdatabasecommand.h; sourceTree = ""; }; A77EBD2E1523709100ED407C /* createdatabasecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = createdatabasecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/createdatabasecommand.cpp; sourceTree = ""; }; A7876A25152A017C00A0AE86 /* subsample.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = subsample.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/subsample.cpp; sourceTree = ""; }; A7876A28152A018B00A0AE86 /* subsample.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = subsample.h; path = /Users/sarahwestcott/Desktop/mothur/source/subsample.h; sourceTree = ""; }; A79234D513C74BF6002B08E2 /* mothurfisher.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mothurfisher.h; path = source/metastats/mothurfisher.h; sourceTree = ""; }; A79234D613C74BF6002B08E2 /* mothurfisher.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mothurfisher.cpp; path = source/metastats/mothurfisher.cpp; sourceTree = ""; }; A795840B13F13CD900F201D5 /* countgroupscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = countgroupscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/countgroupscommand.h; sourceTree = ""; }; A795840C13F13CD900F201D5 /* countgroupscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = countgroupscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/countgroupscommand.cpp; sourceTree = ""; }; A799314816CBD0BC0017E888 /* mergetaxsummarycommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mergetaxsummarycommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mergetaxsummarycommand.h; sourceTree = ""; }; A799314A16CBD0CD0017E888 /* mergetaxsummarycommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mergetaxsummarycommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mergetaxsummarycommand.cpp; sourceTree = ""; }; A799F5B71309A3E000AEEFA0 /* makefastqcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = makefastqcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makefastqcommand.h; sourceTree = ""; }; A799F5B81309A3E000AEEFA0 /* makefastqcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = makefastqcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makefastqcommand.cpp; sourceTree = ""; }; A79EEF8516971D4A0006DEC1 /* filtersharedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = filtersharedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/filtersharedcommand.cpp; sourceTree = ""; }; A79EEF8816971D640006DEC1 /* filtersharedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = filtersharedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/filtersharedcommand.h; sourceTree = ""; }; A7A067191562946F0095C8C5 /* listotulabelscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = listotulabelscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/listotulabelscommand.cpp; sourceTree = ""; }; A7A0671C156294810095C8C5 /* listotulabelscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = listotulabelscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/listotulabelscommand.h; sourceTree = ""; }; A7A0671D1562AC230095C8C5 /* makecontigscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = makecontigscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makecontigscommand.h; sourceTree = ""; }; A7A0671E1562AC3E0095C8C5 /* makecontigscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = makecontigscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makecontigscommand.cpp; sourceTree = ""; }; A7A09B0E18773BF700FAA081 /* shannonrange.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = shannonrange.h; path = source/calculators/shannonrange.h; sourceTree = ""; }; A7A09B0F18773C0E00FAA081 /* shannonrange.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = shannonrange.cpp; path = source/calculators/shannonrange.cpp; sourceTree = ""; }; A7A32DA914DC43B00001D2E5 /* sortseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sortseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sortseqscommand.cpp; sourceTree = ""; }; A7A32DAC14DC43D10001D2E5 /* sortseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sortseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sortseqscommand.h; sourceTree = ""; }; A7A3C8C714D041AD00B1BFBE /* otuassociationcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = otuassociationcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/otuassociationcommand.cpp; sourceTree = ""; }; A7A3C8C814D041AD00B1BFBE /* otuassociationcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = otuassociationcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/otuassociationcommand.h; sourceTree = ""; }; A7A61F1A130035C800E05B6B /* LICENSE.md */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = net.daringfireball.markdown; path = LICENSE.md; sourceTree = ""; }; A7A61F2B130062E000E05B6B /* amovacommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = amovacommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/amovacommand.h; sourceTree = ""; }; A7A61F2C130062E000E05B6B /* amovacommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = amovacommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/amovacommand.cpp; sourceTree = ""; }; A7AACFBA132FE008003D6C4D /* currentfile.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = currentfile.h; path = /Users/sarahwestcott/Desktop/mothur/source/currentfile.h; sourceTree = ""; }; A7B0231416B8244B006BA09E /* removedistscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = removedistscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removedistscommand.cpp; sourceTree = ""; }; A7B0231716B8245D006BA09E /* removedistscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = removedistscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removedistscommand.h; sourceTree = ""; }; A7B093BE18579EF600843CD1 /* pam.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = pam.h; path = source/communitytype/pam.h; sourceTree = ""; }; A7B093BF18579F0400843CD1 /* pam.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = pam.cpp; path = source/communitytype/pam.cpp; sourceTree = ""; }; A7BF221214587886000AD524 /* myPerseus.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = myPerseus.cpp; path = source/chimera/myPerseus.cpp; sourceTree = ""; }; A7BF221314587886000AD524 /* myPerseus.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = myPerseus.h; path = source/chimera/myPerseus.h; sourceTree = ""; }; A7BF2230145879B2000AD524 /* chimeraperseuscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimeraperseuscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimeraperseuscommand.h; sourceTree = ""; }; A7BF2231145879B2000AD524 /* chimeraperseuscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chimeraperseuscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimeraperseuscommand.cpp; sourceTree = ""; }; A7C3DC0914FE457500FE1924 /* cooccurrencecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = cooccurrencecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/cooccurrencecommand.cpp; sourceTree = ""; }; A7C3DC0A14FE457500FE1924 /* cooccurrencecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = cooccurrencecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/cooccurrencecommand.h; sourceTree = ""; }; A7C3DC0D14FE469500FE1924 /* trialSwap2.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = trialSwap2.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/trialSwap2.cpp; sourceTree = ""; }; A7C3DC0E14FE469500FE1924 /* trialswap2.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = trialswap2.h; path = /Users/sarahwestcott/Desktop/mothur/source/trialswap2.h; sourceTree = ""; }; A7C7DAB615DA75760059B0CF /* sffmultiplecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sffmultiplecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sffmultiplecommand.h; sourceTree = ""; }; A7C7DAB815DA758B0059B0CF /* sffmultiplecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sffmultiplecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sffmultiplecommand.cpp; sourceTree = ""; }; A7CFA42F1755400500D9ED4D /* renameseqscommand.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = renameseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/renameseqscommand.h; sourceTree = ""; }; A7CFA4301755401800D9ED4D /* renameseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = renameseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/renameseqscommand.cpp; sourceTree = ""; }; A7D395C2184FA39300A350D7 /* kmeans.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = kmeans.h; path = source/communitytype/kmeans.h; sourceTree = ""; }; A7D395C3184FA3A200A350D7 /* kmeans.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = kmeans.cpp; path = source/communitytype/kmeans.cpp; sourceTree = ""; }; A7D755D71535F665009BF21A /* treereader.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = treereader.h; path = source/read/treereader.h; sourceTree = ""; }; A7D755D91535F679009BF21A /* treereader.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = treereader.cpp; path = source/read/treereader.cpp; sourceTree = ""; }; A7D9378917B146B5001E90B0 /* wilcox.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = wilcox.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/wilcox.cpp; sourceTree = ""; }; A7D9378B17B15215001E90B0 /* wilcox.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = wilcox.h; path = /Users/sarahwestcott/Desktop/mothur/source/wilcox.h; sourceTree = ""; }; A7DAAFA3133A254E003956EB /* commandparameter.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = commandparameter.h; path = /Users/sarahwestcott/Desktop/mothur/source/commandparameter.h; sourceTree = ""; }; A7E0243C15B4520A00A5F046 /* sparsedistancematrix.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sparsedistancematrix.cpp; path = source/datastructures/sparsedistancematrix.cpp; sourceTree = ""; }; A7E0243F15B4522000A5F046 /* sparsedistancematrix.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sparsedistancematrix.h; path = source/datastructures/sparsedistancematrix.h; sourceTree = ""; }; A7E6F69C17427CF2006775E2 /* makelookupcommand.h */ = {isa = PBXFileReference; lastKnownFileType = sourcecode.c.h; name = makelookupcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makelookupcommand.h; sourceTree = ""; }; A7E6F69D17427D06006775E2 /* makelookupcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = makelookupcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makelookupcommand.cpp; sourceTree = ""; }; A7E9B64F12D37EC300DA6239 /* ace.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = ace.cpp; path = source/calculators/ace.cpp; sourceTree = ""; }; A7E9B65012D37EC300DA6239 /* ace.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = ace.h; path = source/calculators/ace.h; sourceTree = ""; }; A7E9B65112D37EC300DA6239 /* aligncommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = aligncommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/aligncommand.cpp; sourceTree = ""; }; A7E9B65212D37EC300DA6239 /* aligncommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = aligncommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/aligncommand.h; sourceTree = ""; }; A7E9B65312D37EC300DA6239 /* alignment.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = alignment.cpp; path = source/datastructures/alignment.cpp; sourceTree = ""; }; A7E9B65412D37EC300DA6239 /* alignment.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = alignment.hpp; path = source/datastructures/alignment.hpp; sourceTree = ""; }; A7E9B65512D37EC300DA6239 /* alignmentcell.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = alignmentcell.cpp; path = source/datastructures/alignmentcell.cpp; sourceTree = ""; }; A7E9B65612D37EC300DA6239 /* alignmentcell.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = alignmentcell.hpp; path = source/datastructures/alignmentcell.hpp; sourceTree = ""; }; A7E9B65712D37EC300DA6239 /* alignmentdb.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = alignmentdb.cpp; path = source/datastructures/alignmentdb.cpp; sourceTree = ""; }; A7E9B65812D37EC300DA6239 /* alignmentdb.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = alignmentdb.h; path = source/datastructures/alignmentdb.h; sourceTree = ""; }; A7E9B65912D37EC300DA6239 /* averagelinkage.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = averagelinkage.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/averagelinkage.cpp; sourceTree = ""; }; A7E9B65A12D37EC300DA6239 /* bayesian.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = bayesian.cpp; path = source/classifier/bayesian.cpp; sourceTree = ""; }; A7E9B65B12D37EC300DA6239 /* bayesian.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = bayesian.h; path = source/classifier/bayesian.h; sourceTree = ""; }; A7E9B65C12D37EC300DA6239 /* bellerophon.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = bellerophon.cpp; path = source/chimera/bellerophon.cpp; sourceTree = ""; }; A7E9B65D12D37EC300DA6239 /* bellerophon.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = bellerophon.h; path = source/chimera/bellerophon.h; sourceTree = ""; }; A7E9B65E12D37EC300DA6239 /* bergerparker.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = bergerparker.cpp; path = source/calculators/bergerparker.cpp; sourceTree = ""; }; A7E9B65F12D37EC300DA6239 /* bergerparker.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = bergerparker.h; path = source/calculators/bergerparker.h; sourceTree = ""; }; A7E9B66012D37EC300DA6239 /* binsequencecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = binsequencecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/binsequencecommand.cpp; sourceTree = ""; }; A7E9B66112D37EC300DA6239 /* binsequencecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = binsequencecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/binsequencecommand.h; sourceTree = ""; }; A7E9B66212D37EC300DA6239 /* blastalign.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = blastalign.cpp; path = source/datastructures/blastalign.cpp; sourceTree = ""; }; A7E9B66312D37EC400DA6239 /* blastalign.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = blastalign.hpp; path = source/datastructures/blastalign.hpp; sourceTree = ""; }; A7E9B66412D37EC400DA6239 /* blastdb.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = blastdb.cpp; path = source/datastructures/blastdb.cpp; sourceTree = ""; }; A7E9B66512D37EC400DA6239 /* blastdb.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = blastdb.hpp; path = source/datastructures/blastdb.hpp; sourceTree = ""; }; A7E9B66612D37EC400DA6239 /* boneh.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = boneh.cpp; path = source/calculators/boneh.cpp; sourceTree = ""; }; A7E9B66712D37EC400DA6239 /* boneh.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = boneh.h; path = source/calculators/boneh.h; sourceTree = ""; }; A7E9B66812D37EC400DA6239 /* bootstrap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = bootstrap.cpp; path = source/calculators/bootstrap.cpp; sourceTree = ""; }; A7E9B66912D37EC400DA6239 /* bootstrap.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = bootstrap.h; path = source/calculators/bootstrap.h; sourceTree = ""; }; A7E9B66C12D37EC400DA6239 /* bstick.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = bstick.cpp; path = source/calculators/bstick.cpp; sourceTree = ""; }; A7E9B66D12D37EC400DA6239 /* bstick.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = bstick.h; path = source/calculators/bstick.h; sourceTree = ""; }; A7E9B66E12D37EC400DA6239 /* calculator.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = calculator.cpp; path = source/calculators/calculator.cpp; sourceTree = ""; }; A7E9B66F12D37EC400DA6239 /* calculator.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = calculator.h; path = source/calculators/calculator.h; sourceTree = ""; }; A7E9B67012D37EC400DA6239 /* canberra.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = canberra.cpp; path = source/calculators/canberra.cpp; sourceTree = ""; }; A7E9B67112D37EC400DA6239 /* canberra.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = canberra.h; path = source/calculators/canberra.h; sourceTree = ""; }; A7E9B67212D37EC400DA6239 /* catchallcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = catchallcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/catchallcommand.cpp; sourceTree = ""; }; A7E9B67312D37EC400DA6239 /* catchallcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = catchallcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/catchallcommand.h; sourceTree = ""; }; A7E9B67412D37EC400DA6239 /* ccode.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = ccode.cpp; path = source/chimera/ccode.cpp; sourceTree = ""; }; A7E9B67512D37EC400DA6239 /* ccode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; lineEnding = 0; name = ccode.h; path = source/chimera/ccode.h; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.objcpp; }; A7E9B67612D37EC400DA6239 /* chao1.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chao1.cpp; path = source/calculators/chao1.cpp; sourceTree = ""; }; A7E9B67712D37EC400DA6239 /* chao1.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chao1.h; path = source/calculators/chao1.h; sourceTree = ""; }; A7E9B67812D37EC400DA6239 /* chimera.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chimera.cpp; path = source/chimera/chimera.cpp; sourceTree = ""; }; A7E9B67912D37EC400DA6239 /* chimera.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimera.h; path = source/chimera/chimera.h; sourceTree = ""; }; A7E9B67A12D37EC400DA6239 /* chimerabellerophoncommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chimerabellerophoncommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimerabellerophoncommand.cpp; sourceTree = ""; }; A7E9B67B12D37EC400DA6239 /* chimerabellerophoncommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimerabellerophoncommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimerabellerophoncommand.h; sourceTree = ""; }; A7E9B67C12D37EC400DA6239 /* chimeraccodecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = chimeraccodecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimeraccodecommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B67D12D37EC400DA6239 /* chimeraccodecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimeraccodecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimeraccodecommand.h; sourceTree = ""; }; A7E9B67E12D37EC400DA6239 /* chimeracheckcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = chimeracheckcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimeracheckcommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B67F12D37EC400DA6239 /* chimeracheckcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimeracheckcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimeracheckcommand.h; sourceTree = ""; }; A7E9B68012D37EC400DA6239 /* chimeracheckrdp.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chimeracheckrdp.cpp; path = source/chimera/chimeracheckrdp.cpp; sourceTree = ""; }; A7E9B68112D37EC400DA6239 /* chimeracheckrdp.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; lineEnding = 0; name = chimeracheckrdp.h; path = source/chimera/chimeracheckrdp.h; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.objcpp; }; A7E9B68212D37EC400DA6239 /* chimerapintailcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = chimerapintailcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimerapintailcommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B68312D37EC400DA6239 /* chimerapintailcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimerapintailcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimerapintailcommand.h; sourceTree = ""; }; A7E9B68412D37EC400DA6239 /* chimerarealigner.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chimerarealigner.cpp; path = source/chimera/chimerarealigner.cpp; sourceTree = ""; }; A7E9B68512D37EC400DA6239 /* chimerarealigner.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimerarealigner.h; path = source/chimera/chimerarealigner.h; sourceTree = ""; }; A7E9B68812D37EC400DA6239 /* chimeraslayer.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chimeraslayer.cpp; path = source/chimera/chimeraslayer.cpp; sourceTree = ""; }; A7E9B68912D37EC400DA6239 /* chimeraslayer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimeraslayer.h; path = source/chimera/chimeraslayer.h; sourceTree = ""; }; A7E9B68A12D37EC400DA6239 /* chimeraslayercommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = chimeraslayercommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimeraslayercommand.cpp; sourceTree = ""; }; A7E9B68B12D37EC400DA6239 /* chimeraslayercommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chimeraslayercommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chimeraslayercommand.h; sourceTree = ""; }; A7E9B68C12D37EC400DA6239 /* chopseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = chopseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chopseqscommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B68D12D37EC400DA6239 /* chopseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = chopseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/chopseqscommand.h; sourceTree = ""; }; A7E9B68E12D37EC400DA6239 /* classify.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = classify.cpp; path = source/classifier/classify.cpp; sourceTree = ""; }; A7E9B68F12D37EC400DA6239 /* classify.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = classify.h; path = source/classifier/classify.h; sourceTree = ""; }; A7E9B69012D37EC400DA6239 /* classifyotucommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = classifyotucommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifyotucommand.cpp; sourceTree = ""; }; A7E9B69112D37EC400DA6239 /* classifyotucommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = classifyotucommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifyotucommand.h; sourceTree = ""; }; A7E9B69212D37EC400DA6239 /* classifyseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = classifyseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifyseqscommand.cpp; sourceTree = ""; }; A7E9B69312D37EC400DA6239 /* classifyseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = classifyseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifyseqscommand.h; sourceTree = ""; }; A7E9B69412D37EC400DA6239 /* clearcut.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = clearcut.cpp; path = source/clearcut/clearcut.cpp; sourceTree = ""; }; A7E9B69512D37EC400DA6239 /* clearcut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = clearcut.h; path = source/clearcut/clearcut.h; sourceTree = ""; }; A7E9B69612D37EC400DA6239 /* clearcutcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = clearcutcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clearcutcommand.cpp; sourceTree = ""; }; A7E9B69712D37EC400DA6239 /* clearcutcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = clearcutcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clearcutcommand.h; sourceTree = ""; }; A7E9B69812D37EC400DA6239 /* cluster.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = cluster.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/cluster.cpp; sourceTree = ""; }; A7E9B69912D37EC400DA6239 /* cluster.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = cluster.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/cluster.hpp; sourceTree = ""; }; A7E9B69A12D37EC400DA6239 /* clusterclassic.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = clusterclassic.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/clusterclassic.cpp; sourceTree = ""; }; A7E9B69B12D37EC400DA6239 /* clusterclassic.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = clusterclassic.h; path = /Users/sarahwestcott/Desktop/mothur/source/clusterclassic.h; sourceTree = ""; }; A7E9B69C12D37EC400DA6239 /* clustercommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = clustercommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clustercommand.cpp; sourceTree = ""; }; A7E9B69D12D37EC400DA6239 /* clustercommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; lineEnding = 0; name = clustercommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clustercommand.h; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.objcpp; }; A7E9B69E12D37EC400DA6239 /* clusterdoturcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = clusterdoturcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clusterdoturcommand.cpp; sourceTree = ""; }; A7E9B69F12D37EC400DA6239 /* clusterdoturcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = clusterdoturcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clusterdoturcommand.h; sourceTree = ""; }; A7E9B6A012D37EC400DA6239 /* clusterfragmentscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = clusterfragmentscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clusterfragmentscommand.cpp; sourceTree = ""; }; A7E9B6A112D37EC400DA6239 /* clusterfragmentscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = clusterfragmentscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clusterfragmentscommand.h; sourceTree = ""; }; A7E9B6A212D37EC400DA6239 /* clustersplitcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = clustersplitcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clustersplitcommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B6A312D37EC400DA6239 /* clustersplitcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = clustersplitcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/clustersplitcommand.h; sourceTree = ""; }; A7E9B6A412D37EC400DA6239 /* cmdargs.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = cmdargs.cpp; path = source/clearcut/cmdargs.cpp; sourceTree = ""; }; A7E9B6A512D37EC400DA6239 /* cmdargs.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = cmdargs.h; path = source/clearcut/cmdargs.h; sourceTree = ""; }; A7E9B6A612D37EC400DA6239 /* collect.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = collect.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/collect.cpp; sourceTree = ""; }; A7E9B6A712D37EC400DA6239 /* collect.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = collect.h; path = /Users/sarahwestcott/Desktop/mothur/source/collect.h; sourceTree = ""; }; A7E9B6A812D37EC400DA6239 /* collectcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = collectcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/collectcommand.cpp; sourceTree = ""; }; A7E9B6A912D37EC400DA6239 /* collectcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = collectcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/collectcommand.h; sourceTree = ""; }; A7E9B6AA12D37EC400DA6239 /* collectdisplay.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = collectdisplay.h; path = /Users/sarahwestcott/Desktop/mothur/source/collectdisplay.h; sourceTree = ""; }; A7E9B6AB12D37EC400DA6239 /* collectorscurvedata.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = collectorscurvedata.h; path = /Users/sarahwestcott/Desktop/mothur/source/collectorscurvedata.h; sourceTree = ""; }; A7E9B6AC12D37EC400DA6239 /* collectsharedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = collectsharedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/collectsharedcommand.cpp; sourceTree = ""; }; A7E9B6AD12D37EC400DA6239 /* collectsharedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = collectsharedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/collectsharedcommand.h; sourceTree = ""; }; A7E9B6AE12D37EC400DA6239 /* command.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = command.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/command.hpp; sourceTree = ""; }; A7E9B6AF12D37EC400DA6239 /* commandfactory.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = commandfactory.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commandfactory.cpp; sourceTree = ""; }; A7E9B6B012D37EC400DA6239 /* commandfactory.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = commandfactory.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/commandfactory.hpp; sourceTree = ""; }; A7E9B6B112D37EC400DA6239 /* commandoptionparser.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = commandoptionparser.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commandoptionparser.cpp; sourceTree = ""; }; A7E9B6B212D37EC400DA6239 /* commandoptionparser.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = commandoptionparser.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/commandoptionparser.hpp; sourceTree = ""; }; A7E9B6B312D37EC400DA6239 /* common.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = common.h; path = source/clearcut/common.h; sourceTree = ""; }; A7E9B6B512D37EC400DA6239 /* consensus.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = consensus.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/consensus.cpp; sourceTree = ""; }; A7E9B6B612D37EC400DA6239 /* consensus.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = consensus.h; path = /Users/sarahwestcott/Desktop/mothur/source/consensus.h; sourceTree = ""; }; A7E9B6B712D37EC400DA6239 /* consensusseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = consensusseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/consensusseqscommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B6B812D37EC400DA6239 /* consensusseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = consensusseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/consensusseqscommand.h; sourceTree = ""; }; A7E9B6B912D37EC400DA6239 /* corraxescommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = corraxescommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/corraxescommand.cpp; sourceTree = ""; }; A7E9B6BA12D37EC400DA6239 /* corraxescommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = corraxescommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/corraxescommand.h; sourceTree = ""; }; A7E9B6BB12D37EC400DA6239 /* coverage.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = coverage.cpp; path = source/calculators/coverage.cpp; sourceTree = ""; }; A7E9B6BC12D37EC400DA6239 /* coverage.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = coverage.h; path = source/calculators/coverage.h; sourceTree = ""; }; A7E9B6BD12D37EC400DA6239 /* database.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = database.cpp; path = source/datastructures/database.cpp; sourceTree = ""; }; A7E9B6BE12D37EC400DA6239 /* database.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = database.hpp; path = source/datastructures/database.hpp; sourceTree = ""; }; A7E9B6BF12D37EC400DA6239 /* datavector.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = datavector.hpp; path = source/datastructures/datavector.hpp; sourceTree = ""; }; A7E9B6C012D37EC400DA6239 /* dayhoff.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = dayhoff.h; path = source/calculators/dayhoff.h; sourceTree = ""; }; A7E9B6C112D37EC400DA6239 /* decalc.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = decalc.cpp; path = source/chimera/decalc.cpp; sourceTree = ""; }; A7E9B6C212D37EC400DA6239 /* decalc.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; lineEnding = 0; name = decalc.h; path = source/chimera/decalc.h; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.objcpp; }; A7E9B6C312D37EC400DA6239 /* deconvolutecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = deconvolutecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/deconvolutecommand.cpp; sourceTree = ""; }; A7E9B6C412D37EC400DA6239 /* deconvolutecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = deconvolutecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/deconvolutecommand.h; sourceTree = ""; }; A7E9B6C512D37EC400DA6239 /* degapseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = degapseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/degapseqscommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B6C612D37EC400DA6239 /* degapseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = degapseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/degapseqscommand.h; sourceTree = ""; }; A7E9B6C712D37EC400DA6239 /* deuniqueseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = deuniqueseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/deuniqueseqscommand.cpp; sourceTree = ""; }; A7E9B6C812D37EC400DA6239 /* deuniqueseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = deuniqueseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/deuniqueseqscommand.h; sourceTree = ""; }; A7E9B6C912D37EC400DA6239 /* display.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = display.h; path = /Users/sarahwestcott/Desktop/mothur/source/display.h; sourceTree = ""; }; A7E9B6CA12D37EC400DA6239 /* dist.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = dist.h; path = source/calculators/dist.h; sourceTree = ""; }; A7E9B6CB12D37EC400DA6239 /* distancecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = distancecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/distancecommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B6CC12D37EC400DA6239 /* distancecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = distancecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/distancecommand.h; sourceTree = ""; }; A7E9B6CD12D37EC400DA6239 /* distancedb.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = distancedb.cpp; path = source/datastructures/distancedb.cpp; sourceTree = ""; }; A7E9B6CE12D37EC400DA6239 /* distancedb.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = distancedb.hpp; path = source/datastructures/distancedb.hpp; sourceTree = ""; }; A7E9B6CF12D37EC400DA6239 /* distclearcut.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = distclearcut.cpp; path = source/clearcut/distclearcut.cpp; sourceTree = ""; }; A7E9B6D012D37EC400DA6239 /* distclearcut.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = distclearcut.h; path = source/clearcut/distclearcut.h; sourceTree = ""; }; A7E9B6D112D37EC400DA6239 /* dlibshuff.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = dlibshuff.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/dlibshuff.cpp; sourceTree = ""; }; A7E9B6D212D37EC400DA6239 /* dlibshuff.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = dlibshuff.h; path = /Users/sarahwestcott/Desktop/mothur/source/dlibshuff.h; sourceTree = ""; }; A7E9B6D312D37EC400DA6239 /* dmat.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = dmat.cpp; path = source/clearcut/dmat.cpp; sourceTree = ""; }; A7E9B6D412D37EC400DA6239 /* dmat.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = dmat.h; path = source/clearcut/dmat.h; sourceTree = ""; }; A7E9B6D512D37EC400DA6239 /* eachgapdist.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = eachgapdist.h; path = source/calculators/eachgapdist.h; sourceTree = ""; }; A7E9B6D612D37EC400DA6239 /* eachgapignore.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = eachgapignore.h; path = source/calculators/eachgapignore.h; sourceTree = ""; }; A7E9B6D712D37EC400DA6239 /* efron.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = efron.cpp; path = source/calculators/efron.cpp; sourceTree = ""; }; A7E9B6D812D37EC400DA6239 /* efron.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = efron.h; path = source/calculators/efron.h; sourceTree = ""; }; A7E9B6D912D37EC400DA6239 /* endiannessmacros.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = endiannessmacros.h; path = /Users/sarahwestcott/Desktop/mothur/source/endiannessmacros.h; sourceTree = ""; }; A7E9B6DA12D37EC400DA6239 /* engine.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = engine.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/engine.cpp; sourceTree = ""; }; A7E9B6DB12D37EC400DA6239 /* engine.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = engine.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/engine.hpp; sourceTree = ""; }; A7E9B6DC12D37EC400DA6239 /* fasta.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = fasta.cpp; path = source/clearcut/fasta.cpp; sourceTree = ""; }; A7E9B6DD12D37EC400DA6239 /* fasta.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = fasta.h; path = source/clearcut/fasta.h; sourceTree = ""; }; A7E9B6DE12D37EC400DA6239 /* fastamap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = fastamap.cpp; path = source/datastructures/fastamap.cpp; sourceTree = ""; }; A7E9B6DF12D37EC400DA6239 /* fastamap.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = fastamap.h; path = source/datastructures/fastamap.h; sourceTree = ""; }; A7E9B6E012D37EC400DA6239 /* fileoutput.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = fileoutput.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/fileoutput.cpp; sourceTree = ""; }; A7E9B6E112D37EC400DA6239 /* fileoutput.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = fileoutput.h; path = /Users/sarahwestcott/Desktop/mothur/source/fileoutput.h; sourceTree = ""; }; A7E9B6E212D37EC400DA6239 /* filters.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = filters.h; path = source/calculators/filters.h; sourceTree = ""; }; A7E9B6E312D37EC400DA6239 /* filterseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = filterseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/filterseqscommand.cpp; sourceTree = ""; }; A7E9B6E412D37EC400DA6239 /* filterseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = filterseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/filterseqscommand.h; sourceTree = ""; }; A7E9B6E712D37EC400DA6239 /* flowdata.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = flowdata.cpp; path = source/datastructures/flowdata.cpp; sourceTree = ""; }; A7E9B6E812D37EC400DA6239 /* flowdata.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = flowdata.h; path = source/datastructures/flowdata.h; sourceTree = ""; }; A7E9B6E912D37EC400DA6239 /* formatcolumn.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = formatcolumn.cpp; path = source/read/formatcolumn.cpp; sourceTree = ""; }; A7E9B6EA12D37EC400DA6239 /* formatcolumn.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = formatcolumn.h; path = source/read/formatcolumn.h; sourceTree = ""; }; A7E9B6EB12D37EC400DA6239 /* formatmatrix.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = formatmatrix.h; path = source/read/formatmatrix.h; sourceTree = ""; }; A7E9B6EC12D37EC400DA6239 /* formatphylip.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = formatphylip.cpp; path = source/read/formatphylip.cpp; sourceTree = ""; }; A7E9B6ED12D37EC400DA6239 /* formatphylip.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = formatphylip.h; path = source/read/formatphylip.h; sourceTree = ""; }; A7E9B6EE12D37EC400DA6239 /* fullmatrix.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = fullmatrix.cpp; path = source/datastructures/fullmatrix.cpp; sourceTree = ""; }; A7E9B6EF12D37EC400DA6239 /* fullmatrix.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = fullmatrix.h; path = source/datastructures/fullmatrix.h; sourceTree = ""; }; A7E9B6F012D37EC400DA6239 /* geom.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = geom.cpp; path = source/calculators/geom.cpp; sourceTree = ""; }; A7E9B6F112D37EC400DA6239 /* geom.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = geom.h; path = source/calculators/geom.h; sourceTree = ""; }; A7E9B6F212D37EC400DA6239 /* getgroupcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = getgroupcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getgroupcommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B6F312D37EC400DA6239 /* getgroupcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getgroupcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getgroupcommand.h; sourceTree = ""; }; A7E9B6F412D37EC400DA6239 /* getgroupscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getgroupscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getgroupscommand.cpp; sourceTree = ""; }; A7E9B6F512D37EC400DA6239 /* getgroupscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getgroupscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getgroupscommand.h; sourceTree = ""; }; A7E9B6F612D37EC400DA6239 /* getlabelcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getlabelcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getlabelcommand.cpp; sourceTree = ""; }; A7E9B6F712D37EC400DA6239 /* getlabelcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getlabelcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getlabelcommand.h; sourceTree = ""; }; A7E9B6F812D37EC400DA6239 /* getlineagecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getlineagecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getlineagecommand.cpp; sourceTree = ""; }; A7E9B6F912D37EC400DA6239 /* getlineagecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getlineagecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getlineagecommand.h; sourceTree = ""; }; A7E9B6FA12D37EC400DA6239 /* getlistcountcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getlistcountcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getlistcountcommand.cpp; sourceTree = ""; }; A7E9B6FB12D37EC400DA6239 /* getlistcountcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getlistcountcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getlistcountcommand.h; sourceTree = ""; }; A7E9B6FC12D37EC400DA6239 /* getopt_long.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getopt_long.cpp; path = source/clearcut/getopt_long.cpp; sourceTree = ""; }; A7E9B6FD12D37EC400DA6239 /* getopt_long.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getopt_long.h; path = source/clearcut/getopt_long.h; sourceTree = ""; }; A7E9B6FE12D37EC400DA6239 /* getoturepcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getoturepcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getoturepcommand.cpp; sourceTree = ""; }; A7E9B6FF12D37EC400DA6239 /* getoturepcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getoturepcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getoturepcommand.h; sourceTree = ""; }; A7E9B70012D37EC400DA6239 /* getotuscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getotuscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getotuscommand.cpp; sourceTree = ""; }; A7E9B70112D37EC400DA6239 /* getotuscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getotuscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getotuscommand.h; sourceTree = ""; }; A7E9B70212D37EC400DA6239 /* getrabundcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = getrabundcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getrabundcommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B70312D37EC400DA6239 /* getrabundcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getrabundcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getrabundcommand.h; sourceTree = ""; }; A7E9B70412D37EC400DA6239 /* getrelabundcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getrelabundcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getrelabundcommand.cpp; sourceTree = ""; }; A7E9B70512D37EC400DA6239 /* getrelabundcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getrelabundcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getrelabundcommand.h; sourceTree = ""; }; A7E9B70612D37EC400DA6239 /* getsabundcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = getsabundcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getsabundcommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B70712D37EC400DA6239 /* getsabundcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getsabundcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getsabundcommand.h; sourceTree = ""; }; A7E9B70812D37EC400DA6239 /* getseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getseqscommand.cpp; sourceTree = ""; }; A7E9B70912D37EC400DA6239 /* getseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getseqscommand.h; sourceTree = ""; }; A7E9B70A12D37EC400DA6239 /* getsharedotucommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getsharedotucommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getsharedotucommand.cpp; sourceTree = ""; }; A7E9B70B12D37EC400DA6239 /* getsharedotucommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getsharedotucommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getsharedotucommand.h; sourceTree = ""; }; A7E9B70E12D37EC400DA6239 /* goodscoverage.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = goodscoverage.cpp; path = source/calculators/goodscoverage.cpp; sourceTree = ""; }; A7E9B70F12D37EC400DA6239 /* goodscoverage.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = goodscoverage.h; path = source/calculators/goodscoverage.h; sourceTree = ""; }; A7E9B71012D37EC400DA6239 /* gotohoverlap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = gotohoverlap.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/gotohoverlap.cpp; sourceTree = ""; }; A7E9B71112D37EC400DA6239 /* gotohoverlap.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = gotohoverlap.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/gotohoverlap.hpp; sourceTree = ""; }; A7E9B71212D37EC400DA6239 /* gower.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = gower.cpp; path = source/calculators/gower.cpp; sourceTree = ""; }; A7E9B71312D37EC400DA6239 /* gower.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = gower.h; path = source/calculators/gower.h; sourceTree = ""; }; A7E9B71412D37EC400DA6239 /* groupmap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = groupmap.cpp; path = source/datastructures/groupmap.cpp; sourceTree = ""; }; A7E9B71512D37EC400DA6239 /* groupmap.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = groupmap.h; path = source/datastructures/groupmap.h; sourceTree = ""; }; A7E9B71612D37EC400DA6239 /* hamming.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = hamming.cpp; path = source/calculators/hamming.cpp; sourceTree = ""; }; A7E9B71712D37EC400DA6239 /* hamming.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = hamming.h; path = source/calculators/hamming.h; sourceTree = ""; }; A7E9B71812D37EC400DA6239 /* hcluster.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = hcluster.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/hcluster.cpp; sourceTree = ""; }; A7E9B71912D37EC400DA6239 /* hcluster.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = hcluster.h; path = /Users/sarahwestcott/Desktop/mothur/source/hcluster.h; sourceTree = ""; }; A7E9B71A12D37EC400DA6239 /* hclustercommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = hclustercommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/hclustercommand.cpp; sourceTree = ""; }; A7E9B71B12D37EC400DA6239 /* hclustercommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; lineEnding = 0; name = hclustercommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/hclustercommand.h; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.objcpp; }; A7E9B71C12D37EC400DA6239 /* heatmap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = heatmap.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/heatmap.cpp; sourceTree = ""; }; A7E9B71D12D37EC400DA6239 /* heatmap.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = heatmap.h; path = /Users/sarahwestcott/Desktop/mothur/source/heatmap.h; sourceTree = ""; }; A7E9B71E12D37EC400DA6239 /* heatmapcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = heatmapcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/heatmapcommand.cpp; sourceTree = ""; }; A7E9B71F12D37EC400DA6239 /* heatmapcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = heatmapcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/heatmapcommand.h; sourceTree = ""; }; A7E9B72012D37EC400DA6239 /* heatmapsim.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = heatmapsim.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/heatmapsim.cpp; sourceTree = ""; }; A7E9B72112D37EC400DA6239 /* heatmapsim.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = heatmapsim.h; path = /Users/sarahwestcott/Desktop/mothur/source/heatmapsim.h; sourceTree = ""; }; A7E9B72212D37EC400DA6239 /* heatmapsimcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = heatmapsimcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/heatmapsimcommand.cpp; sourceTree = ""; }; A7E9B72312D37EC400DA6239 /* heatmapsimcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = heatmapsimcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/heatmapsimcommand.h; sourceTree = ""; }; A7E9B72412D37EC400DA6239 /* heip.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = heip.cpp; path = source/calculators/heip.cpp; sourceTree = ""; }; A7E9B72512D37EC400DA6239 /* heip.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = heip.h; path = source/calculators/heip.h; sourceTree = ""; }; A7E9B72612D37EC400DA6239 /* hellinger.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = hellinger.cpp; path = source/calculators/hellinger.cpp; sourceTree = ""; }; A7E9B72712D37EC400DA6239 /* hellinger.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = hellinger.h; path = source/calculators/hellinger.h; sourceTree = ""; }; A7E9B72812D37EC400DA6239 /* helpcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = helpcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/helpcommand.cpp; sourceTree = ""; }; A7E9B72912D37EC400DA6239 /* helpcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = helpcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/helpcommand.h; sourceTree = ""; }; A7E9B72A12D37EC400DA6239 /* ignoregaps.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = ignoregaps.h; path = source/calculators/ignoregaps.h; sourceTree = ""; }; A7E9B72B12D37EC400DA6239 /* indicatorcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = indicatorcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/indicatorcommand.cpp; sourceTree = ""; }; A7E9B72C12D37EC400DA6239 /* indicatorcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = indicatorcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/indicatorcommand.h; sourceTree = ""; }; A7E9B72D12D37EC400DA6239 /* inputdata.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = inputdata.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/inputdata.cpp; sourceTree = ""; }; A7E9B72E12D37EC400DA6239 /* inputdata.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = inputdata.h; path = /Users/sarahwestcott/Desktop/mothur/source/inputdata.h; sourceTree = ""; }; A7E9B72F12D37EC400DA6239 /* invsimpson.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = invsimpson.cpp; path = source/calculators/invsimpson.cpp; sourceTree = ""; }; A7E9B73012D37EC400DA6239 /* invsimpson.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = invsimpson.h; path = source/calculators/invsimpson.h; sourceTree = ""; }; A7E9B73112D37EC400DA6239 /* jackknife.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = jackknife.cpp; path = source/calculators/jackknife.cpp; sourceTree = ""; }; A7E9B73212D37EC400DA6239 /* jackknife.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = jackknife.h; path = source/calculators/jackknife.h; sourceTree = ""; }; A7E9B73312D37EC400DA6239 /* kmer.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = kmer.cpp; path = source/datastructures/kmer.cpp; sourceTree = ""; }; A7E9B73412D37EC400DA6239 /* kmer.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = kmer.hpp; path = source/datastructures/kmer.hpp; sourceTree = ""; }; A7E9B73512D37EC400DA6239 /* kmerdb.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = kmerdb.cpp; path = source/datastructures/kmerdb.cpp; sourceTree = ""; }; A7E9B73612D37EC400DA6239 /* kmerdb.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = kmerdb.hpp; path = source/datastructures/kmerdb.hpp; sourceTree = ""; }; A7E9B73712D37EC400DA6239 /* knn.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = knn.cpp; path = source/classifier/knn.cpp; sourceTree = ""; }; A7E9B73812D37EC400DA6239 /* knn.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = knn.h; path = source/classifier/knn.h; sourceTree = ""; }; A7E9B73912D37EC400DA6239 /* libshuff.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = libshuff.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/libshuff.cpp; sourceTree = ""; }; A7E9B73A12D37EC400DA6239 /* libshuff.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = libshuff.h; path = /Users/sarahwestcott/Desktop/mothur/source/libshuff.h; sourceTree = ""; }; A7E9B73B12D37EC400DA6239 /* libshuffcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = libshuffcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/libshuffcommand.cpp; sourceTree = ""; }; A7E9B73C12D37EC400DA6239 /* libshuffcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = libshuffcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/libshuffcommand.h; sourceTree = ""; }; A7E9B73D12D37EC400DA6239 /* listseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = listseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/listseqscommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B73E12D37EC400DA6239 /* listseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = listseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/listseqscommand.h; sourceTree = ""; }; A7E9B73F12D37EC400DA6239 /* listvector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = listvector.cpp; path = source/datastructures/listvector.cpp; sourceTree = ""; }; A7E9B74012D37EC400DA6239 /* listvector.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = listvector.hpp; path = source/datastructures/listvector.hpp; sourceTree = ""; }; A7E9B74112D37EC400DA6239 /* logsd.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = logsd.cpp; path = source/calculators/logsd.cpp; sourceTree = ""; }; A7E9B74212D37EC400DA6239 /* logsd.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = logsd.h; path = source/calculators/logsd.h; sourceTree = ""; }; A7E9B74312D37EC400DA6239 /* makegroupcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; lineEnding = 0; name = makegroupcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makegroupcommand.cpp; sourceTree = ""; xcLanguageSpecificationIdentifier = xcode.lang.cpp; }; A7E9B74412D37EC400DA6239 /* makegroupcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = makegroupcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/makegroupcommand.h; sourceTree = ""; }; A7E9B74512D37EC400DA6239 /* maligner.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = maligner.cpp; path = source/chimera/maligner.cpp; sourceTree = ""; }; A7E9B74612D37EC400DA6239 /* maligner.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = maligner.h; path = source/chimera/maligner.h; sourceTree = ""; }; A7E9B74712D37EC400DA6239 /* manhattan.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = manhattan.cpp; path = source/calculators/manhattan.cpp; sourceTree = ""; }; A7E9B74812D37EC400DA6239 /* manhattan.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = manhattan.h; path = source/calculators/manhattan.h; sourceTree = ""; }; A7E9B74912D37EC400DA6239 /* matrixoutputcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = matrixoutputcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/matrixoutputcommand.cpp; sourceTree = ""; }; A7E9B74A12D37EC400DA6239 /* matrixoutputcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = matrixoutputcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/matrixoutputcommand.h; sourceTree = ""; }; A7E9B74B12D37EC400DA6239 /* memchi2.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = memchi2.cpp; path = source/calculators/memchi2.cpp; sourceTree = ""; }; A7E9B74C12D37EC400DA6239 /* memchi2.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = memchi2.h; path = source/calculators/memchi2.h; sourceTree = ""; }; A7E9B74D12D37EC400DA6239 /* memchord.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = memchord.cpp; path = source/calculators/memchord.cpp; sourceTree = ""; }; A7E9B74E12D37EC400DA6239 /* memchord.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = memchord.h; path = source/calculators/memchord.h; sourceTree = ""; }; A7E9B74F12D37EC400DA6239 /* memeuclidean.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = memeuclidean.cpp; path = source/calculators/memeuclidean.cpp; sourceTree = ""; }; A7E9B75012D37EC400DA6239 /* memeuclidean.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = memeuclidean.h; path = source/calculators/memeuclidean.h; sourceTree = ""; }; A7E9B75112D37EC400DA6239 /* mempearson.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mempearson.cpp; path = source/calculators/mempearson.cpp; sourceTree = ""; }; A7E9B75212D37EC400DA6239 /* mempearson.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mempearson.h; path = source/calculators/mempearson.h; sourceTree = ""; }; A7E9B75312D37EC400DA6239 /* mergefilecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mergefilecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mergefilecommand.cpp; sourceTree = ""; }; A7E9B75412D37EC400DA6239 /* mergefilecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mergefilecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mergefilecommand.h; sourceTree = ""; }; A7E9B75712D37EC400DA6239 /* metastatscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = metastatscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/metastatscommand.cpp; sourceTree = ""; }; A7E9B75812D37EC400DA6239 /* metastatscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = metastatscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/metastatscommand.h; sourceTree = ""; }; A7E9B75912D37EC400DA6239 /* mgclustercommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mgclustercommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mgclustercommand.cpp; sourceTree = ""; }; A7E9B75A12D37EC400DA6239 /* mgclustercommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mgclustercommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mgclustercommand.h; sourceTree = ""; }; A7E9B75B12D37EC400DA6239 /* mothur.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mothur.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/mothur.cpp; sourceTree = ""; }; A7E9B75C12D37EC400DA6239 /* mothur.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mothur.h; path = /Users/sarahwestcott/Desktop/mothur/source/mothur.h; sourceTree = ""; }; A7E9B75D12D37EC400DA6239 /* mothurout.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mothurout.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/mothurout.cpp; sourceTree = ""; }; A7E9B75E12D37EC400DA6239 /* mothurout.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mothurout.h; path = /Users/sarahwestcott/Desktop/mothur/source/mothurout.h; sourceTree = ""; }; A7E9B75F12D37EC400DA6239 /* nameassignment.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = nameassignment.cpp; path = source/datastructures/nameassignment.cpp; sourceTree = ""; }; A7E9B76012D37EC400DA6239 /* nameassignment.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = nameassignment.hpp; path = source/datastructures/nameassignment.hpp; sourceTree = ""; }; A7E9B76112D37EC400DA6239 /* nast.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = nast.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/nast.cpp; sourceTree = ""; }; A7E9B76212D37EC400DA6239 /* nast.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = nast.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/nast.hpp; sourceTree = ""; }; A7E9B76312D37EC400DA6239 /* nastreport.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = nastreport.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/nastreport.cpp; sourceTree = ""; }; A7E9B76412D37EC400DA6239 /* nastreport.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = nastreport.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/nastreport.hpp; sourceTree = ""; }; A7E9B76512D37EC400DA6239 /* needlemanoverlap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = needlemanoverlap.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/needlemanoverlap.cpp; sourceTree = ""; }; A7E9B76612D37EC400DA6239 /* needlemanoverlap.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = needlemanoverlap.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/needlemanoverlap.hpp; sourceTree = ""; }; A7E9B76712D37EC400DA6239 /* noalign.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = noalign.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/noalign.cpp; sourceTree = ""; }; A7E9B76812D37EC400DA6239 /* noalign.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = noalign.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/noalign.hpp; sourceTree = ""; }; A7E9B76912D37EC400DA6239 /* nocommands.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = nocommands.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/nocommands.cpp; sourceTree = ""; }; A7E9B76A12D37EC400DA6239 /* nocommands.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = nocommands.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/nocommands.h; sourceTree = ""; }; A7E9B76B12D37EC400DA6239 /* normalizesharedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = normalizesharedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/normalizesharedcommand.cpp; sourceTree = ""; }; A7E9B76C12D37EC400DA6239 /* normalizesharedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = normalizesharedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/normalizesharedcommand.h; sourceTree = ""; }; A7E9B76D12D37EC400DA6239 /* npshannon.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = npshannon.cpp; path = source/calculators/npshannon.cpp; sourceTree = ""; }; A7E9B76E12D37EC400DA6239 /* npshannon.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = npshannon.h; path = source/calculators/npshannon.h; sourceTree = ""; }; A7E9B76F12D37EC400DA6239 /* nseqs.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = nseqs.h; path = source/calculators/nseqs.h; sourceTree = ""; }; A7E9B77012D37EC400DA6239 /* observable.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = observable.h; path = /Users/sarahwestcott/Desktop/mothur/source/observable.h; sourceTree = ""; }; A7E9B77112D37EC400DA6239 /* odum.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = odum.cpp; path = source/calculators/odum.cpp; sourceTree = ""; }; A7E9B77212D37EC400DA6239 /* odum.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = odum.h; path = source/calculators/odum.h; sourceTree = ""; }; A7E9B77312D37EC400DA6239 /* onegapdist.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = onegapdist.h; path = source/calculators/onegapdist.h; sourceTree = ""; }; A7E9B77412D37EC400DA6239 /* onegapignore.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = onegapignore.h; path = source/calculators/onegapignore.h; sourceTree = ""; }; A7E9B77512D37EC400DA6239 /* optionparser.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = optionparser.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/optionparser.cpp; sourceTree = ""; }; A7E9B77612D37EC400DA6239 /* optionparser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = optionparser.h; path = /Users/sarahwestcott/Desktop/mothur/source/optionparser.h; sourceTree = ""; }; A7E9B77712D37EC400DA6239 /* ordervector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = ordervector.cpp; path = source/datastructures/ordervector.cpp; sourceTree = ""; }; A7E9B77812D37EC400DA6239 /* ordervector.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = ordervector.hpp; path = source/datastructures/ordervector.hpp; sourceTree = ""; }; A7E9B77912D37EC400DA6239 /* otuhierarchycommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = otuhierarchycommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/otuhierarchycommand.cpp; sourceTree = ""; }; A7E9B77A12D37EC400DA6239 /* otuhierarchycommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = otuhierarchycommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/otuhierarchycommand.h; sourceTree = ""; }; A7E9B77B12D37EC400DA6239 /* overlap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = overlap.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/overlap.cpp; sourceTree = ""; }; A7E9B77C12D37EC400DA6239 /* overlap.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = overlap.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/overlap.hpp; sourceTree = ""; }; A7E9B77D12D37EC400DA6239 /* pairwiseseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = pairwiseseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pairwiseseqscommand.cpp; sourceTree = ""; }; A7E9B77E12D37EC400DA6239 /* pairwiseseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = pairwiseseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pairwiseseqscommand.h; sourceTree = ""; }; A7E9B77F12D37EC400DA6239 /* parsefastaqcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = parsefastaqcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/parsefastaqcommand.cpp; sourceTree = ""; }; A7E9B78012D37EC400DA6239 /* parsefastaqcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = parsefastaqcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/parsefastaqcommand.h; sourceTree = ""; }; A7E9B78112D37EC400DA6239 /* parselistscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = parselistscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/parselistscommand.cpp; sourceTree = ""; }; A7E9B78212D37EC400DA6239 /* parselistscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = parselistscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/parselistscommand.h; sourceTree = ""; }; A7E9B78312D37EC400DA6239 /* parsimony.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = parsimony.cpp; path = source/calculators/parsimony.cpp; sourceTree = ""; }; A7E9B78412D37EC400DA6239 /* parsimony.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = parsimony.h; path = source/calculators/parsimony.h; sourceTree = ""; }; A7E9B78512D37EC400DA6239 /* parsimonycommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = parsimonycommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/parsimonycommand.cpp; sourceTree = ""; }; A7E9B78612D37EC400DA6239 /* parsimonycommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = parsimonycommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/parsimonycommand.h; sourceTree = ""; }; A7E9B78712D37EC400DA6239 /* pcoacommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = pcoacommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pcoacommand.cpp; sourceTree = ""; }; A7E9B78812D37EC400DA6239 /* pcoacommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = pcoacommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pcoacommand.h; sourceTree = ""; }; A7E9B78B12D37EC400DA6239 /* phylodiversitycommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = phylodiversitycommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/phylodiversitycommand.cpp; sourceTree = ""; }; A7E9B78C12D37EC400DA6239 /* phylodiversitycommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = phylodiversitycommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/phylodiversitycommand.h; sourceTree = ""; }; A7E9B78D12D37EC400DA6239 /* phylosummary.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = phylosummary.cpp; path = source/classifier/phylosummary.cpp; sourceTree = ""; }; A7E9B78E12D37EC400DA6239 /* phylosummary.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = phylosummary.h; path = source/classifier/phylosummary.h; sourceTree = ""; }; A7E9B78F12D37EC400DA6239 /* phylotree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = phylotree.cpp; path = source/classifier/phylotree.cpp; sourceTree = ""; }; A7E9B79012D37EC400DA6239 /* phylotree.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = phylotree.h; path = source/classifier/phylotree.h; sourceTree = ""; }; A7E9B79112D37EC400DA6239 /* phylotypecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = phylotypecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/phylotypecommand.cpp; sourceTree = ""; }; A7E9B79212D37EC400DA6239 /* phylotypecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = phylotypecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/phylotypecommand.h; sourceTree = ""; }; A7E9B79312D37EC400DA6239 /* pintail.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = pintail.cpp; path = source/chimera/pintail.cpp; sourceTree = ""; }; A7E9B79412D37EC400DA6239 /* pintail.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = pintail.h; path = source/chimera/pintail.h; sourceTree = ""; }; A7E9B79512D37EC400DA6239 /* pipelinepdscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = pipelinepdscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pipelinepdscommand.cpp; sourceTree = ""; }; A7E9B79612D37EC400DA6239 /* pipelinepdscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = pipelinepdscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pipelinepdscommand.h; sourceTree = ""; }; A7E9B79712D37EC400DA6239 /* preclustercommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = preclustercommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/preclustercommand.cpp; sourceTree = ""; }; A7E9B79812D37EC400DA6239 /* preclustercommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = preclustercommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/preclustercommand.h; sourceTree = ""; }; A7E9B79912D37EC400DA6239 /* prng.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = prng.cpp; path = source/calculators/prng.cpp; sourceTree = ""; }; A7E9B79A12D37EC400DA6239 /* prng.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = prng.h; path = source/calculators/prng.h; sourceTree = ""; }; A7E9B79B12D37EC400DA6239 /* progress.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = progress.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/progress.cpp; sourceTree = ""; }; A7E9B79C12D37EC400DA6239 /* progress.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = progress.hpp; path = /Users/sarahwestcott/Desktop/mothur/source/progress.hpp; sourceTree = ""; }; A7E9B79D12D37EC400DA6239 /* qstat.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = qstat.cpp; path = source/calculators/qstat.cpp; sourceTree = ""; }; A7E9B79E12D37EC400DA6239 /* qstat.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = qstat.h; path = source/calculators/qstat.h; sourceTree = ""; }; A7E9B79F12D37EC400DA6239 /* qualityscores.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = qualityscores.cpp; path = source/datastructures/qualityscores.cpp; sourceTree = ""; }; A7E9B7A012D37EC400DA6239 /* qualityscores.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = qualityscores.h; path = source/datastructures/qualityscores.h; sourceTree = ""; }; A7E9B7A112D37EC400DA6239 /* quitcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = quitcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/quitcommand.cpp; sourceTree = ""; }; A7E9B7A212D37EC400DA6239 /* quitcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = quitcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/quitcommand.h; sourceTree = ""; }; A7E9B7A312D37EC400DA6239 /* rabundvector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = rabundvector.cpp; path = source/datastructures/rabundvector.cpp; sourceTree = ""; }; A7E9B7A412D37EC400DA6239 /* rabundvector.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = rabundvector.hpp; path = source/datastructures/rabundvector.hpp; sourceTree = ""; }; A7E9B7A512D37EC400DA6239 /* rarecalc.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = rarecalc.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/rarecalc.cpp; sourceTree = ""; }; A7E9B7A612D37EC400DA6239 /* rarecalc.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = rarecalc.h; path = /Users/sarahwestcott/Desktop/mothur/source/rarecalc.h; sourceTree = ""; }; A7E9B7A712D37EC400DA6239 /* raredisplay.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = raredisplay.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/raredisplay.cpp; sourceTree = ""; }; A7E9B7A812D37EC400DA6239 /* raredisplay.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = raredisplay.h; path = /Users/sarahwestcott/Desktop/mothur/source/raredisplay.h; sourceTree = ""; }; A7E9B7A912D37EC400DA6239 /* rarefact.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = rarefact.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/rarefact.cpp; sourceTree = ""; }; A7E9B7AA12D37EC400DA6239 /* rarefact.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = rarefact.h; path = /Users/sarahwestcott/Desktop/mothur/source/rarefact.h; sourceTree = ""; }; A7E9B7AB12D37EC400DA6239 /* rarefactcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = rarefactcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/rarefactcommand.cpp; sourceTree = ""; }; A7E9B7AC12D37EC400DA6239 /* rarefactcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = rarefactcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/rarefactcommand.h; sourceTree = ""; }; A7E9B7AD12D37EC400DA6239 /* rarefactioncurvedata.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = rarefactioncurvedata.h; path = /Users/sarahwestcott/Desktop/mothur/source/rarefactioncurvedata.h; sourceTree = ""; }; A7E9B7AE12D37EC400DA6239 /* rarefactsharedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = rarefactsharedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/rarefactsharedcommand.cpp; sourceTree = ""; }; A7E9B7AF12D37EC400DA6239 /* rarefactsharedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = rarefactsharedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/rarefactsharedcommand.h; sourceTree = ""; }; A7E9B7B012D37EC400DA6239 /* readblast.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = readblast.cpp; path = source/read/readblast.cpp; sourceTree = ""; }; A7E9B7B112D37EC400DA6239 /* readblast.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = readblast.h; path = source/read/readblast.h; sourceTree = ""; }; A7E9B7B212D37EC400DA6239 /* readcluster.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = readcluster.cpp; path = source/read/readcluster.cpp; sourceTree = ""; }; A7E9B7B312D37EC400DA6239 /* readcluster.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = readcluster.h; path = source/read/readcluster.h; sourceTree = ""; }; A7E9B7B412D37EC400DA6239 /* readcolumn.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = readcolumn.cpp; path = source/read/readcolumn.cpp; sourceTree = ""; }; A7E9B7B512D37EC400DA6239 /* readcolumn.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = readcolumn.h; path = source/read/readcolumn.h; sourceTree = ""; }; A7E9B7B812D37EC400DA6239 /* readmatrix.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = readmatrix.hpp; path = source/read/readmatrix.hpp; sourceTree = ""; }; A7E9B7BD12D37EC400DA6239 /* readphylip.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = readphylip.cpp; path = source/read/readphylip.cpp; sourceTree = ""; }; A7E9B7BE12D37EC400DA6239 /* readphylip.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = readphylip.h; path = source/read/readphylip.h; sourceTree = ""; }; A7E9B7BF12D37EC400DA6239 /* readtree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = readtree.cpp; path = source/read/readtree.cpp; sourceTree = ""; }; A7E9B7C012D37EC400DA6239 /* readtree.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = readtree.h; path = source/read/readtree.h; sourceTree = ""; }; A7E9B7C312D37EC400DA6239 /* removegroupscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = removegroupscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removegroupscommand.cpp; sourceTree = ""; }; A7E9B7C412D37EC400DA6239 /* removegroupscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = removegroupscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removegroupscommand.h; sourceTree = ""; }; A7E9B7C512D37EC400DA6239 /* removelineagecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = removelineagecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removelineagecommand.cpp; sourceTree = ""; }; A7E9B7C612D37EC400DA6239 /* removelineagecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = removelineagecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removelineagecommand.h; sourceTree = ""; }; A7E9B7C712D37EC400DA6239 /* removeotuscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = removeotuscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removeotuscommand.cpp; sourceTree = ""; }; A7E9B7C812D37EC400DA6239 /* removeotuscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = removeotuscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removeotuscommand.h; sourceTree = ""; }; A7E9B7C912D37EC400DA6239 /* removeseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = removeseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removeseqscommand.cpp; sourceTree = ""; }; A7E9B7CA12D37EC400DA6239 /* removeseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = removeseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/removeseqscommand.h; sourceTree = ""; }; A7E9B7CB12D37EC400DA6239 /* reportfile.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = reportfile.cpp; path = source/datastructures/reportfile.cpp; sourceTree = ""; }; A7E9B7CC12D37EC400DA6239 /* reportfile.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = reportfile.h; path = source/datastructures/reportfile.h; sourceTree = ""; }; A7E9B7CD12D37EC400DA6239 /* reversecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = reversecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/reversecommand.cpp; sourceTree = ""; }; A7E9B7CE12D37EC400DA6239 /* reversecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = reversecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/reversecommand.h; sourceTree = ""; }; A7E9B7CF12D37EC400DA6239 /* sabundvector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sabundvector.cpp; path = source/datastructures/sabundvector.cpp; sourceTree = ""; }; A7E9B7D012D37EC400DA6239 /* sabundvector.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = sabundvector.hpp; path = source/datastructures/sabundvector.hpp; sourceTree = ""; }; A7E9B7D112D37EC400DA6239 /* screenseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = screenseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/screenseqscommand.cpp; sourceTree = ""; }; A7E9B7D212D37EC400DA6239 /* screenseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = screenseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/screenseqscommand.h; sourceTree = ""; }; A7E9B7D312D37EC400DA6239 /* secondarystructurecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = secondarystructurecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/secondarystructurecommand.cpp; sourceTree = ""; }; A7E9B7D412D37EC400DA6239 /* secondarystructurecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = secondarystructurecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/secondarystructurecommand.h; sourceTree = ""; }; A7E9B7D512D37EC400DA6239 /* sensspeccommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sensspeccommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sensspeccommand.cpp; sourceTree = ""; }; A7E9B7D612D37EC400DA6239 /* sensspeccommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sensspeccommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sensspeccommand.h; sourceTree = ""; }; A7E9B7D712D37EC400DA6239 /* seqerrorcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = seqerrorcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/seqerrorcommand.cpp; sourceTree = ""; }; A7E9B7D812D37EC400DA6239 /* seqerrorcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = seqerrorcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/seqerrorcommand.h; sourceTree = ""; }; A7E9B7D912D37EC400DA6239 /* seqsummarycommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = seqsummarycommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/seqsummarycommand.cpp; sourceTree = ""; }; A7E9B7DA12D37EC400DA6239 /* seqsummarycommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = seqsummarycommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/seqsummarycommand.h; sourceTree = ""; }; A7E9B7DB12D37EC400DA6239 /* sequence.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sequence.cpp; path = source/datastructures/sequence.cpp; sourceTree = ""; }; A7E9B7DC12D37EC400DA6239 /* sequence.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = sequence.hpp; path = source/datastructures/sequence.hpp; sourceTree = ""; }; A7E9B7DD12D37EC400DA6239 /* sequencedb.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sequencedb.cpp; path = source/datastructures/sequencedb.cpp; sourceTree = ""; }; A7E9B7DE12D37EC400DA6239 /* sequencedb.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sequencedb.h; path = source/datastructures/sequencedb.h; sourceTree = ""; }; A7E9B7DF12D37EC400DA6239 /* setdircommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = setdircommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/setdircommand.cpp; sourceTree = ""; }; A7E9B7E012D37EC400DA6239 /* setdircommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = setdircommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/setdircommand.h; sourceTree = ""; }; A7E9B7E112D37EC400DA6239 /* setlogfilecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = setlogfilecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/setlogfilecommand.cpp; sourceTree = ""; }; A7E9B7E212D37EC400DA6239 /* setlogfilecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = setlogfilecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/setlogfilecommand.h; sourceTree = ""; }; A7E9B7E312D37EC400DA6239 /* sffinfocommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sffinfocommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sffinfocommand.cpp; sourceTree = ""; }; A7E9B7E412D37EC400DA6239 /* sffinfocommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sffinfocommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sffinfocommand.h; sourceTree = ""; }; A7E9B7E512D37EC400DA6239 /* shannon.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = shannon.cpp; path = source/calculators/shannon.cpp; sourceTree = ""; }; A7E9B7E612D37EC400DA6239 /* shannon.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = shannon.h; path = source/calculators/shannon.h; sourceTree = ""; }; A7E9B7E712D37EC400DA6239 /* shannoneven.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = shannoneven.cpp; path = source/calculators/shannoneven.cpp; sourceTree = ""; }; A7E9B7E812D37EC400DA6239 /* shannoneven.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = shannoneven.h; path = source/calculators/shannoneven.h; sourceTree = ""; }; A7E9B7E912D37EC400DA6239 /* sharedace.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedace.cpp; path = source/calculators/sharedace.cpp; sourceTree = ""; }; A7E9B7EA12D37EC400DA6239 /* sharedace.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedace.h; path = source/calculators/sharedace.h; sourceTree = ""; }; A7E9B7EC12D37EC400DA6239 /* sharedanderbergs.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedanderbergs.cpp; path = source/calculators/sharedanderbergs.cpp; sourceTree = ""; }; A7E9B7ED12D37EC400DA6239 /* sharedanderbergs.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedanderbergs.h; path = source/calculators/sharedanderbergs.h; sourceTree = ""; }; A7E9B7EE12D37EC400DA6239 /* sharedbraycurtis.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedbraycurtis.cpp; path = source/calculators/sharedbraycurtis.cpp; sourceTree = ""; }; A7E9B7EF12D37EC400DA6239 /* sharedbraycurtis.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedbraycurtis.h; path = source/calculators/sharedbraycurtis.h; sourceTree = ""; }; A7E9B7F012D37EC400DA6239 /* sharedchao1.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedchao1.cpp; path = source/calculators/sharedchao1.cpp; sourceTree = ""; }; A7E9B7F112D37EC400DA6239 /* sharedchao1.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedchao1.h; path = source/calculators/sharedchao1.h; sourceTree = ""; }; A7E9B7F212D37EC400DA6239 /* sharedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sharedcommand.cpp; sourceTree = ""; }; A7E9B7F312D37EC400DA6239 /* sharedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/sharedcommand.h; sourceTree = ""; }; A7E9B7F412D37EC400DA6239 /* sharedjabund.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedjabund.cpp; path = source/calculators/sharedjabund.cpp; sourceTree = ""; }; A7E9B7F512D37EC400DA6239 /* sharedjabund.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedjabund.h; path = source/calculators/sharedjabund.h; sourceTree = ""; }; A7E9B7F612D37EC400DA6239 /* sharedjackknife.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedjackknife.cpp; path = source/calculators/sharedjackknife.cpp; sourceTree = ""; }; A7E9B7F712D37EC400DA6239 /* sharedjackknife.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedjackknife.h; path = source/calculators/sharedjackknife.h; sourceTree = ""; }; A7E9B7F812D37EC400DA6239 /* sharedjclass.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedjclass.cpp; path = source/calculators/sharedjclass.cpp; sourceTree = ""; }; A7E9B7F912D37EC400DA6239 /* sharedjclass.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedjclass.h; path = source/calculators/sharedjclass.h; sourceTree = ""; }; A7E9B7FA12D37EC400DA6239 /* sharedjest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedjest.cpp; path = source/calculators/sharedjest.cpp; sourceTree = ""; }; A7E9B7FB12D37EC400DA6239 /* sharedjest.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedjest.h; path = source/calculators/sharedjest.h; sourceTree = ""; }; A7E9B7FC12D37EC400DA6239 /* sharedkstest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedkstest.cpp; path = source/calculators/sharedkstest.cpp; sourceTree = ""; }; A7E9B7FD12D37EC400DA6239 /* sharedkstest.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedkstest.h; path = source/calculators/sharedkstest.h; sourceTree = ""; }; A7E9B7FE12D37EC400DA6239 /* sharedkulczynski.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedkulczynski.cpp; path = source/calculators/sharedkulczynski.cpp; sourceTree = ""; }; A7E9B7FF12D37EC400DA6239 /* sharedkulczynski.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedkulczynski.h; path = source/calculators/sharedkulczynski.h; sourceTree = ""; }; A7E9B80012D37EC400DA6239 /* sharedkulczynskicody.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedkulczynskicody.cpp; path = source/calculators/sharedkulczynskicody.cpp; sourceTree = ""; }; A7E9B80112D37EC400DA6239 /* sharedkulczynskicody.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedkulczynskicody.h; path = source/calculators/sharedkulczynskicody.h; sourceTree = ""; }; A7E9B80212D37EC400DA6239 /* sharedlennon.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedlennon.cpp; path = source/calculators/sharedlennon.cpp; sourceTree = ""; }; A7E9B80312D37EC400DA6239 /* sharedlennon.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedlennon.h; path = source/calculators/sharedlennon.h; sourceTree = ""; }; A7E9B80412D37EC400DA6239 /* sharedlistvector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedlistvector.cpp; path = source/datastructures/sharedlistvector.cpp; sourceTree = ""; }; A7E9B80512D37EC400DA6239 /* sharedlistvector.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedlistvector.h; path = source/datastructures/sharedlistvector.h; sourceTree = ""; }; A7E9B80612D37EC400DA6239 /* sharedmarczewski.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedmarczewski.cpp; path = source/calculators/sharedmarczewski.cpp; sourceTree = ""; }; A7E9B80712D37EC400DA6239 /* sharedmarczewski.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedmarczewski.h; path = source/calculators/sharedmarczewski.h; sourceTree = ""; }; A7E9B80812D37EC400DA6239 /* sharedmorisitahorn.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedmorisitahorn.cpp; path = source/calculators/sharedmorisitahorn.cpp; sourceTree = ""; }; A7E9B80912D37EC400DA6239 /* sharedmorisitahorn.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedmorisitahorn.h; path = source/calculators/sharedmorisitahorn.h; sourceTree = ""; }; A7E9B80A12D37EC400DA6239 /* sharednseqs.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharednseqs.h; path = source/calculators/sharednseqs.h; sourceTree = ""; }; A7E9B80B12D37EC400DA6239 /* sharedochiai.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedochiai.cpp; path = source/calculators/sharedochiai.cpp; sourceTree = ""; }; A7E9B80C12D37EC400DA6239 /* sharedochiai.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedochiai.h; path = source/calculators/sharedochiai.h; sourceTree = ""; }; A7E9B80D12D37EC400DA6239 /* sharedordervector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedordervector.cpp; path = source/datastructures/sharedordervector.cpp; sourceTree = ""; }; A7E9B80E12D37EC400DA6239 /* sharedordervector.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedordervector.h; path = source/datastructures/sharedordervector.h; sourceTree = ""; }; A7E9B80F12D37EC400DA6239 /* sharedrabundfloatvector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedrabundfloatvector.cpp; path = source/datastructures/sharedrabundfloatvector.cpp; sourceTree = ""; }; A7E9B81012D37EC400DA6239 /* sharedrabundfloatvector.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedrabundfloatvector.h; path = source/datastructures/sharedrabundfloatvector.h; sourceTree = ""; }; A7E9B81112D37EC400DA6239 /* sharedrabundvector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedrabundvector.cpp; path = source/datastructures/sharedrabundvector.cpp; sourceTree = ""; }; A7E9B81212D37EC400DA6239 /* sharedrabundvector.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedrabundvector.h; path = source/datastructures/sharedrabundvector.h; sourceTree = ""; }; A7E9B81312D37EC400DA6239 /* sharedsabundvector.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedsabundvector.cpp; path = source/datastructures/sharedsabundvector.cpp; sourceTree = ""; }; A7E9B81412D37EC400DA6239 /* sharedsabundvector.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedsabundvector.h; path = source/datastructures/sharedsabundvector.h; sourceTree = ""; }; A7E9B81512D37EC400DA6239 /* sharedsobs.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedsobs.cpp; path = source/calculators/sharedsobs.cpp; sourceTree = ""; }; A7E9B81612D37EC400DA6239 /* sharedsobs.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedsobs.h; path = source/calculators/sharedsobs.h; sourceTree = ""; }; A7E9B81712D37EC400DA6239 /* sharedsobscollectsummary.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedsobscollectsummary.cpp; path = source/calculators/sharedsobscollectsummary.cpp; sourceTree = ""; }; A7E9B81812D37EC400DA6239 /* sharedsobscollectsummary.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedsobscollectsummary.h; path = source/calculators/sharedsobscollectsummary.h; sourceTree = ""; }; A7E9B81912D37EC400DA6239 /* sharedsorabund.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedsorabund.cpp; path = source/calculators/sharedsorabund.cpp; sourceTree = ""; }; A7E9B81A12D37EC400DA6239 /* sharedsorabund.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedsorabund.h; path = source/calculators/sharedsorabund.h; sourceTree = ""; }; A7E9B81B12D37EC400DA6239 /* sharedsorclass.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedsorclass.cpp; path = source/calculators/sharedsorclass.cpp; sourceTree = ""; }; A7E9B81C12D37EC400DA6239 /* sharedsorclass.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedsorclass.h; path = source/calculators/sharedsorclass.h; sourceTree = ""; }; A7E9B81D12D37EC400DA6239 /* sharedsorest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedsorest.cpp; path = source/calculators/sharedsorest.cpp; sourceTree = ""; }; A7E9B81E12D37EC400DA6239 /* sharedsorest.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedsorest.h; path = source/calculators/sharedsorest.h; sourceTree = ""; }; A7E9B81F12D37EC400DA6239 /* sharedthetan.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedthetan.cpp; path = source/calculators/sharedthetan.cpp; sourceTree = ""; }; A7E9B82012D37EC400DA6239 /* sharedthetan.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedthetan.h; path = source/calculators/sharedthetan.h; sourceTree = ""; }; A7E9B82112D37EC400DA6239 /* sharedthetayc.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedthetayc.cpp; path = source/calculators/sharedthetayc.cpp; sourceTree = ""; }; A7E9B82212D37EC400DA6239 /* sharedthetayc.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedthetayc.h; path = source/calculators/sharedthetayc.h; sourceTree = ""; }; A7E9B82312D37EC400DA6239 /* sharedutilities.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sharedutilities.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/sharedutilities.cpp; sourceTree = ""; }; A7E9B82412D37EC400DA6239 /* sharedutilities.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sharedutilities.h; path = /Users/sarahwestcott/Desktop/mothur/source/sharedutilities.h; sourceTree = ""; }; A7E9B82512D37EC400DA6239 /* shen.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = shen.cpp; path = source/calculators/shen.cpp; sourceTree = ""; }; A7E9B82612D37EC400DA6239 /* shen.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = shen.h; path = source/calculators/shen.h; sourceTree = ""; }; A7E9B82712D37EC400DA6239 /* shhhercommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = shhhercommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/shhhercommand.cpp; sourceTree = ""; }; A7E9B82812D37EC400DA6239 /* shhhercommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = shhhercommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/shhhercommand.h; sourceTree = ""; }; A7E9B82912D37EC400DA6239 /* simpson.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = simpson.cpp; path = source/calculators/simpson.cpp; sourceTree = ""; }; A7E9B82A12D37EC400DA6239 /* simpson.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = simpson.h; path = source/calculators/simpson.h; sourceTree = ""; }; A7E9B82B12D37EC400DA6239 /* simpsoneven.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = simpsoneven.cpp; path = source/calculators/simpsoneven.cpp; sourceTree = ""; }; A7E9B82C12D37EC400DA6239 /* simpsoneven.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = simpsoneven.h; path = source/calculators/simpsoneven.h; sourceTree = ""; }; A7E9B82D12D37EC400DA6239 /* singlelinkage.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = singlelinkage.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/singlelinkage.cpp; sourceTree = ""; }; A7E9B82E12D37EC400DA6239 /* slayer.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = slayer.cpp; path = source/chimera/slayer.cpp; sourceTree = ""; }; A7E9B82F12D37EC400DA6239 /* slayer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = slayer.h; path = source/chimera/slayer.h; sourceTree = ""; }; A7E9B83012D37EC400DA6239 /* slibshuff.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = slibshuff.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/slibshuff.cpp; sourceTree = ""; }; A7E9B83112D37EC400DA6239 /* slibshuff.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = slibshuff.h; path = /Users/sarahwestcott/Desktop/mothur/source/slibshuff.h; sourceTree = ""; }; A7E9B83212D37EC400DA6239 /* smithwilson.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = smithwilson.cpp; path = source/calculators/smithwilson.cpp; sourceTree = ""; }; A7E9B83312D37EC400DA6239 /* smithwilson.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = smithwilson.h; path = source/calculators/smithwilson.h; sourceTree = ""; }; A7E9B83412D37EC400DA6239 /* sobs.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sobs.h; path = source/calculators/sobs.h; sourceTree = ""; }; A7E9B83512D37EC400DA6239 /* soergel.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = soergel.cpp; path = source/calculators/soergel.cpp; sourceTree = ""; }; A7E9B83612D37EC400DA6239 /* soergel.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = soergel.h; path = source/calculators/soergel.h; sourceTree = ""; }; A7E9B83712D37EC400DA6239 /* solow.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = solow.cpp; path = source/calculators/solow.cpp; sourceTree = ""; }; A7E9B83812D37EC400DA6239 /* solow.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = solow.h; path = source/calculators/solow.h; sourceTree = ""; }; A7E9B83912D37EC400DA6239 /* sparsematrix.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sparsematrix.cpp; path = source/datastructures/sparsematrix.cpp; sourceTree = ""; }; A7E9B83A12D37EC400DA6239 /* sparsematrix.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = sparsematrix.hpp; path = source/datastructures/sparsematrix.hpp; sourceTree = ""; }; A7E9B83B12D37EC400DA6239 /* spearman.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = spearman.cpp; path = source/calculators/spearman.cpp; sourceTree = ""; }; A7E9B83C12D37EC400DA6239 /* spearman.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = spearman.h; path = source/calculators/spearman.h; sourceTree = ""; }; A7E9B83D12D37EC400DA6239 /* speciesprofile.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = speciesprofile.cpp; path = source/calculators/speciesprofile.cpp; sourceTree = ""; }; A7E9B83E12D37EC400DA6239 /* speciesprofile.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = speciesprofile.h; path = source/calculators/speciesprofile.h; sourceTree = ""; }; A7E9B83F12D37EC400DA6239 /* splitabundcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = splitabundcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/splitabundcommand.cpp; sourceTree = ""; }; A7E9B84012D37EC400DA6239 /* splitabundcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = splitabundcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/splitabundcommand.h; sourceTree = ""; }; A7E9B84112D37EC400DA6239 /* splitgroupscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = splitgroupscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/splitgroupscommand.cpp; sourceTree = ""; }; A7E9B84212D37EC400DA6239 /* splitgroupscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = splitgroupscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/splitgroupscommand.h; sourceTree = ""; }; A7E9B84312D37EC400DA6239 /* splitmatrix.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = splitmatrix.cpp; path = source/read/splitmatrix.cpp; sourceTree = ""; }; A7E9B84412D37EC400DA6239 /* splitmatrix.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = splitmatrix.h; path = source/read/splitmatrix.h; sourceTree = ""; }; A7E9B84512D37EC400DA6239 /* structchi2.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = structchi2.cpp; path = source/calculators/structchi2.cpp; sourceTree = ""; }; A7E9B84612D37EC400DA6239 /* structchi2.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = structchi2.h; path = source/calculators/structchi2.h; sourceTree = ""; }; A7E9B84712D37EC400DA6239 /* structchord.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = structchord.cpp; path = source/calculators/structchord.cpp; sourceTree = ""; }; A7E9B84812D37EC400DA6239 /* structchord.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = structchord.h; path = source/calculators/structchord.h; sourceTree = ""; }; A7E9B84912D37EC400DA6239 /* structeuclidean.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = structeuclidean.cpp; path = source/calculators/structeuclidean.cpp; sourceTree = ""; }; A7E9B84A12D37EC400DA6239 /* structeuclidean.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = structeuclidean.h; path = source/calculators/structeuclidean.h; sourceTree = ""; }; A7E9B84B12D37EC400DA6239 /* structkulczynski.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = structkulczynski.cpp; path = source/calculators/structkulczynski.cpp; sourceTree = ""; }; A7E9B84C12D37EC400DA6239 /* structkulczynski.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = structkulczynski.h; path = source/calculators/structkulczynski.h; sourceTree = ""; }; A7E9B84D12D37EC400DA6239 /* structpearson.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = structpearson.cpp; path = source/calculators/structpearson.cpp; sourceTree = ""; }; A7E9B84E12D37EC400DA6239 /* structpearson.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = structpearson.h; path = source/calculators/structpearson.h; sourceTree = ""; }; A7E9B84F12D37EC400DA6239 /* subsamplecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = subsamplecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/subsamplecommand.cpp; sourceTree = ""; }; A7E9B85012D37EC400DA6239 /* subsamplecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = subsamplecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/subsamplecommand.h; sourceTree = ""; }; A7E9B85112D37EC400DA6239 /* suffixdb.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = suffixdb.cpp; path = source/datastructures/suffixdb.cpp; sourceTree = ""; }; A7E9B85212D37EC400DA6239 /* suffixdb.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = suffixdb.hpp; path = source/datastructures/suffixdb.hpp; sourceTree = ""; }; A7E9B85312D37EC400DA6239 /* suffixnodes.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = suffixnodes.cpp; path = source/datastructures/suffixnodes.cpp; sourceTree = ""; }; A7E9B85412D37EC400DA6239 /* suffixnodes.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = suffixnodes.hpp; path = source/datastructures/suffixnodes.hpp; sourceTree = ""; }; A7E9B85512D37EC400DA6239 /* suffixtree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = suffixtree.cpp; path = source/datastructures/suffixtree.cpp; sourceTree = ""; }; A7E9B85612D37EC400DA6239 /* suffixtree.hpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.h; name = suffixtree.hpp; path = source/datastructures/suffixtree.hpp; sourceTree = ""; }; A7E9B85712D37EC400DA6239 /* summarycommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = summarycommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/summarycommand.cpp; sourceTree = ""; }; A7E9B85812D37EC400DA6239 /* summarycommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = summarycommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/summarycommand.h; sourceTree = ""; }; A7E9B85912D37EC400DA6239 /* summarysharedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = summarysharedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/summarysharedcommand.cpp; sourceTree = ""; }; A7E9B85A12D37EC400DA6239 /* summarysharedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = summarysharedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/summarysharedcommand.h; sourceTree = ""; }; A7E9B85B12D37EC400DA6239 /* systemcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = systemcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/systemcommand.cpp; sourceTree = ""; }; A7E9B85C12D37EC400DA6239 /* systemcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = systemcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/systemcommand.h; sourceTree = ""; }; A7E9B85D12D37EC400DA6239 /* taxonomyequalizer.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = taxonomyequalizer.cpp; path = source/classifier/taxonomyequalizer.cpp; sourceTree = ""; }; A7E9B85E12D37EC400DA6239 /* taxonomyequalizer.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = taxonomyequalizer.h; path = source/classifier/taxonomyequalizer.h; sourceTree = ""; }; A7E9B85F12D37EC400DA6239 /* tree.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = tree.cpp; path = source/datastructures/tree.cpp; sourceTree = ""; }; A7E9B86012D37EC400DA6239 /* tree.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = tree.h; path = source/datastructures/tree.h; sourceTree = ""; }; A7E9B86112D37EC400DA6239 /* treecalculator.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = treecalculator.h; path = source/calculators/treecalculator.h; sourceTree = ""; }; A7E9B86212D37EC400DA6239 /* treegroupscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = treegroupscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/treegroupscommand.cpp; sourceTree = ""; }; A7E9B86312D37EC400DA6239 /* treegroupscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = treegroupscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/treegroupscommand.h; sourceTree = ""; }; A7E9B86412D37EC400DA6239 /* treemap.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = treemap.cpp; path = source/datastructures/treemap.cpp; sourceTree = ""; }; A7E9B86512D37EC400DA6239 /* treemap.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = treemap.h; path = source/datastructures/treemap.h; sourceTree = ""; }; A7E9B86612D37EC400DA6239 /* treenode.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = treenode.cpp; path = source/datastructures/treenode.cpp; sourceTree = ""; }; A7E9B86712D37EC400DA6239 /* treenode.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = treenode.h; path = source/datastructures/treenode.h; sourceTree = ""; }; A7E9B86812D37EC400DA6239 /* trimflowscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = trimflowscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/trimflowscommand.cpp; sourceTree = ""; }; A7E9B86912D37EC400DA6239 /* trimflowscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = trimflowscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/trimflowscommand.h; sourceTree = ""; }; A7E9B86A12D37EC400DA6239 /* trimseqscommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = trimseqscommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/trimseqscommand.cpp; sourceTree = ""; }; A7E9B86B12D37EC400DA6239 /* trimseqscommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = trimseqscommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/trimseqscommand.h; sourceTree = ""; }; A7E9B86C12D37EC400DA6239 /* unifracunweightedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = unifracunweightedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/unifracunweightedcommand.cpp; sourceTree = ""; }; A7E9B86D12D37EC400DA6239 /* unifracunweightedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = unifracunweightedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/unifracunweightedcommand.h; sourceTree = ""; }; A7E9B86E12D37EC400DA6239 /* unifracweightedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = unifracweightedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/unifracweightedcommand.cpp; sourceTree = ""; }; A7E9B86F12D37EC400DA6239 /* unifracweightedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = unifracweightedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/unifracweightedcommand.h; sourceTree = ""; }; A7E9B87012D37EC400DA6239 /* unweighted.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = unweighted.cpp; path = source/calculators/unweighted.cpp; sourceTree = ""; }; A7E9B87112D37EC400DA6239 /* unweighted.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = unweighted.h; path = source/calculators/unweighted.h; sourceTree = ""; }; A7E9B87212D37EC400DA6239 /* uvest.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = uvest.cpp; path = source/calculators/uvest.cpp; sourceTree = ""; }; A7E9B87312D37EC400DA6239 /* uvest.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = uvest.h; path = source/calculators/uvest.h; sourceTree = ""; }; A7E9B87412D37EC400DA6239 /* validcalculator.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = validcalculator.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/validcalculator.cpp; sourceTree = ""; }; A7E9B87512D37EC400DA6239 /* validcalculator.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = validcalculator.h; path = /Users/sarahwestcott/Desktop/mothur/source/validcalculator.h; sourceTree = ""; }; A7E9B87612D37EC400DA6239 /* validparameter.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = validparameter.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/validparameter.cpp; sourceTree = ""; }; A7E9B87712D37EC400DA6239 /* validparameter.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = validparameter.h; path = /Users/sarahwestcott/Desktop/mothur/source/validparameter.h; sourceTree = ""; }; A7E9B87812D37EC400DA6239 /* venn.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = venn.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/venn.cpp; sourceTree = ""; }; A7E9B87912D37EC400DA6239 /* venn.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = venn.h; path = /Users/sarahwestcott/Desktop/mothur/source/venn.h; sourceTree = ""; }; A7E9B87A12D37EC400DA6239 /* venncommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = venncommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/venncommand.cpp; sourceTree = ""; }; A7E9B87B12D37EC400DA6239 /* venncommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = venncommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/venncommand.h; sourceTree = ""; }; A7E9B87C12D37EC400DA6239 /* weighted.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = weighted.cpp; path = source/calculators/weighted.cpp; sourceTree = ""; }; A7E9B87D12D37EC400DA6239 /* weighted.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = weighted.h; path = source/calculators/weighted.h; sourceTree = ""; }; A7E9B87E12D37EC400DA6239 /* weightedlinkage.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = weightedlinkage.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/weightedlinkage.cpp; sourceTree = ""; }; A7E9B87F12D37EC400DA6239 /* whittaker.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = whittaker.cpp; path = source/calculators/whittaker.cpp; sourceTree = ""; }; A7E9B88012D37EC400DA6239 /* whittaker.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = whittaker.h; path = source/calculators/whittaker.h; sourceTree = ""; }; A7EEB0F414F29BFD00344B83 /* classifytreecommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = classifytreecommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifytreecommand.cpp; sourceTree = ""; }; A7EEB0F714F29C1B00344B83 /* classifytreecommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = classifytreecommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifytreecommand.h; sourceTree = ""; }; A7F24FC117EA365F0021DC9A /* classifyrfsharedcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = classifyrfsharedcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifyrfsharedcommand.cpp; sourceTree = ""; }; A7F24FC217EA365F0021DC9A /* classifyrfsharedcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = classifyrfsharedcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/classifyrfsharedcommand.h; sourceTree = ""; }; A7F9F5CD141A5E500032F693 /* sequenceparser.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = sequenceparser.h; path = source/datastructures/sequenceparser.h; sourceTree = ""; }; A7F9F5CE141A5E500032F693 /* sequenceparser.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = sequenceparser.cpp; path = source/datastructures/sequenceparser.cpp; sourceTree = ""; }; A7FA10001302E096003860FE /* mantelcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = mantelcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mantelcommand.h; sourceTree = ""; }; A7FA10011302E096003860FE /* mantelcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = mantelcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/mantelcommand.cpp; sourceTree = ""; }; A7FC480C12D788F20055BC5C /* linearalgebra.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = linearalgebra.h; path = /Users/sarahwestcott/Desktop/mothur/source/linearalgebra.h; sourceTree = ""; }; A7FC480D12D788F20055BC5C /* linearalgebra.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = linearalgebra.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/linearalgebra.cpp; sourceTree = ""; }; A7FC486512D795D60055BC5C /* pcacommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = pcacommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pcacommand.h; sourceTree = ""; }; A7FC486612D795D60055BC5C /* pcacommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = pcacommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/pcacommand.cpp; sourceTree = ""; }; A7FE7C3E1330EA1000F7B327 /* getcurrentcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = getcurrentcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getcurrentcommand.h; sourceTree = ""; }; A7FE7C3F1330EA1000F7B327 /* getcurrentcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = getcurrentcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/getcurrentcommand.cpp; sourceTree = ""; }; A7FE7E6B13311EA400F7B327 /* setcurrentcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = setcurrentcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/setcurrentcommand.h; sourceTree = ""; }; A7FE7E6C13311EA400F7B327 /* setcurrentcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = setcurrentcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/setcurrentcommand.cpp; sourceTree = ""; }; A7FF19F0140FFDA500AD216D /* trimoligos.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = trimoligos.h; path = /Users/sarahwestcott/Desktop/mothur/source/trimoligos.h; sourceTree = ""; }; A7FF19F1140FFDA500AD216D /* trimoligos.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = trimoligos.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/trimoligos.cpp; sourceTree = ""; }; A7FFB556142CA02C004884F2 /* summarytaxcommand.h */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.c.h; name = summarytaxcommand.h; path = /Users/sarahwestcott/Desktop/mothur/source/commands/summarytaxcommand.h; sourceTree = ""; }; A7FFB557142CA02C004884F2 /* summarytaxcommand.cpp */ = {isa = PBXFileReference; fileEncoding = 4; lastKnownFileType = sourcecode.cpp.cpp; name = summarytaxcommand.cpp; path = /Users/sarahwestcott/Desktop/mothur/source/commands/summarytaxcommand.cpp; sourceTree = ""; }; /* End PBXFileReference section */ /* Begin PBXFrameworksBuildPhase section */ 481FB5161AC0A63E0076CFF3 /* Frameworks */ = { isa = PBXFrameworksBuildPhase; buildActionMask = 2147483647; files = ( ); runOnlyForDeploymentPostprocessing = 0; }; 8DD76FAD0486AB0100D96B5E /* Frameworks */ = { isa = PBXFrameworksBuildPhase; buildActionMask = 2147483647; files = ( ); runOnlyForDeploymentPostprocessing = 0; }; /* End PBXFrameworksBuildPhase section */ /* Begin PBXGroup section */ 08FB7794FE84155DC02AAC07 /* mothur */ = { isa = PBXGroup; children = ( 08FB7795FE84155DC02AAC07 /* Source */, 481FB51A1AC0A63E0076CFF3 /* TestMothur */, 1AB674ADFE9D54B511CA2CBB /* Products */, ); name = mothur; sourceTree = ""; }; 08FB7795FE84155DC02AAC07 /* Source */ = { isa = PBXGroup; children = ( A7A61F1A130035C800E05B6B /* LICENSE.md */, A70332B512D3A13400761E33 /* makefile */, 481623E31B58267D004C60B7 /* INSTALL.md */, A7E9B65912D37EC300DA6239 /* averagelinkage.cpp */, A77B718A173D40E4002163C2 /* calcsparcc.h */, A77B7189173D40E4002163C2 /* calcsparcc.cpp */, A7E9BA4F12D398D700DA6239 /* clearcut */, A7E9B69812D37EC400DA6239 /* cluster.cpp */, A7E9B69912D37EC400DA6239 /* cluster.hpp */, A7E9B69A12D37EC400DA6239 /* clusterclassic.cpp */, A7E9B69B12D37EC400DA6239 /* clusterclassic.h */, A7E9BA3F12D395F700DA6239 /* calculators */, A7E9BA4512D3965600DA6239 /* chimera */, A7E9BA4B12D3966900DA6239 /* classifier */, A7E9B6A612D37EC400DA6239 /* collect.cpp */, A7E9B6A712D37EC400DA6239 /* collect.h */, A7E9B6AA12D37EC400DA6239 /* collectdisplay.h */, A7E9B6AB12D37EC400DA6239 /* collectorscurvedata.h */, 48F98E4C1A9CFD670005E81B /* completelinkage.cpp */, A7E9BA3812D3956100DA6239 /* commands */, A7E9B6B012D37EC400DA6239 /* commandfactory.hpp */, A7E9B6AF12D37EC400DA6239 /* commandfactory.cpp */, A7E9B6B112D37EC400DA6239 /* commandoptionparser.cpp */, A7E9B6B212D37EC400DA6239 /* commandoptionparser.hpp */, A7DAAFA3133A254E003956EB /* commandparameter.h */, A7D395C1184FA34300A350D7 /* communitytype */, A7E9BA4212D3960D00DA6239 /* containers */, A7E9B6B612D37EC400DA6239 /* consensus.h */, A7E9B6B512D37EC400DA6239 /* consensus.cpp */, A7AACFBA132FE008003D6C4D /* currentfile.h */, A7E9B6C912D37EC400DA6239 /* display.h */, A7E9B6D112D37EC400DA6239 /* dlibshuff.cpp */, A7E9B6D212D37EC400DA6239 /* dlibshuff.h */, A7E9B6D912D37EC400DA6239 /* endiannessmacros.h */, A7E9B6DA12D37EC400DA6239 /* engine.cpp */, A7E9B6DB12D37EC400DA6239 /* engine.hpp */, A7E9B6E012D37EC400DA6239 /* fileoutput.cpp */, A7E9B6E112D37EC400DA6239 /* fileoutput.h */, A7E9B71112D37EC400DA6239 /* gotohoverlap.hpp */, A7E9B71012D37EC400DA6239 /* gotohoverlap.cpp */, A7E9B71812D37EC400DA6239 /* hcluster.cpp */, A7E9B71912D37EC400DA6239 /* hcluster.h */, A7E9B71C12D37EC400DA6239 /* heatmap.cpp */, A7E9B71D12D37EC400DA6239 /* heatmap.h */, A7E9B72012D37EC400DA6239 /* heatmapsim.cpp */, A7E9B72112D37EC400DA6239 /* heatmapsim.h */, A7E9B72E12D37EC400DA6239 /* inputdata.h */, A7E9B72D12D37EC400DA6239 /* inputdata.cpp */, A7E9B73912D37EC400DA6239 /* libshuff.cpp */, A7E9B73A12D37EC400DA6239 /* libshuff.h */, A7FC480C12D788F20055BC5C /* linearalgebra.h */, A7FC480D12D788F20055BC5C /* linearalgebra.cpp */, A7D9378B17B15215001E90B0 /* wilcox.h */, A7D9378917B146B5001E90B0 /* wilcox.cpp */, A7E9BA5612D39BD800DA6239 /* metastats */, A7E9B75B12D37EC400DA6239 /* mothur.cpp */, A7E9B75C12D37EC400DA6239 /* mothur.h */, A7E9B75D12D37EC400DA6239 /* mothurout.cpp */, A7E9B75E12D37EC400DA6239 /* mothurout.h */, A774104714696F320098E6AC /* myseqdist.h */, A774104614696F320098E6AC /* myseqdist.cpp */, A7E9B76112D37EC400DA6239 /* nast.cpp */, A7E9B76212D37EC400DA6239 /* nast.hpp */, A7E9B76312D37EC400DA6239 /* nastreport.cpp */, A7E9B76412D37EC400DA6239 /* nastreport.hpp */, A7E9B76712D37EC400DA6239 /* noalign.cpp */, A7E9B76812D37EC400DA6239 /* noalign.hpp */, A7E9B76512D37EC400DA6239 /* needlemanoverlap.cpp */, A7E9B76612D37EC400DA6239 /* needlemanoverlap.hpp */, A7E9B77012D37EC400DA6239 /* observable.h */, A7E9B77512D37EC400DA6239 /* optionparser.cpp */, A7E9B77612D37EC400DA6239 /* optionparser.h */, A7E9B77B12D37EC400DA6239 /* overlap.cpp */, A7E9B77C12D37EC400DA6239 /* overlap.hpp */, A7E9B79B12D37EC400DA6239 /* progress.cpp */, A7E9B79C12D37EC400DA6239 /* progress.hpp */, A77B7187173D4041002163C2 /* randomnumber.h */, A77B7186173D4041002163C2 /* randomnumber.cpp */, A7E9B7A512D37EC400DA6239 /* rarecalc.cpp */, A7386C191619C9FB00651424 /* randomforest */, A7E9B7A612D37EC400DA6239 /* rarecalc.h */, A7E9B7A712D37EC400DA6239 /* raredisplay.cpp */, A7E9B7A812D37EC400DA6239 /* raredisplay.h */, A7E9B7A912D37EC400DA6239 /* rarefact.cpp */, A7E9B7AA12D37EC400DA6239 /* rarefact.h */, A7E9B7AD12D37EC400DA6239 /* rarefactioncurvedata.h */, 7E6BE10812F710D8007ADDBE /* refchimeratest.h */, 7E6BE10912F710D8007ADDBE /* refchimeratest.cpp */, A77410F514697C300098E6AC /* seqnoise.h */, A77410F414697C300098E6AC /* seqnoise.cpp */, A7E9BA5312D39A5E00DA6239 /* read */, A7E9B82312D37EC400DA6239 /* sharedutilities.cpp */, A7E9B82412D37EC400DA6239 /* sharedutilities.h */, A7E9B82D12D37EC400DA6239 /* singlelinkage.cpp */, A7E9B83012D37EC400DA6239 /* slibshuff.cpp */, A7E9B83112D37EC400DA6239 /* slibshuff.h */, A7876A28152A018B00A0AE86 /* subsample.h */, A7876A25152A017C00A0AE86 /* subsample.cpp */, 7B17437A17AF6F02004C161B /* svm */, A7C3DC0E14FE469500FE1924 /* trialswap2.h */, A7C3DC0D14FE469500FE1924 /* trialSwap2.cpp */, A7FF19F0140FFDA500AD216D /* trimoligos.h */, A7FF19F1140FFDA500AD216D /* trimoligos.cpp */, A7E9B87412D37EC400DA6239 /* validcalculator.cpp */, A7E9B87512D37EC400DA6239 /* validcalculator.h */, A7E9B87612D37EC400DA6239 /* validparameter.cpp */, A7E9B87712D37EC400DA6239 /* validparameter.h */, A7E9B87812D37EC400DA6239 /* venn.cpp */, A7E9B87912D37EC400DA6239 /* venn.h */, A7E9B87E12D37EC400DA6239 /* weightedlinkage.cpp */, ); name = Source; sourceTree = ""; }; 1AB674ADFE9D54B511CA2CBB /* Products */ = { isa = PBXGroup; children = ( 8DD76FB20486AB0100D96B5E /* mothur */, 481FB5191AC0A63E0076CFF3 /* TestMothur */, ); name = Products; sourceTree = ""; }; 481FB51A1AC0A63E0076CFF3 /* TestMothur */ = { isa = PBXGroup; children = ( 481FB5201AC0A6B60076CFF3 /* catch.hpp */, 481FB51B1AC0A63E0076CFF3 /* main.cpp */, 481FB5221AC0AA010076CFF3 /* testcontainers */, 481FB5211AC0A9B40076CFF3 /* testcommands */, ); path = TestMothur; sourceTree = ""; }; 481FB5211AC0A9B40076CFF3 /* testcommands */ = { isa = PBXGroup; children = ( 481FB52D1AC1B0CB0076CFF3 /* testsetseedcommand.cpp */, ); name = testcommands; sourceTree = ""; }; 481FB5221AC0AA010076CFF3 /* testcontainers */ = { isa = PBXGroup; children = ( 481FB5231AC0AA430076CFF3 /* testsequence.cpp */, ); name = testcontainers; sourceTree = ""; }; 7B17437A17AF6F02004C161B /* svm */ = { isa = PBXGroup; children = ( 7B21820117AD77BD00286E6A /* svm.cpp */, 7B21820217AD77BD00286E6A /* svm.hpp */, ); name = svm; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7386C191619C9FB00651424 /* randomforest */ = { isa = PBXGroup; children = ( A7386C1B1619CACB00651424 /* abstractdecisiontree.hpp */, A7386C241619E52200651424 /* abstractdecisiontree.cpp */, 48705ABA19BE32C50075E977 /* abstractrandomforest.hpp */, 48705AC319BE32C50075E977 /* abstractrandomforest.cpp */, A7386C1D1619CACB00651424 /* decisiontree.hpp */, A7386C28161A110700651424 /* decisiontree.cpp */, A7386C1E1619CACB00651424 /* macros.h */, A7386C1F1619CACB00651424 /* randomforest.hpp */, A77E1937161B201E00DB1A2A /* randomforest.cpp */, A7386C201619CACB00651424 /* rftreenode.hpp */, A77E193A161B289600DB1A2A /* rftreenode.cpp */, 83F25B0A163B031200ABE73D /* forest.cpp */, 83F25B0B163B031200ABE73D /* forest.h */, 834D9D561656D7C400E7FAB9 /* regularizedrandomforest.cpp */, 834D9D571656D7C400E7FAB9 /* regularizedrandomforest.h */, ); name = randomforest; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7D395C1184FA34300A350D7 /* communitytype */ = { isa = PBXGroup; children = ( A7132EAE184E76EB00AAA402 /* communitytype.h */, A7132EB2184E792700AAA402 /* communitytype.cpp */, A7D395C2184FA39300A350D7 /* kmeans.h */, A7D395C3184FA3A200A350D7 /* kmeans.cpp */, A7B093BE18579EF600843CD1 /* pam.h */, A7B093BF18579F0400843CD1 /* pam.cpp */, A7548FAF171440ED00B1F05A /* qFinderDMM.h */, A7548FAE171440EC00B1F05A /* qFinderDMM.cpp */, ); name = communitytype; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7E9BA3812D3956100DA6239 /* commands */ = { isa = PBXGroup; children = ( A7E9B6AE12D37EC400DA6239 /* command.hpp */, 219C1DE11552C508004209F9 /* newcommandtemplate.h */, 219C1DDF1552C4BD004209F9 /* newcommandtemplate.cpp */, A7E9B65212D37EC300DA6239 /* aligncommand.h */, A7E9B65112D37EC300DA6239 /* aligncommand.cpp */, A7A61F2B130062E000E05B6B /* amovacommand.h */, A7A61F2C130062E000E05B6B /* amovacommand.cpp */, A71CB15F130B04A2001E7287 /* anosimcommand.h */, A71CB15E130B04A2001E7287 /* anosimcommand.cpp */, A7E9B66112D37EC300DA6239 /* binsequencecommand.h */, A7E9B66012D37EC300DA6239 /* binsequencecommand.cpp */, A7E9B67312D37EC400DA6239 /* catchallcommand.h */, A7E9B67212D37EC400DA6239 /* catchallcommand.cpp */, A7E9B67B12D37EC400DA6239 /* chimerabellerophoncommand.h */, A7E9B67A12D37EC400DA6239 /* chimerabellerophoncommand.cpp */, A7E9B67D12D37EC400DA6239 /* chimeraccodecommand.h */, A7E9B67C12D37EC400DA6239 /* chimeraccodecommand.cpp */, A7E9B67F12D37EC400DA6239 /* chimeracheckcommand.h */, A7E9B67E12D37EC400DA6239 /* chimeracheckcommand.cpp */, A7E9B68312D37EC400DA6239 /* chimerapintailcommand.h */, A7E9B68212D37EC400DA6239 /* chimerapintailcommand.cpp */, A7BF2230145879B2000AD524 /* chimeraperseuscommand.h */, A7BF2231145879B2000AD524 /* chimeraperseuscommand.cpp */, A7E9B68B12D37EC400DA6239 /* chimeraslayercommand.h */, A7E9B68A12D37EC400DA6239 /* chimeraslayercommand.cpp */, A74D36B6137DAFAA00332B0C /* chimerauchimecommand.h */, A74D36B7137DAFAA00332B0C /* chimerauchimecommand.cpp */, A7E9B68D12D37EC400DA6239 /* chopseqscommand.h */, A7E9B68C12D37EC400DA6239 /* chopseqscommand.cpp */, A7E9B69112D37EC400DA6239 /* classifyotucommand.h */, A7E9B69012D37EC400DA6239 /* classifyotucommand.cpp */, A7E9B69312D37EC400DA6239 /* classifyseqscommand.h */, A7E9B69212D37EC400DA6239 /* classifyseqscommand.cpp */, A7F24FC217EA365F0021DC9A /* classifyrfsharedcommand.h */, A7F24FC117EA365F0021DC9A /* classifyrfsharedcommand.cpp */, 7B2181FF17AD777B00286E6A /* classifysvmsharedcommand.h */, 7B2181FE17AD777B00286E6A /* classifysvmsharedcommand.cpp */, A7EEB0F714F29C1B00344B83 /* classifytreecommand.h */, A7EEB0F414F29BFD00344B83 /* classifytreecommand.cpp */, A7E9B69712D37EC400DA6239 /* clearcutcommand.h */, A7E9B69612D37EC400DA6239 /* clearcutcommand.cpp */, A73DDBB813C4A0D1006AAE38 /* clearmemorycommand.h */, A73DDBB913C4A0D1006AAE38 /* clearmemorycommand.cpp */, A7E9B69D12D37EC400DA6239 /* clustercommand.h */, A7E9B69C12D37EC400DA6239 /* clustercommand.cpp */, A7E9B69F12D37EC400DA6239 /* clusterdoturcommand.h */, A7E9B69E12D37EC400DA6239 /* clusterdoturcommand.cpp */, A7E9B6A112D37EC400DA6239 /* clusterfragmentscommand.h */, A7E9B6A012D37EC400DA6239 /* clusterfragmentscommand.cpp */, A7E9B6A312D37EC400DA6239 /* clustersplitcommand.h */, A7E9B6A212D37EC400DA6239 /* clustersplitcommand.cpp */, A7E9B6A912D37EC400DA6239 /* collectcommand.h */, A7E9B6A812D37EC400DA6239 /* collectcommand.cpp */, A7E9B6AD12D37EC400DA6239 /* collectsharedcommand.h */, A7E9B6AC12D37EC400DA6239 /* collectsharedcommand.cpp */, A7E9B6B812D37EC400DA6239 /* consensusseqscommand.h */, A7E9B6B712D37EC400DA6239 /* consensusseqscommand.cpp */, A7C3DC0A14FE457500FE1924 /* cooccurrencecommand.h */, A7C3DC0914FE457500FE1924 /* cooccurrencecommand.cpp */, A7E9B6BA12D37EC400DA6239 /* corraxescommand.h */, A7E9B6B912D37EC400DA6239 /* corraxescommand.cpp */, A795840B13F13CD900F201D5 /* countgroupscommand.h */, A795840C13F13CD900F201D5 /* countgroupscommand.cpp */, A7730EFD13967241007433A3 /* countseqscommand.h */, A7730EFE13967241007433A3 /* countseqscommand.cpp */, A77EBD2C1523707F00ED407C /* createdatabasecommand.h */, A77EBD2E1523709100ED407C /* createdatabasecommand.cpp */, A7E9B6C412D37EC400DA6239 /* deconvolutecommand.h */, A7E9B6C312D37EC400DA6239 /* deconvolutecommand.cpp */, A7E9B6C612D37EC400DA6239 /* degapseqscommand.h */, A7E9B6C512D37EC400DA6239 /* degapseqscommand.cpp */, A7E9B6C812D37EC400DA6239 /* deuniqueseqscommand.h */, A7E9B6C712D37EC400DA6239 /* deuniqueseqscommand.cpp */, A77A221D139001B600B0BE70 /* deuniquetreecommand.h */, A77A221E139001B600B0BE70 /* deuniquetreecommand.cpp */, A7E9B6CC12D37EC400DA6239 /* distancecommand.h */, A7E9B6CB12D37EC400DA6239 /* distancecommand.cpp */, A7E9B6E412D37EC400DA6239 /* filterseqscommand.h */, A7E9B6E312D37EC400DA6239 /* filterseqscommand.cpp */, A79EEF8816971D640006DEC1 /* filtersharedcommand.h */, A79EEF8516971D4A0006DEC1 /* filtersharedcommand.cpp */, A778FE69134CA6CA00C0BA33 /* getcommandinfocommand.h */, A778FE6A134CA6CA00C0BA33 /* getcommandinfocommand.cpp */, 219C1DE51559BCF2004209F9 /* getcoremicrobiomecommand.h */, 219C1DE31559BCCD004209F9 /* getcoremicrobiomecommand.cpp */, A7FE7C3E1330EA1000F7B327 /* getcurrentcommand.h */, A7FE7C3F1330EA1000F7B327 /* getcurrentcommand.cpp */, A7128B1A16B7001200723BE4 /* getdistscommand.h */, A7128B1C16B7002600723BE4 /* getdistscommand.cpp */, A7E9B6F312D37EC400DA6239 /* getgroupcommand.h */, A7E9B6F212D37EC400DA6239 /* getgroupcommand.cpp */, A7E9B6F512D37EC400DA6239 /* getgroupscommand.h */, A7E9B6F412D37EC400DA6239 /* getgroupscommand.cpp */, A7E9B6F712D37EC400DA6239 /* getlabelcommand.h */, A7E9B6F612D37EC400DA6239 /* getlabelcommand.cpp */, A7548FAB17142EA500B1F05A /* getmetacommunitycommand.h */, A7548FAC17142EBC00B1F05A /* getmetacommunitycommand.cpp */, 48705ABB19BE32C50075E977 /* getmimarkspackagecommand.cpp */, 48705ABC19BE32C50075E977 /* getmimarkspackagecommand.h */, A7E9B6F912D37EC400DA6239 /* getlineagecommand.h */, A7E9B6F812D37EC400DA6239 /* getlineagecommand.cpp */, A7E9B6FB12D37EC400DA6239 /* getlistcountcommand.h */, A7E9B6FA12D37EC400DA6239 /* getlistcountcommand.cpp */, A7E9B6FF12D37EC400DA6239 /* getoturepcommand.h */, A7E9B6FE12D37EC400DA6239 /* getoturepcommand.cpp */, A7E9B70112D37EC400DA6239 /* getotuscommand.h */, A7E9B70012D37EC400DA6239 /* getotuscommand.cpp */, A70056E8156A93E300924A2D /* getotulabelscommand.h */, A70056E5156A93D000924A2D /* getotulabelscommand.cpp */, A7E9B70312D37EC400DA6239 /* getrabundcommand.h */, A7E9B70212D37EC400DA6239 /* getrabundcommand.cpp */, A7E9B70512D37EC400DA6239 /* getrelabundcommand.h */, A7E9B70412D37EC400DA6239 /* getrelabundcommand.cpp */, A7E9B70712D37EC400DA6239 /* getsabundcommand.h */, A7E9B70612D37EC400DA6239 /* getsabundcommand.cpp */, A7E9B70912D37EC400DA6239 /* getseqscommand.h */, A7E9B70812D37EC400DA6239 /* getseqscommand.cpp */, A7E9B70B12D37EC400DA6239 /* getsharedotucommand.h */, A7E9B70A12D37EC400DA6239 /* getsharedotucommand.cpp */, A7E9B71B12D37EC400DA6239 /* hclustercommand.h */, A7E9B71A12D37EC400DA6239 /* hclustercommand.cpp */, A7E9B71F12D37EC400DA6239 /* heatmapcommand.h */, A7E9B71E12D37EC400DA6239 /* heatmapcommand.cpp */, A7E9B72312D37EC400DA6239 /* heatmapsimcommand.h */, A7E9B72212D37EC400DA6239 /* heatmapsimcommand.cpp */, A7E9B72912D37EC400DA6239 /* helpcommand.h */, A7E9B72812D37EC400DA6239 /* helpcommand.cpp */, A75790571301749D00A30DAB /* homovacommand.h */, A75790581301749D00A30DAB /* homovacommand.cpp */, A7E9B72C12D37EC400DA6239 /* indicatorcommand.h */, A7E9B72B12D37EC400DA6239 /* indicatorcommand.cpp */, A7496D2D167B531B00CC7D7C /* kruskalwalliscommand.h */, A7496D2C167B531B00CC7D7C /* kruskalwalliscommand.cpp */, A7190B211768E0DF00A9AFA6 /* lefsecommand.h */, A7190B201768E0DF00A9AFA6 /* lefsecommand.cpp */, A7E9B73C12D37EC400DA6239 /* libshuffcommand.h */, A7E9B73B12D37EC400DA6239 /* libshuffcommand.cpp */, A7A0671C156294810095C8C5 /* listotulabelscommand.h */, A7A067191562946F0095C8C5 /* listotulabelscommand.cpp */, A7E9B73E12D37EC400DA6239 /* listseqscommand.h */, A7E9B73D12D37EC400DA6239 /* listseqscommand.cpp */, A73901051588C3EF00ED2ED6 /* loadlogfilecommand.h */, A73901071588C40900ED2ED6 /* loadlogfilecommand.cpp */, A7FA10001302E096003860FE /* mantelcommand.h */, A7FA10011302E096003860FE /* mantelcommand.cpp */, A724D2B4153C8600000A826F /* makebiomcommand.h */, A724D2B6153C8628000A826F /* makebiomcommand.cpp */, A7A0671D1562AC230095C8C5 /* makecontigscommand.h */, A7A0671E1562AC3E0095C8C5 /* makecontigscommand.cpp */, A799F5B71309A3E000AEEFA0 /* makefastqcommand.h */, A799F5B81309A3E000AEEFA0 /* makefastqcommand.cpp */, 48DB37B21B3B27E000C372A4 /* makefilecommand.h */, 48DB37B11B3B27E000C372A4 /* makefilecommand.cpp */, A7E9B74412D37EC400DA6239 /* makegroupcommand.h */, A7E9B74312D37EC400DA6239 /* makegroupcommand.cpp */, A741744B175CD9B1007DF49B /* makelefsecommand.h */, A741744A175CD9B1007DF49B /* makelefsecommand.cpp */, A7E6F69C17427CF2006775E2 /* makelookupcommand.h */, A7E6F69D17427D06006775E2 /* makelookupcommand.cpp */, A7E9B74A12D37EC400DA6239 /* matrixoutputcommand.h */, A7E9B74912D37EC400DA6239 /* matrixoutputcommand.cpp */, 48705ABF19BE32C50075E977 /* mergesfffilecommand.cpp */, 48705AC019BE32C50075E977 /* mergesfffilecommand.h */, A7E9B75412D37EC400DA6239 /* mergefilecommand.h */, A7E9B75312D37EC400DA6239 /* mergefilecommand.cpp */, A71FE12A12EDF72400963CA7 /* mergegroupscommand.h */, A71FE12B12EDF72400963CA7 /* mergegroupscommand.cpp */, A799314816CBD0BC0017E888 /* mergetaxsummarycommand.h */, A799314A16CBD0CD0017E888 /* mergetaxsummarycommand.cpp */, A7E9B75812D37EC400DA6239 /* metastatscommand.h */, A7E9B75712D37EC400DA6239 /* metastatscommand.cpp */, A7E9B75A12D37EC400DA6239 /* mgclustercommand.h */, A7E9B75912D37EC400DA6239 /* mgclustercommand.cpp */, 487C5A851AB88B93002AF48A /* mimarksattributescommand.cpp */, 487C5A861AB88B93002AF48A /* mimarksattributescommand.h */, A7E9B76A12D37EC400DA6239 /* nocommands.h */, A7E9B76912D37EC400DA6239 /* nocommands.cpp */, A7E9B76C12D37EC400DA6239 /* normalizesharedcommand.h */, A7E9B76B12D37EC400DA6239 /* normalizesharedcommand.cpp */, A713EBEB12DC7C5E000092AC /* nmdscommand.h */, A713EBEC12DC7C5E000092AC /* nmdscommand.cpp */, A7A3C8C814D041AD00B1BFBE /* otuassociationcommand.h */, A7A3C8C714D041AD00B1BFBE /* otuassociationcommand.cpp */, A7E9B77A12D37EC400DA6239 /* otuhierarchycommand.h */, A7E9B77912D37EC400DA6239 /* otuhierarchycommand.cpp */, A7E9B77E12D37EC400DA6239 /* pairwiseseqscommand.h */, A7E9B77D12D37EC400DA6239 /* pairwiseseqscommand.cpp */, A7E9B78012D37EC400DA6239 /* parsefastaqcommand.h */, A7E9B77F12D37EC400DA6239 /* parsefastaqcommand.cpp */, A7E9B78212D37EC400DA6239 /* parselistscommand.h */, A7E9B78112D37EC400DA6239 /* parselistscommand.cpp */, A7E9B78612D37EC400DA6239 /* parsimonycommand.h */, A7E9B78512D37EC400DA6239 /* parsimonycommand.cpp */, A7FC486512D795D60055BC5C /* pcacommand.h */, A7FC486612D795D60055BC5C /* pcacommand.cpp */, A7E9B78812D37EC400DA6239 /* pcoacommand.h */, A7E9B78712D37EC400DA6239 /* pcoacommand.cpp */, A76CDD7F1510F09A004C8458 /* pcrseqscommand.h */, 481623E11B56A2DB004C60B7 /* pcrseqscommand.cpp */, A7E9B78C12D37EC400DA6239 /* phylodiversitycommand.h */, A7E9B78B12D37EC400DA6239 /* phylodiversitycommand.cpp */, A7E9B79212D37EC400DA6239 /* phylotypecommand.h */, A7E9B79112D37EC400DA6239 /* phylotypecommand.cpp */, A7E9B79612D37EC400DA6239 /* pipelinepdscommand.h */, A7E9B79512D37EC400DA6239 /* pipelinepdscommand.cpp */, A7E9B79812D37EC400DA6239 /* preclustercommand.h */, A7E9B79712D37EC400DA6239 /* preclustercommand.cpp */, A74C06E616A9C097008390A3 /* primerdesigncommand.h */, A74C06E816A9C0A8008390A3 /* primerdesigncommand.cpp */, A7E9B7A212D37EC400DA6239 /* quitcommand.h */, A7E9B7A112D37EC400DA6239 /* quitcommand.cpp */, A7E9B7AC12D37EC400DA6239 /* rarefactcommand.h */, A7E9B7AB12D37EC400DA6239 /* rarefactcommand.cpp */, A7E9B7AF12D37EC400DA6239 /* rarefactsharedcommand.h */, A7E9B7AE12D37EC400DA6239 /* rarefactsharedcommand.cpp */, A7B0231716B8245D006BA09E /* removedistscommand.h */, A7B0231416B8244B006BA09E /* removedistscommand.cpp */, A7E9B7C412D37EC400DA6239 /* removegroupscommand.h */, A7E9B7C312D37EC400DA6239 /* removegroupscommand.cpp */, A7E9B7C612D37EC400DA6239 /* removelineagecommand.h */, A7E9B7C512D37EC400DA6239 /* removelineagecommand.cpp */, A7E9B7C812D37EC400DA6239 /* removeotuscommand.h */, A7E9B7C712D37EC400DA6239 /* removeotuscommand.cpp */, A70056E9156AB6D400924A2D /* removeotulabelscommand.h */, A70056EA156AB6E500924A2D /* removeotulabelscommand.cpp */, A727864212E9E28C00F86ABA /* removerarecommand.h */, A727864312E9E28C00F86ABA /* removerarecommand.cpp */, A7E9B7CA12D37EC400DA6239 /* removeseqscommand.h */, A7E9B7C912D37EC400DA6239 /* removeseqscommand.cpp */, A7CFA42F1755400500D9ED4D /* renameseqscommand.h */, A7CFA4301755401800D9ED4D /* renameseqscommand.cpp */, A7E9B7CE12D37EC400DA6239 /* reversecommand.h */, A7E9B7CD12D37EC400DA6239 /* reversecommand.cpp */, A7E9B7D212D37EC400DA6239 /* screenseqscommand.h */, A7E9B7D112D37EC400DA6239 /* screenseqscommand.cpp */, A7E9B7D412D37EC400DA6239 /* secondarystructurecommand.h */, A7E9B7D312D37EC400DA6239 /* secondarystructurecommand.cpp */, A7E9B7D612D37EC400DA6239 /* sensspeccommand.h */, A7E9B7D512D37EC400DA6239 /* sensspeccommand.cpp */, A7E9B7D812D37EC400DA6239 /* seqerrorcommand.h */, A7E9B7D712D37EC400DA6239 /* seqerrorcommand.cpp */, A7E9B7DA12D37EC400DA6239 /* seqsummarycommand.h */, A7E9B7D912D37EC400DA6239 /* seqsummarycommand.cpp */, A7FE7E6B13311EA400F7B327 /* setcurrentcommand.h */, A7FE7E6C13311EA400F7B327 /* setcurrentcommand.cpp */, A7E9B7E012D37EC400DA6239 /* setdircommand.h */, A7E9B7DF12D37EC400DA6239 /* setdircommand.cpp */, A7E9B7E212D37EC400DA6239 /* setlogfilecommand.h */, A7E9B7E112D37EC400DA6239 /* setlogfilecommand.cpp */, 481FB5291AC19F8B0076CFF3 /* setseedcommand.h */, 481FB5281AC19F8B0076CFF3 /* setseedcommand.cpp */, A7E9B7E412D37EC400DA6239 /* sffinfocommand.h */, A7E9B7E312D37EC400DA6239 /* sffinfocommand.cpp */, A7C7DAB615DA75760059B0CF /* sffmultiplecommand.h */, A7C7DAB815DA758B0059B0CF /* sffmultiplecommand.cpp */, A7E9B7F312D37EC400DA6239 /* sharedcommand.h */, A7E9B7F212D37EC400DA6239 /* sharedcommand.cpp */, A7E9B82812D37EC400DA6239 /* shhhercommand.h */, A7E9B82712D37EC400DA6239 /* shhhercommand.cpp */, A774101214695AF60098E6AC /* shhhseqscommand.h */, A774101314695AF60098E6AC /* shhhseqscommand.cpp */, A7A32DAC14DC43D10001D2E5 /* sortseqscommand.h */, A7A32DA914DC43B00001D2E5 /* sortseqscommand.cpp */, A77B7183173D222F002163C2 /* sparcccommand.h */, A77B7184173D2240002163C2 /* sparcccommand.cpp */, A7E9B84012D37EC400DA6239 /* splitabundcommand.h */, A7E9B83F12D37EC400DA6239 /* splitabundcommand.cpp */, A7E9B84212D37EC400DA6239 /* splitgroupscommand.h */, A7E9B84112D37EC400DA6239 /* splitgroupscommand.cpp */, A747EC6F181EA0E500345732 /* sracommand.h */, A747EC70181EA0F900345732 /* sracommand.cpp */, A7E9B85012D37EC400DA6239 /* subsamplecommand.h */, A7E9B84F12D37EC400DA6239 /* subsamplecommand.cpp */, A7E9B85812D37EC400DA6239 /* summarycommand.h */, A7E9B85712D37EC400DA6239 /* summarycommand.cpp */, A754149514840CF7005850D1 /* summaryqualcommand.h */, A754149614840CF7005850D1 /* summaryqualcommand.cpp */, A7E9B85A12D37EC400DA6239 /* summarysharedcommand.h */, A7E9B85912D37EC400DA6239 /* summarysharedcommand.cpp */, A7FFB556142CA02C004884F2 /* summarytaxcommand.h */, A7FFB557142CA02C004884F2 /* summarytaxcommand.cpp */, A7E9B85C12D37EC400DA6239 /* systemcommand.h */, A7E9B85B12D37EC400DA6239 /* systemcommand.cpp */, A7E9B86312D37EC400DA6239 /* treegroupscommand.h */, A7E9B86212D37EC400DA6239 /* treegroupscommand.cpp */, A7E9B86912D37EC400DA6239 /* trimflowscommand.h */, A7E9B86812D37EC400DA6239 /* trimflowscommand.cpp */, A7E9B86B12D37EC400DA6239 /* trimseqscommand.h */, A7E9B86A12D37EC400DA6239 /* trimseqscommand.cpp */, A7E9B86D12D37EC400DA6239 /* unifracunweightedcommand.h */, A7E9B86C12D37EC400DA6239 /* unifracunweightedcommand.cpp */, A7E9B86F12D37EC400DA6239 /* unifracweightedcommand.h */, A7E9B86E12D37EC400DA6239 /* unifracweightedcommand.cpp */, A7E9B87B12D37EC400DA6239 /* venncommand.h */, A7E9B87A12D37EC400DA6239 /* venncommand.cpp */, ); name = commands; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7E9BA3F12D395F700DA6239 /* calculators */ = { isa = PBXGroup; children = ( 7E78911B135F3E8600E725D2 /* eachgapdistignorens.h */, A7E9B64F12D37EC300DA6239 /* ace.cpp */, A7E9B65012D37EC300DA6239 /* ace.h */, A7E9B65E12D37EC300DA6239 /* bergerparker.cpp */, A7E9B65F12D37EC300DA6239 /* bergerparker.h */, A7E9B66612D37EC400DA6239 /* boneh.cpp */, A7E9B66712D37EC400DA6239 /* boneh.h */, A7E9B66812D37EC400DA6239 /* bootstrap.cpp */, A7E9B66912D37EC400DA6239 /* bootstrap.h */, A7E9B66C12D37EC400DA6239 /* bstick.cpp */, A7E9B66D12D37EC400DA6239 /* bstick.h */, A7E9B66E12D37EC400DA6239 /* calculator.cpp */, A7E9B66F12D37EC400DA6239 /* calculator.h */, A7E9B67012D37EC400DA6239 /* canberra.cpp */, A7E9B67112D37EC400DA6239 /* canberra.h */, A7E9B67612D37EC400DA6239 /* chao1.cpp */, A7E9B67712D37EC400DA6239 /* chao1.h */, A7E9B6BB12D37EC400DA6239 /* coverage.cpp */, A7E9B6BC12D37EC400DA6239 /* coverage.h */, A7E9B6CA12D37EC400DA6239 /* dist.h */, A7E9B6C012D37EC400DA6239 /* dayhoff.h */, A7E9B6D512D37EC400DA6239 /* eachgapdist.h */, A7E9B6D612D37EC400DA6239 /* eachgapignore.h */, A7E9B6D712D37EC400DA6239 /* efron.cpp */, A7E9B6D812D37EC400DA6239 /* efron.h */, A7E9B6E212D37EC400DA6239 /* filters.h */, A7E9B6F012D37EC400DA6239 /* geom.cpp */, A7E9B6F112D37EC400DA6239 /* geom.h */, A7E9B70E12D37EC400DA6239 /* goodscoverage.cpp */, A7E9B70F12D37EC400DA6239 /* goodscoverage.h */, A7E9B71212D37EC400DA6239 /* gower.cpp */, A7E9B71312D37EC400DA6239 /* gower.h */, A7E9B71612D37EC400DA6239 /* hamming.cpp */, A7E9B71712D37EC400DA6239 /* hamming.h */, A7E9B72412D37EC400DA6239 /* heip.cpp */, A7E9B72512D37EC400DA6239 /* heip.h */, A7E9B72612D37EC400DA6239 /* hellinger.cpp */, A7E9B72712D37EC400DA6239 /* hellinger.h */, A7E9B72A12D37EC400DA6239 /* ignoregaps.h */, A7E9B72F12D37EC400DA6239 /* invsimpson.cpp */, A7E9B73012D37EC400DA6239 /* invsimpson.h */, A7E9B73112D37EC400DA6239 /* jackknife.cpp */, A7E9B73212D37EC400DA6239 /* jackknife.h */, A7E9B74112D37EC400DA6239 /* logsd.cpp */, A7E9B74212D37EC400DA6239 /* logsd.h */, A7E9B74712D37EC400DA6239 /* manhattan.cpp */, A7E9B74812D37EC400DA6239 /* manhattan.h */, A7E9B74B12D37EC400DA6239 /* memchi2.cpp */, A7E9B74C12D37EC400DA6239 /* memchi2.h */, A7E9B74D12D37EC400DA6239 /* memchord.cpp */, A7E9B74E12D37EC400DA6239 /* memchord.h */, A7E9B74F12D37EC400DA6239 /* memeuclidean.cpp */, A7E9B75012D37EC400DA6239 /* memeuclidean.h */, A7E9B75112D37EC400DA6239 /* mempearson.cpp */, A7E9B75212D37EC400DA6239 /* mempearson.h */, A7E9B76D12D37EC400DA6239 /* npshannon.cpp */, A7E9B76E12D37EC400DA6239 /* npshannon.h */, A7E9B76F12D37EC400DA6239 /* nseqs.h */, A7E9B77112D37EC400DA6239 /* odum.cpp */, A7E9B77212D37EC400DA6239 /* odum.h */, A7E9B77312D37EC400DA6239 /* onegapdist.h */, A7E9B77412D37EC400DA6239 /* onegapignore.h */, A7E9B78412D37EC400DA6239 /* parsimony.h */, A7E9B78312D37EC400DA6239 /* parsimony.cpp */, A7E9B79912D37EC400DA6239 /* prng.cpp */, A7E9B79A12D37EC400DA6239 /* prng.h */, A7E9B79D12D37EC400DA6239 /* qstat.cpp */, A7E9B79E12D37EC400DA6239 /* qstat.h */, A7E9B7E512D37EC400DA6239 /* shannon.cpp */, A7E9B7E612D37EC400DA6239 /* shannon.h */, A7E9B7E712D37EC400DA6239 /* shannoneven.cpp */, A7E9B7E812D37EC400DA6239 /* shannoneven.h */, A7A09B0E18773BF700FAA081 /* shannonrange.h */, A7A09B0F18773C0E00FAA081 /* shannonrange.cpp */, A7E9B7E912D37EC400DA6239 /* sharedace.cpp */, A7E9B7F412D37EC400DA6239 /* sharedjabund.cpp */, A7E9B7EA12D37EC400DA6239 /* sharedace.h */, A7E9B7EC12D37EC400DA6239 /* sharedanderbergs.cpp */, A7E9B7ED12D37EC400DA6239 /* sharedanderbergs.h */, A7E9B7EE12D37EC400DA6239 /* sharedbraycurtis.cpp */, A7E9B7EF12D37EC400DA6239 /* sharedbraycurtis.h */, A7E9B7F012D37EC400DA6239 /* sharedchao1.cpp */, A7E9B7F112D37EC400DA6239 /* sharedchao1.h */, A7E9B7F512D37EC400DA6239 /* sharedjabund.h */, A7E9B7F612D37EC400DA6239 /* sharedjackknife.cpp */, A7E9B7F712D37EC400DA6239 /* sharedjackknife.h */, A7E9B7F812D37EC400DA6239 /* sharedjclass.cpp */, A7E9B7F912D37EC400DA6239 /* sharedjclass.h */, A7E9B7FA12D37EC400DA6239 /* sharedjest.cpp */, A7E9B7FB12D37EC400DA6239 /* sharedjest.h */, A7222D711856276C0055A993 /* sharedjsd.h */, A7222D721856277C0055A993 /* sharedjsd.cpp */, A7E9B7FC12D37EC400DA6239 /* sharedkstest.cpp */, A7E9B7FD12D37EC400DA6239 /* sharedkstest.h */, A7E9B7FE12D37EC400DA6239 /* sharedkulczynski.cpp */, A7E9B7FF12D37EC400DA6239 /* sharedkulczynski.h */, A7E9B80012D37EC400DA6239 /* sharedkulczynskicody.cpp */, A7E9B80112D37EC400DA6239 /* sharedkulczynskicody.h */, A7E9B80212D37EC400DA6239 /* sharedlennon.cpp */, A7E9B80312D37EC400DA6239 /* sharedlennon.h */, A7E9B80612D37EC400DA6239 /* sharedmarczewski.cpp */, A7E9B80712D37EC400DA6239 /* sharedmarczewski.h */, A7E9B80812D37EC400DA6239 /* sharedmorisitahorn.cpp */, A7E9B80912D37EC400DA6239 /* sharedmorisitahorn.h */, A7E9B80A12D37EC400DA6239 /* sharednseqs.h */, A7E9B80B12D37EC400DA6239 /* sharedochiai.cpp */, A7E9B80C12D37EC400DA6239 /* sharedochiai.h */, 48705AC119BE32C50075E977 /* sharedrjsd.cpp */, 48705AC219BE32C50075E977 /* sharedrjsd.h */, A7E9B81512D37EC400DA6239 /* sharedsobs.cpp */, A7E9B81612D37EC400DA6239 /* sharedsobs.h */, A7E9B81712D37EC400DA6239 /* sharedsobscollectsummary.cpp */, A7E9B81812D37EC400DA6239 /* sharedsobscollectsummary.h */, A7E9B81912D37EC400DA6239 /* sharedsorabund.cpp */, A7E9B81A12D37EC400DA6239 /* sharedsorabund.h */, A7E9B81B12D37EC400DA6239 /* sharedsorclass.cpp */, A7E9B81C12D37EC400DA6239 /* sharedsorclass.h */, A7E9B81D12D37EC400DA6239 /* sharedsorest.cpp */, A7E9B81E12D37EC400DA6239 /* sharedsorest.h */, A7E9B81F12D37EC400DA6239 /* sharedthetan.cpp */, A7E9B82012D37EC400DA6239 /* sharedthetan.h */, A7E9B82112D37EC400DA6239 /* sharedthetayc.cpp */, A7E9B82212D37EC400DA6239 /* sharedthetayc.h */, A7E9B82512D37EC400DA6239 /* shen.cpp */, A7E9B82612D37EC400DA6239 /* shen.h */, A7E9B82912D37EC400DA6239 /* simpson.cpp */, A7E9B82A12D37EC400DA6239 /* simpson.h */, A7E9B82B12D37EC400DA6239 /* simpsoneven.cpp */, A7E9B82C12D37EC400DA6239 /* simpsoneven.h */, A7E9B83212D37EC400DA6239 /* smithwilson.cpp */, A7E9B83312D37EC400DA6239 /* smithwilson.h */, A7E9B83412D37EC400DA6239 /* sobs.h */, A7E9B83512D37EC400DA6239 /* soergel.cpp */, A7E9B83612D37EC400DA6239 /* soergel.h */, A7E9B83712D37EC400DA6239 /* solow.cpp */, A7E9B83812D37EC400DA6239 /* solow.h */, A7E9B83B12D37EC400DA6239 /* spearman.cpp */, A7E9B83C12D37EC400DA6239 /* spearman.h */, A7E9B83D12D37EC400DA6239 /* speciesprofile.cpp */, A7E9B83E12D37EC400DA6239 /* speciesprofile.h */, A7E9B84512D37EC400DA6239 /* structchi2.cpp */, A7E9B84612D37EC400DA6239 /* structchi2.h */, A7E9B84712D37EC400DA6239 /* structchord.cpp */, A7E9B84812D37EC400DA6239 /* structchord.h */, A7E9B84912D37EC400DA6239 /* structeuclidean.cpp */, A7E9B84A12D37EC400DA6239 /* structeuclidean.h */, A7E9B84B12D37EC400DA6239 /* structkulczynski.cpp */, A7E9B84C12D37EC400DA6239 /* structkulczynski.h */, A7E9B84D12D37EC400DA6239 /* structpearson.cpp */, A7E9B84E12D37EC400DA6239 /* structpearson.h */, A7E9B86112D37EC400DA6239 /* treecalculator.h */, A7E9B87112D37EC400DA6239 /* unweighted.h */, A7E9B87012D37EC400DA6239 /* unweighted.cpp */, A7E9B87212D37EC400DA6239 /* uvest.cpp */, A7E9B87312D37EC400DA6239 /* uvest.h */, A7E9B87D12D37EC400DA6239 /* weighted.h */, A7E9B87C12D37EC400DA6239 /* weighted.cpp */, A7E9B87F12D37EC400DA6239 /* whittaker.cpp */, A7E9B88012D37EC400DA6239 /* whittaker.h */, ); name = calculators; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7E9BA4212D3960D00DA6239 /* containers */ = { isa = PBXGroup; children = ( A7E9B65312D37EC300DA6239 /* alignment.cpp */, A7E9B65412D37EC300DA6239 /* alignment.hpp */, A7E9B65512D37EC300DA6239 /* alignmentcell.cpp */, A7E9B65612D37EC300DA6239 /* alignmentcell.hpp */, A7E9B65712D37EC300DA6239 /* alignmentdb.cpp */, A7E9B65812D37EC300DA6239 /* alignmentdb.h */, A7E9B66212D37EC300DA6239 /* blastalign.cpp */, A7E9B66312D37EC400DA6239 /* blastalign.hpp */, A7E9B66412D37EC400DA6239 /* blastdb.cpp */, A7E9B66512D37EC400DA6239 /* blastdb.hpp */, 48844B261AA74AF9006EF2B8 /* compare.h */, A74D59A6159A1E3600043046 /* counttable.h */, A74D59A3159A1E2000043046 /* counttable.cpp */, A7E9B6BD12D37EC400DA6239 /* database.cpp */, A7E9B6BE12D37EC400DA6239 /* database.hpp */, A7E9B6BF12D37EC400DA6239 /* datavector.hpp */, A77916E7176F7F7600EEFE18 /* designmap.h */, A77916E6176F7F7600EEFE18 /* designmap.cpp */, A7E9B6CE12D37EC400DA6239 /* distancedb.hpp */, A7E9B6CD12D37EC400DA6239 /* distancedb.cpp */, A7E9B6DE12D37EC400DA6239 /* fastamap.cpp */, A7E9B6DF12D37EC400DA6239 /* fastamap.h */, 48C51DEE1A76B870004ECDF1 /* fastqread.h */, 48C51DEF1A76B888004ECDF1 /* fastqread.cpp */, A7E9B6E812D37EC400DA6239 /* flowdata.h */, A7E9B6E712D37EC400DA6239 /* flowdata.cpp */, A7E9B6EE12D37EC400DA6239 /* fullmatrix.cpp */, A7E9B6EF12D37EC400DA6239 /* fullmatrix.h */, A7E9B71412D37EC400DA6239 /* groupmap.cpp */, A7E9B71512D37EC400DA6239 /* groupmap.h */, A7E9B73312D37EC400DA6239 /* kmer.cpp */, A7E9B73412D37EC400DA6239 /* kmer.hpp */, 48C51DF21A793EFE004ECDF1 /* kmeralign.h */, 48C51DF11A793EFE004ECDF1 /* kmeralign.cpp */, A7E9B73512D37EC400DA6239 /* kmerdb.cpp */, A7E9B73612D37EC400DA6239 /* kmerdb.hpp */, A7E9B73F12D37EC400DA6239 /* listvector.cpp */, A7E9B74012D37EC400DA6239 /* listvector.hpp */, A7E9B75F12D37EC400DA6239 /* nameassignment.cpp */, A7E9B76012D37EC400DA6239 /* nameassignment.hpp */, 48705ABE19BE32C50075E977 /* oligos.h */, 48705ABD19BE32C50075E977 /* oligos.cpp */, A7E9B77712D37EC400DA6239 /* ordervector.cpp */, A7E9B77812D37EC400DA6239 /* ordervector.hpp */, A7E9B79F12D37EC400DA6239 /* qualityscores.cpp */, A7E9B7A012D37EC400DA6239 /* qualityscores.h */, A7E9B7A312D37EC400DA6239 /* rabundvector.cpp */, A7E9B7A412D37EC400DA6239 /* rabundvector.hpp */, A721765513BB9F7D0014DAAE /* referencedb.h */, A721765613BB9F7D0014DAAE /* referencedb.cpp */, A7E9B7CB12D37EC400DA6239 /* reportfile.cpp */, A7E9B7CC12D37EC400DA6239 /* reportfile.h */, A7E9B7CF12D37EC400DA6239 /* sabundvector.cpp */, A7E9B7D012D37EC400DA6239 /* sabundvector.hpp */, A7E9B7DB12D37EC400DA6239 /* sequence.cpp */, A7E9B7DC12D37EC400DA6239 /* sequence.hpp */, A741FAD415D168A00067BCC5 /* sequencecountparser.h */, A741FAD115D1688E0067BCC5 /* sequencecountparser.cpp */, A7E9B7DD12D37EC400DA6239 /* sequencedb.cpp */, A7E9B7DE12D37EC400DA6239 /* sequencedb.h */, A7F9F5CD141A5E500032F693 /* sequenceparser.h */, A7F9F5CE141A5E500032F693 /* sequenceparser.cpp */, A7E9B80412D37EC400DA6239 /* sharedlistvector.cpp */, A7E9B80512D37EC400DA6239 /* sharedlistvector.h */, A7E9B80E12D37EC400DA6239 /* sharedordervector.h */, A7E9B80D12D37EC400DA6239 /* sharedordervector.cpp */, A7E9B81012D37EC400DA6239 /* sharedrabundfloatvector.h */, A7E9B80F12D37EC400DA6239 /* sharedrabundfloatvector.cpp */, A7E9B81112D37EC400DA6239 /* sharedrabundvector.cpp */, A7E9B81212D37EC400DA6239 /* sharedrabundvector.h */, A7E9B81312D37EC400DA6239 /* sharedsabundvector.cpp */, A7E9B81412D37EC400DA6239 /* sharedsabundvector.h */, A7E9B83912D37EC400DA6239 /* sparsematrix.cpp */, A7E9B83A12D37EC400DA6239 /* sparsematrix.hpp */, A7E0243F15B4522000A5F046 /* sparsedistancematrix.h */, A7E0243C15B4520A00A5F046 /* sparsedistancematrix.cpp */, A7E9B85112D37EC400DA6239 /* suffixdb.cpp */, A7E9B85212D37EC400DA6239 /* suffixdb.hpp */, A7E9B85312D37EC400DA6239 /* suffixnodes.cpp */, A7E9B85412D37EC400DA6239 /* suffixnodes.hpp */, A7E9B85512D37EC400DA6239 /* suffixtree.cpp */, A7E9B85612D37EC400DA6239 /* suffixtree.hpp */, A7E9B85F12D37EC400DA6239 /* tree.cpp */, A7E9B86012D37EC400DA6239 /* tree.h */, A7E9B86412D37EC400DA6239 /* treemap.cpp */, A7E9B86512D37EC400DA6239 /* treemap.h */, A7E9B86612D37EC400DA6239 /* treenode.cpp */, A7E9B86712D37EC400DA6239 /* treenode.h */, ); name = containers; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7E9BA4512D3965600DA6239 /* chimera */ = { isa = PBXGroup; children = ( A7E9B65C12D37EC300DA6239 /* bellerophon.cpp */, A7E9B65D12D37EC300DA6239 /* bellerophon.h */, A7E9B67412D37EC400DA6239 /* ccode.cpp */, A7E9B67512D37EC400DA6239 /* ccode.h */, A7E9B67812D37EC400DA6239 /* chimera.cpp */, A7E9B67912D37EC400DA6239 /* chimera.h */, A7E9B68012D37EC400DA6239 /* chimeracheckrdp.cpp */, A7E9B68112D37EC400DA6239 /* chimeracheckrdp.h */, A7E9B68412D37EC400DA6239 /* chimerarealigner.cpp */, A7E9B68512D37EC400DA6239 /* chimerarealigner.h */, A7E9B6C212D37EC400DA6239 /* decalc.h */, A7E9B6C112D37EC400DA6239 /* decalc.cpp */, A7E9B68812D37EC400DA6239 /* chimeraslayer.cpp */, A7E9B68912D37EC400DA6239 /* chimeraslayer.h */, A7E9B74612D37EC400DA6239 /* maligner.h */, A7E9B74512D37EC400DA6239 /* maligner.cpp */, A7BF221314587886000AD524 /* myPerseus.h */, A7BF221214587886000AD524 /* myPerseus.cpp */, A7E9B79312D37EC400DA6239 /* pintail.cpp */, A7E9B79412D37EC400DA6239 /* pintail.h */, A7E9B82E12D37EC400DA6239 /* slayer.cpp */, A7E9B82F12D37EC400DA6239 /* slayer.h */, ); name = chimera; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7E9BA4B12D3966900DA6239 /* classifier */ = { isa = PBXGroup; children = ( A721AB67161C570F009860A1 /* alignnode.h */, A721AB66161C570F009860A1 /* alignnode.cpp */, A721AB69161C570F009860A1 /* aligntree.h */, A721AB68161C570F009860A1 /* aligntree.cpp */, A7E9B65B12D37EC300DA6239 /* bayesian.h */, A7E9B65A12D37EC300DA6239 /* bayesian.cpp */, A7E9B68E12D37EC400DA6239 /* classify.cpp */, A7E9B68F12D37EC400DA6239 /* classify.h */, A721AB6E161C572A009860A1 /* kmernode.h */, A721AB6D161C572A009860A1 /* kmernode.cpp */, A721AB70161C572A009860A1 /* kmertree.h */, A721AB6F161C572A009860A1 /* kmertree.cpp */, A7E9B73812D37EC400DA6239 /* knn.h */, A7E9B73712D37EC400DA6239 /* knn.cpp */, A7E9B78D12D37EC400DA6239 /* phylosummary.cpp */, A7E9B78E12D37EC400DA6239 /* phylosummary.h */, A7E9B78F12D37EC400DA6239 /* phylotree.cpp */, A7E9B79012D37EC400DA6239 /* phylotree.h */, A7E9B85D12D37EC400DA6239 /* taxonomyequalizer.cpp */, A7E9B85E12D37EC400DA6239 /* taxonomyequalizer.h */, A721AB74161C573B009860A1 /* taxonomynode.h */, A721AB73161C573B009860A1 /* taxonomynode.cpp */, ); name = classifier; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7E9BA4F12D398D700DA6239 /* clearcut */ = { isa = PBXGroup; children = ( A7E9B69412D37EC400DA6239 /* clearcut.cpp */, A7E9B69512D37EC400DA6239 /* clearcut.h */, A7E9B6A412D37EC400DA6239 /* cmdargs.cpp */, A7E9B6A512D37EC400DA6239 /* cmdargs.h */, A7E9B6B312D37EC400DA6239 /* common.h */, A7E9B6CF12D37EC400DA6239 /* distclearcut.cpp */, A7E9B6D012D37EC400DA6239 /* distclearcut.h */, A7E9B6D312D37EC400DA6239 /* dmat.cpp */, A7E9B6D412D37EC400DA6239 /* dmat.h */, A7E9B6DC12D37EC400DA6239 /* fasta.cpp */, A7E9B6DD12D37EC400DA6239 /* fasta.h */, A7E9B6FD12D37EC400DA6239 /* getopt_long.h */, A7E9B6FC12D37EC400DA6239 /* getopt_long.cpp */, ); name = clearcut; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7E9BA5312D39A5E00DA6239 /* read */ = { isa = PBXGroup; children = ( A7E9B6E912D37EC400DA6239 /* formatcolumn.cpp */, A7E9B6EA12D37EC400DA6239 /* formatcolumn.h */, A7E9B6EB12D37EC400DA6239 /* formatmatrix.h */, A7E9B6EC12D37EC400DA6239 /* formatphylip.cpp */, A7E9B6ED12D37EC400DA6239 /* formatphylip.h */, A7E9B7B012D37EC400DA6239 /* readblast.cpp */, A7E9B7B112D37EC400DA6239 /* readblast.h */, A7E9B7B212D37EC400DA6239 /* readcluster.cpp */, A7E9B7B312D37EC400DA6239 /* readcluster.h */, A7E9B7B412D37EC400DA6239 /* readcolumn.cpp */, A7E9B7B512D37EC400DA6239 /* readcolumn.h */, A7E9B7B812D37EC400DA6239 /* readmatrix.hpp */, A7E9B7BD12D37EC400DA6239 /* readphylip.cpp */, A7E9B7BE12D37EC400DA6239 /* readphylip.h */, A7E9B7BF12D37EC400DA6239 /* readtree.cpp */, A7E9B7C012D37EC400DA6239 /* readtree.h */, A713EBAA12DC7613000092AC /* readphylipvector.h */, A713EBAB12DC7613000092AC /* readphylipvector.cpp */, A7E9B84312D37EC400DA6239 /* splitmatrix.cpp */, A7E9B84412D37EC400DA6239 /* splitmatrix.h */, A7D755D71535F665009BF21A /* treereader.h */, A7D755D91535F679009BF21A /* treereader.cpp */, ); name = read; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; A7E9BA5612D39BD800DA6239 /* metastats */ = { isa = PBXGroup; children = ( A79234D513C74BF6002B08E2 /* mothurfisher.h */, A79234D613C74BF6002B08E2 /* mothurfisher.cpp */, A73DDC3613C4BF64006AAE38 /* mothurmetastats.h */, A73DDC3713C4BF64006AAE38 /* mothurmetastats.cpp */, ); name = metastats; path = /Users/sarahwestcott/Desktop/mothur; sourceTree = ""; }; /* End PBXGroup section */ /* Begin PBXNativeTarget section */ 481FB5181AC0A63E0076CFF3 /* TestMothur */ = { isa = PBXNativeTarget; buildConfigurationList = 481FB51F1AC0A63E0076CFF3 /* Build configuration list for PBXNativeTarget "TestMothur" */; buildPhases = ( 481FB5151AC0A63E0076CFF3 /* Sources */, 481FB5161AC0A63E0076CFF3 /* Frameworks */, 481FB5171AC0A63E0076CFF3 /* CopyFiles */, ); buildRules = ( 481FB6A11AC1BE060076CFF3 /* PBXBuildRule */, ); dependencies = ( ); name = TestMothur; productName = TestMothur; productReference = 481FB5191AC0A63E0076CFF3 /* TestMothur */; productType = "com.apple.product-type.tool"; }; 8DD76FA90486AB0100D96B5E /* Mothur */ = { isa = PBXNativeTarget; buildConfigurationList = 1DEB928508733DD80010E9CD /* Build configuration list for PBXNativeTarget "Mothur" */; buildPhases = ( 8DD76FAB0486AB0100D96B5E /* Sources */, 8DD76FAD0486AB0100D96B5E /* Frameworks */, 8DD76FAF0486AB0100D96B5E /* CopyFiles */, ); buildRules = ( A7D162CB149F96CA000523E8 /* PBXBuildRule */, ); dependencies = ( ); name = Mothur; productInstallPath = "$(HOME)/bin"; productName = mothur; productReference = 8DD76FB20486AB0100D96B5E /* mothur */; productType = "com.apple.product-type.tool"; }; /* End PBXNativeTarget section */ /* Begin PBXProject section */ 08FB7793FE84155DC02AAC07 /* Project object */ = { isa = PBXProject; attributes = { LastUpgradeCheck = 0600; ORGANIZATIONNAME = "Schloss Lab"; TargetAttributes = { 481FB5181AC0A63E0076CFF3 = { CreatedOnToolsVersion = 6.2; }; }; }; buildConfigurationList = 1DEB928908733DD80010E9CD /* Build configuration list for PBXProject "Mothur" */; compatibilityVersion = "Xcode 3.2"; developmentRegion = English; hasScannedForEncodings = 1; knownRegions = ( English, Japanese, French, German, ); mainGroup = 08FB7794FE84155DC02AAC07 /* mothur */; projectDirPath = ""; projectRoot = ""; targets = ( 8DD76FA90486AB0100D96B5E /* Mothur */, 481FB5181AC0A63E0076CFF3 /* TestMothur */, ); }; /* End PBXProject section */ /* Begin PBXSourcesBuildPhase section */ 481FB5151AC0A63E0076CFF3 /* Sources */ = { isa = PBXSourcesBuildPhase; buildActionMask = 2147483647; files = ( 481FB5E51AC1B77E0076CFF3 /* nocommands.cpp in Sources */, 481FB5F61AC1B77E0076CFF3 /* quitcommand.cpp in Sources */, 481FB52C1AC1B0A70076CFF3 /* commandfactory.cpp in Sources */, 481FB5C71AC1B74F0076CFF3 /* getsabundcommand.cpp in Sources */, 481FB5A51AC1B7300076CFF3 /* clusterdoturcommand.cpp in Sources */, 481FB6271AC1B7EA0076CFF3 /* alignmentdb.cpp in Sources */, 481FB6351AC1B7EA0076CFF3 /* kmerdb.cpp in Sources */, 481FB5721AC1B6D40076CFF3 /* simpson.cpp in Sources */, 481FB5A31AC1B7300076CFF3 /* clearmemorycommand.cpp in Sources */, 481FB55D1AC1B6690076CFF3 /* sharedchao1.cpp in Sources */, 481FB5FE1AC1B7970076CFF3 /* removerarecommand.cpp in Sources */, 481FB53C1AC1B5F10076CFF3 /* bootstrap.cpp in Sources */, 481FB5E21AC1B77E0076CFF3 /* metastatscommand.cpp in Sources */, 481FB5631AC1B6A10076CFF3 /* sharedkulczynski.cpp in Sources */, 481FB5EF1AC1B77E0076CFF3 /* pcoacommand.cpp in Sources */, 481FB64E1AC1B7F40076CFF3 /* treenode.cpp in Sources */, 481FB5801AC1B6EA0076CFF3 /* weighted.cpp in Sources */, 481FB54F1AC1B63A0076CFF3 /* memeuclidean.cpp in Sources */, 481FB5611AC1B69B0076CFF3 /* sharedjsd.cpp in Sources */, 481FB5AF1AC1B7300076CFF3 /* createdatabasecommand.cpp in Sources */, 481FB5731AC1B6EA0076CFF3 /* simpsoneven.cpp in Sources */, 481FB52F1AC1B5C20076CFF3 /* averagelinkage.cpp in Sources */, 481FB58D1AC1B7060076CFF3 /* collect.cpp in Sources */, 481FB5A01AC1B71B0076CFF3 /* classifysvmsharedcommand.cpp in Sources */, 481FB5741AC1B6EA0076CFF3 /* smithwilson.cpp in Sources */, 481FB6661AC1B8450076CFF3 /* progress.cpp in Sources */, 481FB6511AC1B8100076CFF3 /* engine.cpp in Sources */, 481FB5381AC1B5E30076CFF3 /* clusterclassic.cpp in Sources */, 481FB5EC1AC1B77E0076CFF3 /* parselistscommand.cpp in Sources */, 481FB5B61AC1B74F0076CFF3 /* filtersharedcommand.cpp in Sources */, 481FB5F81AC1B77E0076CFF3 /* rarefactsharedcommand.cpp in Sources */, 481FB62E1AC1B7EA0076CFF3 /* fastamap.cpp in Sources */, 481FB5C41AC1B74F0076CFF3 /* getotulabelscommand.cpp in Sources */, 481FB5A61AC1B7300076CFF3 /* clusterfragmentscommand.cpp in Sources */, 481FB5C01AC1B74F0076CFF3 /* getlineagecommand.cpp in Sources */, 481FB5F71AC1B77E0076CFF3 /* rarefactcommand.cpp in Sources */, 481FB61F1AC1B7AC0076CFF3 /* venncommand.cpp in Sources */, 481FB61A1AC1B7AC0076CFF3 /* treegroupscommand.cpp in Sources */, 481FB66B1AC1B8520076CFF3 /* decisiontree.cpp in Sources */, 481FB60A1AC1B7970076CFF3 /* sffinfocommand.cpp in Sources */, 481FB58C1AC1B6FF0076CFF3 /* slayer.cpp in Sources */, 481FB6531AC1B8100076CFF3 /* gotohoverlap.cpp in Sources */, 481FB66D1AC1B8520076CFF3 /* rftreenode.cpp in Sources */, 481FB5C31AC1B74F0076CFF3 /* getotuscommand.cpp in Sources */, 481FB65B1AC1B82C0076CFF3 /* mothurfisher.cpp in Sources */, 481FB6721AC1B8820076CFF3 /* refchimeratest.cpp in Sources */, 481FB6051AC1B7970076CFF3 /* seqerrorcommand.cpp in Sources */, 481FB6871AC1B8B80076CFF3 /* venn.cpp in Sources */, 481FB5D71AC1B75C0076CFF3 /* makebiomcommand.cpp in Sources */, 481FB6601AC1B8450076CFF3 /* nast.cpp in Sources */, 481FB5861AC1B6FF0076CFF3 /* chimerarealigner.cpp in Sources */, 481FB5A81AC1B7300076CFF3 /* collectcommand.cpp in Sources */, 481FB5961AC1B71B0076CFF3 /* chimeraccodecommand.cpp in Sources */, 481FB61B1AC1B7AC0076CFF3 /* trimflowscommand.cpp in Sources */, 481FB6781AC1B88F0076CFF3 /* readcolumn.cpp in Sources */, 481FB6291AC1B7EA0076CFF3 /* blastdb.cpp in Sources */, 481FB6831AC1B8B80076CFF3 /* trialSwap2.cpp in Sources */, 481FB63A1AC1B7EA0076CFF3 /* qualityscores.cpp in Sources */, 481FB5FD1AC1B7970076CFF3 /* removeotulabelscommand.cpp in Sources */, 481FB63F1AC1B7EA0076CFF3 /* sequencecountparser.cpp in Sources */, 481FB67E1AC1B8960076CFF3 /* sharedutilities.cpp in Sources */, 481FB67C1AC1B88F0076CFF3 /* splitmatrix.cpp in Sources */, 481FB59C1AC1B71B0076CFF3 /* chopseqscommand.cpp in Sources */, 481FB5DE1AC1B77E0076CFF3 /* mergesfffilecommand.cpp in Sources */, 481FB6421AC1B7EA0076CFF3 /* sharedlistvector.cpp in Sources */, 481FB5A71AC1B7300076CFF3 /* clustersplitcommand.cpp in Sources */, 481FB65C1AC1B82C0076CFF3 /* mothurmetastats.cpp in Sources */, 481FB5EB1AC1B77E0076CFF3 /* parsefastaqcommand.cpp in Sources */, 481FB6341AC1B7EA0076CFF3 /* kmeralign.cpp in Sources */, 481FB55B1AC1B6630076CFF3 /* sharedanderbergs.cpp in Sources */, 481FB5B81AC1B74F0076CFF3 /* getcoremicrobiomecommand.cpp in Sources */, 481FB54E1AC1B6340076CFF3 /* memchord.cpp in Sources */, 481FB6021AC1B7970076CFF3 /* screenseqscommand.cpp in Sources */, 481FB52E1AC1B0CB0076CFF3 /* testsetseedcommand.cpp in Sources */, 481FB65A1AC1B8100076CFF3 /* wilcox.cpp in Sources */, 481FB6251AC1B7EA0076CFF3 /* alignment.cpp in Sources */, 481FB5C51AC1B74F0076CFF3 /* getrabundcommand.cpp in Sources */, 481FB56E1AC1B6C30076CFF3 /* sharedsorest.cpp in Sources */, 481FB6161AC1B7AC0076CFF3 /* summaryqualcommand.cpp in Sources */, 481FB56F1AC1B6C70076CFF3 /* sharedthetan.cpp in Sources */, 481FB5B21AC1B7300076CFF3 /* deuniqueseqscommand.cpp in Sources */, 481FB6331AC1B7EA0076CFF3 /* kmer.cpp in Sources */, 481FB5BC1AC1B74F0076CFF3 /* getgroupscommand.cpp in Sources */, 481FB5891AC1B6FF0076CFF3 /* maligner.cpp in Sources */, 481FB5CC1AC1B74F0076CFF3 /* heatmapsimcommand.cpp in Sources */, 481FB54C1AC1B62D0076CFF3 /* manhattan.cpp in Sources */, 481FB5E41AC1B77E0076CFF3 /* mimarksattributescommand.cpp in Sources */, 481FB5C11AC1B74F0076CFF3 /* getlistcountcommand.cpp in Sources */, 481FB57C1AC1B6EA0076CFF3 /* structkulczynski.cpp in Sources */, 481FB5BF1AC1B74F0076CFF3 /* getmimarkspackagecommand.cpp in Sources */, 481FB66C1AC1B8520076CFF3 /* randomforest.cpp in Sources */, 481FB67A1AC1B88F0076CFF3 /* readtree.cpp in Sources */, 481FB6061AC1B7970076CFF3 /* seqsummarycommand.cpp in Sources */, 481FB54A1AC1B6270076CFF3 /* jackknife.cpp in Sources */, 481FB5431AC1B6110076CFF3 /* geom.cpp in Sources */, 481FB5761AC1B6EA0076CFF3 /* solow.cpp in Sources */, 481FB5421AC1B60D0076CFF3 /* efron.cpp in Sources */, 481FB5461AC1B6190076CFF3 /* hamming.cpp in Sources */, 481FB6891AC1BA760076CFF3 /* phylosummary.cpp in Sources */, 481FB6881AC1B8B80076CFF3 /* weightedlinkage.cpp in Sources */, 481FB61E1AC1B7AC0076CFF3 /* unifracweightedcommand.cpp in Sources */, 481FB5951AC1B71B0076CFF3 /* chimerabellerophoncommand.cpp in Sources */, 481FB68D1AC1BA9E0076CFF3 /* classify.cpp in Sources */, 481FB65F1AC1B8450076CFF3 /* myseqdist.cpp in Sources */, 481FB6391AC1B7EA0076CFF3 /* ordervector.cpp in Sources */, 481FB59A1AC1B71B0076CFF3 /* chimeraslayercommand.cpp in Sources */, 481FB5901AC1B71B0076CFF3 /* aligncommand.cpp in Sources */, 481FB6081AC1B7970076CFF3 /* setdircommand.cpp in Sources */, 481FB62C1AC1B7EA0076CFF3 /* designmap.cpp in Sources */, 481FB5661AC1B6AA0076CFF3 /* sharedmarczewski.cpp in Sources */, 481FB5881AC1B6FF0076CFF3 /* chimeraslayer.cpp in Sources */, 481FB6761AC1B88F0076CFF3 /* readblast.cpp in Sources */, 481FB5D81AC1B75C0076CFF3 /* makecontigscommand.cpp in Sources */, 481FB6481AC1B7EA0076CFF3 /* sparsedistancematrix.cpp in Sources */, 481FB5531AC1B6490076CFF3 /* parsimony.cpp in Sources */, 481FB63C1AC1B7EA0076CFF3 /* referencedb.cpp in Sources */, 481FB6641AC1B8450076CFF3 /* optionparser.cpp in Sources */, 481FB68B1AC1BA9E0076CFF3 /* aligntree.cpp in Sources */, 481FB5FB1AC1B77E0076CFF3 /* removelineagecommand.cpp in Sources */, 481FB57A1AC1B6EA0076CFF3 /* structchord.cpp in Sources */, 481FB6651AC1B8450076CFF3 /* overlap.cpp in Sources */, 481FB6841AC1B8B80076CFF3 /* trimoligos.cpp in Sources */, 481FB6401AC1B7EA0076CFF3 /* sequencedb.cpp in Sources */, 481FB5C81AC1B74F0076CFF3 /* getseqscommand.cpp in Sources */, 481FB6011AC1B7970076CFF3 /* reversecommand.cpp in Sources */, 481FB55E1AC1B66D0076CFF3 /* sharedjackknife.cpp in Sources */, 481FB5D51AC1B75C0076CFF3 /* loadlogfilecommand.cpp in Sources */, 481FB64B1AC1B7F40076CFF3 /* suffixtree.cpp in Sources */, 481FB5F21AC1B77E0076CFF3 /* phylotypecommand.cpp in Sources */, 481FB6281AC1B7EA0076CFF3 /* blastalign.cpp in Sources */, 481FB61D1AC1B7AC0076CFF3 /* unifracunweightedcommand.cpp in Sources */, 481FB6141AC1B7AC0076CFF3 /* subsamplecommand.cpp in Sources */, 481FB5481AC1B61F0076CFF3 /* hellinger.cpp in Sources */, 481FB5CA1AC1B74F0076CFF3 /* hclustercommand.cpp in Sources */, 481FB5D41AC1B75C0076CFF3 /* listseqscommand.cpp in Sources */, 481FB6521AC1B8100076CFF3 /* fileoutput.cpp in Sources */, 481FB6851AC1B8B80076CFF3 /* validcalculator.cpp in Sources */, 481FB56D1AC1B6C10076CFF3 /* sharedsorclass.cpp in Sources */, 481FB5931AC1B71B0076CFF3 /* binsequencecommand.cpp in Sources */, 481FB6861AC1B8B80076CFF3 /* validparameter.cpp in Sources */, 481FB6431AC1B7EA0076CFF3 /* sharedordervector.cpp in Sources */, 481FB5301AC1B5C80076CFF3 /* calcsparcc.cpp in Sources */, 481FB5B01AC1B7300076CFF3 /* deconvolutecommand.cpp in Sources */, 481FB6001AC1B7970076CFF3 /* renameseqscommand.cpp in Sources */, 481FB5921AC1B71B0076CFF3 /* anosimcommand.cpp in Sources */, 481FB6201AC1B7B30076CFF3 /* commandoptionparser.cpp in Sources */, 481FB5341AC1B5D60076CFF3 /* dmat.cpp in Sources */, 481FB6171AC1B7AC0076CFF3 /* summarysharedcommand.cpp in Sources */, 481FB68C1AC1BA9E0076CFF3 /* bayesian.cpp in Sources */, 481FB5F41AC1B77E0076CFF3 /* preclustercommand.cpp in Sources */, 481FB5911AC1B71B0076CFF3 /* amovacommand.cpp in Sources */, 481FB58A1AC1B6FF0076CFF3 /* myPerseus.cpp in Sources */, 481FB63E1AC1B7EA0076CFF3 /* sabundvector.cpp in Sources */, 481FB57D1AC1B6EA0076CFF3 /* structpearson.cpp in Sources */, 481FB5331AC1B5D30076CFF3 /* distclearcut.cpp in Sources */, 481FB6811AC1B8960076CFF3 /* subsample.cpp in Sources */, 481FB5521AC1B6450076CFF3 /* odum.cpp in Sources */, 481FB68E1AC1BA9E0076CFF3 /* kmernode.cpp in Sources */, 481FB5CE1AC1B75C0076CFF3 /* homovacommand.cpp in Sources */, 481FB6551AC1B8100076CFF3 /* heatmap.cpp in Sources */, 481FB5E61AC1B77E0076CFF3 /* normalizesharedcommand.cpp in Sources */, 481FB5E71AC1B77E0076CFF3 /* nmdscommand.cpp in Sources */, 481FB52B1AC1B09F0076CFF3 /* setseedcommand.cpp in Sources */, 481FB5261AC0ADA00076CFF3 /* sequence.cpp in Sources */, 481FB5C61AC1B74F0076CFF3 /* getrelabundcommand.cpp in Sources */, 481FB6571AC1B8100076CFF3 /* inputdata.cpp in Sources */, 481FB5451AC1B6170076CFF3 /* gower.cpp in Sources */, 481FB5AC1AC1B7300076CFF3 /* corraxescommand.cpp in Sources */, 481FB5A11AC1B71B0076CFF3 /* classifytreecommand.cpp in Sources */, 481FB62F1AC1B7EA0076CFF3 /* fastqread.cpp in Sources */, 481FB6901AC1BA9E0076CFF3 /* knn.cpp in Sources */, 481FB5941AC1B71B0076CFF3 /* catchallcommand.cpp in Sources */, 481FB56C1AC1B6BE0076CFF3 /* sharedsorabund.cpp in Sources */, 481FB6411AC1B7EA0076CFF3 /* sequenceparser.cpp in Sources */, 481FB6381AC1B7EA0076CFF3 /* oligos.cpp in Sources */, 481FB59E1AC1B71B0076CFF3 /* classifyseqscommand.cpp in Sources */, 481FB5F31AC1B77E0076CFF3 /* pipelinepdscommand.cpp in Sources */, 481FB5CF1AC1B75C0076CFF3 /* indicatorcommand.cpp in Sources */, 481FB64F1AC1B8100076CFF3 /* consensus.cpp in Sources */, 481FB5441AC1B6140076CFF3 /* goodscoverage.cpp in Sources */, 481FB5DD1AC1B77E0076CFF3 /* matrixoutputcommand.cpp in Sources */, 481FB5771AC1B6EA0076CFF3 /* spearman.cpp in Sources */, 481FB6031AC1B7970076CFF3 /* secondarystructurecommand.cpp in Sources */, 481FB66F1AC1B8520076CFF3 /* regularizedrandomforest.cpp in Sources */, 481FB5361AC1B5DC0076CFF3 /* getopt_long.cpp in Sources */, 481FB5A41AC1B7300076CFF3 /* clustercommand.cpp in Sources */, 481FB5671AC1B6AD0076CFF3 /* sharedmorisitahorn.cpp in Sources */, 481FB5581AC1B6590076CFF3 /* shannonrange.cpp in Sources */, 481FB5601AC1B6790076CFF3 /* sharedjest.cpp in Sources */, 481FB64A1AC1B7F40076CFF3 /* suffixnodes.cpp in Sources */, 481FB53F1AC1B6000076CFF3 /* canberra.cpp in Sources */, 481FB62B1AC1B7EA0076CFF3 /* database.cpp in Sources */, 481FB5BD1AC1B74F0076CFF3 /* getlabelcommand.cpp in Sources */, 481FB5B91AC1B74F0076CFF3 /* getcurrentcommand.cpp in Sources */, 481FB5991AC1B71B0076CFF3 /* chimeraperseuscommand.cpp in Sources */, 481FB68F1AC1BA9E0076CFF3 /* kmertree.cpp in Sources */, 481FB5CB1AC1B74F0076CFF3 /* heatmapcommand.cpp in Sources */, 481FB60C1AC1B7AC0076CFF3 /* sharedcommand.cpp in Sources */, 481FB5701AC1B6CA0076CFF3 /* sharedthetayc.cpp in Sources */, 481FB62D1AC1B7EA0076CFF3 /* distancedb.cpp in Sources */, 481FB5AA1AC1B7300076CFF3 /* consensusseqscommand.cpp in Sources */, 481FB5AE1AC1B7300076CFF3 /* countseqscommand.cpp in Sources */, 481FB5811AC1B6EA0076CFF3 /* whittaker.cpp in Sources */, 481FB58E1AC1B7060076CFF3 /* completelinkage.cpp in Sources */, 481FB6301AC1B7EA0076CFF3 /* flowdata.cpp in Sources */, 481FB59B1AC1B71B0076CFF3 /* chimerauchimecommand.cpp in Sources */, 481FB5971AC1B71B0076CFF3 /* chimeracheckcommand.cpp in Sources */, 481FB5271AC0ADBA0076CFF3 /* mothurout.cpp in Sources */, 481FB54D1AC1B6300076CFF3 /* memchi2.cpp in Sources */, 481FB5E01AC1B77E0076CFF3 /* mergegroupscommand.cpp in Sources */, 481FB56B1AC1B6BB0076CFF3 /* sharedsobscollectsummary.cpp in Sources */, 481FB57F1AC1B6EA0076CFF3 /* uvest.cpp in Sources */, 481FB5791AC1B6EA0076CFF3 /* structchi2.cpp in Sources */, 481FB63B1AC1B7EA0076CFF3 /* rabundvector.cpp in Sources */, 481FB5A91AC1B7300076CFF3 /* collectsharedcommand.cpp in Sources */, 481FB6211AC1B7BA0076CFF3 /* communitytype.cpp in Sources */, 481FB5621AC1B69E0076CFF3 /* sharedkstest.cpp in Sources */, 481FB5E91AC1B77E0076CFF3 /* otuhierarchycommand.cpp in Sources */, 481FB5351AC1B5D90076CFF3 /* fasta.cpp in Sources */, 481FB6321AC1B7EA0076CFF3 /* groupmap.cpp in Sources */, 481FB5251AC0AA430076CFF3 /* testsequence.cpp in Sources */, 481FB5FF1AC1B7970076CFF3 /* removeseqscommand.cpp in Sources */, 481FB6771AC1B88F0076CFF3 /* readcluster.cpp in Sources */, 481FB5831AC1B6FF0076CFF3 /* ccode.cpp in Sources */, 481FB5681AC1B6B20076CFF3 /* sharedochiai.cpp in Sources */, 481FB66E1AC1B8520076CFF3 /* forest.cpp in Sources */, 481FB66A1AC1B8520076CFF3 /* abstractrandomforest.cpp in Sources */, 481FB56A1AC1B6B80076CFF3 /* sharedsobs.cpp in Sources */, 481FB6671AC1B8450076CFF3 /* randomnumber.cpp in Sources */, 481FB5DB1AC1B75C0076CFF3 /* makelefsecommand.cpp in Sources */, 481FB6371AC1B7EA0076CFF3 /* nameassignment.cpp in Sources */, 481FB5D21AC1B75C0076CFF3 /* libshuffcommand.cpp in Sources */, 481FB5561AC1B6520076CFF3 /* shannon.cpp in Sources */, 481FB6591AC1B8100076CFF3 /* linearalgebra.cpp in Sources */, 481FB5411AC1B6070076CFF3 /* coverage.cpp in Sources */, 481FB6231AC1B7BA0076CFF3 /* pam.cpp in Sources */, 481FB5BA1AC1B74F0076CFF3 /* getdistscommand.cpp in Sources */, 481FB6191AC1B7AC0076CFF3 /* systemcommand.cpp in Sources */, 481FB6611AC1B8450076CFF3 /* nastreport.cpp in Sources */, 48DB37B41B3B27E000C372A4 /* makefilecommand.cpp in Sources */, 481FB6181AC1B7AC0076CFF3 /* summarytaxcommand.cpp in Sources */, 481FB5CD1AC1B74F0076CFF3 /* helpcommand.cpp in Sources */, 481FB6451AC1B7EA0076CFF3 /* sharedrabundvector.cpp in Sources */, 481FB6701AC1B8820076CFF3 /* raredisplay.cpp in Sources */, 481FB5F91AC1B77E0076CFF3 /* removedistscommand.cpp in Sources */, 481FB6581AC1B8100076CFF3 /* libshuff.cpp in Sources */, 481FB59D1AC1B71B0076CFF3 /* classifyotucommand.cpp in Sources */, 481FB5781AC1B6EA0076CFF3 /* speciesprofile.cpp in Sources */, 481FB5401AC1B6030076CFF3 /* chao1.cpp in Sources */, 481FB5591AC1B65D0076CFF3 /* sharedjabund.cpp in Sources */, 481FB62A1AC1B7EA0076CFF3 /* counttable.cpp in Sources */, 481FB53A1AC1B5EC0076CFF3 /* bergerparker.cpp in Sources */, 481FB6751AC1B88F0076CFF3 /* formatphylip.cpp in Sources */, 481FB5AD1AC1B7300076CFF3 /* countgroupscommand.cpp in Sources */, 481FB61C1AC1B7AC0076CFF3 /* trimseqscommand.cpp in Sources */, 481FB5311AC1B5CD0076CFF3 /* clearcut.cpp in Sources */, 481FB5651AC1B6A70076CFF3 /* sharedlennon.cpp in Sources */, 481FB53E1AC1B5FC0076CFF3 /* calculator.cpp in Sources */, 481FB6241AC1B7BA0076CFF3 /* qFinderDMM.cpp in Sources */, 481FB6541AC1B8100076CFF3 /* hcluster.cpp in Sources */, 481FB6311AC1B7EA0076CFF3 /* fullmatrix.cpp in Sources */, 481FB51C1AC0A63E0076CFF3 /* main.cpp in Sources */, 481FB58F1AC1B71B0076CFF3 /* newcommandtemplate.cpp in Sources */, 481FB6741AC1B88F0076CFF3 /* formatcolumn.cpp in Sources */, 481FB5571AC1B6550076CFF3 /* shannoneven.cpp in Sources */, 481FB5D11AC1B75C0076CFF3 /* lefsecommand.cpp in Sources */, 481FB6561AC1B8100076CFF3 /* heatmapsim.cpp in Sources */, 481FB5EA1AC1B77E0076CFF3 /* pairwiseseqscommand.cpp in Sources */, 481FB63D1AC1B7EA0076CFF3 /* reportfile.cpp in Sources */, 481FB5F11AC1B77E0076CFF3 /* phylodiversitycommand.cpp in Sources */, 481FB5501AC1B63D0076CFF3 /* mempearson.cpp in Sources */, 481FB5B51AC1B7300076CFF3 /* filterseqscommand.cpp in Sources */, 481FB6621AC1B8450076CFF3 /* noalign.cpp in Sources */, 481FB5E31AC1B77E0076CFF3 /* mgclustercommand.cpp in Sources */, 481FB5491AC1B6220076CFF3 /* invsimpson.cpp in Sources */, 481FB5821AC1B6FF0076CFF3 /* bellerophon.cpp in Sources */, 481FB6731AC1B8820076CFF3 /* seqnoise.cpp in Sources */, 481FB5DC1AC1B75C0076CFF3 /* makelookupcommand.cpp in Sources */, 481FB53D1AC1B5F80076CFF3 /* bstick.cpp in Sources */, 481FB6681AC1B8450076CFF3 /* rarecalc.cpp in Sources */, 481FB60B1AC1B7AC0076CFF3 /* sffmultiplecommand.cpp in Sources */, 481FB59F1AC1B71B0076CFF3 /* classifyrfsharedcommand.cpp in Sources */, 481FB5F51AC1B77E0076CFF3 /* primerdesigncommand.cpp in Sources */, 481FB5B41AC1B7300076CFF3 /* distancecommand.cpp in Sources */, 481FB5391AC1B5E90076CFF3 /* ace.cpp in Sources */, 481FB5751AC1B6EA0076CFF3 /* soergel.cpp in Sources */, 481FB5DA1AC1B75C0076CFF3 /* makegroupcommand.cpp in Sources */, 481FB5691AC1B6B50076CFF3 /* sharedrjsd.cpp in Sources */, 481FB6801AC1B8960076CFF3 /* slibshuff.cpp in Sources */, 481FB67B1AC1B88F0076CFF3 /* readphylipvector.cpp in Sources */, 481FB64C1AC1B7F40076CFF3 /* tree.cpp in Sources */, 481FB6631AC1B8450076CFF3 /* needlemanoverlap.cpp in Sources */, 481FB6931AC1BAA60076CFF3 /* taxonomynode.cpp in Sources */, 481FB60E1AC1B7AC0076CFF3 /* shhhseqscommand.cpp in Sources */, 481FB5E11AC1B77E0076CFF3 /* mergetaxsummarycommand.cpp in Sources */, 481FB5AB1AC1B7300076CFF3 /* cooccurrencecommand.cpp in Sources */, 481FB5D61AC1B75C0076CFF3 /* mantelcommand.cpp in Sources */, 481FB57E1AC1B6EA0076CFF3 /* unweighted.cpp in Sources */, 481FB60F1AC1B7AC0076CFF3 /* sortseqscommand.cpp in Sources */, 481FB67D1AC1B88F0076CFF3 /* treereader.cpp in Sources */, 481FB6131AC1B7AC0076CFF3 /* sracommand.cpp in Sources */, 481FB5541AC1B64C0076CFF3 /* prng.cpp in Sources */, 481FB6691AC1B8520076CFF3 /* abstractdecisiontree.cpp in Sources */, 481FB57B1AC1B6EA0076CFF3 /* structeuclidean.cpp in Sources */, 481FB6221AC1B7BA0076CFF3 /* kmeans.cpp in Sources */, 481FB54B1AC1B62A0076CFF3 /* logsd.cpp in Sources */, 481FB55A1AC1B6600076CFF3 /* sharedace.cpp in Sources */, 481FB5BB1AC1B74F0076CFF3 /* getgroupcommand.cpp in Sources */, 481FB6361AC1B7EA0076CFF3 /* listvector.cpp in Sources */, 481FB5ED1AC1B77E0076CFF3 /* parsimonycommand.cpp in Sources */, 481FB55F1AC1B6750076CFF3 /* sharedjclass.cpp in Sources */, 481FB6101AC1B7AC0076CFF3 /* sparcccommand.cpp in Sources */, 481FB5E81AC1B77E0076CFF3 /* otuassociationcommand.cpp in Sources */, 481FB5B31AC1B7300076CFF3 /* deuniquetreecommand.cpp in Sources */, 481FB5D91AC1B75C0076CFF3 /* makefastqcommand.cpp in Sources */, 481FB5A21AC1B71B0076CFF3 /* clearcutcommand.cpp in Sources */, 481FB5851AC1B6FF0076CFF3 /* chimeracheckrdp.cpp in Sources */, 481FB55C1AC1B6660076CFF3 /* sharedbraycurtis.cpp in Sources */, 481FB5FC1AC1B7970076CFF3 /* removeotuscommand.cpp in Sources */, 481FB5BE1AC1B74F0076CFF3 /* getmetacommunitycommand.cpp in Sources */, 481FB6821AC1B8AF0076CFF3 /* svm.cpp in Sources */, 481FB6911AC1BAA60076CFF3 /* phylotree.cpp in Sources */, 481FB6261AC1B7EA0076CFF3 /* alignmentcell.cpp in Sources */, 481FB5C21AC1B74F0076CFF3 /* getoturepcommand.cpp in Sources */, 481FB5D01AC1B75C0076CFF3 /* kruskalwalliscommand.cpp in Sources */, 481FB5511AC1B6410076CFF3 /* npshannon.cpp in Sources */, 481FB6471AC1B7EA0076CFF3 /* sparsematrix.cpp in Sources */, 481FB5871AC1B6FF0076CFF3 /* decalc.cpp in Sources */, 481FB6791AC1B88F0076CFF3 /* readphylip.cpp in Sources */, 481FB6151AC1B7AC0076CFF3 /* summarycommand.cpp in Sources */, 481FB5EE1AC1B77E0076CFF3 /* pcacommand.cpp in Sources */, 481FB5B71AC1B74F0076CFF3 /* getcommandinfocommand.cpp in Sources */, 481FB5711AC1B6D40076CFF3 /* shen.cpp in Sources */, 481FB6501AC1B8100076CFF3 /* dlibshuff.cpp in Sources */, 481FB64D1AC1B7F40076CFF3 /* treemap.cpp in Sources */, 481FB67F1AC1B8960076CFF3 /* singlelinkage.cpp in Sources */, 481FB5641AC1B6A40076CFF3 /* sharedkulczynskicody.cpp in Sources */, 481FB6461AC1B7EA0076CFF3 /* sharedsabundvector.cpp in Sources */, 481FB60D1AC1B7AC0076CFF3 /* shhhercommand.cpp in Sources */, 481FB5FA1AC1B77E0076CFF3 /* removegroupscommand.cpp in Sources */, 481FB5371AC1B5E00076CFF3 /* cluster.cpp in Sources */, 481FB53B1AC1B5EF0076CFF3 /* boneh.cpp in Sources */, 481FB6441AC1B7EA0076CFF3 /* sharedrabundfloatvector.cpp in Sources */, 481FB6071AC1B7970076CFF3 /* setcurrentcommand.cpp in Sources */, 481FB5321AC1B5D00076CFF3 /* cmdargs.cpp in Sources */, 481FB6711AC1B8820076CFF3 /* rarefact.cpp in Sources */, 481FB5841AC1B6FF0076CFF3 /* chimera.cpp in Sources */, 481FB6121AC1B7AC0076CFF3 /* splitgroupscommand.cpp in Sources */, 481FB6921AC1BAA60076CFF3 /* taxonomyequalizer.cpp in Sources */, 481FB68A1AC1BA9E0076CFF3 /* alignnode.cpp in Sources */, 481FB58B1AC1B6FF0076CFF3 /* pintail.cpp in Sources */, 481FB6041AC1B7970076CFF3 /* sensspeccommand.cpp in Sources */, 481FB6491AC1B7F40076CFF3 /* suffixdb.cpp in Sources */, 481FB6111AC1B7AC0076CFF3 /* splitabundcommand.cpp in Sources */, 481FB5471AC1B61C0076CFF3 /* heip.cpp in Sources */, 481FB5D31AC1B75C0076CFF3 /* listotulabelscommand.cpp in Sources */, 481FB5551AC1B64F0076CFF3 /* qstat.cpp in Sources */, 481FB5DF1AC1B77E0076CFF3 /* mergefilecommand.cpp in Sources */, 481FB5981AC1B71B0076CFF3 /* chimerapintailcommand.cpp in Sources */, 481FB6091AC1B7970076CFF3 /* setlogfilecommand.cpp in Sources */, 481FB5C91AC1B74F0076CFF3 /* getsharedotucommand.cpp in Sources */, 481FB5B11AC1B7300076CFF3 /* degapseqscommand.cpp in Sources */, ); runOnlyForDeploymentPostprocessing = 0; }; 8DD76FAB0486AB0100D96B5E /* Sources */ = { isa = PBXSourcesBuildPhase; buildActionMask = 2147483647; files = ( A7E9B88112D37EC400DA6239 /* ace.cpp in Sources */, A7E9B88212D37EC400DA6239 /* aligncommand.cpp in Sources */, A7E9B88312D37EC400DA6239 /* alignment.cpp in Sources */, A7E9B88412D37EC400DA6239 /* alignmentcell.cpp in Sources */, A7E9B88512D37EC400DA6239 /* alignmentdb.cpp in Sources */, A7E9B88612D37EC400DA6239 /* averagelinkage.cpp in Sources */, A7E9B88712D37EC400DA6239 /* bayesian.cpp in Sources */, A7E9B88812D37EC400DA6239 /* bellerophon.cpp in Sources */, A7E9B88912D37EC400DA6239 /* bergerparker.cpp in Sources */, A7E9B88A12D37EC400DA6239 /* binsequencecommand.cpp in Sources */, A7E9B88B12D37EC400DA6239 /* blastalign.cpp in Sources */, A7E9B88C12D37EC400DA6239 /* blastdb.cpp in Sources */, A7E9B88D12D37EC400DA6239 /* boneh.cpp in Sources */, A7E9B88E12D37EC400DA6239 /* bootstrap.cpp in Sources */, A7E9B89012D37EC400DA6239 /* bstick.cpp in Sources */, A7E9B89112D37EC400DA6239 /* calculator.cpp in Sources */, A7E9B89212D37EC400DA6239 /* canberra.cpp in Sources */, A7E9B89312D37EC400DA6239 /* catchallcommand.cpp in Sources */, A7E9B89412D37EC400DA6239 /* ccode.cpp in Sources */, A7E9B89512D37EC400DA6239 /* chao1.cpp in Sources */, A7E9B89612D37EC400DA6239 /* chimera.cpp in Sources */, A7E9B89712D37EC400DA6239 /* chimerabellerophoncommand.cpp in Sources */, A7E9B89812D37EC400DA6239 /* chimeraccodecommand.cpp in Sources */, A7E9B89912D37EC400DA6239 /* chimeracheckcommand.cpp in Sources */, 48E981CF189C38FB0042BE9D /* (null) in Sources */, A7E9B89A12D37EC400DA6239 /* chimeracheckrdp.cpp in Sources */, A7E9B89B12D37EC400DA6239 /* chimerapintailcommand.cpp in Sources */, A7E9B89C12D37EC400DA6239 /* chimerarealigner.cpp in Sources */, A7E9B89E12D37EC400DA6239 /* chimeraslayer.cpp in Sources */, A7E9B89F12D37EC400DA6239 /* chimeraslayercommand.cpp in Sources */, A7E9B8A012D37EC400DA6239 /* chopseqscommand.cpp in Sources */, A7E9B8A112D37EC400DA6239 /* classify.cpp in Sources */, A7E9B8A212D37EC400DA6239 /* classifyotucommand.cpp in Sources */, A7E9B8A312D37EC400DA6239 /* classifyseqscommand.cpp in Sources */, A7E9B8A412D37EC400DA6239 /* clearcut.cpp in Sources */, A7E9B8A512D37EC400DA6239 /* clearcutcommand.cpp in Sources */, A7E9B8A612D37EC400DA6239 /* cluster.cpp in Sources */, A7E9B8A712D37EC400DA6239 /* clusterclassic.cpp in Sources */, A7E9B8A812D37EC400DA6239 /* clustercommand.cpp in Sources */, A7E9B8A912D37EC400DA6239 /* clusterdoturcommand.cpp in Sources */, A7E9B8AA12D37EC400DA6239 /* clusterfragmentscommand.cpp in Sources */, A7E9B8AB12D37EC400DA6239 /* clustersplitcommand.cpp in Sources */, A7E9B8AC12D37EC400DA6239 /* cmdargs.cpp in Sources */, A7E9B8AD12D37EC400DA6239 /* collect.cpp in Sources */, A7E9B8AE12D37EC400DA6239 /* collectcommand.cpp in Sources */, A7E9B8AF12D37EC400DA6239 /* collectsharedcommand.cpp in Sources */, A7E9B8B012D37EC400DA6239 /* commandfactory.cpp in Sources */, A7E9B8B112D37EC400DA6239 /* commandoptionparser.cpp in Sources */, A7E9B8B312D37EC400DA6239 /* consensus.cpp in Sources */, A7E9B8B412D37EC400DA6239 /* consensusseqscommand.cpp in Sources */, A7E9B8B512D37EC400DA6239 /* corraxescommand.cpp in Sources */, A7E9B8B612D37EC400DA6239 /* coverage.cpp in Sources */, A7E9B8B712D37EC400DA6239 /* database.cpp in Sources */, A7E9B8B812D37EC400DA6239 /* decalc.cpp in Sources */, A7E9B8B912D37EC400DA6239 /* deconvolutecommand.cpp in Sources */, A7E9B8BA12D37EC400DA6239 /* degapseqscommand.cpp in Sources */, A7E9B8BB12D37EC400DA6239 /* deuniqueseqscommand.cpp in Sources */, A7E9B8BC12D37EC400DA6239 /* distancecommand.cpp in Sources */, A7E9B8BD12D37EC400DA6239 /* distancedb.cpp in Sources */, A7E9B8BE12D37EC400DA6239 /* distclearcut.cpp in Sources */, A7E9B8BF12D37EC400DA6239 /* dlibshuff.cpp in Sources */, A7E9B8C012D37EC400DA6239 /* dmat.cpp in Sources */, A7E9B8C112D37EC400DA6239 /* efron.cpp in Sources */, A7E9B8C212D37EC400DA6239 /* engine.cpp in Sources */, A7E9B8C312D37EC400DA6239 /* fasta.cpp in Sources */, A7E9B8C412D37EC400DA6239 /* fastamap.cpp in Sources */, A7E9B8C512D37EC400DA6239 /* fileoutput.cpp in Sources */, A7E9B8C612D37EC400DA6239 /* filterseqscommand.cpp in Sources */, A7E9B8C812D37EC400DA6239 /* flowdata.cpp in Sources */, A7E9B8C912D37EC400DA6239 /* formatcolumn.cpp in Sources */, A7E9B8CA12D37EC400DA6239 /* formatphylip.cpp in Sources */, A7E9B8CB12D37EC400DA6239 /* fullmatrix.cpp in Sources */, A7E9B8CC12D37EC400DA6239 /* geom.cpp in Sources */, A7E9B8CD12D37EC400DA6239 /* getgroupcommand.cpp in Sources */, A7E9B8CE12D37EC400DA6239 /* getgroupscommand.cpp in Sources */, A7E9B8CF12D37EC400DA6239 /* getlabelcommand.cpp in Sources */, A7E9B8D012D37EC400DA6239 /* getlineagecommand.cpp in Sources */, A7E9B8D112D37EC400DA6239 /* getlistcountcommand.cpp in Sources */, A7E9B8D212D37EC400DA6239 /* getopt_long.cpp in Sources */, A7E9B8D312D37EC400DA6239 /* getoturepcommand.cpp in Sources */, A7E9B8D412D37EC400DA6239 /* getotuscommand.cpp in Sources */, A7E9B8D512D37EC400DA6239 /* getrabundcommand.cpp in Sources */, A7E9B8D612D37EC400DA6239 /* getrelabundcommand.cpp in Sources */, A7E9B8D712D37EC400DA6239 /* getsabundcommand.cpp in Sources */, A7E9B8D812D37EC400DA6239 /* getseqscommand.cpp in Sources */, A7E9B8D912D37EC400DA6239 /* getsharedotucommand.cpp in Sources */, A7E9B8DB12D37EC400DA6239 /* goodscoverage.cpp in Sources */, A7E9B8DC12D37EC400DA6239 /* gotohoverlap.cpp in Sources */, A7E9B8DD12D37EC400DA6239 /* gower.cpp in Sources */, A7E9B8DE12D37EC400DA6239 /* groupmap.cpp in Sources */, 4893DE2918EEF28100C615DF /* (null) in Sources */, A7E9B8DF12D37EC400DA6239 /* hamming.cpp in Sources */, A7E9B8E012D37EC400DA6239 /* hcluster.cpp in Sources */, A7E9B8E112D37EC400DA6239 /* hclustercommand.cpp in Sources */, A7E9B8E212D37EC400DA6239 /* heatmap.cpp in Sources */, A7E9B8E312D37EC400DA6239 /* heatmapcommand.cpp in Sources */, A7E9B8E412D37EC400DA6239 /* heatmapsim.cpp in Sources */, A7E9B8E512D37EC400DA6239 /* heatmapsimcommand.cpp in Sources */, A7E9B8E612D37EC400DA6239 /* heip.cpp in Sources */, 481FB52A1AC19F8B0076CFF3 /* setseedcommand.cpp in Sources */, A7E9B8E712D37EC400DA6239 /* hellinger.cpp in Sources */, A7E9B8E812D37EC400DA6239 /* helpcommand.cpp in Sources */, A7E9B8E912D37EC400DA6239 /* indicatorcommand.cpp in Sources */, A7E9B8EA12D37EC400DA6239 /* inputdata.cpp in Sources */, A7E9B8EB12D37EC400DA6239 /* invsimpson.cpp in Sources */, A7E9B8EC12D37EC400DA6239 /* jackknife.cpp in Sources */, A7E9B8ED12D37EC400DA6239 /* kmer.cpp in Sources */, A7E9B8EE12D37EC400DA6239 /* kmerdb.cpp in Sources */, A7E9B8EF12D37EC400DA6239 /* knn.cpp in Sources */, 48A85BAD18E1AF2000199B6F /* (null) in Sources */, A7E9B8F012D37EC400DA6239 /* libshuff.cpp in Sources */, 48F98E4D1A9CFD670005E81B /* completelinkage.cpp in Sources */, A7E9B8F112D37EC400DA6239 /* libshuffcommand.cpp in Sources */, A7E9B8F212D37EC400DA6239 /* listseqscommand.cpp in Sources */, A7E9B8F312D37EC400DA6239 /* listvector.cpp in Sources */, A7E9B8F412D37EC400DA6239 /* logsd.cpp in Sources */, A7E9B8F512D37EC400DA6239 /* makegroupcommand.cpp in Sources */, 48705AC719BE32C50075E977 /* sharedrjsd.cpp in Sources */, A7E9B8F612D37EC400DA6239 /* maligner.cpp in Sources */, A7E9B8F712D37EC400DA6239 /* manhattan.cpp in Sources */, A7E9B8F812D37EC400DA6239 /* matrixoutputcommand.cpp in Sources */, A7E9B8F912D37EC400DA6239 /* memchi2.cpp in Sources */, A7E9B8FA12D37EC400DA6239 /* memchord.cpp in Sources */, A7E9B8FB12D37EC400DA6239 /* memeuclidean.cpp in Sources */, A7E9B8FC12D37EC400DA6239 /* mempearson.cpp in Sources */, A7E9B8FD12D37EC400DA6239 /* mergefilecommand.cpp in Sources */, A7E9B8FF12D37EC400DA6239 /* metastatscommand.cpp in Sources */, A7E9B90012D37EC400DA6239 /* mgclustercommand.cpp in Sources */, A7E9B90112D37EC400DA6239 /* mothur.cpp in Sources */, A7E9B90212D37EC400DA6239 /* mothurout.cpp in Sources */, A7E9B90312D37EC400DA6239 /* nameassignment.cpp in Sources */, A7E9B90412D37EC400DA6239 /* nast.cpp in Sources */, A7E9B90512D37EC400DA6239 /* nastreport.cpp in Sources */, A7E9B90612D37EC400DA6239 /* needlemanoverlap.cpp in Sources */, A7E9B90712D37EC400DA6239 /* noalign.cpp in Sources */, A7E9B90812D37EC400DA6239 /* nocommands.cpp in Sources */, A7E9B90912D37EC400DA6239 /* normalizesharedcommand.cpp in Sources */, A7E9B90A12D37EC400DA6239 /* npshannon.cpp in Sources */, A7E9B90B12D37EC400DA6239 /* odum.cpp in Sources */, A7E9B90C12D37EC400DA6239 /* optionparser.cpp in Sources */, A7E9B90D12D37EC400DA6239 /* ordervector.cpp in Sources */, A7E9B90E12D37EC400DA6239 /* otuhierarchycommand.cpp in Sources */, A7E9B90F12D37EC400DA6239 /* overlap.cpp in Sources */, A7E9B91012D37EC400DA6239 /* pairwiseseqscommand.cpp in Sources */, A7E9B91112D37EC400DA6239 /* parsefastaqcommand.cpp in Sources */, A7E9B91212D37EC400DA6239 /* parselistscommand.cpp in Sources */, A7E9B91312D37EC400DA6239 /* parsimony.cpp in Sources */, A7E9B91412D37EC400DA6239 /* parsimonycommand.cpp in Sources */, A7E9B91512D37EC400DA6239 /* pcoacommand.cpp in Sources */, A7E9B91712D37EC400DA6239 /* phylodiversitycommand.cpp in Sources */, A7E9B91812D37EC400DA6239 /* phylosummary.cpp in Sources */, A7E9B91912D37EC400DA6239 /* phylotree.cpp in Sources */, A7E9B91A12D37EC400DA6239 /* phylotypecommand.cpp in Sources */, A7E9B91B12D37EC400DA6239 /* pintail.cpp in Sources */, A7E9B91C12D37EC400DA6239 /* pipelinepdscommand.cpp in Sources */, 48DB37B31B3B27E000C372A4 /* makefilecommand.cpp in Sources */, A7E9B91D12D37EC400DA6239 /* preclustercommand.cpp in Sources */, A7E9B91E12D37EC400DA6239 /* prng.cpp in Sources */, A7E9B91F12D37EC400DA6239 /* progress.cpp in Sources */, A7E9B92012D37EC400DA6239 /* qstat.cpp in Sources */, A7E9B92112D37EC400DA6239 /* qualityscores.cpp in Sources */, A7E9B92212D37EC400DA6239 /* quitcommand.cpp in Sources */, A7E9B92312D37EC400DA6239 /* rabundvector.cpp in Sources */, A7E9B92412D37EC400DA6239 /* rarecalc.cpp in Sources */, A7E9B92512D37EC400DA6239 /* raredisplay.cpp in Sources */, A7E9B92612D37EC400DA6239 /* rarefact.cpp in Sources */, A7E9B92712D37EC400DA6239 /* rarefactcommand.cpp in Sources */, A7E9B92812D37EC400DA6239 /* rarefactsharedcommand.cpp in Sources */, A7E9B92912D37EC400DA6239 /* readblast.cpp in Sources */, A7E9B92A12D37EC400DA6239 /* readcluster.cpp in Sources */, A7E9B92B12D37EC400DA6239 /* readcolumn.cpp in Sources */, A7E9B92F12D37EC400DA6239 /* readphylip.cpp in Sources */, A7E9B93012D37EC400DA6239 /* readtree.cpp in Sources */, A7E9B93212D37EC400DA6239 /* removegroupscommand.cpp in Sources */, A7E9B93312D37EC400DA6239 /* removelineagecommand.cpp in Sources */, A7E9B93412D37EC400DA6239 /* removeotuscommand.cpp in Sources */, A7E9B93512D37EC400DA6239 /* removeseqscommand.cpp in Sources */, A7E9B93612D37EC400DA6239 /* reportfile.cpp in Sources */, A7E9B93712D37EC400DA6239 /* reversecommand.cpp in Sources */, A7E9B93812D37EC400DA6239 /* sabundvector.cpp in Sources */, A7E9B93912D37EC400DA6239 /* screenseqscommand.cpp in Sources */, A7E9B93A12D37EC400DA6239 /* secondarystructurecommand.cpp in Sources */, A7E9B93B12D37EC400DA6239 /* sensspeccommand.cpp in Sources */, A7E9B93C12D37EC400DA6239 /* seqerrorcommand.cpp in Sources */, A7E9B93D12D37EC400DA6239 /* seqsummarycommand.cpp in Sources */, A7E9B93E12D37EC400DA6239 /* sequence.cpp in Sources */, A7E9B93F12D37EC400DA6239 /* sequencedb.cpp in Sources */, A7E9B94012D37EC400DA6239 /* setdircommand.cpp in Sources */, A7E9B94112D37EC400DA6239 /* setlogfilecommand.cpp in Sources */, A7E9B94212D37EC400DA6239 /* sffinfocommand.cpp in Sources */, A7E9B94312D37EC400DA6239 /* shannon.cpp in Sources */, 483C952E188F0CAD0035E7B7 /* (null) in Sources */, A7E9B94412D37EC400DA6239 /* shannoneven.cpp in Sources */, A7E9B94512D37EC400DA6239 /* sharedace.cpp in Sources */, A7E9B94612D37EC400DA6239 /* sharedanderbergs.cpp in Sources */, A7E9B94712D37EC400DA6239 /* sharedbraycurtis.cpp in Sources */, A7E9B94812D37EC400DA6239 /* sharedchao1.cpp in Sources */, A7E9B94912D37EC400DA6239 /* sharedcommand.cpp in Sources */, A7E9B94A12D37EC400DA6239 /* sharedjabund.cpp in Sources */, A7E9B94B12D37EC400DA6239 /* sharedjackknife.cpp in Sources */, A7E9B94C12D37EC400DA6239 /* sharedjclass.cpp in Sources */, A7E9B94D12D37EC400DA6239 /* sharedjest.cpp in Sources */, A7E9B94E12D37EC400DA6239 /* sharedkstest.cpp in Sources */, A7E9B94F12D37EC400DA6239 /* sharedkulczynski.cpp in Sources */, A7E9B95012D37EC400DA6239 /* sharedkulczynskicody.cpp in Sources */, 48705AC419BE32C50075E977 /* getmimarkspackagecommand.cpp in Sources */, A7E9B95112D37EC400DA6239 /* sharedlennon.cpp in Sources */, A7E9B95212D37EC400DA6239 /* sharedlistvector.cpp in Sources */, A7E9B95312D37EC400DA6239 /* sharedmarczewski.cpp in Sources */, A7E9B95412D37EC400DA6239 /* sharedmorisitahorn.cpp in Sources */, A7E9B95512D37EC400DA6239 /* sharedochiai.cpp in Sources */, A7E9B95612D37EC400DA6239 /* sharedordervector.cpp in Sources */, A7E9B95712D37EC400DA6239 /* sharedrabundfloatvector.cpp in Sources */, A7E9B95812D37EC400DA6239 /* sharedrabundvector.cpp in Sources */, A7E9B95912D37EC400DA6239 /* sharedsabundvector.cpp in Sources */, A7E9B95A12D37EC400DA6239 /* sharedsobs.cpp in Sources */, A7E9B95B12D37EC400DA6239 /* sharedsobscollectsummary.cpp in Sources */, A7E9B95C12D37EC400DA6239 /* sharedsorabund.cpp in Sources */, A7E9B95D12D37EC400DA6239 /* sharedsorclass.cpp in Sources */, A7E9B95E12D37EC400DA6239 /* sharedsorest.cpp in Sources */, A7E9B95F12D37EC400DA6239 /* sharedthetan.cpp in Sources */, 48705AC819BE32C50075E977 /* abstractrandomforest.cpp in Sources */, A7E9B96012D37EC400DA6239 /* sharedthetayc.cpp in Sources */, A7E9B96112D37EC400DA6239 /* sharedutilities.cpp in Sources */, A7E9B96212D37EC400DA6239 /* shen.cpp in Sources */, A7E9B96312D37EC400DA6239 /* shhhercommand.cpp in Sources */, A7E9B96412D37EC400DA6239 /* simpson.cpp in Sources */, A7E9B96512D37EC400DA6239 /* simpsoneven.cpp in Sources */, A7E9B96612D37EC400DA6239 /* singlelinkage.cpp in Sources */, A7E9B96712D37EC400DA6239 /* slayer.cpp in Sources */, A7E9B96812D37EC400DA6239 /* slibshuff.cpp in Sources */, A7E9B96912D37EC400DA6239 /* smithwilson.cpp in Sources */, A7E9B96A12D37EC400DA6239 /* soergel.cpp in Sources */, A7E9B96B12D37EC400DA6239 /* solow.cpp in Sources */, A7E9B96C12D37EC400DA6239 /* sparsematrix.cpp in Sources */, 487C5A871AB88B93002AF48A /* mimarksattributescommand.cpp in Sources */, A7E9B96D12D37EC400DA6239 /* spearman.cpp in Sources */, 48705AC519BE32C50075E977 /* oligos.cpp in Sources */, A7E9B96E12D37EC400DA6239 /* speciesprofile.cpp in Sources */, A7E9B96F12D37EC400DA6239 /* splitabundcommand.cpp in Sources */, A7E9B97012D37EC400DA6239 /* splitgroupscommand.cpp in Sources */, A7E9B97112D37EC400DA6239 /* splitmatrix.cpp in Sources */, A7E9B97212D37EC400DA6239 /* structchi2.cpp in Sources */, A7E9B97312D37EC400DA6239 /* structchord.cpp in Sources */, A7E9B97412D37EC400DA6239 /* structeuclidean.cpp in Sources */, A7E9B97512D37EC400DA6239 /* structkulczynski.cpp in Sources */, A7E9B97612D37EC400DA6239 /* structpearson.cpp in Sources */, A7E9B97712D37EC400DA6239 /* subsamplecommand.cpp in Sources */, A7E9B97812D37EC400DA6239 /* suffixdb.cpp in Sources */, A7E9B97912D37EC400DA6239 /* suffixnodes.cpp in Sources */, A7E9B97A12D37EC400DA6239 /* suffixtree.cpp in Sources */, A7E9B97B12D37EC400DA6239 /* summarycommand.cpp in Sources */, A7E9B97C12D37EC400DA6239 /* summarysharedcommand.cpp in Sources */, A7E9B97D12D37EC400DA6239 /* systemcommand.cpp in Sources */, 48C51DF31A793EFE004ECDF1 /* kmeralign.cpp in Sources */, A7E9B97E12D37EC400DA6239 /* taxonomyequalizer.cpp in Sources */, A7E9B97F12D37EC400DA6239 /* tree.cpp in Sources */, A7E9B98012D37EC400DA6239 /* treegroupscommand.cpp in Sources */, A7E9B98112D37EC400DA6239 /* treemap.cpp in Sources */, A7E9B98212D37EC400DA6239 /* treenode.cpp in Sources */, A7E9B98312D37EC400DA6239 /* trimflowscommand.cpp in Sources */, A7E9B98412D37EC400DA6239 /* trimseqscommand.cpp in Sources */, A7E9B98512D37EC400DA6239 /* unifracunweightedcommand.cpp in Sources */, A7E9B98612D37EC400DA6239 /* unifracweightedcommand.cpp in Sources */, A7E9B98712D37EC400DA6239 /* unweighted.cpp in Sources */, A7E9B98812D37EC400DA6239 /* uvest.cpp in Sources */, A7E9B98912D37EC400DA6239 /* validcalculator.cpp in Sources */, A7E9B98A12D37EC400DA6239 /* validparameter.cpp in Sources */, A7E9B98B12D37EC400DA6239 /* venn.cpp in Sources */, A7E9B98C12D37EC400DA6239 /* venncommand.cpp in Sources */, A7E9B98D12D37EC400DA6239 /* weighted.cpp in Sources */, A7E9B98E12D37EC400DA6239 /* weightedlinkage.cpp in Sources */, A7E9B98F12D37EC400DA6239 /* whittaker.cpp in Sources */, A70332B712D3A13400761E33 /* makefile in Sources */, A7FC480E12D788F20055BC5C /* linearalgebra.cpp in Sources */, A7FC486712D795D60055BC5C /* pcacommand.cpp in Sources */, A713EBAC12DC7613000092AC /* readphylipvector.cpp in Sources */, A713EBED12DC7C5E000092AC /* nmdscommand.cpp in Sources */, A727864412E9E28C00F86ABA /* removerarecommand.cpp in Sources */, A71FE12C12EDF72400963CA7 /* mergegroupscommand.cpp in Sources */, 7E6BE10A12F710D8007ADDBE /* refchimeratest.cpp in Sources */, A7A61F2D130062E000E05B6B /* amovacommand.cpp in Sources */, A75790591301749D00A30DAB /* homovacommand.cpp in Sources */, 481623E21B56A2DB004C60B7 /* pcrseqscommand.cpp in Sources */, A7FA10021302E097003860FE /* mantelcommand.cpp in Sources */, A799F5B91309A3E000AEEFA0 /* makefastqcommand.cpp in Sources */, A71CB160130B04A2001E7287 /* anosimcommand.cpp in Sources */, A7FE7C401330EA1000F7B327 /* getcurrentcommand.cpp in Sources */, A7FE7E6D13311EA400F7B327 /* setcurrentcommand.cpp in Sources */, A778FE6B134CA6CA00C0BA33 /* getcommandinfocommand.cpp in Sources */, A74D36B8137DAFAA00332B0C /* chimerauchimecommand.cpp in Sources */, A77A221F139001B600B0BE70 /* deuniquetreecommand.cpp in Sources */, A7730EFF13967241007433A3 /* countseqscommand.cpp in Sources */, A721765713BB9F7D0014DAAE /* referencedb.cpp in Sources */, A73DDBBA13C4A0D1006AAE38 /* clearmemorycommand.cpp in Sources */, A73DDC3813C4BF64006AAE38 /* mothurmetastats.cpp in Sources */, A79234D713C74BF6002B08E2 /* mothurfisher.cpp in Sources */, A795840D13F13CD900F201D5 /* countgroupscommand.cpp in Sources */, A7FF19F2140FFDA500AD216D /* trimoligos.cpp in Sources */, A7F9F5CF141A5E500032F693 /* sequenceparser.cpp in Sources */, A7FFB558142CA02C004884F2 /* summarytaxcommand.cpp in Sources */, A7BF221414587886000AD524 /* myPerseus.cpp in Sources */, A7BF2232145879B2000AD524 /* chimeraperseuscommand.cpp in Sources */, A774101414695AF60098E6AC /* shhhseqscommand.cpp in Sources */, A774104814696F320098E6AC /* myseqdist.cpp in Sources */, 835FE03E19F00A4D005AA754 /* svm.cpp in Sources */, A77410F614697C300098E6AC /* seqnoise.cpp in Sources */, A754149714840CF7005850D1 /* summaryqualcommand.cpp in Sources */, 48705AC619BE32C50075E977 /* mergesfffilecommand.cpp in Sources */, A7A3C8C914D041AD00B1BFBE /* otuassociationcommand.cpp in Sources */, A7A32DAA14DC43B00001D2E5 /* sortseqscommand.cpp in Sources */, A7EEB0F514F29BFE00344B83 /* classifytreecommand.cpp in Sources */, 48C51DF01A76B888004ECDF1 /* fastqread.cpp in Sources */, A7C3DC0B14FE457500FE1924 /* cooccurrencecommand.cpp in Sources */, A7C3DC0F14FE469500FE1924 /* trialSwap2.cpp in Sources */, A77EBD2F1523709100ED407C /* createdatabasecommand.cpp in Sources */, A7876A26152A017C00A0AE86 /* subsample.cpp in Sources */, A7D755DA1535F679009BF21A /* treereader.cpp in Sources */, A724D2B7153C8628000A826F /* makebiomcommand.cpp in Sources */, 219C1DE01552C4BD004209F9 /* newcommandtemplate.cpp in Sources */, 219C1DE41559BCCF004209F9 /* getcoremicrobiomecommand.cpp in Sources */, A7A0671A1562946F0095C8C5 /* listotulabelscommand.cpp in Sources */, A7A0671F1562AC3E0095C8C5 /* makecontigscommand.cpp in Sources */, A70056E6156A93D000924A2D /* getotulabelscommand.cpp in Sources */, A70056EB156AB6E500924A2D /* removeotulabelscommand.cpp in Sources */, A73901081588C40900ED2ED6 /* loadlogfilecommand.cpp in Sources */, A74D59A4159A1E2000043046 /* counttable.cpp in Sources */, A7E0243D15B4520A00A5F046 /* sparsedistancematrix.cpp in Sources */, A741FAD215D1688E0067BCC5 /* sequencecountparser.cpp in Sources */, A7C7DAB915DA758B0059B0CF /* sffmultiplecommand.cpp in Sources */, A7386C251619E52300651424 /* abstractdecisiontree.cpp in Sources */, A7386C29161A110800651424 /* decisiontree.cpp in Sources */, A77E1938161B201E00DB1A2A /* randomforest.cpp in Sources */, 835FE03D19F00640005AA754 /* classifysvmsharedcommand.cpp in Sources */, A77E193B161B289600DB1A2A /* rftreenode.cpp in Sources */, A721AB6A161C570F009860A1 /* alignnode.cpp in Sources */, A721AB6B161C570F009860A1 /* aligntree.cpp in Sources */, A721AB71161C572A009860A1 /* kmernode.cpp in Sources */, A721AB72161C572A009860A1 /* kmertree.cpp in Sources */, A721AB77161C573B009860A1 /* taxonomynode.cpp in Sources */, 83F25B0C163B031200ABE73D /* forest.cpp in Sources */, 834D9D581656D7C400E7FAB9 /* regularizedrandomforest.cpp in Sources */, A7496D2E167B531B00CC7D7C /* kruskalwalliscommand.cpp in Sources */, A79EEF8616971D4A0006DEC1 /* filtersharedcommand.cpp in Sources */, A74C06E916A9C0A9008390A3 /* primerdesigncommand.cpp in Sources */, A7128B1D16B7002A00723BE4 /* getdistscommand.cpp in Sources */, A7B0231516B8244C006BA09E /* removedistscommand.cpp in Sources */, A799314B16CBD0CD0017E888 /* mergetaxsummarycommand.cpp in Sources */, A7548FAD17142EBC00B1F05A /* getmetacommunitycommand.cpp in Sources */, A7548FB0171440ED00B1F05A /* qFinderDMM.cpp in Sources */, A77B7185173D2240002163C2 /* sparcccommand.cpp in Sources */, A77B7188173D4042002163C2 /* randomnumber.cpp in Sources */, A77B718B173D40E5002163C2 /* calcsparcc.cpp in Sources */, A7E6F69E17427D06006775E2 /* makelookupcommand.cpp in Sources */, A7CFA4311755401800D9ED4D /* renameseqscommand.cpp in Sources */, A741744C175CD9B1007DF49B /* makelefsecommand.cpp in Sources */, A7190B221768E0DF00A9AFA6 /* lefsecommand.cpp in Sources */, A77916E8176F7F7600EEFE18 /* designmap.cpp in Sources */, A7D9378A17B146B5001E90B0 /* wilcox.cpp in Sources */, A7F24FC317EA36600021DC9A /* classifyrfsharedcommand.cpp in Sources */, A747EC71181EA0F900345732 /* sracommand.cpp in Sources */, A7132EB3184E792700AAA402 /* communitytype.cpp in Sources */, A7D395C4184FA3A200A350D7 /* kmeans.cpp in Sources */, A7222D731856277C0055A993 /* sharedjsd.cpp in Sources */, A7B093C018579F0400843CD1 /* pam.cpp in Sources */, A7A09B1018773C0E00FAA081 /* shannonrange.cpp in Sources */, ); runOnlyForDeploymentPostprocessing = 0; }; /* End PBXSourcesBuildPhase section */ /* Begin XCBuildConfiguration section */ 1DEB928608733DD80010E9CD /* Debug */ = { isa = XCBuildConfiguration; buildSettings = { ALWAYS_SEARCH_USER_PATHS = YES; ARCHS = "$(ARCHS_STANDARD_64_BIT)"; CLANG_CXX_LANGUAGE_STANDARD = "compiler-default"; CLANG_CXX_LIBRARY = "compiler-default"; CLANG_WARN_CXX0X_EXTENSIONS = YES; "CLANG_WARN_CXX0X_EXTENSIONS[arch=*]" = NO; CLANG_WARN_UNREACHABLE_CODE = YES; COPY_PHASE_STRIP = NO; DEPLOYMENT_LOCATION = YES; DSTROOT = TARGET_BUILD_DIR; "DSTROOT[sdk=*]" = TARGET_BUILD_DIR; "DYLIB_CURRENT_VERSION[sdk=*]" = ""; GCC_DYNAMIC_NO_PIC = NO; GCC_MODEL_TUNING = G5; GCC_OPTIMIZATION_LEVEL = 3; GCC_WARN_ABOUT_MISSING_PROTOTYPES = NO; GCC_WARN_UNINITIALIZED_AUTOS = NO; GCC_WARN_UNUSED_FUNCTION = YES; "INSTALL_PATH[sdk=*]" = TARGET_BUILD_DIR; "PRELINK_LIBS[arch=*]" = ""; PRODUCT_NAME = mothur; SDKROOT = macosx10.9; SKIP_INSTALL = NO; }; name = Debug; }; 1DEB928708733DD80010E9CD /* Release */ = { isa = XCBuildConfiguration; buildSettings = { ALWAYS_SEARCH_USER_PATHS = YES; ARCHS = "$(ARCHS_STANDARD_64_BIT)"; CLANG_CXX_LANGUAGE_STANDARD = "compiler-default"; CLANG_CXX_LIBRARY = "compiler-default"; CLANG_WARN_CXX0X_EXTENSIONS = NO; CLANG_WARN_UNREACHABLE_CODE = YES; DEBUG_INFORMATION_FORMAT = "dwarf-with-dsym"; DEPLOYMENT_LOCATION = YES; DSTROOT = TARGET_BUILD_DIR; GCC_MODEL_TUNING = G5; GCC_OPTIMIZATION_LEVEL = 3; GCC_WARN_ABOUT_MISSING_PROTOTYPES = NO; GCC_WARN_UNINITIALIZED_AUTOS = NO; GCC_WARN_UNUSED_VALUE = YES; ONLY_ACTIVE_ARCH = YES; PRODUCT_NAME = mothur; SDKROOT = macosx10.9; SKIP_INSTALL = NO; "VALID_ARCHS[sdk=*]" = "i386 x86_64"; }; name = Release; }; 1DEB928A08733DD80010E9CD /* Debug */ = { isa = XCBuildConfiguration; buildSettings = { CLANG_WARN_UNREACHABLE_CODE = YES; DEPLOYMENT_LOCATION = NO; GCC_C_LANGUAGE_STANDARD = "compiler-default"; GCC_ENABLE_SSE3_EXTENSIONS = NO; GCC_ENABLE_SSE41_EXTENSIONS = NO; GCC_ENABLE_SSE42_EXTENSIONS = NO; GCC_OPTIMIZATION_LEVEL = 3; GCC_PREPROCESSOR_DEFINITIONS = ( "MOTHUR_FILES=\"\\\"/Users/sarahwestcott/desktop/release\\\"\"", "VERSION=\"\\\"1.36.0\\\"\"", "RELEASE_DATE=\"\\\"07/23/2015\\\"\"", ); GCC_VERSION = "/usr/bin/c++"; "GCC_VERSION[arch=*]" = ""; GCC_WARN_ABOUT_MISSING_NEWLINE = YES; GCC_WARN_ABOUT_MISSING_PROTOTYPES = NO; GCC_WARN_ABOUT_RETURN_TYPE = YES; GCC_WARN_UNUSED_FUNCTION = YES; GCC_WARN_UNUSED_VARIABLE = YES; HEADER_SEARCH_PATHS = /usr/local/include/; "HEADER_SEARCH_PATHS[arch=*]" = ""; INSTALL_PATH = TARGET_BUILD_DIR; LIBRARY_SEARCH_PATHS = ""; "LIBRARY_SEARCH_PATHS[arch=*]" = /usr/local/lib/; MACH_O_TYPE = mh_execute; ONLY_ACTIVE_ARCH = YES; OTHER_CPLUSPLUSFLAGS = ( "-DBIT_VERSION", "-DUSE_BOOST", "-DUNIT_TEST", "-DUSE_READLINE", "$(OTHER_CFLAGS)", ); OTHER_LDFLAGS = ( "-lncurses", "-lreadline", "-lboost_iostreams", ); SDKROOT = macosx; SKIP_INSTALL = NO; USER_HEADER_SEARCH_PATHS = /usr/local/include/; }; name = Debug; }; 1DEB928B08733DD80010E9CD /* Release */ = { isa = XCBuildConfiguration; buildSettings = { CLANG_WARN_UNREACHABLE_CODE = YES; DEPLOYMENT_LOCATION = NO; GCC_C_LANGUAGE_STANDARD = "compiler-default"; GCC_GENERATE_DEBUGGING_SYMBOLS = NO; GCC_MODEL_TUNING = ""; GCC_OPTIMIZATION_LEVEL = 3; GCC_PREPROCESSOR_DEFINITIONS = ( "VERSION=\"\\\"1.36.0\\\"\"", "RELEASE_DATE=\"\\\"07/23/2015\\\"\"", ); GCC_VERSION = ""; GCC_WARN_ABOUT_MISSING_NEWLINE = YES; GCC_WARN_ABOUT_MISSING_PROTOTYPES = NO; GCC_WARN_ABOUT_RETURN_TYPE = YES; GCC_WARN_MISSING_PARENTHESES = YES; GCC_WARN_MULTIPLE_DEFINITION_TYPES_FOR_SELECTOR = YES; GCC_WARN_UNUSED_FUNCTION = YES; GCC_WARN_UNUSED_PARAMETER = YES; GCC_WARN_UNUSED_VALUE = YES; GCC_WARN_UNUSED_VARIABLE = YES; "HEADER_SEARCH_PATHS[arch=*]" = ( "$(inherited)", /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include, ); INSTALL_PATH = TARGET_BUILD_DIR; LIBRARY_SEARCH_PATHS = ""; "LIBRARY_SEARCH_PATHS[arch=*]" = /usr/local/lib/; MACH_O_TYPE = mh_execute; OTHER_CPLUSPLUSFLAGS = ( "-DUSE_READLINE", "-DUSE_BOOST", "-DUNIT_TEST", "-DBIT_VERSION", "$(OTHER_CFLAGS)", ); OTHER_LDFLAGS = ( "-lncurses", "-lreadline", "-lboost_iostreams", ); SDKROOT = macosx; SKIP_INSTALL = NO; USER_HEADER_SEARCH_PATHS = /usr/local/include/; }; name = Release; }; 481FB51D1AC0A63E0076CFF3 /* Debug */ = { isa = XCBuildConfiguration; buildSettings = { ALWAYS_SEARCH_USER_PATHS = YES; CLANG_CXX_LANGUAGE_STANDARD = "gnu++0x"; CLANG_CXX_LIBRARY = "libc++"; CLANG_ENABLE_MODULES = YES; CLANG_ENABLE_OBJC_ARC = YES; CLANG_WARN_BOOL_CONVERSION = YES; CLANG_WARN_CONSTANT_CONVERSION = YES; CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR; CLANG_WARN_EMPTY_BODY = YES; CLANG_WARN_ENUM_CONVERSION = YES; CLANG_WARN_INT_CONVERSION = YES; CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR; CLANG_WARN__DUPLICATE_METHOD_MATCH = YES; COPY_PHASE_STRIP = NO; ENABLE_STRICT_OBJC_MSGSEND = YES; GCC_C_LANGUAGE_STANDARD = gnu99; GCC_DYNAMIC_NO_PIC = NO; GCC_OPTIMIZATION_LEVEL = 0; GCC_PREPROCESSOR_DEFINITIONS = ( "DEBUG=1", "$(inherited)", ); GCC_SYMBOLS_PRIVATE_EXTERN = NO; GCC_WARN_64_TO_32_BIT_CONVERSION = YES; GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR; GCC_WARN_UNDECLARED_SELECTOR = YES; GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; MACOSX_DEPLOYMENT_TARGET = 10.9; MTL_ENABLE_DEBUG_INFO = YES; PRODUCT_NAME = "$(TARGET_NAME)"; }; name = Debug; }; 481FB51E1AC0A63E0076CFF3 /* Release */ = { isa = XCBuildConfiguration; buildSettings = { ALWAYS_SEARCH_USER_PATHS = YES; CLANG_CXX_LANGUAGE_STANDARD = "gnu++0x"; CLANG_CXX_LIBRARY = "libc++"; CLANG_ENABLE_MODULES = YES; CLANG_ENABLE_OBJC_ARC = YES; CLANG_WARN_BOOL_CONVERSION = YES; CLANG_WARN_CONSTANT_CONVERSION = YES; CLANG_WARN_DIRECT_OBJC_ISA_USAGE = YES_ERROR; CLANG_WARN_EMPTY_BODY = YES; CLANG_WARN_ENUM_CONVERSION = YES; CLANG_WARN_INT_CONVERSION = YES; CLANG_WARN_OBJC_ROOT_CLASS = YES_ERROR; CLANG_WARN__DUPLICATE_METHOD_MATCH = YES; COPY_PHASE_STRIP = NO; DEBUG_INFORMATION_FORMAT = "dwarf-with-dsym"; ENABLE_NS_ASSERTIONS = NO; ENABLE_STRICT_OBJC_MSGSEND = YES; GCC_C_LANGUAGE_STANDARD = gnu99; GCC_WARN_64_TO_32_BIT_CONVERSION = YES; GCC_WARN_ABOUT_RETURN_TYPE = YES_ERROR; GCC_WARN_UNDECLARED_SELECTOR = YES; GCC_WARN_UNINITIALIZED_AUTOS = YES_AGGRESSIVE; MACOSX_DEPLOYMENT_TARGET = 10.9; MTL_ENABLE_DEBUG_INFO = NO; PRODUCT_NAME = "$(TARGET_NAME)"; }; name = Release; }; /* End XCBuildConfiguration section */ /* Begin XCConfigurationList section */ 1DEB928508733DD80010E9CD /* Build configuration list for PBXNativeTarget "Mothur" */ = { isa = XCConfigurationList; buildConfigurations = ( 1DEB928608733DD80010E9CD /* Debug */, 1DEB928708733DD80010E9CD /* Release */, ); defaultConfigurationIsVisible = 0; defaultConfigurationName = Debug; }; 1DEB928908733DD80010E9CD /* Build configuration list for PBXProject "Mothur" */ = { isa = XCConfigurationList; buildConfigurations = ( 1DEB928A08733DD80010E9CD /* Debug */, 1DEB928B08733DD80010E9CD /* Release */, ); defaultConfigurationIsVisible = 0; defaultConfigurationName = Debug; }; 481FB51F1AC0A63E0076CFF3 /* Build configuration list for PBXNativeTarget "TestMothur" */ = { isa = XCConfigurationList; buildConfigurations = ( 481FB51D1AC0A63E0076CFF3 /* Debug */, 481FB51E1AC0A63E0076CFF3 /* Release */, ); defaultConfigurationIsVisible = 0; defaultConfigurationName = Debug; }; /* End XCConfigurationList section */ }; rootObject = 08FB7793FE84155DC02AAC07 /* Project object */; } mothur-1.36.1/README.md000066400000000000000000000016211255543666200144150ustar00rootroot00000000000000# README Welcome to the mothur project, initiated by Dr. Patrick Schloss and his software development team in the Department of Microbiology & Immunology at The University of Michigan. This project seeks to develop a single piece of open-source, expandable software to fill the bioinformatics needs of the microbial ecology community. mothur is available under the GPL license. Useful links... * [The current release](https://github.com/mothur/mothur/releases/latest) * [Wiki documentation](http://www.mothur.org/wiki) * [User forum](http://www.mothur.org/forum) * [Blog](http://www.mothur.org/forum) SOPs... * [MiSeq](http://www.mothur.org/wiki/MiSeq_SOP) * [454](http://www.mothur.org/wiki/454_SOP) References... * [SILVA](http://www.mothur.org/wiki/Silva_reference_files) * [greengenes](http://www.mothur.org/wiki/Greengenes-formatted_databases) * [RDP](http://www.mothur.org/wiki/RDP_reference_files) mothur-1.36.1/TestMothur/000077500000000000000000000000001255543666200152545ustar00rootroot00000000000000mothur-1.36.1/TestMothur/catch.hpp000077500000000000000000011610041255543666200170550ustar00rootroot00000000000000/* * CATCH v1.0 build 53 (master branch) * Generated: 2014-08-20 08:08:19.533804 * ---------------------------------------------------------- * This file has been merged from multiple headers. Please don't edit it directly * Copyright (c) 2012 Two Blue Cubes Ltd. All rights reserved. * * Distributed under the Boost Software License, Version 1.0. (See accompanying * file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) */ #ifndef TWOBLUECUBES_SINGLE_INCLUDE_CATCH_HPP_INCLUDED #define TWOBLUECUBES_SINGLE_INCLUDE_CATCH_HPP_INCLUDED #define UNIT_TEST #define TWOBLUECUBES_CATCH_HPP_INCLUDED // #included from: internal/catch_suppress_warnings.h #define TWOBLUECUBES_CATCH_SUPPRESS_WARNINGS_H_INCLUDED #ifdef __clang__ #pragma clang diagnostic ignored "-Wglobal-constructors" #pragma clang diagnostic ignored "-Wvariadic-macros" #pragma clang diagnostic ignored "-Wc99-extensions" #pragma clang diagnostic ignored "-Wunused-variable" #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wpadded" #pragma clang diagnostic ignored "-Wc++98-compat" #pragma clang diagnostic ignored "-Wc++98-compat-pedantic" #elif defined __GNUC__ #pragma GCC diagnostic ignored "-Wvariadic-macros" #pragma GCC diagnostic ignored "-Wunused-variable" #pragma GCC diagnostic push #pragma GCC diagnostic ignored "-Wpadded" #endif #ifdef CATCH_CONFIG_MAIN # define CATCH_CONFIG_RUNNER #endif #ifdef CATCH_CONFIG_RUNNER # ifndef CLARA_CONFIG_MAIN # define CLARA_CONFIG_MAIN_NOT_DEFINED # define CLARA_CONFIG_MAIN # endif #endif // #included from: internal/catch_notimplemented_exception.h #define TWOBLUECUBES_CATCH_NOTIMPLEMENTED_EXCEPTION_H_INCLUDED // #included from: catch_common.h #define TWOBLUECUBES_CATCH_COMMON_H_INCLUDED #define INTERNAL_CATCH_UNIQUE_NAME_LINE2( name, line ) name##line #define INTERNAL_CATCH_UNIQUE_NAME_LINE( name, line ) INTERNAL_CATCH_UNIQUE_NAME_LINE2( name, line ) #define INTERNAL_CATCH_UNIQUE_NAME( name ) INTERNAL_CATCH_UNIQUE_NAME_LINE( name, __LINE__ ) #define INTERNAL_CATCH_STRINGIFY2( expr ) #expr #define INTERNAL_CATCH_STRINGIFY( expr ) INTERNAL_CATCH_STRINGIFY2( expr ) #include #include #include // #included from: catch_compiler_capabilities.h #define TWOBLUECUBES_CATCH_COMPILER_CAPABILITIES_HPP_INCLUDED // Much of the following code is based on Boost (1.53) #ifdef __clang__ # if __has_feature(cxx_nullptr) # define CATCH_CONFIG_CPP11_NULLPTR # endif # if __has_feature(cxx_noexcept) # define CATCH_CONFIG_CPP11_NOEXCEPT # endif #endif // __clang__ //////////////////////////////////////////////////////////////////////////////// // Borland #ifdef __BORLANDC__ #if (__BORLANDC__ > 0x582 ) //#define CATCH_CONFIG_SFINAE // Not confirmed #endif #endif // __BORLANDC__ //////////////////////////////////////////////////////////////////////////////// // EDG #ifdef __EDG_VERSION__ #if (__EDG_VERSION__ > 238 ) //#define CATCH_CONFIG_SFINAE // Not confirmed #endif #endif // __EDG_VERSION__ //////////////////////////////////////////////////////////////////////////////// // Digital Mars #ifdef __DMC__ #if (__DMC__ > 0x840 ) //#define CATCH_CONFIG_SFINAE // Not confirmed #endif #endif // __DMC__ //////////////////////////////////////////////////////////////////////////////// // GCC #ifdef __GNUC__ #if __GNUC__ < 3 #if (__GNUC_MINOR__ >= 96 ) //#define CATCH_CONFIG_SFINAE #endif #elif __GNUC__ >= 3 // #define CATCH_CONFIG_SFINAE // Taking this out completely for now #endif // __GNUC__ < 3 #if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 6 && defined(__GXX_EXPERIMENTAL_CXX0X__) ) #define CATCH_CONFIG_CPP11_NULLPTR #endif #endif // __GNUC__ //////////////////////////////////////////////////////////////////////////////// // Visual C++ #ifdef _MSC_VER #if (_MSC_VER >= 1310 ) // (VC++ 7.0+) //#define CATCH_CONFIG_SFINAE // Not confirmed #endif #endif // _MSC_VER // Use variadic macros if the compiler supports them #if ( defined _MSC_VER && _MSC_VER > 1400 && !defined __EDGE__) || \ ( defined __WAVE__ && __WAVE_HAS_VARIADICS ) || \ ( defined __GNUC__ && __GNUC__ >= 3 ) || \ ( !defined __cplusplus && __STDC_VERSION__ >= 199901L || __cplusplus >= 201103L ) #ifndef CATCH_CONFIG_NO_VARIADIC_MACROS #define CATCH_CONFIG_VARIADIC_MACROS #endif #endif //////////////////////////////////////////////////////////////////////////////// // C++ language feature support // detect language version: #if (__cplusplus == 201103L) # define CATCH_CPP11 # define CATCH_CPP11_OR_GREATER #elif (__cplusplus >= 201103L) # define CATCH_CPP11_OR_GREATER #endif // noexcept support: #if defined(CATCH_CONFIG_CPP11_NOEXCEPT) && !defined(CATCH_NOEXCEPT) # define CATCH_NOEXCEPT noexcept # define CATCH_NOEXCEPT_IS(x) noexcept(x) #else # define CATCH_NOEXCEPT throw() # define CATCH_NOEXCEPT_IS(x) #endif namespace Catch { class NonCopyable { NonCopyable( NonCopyable const& ); void operator = ( NonCopyable const& ); protected: NonCopyable() {} virtual ~NonCopyable(); }; class SafeBool { public: typedef void (SafeBool::*type)() const; static type makeSafe( bool value ) { return value ? &SafeBool::trueValue : 0; } private: void trueValue() const {} }; template inline void deleteAll( ContainerT& container ) { typename ContainerT::const_iterator it = container.begin(); typename ContainerT::const_iterator itEnd = container.end(); for(; it != itEnd; ++it ) delete *it; } template inline void deleteAllValues( AssociativeContainerT& container ) { typename AssociativeContainerT::const_iterator it = container.begin(); typename AssociativeContainerT::const_iterator itEnd = container.end(); for(; it != itEnd; ++it ) delete it->second; } bool startsWith( std::string const& s, std::string const& prefix ); bool endsWith( std::string const& s, std::string const& suffix ); bool contains( std::string const& s, std::string const& infix ); void toLowerInPlace( std::string& s ); std::string toLower( std::string const& s ); std::string trim( std::string const& str ); struct pluralise { pluralise( std::size_t count, std::string const& label ); friend std::ostream& operator << ( std::ostream& os, pluralise const& pluraliser ); std::size_t m_count; std::string m_label; }; struct SourceLineInfo { SourceLineInfo(); SourceLineInfo( char const* _file, std::size_t _line ); SourceLineInfo( SourceLineInfo const& other ); # ifdef CATCH_CPP11_OR_GREATER SourceLineInfo( SourceLineInfo && ) = default; SourceLineInfo& operator = ( SourceLineInfo const& ) = default; SourceLineInfo& operator = ( SourceLineInfo && ) = default; # endif bool empty() const; bool operator == ( SourceLineInfo const& other ) const; std::string file; std::size_t line; }; std::ostream& operator << ( std::ostream& os, SourceLineInfo const& info ); // This is just here to avoid compiler warnings with macro constants and boolean literals inline bool isTrue( bool value ){ return value; } inline bool alwaysTrue() { return true; } inline bool alwaysFalse() { return false; } void throwLogicError( std::string const& message, SourceLineInfo const& locationInfo ); // Use this in variadic streaming macros to allow // >> +StreamEndStop // as well as // >> stuff +StreamEndStop struct StreamEndStop { std::string operator+() { return std::string(); } }; template T const& operator + ( T const& value, StreamEndStop ) { return value; } } #define CATCH_INTERNAL_LINEINFO ::Catch::SourceLineInfo( __FILE__, static_cast( __LINE__ ) ) #define CATCH_INTERNAL_ERROR( msg ) ::Catch::throwLogicError( msg, CATCH_INTERNAL_LINEINFO ); #include namespace Catch { class NotImplementedException : public std::exception { public: NotImplementedException( SourceLineInfo const& lineInfo ); NotImplementedException( NotImplementedException const& ) {} virtual ~NotImplementedException() CATCH_NOEXCEPT {} virtual const char* what() const CATCH_NOEXCEPT; private: std::string m_what; SourceLineInfo m_lineInfo; }; } // end namespace Catch /////////////////////////////////////////////////////////////////////////////// #define CATCH_NOT_IMPLEMENTED throw Catch::NotImplementedException( CATCH_INTERNAL_LINEINFO ) // #included from: internal/catch_context.h #define TWOBLUECUBES_CATCH_CONTEXT_H_INCLUDED // #included from: catch_interfaces_generators.h #define TWOBLUECUBES_CATCH_INTERFACES_GENERATORS_H_INCLUDED #include namespace Catch { struct IGeneratorInfo { virtual ~IGeneratorInfo(); virtual bool moveNext() = 0; virtual std::size_t getCurrentIndex() const = 0; }; struct IGeneratorsForTest { virtual ~IGeneratorsForTest(); virtual IGeneratorInfo& getGeneratorInfo( std::string const& fileInfo, std::size_t size ) = 0; virtual bool moveNext() = 0; }; IGeneratorsForTest* createGeneratorsForTest(); } // end namespace Catch // #included from: catch_ptr.hpp #define TWOBLUECUBES_CATCH_PTR_HPP_INCLUDED #ifdef __clang__ #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wpadded" #endif namespace Catch { // An intrusive reference counting smart pointer. // T must implement addRef() and release() methods // typically implementing the IShared interface template class Ptr { public: Ptr() : m_p( NULL ){} Ptr( T* p ) : m_p( p ){ if( m_p ) m_p->addRef(); } Ptr( Ptr const& other ) : m_p( other.m_p ){ if( m_p ) m_p->addRef(); } ~Ptr(){ if( m_p ) m_p->release(); } void reset() { if( m_p ) m_p->release(); m_p = NULL; } Ptr& operator = ( T* p ){ Ptr temp( p ); swap( temp ); return *this; } Ptr& operator = ( Ptr const& other ){ Ptr temp( other ); swap( temp ); return *this; } void swap( Ptr& other ) { std::swap( m_p, other.m_p ); } T* get() { return m_p; } const T* get() const{ return m_p; } T& operator*() const { return *m_p; } T* operator->() const { return m_p; } bool operator !() const { return m_p == NULL; } operator SafeBool::type() const { return SafeBool::makeSafe( m_p != NULL ); } private: T* m_p; }; struct IShared : NonCopyable { virtual ~IShared(); virtual void addRef() const = 0; virtual void release() const = 0; }; template struct SharedImpl : T { SharedImpl() : m_rc( 0 ){} virtual void addRef() const { ++m_rc; } virtual void release() const { if( --m_rc == 0 ) delete this; } mutable unsigned int m_rc; }; } // end namespace Catch #ifdef __clang__ #pragma clang diagnostic pop #endif #include #include #include namespace Catch { class TestCase; class Stream; struct IResultCapture; struct IRunner; struct IGeneratorsForTest; struct IConfig; struct IContext { virtual ~IContext(); virtual IResultCapture* getResultCapture() = 0; virtual IRunner* getRunner() = 0; virtual size_t getGeneratorIndex( std::string const& fileInfo, size_t totalSize ) = 0; virtual bool advanceGeneratorsForCurrentTest() = 0; virtual Ptr getConfig() const = 0; }; struct IMutableContext : IContext { virtual ~IMutableContext(); virtual void setResultCapture( IResultCapture* resultCapture ) = 0; virtual void setRunner( IRunner* runner ) = 0; virtual void setConfig( Ptr const& config ) = 0; }; IContext& getCurrentContext(); IMutableContext& getCurrentMutableContext(); void cleanUpContext(); Stream createStream( std::string const& streamName ); } // #included from: internal/catch_test_registry.hpp #define TWOBLUECUBES_CATCH_TEST_REGISTRY_HPP_INCLUDED // #included from: catch_interfaces_testcase.h #define TWOBLUECUBES_CATCH_INTERFACES_TESTCASE_H_INCLUDED #include namespace Catch { class TestSpec; struct ITestCase : IShared { virtual void invoke () const = 0; protected: virtual ~ITestCase(); }; class TestCase; struct IConfig; struct ITestCaseRegistry { virtual ~ITestCaseRegistry(); virtual std::vector const& getAllTests() const = 0; virtual void getFilteredTests( TestSpec const& testSpec, IConfig const& config, std::vector& matchingTestCases ) const = 0; }; } namespace Catch { template class MethodTestCase : public SharedImpl { public: MethodTestCase( void (C::*method)() ) : m_method( method ) {} virtual void invoke() const { C obj; (obj.*m_method)(); } private: virtual ~MethodTestCase() {} void (C::*m_method)(); }; typedef void(*TestFunction)(); struct NameAndDesc { NameAndDesc( const char* _name = "", const char* _description= "" ) : name( _name ), description( _description ) {} const char* name; const char* description; }; struct AutoReg { AutoReg( TestFunction function, SourceLineInfo const& lineInfo, NameAndDesc const& nameAndDesc ); template AutoReg( void (C::*method)(), char const* className, NameAndDesc const& nameAndDesc, SourceLineInfo const& lineInfo ) { registerTestCase( new MethodTestCase( method ), className, nameAndDesc, lineInfo ); } void registerTestCase( ITestCase* testCase, char const* className, NameAndDesc const& nameAndDesc, SourceLineInfo const& lineInfo ); ~AutoReg(); private: AutoReg( AutoReg const& ); void operator= ( AutoReg const& ); }; } // end namespace Catch #ifdef CATCH_CONFIG_VARIADIC_MACROS /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_TESTCASE( ... ) \ static void INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ )(); \ namespace{ Catch::AutoReg INTERNAL_CATCH_UNIQUE_NAME( autoRegistrar )( &INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ ), CATCH_INTERNAL_LINEINFO, Catch::NameAndDesc( __VA_ARGS__ ) ); }\ static void INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ )() /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_METHOD_AS_TEST_CASE( QualifiedMethod, ... ) \ namespace{ Catch::AutoReg INTERNAL_CATCH_UNIQUE_NAME( autoRegistrar )( &QualifiedMethod, "&" #QualifiedMethod, Catch::NameAndDesc( __VA_ARGS__ ), CATCH_INTERNAL_LINEINFO ); } /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_TEST_CASE_METHOD( ClassName, ... )\ namespace{ \ struct INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ ) : ClassName{ \ void test(); \ }; \ Catch::AutoReg INTERNAL_CATCH_UNIQUE_NAME( autoRegistrar ) ( &INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ )::test, #ClassName, Catch::NameAndDesc( __VA_ARGS__ ), CATCH_INTERNAL_LINEINFO ); \ } \ void INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ )::test() #else /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_TESTCASE( Name, Desc ) \ static void INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ )(); \ namespace{ Catch::AutoReg INTERNAL_CATCH_UNIQUE_NAME( autoRegistrar )( &INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ ), CATCH_INTERNAL_LINEINFO, Catch::NameAndDesc( Name, Desc ) ); }\ static void INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ )() /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_METHOD_AS_TEST_CASE( QualifiedMethod, Name, Desc ) \ namespace{ Catch::AutoReg INTERNAL_CATCH_UNIQUE_NAME( autoRegistrar )( &QualifiedMethod, "&" #QualifiedMethod, Catch::NameAndDesc( Name, Desc ), CATCH_INTERNAL_LINEINFO ); } /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_TEST_CASE_METHOD( ClassName, TestName, Desc )\ namespace{ \ struct INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ ) : ClassName{ \ void test(); \ }; \ Catch::AutoReg INTERNAL_CATCH_UNIQUE_NAME( autoRegistrar ) ( &INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ )::test, #ClassName, Catch::NameAndDesc( TestName, Desc ), CATCH_INTERNAL_LINEINFO ); \ } \ void INTERNAL_CATCH_UNIQUE_NAME( ____C_A_T_C_H____T_E_S_T____ )::test() #endif // #included from: internal/catch_capture.hpp #define TWOBLUECUBES_CATCH_CAPTURE_HPP_INCLUDED // #included from: catch_result_builder.h #define TWOBLUECUBES_CATCH_RESULT_BUILDER_H_INCLUDED // #included from: catch_result_type.h #define TWOBLUECUBES_CATCH_RESULT_TYPE_H_INCLUDED namespace Catch { // ResultWas::OfType enum struct ResultWas { enum OfType { Unknown = -1, Ok = 0, Info = 1, Warning = 2, FailureBit = 0x10, ExpressionFailed = FailureBit | 1, ExplicitFailure = FailureBit | 2, Exception = 0x100 | FailureBit, ThrewException = Exception | 1, DidntThrowException = Exception | 2 }; }; inline bool isOk( ResultWas::OfType resultType ) { return ( resultType & ResultWas::FailureBit ) == 0; } inline bool isJustInfo( int flags ) { return flags == ResultWas::Info; } // ResultDisposition::Flags enum struct ResultDisposition { enum Flags { Normal = 0x00, ContinueOnFailure = 0x01, // Failures fail test, but execution continues FalseTest = 0x02, // Prefix expression with ! SuppressFail = 0x04 // Failures are reported but do not fail the test }; }; inline ResultDisposition::Flags operator | ( ResultDisposition::Flags lhs, ResultDisposition::Flags rhs ) { return static_cast( static_cast( lhs ) | static_cast( rhs ) ); } inline bool shouldContinueOnFailure( int flags ) { return ( flags & ResultDisposition::ContinueOnFailure ) != 0; } inline bool isFalseTest( int flags ) { return ( flags & ResultDisposition::FalseTest ) != 0; } inline bool shouldSuppressFailure( int flags ) { return ( flags & ResultDisposition::SuppressFail ) != 0; } } // end namespace Catch // #included from: catch_assertionresult.h #define TWOBLUECUBES_CATCH_ASSERTIONRESULT_H_INCLUDED #include namespace Catch { struct AssertionInfo { AssertionInfo() {} AssertionInfo( std::string const& _macroName, SourceLineInfo const& _lineInfo, std::string const& _capturedExpression, ResultDisposition::Flags _resultDisposition ); std::string macroName; SourceLineInfo lineInfo; std::string capturedExpression; ResultDisposition::Flags resultDisposition; }; struct AssertionResultData { AssertionResultData() : resultType( ResultWas::Unknown ) {} std::string reconstructedExpression; std::string message; ResultWas::OfType resultType; }; class AssertionResult { public: AssertionResult(); AssertionResult( AssertionInfo const& info, AssertionResultData const& data ); ~AssertionResult(); # ifdef CATCH_CPP11_OR_GREATER AssertionResult( AssertionResult const& ) = default; AssertionResult( AssertionResult && ) = default; AssertionResult& operator = ( AssertionResult const& ) = default; AssertionResult& operator = ( AssertionResult && ) = default; # endif bool isOk() const; bool succeeded() const; ResultWas::OfType getResultType() const; bool hasExpression() const; bool hasMessage() const; std::string getExpression() const; std::string getExpressionInMacro() const; bool hasExpandedExpression() const; std::string getExpandedExpression() const; std::string getMessage() const; SourceLineInfo getSourceInfo() const; std::string getTestMacroName() const; protected: AssertionInfo m_info; AssertionResultData m_resultData; }; } // end namespace Catch namespace Catch { struct TestFailureException{}; template class ExpressionLhs; struct STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison; struct CopyableStream { CopyableStream() {} CopyableStream( CopyableStream const& other ) { oss << other.oss.str(); } CopyableStream& operator=( CopyableStream const& other ) { oss.str(""); oss << other.oss.str(); return *this; } std::ostringstream oss; }; class ResultBuilder { public: ResultBuilder( char const* macroName, SourceLineInfo const& lineInfo, char const* capturedExpression, ResultDisposition::Flags resultDisposition ); template ExpressionLhs operator->* ( T const& operand ); ExpressionLhs operator->* ( bool value ); template ResultBuilder& operator << ( T const& value ) { m_stream.oss << value; return *this; } template STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison& operator && ( RhsT const& ); template STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison& operator || ( RhsT const& ); ResultBuilder& setResultType( ResultWas::OfType result ); ResultBuilder& setResultType( bool result ); ResultBuilder& setLhs( std::string const& lhs ); ResultBuilder& setRhs( std::string const& rhs ); ResultBuilder& setOp( std::string const& op ); void endExpression(); std::string reconstructExpression() const; AssertionResult build() const; void useActiveException( ResultDisposition::Flags resultDisposition = ResultDisposition::Normal ); void captureResult( ResultWas::OfType resultType ); void captureExpression(); void react(); bool shouldDebugBreak() const; bool allowThrows() const; private: AssertionInfo m_assertionInfo; AssertionResultData m_data; struct ExprComponents { ExprComponents() : testFalse( false ) {} bool testFalse; std::string lhs, rhs, op; } m_exprComponents; CopyableStream m_stream; bool m_shouldDebugBreak; bool m_shouldThrow; }; } // namespace Catch // Include after due to circular dependency: // #included from: catch_expression_lhs.hpp #define TWOBLUECUBES_CATCH_EXPRESSION_LHS_HPP_INCLUDED // #included from: catch_evaluate.hpp #define TWOBLUECUBES_CATCH_EVALUATE_HPP_INCLUDED #ifdef _MSC_VER #pragma warning(push) #pragma warning(disable:4389) // '==' : signed/unsigned mismatch #endif #include namespace Catch { namespace Internal { enum Operator { IsEqualTo, IsNotEqualTo, IsLessThan, IsGreaterThan, IsLessThanOrEqualTo, IsGreaterThanOrEqualTo }; template struct OperatorTraits { static const char* getName(){ return "*error*"; } }; template<> struct OperatorTraits { static const char* getName(){ return "=="; } }; template<> struct OperatorTraits { static const char* getName(){ return "!="; } }; template<> struct OperatorTraits { static const char* getName(){ return "<"; } }; template<> struct OperatorTraits { static const char* getName(){ return ">"; } }; template<> struct OperatorTraits { static const char* getName(){ return "<="; } }; template<> struct OperatorTraits{ static const char* getName(){ return ">="; } }; template inline T& opCast(T const& t) { return const_cast(t); } // nullptr_t support based on pull request #154 from Konstantin Baumann #ifdef CATCH_CONFIG_CPP11_NULLPTR inline std::nullptr_t opCast(std::nullptr_t) { return nullptr; } #endif // CATCH_CONFIG_CPP11_NULLPTR // So the compare overloads can be operator agnostic we convey the operator as a template // enum, which is used to specialise an Evaluator for doing the comparison. template class Evaluator{}; template struct Evaluator { static bool evaluate( T1 const& lhs, T2 const& rhs) { return opCast( lhs ) == opCast( rhs ); } }; template struct Evaluator { static bool evaluate( T1 const& lhs, T2 const& rhs ) { return opCast( lhs ) != opCast( rhs ); } }; template struct Evaluator { static bool evaluate( T1 const& lhs, T2 const& rhs ) { return opCast( lhs ) < opCast( rhs ); } }; template struct Evaluator { static bool evaluate( T1 const& lhs, T2 const& rhs ) { return opCast( lhs ) > opCast( rhs ); } }; template struct Evaluator { static bool evaluate( T1 const& lhs, T2 const& rhs ) { return opCast( lhs ) >= opCast( rhs ); } }; template struct Evaluator { static bool evaluate( T1 const& lhs, T2 const& rhs ) { return opCast( lhs ) <= opCast( rhs ); } }; template bool applyEvaluator( T1 const& lhs, T2 const& rhs ) { return Evaluator::evaluate( lhs, rhs ); } // This level of indirection allows us to specialise for integer types // to avoid signed/ unsigned warnings // "base" overload template bool compare( T1 const& lhs, T2 const& rhs ) { return Evaluator::evaluate( lhs, rhs ); } // unsigned X to int template bool compare( unsigned int lhs, int rhs ) { return applyEvaluator( lhs, static_cast( rhs ) ); } template bool compare( unsigned long lhs, int rhs ) { return applyEvaluator( lhs, static_cast( rhs ) ); } template bool compare( unsigned char lhs, int rhs ) { return applyEvaluator( lhs, static_cast( rhs ) ); } // unsigned X to long template bool compare( unsigned int lhs, long rhs ) { return applyEvaluator( lhs, static_cast( rhs ) ); } template bool compare( unsigned long lhs, long rhs ) { return applyEvaluator( lhs, static_cast( rhs ) ); } template bool compare( unsigned char lhs, long rhs ) { return applyEvaluator( lhs, static_cast( rhs ) ); } // int to unsigned X template bool compare( int lhs, unsigned int rhs ) { return applyEvaluator( static_cast( lhs ), rhs ); } template bool compare( int lhs, unsigned long rhs ) { return applyEvaluator( static_cast( lhs ), rhs ); } template bool compare( int lhs, unsigned char rhs ) { return applyEvaluator( static_cast( lhs ), rhs ); } // long to unsigned X template bool compare( long lhs, unsigned int rhs ) { return applyEvaluator( static_cast( lhs ), rhs ); } template bool compare( long lhs, unsigned long rhs ) { return applyEvaluator( static_cast( lhs ), rhs ); } template bool compare( long lhs, unsigned char rhs ) { return applyEvaluator( static_cast( lhs ), rhs ); } // pointer to long (when comparing against NULL) template bool compare( long lhs, T* rhs ) { return Evaluator::evaluate( reinterpret_cast( lhs ), rhs ); } template bool compare( T* lhs, long rhs ) { return Evaluator::evaluate( lhs, reinterpret_cast( rhs ) ); } // pointer to int (when comparing against NULL) template bool compare( int lhs, T* rhs ) { return Evaluator::evaluate( reinterpret_cast( lhs ), rhs ); } template bool compare( T* lhs, int rhs ) { return Evaluator::evaluate( lhs, reinterpret_cast( rhs ) ); } #ifdef CATCH_CONFIG_CPP11_NULLPTR // pointer to nullptr_t (when comparing against nullptr) template bool compare( std::nullptr_t, T* rhs ) { return Evaluator::evaluate( NULL, rhs ); } template bool compare( T* lhs, std::nullptr_t ) { return Evaluator::evaluate( lhs, NULL ); } #endif // CATCH_CONFIG_CPP11_NULLPTR } // end of namespace Internal } // end of namespace Catch #ifdef _MSC_VER #pragma warning(pop) #endif // #included from: catch_tostring.h #define TWOBLUECUBES_CATCH_TOSTRING_H_INCLUDED // #included from: catch_sfinae.hpp #define TWOBLUECUBES_CATCH_SFINAE_HPP_INCLUDED // Try to detect if the current compiler supports SFINAE namespace Catch { struct TrueType { static const bool value = true; typedef void Enable; char sizer[1]; }; struct FalseType { static const bool value = false; typedef void Disable; char sizer[2]; }; #ifdef CATCH_CONFIG_SFINAE template struct NotABooleanExpression; template struct If : NotABooleanExpression {}; template<> struct If : TrueType {}; template<> struct If : FalseType {}; template struct SizedIf; template<> struct SizedIf : TrueType {}; template<> struct SizedIf : FalseType {}; #endif // CATCH_CONFIG_SFINAE } // end namespace Catch #include #include #include #include #include #ifdef __OBJC__ // #included from: catch_objc_arc.hpp #define TWOBLUECUBES_CATCH_OBJC_ARC_HPP_INCLUDED #import #ifdef __has_feature #define CATCH_ARC_ENABLED __has_feature(objc_arc) #else #define CATCH_ARC_ENABLED 0 #endif void arcSafeRelease( NSObject* obj ); id performOptionalSelector( id obj, SEL sel ); #if !CATCH_ARC_ENABLED inline void arcSafeRelease( NSObject* obj ) { [obj release]; } inline id performOptionalSelector( id obj, SEL sel ) { if( [obj respondsToSelector: sel] ) return [obj performSelector: sel]; return nil; } #define CATCH_UNSAFE_UNRETAINED #define CATCH_ARC_STRONG #else inline void arcSafeRelease( NSObject* ){} inline id performOptionalSelector( id obj, SEL sel ) { #ifdef __clang__ #pragma clang diagnostic push #pragma clang diagnostic ignored "-Warc-performSelector-leaks" #endif if( [obj respondsToSelector: sel] ) return [obj performSelector: sel]; #ifdef __clang__ #pragma clang diagnostic pop #endif return nil; } #define CATCH_UNSAFE_UNRETAINED __unsafe_unretained #define CATCH_ARC_STRONG __strong #endif #endif namespace Catch { namespace Detail { // SFINAE is currently disabled by default for all compilers. // If the non SFINAE version of IsStreamInsertable is ambiguous for you // and your compiler supports SFINAE, try #defining CATCH_CONFIG_SFINAE #ifdef CATCH_CONFIG_SFINAE template class IsStreamInsertableHelper { template struct TrueIfSizeable : TrueType {}; template static TrueIfSizeable dummy(T2*); static FalseType dummy(...); public: typedef SizedIf type; }; template struct IsStreamInsertable : IsStreamInsertableHelper::type {}; #else struct BorgType { template BorgType( T const& ); }; TrueType& testStreamable( std::ostream& ); FalseType testStreamable( FalseType ); FalseType operator<<( std::ostream const&, BorgType const& ); template struct IsStreamInsertable { static std::ostream &s; static T const&t; enum { value = sizeof( testStreamable(s << t) ) == sizeof( TrueType ) }; }; #endif template struct StringMakerBase { template static std::string convert( T const& ) { return "{?}"; } }; template<> struct StringMakerBase { template static std::string convert( T const& _value ) { std::ostringstream oss; oss << _value; return oss.str(); } }; std::string rawMemoryToString( const void *object, std::size_t size ); template inline std::string rawMemoryToString( const T& object ) { return rawMemoryToString( &object, sizeof(object) ); } } // end namespace Detail template std::string toString( T const& value ); template struct StringMaker : Detail::StringMakerBase::value> {}; template struct StringMaker { template static std::string convert( U* p ) { if( !p ) return INTERNAL_CATCH_STRINGIFY( NULL ); else return Detail::rawMemoryToString( p ); } }; template struct StringMaker { static std::string convert( R C::* p ) { if( !p ) return INTERNAL_CATCH_STRINGIFY( NULL ); else return Detail::rawMemoryToString( p ); } }; namespace Detail { template std::string rangeToString( InputIterator first, InputIterator last ); } template struct StringMaker > { static std::string convert( std::vector const& v ) { return Detail::rangeToString( v.begin(), v.end() ); } }; namespace Detail { template std::string makeString( T const& value ) { return StringMaker::convert( value ); } } // end namespace Detail /// \brief converts any type to a string /// /// The default template forwards on to ostringstream - except when an /// ostringstream overload does not exist - in which case it attempts to detect /// that and writes {?}. /// Overload (not specialise) this template for custom typs that you don't want /// to provide an ostream overload for. template std::string toString( T const& value ) { return StringMaker::convert( value ); } // Built in overloads std::string toString( std::string const& value ); std::string toString( std::wstring const& value ); std::string toString( const char* const value ); std::string toString( char* const value ); std::string toString( const wchar_t* const value ); std::string toString( wchar_t* const value ); std::string toString( int value ); std::string toString( unsigned long value ); std::string toString( unsigned int value ); std::string toString( const double value ); std::string toString( const float value ); std::string toString( bool value ); std::string toString( char value ); std::string toString( signed char value ); std::string toString( unsigned char value ); #ifdef CATCH_CONFIG_CPP11_NULLPTR std::string toString( std::nullptr_t ); #endif #ifdef __OBJC__ std::string toString( NSString const * const& nsstring ); std::string toString( NSString * CATCH_ARC_STRONG const& nsstring ); std::string toString( NSObject* const& nsObject ); #endif namespace Detail { template std::string rangeToString( InputIterator first, InputIterator last ) { std::ostringstream oss; oss << "{ "; if( first != last ) { oss << toString( *first ); for( ++first ; first != last ; ++first ) { oss << ", " << toString( *first ); } } oss << " }"; return oss.str(); } } } // end namespace Catch namespace Catch { // Wraps the LHS of an expression and captures the operator and RHS (if any) - // wrapping them all in a ResultBuilder object template class ExpressionLhs { ExpressionLhs& operator = ( ExpressionLhs const& ); # ifdef CATCH_CPP11_OR_GREATER ExpressionLhs& operator = ( ExpressionLhs && ) = delete; # endif public: ExpressionLhs( ResultBuilder& rb, T lhs ) : m_rb( rb ), m_lhs( lhs ) {} # ifdef CATCH_CPP11_OR_GREATER ExpressionLhs( ExpressionLhs const& ) = default; ExpressionLhs( ExpressionLhs && ) = default; # endif template ResultBuilder& operator == ( RhsT const& rhs ) { return captureExpression( rhs ); } template ResultBuilder& operator != ( RhsT const& rhs ) { return captureExpression( rhs ); } template ResultBuilder& operator < ( RhsT const& rhs ) { return captureExpression( rhs ); } template ResultBuilder& operator > ( RhsT const& rhs ) { return captureExpression( rhs ); } template ResultBuilder& operator <= ( RhsT const& rhs ) { return captureExpression( rhs ); } template ResultBuilder& operator >= ( RhsT const& rhs ) { return captureExpression( rhs ); } ResultBuilder& operator == ( bool rhs ) { return captureExpression( rhs ); } ResultBuilder& operator != ( bool rhs ) { return captureExpression( rhs ); } void endExpression() { bool value = m_lhs ? true : false; m_rb .setLhs( Catch::toString( value ) ) .setResultType( value ) .endExpression(); } // Only simple binary expressions are allowed on the LHS. // If more complex compositions are required then place the sub expression in parentheses template STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison& operator + ( RhsT const& ); template STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison& operator - ( RhsT const& ); template STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison& operator / ( RhsT const& ); template STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison& operator * ( RhsT const& ); template STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison& operator && ( RhsT const& ); template STATIC_ASSERT_Expression_Too_Complex_Please_Rewrite_As_Binary_Comparison& operator || ( RhsT const& ); private: template ResultBuilder& captureExpression( RhsT const& rhs ) { return m_rb .setResultType( Internal::compare( m_lhs, rhs ) ) .setLhs( Catch::toString( m_lhs ) ) .setRhs( Catch::toString( rhs ) ) .setOp( Internal::OperatorTraits::getName() ); } private: ResultBuilder& m_rb; T m_lhs; }; } // end namespace Catch namespace Catch { template inline ExpressionLhs ResultBuilder::operator->* ( T const& operand ) { return ExpressionLhs( *this, operand ); } inline ExpressionLhs ResultBuilder::operator->* ( bool value ) { return ExpressionLhs( *this, value ); } } // namespace Catch // #included from: catch_message.h #define TWOBLUECUBES_CATCH_MESSAGE_H_INCLUDED #include namespace Catch { struct MessageInfo { MessageInfo( std::string const& _macroName, SourceLineInfo const& _lineInfo, ResultWas::OfType _type ); std::string macroName; SourceLineInfo lineInfo; ResultWas::OfType type; std::string message; unsigned int sequence; bool operator == ( MessageInfo const& other ) const { return sequence == other.sequence; } bool operator < ( MessageInfo const& other ) const { return sequence < other.sequence; } private: static unsigned int globalCount; }; struct MessageBuilder { MessageBuilder( std::string const& macroName, SourceLineInfo const& lineInfo, ResultWas::OfType type ) : m_info( macroName, lineInfo, type ) {} template MessageBuilder& operator << ( T const& value ) { m_stream << value; return *this; } MessageInfo m_info; std::ostringstream m_stream; }; class ScopedMessage { public: ScopedMessage( MessageBuilder const& builder ); ScopedMessage( ScopedMessage const& other ); ~ScopedMessage(); MessageInfo m_info; }; } // end namespace Catch // #included from: catch_interfaces_capture.h #define TWOBLUECUBES_CATCH_INTERFACES_CAPTURE_H_INCLUDED #include namespace Catch { class TestCase; class AssertionResult; struct AssertionInfo; struct SectionInfo; struct MessageInfo; class ScopedMessageBuilder; struct Counts; struct IResultCapture { virtual ~IResultCapture(); virtual void assertionEnded( AssertionResult const& result ) = 0; virtual bool sectionStarted( SectionInfo const& sectionInfo, Counts& assertions ) = 0; virtual void sectionEnded( SectionInfo const& name, Counts const& assertions, double _durationInSeconds ) = 0; virtual void pushScopedMessage( MessageInfo const& message ) = 0; virtual void popScopedMessage( MessageInfo const& message ) = 0; virtual std::string getCurrentTestName() const = 0; virtual const AssertionResult* getLastResult() const = 0; }; IResultCapture& getResultCapture(); } // #included from: catch_debugger.h #define TWOBLUECUBES_CATCH_DEBUGGER_H_INCLUDED // #included from: catch_platform.h #define TWOBLUECUBES_CATCH_PLATFORM_H_INCLUDED #if defined(__MAC_OS_X_VERSION_MIN_REQUIRED) #define CATCH_PLATFORM_MAC #elif defined(__IPHONE_OS_VERSION_MIN_REQUIRED) #define CATCH_PLATFORM_IPHONE #elif defined(WIN32) || defined(__WIN32__) || defined(_WIN32) || defined(_MSC_VER) #define CATCH_PLATFORM_WINDOWS #endif #include namespace Catch{ bool isDebuggerActive(); void writeToDebugConsole( std::string const& text ); } #ifdef CATCH_PLATFORM_MAC // The following code snippet based on: // http://cocoawithlove.com/2008/03/break-into-debugger.html #ifdef DEBUG #if defined(__ppc64__) || defined(__ppc__) #define CATCH_BREAK_INTO_DEBUGGER() \ if( Catch::isDebuggerActive() ) { \ __asm__("li r0, 20\nsc\nnop\nli r0, 37\nli r4, 2\nsc\nnop\n" \ : : : "memory","r0","r3","r4" ); \ } #else #define CATCH_BREAK_INTO_DEBUGGER() if( Catch::isDebuggerActive() ) {__asm__("int $3\n" : : );} #endif #endif #elif defined(_MSC_VER) #define CATCH_BREAK_INTO_DEBUGGER() if( Catch::isDebuggerActive() ) { __debugbreak(); } #elif defined(__MINGW32__) extern "C" __declspec(dllimport) void __stdcall DebugBreak(); #define CATCH_BREAK_INTO_DEBUGGER() if( Catch::isDebuggerActive() ) { DebugBreak(); } #endif #ifndef CATCH_BREAK_INTO_DEBUGGER #define CATCH_BREAK_INTO_DEBUGGER() Catch::alwaysTrue(); #endif // #included from: catch_interfaces_runner.h #define TWOBLUECUBES_CATCH_INTERFACES_RUNNER_H_INCLUDED namespace Catch { class TestCase; struct IRunner { virtual ~IRunner(); virtual bool aborting() const = 0; }; } /////////////////////////////////////////////////////////////////////////////// // In the event of a failure works out if the debugger needs to be invoked // and/or an exception thrown and takes appropriate action. // This needs to be done as a macro so the debugger will stop in the user // source code rather than in Catch library code #define INTERNAL_CATCH_REACT( resultBuilder ) \ if( resultBuilder.shouldDebugBreak() ) CATCH_BREAK_INTO_DEBUGGER(); \ resultBuilder.react(); /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_TEST( expr, resultDisposition, macroName ) \ do { \ Catch::ResultBuilder __catchResult( macroName, CATCH_INTERNAL_LINEINFO, #expr, resultDisposition ); \ try { \ ( __catchResult->*expr ).endExpression(); \ } \ catch( ... ) { \ __catchResult.useActiveException( Catch::ResultDisposition::Normal ); \ } \ INTERNAL_CATCH_REACT( __catchResult ) \ } while( Catch::isTrue( false && (expr) ) ) // expr here is never evaluated at runtime but it forces the compiler to give it a look /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_IF( expr, resultDisposition, macroName ) \ INTERNAL_CATCH_TEST( expr, resultDisposition, macroName ); \ if( Catch::getResultCapture().getLastResult()->succeeded() ) /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_ELSE( expr, resultDisposition, macroName ) \ INTERNAL_CATCH_TEST( expr, resultDisposition, macroName ); \ if( !Catch::getResultCapture().getLastResult()->succeeded() ) /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_NO_THROW( expr, resultDisposition, macroName ) \ do { \ Catch::ResultBuilder __catchResult( macroName, CATCH_INTERNAL_LINEINFO, #expr, resultDisposition ); \ try { \ expr; \ __catchResult.captureResult( Catch::ResultWas::Ok ); \ } \ catch( ... ) { \ __catchResult.useActiveException( resultDisposition ); \ } \ INTERNAL_CATCH_REACT( __catchResult ) \ } while( Catch::alwaysFalse() ) /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_THROWS( expr, resultDisposition, macroName ) \ do { \ Catch::ResultBuilder __catchResult( macroName, CATCH_INTERNAL_LINEINFO, #expr, resultDisposition ); \ if( __catchResult.allowThrows() ) \ try { \ expr; \ __catchResult.captureResult( Catch::ResultWas::DidntThrowException ); \ } \ catch( ... ) { \ __catchResult.captureResult( Catch::ResultWas::Ok ); \ } \ else \ __catchResult.captureResult( Catch::ResultWas::Ok ); \ INTERNAL_CATCH_REACT( __catchResult ) \ } while( Catch::alwaysFalse() ) /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_THROWS_AS( expr, exceptionType, resultDisposition, macroName ) \ do { \ Catch::ResultBuilder __catchResult( macroName, CATCH_INTERNAL_LINEINFO, #expr, resultDisposition ); \ if( __catchResult.allowThrows() ) \ try { \ expr; \ __catchResult.captureResult( Catch::ResultWas::DidntThrowException ); \ } \ catch( exceptionType ) { \ __catchResult.captureResult( Catch::ResultWas::Ok ); \ } \ catch( ... ) { \ __catchResult.useActiveException( resultDisposition ); \ } \ else \ __catchResult.captureResult( Catch::ResultWas::Ok ); \ INTERNAL_CATCH_REACT( __catchResult ) \ } while( Catch::alwaysFalse() ) /////////////////////////////////////////////////////////////////////////////// #ifdef CATCH_CONFIG_VARIADIC_MACROS #define INTERNAL_CATCH_MSG( messageType, resultDisposition, macroName, ... ) \ do { \ Catch::ResultBuilder __catchResult( macroName, CATCH_INTERNAL_LINEINFO, "", resultDisposition ); \ __catchResult << __VA_ARGS__ + ::Catch::StreamEndStop(); \ __catchResult.captureResult( messageType ); \ INTERNAL_CATCH_REACT( __catchResult ) \ } while( Catch::alwaysFalse() ) #else #define INTERNAL_CATCH_MSG( messageType, resultDisposition, macroName, log ) \ do { \ Catch::ResultBuilder __catchResult( macroName, CATCH_INTERNAL_LINEINFO, "", resultDisposition ); \ __catchResult << log + ::Catch::StreamEndStop(); \ __catchResult.captureResult( messageType ); \ INTERNAL_CATCH_REACT( __catchResult ) \ } while( Catch::alwaysFalse() ) #endif /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_INFO( log, macroName ) \ Catch::ScopedMessage INTERNAL_CATCH_UNIQUE_NAME( scopedMessage ) = Catch::MessageBuilder( macroName, CATCH_INTERNAL_LINEINFO, Catch::ResultWas::Info ) << log; /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CHECK_THAT( arg, matcher, resultDisposition, macroName ) \ do { \ Catch::ResultBuilder __catchResult( macroName, CATCH_INTERNAL_LINEINFO, #arg " " #matcher, resultDisposition ); \ try { \ std::string matcherAsString = ::Catch::Matchers::matcher.toString(); \ __catchResult \ .setLhs( Catch::toString( arg ) ) \ .setRhs( matcherAsString == "{?}" ? #matcher : matcherAsString ) \ .setOp( "matches" ) \ .setResultType( ::Catch::Matchers::matcher.match( arg ) ); \ __catchResult.captureExpression(); \ } catch( ... ) { \ __catchResult.useActiveException( resultDisposition | Catch::ResultDisposition::ContinueOnFailure ); \ } \ INTERNAL_CATCH_REACT( __catchResult ) \ } while( Catch::alwaysFalse() ) // #included from: internal/catch_section.h #define TWOBLUECUBES_CATCH_SECTION_H_INCLUDED // #included from: catch_section_info.h #define TWOBLUECUBES_CATCH_SECTION_INFO_H_INCLUDED namespace Catch { struct SectionInfo { SectionInfo ( SourceLineInfo const& _lineInfo, std::string const& _name, std::string const& _description = std::string() ); std::string name; std::string description; SourceLineInfo lineInfo; }; } // end namespace Catch // #included from: catch_totals.hpp #define TWOBLUECUBES_CATCH_TOTALS_HPP_INCLUDED #include namespace Catch { struct Counts { Counts() : passed( 0 ), failed( 0 ), failedButOk( 0 ) {} Counts operator - ( Counts const& other ) const { Counts diff; diff.passed = passed - other.passed; diff.failed = failed - other.failed; diff.failedButOk = failedButOk - other.failedButOk; return diff; } Counts& operator += ( Counts const& other ) { passed += other.passed; failed += other.failed; failedButOk += other.failedButOk; return *this; } std::size_t total() const { return passed + failed + failedButOk; } bool allPassed() const { return failed == 0 && failedButOk == 0; } std::size_t passed; std::size_t failed; std::size_t failedButOk; }; struct Totals { Totals operator - ( Totals const& other ) const { Totals diff; diff.assertions = assertions - other.assertions; diff.testCases = testCases - other.testCases; return diff; } Totals delta( Totals const& prevTotals ) const { Totals diff = *this - prevTotals; if( diff.assertions.failed > 0 ) ++diff.testCases.failed; else if( diff.assertions.failedButOk > 0 ) ++diff.testCases.failedButOk; else ++diff.testCases.passed; return diff; } Totals& operator += ( Totals const& other ) { assertions += other.assertions; testCases += other.testCases; return *this; } Counts assertions; Counts testCases; }; } // #included from: catch_timer.h #define TWOBLUECUBES_CATCH_TIMER_H_INCLUDED #ifdef CATCH_PLATFORM_WINDOWS typedef unsigned long long uint64_t; #else #include #endif namespace Catch { class Timer { public: Timer() : m_ticks( 0 ) {} void start(); unsigned int getElapsedNanoseconds() const; unsigned int getElapsedMilliseconds() const; double getElapsedSeconds() const; private: uint64_t m_ticks; }; } // namespace Catch #include namespace Catch { class Section { public: Section( SectionInfo const& info ); ~Section(); // This indicates whether the section should be executed or not operator bool() const; private: #ifdef CATCH_CPP11_OR_GREATER Section( Section const& ) = delete; Section( Section && ) = delete; Section& operator = ( Section const& ) = delete; Section& operator = ( Section && ) = delete; #else Section( Section const& info ); Section& operator = ( Section const& ); #endif SectionInfo m_info; std::string m_name; Counts m_assertions; bool m_sectionIncluded; Timer m_timer; }; } // end namespace Catch #ifdef CATCH_CONFIG_VARIADIC_MACROS #define INTERNAL_CATCH_SECTION( ... ) \ if( Catch::Section const& INTERNAL_CATCH_UNIQUE_NAME( catch_internal_Section ) = Catch::SectionInfo( CATCH_INTERNAL_LINEINFO, __VA_ARGS__ ) ) #else #define INTERNAL_CATCH_SECTION( name, desc ) \ if( Catch::Section const& INTERNAL_CATCH_UNIQUE_NAME( catch_internal_Section ) = Catch::SectionInfo( CATCH_INTERNAL_LINEINFO, name, desc ) ) #endif // #included from: internal/catch_generators.hpp #define TWOBLUECUBES_CATCH_GENERATORS_HPP_INCLUDED #include #include #include #include namespace Catch { template struct IGenerator { virtual ~IGenerator() {} virtual T getValue( std::size_t index ) const = 0; virtual std::size_t size () const = 0; }; template class BetweenGenerator : public IGenerator { public: BetweenGenerator( T from, T to ) : m_from( from ), m_to( to ){} virtual T getValue( std::size_t index ) const { return m_from+static_cast( index ); } virtual std::size_t size() const { return static_cast( 1+m_to-m_from ); } private: T m_from; T m_to; }; template class ValuesGenerator : public IGenerator { public: ValuesGenerator(){} void add( T value ) { m_values.push_back( value ); } virtual T getValue( std::size_t index ) const { return m_values[index]; } virtual std::size_t size() const { return m_values.size(); } private: std::vector m_values; }; template class CompositeGenerator { public: CompositeGenerator() : m_totalSize( 0 ) {} // *** Move semantics, similar to auto_ptr *** CompositeGenerator( CompositeGenerator& other ) : m_fileInfo( other.m_fileInfo ), m_totalSize( 0 ) { move( other ); } CompositeGenerator& setFileInfo( const char* fileInfo ) { m_fileInfo = fileInfo; return *this; } ~CompositeGenerator() { deleteAll( m_composed ); } operator T () const { size_t overallIndex = getCurrentContext().getGeneratorIndex( m_fileInfo, m_totalSize ); typename std::vector*>::const_iterator it = m_composed.begin(); typename std::vector*>::const_iterator itEnd = m_composed.end(); for( size_t index = 0; it != itEnd; ++it ) { const IGenerator* generator = *it; if( overallIndex >= index && overallIndex < index + generator->size() ) { return generator->getValue( overallIndex-index ); } index += generator->size(); } CATCH_INTERNAL_ERROR( "Indexed past end of generated range" ); return T(); // Suppress spurious "not all control paths return a value" warning in Visual Studio - if you know how to fix this please do so } void add( const IGenerator* generator ) { m_totalSize += generator->size(); m_composed.push_back( generator ); } CompositeGenerator& then( CompositeGenerator& other ) { move( other ); return *this; } CompositeGenerator& then( T value ) { ValuesGenerator* valuesGen = new ValuesGenerator(); valuesGen->add( value ); add( valuesGen ); return *this; } private: void move( CompositeGenerator& other ) { std::copy( other.m_composed.begin(), other.m_composed.end(), std::back_inserter( m_composed ) ); m_totalSize += other.m_totalSize; other.m_composed.clear(); } std::vector*> m_composed; std::string m_fileInfo; size_t m_totalSize; }; namespace Generators { template CompositeGenerator between( T from, T to ) { CompositeGenerator generators; generators.add( new BetweenGenerator( from, to ) ); return generators; } template CompositeGenerator values( T val1, T val2 ) { CompositeGenerator generators; ValuesGenerator* valuesGen = new ValuesGenerator(); valuesGen->add( val1 ); valuesGen->add( val2 ); generators.add( valuesGen ); return generators; } template CompositeGenerator values( T val1, T val2, T val3 ){ CompositeGenerator generators; ValuesGenerator* valuesGen = new ValuesGenerator(); valuesGen->add( val1 ); valuesGen->add( val2 ); valuesGen->add( val3 ); generators.add( valuesGen ); return generators; } template CompositeGenerator values( T val1, T val2, T val3, T val4 ) { CompositeGenerator generators; ValuesGenerator* valuesGen = new ValuesGenerator(); valuesGen->add( val1 ); valuesGen->add( val2 ); valuesGen->add( val3 ); valuesGen->add( val4 ); generators.add( valuesGen ); return generators; } } // end namespace Generators using namespace Generators; } // end namespace Catch #define INTERNAL_CATCH_LINESTR2( line ) #line #define INTERNAL_CATCH_LINESTR( line ) INTERNAL_CATCH_LINESTR2( line ) #define INTERNAL_CATCH_GENERATE( expr ) expr.setFileInfo( __FILE__ "(" INTERNAL_CATCH_LINESTR( __LINE__ ) ")" ) // #included from: internal/catch_interfaces_exception.h #define TWOBLUECUBES_CATCH_INTERFACES_EXCEPTION_H_INCLUDED #include // #included from: catch_interfaces_registry_hub.h #define TWOBLUECUBES_CATCH_INTERFACES_REGISTRY_HUB_H_INCLUDED #include namespace Catch { class TestCase; struct ITestCaseRegistry; struct IExceptionTranslatorRegistry; struct IExceptionTranslator; struct IReporterRegistry; struct IReporterFactory; struct IRegistryHub { virtual ~IRegistryHub(); virtual IReporterRegistry const& getReporterRegistry() const = 0; virtual ITestCaseRegistry const& getTestCaseRegistry() const = 0; virtual IExceptionTranslatorRegistry& getExceptionTranslatorRegistry() = 0; }; struct IMutableRegistryHub { virtual ~IMutableRegistryHub(); virtual void registerReporter( std::string const& name, IReporterFactory* factory ) = 0; virtual void registerTest( TestCase const& testInfo ) = 0; virtual void registerTranslator( const IExceptionTranslator* translator ) = 0; }; IRegistryHub& getRegistryHub(); IMutableRegistryHub& getMutableRegistryHub(); void cleanUp(); std::string translateActiveException(); } namespace Catch { typedef std::string(*exceptionTranslateFunction)(); struct IExceptionTranslator { virtual ~IExceptionTranslator(); virtual std::string translate() const = 0; }; struct IExceptionTranslatorRegistry { virtual ~IExceptionTranslatorRegistry(); virtual std::string translateActiveException() const = 0; }; class ExceptionTranslatorRegistrar { template class ExceptionTranslator : public IExceptionTranslator { public: ExceptionTranslator( std::string(*translateFunction)( T& ) ) : m_translateFunction( translateFunction ) {} virtual std::string translate() const { try { throw; } catch( T& ex ) { return m_translateFunction( ex ); } } protected: std::string(*m_translateFunction)( T& ); }; public: template ExceptionTranslatorRegistrar( std::string(*translateFunction)( T& ) ) { getMutableRegistryHub().registerTranslator ( new ExceptionTranslator( translateFunction ) ); } }; } /////////////////////////////////////////////////////////////////////////////// #define INTERNAL_CATCH_TRANSLATE_EXCEPTION( signature ) \ static std::string INTERNAL_CATCH_UNIQUE_NAME( catch_internal_ExceptionTranslator )( signature ); \ namespace{ Catch::ExceptionTranslatorRegistrar INTERNAL_CATCH_UNIQUE_NAME( catch_internal_ExceptionRegistrar )( &INTERNAL_CATCH_UNIQUE_NAME( catch_internal_ExceptionTranslator ) ); }\ static std::string INTERNAL_CATCH_UNIQUE_NAME( catch_internal_ExceptionTranslator )( signature ) // #included from: internal/catch_approx.hpp #define TWOBLUECUBES_CATCH_APPROX_HPP_INCLUDED #include #include namespace Catch { namespace Detail { class Approx { public: explicit Approx ( double value ) : m_epsilon( std::numeric_limits::epsilon()*100 ), m_scale( 1.0 ), m_value( value ) {} Approx( Approx const& other ) : m_epsilon( other.m_epsilon ), m_scale( other.m_scale ), m_value( other.m_value ) {} static Approx custom() { return Approx( 0 ); } Approx operator()( double value ) { Approx approx( value ); approx.epsilon( m_epsilon ); approx.scale( m_scale ); return approx; } friend bool operator == ( double lhs, Approx const& rhs ) { // Thanks to Richard Harris for his help refining this formula return fabs( lhs - rhs.m_value ) < rhs.m_epsilon * (rhs.m_scale + (std::max)( fabs(lhs), fabs(rhs.m_value) ) ); } friend bool operator == ( Approx const& lhs, double rhs ) { return operator==( rhs, lhs ); } friend bool operator != ( double lhs, Approx const& rhs ) { return !operator==( lhs, rhs ); } friend bool operator != ( Approx const& lhs, double rhs ) { return !operator==( rhs, lhs ); } Approx& epsilon( double newEpsilon ) { m_epsilon = newEpsilon; return *this; } Approx& scale( double newScale ) { m_scale = newScale; return *this; } std::string toString() const { std::ostringstream oss; oss << "Approx( " << Catch::toString( m_value ) << " )"; return oss.str(); } private: double m_epsilon; double m_scale; double m_value; }; } template<> inline std::string toString( Detail::Approx const& value ) { return value.toString(); } } // end namespace Catch // #included from: internal/catch_matchers.hpp #define TWOBLUECUBES_CATCH_MATCHERS_HPP_INCLUDED namespace Catch { namespace Matchers { namespace Impl { template struct Matcher : SharedImpl { typedef ExpressionT ExpressionType; virtual ~Matcher() {} virtual Ptr clone() const = 0; virtual bool match( ExpressionT const& expr ) const = 0; virtual std::string toString() const = 0; }; template struct MatcherImpl : Matcher { virtual Ptr > clone() const { return Ptr >( new DerivedT( static_cast( *this ) ) ); } }; namespace Generic { template class AllOf : public MatcherImpl, ExpressionT> { public: AllOf() {} AllOf( AllOf const& other ) : m_matchers( other.m_matchers ) {} AllOf& add( Matcher const& matcher ) { m_matchers.push_back( matcher.clone() ); return *this; } virtual bool match( ExpressionT const& expr ) const { for( std::size_t i = 0; i < m_matchers.size(); ++i ) if( !m_matchers[i]->match( expr ) ) return false; return true; } virtual std::string toString() const { std::ostringstream oss; oss << "( "; for( std::size_t i = 0; i < m_matchers.size(); ++i ) { if( i != 0 ) oss << " and "; oss << m_matchers[i]->toString(); } oss << " )"; return oss.str(); } private: std::vector > > m_matchers; }; template class AnyOf : public MatcherImpl, ExpressionT> { public: AnyOf() {} AnyOf( AnyOf const& other ) : m_matchers( other.m_matchers ) {} AnyOf& add( Matcher const& matcher ) { m_matchers.push_back( matcher.clone() ); return *this; } virtual bool match( ExpressionT const& expr ) const { for( std::size_t i = 0; i < m_matchers.size(); ++i ) if( m_matchers[i]->match( expr ) ) return true; return false; } virtual std::string toString() const { std::ostringstream oss; oss << "( "; for( std::size_t i = 0; i < m_matchers.size(); ++i ) { if( i != 0 ) oss << " or "; oss << m_matchers[i]->toString(); } oss << " )"; return oss.str(); } private: std::vector > > m_matchers; }; } namespace StdString { inline std::string makeString( std::string const& str ) { return str; } inline std::string makeString( const char* str ) { return str ? std::string( str ) : std::string(); } struct Equals : MatcherImpl { Equals( std::string const& str ) : m_str( str ){} Equals( Equals const& other ) : m_str( other.m_str ){} virtual ~Equals(); virtual bool match( std::string const& expr ) const { return m_str == expr; } virtual std::string toString() const { return "equals: \"" + m_str + "\""; } std::string m_str; }; struct Contains : MatcherImpl { Contains( std::string const& substr ) : m_substr( substr ){} Contains( Contains const& other ) : m_substr( other.m_substr ){} virtual ~Contains(); virtual bool match( std::string const& expr ) const { return expr.find( m_substr ) != std::string::npos; } virtual std::string toString() const { return "contains: \"" + m_substr + "\""; } std::string m_substr; }; struct StartsWith : MatcherImpl { StartsWith( std::string const& substr ) : m_substr( substr ){} StartsWith( StartsWith const& other ) : m_substr( other.m_substr ){} virtual ~StartsWith(); virtual bool match( std::string const& expr ) const { return expr.find( m_substr ) == 0; } virtual std::string toString() const { return "starts with: \"" + m_substr + "\""; } std::string m_substr; }; struct EndsWith : MatcherImpl { EndsWith( std::string const& substr ) : m_substr( substr ){} EndsWith( EndsWith const& other ) : m_substr( other.m_substr ){} virtual ~EndsWith(); virtual bool match( std::string const& expr ) const { return expr.find( m_substr ) == expr.size() - m_substr.size(); } virtual std::string toString() const { return "ends with: \"" + m_substr + "\""; } std::string m_substr; }; } // namespace StdString } // namespace Impl // The following functions create the actual matcher objects. // This allows the types to be inferred template inline Impl::Generic::AllOf AllOf( Impl::Matcher const& m1, Impl::Matcher const& m2 ) { return Impl::Generic::AllOf().add( m1 ).add( m2 ); } template inline Impl::Generic::AllOf AllOf( Impl::Matcher const& m1, Impl::Matcher const& m2, Impl::Matcher const& m3 ) { return Impl::Generic::AllOf().add( m1 ).add( m2 ).add( m3 ); } template inline Impl::Generic::AnyOf AnyOf( Impl::Matcher const& m1, Impl::Matcher const& m2 ) { return Impl::Generic::AnyOf().add( m1 ).add( m2 ); } template inline Impl::Generic::AnyOf AnyOf( Impl::Matcher const& m1, Impl::Matcher const& m2, Impl::Matcher const& m3 ) { return Impl::Generic::AnyOf().add( m1 ).add( m2 ).add( m3 ); } inline Impl::StdString::Equals Equals( std::string const& str ) { return Impl::StdString::Equals( str ); } inline Impl::StdString::Equals Equals( const char* str ) { return Impl::StdString::Equals( Impl::StdString::makeString( str ) ); } inline Impl::StdString::Contains Contains( std::string const& substr ) { return Impl::StdString::Contains( substr ); } inline Impl::StdString::Contains Contains( const char* substr ) { return Impl::StdString::Contains( Impl::StdString::makeString( substr ) ); } inline Impl::StdString::StartsWith StartsWith( std::string const& substr ) { return Impl::StdString::StartsWith( substr ); } inline Impl::StdString::StartsWith StartsWith( const char* substr ) { return Impl::StdString::StartsWith( Impl::StdString::makeString( substr ) ); } inline Impl::StdString::EndsWith EndsWith( std::string const& substr ) { return Impl::StdString::EndsWith( substr ); } inline Impl::StdString::EndsWith EndsWith( const char* substr ) { return Impl::StdString::EndsWith( Impl::StdString::makeString( substr ) ); } } // namespace Matchers using namespace Matchers; } // namespace Catch // #included from: internal/catch_interfaces_tag_alias_registry.h #define TWOBLUECUBES_CATCH_INTERFACES_TAG_ALIAS_REGISTRY_H_INCLUDED // #included from: catch_tag_alias.h #define TWOBLUECUBES_CATCH_TAG_ALIAS_H_INCLUDED #include namespace Catch { struct TagAlias { TagAlias( std::string _tag, SourceLineInfo _lineInfo ) : tag( _tag ), lineInfo( _lineInfo ) {} std::string tag; SourceLineInfo lineInfo; }; struct RegistrarForTagAliases { RegistrarForTagAliases( char const* alias, char const* tag, SourceLineInfo const& lineInfo ); }; } // end namespace Catch #define CATCH_REGISTER_TAG_ALIAS( alias, spec ) namespace{ Catch::RegistrarForTagAliases INTERNAL_CATCH_UNIQUE_NAME( AutoRegisterTagAlias )( alias, spec, CATCH_INTERNAL_LINEINFO ); } // #included from: catch_option.hpp #define TWOBLUECUBES_CATCH_OPTION_HPP_INCLUDED namespace Catch { // An optional type template class Option { public: Option() : nullableValue( NULL ) {} Option( T const& _value ) : nullableValue( new( storage ) T( _value ) ) {} Option( Option const& _other ) : nullableValue( _other ? new( storage ) T( *_other ) : NULL ) {} ~Option() { reset(); } Option& operator= ( Option const& _other ) { if( &_other != this ) { reset(); if( _other ) nullableValue = new( storage ) T( *_other ); } return *this; } Option& operator = ( T const& _value ) { reset(); nullableValue = new( storage ) T( _value ); return *this; } void reset() { if( nullableValue ) nullableValue->~T(); nullableValue = NULL; } T& operator*() { return *nullableValue; } T const& operator*() const { return *nullableValue; } T* operator->() { return nullableValue; } const T* operator->() const { return nullableValue; } T valueOr( T const& defaultValue ) const { return nullableValue ? *nullableValue : defaultValue; } bool some() const { return nullableValue != NULL; } bool none() const { return nullableValue == NULL; } bool operator !() const { return nullableValue == NULL; } operator SafeBool::type() const { return SafeBool::makeSafe( some() ); } private: T* nullableValue; char storage[sizeof(T)]; }; } // end namespace Catch namespace Catch { struct ITagAliasRegistry { virtual ~ITagAliasRegistry(); virtual Option find( std::string const& alias ) const = 0; virtual std::string expandAliases( std::string const& unexpandedTestSpec ) const = 0; static ITagAliasRegistry const& get(); }; } // end namespace Catch // These files are included here so the single_include script doesn't put them // in the conditionally compiled sections // #included from: internal/catch_test_case_info.h #define TWOBLUECUBES_CATCH_TEST_CASE_INFO_H_INCLUDED #include #include #ifdef __clang__ #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wpadded" #endif namespace Catch { struct ITestCase; struct TestCaseInfo { enum SpecialProperties{ None = 0, IsHidden = 1 << 1, ShouldFail = 1 << 2, MayFail = 1 << 3, Throws = 1 << 4 }; TestCaseInfo( std::string const& _name, std::string const& _className, std::string const& _description, std::set const& _tags, SourceLineInfo const& _lineInfo ); TestCaseInfo( TestCaseInfo const& other ); bool isHidden() const; bool throws() const; bool okToFail() const; bool expectedToFail() const; std::string name; std::string className; std::string description; std::set tags; std::set lcaseTags; std::string tagsAsString; SourceLineInfo lineInfo; SpecialProperties properties; }; class TestCase : public TestCaseInfo { public: TestCase( ITestCase* testCase, TestCaseInfo const& info ); TestCase( TestCase const& other ); TestCase withName( std::string const& _newName ) const; void invoke() const; TestCaseInfo const& getTestCaseInfo() const; void swap( TestCase& other ); bool operator == ( TestCase const& other ) const; bool operator < ( TestCase const& other ) const; TestCase& operator = ( TestCase const& other ); private: Ptr test; }; TestCase makeTestCase( ITestCase* testCase, std::string const& className, std::string const& name, std::string const& description, SourceLineInfo const& lineInfo ); } #ifdef __clang__ #pragma clang diagnostic pop #endif #ifdef __OBJC__ // #included from: internal/catch_objc.hpp #define TWOBLUECUBES_CATCH_OBJC_HPP_INCLUDED #import #include // NB. Any general catch headers included here must be included // in catch.hpp first to make sure they are included by the single // header for non obj-usage /////////////////////////////////////////////////////////////////////////////// // This protocol is really only here for (self) documenting purposes, since // all its methods are optional. @protocol OcFixture @optional -(void) setUp; -(void) tearDown; @end namespace Catch { class OcMethod : public SharedImpl { public: OcMethod( Class cls, SEL sel ) : m_cls( cls ), m_sel( sel ) {} virtual void invoke() const { id obj = [[m_cls alloc] init]; performOptionalSelector( obj, @selector(setUp) ); performOptionalSelector( obj, m_sel ); performOptionalSelector( obj, @selector(tearDown) ); arcSafeRelease( obj ); } private: virtual ~OcMethod() {} Class m_cls; SEL m_sel; }; namespace Detail{ inline std::string getAnnotation( Class cls, std::string const& annotationName, std::string const& testCaseName ) { NSString* selStr = [[NSString alloc] initWithFormat:@"Catch_%s_%s", annotationName.c_str(), testCaseName.c_str()]; SEL sel = NSSelectorFromString( selStr ); arcSafeRelease( selStr ); id value = performOptionalSelector( cls, sel ); if( value ) return [(NSString*)value UTF8String]; return ""; } } inline size_t registerTestMethods() { size_t noTestMethods = 0; int noClasses = objc_getClassList( NULL, 0 ); Class* classes = (CATCH_UNSAFE_UNRETAINED Class *)malloc( sizeof(Class) * noClasses); objc_getClassList( classes, noClasses ); for( int c = 0; c < noClasses; c++ ) { Class cls = classes[c]; { u_int count; Method* methods = class_copyMethodList( cls, &count ); for( u_int m = 0; m < count ; m++ ) { SEL selector = method_getName(methods[m]); std::string methodName = sel_getName(selector); if( startsWith( methodName, "Catch_TestCase_" ) ) { std::string testCaseName = methodName.substr( 15 ); std::string name = Detail::getAnnotation( cls, "Name", testCaseName ); std::string desc = Detail::getAnnotation( cls, "Description", testCaseName ); const char* className = class_getName( cls ); getMutableRegistryHub().registerTest( makeTestCase( new OcMethod( cls, selector ), className, name.c_str(), desc.c_str(), SourceLineInfo() ) ); noTestMethods++; } } free(methods); } } return noTestMethods; } namespace Matchers { namespace Impl { namespace NSStringMatchers { template struct StringHolder : MatcherImpl{ StringHolder( NSString* substr ) : m_substr( [substr copy] ){} StringHolder( StringHolder const& other ) : m_substr( [other.m_substr copy] ){} StringHolder() { arcSafeRelease( m_substr ); } NSString* m_substr; }; struct Equals : StringHolder { Equals( NSString* substr ) : StringHolder( substr ){} virtual bool match( ExpressionType const& str ) const { return (str != nil || m_substr == nil ) && [str isEqualToString:m_substr]; } virtual std::string toString() const { return "equals string: " + Catch::toString( m_substr ); } }; struct Contains : StringHolder { Contains( NSString* substr ) : StringHolder( substr ){} virtual bool match( ExpressionType const& str ) const { return (str != nil || m_substr == nil ) && [str rangeOfString:m_substr].location != NSNotFound; } virtual std::string toString() const { return "contains string: " + Catch::toString( m_substr ); } }; struct StartsWith : StringHolder { StartsWith( NSString* substr ) : StringHolder( substr ){} virtual bool match( ExpressionType const& str ) const { return (str != nil || m_substr == nil ) && [str rangeOfString:m_substr].location == 0; } virtual std::string toString() const { return "starts with: " + Catch::toString( m_substr ); } }; struct EndsWith : StringHolder { EndsWith( NSString* substr ) : StringHolder( substr ){} virtual bool match( ExpressionType const& str ) const { return (str != nil || m_substr == nil ) && [str rangeOfString:m_substr].location == [str length] - [m_substr length]; } virtual std::string toString() const { return "ends with: " + Catch::toString( m_substr ); } }; } // namespace NSStringMatchers } // namespace Impl inline Impl::NSStringMatchers::Equals Equals( NSString* substr ){ return Impl::NSStringMatchers::Equals( substr ); } inline Impl::NSStringMatchers::Contains Contains( NSString* substr ){ return Impl::NSStringMatchers::Contains( substr ); } inline Impl::NSStringMatchers::StartsWith StartsWith( NSString* substr ){ return Impl::NSStringMatchers::StartsWith( substr ); } inline Impl::NSStringMatchers::EndsWith EndsWith( NSString* substr ){ return Impl::NSStringMatchers::EndsWith( substr ); } } // namespace Matchers using namespace Matchers; } // namespace Catch /////////////////////////////////////////////////////////////////////////////// #define OC_TEST_CASE( name, desc )\ +(NSString*) INTERNAL_CATCH_UNIQUE_NAME( Catch_Name_test ) \ {\ return @ name; \ }\ +(NSString*) INTERNAL_CATCH_UNIQUE_NAME( Catch_Description_test ) \ { \ return @ desc; \ } \ -(void) INTERNAL_CATCH_UNIQUE_NAME( Catch_TestCase_test ) #endif #ifdef CATCH_CONFIG_RUNNER // #included from: internal/catch_impl.hpp #define TWOBLUECUBES_CATCH_IMPL_HPP_INCLUDED // Collect all the implementation files together here // These are the equivalent of what would usually be cpp files #ifdef __clang__ #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wweak-vtables" #endif // #included from: catch_runner.hpp #define TWOBLUECUBES_CATCH_RUNNER_HPP_INCLUDED // #included from: internal/catch_commandline.hpp #define TWOBLUECUBES_CATCH_COMMANDLINE_HPP_INCLUDED // #included from: catch_config.hpp #define TWOBLUECUBES_CATCH_CONFIG_HPP_INCLUDED // #included from: catch_test_spec_parser.hpp #define TWOBLUECUBES_CATCH_TEST_SPEC_PARSER_HPP_INCLUDED #ifdef __clang__ #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wpadded" #endif // #included from: catch_test_spec.hpp #define TWOBLUECUBES_CATCH_TEST_SPEC_HPP_INCLUDED #ifdef __clang__ #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wpadded" #endif #include #include namespace Catch { class TestSpec { struct Pattern : SharedImpl<> { virtual ~Pattern(); virtual bool matches( TestCaseInfo const& testCase ) const = 0; }; class NamePattern : public Pattern { enum WildcardPosition { NoWildcard = 0, WildcardAtStart = 1, WildcardAtEnd = 2, WildcardAtBothEnds = WildcardAtStart | WildcardAtEnd }; public: NamePattern( std::string const& name ) : m_name( toLower( name ) ), m_wildcard( NoWildcard ) { if( startsWith( m_name, "*" ) ) { m_name = m_name.substr( 1 ); m_wildcard = WildcardAtStart; } if( endsWith( m_name, "*" ) ) { m_name = m_name.substr( 0, m_name.size()-1 ); m_wildcard = static_cast( m_wildcard | WildcardAtEnd ); } } virtual ~NamePattern(); virtual bool matches( TestCaseInfo const& testCase ) const { switch( m_wildcard ) { case NoWildcard: return m_name == toLower( testCase.name ); case WildcardAtStart: return endsWith( toLower( testCase.name ), m_name ); case WildcardAtEnd: return startsWith( toLower( testCase.name ), m_name ); case WildcardAtBothEnds: return contains( toLower( testCase.name ), m_name ); } #ifdef __clang__ #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wunreachable-code" #endif throw std::logic_error( "Unknown enum" ); #ifdef __clang__ #pragma clang diagnostic pop #endif } private: std::string m_name; WildcardPosition m_wildcard; }; class TagPattern : public Pattern { public: TagPattern( std::string const& tag ) : m_tag( toLower( tag ) ) {} virtual ~TagPattern(); virtual bool matches( TestCaseInfo const& testCase ) const { return testCase.lcaseTags.find( m_tag ) != testCase.lcaseTags.end(); } private: std::string m_tag; }; class ExcludedPattern : public Pattern { public: ExcludedPattern( Ptr const& underlyingPattern ) : m_underlyingPattern( underlyingPattern ) {} virtual ~ExcludedPattern(); virtual bool matches( TestCaseInfo const& testCase ) const { return !m_underlyingPattern->matches( testCase ); } private: Ptr m_underlyingPattern; }; struct Filter { std::vector > m_patterns; bool matches( TestCaseInfo const& testCase ) const { // All patterns in a filter must match for the filter to be a match for( std::vector >::const_iterator it = m_patterns.begin(), itEnd = m_patterns.end(); it != itEnd; ++it ) if( !(*it)->matches( testCase ) ) return false; return true; } }; public: bool hasFilters() const { return !m_filters.empty(); } bool matches( TestCaseInfo const& testCase ) const { // A TestSpec matches if any filter matches for( std::vector::const_iterator it = m_filters.begin(), itEnd = m_filters.end(); it != itEnd; ++it ) if( it->matches( testCase ) ) return true; return false; } private: std::vector m_filters; friend class TestSpecParser; }; } #ifdef __clang__ #pragma clang diagnostic pop #endif namespace Catch { class TestSpecParser { enum Mode{ None, Name, QuotedName, Tag }; Mode m_mode; bool m_exclusion; std::size_t m_start, m_pos; std::string m_arg; TestSpec::Filter m_currentFilter; TestSpec m_testSpec; ITagAliasRegistry const* m_tagAliases; public: TestSpecParser( ITagAliasRegistry const& tagAliases ) : m_tagAliases( &tagAliases ) {} TestSpecParser& parse( std::string const& arg ) { m_mode = None; m_exclusion = false; m_start = std::string::npos; m_arg = m_tagAliases->expandAliases( arg ); for( m_pos = 0; m_pos < m_arg.size(); ++m_pos ) visitChar( m_arg[m_pos] ); if( m_mode == Name ) addPattern(); return *this; } TestSpec testSpec() { addFilter(); return m_testSpec; } private: void visitChar( char c ) { if( m_mode == None ) { switch( c ) { case ' ': return; case '~': m_exclusion = true; return; case '[': return startNewMode( Tag, ++m_pos ); case '"': return startNewMode( QuotedName, ++m_pos ); default: startNewMode( Name, m_pos ); break; } } if( m_mode == Name ) { if( c == ',' ) { addPattern(); addFilter(); } else if( c == '[' ) { if( subString() == "exclude:" ) m_exclusion = true; else addPattern(); startNewMode( Tag, ++m_pos ); } } else if( m_mode == QuotedName && c == '"' ) addPattern(); else if( m_mode == Tag && c == ']' ) addPattern(); } void startNewMode( Mode mode, std::size_t start ) { m_mode = mode; m_start = start; } std::string subString() const { return m_arg.substr( m_start, m_pos - m_start ); } template void addPattern() { std::string token = subString(); if( startsWith( token, "exclude:" ) ) { m_exclusion = true; token = token.substr( 8 ); } if( !token.empty() ) { Ptr pattern = new T( token ); if( m_exclusion ) pattern = new TestSpec::ExcludedPattern( pattern ); m_currentFilter.m_patterns.push_back( pattern ); } m_exclusion = false; m_mode = None; } void addFilter() { if( !m_currentFilter.m_patterns.empty() ) { m_testSpec.m_filters.push_back( m_currentFilter ); m_currentFilter = TestSpec::Filter(); } } }; inline TestSpec parseTestSpec( std::string const& arg ) { return TestSpecParser( ITagAliasRegistry::get() ).parse( arg ).testSpec(); } } // namespace Catch #ifdef __clang__ #pragma clang diagnostic pop #endif // #included from: catch_interfaces_config.h #define TWOBLUECUBES_CATCH_INTERFACES_CONFIG_H_INCLUDED #include #include #include namespace Catch { struct Verbosity { enum Level { NoOutput = 0, Quiet, Normal }; }; struct WarnAbout { enum What { Nothing = 0x00, NoAssertions = 0x01 }; }; struct ShowDurations { enum OrNot { DefaultForReporter, Always, Never }; }; class TestSpec; struct IConfig : IShared { virtual ~IConfig(); virtual bool allowThrows() const = 0; virtual std::ostream& stream() const = 0; virtual std::string name() const = 0; virtual bool includeSuccessfulResults() const = 0; virtual bool shouldDebugBreak() const = 0; virtual bool warnAboutMissingAssertions() const = 0; virtual int abortAfter() const = 0; virtual bool showInvisibles() const = 0; virtual ShowDurations::OrNot showDurations() const = 0; virtual TestSpec const& testSpec() const = 0; }; } // #included from: catch_stream.h #define TWOBLUECUBES_CATCH_STREAM_H_INCLUDED #include #ifdef __clang__ #pragma clang diagnostic ignored "-Wpadded" #endif namespace Catch { class Stream { public: Stream(); Stream( std::streambuf* _streamBuf, bool _isOwned ); void release(); std::streambuf* streamBuf; private: bool isOwned; }; } #include #include #include #include #ifndef CATCH_CONFIG_CONSOLE_WIDTH #define CATCH_CONFIG_CONSOLE_WIDTH 80 #endif namespace Catch { struct ConfigData { ConfigData() : listTests( false ), listTags( false ), listReporters( false ), listTestNamesOnly( false ), showSuccessfulTests( false ), shouldDebugBreak( false ), noThrow( false ), showHelp( false ), showInvisibles( false ), abortAfter( -1 ), verbosity( Verbosity::Normal ), warnings( WarnAbout::Nothing ), showDurations( ShowDurations::DefaultForReporter ) {} bool listTests; bool listTags; bool listReporters; bool listTestNamesOnly; bool showSuccessfulTests; bool shouldDebugBreak; bool noThrow; bool showHelp; bool showInvisibles; int abortAfter; Verbosity::Level verbosity; WarnAbout::What warnings; ShowDurations::OrNot showDurations; std::string reporterName; std::string outputFilename; std::string name; std::string processName; std::vector testsOrTags; }; class Config : public SharedImpl { private: Config( Config const& other ); Config& operator = ( Config const& other ); virtual void dummy(); public: Config() : m_os( std::cout.rdbuf() ) {} Config( ConfigData const& data ) : m_data( data ), m_os( std::cout.rdbuf() ) { if( !data.testsOrTags.empty() ) { TestSpecParser parser( ITagAliasRegistry::get() ); for( std::size_t i = 0; i < data.testsOrTags.size(); ++i ) parser.parse( data.testsOrTags[i] ); m_testSpec = parser.testSpec(); } } virtual ~Config() { m_os.rdbuf( std::cout.rdbuf() ); m_stream.release(); } void setFilename( std::string const& filename ) { m_data.outputFilename = filename; } std::string const& getFilename() const { return m_data.outputFilename ; } bool listTests() const { return m_data.listTests; } bool listTestNamesOnly() const { return m_data.listTestNamesOnly; } bool listTags() const { return m_data.listTags; } bool listReporters() const { return m_data.listReporters; } std::string getProcessName() const { return m_data.processName; } bool shouldDebugBreak() const { return m_data.shouldDebugBreak; } void setStreamBuf( std::streambuf* buf ) { m_os.rdbuf( buf ? buf : std::cout.rdbuf() ); } void useStream( std::string const& streamName ) { Stream stream = createStream( streamName ); setStreamBuf( stream.streamBuf ); m_stream.release(); m_stream = stream; } std::string getReporterName() const { return m_data.reporterName; } int abortAfter() const { return m_data.abortAfter; } TestSpec const& testSpec() const { return m_testSpec; } bool showHelp() const { return m_data.showHelp; } bool showInvisibles() const { return m_data.showInvisibles; } // IConfig interface virtual bool allowThrows() const { return !m_data.noThrow; } virtual std::ostream& stream() const { return m_os; } virtual std::string name() const { return m_data.name.empty() ? m_data.processName : m_data.name; } virtual bool includeSuccessfulResults() const { return m_data.showSuccessfulTests; } virtual bool warnAboutMissingAssertions() const { return m_data.warnings & WarnAbout::NoAssertions; } virtual ShowDurations::OrNot showDurations() const { return m_data.showDurations; } private: ConfigData m_data; Stream m_stream; mutable std::ostream m_os; TestSpec m_testSpec; }; } // end namespace Catch // #included from: catch_clara.h #define TWOBLUECUBES_CATCH_CLARA_H_INCLUDED // Use Catch's value for console width (store Clara's off to the side, if present) #ifdef CLARA_CONFIG_CONSOLE_WIDTH #define CATCH_TEMP_CLARA_CONFIG_CONSOLE_WIDTH CLARA_CONFIG_CONSOLE_WIDTH #undef CLARA_CONFIG_CONSOLE_WIDTH #endif #define CLARA_CONFIG_CONSOLE_WIDTH CATCH_CONFIG_CONSOLE_WIDTH // Declare Clara inside the Catch namespace #define STITCH_CLARA_OPEN_NAMESPACE namespace Catch { // #included from: ../external/clara.h // Only use header guard if we are not using an outer namespace #if !defined(TWOBLUECUBES_CLARA_H_INCLUDED) || defined(STITCH_CLARA_OPEN_NAMESPACE) #ifndef STITCH_CLARA_OPEN_NAMESPACE #define TWOBLUECUBES_CLARA_H_INCLUDED #define STITCH_CLARA_OPEN_NAMESPACE #define STITCH_CLARA_CLOSE_NAMESPACE #else #define STITCH_CLARA_CLOSE_NAMESPACE } #endif #define STITCH_TBC_TEXT_FORMAT_OPEN_NAMESPACE STITCH_CLARA_OPEN_NAMESPACE // ----------- #included from tbc_text_format.h ----------- // Only use header guard if we are not using an outer namespace #if !defined(TBC_TEXT_FORMAT_H_INCLUDED) || defined(STITCH_TBC_TEXT_FORMAT_OUTER_NAMESPACE) #ifndef STITCH_TBC_TEXT_FORMAT_OUTER_NAMESPACE #define TBC_TEXT_FORMAT_H_INCLUDED #endif #include #include #include // Use optional outer namespace #ifdef STITCH_TBC_TEXT_FORMAT_OUTER_NAMESPACE namespace STITCH_TBC_TEXT_FORMAT_OUTER_NAMESPACE { #endif namespace Tbc { #ifdef TBC_TEXT_FORMAT_CONSOLE_WIDTH const unsigned int consoleWidth = TBC_TEXT_FORMAT_CONSOLE_WIDTH; #else const unsigned int consoleWidth = 80; #endif struct TextAttributes { TextAttributes() : initialIndent( std::string::npos ), indent( 0 ), width( consoleWidth-1 ), tabChar( '\t' ) {} TextAttributes& setInitialIndent( std::size_t _value ) { initialIndent = _value; return *this; } TextAttributes& setIndent( std::size_t _value ) { indent = _value; return *this; } TextAttributes& setWidth( std::size_t _value ) { width = _value; return *this; } TextAttributes& setTabChar( char _value ) { tabChar = _value; return *this; } std::size_t initialIndent; // indent of first line, or npos std::size_t indent; // indent of subsequent lines, or all if initialIndent is npos std::size_t width; // maximum width of text, including indent. Longer text will wrap char tabChar; // If this char is seen the indent is changed to current pos }; class Text { public: Text( std::string const& _str, TextAttributes const& _attr = TextAttributes() ) : attr( _attr ) { std::string wrappableChars = " [({.,/|\\-"; std::size_t indent = _attr.initialIndent != std::string::npos ? _attr.initialIndent : _attr.indent; std::string remainder = _str; while( !remainder.empty() ) { if( lines.size() >= 1000 ) { lines.push_back( "... message truncated due to excessive size" ); return; } std::size_t tabPos = std::string::npos; std::size_t width = (std::min)( remainder.size(), _attr.width - indent ); std::size_t pos = remainder.find_first_of( '\n' ); if( pos <= width ) { width = pos; } pos = remainder.find_last_of( _attr.tabChar, width ); if( pos != std::string::npos ) { tabPos = pos; if( remainder[width] == '\n' ) width--; remainder = remainder.substr( 0, tabPos ) + remainder.substr( tabPos+1 ); } if( width == remainder.size() ) { spliceLine( indent, remainder, width ); } else if( remainder[width] == '\n' ) { spliceLine( indent, remainder, width ); if( width <= 1 || remainder.size() != 1 ) remainder = remainder.substr( 1 ); indent = _attr.indent; } else { pos = remainder.find_last_of( wrappableChars, width ); if( pos != std::string::npos && pos > 0 ) { spliceLine( indent, remainder, pos ); if( remainder[0] == ' ' ) remainder = remainder.substr( 1 ); } else { spliceLine( indent, remainder, width-1 ); lines.back() += "-"; } if( lines.size() == 1 ) indent = _attr.indent; if( tabPos != std::string::npos ) indent += tabPos; } } } void spliceLine( std::size_t _indent, std::string& _remainder, std::size_t _pos ) { lines.push_back( std::string( _indent, ' ' ) + _remainder.substr( 0, _pos ) ); _remainder = _remainder.substr( _pos ); } typedef std::vector::const_iterator const_iterator; const_iterator begin() const { return lines.begin(); } const_iterator end() const { return lines.end(); } std::string const& last() const { return lines.back(); } std::size_t size() const { return lines.size(); } std::string const& operator[]( std::size_t _index ) const { return lines[_index]; } std::string toString() const { std::ostringstream oss; oss << *this; return oss.str(); } inline friend std::ostream& operator << ( std::ostream& _stream, Text const& _text ) { for( Text::const_iterator it = _text.begin(), itEnd = _text.end(); it != itEnd; ++it ) { if( it != _text.begin() ) _stream << "\n"; _stream << *it; } return _stream; } private: std::string str; TextAttributes attr; std::vector lines; }; } // end namespace Tbc #ifdef STITCH_TBC_TEXT_FORMAT_OUTER_NAMESPACE } // end outer namespace #endif #endif // TBC_TEXT_FORMAT_H_INCLUDED // ----------- end of #include from tbc_text_format.h ----------- // ........... back in /Users/philnash/Dev/OSS/Clara/srcs/clara.h #undef STITCH_TBC_TEXT_FORMAT_OPEN_NAMESPACE #include #include #include #include // Use optional outer namespace #ifdef STITCH_CLARA_OPEN_NAMESPACE STITCH_CLARA_OPEN_NAMESPACE #endif namespace Clara { struct UnpositionalTag {}; extern UnpositionalTag _; #ifdef CLARA_CONFIG_MAIN UnpositionalTag _; #endif namespace Detail { #ifdef CLARA_CONSOLE_WIDTH const unsigned int consoleWidth = CLARA_CONFIG_CONSOLE_WIDTH; #else const unsigned int consoleWidth = 80; #endif using namespace Tbc; inline bool startsWith( std::string const& str, std::string const& prefix ) { return str.size() >= prefix.size() && str.substr( 0, prefix.size() ) == prefix; } template struct RemoveConstRef{ typedef T type; }; template struct RemoveConstRef{ typedef T type; }; template struct RemoveConstRef{ typedef T type; }; template struct RemoveConstRef{ typedef T type; }; template struct IsBool { static const bool value = false; }; template<> struct IsBool { static const bool value = true; }; template void convertInto( std::string const& _source, T& _dest ) { std::stringstream ss; ss << _source; ss >> _dest; if( ss.fail() ) throw std::runtime_error( "Unable to convert " + _source + " to destination type" ); } inline void convertInto( std::string const& _source, std::string& _dest ) { _dest = _source; } inline void convertInto( std::string const& _source, bool& _dest ) { std::string sourceLC = _source; std::transform( sourceLC.begin(), sourceLC.end(), sourceLC.begin(), ::tolower ); if( sourceLC == "y" || sourceLC == "1" || sourceLC == "true" || sourceLC == "yes" || sourceLC == "on" ) _dest = true; else if( sourceLC == "n" || sourceLC == "0" || sourceLC == "false" || sourceLC == "no" || sourceLC == "off" ) _dest = false; else throw std::runtime_error( "Expected a boolean value but did not recognise:\n '" + _source + "'" ); } inline void convertInto( bool _source, bool& _dest ) { _dest = _source; } template inline void convertInto( bool, T& ) { throw std::runtime_error( "Invalid conversion" ); } template struct IArgFunction { virtual ~IArgFunction() {} # ifdef CATCH_CPP11_OR_GREATER IArgFunction() = default; IArgFunction( IArgFunction const& ) = default; # endif virtual void set( ConfigT& config, std::string const& value ) const = 0; virtual void setFlag( ConfigT& config ) const = 0; virtual bool takesArg() const = 0; virtual IArgFunction* clone() const = 0; }; template class BoundArgFunction { public: BoundArgFunction() : functionObj( NULL ) {} BoundArgFunction( IArgFunction* _functionObj ) : functionObj( _functionObj ) {} BoundArgFunction( BoundArgFunction const& other ) : functionObj( other.functionObj ? other.functionObj->clone() : NULL ) {} BoundArgFunction& operator = ( BoundArgFunction const& other ) { IArgFunction* newFunctionObj = other.functionObj ? other.functionObj->clone() : NULL; delete functionObj; functionObj = newFunctionObj; return *this; } ~BoundArgFunction() { delete functionObj; } void set( ConfigT& config, std::string const& value ) const { functionObj->set( config, value ); } void setFlag( ConfigT& config ) const { functionObj->setFlag( config ); } bool takesArg() const { return functionObj->takesArg(); } bool isSet() const { return functionObj != NULL; } private: IArgFunction* functionObj; }; template struct NullBinder : IArgFunction{ virtual void set( C&, std::string const& ) const {} virtual void setFlag( C& ) const {} virtual bool takesArg() const { return true; } virtual IArgFunction* clone() const { return new NullBinder( *this ); } }; template struct BoundDataMember : IArgFunction{ BoundDataMember( M C::* _member ) : member( _member ) {} virtual void set( C& p, std::string const& stringValue ) const { convertInto( stringValue, p.*member ); } virtual void setFlag( C& p ) const { convertInto( true, p.*member ); } virtual bool takesArg() const { return !IsBool::value; } virtual IArgFunction* clone() const { return new BoundDataMember( *this ); } M C::* member; }; template struct BoundUnaryMethod : IArgFunction{ BoundUnaryMethod( void (C::*_member)( M ) ) : member( _member ) {} virtual void set( C& p, std::string const& stringValue ) const { typename RemoveConstRef::type value; convertInto( stringValue, value ); (p.*member)( value ); } virtual void setFlag( C& p ) const { typename RemoveConstRef::type value; convertInto( true, value ); (p.*member)( value ); } virtual bool takesArg() const { return !IsBool::value; } virtual IArgFunction* clone() const { return new BoundUnaryMethod( *this ); } void (C::*member)( M ); }; template struct BoundNullaryMethod : IArgFunction{ BoundNullaryMethod( void (C::*_member)() ) : member( _member ) {} virtual void set( C& p, std::string const& stringValue ) const { bool value; convertInto( stringValue, value ); if( value ) (p.*member)(); } virtual void setFlag( C& p ) const { (p.*member)(); } virtual bool takesArg() const { return false; } virtual IArgFunction* clone() const { return new BoundNullaryMethod( *this ); } void (C::*member)(); }; template struct BoundUnaryFunction : IArgFunction{ BoundUnaryFunction( void (*_function)( C& ) ) : function( _function ) {} virtual void set( C& obj, std::string const& stringValue ) const { bool value; convertInto( stringValue, value ); if( value ) function( obj ); } virtual void setFlag( C& p ) const { function( p ); } virtual bool takesArg() const { return false; } virtual IArgFunction* clone() const { return new BoundUnaryFunction( *this ); } void (*function)( C& ); }; template struct BoundBinaryFunction : IArgFunction{ BoundBinaryFunction( void (*_function)( C&, T ) ) : function( _function ) {} virtual void set( C& obj, std::string const& stringValue ) const { typename RemoveConstRef::type value; convertInto( stringValue, value ); function( obj, value ); } virtual void setFlag( C& obj ) const { typename RemoveConstRef::type value; convertInto( true, value ); function( obj, value ); } virtual bool takesArg() const { return !IsBool::value; } virtual IArgFunction* clone() const { return new BoundBinaryFunction( *this ); } void (*function)( C&, T ); }; } // namespace Detail struct Parser { Parser() : separators( " \t=:" ) {} struct Token { enum Type { Positional, ShortOpt, LongOpt }; Token( Type _type, std::string const& _data ) : type( _type ), data( _data ) {} Type type; std::string data; }; void parseIntoTokens( int argc, char const * const * argv, std::vector& tokens ) const { const std::string doubleDash = "--"; for( int i = 1; i < argc && argv[i] != doubleDash; ++i ) parseIntoTokens( argv[i] , tokens); } void parseIntoTokens( std::string arg, std::vector& tokens ) const { while( !arg.empty() ) { Parser::Token token( Parser::Token::Positional, arg ); arg = ""; if( token.data[0] == '-' ) { if( token.data.size() > 1 && token.data[1] == '-' ) { token = Parser::Token( Parser::Token::LongOpt, token.data.substr( 2 ) ); } else { token = Parser::Token( Parser::Token::ShortOpt, token.data.substr( 1 ) ); if( token.data.size() > 1 && separators.find( token.data[1] ) == std::string::npos ) { arg = "-" + token.data.substr( 1 ); token.data = token.data.substr( 0, 1 ); } } } if( token.type != Parser::Token::Positional ) { std::size_t pos = token.data.find_first_of( separators ); if( pos != std::string::npos ) { arg = token.data.substr( pos+1 ); token.data = token.data.substr( 0, pos ); } } tokens.push_back( token ); } } std::string separators; }; template struct CommonArgProperties { CommonArgProperties() {} CommonArgProperties( Detail::BoundArgFunction const& _boundField ) : boundField( _boundField ) {} Detail::BoundArgFunction boundField; std::string description; std::string detail; std::string placeholder; // Only value if boundField takes an arg bool takesArg() const { return !placeholder.empty(); } void validate() const { if( !boundField.isSet() ) throw std::logic_error( "option not bound" ); } }; struct OptionArgProperties { std::vector shortNames; std::string longName; bool hasShortName( std::string const& shortName ) const { return std::find( shortNames.begin(), shortNames.end(), shortName ) != shortNames.end(); } bool hasLongName( std::string const& _longName ) const { return _longName == longName; } }; struct PositionalArgProperties { PositionalArgProperties() : position( -1 ) {} int position; // -1 means non-positional (floating) bool isFixedPositional() const { return position != -1; } }; template class CommandLine { struct Arg : CommonArgProperties, OptionArgProperties, PositionalArgProperties { Arg() {} Arg( Detail::BoundArgFunction const& _boundField ) : CommonArgProperties( _boundField ) {} using CommonArgProperties::placeholder; // !TBD std::string dbgName() const { if( !longName.empty() ) return "--" + longName; if( !shortNames.empty() ) return "-" + shortNames[0]; return "positional args"; } std::string commands() const { std::ostringstream oss; bool first = true; std::vector::const_iterator it = shortNames.begin(), itEnd = shortNames.end(); for(; it != itEnd; ++it ) { if( first ) first = false; else oss << ", "; oss << "-" << *it; } if( !longName.empty() ) { if( !first ) oss << ", "; oss << "--" << longName; } if( !placeholder.empty() ) oss << " <" << placeholder << ">"; return oss.str(); } }; // NOTE: std::auto_ptr is deprecated in c++11/c++0x #if defined(__cplusplus) && __cplusplus > 199711L typedef std::unique_ptr ArgAutoPtr; #else typedef std::auto_ptr ArgAutoPtr; #endif friend void addOptName( Arg& arg, std::string const& optName ) { if( optName.empty() ) return; if( Detail::startsWith( optName, "--" ) ) { if( !arg.longName.empty() ) throw std::logic_error( "Only one long opt may be specified. '" + arg.longName + "' already specified, now attempting to add '" + optName + "'" ); arg.longName = optName.substr( 2 ); } else if( Detail::startsWith( optName, "-" ) ) arg.shortNames.push_back( optName.substr( 1 ) ); else throw std::logic_error( "option must begin with - or --. Option was: '" + optName + "'" ); } friend void setPositionalArg( Arg& arg, int position ) { arg.position = position; } class ArgBuilder { public: ArgBuilder( Arg* arg ) : m_arg( arg ) {} // Bind a non-boolean data member (requires placeholder string) template void bind( M C::* field, std::string const& placeholder ) { m_arg->boundField = new Detail::BoundDataMember( field ); m_arg->placeholder = placeholder; } // Bind a boolean data member (no placeholder required) template void bind( bool C::* field ) { m_arg->boundField = new Detail::BoundDataMember( field ); } // Bind a method taking a single, non-boolean argument (requires a placeholder string) template void bind( void (C::* unaryMethod)( M ), std::string const& placeholder ) { m_arg->boundField = new Detail::BoundUnaryMethod( unaryMethod ); m_arg->placeholder = placeholder; } // Bind a method taking a single, boolean argument (no placeholder string required) template void bind( void (C::* unaryMethod)( bool ) ) { m_arg->boundField = new Detail::BoundUnaryMethod( unaryMethod ); } // Bind a method that takes no arguments (will be called if opt is present) template void bind( void (C::* nullaryMethod)() ) { m_arg->boundField = new Detail::BoundNullaryMethod( nullaryMethod ); } // Bind a free function taking a single argument - the object to operate on (no placeholder string required) template void bind( void (* unaryFunction)( C& ) ) { m_arg->boundField = new Detail::BoundUnaryFunction( unaryFunction ); } // Bind a free function taking a single argument - the object to operate on (requires a placeholder string) template void bind( void (* binaryFunction)( C&, T ), std::string const& placeholder ) { m_arg->boundField = new Detail::BoundBinaryFunction( binaryFunction ); m_arg->placeholder = placeholder; } ArgBuilder& describe( std::string const& description ) { m_arg->description = description; return *this; } ArgBuilder& detail( std::string const& detail ) { m_arg->detail = detail; return *this; } protected: Arg* m_arg; }; class OptBuilder : public ArgBuilder { public: OptBuilder( Arg* arg ) : ArgBuilder( arg ) {} OptBuilder( OptBuilder& other ) : ArgBuilder( other ) {} OptBuilder& operator[]( std::string const& optName ) { addOptName( *ArgBuilder::m_arg, optName ); return *this; } }; public: CommandLine() : m_boundProcessName( new Detail::NullBinder() ), m_highestSpecifiedArgPosition( 0 ), m_throwOnUnrecognisedTokens( false ) {} CommandLine( CommandLine const& other ) : m_boundProcessName( other.m_boundProcessName ), m_options ( other.m_options ), m_positionalArgs( other.m_positionalArgs ), m_highestSpecifiedArgPosition( other.m_highestSpecifiedArgPosition ), m_throwOnUnrecognisedTokens( other.m_throwOnUnrecognisedTokens ) { if( other.m_floatingArg.get() ) m_floatingArg = ArgAutoPtr( new Arg( *other.m_floatingArg ) ); } CommandLine& setThrowOnUnrecognisedTokens( bool shouldThrow = true ) { m_throwOnUnrecognisedTokens = shouldThrow; return *this; } OptBuilder operator[]( std::string const& optName ) { m_options.push_back( Arg() ); addOptName( m_options.back(), optName ); OptBuilder builder( &m_options.back() ); return builder; } ArgBuilder operator[]( int position ) { m_positionalArgs.insert( std::make_pair( position, Arg() ) ); if( position > m_highestSpecifiedArgPosition ) m_highestSpecifiedArgPosition = position; setPositionalArg( m_positionalArgs[position], position ); ArgBuilder builder( &m_positionalArgs[position] ); return builder; } // Invoke this with the _ instance ArgBuilder operator[]( UnpositionalTag ) { if( m_floatingArg.get() ) throw std::logic_error( "Only one unpositional argument can be added" ); m_floatingArg = ArgAutoPtr( new Arg() ); ArgBuilder builder( m_floatingArg.get() ); return builder; } template void bindProcessName( M C::* field ) { m_boundProcessName = new Detail::BoundDataMember( field ); } template void bindProcessName( void (C::*_unaryMethod)( M ) ) { m_boundProcessName = new Detail::BoundUnaryMethod( _unaryMethod ); } void optUsage( std::ostream& os, std::size_t indent = 0, std::size_t width = Detail::consoleWidth ) const { typename std::vector::const_iterator itBegin = m_options.begin(), itEnd = m_options.end(), it; std::size_t maxWidth = 0; for( it = itBegin; it != itEnd; ++it ) maxWidth = (std::max)( maxWidth, it->commands().size() ); for( it = itBegin; it != itEnd; ++it ) { Detail::Text usage( it->commands(), Detail::TextAttributes() .setWidth( maxWidth+indent ) .setIndent( indent ) ); Detail::Text desc( it->description, Detail::TextAttributes() .setWidth( width - maxWidth - 3 ) ); for( std::size_t i = 0; i < (std::max)( usage.size(), desc.size() ); ++i ) { std::string usageCol = i < usage.size() ? usage[i] : ""; os << usageCol; if( i < desc.size() && !desc[i].empty() ) os << std::string( indent + 2 + maxWidth - usageCol.size(), ' ' ) << desc[i]; os << "\n"; } } } std::string optUsage() const { std::ostringstream oss; optUsage( oss ); return oss.str(); } void argSynopsis( std::ostream& os ) const { for( int i = 1; i <= m_highestSpecifiedArgPosition; ++i ) { if( i > 1 ) os << " "; typename std::map::const_iterator it = m_positionalArgs.find( i ); if( it != m_positionalArgs.end() ) os << "<" << it->second.placeholder << ">"; else if( m_floatingArg.get() ) os << "<" << m_floatingArg->placeholder << ">"; else throw std::logic_error( "non consecutive positional arguments with no floating args" ); } // !TBD No indication of mandatory args if( m_floatingArg.get() ) { if( m_highestSpecifiedArgPosition > 1 ) os << " "; os << "[<" << m_floatingArg->placeholder << "> ...]"; } } std::string argSynopsis() const { std::ostringstream oss; argSynopsis( oss ); return oss.str(); } void usage( std::ostream& os, std::string const& procName ) const { validate(); os << "usage:\n " << procName << " "; argSynopsis( os ); if( !m_options.empty() ) { os << " [options]\n\nwhere options are: \n"; optUsage( os, 2 ); } os << "\n"; } std::string usage( std::string const& procName ) const { std::ostringstream oss; usage( oss, procName ); return oss.str(); } ConfigT parse( int argc, char const * const * argv ) const { ConfigT config; parseInto( argc, argv, config ); return config; } std::vector parseInto( int argc, char const * const * argv, ConfigT& config ) const { std::string processName = argv[0]; std::size_t lastSlash = processName.find_last_of( "/\\" ); if( lastSlash != std::string::npos ) processName = processName.substr( lastSlash+1 ); m_boundProcessName.set( config, processName ); std::vector tokens; Parser parser; parser.parseIntoTokens( argc, argv, tokens ); return populate( tokens, config ); } std::vector populate( std::vector const& tokens, ConfigT& config ) const { validate(); std::vector unusedTokens = populateOptions( tokens, config ); unusedTokens = populateFixedArgs( unusedTokens, config ); unusedTokens = populateFloatingArgs( unusedTokens, config ); return unusedTokens; } std::vector populateOptions( std::vector const& tokens, ConfigT& config ) const { std::vector unusedTokens; std::vector errors; for( std::size_t i = 0; i < tokens.size(); ++i ) { Parser::Token const& token = tokens[i]; typename std::vector::const_iterator it = m_options.begin(), itEnd = m_options.end(); for(; it != itEnd; ++it ) { Arg const& arg = *it; try { if( ( token.type == Parser::Token::ShortOpt && arg.hasShortName( token.data ) ) || ( token.type == Parser::Token::LongOpt && arg.hasLongName( token.data ) ) ) { if( arg.takesArg() ) { if( i == tokens.size()-1 || tokens[i+1].type != Parser::Token::Positional ) errors.push_back( "Expected argument to option: " + token.data ); else arg.boundField.set( config, tokens[++i].data ); } else { arg.boundField.setFlag( config ); } break; } } catch( std::exception& ex ) { errors.push_back( std::string( ex.what() ) + "\n- while parsing: (" + arg.commands() + ")" ); } } if( it == itEnd ) { if( token.type == Parser::Token::Positional || !m_throwOnUnrecognisedTokens ) unusedTokens.push_back( token ); else if( m_throwOnUnrecognisedTokens ) errors.push_back( "unrecognised option: " + token.data ); } } if( !errors.empty() ) { std::ostringstream oss; for( std::vector::const_iterator it = errors.begin(), itEnd = errors.end(); it != itEnd; ++it ) { if( it != errors.begin() ) oss << "\n"; oss << *it; } throw std::runtime_error( oss.str() ); } return unusedTokens; } std::vector populateFixedArgs( std::vector const& tokens, ConfigT& config ) const { std::vector unusedTokens; int position = 1; for( std::size_t i = 0; i < tokens.size(); ++i ) { Parser::Token const& token = tokens[i]; typename std::map::const_iterator it = m_positionalArgs.find( position ); if( it != m_positionalArgs.end() ) it->second.boundField.set( config, token.data ); else unusedTokens.push_back( token ); if( token.type == Parser::Token::Positional ) position++; } return unusedTokens; } std::vector populateFloatingArgs( std::vector const& tokens, ConfigT& config ) const { if( !m_floatingArg.get() ) return tokens; std::vector unusedTokens; for( std::size_t i = 0; i < tokens.size(); ++i ) { Parser::Token const& token = tokens[i]; if( token.type == Parser::Token::Positional ) m_floatingArg->boundField.set( config, token.data ); else unusedTokens.push_back( token ); } return unusedTokens; } void validate() const { if( m_options.empty() && m_positionalArgs.empty() && !m_floatingArg.get() ) throw std::logic_error( "No options or arguments specified" ); for( typename std::vector::const_iterator it = m_options.begin(), itEnd = m_options.end(); it != itEnd; ++it ) it->validate(); } private: Detail::BoundArgFunction m_boundProcessName; std::vector m_options; std::map m_positionalArgs; ArgAutoPtr m_floatingArg; int m_highestSpecifiedArgPosition; bool m_throwOnUnrecognisedTokens; }; } // end namespace Clara STITCH_CLARA_CLOSE_NAMESPACE #undef STITCH_CLARA_OPEN_NAMESPACE #undef STITCH_CLARA_CLOSE_NAMESPACE #endif // TWOBLUECUBES_CLARA_H_INCLUDED #undef STITCH_CLARA_OPEN_NAMESPACE // Restore Clara's value for console width, if present #ifdef CATCH_TEMP_CLARA_CONFIG_CONSOLE_WIDTH #define CLARA_CONFIG_CONSOLE_WIDTH CATCH_TEMP_CLARA_CONFIG_CONSOLE_WIDTH #undef CATCH_TEMP_CLARA_CONFIG_CONSOLE_WIDTH #endif #include namespace Catch { inline void abortAfterFirst( ConfigData& config ) { config.abortAfter = 1; } inline void abortAfterX( ConfigData& config, int x ) { if( x < 1 ) throw std::runtime_error( "Value after -x or --abortAfter must be greater than zero" ); config.abortAfter = x; } inline void addTestOrTags( ConfigData& config, std::string const& _testSpec ) { config.testsOrTags.push_back( _testSpec ); } inline void addWarning( ConfigData& config, std::string const& _warning ) { if( _warning == "NoAssertions" ) config.warnings = static_cast( config.warnings | WarnAbout::NoAssertions ); else throw std::runtime_error( "Unrecognised warning: '" + _warning + "'" ); } inline void setVerbosity( ConfigData& config, int level ) { // !TBD: accept strings? config.verbosity = static_cast( level ); } inline void setShowDurations( ConfigData& config, bool _showDurations ) { config.showDurations = _showDurations ? ShowDurations::Always : ShowDurations::Never; } inline void loadTestNamesFromFile( ConfigData& config, std::string const& _filename ) { std::ifstream f( _filename.c_str() ); if( !f.is_open() ) throw std::domain_error( "Unable to load input file: " + _filename ); std::string line; while( std::getline( f, line ) ) { line = trim(line); if( !line.empty() && !startsWith( line, "#" ) ) addTestOrTags( config, "\"" + line + "\"," ); } } inline Clara::CommandLine makeCommandLineParser() { using namespace Clara; CommandLine cli; cli.bindProcessName( &ConfigData::processName ); cli["-?"]["-h"]["--help"] .describe( "display usage information" ) .bind( &ConfigData::showHelp ); cli["-l"]["--list-tests"] .describe( "list all/matching test cases" ) .bind( &ConfigData::listTests ); cli["-t"]["--list-tags"] .describe( "list all/matching tags" ) .bind( &ConfigData::listTags ); cli["-s"]["--success"] .describe( "include successful tests in output" ) .bind( &ConfigData::showSuccessfulTests ); cli["-b"]["--break"] .describe( "break into debugger on failure" ) .bind( &ConfigData::shouldDebugBreak ); cli["-e"]["--nothrow"] .describe( "skip exception tests" ) .bind( &ConfigData::noThrow ); cli["-i"]["--invisibles"] .describe( "show invisibles (tabs, newlines)" ) .bind( &ConfigData::showInvisibles ); cli["-o"]["--out"] .describe( "output filename" ) .bind( &ConfigData::outputFilename, "filename" ); cli["-r"]["--reporter"] // .placeholder( "name[:filename]" ) .describe( "reporter to use (defaults to console)" ) .bind( &ConfigData::reporterName, "name" ); cli["-n"]["--name"] .describe( "suite name" ) .bind( &ConfigData::name, "name" ); cli["-a"]["--abort"] .describe( "abort at first failure" ) .bind( &abortAfterFirst ); cli["-x"]["--abortx"] .describe( "abort after x failures" ) .bind( &abortAfterX, "no. failures" ); cli["-w"]["--warn"] .describe( "enable warnings" ) .bind( &addWarning, "warning name" ); // - needs updating if reinstated // cli.into( &setVerbosity ) // .describe( "level of verbosity (0=no output)" ) // .shortOpt( "v") // .longOpt( "verbosity" ) // .placeholder( "level" ); cli[_] .describe( "which test or tests to use" ) .bind( &addTestOrTags, "test name, pattern or tags" ); cli["-d"]["--durations"] .describe( "show test durations" ) .bind( &setShowDurations, "yes/no" ); cli["-f"]["--input-file"] .describe( "load test names to run from a file" ) .bind( &loadTestNamesFromFile, "filename" ); // Less common commands which don't have a short form cli["--list-test-names-only"] .describe( "list all/matching test cases names only" ) .bind( &ConfigData::listTestNamesOnly ); cli["--list-reporters"] .describe( "list all reporters" ) .bind( &ConfigData::listReporters ); return cli; } } // end namespace Catch // #included from: internal/catch_list.hpp #define TWOBLUECUBES_CATCH_LIST_HPP_INCLUDED // #included from: catch_text.h #define TWOBLUECUBES_CATCH_TEXT_H_INCLUDED #define TBC_TEXT_FORMAT_CONSOLE_WIDTH CATCH_CONFIG_CONSOLE_WIDTH #define CLICHE_TBC_TEXT_FORMAT_OUTER_NAMESPACE Catch // #included from: ../external/tbc_text_format.h // Only use header guard if we are not using an outer namespace #ifndef CLICHE_TBC_TEXT_FORMAT_OUTER_NAMESPACE # ifdef TWOBLUECUBES_TEXT_FORMAT_H_INCLUDED # ifndef TWOBLUECUBES_TEXT_FORMAT_H_ALREADY_INCLUDED # define TWOBLUECUBES_TEXT_FORMAT_H_ALREADY_INCLUDED # endif # else # define TWOBLUECUBES_TEXT_FORMAT_H_INCLUDED # endif #endif #ifndef TWOBLUECUBES_TEXT_FORMAT_H_ALREADY_INCLUDED #include #include #include // Use optional outer namespace #ifdef CLICHE_TBC_TEXT_FORMAT_OUTER_NAMESPACE namespace CLICHE_TBC_TEXT_FORMAT_OUTER_NAMESPACE { #endif namespace Tbc { #ifdef TBC_TEXT_FORMAT_CONSOLE_WIDTH const unsigned int consoleWidth = TBC_TEXT_FORMAT_CONSOLE_WIDTH; #else const unsigned int consoleWidth = 80; #endif struct TextAttributes { TextAttributes() : initialIndent( std::string::npos ), indent( 0 ), width( consoleWidth-1 ), tabChar( '\t' ) {} TextAttributes& setInitialIndent( std::size_t _value ) { initialIndent = _value; return *this; } TextAttributes& setIndent( std::size_t _value ) { indent = _value; return *this; } TextAttributes& setWidth( std::size_t _value ) { width = _value; return *this; } TextAttributes& setTabChar( char _value ) { tabChar = _value; return *this; } std::size_t initialIndent; // indent of first line, or npos std::size_t indent; // indent of subsequent lines, or all if initialIndent is npos std::size_t width; // maximum width of text, including indent. Longer text will wrap char tabChar; // If this char is seen the indent is changed to current pos }; class Text { public: Text( std::string const& _str, TextAttributes const& _attr = TextAttributes() ) : attr( _attr ) { std::string wrappableChars = " [({.,/|\\-"; std::size_t indent = _attr.initialIndent != std::string::npos ? _attr.initialIndent : _attr.indent; std::string remainder = _str; while( !remainder.empty() ) { if( lines.size() >= 1000 ) { lines.push_back( "... message truncated due to excessive size" ); return; } std::size_t tabPos = std::string::npos; std::size_t width = (std::min)( remainder.size(), _attr.width - indent ); std::size_t pos = remainder.find_first_of( '\n' ); if( pos <= width ) { width = pos; } pos = remainder.find_last_of( _attr.tabChar, width ); if( pos != std::string::npos ) { tabPos = pos; if( remainder[width] == '\n' ) width--; remainder = remainder.substr( 0, tabPos ) + remainder.substr( tabPos+1 ); } if( width == remainder.size() ) { spliceLine( indent, remainder, width ); } else if( remainder[width] == '\n' ) { spliceLine( indent, remainder, width ); if( width <= 1 || remainder.size() != 1 ) remainder = remainder.substr( 1 ); indent = _attr.indent; } else { pos = remainder.find_last_of( wrappableChars, width ); if( pos != std::string::npos && pos > 0 ) { spliceLine( indent, remainder, pos ); if( remainder[0] == ' ' ) remainder = remainder.substr( 1 ); } else { spliceLine( indent, remainder, width-1 ); lines.back() += "-"; } if( lines.size() == 1 ) indent = _attr.indent; if( tabPos != std::string::npos ) indent += tabPos; } } } void spliceLine( std::size_t _indent, std::string& _remainder, std::size_t _pos ) { lines.push_back( std::string( _indent, ' ' ) + _remainder.substr( 0, _pos ) ); _remainder = _remainder.substr( _pos ); } typedef std::vector::const_iterator const_iterator; const_iterator begin() const { return lines.begin(); } const_iterator end() const { return lines.end(); } std::string const& last() const { return lines.back(); } std::size_t size() const { return lines.size(); } std::string const& operator[]( std::size_t _index ) const { return lines[_index]; } std::string toString() const { std::ostringstream oss; oss << *this; return oss.str(); } inline friend std::ostream& operator << ( std::ostream& _stream, Text const& _text ) { for( Text::const_iterator it = _text.begin(), itEnd = _text.end(); it != itEnd; ++it ) { if( it != _text.begin() ) _stream << "\n"; _stream << *it; } return _stream; } private: std::string str; TextAttributes attr; std::vector lines; }; } // end namespace Tbc #ifdef CLICHE_TBC_TEXT_FORMAT_OUTER_NAMESPACE } // end outer namespace #endif #endif // TWOBLUECUBES_TEXT_FORMAT_H_ALREADY_INCLUDED #undef CLICHE_TBC_TEXT_FORMAT_OUTER_NAMESPACE namespace Catch { using Tbc::Text; using Tbc::TextAttributes; } // #included from: catch_console_colour.hpp #define TWOBLUECUBES_CATCH_CONSOLE_COLOUR_HPP_INCLUDED namespace Catch { namespace Detail { struct IColourImpl; } struct Colour { enum Code { None = 0, White, Red, Green, Blue, Cyan, Yellow, Grey, Bright = 0x10, BrightRed = Bright | Red, BrightGreen = Bright | Green, LightGrey = Bright | Grey, BrightWhite = Bright | White, // By intention FileName = LightGrey, Warning = Yellow, ResultError = BrightRed, ResultSuccess = BrightGreen, ResultExpectedFailure = Warning, Error = BrightRed, Success = Green, OriginalExpression = Cyan, ReconstructedExpression = Yellow, SecondaryText = LightGrey, Headers = White }; // Use constructed object for RAII guard Colour( Code _colourCode ); Colour( Colour const& other ); ~Colour(); // Use static method for one-shot changes static void use( Code _colourCode ); private: static Detail::IColourImpl* impl(); bool m_moved; }; inline std::ostream& operator << ( std::ostream& os, Colour const& ) { return os; } } // end namespace Catch // #included from: catch_interfaces_reporter.h #define TWOBLUECUBES_CATCH_INTERFACES_REPORTER_H_INCLUDED #include #include #include #include namespace Catch { struct ReporterConfig { explicit ReporterConfig( Ptr const& _fullConfig ) : m_stream( &_fullConfig->stream() ), m_fullConfig( _fullConfig ) {} ReporterConfig( Ptr const& _fullConfig, std::ostream& _stream ) : m_stream( &_stream ), m_fullConfig( _fullConfig ) {} std::ostream& stream() const { return *m_stream; } Ptr fullConfig() const { return m_fullConfig; } private: std::ostream* m_stream; Ptr m_fullConfig; }; struct ReporterPreferences { ReporterPreferences() : shouldRedirectStdOut( false ) {} bool shouldRedirectStdOut; }; template struct LazyStat : Option { LazyStat() : used( false ) {} LazyStat& operator=( T const& _value ) { Option::operator=( _value ); used = false; return *this; } void reset() { Option::reset(); used = false; } bool used; }; struct TestRunInfo { TestRunInfo( std::string const& _name ) : name( _name ) {} std::string name; }; struct GroupInfo { GroupInfo( std::string const& _name, std::size_t _groupIndex, std::size_t _groupsCount ) : name( _name ), groupIndex( _groupIndex ), groupsCounts( _groupsCount ) {} std::string name; std::size_t groupIndex; std::size_t groupsCounts; }; struct AssertionStats { AssertionStats( AssertionResult const& _assertionResult, std::vector const& _infoMessages, Totals const& _totals ) : assertionResult( _assertionResult ), infoMessages( _infoMessages ), totals( _totals ) { if( assertionResult.hasMessage() ) { // Copy message into messages list. // !TBD This should have been done earlier, somewhere MessageBuilder builder( assertionResult.getTestMacroName(), assertionResult.getSourceInfo(), assertionResult.getResultType() ); builder << assertionResult.getMessage(); builder.m_info.message = builder.m_stream.str(); infoMessages.push_back( builder.m_info ); } } virtual ~AssertionStats(); # ifdef CATCH_CPP11_OR_GREATER AssertionStats( AssertionStats const& ) = default; AssertionStats( AssertionStats && ) = default; AssertionStats& operator = ( AssertionStats const& ) = default; AssertionStats& operator = ( AssertionStats && ) = default; # endif AssertionResult assertionResult; std::vector infoMessages; Totals totals; }; struct SectionStats { SectionStats( SectionInfo const& _sectionInfo, Counts const& _assertions, double _durationInSeconds, bool _missingAssertions ) : sectionInfo( _sectionInfo ), assertions( _assertions ), durationInSeconds( _durationInSeconds ), missingAssertions( _missingAssertions ) {} virtual ~SectionStats(); # ifdef CATCH_CPP11_OR_GREATER SectionStats( SectionStats const& ) = default; SectionStats( SectionStats && ) = default; SectionStats& operator = ( SectionStats const& ) = default; SectionStats& operator = ( SectionStats && ) = default; # endif SectionInfo sectionInfo; Counts assertions; double durationInSeconds; bool missingAssertions; }; struct TestCaseStats { TestCaseStats( TestCaseInfo const& _testInfo, Totals const& _totals, std::string const& _stdOut, std::string const& _stdErr, bool _aborting ) : testInfo( _testInfo ), totals( _totals ), stdOut( _stdOut ), stdErr( _stdErr ), aborting( _aborting ) {} virtual ~TestCaseStats(); # ifdef CATCH_CPP11_OR_GREATER TestCaseStats( TestCaseStats const& ) = default; TestCaseStats( TestCaseStats && ) = default; TestCaseStats& operator = ( TestCaseStats const& ) = default; TestCaseStats& operator = ( TestCaseStats && ) = default; # endif TestCaseInfo testInfo; Totals totals; std::string stdOut; std::string stdErr; bool aborting; }; struct TestGroupStats { TestGroupStats( GroupInfo const& _groupInfo, Totals const& _totals, bool _aborting ) : groupInfo( _groupInfo ), totals( _totals ), aborting( _aborting ) {} TestGroupStats( GroupInfo const& _groupInfo ) : groupInfo( _groupInfo ), aborting( false ) {} virtual ~TestGroupStats(); # ifdef CATCH_CPP11_OR_GREATER TestGroupStats( TestGroupStats const& ) = default; TestGroupStats( TestGroupStats && ) = default; TestGroupStats& operator = ( TestGroupStats const& ) = default; TestGroupStats& operator = ( TestGroupStats && ) = default; # endif GroupInfo groupInfo; Totals totals; bool aborting; }; struct TestRunStats { TestRunStats( TestRunInfo const& _runInfo, Totals const& _totals, bool _aborting ) : runInfo( _runInfo ), totals( _totals ), aborting( _aborting ) {} virtual ~TestRunStats(); # ifndef CATCH_CPP11_OR_GREATER TestRunStats( TestRunStats const& _other ) : runInfo( _other.runInfo ), totals( _other.totals ), aborting( _other.aborting ) {} # else TestRunStats( TestRunStats const& ) = default; TestRunStats( TestRunStats && ) = default; TestRunStats& operator = ( TestRunStats const& ) = default; TestRunStats& operator = ( TestRunStats && ) = default; # endif TestRunInfo runInfo; Totals totals; bool aborting; }; struct IStreamingReporter : IShared { virtual ~IStreamingReporter(); // Implementing class must also provide the following static method: // static std::string getDescription(); virtual ReporterPreferences getPreferences() const = 0; virtual void noMatchingTestCases( std::string const& spec ) = 0; virtual void testRunStarting( TestRunInfo const& testRunInfo ) = 0; virtual void testGroupStarting( GroupInfo const& groupInfo ) = 0; virtual void testCaseStarting( TestCaseInfo const& testInfo ) = 0; virtual void sectionStarting( SectionInfo const& sectionInfo ) = 0; virtual void assertionStarting( AssertionInfo const& assertionInfo ) = 0; virtual bool assertionEnded( AssertionStats const& assertionStats ) = 0; virtual void sectionEnded( SectionStats const& sectionStats ) = 0; virtual void testCaseEnded( TestCaseStats const& testCaseStats ) = 0; virtual void testGroupEnded( TestGroupStats const& testGroupStats ) = 0; virtual void testRunEnded( TestRunStats const& testRunStats ) = 0; }; struct IReporterFactory { virtual ~IReporterFactory(); virtual IStreamingReporter* create( ReporterConfig const& config ) const = 0; virtual std::string getDescription() const = 0; }; struct IReporterRegistry { typedef std::map FactoryMap; virtual ~IReporterRegistry(); virtual IStreamingReporter* create( std::string const& name, Ptr const& config ) const = 0; virtual FactoryMap const& getFactories() const = 0; }; } #include #include namespace Catch { inline std::size_t listTests( Config const& config ) { TestSpec testSpec = config.testSpec(); if( config.testSpec().hasFilters() ) std::cout << "Matching test cases:\n"; else { std::cout << "All available test cases:\n"; testSpec = TestSpecParser( ITagAliasRegistry::get() ).parse( "*" ).testSpec(); } std::size_t matchedTests = 0; TextAttributes nameAttr, tagsAttr; nameAttr.setInitialIndent( 2 ).setIndent( 4 ); tagsAttr.setIndent( 6 ); std::vector matchedTestCases; getRegistryHub().getTestCaseRegistry().getFilteredTests( testSpec, config, matchedTestCases ); for( std::vector::const_iterator it = matchedTestCases.begin(), itEnd = matchedTestCases.end(); it != itEnd; ++it ) { matchedTests++; TestCaseInfo const& testCaseInfo = it->getTestCaseInfo(); Colour::Code colour = testCaseInfo.isHidden() ? Colour::SecondaryText : Colour::None; Colour colourGuard( colour ); std::cout << Text( testCaseInfo.name, nameAttr ) << std::endl; if( !testCaseInfo.tags.empty() ) std::cout << Text( testCaseInfo.tagsAsString, tagsAttr ) << std::endl; } if( !config.testSpec().hasFilters() ) std::cout << pluralise( matchedTests, "test case" ) << "\n" << std::endl; else std::cout << pluralise( matchedTests, "matching test case" ) << "\n" << std::endl; return matchedTests; } inline std::size_t listTestsNamesOnly( Config const& config ) { TestSpec testSpec = config.testSpec(); if( !config.testSpec().hasFilters() ) testSpec = TestSpecParser( ITagAliasRegistry::get() ).parse( "*" ).testSpec(); std::size_t matchedTests = 0; std::vector matchedTestCases; getRegistryHub().getTestCaseRegistry().getFilteredTests( testSpec, config, matchedTestCases ); for( std::vector::const_iterator it = matchedTestCases.begin(), itEnd = matchedTestCases.end(); it != itEnd; ++it ) { matchedTests++; TestCaseInfo const& testCaseInfo = it->getTestCaseInfo(); std::cout << testCaseInfo.name << std::endl; } return matchedTests; } struct TagInfo { TagInfo() : count ( 0 ) {} void add( std::string const& spelling ) { ++count; spellings.insert( spelling ); } std::string all() const { std::string out; for( std::set::const_iterator it = spellings.begin(), itEnd = spellings.end(); it != itEnd; ++it ) out += "[" + *it + "]"; return out; } std::set spellings; std::size_t count; }; inline std::size_t listTags( Config const& config ) { TestSpec testSpec = config.testSpec(); if( config.testSpec().hasFilters() ) std::cout << "Tags for matching test cases:\n"; else { std::cout << "All available tags:\n"; testSpec = TestSpecParser( ITagAliasRegistry::get() ).parse( "*" ).testSpec(); } std::map tagCounts; std::vector matchedTestCases; getRegistryHub().getTestCaseRegistry().getFilteredTests( testSpec, config, matchedTestCases ); for( std::vector::const_iterator it = matchedTestCases.begin(), itEnd = matchedTestCases.end(); it != itEnd; ++it ) { for( std::set::const_iterator tagIt = it->getTestCaseInfo().tags.begin(), tagItEnd = it->getTestCaseInfo().tags.end(); tagIt != tagItEnd; ++tagIt ) { std::string tagName = *tagIt; std::string lcaseTagName = toLower( tagName ); std::map::iterator countIt = tagCounts.find( lcaseTagName ); if( countIt == tagCounts.end() ) countIt = tagCounts.insert( std::make_pair( lcaseTagName, TagInfo() ) ).first; countIt->second.add( tagName ); } } for( std::map::const_iterator countIt = tagCounts.begin(), countItEnd = tagCounts.end(); countIt != countItEnd; ++countIt ) { std::ostringstream oss; oss << " " << std::setw(2) << countIt->second.count << " "; Text wrapper( countIt->second.all(), TextAttributes() .setInitialIndent( 0 ) .setIndent( oss.str().size() ) .setWidth( CATCH_CONFIG_CONSOLE_WIDTH-10 ) ); std::cout << oss.str() << wrapper << "\n"; } std::cout << pluralise( tagCounts.size(), "tag" ) << "\n" << std::endl; return tagCounts.size(); } inline std::size_t listReporters( Config const& /*config*/ ) { std::cout << "Available reports:\n"; IReporterRegistry::FactoryMap const& factories = getRegistryHub().getReporterRegistry().getFactories(); IReporterRegistry::FactoryMap::const_iterator itBegin = factories.begin(), itEnd = factories.end(), it; std::size_t maxNameLen = 0; for(it = itBegin; it != itEnd; ++it ) maxNameLen = (std::max)( maxNameLen, it->first.size() ); for(it = itBegin; it != itEnd; ++it ) { Text wrapper( it->second->getDescription(), TextAttributes() .setInitialIndent( 0 ) .setIndent( 7+maxNameLen ) .setWidth( CATCH_CONFIG_CONSOLE_WIDTH - maxNameLen-8 ) ); std::cout << " " << it->first << ":" << std::string( maxNameLen - it->first.size() + 2, ' ' ) << wrapper << "\n"; } std::cout << std::endl; return factories.size(); } inline Option list( Config const& config ) { Option listedCount; if( config.listTests() ) listedCount = listedCount.valueOr(0) + listTests( config ); if( config.listTestNamesOnly() ) listedCount = listedCount.valueOr(0) + listTestsNamesOnly( config ); if( config.listTags() ) listedCount = listedCount.valueOr(0) + listTags( config ); if( config.listReporters() ) listedCount = listedCount.valueOr(0) + listReporters( config ); return listedCount; } } // end namespace Catch // #included from: internal/catch_runner_impl.hpp #define TWOBLUECUBES_CATCH_RUNNER_IMPL_HPP_INCLUDED // #included from: catch_test_case_tracker.hpp #define TWOBLUECUBES_CATCH_TEST_CASE_TRACKER_HPP_INCLUDED #include #include #include namespace Catch { namespace SectionTracking { class TrackedSection { typedef std::map TrackedSections; public: enum RunState { NotStarted, Executing, ExecutingChildren, Completed }; TrackedSection( std::string const& name, TrackedSection* parent ) : m_name( name ), m_runState( NotStarted ), m_parent( parent ) {} RunState runState() const { return m_runState; } TrackedSection* findChild( std::string const& childName ) { TrackedSections::iterator it = m_children.find( childName ); return it != m_children.end() ? &it->second : NULL; } TrackedSection* acquireChild( std::string const& childName ) { if( TrackedSection* child = findChild( childName ) ) return child; m_children.insert( std::make_pair( childName, TrackedSection( childName, this ) ) ); return findChild( childName ); } void enter() { if( m_runState == NotStarted ) m_runState = Executing; } void leave() { for( TrackedSections::const_iterator it = m_children.begin(), itEnd = m_children.end(); it != itEnd; ++it ) if( it->second.runState() != Completed ) { m_runState = ExecutingChildren; return; } m_runState = Completed; } TrackedSection* getParent() { return m_parent; } bool hasChildren() const { return !m_children.empty(); } private: std::string m_name; RunState m_runState; TrackedSections m_children; TrackedSection* m_parent; }; class TestCaseTracker { public: TestCaseTracker( std::string const& testCaseName ) : m_testCase( testCaseName, NULL ), m_currentSection( &m_testCase ), m_completedASectionThisRun( false ) {} bool enterSection( std::string const& name ) { TrackedSection* child = m_currentSection->acquireChild( name ); if( m_completedASectionThisRun || child->runState() == TrackedSection::Completed ) return false; m_currentSection = child; m_currentSection->enter(); return true; } void leaveSection() { m_currentSection->leave(); m_currentSection = m_currentSection->getParent(); assert( m_currentSection != NULL ); m_completedASectionThisRun = true; } bool currentSectionHasChildren() const { return m_currentSection->hasChildren(); } bool isCompleted() const { return m_testCase.runState() == TrackedSection::Completed; } class Guard { public: Guard( TestCaseTracker& tracker ) : m_tracker( tracker ) { m_tracker.enterTestCase(); } ~Guard() { m_tracker.leaveTestCase(); } private: Guard( Guard const& ); void operator = ( Guard const& ); TestCaseTracker& m_tracker; }; private: void enterTestCase() { m_currentSection = &m_testCase; m_completedASectionThisRun = false; m_testCase.enter(); } void leaveTestCase() { m_testCase.leave(); } TrackedSection m_testCase; TrackedSection* m_currentSection; bool m_completedASectionThisRun; }; } // namespace SectionTracking using SectionTracking::TestCaseTracker; } // namespace Catch #include #include namespace Catch { class StreamRedirect { public: StreamRedirect( std::ostream& stream, std::string& targetString ) : m_stream( stream ), m_prevBuf( stream.rdbuf() ), m_targetString( targetString ) { stream.rdbuf( m_oss.rdbuf() ); } ~StreamRedirect() { m_targetString += m_oss.str(); m_stream.rdbuf( m_prevBuf ); } private: std::ostream& m_stream; std::streambuf* m_prevBuf; std::ostringstream m_oss; std::string& m_targetString; }; /////////////////////////////////////////////////////////////////////////// class RunContext : public IResultCapture, public IRunner { RunContext( RunContext const& ); void operator =( RunContext const& ); public: explicit RunContext( Ptr const& config, Ptr const& reporter ) : m_runInfo( config->name() ), m_context( getCurrentMutableContext() ), m_activeTestCase( NULL ), m_config( config ), m_reporter( reporter ), m_prevRunner( m_context.getRunner() ), m_prevResultCapture( m_context.getResultCapture() ), m_prevConfig( m_context.getConfig() ) { m_context.setRunner( this ); m_context.setConfig( m_config ); m_context.setResultCapture( this ); m_reporter->testRunStarting( m_runInfo ); } virtual ~RunContext() { m_reporter->testRunEnded( TestRunStats( m_runInfo, m_totals, aborting() ) ); m_context.setRunner( m_prevRunner ); m_context.setConfig( NULL ); m_context.setResultCapture( m_prevResultCapture ); m_context.setConfig( m_prevConfig ); } void testGroupStarting( std::string const& testSpec, std::size_t groupIndex, std::size_t groupsCount ) { m_reporter->testGroupStarting( GroupInfo( testSpec, groupIndex, groupsCount ) ); } void testGroupEnded( std::string const& testSpec, Totals const& totals, std::size_t groupIndex, std::size_t groupsCount ) { m_reporter->testGroupEnded( TestGroupStats( GroupInfo( testSpec, groupIndex, groupsCount ), totals, aborting() ) ); } Totals runTest( TestCase const& testCase ) { Totals prevTotals = m_totals; std::string redirectedCout; std::string redirectedCerr; TestCaseInfo testInfo = testCase.getTestCaseInfo(); m_reporter->testCaseStarting( testInfo ); m_activeTestCase = &testCase; m_testCaseTracker = TestCaseTracker( testInfo.name ); do { do { runCurrentTest( redirectedCout, redirectedCerr ); } while( !m_testCaseTracker->isCompleted() && !aborting() ); } while( getCurrentContext().advanceGeneratorsForCurrentTest() && !aborting() ); Totals deltaTotals = m_totals.delta( prevTotals ); m_totals.testCases += deltaTotals.testCases; m_reporter->testCaseEnded( TestCaseStats( testInfo, deltaTotals, redirectedCout, redirectedCerr, aborting() ) ); m_activeTestCase = NULL; m_testCaseTracker.reset(); return deltaTotals; } Ptr config() const { return m_config; } private: // IResultCapture virtual void assertionEnded( AssertionResult const& result ) { if( result.getResultType() == ResultWas::Ok ) { m_totals.assertions.passed++; } else if( !result.isOk() ) { m_totals.assertions.failed++; } if( m_reporter->assertionEnded( AssertionStats( result, m_messages, m_totals ) ) ) m_messages.clear(); // Reset working state m_lastAssertionInfo = AssertionInfo( "", m_lastAssertionInfo.lineInfo, "{Unknown expression after the reported line}" , m_lastAssertionInfo.resultDisposition ); m_lastResult = result; } virtual bool sectionStarted ( SectionInfo const& sectionInfo, Counts& assertions ) { std::ostringstream oss; oss << sectionInfo.name << "@" << sectionInfo.lineInfo; if( !m_testCaseTracker->enterSection( oss.str() ) ) return false; m_lastAssertionInfo.lineInfo = sectionInfo.lineInfo; m_reporter->sectionStarting( sectionInfo ); assertions = m_totals.assertions; return true; } bool testForMissingAssertions( Counts& assertions ) { if( assertions.total() != 0 || !m_config->warnAboutMissingAssertions() || m_testCaseTracker->currentSectionHasChildren() ) return false; m_totals.assertions.failed++; assertions.failed++; return true; } virtual void sectionEnded( SectionInfo const& info, Counts const& prevAssertions, double _durationInSeconds ) { if( std::uncaught_exception() ) { m_unfinishedSections.push_back( UnfinishedSections( info, prevAssertions, _durationInSeconds ) ); return; } Counts assertions = m_totals.assertions - prevAssertions; bool missingAssertions = testForMissingAssertions( assertions ); m_testCaseTracker->leaveSection(); m_reporter->sectionEnded( SectionStats( info, assertions, _durationInSeconds, missingAssertions ) ); m_messages.clear(); } virtual void pushScopedMessage( MessageInfo const& message ) { m_messages.push_back( message ); } virtual void popScopedMessage( MessageInfo const& message ) { m_messages.erase( std::remove( m_messages.begin(), m_messages.end(), message ), m_messages.end() ); } virtual std::string getCurrentTestName() const { return m_activeTestCase ? m_activeTestCase->getTestCaseInfo().name : ""; } virtual const AssertionResult* getLastResult() const { return &m_lastResult; } public: // !TBD We need to do this another way! bool aborting() const { return m_totals.assertions.failed == static_cast( m_config->abortAfter() ); } private: void runCurrentTest( std::string& redirectedCout, std::string& redirectedCerr ) { TestCaseInfo const& testCaseInfo = m_activeTestCase->getTestCaseInfo(); SectionInfo testCaseSection( testCaseInfo.lineInfo, testCaseInfo.name, testCaseInfo.description ); m_reporter->sectionStarting( testCaseSection ); Counts prevAssertions = m_totals.assertions; double duration = 0; try { m_lastAssertionInfo = AssertionInfo( "TEST_CASE", testCaseInfo.lineInfo, "", ResultDisposition::Normal ); TestCaseTracker::Guard guard( *m_testCaseTracker ); Timer timer; timer.start(); if( m_reporter->getPreferences().shouldRedirectStdOut ) { StreamRedirect coutRedir( std::cout, redirectedCout ); StreamRedirect cerrRedir( std::cerr, redirectedCerr ); m_activeTestCase->invoke(); } else { m_activeTestCase->invoke(); } duration = timer.getElapsedSeconds(); } catch( TestFailureException& ) { // This just means the test was aborted due to failure } catch(...) { ResultBuilder exResult( m_lastAssertionInfo.macroName.c_str(), m_lastAssertionInfo.lineInfo, m_lastAssertionInfo.capturedExpression.c_str(), m_lastAssertionInfo.resultDisposition ); exResult.useActiveException(); } // If sections ended prematurely due to an exception we stored their // infos here so we can tear them down outside the unwind process. for( std::vector::const_reverse_iterator it = m_unfinishedSections.rbegin(), itEnd = m_unfinishedSections.rend(); it != itEnd; ++it ) sectionEnded( it->info, it->prevAssertions, it->durationInSeconds ); m_unfinishedSections.clear(); m_messages.clear(); Counts assertions = m_totals.assertions - prevAssertions; bool missingAssertions = testForMissingAssertions( assertions ); if( testCaseInfo.okToFail() ) { std::swap( assertions.failedButOk, assertions.failed ); m_totals.assertions.failed -= assertions.failedButOk; m_totals.assertions.failedButOk += assertions.failedButOk; } SectionStats testCaseSectionStats( testCaseSection, assertions, duration, missingAssertions ); m_reporter->sectionEnded( testCaseSectionStats ); } private: struct UnfinishedSections { UnfinishedSections( SectionInfo const& _info, Counts const& _prevAssertions, double _durationInSeconds ) : info( _info ), prevAssertions( _prevAssertions ), durationInSeconds( _durationInSeconds ) {} SectionInfo info; Counts prevAssertions; double durationInSeconds; }; TestRunInfo m_runInfo; IMutableContext& m_context; TestCase const* m_activeTestCase; Option m_testCaseTracker; AssertionResult m_lastResult; Ptr m_config; Totals m_totals; Ptr m_reporter; std::vector m_messages; IRunner* m_prevRunner; IResultCapture* m_prevResultCapture; Ptr m_prevConfig; AssertionInfo m_lastAssertionInfo; std::vector m_unfinishedSections; }; IResultCapture& getResultCapture() { if( IResultCapture* capture = getCurrentContext().getResultCapture() ) return *capture; else throw std::logic_error( "No result capture instance" ); } } // end namespace Catch // #included from: internal/catch_version.h #define TWOBLUECUBES_CATCH_VERSION_H_INCLUDED namespace Catch { // Versioning information struct Version { Version( unsigned int _majorVersion, unsigned int _minorVersion, unsigned int _buildNumber, char const* const _branchName ) : majorVersion( _majorVersion ), minorVersion( _minorVersion ), buildNumber( _buildNumber ), branchName( _branchName ) {} unsigned int const majorVersion; unsigned int const minorVersion; unsigned int const buildNumber; char const* const branchName; private: void operator=( Version const& ); }; extern Version libraryVersion; } #include #include #include namespace Catch { class Runner { public: Runner( Ptr const& config ) : m_config( config ) { openStream(); makeReporter(); } Totals runTests() { RunContext context( m_config.get(), m_reporter ); Totals totals; context.testGroupStarting( "", 1, 1 ); // deprecated? TestSpec testSpec = m_config->testSpec(); if( !testSpec.hasFilters() ) testSpec = TestSpecParser( ITagAliasRegistry::get() ).parse( "~[.]" ).testSpec(); // All not hidden tests std::vector testCases; getRegistryHub().getTestCaseRegistry().getFilteredTests( testSpec, *m_config, testCases ); int testsRunForGroup = 0; for( std::vector::const_iterator it = testCases.begin(), itEnd = testCases.end(); it != itEnd; ++it ) { testsRunForGroup++; if( m_testsAlreadyRun.find( *it ) == m_testsAlreadyRun.end() ) { if( context.aborting() ) break; totals += context.runTest( *it ); m_testsAlreadyRun.insert( *it ); } } context.testGroupEnded( "", totals, 1, 1 ); return totals; } private: void openStream() { // Open output file, if specified if( !m_config->getFilename().empty() ) { m_ofs.open( m_config->getFilename().c_str() ); if( m_ofs.fail() ) { std::ostringstream oss; oss << "Unable to open file: '" << m_config->getFilename() << "'"; throw std::domain_error( oss.str() ); } m_config->setStreamBuf( m_ofs.rdbuf() ); } } void makeReporter() { std::string reporterName = m_config->getReporterName().empty() ? "console" : m_config->getReporterName(); m_reporter = getRegistryHub().getReporterRegistry().create( reporterName, m_config.get() ); if( !m_reporter ) { std::ostringstream oss; oss << "No reporter registered with name: '" << reporterName << "'"; throw std::domain_error( oss.str() ); } } private: Ptr m_config; std::ofstream m_ofs; Ptr m_reporter; std::set m_testsAlreadyRun; }; class Session { static bool alreadyInstantiated; public: struct OnUnusedOptions { enum DoWhat { Ignore, Fail }; }; Session() : m_cli( makeCommandLineParser() ) { if( alreadyInstantiated ) { std::string msg = "Only one instance of Catch::Session can ever be used"; std::cerr << msg << std::endl; throw std::logic_error( msg ); } alreadyInstantiated = true; } ~Session() { Catch::cleanUp(); } void showHelp( std::string const& processName ) { std::cout << "\nCatch v" << libraryVersion.majorVersion << "." << libraryVersion.minorVersion << " build " << libraryVersion.buildNumber; if( libraryVersion.branchName != std::string( "master" ) ) std::cout << " (" << libraryVersion.branchName << " branch)"; std::cout << "\n"; m_cli.usage( std::cout, processName ); std::cout << "For more detail usage please see the project docs\n" << std::endl; } int applyCommandLine( int argc, char* const argv[], OnUnusedOptions::DoWhat unusedOptionBehaviour = OnUnusedOptions::Fail ) { try { m_cli.setThrowOnUnrecognisedTokens( unusedOptionBehaviour == OnUnusedOptions::Fail ); m_unusedTokens = m_cli.parseInto( argc, argv, m_configData ); if( m_configData.showHelp ) showHelp( m_configData.processName ); m_config.reset(); } catch( std::exception& ex ) { { Colour colourGuard( Colour::Red ); std::cerr << "\nError(s) in input:\n" << Text( ex.what(), TextAttributes().setIndent(2) ) << "\n\n"; } m_cli.usage( std::cout, m_configData.processName ); return (std::numeric_limits::max)(); } return 0; } void useConfigData( ConfigData const& _configData ) { m_configData = _configData; m_config.reset(); } int run( int argc, char* const argv[] ) { int returnCode = applyCommandLine( argc, argv ); if( returnCode == 0 ) returnCode = run(); return returnCode; } int run() { if( m_configData.showHelp ) return 0; try { config(); // Force config to be constructed Runner runner( m_config ); // Handle list request if( Option listed = list( config() ) ) return static_cast( *listed ); return static_cast( runner.runTests().assertions.failed ); } catch( std::exception& ex ) { std::cerr << ex.what() << std::endl; return (std::numeric_limits::max)(); } } Clara::CommandLine const& cli() const { return m_cli; } std::vector const& unusedTokens() const { return m_unusedTokens; } ConfigData& configData() { return m_configData; } Config& config() { if( !m_config ) m_config = new Config( m_configData ); return *m_config; } private: Clara::CommandLine m_cli; std::vector m_unusedTokens; ConfigData m_configData; Ptr m_config; }; bool Session::alreadyInstantiated = false; } // end namespace Catch // #included from: catch_registry_hub.hpp #define TWOBLUECUBES_CATCH_REGISTRY_HUB_HPP_INCLUDED // #included from: catch_test_case_registry_impl.hpp #define TWOBLUECUBES_CATCH_TEST_CASE_REGISTRY_IMPL_HPP_INCLUDED #include #include #include #include namespace Catch { class TestRegistry : public ITestCaseRegistry { public: TestRegistry() : m_unnamedCount( 0 ) {} virtual ~TestRegistry(); virtual void registerTest( TestCase const& testCase ) { std::string name = testCase.getTestCaseInfo().name; if( name == "" ) { std::ostringstream oss; oss << "Anonymous test case " << ++m_unnamedCount; return registerTest( testCase.withName( oss.str() ) ); } if( m_functions.find( testCase ) == m_functions.end() ) { m_functions.insert( testCase ); m_functionsInOrder.push_back( testCase ); if( !testCase.isHidden() ) m_nonHiddenFunctions.push_back( testCase ); } else { TestCase const& prev = *m_functions.find( testCase ); { Colour colourGuard( Colour::Red ); std::cerr << "error: TEST_CASE( \"" << name << "\" ) already defined.\n" << "\tFirst seen at " << prev.getTestCaseInfo().lineInfo << "\n" << "\tRedefined at " << testCase.getTestCaseInfo().lineInfo << std::endl; } exit(1); } } virtual std::vector const& getAllTests() const { return m_functionsInOrder; } virtual std::vector const& getAllNonHiddenTests() const { return m_nonHiddenFunctions; } virtual void getFilteredTests( TestSpec const& testSpec, IConfig const& config, std::vector& matchingTestCases ) const { for( std::vector::const_iterator it = m_functionsInOrder.begin(), itEnd = m_functionsInOrder.end(); it != itEnd; ++it ) { if( testSpec.matches( *it ) && ( config.allowThrows() || !it->throws() ) ) matchingTestCases.push_back( *it ); } } private: std::set m_functions; std::vector m_functionsInOrder; std::vector m_nonHiddenFunctions; size_t m_unnamedCount; }; /////////////////////////////////////////////////////////////////////////// class FreeFunctionTestCase : public SharedImpl { public: FreeFunctionTestCase( TestFunction fun ) : m_fun( fun ) {} virtual void invoke() const { m_fun(); } private: virtual ~FreeFunctionTestCase(); TestFunction m_fun; }; inline std::string extractClassName( std::string const& classOrQualifiedMethodName ) { std::string className = classOrQualifiedMethodName; if( startsWith( className, "&" ) ) { std::size_t lastColons = className.rfind( "::" ); std::size_t penultimateColons = className.rfind( "::", lastColons-1 ); if( penultimateColons == std::string::npos ) penultimateColons = 1; className = className.substr( penultimateColons, lastColons-penultimateColons ); } return className; } /////////////////////////////////////////////////////////////////////////// AutoReg::AutoReg( TestFunction function, SourceLineInfo const& lineInfo, NameAndDesc const& nameAndDesc ) { registerTestCase( new FreeFunctionTestCase( function ), "", nameAndDesc, lineInfo ); } AutoReg::~AutoReg() {} void AutoReg::registerTestCase( ITestCase* testCase, char const* classOrQualifiedMethodName, NameAndDesc const& nameAndDesc, SourceLineInfo const& lineInfo ) { getMutableRegistryHub().registerTest ( makeTestCase( testCase, extractClassName( classOrQualifiedMethodName ), nameAndDesc.name, nameAndDesc.description, lineInfo ) ); } } // end namespace Catch // #included from: catch_reporter_registry.hpp #define TWOBLUECUBES_CATCH_REPORTER_REGISTRY_HPP_INCLUDED #include namespace Catch { class ReporterRegistry : public IReporterRegistry { public: virtual ~ReporterRegistry() { deleteAllValues( m_factories ); } virtual IStreamingReporter* create( std::string const& name, Ptr const& config ) const { FactoryMap::const_iterator it = m_factories.find( name ); if( it == m_factories.end() ) return NULL; return it->second->create( ReporterConfig( config ) ); } void registerReporter( std::string const& name, IReporterFactory* factory ) { m_factories.insert( std::make_pair( name, factory ) ); } FactoryMap const& getFactories() const { return m_factories; } private: FactoryMap m_factories; }; } // #included from: catch_exception_translator_registry.hpp #define TWOBLUECUBES_CATCH_EXCEPTION_TRANSLATOR_REGISTRY_HPP_INCLUDED #ifdef __OBJC__ #import "Foundation/Foundation.h" #endif namespace Catch { class ExceptionTranslatorRegistry : public IExceptionTranslatorRegistry { public: ~ExceptionTranslatorRegistry() { deleteAll( m_translators ); } virtual void registerTranslator( const IExceptionTranslator* translator ) { m_translators.push_back( translator ); } virtual std::string translateActiveException() const { try { #ifdef __OBJC__ // In Objective-C try objective-c exceptions first @try { throw; } @catch (NSException *exception) { return toString( [exception description] ); } #else throw; #endif } catch( TestFailureException& ) { throw; } catch( std::exception& ex ) { return ex.what(); } catch( std::string& msg ) { return msg; } catch( const char* msg ) { return msg; } catch(...) { return tryTranslators( m_translators.begin() ); } } std::string tryTranslators( std::vector::const_iterator it ) const { if( it == m_translators.end() ) return "Unknown exception"; try { return (*it)->translate(); } catch(...) { return tryTranslators( it+1 ); } } private: std::vector m_translators; }; } namespace Catch { namespace { class RegistryHub : public IRegistryHub, public IMutableRegistryHub { RegistryHub( RegistryHub const& ); void operator=( RegistryHub const& ); public: // IRegistryHub RegistryHub() { } virtual IReporterRegistry const& getReporterRegistry() const { return m_reporterRegistry; } virtual ITestCaseRegistry const& getTestCaseRegistry() const { return m_testCaseRegistry; } virtual IExceptionTranslatorRegistry& getExceptionTranslatorRegistry() { return m_exceptionTranslatorRegistry; } public: // IMutableRegistryHub virtual void registerReporter( std::string const& name, IReporterFactory* factory ) { m_reporterRegistry.registerReporter( name, factory ); } virtual void registerTest( TestCase const& testInfo ) { m_testCaseRegistry.registerTest( testInfo ); } virtual void registerTranslator( const IExceptionTranslator* translator ) { m_exceptionTranslatorRegistry.registerTranslator( translator ); } private: TestRegistry m_testCaseRegistry; ReporterRegistry m_reporterRegistry; ExceptionTranslatorRegistry m_exceptionTranslatorRegistry; }; // Single, global, instance inline RegistryHub*& getTheRegistryHub() { static RegistryHub* theRegistryHub = NULL; if( !theRegistryHub ) theRegistryHub = new RegistryHub(); return theRegistryHub; } } IRegistryHub& getRegistryHub() { return *getTheRegistryHub(); } IMutableRegistryHub& getMutableRegistryHub() { return *getTheRegistryHub(); } void cleanUp() { delete getTheRegistryHub(); getTheRegistryHub() = NULL; cleanUpContext(); } std::string translateActiveException() { return getRegistryHub().getExceptionTranslatorRegistry().translateActiveException(); } } // end namespace Catch // #included from: catch_notimplemented_exception.hpp #define TWOBLUECUBES_CATCH_NOTIMPLEMENTED_EXCEPTION_HPP_INCLUDED #include namespace Catch { NotImplementedException::NotImplementedException( SourceLineInfo const& lineInfo ) : m_lineInfo( lineInfo ) { std::ostringstream oss; oss << lineInfo << ": function "; oss << "not implemented"; m_what = oss.str(); } const char* NotImplementedException::what() const CATCH_NOEXCEPT { return m_what.c_str(); } } // end namespace Catch // #included from: catch_context_impl.hpp #define TWOBLUECUBES_CATCH_CONTEXT_IMPL_HPP_INCLUDED // #included from: catch_stream.hpp #define TWOBLUECUBES_CATCH_STREAM_HPP_INCLUDED // #included from: catch_streambuf.h #define TWOBLUECUBES_CATCH_STREAMBUF_H_INCLUDED #include namespace Catch { class StreamBufBase : public std::streambuf { public: virtual ~StreamBufBase() CATCH_NOEXCEPT; }; } #include #include namespace Catch { template class StreamBufImpl : public StreamBufBase { char data[bufferSize]; WriterF m_writer; public: StreamBufImpl() { setp( data, data + sizeof(data) ); } ~StreamBufImpl() CATCH_NOEXCEPT { sync(); } private: int overflow( int c ) { sync(); if( c != EOF ) { if( pbase() == epptr() ) m_writer( std::string( 1, static_cast( c ) ) ); else sputc( static_cast( c ) ); } return 0; } int sync() { if( pbase() != pptr() ) { m_writer( std::string( pbase(), static_cast( pptr() - pbase() ) ) ); setp( pbase(), epptr() ); } return 0; } }; /////////////////////////////////////////////////////////////////////////// struct OutputDebugWriter { void operator()( std::string const&str ) { writeToDebugConsole( str ); } }; Stream::Stream() : streamBuf( NULL ), isOwned( false ) {} Stream::Stream( std::streambuf* _streamBuf, bool _isOwned ) : streamBuf( _streamBuf ), isOwned( _isOwned ) {} void Stream::release() { if( isOwned ) { delete streamBuf; streamBuf = NULL; isOwned = false; } } } namespace Catch { class Context : public IMutableContext { Context() : m_config( NULL ), m_runner( NULL ), m_resultCapture( NULL ) {} Context( Context const& ); void operator=( Context const& ); public: // IContext virtual IResultCapture* getResultCapture() { return m_resultCapture; } virtual IRunner* getRunner() { return m_runner; } virtual size_t getGeneratorIndex( std::string const& fileInfo, size_t totalSize ) { return getGeneratorsForCurrentTest() .getGeneratorInfo( fileInfo, totalSize ) .getCurrentIndex(); } virtual bool advanceGeneratorsForCurrentTest() { IGeneratorsForTest* generators = findGeneratorsForCurrentTest(); return generators && generators->moveNext(); } virtual Ptr getConfig() const { return m_config; } public: // IMutableContext virtual void setResultCapture( IResultCapture* resultCapture ) { m_resultCapture = resultCapture; } virtual void setRunner( IRunner* runner ) { m_runner = runner; } virtual void setConfig( Ptr const& config ) { m_config = config; } friend IMutableContext& getCurrentMutableContext(); private: IGeneratorsForTest* findGeneratorsForCurrentTest() { std::string testName = getResultCapture()->getCurrentTestName(); std::map::const_iterator it = m_generatorsByTestName.find( testName ); return it != m_generatorsByTestName.end() ? it->second : NULL; } IGeneratorsForTest& getGeneratorsForCurrentTest() { IGeneratorsForTest* generators = findGeneratorsForCurrentTest(); if( !generators ) { std::string testName = getResultCapture()->getCurrentTestName(); generators = createGeneratorsForTest(); m_generatorsByTestName.insert( std::make_pair( testName, generators ) ); } return *generators; } private: Ptr m_config; IRunner* m_runner; IResultCapture* m_resultCapture; std::map m_generatorsByTestName; }; namespace { Context* currentContext = NULL; } IMutableContext& getCurrentMutableContext() { if( !currentContext ) currentContext = new Context(); return *currentContext; } IContext& getCurrentContext() { return getCurrentMutableContext(); } Stream createStream( std::string const& streamName ) { if( streamName == "stdout" ) return Stream( std::cout.rdbuf(), false ); if( streamName == "stderr" ) return Stream( std::cerr.rdbuf(), false ); if( streamName == "debug" ) return Stream( new StreamBufImpl, true ); throw std::domain_error( "Unknown stream: " + streamName ); } void cleanUpContext() { delete currentContext; currentContext = NULL; } } // #included from: catch_console_colour_impl.hpp #define TWOBLUECUBES_CATCH_CONSOLE_COLOUR_IMPL_HPP_INCLUDED namespace Catch { namespace Detail { struct IColourImpl { virtual ~IColourImpl() {} virtual void use( Colour::Code _colourCode ) = 0; }; }} #if defined ( CATCH_PLATFORM_WINDOWS ) ///////////////////////////////////////// #ifndef NOMINMAX #define NOMINMAX #endif #ifdef __AFXDLL #include #else #include #endif namespace Catch { namespace { class Win32ColourImpl : public Detail::IColourImpl { public: Win32ColourImpl() : stdoutHandle( GetStdHandle(STD_OUTPUT_HANDLE) ) { CONSOLE_SCREEN_BUFFER_INFO csbiInfo; GetConsoleScreenBufferInfo( stdoutHandle, &csbiInfo ); originalAttributes = csbiInfo.wAttributes; } virtual void use( Colour::Code _colourCode ) { switch( _colourCode ) { case Colour::None: return setTextAttribute( originalAttributes ); case Colour::White: return setTextAttribute( FOREGROUND_GREEN | FOREGROUND_RED | FOREGROUND_BLUE ); case Colour::Red: return setTextAttribute( FOREGROUND_RED ); case Colour::Green: return setTextAttribute( FOREGROUND_GREEN ); case Colour::Blue: return setTextAttribute( FOREGROUND_BLUE ); case Colour::Cyan: return setTextAttribute( FOREGROUND_BLUE | FOREGROUND_GREEN ); case Colour::Yellow: return setTextAttribute( FOREGROUND_RED | FOREGROUND_GREEN ); case Colour::Grey: return setTextAttribute( 0 ); case Colour::LightGrey: return setTextAttribute( FOREGROUND_INTENSITY ); case Colour::BrightRed: return setTextAttribute( FOREGROUND_INTENSITY | FOREGROUND_RED ); case Colour::BrightGreen: return setTextAttribute( FOREGROUND_INTENSITY | FOREGROUND_GREEN ); case Colour::BrightWhite: return setTextAttribute( FOREGROUND_INTENSITY | FOREGROUND_GREEN | FOREGROUND_RED | FOREGROUND_BLUE ); case Colour::Bright: throw std::logic_error( "not a colour" ); } } private: void setTextAttribute( WORD _textAttribute ) { SetConsoleTextAttribute( stdoutHandle, _textAttribute ); } HANDLE stdoutHandle; WORD originalAttributes; }; inline bool shouldUseColourForPlatform() { return true; } static Detail::IColourImpl* platformColourInstance() { static Win32ColourImpl s_instance; return &s_instance; } } // end anon namespace } // end namespace Catch #else // Not Windows - assumed to be POSIX compatible ////////////////////////// #include namespace Catch { namespace { // use POSIX/ ANSI console terminal codes // Thanks to Adam Strzelecki for original contribution // (http://github.com/nanoant) // https://github.com/philsquared/Catch/pull/131 class PosixColourImpl : public Detail::IColourImpl { public: virtual void use( Colour::Code _colourCode ) { switch( _colourCode ) { case Colour::None: case Colour::White: return setColour( "[0m" ); case Colour::Red: return setColour( "[0;31m" ); case Colour::Green: return setColour( "[0;32m" ); case Colour::Blue: return setColour( "[0:34m" ); case Colour::Cyan: return setColour( "[0;36m" ); case Colour::Yellow: return setColour( "[0;33m" ); case Colour::Grey: return setColour( "[1;30m" ); case Colour::LightGrey: return setColour( "[0;37m" ); case Colour::BrightRed: return setColour( "[1;31m" ); case Colour::BrightGreen: return setColour( "[1;32m" ); case Colour::BrightWhite: return setColour( "[1;37m" ); case Colour::Bright: throw std::logic_error( "not a colour" ); } } private: void setColour( const char* _escapeCode ) { std::cout << '\033' << _escapeCode; } }; inline bool shouldUseColourForPlatform() { return isatty(STDOUT_FILENO); } static Detail::IColourImpl* platformColourInstance() { static PosixColourImpl s_instance; return &s_instance; } } // end anon namespace } // end namespace Catch #endif // not Windows namespace Catch { namespace { struct NoColourImpl : Detail::IColourImpl { void use( Colour::Code ) {} static IColourImpl* instance() { static NoColourImpl s_instance; return &s_instance; } }; static bool shouldUseColour() { return shouldUseColourForPlatform() && !isDebuggerActive(); } } Colour::Colour( Code _colourCode ) : m_moved( false ) { use( _colourCode ); } Colour::Colour( Colour const& _other ) : m_moved( false ) { const_cast( _other ).m_moved = true; } Colour::~Colour(){ if( !m_moved ) use( None ); } void Colour::use( Code _colourCode ) { impl()->use( _colourCode ); } Detail::IColourImpl* Colour::impl() { return shouldUseColour() ? platformColourInstance() : NoColourImpl::instance(); } } // end namespace Catch // #included from: catch_generators_impl.hpp #define TWOBLUECUBES_CATCH_GENERATORS_IMPL_HPP_INCLUDED #include #include #include namespace Catch { struct GeneratorInfo : IGeneratorInfo { GeneratorInfo( std::size_t size ) : m_size( size ), m_currentIndex( 0 ) {} bool moveNext() { if( ++m_currentIndex == m_size ) { m_currentIndex = 0; return false; } return true; } std::size_t getCurrentIndex() const { return m_currentIndex; } std::size_t m_size; std::size_t m_currentIndex; }; /////////////////////////////////////////////////////////////////////////// class GeneratorsForTest : public IGeneratorsForTest { public: ~GeneratorsForTest() { deleteAll( m_generatorsInOrder ); } IGeneratorInfo& getGeneratorInfo( std::string const& fileInfo, std::size_t size ) { std::map::const_iterator it = m_generatorsByName.find( fileInfo ); if( it == m_generatorsByName.end() ) { IGeneratorInfo* info = new GeneratorInfo( size ); m_generatorsByName.insert( std::make_pair( fileInfo, info ) ); m_generatorsInOrder.push_back( info ); return *info; } return *it->second; } bool moveNext() { std::vector::const_iterator it = m_generatorsInOrder.begin(); std::vector::const_iterator itEnd = m_generatorsInOrder.end(); for(; it != itEnd; ++it ) { if( (*it)->moveNext() ) return true; } return false; } private: std::map m_generatorsByName; std::vector m_generatorsInOrder; }; IGeneratorsForTest* createGeneratorsForTest() { return new GeneratorsForTest(); } } // end namespace Catch // #included from: catch_assertionresult.hpp #define TWOBLUECUBES_CATCH_ASSERTIONRESULT_HPP_INCLUDED namespace Catch { AssertionInfo::AssertionInfo( std::string const& _macroName, SourceLineInfo const& _lineInfo, std::string const& _capturedExpression, ResultDisposition::Flags _resultDisposition ) : macroName( _macroName ), lineInfo( _lineInfo ), capturedExpression( _capturedExpression ), resultDisposition( _resultDisposition ) {} AssertionResult::AssertionResult() {} AssertionResult::AssertionResult( AssertionInfo const& info, AssertionResultData const& data ) : m_info( info ), m_resultData( data ) {} AssertionResult::~AssertionResult() {} // Result was a success bool AssertionResult::succeeded() const { return Catch::isOk( m_resultData.resultType ); } // Result was a success, or failure is suppressed bool AssertionResult::isOk() const { return Catch::isOk( m_resultData.resultType ) || shouldSuppressFailure( m_info.resultDisposition ); } ResultWas::OfType AssertionResult::getResultType() const { return m_resultData.resultType; } bool AssertionResult::hasExpression() const { return !m_info.capturedExpression.empty(); } bool AssertionResult::hasMessage() const { return !m_resultData.message.empty(); } std::string AssertionResult::getExpression() const { if( isFalseTest( m_info.resultDisposition ) ) return "!" + m_info.capturedExpression; else return m_info.capturedExpression; } std::string AssertionResult::getExpressionInMacro() const { if( m_info.macroName.empty() ) return m_info.capturedExpression; else return m_info.macroName + "( " + m_info.capturedExpression + " )"; } bool AssertionResult::hasExpandedExpression() const { return hasExpression() && getExpandedExpression() != getExpression(); } std::string AssertionResult::getExpandedExpression() const { return m_resultData.reconstructedExpression; } std::string AssertionResult::getMessage() const { return m_resultData.message; } SourceLineInfo AssertionResult::getSourceInfo() const { return m_info.lineInfo; } std::string AssertionResult::getTestMacroName() const { return m_info.macroName; } } // end namespace Catch // #included from: catch_test_case_info.hpp #define TWOBLUECUBES_CATCH_TEST_CASE_INFO_HPP_INCLUDED namespace Catch { inline TestCaseInfo::SpecialProperties parseSpecialTag( std::string const& tag ) { if( tag == "." || tag == "hide" || tag == "!hide" ) return TestCaseInfo::IsHidden; else if( tag == "!throws" ) return TestCaseInfo::Throws; else if( tag == "!shouldfail" ) return TestCaseInfo::ShouldFail; else if( tag == "!mayfail" ) return TestCaseInfo::MayFail; else return TestCaseInfo::None; } inline bool isReservedTag( std::string const& tag ) { return parseSpecialTag( tag ) == TestCaseInfo::None && tag.size() > 0 && !isalnum( tag[0] ); } inline void enforceNotReservedTag( std::string const& tag, SourceLineInfo const& _lineInfo ) { if( isReservedTag( tag ) ) { { Colour colourGuard( Colour::Red ); std::cerr << "Tag name [" << tag << "] not allowed.\n" << "Tag names starting with non alpha-numeric characters are reserved\n"; } { Colour colourGuard( Colour::FileName ); std::cerr << _lineInfo << std::endl; } exit(1); } } TestCase makeTestCase( ITestCase* _testCase, std::string const& _className, std::string const& _name, std::string const& _descOrTags, SourceLineInfo const& _lineInfo ) { bool isHidden( startsWith( _name, "./" ) ); // Legacy support // Parse out tags std::set tags; std::string desc, tag; bool inTag = false; for( std::size_t i = 0; i < _descOrTags.size(); ++i ) { char c = _descOrTags[i]; if( !inTag ) { if( c == '[' ) inTag = true; else desc += c; } else { if( c == ']' ) { enforceNotReservedTag( tag, _lineInfo ); inTag = false; if( tag == "hide" || tag == "." ) isHidden = true; else tags.insert( tag ); tag.clear(); } else tag += c; } } if( isHidden ) { tags.insert( "hide" ); tags.insert( "." ); } TestCaseInfo info( _name, _className, desc, tags, _lineInfo ); return TestCase( _testCase, info ); } TestCaseInfo::TestCaseInfo( std::string const& _name, std::string const& _className, std::string const& _description, std::set const& _tags, SourceLineInfo const& _lineInfo ) : name( _name ), className( _className ), description( _description ), tags( _tags ), lineInfo( _lineInfo ), properties( None ) { std::ostringstream oss; for( std::set::const_iterator it = _tags.begin(), itEnd = _tags.end(); it != itEnd; ++it ) { oss << "[" << *it << "]"; std::string lcaseTag = toLower( *it ); properties = static_cast( properties | parseSpecialTag( lcaseTag ) ); lcaseTags.insert( lcaseTag ); } tagsAsString = oss.str(); } TestCaseInfo::TestCaseInfo( TestCaseInfo const& other ) : name( other.name ), className( other.className ), description( other.description ), tags( other.tags ), lcaseTags( other.lcaseTags ), tagsAsString( other.tagsAsString ), lineInfo( other.lineInfo ), properties( other.properties ) {} bool TestCaseInfo::isHidden() const { return ( properties & IsHidden ) != 0; } bool TestCaseInfo::throws() const { return ( properties & Throws ) != 0; } bool TestCaseInfo::okToFail() const { return ( properties & (ShouldFail | MayFail ) ) != 0; } bool TestCaseInfo::expectedToFail() const { return ( properties & (ShouldFail ) ) != 0; } TestCase::TestCase( ITestCase* testCase, TestCaseInfo const& info ) : TestCaseInfo( info ), test( testCase ) {} TestCase::TestCase( TestCase const& other ) : TestCaseInfo( other ), test( other.test ) {} TestCase TestCase::withName( std::string const& _newName ) const { TestCase other( *this ); other.name = _newName; return other; } void TestCase::swap( TestCase& other ) { test.swap( other.test ); name.swap( other.name ); className.swap( other.className ); description.swap( other.description ); tags.swap( other.tags ); lcaseTags.swap( other.lcaseTags ); tagsAsString.swap( other.tagsAsString ); std::swap( TestCaseInfo::properties, static_cast( other ).properties ); std::swap( lineInfo, other.lineInfo ); } void TestCase::invoke() const { test->invoke(); } bool TestCase::operator == ( TestCase const& other ) const { return test.get() == other.test.get() && name == other.name && className == other.className; } bool TestCase::operator < ( TestCase const& other ) const { return name < other.name; } TestCase& TestCase::operator = ( TestCase const& other ) { TestCase temp( other ); swap( temp ); return *this; } TestCaseInfo const& TestCase::getTestCaseInfo() const { return *this; } } // end namespace Catch // #included from: catch_version.hpp #define TWOBLUECUBES_CATCH_VERSION_HPP_INCLUDED namespace Catch { // These numbers are maintained by a script Version libraryVersion( 1, 0, 53, "master" ); } // #included from: catch_message.hpp #define TWOBLUECUBES_CATCH_MESSAGE_HPP_INCLUDED namespace Catch { MessageInfo::MessageInfo( std::string const& _macroName, SourceLineInfo const& _lineInfo, ResultWas::OfType _type ) : macroName( _macroName ), lineInfo( _lineInfo ), type( _type ), sequence( ++globalCount ) {} // This may need protecting if threading support is added unsigned int MessageInfo::globalCount = 0; //////////////////////////////////////////////////////////////////////////// ScopedMessage::ScopedMessage( MessageBuilder const& builder ) : m_info( builder.m_info ) { m_info.message = builder.m_stream.str(); getResultCapture().pushScopedMessage( m_info ); } ScopedMessage::ScopedMessage( ScopedMessage const& other ) : m_info( other.m_info ) {} ScopedMessage::~ScopedMessage() { getResultCapture().popScopedMessage( m_info ); } } // end namespace Catch // #included from: catch_legacy_reporter_adapter.hpp #define TWOBLUECUBES_CATCH_LEGACY_REPORTER_ADAPTER_HPP_INCLUDED // #included from: catch_legacy_reporter_adapter.h #define TWOBLUECUBES_CATCH_LEGACY_REPORTER_ADAPTER_H_INCLUDED namespace Catch { // Deprecated struct IReporter : IShared { virtual ~IReporter(); virtual bool shouldRedirectStdout() const = 0; virtual void StartTesting() = 0; virtual void EndTesting( Totals const& totals ) = 0; virtual void StartGroup( std::string const& groupName ) = 0; virtual void EndGroup( std::string const& groupName, Totals const& totals ) = 0; virtual void StartTestCase( TestCaseInfo const& testInfo ) = 0; virtual void EndTestCase( TestCaseInfo const& testInfo, Totals const& totals, std::string const& stdOut, std::string const& stdErr ) = 0; virtual void StartSection( std::string const& sectionName, std::string const& description ) = 0; virtual void EndSection( std::string const& sectionName, Counts const& assertions ) = 0; virtual void NoAssertionsInSection( std::string const& sectionName ) = 0; virtual void NoAssertionsInTestCase( std::string const& testName ) = 0; virtual void Aborted() = 0; virtual void Result( AssertionResult const& result ) = 0; }; class LegacyReporterAdapter : public SharedImpl { public: LegacyReporterAdapter( Ptr const& legacyReporter ); virtual ~LegacyReporterAdapter(); virtual ReporterPreferences getPreferences() const; virtual void noMatchingTestCases( std::string const& ); virtual void testRunStarting( TestRunInfo const& ); virtual void testGroupStarting( GroupInfo const& groupInfo ); virtual void testCaseStarting( TestCaseInfo const& testInfo ); virtual void sectionStarting( SectionInfo const& sectionInfo ); virtual void assertionStarting( AssertionInfo const& ); virtual bool assertionEnded( AssertionStats const& assertionStats ); virtual void sectionEnded( SectionStats const& sectionStats ); virtual void testCaseEnded( TestCaseStats const& testCaseStats ); virtual void testGroupEnded( TestGroupStats const& testGroupStats ); virtual void testRunEnded( TestRunStats const& testRunStats ); private: Ptr m_legacyReporter; }; } namespace Catch { LegacyReporterAdapter::LegacyReporterAdapter( Ptr const& legacyReporter ) : m_legacyReporter( legacyReporter ) {} LegacyReporterAdapter::~LegacyReporterAdapter() {} ReporterPreferences LegacyReporterAdapter::getPreferences() const { ReporterPreferences prefs; prefs.shouldRedirectStdOut = m_legacyReporter->shouldRedirectStdout(); return prefs; } void LegacyReporterAdapter::noMatchingTestCases( std::string const& ) {} void LegacyReporterAdapter::testRunStarting( TestRunInfo const& ) { m_legacyReporter->StartTesting(); } void LegacyReporterAdapter::testGroupStarting( GroupInfo const& groupInfo ) { m_legacyReporter->StartGroup( groupInfo.name ); } void LegacyReporterAdapter::testCaseStarting( TestCaseInfo const& testInfo ) { m_legacyReporter->StartTestCase( testInfo ); } void LegacyReporterAdapter::sectionStarting( SectionInfo const& sectionInfo ) { m_legacyReporter->StartSection( sectionInfo.name, sectionInfo.description ); } void LegacyReporterAdapter::assertionStarting( AssertionInfo const& ) { // Not on legacy interface } bool LegacyReporterAdapter::assertionEnded( AssertionStats const& assertionStats ) { if( assertionStats.assertionResult.getResultType() != ResultWas::Ok ) { for( std::vector::const_iterator it = assertionStats.infoMessages.begin(), itEnd = assertionStats.infoMessages.end(); it != itEnd; ++it ) { if( it->type == ResultWas::Info ) { ResultBuilder rb( it->macroName.c_str(), it->lineInfo, "", ResultDisposition::Normal ); rb << it->message; rb.setResultType( ResultWas::Info ); AssertionResult result = rb.build(); m_legacyReporter->Result( result ); } } } m_legacyReporter->Result( assertionStats.assertionResult ); return true; } void LegacyReporterAdapter::sectionEnded( SectionStats const& sectionStats ) { if( sectionStats.missingAssertions ) m_legacyReporter->NoAssertionsInSection( sectionStats.sectionInfo.name ); m_legacyReporter->EndSection( sectionStats.sectionInfo.name, sectionStats.assertions ); } void LegacyReporterAdapter::testCaseEnded( TestCaseStats const& testCaseStats ) { m_legacyReporter->EndTestCase ( testCaseStats.testInfo, testCaseStats.totals, testCaseStats.stdOut, testCaseStats.stdErr ); } void LegacyReporterAdapter::testGroupEnded( TestGroupStats const& testGroupStats ) { if( testGroupStats.aborting ) m_legacyReporter->Aborted(); m_legacyReporter->EndGroup( testGroupStats.groupInfo.name, testGroupStats.totals ); } void LegacyReporterAdapter::testRunEnded( TestRunStats const& testRunStats ) { m_legacyReporter->EndTesting( testRunStats.totals ); } } // #included from: catch_timer.hpp #ifdef __clang__ #pragma clang diagnostic push #pragma clang diagnostic ignored "-Wc++11-long-long" #endif #ifdef CATCH_PLATFORM_WINDOWS #include #else #include #endif namespace Catch { namespace { #ifdef CATCH_PLATFORM_WINDOWS uint64_t getCurrentTicks() { static uint64_t hz=0, hzo=0; if (!hz) { QueryPerformanceFrequency((LARGE_INTEGER*)&hz); QueryPerformanceCounter((LARGE_INTEGER*)&hzo); } uint64_t t; QueryPerformanceCounter((LARGE_INTEGER*)&t); return ((t-hzo)*1000000)/hz; } #else uint64_t getCurrentTicks() { timeval t; gettimeofday(&t,NULL); return static_cast( t.tv_sec ) * 1000000ull + static_cast( t.tv_usec ); } #endif } void Timer::start() { m_ticks = getCurrentTicks(); } unsigned int Timer::getElapsedNanoseconds() const { return static_cast(getCurrentTicks() - m_ticks); } unsigned int Timer::getElapsedMilliseconds() const { return static_cast((getCurrentTicks() - m_ticks)/1000); } double Timer::getElapsedSeconds() const { return (getCurrentTicks() - m_ticks)/1000000.0; } } // namespace Catch #ifdef __clang__ #pragma clang diagnostic pop #endif // #included from: catch_common.hpp #define TWOBLUECUBES_CATCH_COMMON_HPP_INCLUDED namespace Catch { bool startsWith( std::string const& s, std::string const& prefix ) { return s.size() >= prefix.size() && s.substr( 0, prefix.size() ) == prefix; } bool endsWith( std::string const& s, std::string const& suffix ) { return s.size() >= suffix.size() && s.substr( s.size()-suffix.size(), suffix.size() ) == suffix; } bool contains( std::string const& s, std::string const& infix ) { return s.find( infix ) != std::string::npos; } void toLowerInPlace( std::string& s ) { std::transform( s.begin(), s.end(), s.begin(), ::tolower ); } std::string toLower( std::string const& s ) { std::string lc = s; toLowerInPlace( lc ); return lc; } std::string trim( std::string const& str ) { static char const* whitespaceChars = "\n\r\t "; std::string::size_type start = str.find_first_not_of( whitespaceChars ); std::string::size_type end = str.find_last_not_of( whitespaceChars ); return start != std::string::npos ? str.substr( start, 1+end-start ) : ""; } pluralise::pluralise( std::size_t count, std::string const& label ) : m_count( count ), m_label( label ) {} std::ostream& operator << ( std::ostream& os, pluralise const& pluraliser ) { os << pluraliser.m_count << " " << pluraliser.m_label; if( pluraliser.m_count != 1 ) os << "s"; return os; } SourceLineInfo::SourceLineInfo() : line( 0 ){} SourceLineInfo::SourceLineInfo( char const* _file, std::size_t _line ) : file( _file ), line( _line ) {} SourceLineInfo::SourceLineInfo( SourceLineInfo const& other ) : file( other.file ), line( other.line ) {} bool SourceLineInfo::empty() const { return file.empty(); } bool SourceLineInfo::operator == ( SourceLineInfo const& other ) const { return line == other.line && file == other.file; } std::ostream& operator << ( std::ostream& os, SourceLineInfo const& info ) { #ifndef __GNUG__ os << info.file << "(" << info.line << ")"; #else os << info.file << ":" << info.line; #endif return os; } void throwLogicError( std::string const& message, SourceLineInfo const& locationInfo ) { std::ostringstream oss; oss << locationInfo << ": Internal Catch error: '" << message << "'"; if( alwaysTrue() ) throw std::logic_error( oss.str() ); } } // #included from: catch_section.hpp #define TWOBLUECUBES_CATCH_SECTION_HPP_INCLUDED namespace Catch { SectionInfo::SectionInfo ( SourceLineInfo const& _lineInfo, std::string const& _name, std::string const& _description ) : name( _name ), description( _description ), lineInfo( _lineInfo ) {} Section::Section( SectionInfo const& info ) : m_info( info ), m_sectionIncluded( getResultCapture().sectionStarted( m_info, m_assertions ) ) { m_timer.start(); } Section::~Section() { if( m_sectionIncluded ) getResultCapture().sectionEnded( m_info, m_assertions, m_timer.getElapsedSeconds() ); } // This indicates whether the section should be executed or not Section::operator bool() const { return m_sectionIncluded; } } // end namespace Catch // #included from: catch_debugger.hpp #define TWOBLUECUBES_CATCH_DEBUGGER_HPP_INCLUDED #include #ifdef CATCH_PLATFORM_MAC #include #include #include #include #include namespace Catch{ // The following function is taken directly from the following technical note: // http://developer.apple.com/library/mac/#qa/qa2004/qa1361.html // Returns true if the current process is being debugged (either // running under the debugger or has a debugger attached post facto). bool isDebuggerActive(){ int mib[4]; struct kinfo_proc info; size_t size; // Initialize the flags so that, if sysctl fails for some bizarre // reason, we get a predictable result. info.kp_proc.p_flag = 0; // Initialize mib, which tells sysctl the info we want, in this case // we're looking for information about a specific process ID. mib[0] = CTL_KERN; mib[1] = KERN_PROC; mib[2] = KERN_PROC_PID; mib[3] = getpid(); // Call sysctl. size = sizeof(info); if( sysctl(mib, sizeof(mib) / sizeof(*mib), &info, &size, NULL, 0) != 0 ) { std::cerr << "\n** Call to sysctl failed - unable to determine if debugger is active **\n" << std::endl; return false; } // We're being debugged if the P_TRACED flag is set. return ( (info.kp_proc.p_flag & P_TRACED) != 0 ); } } // namespace Catch #elif defined(_MSC_VER) extern "C" __declspec(dllimport) int __stdcall IsDebuggerPresent(); namespace Catch { bool isDebuggerActive() { return IsDebuggerPresent() != 0; } } #elif defined(__MINGW32__) extern "C" __declspec(dllimport) int __stdcall IsDebuggerPresent(); namespace Catch { bool isDebuggerActive() { return IsDebuggerPresent() != 0; } } #else namespace Catch { inline bool isDebuggerActive() { return false; } } #endif // Platform #ifdef CATCH_PLATFORM_WINDOWS extern "C" __declspec(dllimport) void __stdcall OutputDebugStringA( const char* ); namespace Catch { void writeToDebugConsole( std::string const& text ) { ::OutputDebugStringA( text.c_str() ); } } #else namespace Catch { void writeToDebugConsole( std::string const& text ) { // !TBD: Need a version for Mac/ XCode and other IDEs std::cout << text; } } #endif // Platform // #included from: catch_tostring.hpp #define TWOBLUECUBES_CATCH_TOSTRING_HPP_INCLUDED namespace Catch { namespace Detail { namespace { struct Endianness { enum Arch { Big, Little }; static Arch which() { union _{ int asInt; char asChar[sizeof (int)]; } u; u.asInt = 1; return ( u.asChar[sizeof(int)-1] == 1 ) ? Big : Little; } }; } std::string rawMemoryToString( const void *object, std::size_t size ) { // Reverse order for little endian architectures int i = 0, end = static_cast( size ), inc = 1; if( Endianness::which() == Endianness::Little ) { i = end-1; end = inc = -1; } unsigned char const *bytes = static_cast(object); std::ostringstream os; os << "0x" << std::setfill('0') << std::hex; for( ; i != end; i += inc ) os << std::setw(2) << static_cast(bytes[i]); return os.str(); } } std::string toString( std::string const& value ) { std::string s = value; if( getCurrentContext().getConfig()->showInvisibles() ) { for(size_t i = 0; i < s.size(); ++i ) { std::string subs; switch( s[i] ) { case '\n': subs = "\\n"; break; case '\t': subs = "\\t"; break; default: break; } if( !subs.empty() ) { s = s.substr( 0, i ) + subs + s.substr( i+1 ); ++i; } } } return "\"" + s + "\""; } std::string toString( std::wstring const& value ) { std::string s; s.reserve( value.size() ); for(size_t i = 0; i < value.size(); ++i ) s += value[i] <= 0xff ? static_cast( value[i] ) : '?'; return toString( s ); } std::string toString( const char* const value ) { return value ? Catch::toString( std::string( value ) ) : std::string( "{null string}" ); } std::string toString( char* const value ) { return Catch::toString( static_cast( value ) ); } std::string toString( const wchar_t* const value ) { return value ? Catch::toString( std::wstring(value) ) : std::string( "{null string}" ); } std::string toString( wchar_t* const value ) { return Catch::toString( static_cast( value ) ); } std::string toString( int value ) { std::ostringstream oss; oss << value; return oss.str(); } std::string toString( unsigned long value ) { std::ostringstream oss; if( value > 8192 ) oss << "0x" << std::hex << value; else oss << value; return oss.str(); } std::string toString( unsigned int value ) { return toString( static_cast( value ) ); } template std::string fpToString( T value, int precision ) { std::ostringstream oss; oss << std::setprecision( precision ) << std::fixed << value; std::string d = oss.str(); std::size_t i = d.find_last_not_of( '0' ); if( i != std::string::npos && i != d.size()-1 ) { if( d[i] == '.' ) i++; d = d.substr( 0, i+1 ); } return d; } std::string toString( const double value ) { return fpToString( value, 10 ); } std::string toString( const float value ) { return fpToString( value, 5 ) + "f"; } std::string toString( bool value ) { return value ? "true" : "false"; } std::string toString( char value ) { return value < ' ' ? toString( static_cast( value ) ) : Detail::makeString( value ); } std::string toString( signed char value ) { return toString( static_cast( value ) ); } std::string toString( unsigned char value ) { return toString( static_cast( value ) ); } #ifdef CATCH_CONFIG_CPP11_NULLPTR std::string toString( std::nullptr_t ) { return "nullptr"; } #endif #ifdef __OBJC__ std::string toString( NSString const * const& nsstring ) { if( !nsstring ) return "nil"; return "@" + toString([nsstring UTF8String]); } std::string toString( NSString * CATCH_ARC_STRONG const& nsstring ) { if( !nsstring ) return "nil"; return "@" + toString([nsstring UTF8String]); } std::string toString( NSObject* const& nsObject ) { return toString( [nsObject description] ); } #endif } // end namespace Catch // #included from: catch_result_builder.hpp #define TWOBLUECUBES_CATCH_RESULT_BUILDER_HPP_INCLUDED namespace Catch { ResultBuilder::ResultBuilder( char const* macroName, SourceLineInfo const& lineInfo, char const* capturedExpression, ResultDisposition::Flags resultDisposition ) : m_assertionInfo( macroName, lineInfo, capturedExpression, resultDisposition ), m_shouldDebugBreak( false ), m_shouldThrow( false ) {} ResultBuilder& ResultBuilder::setResultType( ResultWas::OfType result ) { m_data.resultType = result; return *this; } ResultBuilder& ResultBuilder::setResultType( bool result ) { m_data.resultType = result ? ResultWas::Ok : ResultWas::ExpressionFailed; return *this; } ResultBuilder& ResultBuilder::setLhs( std::string const& lhs ) { m_exprComponents.lhs = lhs; return *this; } ResultBuilder& ResultBuilder::setRhs( std::string const& rhs ) { m_exprComponents.rhs = rhs; return *this; } ResultBuilder& ResultBuilder::setOp( std::string const& op ) { m_exprComponents.op = op; return *this; } void ResultBuilder::endExpression() { m_exprComponents.testFalse = isFalseTest( m_assertionInfo.resultDisposition ); captureExpression(); } void ResultBuilder::useActiveException( ResultDisposition::Flags resultDisposition ) { m_assertionInfo.resultDisposition = resultDisposition; m_stream.oss << Catch::translateActiveException(); captureResult( ResultWas::ThrewException ); } void ResultBuilder::captureResult( ResultWas::OfType resultType ) { setResultType( resultType ); captureExpression(); } void ResultBuilder::captureExpression() { AssertionResult result = build(); getResultCapture().assertionEnded( result ); if( !result.isOk() ) { if( getCurrentContext().getConfig()->shouldDebugBreak() ) m_shouldDebugBreak = true; if( getCurrentContext().getRunner()->aborting() || m_assertionInfo.resultDisposition == ResultDisposition::Normal ) m_shouldThrow = true; } } void ResultBuilder::react() { if( m_shouldThrow ) throw Catch::TestFailureException(); } bool ResultBuilder::shouldDebugBreak() const { return m_shouldDebugBreak; } bool ResultBuilder::allowThrows() const { return getCurrentContext().getConfig()->allowThrows(); } AssertionResult ResultBuilder::build() const { assert( m_data.resultType != ResultWas::Unknown ); AssertionResultData data = m_data; // Flip bool results if testFalse is set if( m_exprComponents.testFalse ) { if( data.resultType == ResultWas::Ok ) data.resultType = ResultWas::ExpressionFailed; else if( data.resultType == ResultWas::ExpressionFailed ) data.resultType = ResultWas::Ok; } data.message = m_stream.oss.str(); data.reconstructedExpression = reconstructExpression(); if( m_exprComponents.testFalse ) { if( m_exprComponents.op == "" ) data.reconstructedExpression = "!" + data.reconstructedExpression; else data.reconstructedExpression = "!(" + data.reconstructedExpression + ")"; } return AssertionResult( m_assertionInfo, data ); } std::string ResultBuilder::reconstructExpression() const { if( m_exprComponents.op == "" ) return m_exprComponents.lhs.empty() ? m_assertionInfo.capturedExpression : m_exprComponents.op + m_exprComponents.lhs; else if( m_exprComponents.op == "matches" ) return m_exprComponents.lhs + " " + m_exprComponents.rhs; else if( m_exprComponents.op != "!" ) { if( m_exprComponents.lhs.size() + m_exprComponents.rhs.size() < 40 && m_exprComponents.lhs.find("\n") == std::string::npos && m_exprComponents.rhs.find("\n") == std::string::npos ) return m_exprComponents.lhs + " " + m_exprComponents.op + " " + m_exprComponents.rhs; else return m_exprComponents.lhs + "\n" + m_exprComponents.op + "\n" + m_exprComponents.rhs; } else return "{can't expand - use " + m_assertionInfo.macroName + "_FALSE( " + m_assertionInfo.capturedExpression.substr(1) + " ) instead of " + m_assertionInfo.macroName + "( " + m_assertionInfo.capturedExpression + " ) for better diagnostics}"; } } // end namespace Catch // #included from: catch_tag_alias_registry.hpp #define TWOBLUECUBES_CATCH_TAG_ALIAS_REGISTRY_HPP_INCLUDED // #included from: catch_tag_alias_registry.h #define TWOBLUECUBES_CATCH_TAG_ALIAS_REGISTRY_H_INCLUDED #include namespace Catch { class TagAliasRegistry : public ITagAliasRegistry { public: virtual ~TagAliasRegistry(); virtual Option find( std::string const& alias ) const; virtual std::string expandAliases( std::string const& unexpandedTestSpec ) const; void add( char const* alias, char const* tag, SourceLineInfo const& lineInfo ); static TagAliasRegistry& get(); private: std::map m_registry; }; } // end namespace Catch #include #include namespace Catch { TagAliasRegistry::~TagAliasRegistry() {} Option TagAliasRegistry::find( std::string const& alias ) const { std::map::const_iterator it = m_registry.find( alias ); if( it != m_registry.end() ) return it->second; else return Option(); } std::string TagAliasRegistry::expandAliases( std::string const& unexpandedTestSpec ) const { std::string expandedTestSpec = unexpandedTestSpec; for( std::map::const_iterator it = m_registry.begin(), itEnd = m_registry.end(); it != itEnd; ++it ) { std::size_t pos = expandedTestSpec.find( it->first ); if( pos != std::string::npos ) { expandedTestSpec = expandedTestSpec.substr( 0, pos ) + it->second.tag + expandedTestSpec.substr( pos + it->first.size() ); } } return expandedTestSpec; } void TagAliasRegistry::add( char const* alias, char const* tag, SourceLineInfo const& lineInfo ) { if( !startsWith( alias, "[@" ) || !endsWith( alias, "]" ) ) { std::ostringstream oss; oss << "error: tag alias, \"" << alias << "\" is not of the form [@alias name].\n" << lineInfo; throw std::domain_error( oss.str().c_str() ); } if( !m_registry.insert( std::make_pair( alias, TagAlias( tag, lineInfo ) ) ).second ) { std::ostringstream oss; oss << "error: tag alias, \"" << alias << "\" already registered.\n" << "\tFirst seen at " << find(alias)->lineInfo << "\n" << "\tRedefined at " << lineInfo; throw std::domain_error( oss.str().c_str() ); } } TagAliasRegistry& TagAliasRegistry::get() { static TagAliasRegistry instance; return instance; } ITagAliasRegistry::~ITagAliasRegistry() {} ITagAliasRegistry const& ITagAliasRegistry::get() { return TagAliasRegistry::get(); } RegistrarForTagAliases::RegistrarForTagAliases( char const* alias, char const* tag, SourceLineInfo const& lineInfo ) { try { TagAliasRegistry::get().add( alias, tag, lineInfo ); } catch( std::exception& ex ) { Colour colourGuard( Colour::Red ); std::cerr << ex.what() << std::endl; exit(1); } } } // end namespace Catch // #included from: ../reporters/catch_reporter_xml.hpp #define TWOBLUECUBES_CATCH_REPORTER_XML_HPP_INCLUDED // #included from: catch_reporter_bases.hpp #define TWOBLUECUBES_CATCH_REPORTER_BASES_HPP_INCLUDED namespace Catch { struct StreamingReporterBase : SharedImpl { StreamingReporterBase( ReporterConfig const& _config ) : m_config( _config.fullConfig() ), stream( _config.stream() ) {} virtual ~StreamingReporterBase(); virtual void noMatchingTestCases( std::string const& ) {} virtual void testRunStarting( TestRunInfo const& _testRunInfo ) { currentTestRunInfo = _testRunInfo; } virtual void testGroupStarting( GroupInfo const& _groupInfo ) { currentGroupInfo = _groupInfo; } virtual void testCaseStarting( TestCaseInfo const& _testInfo ) { currentTestCaseInfo = _testInfo; } virtual void sectionStarting( SectionInfo const& _sectionInfo ) { m_sectionStack.push_back( _sectionInfo ); } virtual void sectionEnded( SectionStats const& /* _sectionStats */ ) { m_sectionStack.pop_back(); } virtual void testCaseEnded( TestCaseStats const& /* _testCaseStats */ ) { currentTestCaseInfo.reset(); assert( m_sectionStack.empty() ); } virtual void testGroupEnded( TestGroupStats const& /* _testGroupStats */ ) { currentGroupInfo.reset(); } virtual void testRunEnded( TestRunStats const& /* _testRunStats */ ) { currentTestCaseInfo.reset(); currentGroupInfo.reset(); currentTestRunInfo.reset(); } Ptr m_config; std::ostream& stream; LazyStat currentTestRunInfo; LazyStat currentGroupInfo; LazyStat currentTestCaseInfo; std::vector m_sectionStack; }; struct CumulativeReporterBase : SharedImpl { template struct Node : SharedImpl<> { explicit Node( T const& _value ) : value( _value ) {} virtual ~Node() {} typedef std::vector > ChildNodes; T value; ChildNodes children; }; struct SectionNode : SharedImpl<> { explicit SectionNode( SectionStats const& _stats ) : stats( _stats ) {} virtual ~SectionNode(); bool operator == ( SectionNode const& other ) const { return stats.sectionInfo.lineInfo == other.stats.sectionInfo.lineInfo; } bool operator == ( Ptr const& other ) const { return operator==( *other ); } SectionStats stats; typedef std::vector > ChildSections; typedef std::vector Assertions; ChildSections childSections; Assertions assertions; std::string stdOut; std::string stdErr; }; struct BySectionInfo { BySectionInfo( SectionInfo const& other ) : m_other( other ) {} BySectionInfo( BySectionInfo const& other ) : m_other( other.m_other ) {} bool operator() ( Ptr const& node ) const { return node->stats.sectionInfo.lineInfo == m_other.lineInfo; } private: void operator=( BySectionInfo const& ); SectionInfo const& m_other; }; typedef Node TestCaseNode; typedef Node TestGroupNode; typedef Node TestRunNode; CumulativeReporterBase( ReporterConfig const& _config ) : m_config( _config.fullConfig() ), stream( _config.stream() ) {} ~CumulativeReporterBase(); virtual void testRunStarting( TestRunInfo const& ) {} virtual void testGroupStarting( GroupInfo const& ) {} virtual void testCaseStarting( TestCaseInfo const& ) {} virtual void sectionStarting( SectionInfo const& sectionInfo ) { SectionStats incompleteStats( sectionInfo, Counts(), 0, false ); Ptr node; if( m_sectionStack.empty() ) { if( !m_rootSection ) m_rootSection = new SectionNode( incompleteStats ); node = m_rootSection; } else { SectionNode& parentNode = *m_sectionStack.back(); SectionNode::ChildSections::const_iterator it = std::find_if( parentNode.childSections.begin(), parentNode.childSections.end(), BySectionInfo( sectionInfo ) ); if( it == parentNode.childSections.end() ) { node = new SectionNode( incompleteStats ); parentNode.childSections.push_back( node ); } else node = *it; } m_sectionStack.push_back( node ); m_deepestSection = node; } virtual void assertionStarting( AssertionInfo const& ) {} virtual bool assertionEnded( AssertionStats const& assertionStats ) { assert( !m_sectionStack.empty() ); SectionNode& sectionNode = *m_sectionStack.back(); sectionNode.assertions.push_back( assertionStats ); return true; } virtual void sectionEnded( SectionStats const& sectionStats ) { assert( !m_sectionStack.empty() ); SectionNode& node = *m_sectionStack.back(); node.stats = sectionStats; m_sectionStack.pop_back(); } virtual void testCaseEnded( TestCaseStats const& testCaseStats ) { Ptr node = new TestCaseNode( testCaseStats ); assert( m_sectionStack.size() == 0 ); node->children.push_back( m_rootSection ); m_testCases.push_back( node ); m_rootSection.reset(); assert( m_deepestSection ); m_deepestSection->stdOut = testCaseStats.stdOut; m_deepestSection->stdErr = testCaseStats.stdErr; } virtual void testGroupEnded( TestGroupStats const& testGroupStats ) { Ptr node = new TestGroupNode( testGroupStats ); node->children.swap( m_testCases ); m_testGroups.push_back( node ); } virtual void testRunEnded( TestRunStats const& testRunStats ) { Ptr node = new TestRunNode( testRunStats ); node->children.swap( m_testGroups ); m_testRuns.push_back( node ); testRunEndedCumulative(); } virtual void testRunEndedCumulative() = 0; Ptr m_config; std::ostream& stream; std::vector m_assertions; std::vector > > m_sections; std::vector > m_testCases; std::vector > m_testGroups; std::vector > m_testRuns; Ptr m_rootSection; Ptr m_deepestSection; std::vector > m_sectionStack; }; } // end namespace Catch // #included from: ../internal/catch_reporter_registrars.hpp #define TWOBLUECUBES_CATCH_REPORTER_REGISTRARS_HPP_INCLUDED namespace Catch { template class LegacyReporterRegistrar { class ReporterFactory : public IReporterFactory { virtual IStreamingReporter* create( ReporterConfig const& config ) const { return new LegacyReporterAdapter( new T( config ) ); } virtual std::string getDescription() const { return T::getDescription(); } }; public: LegacyReporterRegistrar( std::string const& name ) { getMutableRegistryHub().registerReporter( name, new ReporterFactory() ); } }; template class ReporterRegistrar { class ReporterFactory : public IReporterFactory { // *** Please Note ***: // - If you end up here looking at a compiler error because it's trying to register // your custom reporter class be aware that the native reporter interface has changed // to IStreamingReporter. The "legacy" interface, IReporter, is still supported via // an adapter. Just use REGISTER_LEGACY_REPORTER to take advantage of the adapter. // However please consider updating to the new interface as the old one is now // deprecated and will probably be removed quite soon! // Please contact me via github if you have any questions at all about this. // In fact, ideally, please contact me anyway to let me know you've hit this - as I have // no idea who is actually using custom reporters at all (possibly no-one!). // The new interface is designed to minimise exposure to interface changes in the future. virtual IStreamingReporter* create( ReporterConfig const& config ) const { return new T( config ); } virtual std::string getDescription() const { return T::getDescription(); } }; public: ReporterRegistrar( std::string const& name ) { getMutableRegistryHub().registerReporter( name, new ReporterFactory() ); } }; } #define INTERNAL_CATCH_REGISTER_LEGACY_REPORTER( name, reporterType ) \ namespace{ Catch::LegacyReporterRegistrar catch_internal_RegistrarFor##reporterType( name ); } #define INTERNAL_CATCH_REGISTER_REPORTER( name, reporterType ) \ namespace{ Catch::ReporterRegistrar catch_internal_RegistrarFor##reporterType( name ); } // #included from: ../internal/catch_xmlwriter.hpp #define TWOBLUECUBES_CATCH_XMLWRITER_HPP_INCLUDED #include #include #include #include namespace Catch { class XmlWriter { public: class ScopedElement { public: ScopedElement( XmlWriter* writer ) : m_writer( writer ) {} ScopedElement( ScopedElement const& other ) : m_writer( other.m_writer ){ other.m_writer = NULL; } ~ScopedElement() { if( m_writer ) m_writer->endElement(); } ScopedElement& writeText( std::string const& text, bool indent = true ) { m_writer->writeText( text, indent ); return *this; } template ScopedElement& writeAttribute( std::string const& name, T const& attribute ) { m_writer->writeAttribute( name, attribute ); return *this; } private: mutable XmlWriter* m_writer; }; XmlWriter() : m_tagIsOpen( false ), m_needsNewline( false ), m_os( &std::cout ) {} XmlWriter( std::ostream& os ) : m_tagIsOpen( false ), m_needsNewline( false ), m_os( &os ) {} ~XmlWriter() { while( !m_tags.empty() ) endElement(); } //# ifndef CATCH_CPP11_OR_GREATER // XmlWriter& operator = ( XmlWriter const& other ) { // XmlWriter temp( other ); // swap( temp ); // return *this; // } //# else // XmlWriter( XmlWriter const& ) = default; // XmlWriter( XmlWriter && ) = default; // XmlWriter& operator = ( XmlWriter const& ) = default; // XmlWriter& operator = ( XmlWriter && ) = default; //# endif // // void swap( XmlWriter& other ) { // std::swap( m_tagIsOpen, other.m_tagIsOpen ); // std::swap( m_needsNewline, other.m_needsNewline ); // std::swap( m_tags, other.m_tags ); // std::swap( m_indent, other.m_indent ); // std::swap( m_os, other.m_os ); // } XmlWriter& startElement( std::string const& name ) { ensureTagClosed(); newlineIfNecessary(); stream() << m_indent << "<" << name; m_tags.push_back( name ); m_indent += " "; m_tagIsOpen = true; return *this; } ScopedElement scopedElement( std::string const& name ) { ScopedElement scoped( this ); startElement( name ); return scoped; } XmlWriter& endElement() { newlineIfNecessary(); m_indent = m_indent.substr( 0, m_indent.size()-2 ); if( m_tagIsOpen ) { stream() << "/>\n"; m_tagIsOpen = false; } else { stream() << m_indent << "\n"; } m_tags.pop_back(); return *this; } XmlWriter& writeAttribute( std::string const& name, std::string const& attribute ) { if( !name.empty() && !attribute.empty() ) { stream() << " " << name << "=\""; writeEncodedText( attribute ); stream() << "\""; } return *this; } XmlWriter& writeAttribute( std::string const& name, bool attribute ) { stream() << " " << name << "=\"" << ( attribute ? "true" : "false" ) << "\""; return *this; } template XmlWriter& writeAttribute( std::string const& name, T const& attribute ) { if( !name.empty() ) stream() << " " << name << "=\"" << attribute << "\""; return *this; } XmlWriter& writeText( std::string const& text, bool indent = true ) { if( !text.empty() ){ bool tagWasOpen = m_tagIsOpen; ensureTagClosed(); if( tagWasOpen && indent ) stream() << m_indent; writeEncodedText( text ); m_needsNewline = true; } return *this; } XmlWriter& writeComment( std::string const& text ) { ensureTagClosed(); stream() << m_indent << ""; m_needsNewline = true; return *this; } XmlWriter& writeBlankLine() { ensureTagClosed(); stream() << "\n"; return *this; } void setStream( std::ostream& os ) { m_os = &os; } private: XmlWriter( XmlWriter const& ); void operator=( XmlWriter const& ); std::ostream& stream() { return *m_os; } void ensureTagClosed() { if( m_tagIsOpen ) { stream() << ">\n"; m_tagIsOpen = false; } } void newlineIfNecessary() { if( m_needsNewline ) { stream() << "\n"; m_needsNewline = false; } } void writeEncodedText( std::string const& text ) { static const char* charsToEncode = "<&\""; std::string mtext = text; std::string::size_type pos = mtext.find_first_of( charsToEncode ); while( pos != std::string::npos ) { stream() << mtext.substr( 0, pos ); switch( mtext[pos] ) { case '<': stream() << "<"; break; case '&': stream() << "&"; break; case '\"': stream() << """; break; } mtext = mtext.substr( pos+1 ); pos = mtext.find_first_of( charsToEncode ); } stream() << mtext; } bool m_tagIsOpen; bool m_needsNewline; std::vector m_tags; std::string m_indent; std::ostream* m_os; }; } namespace Catch { class XmlReporter : public SharedImpl { public: XmlReporter( ReporterConfig const& config ) : m_config( config ), m_sectionDepth( 0 ) {} static std::string getDescription() { return "Reports test results as an XML document"; } virtual ~XmlReporter(); private: // IReporter virtual bool shouldRedirectStdout() const { return true; } virtual void StartTesting() { m_xml.setStream( m_config.stream() ); m_xml.startElement( "Catch" ); if( !m_config.fullConfig()->name().empty() ) m_xml.writeAttribute( "name", m_config.fullConfig()->name() ); } virtual void EndTesting( const Totals& totals ) { m_xml.scopedElement( "OverallResults" ) .writeAttribute( "successes", totals.assertions.passed ) .writeAttribute( "failures", totals.assertions.failed ) .writeAttribute( "expectedFailures", totals.assertions.failedButOk ); m_xml.endElement(); } virtual void StartGroup( const std::string& groupName ) { m_xml.startElement( "Group" ) .writeAttribute( "name", groupName ); } virtual void EndGroup( const std::string&, const Totals& totals ) { m_xml.scopedElement( "OverallResults" ) .writeAttribute( "successes", totals.assertions.passed ) .writeAttribute( "failures", totals.assertions.failed ) .writeAttribute( "expectedFailures", totals.assertions.failedButOk ); m_xml.endElement(); } virtual void StartSection( const std::string& sectionName, const std::string& description ) { if( m_sectionDepth++ > 0 ) { m_xml.startElement( "Section" ) .writeAttribute( "name", trim( sectionName ) ) .writeAttribute( "description", description ); } } virtual void NoAssertionsInSection( const std::string& ) {} virtual void NoAssertionsInTestCase( const std::string& ) {} virtual void EndSection( const std::string& /*sectionName*/, const Counts& assertions ) { if( --m_sectionDepth > 0 ) { m_xml.scopedElement( "OverallResults" ) .writeAttribute( "successes", assertions.passed ) .writeAttribute( "failures", assertions.failed ) .writeAttribute( "expectedFailures", assertions.failedButOk ); m_xml.endElement(); } } virtual void StartTestCase( const Catch::TestCaseInfo& testInfo ) { m_xml.startElement( "TestCase" ).writeAttribute( "name", trim( testInfo.name ) ); m_currentTestSuccess = true; } virtual void Result( const Catch::AssertionResult& assertionResult ) { if( !m_config.fullConfig()->includeSuccessfulResults() && assertionResult.getResultType() == ResultWas::Ok ) return; if( assertionResult.hasExpression() ) { m_xml.startElement( "Expression" ) .writeAttribute( "success", assertionResult.succeeded() ) .writeAttribute( "filename", assertionResult.getSourceInfo().file ) .writeAttribute( "line", assertionResult.getSourceInfo().line ); m_xml.scopedElement( "Original" ) .writeText( assertionResult.getExpression() ); m_xml.scopedElement( "Expanded" ) .writeText( assertionResult.getExpandedExpression() ); m_currentTestSuccess &= assertionResult.succeeded(); } switch( assertionResult.getResultType() ) { case ResultWas::ThrewException: m_xml.scopedElement( "Exception" ) .writeAttribute( "filename", assertionResult.getSourceInfo().file ) .writeAttribute( "line", assertionResult.getSourceInfo().line ) .writeText( assertionResult.getMessage() ); m_currentTestSuccess = false; break; case ResultWas::Info: m_xml.scopedElement( "Info" ) .writeText( assertionResult.getMessage() ); break; case ResultWas::Warning: m_xml.scopedElement( "Warning" ) .writeText( assertionResult.getMessage() ); break; case ResultWas::ExplicitFailure: m_xml.scopedElement( "Failure" ) .writeText( assertionResult.getMessage() ); m_currentTestSuccess = false; break; case ResultWas::Unknown: case ResultWas::Ok: case ResultWas::FailureBit: case ResultWas::ExpressionFailed: case ResultWas::Exception: case ResultWas::DidntThrowException: break; } if( assertionResult.hasExpression() ) m_xml.endElement(); } virtual void Aborted() { // !TBD } virtual void EndTestCase( const Catch::TestCaseInfo&, const Totals&, const std::string&, const std::string& ) { m_xml.scopedElement( "OverallResult" ).writeAttribute( "success", m_currentTestSuccess ); m_xml.endElement(); } private: ReporterConfig m_config; bool m_currentTestSuccess; XmlWriter m_xml; int m_sectionDepth; }; } // end namespace Catch // #included from: ../reporters/catch_reporter_junit.hpp #define TWOBLUECUBES_CATCH_REPORTER_JUNIT_HPP_INCLUDED #include namespace Catch { class JunitReporter : public CumulativeReporterBase { public: JunitReporter( ReporterConfig const& _config ) : CumulativeReporterBase( _config ), xml( _config.stream() ) {} ~JunitReporter(); static std::string getDescription() { return "Reports test results in an XML format that looks like Ant's junitreport target"; } virtual void noMatchingTestCases( std::string const& /*spec*/ ) {} virtual ReporterPreferences getPreferences() const { ReporterPreferences prefs; prefs.shouldRedirectStdOut = true; return prefs; } virtual void testRunStarting( TestRunInfo const& runInfo ) { CumulativeReporterBase::testRunStarting( runInfo ); xml.startElement( "testsuites" ); } virtual void testGroupStarting( GroupInfo const& groupInfo ) { suiteTimer.start(); stdOutForSuite.str(""); stdErrForSuite.str(""); unexpectedExceptions = 0; CumulativeReporterBase::testGroupStarting( groupInfo ); } virtual bool assertionEnded( AssertionStats const& assertionStats ) { if( assertionStats.assertionResult.getResultType() == ResultWas::ThrewException ) unexpectedExceptions++; return CumulativeReporterBase::assertionEnded( assertionStats ); } virtual void testCaseEnded( TestCaseStats const& testCaseStats ) { stdOutForSuite << testCaseStats.stdOut; stdErrForSuite << testCaseStats.stdErr; CumulativeReporterBase::testCaseEnded( testCaseStats ); } virtual void testGroupEnded( TestGroupStats const& testGroupStats ) { double suiteTime = suiteTimer.getElapsedSeconds(); CumulativeReporterBase::testGroupEnded( testGroupStats ); writeGroup( *m_testGroups.back(), suiteTime ); } virtual void testRunEndedCumulative() { xml.endElement(); } void writeGroup( TestGroupNode const& groupNode, double suiteTime ) { XmlWriter::ScopedElement e = xml.scopedElement( "testsuite" ); TestGroupStats const& stats = groupNode.value; xml.writeAttribute( "name", stats.groupInfo.name ); xml.writeAttribute( "errors", unexpectedExceptions ); xml.writeAttribute( "failures", stats.totals.assertions.failed-unexpectedExceptions ); xml.writeAttribute( "tests", stats.totals.assertions.total() ); xml.writeAttribute( "hostname", "tbd" ); // !TBD if( m_config->showDurations() == ShowDurations::Never ) xml.writeAttribute( "time", "" ); else xml.writeAttribute( "time", suiteTime ); xml.writeAttribute( "timestamp", "tbd" ); // !TBD // Write test cases for( TestGroupNode::ChildNodes::const_iterator it = groupNode.children.begin(), itEnd = groupNode.children.end(); it != itEnd; ++it ) writeTestCase( **it ); xml.scopedElement( "system-out" ).writeText( trim( stdOutForSuite.str() ), false ); xml.scopedElement( "system-err" ).writeText( trim( stdErrForSuite.str() ), false ); } void writeTestCase( TestCaseNode const& testCaseNode ) { TestCaseStats const& stats = testCaseNode.value; // All test cases have exactly one section - which represents the // test case itself. That section may have 0-n nested sections assert( testCaseNode.children.size() == 1 ); SectionNode const& rootSection = *testCaseNode.children.front(); std::string className = stats.testInfo.className; if( className.empty() ) { if( rootSection.childSections.empty() ) className = "global"; } writeSection( className, "", rootSection ); } void writeSection( std::string const& className, std::string const& rootName, SectionNode const& sectionNode ) { std::string name = trim( sectionNode.stats.sectionInfo.name ); if( !rootName.empty() ) name = rootName + "/" + name; if( !sectionNode.assertions.empty() || !sectionNode.stdOut.empty() || !sectionNode.stdErr.empty() ) { XmlWriter::ScopedElement e = xml.scopedElement( "testcase" ); if( className.empty() ) { xml.writeAttribute( "classname", name ); xml.writeAttribute( "name", "root" ); } else { xml.writeAttribute( "classname", className ); xml.writeAttribute( "name", name ); } xml.writeAttribute( "time", toString( sectionNode.stats.durationInSeconds ) ); writeAssertions( sectionNode ); if( !sectionNode.stdOut.empty() ) xml.scopedElement( "system-out" ).writeText( trim( sectionNode.stdOut ), false ); if( !sectionNode.stdErr.empty() ) xml.scopedElement( "system-err" ).writeText( trim( sectionNode.stdErr ), false ); } for( SectionNode::ChildSections::const_iterator it = sectionNode.childSections.begin(), itEnd = sectionNode.childSections.end(); it != itEnd; ++it ) if( className.empty() ) writeSection( name, "", **it ); else writeSection( className, name, **it ); } void writeAssertions( SectionNode const& sectionNode ) { for( SectionNode::Assertions::const_iterator it = sectionNode.assertions.begin(), itEnd = sectionNode.assertions.end(); it != itEnd; ++it ) writeAssertion( *it ); } void writeAssertion( AssertionStats const& stats ) { AssertionResult const& result = stats.assertionResult; if( !result.isOk() ) { std::string elementName; switch( result.getResultType() ) { case ResultWas::ThrewException: elementName = "error"; break; case ResultWas::ExplicitFailure: elementName = "failure"; break; case ResultWas::ExpressionFailed: elementName = "failure"; break; case ResultWas::DidntThrowException: elementName = "failure"; break; // We should never see these here: case ResultWas::Info: case ResultWas::Warning: case ResultWas::Ok: case ResultWas::Unknown: case ResultWas::FailureBit: case ResultWas::Exception: elementName = "internalError"; break; } XmlWriter::ScopedElement e = xml.scopedElement( elementName ); xml.writeAttribute( "message", result.getExpandedExpression() ); xml.writeAttribute( "type", result.getTestMacroName() ); std::ostringstream oss; if( !result.getMessage().empty() ) oss << result.getMessage() << "\n"; for( std::vector::const_iterator it = stats.infoMessages.begin(), itEnd = stats.infoMessages.end(); it != itEnd; ++it ) if( it->type == ResultWas::Info ) oss << it->message << "\n"; oss << "at " << result.getSourceInfo(); xml.writeText( oss.str(), false ); } } XmlWriter xml; Timer suiteTimer; std::ostringstream stdOutForSuite; std::ostringstream stdErrForSuite; unsigned int unexpectedExceptions; }; INTERNAL_CATCH_REGISTER_REPORTER( "junit", JunitReporter ) } // end namespace Catch // #included from: ../reporters/catch_reporter_console.hpp #define TWOBLUECUBES_CATCH_REPORTER_CONSOLE_HPP_INCLUDED #include namespace Catch { struct ConsoleReporter : StreamingReporterBase { ConsoleReporter( ReporterConfig const& _config ) : StreamingReporterBase( _config ), m_headerPrinted( false ) {} virtual ~ConsoleReporter(); static std::string getDescription() { return "Reports test results as plain lines of text"; } virtual ReporterPreferences getPreferences() const { ReporterPreferences prefs; prefs.shouldRedirectStdOut = false; return prefs; } virtual void noMatchingTestCases( std::string const& spec ) { stream << "No test cases matched '" << spec << "'" << std::endl; } virtual void assertionStarting( AssertionInfo const& ) { } virtual bool assertionEnded( AssertionStats const& _assertionStats ) { AssertionResult const& result = _assertionStats.assertionResult; bool printInfoMessages = true; // Drop out if result was successful and we're not printing those if( !m_config->includeSuccessfulResults() && result.isOk() ) { if( result.getResultType() != ResultWas::Warning ) return false; printInfoMessages = false; } lazyPrint(); AssertionPrinter printer( stream, _assertionStats, printInfoMessages ); printer.print(); stream << std::endl; return true; } virtual void sectionStarting( SectionInfo const& _sectionInfo ) { m_headerPrinted = false; StreamingReporterBase::sectionStarting( _sectionInfo ); } virtual void sectionEnded( SectionStats const& _sectionStats ) { if( _sectionStats.missingAssertions ) { lazyPrint(); Colour colour( Colour::ResultError ); if( m_sectionStack.size() > 1 ) stream << "\nNo assertions in section"; else stream << "\nNo assertions in test case"; stream << " '" << _sectionStats.sectionInfo.name << "'\n" << std::endl; } if( m_headerPrinted ) { if( m_config->showDurations() == ShowDurations::Always ) stream << "Completed in " << _sectionStats.durationInSeconds << "s" << std::endl; m_headerPrinted = false; } else { if( m_config->showDurations() == ShowDurations::Always ) stream << _sectionStats.sectionInfo.name << " completed in " << _sectionStats.durationInSeconds << "s" << std::endl; } StreamingReporterBase::sectionEnded( _sectionStats ); } virtual void testCaseEnded( TestCaseStats const& _testCaseStats ) { StreamingReporterBase::testCaseEnded( _testCaseStats ); m_headerPrinted = false; } virtual void testGroupEnded( TestGroupStats const& _testGroupStats ) { if( currentGroupInfo.used ) { printSummaryDivider(); stream << "Summary for group '" << _testGroupStats.groupInfo.name << "':\n"; printTotals( _testGroupStats.totals ); stream << "\n" << std::endl; } StreamingReporterBase::testGroupEnded( _testGroupStats ); } virtual void testRunEnded( TestRunStats const& _testRunStats ) { printTotalsDivider( _testRunStats.totals ); printTotals( _testRunStats.totals ); stream << std::endl; StreamingReporterBase::testRunEnded( _testRunStats ); } private: class AssertionPrinter { void operator= ( AssertionPrinter const& ); public: AssertionPrinter( std::ostream& _stream, AssertionStats const& _stats, bool _printInfoMessages ) : stream( _stream ), stats( _stats ), result( _stats.assertionResult ), colour( Colour::None ), message( result.getMessage() ), messages( _stats.infoMessages ), printInfoMessages( _printInfoMessages ) { switch( result.getResultType() ) { case ResultWas::Ok: colour = Colour::Success; passOrFail = "PASSED"; //if( result.hasMessage() ) if( _stats.infoMessages.size() == 1 ) messageLabel = "with message"; if( _stats.infoMessages.size() > 1 ) messageLabel = "with messages"; break; case ResultWas::ExpressionFailed: if( result.isOk() ) { colour = Colour::Success; passOrFail = "FAILED - but was ok"; } else { colour = Colour::Error; passOrFail = "FAILED"; } if( _stats.infoMessages.size() == 1 ) messageLabel = "with message"; if( _stats.infoMessages.size() > 1 ) messageLabel = "with messages"; break; case ResultWas::ThrewException: colour = Colour::Error; passOrFail = "FAILED"; messageLabel = "due to unexpected exception with message"; break; case ResultWas::DidntThrowException: colour = Colour::Error; passOrFail = "FAILED"; messageLabel = "because no exception was thrown where one was expected"; break; case ResultWas::Info: messageLabel = "info"; break; case ResultWas::Warning: messageLabel = "warning"; break; case ResultWas::ExplicitFailure: passOrFail = "FAILED"; colour = Colour::Error; if( _stats.infoMessages.size() == 1 ) messageLabel = "explicitly with message"; if( _stats.infoMessages.size() > 1 ) messageLabel = "explicitly with messages"; break; // These cases are here to prevent compiler warnings case ResultWas::Unknown: case ResultWas::FailureBit: case ResultWas::Exception: passOrFail = "** internal error **"; colour = Colour::Error; break; } } void print() const { printSourceInfo(); if( stats.totals.assertions.total() > 0 ) { if( result.isOk() ) stream << "\n"; printResultType(); printOriginalExpression(); printReconstructedExpression(); } else { stream << "\n"; } printMessage(); } private: void printResultType() const { if( !passOrFail.empty() ) { Colour colourGuard( colour ); stream << passOrFail << ":\n"; } } void printOriginalExpression() const { if( result.hasExpression() ) { Colour colourGuard( Colour::OriginalExpression ); stream << " "; stream << result.getExpressionInMacro(); stream << "\n"; } } void printReconstructedExpression() const { if( result.hasExpandedExpression() ) { stream << "with expansion:\n"; Colour colourGuard( Colour::ReconstructedExpression ); stream << Text( result.getExpandedExpression(), TextAttributes().setIndent(2) ) << "\n"; } } void printMessage() const { if( !messageLabel.empty() ) stream << messageLabel << ":" << "\n"; for( std::vector::const_iterator it = messages.begin(), itEnd = messages.end(); it != itEnd; ++it ) { // If this assertion is a warning ignore any INFO messages if( printInfoMessages || it->type != ResultWas::Info ) stream << Text( it->message, TextAttributes().setIndent(2) ) << "\n"; } } void printSourceInfo() const { Colour colourGuard( Colour::FileName ); stream << result.getSourceInfo() << ": "; } std::ostream& stream; AssertionStats const& stats; AssertionResult const& result; Colour::Code colour; std::string passOrFail; std::string messageLabel; std::string message; std::vector messages; bool printInfoMessages; }; void lazyPrint() { if( !currentTestRunInfo.used ) lazyPrintRunInfo(); if( !currentGroupInfo.used ) lazyPrintGroupInfo(); if( !m_headerPrinted ) { printTestCaseAndSectionHeader(); m_headerPrinted = true; } } void lazyPrintRunInfo() { stream << "\n" << getLineOfChars<'~'>() << "\n"; Colour colour( Colour::SecondaryText ); stream << currentTestRunInfo->name << " is a Catch v" << libraryVersion.majorVersion << "." << libraryVersion.minorVersion << " b" << libraryVersion.buildNumber; if( libraryVersion.branchName != std::string( "master" ) ) stream << " (" << libraryVersion.branchName << ")"; stream << " host application.\n" << "Run with -? for options\n\n"; currentTestRunInfo.used = true; } void lazyPrintGroupInfo() { if( !currentGroupInfo->name.empty() && currentGroupInfo->groupsCounts > 1 ) { printClosedHeader( "Group: " + currentGroupInfo->name ); currentGroupInfo.used = true; } } void printTestCaseAndSectionHeader() { assert( !m_sectionStack.empty() ); printOpenHeader( currentTestCaseInfo->name ); if( m_sectionStack.size() > 1 ) { Colour colourGuard( Colour::Headers ); std::vector::const_iterator it = m_sectionStack.begin()+1, // Skip first section (test case) itEnd = m_sectionStack.end(); for( ; it != itEnd; ++it ) printHeaderString( it->name, 2 ); } SourceLineInfo lineInfo = m_sectionStack.front().lineInfo; if( !lineInfo.empty() ){ stream << getLineOfChars<'-'>() << "\n"; Colour colourGuard( Colour::FileName ); stream << lineInfo << "\n"; } stream << getLineOfChars<'.'>() << "\n" << std::endl; } void printClosedHeader( std::string const& _name ) { printOpenHeader( _name ); stream << getLineOfChars<'.'>() << "\n"; } void printOpenHeader( std::string const& _name ) { stream << getLineOfChars<'-'>() << "\n"; { Colour colourGuard( Colour::Headers ); printHeaderString( _name ); } } // if string has a : in first line will set indent to follow it on // subsequent lines void printHeaderString( std::string const& _string, std::size_t indent = 0 ) { std::size_t i = _string.find( ": " ); if( i != std::string::npos ) i+=2; else i = 0; stream << Text( _string, TextAttributes() .setIndent( indent+i) .setInitialIndent( indent ) ) << "\n"; } struct SummaryColumn { SummaryColumn( std::string const& _label, Colour::Code _colour ) : label( _label ), colour( _colour ) {} SummaryColumn addRow( std::size_t count ) { std::ostringstream oss; oss << count; std::string row = oss.str(); for( std::vector::iterator it = rows.begin(); it != rows.end(); ++it ) { while( it->size() < row.size() ) *it = " " + *it; while( it->size() > row.size() ) row = " " + row; } rows.push_back( row ); return *this; } std::string label; Colour::Code colour; std::vector rows; }; void printTotals( Totals const& totals ) { if( totals.testCases.total() == 0 ) { stream << Colour( Colour::Warning ) << "No tests ran\n"; } else if( totals.assertions.total() > 0 && totals.assertions.allPassed() ) { stream << Colour( Colour::ResultSuccess ) << "All tests passed"; stream << " (" << pluralise( totals.assertions.passed, "assertion" ) << " in " << pluralise( totals.testCases.passed, "test case" ) << ")" << "\n"; } else { std::vector columns; columns.push_back( SummaryColumn( "", Colour::None ) .addRow( totals.testCases.total() ) .addRow( totals.assertions.total() ) ); columns.push_back( SummaryColumn( "passed", Colour::Success ) .addRow( totals.testCases.passed ) .addRow( totals.assertions.passed ) ); columns.push_back( SummaryColumn( "failed", Colour::ResultError ) .addRow( totals.testCases.failed ) .addRow( totals.assertions.failed ) ); columns.push_back( SummaryColumn( "failed as expected", Colour::ResultExpectedFailure ) .addRow( totals.testCases.failedButOk ) .addRow( totals.assertions.failedButOk ) ); printSummaryRow( "test cases", columns, 0 ); printSummaryRow( "assertions", columns, 1 ); } } void printSummaryRow( std::string const& label, std::vector const& cols, std::size_t row ) { for( std::vector::const_iterator it = cols.begin(); it != cols.end(); ++it ) { std::string value = it->rows[row]; if( it->label.empty() ) { stream << label << ": "; if( value != "0" ) stream << value; else stream << Colour( Colour::Warning ) << "- none -"; } else if( value != "0" ) { stream << Colour( Colour::LightGrey ) << " | "; stream << Colour( it->colour ) << value << " " << it->label; } } stream << "\n"; } static std::size_t makeRatio( std::size_t number, std::size_t total ) { std::size_t ratio = total > 0 ? CATCH_CONFIG_CONSOLE_WIDTH * number/ total : 0; return ( ratio == 0 && number > 0 ) ? 1 : ratio; } static std::size_t& findMax( std::size_t& i, std::size_t& j, std::size_t& k ) { if( i > j && i > k ) return i; else if( j > k ) return j; else return k; } void printTotalsDivider( Totals const& totals ) { if( totals.testCases.total() > 0 ) { std::size_t failedRatio = makeRatio( totals.testCases.failed, totals.testCases.total() ); std::size_t failedButOkRatio = makeRatio( totals.testCases.failedButOk, totals.testCases.total() ); std::size_t passedRatio = makeRatio( totals.testCases.passed, totals.testCases.total() ); while( failedRatio + failedButOkRatio + passedRatio < CATCH_CONFIG_CONSOLE_WIDTH-1 ) findMax( failedRatio, failedButOkRatio, passedRatio )++; while( failedRatio + failedButOkRatio + passedRatio > CATCH_CONFIG_CONSOLE_WIDTH-1 ) findMax( failedRatio, failedButOkRatio, passedRatio )--; stream << Colour( Colour::Error ) << std::string( failedRatio, '=' ); stream << Colour( Colour::ResultExpectedFailure ) << std::string( failedButOkRatio, '=' ); if( totals.testCases.allPassed() ) stream << Colour( Colour::ResultSuccess ) << std::string( passedRatio, '=' ); else stream << Colour( Colour::Success ) << std::string( passedRatio, '=' ); } else { stream << Colour( Colour::Warning ) << std::string( CATCH_CONFIG_CONSOLE_WIDTH-1, '=' ); } stream << "\n"; } void printSummaryDivider() { stream << getLineOfChars<'-'>() << "\n"; } template static char const* getLineOfChars() { static char line[CATCH_CONFIG_CONSOLE_WIDTH] = {0}; if( !*line ) { memset( line, C, CATCH_CONFIG_CONSOLE_WIDTH-1 ); line[CATCH_CONFIG_CONSOLE_WIDTH-1] = 0; } return line; } private: bool m_headerPrinted; }; INTERNAL_CATCH_REGISTER_REPORTER( "console", ConsoleReporter ) } // end namespace Catch // #included from: ../reporters/catch_reporter_compact.hpp #define TWOBLUECUBES_CATCH_REPORTER_COMPACT_HPP_INCLUDED namespace Catch { struct CompactReporter : StreamingReporterBase { CompactReporter( ReporterConfig const& _config ) : StreamingReporterBase( _config ) {} virtual ~CompactReporter(); static std::string getDescription() { return "Reports test results on a single line, suitable for IDEs"; } virtual ReporterPreferences getPreferences() const { ReporterPreferences prefs; prefs.shouldRedirectStdOut = false; return prefs; } virtual void noMatchingTestCases( std::string const& spec ) { stream << "No test cases matched '" << spec << "'" << std::endl; } virtual void assertionStarting( AssertionInfo const& ) { } virtual bool assertionEnded( AssertionStats const& _assertionStats ) { AssertionResult const& result = _assertionStats.assertionResult; bool printInfoMessages = true; // Drop out if result was successful and we're not printing those if( !m_config->includeSuccessfulResults() && result.isOk() ) { if( result.getResultType() != ResultWas::Warning ) return false; printInfoMessages = false; } AssertionPrinter printer( stream, _assertionStats, printInfoMessages ); printer.print(); stream << std::endl; return true; } virtual void testRunEnded( TestRunStats const& _testRunStats ) { printTotals( _testRunStats.totals ); stream << "\n" << std::endl; StreamingReporterBase::testRunEnded( _testRunStats ); } private: class AssertionPrinter { void operator= ( AssertionPrinter const& ); public: AssertionPrinter( std::ostream& _stream, AssertionStats const& _stats, bool _printInfoMessages ) : stream( _stream ) , stats( _stats ) , result( _stats.assertionResult ) , messages( _stats.infoMessages ) , itMessage( _stats.infoMessages.begin() ) , printInfoMessages( _printInfoMessages ) {} void print() { printSourceInfo(); itMessage = messages.begin(); switch( result.getResultType() ) { case ResultWas::Ok: printResultType( Colour::ResultSuccess, passedString() ); printOriginalExpression(); printReconstructedExpression(); if ( ! result.hasExpression() ) printRemainingMessages( Colour::None ); else printRemainingMessages(); break; case ResultWas::ExpressionFailed: if( result.isOk() ) printResultType( Colour::ResultSuccess, failedString() + std::string( " - but was ok" ) ); else printResultType( Colour::Error, failedString() ); printOriginalExpression(); printReconstructedExpression(); printRemainingMessages(); break; case ResultWas::ThrewException: printResultType( Colour::Error, failedString() ); printIssue( "unexpected exception with message:" ); printMessage(); printExpressionWas(); printRemainingMessages(); break; case ResultWas::DidntThrowException: printResultType( Colour::Error, failedString() ); printIssue( "expected exception, got none" ); printExpressionWas(); printRemainingMessages(); break; case ResultWas::Info: printResultType( Colour::None, "info" ); printMessage(); printRemainingMessages(); break; case ResultWas::Warning: printResultType( Colour::None, "warning" ); printMessage(); printRemainingMessages(); break; case ResultWas::ExplicitFailure: printResultType( Colour::Error, failedString() ); printIssue( "explicitly" ); printRemainingMessages( Colour::None ); break; // These cases are here to prevent compiler warnings case ResultWas::Unknown: case ResultWas::FailureBit: case ResultWas::Exception: printResultType( Colour::Error, "** internal error **" ); break; } } private: // Colour::LightGrey static Colour::Code dimColour() { return Colour::FileName; } #ifdef CATCH_PLATFORM_MAC static const char* failedString() { return "FAILED"; } static const char* passedString() { return "PASSED"; } #else static const char* failedString() { return "failed"; } static const char* passedString() { return "passed"; } #endif void printSourceInfo() const { Colour colourGuard( Colour::FileName ); stream << result.getSourceInfo() << ":"; } void printResultType( Colour::Code colour, std::string passOrFail ) const { if( !passOrFail.empty() ) { { Colour colourGuard( colour ); stream << " " << passOrFail; } stream << ":"; } } void printIssue( std::string issue ) const { stream << " " << issue; } void printExpressionWas() { if( result.hasExpression() ) { stream << ";"; { Colour colour( dimColour() ); stream << " expression was:"; } printOriginalExpression(); } } void printOriginalExpression() const { if( result.hasExpression() ) { stream << " " << result.getExpression(); } } void printReconstructedExpression() const { if( result.hasExpandedExpression() ) { { Colour colour( dimColour() ); stream << " for: "; } stream << result.getExpandedExpression(); } } void printMessage() { if ( itMessage != messages.end() ) { stream << " '" << itMessage->message << "'"; ++itMessage; } } void printRemainingMessages( Colour::Code colour = dimColour() ) { if ( itMessage == messages.end() ) return; // using messages.end() directly yields compilation error: std::vector::const_iterator itEnd = messages.end(); const std::size_t N = static_cast( std::distance( itMessage, itEnd ) ); { Colour colourGuard( colour ); stream << " with " << pluralise( N, "message" ) << ":"; } for(; itMessage != itEnd; ) { // If this assertion is a warning ignore any INFO messages if( printInfoMessages || itMessage->type != ResultWas::Info ) { stream << " '" << itMessage->message << "'"; if ( ++itMessage != itEnd ) { Colour colourGuard( dimColour() ); stream << " and"; } } } } private: std::ostream& stream; AssertionStats const& stats; AssertionResult const& result; std::vector messages; std::vector::const_iterator itMessage; bool printInfoMessages; }; // Colour, message variants: // - white: No tests ran. // - red: Failed [both/all] N test cases, failed [both/all] M assertions. // - white: Passed [both/all] N test cases (no assertions). // - red: Failed N tests cases, failed M assertions. // - green: Passed [both/all] N tests cases with M assertions. std::string bothOrAll( std::size_t count ) const { return count == 1 ? "" : count == 2 ? "both " : "all " ; } void printTotals( const Totals& totals ) const { if( totals.testCases.total() == 0 ) { stream << "No tests ran."; } else if( totals.testCases.failed == totals.testCases.total() ) { Colour colour( Colour::ResultError ); const std::string qualify_assertions_failed = totals.assertions.failed == totals.assertions.total() ? bothOrAll( totals.assertions.failed ) : ""; stream << "Failed " << bothOrAll( totals.testCases.failed ) << pluralise( totals.testCases.failed, "test case" ) << ", " "failed " << qualify_assertions_failed << pluralise( totals.assertions.failed, "assertion" ) << "."; } else if( totals.assertions.total() == 0 ) { stream << "Passed " << bothOrAll( totals.testCases.total() ) << pluralise( totals.testCases.total(), "test case" ) << " (no assertions)."; } else if( totals.assertions.failed ) { Colour colour( Colour::ResultError ); stream << "Failed " << pluralise( totals.testCases.failed, "test case" ) << ", " "failed " << pluralise( totals.assertions.failed, "assertion" ) << "."; } else { Colour colour( Colour::ResultSuccess ); stream << "Passed " << bothOrAll( totals.testCases.passed ) << pluralise( totals.testCases.passed, "test case" ) << " with " << pluralise( totals.assertions.passed, "assertion" ) << "."; } } }; INTERNAL_CATCH_REGISTER_REPORTER( "compact", CompactReporter ) } // end namespace Catch namespace Catch { NonCopyable::~NonCopyable() {} IShared::~IShared() {} StreamBufBase::~StreamBufBase() CATCH_NOEXCEPT {} IContext::~IContext() {} IResultCapture::~IResultCapture() {} ITestCase::~ITestCase() {} ITestCaseRegistry::~ITestCaseRegistry() {} IRegistryHub::~IRegistryHub() {} IMutableRegistryHub::~IMutableRegistryHub() {} IExceptionTranslator::~IExceptionTranslator() {} IExceptionTranslatorRegistry::~IExceptionTranslatorRegistry() {} IReporter::~IReporter() {} IReporterFactory::~IReporterFactory() {} IReporterRegistry::~IReporterRegistry() {} IStreamingReporter::~IStreamingReporter() {} AssertionStats::~AssertionStats() {} SectionStats::~SectionStats() {} TestCaseStats::~TestCaseStats() {} TestGroupStats::~TestGroupStats() {} TestRunStats::~TestRunStats() {} CumulativeReporterBase::SectionNode::~SectionNode() {} CumulativeReporterBase::~CumulativeReporterBase() {} StreamingReporterBase::~StreamingReporterBase() {} ConsoleReporter::~ConsoleReporter() {} CompactReporter::~CompactReporter() {} IRunner::~IRunner() {} IMutableContext::~IMutableContext() {} IConfig::~IConfig() {} XmlReporter::~XmlReporter() {} JunitReporter::~JunitReporter() {} TestRegistry::~TestRegistry() {} FreeFunctionTestCase::~FreeFunctionTestCase() {} IGeneratorInfo::~IGeneratorInfo() {} IGeneratorsForTest::~IGeneratorsForTest() {} TestSpec::Pattern::~Pattern() {} TestSpec::NamePattern::~NamePattern() {} TestSpec::TagPattern::~TagPattern() {} TestSpec::ExcludedPattern::~ExcludedPattern() {} Matchers::Impl::StdString::Equals::~Equals() {} Matchers::Impl::StdString::Contains::~Contains() {} Matchers::Impl::StdString::StartsWith::~StartsWith() {} Matchers::Impl::StdString::EndsWith::~EndsWith() {} void Config::dummy() {} INTERNAL_CATCH_REGISTER_LEGACY_REPORTER( "xml", XmlReporter ) } #ifdef __clang__ #pragma clang diagnostic pop #endif #endif #ifdef CATCH_CONFIG_MAIN // #included from: internal/catch_default_main.hpp #define TWOBLUECUBES_CATCH_DEFAULT_MAIN_HPP_INCLUDED #ifndef __OBJC__ // Standard C/C++ main entry point int main (int argc, char * const argv[]) { return Catch::Session().run( argc, argv ); } #else // __OBJC__ // Objective-C entry point int main (int argc, char * const argv[]) { #if !CATCH_ARC_ENABLED NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init]; #endif Catch::registerTestMethods(); int result = Catch::Session().run( argc, (char* const*)argv ); #if !CATCH_ARC_ENABLED [pool drain]; #endif return result; } #endif // __OBJC__ #endif #ifdef CLARA_CONFIG_MAIN_NOT_DEFINED # undef CLARA_CONFIG_MAIN #endif ////// // If this config identifier is defined then all CATCH macros are prefixed with CATCH_ #ifdef CATCH_CONFIG_PREFIX_ALL #define CATCH_REQUIRE( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::Normal, "CATCH_REQUIRE" ) #define CATCH_REQUIRE_FALSE( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::Normal | Catch::ResultDisposition::FalseTest, "CATCH_REQUIRE_FALSE" ) #define CATCH_REQUIRE_THROWS( expr ) INTERNAL_CATCH_THROWS( expr, Catch::ResultDisposition::Normal, "CATCH_REQUIRE_THROWS" ) #define CATCH_REQUIRE_THROWS_AS( expr, exceptionType ) INTERNAL_CATCH_THROWS_AS( expr, exceptionType, Catch::ResultDisposition::Normal, "CATCH_REQUIRE_THROWS_AS" ) #define CATCH_REQUIRE_NOTHROW( expr ) INTERNAL_CATCH_NO_THROW( expr, Catch::ResultDisposition::Normal, "CATCH_REQUIRE_NOTHROW" ) #define CATCH_CHECK( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::ContinueOnFailure, "CATCH_CHECK" ) #define CATCH_CHECK_FALSE( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::ContinueOnFailure | Catch::ResultDisposition::FalseTest, "CATCH_CHECK_FALSE" ) #define CATCH_CHECKED_IF( expr ) INTERNAL_CATCH_IF( expr, Catch::ResultDisposition::ContinueOnFailure, "CATCH_CHECKED_IF" ) #define CATCH_CHECKED_ELSE( expr ) INTERNAL_CATCH_ELSE( expr, Catch::ResultDisposition::ContinueOnFailure, "CATCH_CHECKED_ELSE" ) #define CATCH_CHECK_NOFAIL( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::ContinueOnFailure | Catch::ResultDisposition::SuppressFail, "CATCH_CHECK_NOFAIL" ) #define CATCH_CHECK_THROWS( expr ) INTERNAL_CATCH_THROWS( expr, Catch::ResultDisposition::ContinueOnFailure, "CATCH_CHECK_THROWS" ) #define CATCH_CHECK_THROWS_AS( expr, exceptionType ) INTERNAL_CATCH_THROWS_AS( expr, exceptionType, Catch::ResultDisposition::ContinueOnFailure, "CATCH_CHECK_THROWS_AS" ) #define CATCH_CHECK_NOTHROW( expr ) INTERNAL_CATCH_NO_THROW( expr, Catch::ResultDisposition::ContinueOnFailure, "CATCH_CHECK_NOTHROW" ) #define CHECK_THAT( arg, matcher ) INTERNAL_CHECK_THAT( arg, matcher, Catch::ResultDisposition::ContinueOnFailure, "CATCH_CHECK_THAT" ) #define CATCH_REQUIRE_THAT( arg, matcher ) INTERNAL_CHECK_THAT( arg, matcher, Catch::ResultDisposition::Normal, "CATCH_REQUIRE_THAT" ) #define CATCH_INFO( msg ) INTERNAL_CATCH_INFO( msg, "CATCH_INFO" ) #define CATCH_WARN( msg ) INTERNAL_CATCH_MSG( Catch::ResultWas::Warning, Catch::ResultDisposition::ContinueOnFailure, "CATCH_WARN", msg ) #define CATCH_SCOPED_INFO( msg ) INTERNAL_CATCH_INFO( msg, "CATCH_INFO" ) #define CATCH_CAPTURE( msg ) INTERNAL_CATCH_INFO( #msg " := " << msg, "CATCH_CAPTURE" ) #define CATCH_SCOPED_CAPTURE( msg ) INTERNAL_CATCH_INFO( #msg " := " << msg, "CATCH_CAPTURE" ) #ifdef CATCH_CONFIG_VARIADIC_MACROS #define CATCH_TEST_CASE( ... ) INTERNAL_CATCH_TESTCASE( __VA_ARGS__ ) #define CATCH_TEST_CASE_METHOD( className, ... ) INTERNAL_CATCH_TEST_CASE_METHOD( className, __VA_ARGS__ ) #define CATCH_METHOD_AS_TEST_CASE( method, ... ) INTERNAL_CATCH_METHOD_AS_TEST_CASE( method, __VA_ARGS__ ) #define CATCH_SECTION( ... ) INTERNAL_CATCH_SECTION( __VA_ARGS__ ) #define CATCH_FAIL( ... ) INTERNAL_CATCH_MSG( Catch::ResultWas::ExplicitFailure, Catch::ResultDisposition::Normal, "CATCH_FAIL", __VA_ARGS__ ) #define CATCH_SUCCEED( ... ) INTERNAL_CATCH_MSG( Catch::ResultWas::Ok, Catch::ResultDisposition::ContinueOnFailure, "CATCH_SUCCEED", __VA_ARGS__ ) #else #define CATCH_TEST_CASE( name, description ) INTERNAL_CATCH_TESTCASE( name, description ) #define CATCH_TEST_CASE_METHOD( className, name, description ) INTERNAL_CATCH_TEST_CASE_METHOD( className, name, description ) #define CATCH_METHOD_AS_TEST_CASE( method, name, description ) INTERNAL_CATCH_METHOD_AS_TEST_CASE( method, name, description ) #define CATCH_SECTION( name, description ) INTERNAL_CATCH_SECTION( name, description ) #define CATCH_FAIL( msg ) INTERNAL_CATCH_MSG( Catch::ResultWas::ExplicitFailure, Catch::ResultDisposition::Normal, "CATCH_FAIL", msg ) #define CATCH_SUCCEED( msg ) INTERNAL_CATCH_MSG( Catch::ResultWas::Ok, Catch::ResultDisposition::ContinueOnFailure, "CATCH_SUCCEED", msg ) #endif #define CATCH_ANON_TEST_CASE() INTERNAL_CATCH_TESTCASE( "", "" ) #define CATCH_REGISTER_REPORTER( name, reporterType ) INTERNAL_CATCH_REGISTER_REPORTER( name, reporterType ) #define CATCH_REGISTER_LEGACY_REPORTER( name, reporterType ) INTERNAL_CATCH_REGISTER_LEGACY_REPORTER( name, reporterType ) #define CATCH_GENERATE( expr) INTERNAL_CATCH_GENERATE( expr ) // "BDD-style" convenience wrappers #ifdef CATCH_CONFIG_VARIADIC_MACROS #define CATCH_SCENARIO( ... ) CATCH_TEST_CASE( "Scenario: " __VA_ARGS__ ) #define CATCH_SCENARIO_METHOD( className, ... ) INTERNAL_CATCH_TEST_CASE_METHOD( className, "Scenario: " __VA_ARGS__ ) #else #define CATCH_SCENARIO( name, tags ) CATCH_TEST_CASE( "Scenario: " name, tags ) #define CATCH_SCENARIO_METHOD( className, name, tags ) INTERNAL_CATCH_TEST_CASE_METHOD( className, "Scenario: " name, tags ) #endif #define CATCH_GIVEN( desc ) CATCH_SECTION( "Given: " desc, "" ) #define CATCH_WHEN( desc ) CATCH_SECTION( " When: " desc, "" ) #define CATCH_AND_WHEN( desc ) CATCH_SECTION( " And: " desc, "" ) #define CATCH_THEN( desc ) CATCH_SECTION( " Then: " desc, "" ) #define CATCH_AND_THEN( desc ) CATCH_SECTION( " And: " desc, "" ) // If CATCH_CONFIG_PREFIX_ALL is not defined then the CATCH_ prefix is not required #else #define REQUIRE( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::Normal, "REQUIRE" ) #define REQUIRE_FALSE( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::Normal | Catch::ResultDisposition::FalseTest, "REQUIRE_FALSE" ) #define REQUIRE_THROWS( expr ) INTERNAL_CATCH_THROWS( expr, Catch::ResultDisposition::Normal, "REQUIRE_THROWS" ) #define REQUIRE_THROWS_AS( expr, exceptionType ) INTERNAL_CATCH_THROWS_AS( expr, exceptionType, Catch::ResultDisposition::Normal, "REQUIRE_THROWS_AS" ) #define REQUIRE_NOTHROW( expr ) INTERNAL_CATCH_NO_THROW( expr, Catch::ResultDisposition::Normal, "REQUIRE_NOTHROW" ) #define CHECK( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::ContinueOnFailure, "CHECK" ) #define CHECK_FALSE( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::ContinueOnFailure | Catch::ResultDisposition::FalseTest, "CHECK_FALSE" ) #define CHECKED_IF( expr ) INTERNAL_CATCH_IF( expr, Catch::ResultDisposition::ContinueOnFailure, "CHECKED_IF" ) #define CHECKED_ELSE( expr ) INTERNAL_CATCH_ELSE( expr, Catch::ResultDisposition::ContinueOnFailure, "CHECKED_ELSE" ) #define CHECK_NOFAIL( expr ) INTERNAL_CATCH_TEST( expr, Catch::ResultDisposition::ContinueOnFailure | Catch::ResultDisposition::SuppressFail, "CHECK_NOFAIL" ) #define CHECK_THROWS( expr ) INTERNAL_CATCH_THROWS( expr, Catch::ResultDisposition::ContinueOnFailure, "CHECK_THROWS" ) #define CHECK_THROWS_AS( expr, exceptionType ) INTERNAL_CATCH_THROWS_AS( expr, exceptionType, Catch::ResultDisposition::ContinueOnFailure, "CHECK_THROWS_AS" ) #define CHECK_NOTHROW( expr ) INTERNAL_CATCH_NO_THROW( expr, Catch::ResultDisposition::ContinueOnFailure, "CHECK_NOTHROW" ) #define CHECK_THAT( arg, matcher ) INTERNAL_CHECK_THAT( arg, matcher, Catch::ResultDisposition::ContinueOnFailure, "CHECK_THAT" ) #define REQUIRE_THAT( arg, matcher ) INTERNAL_CHECK_THAT( arg, matcher, Catch::ResultDisposition::Normal, "REQUIRE_THAT" ) #define INFO( msg ) INTERNAL_CATCH_INFO( msg, "INFO" ) #define WARN( msg ) INTERNAL_CATCH_MSG( Catch::ResultWas::Warning, Catch::ResultDisposition::ContinueOnFailure, "WARN", msg ) #define SCOPED_INFO( msg ) INTERNAL_CATCH_INFO( msg, "INFO" ) #define CAPTURE( msg ) INTERNAL_CATCH_INFO( #msg " := " << msg, "CAPTURE" ) #define SCOPED_CAPTURE( msg ) INTERNAL_CATCH_INFO( #msg " := " << msg, "CAPTURE" ) #ifdef CATCH_CONFIG_VARIADIC_MACROS #define TEST_CASE( ... ) INTERNAL_CATCH_TESTCASE( __VA_ARGS__ ) #define TEST_CASE_METHOD( className, ... ) INTERNAL_CATCH_TEST_CASE_METHOD( className, __VA_ARGS__ ) #define METHOD_AS_TEST_CASE( method, ... ) INTERNAL_CATCH_METHOD_AS_TEST_CASE( method, __VA_ARGS__ ) #define SECTION( ... ) INTERNAL_CATCH_SECTION( __VA_ARGS__ ) #define FAIL( ... ) INTERNAL_CATCH_MSG( Catch::ResultWas::ExplicitFailure, Catch::ResultDisposition::Normal, "FAIL", __VA_ARGS__ ) #define SUCCEED( ... ) INTERNAL_CATCH_MSG( Catch::ResultWas::Ok, Catch::ResultDisposition::ContinueOnFailure, "SUCCEED", __VA_ARGS__ ) #else #define TEST_CASE( name, description ) INTERNAL_CATCH_TESTCASE( name, description ) #define TEST_CASE_METHOD( className, name, description ) INTERNAL_CATCH_TEST_CASE_METHOD( className, name, description ) #define METHOD_AS_TEST_CASE( method, name, description ) INTERNAL_CATCH_METHOD_AS_TEST_CASE( method, name, description ) #define SECTION( name, description ) INTERNAL_CATCH_SECTION( name, description ) #define FAIL( msg ) INTERNAL_CATCH_MSG( Catch::ResultWas::ExplicitFailure, Catch::ResultDisposition::Normal, "FAIL", msg ) #define SUCCEED( msg ) INTERNAL_CATCH_MSG( Catch::ResultWas::Ok, Catch::ResultDisposition::ContinueOnFailure, "SUCCEED", msg ) #endif #define ANON_TEST_CASE() INTERNAL_CATCH_TESTCASE( "", "" ) #define REGISTER_REPORTER( name, reporterType ) INTERNAL_CATCH_REGISTER_REPORTER( name, reporterType ) #define REGISTER_LEGACY_REPORTER( name, reporterType ) INTERNAL_CATCH_REGISTER_LEGACY_REPORTER( name, reporterType ) #define GENERATE( expr) INTERNAL_CATCH_GENERATE( expr ) #endif #define CATCH_TRANSLATE_EXCEPTION( signature ) INTERNAL_CATCH_TRANSLATE_EXCEPTION( signature ) // "BDD-style" convenience wrappers #ifdef CATCH_CONFIG_VARIADIC_MACROS #define SCENARIO( ... ) TEST_CASE( "Scenario: " __VA_ARGS__ ) #define SCENARIO_METHOD( className, ... ) INTERNAL_CATCH_TEST_CASE_METHOD( className, "Scenario: " __VA_ARGS__ ) #else #define SCENARIO( name, tags ) TEST_CASE( "Scenario: " name, tags ) #define SCENARIO_METHOD( className, name, tags ) INTERNAL_CATCH_TEST_CASE_METHOD( className, "Scenario: " name, tags ) #endif #define GIVEN( desc ) SECTION( " Given: " desc, "" ) #define WHEN( desc ) SECTION( " When: " desc, "" ) #define AND_WHEN( desc ) SECTION( "And when: " desc, "" ) #define THEN( desc ) SECTION( " Then: " desc, "" ) #define AND_THEN( desc ) SECTION( " And: " desc, "" ) using Catch::Detail::Approx; // #included from: internal/catch_reenable_warnings.h #define TWOBLUECUBES_CATCH_REENABLE_WARNINGS_H_INCLUDED #ifdef __clang__ #pragma clang diagnostic pop #elif defined __GNUC__ #pragma GCC diagnostic pop #endif #endif // TWOBLUECUBES_SINGLE_INCLUDE_CATCH_HPP_INCLUDED mothur-1.36.1/TestMothur/main.cpp000066400000000000000000000004021255543666200167000ustar00rootroot00000000000000// // main.cpp // TestMothur // // Created by Sarah Westcott on 3/23/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #define CATCH_CONFIG_MAIN // This tells Catch to provide a main() - only do this in one cpp file #include "catch.hpp" mothur-1.36.1/TestMothur/testcommands/000077500000000000000000000000001255543666200177555ustar00rootroot00000000000000mothur-1.36.1/TestMothur/testcommands/testsetseedcommand.cpp000066400000000000000000000011261255543666200243540ustar00rootroot00000000000000// // testsetseedcommand.cpp // Mothur // // Created by Sarah Westcott on 3/24/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #include "catch.hpp" #include "setseedcommand.h" TEST_CASE("Testing set.seed command") { string optionString = "seed=12345"; Command* setseed = new SetSeedCommand(optionString); SECTION("Testing random seed") { INFO("Using seed=12345") // Only appears on a FAIL setseed->execute(); int randValue = rand()%100 + 1; CHECK(randValue == 25); } delete setseed; } mothur-1.36.1/TestMothur/testdatastructures/000077500000000000000000000000001255543666200212315ustar00rootroot00000000000000mothur-1.36.1/TestMothur/testdatastructures/testsequence.cpp000066400000000000000000000033311255543666200244450ustar00rootroot00000000000000// // testsequence.cpp // Mothur // // Created by Sarah Westcott on 3/23/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #include "catch.hpp" #include "sequence.hpp" TEST_CASE("Testing Sequence Class") { Sequence seq; SECTION("test constructor - string, string") { INFO("Using TestSeq, atgcatgc") // Only appears on a FAIL Sequence seq2("TestSeq", "atgcatgc"); CAPTURE(seq2.getInlineSeq()); // Displays this variable on a FAIL CHECK(seq2.getInlineSeq() == "TestSeq\tatgcatgc"); } SECTION("setting / getting name") { INFO("Using TestSeq") // Only appears on a FAIL seq.setName("TestSeq"); CAPTURE(seq.getName()); // Displays this variable on a FAIL CHECK(seq.getName() == "TestSeq"); } SECTION("test setAligned / get Aligned") { INFO("Using ..atgc--atgc..") // Only appears on a FAIL seq.setAligned("..atgc--atgc.."); CAPTURE(seq.getAligned()); // Displays this variable on a FAIL CHECK(seq.getAligned() == "..atgc--atgc.."); } SECTION("test setUnaligned / getUnaligned") { INFO("Using ..atgc--atgc..") // Only appears on a FAIL seq.setUnaligned("..atgc--atgc.."); CAPTURE(seq.getUnaligned()); // Displays this variable on a FAIL CHECK(seq.getUnaligned() == "atgcatgc"); } SECTION("test initialize") { INFO("No data") // Only appears on a FAIL seq.initialize(); CAPTURE(seq.getUnaligned()); // Displays this variable on a FAIL CHECK(seq.getUnaligned() == ""); } //more tests need to be added - just a start to set up testing project and model } mothur-1.36.1/makefile000066400000000000000000000060771255543666200146500ustar00rootroot00000000000000################################################### # # Makefile for mothur # ################################################### # # Macros # USEMPI ?= no 64BIT_VERSION ?= yes USEREADLINE ?= yes USECOMPRESSION ?= no USEBOOST ?= yes MOTHUR_FILES="\"Enter_your_default_path_here\"" RELEASE_DATE = "\"7/27/2015\"" VERSION = "\"1.36.1\"" # Optimize to level 3: CXXFLAGS += -O3 ifeq ($(strip $(64BIT_VERSION)),yes) #if you are a mac user use the following line TARGET_ARCH += -arch x86_64 #if you using cygwin to build Windows the following line #CXX = x86_64-w64-mingw32-g++ #CC = x86_64-w64-mingw32-g++ #TARGET_ARCH += -m64 -static #if you are a linux user use the following line #CXXFLAGS += -mtune=native -march=native CXXFLAGS += -DBIT_VERSION endif CXXFLAGS += -DRELEASE_DATE=${RELEASE_DATE} -DVERSION=${VERSION} ifeq ($(strip $(MOTHUR_FILES)),"\"Enter_your_default_path_here\"") else CXXFLAGS += -DMOTHUR_FILES=${MOTHUR_FILES} endif # if you do not want to use the readline library, set this to no. # make sure you have the library installed ifeq ($(strip $(USEREADLINE)),yes) CXXFLAGS += -DUSE_READLINE LIBS = \ -lreadline\ -lncurses endif ifeq ($(strip $(USEMPI)),yes) CXX = mpic++ CXXFLAGS += -DUSE_MPI endif #The boost libraries allow you to read gz files. ifeq ($(strip $(USEBOOST)),yes) BOOST_INCLUDE_DIR="/usr/local/include" BOOST_LIBRARY_DIR="/usr/local/lib" CXXFLAGS += -DUSE_BOOST LIBS += \ ${BOOST_LIBRARY_DIR}/libboost_iostreams.a \ ${BOOST_LIBRARY_DIR}/zlib.a #if linux or windows then ${BOOST_LIBRARY_DIR}/libz.a endif # if you want to enable reading and writing of compressed files, set to yes. # The default is no. this may only work on unix-like systems, not for windows. ifeq ($(strip $(USECOMPRESSION)),yes) CXXFLAGS += -DUSE_COMPRESSION endif # # INCLUDE directories for mothur # # VPATH=source/calculators:source/chimera:source/classifier:source/clearcut:source/commands:source/communitytype:source/datastructures:source/metastats:source/randomforest:source/read:source/svm skipUchime := source/uchime_src/ subdirs := $(sort $(dir $(filter-out $(skipUchime), $(wildcard source/*/)))) subDirIncludes = $(patsubst %, -I %, $(subdirs)) subDirLinking = $(patsubst %, -L%, $(subdirs)) CXXFLAGS += -I. $(subDirIncludes) LDFLAGS += $(subDirLinking) # # Get the list of all .cpp files, rename to .o files # OBJECTS=$(patsubst %.cpp,%.o,$(wildcard $(addsuffix *.cpp,$(subdirs)))) OBJECTS+=$(patsubst %.c,%.o,$(wildcard $(addsuffix *.c,$(subdirs)))) OBJECTS+=$(patsubst %.cpp,%.o,$(wildcard *.cpp)) OBJECTS+=$(patsubst %.c,%.o,$(wildcard *.c)) mothur : $(OBJECTS) uchime $(CXX) $(LDFLAGS) $(TARGET_ARCH) -o $@ $(OBJECTS) $(LIBS) strip mothur uchime: cd source/uchime_src && ./mk && mv uchime ../../ && cd .. install : mothur %.o : %.c %.h $(COMPILE.c) $(OUTPUT_OPTION) $< %.o : %.cpp %.h $(COMPILE.cpp) $(OUTPUT_OPTION) $< %.o : %.cpp %.hpp $(COMPILE.cpp) $(OUTPUT_OPTION) $< clean : @rm -f $(OBJECTS) @rm -f uchime mothur-1.36.1/source/000077500000000000000000000000001255543666200144365ustar00rootroot00000000000000mothur-1.36.1/source/averagelinkage.cpp000066400000000000000000000030651255543666200201130ustar00rootroot00000000000000#ifndef AVERAGE_H #define AVERAGE_H //test #include "mothur.h" #include "cluster.hpp" #include "rabundvector.hpp" #include "sparsedistancematrix.h" /* This class implements the average UPGMA, average neighbor clustering algorithm */ /***********************************************************************/ AverageLinkage::AverageLinkage(RAbundVector* rav, ListVector* lv, SparseDistanceMatrix* dm, float c, string s, float a) : Cluster(rav, lv, dm, c, s, a) { saveRow = -1; saveCol = -1; } /***********************************************************************/ //This function returns the tag of the method. string AverageLinkage::getTag() { return("an"); } /***********************************************************************/ //This function updates the distance based on the average linkage method. bool AverageLinkage::updateDistance(PDistCell& colCell, PDistCell& rowCell) { try { if ((saveRow != smallRow) || (saveCol != smallCol)) { rowBin = rabund->get(smallRow); colBin = rabund->get(smallCol); totalBin = rowBin + colBin; saveRow = smallRow; saveCol = smallCol; } //cout << "colcell.dist = " << colCell.dist << '\t' << smallRow << '\t' << smallCol << '\t' << rowCell.dist << endl; colCell.dist = (colBin * colCell.dist + rowBin * rowCell.dist) / totalBin; return(true); } catch(exception& e) { m->errorOut(e, "AverageLinkage", "updateDistance"); exit(1); } } /***********************************************************************/ /***********************************************************************/ #endif mothur-1.36.1/source/calcsparcc.cpp000066400000000000000000000256471255543666200172560ustar00rootroot00000000000000// // runSparcc.cpp // PDSSparCC // // Created by Patrick Schloss on 10/31/12. // Copyright (c) 2012 University of Michigan. All rights reserved. // #include "calcsparcc.h" #include "linearalgebra.h" /**************************************************************************************************/ CalcSparcc::CalcSparcc(vector > sharedVector, int maxIterations, int numSamplings, string method){ try { m = MothurOut::getInstance(); numOTUs = (int)sharedVector[0].size(); numGroups = (int)sharedVector.size(); normalizationMethod = method; int numOTUs = (int)sharedVector[0].size(); addPseudoCount(sharedVector); vector > > allCorrelations(numSamplings); // float cycClockStart = clock(); // unsigned long long cycTimeStart = time(NULL); for(int i=0;icontrol_pressed) { break; } vector logFractions = getLogFractions(sharedVector, method); getT_Matrix(logFractions); //this step is slow... getT_Vector(); getD_Matrix(); vector basisVariances = getBasisVariances(); //this step is slow... vector > correlation = getBasisCorrelations(basisVariances); excluded.resize(numOTUs); for(int j=0;j 0.10 && iter < maxIterations){ maxRho = getExcludedPairs(correlation, excludeRow, excludeColumn); excludeValues(excludeRow, excludeColumn); vector excludedBasisVariances = getBasisVariances(); correlation = getBasisCorrelations(excludedBasisVariances); iter++; } allCorrelations[i] = correlation; } if (!m->control_pressed) { if(numSamplings > 1){ getMedian(allCorrelations); } else{ median = allCorrelations[0]; } } // cout << median[0][3] << '\t' << median[0][6] << endl; } catch(exception& e) { m->errorOut(e, "CalcSparcc", "CalcSparcc"); exit(1); } } /**************************************************************************************************/ void CalcSparcc::addPseudoCount(vector >& sharedVector){ try { for(int i=0;icontrol_pressed) { return; } for(int j=0;jerrorOut(e, "CalcSparcc", "addPseudoCount"); exit(1); } } /**************************************************************************************************/ vector CalcSparcc::getLogFractions(vector > sharedVector, string method){ //dirichlet by default try { vector logSharedFractions(numGroups * numOTUs, 0); if(method == "dirichlet"){ vector alphas(numGroups); for(int i=0;icontrol_pressed) { return logSharedFractions; } alphas = RNG.randomDirichlet(sharedVector[i]); for(int j=0;jcontrol_pressed) { return logSharedFractions; } float total = 0.0; for(int j=0;jerrorOut(e, "CalcSparcc", "addPseudoCount"); exit(1); } } /**************************************************************************************************/ void CalcSparcc::getT_Matrix(vector sharedFractions){ try { tMatrix.resize(numOTUs * numOTUs, 0); vector diff(numGroups); for(int j1=0;j1control_pressed) { return; } float mean = 0.0; for(int i=0;ierrorOut(e, "CalcSparcc", "getT_Matrix"); exit(1); } } /**************************************************************************************************/ void CalcSparcc::getT_Vector(){ try { tVector.assign(numOTUs, 0); for(int j1=0;j1control_pressed) { return; } for(int j2=0;j2errorOut(e, "CalcSparcc", "getT_Vector"); exit(1); } } /**************************************************************************************************/ void CalcSparcc::getD_Matrix(){ try { float d = numOTUs - 1.0; dMatrix.resize(numOTUs); for(int i=0;icontrol_pressed) { return; } dMatrix[i].resize(numOTUs, 1); dMatrix[i][i] = d; } } catch(exception& e) { m->errorOut(e, "CalcSparcc", "getD_Matrix"); exit(1); } } /**************************************************************************************************/ vector CalcSparcc::getBasisVariances(){ try { LinearAlgebra LA; vector variances = LA.solveEquations(dMatrix, tVector); for(int i=0;icontrol_pressed) { return variances; } if(variances[i] < 0){ variances[i] = 1e-4; } } return variances; } catch(exception& e) { m->errorOut(e, "CalcSparcc", "getBasisVariances"); exit(1); } } /**************************************************************************************************/ vector > CalcSparcc::getBasisCorrelations(vector basisVariance){ try { vector > rho(numOTUs); for(int i=0;icontrol_pressed) { return rho; } float var_j = basisVariance[j]; rho[i][j] = (var_i + var_j - tMatrix[i * numOTUs + j]) / (2.0 * sqrt_var_i * sqrt(var_j)); if(rho[i][j] > 1.0) { rho[i][j] = 1.0; } else if(rho[i][j] < -1.0) { rho[i][j] = -1.0; } rho[j][i] = rho[i][j]; } } return rho; } catch(exception& e) { m->errorOut(e, "CalcSparcc", "getBasisCorrelations"); exit(1); } } /**************************************************************************************************/ float CalcSparcc::getExcludedPairs(vector > rho, int& maxRow, int& maxColumn){ try { float maxRho = 0; maxRow = -1; maxColumn = -1; for(int i=0;icontrol_pressed) { return maxRho; } float tester = abs(rho[i][j]); if(tester > maxRho && excluded[i][j] != 1){ maxRho = tester; maxRow = i; maxColumn = j; } } } return maxRho; } catch(exception& e) { m->errorOut(e, "CalcSparcc", "getExcludedPairs"); exit(1); } } /**************************************************************************************************/ void CalcSparcc::excludeValues(int excludeRow, int excludeColumn){ try { tVector[excludeRow] -= tMatrix[excludeRow * numOTUs + excludeColumn]; tVector[excludeColumn] -= tMatrix[excludeRow * numOTUs + excludeColumn]; dMatrix[excludeRow][excludeColumn] = 0; dMatrix[excludeColumn][excludeRow] = 0; dMatrix[excludeRow][excludeRow]--; dMatrix[excludeColumn][excludeColumn]--; excluded[excludeRow][excludeColumn] = 1; excluded[excludeColumn][excludeRow] = 1; } catch(exception& e) { m->errorOut(e, "CalcSparcc", "excludeValues"); exit(1); } } /**************************************************************************************************/ void CalcSparcc::getMedian(vector > > allCorrelations){ try { int numSamples = (int)allCorrelations.size(); median.resize(numOTUs); for(int i=0;i hold(numSamples); for(int i=0;icontrol_pressed) { return; } for(int k=0;kerrorOut(e, "CalcSparcc", "getMedian"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/calcsparcc.h000066400000000000000000000027541255543666200167150ustar00rootroot00000000000000 #ifndef PDSSparCC_runSparcc_h #define PDSSparCC_runSparcc_h // // runSparcc.h // PDSSparCC // // Created by Patrick Schloss on 10/31/12. // Copyright (c) 2012 University of Michigan. All rights reserved. // /**************************************************************************************************/ //#include "sparcc.h" #include "randomnumber.h" #include "mothurout.h" /**************************************************************************************************/ class CalcSparcc { public: CalcSparcc(vector >, int, int, string); vector > getRho() { return median; } private: MothurOut* m; void addPseudoCount(vector >&); vector getLogFractions(vector >, string); void getT_Matrix(vector); void getT_Vector(); void getD_Matrix(); vector getBasisVariances(); vector > getBasisCorrelations(vector); float getExcludedPairs(vector >, int&, int&); void excludeValues(int, int); void getMedian(vector > >); vector tMatrix; vector > dMatrix; vector tVector; vector > excluded; vector > median; int numOTUs; int numGroups; string normalizationMethod; RandomNumberGenerator RNG; }; #endif /**************************************************************************************************/ mothur-1.36.1/source/calculators/000077500000000000000000000000001255543666200167525ustar00rootroot00000000000000mothur-1.36.1/source/calculators/ace.cpp000066400000000000000000000077631255543666200202230ustar00rootroot00000000000000/* * ace.cpp * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "ace.h" /***********************************************************************/ EstOutput Ace::getValues(SAbundVector* rank) { try { data.resize(3,0); double ace, acelci, acehci; double nrare = 0; double srare = 0; double sabund = 0; double Cace, term1, gamace; double numsum = 0; double maxRank = (double)rank->getMaxRank(); for(int i=1;i<=maxRank;i++){ if(i<=abund){ srare += rank->get(i); nrare += i*rank->get(i); numsum += (i-1)*i*rank->get(i); } else if(i>abund) {sabund += rank->get(i);} } double sobs = srare + sabund; if (nrare == 0){ Cace = 0.0000; } else { Cace = 1.0000 -(double)rank->get(1)/(double)nrare; } double denom = Cace * (double)(nrare * (nrare-1)); if(denom <= 0.0){ term1=0.0000; } else { term1 = (double)(srare * numsum)/(double)denom - 1.0; } if(term1 >= 0.0){ gamace = term1; } else { gamace = 0.0; } if(gamace >= 0.64){ gamace = gamace * (1 + (nrare * (1 - Cace) * numsum) / denom); if(gamace<0){ gamace = 0; } } if(Cace == 0.0){ ace = 0.00;}//ace else{ ace = (double)sabund+((double)srare+(double)rank->get(1)*gamace)/Cace;//ace } /* The following code was obtained from Anne Chao for calculating the SE for her ACE estimator My modification was to reset the frequencies so that a singleton is found in rank[1] insted of in rank[0], etc. I have also added the forumlae to calculate the 95% confidence intervals. */ double j,D_s=0,nn=0,ww=0; int Max_Index=rank->getMaxRank()+1; double pp, temp1, temp2; vector Part_N_Part_F(Max_Index+1,0.0); for (j=1; jget(j); for (j=1; jget(j) * j; ww += rank->get(j) * j * ( j - 1); } } double C_hat = 1.-rank->get(1)/double(nn); double Gamma = ( D_s * ww) / ( C_hat * nn * ( nn - 1.)) - 1.; temp1 = double(nn - rank->get(1)); temp2 = double(nn - 1.); if ( Gamma > 0.){ Part_N_Part_F[1] = ( D_s + nn) * ( 1. + rank->get(1) * ww / temp1 / temp2) / temp1 + nn * D_s * ww * ( temp1 - 1.) / ( temp1 * temp1 * temp2 * temp2) - ( nn + rank->get(1)) / temp1; for ( j=2; j<=Max_Index; j++){ if(j<=abund){ Part_N_Part_F[j] = ( nn * temp1 - j * rank->get(1) * D_s) / temp1 / temp1 * ( 1. + rank->get(1) * ww / temp1 / temp2) + j * rank->get(1) * D_s * nn * ( ( j - 1.) * temp1 * temp2 - ww * ( temp1 + temp2)) / temp1 / temp1 / temp1 / temp2 / temp2 + j * rank->get(1) * rank->get(1) / temp1 / temp1; } } } else{ Part_N_Part_F[1] = ( nn + D_s ) / temp1; for ( j=2; j<=Max_Index; j++){ if(j<=abund){ Part_N_Part_F[j-1] = ( nn * temp1 - j * rank->get(1) * D_s ) / temp1 / temp1; } } } if(Max_Index>abund){ for ( j=abund+1; j<=Max_Index; j++){ Part_N_Part_F[j-1] = 1.; } } for ( temp1=0., temp2=0., j=0; jget(j); temp2 += pp * pp * rank->get(j); } double se = temp2 - temp1 * temp1 / ace; if(toString(se) == "nan"){ acelci = ace; acehci = ace; } else if(ace==0.000){ acelci = ace; acehci = ace; } else if(ace==sobs){ double ci = 1.96*pow(se,0.5); acelci = ace-ci; //ace lci acehci = ace+ci; //ace hci }else{ double denom = pow(ace-sobs,2); double c = exp(1.96*pow((log(1+se/denom)),0.5)); acelci = sobs+(ace-sobs)/c; //ace lci acehci = sobs+(ace-sobs)*c; //ace hci } data[0] = ace; data[1] = acelci; data[2] = acehci; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Ace", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/ace.h000066400000000000000000000013611255543666200176540ustar00rootroot00000000000000#ifndef ACE_H #define ACE_H /* * ace.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the Ace estimator on a single group. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class Ace : public Calculator { public: Ace(int n) : abund(n), Calculator("ace", 3, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Ace"; } private: int abund; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/bergerparker.cpp000066400000000000000000000012411255543666200221270ustar00rootroot00000000000000/* * ssbp.cpp * Mothur * * Created by Thomas Ryabin on 3/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "bergerparker.h" /***************************************************************/ EstOutput BergerParker::getValues(SAbundVector* rank){ try { data.resize(1,0); //Berger-Parker index double BP = (double)rank->getMaxRank() / (double)rank->getNumSeqs(); data[0] = BP; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "BergerParker", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/bergerparker.h000066400000000000000000000014241255543666200215770ustar00rootroot00000000000000#ifndef BERGERPARKER_H #define BERGERPARKER_H /* * bergerparker.h * Mothur * * Created by Thomas Ryabin on 3/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /*This class implements the SSBP estimator on single group. It is a child of the calculator class.*/ /***********************************************************************/ class BergerParker : public Calculator { public: BergerParker() : Calculator("bergerparker", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Bergerparker"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/boneh.cpp000066400000000000000000000033001255543666200205450ustar00rootroot00000000000000/* * boneh.cpp * Mothur * * Created by Thomas Ryabin on 5/13/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "boneh.h" #include /***********************************************************************/ //This solves for the value of 'v' using a binary search. double Boneh::getV(double f1, double n, double rs) { if(rs == 0) return 0; double accuracy = .0001; double v = 100000.0; double step = v/2; double ls = v * (1 - pow((1 - f1/(n*v)), n)); while(abs(ls - rs) > accuracy) { if(ls > rs) { v -= step; } else { v += step; } ls = v * (1 - pow((1 - f1/(n * v)), n)); step /= 2; } return v; } /***********************************************************************/ EstOutput Boneh::getValues(SAbundVector* sabund){ try { data.resize(1,0); bool valid = false; double sum = 0; double n = (double)sabund->getNumSeqs(); if(f==0){ f=n; } double f1 = (double)sabund->get(1); for(int i = 1; i < sabund->size(); i++){ sum += (double)sabund->get(i) * exp(-i); } if(sabund->get(1) > sum) valid = true; sum = 0; if(valid) { for(int j = 1; j < sabund->size(); j++){ sum += sabund->get(j) * pow((1 - (double)j / n), n); } double v = getV(f1, n, sum); sum = 0; for(int j = 1; j < sabund->size(); j++) { for (int i = 0; i < sabund->get(j); i++) { sum += pow(1 - j / n, n) * (1 - pow(1 - j / n, f)); } } sum += v * pow(1 - f1/(n*v), n) * (1 - pow(1 - f1/(n*v), f)); } data[0] = sum; return data; } catch(exception& e) { m->errorOut(e, "Boneh", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/boneh.h000066400000000000000000000014501255543666200202160ustar00rootroot00000000000000#ifndef BONEH_H #define BONEH_H /* * boneh.h * Mothur * * Created by Thomas Ryabin on 5/13/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /* This class implements the boneh calculator on single group. It is a child of the calculator class. */ /***********************************************************************/ class Boneh : public Calculator { public: Boneh(int size) : f(size), Calculator("boneh", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Boneh"; } private: double getV(double, double, double); int f; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/bootstrap.cpp000066400000000000000000000015701255543666200214760ustar00rootroot00000000000000/* * bootstrap.cpp * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "bootstrap.h" /***********************************************************************/ EstOutput Bootstrap::getValues(SAbundVector* rank){ try { //vector bootData(3,0); data.resize(1,0); double maxRank = (double)rank->getMaxRank(); double sampled = rank->getNumSeqs(); double sobs = rank->getNumBins(); double boot = (double)sobs; for(int i=1;i<=maxRank;i++){ boot += (double)rank->get(i)*pow((1.0-(double)i/(double)sampled),sampled); } data[0] = boot; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Bootstrap", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/bootstrap.h000066400000000000000000000013751255543666200211460ustar00rootroot00000000000000#ifndef BOOTSTRAP_H #define BOOTSTRAP_H /* * bootstrap.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the Bootstrap estimator on single group. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class Bootstrap : public Calculator { public: Bootstrap() : Calculator("bootstrap", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Bootstrap"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/bstick.cpp000066400000000000000000000041701255543666200207370ustar00rootroot00000000000000/* * bstick.cpp * Mothur * * Created by Thomas Ryabin on 3/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "bstick.h" /***********************************************************************/ double BStick::invSum(int index, double numSpec) { double sum = 0; for(int i = index; i <= numSpec; i++) sum += 1/(double)i; return sum; } /***********************************************************************/ RAbundVector BStick::getRAbundVector(SAbundVector* rank){ vector rData; int mr = 1; int nb = 0; int ns = 0; for(int i = rank->size()-1; i > 0; i--) { double cur = rank->get(i); if(mr == 1 && cur > 0) mr = i; nb += cur; ns += i*cur; for(int j = 0; j < cur; j++) rData.push_back(i); } RAbundVector rav = RAbundVector(rData, mr, nb, ns); rav.setLabel(rank->getLabel()); return rav; } /***************************************************************************/ /***************************************************************************/ EstOutput BStick::getValues(SAbundVector* rank){ try { data.resize(3,0); rdata = getRAbundVector(rank); double numInd = (double)rdata.getNumSeqs(); double numSpec = (double)rdata.getNumBins(); double sumExp = 0; double sumObs = 0; double maxDiff = 0; for(int i = 0; i < rdata.size(); i++) { sumObs += rdata.get(i); sumExp += numInd/numSpec*invSum(i+1,numSpec); double diff = fabs(sumObs-sumExp); if(diff > maxDiff) maxDiff = diff; } data[0] = maxDiff/numInd; data[1] = 0.886/sqrt(rdata.size()); data[2] = 1.031/sqrt(rdata.size()); /*m->mothurOut(critVal); m->mothurOutEndLine(); m->mothurOut("If D-Statistic is less than the critical value then the data fits the Broken Stick model w/ 95% confidence.\n");*/ if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "BStick", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/bstick.h000066400000000000000000000015111255543666200204000ustar00rootroot00000000000000#ifndef BSTICK_H #define BSTICK_H /* * bstick.h * Mothur * * Created by Thomas Ryabin on 3/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /*This class implements the BStick estimator on single group. It is a child of the calculator class.*/ /***********************************************************************/ class BStick : public Calculator { public: BStick() : Calculator("bstick", 3, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Bstick"; } private: double invSum(int, double); RAbundVector getRAbundVector(SAbundVector*); RAbundVector rdata; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/calculator.cpp000066400000000000000000000072351255543666200216160ustar00rootroot00000000000000/* * calculator.cpp * Dotur * * Created by Sarah Westcott on 11/18/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ int VecCalc::sumElements(vector vec){ return sumElements(vec,0); } /***********************************************************************/ int VecCalc::sumElements(vector vec, int index){ int sum = 0; for(int i = index; i < vec.size(); i++) sum += vec.at(i); return sum; } /***********************************************************************/ double VecCalc::sumElements(vector vec){ double sum = 0; for(int i = 0; i < vec.size(); i++) sum += vec.at(i); return sum; } /***********************************************************************/ double VecCalc::sumElements(vector vec, int index){ double sum = 0; for(int i = index; i < vec.size(); i++) sum += vec.at(i); return sum; } /***********************************************************************/ int VecCalc::numNZ(vector vec){ int numNZ = 0; for(int i = 0; i < vec.size(); i++) if(vec.at(i) != 0) numNZ++; return numNZ; } /***********************************************************************/ double VecCalc::numNZ(vector vec){ double numNZ = 0; for(int i = 0; i < vec.size(); i++) if(vec.at(i) != 0) numNZ++; return numNZ; } /***********************************************************************/ double TDTable::getConfLimit(int row, int col) //Rows are the degrees of freedom { //Found on http://www.vgtu.lt/leidiniai/elektroniniai/Probability.pdf/Table%203.pdf //Confidence Level .90 .95 .975 .99 .995 .999 .9995 double values[30][7] = {{3.078, 6.314, 12.706, 31.821, 63.656, 318.289, 636.578}, {1.886, 2.920, 4.303, 6.965, 9.925, 22.328, 31.600}, {1.638, 2.353, 3.182, 4.541, 5.841, 10.214, 12.924}, {1.533, 2.132, 2.776, 3.747, 4.604, 7.173, 8.610}, {1.476, 2.015, 2.571, 3.365, 4.032, 5.894, 6.869}, {1.440, 1.943, 2.447, 3.143, 3.707, 5.208, 5.959}, {1.415, 1.895, 2.365, 2.998, 3.499, 4.785, 5.408}, {1.397, 1.860, 2.306, 2.896, 3.355, 4.501, 5.041}, {1.383, 1.833, 2.262, 2.821, 3.250, 4.297, 4.781}, {1.372, 1.812, 2.228, 2.764, 3.169, 4.144, 4.587}, {1.363, 1.796, 2.201, 2.718, 3.106, 4.025, 4.437}, {1.356, 1.782, 2.179, 2.681, 3.055, 3.930, 4.318}, {1.350, 1.771, 2.160, 2.650, 3.012, 3.852, 4.221}, {1.345, 1.761, 2.145, 2.624, 2.977, 3.787, 4.140}, {1.341, 1.753, 2.131, 2.602, 2.947, 3.733, 4.073}, {1.337, 1.746, 2.120, 2.583, 2.921, 3.686, 4.015}, {1.333, 1.740, 2.110, 2.567, 2.898, 3.646, 3.965}, {1.330, 1.734, 2.101, 2.552, 2.878, 3.610, 3.922}, {1.328, 1.729, 2.093, 2.539, 2.861, 3.579, 3.883}, {1.325, 1.725, 2.086, 2.528, 2.845, 3.552, 3.850}, {1.323, 1.721, 2.080, 2.518, 2.831, 3.527, 3.819}, {1.321, 1.717, 2.074, 2.508, 2.819, 3.505, 3.792}, {1.319, 1.714, 2.069, 2.500, 2.807, 3.485, 3.768}, {1.318, 1.711, 2.064, 2.492, 2.797, 3.467, 3.745}, {1.316, 1.708, 2.060, 2.485, 2.787, 3.450, 3.725}, {1.315, 1.706, 2.056, 2.479, 2.779, 3.435, 3.707}, {1.314, 1.703, 2.052, 2.473, 2.771, 3.421, 3.689}, {1.313, 1.701, 2.048, 2.467, 2.763, 3.408, 3.674}, {1.311, 1.699, 2.045, 2.462, 2.756, 3.396, 3.660}, {1.310, 1.697, 2.042, 2.457, 2.750, 3.385, 3.646}}; return values[row][col]; } /***********************************************************************/ mothur-1.36.1/source/calculators/calculator.h000066400000000000000000000073571255543666200212700ustar00rootroot00000000000000#ifndef CALCULATOR_H #define CALCULATOR_H #include "mothur.h" #include "sabundvector.hpp" #include "sharedsabundvector.h" #include "rabundvector.hpp" #include "uvest.h" #include "mothurout.h" /* The calculator class is the parent class for all the different estimators implemented in mothur except the tree calculators. It has 2 pure functions EstOutput getValues(SAbundVector*), which works on a single group, and EstOutput getValues(SharedRAbundVector* shared1, SharedRAbundVector* shared2), which compares 2 groups. */ typedef vector EstOutput; /***********************************************************************/ class Calculator { public: Calculator(){ m = MothurOut::getInstance(); needsAll = false; } virtual ~Calculator(){}; Calculator(string n, int c, bool f) : name(n), cols(c), multiple(f) { m = MothurOut::getInstance(); needsAll = false; }; Calculator(string n, int c, bool f, bool a) : name(n), cols(c), multiple(f), needsAll(a) { m = MothurOut::getInstance(); }; virtual EstOutput getValues(SAbundVector*) = 0; virtual EstOutput getValues(vector) = 0; //optional calc that returns the otus labels of shared otus virtual EstOutput getValues(vector sv , vector&) { data = getValues(sv); return data; } virtual void print(ostream& f) { f.setf(ios::fixed, ios::floatfield); f.setf(ios::showpoint); f << data[0]; for(int i=1;imothurOut(getCitation()); m->mothurOutEndLine(); } protected: MothurOut* m; EstOutput data; string name; int cols; bool multiple; bool needsAll; }; /**************************************************************************************************/ /*This Class holds all of the methods that manipulate vectors. These methods are used in the other classes. This class must be included if any of the other classes are to be used. */ class VecCalc { // The methods seen in the order here is how they are ordered throughout the class. public: VecCalc(){}; //void printElements(vector); //This prints the values of the vector on one line with a space between each value. //void printElements(vector); //This prints the values of the vector on one line with a space between each value. //int findString(vector, string);//This returns the index of the given string in the given vector, if the string does not exist in the vector it returns -1. //double mean(vector); //This returns the mean value of the vector. //double stError(vector); //This returns the standard error of the vector. int sumElements(vector, int); int sumElements(vector); double sumElements(vector); //This returns the sum of all the values in the vector. double sumElements(vector, int); //This returns the sum of all the values in the vector excluding those whose index is before the given index. //double findMax(vector); //This returns the maximum value in the vector. int numNZ(vector); //This returns the number of non-zero values in the vector. double numNZ(vector); //This returns the number of non-zero values in the vector. }; /**************************************************************************************************/ //This Class stores the table of the confidence limits of the Student-T distribution. class TDTable { public: double getConfLimit(int,int); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/calculators/canberra.cpp000066400000000000000000000017421255543666200212370ustar00rootroot00000000000000/* * canberra.cpp * Mothur * * Created by westcott on 12/14/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "canberra.h" /***********************************************************************/ EstOutput Canberra::getValues(vector shared) { try { data.resize(1,0); int numSharedOTUS = 0; double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); //is this otu shared if ((Aij != 0) && (Bij != 0)) { numSharedOTUS++; } if ((Aij + Bij) != 0) { sum += ((abs(Aij - Bij)) / (float) (Aij + Bij)); } } data[0] = (1 / (float) shared[0]->getNumBins()) * sum; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Canberra", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/canberra.h000066400000000000000000000012041255543666200206750ustar00rootroot00000000000000#ifndef CANBERRA_H #define CANBERRA_H /* * canberra.h * Mothur * * Created by westcott on 12/14/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Canberra : public Calculator { public: Canberra() : Calculator("canberra", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Canberra"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/chao1.cpp000066400000000000000000000042651255543666200204600ustar00rootroot00000000000000/* * chao1.cpp * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "chao1.h" /***********************************************************************/ EstOutput Chao1::getValues(SAbundVector* rank){ try { data.resize(3,0); double sobs = (double)rank->getNumBins(); //this is a modification do to a vector fill error that occurs when an empty sharedRabund creates a sabund //in that case there is no 1 0r 2. double singles; if (rank->size() > 1) { singles = (double)rank->get(1); }else{ singles = 0.0; } double doubles; if (rank->size() > 2) { doubles = (double)rank->get(2); }else{ doubles = 0.0; } double chaovar = 0.0000; //cout << "singles = " << singles << " doubles = " << doubles << " sobs = " << sobs << endl; double chao = sobs + singles*(singles-1)/(2*(doubles+1)); if(singles > 0 && doubles > 0){ chaovar = singles*(singles-1)/(2*(doubles+1)) + singles*pow(2*singles-1, 2)/(4*pow(doubles+1,2)) + pow(singles, 2)*doubles*pow(singles-1, 2)/(4*pow(doubles+1,4)); } else if(singles > 0 && doubles == 0){ chaovar = singles*(singles-1)/2 + singles*pow(2*singles-1, 2)/4 - pow(singles, 4)/(4*chao); } else if(singles == 0){ chaovar = sobs*exp(-1*rank->getNumSeqs()/sobs)*(1-exp(-1*rank->getNumSeqs()/sobs)); } double chaohci, chaolci; if(singles>0){ double denom = pow(chao-sobs,2); double c = exp(1.96*pow((log(1+chaovar/denom)),0.5)); chaolci = sobs+(chao-sobs)/c;//chao lci chaohci = sobs+(chao-sobs)*c;//chao hci } else{ double p = exp(-1*rank->getNumSeqs()/sobs); chaolci = sobs/(1-p)-1.96*pow(sobs*p/(1-p), 0.5); chaohci = sobs/(1-p)+1.96*pow(sobs*p/(1-p), 0.5); if(chaolci < sobs){ chaolci = sobs; } } data[0] = chao; data[1] = chaolci; data[2] = chaohci; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Chao1", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/chao1.h000066400000000000000000000013311255543666200201140ustar00rootroot00000000000000#ifndef CHAO1_H #define CHAO1_H /* * chao1.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /* This class implements the Ace estimator on single group. It is a child of the calculator class. */ /***********************************************************************/ class Chao1 : public Calculator { public: Chao1() : Calculator("chao", 3, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Chao"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/coverage.cpp000066400000000000000000000010551255543666200212520ustar00rootroot00000000000000/* * coverage.cpp * Mothur * * Created by Pat Schloss on 4/22/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "coverage.h" /***********************************************************************/ EstOutput Coverage::getValues(SAbundVector* rank){ try { data.resize(1,0); data[0] = 1. - rank->get(1) / (double)rank->getNumSeqs(); return data; } catch(exception& e) { m->errorOut(e, "Coverage", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/coverage.h000066400000000000000000000013571255543666200207240ustar00rootroot00000000000000#ifndef COVERAGE_H #define COVERAGE_H /* * coverage.h * Mothur * * Created by Pat Schloss on 4/22/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "calculator.h" /* This class implements the coverage estimator on single group. It is a child of the calculator class. */ /***********************************************************************/ class Coverage : public Calculator { public: Coverage() : Calculator("coverage", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Coverage"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/dayhoff.h000066400000000000000000000101701255543666200205420ustar00rootroot00000000000000/* * dayhoff.h * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #ifndef _INC_NJ_DAYHOFF_H_ #define _INC_NJ_DAYHOFF_H_ 1 /* * As sequence divergence increases, we need to correct for multiple hits * when using Kimura distance correction method for amino acid sequences. * * This matrix of values represents the estimated "Accepted Point Mutations" * or PAMs for a range of amino acid sequence divergence, starting at 75% * up through 93% (in 0.1% increments). * * This model is derived from Dayhoff (1978). * * This Dayhoff matrix and the shortcut methods for dealing with Kimura * correction at high sequence divergence (> 75%) are derived from similar * work in Clustal W: * * Thompson, J.D., Higgins, D.G., Gibson, T.J., "CLUSTAL W: * improving the sensitivity of progressive multiple sequence * alignment through sequence weighting, position-specific gap * penalties and weight matrix choice.", * Nucleic Acids Research, 22:4673-4680, 1994 * */ int NJ_dayhoff[]={ 195, 196, 197, 198, 199, 200, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 226, 227, 228, 229, 230, 231, 232, 233, 234, 236, 237, 238, 239, 240, 241, 243, 244, 245, 246, 248, 249, 250, 252, 253, 254, 255, 257, 258, 260, 261, 262, 264, 265, 267, 268, 270, 271, 273, 274, 276, 277, 279, 281, 282, 284, 285, 287, 289, 291, 292, 294, 296, 298, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 328, 330, 332, 335, 337, 339, 342, 344, 347, 349, 352, 354, 357, 360, 362, 365, 368, 371, 374, 377, 380, 383, 386, 389, 393, 396, 399, 403, 407, 410, 414, 418, 422, 426, 430, 434, 438, 442, 447, 451, 456, 461, 466, 471, 476, 482, 487, 493, 498, 504, 511, 517, 524, 531, 538, 545, 553, 560, 569, 577, 586, 595, 605, 615, 626, 637, 649, 661, 675, 688, 703, 719, 736, 754, 775, 796, 819, 845, 874, 907, 945, 988 }; #endif /* _INC_NJ_DAYHOFF_H_ */ mothur-1.36.1/source/calculators/dist.h000066400000000000000000000013221255543666200200640ustar00rootroot00000000000000#ifndef DIST_H #define DIST_H /* * dist.h * Mothur * * Created by Sarah Westcott on 5/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "sequence.hpp" /**************************************************************************************************/ class Dist { public: Dist(){ dist = 0; m = MothurOut::getInstance(); } Dist(const Dist& d) : dist(d.dist) { m = MothurOut::getInstance(); } virtual ~Dist() {} virtual void calcDist(Sequence, Sequence) = 0; double getDist() { return dist; } protected: double dist; MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/calculators/eachgapdist.h000066400000000000000000000022751255543666200214050ustar00rootroot00000000000000#ifndef EACHGAPDIST_H #define EACHGAPDIST_H /* * eachgapdist.h * Mothur * * Created by Sarah Westcott on 5/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "dist.h" /**************************************************************************************************/ class eachGapDist : public Dist { public: eachGapDist() {} void calcDist(Sequence A, Sequence B){ int diff = 0; int length = 0; int start = 0; string seqA = A.getAligned(); string seqB = B.getAligned(); int alignLength = seqA.length(); for(int i=0; i=0;i--){ if(seqA[i] != '.' && seqB[i] != '.' && seqA[i] != '-' && seqB[i] != '-' ){ end = i; // cout << "end: " << end << endl; overlap = true; break; } } for(int i=start;i<=end;i++){ if(seqA[i] == '.' || seqB[i] == '.'){ break; } else if(seqA[i] != '-' || seqB[i] != '-'){ if(seqA[i] != seqB[i]){ diff++; } length++; } } //non-overlapping sequences if (!overlap) { length = 0; } if(length == 0) { dist = 1.0000; } else { dist = ((double)diff / (double)length); } } }; /**************************************************************************************************/ #endif mothur-1.36.1/source/calculators/efron.cpp000066400000000000000000000013301255543666200205640ustar00rootroot00000000000000/* * efron.cpp * Mothur * * Created by Thomas Ryabin on 5/13/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "efron.h" /***********************************************************************/ EstOutput Efron::getValues(SAbundVector* rank){ try { data.resize(1,0); double n = (double)rank->getNumSeqs(); if(f > n || f == 0) { f = n; } double sum = 0; for(int i = 1; i < rank->size(); i++){ sum += pow(-1., i+1) * pow(((double)f / n), i) * (double)(rank->get(i)); } data[0] = sum; return data; } catch(exception& e) { m->errorOut(e, "Efron", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/efron.h000066400000000000000000000014021255543666200202310ustar00rootroot00000000000000#ifndef EFRON_H #define EFRON_H /* * efron.h * Mothur * * Created by Thomas Ryabin on 5/13/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /* This class implements the efron calculator on single group. It is a child of the calculator class. */ /***********************************************************************/ class Efron : public Calculator { public: Efron(int size) : f(size), Calculator("efron", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Efron"; } private: int f; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/filters.h000066400000000000000000000047211255543666200205770ustar00rootroot00000000000000#ifndef FILTERS_H #define FILTERS_H /* * filters.h * Mothur * * Created by Sarah Westcott on 6/29/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "sequence.hpp" /***********************************************************************/ class Filters { public: Filters() { m = MothurOut::getInstance(); }; ~Filters(){}; string getFilter() { return filter; } void setFilter(string s) { filter = s; } void setLength(int l) { alignmentLength = l; } void setSoft(float s) { soft = s; } void setTrump(float t) { trump = t; } void setNumSeqs(int num) { numSeqs = num; } vector a, t, g, c, gap; void initialize() { a.assign(alignmentLength, 0); t.assign(alignmentLength, 0); g.assign(alignmentLength, 0); c.assign(alignmentLength, 0); gap.assign(alignmentLength, 0); } void doSoft() { int threshold = int (soft * numSeqs); for(int i=0;iopenInputFile(hard, fileHandle); fileHandle >> filter; fileHandle.close(); if (filter.length() != alignmentLength) { m->mothurOut("[ERROR]: Sequences are not all the same length as the filter, please correct.\n"); m->control_pressed = true; } } void getFreqs(Sequence seq) { string curAligned = seq.getAligned(); for(int j=0;j rData; int mr = 1; int nb = 0; int ns = 0; for(int i = rank->size()-1; i > 0; i--) { int cur = rank->get(i); if(mr == 1 && cur > 0) mr = i; nb += cur; ns += i*cur; for(int j = 0; j < cur; j++) rData.push_back(i); } RAbundVector rav = RAbundVector(rData, mr, nb, ns); rav.setLabel(rank->getLabel()); return rav; } /***********************************************************************************/ /***********************************************************************************/ EstOutput Geom::getValues(SAbundVector* rank){ try { data.resize(3,0); rdata = getRAbundVector(rank); double numInd = rdata.getNumSeqs(); double numSpec = rdata.getNumBins(); double min = rdata.get(rdata.size()-1); double k = .5; double step = .49999; while(fabs(min - numInd*kEq(k, (double)numSpec)) > .0001) { //This uses a binary search to find the value of k. if(numInd*kEq(k, numSpec) > min) k += step; else k -= step; step /= 2; } double cK = 1/(1-pow(1-k, numSpec)); double sumExp = 0; double sumObs = 0; double maxDiff = 0; for(int i = 0; i < numSpec; i++) { sumObs += rdata.get(i); sumExp += numInd*cK*k*pow(1-k, i); double diff = fabs(sumObs-sumExp); if(diff > maxDiff) { maxDiff = diff; } } data[0] = maxDiff/numInd; data[1] = 0.886/sqrt(numSpec); data[2] = 1.031/sqrt(numSpec); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Geom", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/geom.h000066400000000000000000000015221255543666200200520ustar00rootroot00000000000000#ifndef GEOM_H #define GEOM_H /* * geom.h * Mothur * * Created by Thomas Ryabin on 2/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /* This class implements the geometric estimator on single group. It is a child of the calculator class. */ /***********************************************************************/ class Geom : public Calculator { public: Geom() : Calculator("geometric", 3, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Geometric"; } private: double kEq(double, double); RAbundVector getRAbundVector(SAbundVector*); RAbundVector rdata; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/goodscoverage.cpp000066400000000000000000000013361255543666200223100ustar00rootroot00000000000000/* * goodscoverage.cpp * Mothur * * Created by Thomas Ryabin on 4/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "goodscoverage.h" #include "calculator.h" /**********************************************************/ EstOutput GoodsCoverage::getValues(SAbundVector* rank){ try { data.resize(1,0); double numSingletons = rank->get(1); double totalIndividuals = rank->getNumSeqs(); data[0] = 1 - numSingletons/totalIndividuals; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "GoodsCoverage", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/goodscoverage.h000066400000000000000000000014341255543666200217540ustar00rootroot00000000000000#ifndef GOODSCOVERAGE_H #define GOODSCOVERAGE_H /* * goodscoverage.h * Mothur * * Created by Thomas Ryabin on 4/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /* This class implements the LogSD estimator on single group. It is a child of the calculator class. */ /***********************************************************************/ class GoodsCoverage : public Calculator { public: GoodsCoverage() : Calculator("goodscoverage", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/GoodsCoverage"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/gower.cpp000066400000000000000000000026141255543666200206040ustar00rootroot00000000000000/* * gower.cpp * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "gower.h" /***********************************************************************/ EstOutput Gower::getValues(vector shared) { try { data.resize(1,0); vector maxOtus; maxOtus.resize(shared[0]->getNumBins()); vector minOtus; minOtus.resize(shared[0]->getNumBins()); //for each otu for (int i = 0; i < shared[0]->getNumBins(); i++) { //set otus min and max to first one minOtus[i] = shared[0]->getAbundance(i); maxOtus[i] = shared[0]->getAbundance(i); //for each group for (int j = 1; j < shared.size(); j++) { maxOtus[i] = max(shared[j]->getAbundance(i), maxOtus[i]); minOtus[i] = min(shared[j]->getAbundance(i), minOtus[i]); } } double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int A = shared[0]->getAbundance(i); int B = shared[1]->getAbundance(i); double numerator = abs(A - B); double denominator = maxOtus[i] - minOtus[i]; if (denominator != 0) { sum += (numerator / denominator); } } data[0] = sum; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Gower", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/gower.h000066400000000000000000000013031255543666200202430ustar00rootroot00000000000000#ifndef GOWER_H #define GOWER_H /* * gower.h * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Gower : public Calculator { public: Gower() : Calculator("gower", 1, false, true) {}; //the true means this calculator needs all groups to calculate the pair value EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Gower"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/hamming.cpp000066400000000000000000000016441255543666200211030ustar00rootroot00000000000000/* * hamming.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "hamming.h" /***********************************************************************/ EstOutput Hamming::getValues(vector shared) { try { data.resize(1,0); int numA = 0; int numB = 0; int numShared = 0; //calc the 2 denominators for (int i = 0; i < shared[0]->getNumBins(); i++) { int A = shared[0]->getAbundance(i); int B = shared[1]->getAbundance(i); if (A != 0) { numA++; } if (B != 0) { numB++; } if ((A != 0) && (B != 0)) { numShared++; } } data[0] = numA + numB - (2 * numShared); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Hamming", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/hamming.h000066400000000000000000000011751255543666200205470ustar00rootroot00000000000000#ifndef HAMMING_H #define HAMMING_H /* * hamming.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Hamming : public Calculator { public: Hamming() : Calculator("hamming", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Hamming"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/heip.cpp000066400000000000000000000013711255543666200204050ustar00rootroot00000000000000/* * heip.cpp * Mothur * * Created by Pat Schloss on 8/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "heip.h" #include "shannon.h" /***********************************************************************/ EstOutput Heip::getValues(SAbundVector* rank){ try { data.resize(1,0.0000); vector shanData(3,0); Shannon* shannon = new Shannon(); shanData = shannon->getValues(rank); long int sobs = rank->getNumBins(); if(sobs > 1){ data[0] = (exp(shanData[0])-1) / (sobs - 1);; } else{ data[0] = 1; } delete shannon; return data; } catch(exception& e) { m->errorOut(e, "Heip", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/heip.h000066400000000000000000000011271255543666200200510ustar00rootroot00000000000000#ifndef HEIP #define HEIP /* * heip.h * Mothur * * Created by Pat Schloss on 8/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Heip : public Calculator { public: Heip() : Calculator("heip", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Heip"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/hellinger.cpp000066400000000000000000000021411255543666200214250ustar00rootroot00000000000000/* * hellinger.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "hellinger.h" /***********************************************************************/ EstOutput Hellinger::getValues(vector shared) { try { data.resize(1,0); double sumA = 0.0; double sumB = 0.0; //calc the 2 denominators for (int i = 0; i < shared[0]->getNumBins(); i++) { sumA += shared[0]->getAbundance(i); sumB += shared[1]->getAbundance(i); } //calc sum double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); double term1 = sqrt((Aij / sumA)); double term2 = sqrt((Bij / sumB)); sum += ((term1 - term2) * (term1 - term2)); } data[0] = sqrt(sum); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Hellinger", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/hellinger.h000066400000000000000000000012131255543666200210710ustar00rootroot00000000000000#ifndef HELLINGER_H #define HELLINGER_H /* * hellinger.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Hellinger : public Calculator { public: Hellinger() : Calculator("hellinger", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Hellinger"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/ignoregaps.h000066400000000000000000000025321255543666200212630ustar00rootroot00000000000000#ifndef IGNOREGAPS_H #define IGNOREGAPS_H /* * ignoregaps.h * Mothur * * Created by Sarah Westcott on 5/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "dist.h" /**************************************************************************************************/ // this class calculates distances by ignoring all gap characters. so if seq a has an "A" and seq // b has a '-', there is no penalty class ignoreGaps : public Dist { public: ignoreGaps() {} void calcDist(Sequence A, Sequence B){ int diff = 0; int length = 0; int start = 0; bool overlap = false; string seqA = A.getAligned(); string seqB = B.getAligned(); int alignLength = seqA.length(); for(int i=0;i simpsonData(3,0); data.resize(3,0); vector simpData(3,0); Simpson* simp = new Simpson(); simpData = simp->getValues(rank); if(simpData[0] != 0){ data[0] = 1/simpData[0]; data[1] = 1/simpData[2]; data[2] = 1/simpData[1]; } else{ data.assign(3,1); } delete simp; return data; } catch(exception& e) { m->errorOut(e, "InvSimpson", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/invsimpson.h000066400000000000000000000012031255543666200213240ustar00rootroot00000000000000#ifndef INVSIMPSON #define INVSIMPSON /* * invsimpson.h * Mothur * * Created by Pat Schloss on 8/20/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class InvSimpson : public Calculator { public: InvSimpson() : Calculator("invsimpson", 3, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/InvSimpson"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/jackknife.cpp000066400000000000000000000063771255543666200214200ustar00rootroot00000000000000/* * jacknife.cpp * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "jackknife.h" /***********************************************************************/ void Jackknife::getAMatrix(void){ try { vector > B = m->binomial(maxOrder); aMat.resize(maxOrder+1); for(int i=0;i<=maxOrder;i++){ aMat[i].resize(maxOrder+1); for(int j=1;j<=maxOrder;j++){ aMat[i][j] = 1 + B[i][j] * (int)(pow(-1.0,j+1)); } } } catch(exception& e) { m->errorOut(e, "Jackknife", "getAMatrix"); exit(1); } } /**************************************************************************************************/ double Jackknife::CN(double z){ try { if(z>6.0) { return 0.0; } if(z<-6.0) { return 0.0; } const double b1= 0.31938153; const double b2= -0.356563782; const double b3= 1.781477937; const double b4= -1.821255978; const double b5= 1.330274429; const double p= 0.2316419; const double c2= 0.3989423; double a=abs(z); double t=1.0/(1.0+a*p); double b=c2*exp((-z)*(z/2.0)); double n=((((b5*t+b4)*t+b3)*t+b2)*t+b1)*t; n = 2*b*n; return n; } catch(exception& e) { m->errorOut(e, "Jackknife", "CN"); exit(1); } } /***********************************************************************/ EstOutput Jackknife::getValues(SAbundVector* rank){ try { //EstOutput jackData(3,0); data.resize(3,0); double jack, jacklci, jackhci; int maxRank = (double)rank->getMaxRank(); int S = rank->getNumBins(); double N[maxOrder+1]; double variance[maxOrder+1]; double p[maxOrder+1]; int k = 0; for(int i=0;i<=maxOrder;i++){ N[i]=0.0000; variance[i]=0.0000; for(int j=1;j<=maxRank;j++){ if(j<=i){ N[i] += aMat[i][j]*rank->get(j); variance[i] += aMat[i][j]*aMat[i][j]*rank->get(j); } else{ N[i] += rank->get(j); variance[i] += rank->get(j); } } variance[i] = variance[i]-N[i]; double var = 0.0000; if(i>0){ for(int j=1;j<=maxRank;j++){ if(j<=i){ var += rank->get(j)*pow((aMat[i][j]-aMat[i-1][j]),2.0); } else { var += 0.0000; } } var -= ((N[i]-N[i-1])*(N[i]-N[i-1]))/S; var = var * S / (S-1); double T = (N[i]-N[i-1])/sqrt(var); if(T<=0.00){ p[i-1] = 1.00000; } else{ p[i-1] = CN(T); } if(p[i-1]>=0.05){ k = i-1; break; } } if(i == maxOrder){ k=1; } } double ci = 0; if(k>1){ double c = (0.05-p[k-1])/(p[k]-p[k-1]); ci = 0.0000; jack = c*N[k]+(1-c)*N[k-1]; for(int j=1;j<=maxRank;j++){ if(j<=k){ ci += rank->get(j)*pow((c*aMat[k][j]+(1-c)*aMat[k-1][j]),2.0); } else { ci += rank->get(j); } } ci = 1.96 * sqrt(ci - jack); } else if(k==1){ jack = N[1]; ci = 1.96*sqrt(variance[1]); }else{ jack = 0.0; ci = 0.0; } jacklci = jack-ci; jackhci = jack+ci; data[0] = jack; data[1] = jacklci; data[2] = jackhci; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Jackknife", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/jackknife.h000066400000000000000000000015741255543666200210570ustar00rootroot00000000000000#ifndef JACKKNIFE_H #define JACKKNIFE_H /* * jacknife.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /* This class implements the JackKnife estimator on single group. It is a child of the calculator class. */ /***********************************************************************/ class Jackknife : public Calculator { public: Jackknife() : Calculator("jackknife", 3, false) { getAMatrix(); }; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Jackknife"; } private: static const int maxOrder = 30; vector > aMat; void getAMatrix(); double CN(double); }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/logsd.cpp000066400000000000000000000131731255543666200205730ustar00rootroot00000000000000/* * logsd.cpp * Mothur * * Created by Thomas Ryabin on 2/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "logsd.h" /***********************************************************************/ double LogSD::logS(double x){ return -(1-x)*log(1-x)/x; } /***********************************************************************/ EstOutput LogSD::getValues(SAbundVector* rank){ try { /*test data VVV int dstring[] = {0,37,22,12,12,11,11,6,4,3,5,2,4,2,3,2,2,4,2,0,4,4,1,1,0,1,0,0,2,2,0,0,0,2,2,0,0,0,1,1,3,0,2,0,0,0,0,0,2,0,0,1,1,1,0,0,0,0,1,0,0,1,0,0,2,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}; vector dvec; for(int i = 0; i < 1800; i++) dvec.push_back(dstring[i]); int mr = 1799; int nb = 197; int ns = 6815; SAbundVector rankw = SAbundVector(dvec, mr,nb,ns); SAbundVector *rank = &rankw;*/ data.resize(3,0); double numInd = rank->getNumSeqs(); double numSpec = rank->getNumBins(); double snRatio = (double)numSpec/numInd; double x = .5; double step = .4999999999; while(fabs(snRatio - logS(x)) > .00001) { //This uses a binary search to find the value of x. if(logS(x) > snRatio) x += step; else x -= step; step /= 2; } double alpha = numInd*(1-x)/x; double oct = 1; double octSumObs = 0; double sumObs = 0; double octSumExp = 0; double sumExp = 0; double maxDiff = 0; for(int y = 1; y < rank->size(); y++) { if(y - .5 < pow(2.0, oct)) { octSumObs += rank->get(y); octSumExp += alpha*pow(x,y)/(y); } else { sumObs += octSumObs; octSumObs = rank->get(y); sumExp += octSumExp; octSumExp = alpha*pow(x,y)/(y); oct++; } if(y == rank->size()-1) { sumObs += octSumObs; sumExp += octSumExp; } double diff = fabs(sumObs - .5 - sumExp); if(diff > maxDiff) maxDiff = diff; } data[0] = (maxDiff + .5)/numSpec; data[1] = 0.886/sqrt(numSpec); data[2] = 1.031/sqrt(numSpec); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "LogSD", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/logsd.h000066400000000000000000000014351255543666200202360ustar00rootroot00000000000000#ifndef LOGSD_H #define LOGSD_H /* * logsd.h * Mothur * * Created by Thomas Ryabin on 2/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /*This class implements the LogSD estimator on single group. It is a child of the calculator class.*/ /***********************************************************************/ class LogSD : public Calculator { public: LogSD() : Calculator("logseries", 3, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/LogSeries"; } private: double logS(double); RAbundVector rdata; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/manhattan.cpp000066400000000000000000000014241255543666200214320ustar00rootroot00000000000000/* * manhattan.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "manhattan.h" /***********************************************************************/ EstOutput Manhattan::getValues(vector shared) { try { data.resize(1,0); double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); sum += abs((Aij - Bij)); } data[0] = sum; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Manhattan", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/manhattan.h000066400000000000000000000012111255543666200210710ustar00rootroot00000000000000#ifndef MANHATTAN_H #define MANHATTAN_H /* * manhattan.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Manhattan : public Calculator { public: Manhattan() : Calculator("manhattan", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Manhattan"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/memchi2.cpp000066400000000000000000000026051255543666200210050ustar00rootroot00000000000000/* * memchi2.cpp * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "memchi2.h" /***********************************************************************/ EstOutput MemChi2::getValues(vector shared) { try { data.resize(1,0); int nonZeroA = 0; int nonZeroB = 0; int totalOtus = shared[0]->getNumBins(); //int totalGroups = shared.size(); //for each otu for (int i = 0; i < shared[0]->getNumBins(); i++) { if (shared[0]->getAbundance(i) != 0) { nonZeroA++; } if (shared[1]->getAbundance(i) != 0) { nonZeroB++; } } double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int A = shared[0]->getAbundance(i); int B = shared[1]->getAbundance(i); if (A > 0) { A = 1; } if (B > 0) { B = 1; } double Aterm = A / (float) nonZeroA; double Bterm = B / (float) nonZeroB; int incidence = 0; for(int j=0;jgetAbundance(i) != 0){ incidence++; } } if(incidence != 0){ sum += (((Aterm-Bterm)*(Aterm-Bterm))/incidence); } } data[0] = sqrt(totalOtus * sum); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "MemChi2", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/memchi2.h000066400000000000000000000013211255543666200204440ustar00rootroot00000000000000#ifndef MEMCHI2_H #define MEMCHI2_H /* * memchi2.h * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class MemChi2 : public Calculator { public: MemChi2() : Calculator("memchi2", 1, false, true) {}; //the true means this calculator needs all groups to calculate the pair value EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Memchi2"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/memchord.cpp000066400000000000000000000022761255543666200212630ustar00rootroot00000000000000/* * memchord.cpp * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "memchord.h" /***********************************************************************/ EstOutput MemChord::getValues(vector shared) { try { data.resize(1,0); double nonZeroA = 0; double nonZeroB = 0; //for each otu for (int i = 0; i < shared[0]->getNumBins(); i++) { if (shared[0]->getAbundance(i) != 0) { nonZeroA++; } if (shared[1]->getAbundance(i) != 0) { nonZeroB++; } } nonZeroA = sqrt(nonZeroA); nonZeroB = sqrt(nonZeroB); double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int A = shared[0]->getAbundance(i); int B = shared[1]->getAbundance(i); if (A > 0) { A = 1; } if (B > 0) { B = 1; } double Aterm = A / nonZeroA; double Bterm = B / nonZeroB; sum += ((Aterm-Bterm)*(Aterm-Bterm)); } data[0] = sqrt(sum); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "MemChord", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/memchord.h000066400000000000000000000012051255543666200207170ustar00rootroot00000000000000#ifndef MEMCHORD_H #define MEMCHORD_H /* * memchord.h * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class MemChord : public Calculator { public: MemChord() : Calculator("memchord", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Memchord"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/memeuclidean.cpp000066400000000000000000000015161255543666200221110ustar00rootroot00000000000000/* * memeuclidean.cpp * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "memeuclidean.h" /***********************************************************************/ EstOutput MemEuclidean::getValues(vector shared) { try { data.resize(1,0); double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int A = shared[0]->getAbundance(i); int B = shared[1]->getAbundance(i); if (A > 0) { A = 1; } if (B > 0) { B = 1; } sum += ((A-B)*(A-B)); } data[0] = sqrt(sum); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "MemEuclidean", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/memeuclidean.h000066400000000000000000000012401255543666200215500ustar00rootroot00000000000000#ifndef MEMEUCLIDEAN_H #define MEMEUCLIDEAN_H /* * memeuclidean.h * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class MemEuclidean : public Calculator { public: MemEuclidean() : Calculator("memeuclidean", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Memeuclidean"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/mempearson.cpp000066400000000000000000000030761255543666200216320ustar00rootroot00000000000000/* * mempearson.cpp * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mempearson.h" /***********************************************************************/ EstOutput MemPearson::getValues(vector shared) { try { data.resize(1,0); int nonZeroA = 0; int nonZeroB = 0; int numOTUS = shared[0]->getNumBins(); //for each otu for (int i = 0; i < shared[0]->getNumBins(); i++) { if (shared[0]->getAbundance(i) != 0) { nonZeroA++; } if (shared[1]->getAbundance(i) != 0) { nonZeroB++; } } double numTerm = 0.0; double denomTerm1 = 0.0; double denomTerm2 = 0.0; double averageA = nonZeroA / (float) numOTUS; double averageB = nonZeroB / (float) numOTUS; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); if (Aij > 0) { Aij = 1; } if (Bij > 0) { Bij = 1; } numTerm += ((Aij - averageA) * (Bij - averageB)); denomTerm1 += ((Aij - averageA) * (Aij - averageA)); denomTerm2 += ((Bij - averageB) * (Bij - averageB)); } denomTerm1 = sqrt(denomTerm1); denomTerm2 = sqrt(denomTerm2); double denom = denomTerm1 * denomTerm2; if (denom != 0) { data[0] = (numTerm / denom); }else { data[0] = 1.0; } if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "MemPearson", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/mempearson.h000066400000000000000000000012231255543666200212670ustar00rootroot00000000000000#ifndef MEMPEARSON_H #define MEMPEARSON_H /* * mempearson.h * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class MemPearson : public Calculator { public: MemPearson() : Calculator("mempearson", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Mempearson"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/npshannon.cpp000066400000000000000000000020711255543666200214600ustar00rootroot00000000000000/* * npshannon.cpp * Dotur * * Created by John Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "npshannon.h" /***********************************************************************/ EstOutput NPShannon::getValues(SAbundVector* rank){ try { data.resize(1,0); float npShannon = 0.0000; double maxRank = (double)rank->getMaxRank(); double sampled = rank->getNumSeqs(); double Chat = 1.0000 - (double)rank->get(1)/(double)sampled; if(Chat>0) { for(int i=1;i<=maxRank;i++){ double pi = ((double) i)/((double)sampled); double ChatPi = Chat*pi; if(ChatPi>0){ npShannon += rank->get(i) * ChatPi*log(ChatPi)/(1-pow(1-ChatPi,(double)sampled)); } } npShannon = -npShannon; } else{ npShannon = 0.000; } data[0] = npShannon; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "NPShannon", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/npshannon.h000066400000000000000000000014061255543666200211260ustar00rootroot00000000000000#ifndef NPSHANNON_H #define NPSHANNON_H /* * npshannon.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the NPShannon estimator on single group. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class NPShannon : public Calculator { public: NPShannon() : Calculator("npshannon", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Npshannon"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/nseqs.h000066400000000000000000000023141255543666200202540ustar00rootroot00000000000000#ifndef NSEQS_H #define NSEQS_H /* * nseqs.h * Mothur * * Created by Sarah Westcott on 3/16/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class NSeqs : public Calculator { public: NSeqs() : Calculator("nseqs", 1, false) {}; EstOutput getValues(SAbundVector* rank){ data.resize(1,0); data[0] = (double)rank->getNumSeqs(); return data; } EstOutput getValues(vector shared) { //return number of sequences in the sharedotus int numGroups = shared.size(); data.clear(); data.resize(numGroups,0); for (int i = 0; i < shared[0]->getNumBins(); i++) { //get bin values and set sharedByAll bool sharedByAll = true; for (int j = 0; j < numGroups; j++) { if (shared[j]->getAbundance(i) == 0) { sharedByAll = false; } } //they are shared if (sharedByAll == true) { for (int j = 0; j < numGroups; j++) { data[j] += shared[j]->getAbundance(i); } } } return data; } string getCitation() { return "http://www.mothur.org/wiki/Nseqs"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/odum.cpp000066400000000000000000000015111255543666200204200ustar00rootroot00000000000000/* * odum.cpp * Mothur * * Created by westcott on 12/14/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "odum.h" /***********************************************************************/ EstOutput Odum::getValues(vector shared) { try { data.resize(1,0); double sumNum = 0.0; double sumDenom = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); sumNum += abs(Aij - Bij); sumDenom += (Aij + Bij); } data[0] = sumNum / sumDenom; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Odum", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/odum.h000066400000000000000000000011501255543666200200640ustar00rootroot00000000000000#ifndef ODUM_H #define ODUM_H /* * odum.h * Mothur * * Created by westcott on 12/14/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Odum : public Calculator { public: Odum() : Calculator("odum", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Odum"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/onegapdist.h000066400000000000000000000033471255543666200212670ustar00rootroot00000000000000#ifndef ONEGAPDIST_H #define ONEGAPDIST_H /* * onegapdist.h * Mothur * * Created by Sarah Westcott on 5/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "dist.h" /**************************************************************************************************/ class oneGapDist : public Dist { public: oneGapDist() {} void calcDist(Sequence A, Sequence B){ int difference = 0; int minLength = 0; int openGapA = 0; int openGapB = 0; int start = 0; string seqA = A.getAligned(); string seqB = B.getAligned(); int alignLength = seqA.length(); for(int i=0;i=0;i--){ if(seqA[i] != '.' && seqB[i] != '.' && seqA[i] != '-' && seqB[i] != '-' ){ end = i; // cout << "end: " << end << endl; overlap = true; break; } } for(int i=start;i<=end;i++){ if(seqA[i] == '-' && seqB[i] == '-'){ ; } else if(seqB[i] != '-' && seqA[i] == '-'){ if(openGapA == 0){ difference++; minLength++; openGapA = 1; openGapB = 0; } } else if(seqA[i] != '-' && seqB[i] == '-'){ if(openGapB == 0){ difference++; minLength++; openGapA = 0; openGapB = 1; } } else if(seqA[i] != '-' && seqB[i] != '-'){ if(seqA[i] != seqB[i]){ difference++; } minLength++; openGapA = 0; openGapB = 0; } } //non-overlapping sequences if (!overlap) { minLength = 0; } if(minLength == 0) { dist = 1.0000; } else { dist = (double)difference / minLength; } } }; /**************************************************************************************************/ #endif mothur-1.36.1/source/calculators/parsimony.cpp000066400000000000000000000255501255543666200215060ustar00rootroot00000000000000/* * parsimony.cpp * Mothur * * Created by Sarah Westcott on 1/26/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "parsimony.h" /**************************************************************************************************/ EstOutput Parsimony::getValues(Tree* t, int p, string o) { try { processors = p; outputDir = o; CountTable* ct = t->getCountTable(); //if the users enters no groups then give them the score of all groups vector mGroups = m->getGroups(); int numGroups = mGroups.size(); //calculate number of comparsions int numComp = 0; vector< vector > namesOfGroupCombos; for (int r=0; r groups; groups.push_back(mGroups[r]); groups.push_back(mGroups[l]); //cout << globaldata->Groups[r] << '\t' << globaldata->Groups[l] << endl; namesOfGroupCombos.push_back(groups); } } //numComp+1 for AB, AC, BC, ABC if (numComp != 1) { vector groups; if (numGroups == 0) { //get score for all users groups vector tGroups = ct->getNamesOfGroups(); for (int i = 0; i < tGroups.size(); i++) { if (tGroups[i] != "xxx") { groups.push_back(tGroups[i]); //cout << tmap->namesOfGroups[i] << endl; } } namesOfGroupCombos.push_back(groups); }else { for (int i = 0; i < mGroups.size(); i++) { groups.push_back(mGroups[i]); //cout << globaldata->Groups[i] << endl; } namesOfGroupCombos.push_back(groups); } } lines.clear(); int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } data = createProcesses(t, namesOfGroupCombos, ct); return data; } catch(exception& e) { m->errorOut(e, "Parsimony", "getValues"); exit(1); } } /**************************************************************************************************/ EstOutput Parsimony::createProcesses(Tree* t, vector< vector > namesOfGroupCombos, CountTable* ct) { try { int process = 1; vector processIDS; bool recalc = false; EstOutput results; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ EstOutput myresults; myresults = driver(t, namesOfGroupCombos, lines[process].start, lines[process].num, ct); if (m->control_pressed) { exit(0); } //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".pars.results.temp"; m->openOutputFile(tempFile, out); out << myresults.size() << endl; for (int i = 0; i < myresults.size(); i++) { out << myresults[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".pars.results.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".pars.results.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ EstOutput myresults; myresults = driver(t, namesOfGroupCombos, lines[process].start, lines[process].num, ct); if (m->control_pressed) { exit(0); } //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".pars.results.temp"; m->openOutputFile(tempFile, out); out << myresults.size() << endl; for (int i = 0; i < myresults.size(); i++) { out << myresults[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } results = driver(t, namesOfGroupCombos, lines[0].start, lines[0].num, ct); //force parent to wait until all the processes are done for (int i=0;icontrol_pressed) { return results; } //get data created by processes for (int i=0;iopenInputFile(s, in); //get scores if (!in.eof()) { int num; in >> num; m->gobble(in); if (m->control_pressed) { break; } double w; for (int j = 0; j < num; j++) { in >> w; results.push_back(w); } m->gobble(in); } in.close(); m->mothurRemove(s); } #else //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; vector cts; vector trees; //Create processor worker threads. for( int i=1; icopy(ct); Tree* copyTree = new Tree(copyCount); copyTree->getCopy(t); cts.push_back(copyCount); trees.push_back(copyTree); parsData* temppars = new parsData(m, lines[i].start, lines[i].num, namesOfGroupCombos, copyTree, copyCount); pDataArray.push_back(temppars); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyParsimonyThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } results = driver(t, namesOfGroupCombos, lines[0].start, lines[0].num, ct); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (int j = 0; j < pDataArray[i]->results.size(); j++) { results.push_back(pDataArray[i]->results[j]); } delete cts[i]; delete trees[i]; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif return results; } catch(exception& e) { m->errorOut(e, "Parsimony", "createProcesses"); exit(1); } } /**************************************************************************************************/ EstOutput Parsimony::driver(Tree* t, vector< vector > namesOfGroupCombos, int start, int num, CountTable* ct) { try { EstOutput results; results.resize(num); Tree* copyTree = new Tree(ct); int count = 0; for (int h = start; h < (start+num); h++) { if (m->control_pressed) { delete copyTree; return results; } int score = 0; //groups in this combo vector groups = namesOfGroupCombos[h]; //copy users tree so that you can redo pgroups copyTree->getCopy(t); //create pgroups that reflect the groups the user want to use for(int i=copyTree->getNumLeaves();igetNumNodes();i++){ copyTree->tree[i].pGroups = (copyTree->mergeUserGroups(i, groups)); } for(int i=copyTree->getNumLeaves();igetNumNodes();i++){ if (m->control_pressed) { return data; } int lc = copyTree->tree[i].getLChild(); int rc = copyTree->tree[i].getRChild(); int iSize = copyTree->tree[i].pGroups.size(); int rcSize = copyTree->tree[rc].pGroups.size(); int lcSize = copyTree->tree[lc].pGroups.size(); //if isize are 0 then that branch is to be ignored if (iSize == 0) { } else if ((rcSize == 0) || (lcSize == 0)) { } //if you have more groups than either of your kids then theres been a change. else if(iSize > rcSize || iSize > lcSize){ score++; } } results[count] = score; count++; } delete copyTree; return results; } catch(exception& e) { m->errorOut(e, "Parsimony", "driver"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/calculators/parsimony.h000066400000000000000000000063211255543666200211460ustar00rootroot00000000000000#ifndef PARSIMONY_H #define PARSIMONY_H /* * parsimony.h * Mothur * * Created by Sarah Westcott on 1/26/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "treecalculator.h" #include "counttable.h" /***********************************************************************/ class Parsimony : public TreeCalculator { public: Parsimony() {}; ~Parsimony() {}; EstOutput getValues(Tree*, int, string); private: struct linePair { int start; int num; linePair(int i, int j) : start(i), num(j) {} }; vector lines; EstOutput data; int processors; string outputDir; EstOutput driver(Tree*, vector< vector >, int, int, CountTable*); EstOutput createProcesses(Tree*, vector< vector >, CountTable*); }; /***********************************************************************/ struct parsData { int start; int num; MothurOut* m; EstOutput results; vector< vector > namesOfGroupCombos; Tree* t; CountTable* ct; parsData(){} parsData(MothurOut* mout, int st, int en, vector< vector > ngc, Tree* tree, CountTable* count) { m = mout; start = st; num = en; namesOfGroupCombos = ngc; t = tree; ct = count; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyParsimonyThreadFunction(LPVOID lpParam){ parsData* pDataArray; pDataArray = (parsData*)lpParam; try { pDataArray->results.resize(pDataArray->num); Tree* copyTree = new Tree(pDataArray->ct); int count = 0; for (int h = pDataArray->start; h < (pDataArray->start+pDataArray->num); h++) { if (pDataArray->m->control_pressed) { delete copyTree; return 0; } int score = 0; //groups in this combo vector groups = pDataArray->namesOfGroupCombos[h]; //copy users tree so that you can redo pgroups copyTree->getCopy(pDataArray->t); //create pgroups that reflect the groups the user want to use for(int i=copyTree->getNumLeaves();igetNumNodes();i++){ copyTree->tree[i].pGroups = (copyTree->mergeUserGroups(i, groups)); } for(int i=copyTree->getNumLeaves();igetNumNodes();i++){ if (pDataArray->m->control_pressed) { return 0; } int lc = copyTree->tree[i].getLChild(); int rc = copyTree->tree[i].getRChild(); int iSize = copyTree->tree[i].pGroups.size(); int rcSize = copyTree->tree[rc].pGroups.size(); int lcSize = copyTree->tree[lc].pGroups.size(); //if isize are 0 then that branch is to be ignored if (iSize == 0) { } else if ((rcSize == 0) || (lcSize == 0)) { } //if you have more groups than either of your kids then theres been a change. else if(iSize > rcSize || iSize > lcSize){ score++; } } pDataArray->results[count] = score; count++; } delete copyTree; return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "Parsimony", "MyParsimonyThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/calculators/prng.cpp000066400000000000000000000144541255543666200204340ustar00rootroot00000000000000/* A C-program for MT19937, with initialization improved 2002/1/26. Coded by Takuji Nishimura and Makoto Matsumoto. Before using, initialize the state by using init_genrand(seed) or init_by_array(init_key, key_length). Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura, All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html email: m-mat @ math.sci.hiroshima-u.ac.jp (remove space) */ #include #include "prng.h" /* Period parameters */ #define N 624 #define M 397 #define MATRIX_A 0x9908b0dfUL /* constant vector a */ #define UPPER_MASK 0x80000000UL /* most significant w-r bits */ #define LOWER_MASK 0x7fffffffUL /* least significant r bits */ #define NJ_RAND_MAX 0x7fffffffUL static unsigned long mt[N]; /* the array for the state vector */ static int mti=N+1; /* mti==N+1 means mt[N] is not initialized */ /* initializes mt[N] with a seed */ void init_genrand(unsigned long s) { mt[0]= s & 0xffffffffUL; for (mti=1; mti> 30)) + mti); /* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier. */ /* In the previous versions, MSBs of the seed affect */ /* only MSBs of the array mt[]. */ /* 2002/01/09 modified by Makoto Matsumoto */ mt[mti] &= 0xffffffffUL; /* for >32 bit machines */ } } /* initialize by an array with array-length */ /* init_key is the array for initializing keys */ /* key_length is its length */ /* slight change for C++, 2004/2/26 */ void init_by_array(unsigned long init_key[], int key_length) { int i, j, k; init_genrand(19650218UL); i=1; j=0; k = (N>key_length ? N : key_length); for (; k; k--) { mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1664525UL)) + init_key[j] + j; /* non linear */ mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */ i++; j++; if (i>=N) { mt[0] = mt[N-1]; i=1; } if (j>=key_length) j=0; } for (k=N-1; k; k--) { mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL)) - i; /* non linear */ mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */ i++; if (i>=N) { mt[0] = mt[N-1]; i=1; } } mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */ } /* generates a random number on [0,0xffffffff]-interval */ unsigned long genrand_int32(void) { unsigned long y; static unsigned long mag01[2]={0x0UL, MATRIX_A}; /* mag01[x] = x * MATRIX_A for x=0,1 */ if (mti >= N) { /* generate N words at one time */ int kk; if (mti == N+1) /* if init_genrand() has not been called, */ init_genrand(5489UL); /* a default initial seed is used */ for (kk=0;kk> 1) ^ mag01[y & 0x1UL]; } for (;kk> 1) ^ mag01[y & 0x1UL]; } y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK); mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1UL]; mti = 0; } y = mt[mti++]; /* Tempering */ y ^= (y >> 11); y ^= (y << 7) & 0x9d2c5680UL; y ^= (y << 15) & 0xefc60000UL; y ^= (y >> 18); return y; } /* generates a random number on [0,0x7fffffff]-interval */ long int genrand_int31(void) { return (long)(genrand_int32()>>1); } /* These real versions are due to Isaku Wada, 2002/01/09 added */ /* generates a random number on [0,1]-real-interval */ double genrand_real1(void) { return genrand_int32()*(1.0/4294967295.0); /* divided by 2^32-1 */ } /* generates a random number on [0,1)-real-interval */ double genrand_real2(void) { return genrand_int32()*(1.0/4294967296.0); /* divided by 2^32 */ } /* generates a random number on (0,1)-real-interval */ double genrand_real3(void) { return (((double)genrand_int32()) + 0.5)*(1.0/4294967296.0); /* divided by 2^32 */ } /* generates a random number on [0,1) with 53-bit resolution*/ double genrand_res53(void) { unsigned long a=genrand_int32()>>5, b=genrand_int32()>>6; return(a*67108864.0+b)*(1.0/9007199254740992.0); } /* * NJ_genrand_int31_top() - Returns an int in the range 0..top * * This function attempts to remove bias in selecting random * integers in a range. * */ long int NJ_genrand_int31_top(long int top) { long int overflow; long int r; long int retval; if(top <= 0) { return(0); } else { overflow = (NJ_RAND_MAX / top) * top; } while(1) { r = genrand_int31(); if(r < overflow) { break; } } retval = r % top; return(retval); } mothur-1.36.1/source/calculators/prng.h000066400000000000000000000047331255543666200201000ustar00rootroot00000000000000/* * prng.h * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * Some function prototypes for the Mersenne Twister PRNG * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #ifndef _INC_PRNG_H_ #define _INC_PRNG_H_ 1 #ifdef __cplusplus extern "C" { #endif #define NJ_RAND_MAX 0x7fffffffUL /* some function prototypes */ void init_genrand(unsigned long s); void init_by_array(unsigned long init_key[], int key_length); unsigned long genrand_int32(void); long int genrand_int31(void); double genrand_real1(void); double genrand_real2(void); double genrand_real3(void); double genrand_res53(void); long int NJ_genrand_int31_top(long int top); #ifdef __cplusplus } #endif #endif /* _INC_PRNG_H_ */ mothur-1.36.1/source/calculators/qstat.cpp000066400000000000000000000033661255543666200206220ustar00rootroot00000000000000/* * qstat.cpp * Mothur * * Created by Thomas Ryabin on 3/4/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "qstat.h" /***********************************************************************/ EstOutput QStat::getValues(SAbundVector* rank){ try { /*test data VVV int dstring[] = {0,0,1,4,2,0,2,1,1,1,1,1,0,1,1,2,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1}; vector dvec; for(int i = 0; i < 171; i++) dvec.push_back(dstring[i]); int mr = 170; int nb = 29; int ns = 884; SAbundVector rankw = SAbundVector(dvec, mr,nb,ns); SAbundVector *rank = &rankw;*/ data.resize(1,0); int numSpec = rank->getNumBins(); int r1 = -1; int r3 = -1; int r1Ind = 0; int r3Ind = 0; double sumSpec = 0; double iqSum = 0; for(int i = 1; i < rank->size(); i++) { if(r1 != -1 && r3 != -1) i = rank->size(); sumSpec += rank->get(i); if(r1 == -1 && sumSpec >= numSpec*.25) { r1 = rank->get(i); r1Ind = i; } else if(r3 == -1 && sumSpec >= numSpec*.75) { r3 = rank->get(i); r3Ind = i; } else if(sumSpec >= numSpec*.25 && sumSpec < numSpec*.75) iqSum += rank->get(i); } double qstat = (.5*r1 + iqSum + .5*r3)/log((double)r3Ind/r1Ind); data[0] = qstat; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "QStat", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/qstat.h000066400000000000000000000013651255543666200202640ustar00rootroot00000000000000#ifndef QSTAT_H #define QSTAT_H /* * qstat.h * Mothur * * Created by Thomas Ryabin on 3/4/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /*This class implements the q statistic on single group. It is a child of the calculator class.*/ /***********************************************************************/ class QStat : public Calculator { public: QStat() : Calculator("qstat", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Qstat"; } private: RAbundVector rdata; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/shannon.cpp000066400000000000000000000026071255543666200211270ustar00rootroot00000000000000/* * shannon.cpp * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "shannon.h" /***********************************************************************/ EstOutput Shannon::getValues(SAbundVector* rank){ try { //vector shannonData(3,0); data.resize(3,0); double shannon = 0.0000; //hprime double hvara=0.0000; double maxRank = rank->getMaxRank(); int sampled = rank->getNumSeqs(); int sobs = rank->getNumBins(); for(int i=1;i<=maxRank;i++){ double p = ((double) i)/((double)sampled); shannon += (double)rank->get(i)*p*log(p); //hprime hvara += (double)rank->get(i)*p*pow(log(p),2); } shannon = -shannon; double hvar = (hvara-pow(shannon,2))/(double)sampled+(double)(sobs-1)/(double)(2*sampled*sampled); double ci = 0; if(hvar>0){ ci = 1.96*pow(hvar,0.5); } double shannonhci = shannon + ci; double shannonlci = shannon - ci; data[0] = shannon; data[1] = shannonlci; data[2] = shannonhci; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Shannon", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/shannon.h000066400000000000000000000013731255543666200205730ustar00rootroot00000000000000#ifndef SHANNON_H #define SHANNON_H /* * shannon.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the Shannon estimator on single group. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class Shannon : public Calculator { public: Shannon() : Calculator("shannon", 3, false) {}; EstOutput getValues(SAbundVector* rank); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Shannon"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/shannoneven.cpp000066400000000000000000000014521255543666200220020ustar00rootroot00000000000000/* * shannoneven.cpp * Mothur * * Created by Pat Schloss on 8/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "shannoneven.h" #include "shannon.h" /***********************************************************************/ EstOutput ShannonEven::getValues(SAbundVector* rank){ try { //vector simpsonData(3,0); data.resize(1,0); vector shanData(3,0); Shannon* shannon = new Shannon(); shanData = shannon->getValues(rank); long int sobs = rank->getNumBins(); if(sobs > 1){ data[0] = shanData[0] / log(sobs); } else{ data[0] = 1; } delete shannon; return data; } catch(exception& e) { m->errorOut(e, "ShannonEven", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/shannoneven.h000066400000000000000000000012101255543666200214370ustar00rootroot00000000000000#ifndef SHANNONEVEN #define SHANNONEVEN /* * shannoneven.h * Mothur * * Created by Pat Schloss on 8/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class ShannonEven : public Calculator { public: ShannonEven() : Calculator("shannoneven", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Shannoneven"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/shannonrange.cpp000066400000000000000000000060611255543666200221420ustar00rootroot00000000000000// // shannonrange.cpp // Mothur // // Created by SarahsWork on 1/3/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #include "shannonrange.h" /***********************************************************************/ EstOutput RangeShannon::getValues(SAbundVector* rank){ try { data.resize(3,0); double commSize = 1e20; double sampleSize = rank->getNumSeqs(); vector freqx; vector freqy; for (int i = 1; i <=rank->getMaxRank(); i++) { int abund = rank->get(i); if (abund != 0) { freqx.push_back(i); freqy.push_back(abund); } } double aux = ceil(pow((sampleSize+1), (1/(double)3))); double est0 = max(freqy[0]+1, aux); vector ests; double numr = 0.0; double denr = 0.0; for (int i = 0; i < freqx.size()-1; i++) { if (m->control_pressed) { break; } if (freqx[i+1] == freqx[i]+1) { numr = max(freqy[i+1]+1, aux); } else { numr = aux; } denr = max(freqy[i], aux); ests.push_back((freqx[i]+1)*numr/(double)denr); } numr = aux; denr = max(freqy[freqy.size()-1], aux); ests.push_back((freqx[freqx.size()-1]+1)*numr/(double)denr); double sum = 0.0; for (int i = 0; i < freqy.size(); i++) { sum += (ests[i]*freqy[i]); } double nfac = est0 + sum; est0 /= nfac; for (int i = 0; i < ests.size(); i++) { ests[i] /= nfac; } double abunup = 1 / commSize; double nbrup = est0 / abunup; double abunlow = ests[0]; double nbrlow = est0 / abunlow; if (alpha == 1) { double sum = 0.0; for (int i = 0; i < freqy.size(); i++) { if (m->control_pressed) { break; } sum += (freqy[i] * ests[i] * log(ests[i])); } data[0] = -sum; data[1] = exp(data[0]+nbrlow*(-abunlow*log(abunlow))); data[2] = exp(data[0]+nbrup*(-abunup*log(abunup))); }else { for (int i = 0; i < freqy.size(); i++) { if (m->control_pressed) { break; } data[0] += (freqy[i] * (pow(ests[i],alpha))); } data[1] = pow(data[0]+nbrup*pow(abunup,alpha), (1/(1-alpha))); data[2] = pow(data[0]+nbrlow*pow(abunlow,alpha), (1/(1-alpha))); } //this calc has no data[0], just a lower and upper estimate. set data[0] to lower estimate. data[0] = data[1]; if (data[1] > data[2]) { data[1] = data[2]; data[2] = data[0]; } data[0] = -1.0; //no value if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "RangeShannon", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/shannonrange.h000066400000000000000000000030651255543666200216100ustar00rootroot00000000000000// // shannonrange.h // Mothur // // Created by SarahsWork on 1/3/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // /* 1] Haegeman, B., Hamelin, J., Moriarty, J., Neal, P., Dushoff, J., & Weitz, J. S. (2013). Robust estimation of microbial diversity in theory and in practice. The ISME journal, 7(6), 1092–1101. [2] Hill, M. O. (1973). Diversity and evenness: A unifying notation and its consequences. Ecology, 54(2), 427–432. [3] Orlitsky, A., Santhanam, N. P., & Zhang, J. (2003). Always Good Turing: Asymptoti- cally optimal probability estimation. Science, 302(5644), 427–431. [4] Roesch, L. F., Fulthorpe, R. R., Riva, A., Casella, G., Hadwin, A. K., Kent, A. D., et al. (2007). Pyrosequencing enumerates and contrasts soil microbial diversity. The ISME Journal, 1(4), 283–290. */ #ifndef Mothur_shannonrange_h #define Mothur_shannonrange_h #include "calculator.h" /***********************************************************************/ class RangeShannon : public Calculator { public: RangeShannon(int a) : alpha(a), Calculator("rangeshannon", 3, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "Haegeman, B., Hamelin, J., Moriarty, J., Neal, P., Dushoff, J., & Weitz, J. S. (2013). Robust estimation of microbial diversity in theory and in practice. The ISME journal, 7(6), 1092–1101., http://www.mothur.org/wiki/rangeshannon"; } private: int alpha; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedace.cpp000066400000000000000000000073501255543666200214020ustar00rootroot00000000000000 /* * sharedace.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedace.h" /***********************************************************************/ EstOutput SharedAce::getValues(vector shared) { try { data.resize(1,0); string label; label = shared[0]->getLabel(); double fARare, fBRare, S12Rare, S12Abund, S12, f11, tempA, tempB, t10, t01, t11, t21, t12, t22, C12Numerator; fARare = 0; fBRare = 0; S12Rare = 0; S12Abund = 0; S12 = 0; f11 = 0; t10 = 0; t01 = 0; t11= 0; t21= 0; t12= 0; t22= 0; C12Numerator = 0; double Sharedace, C12, part1, part2, part3, part4, part5, Gamma1, Gamma2, Gamma3; /*fARare = number of OTUs with one individual found in A and less than or equal to 10 in B. fBRare = number of OTUs with one individual found in B and less than or equal to 10 in A. arare = number of sequences from A that contain less than 10 sequences. brare = number of sequences from B that contain less than 10 sequences. S12Rare = number of shared OTUs where both of the communities are represented by less than or equal to 10 sequences S12Abund = number of shared OTUs where at least one of the communities is represented by more than 10 sequences S12 = number of shared OTUs in A and B This estimator was changed to reflect Caldwell's changes, eliminating the nrare / nrare - 1 */ for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); if ((tempA != 0) && (tempB != 0)) {//they are shared S12++; //do both A and B have one if ((tempA == 1) && (tempB == 1)) { f11++; } //is A one and B rare if ((tempA == 1) && (tempB <= abund)) { fARare++; } //is B one and A rare if ((tempB == 1) && (tempA <= abund)) { fBRare++; } if ((tempA <= abund) && (tempB <= abund)) { //shared and both rare S12Rare++; t10 += tempA; //Sum Xi t01 += tempB; //Sum Yi //calculate top of C12 // YiI(Xi = 1) if (tempA == 1) { C12Numerator += tempB; } //XiI(Yi = 1) if (tempB == 1) { C12Numerator += tempA; } //-I(Xi=Yi=1) if ((tempA == 1) && (tempB == 1)) { C12Numerator--; } //calculate t11 - Sum of XiYi t11 += tempA * tempB; //calculate t21 - Sum of Xi(Xi - 1)Yi t21 += tempA * (tempA - 1) * tempB; //calculate t12 - Sum of Xi(Yi - 1)Yi t12 += tempA * (tempB - 1) * tempB; //calculate t22 - Sum of Xi(Xi - 1)Yi(Yi - 1) t22 += tempA * (tempA - 1) * tempB * (tempB - 1); } if ((tempA > 10) || (tempB > 10)) { S12Abund++; } } } C12 = 1 - (C12Numerator /(float) t11); part1 = S12Rare / (float)C12; part2 = 1 / (float)C12; //calculate gammas Gamma1 = ((S12Rare * t21) / (float)((C12 * t10 * t11)) - 1); Gamma2 = ((S12Rare * t12) / (float)((C12 * t01 * t11)) - 1); Gamma3 = ((S12Rare / C12) * (S12Rare / C12)) * ( t22 / (float)(t10 * t01 * t11)); Gamma3 = Gamma3 - ((S12Rare * t11) / (float)(C12 * t01 * t10)) - Gamma1 - Gamma2; if (isnan(Gamma1) || isinf(Gamma1)) { Gamma1 = 0; } if (isnan(Gamma2) || isinf(Gamma2)) { Gamma2 = 0; } if (isnan(Gamma3) || isinf(Gamma3)) { Gamma3 = 0; } if (isnan(part1) || isinf(part1)) { part1 = 0; } if (isnan(part2) || isinf(part2)) { part2 = 0; } part3 = fARare * Gamma1; part4 = fBRare * Gamma2; part5 = f11 * Gamma3; Sharedace = S12Abund + part1 + (part2 * (part3 + part4 + part5)); data[0] = Sharedace; return data; } catch(exception& e) { m->errorOut(e, "SharedAce", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedace.h000066400000000000000000000014401255543666200210410ustar00rootroot00000000000000#ifndef SHAREDACE_H #define SHAREDACE_H /* * sharedace.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedAce estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class SharedAce : public Calculator { public: SharedAce(int n=10) : abund(n), Calculator("sharedace", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/SharedAce"; } private: int abund; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedanderbergs.cpp000066400000000000000000000022651255543666200227660ustar00rootroot00000000000000/* * sharedanderberg.cpp * Mothur * * Created by Sarah Westcott on 3/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedanderbergs.h" /***********************************************************************/ EstOutput Anderberg::getValues(vector shared) { try { double S1, S2, S12, tempA, tempB; S1 = 0; S2 = 0; S12 = 0; tempA = 0; tempB = 0; /*S1, S2 = number of OTUs observed or estimated in A and B S12=number of OTUs shared between A and B */ data.resize(1,0); for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); if (tempA != 0) { S1++; } if (tempB != 0) { S2++; } //they are shared if ((tempA != 0) && (tempB != 0)) { S12++; } } data[0] = 1.0 - S12 / ((float)((2 * S1) + (2 * S2) - (3 * S12))); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Anderberg", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedanderbergs.h000066400000000000000000000012451255543666200224300ustar00rootroot00000000000000#ifndef ANDERBERG_H #define ANDERBERG_H /* * sharedanderberg.h * Mothur * * Created by Sarah Westcott on 3/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Anderberg : public Calculator { public: Anderberg() : Calculator("anderberg", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Anderberg"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedbraycurtis.cpp000066400000000000000000000026521255543666200230410ustar00rootroot00000000000000/* * sharedbraycurtis.cpp * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedbraycurtis.h" /***********************************************************************/ //This is used by SharedJAbund and SharedSorAbund EstOutput BrayCurtis::getValues(vector shared) { try { data.resize(1,0); double sumSharedA, sumSharedB, sumSharedAB, tempA, tempB; sumSharedA = 0; sumSharedB = 0; sumSharedAB = 0; /*Xi, Yi = abundance of the ith shared OTU in A and B sumSharedA = the number of otus in A sumSharedB = the sum of all shared otus in B sumSharedAB = the sum of the minimum otus int all shared otus in AB. */ for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); sumSharedA += tempA; sumSharedB += tempB; //sum the min of tempA and tempB if (tempA < tempB) { sumSharedAB += tempA; } else { sumSharedAB += tempB; } } data[0] = 1.0 - (2 * sumSharedAB) / (float)( sumSharedA + sumSharedB); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "BrayCurtis", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedbraycurtis.h000066400000000000000000000012501255543666200224770ustar00rootroot00000000000000#ifndef BRAYCURTIS_H #define BRAYCURTIS_H /* * sharedbraycurtis.h * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class BrayCurtis : public Calculator { public: BrayCurtis() : Calculator("braycurtis", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Braycurtis"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedchao1.cpp000066400000000000000000000165371255543666200216540ustar00rootroot00000000000000/* * sharedchao1.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedchao1.h" /***********************************************************************/ EstOutput SharedChao1::getValues(vector shared){ try { data.resize(1,0); vector temp; int numGroups = shared.size(); float Chao = 0.0; float leftvalue, rightvalue; // IntNode is defined in mothur.h // The tree used here is a binary tree used to represent the f1+++, f+1++, f++1+, f+++1, f11++, f1+1+... // combinations required to solve the chao estimator equation for any number of groups. Conceptually, think // of each node as having a 1 and a + value, or for f2 values a 2 and a + value, and 2 pointers to intnodes, and 2 coeffient values. // The coeffient value is how many times you chose branch 1 to get to that fvalue. // If you choose left you are selecting the 1 or 2 value and right means the + value. For instance, to find // the number of bins that have f1+1+ you would start at the root, go left, right, left, and select the rightvalue. // the coeffient is 2. Note: we only set the coeffient in f2 values. //create and initialize trees to 0. initialTree(numGroups); for (int i = 0; i < shared[0]->getNumBins(); i++) { //get bin values and calc shared bool sharedByAll = true; temp.clear(); for (int j = 0; j < numGroups; j++) { temp.push_back(shared[j]->getAbundance(i)); if (temp[j] == 0) { sharedByAll = false; } } //they are shared if (sharedByAll == true) { //find f1 and f2values updateTree(temp); } } //calculate chao1, (numleaves-1) because numleaves contains the ++ values. bool bias = false; for(int i=0;ilvalue == 0 || f2leaves[i]->rvalue == 0) { bias = true;}// break;} } if(bias){ for (int i = 0; i < numLeaves; i++) { leftvalue = (float)(f1leaves[i]->lvalue * (f1leaves[i]->lvalue - 1)) / (float)((pow(2, (float)f2leaves[i]->lcoef)) * (f2leaves[i]->lvalue + 1)); if (i != (numLeaves-1)) { rightvalue = (float)(f1leaves[i]->rvalue * (f1leaves[i]->rvalue - 1)) / (float)((pow(2, (float)f2leaves[i]->rcoef)) * (f2leaves[i]->rvalue + 1)); }else{ //add in sobs rightvalue = (float)(f1leaves[i]->rvalue); } Chao += leftvalue + rightvalue; } } else{ for (int i = 0; i < numLeaves; i++) { leftvalue = (float)(f1leaves[i]->lvalue * f1leaves[i]->lvalue) / (float)((pow(2, (float)f2leaves[i]->lcoef)) * f2leaves[i]->lvalue); if (i != (numLeaves-1)) { rightvalue = (float)(f1leaves[i]->rvalue * f1leaves[i]->rvalue) / (float)((pow(2, (float)f2leaves[i]->rcoef)) * f2leaves[i]->rvalue); }else{ //add in sobs rightvalue = (float)(f1leaves[i]->rvalue); } Chao += leftvalue + rightvalue; } } for (int i = 0; i < numNodes; i++) { delete f1leaves[i]; delete f2leaves[i]; } data[0] = Chao; return data; } catch(exception& e) { m->errorOut(e, "SharedChao1", "getValues"); exit(1); } } /***********************************************************************/ //builds trees structure with n leaf nodes initialized to 0. void SharedChao1::initialTree(int n) { try { // (2^n) / 2. Divide by 2 because each leaf node contains 2 values. One for + and one for 1 or 2. numLeaves = pow(2, (float)n) / 2; numNodes = 2*numLeaves - 1; int countleft = 0; int countright = 1; f1leaves.resize(numNodes); f2leaves.resize(numNodes); //initialize leaf values for (int i = 0; i < numLeaves; i++) { f1leaves[i] = new IntNode(0, 0, NULL, NULL); f2leaves[i] = new IntNode(0, 0, NULL, NULL); } //set pointers to children for (int j = numLeaves; j < numNodes; j++) { f1leaves[j] = new IntNode(); f1leaves[j]->left = f1leaves[countleft]; f1leaves[j]->right = f1leaves[countright]; f2leaves[j] = new IntNode(); f2leaves[j]->left = f2leaves[countleft]; f2leaves[j]->right =f2leaves[countright]; countleft = countleft + 2; countright = countright + 2; } //point to root f1root = f1leaves[numNodes-1]; //point to root f2root = f2leaves[numNodes-1]; //set coeffients setCoef(f2root, 0); } catch(exception& e) { if ((toString(e.what()) == "vector::_M_fill_insert") || (toString(e.what()) == "St9bad_alloc")) { m->mothurOut("You are using " + toString(n) + " groups which creates 2^" + toString(n+1) + " nodes. Try reducing the number of groups you selected. "); m->mothurOutEndLine(); exit(1); } m->errorOut(e, "SharedChao1", "initialTree"); exit(1); } } /***********************************************************************/ //take vector containing the abundance info. for a bin and updates trees. void SharedChao1::updateTree(vector bin) { try { updateBranchf1(f1root, bin, 0); updateBranchf2(f2root, bin, 0); } catch(exception& e) { m->errorOut(e, "SharedChao1", "updateTree"); exit(1); } } /***********************************************************************/ void SharedChao1::updateBranchf1(IntNode* node, vector bin, int index) { try { //if you have more than one group if (index == (bin.size()-1)) { if (bin[index] == 1) { node->lvalue++; node->rvalue++; } else { node->rvalue++; } }else { if (bin[index] == 1) { //follow path as if you are 1 updateBranchf1(node->left, bin, index+1); } //follow path as if you are + updateBranchf1(node->right, bin, index+1); } } catch(exception& e) { m->errorOut(e, "SharedChao1", "updateBranchf1"); exit(1); } } /***********************************************************************/ void SharedChao1::updateBranchf2(IntNode* node, vector bin, int index) { try { //if you have more than one group if (index == (bin.size()-1)) { if (bin[index] == 2) { node->lvalue++; node->rvalue++; } else { node->rvalue++; } }else { if (bin[index] == 2) { //follow path as if you are 1 updateBranchf2(node->left, bin, index+1); } //follow path as if you are + updateBranchf2(node->right, bin, index+1); } } catch(exception& e) { m->errorOut(e, "SharedChao1", "updateBranchf2"); exit(1); } } /***********************************************************************/ void SharedChao1::setCoef(IntNode* node, int coef) { try { if (node->left != NULL) { setCoef(node->left, coef+1); setCoef(node->right, coef); }else { node->lcoef = coef+1; node->rcoef = coef; } } catch(exception& e) { m->errorOut(e, "SharedChao1", "setCoef"); exit(1); } } /***********************************************************************/ //for debugging purposes void SharedChao1::printTree() { m->mothurOut("F1 leaves"); m->mothurOutEndLine(); printBranch(f1root); m->mothurOut("F2 leaves"); m->mothurOutEndLine(); printBranch(f2root); } /*****************************************************************/ void SharedChao1::printBranch(IntNode* node) { try { // you are not a leaf if (node->left != NULL) { printBranch(node->left); printBranch(node->right); }else { //you are a leaf m->mothurOut(toString(node->lvalue)); m->mothurOutEndLine(); m->mothurOut(toString(node->rvalue)); m->mothurOutEndLine(); } } catch(exception& e) { m->errorOut(e, "SharedChao1", "printBranch"); exit(1); } } /*****************************************************************/ mothur-1.36.1/source/calculators/sharedchao1.h000066400000000000000000000026321255543666200213100ustar00rootroot00000000000000#ifndef SHAREDCHAO1_H #define SHAREDCHAO1_H /* * sharedchao1.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the Sharedchao1 estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class SharedChao1 : public Calculator { public: SharedChao1() : Calculator("sharedchao", 1, true) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Sharedchao"; } private: IntNode* f1root; IntNode* f2root; vector f1leaves; vector f2leaves; int numLeaves; int numNodes; void initialTree(int); //builds trees structure with n leaf nodes initialized to 0. void setCoef(IntNode*, int); void updateTree(vector); //take vector containing the abundance info. for a bin and updates trees. void updateBranchf1(IntNode*, vector, int); //pointer, vector of abundance values, index into vector void updateBranchf2(IntNode*, vector, int); //pointer, vector of abundance values, index into vector //for debugging void printTree(); void printBranch(IntNode*); }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedjabund.cpp000066400000000000000000000014721255543666200221140ustar00rootroot00000000000000/* * sharedjabund.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedjabund.h" /***********************************************************************/ EstOutput JAbund::getValues(vector shared) { try { EstOutput UVest; UVest.resize(2,0); data.resize(1,0); UVest = uv->getUVest(shared); //UVest[0] is Uest UVest[1] is Vest data[0] = 1.0-(UVest[0] * UVest[1]) / ((float)(UVest[0] + UVest[1] - (UVest[0] * UVest[1]))); if(data[0] > 1){data[0] = 0; } if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "JAbund", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedjabund.h000066400000000000000000000014051255543666200215550ustar00rootroot00000000000000#ifndef JABUND_H #define JABUND_H /* * sharedjabund.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedJAbund estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class JAbund : public Calculator { public: JAbund() : Calculator("jabund", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Jabund"; } private: UVEst* uv; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedjackknife.cpp000066400000000000000000000073601255543666200226000ustar00rootroot00000000000000/* * sharedjackknife.cpp * Mothur * * Created by Thomas Ryabin on 3/30/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedjackknife.h" /*************************************************************************************** ***************************************************************************************/ double SharedJackknife::simpson(vector abunds, double numInd, int numBins){ double denom = numInd*(numInd-1); double sum = 0; for(int i = 0; i < numBins; i++) sum += (double)abunds[i]*((double)abunds[i]-1)/denom; return sum; } /*****************************************************************************************/ double* SharedJackknife::jackknife(){ int numBins = groups.at(0)->getNumBins()-1; vector cArray(numBins); for(int i = 0; i < numBins; i++) cArray[i] = 0; double numInd = 0; for(int i = 0; i < numGroups; i++) for(int j = 0; j < numBins; j++) { int curAbund = groups.at(i)->get(j+1).abundance; cArray[j] += curAbund; numInd += (double)curAbund; } double baseD = 1/simpson(cArray, numInd, numBins); vector pseudoVals(numBins); double jackknifeEstimate = 0; for(int i = 0; i < numGroups; i++) { for(int j = 0; j < numBins-1; j++) { int abundDiff = -groups.at(i)->get(j+1).abundance; if(i > 0) abundDiff += groups.at(i-1)->get(j+1).abundance; cArray[j] += abundDiff; numInd += abundDiff; } double curD = 1/simpson(cArray, numInd, numBins); pseudoVals[i] = (double)numGroups*(baseD - curD) + curD; jackknifeEstimate += pseudoVals[i]; } jackknifeEstimate /= (double)numGroups; double variance = 0; for(int i = 0; i < numGroups; i++) variance += pow(pseudoVals[i]-jackknifeEstimate, 2); variance /= (double)numGroups*((double)numGroups-1); double stErr = sqrt(variance); TDTable table; double confLimit = 0; if(numGroups <= 30) confLimit = table.getConfLimit(numGroups-1, 1); else confLimit = 1.645; confLimit *= stErr; double* rdata = new double[3]; rdata[0] = baseD; rdata[1] = jackknifeEstimate - confLimit; rdata[2] = jackknifeEstimate + confLimit; return rdata; } /************************************************************************************************ ************************************************************************************************/ EstOutput SharedJackknife::getValues(vector vectorShared){ //Fix this for collect, mistake was that it was made with summary in mind. try { SharedRAbundVector* shared1 = vectorShared[0]; SharedRAbundVector* shared2 = vectorShared[1]; if(numGroups == -1) { numGroups = m->getNumGroups(); } if(callCount == numGroups*(numGroups-1)/2) { currentCallDone = true; callCount = 0; } callCount++; if(currentCallDone) { groups.clear(); currentCallDone = false; } if(groups.size() != numGroups) { if(groups.size() == 0) groups.push_back(shared1); groups.push_back(shared2); } if(groups.size() == numGroups && callCount < numGroups) { data.resize(3,0); double* rdata = jackknife(); data[0] = rdata[0]; data[1] = rdata[1]; data[2] = rdata[2]; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[0])) { data[2] = 0; } return data; } data.resize(3,0); data[0] = 0; data[1] = 0; data[2] = 0; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "SharedJackknife", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedjackknife.h000066400000000000000000000020041255543666200222330ustar00rootroot00000000000000#ifndef SHAREDJACKKNIFE_H #define SHAREDJACKKNIFE_H /* * sharedjackknife.h * Mothur * * Created by Thomas Ryabin on 3/30/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /*This class implements the SharedJackknife estimator. It is a child of the calculator class.*/ /***********************************************************************/ class SharedJackknife : public Calculator { public: SharedJackknife() : numGroups(-1), callCount(0), count(0), currentCallDone(true), Calculator("sharedjackknife", 3, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Sharedjackknife"; } private: int numGroups, callCount, count; bool currentCallDone; vector groups; double simpson(vector, double, int); double* jackknife(); }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedjclass.cpp000066400000000000000000000023051255543666200221240ustar00rootroot00000000000000/* * sharedjclass.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedjclass.h" /***********************************************************************/ EstOutput Jclass::getValues(vector shared) { try { double S1, S2, S12, tempA, tempB; S1 = 0; S2 = 0; S12 = 0; tempA = 0; tempB = 0; /*S1, S2 = number of OTUs observed or estimated in A and B S12=number of OTUs shared between A and B */ data.resize(1,0); for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); //find number of bins in shared1 and shared2 if (tempA != 0) { S1++; } if (tempB != 0) { S2++; } //they are shared if ((tempA != 0) && (tempB != 0)) { S12++; } } data[0] = 1.0 - S12 / (float)(S1 + S2 - S12); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Jclass", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedjclass.h000066400000000000000000000013701255543666200215720ustar00rootroot00000000000000#ifndef JCLASS_H #define JCLASS_H /* * sharedjclass.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedJclass estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class Jclass : public Calculator { public: Jclass() : Calculator("jclass", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Jclass"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedjest.cpp000066400000000000000000000033401255543666200216120ustar00rootroot00000000000000/* * sharedjest.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedchao1.h" #include "chao1.h" #include "sharedjest.h" /***********************************************************************/ EstOutput Jest::getValues(vector shared) { try { EstOutput S1, S2, S12; S12.resize(1,0); S1.resize(3,0); S2.resize(3,0); /*S1, S2 = number of OTUs estimated in A and B using the Chao estimator S12 = estimated number of OTUs shared between A and B using the SharedChao estimator*/ data.resize(1,0); SharedChao1* sharedChao = new SharedChao1(); Chao1* chaoS1 = new Chao1(); Chao1* chaoS2 = new Chao1(); SAbundVector* chaoS1Sabund = new SAbundVector(); SAbundVector* chaoS2Sabund = new SAbundVector(); *chaoS1Sabund = shared[0]->getSAbundVector(); *chaoS2Sabund = shared[1]->getSAbundVector(); //chaoS1Sabund->print(cout); //chaoS2Sabund->print(cout); S12 = sharedChao->getValues(shared); S1 = chaoS1->getValues(chaoS1Sabund); S2 = chaoS2->getValues(chaoS2Sabund); //cout << S12[0] << '\t' << S1[0] << '\t' << S2[0] << endl; data[0] = 1.0 - S12[0] / (float)(S1[0] + S2[0] - S12[0]); //cout << data[0] << endl; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (data[0] < 0) { data[0] = 0; } if (data[0] > 1) { data[0] = 1; } delete sharedChao; delete chaoS1; delete chaoS2; delete chaoS1Sabund; delete chaoS2Sabund; return data; } catch(exception& e) { m->errorOut(e, "Jest", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedjest.h000066400000000000000000000013511255543666200212570ustar00rootroot00000000000000#ifndef JEST_H #define JEST_H /* * sharedjest.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedJest estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class Jest : public Calculator { public: Jest() : Calculator("jest", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Jest"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedjsd.cpp000066400000000000000000000032201255543666200214220ustar00rootroot00000000000000// // sharedjsd.cpp // Mothur // // Created by SarahsWork on 12/9/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "sharedjsd.h" /***********************************************************************/ //KLD <- function(x,y) sum(x *log(x/y)) //JSD<- function(x,y) sqrt(0.5 * KLD(x, (x+y)/2) + 0.5 * KLD(y, (x+y)/2)) EstOutput JSD::getValues(vector shared) { try { data.resize(1,0); double KLD1 = 0.0; double KLD2 = 0.0; vector countA = shared[0]->getAbundances(); vector countB = shared[1]->getAbundances(); double totalA = 0; double totalB = 0; for (int i = 0; i < shared[0]->getNumBins(); i++) { totalA += countA[i]; totalB += countB[i]; } for (int i = 0; i < shared[0]->getNumBins(); i++) { double tempA = countA[i] / totalA; double tempB = countB[i] / totalB; tempA = countA[i] / totalA; tempB = countB[i] / totalB; if (tempA == 0) { tempA = 0.000001; } if (tempB == 0) { tempB = 0.000001; } double denom = (tempA+tempB)/(double)2.0; if (tempA != 0) { KLD1 += tempA * log(tempA/denom); } //KLD(x,m) if (tempB != 0) { KLD2 += tempB * log(tempB/denom); } //KLD(y,m) } data[0] = ((0.5*KLD1) + (0.5*KLD2)); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "JSD", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedjsd.h000066400000000000000000000012411255543666200210700ustar00rootroot00000000000000// // sharedjsd.h // Mothur // // Created by SarahsWork on 12/9/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_sharedjsd_h #define Mothur_sharedjsd_h #include "calculator.h" /***********************************************************************/ //Jensen-Shannon divergence (JSD) class JSD : public Calculator { public: JSD() : Calculator("jsd", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/JSD"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedkstest.cpp000066400000000000000000000032241255543666200221630ustar00rootroot00000000000000/* * kstest.cpp * Mothur * * Created by Thomas Ryabin on 3/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedkstest.h" /***********************************************************************/ EstOutput KSTest::getValues(vector shared){ try { data.resize(3,0); //Must return shared1 and shared2 to original order at conclusion of kstest vector initData1 = shared[0]->getData(); vector initData2 = shared[1]->getData(); shared[0]->sortD(); shared[1]->sortD(); int numNZ1 = shared[0]->numNZ(); int numNZ2 = shared[1]->numNZ(); double numInd1 = (double)shared[0]->getNumSeqs(); double numInd2 = (double)shared[1]->getNumSeqs(); double maxDiff = -1; double sum1 = 0; double sum2 = 0; for(int i = 1; i < shared[0]->getNumBins(); i++) { sum1 += shared[0]->get(i).abundance; sum2 += shared[1]->get(i).abundance; double diff = fabs((double)sum1/numInd1 - (double)sum2/numInd2); if(diff > maxDiff) maxDiff = diff; } double DStatistic = maxDiff*numNZ1*numNZ2; double a = pow((double)(numNZ1 + numNZ2)/(numNZ1*numNZ2),.5); //double pVal = exp(-2*pow(maxDiff/a,2)); double critVal = 1.36*a*numNZ1*numNZ2; shared[0]->setData(initData1); shared[1]->setData(initData2); data[0] = DStatistic; data[1] = critVal; data[2] = 0; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } return data; } catch(exception& e) { m->errorOut(e, "KSTest", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedkstest.h000066400000000000000000000013451255543666200216320ustar00rootroot00000000000000#ifndef KSTEST_H #define KSTEST_H /* * kstest.h * Mothur * * Created by Thomas Ryabin on 3/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /*This class implements the KSTest estimator on 2 groups. It is a child of the calculator class.*/ /***********************************************************************/ class KSTest : public Calculator { public: KSTest() : Calculator("kstest", 3, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Kstest"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedkulczynski.cpp000066400000000000000000000022521255543666200230540ustar00rootroot00000000000000/* * sharedkulczynski.cpp * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedkulczynski.h" /***********************************************************************/ EstOutput Kulczynski::getValues(vector shared) { try { double S1, S2, S12, tempA, tempB; S1 = 0; S2 = 0; S12 = 0; tempA = 0; tempB = 0; /*S1, S2 = number of OTUs observed or estimated in A and B S12=number of OTUs shared between A and B */ data.resize(1,0); for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); if (tempA != 0) { S1++; } if (tempB != 0) { S2++; } //they are shared if ((tempA != 0) && (tempB != 0)) { S12++; } } data[0] = 1.0 - S12 / (float)(S1 + S2 - (2 * S12)); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Kulczynski", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedkulczynski.h000066400000000000000000000012471255543666200225240ustar00rootroot00000000000000#ifndef KULCZYNSKI_H #define KULCZYNSKI_H /* * sharedkulczynski.h * Mothur * * Created by John Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Kulczynski : public Calculator { public: Kulczynski() : Calculator("kulczynski", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Kulczynski"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedkulczynskicody.cpp000066400000000000000000000023051255543666200237320ustar00rootroot00000000000000/* * sharedkulczynskicody.cpp * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedkulczynskicody.h" /***********************************************************************/ EstOutput KulczynskiCody::getValues(vector shared) { try { double S1, S2, S12, tempA, tempB; S1 = 0; S2 = 0; S12 = 0; tempA = 0; tempB = 0; /*S1, S2 = number of OTUs observed or estimated in A and B S12=number of OTUs shared between A and B */ data.resize(1,0); for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); if (tempA != 0) { S1++; } if (tempB != 0) { S2++; } //they are shared if ((tempA != 0) && (tempB != 0)) { S12++; } } data[0] = 1.0 - 0.5 * ((S12 / (float)S1) + (S12 / (float)S2)); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "KulczynskiCody", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedkulczynskicody.h000066400000000000000000000013051255543666200233760ustar00rootroot00000000000000#ifndef KULCZYNSKICODY_H #define KULCZYNSKICODY_H /* * sharedkulczynskicody.h * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class KulczynskiCody : public Calculator { public: KulczynskiCody() : Calculator("kulczynskicody", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Kulczynskicody"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedlennon.cpp000066400000000000000000000024131255543666200221360ustar00rootroot00000000000000/* * sharedlennon.cpp * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedlennon.h" /***********************************************************************/ EstOutput Lennon::getValues(vector shared) { try { double S1, S2, S12, tempA, tempB, min; S1 = 0; S2 = 0; S12 = 0; tempA = 0; tempB = 0; min = 0; /*S1, S2 = number of OTUs observed or estimated in A and B S12=number of OTUs shared between A and B */ data.resize(1,0); for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); if (tempA != 0) { S1++; } if (tempB != 0) { S2++; } //they are shared if ((tempA != 0) && (tempB != 0)) { S12++; } } tempA = S1 - S12; tempB = S2 - S12; if (tempA < tempB) { min = tempA; } else { min = tempB; } data[0] = 1.0 - S12 / (float)(S12 + min); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Lennon", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedlennon.h000066400000000000000000000012151255543666200216020ustar00rootroot00000000000000#ifndef LENNON_H #define LENNON_H /* * sharedlennon.h * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Lennon : public Calculator { public: Lennon() : Calculator("lennon", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Lennon"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedmarczewski.cpp000066400000000000000000000017431255543666200230310ustar00rootroot00000000000000/* * sharedmarczewski.cpp * Mothur * * Created by Thomas Ryabin on 4/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedmarczewski.h" EstOutput SharedMarczewski::getValues(vector vectorShared){ try { SharedRAbundVector* shared1 = vectorShared[0]; SharedRAbundVector* shared2 = vectorShared[1]; data.resize(1,0); double a = 0; double b = 0; double c = 0; for(int i = 1; i < shared1->size(); i++) { int abund1 = shared1->get(i).abundance; int abund2 = shared2->get(i).abundance; if(abund1 > 0 && abund2 > 0) a++; else if(abund1 > 0 && abund2 == 0) b++; else if(abund1 == 0 && abund2 > 0) c++; } data[0] = (b+c)/(a+b+c); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "SharedMarczewski", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedmarczewski.h000066400000000000000000000013071255543666200224720ustar00rootroot00000000000000#ifndef SHAREDMARCZEWSKI_H #define SHAREDMARCZEWSKI_H /* * sharedmarczewski.h * Mothur * * Created by Thomas Ryabin on 4/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class SharedMarczewski : public Calculator { public: SharedMarczewski() : Calculator("sharedmarczewski", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Sharedmarczewski"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedmorisitahorn.cpp000066400000000000000000000027431255543666200233710ustar00rootroot00000000000000/* * sharedmorisitahorn.cpp * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedmorisitahorn.h" /***********************************************************************/ EstOutput MorHorn::getValues(vector shared) { try { data.resize(1,0); double Atotal, Btotal, tempA, tempB; Atotal = 0; Btotal = 0; double morhorn, sumSharedA, sumSharedB, a, b, d; morhorn = 0.0; sumSharedA = 0.0; sumSharedB = 0.0; a = 0.0; b = 0.0; d = 0.0; //get the total values we need to calculate the theta denominator sums for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls Atotal += shared[0]->getAbundance(i); Btotal += shared[1]->getAbundance(i); } //calculate the denominator sums for (int j = 0; j < shared[0]->getNumBins(); j++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(j); tempB = shared[1]->getAbundance(j); float relA = tempA / Atotal; float relB = tempB / Btotal; a += relA * relA; b += relB * relB; d += relA * relB; } morhorn = 1- (2 * d) / (a + b); if (isnan(morhorn) || isinf(morhorn)) { morhorn = 1; } data[0] = morhorn; return data; } catch(exception& e) { m->errorOut(e, "MorHorn", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedmorisitahorn.h000066400000000000000000000012431255543666200230300ustar00rootroot00000000000000#ifndef MORHORN_H #define MORHORN_H /* * sharedmorisitahorn.h * Mothur * * Created by Sarah Westcott on 3/24/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class MorHorn : public Calculator { public: MorHorn() : Calculator("morisitahorn", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Morisitahorn"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharednseqs.h000066400000000000000000000014351255543666200214460ustar00rootroot00000000000000#ifndef SHAREDNSEQS_H #define SHAREDNSEQS_H /* * sharednseqs.h * Mothur * * Created by Sarah Westcott on 3/16/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class SharedNSeqs : public Calculator { public: SharedNSeqs() : Calculator("sharednseqs", 1, false) {}; EstOutput getValues(SAbundVector* rank){ return data; }; EstOutput getValues(vector shared) { data.resize(1,0); data[0] = (double)shared[0]->getNumSeqs() + (double)shared[1]->getNumSeqs(); return data; } string getCitation() { return "http://www.mothur.org/wiki/Sharednseqs"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedochiai.cpp000066400000000000000000000022241255543666200221010ustar00rootroot00000000000000/* * sharedochiai.cpp * Mothur * * Created by Sarah Westcott on 3/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedochiai.h" /***********************************************************************/ EstOutput Ochiai::getValues(vector shared) { try { double S1, S2, S12, tempA, tempB; S1 = 0; S2 = 0; S12 = 0; tempA = 0; tempB = 0; /*S1, S2 = number of OTUs observed or estimated in A and B S12=number of OTUs shared between A and B */ data.resize(1,0); for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); if (tempA != 0) { S1++; } if (tempB != 0) { S2++; } //they are shared if ((tempA != 0) && (tempB != 0)) { S12++; } } data[0] = S12 / ((float)pow((S1 * S2), 0.5)); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Ochiai", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedochiai.h000066400000000000000000000012141255543666200215440ustar00rootroot00000000000000#ifndef OCHIAI_H #define OCHIAI_H /* * sharedochiai.h * Mothur * * Created by Sarah Westcott on 3/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Ochiai : public Calculator { public: Ochiai() : Calculator("ochiai", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/ochiai"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedrjsd.cpp000066400000000000000000000032141255543666200216070ustar00rootroot00000000000000// // sharedrjsd.cpp // Mothur // // Created by Sarah Westcott on 1/21/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #include "sharedrjsd.h" /***********************************************************************/ //KLD <- function(x,y) sum(x *log(x/y)) //JSD<- function(x,y) sqrt(0.5 * KLD(x, (x+y)/2) + 0.5 * KLD(y, (x+y)/2)) EstOutput RJSD::getValues(vector shared) { try { data.resize(1,0); double KLD1 = 0.0; double KLD2 = 0.0; vector countA = shared[0]->getAbundances(); vector countB = shared[1]->getAbundances(); double totalA = 0; double totalB = 0; for (int i = 0; i < shared[0]->getNumBins(); i++) { totalA += countA[i]; totalB += countB[i]; } for (int i = 0; i < shared[0]->getNumBins(); i++) { double tempA = countA[i] / totalA; double tempB = countB[i] / totalB; tempA = countA[i] / totalA; tempB = countB[i] / totalB; if (tempA == 0) { tempA = 0.000001; } if (tempB == 0) { tempB = 0.000001; } double denom = (tempA+tempB)/(double)2.0; if (tempA != 0) { KLD1 += tempA * log(tempA/denom); } //KLD(x,m) if (tempB != 0) { KLD2 += tempB * log(tempB/denom); } //KLD(y,m) } data[0] = sqrt((0.5*KLD1) + (0.5*KLD2)); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "RJSD", "getValues"); exit(1); } } mothur-1.36.1/source/calculators/sharedrjsd.h000066400000000000000000000012551255543666200212570ustar00rootroot00000000000000// // sharedrjsd.h // Mothur // // Created by Sarah Westcott on 1/21/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #ifndef Mothur_sharedrjsd_h #define Mothur_sharedrjsd_h #include "calculator.h" /***********************************************************************/ //Jensen-Shannon divergence (JSD) class RJSD : public Calculator { public: RJSD() : Calculator("rjsd", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/RJSD"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedsobs.cpp000066400000000000000000000017431255543666200216200ustar00rootroot00000000000000/* * sharedsobs.cpp * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedsobs.h" /***********************************************************************/ //This returns the number of unique species observed in several groups. //The shared vector is each groups sharedrabundvector. EstOutput SharedSobs::getValues(vector shared){ try { data.resize(1,0); double observed = 0; //loop through the species in each group for (int k = 0; k < shared[0]->getNumBins(); k++) { //if you have found a new species if (shared[0]->getAbundance(k) != 0) { observed++; } else if ((shared[0]->getAbundance(k) == 0) && (shared[1]->getAbundance(k) != 0)) { observed++; } } data[0] = observed; return data; } catch(exception& e) { m->errorOut(e, "SharedSobs", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedsobs.h000066400000000000000000000014501255543666200212600ustar00rootroot00000000000000#ifndef SHAREDSOBS_H #define SHAREDSOBS_H /* * sharedsobs.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedSobs estimator on two groups for the shared rarefaction command. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class SharedSobs : public Calculator { public: SharedSobs() : Calculator("sharedsobs", 1, false) {}; EstOutput getValues(SAbundVector* rank){ return data; }; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/sharedsobs"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedsobscollectsummary.cpp000066400000000000000000000037011255543666200246000ustar00rootroot00000000000000/* * sharedsobscollectsummary.cpp * Mothur * * Created by Sarah Westcott on 2/12/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedsobscollectsummary.h" /***********************************************************************/ //This returns the number of shared species observed in several groups. //The shared vector is each groups sharedrabundvector. EstOutput SharedSobsCS::getValues(vector shared){ try { data.resize(1,0); double observed = 0; int numGroups = shared.size(); for (int i = 0; i < shared[0]->getNumBins(); i++) { //get bin values and set sharedByAll bool sharedByAll = true; for (int j = 0; j < numGroups; j++) { if (shared[j]->getAbundance(i) == 0) { sharedByAll = false; } } //they are shared if (sharedByAll == true) { observed++; } } data[0] = observed; return data; } catch(exception& e) { m->errorOut(e, "SharedSobsCS", "getValues"); exit(1); } } /***********************************************************************/ //This returns the number of shared species observed in several groups. //The shared vector is each groups sharedrabundvector. EstOutput SharedSobsCS::getValues(vector shared, vector& labels){ try { data.resize(1,0); double observed = 0; int numGroups = shared.size(); labels.clear(); for (int i = 0; i < shared[0]->getNumBins(); i++) { //get bin values and set sharedByAll bool sharedByAll = true; for (int j = 0; j < numGroups; j++) { if (shared[j]->getAbundance(i) == 0) { sharedByAll = false; } } //they are shared if (sharedByAll == true) { observed++; labels.push_back(m->currentSharedBinLabels[i]); } } data[0] = observed; return data; } catch(exception& e) { m->errorOut(e, "SharedSobsCS", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedsobscollectsummary.h000066400000000000000000000015331255543666200242460ustar00rootroot00000000000000#ifndef SHAREDSOBSCOLLECTSUMMARY_H #define SHAREDSOBSCOLLECTSUMMARY_H /* * sharedsobscollectsummary.h * Mothur * * Created by Sarah Westcott on 2/12/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This calculator returns the number of shared species between 2 groups. */ #include "calculator.h" /***********************************************************************/ class SharedSobsCS : public Calculator { public: SharedSobsCS() : Calculator("sharedsobs", 1, true) {}; EstOutput getValues(SAbundVector* rank){ return data; }; EstOutput getValues(vector); EstOutput getValues(vector, vector&); string getCitation() { return "http://www.mothur.org/wiki/Sharedsobs"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedsorabund.cpp000066400000000000000000000014451255543666200224660ustar00rootroot00000000000000/* * sharedsorabund.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedsorabund.h" /***********************************************************************/ EstOutput SorAbund::getValues(vector shared) { try { EstOutput UVest; UVest.resize(2,0); data.resize(1,0); UVest = uv->getUVest(shared); //UVest[0] is Uest, UVest[1] is Vest data[0] = (2 * UVest[0] * UVest[1]) / ((float)(UVest[0] + UVest[1])); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } data[0] = 1-data[0]; return data; } catch(exception& e) { m->errorOut(e, "SorAbund", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedsorabund.h000066400000000000000000000014241255543666200221300ustar00rootroot00000000000000#ifndef SORABUND_H #define SORABUND_H /* * sharedsorabund.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedSorAbund estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class SorAbund : public Calculator { public: SorAbund() : Calculator("sorabund", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Sorabund"; } private: UVEst* uv; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedsorclass.cpp000066400000000000000000000023101255543666200224720ustar00rootroot00000000000000/* * sharedsorclass.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedsorclass.h" /***********************************************************************/ EstOutput SorClass::getValues(vector shared) { try { double S1, S2, S12, tempA, tempB; S1 = 0; S2 = 0; S12 = 0; tempA = 0; tempB = 0; /*S1, S2 = number of OTUs observed or estimated in A and B S12=number of OTUs shared between A and B */ data.resize(1,0); for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); //find number of bins in shared1 and shared2 if (tempA != 0) { S1++; } if (tempB != 0) { S2++; } //they are shared if ((tempA != 0) && (tempB != 0)) { S12++; } } data[0] = 1.0-(2 * S12) / (float)(S1 + S2); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "SorClass", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedsorclass.h000066400000000000000000000014101255543666200221370ustar00rootroot00000000000000#ifndef SORCLASS_H #define SORCLASS_H /* * sharedsorclass.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedSorClass estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class SorClass : public Calculator { public: SorClass() : Calculator("sorclass", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Sorclass"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedsorest.cpp000066400000000000000000000027021255543666200221650ustar00rootroot00000000000000/* * sharedsorest.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedsorest.h" #include "chao1.h" #include "sharedchao1.h" /***********************************************************************/ EstOutput SorEst::getValues(vector shared) { try { EstOutput S1, S2, S12; S12.resize(1,0); S1.resize(3,0); S2.resize(3,0); /*S1, S2 = number of OTUs estimated in A and B using the Chao estimator S12 = estimated number of OTUs shared between A and B using the SharedChao estimator*/ data.resize(1,0); SharedChao1* sharedChao = new SharedChao1(); Chao1* chaoS1 = new Chao1(); Chao1* chaoS2 = new Chao1(); SAbundVector* chaoS1Sabund = new SAbundVector(); SAbundVector* chaoS2Sabund = new SAbundVector(); *chaoS1Sabund = shared[0]->getSAbundVector(); *chaoS2Sabund = shared[1]->getSAbundVector(); S12 = sharedChao->getValues(shared); S1 = chaoS1->getValues(chaoS1Sabund); S2 = chaoS2->getValues(chaoS2Sabund); data[0] = 1.0-(2 * S12[0]) / (float)(S1[0] + S2[0]); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } delete sharedChao; delete chaoS1; delete chaoS2; delete chaoS1Sabund; delete chaoS2Sabund; return data; } catch(exception& e) { m->errorOut(e, "SorEst", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedsorest.h000066400000000000000000000013711255543666200216330ustar00rootroot00000000000000#ifndef SOREST_H #define SOREST_H /* * sharedsorest.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedSorEst estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class SorEst : public Calculator { public: SorEst() : Calculator("sorest", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Sorest"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedthetan.cpp000066400000000000000000000032121255543666200221260ustar00rootroot00000000000000/* * sharedthetan.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedthetan.h" /***********************************************************************/ EstOutput ThetaN::getValues(vector shared) { try { data.resize(1,0); double Atotal, Btotal, tempA, tempB; Atotal = 0; Btotal = 0; double numerator, denominator, thetaN, sumSharedA, sumSharedB, a, b, d; numerator = 0.0; denominator = 0.0; thetaN = 0.0; sumSharedA = 0.0; sumSharedB = 0.0; a = 0.0; b = 0.0; d = 0.0; //get the total values we need to calculate the theta denominator sums for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls Atotal += shared[0]->getAbundance(i); Btotal += shared[1]->getAbundance(i); } //calculate the theta denominator sums for (int j = 0; j < shared[0]->getNumBins(); j++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(j); tempB = shared[1]->getAbundance(j); //they are shared if ((tempA != 0) && (tempB != 0)) { if (Atotal != 0) { sumSharedA = (tempA / (float)Atotal); } if (Btotal != 0) { sumSharedB = (tempB / (float)Btotal); } a += sumSharedA; b += sumSharedB; } } thetaN = (a * b) / (a + b - (a * b)); if (isnan(thetaN) || isinf(thetaN)) { thetaN = 0; } data[0] = 1.0 - thetaN; return data; } catch(exception& e) { m->errorOut(e, "ThetaN", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedthetan.h000066400000000000000000000013711255543666200215770ustar00rootroot00000000000000#ifndef THETAN_H #define THETAN_H /* * sharedthetan.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedThetaN estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class ThetaN : public Calculator { public: ThetaN() : Calculator("thetan", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Thetan"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sharedthetayc.cpp000066400000000000000000000047041255543666200223130ustar00rootroot00000000000000/* * sharedthetayc.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedthetayc.h" /***********************************************************************/ EstOutput ThetaYC::getValues(vector shared) { try { data.resize(3,0.0000); double Atotal = 0; double Btotal = 0; double thetaYC = 0; double pi = 0; double qi = 0; double a = 0; double b = 0; double d = 0; double sumPcubed = 0; double sumQcubed = 0; double sumPQsq = 0; double sumPsqQ = 0; //get the total values we need to calculate the theta denominator sums for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls Atotal += (double)shared[0]->getAbundance(i); Btotal += (double)shared[1]->getAbundance(i); } //calculate the theta denominator sums for (int j = 0; j < shared[0]->getNumBins(); j++) { //store in temps to avoid multiple repetitive function calls pi = shared[0]->getAbundance(j) / Atotal; qi = shared[1]->getAbundance(j) / Btotal; a += pi * pi; b += qi * qi; d += pi * qi; sumPcubed += pi * pi * pi; sumQcubed += qi * qi * qi; sumPQsq += pi * qi * qi; sumPsqQ += pi * pi * qi; } thetaYC = d / (a + b - d); if (isnan(thetaYC) || isinf(thetaYC)) { thetaYC = 0; } double varA = 4 / Atotal * (sumPcubed - a * a); double varB = 4 / Btotal * (sumQcubed - b * b); double varD = sumPQsq / Atotal + sumPsqQ / Btotal - d * d * (1/Atotal + 1/Btotal); double covAD = 2 / Atotal * (sumPsqQ - a * d); double covBD = 2 / Btotal * (sumPQsq - b* d); double varT = d * d * (varA + varB) / pow(a + b - d, (double)4.0) + pow(a+b, (double)2.0) * varD / pow(a+b-d, (double)4.0) - 2.0 * (a + b) * d / pow(a + b - d, (double)4.0) * (covAD + covBD); double ci = 1.95 * sqrt(varT); data[0] = thetaYC; data[1] = thetaYC - ci; data[2] = thetaYC + ci; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } data[0] = 1.0 - data[0]; double hold = data[1]; data[1] = 1.0 - data[2]; data[2] = 1.0 - hold; return data; } catch(exception& e) { m->errorOut(e, "ThetaYC", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/sharedthetayc.h000066400000000000000000000014021255543666200217500ustar00rootroot00000000000000#ifndef THETAYC_H #define THETAYC_H /* * sharedthetayc.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the SharedThetaYC estimator on two groups. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class ThetaYC : public Calculator { public: ThetaYC() : Calculator("thetayc", 3, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Thetayc"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/shen.cpp000066400000000000000000000014061255543666200204140ustar00rootroot00000000000000/* * shen.cpp * Mothur * * Created by Thomas Ryabin on 5/18/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "shen.h" #include "ace.h" /***********************************************************************/ EstOutput Shen::getValues(SAbundVector* rank){ try { data.resize(1,0); double n = (double)rank->getNumSeqs(); double f1 = (double)rank->get(1); Ace* calc = new Ace(abund); EstOutput ace = calc->getValues(rank); double f0 = ace[0]-rank->getNumBins(); data[0] = f0 * (1 - pow(1 - f1/n/f0, f)); delete calc; return data; } catch(exception& e) { m->errorOut(e, "Shen", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/shen.h000066400000000000000000000014271255543666200200640ustar00rootroot00000000000000#ifndef SHEN_H #define SHEN_H /* * shen.h * Mothur * * Created by Thomas Ryabin on 5/18/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /* This class implements the shen calculator on single group. It is a child of the calculator class. */ /***********************************************************************/ class Shen : public Calculator { public: Shen(int size, int n) : f(size), abund(n), Calculator("shen", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Shen"; } private: int f; int abund; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/simpson.cpp000066400000000000000000000030611255543666200211460ustar00rootroot00000000000000/* * simpson.cpp * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "simpson.h" /***********************************************************************/ EstOutput Simpson::getValues(SAbundVector* rank){ try { //vector simpsonData(3,0); data.resize(3,0); double simpson = 0.0000; double ci = 0; double maxRank = (double)rank->getMaxRank(); double sampled = (double)rank->getNumSeqs(); double sobs = (double)rank->getNumBins(); double firstTerm = 0; double secondTerm = 0; if(sobs != 0){ double simnum=0.0000; for(unsigned long long i=1;i<=maxRank;i++){ simnum += (double)(rank->get(i)*i*(i-1)); } simpson = simnum / (sampled*(sampled-1)); for(unsigned long long i=1;i<=maxRank;i++){ double piI = (double) i / (double)sampled; firstTerm += rank->get(i) * pow(piI, 3); secondTerm += rank->get(i) * pow(piI, 2); } double var = (4.0 / sampled) * (firstTerm - secondTerm*secondTerm); ci = 1.95 * pow(var, 0.5); } double simpsonlci = simpson - ci; double simpsonhci = simpson + ci; data[0] = simpson; data[1] = simpsonlci; data[2] = simpsonhci; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } if (isnan(data[1]) || isinf(data[1])) { data[1] = 0; } if (isnan(data[2]) || isinf(data[2])) { data[2] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Simpson", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/simpson.h000066400000000000000000000013521255543666200206140ustar00rootroot00000000000000#ifndef SIMPSON_H #define SIMPSON_H /* * simpson.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the Simpson estimator on single group. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class Simpson : public Calculator { public: Simpson() : Calculator("simpson", 3, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Simpson"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/simpsoneven.cpp000066400000000000000000000012521255543666200220240ustar00rootroot00000000000000/* * simpsoneven.cpp * Mothur * * Created by Pat Schloss on 8/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "simpsoneven.h" #include "invsimpson.h" /***********************************************************************/ EstOutput SimpsonEven::getValues(SAbundVector* rank){ try { data.resize(1,0); InvSimpson* simp = new InvSimpson(); vector invSimpData = simp->getValues(rank); data[0] = invSimpData[0] / double(rank->getNumBins()); return data; } catch(exception& e) { m->errorOut(e, "SimpsonEven", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/simpsoneven.h000066400000000000000000000012151255543666200214700ustar00rootroot00000000000000#ifndef SIMPSONEVEN_H #define SIMPSONEVEN_H /* * simpsoneven.h * Mothur * * Created by Pat Schloss on 8/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class SimpsonEven : public Calculator { public: SimpsonEven() : Calculator("simpsoneven", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Simpsoneven"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/smithwilson.cpp000066400000000000000000000017071255543666200220430ustar00rootroot00000000000000/* * smithwilson.cpp * Mothur * * Created by Pat Schloss on 8/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "smithwilson.h" /***********************************************************************/ EstOutput SmithWilson::getValues(SAbundVector* rank){ try { data.resize(1,0); double maxRank = rank->getMaxRank(); double sobs = rank->getNumBins(); double innerSum = 0; for(int i=1;i<=maxRank;i++){ innerSum += rank->get(i) * log(i); } innerSum /= sobs; double outerSum = 0; for(int i=1;i<=maxRank;i++){ outerSum += rank->get(i) * (log(i) - innerSum) * (log(i) - innerSum); } outerSum /= sobs; if(outerSum > 0){ data[0] = 1.0000 - 2.0000 / (3.14159 * atan(outerSum)); } else{ data[0] = 1.0000; } return data; } catch(exception& e) { m->errorOut(e, "InvSimpson", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/smithwilson.h000066400000000000000000000012101255543666200214750ustar00rootroot00000000000000#ifndef SMITHWILSON #define SMITHWILSON /* * smithwilson.h * Mothur * * Created by Pat Schloss on 8/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class SmithWilson : public Calculator { public: SmithWilson() : Calculator("smithwilson", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Smithwilson"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/sobs.h000066400000000000000000000014451255543666200200750ustar00rootroot00000000000000#ifndef SOBS_H #define SOBS_H /* * sobs.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the Sobs estimator on single group. It is a child of the calculator class. */ #include "calculator.h" /***********************************************************************/ class Sobs : public Calculator { public: Sobs() : Calculator("sobs", 1, false) {}; EstOutput getValues(SAbundVector* rank){ data.resize(1,0); data[0] = (double)rank->getNumBins(); return data; } EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Sobs"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/soergel.cpp000066400000000000000000000015561255543666200211250ustar00rootroot00000000000000/* * soergel.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "soergel.h" /***********************************************************************/ EstOutput Soergel::getValues(vector shared) { try { data.resize(1,0); double sumNum = 0.0; double sumMax = 0.0; //calc the 2 denominators for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); sumNum += abs((Aij - Bij)); sumMax += max(Aij, Bij); } data[0] = sumNum / sumMax; if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Soergel", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/soergel.h000066400000000000000000000011761255543666200205700ustar00rootroot00000000000000#ifndef SOERGEL_H #define SOERGEL_H /* * soergel.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Soergel : public Calculator { public: Soergel() : Calculator("soergel", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Soergel"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/solow.cpp000066400000000000000000000012541255543666200206230ustar00rootroot00000000000000/* * solow.cpp * Mothur * * Created by Thomas Ryabin on 5/13/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "solow.h" #include /***********************************************************************/ EstOutput Solow::getValues(SAbundVector* rank){ try { data.resize(1,0); double n = (double)rank->getNumSeqs(); double f1 = (double)rank->get(1); double f2 = (double)rank->get(2); data[0] = f1*f1/2/f2 * (1 - pow(1 - 2*f2/n/f1, f)); return data; } catch(exception& e) { m->errorOut(e, "Solow", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/solow.h000066400000000000000000000014021255543666200202630ustar00rootroot00000000000000#ifndef SOLOW_H #define SOLOW_H /* * solow.h * Mothur * * Created by Thomas Ryabin on 5/13/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /* This class implements the solow calculator on single group. It is a child of the calculator class. */ /***********************************************************************/ class Solow : public Calculator { public: Solow(int size) : f(size), Calculator("solow", 1, false) {}; EstOutput getValues(SAbundVector*); EstOutput getValues(vector) {return data;}; string getCitation() { return "http://www.mothur.org/wiki/Solow"; } private: int f; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/spearman.cpp000066400000000000000000000036661255543666200212770ustar00rootroot00000000000000/* * spearman.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "spearman.h" /***********************************************************************/ EstOutput Spearman::getValues(vector shared) { try { data.resize(1,0); SAbundVector savA = shared[0]->getSAbundVector(); SAbundVector savB = shared[1]->getSAbundVector(); double sumRanks = 0.0; int numOTUS = shared[0]->getNumBins(); vector rankVectorA(savA.getMaxRank()+1, 0); int currentRankA = 0; for(int i=savA.getMaxRank();i>0;i--){ int numWithAbundanceI = savA.get(i); if(numWithAbundanceI > 1){ rankVectorA[i] = (currentRankA + 1 + currentRankA + numWithAbundanceI) / 2.0; } else { rankVectorA[i] = currentRankA+numWithAbundanceI; } currentRankA += numWithAbundanceI; } rankVectorA[0] = (numOTUS + currentRankA + 1) / 2.0; vector rankVectorB(savB.getMaxRank()+1, 0); int currentRankB = 0; for(int i=savB.getMaxRank();i>0;i--){ int numWithAbundanceI = savB.get(i); if(numWithAbundanceI > 1){ rankVectorB[i] = (currentRankB + 1 + currentRankB + numWithAbundanceI) / 2.0; } else { rankVectorB[i] = currentRankB+numWithAbundanceI; } currentRankB += numWithAbundanceI; } rankVectorB[0] = (numOTUS + currentRankB + 1) / 2.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); float rankA = rankVectorA[Aij]; float rankB = rankVectorB[Bij]; sumRanks += ((rankA - rankB) * (rankA - rankB)); } data[0] = 1.0 - ((6 * sumRanks) / (float) (numOTUS * ((numOTUS*numOTUS)-1))); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "Spearman", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/spearman.h000066400000000000000000000012101255543666200207230ustar00rootroot00000000000000#ifndef SPEARMAN_H #define SPEARMAN_H /* * spearman.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class Spearman : public Calculator { public: Spearman() : Calculator("spearman", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Spearman"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/speciesprofile.cpp000066400000000000000000000020041255543666200224660ustar00rootroot00000000000000/* * speciesprofile.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "speciesprofile.h" /***********************************************************************/ EstOutput SpeciesProfile::getValues(vector shared) { try { data.resize(1,0); double sumA = 0.0; double sumB = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { sumA += shared[0]->getAbundance(i); sumB += shared[1]->getAbundance(i); } double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int A = shared[0]->getAbundance(i); int B = shared[1]->getAbundance(i); sum += (((A / sumA) - (B / sumB)) * ((A / sumA) - (B / sumB))); } data[0] = sqrt(sum); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "SpeciesProfile", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/speciesprofile.h000066400000000000000000000012561255543666200221430ustar00rootroot00000000000000#ifndef SPECIESPROFILE_H #define SPECIESPROFILE_H /* * speciesprofile.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class SpeciesProfile : public Calculator { public: SpeciesProfile() : Calculator("speciesprofile", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Speciesprofile"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/structchi2.cpp000066400000000000000000000025621255543666200215550ustar00rootroot00000000000000/* * structchi2.cpp * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "structchi2.h" /***********************************************************************/ EstOutput StructChi2::getValues(vector shared) { try { data.resize(1,0); double sumA = shared[0]->getNumSeqs(); double sumB = shared[1]->getNumSeqs(); double totalSum = 0.0; for (int i = 0; i < shared.size(); i++) { totalSum += shared[i]->getNumSeqs(); } vector sumOtus; sumOtus.resize(shared[0]->getNumBins(), 0); //for each otu for (int i = 0; i < shared[0]->getNumBins(); i++) { //for each group for (int j = 0; j < shared.size(); j++) { sumOtus[i] += shared[j]->getAbundance(i); } } double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int A = shared[0]->getAbundance(i); int B = shared[1]->getAbundance(i); double totalTerm = 1 / (float) sumOtus[i]; double Aterm = A / sumA; double Bterm = B / sumB; sum += (totalTerm * ((Aterm-Bterm)*(Aterm-Bterm))); } data[0] = sqrt((totalSum * sum)); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "StructChi2", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/structchi2.h000066400000000000000000000013511255543666200212150ustar00rootroot00000000000000#ifndef STRUCTCHI2_H #define STRUCTCHI2_H /* * structchi2.h * Mothur * * Created by westcott on 12/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class StructChi2 : public Calculator { public: StructChi2() : Calculator("structchi2", 1, false, true) {}; //the true means this calculator needs all groups to calculate the pair value EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Structchi2"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/structchord.cpp000066400000000000000000000023001255543666200220150ustar00rootroot00000000000000/* * structchord.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "structchord.h" /***********************************************************************/ EstOutput StructChord::getValues(vector shared) { try { data.resize(1,0); double sumAj2 = 0.0; double sumBj2 = 0.0; //calc the 2 denominators for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); //(Aij) ^ 2 sumAj2 += (Aij * Aij); sumBj2 += (Bij * Bij); } sumAj2 = sqrt(sumAj2); sumBj2 = sqrt(sumBj2); //calc sum double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); sum += (((Aij / sumAj2) - (Bij / sumBj2)) * ((Aij / sumAj2) - (Bij / sumBj2))); } data[0] = sqrt(sum); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "StructChord", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/structchord.h000066400000000000000000000012321255543666200214650ustar00rootroot00000000000000#ifndef STRUCTCHORD_H #define STRUCTCHORD_H /* * structchord.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class StructChord : public Calculator { public: StructChord() : Calculator("structchord", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Structchord"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/structeuclidean.cpp000066400000000000000000000015231255543666200226550ustar00rootroot00000000000000/* * structeuclidean.cpp * Mothur * * Created by westcott on 12/14/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "structeuclidean.h" /***********************************************************************/ EstOutput StructEuclidean::getValues(vector shared) { try { data.resize(1,0); double sum = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); //(Aij - Bij) ^ 2 sum += ((Aij - Bij) * (Aij - Bij)); } data[0] = sqrt(sum); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "StructEuclidean", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/structeuclidean.h000066400000000000000000000012661255543666200223260ustar00rootroot00000000000000#ifndef STRUCTEUCLIDEAN_H #define STRUCTEUCLIDEAN_H /* * structeuclidean.h * Mothur * * Created by westcott on 12/14/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class StructEuclidean : public Calculator { public: StructEuclidean() : Calculator("structeuclidean", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Structeuclidean"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/structkulczynski.cpp000066400000000000000000000016411255543666200231330ustar00rootroot00000000000000/* * structkulczynski.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "structkulczynski.h" /***********************************************************************/ EstOutput StructKulczynski::getValues(vector shared) { try { data.resize(1,0); double sumA = 0.0; double sumB = 0.0; double sumMin = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int A = shared[0]->getAbundance(i); int B = shared[1]->getAbundance(i); sumA += A; sumB += B; sumMin += min(A, B); } data[0] = 1.0 - (0.5 * ((sumMin / sumA) + (sumMin / sumB))); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "StructKulczynski", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/structkulczynski.h000066400000000000000000000012751255543666200226030ustar00rootroot00000000000000#ifndef STRUCTKULCZYNSKI_H #define STRUCTKULCZYNSKI_H /* * structkulczynski.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class StructKulczynski : public Calculator { public: StructKulczynski() : Calculator("structkulczynski", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Structkulczynski"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/structpearson.cpp000066400000000000000000000024241255543666200223740ustar00rootroot00000000000000/* * structpearson.cpp * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "structpearson.h" /***********************************************************************/ EstOutput StructPearson::getValues(vector shared) { try { data.resize(1,0); int numOTUS = shared[0]->getNumBins(); double averageA = shared[0]->getNumSeqs() / (float) numOTUS; double averageB = shared[1]->getNumSeqs() / (float) numOTUS; double numTerm = 0.0; double denomTerm1 = 0.0; double denomTerm2 = 0.0; for (int i = 0; i < shared[0]->getNumBins(); i++) { int Aij = shared[0]->getAbundance(i); int Bij = shared[1]->getAbundance(i); numTerm += ((Aij - averageA) * (Bij - averageB)); denomTerm1 += ((Aij - averageA) * (Aij - averageA)); denomTerm2 += ((Bij - averageB) * (Bij - averageB)); } denomTerm1 = sqrt(denomTerm1); denomTerm2 = sqrt(denomTerm2); double denom = denomTerm1 * denomTerm2; data[0] = (numTerm / denom); if (isnan(data[0]) || isinf(data[0])) { data[0] = 0; } return data; } catch(exception& e) { m->errorOut(e, "StructPearson", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/structpearson.h000066400000000000000000000012501255543666200220350ustar00rootroot00000000000000#ifndef STRUCTPEARSON_H #define STRUCTPEARSON_H /* * structpearson.h * Mothur * * Created by westcott on 12/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "calculator.h" /***********************************************************************/ class StructPearson : public Calculator { public: StructPearson() : Calculator("structpearson", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Structpearson"; } private: }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/treecalculator.h000066400000000000000000000021001255543666200221250ustar00rootroot00000000000000#ifndef TREECALCULATOR_H #define TREECALCULATOR_H /* * treecalculator.h * Mothur * * Created by Sarah Westcott on 1/26/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "tree.h" #include "mothurout.h" /* The tree calculator class is the parent class for tree calculators in mothur. */ typedef vector EstOutput; /***********************************************************************/ class TreeCalculator { public: TreeCalculator(){ m = MothurOut::getInstance(); } TreeCalculator(string n) : name(n) {}; virtual ~TreeCalculator(){}; virtual EstOutput getValues(Tree*) { return data; } virtual EstOutput getValues(Tree*, int, string) { return data; } virtual EstOutput getValues(Tree*, string, string) { return data; } virtual EstOutput getValues(Tree*, string, string, vector&) { return data; } virtual string getName() { return name; } protected: EstOutput data; string name; MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/unweighted.cpp000066400000000000000000000713521255543666200216310ustar00rootroot00000000000000/* * unweighted.cpp * Mothur * * Created by Sarah Westcott on 2/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "unweighted.h" /**************************************************************************************************/ EstOutput Unweighted::getValues(Tree* t, int p, string o) { try { processors = p; outputDir = o; CountTable* ct = t->getCountTable(); //if the users enters no groups then give them the score of all groups int numGroups = m->getNumGroups(); //calculate number of comparsions int numComp = 0; vector< vector > namesOfGroupCombos; for (int r=0; r groups; groups.push_back((m->getGroups())[r]); groups.push_back((m->getGroups())[l]); namesOfGroupCombos.push_back(groups); } } if (numComp != 1) { vector groups; if (numGroups == 0) { //get score for all users groups for (int i = 0; i < (ct->getNamesOfGroups()).size(); i++) { if ((ct->getNamesOfGroups())[i] != "xxx") { groups.push_back((ct->getNamesOfGroups())[i]); } } namesOfGroupCombos.push_back(groups); }else { for (int i = 0; i < m->getNumGroups(); i++) { groups.push_back((m->getGroups())[i]); } namesOfGroupCombos.push_back(groups); } } lines.clear(); int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } data = createProcesses(t, namesOfGroupCombos, ct); lines.clear(); return data; } catch(exception& e) { m->errorOut(e, "Unweighted", "getValues"); exit(1); } } /**************************************************************************************************/ EstOutput Unweighted::createProcesses(Tree* t, vector< vector > namesOfGroupCombos, CountTable* ct) { try { int process = 1; vector processIDS; bool recalc = false; EstOutput results; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ EstOutput myresults; myresults = driver(t, namesOfGroupCombos, lines[process].start, lines[process].num, ct); if (m->control_pressed) { exit(0); } //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".unweighted.results.temp"; m->openOutputFile(tempFile, out); out << myresults.size() << endl; for (int i = 0; i < myresults.size(); i++) { out << myresults[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".unweighted.results.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".unweighted.results.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //if the users enters no groups then give them the score of all groups int numGroups = m->getNumGroups(); //calculate number of comparsions int numComp = 0; vector< vector > namesOfGroupCombos; for (int r=0; r groups; groups.push_back((m->getGroups())[r]); groups.push_back((m->getGroups())[l]); namesOfGroupCombos.push_back(groups); } } if (numComp != 1) { vector groups; if (numGroups == 0) { //get score for all users groups for (int i = 0; i < (ct->getNamesOfGroups()).size(); i++) { if ((ct->getNamesOfGroups())[i] != "xxx") { groups.push_back((ct->getNamesOfGroups())[i]); } } namesOfGroupCombos.push_back(groups); }else { for (int i = 0; i < m->getNumGroups(); i++) { groups.push_back((m->getGroups())[i]); } namesOfGroupCombos.push_back(groups); } } lines.clear(); int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } results.clear(); processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ EstOutput myresults; myresults = driver(t, namesOfGroupCombos, lines[process].start, lines[process].num, ct); if (m->control_pressed) { exit(0); } //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".unweighted.results.temp"; m->openOutputFile(tempFile, out); out << myresults.size() << endl; for (int i = 0; i < myresults.size(); i++) { out << myresults[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } results = driver(t, namesOfGroupCombos, lines[0].start, lines[0].num, ct); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } if (m->control_pressed) { return results; } //get data created by processes for (int i=0;i<(processors-1);i++) { ifstream in; string s = outputDir + toString(processIDS[i]) + ".unweighted.results.temp"; m->openInputFile(s, in); //get quantiles if (!in.eof()) { int num; in >> num; m->gobble(in); if (m->control_pressed) { break; } double w; for (int j = 0; j < num; j++) { in >> w; results.push_back(w); } m->gobble(in); } in.close(); m->mothurRemove(s); } #else //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; vector cts; vector trees; //Create processor worker threads. for( int i=1; icopy(ct); Tree* copyTree = new Tree(copyCount); copyTree->getCopy(t); cts.push_back(copyCount); trees.push_back(copyTree); unweightedData* tempweighted = new unweightedData(m, lines[i].start, lines[i].num, namesOfGroupCombos, copyTree, copyCount, includeRoot); pDataArray.push_back(tempweighted); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyUnWeightedThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } results = driver(t, namesOfGroupCombos, lines[0].start, lines[0].num, ct); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (int j = 0; j < pDataArray[i]->results.size(); j++) { results.push_back(pDataArray[i]->results[j]); } delete cts[i]; delete trees[i]; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif return results; } catch(exception& e) { m->errorOut(e, "Unweighted", "createProcesses"); exit(1); } } /**************************************************************************************************/ EstOutput Unweighted::driver(Tree* t, vector< vector > namesOfGroupCombos, int start, int num, CountTable* ct) { try { EstOutput results; results.resize(num); int count = 0; //int total = num; for (int h = start; h < (start+num); h++) { if (m->control_pressed) { return results; } double UniqueBL=0.0000; //a branch length is unique if it's chidren are from the same group double totalBL = 0.00; //all branch lengths double UW = 0.00; //Unweighted Value = UniqueBL / totalBL; //find a node that belongs to one of the groups in this combo int nodeBelonging = -1; for (int g = 0; g < namesOfGroupCombos[h].size(); g++) { if (t->groupNodeInfo[namesOfGroupCombos[h][g]].size() != 0) { nodeBelonging = t->groupNodeInfo[namesOfGroupCombos[h][g]][0]; break; } } //sanity check if (nodeBelonging == -1) { m->mothurOut("[WARNING]: cannot find a nodes in the tree from grouping "); for (int g = 0; g < namesOfGroupCombos[h].size()-1; g++) { m->mothurOut(namesOfGroupCombos[h][g] + "-"); } m->mothurOut(namesOfGroupCombos[h][namesOfGroupCombos[h].size()-1]); m->mothurOut(", skipping."); m->mothurOutEndLine(); results[count] = UW; }else{ //cout << "trying to get root" << endl; //if including the root this clears rootForGrouping[namesOfGroupCombos[h]] getRoot(t, nodeBelonging, namesOfGroupCombos[h]); //cout << "here" << endl; for(int i=0;igetNumNodes();i++){ if (m->control_pressed) { return data; } //cout << i << endl; //pcountSize = 0, they are from a branch that is entirely from a group the user doesn't want //pcountSize = 2, not unique to one group //pcountSize = 1, unique to one group int pcountSize = 0; for (int j = 0; j < namesOfGroupCombos[h].size(); j++) { map::iterator itGroup = t->tree[i].pcount.find(namesOfGroupCombos[h][j]); if (itGroup != t->tree[i].pcount.end()) { pcountSize++; if (pcountSize > 1) { break; } } } //unique calc if (pcountSize == 0) { } else if ((t->tree[i].getBranchLength() != -1) && (pcountSize == 1) && (rootForGrouping[namesOfGroupCombos[h]].count(i) == 0)) { //you have a unique branch length and you are not the root UniqueBL += abs(t->tree[i].getBranchLength()); } //total calc if (pcountSize == 0) { } else if ((t->tree[i].getBranchLength() != -1) && (pcountSize != 0) && (rootForGrouping[namesOfGroupCombos[h]].count(i) == 0)) { //you have a branch length and you are not the root totalBL += abs(t->tree[i].getBranchLength()); } } //cout << UniqueBL << '\t' << totalBL << endl; UW = (UniqueBL / totalBL); if (isnan(UW) || isinf(UW)) { UW = 0; } results[count] = UW; } count++; } return results; } catch(exception& e) { m->errorOut(e, "Unweighted", "driver"); exit(1); } } /**************************************************************************************************/ EstOutput Unweighted::getValues(Tree* t, string groupA, string groupB, int p, string o) { try { processors = p; outputDir = o; CountTable* ct = t->getCountTable(); //if the users enters no groups then give them the score of all groups int numGroups = m->getNumGroups(); //calculate number of comparsions int numComp = 0; vector< vector > namesOfGroupCombos; for (int r=0; r groups; groups.push_back((m->getGroups())[r]); groups.push_back((m->getGroups())[l]); namesOfGroupCombos.push_back(groups); } } if (numComp != 1) { vector groups; if (numGroups == 0) { //get score for all users groups for (int i = 0; i < (ct->getNamesOfGroups()).size(); i++) { if ((ct->getNamesOfGroups())[i] != "xxx") { groups.push_back((ct->getNamesOfGroups())[i]); } } namesOfGroupCombos.push_back(groups); }else { for (int i = 0; i < m->getNumGroups(); i++) { groups.push_back((m->getGroups())[i]); } namesOfGroupCombos.push_back(groups); } } lines.clear(); int numPairs = namesOfGroupCombos.size(); int numPairsPerProcessor = ceil(numPairs / processors); for (int i = 0; i < processors; i++) { int startPos = i * numPairsPerProcessor; if(i == processors - 1){ numPairsPerProcessor = numPairs - i * numPairsPerProcessor; } lines.push_back(linePair(startPos, numPairsPerProcessor)); } data = createProcesses(t, namesOfGroupCombos, true, ct); lines.clear(); return data; } catch(exception& e) { m->errorOut(e, "Unweighted", "getValues"); exit(1); } } /**************************************************************************************************/ EstOutput Unweighted::createProcesses(Tree* t, vector< vector > namesOfGroupCombos, bool usingGroups, CountTable* ct) { try { int process = 1; vector processIDS; bool recalc = false; EstOutput results; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ EstOutput myresults; myresults = driver(t, namesOfGroupCombos, lines[process].start, lines[process].num, usingGroups, ct); if (m->control_pressed) { exit(0); } //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".unweighted.results.temp"; m->openOutputFile(tempFile, out); out << myresults.size() << endl; for (int i = 0; i < myresults.size(); i++) { out << myresults[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".unweighted.results.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".unweighted.results.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //if the users enters no groups then give them the score of all groups int numGroups = m->getNumGroups(); //calculate number of comparsions int numComp = 0; vector< vector > namesOfGroupCombos; for (int r=0; r groups; groups.push_back((m->getGroups())[r]); groups.push_back((m->getGroups())[l]); namesOfGroupCombos.push_back(groups); } } if (numComp != 1) { vector groups; if (numGroups == 0) { //get score for all users groups for (int i = 0; i < (ct->getNamesOfGroups()).size(); i++) { if ((ct->getNamesOfGroups())[i] != "xxx") { groups.push_back((ct->getNamesOfGroups())[i]); } } namesOfGroupCombos.push_back(groups); }else { for (int i = 0; i < m->getNumGroups(); i++) { groups.push_back((m->getGroups())[i]); } namesOfGroupCombos.push_back(groups); } } lines.clear(); int numPairs = namesOfGroupCombos.size(); int numPairsPerProcessor = ceil(numPairs / processors); for (int i = 0; i < processors; i++) { int startPos = i * numPairsPerProcessor; if(i == processors - 1){ numPairsPerProcessor = numPairs - i * numPairsPerProcessor; } lines.push_back(linePair(startPos, numPairsPerProcessor)); } results.clear(); processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ EstOutput myresults; myresults = driver(t, namesOfGroupCombos, lines[process].start, lines[process].num, usingGroups, ct); if (m->control_pressed) { exit(0); } //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".unweighted.results.temp"; m->openOutputFile(tempFile, out); out << myresults.size() << endl; for (int i = 0; i < myresults.size(); i++) { out << myresults[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } results = driver(t, namesOfGroupCombos, lines[0].start, lines[0].num, usingGroups, ct); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } if (m->control_pressed) { return results; } //get data created by processes for (int i=0;i<(processors-1);i++) { ifstream in; string s = outputDir + toString(processIDS[i]) + ".unweighted.results.temp"; m->openInputFile(s, in); //get quantiles if (!in.eof()) { int num; in >> num; m->gobble(in); if (m->control_pressed) { break; } double w; for (int j = 0; j < num; j++) { in >> w; results.push_back(w); } m->gobble(in); } in.close(); m->mothurRemove(s); } #else //for some reason it doesn't seem to be calculating hte random trees scores. all scores are the same even though copytree appears to be randomized. /* //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; vector cts; vector trees; //Create processor worker threads. for( int i=1; icopy(ct); Tree* copyTree = new Tree(copyCount); copyTree->getCopy(t); cts.push_back(copyCount); trees.push_back(copyTree); unweightedData* tempweighted = new unweightedData(m, lines[i].start, lines[i].num, namesOfGroupCombos, copyTree, copyCount, includeRoot); pDataArray.push_back(tempweighted); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyUnWeightedRandomThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } results = driver(t, namesOfGroupCombos, lines[0].start, lines[0].num, usingGroups, ct); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (int j = 0; j < pDataArray[i]->results.size(); j++) { results.push_back(pDataArray[i]->results[j]); } delete cts[i]; delete trees[i]; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } */ results = driver(t, namesOfGroupCombos, 0, namesOfGroupCombos.size(), usingGroups, ct); #endif return results; } catch(exception& e) { m->errorOut(e, "Unweighted", "createProcesses"); exit(1); } } /**************************************************************************************************/ EstOutput Unweighted::driver(Tree* t, vector< vector > namesOfGroupCombos, int start, int num, bool usingGroups, CountTable* ct) { try { EstOutput results; results.resize(num); int count = 0; Tree* copyTree = new Tree(ct); for (int h = start; h < (start+num); h++) { if (m->control_pressed) { return results; } //copy random tree passed in copyTree->getCopy(t); //swap labels in the groups you want to compare copyTree->assembleRandomUnifracTree(namesOfGroupCombos[h]); double UniqueBL=0.0000; //a branch length is unique if it's chidren are from the same group double totalBL = 0.00; //all branch lengths double UW = 0.00; //Unweighted Value = UniqueBL / totalBL; //find a node that belongs to one of the groups in this combo int nodeBelonging = -1; for (int g = 0; g < namesOfGroupCombos[h].size(); g++) { if (copyTree->groupNodeInfo[namesOfGroupCombos[h][g]].size() != 0) { nodeBelonging = copyTree->groupNodeInfo[namesOfGroupCombos[h][g]][0]; break; } } //sanity check if (nodeBelonging == -1) { m->mothurOut("[WARNING]: cannot find a nodes in the tree from grouping "); for (int g = 0; g < namesOfGroupCombos[h].size()-1; g++) { m->mothurOut(namesOfGroupCombos[h][g] + "-"); } m->mothurOut(namesOfGroupCombos[h][namesOfGroupCombos[h].size()-1]); m->mothurOut(", skipping."); m->mothurOutEndLine(); results[count] = UW; }else{ //if including the root this clears rootForGrouping[namesOfGroupCombos[h]] getRoot(copyTree, nodeBelonging, namesOfGroupCombos[h]); for(int i=0;igetNumNodes();i++){ if (m->control_pressed) { return data; } //pcountSize = 0, they are from a branch that is entirely from a group the user doesn't want //pcountSize = 2, not unique to one group //pcountSize = 1, unique to one group int pcountSize = 0; for (int j = 0; j < namesOfGroupCombos[h].size(); j++) { map::iterator itGroup = copyTree->tree[i].pcount.find(namesOfGroupCombos[h][j]); if (itGroup != copyTree->tree[i].pcount.end()) { pcountSize++; if (pcountSize > 1) { break; } } } //unique calc if (pcountSize == 0) { } else if ((copyTree->tree[i].getBranchLength() != -1) && (pcountSize == 1) && (rootForGrouping[namesOfGroupCombos[h]].count(i) == 0)) { //you have a unique branch length and you are not the root UniqueBL += abs(copyTree->tree[i].getBranchLength()); } //total calc if (pcountSize == 0) { } else if ((copyTree->tree[i].getBranchLength() != -1) && (pcountSize != 0) && (rootForGrouping[namesOfGroupCombos[h]].count(i) == 0)) { //you have a branch length and you are not the root totalBL += abs(copyTree->tree[i].getBranchLength()); } } //cout << UniqueBL << '\t' << totalBL << endl; UW = (UniqueBL / totalBL); if (isnan(UW) || isinf(UW)) { UW = 0; } results[count] = UW; } count++; } delete copyTree; return results; } catch(exception& e) { m->errorOut(e, "Unweighted", "driver"); exit(1); } } /**************************************************************************************************/ int Unweighted::getRoot(Tree* t, int v, vector grouping) { try { //you are a leaf so get your parent int index = t->tree[v].getParent(); if (includeRoot) { rootForGrouping[grouping].clear(); }else { //my parent is a potential root rootForGrouping[grouping].insert(index); //while you aren't at root while(t->tree[index].getParent() != -1){ //cout << index << endl; if (m->control_pressed) { return 0; } //am I the root for this grouping? if so I want to stop "early" //does my sibling have descendants from the users groups? //if so I am not the root int parent = t->tree[index].getParent(); int lc = t->tree[parent].getLChild(); int rc = t->tree[parent].getRChild(); int sib = lc; if (lc == index) { sib = rc; } map::iterator itGroup; int pcountSize = 0; for (int j = 0; j < grouping.size(); j++) { map::iterator itGroup = t->tree[sib].pcount.find(grouping[j]); if (itGroup != t->tree[sib].pcount.end()) { pcountSize++; if (pcountSize > 1) { break; } } } //if yes, I am not the root if (pcountSize != 0) { rootForGrouping[grouping].clear(); rootForGrouping[grouping].insert(parent); } index = parent; } //get all nodes above the root to add so we don't add their u values above index = *(rootForGrouping[grouping].begin()); while(t->tree[index].getParent() != -1){ int parent = t->tree[index].getParent(); rootForGrouping[grouping].insert(parent); //cout << parent << " in root" << endl; index = parent; } } return 0; } catch(exception& e) { m->errorOut(e, "Unweighted", "getRoot"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/calculators/unweighted.h000066400000000000000000000351721255543666200212760ustar00rootroot00000000000000#ifndef UNWEIGHTED_H #define UNWEIGHTED_H /* * unweighted.h * Mothur * * Created by Sarah Westcott on 2/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "treecalculator.h" #include "counttable.h" /***********************************************************************/ class Unweighted : public TreeCalculator { public: Unweighted(bool r) : includeRoot(r) {}; ~Unweighted() {}; EstOutput getValues(Tree*, int, string); EstOutput getValues(Tree*, string, string, int, string); private: struct linePair { int start; int num; linePair(int i, int j) : start(i), num(j) {} }; vector lines; EstOutput data; int processors; string outputDir; map< vector, set > rootForGrouping; //maps a grouping combo to the roots for that combo bool includeRoot; EstOutput driver(Tree*, vector< vector >, int, int, CountTable*); EstOutput createProcesses(Tree*, vector< vector >, CountTable*); EstOutput driver(Tree*, vector< vector >, int, int, bool, CountTable*); EstOutput createProcesses(Tree*, vector< vector >, bool, CountTable*); int getRoot(Tree*, int, vector); }; /***********************************************************************/ struct unweightedData { int start; int num; MothurOut* m; EstOutput results; vector< vector > namesOfGroupCombos; Tree* t; CountTable* ct; bool includeRoot; unweightedData(){} unweightedData(MothurOut* mout, int st, int en, vector< vector > ngc, Tree* tree, CountTable* count, bool ir) { m = mout; start = st; num = en; namesOfGroupCombos = ngc; t = tree; ct = count; includeRoot = ir; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyUnWeightedThreadFunction(LPVOID lpParam){ unweightedData* pDataArray; pDataArray = (unweightedData*)lpParam; try { pDataArray->results.resize(pDataArray->num); map< vector, set > rootForGrouping; int count = 0; for (int h = pDataArray->start; h < (pDataArray->start+pDataArray->num); h++) { if (pDataArray->m->control_pressed) { return 0; } double UniqueBL=0.0000; //a branch length is unique if it's chidren are from the same group double totalBL = 0.00; //all branch lengths double UW = 0.00; //Unweighted Value = UniqueBL / totalBL; //find a node that belongs to one of the groups in this combo int nodeBelonging = -1; for (int g = 0; g < pDataArray->namesOfGroupCombos[h].size(); g++) { if (pDataArray->t->groupNodeInfo[pDataArray->namesOfGroupCombos[h][g]].size() != 0) { nodeBelonging = pDataArray->t->groupNodeInfo[pDataArray->namesOfGroupCombos[h][g]][0]; break; } } //sanity check if (nodeBelonging == -1) { pDataArray->m->mothurOut("[WARNING]: cannot find a nodes in the tree from grouping "); for (int g = 0; g < pDataArray->namesOfGroupCombos[h].size()-1; g++) { pDataArray->m->mothurOut(pDataArray->namesOfGroupCombos[h][g] + "-"); } pDataArray->m->mothurOut(pDataArray->namesOfGroupCombos[h][pDataArray->namesOfGroupCombos[h].size()-1]); pDataArray->m->mothurOut(", skipping."); pDataArray->m->mothurOutEndLine(); pDataArray->results[count] = UW; }else{ //if including the root this clears rootForGrouping[namesOfGroupCombos[h]] //getRoot(t, nodeBelonging, namesOfGroupCombos[h]); ///////////////////////////////////////////////////////////////////////////// //you are a leaf so get your parent vector grouping = pDataArray->namesOfGroupCombos[h]; int index = pDataArray->t->tree[nodeBelonging].getParent(); if (pDataArray->includeRoot) { rootForGrouping[grouping].clear(); }else { //my parent is a potential root rootForGrouping[grouping].insert(index); //while you aren't at root while(pDataArray->t->tree[index].getParent() != -1){ //cout << index << endl; if (pDataArray->m->control_pressed) { return 0; } //am I the root for this grouping? if so I want to stop "early" //does my sibling have descendants from the users groups? //if so I am not the root int parent = pDataArray->t->tree[index].getParent(); int lc = pDataArray->t->tree[parent].getLChild(); int rc = pDataArray->t->tree[parent].getRChild(); int sib = lc; if (lc == index) { sib = rc; } map::iterator itGroup; int pcountSize = 0; for (int j = 0; j < grouping.size(); j++) { map::iterator itGroup = pDataArray->t->tree[sib].pcount.find(grouping[j]); if (itGroup != pDataArray->t->tree[sib].pcount.end()) { pcountSize++; if (pcountSize > 1) { break; } } } //if yes, I am not the root if (pcountSize != 0) { rootForGrouping[grouping].clear(); rootForGrouping[grouping].insert(parent); } index = parent; } //get all nodes above the root to add so we don't add their u values above index = *(rootForGrouping[grouping].begin()); while(pDataArray->t->tree[index].getParent() != -1){ int parent = pDataArray->t->tree[index].getParent(); rootForGrouping[grouping].insert(parent); //cout << parent << " in root" << endl; index = parent; } } ///////////////////////////////////////////////////////////////////////////// for(int i=0;it->getNumNodes();i++){ if (pDataArray->m->control_pressed) { return 0; } //cout << i << endl; //pcountSize = 0, they are from a branch that is entirely from a group the user doesn't want //pcountSize = 2, not unique to one group //pcountSize = 1, unique to one group int pcountSize = 0; for (int j = 0; j < pDataArray->namesOfGroupCombos[h].size(); j++) { map::iterator itGroup = pDataArray->t->tree[i].pcount.find(pDataArray->namesOfGroupCombos[h][j]); if (itGroup != pDataArray->t->tree[i].pcount.end()) { pcountSize++; if (pcountSize > 1) { break; } } } //unique calc if (pcountSize == 0) { } else if ((pDataArray->t->tree[i].getBranchLength() != -1) && (pcountSize == 1) && (rootForGrouping[pDataArray->namesOfGroupCombos[h]].count(i) == 0)) { //you have a unique branch length and you are not the root UniqueBL += abs(pDataArray->t->tree[i].getBranchLength()); } //total calc if (pcountSize == 0) { } else if ((pDataArray->t->tree[i].getBranchLength() != -1) && (pcountSize != 0) && (rootForGrouping[pDataArray->namesOfGroupCombos[h]].count(i) == 0)) { //you have a branch length and you are not the root totalBL += abs(pDataArray->t->tree[i].getBranchLength()); } } //cout << UniqueBL << '\t' << totalBL << endl; UW = (UniqueBL / totalBL); if (isnan(UW) || isinf(UW)) { UW = 0; } pDataArray->results[count] = UW; } count++; } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "UnWeighted", "MyUnWeightedThreadFunction"); exit(1); } } /**************************************************************************************************/ static DWORD WINAPI MyUnWeightedRandomThreadFunction(LPVOID lpParam){ unweightedData* pDataArray; pDataArray = (unweightedData*)lpParam; try { pDataArray->results.resize(pDataArray->num); int count = 0; Tree* copyTree = new Tree(pDataArray->ct); for (int h = pDataArray->start; h < (pDataArray->start+pDataArray->num); h++) { if (pDataArray->m->control_pressed) { return 0; } map< vector, set > rootForGrouping; //copy random tree passed in copyTree->getCopy(pDataArray->t); //swap labels in the groups you want to compare copyTree->assembleRandomUnifracTree(pDataArray->namesOfGroupCombos[h]); double UniqueBL=0.0000; //a branch length is unique if it's chidren are from the same group double totalBL = 0.00; //all branch lengths double UW = 0.00; //Unweighted Value = UniqueBL / totalBL; //find a node that belongs to one of the groups in this combo int nodeBelonging = -1; for (int g = 0; g < pDataArray->namesOfGroupCombos[h].size(); g++) { if (copyTree->groupNodeInfo[pDataArray->namesOfGroupCombos[h][g]].size() != 0) { nodeBelonging = copyTree->groupNodeInfo[pDataArray->namesOfGroupCombos[h][g]][0]; break; } } //sanity check if (nodeBelonging == -1) { pDataArray->m->mothurOut("[WARNING]: cannot find a nodes in the tree from grouping "); for (int g = 0; g < pDataArray->namesOfGroupCombos[h].size()-1; g++) { pDataArray->m->mothurOut(pDataArray->namesOfGroupCombos[h][g] + "-"); } pDataArray->m->mothurOut(pDataArray->namesOfGroupCombos[h][pDataArray->namesOfGroupCombos[h].size()-1]); pDataArray->m->mothurOut(", skipping."); pDataArray->m->mothurOutEndLine(); pDataArray->results[count] = UW; }else{ //if including the root this clears rootForGrouping[namesOfGroupCombos[h]] //getRoot(copyTree, nodeBelonging, namesOfGroupCombos[h]); ///////////////////////////////////////////////////////////////////////////// //you are a leaf so get your parent vector grouping = pDataArray->namesOfGroupCombos[h]; int index = copyTree->tree[nodeBelonging].getParent(); if (pDataArray->includeRoot) { rootForGrouping[grouping].clear(); }else { //my parent is a potential root rootForGrouping[grouping].insert(index); //while you aren't at root while(copyTree->tree[index].getParent() != -1){ //cout << index << endl; if (pDataArray->m->control_pressed) { return 0; } //am I the root for this grouping? if so I want to stop "early" //does my sibling have descendants from the users groups? //if so I am not the root int parent = copyTree->tree[index].getParent(); int lc = copyTree->tree[parent].getLChild(); int rc = copyTree->tree[parent].getRChild(); int sib = lc; if (lc == index) { sib = rc; } map::iterator itGroup; int pcountSize = 0; for (int j = 0; j < grouping.size(); j++) { map::iterator itGroup = copyTree->tree[sib].pcount.find(grouping[j]); if (itGroup != copyTree->tree[sib].pcount.end()) { pcountSize++; if (pcountSize > 1) { break; } } } //if yes, I am not the root if (pcountSize != 0) { rootForGrouping[grouping].clear(); rootForGrouping[grouping].insert(parent); } index = parent; } //get all nodes above the root to add so we don't add their u values above index = *(rootForGrouping[grouping].begin()); while(copyTree->tree[index].getParent() != -1){ int parent = copyTree->tree[index].getParent(); rootForGrouping[grouping].insert(parent); //cout << parent << " in root" << endl; index = parent; } } ///////////////////////////////////////////////////////////////////////////// for(int i=0;igetNumNodes();i++){ if (pDataArray->m->control_pressed) { return 0; } //pcountSize = 0, they are from a branch that is entirely from a group the user doesn't want //pcountSize = 2, not unique to one group //pcountSize = 1, unique to one group int pcountSize = 0; for (int j = 0; j < pDataArray->namesOfGroupCombos[h].size(); j++) { map::iterator itGroup = copyTree->tree[i].pcount.find(pDataArray->namesOfGroupCombos[h][j]); if (itGroup != copyTree->tree[i].pcount.end()) { pcountSize++; if (pcountSize > 1) { break; } } } //unique calc if (pcountSize == 0) { } else if ((copyTree->tree[i].getBranchLength() != -1) && (pcountSize == 1) && (rootForGrouping[pDataArray->namesOfGroupCombos[h]].count(i) == 0)) { //you have a unique branch length and you are not the root UniqueBL += abs(copyTree->tree[i].getBranchLength()); } //total calc if (pcountSize == 0) { } else if ((copyTree->tree[i].getBranchLength() != -1) && (pcountSize != 0) && (rootForGrouping[pDataArray->namesOfGroupCombos[h]].count(i) == 0)) { //you have a branch length and you are not the root totalBL += abs(copyTree->tree[i].getBranchLength()); } } //cout << h << '\t' << UniqueBL << '\t' << totalBL << endl; UW = (UniqueBL / totalBL); if (isnan(UW) || isinf(UW)) { UW = 0; } pDataArray->results[count] = UW; //cout << h << '\t' << UW << endl; } count++; } delete copyTree; return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "UnWeighted", "MyUnWeightedRandomThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/calculators/uvest.cpp000066400000000000000000000054121255543666200206260ustar00rootroot00000000000000/* * uvest.cpp * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "uvest.h" /***********************************************************************/ //This is used by SharedJAbund and SharedSorAbund EstOutput UVEst::getUVest(vector shared) { try { EstOutput results; results.resize(2,0); int S12, Atotal, Btotal, f1A, f2A, f1B, f2B, sumSharedA, sumSharedB, sumSharedA1, sumSharedB1, tempA, tempB; S12 = 0; Atotal = 0; Btotal = 0; f1A = 0; f2A = 0; f1B = 0; f2B = 0; sumSharedA = 0; sumSharedB = 0; sumSharedA1 = 0; sumSharedB1 = 0; float Upart1, Upart2, Upart3, Vpart1, Vpart2, Vpart3, Uest, Vest; Upart1 = 0.0; Upart2 = 0.0; Upart3 = 0.0; Vpart1 = 0.0; Vpart2 = 0.0; Vpart3 = 0.0; /*Xi, Yi = abundance of the ith shared OTU in A and B ntotal, mtotal = total number of sequences sampled in A and B I(•) = if the argument, •, is true then I(•) is 1; otherwise it is 0. sumSharedA = the sum of all shared otus in A sumSharedB = the sum of all shared otus in B sumSharedA1 = the sum of all shared otus in A where B = 1 sumSharedB1 = the sum of all shared otus in B where A = 1 */ for (int i = 0; i < shared[0]->getNumBins(); i++) { //store in temps to avoid multiple repetitive function calls tempA = shared[0]->getAbundance(i); tempB = shared[1]->getAbundance(i); Atotal += tempA; Btotal += tempB; if ((tempA != 0) && (tempB != 0)) {//they are shared sumSharedA += tempA; sumSharedB += tempB; //does A have one or two if (tempA == 1) { f1A++; sumSharedB1 += tempB;} else if (tempA == 2) { f2A++; } //does B have one or two if (tempB == 1) { f1B++; sumSharedA1 += tempA;} else if (tempB == 2) { f2B++; } } } Upart1 = sumSharedA / (float) Atotal; Upart2 = ((Btotal - 1) * f1B) / (float) (Btotal * 2 * f2B); Upart3 = sumSharedA1 / (float) Atotal; if (isnan(Upart1) || isinf(Upart1)) { Upart1 = 0; } if (isnan(Upart2) || isinf(Upart2)) { Upart2 = 0; } if (isnan(Upart3) || isinf(Upart3)) { Upart3 = 0; } Uest = Upart1 + (Upart2 * Upart3); Vpart1 = sumSharedB / (float) Btotal; Vpart2 = ((Atotal - 1) * f1A) / (float) (Atotal * 2 * f2A); Vpart3 = sumSharedB1 / (float) Btotal; if (isnan(Vpart1) || isinf(Vpart1)) { Vpart1 = 0; } if (isnan(Vpart2) || isinf(Vpart2)) { Vpart2 = 0; } if (isnan(Vpart3) || isinf(Vpart3)) { Vpart3 = 0; } Vest = Vpart1 + (Vpart2 * Vpart3); if (Uest > 1) { Uest = 1; } if (Vest > 1) { Vest = 1; } results[0] = Uest; results[1] = Vest; return results; } catch(exception& e) { m->errorOut(e, "UVEst", "getUVest"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/uvest.h000066400000000000000000000012601255543666200202700ustar00rootroot00000000000000#ifndef UVEST_H #define UVEST_H /* * uvest.h * Dotur * * Created by Sarah Westcott on 1/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class implements the UVEst estimator on two groups. It is used by sharedJAbund and SharedSorensonAbund. */ #include "mothur.h" #include "sharedrabundvector.h" typedef vector EstOutput; /***********************************************************************/ class UVEst { public: UVEst() { m = MothurOut::getInstance(); } EstOutput getUVest(vector); private: MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/calculators/weighted.cpp000066400000000000000000000442611255543666200212650ustar00rootroot00000000000000/* * weighted.cpp * Mothur * * Created by Sarah Westcott on 2/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "weighted.h" /**************************************************************************************************/ EstOutput Weighted::getValues(Tree* t, int p, string o) { try { data.clear(); //clear out old values int numGroups; vector D; processors = p; outputDir = o; CountTable* ct = t->getCountTable(); numGroups = m->getNumGroups(); if (m->control_pressed) { return data; } //calculate number of comparisons i.e. with groups A,B,C = AB, AC, BC = 3; vector< vector > namesOfGroupCombos; for (int i=0; iGroups[i]+globaldata->Groups[l]] = 0.0; vector groups; groups.push_back((m->getGroups())[i]); groups.push_back((m->getGroups())[l]); namesOfGroupCombos.push_back(groups); } } int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } data = createProcesses(t, namesOfGroupCombos, ct); lines.clear(); return data; } catch(exception& e) { m->errorOut(e, "Weighted", "getValues"); exit(1); } } /**************************************************************************************************/ EstOutput Weighted::createProcesses(Tree* t, vector< vector > namesOfGroupCombos, CountTable* ct) { try { vector processIDS; EstOutput results; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ EstOutput Myresults; Myresults = driver(t, namesOfGroupCombos, lines[process].start, lines[process].num, ct); //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".weighted.results.temp"; m->openOutputFile(tempFile, out); out << Myresults.size() << endl; for (int i = 0; i < Myresults.size(); i++) { out << Myresults[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".weightedcommand.results.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".weightedcommand.results.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); //calculate number of comparisons i.e. with groups A,B,C = AB, AC, BC = 3; int numGroups = m->getNumGroups(); vector< vector > namesOfGroupCombos; for (int i=0; i groups; groups.push_back((m->getGroups())[i]); groups.push_back((m->getGroups())[l]); namesOfGroupCombos.push_back(groups); } } int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } results.clear(); processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ EstOutput Myresults; Myresults = driver(t, namesOfGroupCombos, lines[process].start, lines[process].num, ct); //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".weighted.results.temp"; m->openOutputFile(tempFile, out); out << Myresults.size() << endl; for (int i = 0; i < Myresults.size(); i++) { out << Myresults[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } results = driver(t, namesOfGroupCombos, lines[0].start, lines[0].num, ct); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } if (m->control_pressed) { return results; } //get data created by processes for (int i=0;i<(processors-1);i++) { ifstream in; string s = outputDir + toString(processIDS[i]) + ".weighted.results.temp"; m->openInputFile(s, in); //get quantiles while (!in.eof()) { int num; in >> num; m->gobble(in); if (m->control_pressed) { break; } double w; for (int j = 0; j < num; j++) { in >> w; results.push_back(w); } m->gobble(in); } in.close(); m->mothurRemove(s); } #else //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; vector cts; vector trees; //Create processor worker threads. for( int i=1; icopy(ct); Tree* copyTree = new Tree(copyCount); copyTree->getCopy(t); cts.push_back(copyCount); trees.push_back(copyTree); weightedData* tempweighted = new weightedData(m, lines[i].start, lines[i].num, namesOfGroupCombos, copyTree, copyCount, includeRoot); pDataArray.push_back(tempweighted); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyWeightedThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } results = driver(t, namesOfGroupCombos, lines[0].start, lines[0].num, ct); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (int j = 0; j < pDataArray[i]->results.size(); j++) { results.push_back(pDataArray[i]->results[j]); } delete cts[i]; delete trees[i]; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif return results; } catch(exception& e) { m->errorOut(e, "Weighted", "createProcesses"); exit(1); } } /**************************************************************************************************/ EstOutput Weighted::driver(Tree* t, vector< vector > namesOfGroupCombos, int start, int num, CountTable* ct) { try { EstOutput results; vector D; int count = 0; for (int h = start; h < (start+num); h++) { if (m->control_pressed) { return results; } //initialize weighted score string groupA = namesOfGroupCombos[h][0]; string groupB = namesOfGroupCombos[h][1]; set validBranches; WScore[groupA+groupB] = 0.0; D.push_back(0.0000); //initialize a spot in D for each combination //adding the wieghted sums from group i for (int j = 0; j < t->groupNodeInfo[groupA].size(); j++) { //the leaf nodes that have seqs from group i map::iterator it = t->tree[t->groupNodeInfo[groupA][j]].pcount.find(groupA); int numSeqsInGroupI = it->second; double sum = getLengthToRoot(t, t->groupNodeInfo[groupA][j], groupA, groupB); double weightedSum = ((numSeqsInGroupI * sum) / (double)ct->getGroupCount(groupA)); D[count] += weightedSum; } //adding the wieghted sums from group l for (int j = 0; j < t->groupNodeInfo[groupB].size(); j++) { //the leaf nodes that have seqs from group l map::iterator it = t->tree[t->groupNodeInfo[groupB][j]].pcount.find(groupB); int numSeqsInGroupL = it->second; double sum = getLengthToRoot(t, t->groupNodeInfo[groupB][j], groupA, groupB); double weightedSum = ((numSeqsInGroupL * sum) / (double)ct->getGroupCount(groupB)); D[count] += weightedSum; } count++; } //calculate u for the group comb for (int h = start; h < (start+num); h++) { //report progress //m->mothurOut("Processing combo: " + toString(h)); m->mothurOutEndLine(); string groupA = namesOfGroupCombos[h][0]; string groupB = namesOfGroupCombos[h][1]; //calculate u for the group comb for(int i=0;igetNumNodes();i++){ if (m->control_pressed) { return data; } double u; //int pcountSize = 0; //does this node have descendants from groupA it = t->tree[i].pcount.find(groupA); //if it does u = # of its descendants with a certain group / total number in tree with a certain group if (it != t->tree[i].pcount.end()) { u = (double) t->tree[i].pcount[groupA] / (double) ct->getGroupCount(groupA); }else { u = 0.00; } //does this node have descendants from group l it = t->tree[i].pcount.find(groupB); //if it does subtract their percentage from u if (it != t->tree[i].pcount.end()) { u -= (double) t->tree[i].pcount[groupB] / (double) ct->getGroupCount(groupB); } if (includeRoot) { if (t->tree[i].getBranchLength() != -1) { u = abs(u * t->tree[i].getBranchLength()); WScore[(groupA+groupB)] += u; } }else { //if this is not the root then add it if (rootForGrouping[namesOfGroupCombos[h]].count(i) == 0) { if (t->tree[i].getBranchLength() != -1) { u = abs(u * t->tree[i].getBranchLength()); WScore[(groupA+groupB)] += u; } } } } } /********************************************************/ //calculate weighted score for the group combination double UN; count = 0; for (int h = start; h < (start+num); h++) { UN = (WScore[namesOfGroupCombos[h][0]+namesOfGroupCombos[h][1]] / D[count]); if (isnan(UN) || isinf(UN)) { UN = 0; } results.push_back(UN); count++; } return results; } catch(exception& e) { m->errorOut(e, "Weighted", "driver"); exit(1); } } /**************************************************************************************************/ EstOutput Weighted::getValues(Tree* t, string groupA, string groupB) { try { data.clear(); //clear out old values CountTable* ct = t->getCountTable(); if (m->control_pressed) { return data; } //initialize weighted score WScore[(groupA+groupB)] = 0.0; double D = 0.0; set validBranches; vector groups; groups.push_back(groupA); groups.push_back(groupB); //adding the wieghted sums from group i for (int j = 0; j < t->groupNodeInfo[groups[0]].size(); j++) { //the leaf nodes that have seqs from group i map::iterator it = t->tree[t->groupNodeInfo[groups[0]][j]].pcount.find(groups[0]); int numSeqsInGroupI = it->second; double sum = getLengthToRoot(t, t->groupNodeInfo[groups[0]][j], groups[0], groups[1]); double weightedSum = ((numSeqsInGroupI * sum) / (double)ct->getGroupCount(groups[0])); D += weightedSum; } //adding the wieghted sums from group l for (int j = 0; j < t->groupNodeInfo[groups[1]].size(); j++) { //the leaf nodes that have seqs from group l map::iterator it = t->tree[t->groupNodeInfo[groups[1]][j]].pcount.find(groups[1]); int numSeqsInGroupL = it->second; double sum = getLengthToRoot(t, t->groupNodeInfo[groups[1]][j], groups[0], groups[1]); double weightedSum = ((numSeqsInGroupL * sum) / (double)ct->getGroupCount(groups[1])); D += weightedSum; } //calculate u for the group comb for(int i=0;igetNumNodes();i++){ if (m->control_pressed) { return data; } double u; //int pcountSize = 0; //does this node have descendants from groupA it = t->tree[i].pcount.find(groupA); //if it does u = # of its descendants with a certain group / total number in tree with a certain group if (it != t->tree[i].pcount.end()) { u = (double) t->tree[i].pcount[groupA] / (double) ct->getGroupCount(groupA); }else { u = 0.00; } //does this node have descendants from group l it = t->tree[i].pcount.find(groupB); //if it does subtract their percentage from u if (it != t->tree[i].pcount.end()) { u -= (double) t->tree[i].pcount[groupB] / (double) ct->getGroupCount(groupB); } if (includeRoot) { if (t->tree[i].getBranchLength() != -1) { u = abs(u * t->tree[i].getBranchLength()); WScore[(groupA+groupB)] += u; } }else{ //if this is not the root then add it if (rootForGrouping[groups].count(i) == 0) { if (t->tree[i].getBranchLength() != -1) { u = abs(u * t->tree[i].getBranchLength()); WScore[(groupA+groupB)] += u; } } } } /********************************************************/ //calculate weighted score for the group combination double UN; UN = (WScore[(groupA+groupB)] / D); if (isnan(UN) || isinf(UN)) { UN = 0; } data.push_back(UN); return data; } catch(exception& e) { m->errorOut(e, "Weighted", "getValues"); exit(1); } } /**************************************************************************************************/ double Weighted::getLengthToRoot(Tree* t, int v, string groupA, string groupB) { try { double sum = 0.0; int index = v; //you are a leaf if(t->tree[index].getBranchLength() != -1){ sum += abs(t->tree[index].getBranchLength()); } double tempTotal = 0.0; index = t->tree[index].getParent(); vector grouping; grouping.push_back(groupA); grouping.push_back(groupB); rootForGrouping[grouping].insert(index); //while you aren't at root while(t->tree[index].getParent() != -1){ if (m->control_pressed) { return sum; } int parent = t->tree[index].getParent(); if (includeRoot) { //add everyone if(t->tree[index].getBranchLength() != -1){ sum += abs(t->tree[index].getBranchLength()); } }else { //am I the root for this grouping? if so I want to stop "early" //does my sibling have descendants from the users groups? int lc = t->tree[parent].getLChild(); int rc = t->tree[parent].getRChild(); int sib = lc; if (lc == index) { sib = rc; } map::iterator itGroup; int pcountSize = 0; itGroup = t->tree[sib].pcount.find(groupA); if (itGroup != t->tree[sib].pcount.end()) { pcountSize++; } itGroup = t->tree[sib].pcount.find(groupB); if (itGroup != t->tree[sib].pcount.end()) { pcountSize++; } //if yes, I am not the root so add me if (pcountSize != 0) { if (t->tree[index].getBranchLength() != -1) { sum += abs(t->tree[index].getBranchLength()) + tempTotal; tempTotal = 0.0; }else { sum += tempTotal; tempTotal = 0.0; } rootForGrouping[grouping].clear(); rootForGrouping[grouping].insert(parent); }else { //if no, I may be the root so add my br to tempTotal until I am proven innocent if (t->tree[index].getBranchLength() != -1) { tempTotal += abs(t->tree[index].getBranchLength()); } } } index = parent; } //get all nodes above the root to add so we don't add their u values above index = *(rootForGrouping[grouping].begin()); while(t->tree[index].getParent() != -1){ int parent = t->tree[index].getParent(); rootForGrouping[grouping].insert(parent); index = parent; } return sum; } catch(exception& e) { m->errorOut(e, "Weighted", "getBranchLengthSums"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/calculators/weighted.h000066400000000000000000000333331255543666200207300ustar00rootroot00000000000000#ifndef WEIGHTED_H #define WEIGHTED_H /* * weighted.h * Mothur * * Created by Sarah Westcott on 2/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "treecalculator.h" #include "counttable.h" /***********************************************************************/ class Weighted : public TreeCalculator { public: Weighted( bool r) : includeRoot(r) {}; ~Weighted() {}; EstOutput getValues(Tree*, string, string); EstOutput getValues(Tree*, int, string); private: struct linePair { int start; int num; linePair(int i, int j) : start(i), num(j) {} }; vector lines; EstOutput data; map::iterator it; map WScore; //a score for each group combination i.e. AB, AC, BC. int processors; string outputDir; map< vector, set > rootForGrouping; //maps a grouping combo to the root for that combo bool includeRoot; EstOutput driver(Tree*, vector< vector >, int, int, CountTable*); EstOutput createProcesses(Tree*, vector< vector >, CountTable*); double getLengthToRoot(Tree*, int, string, string); }; /***********************************************************************/ struct weightedData { int start; int num; MothurOut* m; EstOutput results; vector< vector > namesOfGroupCombos; Tree* t; CountTable* ct; bool includeRoot; weightedData(){} weightedData(MothurOut* mout, int st, int en, vector< vector > ngc, Tree* tree, CountTable* count, bool ir) { m = mout; start = st; num = en; namesOfGroupCombos = ngc; t = tree; ct = count; includeRoot = ir; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyWeightedThreadFunction(LPVOID lpParam){ weightedData* pDataArray; pDataArray = (weightedData*)lpParam; try { map::iterator it; vector D; int count = 0; map< vector, set > rootForGrouping; map WScore; for (int h = pDataArray->start; h < (pDataArray->start+pDataArray->num); h++) { if (pDataArray->m->control_pressed) { return 0; } //initialize weighted score string groupA = pDataArray->namesOfGroupCombos[h][0]; string groupB = pDataArray->namesOfGroupCombos[h][1]; set validBranches; WScore[groupA+groupB] = 0.0; D.push_back(0.0000); //initialize a spot in D for each combination //adding the wieghted sums from group i for (int j = 0; j < pDataArray->t->groupNodeInfo[groupA].size(); j++) { //the leaf nodes that have seqs from group i map::iterator it = pDataArray->t->tree[pDataArray->t->groupNodeInfo[groupA][j]].pcount.find(groupA); int numSeqsInGroupI = it->second; //double sum = getLengthToRoot(pDataArray->t, pDataArray->t->groupNodeInfo[groupA][j], groupA, groupB); /*************************************************************************************/ double sum = 0.0; int index = pDataArray->t->groupNodeInfo[groupA][j]; //you are a leaf if(pDataArray->t->tree[index].getBranchLength() != -1){ sum += abs(pDataArray->t->tree[index].getBranchLength()); } double tempTotal = 0.0; index = pDataArray->t->tree[index].getParent(); vector grouping; grouping.push_back(groupA); grouping.push_back(groupB); rootForGrouping[grouping].insert(index); //while you aren't at root while(pDataArray->t->tree[index].getParent() != -1){ if (pDataArray->m->control_pressed) { return 0; } int parent = pDataArray->t->tree[index].getParent(); if (pDataArray->includeRoot) { //add everyone if(pDataArray->t->tree[index].getBranchLength() != -1){ sum += abs(pDataArray->t->tree[index].getBranchLength()); } }else { //am I the root for this grouping? if so I want to stop "early" //does my sibling have descendants from the users groups? int lc = pDataArray->t->tree[parent].getLChild(); int rc = pDataArray->t->tree[parent].getRChild(); int sib = lc; if (lc == index) { sib = rc; } map::iterator itGroup; int pcountSize = 0; itGroup = pDataArray->t->tree[sib].pcount.find(groupA); if (itGroup != pDataArray->t->tree[sib].pcount.end()) { pcountSize++; } itGroup = pDataArray->t->tree[sib].pcount.find(groupB); if (itGroup != pDataArray->t->tree[sib].pcount.end()) { pcountSize++; } //if yes, I am not the root so add me if (pcountSize != 0) { if (pDataArray->t->tree[index].getBranchLength() != -1) { sum += abs(pDataArray->t->tree[index].getBranchLength()) + tempTotal; tempTotal = 0.0; }else { sum += tempTotal; tempTotal = 0.0; } rootForGrouping[grouping].clear(); rootForGrouping[grouping].insert(parent); }else { //if no, I may be the root so add my br to tempTotal until I am proven innocent if (pDataArray->t->tree[index].getBranchLength() != -1) { tempTotal += abs(pDataArray->t->tree[index].getBranchLength()); } } } index = parent; } //get all nodes above the root to add so we don't add their u values above index = *(rootForGrouping[grouping].begin()); while(pDataArray->t->tree[index].getParent() != -1){ int parent = pDataArray->t->tree[index].getParent(); rootForGrouping[grouping].insert(parent); index = parent; } /*************************************************************************************/ double weightedSum = ((numSeqsInGroupI * sum) / (double)pDataArray->ct->getGroupCount(groupA)); D[count] += weightedSum; } //adding the wieghted sums from group l for (int j = 0; j < pDataArray->t->groupNodeInfo[groupB].size(); j++) { //the leaf nodes that have seqs from group l map::iterator it = pDataArray->t->tree[pDataArray->t->groupNodeInfo[groupB][j]].pcount.find(groupB); int numSeqsInGroupL = it->second; //double sum = getLengthToRoot(pDataArray->t, pDataArray->t->groupNodeInfo[groupB][j], groupA, groupB); /*************************************************************************************/ double sum = 0.0; int index = pDataArray->t->groupNodeInfo[groupB][j]; //you are a leaf if(pDataArray->t->tree[index].getBranchLength() != -1){ sum += abs(pDataArray->t->tree[index].getBranchLength()); } double tempTotal = 0.0; index = pDataArray->t->tree[index].getParent(); vector grouping; grouping.push_back(groupA); grouping.push_back(groupB); rootForGrouping[grouping].insert(index); //while you aren't at root while(pDataArray->t->tree[index].getParent() != -1){ if (pDataArray->m->control_pressed) { return 0; } int parent = pDataArray->t->tree[index].getParent(); if (pDataArray->includeRoot) { //add everyone if(pDataArray->t->tree[index].getBranchLength() != -1){ sum += abs(pDataArray->t->tree[index].getBranchLength()); } }else { //am I the root for this grouping? if so I want to stop "early" //does my sibling have descendants from the users groups? int lc = pDataArray->t->tree[parent].getLChild(); int rc = pDataArray->t->tree[parent].getRChild(); int sib = lc; if (lc == index) { sib = rc; } map::iterator itGroup; int pcountSize = 0; itGroup = pDataArray->t->tree[sib].pcount.find(groupA); if (itGroup != pDataArray->t->tree[sib].pcount.end()) { pcountSize++; } itGroup = pDataArray->t->tree[sib].pcount.find(groupB); if (itGroup != pDataArray->t->tree[sib].pcount.end()) { pcountSize++; } //if yes, I am not the root so add me if (pcountSize != 0) { if (pDataArray->t->tree[index].getBranchLength() != -1) { sum += abs(pDataArray->t->tree[index].getBranchLength()) + tempTotal; tempTotal = 0.0; }else { sum += tempTotal; tempTotal = 0.0; } rootForGrouping[grouping].clear(); rootForGrouping[grouping].insert(parent); }else { //if no, I may be the root so add my br to tempTotal until I am proven innocent if (pDataArray->t->tree[index].getBranchLength() != -1) { tempTotal += abs(pDataArray->t->tree[index].getBranchLength()); } } } index = parent; } //get all nodes above the root to add so we don't add their u values above index = *(rootForGrouping[grouping].begin()); while(pDataArray->t->tree[index].getParent() != -1){ int parent = pDataArray->t->tree[index].getParent(); rootForGrouping[grouping].insert(parent); index = parent; } /*************************************************************************************/ double weightedSum = ((numSeqsInGroupL * sum) / (double)pDataArray->ct->getGroupCount(groupB)); D[count] += weightedSum; } count++; } //calculate u for the group comb for (int h = pDataArray->start; h < (pDataArray->start+pDataArray->num); h++) { //report progress //pDataArray->m->mothurOut("Processing combo: " + toString(h)); pDataArray->m->mothurOutEndLine(); string groupA = pDataArray->namesOfGroupCombos[h][0]; string groupB = pDataArray->namesOfGroupCombos[h][1]; //calculate u for the group comb for(int i=0;it->getNumNodes();i++){ if (pDataArray->m->control_pressed) { return 0; } double u; //int pcountSize = 0; //does this node have descendants from groupA it = pDataArray->t->tree[i].pcount.find(groupA); //if it does u = # of its descendants with a certain group / total number in tree with a certain group if (it != pDataArray->t->tree[i].pcount.end()) { u = (double) pDataArray->t->tree[i].pcount[groupA] / (double) pDataArray->ct->getGroupCount(groupA); }else { u = 0.00; } //does this node have descendants from group l it = pDataArray->t->tree[i].pcount.find(groupB); //if it does subtract their percentage from u if (it != pDataArray->t->tree[i].pcount.end()) { u -= (double) pDataArray->t->tree[i].pcount[groupB] / (double) pDataArray->ct->getGroupCount(groupB); } if (pDataArray->includeRoot) { if (pDataArray->t->tree[i].getBranchLength() != -1) { u = abs(u * pDataArray->t->tree[i].getBranchLength()); WScore[(groupA+groupB)] += u; } }else { //if this is not the root then add it if (rootForGrouping[pDataArray->namesOfGroupCombos[h]].count(i) == 0) { if (pDataArray->t->tree[i].getBranchLength() != -1) { u = abs(u * pDataArray->t->tree[i].getBranchLength()); WScore[(groupA+groupB)] += u; } } } } } /********************************************************/ //calculate weighted score for the group combination double UN; count = 0; for (int h = pDataArray->start; h < (pDataArray->start+pDataArray->num); h++) { UN = (WScore[pDataArray->namesOfGroupCombos[h][0]+pDataArray->namesOfGroupCombos[h][1]] / D[count]); if (isnan(UN) || isinf(UN)) { UN = 0; } pDataArray->results.push_back(UN); count++; } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "Weighted", "MyWeightedThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/calculators/whittaker.cpp000066400000000000000000000014071255543666200214620ustar00rootroot00000000000000/* * whittaker.cpp * Mothur * * Created by Pat Schloss on 4/23/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "whittaker.h" /***********************************************************************/ EstOutput Whittaker::getValues(vector shared){ try{ data.resize(1); int countA = 0; int countB = 0; int sTotal = shared[0]->getNumBins(); for(int i=0;igetAbundance(i) != 0){ countA++; } if(shared[1]->getAbundance(i) != 0){ countB++; } } data[0] = 2-2*sTotal/(float)(countA+countB); return data; } catch(exception& e) { m->errorOut(e, "Whittaker", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/calculators/whittaker.h000066400000000000000000000013671255543666200211340ustar00rootroot00000000000000#ifndef WHITTAKER_H #define WHITTAKER_H /* * whittaker.h * Mothur * * Created by Thomas Ryabin on 3/13/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "calculator.h" /*This class implements the Whittaker estimator on 2 groups. It is a child of the calculator class.*/ /***********************************************************************/ class Whittaker : public Calculator { public: Whittaker() : Calculator("whittaker", 1, false) {}; EstOutput getValues(SAbundVector*) {return data;}; EstOutput getValues(vector); string getCitation() { return "http://www.mothur.org/wiki/Whittaker"; } }; /***********************************************************************/ #endif mothur-1.36.1/source/chimera/000077500000000000000000000000001255543666200160465ustar00rootroot00000000000000mothur-1.36.1/source/chimera/bellerophon.cpp000066400000000000000000000662751255543666200211030ustar00rootroot00000000000000/* * bellerophon.cpp * Mothur * * Created by Sarah Westcott on 7/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "bellerophon.h" #include "eachgapdist.h" #include "ignoregaps.h" #include "onegapdist.h" /***************************************************************************************************************/ Bellerophon::Bellerophon(string name, bool filterSeqs, bool c, int win, int inc, int p, string o) : Chimera() { try { fastafile = name; correction = c; outputDir = o; window = win; increment = inc; processors = p; //read in sequences seqs = readSeqs(fastafile); numSeqs = seqs.size(); if (numSeqs == 0) { m->mothurOut("Error in reading you sequences."); m->mothurOutEndLine(); exit(1); } //do soft filter if (filterSeqs) { createFilter(seqs, 0.5); for (int i = 0; i < seqs.size(); i++) { runFilter(seqs[i]); } } distCalculator = new eachGapDist(); //set default window to 25% of sequence length string seq0 = seqs[0]->getAligned(); if (window == 0) { window = seq0.length() / 4; } else if (window > (seq0.length() / 2)) { m->mothurOut("Your sequence length is = " + toString(seq0.length()) + ". You have selected a window size greater than the length of half your aligned sequence. I will run it with a window size of " + toString((seq0.length() / 2))); m->mothurOutEndLine(); window = (seq0.length() / 2); } if (increment > (seqs[0]->getAlignLength() - (2*window))) { if (increment != 10) { m->mothurOut("You have selected a increment that is too large. I will use the default."); m->mothurOutEndLine(); increment = 10; if (increment > (seqs[0]->getAlignLength() - (2*window))) { increment = 0; } }else{ increment = 0; } } if (increment == 0) { iters = 1; } else { iters = ((seqs[0]->getAlignLength() - (2*window)) / increment); } //initialize pref pref.resize(iters); for (int i = 0; i < iters; i++) { Preference temp; for (int j = 0; j < numSeqs; j++) { pref[i].push_back(temp); } } } catch(exception& e) { m->errorOut(e, "Bellerophon", "Bellerophon"); exit(1); } } //*************************************************************************************************************** int Bellerophon::print(ostream& out, ostream& outAcc, string s) { try { int above1 = 0; //sorted "best" preference scores for all seqs vector best = getBestPref(); if (m->control_pressed) { return numSeqs; } out << "Name\tScore\tLeft\tRight\t" << endl; //output prefenence structure to .chimeras file for (int i = 0; i < best.size(); i++) { if (m->control_pressed) { return numSeqs; } out << best[i].name << '\t' << setprecision(3) << best[i].score << '\t' << best[i].leftParent << '\t' << best[i].rightParent << endl; //calc # of seqs with preference above 1.0 if (best[i].score > 1.0) { above1++; outAcc << best[i].name << endl; m->mothurOut(best[i].name + " is a suspected chimera at breakpoint " + toString(best[i].midpoint)); m->mothurOutEndLine(); m->mothurOut("It's score is " + toString(best[i].score) + " with suspected left parent " + best[i].leftParent + " and right parent " + best[i].rightParent); m->mothurOutEndLine(); } } //output results to screen m->mothurOutEndLine(); m->mothurOut("Sequence with preference score above 1.0: " + toString(above1)); m->mothurOutEndLine(); int spot; spot = best.size()-1; m->mothurOut("Minimum:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.975; m->mothurOut("2.5%-tile:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.75; m->mothurOut("25%-tile:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.50; m->mothurOut("Median: \t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.25; m->mothurOut("75%-tile:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.025; m->mothurOut("97.5%-tile:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = 0; m->mothurOut("Maximum:\t" + toString(best[spot].score)); m->mothurOutEndLine(); return numSeqs; } catch(exception& e) { m->errorOut(e, "Bellerophon", "print"); exit(1); } } #ifdef USE_MPI //*************************************************************************************************************** int Bellerophon::print(MPI_File& out, MPI_File& outAcc, string s) { try { int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { string outString = ""; //sorted "best" preference scores for all seqs vector best = getBestPref(); int above1 = 0; int ninetyfive = best.size() * 0.05; float cutoffScore = best[ninetyfive].score; if (m->control_pressed) { return numSeqs; } outString = "Name\tScore\tLeft\tRight\n"; MPI_Status status; int olength = outString.length(); char* buf5 = new char[olength]; memcpy(buf5, outString.c_str(), olength); MPI_File_write_shared(out, buf5, olength, MPI_CHAR, &status); delete buf5; //output prefenence structure to .chimeras file for (int i = 0; i < best.size(); i++) { if (m->control_pressed) { return numSeqs; } outString = best[i].name + "\t" + toString(best[i].score) + "\t" + best[i].leftParent + "\t" + best[i].rightParent + "\n"; MPI_Status status; int length = outString.length(); char* buf2 = new char[length]; memcpy(buf2, outString.c_str(), length); MPI_File_write_shared(out, buf2, length, MPI_CHAR, &status); delete buf2; //calc # of seqs with preference above 95%tile if (best[i].score >= cutoffScore) { above1++; string outAccString = ""; outAccString += best[i].name + "\n"; MPI_Status statusAcc; length = outAccString.length(); char* buf = new char[length]; memcpy(buf, outAccString.c_str(), length); MPI_File_write_shared(outAcc, buf, length, MPI_CHAR, &statusAcc); delete buf; cout << best[i].name << " is a suspected chimera at breakpoint " << toString(best[i].midpoint) << endl; cout << "It's score is " << toString(best[i].score) << " with suspected left parent " << best[i].leftParent << " and right parent " << best[i].rightParent << endl; } } //output results to screen m->mothurOutEndLine(); m->mothurOut("Sequence with preference score above " + toString(cutoffScore) + ": " + toString(above1)); m->mothurOutEndLine(); int spot; spot = best.size()-1; m->mothurOut("Minimum:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.975; m->mothurOut("2.5%-tile:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.75; m->mothurOut("25%-tile:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.50; m->mothurOut("Median: \t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.25; m->mothurOut("75%-tile:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = best.size() * 0.025; m->mothurOut("97.5%-tile:\t" + toString(best[spot].score)); m->mothurOutEndLine(); spot = 0; m->mothurOut("Maximum:\t" + toString(best[spot].score)); m->mothurOutEndLine(); } return numSeqs; } catch(exception& e) { m->errorOut(e, "Bellerophon", "print"); exit(1); } } #endif //******************************************************************************************************************** //sorts highest score to lowest inline bool comparePref(Preference left, Preference right){ return (left.score > right.score); } //*************************************************************************************************************** int Bellerophon::getChimeras() { try { //create breaking points vector midpoints; midpoints.resize(iters, window); for (int i = 1; i < iters; i++) { midpoints[i] = midpoints[i-1] + increment; } #ifdef USE_MPI int pid, numSeqsPerProcessor; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); numSeqsPerProcessor = iters / processors; //each process hits this only once unsigned long long startPos = pid * numSeqsPerProcessor; if(pid == processors - 1){ numSeqsPerProcessor = iters - pid * numSeqsPerProcessor; } lines.push_back(linePair(startPos, numSeqsPerProcessor)); //fill pref with scores driverChimeras(midpoints, lines[0]); if (m->control_pressed) { return 0; } //each process must send its parts back to pid 0 if (pid == 0) { //receive results for (int j = 1; j < processors; j++) { vector MPIBestSend; for (int i = 0; i < numSeqs; i++) { if (m->control_pressed) { return 0; } MPI_Status status; //receive string int length; MPI_Recv(&length, 1, MPI_INT, j, 2001, MPI_COMM_WORLD, &status); char* buf = new char[length]; MPI_Recv(&buf[0], length, MPI_CHAR, j, 2001, MPI_COMM_WORLD, &status); string temp = buf; if (temp.length() > length) { temp = temp.substr(0, length); } delete buf; MPIBestSend.push_back(temp); } fillPref(j, MPIBestSend); if (m->control_pressed) { return 0; } } }else { //takes best window for each sequence and turns Preference to string that can be parsed by pid 0. //played with this a bit, but it may be better to try user-defined datatypes with set string lengths?? vector MPIBestSend = getBestWindow(lines[0]); pref.clear(); //send your result to parent for (int i = 0; i < numSeqs; i++) { if (m->control_pressed) { return 0; } int bestLength = MPIBestSend[i].length(); char* buf = new char[bestLength]; memcpy(buf, MPIBestSend[i].c_str(), bestLength); MPI_Send(&bestLength, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD); MPI_Send(buf, bestLength, MPI_CHAR, 0, 2001, MPI_COMM_WORLD); delete buf; } MPIBestSend.clear(); } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else //divide breakpoints between processors #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if(processors == 1){ lines.push_back(linePair(0, iters)); //fill pref with scores driverChimeras(midpoints, lines[0]); }else{ int numSeqsPerProcessor = iters / processors; for (int i = 0; i < processors; i++) { unsigned long long startPos = i * numSeqsPerProcessor; if(i == processors - 1){ numSeqsPerProcessor = iters - i * numSeqsPerProcessor; } lines.push_back(linePair(startPos, numSeqsPerProcessor)); } createProcesses(midpoints); } #else lines.push_back(linePair(0, iters)); ///fill pref with scores driverChimeras(midpoints, lines[0]); #endif #endif return 0; } catch(exception& e) { m->errorOut(e, "Bellerophon", "getChimeras"); exit(1); } } /**************************************************************************************************/ int Bellerophon::createProcesses(vector mid) { try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 0; int exitCommand = 1; vector processIDS; bool recalc = false; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ exitCommand = driverChimeras(mid, lines[process]); string tempOut = outputDir + toString(m->mothurGetpid(process)) + ".temp"; writePrefs(tempOut, lines[process]); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } m->control_pressed = false; //wait to die for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); int numSeqsPerProcessor = iters / processors; for (int i = 0; i < processors; i++) { unsigned long long startPos = i * numSeqsPerProcessor; if(i == processors - 1){ numSeqsPerProcessor = iters - i * numSeqsPerProcessor; } lines.push_back(linePair(startPos, numSeqsPerProcessor)); } processIDS.clear(); process = 0; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ exitCommand = driverChimeras(mid, lines[process]); string tempOut = outputDir + toString(m->mothurGetpid(process)) + ".temp"; writePrefs(tempOut, lines[process]); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //force parent to wait until all the processes are done for (int i=0;ierrorOut(e, "AlignCommand", "createProcesses"); exit(1); } } //*************************************************************************************************************** int Bellerophon::driverChimeras(vector midpoints, linePair line) { try { for (int h = line.start; h < (line.start + line.num); h++) { count = h; int midpoint = midpoints[h]; //initialize pref[count] for (int i = 0; i < numSeqs; i++ ) { pref[count][i].name = seqs[i]->getName(); pref[count][i].midpoint = midpoint; } if (m->control_pressed) { return 0; } //create 2 vectors of sequences, 1 for left side and one for right side vector left; vector right; for (int i = 0; i < seqs.size(); i++) { if (m->control_pressed) { return 0; } //cout << "midpoint = " << midpoint << "\twindow = " << window << endl; //cout << "whole = " << seqs[i]->getAligned().length() << endl; //save left side string seqLeft = seqs[i]->getAligned().substr(midpoint-window, window); Sequence tempLeft; tempLeft.setName(seqs[i]->getName()); tempLeft.setAligned(seqLeft); left.push_back(tempLeft); //cout << "left = " << tempLeft.getAligned().length() << endl; //save right side string seqRight = seqs[i]->getAligned().substr(midpoint, window); Sequence tempRight; tempRight.setName(seqs[i]->getName()); tempRight.setAligned(seqRight); right.push_back(tempRight); //cout << "right = " << seqRight.length() << endl; } //this should be parallelized //perference = sum of (| distance of my left to sequence j's left - distance of my right to sequence j's right | ) //create a matrix containing the distance from left to left and right to right //calculate distances SparseMatrix* SparseLeft = new SparseMatrix(); SparseMatrix* SparseRight = new SparseMatrix(); createSparseMatrix(0, left.size(), SparseLeft, left); if (m->control_pressed) { delete SparseLeft; delete SparseRight; return 0; } createSparseMatrix(0, right.size(), SparseRight, right); if (m->control_pressed) { delete SparseLeft; delete SparseRight; return 0; } left.clear(); right.clear(); vector distMapRight; vector distMapLeft; // Create a data structure to quickly access the distance information. //this is from thallingers reimplementation on get.oturep // It consists of a vector of distance maps, where each map contains // all distances of a certain sequence. Vector and maps are accessed // via the index of a sequence in the distance matrix distMapRight = vector(numSeqs); distMapLeft = vector(numSeqs); //cout << "left" << endl << endl; for (MatData currentCell = SparseLeft->begin(); currentCell != SparseLeft->end(); currentCell++) { distMapLeft[currentCell->row][currentCell->column] = currentCell->dist; if (m->control_pressed) { delete SparseLeft; delete SparseRight; return 0; } //cout << " i = " << currentCell->row << " j = " << currentCell->column << " dist = " << currentCell->dist << endl; } //cout << "right" << endl << endl; for (MatData currentCell = SparseRight->begin(); currentCell != SparseRight->end(); currentCell++) { distMapRight[currentCell->row][currentCell->column] = currentCell->dist; if (m->control_pressed) { delete SparseLeft; delete SparseRight; return 0; } //cout << " i = " << currentCell->row << " j = " << currentCell->column << " dist = " << currentCell->dist << endl; } delete SparseLeft; delete SparseRight; //fill preference structure generatePreferences(distMapLeft, distMapRight, midpoint); if (m->control_pressed) { return 0; } //report progress if((h+1) % 10 == 0){ m->mothurOutJustToScreen("Processing sliding window: " + toString(h+1) + "\n") ; } } //report progress if((line.start + line.num) % 10 != 0){ m->mothurOutJustToScreen("Processing sliding window: " + toString(line.start + line.num) + "\n") ; } return 0; } catch(exception& e) { m->errorOut(e, "Bellerophon", "driverChimeras"); exit(1); } } /***************************************************************************************************************/ int Bellerophon::createSparseMatrix(int startSeq, int endSeq, SparseMatrix* sparse, vector s){ try { for(int i=startSeq; icontrol_pressed) { return 0; } distCalculator->calcDist(s[i], s[j]); float dist = distCalculator->getDist(); PCell temp(i, j, dist); sparse->addCell(temp); } } return 1; } catch(exception& e) { m->errorOut(e, "Bellerophon", "createSparseMatrix"); exit(1); } } /***************************************************************************************************************/ int Bellerophon::generatePreferences(vector left, vector right, int mid){ try { SeqMap::iterator itR; SeqMap::iterator itL; for (int i = 0; i < left.size(); i++) { SeqMap currentLeft = left[i]; //example i = 3; currentLeft is a map of 0 to the distance of sequence 3 to sequence 0, // 1 to the distance of sequence 3 to sequence 1, // 2 to the distance of sequence 3 to sequence 2. SeqMap currentRight = right[i]; // same as left but with distances on the right side. for (int j = 0; j < i; j++) { if (m->control_pressed) { return 0; } itL = currentLeft.find(j); itR = currentRight.find(j); //cout << " i = " << i << " j = " << j << " distLeft = " << itL->second << endl; //cout << " i = " << i << " j = " << j << " distright = " << itR->second << endl; //if you can find this entry update the preferences if ((itL != currentLeft.end()) && (itR != currentRight.end())) { if (!correction) { pref[count][i].score += abs((itL->second - itR->second)); pref[count][j].score += abs((itL->second - itR->second)); //cout << "left " << i << " " << j << " = " << itL->second << " right " << i << " " << j << " = " << itR->second << endl; //cout << "abs = " << abs((itL->second - itR->second)) << endl; //cout << i << " score = " << pref[i].score[1] << endl; //cout << j << " score = " << pref[j].score[1] << endl; }else { pref[count][i].score += abs((sqrt(itL->second) - sqrt(itR->second))); pref[count][j].score += abs((sqrt(itL->second) - sqrt(itR->second))); //cout << "left " << i << " " << j << " = " << itL->second << " right " << i << " " << j << " = " << itR->second << endl; //cout << "abs = " << abs((sqrt(itL->second) - sqrt(itR->second))) << endl; //cout << i << " score = " << pref[i].score[1] << endl; //cout << j << " score = " << pref[j].score[1] << endl; } //cout << "pref[" << i << "].closestLeft[1] = " << pref[i].closestLeft[1] << " parent = " << pref[i].leftParent[1] << endl; //are you the closest left sequence if (itL->second < pref[count][i].closestLeft) { pref[count][i].closestLeft = itL->second; pref[count][i].leftParent = seqs[j]->getName(); //cout << "updating closest left to " << pref[i].leftParent[1] << endl; } //cout << "pref[" << j << "].closestLeft[1] = " << pref[j].closestLeft[1] << " parent = " << pref[j].leftParent[1] << endl; if (itL->second < pref[count][j].closestLeft) { pref[count][j].closestLeft = itL->second; pref[count][j].leftParent = seqs[i]->getName(); //cout << "updating closest left to " << pref[j].leftParent[1] << endl; } //are you the closest right sequence if (itR->second < pref[count][i].closestRight) { pref[count][i].closestRight = itR->second; pref[count][i].rightParent = seqs[j]->getName(); } if (itR->second < pref[count][j].closestRight) { pref[count][j].closestRight = itR->second; pref[count][j].rightParent = seqs[i]->getName(); } } } } return 1; } catch(exception& e) { m->errorOut(e, "Bellerophon", "generatePreferences"); exit(1); } } /**************************************************************************************************/ vector Bellerophon::getBestPref() { try { vector best; //for each sequence for (int i = 0; i < numSeqs; i++) { //set best pref score to first one Preference temp = pref[0][i]; if (m->control_pressed) { return best; } //for each window for (int j = 1; j < pref.size(); j++) { //is this a better score if (pref[j][i].score > temp.score) { temp = pref[j][i]; } } best.push_back(temp); } //rank preference score to eachother float dme = 0.0; float expectedPercent = 1 / (float) (best.size()); for (int i = 0; i < best.size(); i++) { dme += best[i].score; } for (int i = 0; i < best.size(); i++) { if (m->control_pressed) { return best; } //gives the actual percentage of the dme this seq adds best[i].score = best[i].score / dme; //how much higher or lower is this than expected best[i].score = best[i].score / expectedPercent; } //sort Preferences highest to lowest sort(best.begin(), best.end(), comparePref); return best; } catch(exception& e) { m->errorOut(e, "Bellerophon", "getBestPref"); exit(1); } } /**************************************************************************************************/ int Bellerophon::writePrefs(string file, linePair tempLine) { try { ofstream outTemp; m->openOutputFile(file, outTemp); //lets you know what part of the pref matrix you are writing outTemp << tempLine.start << '\t' << tempLine.num << endl; for (int i = tempLine.start; i < (tempLine.start + tempLine.num); i++) { for (int j = 0; j < numSeqs; j++) { if (m->control_pressed) { outTemp.close(); m->mothurRemove(file); return 0; } outTemp << pref[i][j].name << '\t' << pref[i][j].leftParent << '\t' << pref[i][j].rightParent << '\t'; outTemp << pref[i][j].score << '\t' << pref[i][j].closestLeft << '\t' << pref[i][j].closestRight << '\t' << pref[i][j].midpoint << endl; } } outTemp.close(); return 0; } catch(exception& e) { m->errorOut(e, "Bellerophon", "writePrefs"); exit(1); } } /**************************************************************************************************/ int Bellerophon::readPrefs(string file) { try { ifstream inTemp; m->openInputFile(file, inTemp); int start, num; //lets you know what part of the pref matrix you are writing inTemp >> start >> num; m->gobble(inTemp); for (int i = start; i < num; i++) { for (int j = 0; j < numSeqs; j++) { if (m->control_pressed) { inTemp.close(); m->mothurRemove(file); return 0; } inTemp >> pref[i][j].name >> pref[i][j].leftParent >> pref[i][j].rightParent; inTemp >> pref[i][j].score >> pref[i][j].closestLeft >> pref[i][j].closestRight >> pref[i][j].midpoint; m->gobble(inTemp); } } inTemp.close(); m->mothurRemove(file); return 0; } catch(exception& e) { m->errorOut(e, "Bellerophon", "writePrefs"); exit(1); } } /**************************************************************************************************/ vector Bellerophon::getBestWindow(linePair line) { try { vector best; //for each sequence for (int i = 0; i < numSeqs; i++) { //set best pref score to first one Preference temp = pref[line.start][i]; if (m->control_pressed) { return best; } //for each window for (int j = (line.start+1); j < (line.start+line.num); j++) { //is this a better score if (pref[j][i].score > temp.score) { temp = pref[j][i]; } } string tempString = temp.name + '\t' + temp.leftParent + '\t' + temp.rightParent + '\t' + toString(temp.score); best.push_back(tempString); } return best; } catch(exception& e) { m->errorOut(e, "Bellerophon", "getBestWindow"); exit(1); } } /**************************************************************************************************/ int Bellerophon::fillPref(int process, vector& best) { try { //figure out where you start so you can put the best scores there int numSeqsPerProcessor = iters / processors; int start = process * numSeqsPerProcessor; for (int i = 0; i < best.size(); i++) { if (m->control_pressed) { return 0; } istringstream iss (best[i],istringstream::in); string tempScore; iss >> pref[start][i].name >> pref[start][i].leftParent >> pref[start][i].rightParent >> tempScore; convert(tempScore, pref[start][i].score); } return 0; } catch(exception& e) { m->errorOut(e, "Bellerophon", "fillPref"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/chimera/bellerophon.h000066400000000000000000000034461255543666200205370ustar00rootroot00000000000000#ifndef BELLEROPHON_H #define BELLEROPHON_H /* * bellerophon.h * Mothur * * Created by Sarah Westcott on 7/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "chimera.h" #include "sparsematrix.hpp" #include "sequence.hpp" #include "dist.h" typedef list::iterator MatData; typedef map SeqMap; //maps sequence to all distance for that seqeunce /***********************************************************/ class Bellerophon : public Chimera { public: Bellerophon(string, bool, bool, int, int, int, string); //fastafile, filter, correction, window, increment, processors, outputDir); ~Bellerophon() { delete distCalculator; for (int i = 0; i < seqs.size(); i++) { delete seqs[i]; } seqs.clear(); } int getChimeras(); int print(ostream&, ostream&, string); #ifdef USE_MPI int print(MPI_File&, MPI_File&, string); #endif private: struct linePair { unsigned long long start; int num; linePair(unsigned long long i, int j) : start(i), num(j) {} }; vector lines; Dist* distCalculator; vector seqs; vector< vector > pref; //pref[0] = preference scores for all seqs in window 0. string fastafile; int iters, count, window, increment, numSeqs, processors; //iters = number of windows bool correction; int generatePreferences(vector, vector, int); int createSparseMatrix(int, int, SparseMatrix*, vector); vector getBestPref(); int driverChimeras(vector, linePair); int createProcesses(vector); int writePrefs(string, linePair); int readPrefs(string); vector getBestWindow(linePair line); int fillPref(int, vector&); }; /***********************************************************/ #endif mothur-1.36.1/source/chimera/ccode.cpp000066400000000000000000001063231255543666200176340ustar00rootroot00000000000000/* * ccode.cpp * Mothur * * Created by westcott on 8/24/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "ccode.h" #include "ignoregaps.h" #include "eachgapdist.h" //*************************************************************************************************************** Ccode::Ccode(string filename, string temp, bool f, string mask, int win, int numW, string o) : Chimera() { try { fastafile = filename; outputDir = o; templateFileName = temp; templateSeqs = readSeqs(temp); setMask(mask); filter = f; window = win; numWanted = numW; distCalc = new eachGapDist(); decalc = new DeCalculator(); mapInfo = outputDir + m->getRootName(m->getSimpleName(fastafile)) + "mapinfo"; #ifdef USE_MPI //char* inFileName = new char[mapInfo.length()]; //memcpy(inFileName, mapInfo.c_str(), mapInfo.length()); char inFileName[1024]; strcpy(inFileName, mapInfo.c_str()); int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; MPI_File_open(MPI_COMM_WORLD, inFileName, outMode, MPI_INFO_NULL, &outMap); //comm, filename, mode, info, filepointer //delete inFileName; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { string outString = "Place in masked, filtered and trimmed sequence\tPlace in original alignment\n"; MPI_Status status; int length = outString.length(); char* buf2 = new char[length]; memcpy(buf2, outString.c_str(), length); MPI_File_write_shared(outMap, buf2, length, MPI_CHAR, &status); delete buf2; } #else ofstream out2; m->openOutputFile(mapInfo, out2); out2 << "Place in masked, filtered and trimmed sequence\tPlace in original alignment" << endl; out2.close(); #endif } catch(exception& e) { m->errorOut(e, "Ccode", "Ccode"); exit(1); } } //*************************************************************************************************************** Ccode::~Ccode() { delete distCalc; delete decalc; #ifdef USE_MPI MPI_File_close(&outMap); #endif } //*************************************************************************************************************** Sequence Ccode::print(ostream& out, ostream& outAcc) { try { ofstream out2; m->openOutputFileAppend(mapInfo, out2); out2 << querySeq->getName() << endl; for (it = spotMap.begin(); it!= spotMap.end(); it++) { out2 << it->first << '\t' << it->second << endl; } out2.close(); out << querySeq->getName() << endl << endl << "Reference sequences used and distance to query:" << endl; for (int j = 0; j < closest.size(); j++) { out << closest[j].seq->getName() << '\t' << closest[j].dist << endl; } out << endl << endl; //for each window //window mapping info. out << "Mapping information: "; //you mask and did not filter if ((seqMask != "") && (!filter)) { out << "mask and trim."; } //you filtered and did not mask if ((seqMask == "") && (filter)) { out << "filter and trim."; } //you masked and filtered if ((seqMask != "") && (filter)) { out << "mask, filter and trim."; } out << endl << "Window\tStartPos\tEndPos" << endl; it = trim.begin(); for (int k = 0; k < windows.size()-1; k++) { out << k+1 << '\t' << spotMap[windows[k]-it->first] << '\t' << spotMap[windows[k]-it->first+windowSizes] << endl; } out << windows.size() << '\t' << spotMap[windows[windows.size()-1]-it->first] << '\t' << spotMap[it->second-it->first-1] << endl; out << endl; out << "Window\tAvgQ\t(sdQ)\tAvgR\t(sdR)\tRatio\tAnova" << endl; for (int k = 0; k < windows.size(); k++) { float ds = averageQuery[k] / averageRef[k]; out << k+1 << '\t' << averageQuery[k] << '\t' << sdQuery[k] << '\t' << averageRef[k] << '\t'<< sdRef[k] << '\t' << ds << '\t' << anova[k] << endl; } out << endl; //varRef //varQuery /* F test for differences among variances. * varQuery is expected to be higher or similar than varRef */ //float fs = varQuery[query] / varRef[query]; /* F-Snedecor, test for differences of variances */ bool results = false; //confidence limit, t - Student, anova out << "Window\tConfidenceLimit\tt-Student\tAnova" << endl; for (int k = 0; k < windows.size(); k++) { string temp = ""; if (isChimericConfidence[k]) { temp += "*\t"; } else { temp += "\t"; } if (isChimericTStudent[k]) { temp += "*\t"; } else { temp += "\t"; } if (isChimericANOVA[k]) { temp += "*\t"; } else { temp += "\t"; } out << k+1 << '\t' << temp << endl; if (temp == "*\t*\t*\t") { results = true; } } out << endl; if (results) { m->mothurOut(querySeq->getName() + " was found have at least one chimeric window."); m->mothurOutEndLine(); outAcc << querySeq->getName() << endl; } //free memory for (int i = 0; i < closest.size(); i++) { delete closest[i].seq; } return *querySeq; } catch(exception& e) { m->errorOut(e, "Ccode", "print"); exit(1); } } #ifdef USE_MPI //*************************************************************************************************************** Sequence Ccode::print(MPI_File& out, MPI_File& outAcc) { try { string outMapString = ""; outMapString += querySeq->getName() + "\n"; for (it = spotMap.begin(); it!= spotMap.end(); it++) { outMapString += toString(it->first) + "\t" + toString(it->second) + "\n"; } printMapping(outMapString); outMapString = ""; string outString = ""; string outAccString = ""; outString += querySeq->getName() + "\n\nReference sequences used and distance to query:\n"; for (int j = 0; j < closest.size(); j++) { outString += closest[j].seq->getName() + "\t" + toString(closest[j].dist) + "\n"; } outString += "\n\nMapping information: "; //for each window //window mapping info. //you mask and did not filter if ((seqMask != "") && (!filter)) { outString += "mask and trim."; } //you filtered and did not mask if ((seqMask == "") && (filter)) { outString += "filter and trim."; } //you masked and filtered if ((seqMask != "") && (filter)) { outString += "mask, filter and trim."; } outString += "\nWindow\tStartPos\tEndPos\n"; it = trim.begin(); for (int k = 0; k < windows.size()-1; k++) { outString += toString(k+1) + "\t" + toString(spotMap[windows[k]-it->first]) + "\t" + toString(spotMap[windows[k]-it->first+windowSizes]) + "\n"; } outString += toString(windows.size()) + "\t" + toString(spotMap[windows[windows.size()-1]-it->first]) + "\t" + toString(spotMap[it->second-it->first-1]) + "\n\n"; outString += "Window\tAvgQ\t(sdQ)\tAvgR\t(sdR)\tRatio\tAnova\n"; for (int k = 0; k < windows.size(); k++) { float ds = averageQuery[k] / averageRef[k]; outString += toString(k+1) + "\t" + toString(averageQuery[k]) + "\t" + toString(sdQuery[k]) + "\t" + toString(averageRef[k]) + "\t" + toString(sdRef[k]) + "\t" + toString(ds) + "\t" + toString(anova[k]) + "\n"; } //varRef //varQuery /* F test for differences among variances. * varQuery is expected to be higher or similar than varRef */ //float fs = varQuery[query] / varRef[query]; /* F-Snedecor, test for differences of variances */ bool results = false; //confidence limit, t - Student, anova outString += "\nWindow\tConfidenceLimit\tt-Student\tAnova\n"; for (int k = 0; k < windows.size(); k++) { string temp = ""; if (isChimericConfidence[k]) { temp += "*\t"; } else { temp += "\t"; } if (isChimericTStudent[k]) { temp += "*\t"; } else { temp += "\t"; } if (isChimericANOVA[k]) { temp += "*\t"; } else { temp += "\t"; } outString += toString(k+1) + "\t" + temp + "\n"; if (temp == "*\t*\t*\t") { results = true; } } outString += "\n"; MPI_Status status; int length = outString.length(); char* buf2 = new char[length]; memcpy(buf2, outString.c_str(), length); MPI_File_write_shared(out, buf2, length, MPI_CHAR, &status); delete buf2; if (results) { m->mothurOut(querySeq->getName() + " was found have at least one chimeric window."); m->mothurOutEndLine(); outAccString += querySeq->getName() + "\n"; MPI_Status statusAcc; length = outAccString.length(); char* buf = new char[length]; memcpy(buf, outAccString.c_str(), length); MPI_File_write_shared(outAcc, buf, length, MPI_CHAR, &statusAcc); delete buf; } //free memory for (int i = 0; i < closest.size(); i++) { delete closest[i].seq; } return *querySeq; } catch(exception& e) { m->errorOut(e, "Ccode", "print"); exit(1); } } //*************************************************************************************************************** int Ccode::printMapping(string& output) { try { MPI_Status status; int length = output.length(); char* buf = new char[length]; memcpy(buf, output.c_str(), length); MPI_File_write_shared(outMap, buf, length, MPI_CHAR, &status); delete buf; return 0; } catch(exception& e) { m->errorOut(e, "Ccode", "printMapping"); exit(1); } } #endif //*************************************************************************************************************** int Ccode::getChimeras(Sequence* query) { try { closest.clear(); refCombo = 0; sumRef.clear(); varRef.clear(); varQuery.clear(); sdRef.clear(); sdQuery.clear(); sumQuery.clear(); sumSquaredRef.clear(); sumSquaredQuery.clear(); averageRef.clear(); averageQuery.clear(); anova.clear(); isChimericConfidence.clear(); isChimericTStudent.clear(); isChimericANOVA.clear(); trim.clear(); spotMap.clear(); windowSizes = window; windows.clear(); querySeq = query; //find closest matches to query closest = findClosest(query, numWanted); if (m->control_pressed) { return 0; } //initialize spotMap for (int i = 0; i < query->getAligned().length(); i++) { spotMap[i] = i; } //mask sequences if the user wants to if (seqMask != "") { decalc->setMask(seqMask); decalc->runMask(query); //mask closest for (int i = 0; i < closest.size(); i++) { decalc->runMask(closest[i].seq); } spotMap = decalc->getMaskMap(); } if (filter) { vector temp; for (int i = 0; i < closest.size(); i++) { temp.push_back(closest[i].seq); } temp.push_back(query); createFilter(temp, 0.5); for (int i = 0; i < temp.size(); i++) { if (m->control_pressed) { return 0; } runFilter(temp[i]); } //update spotMap map newMap; int spot = 0; for (int i = 0; i < filterString.length(); i++) { if (filterString[i] == '1') { //add to newMap newMap[spot] = spotMap[i]; spot++; } } spotMap = newMap; } //trim sequences - this follows ccodes remove_extra_gaps trimSequences(query); if (m->control_pressed) { return 0; } //windows are equivalent to words - ccode paper recommends windows are between 5% and 20% on alignment length(). //Our default will be 10% and we will warn if user tries to use a window above or below these recommendations windows = findWindows(); if (m->control_pressed) { return 0; } //remove sequences that are more than 20% different and less than 0.5% different - may want to allow user to specify this later removeBadReferenceSeqs(closest); if (m->control_pressed) { return 0; } //find the averages for each querys references getAverageRef(closest); //fills sumRef, averageRef, sumSquaredRef and refCombo. getAverageQuery(closest, query); //fills sumQuery, averageQuery, sumSquaredQuery. if (m->control_pressed) { return 0; } //find the averages for each querys references findVarianceRef(); //fills varRef and sdRef also sets minimum error rate to 0.001 to avoid divide by 0. if (m->control_pressed) { return 0; } //find the averages for the query findVarianceQuery(); //fills varQuery and sdQuery also sets minimum error rate to 0.001 to avoid divide by 0. if (m->control_pressed) { return 0; } determineChimeras(); //fills anova, isChimericConfidence, isChimericTStudent and isChimericANOVA. if (m->control_pressed) { return 0; } return 0; } catch(exception& e) { m->errorOut(e, "Ccode", "getChimeras"); exit(1); } } /***************************************************************************************************************/ //ccode algo says it does this to "Removes the initial and final gaps to avoid biases due to incomplete sequences." void Ccode::trimSequences(Sequence* query) { try { int frontPos = 0; //should contain first position in all seqs that is not a gap character int rearPos = query->getAligned().length(); //********find first position in closest seqs that is a non gap character***********// //find first position all query seqs that is a non gap character for (int i = 0; i < closest.size(); i++) { string aligned = closest[i].seq->getAligned(); int pos = 0; //find first spot in this seq for (int j = 0; j < aligned.length(); j++) { if (isalpha(aligned[j])) { pos = j; break; } } //save this spot if it is the farthest if (pos > frontPos) { frontPos = pos; } } //find first position all querySeq[query] that is a non gap character string aligned = query->getAligned(); int pos = 0; //find first spot in this seq for (int j = 0; j < aligned.length(); j++) { if (isalpha(aligned[j])) { pos = j; break; } } //save this spot if it is the farthest if (pos > frontPos) { frontPos = pos; } //********find last position in closest seqs that is a non gap character***********// for (int i = 0; i < closest.size(); i++) { string aligned = closest[i].seq->getAligned(); int pos = aligned.length(); //find first spot in this seq for (int j = aligned.length()-1; j >= 0; j--) { if (isalpha(aligned[j])) { pos = j; break; } } //save this spot if it is the farthest if (pos < rearPos) { rearPos = pos; } } //find last position all querySeqs[query] that is a non gap character aligned = query->getAligned(); pos = aligned.length(); //find first spot in this seq for (int j = aligned.length()-1; j >= 0; j--) { if (isalpha(aligned[j])) { pos = j; break; } } //save this spot if it is the farthest if (pos < rearPos) { rearPos = pos; } //check to make sure that is not whole seq if ((rearPos - frontPos - 1) <= 0) { m->mothurOut("Error, when I trim your sequences, the entire sequence is trimmed."); m->mothurOutEndLine(); exit(1); } map tempTrim; tempTrim[frontPos] = rearPos; //save trimmed locations trim = tempTrim; //update spotMask map newMap; int spot = 0; for (int i = frontPos; i < rearPos; i++) { //add to newMap newMap[spot] = spotMap[i]; spot++; } spotMap = newMap; } catch(exception& e) { m->errorOut(e, "Ccode", "trimSequences"); exit(1); } } /***************************************************************************************************************/ vector Ccode::findWindows() { try { vector win; it = trim.begin(); int length = it->second - it->first; //default is wanted = 10% of total length if (windowSizes > length) { m->mothurOut("You have slected a window larger than your sequence length after all filters, masks and trims have been done. I will use the default 10% of sequence length."); windowSizes = length / 10; }else if (windowSizes == 0) { windowSizes = length / 10; } else if (windowSizes > (length * 0.20)) { m->mothurOut("You have selected a window that is larger than 20% of your sequence length. This is not recommended, but I will continue anyway."); m->mothurOutEndLine(); }else if (windowSizes < (length * 0.05)) { m->mothurOut("You have selected a window that is smaller than 5% of your sequence length. This is not recommended, but I will continue anyway."); m->mothurOutEndLine(); } //save starting points of each window for (int m = it->first; m < (it->second-windowSizes); m+=windowSizes) { win.push_back(m); } //save last window if (win[win.size()-1] < (it->first+length)) { win.push_back(win[win.size()-1]+windowSizes); // ex. string length is 115, window is 25, without this you would get 0, 25, 50, 75 } //with this you would get 1,25,50,75,100 return win; } catch(exception& e) { m->errorOut(e, "Ccode", "findWindows"); exit(1); } } //*************************************************************************************************************** int Ccode::getDiff(string seqA, string seqB) { try { int numDiff = 0; for (int i = 0; i < seqA.length(); i++) { //if you are both not gaps //if (isalpha(seqA[i]) && isalpha(seqA[i])) { //are you different if (seqA[i] != seqB[i]) { int ok; /* ok=1 means equivalent base. Checks for degenerate bases */ /* the char in base_a and base_b have been checked and they are different */ if ((seqA[i] == 'N') && (seqB[i] != '-')) ok = 1; else if ((seqB[i] == 'N') && (seqA[i] != '-')) ok = 1; else if ((seqA[i] == 'Y') && ((seqB[i] == 'C') || (seqB[i] == 'T'))) ok = 1; else if ((seqB[i] == 'Y') && ((seqA[i] == 'C') || (seqA[i] == 'T'))) ok = 1; else if ((seqA[i] == 'R') && ((seqB[i] == 'G') || (seqB[i] == 'A'))) ok = 1; else if ((seqB[i] == 'R') && ((seqA[i] == 'G') || (seqA[i] == 'A'))) ok = 1; else if ((seqA[i] == 'S') && ((seqB[i] == 'C') || (seqB[i] == 'G'))) ok = 1; else if ((seqB[i] == 'S') && ((seqA[i] == 'C') || (seqA[i] == 'G'))) ok = 1; else if ((seqA[i] == 'W') && ((seqB[i] == 'T') || (seqB[i] == 'A'))) ok = 1; else if ((seqB[i] == 'W') && ((seqA[i] == 'T') || (seqA[i] == 'A'))) ok = 1; else if ((seqA[i] == 'M') && ((seqB[i] == 'A') || (seqB[i] == 'C'))) ok = 1; else if ((seqB[i] == 'M') && ((seqA[i] == 'A') || (seqA[i] == 'C'))) ok = 1; else if ((seqA[i] == 'K') && ((seqB[i] == 'T') || (seqB[i] == 'G'))) ok = 1; else if ((seqB[i] == 'K') && ((seqA[i] == 'T') || (seqA[i] == 'G'))) ok = 1; else if ((seqA[i] == 'V') && ((seqB[i] == 'C') || (seqB[i] == 'A') || (seqB[i] == 'G'))) ok = 1; else if ((seqB[i] == 'V') && ((seqA[i] == 'C') || (seqA[i] == 'A') || (seqA[i] == 'G'))) ok = 1; else if ((seqA[i] == 'H') && ((seqB[i] == 'T') || (seqB[i] == 'A') || (seqB[i] == 'C'))) ok = 1; else if ((seqB[i] == 'H') && ((seqA[i] == 'T') || (seqA[i] == 'A') || (seqA[i] == 'C'))) ok = 1; else if ((seqA[i] == 'D') && ((seqB[i] == 'T') || (seqB[i] == 'A') || (seqB[i] == 'G'))) ok = 1; else if ((seqB[i] == 'D') && ((seqA[i] == 'T') || (seqA[i] == 'A') || (seqA[i] == 'G'))) ok = 1; else if ((seqA[i] == 'B') && ((seqB[i] == 'C') || (seqB[i] == 'T') || (seqB[i] == 'G'))) ok = 1; else if ((seqB[i] == 'B') && ((seqA[i] == 'C') || (seqA[i] == 'T') || (seqA[i] == 'G'))) ok = 1; else ok = 0; /* the bases are different and not equivalent */ //check if they are both blanks if ((seqA[i] == '.') && (seqB[i] == '-')) ok = 1; else if ((seqB[i] == '.') && (seqA[i] == '-')) ok = 1; if (ok == 0) { numDiff++; } } //} } return numDiff; } catch(exception& e) { m->errorOut(e, "Ccode", "getDiff"); exit(1); } } //*************************************************************************************************************** //tried to make this look most like ccode original implementation void Ccode::removeBadReferenceSeqs(vector& seqs) { try { vector< vector > numDiffBases; numDiffBases.resize(seqs.size()); //initialize to 0 for (int i = 0; i < numDiffBases.size(); i++) { numDiffBases[i].resize(seqs.size(),0); } it = trim.begin(); int length = it->second - it->first; //calc differences from each sequence to everyother seq in the set for (int i = 0; i < seqs.size(); i++) { string seqA = seqs[i].seq->getAligned().substr(it->first, length); //so you don't calc i to j and j to i since they are the same for (int j = 0; j < i; j++) { string seqB = seqs[j].seq->getAligned().substr(it->first, length); //compare strings int numDiff = getDiff(seqA, seqB); numDiffBases[i][j] = numDiff; numDiffBases[j][i] = numDiff; } } //initailize remove to 0 vector remove; remove.resize(seqs.size(), 0); float top = ((20*length) / (float) 100); float bottom = ((0.5*length) / (float) 100); //check each numDiffBases and if any are higher than threshold set remove to 1 so you can remove those seqs from the closest set for (int i = 0; i < numDiffBases.size(); i++) { for (int j = 0; j < i; j++) { //are you more than 20% different if (numDiffBases[i][j] > top) { remove[j] = 1; } //are you less than 0.5% different if (numDiffBases[i][j] < bottom) { remove[j] = 1; } } } int numSeqsLeft = 0; //count seqs that are not going to be removed for (int i = 0; i < remove.size(); i++) { if (remove[i] == 0) { numSeqsLeft++; } } //if you have enough then remove bad ones if (numSeqsLeft >= 3) { vector goodSeqs; //remove bad seqs for (int i = 0; i < remove.size(); i++) { if (remove[i] == 0) { goodSeqs.push_back(seqs[i]); } } seqs = goodSeqs; }else { //warn, but dont remove any m->mothurOut(querySeq->getName() + " does not have an adaquate number of reference sequences that are within 20% and 0.5% similarity. I will continue, but please check."); m->mothurOutEndLine(); } } catch(exception& e) { m->errorOut(e, "Ccode", "removeBadReferenceSeqs"); exit(1); } } //*************************************************************************************************************** //makes copy of templateseq for filter vector Ccode::findClosest(Sequence* q, int numWanted) { try{ vector topMatches; Sequence query = *(q); //calc distance to each sequence in template seqs for (int i = 0; i < templateSeqs.size(); i++) { Sequence ref = *(templateSeqs[i]); //find overall dist distCalc->calcDist(query, ref); float dist = distCalc->getDist(); //save distance SeqDist temp; temp.seq = new Sequence(templateSeqs[i]->getName(), templateSeqs[i]->getAligned()); temp.dist = dist; topMatches.push_back(temp); } sort(topMatches.begin(), topMatches.end(), compareSeqDist); for (int i = numWanted; i < topMatches.size(); i++) { delete topMatches[i].seq; } topMatches.resize(numWanted); return topMatches; } catch(exception& e) { m->errorOut(e, "Ccode", "findClosestSides"); exit(1); } } /**************************************************************************************************/ //find the distances from each reference sequence to every other reference sequence for each window for this query void Ccode::getAverageRef(vector ref) { try { vector< vector< vector > > diffs; //diffs[0][1][2] is the number of differences between ref seq 0 and ref seq 1 at window 2. //initialize diffs vector diffs.resize(ref.size()); for (int i = 0; i < diffs.size(); i++) { diffs[i].resize(ref.size()); for (int j = 0; j < diffs[i].size(); j++) { diffs[i][j].resize(windows.size(), 0); } } it = trim.begin(); //find the distances from each reference sequence to every other reference sequence for each window for this query for (int i = 0; i < ref.size(); i++) { string refI = ref[i].seq->getAligned(); //jgetAligned(); for (int k = 0; k < windows.size(); k++) { string refIWindowk, refJWindowk; if (k < windows.size()-1) { //get window strings refIWindowk = refI.substr(windows[k], windowSizes); refJWindowk = refJ.substr(windows[k], windowSizes); }else { //last window may be smaller than rest - see findwindows //get window strings refIWindowk = refI.substr(windows[k], (it->second-windows[k])); refJWindowk = refJ.substr(windows[k], (it->second-windows[k])); } //find differences int diff = getDiff(refIWindowk, refJWindowk); //save differences in [i][j][k] and [j][i][k] since they are the same diffs[i][j][k] = diff; diffs[j][i][k] = diff; }//k }//j }//i //initialize sumRef for this query sumRef.resize(windows.size(), 0); sumSquaredRef.resize(windows.size(), 0); averageRef.resize(windows.size(), 0); //find the sum of the differences for hte reference sequences for (int i = 0; i < diffs.size(); i++) { for (int j = 0; j < i; j++) { //increment this querys reference sequences combos refCombo++; for (int k = 0; k < diffs[i][j].size(); k++) { sumRef[k] += diffs[i][j][k]; sumSquaredRef[k] += (diffs[i][j][k]*diffs[i][j][k]); }//k }//j }//i //find the average of the differences for the references for each window for (int i = 0; i < windows.size(); i++) { averageRef[i] = sumRef[i] / (float) refCombo; } } catch(exception& e) { m->errorOut(e, "Ccode", "getAverageRef"); exit(1); } } /**************************************************************************************************/ void Ccode::getAverageQuery (vector ref, Sequence* query) { try { vector< vector > diffs; //diffs[1][2] is the number of differences between querySeqs[query] and ref seq 1 at window 2. //initialize diffs vector diffs.resize(ref.size()); for (int j = 0; j < diffs.size(); j++) { diffs[j].resize(windows.size(), 0); } it = trim.begin(); string refQuery = query->getAligned(); //jgetAligned(); for (int k = 0; k < windows.size(); k++) { string QueryWindowk, refJWindowk; if (k < windows.size()-1) { //get window strings QueryWindowk = refQuery.substr(windows[k], windowSizes); refJWindowk = refJ.substr(windows[k], windowSizes); }else { //last window may be smaller than rest - see findwindows //get window strings QueryWindowk = refQuery.substr(windows[k], (it->second-windows[k])); refJWindowk = refJ.substr(windows[k], (it->second-windows[k])); } //find differences int diff = getDiff(QueryWindowk, refJWindowk); //save differences diffs[j][k] = diff; }//k }//j //initialize sumRef for this query sumQuery.resize(windows.size(), 0); sumSquaredQuery.resize(windows.size(), 0); averageQuery.resize(windows.size(), 0); //find the sum of the differences for (int j = 0; j < diffs.size(); j++) { for (int k = 0; k < diffs[j].size(); k++) { sumQuery[k] += diffs[j][k]; sumSquaredQuery[k] += (diffs[j][k]*diffs[j][k]); }//k }//j //find the average of the differences for the references for each window for (int i = 0; i < windows.size(); i++) { averageQuery[i] = sumQuery[i] / (float) ref.size(); } } catch(exception& e) { m->errorOut(e, "Ccode", "getAverageQuery"); exit(1); } } /**************************************************************************************************/ void Ccode::findVarianceRef() { try { varRef.resize(windows.size(), 0); sdRef.resize(windows.size(), 0); //for each window for (int i = 0; i < windows.size(); i++) { varRef[i] = (sumSquaredRef[i] - ((sumRef[i]*sumRef[i])/(float)refCombo)) / (float)(refCombo-1); sdRef[i] = sqrt(varRef[i]); //set minimum error rate to 0.001 - to avoid potential divide by zero - not sure if this is necessary but it follows ccode implementation if (averageRef[i] < 0.001) { averageRef[i] = 0.001; } if (sumRef[i] < 0.001) { sumRef[i] = 0.001; } if (varRef[i] < 0.001) { varRef[i] = 0.001; } if (sumSquaredRef[i] < 0.001) { sumSquaredRef[i] = 0.001; } if (sdRef[i] < 0.001) { sdRef[i] = 0.001; } } } catch(exception& e) { m->errorOut(e, "Ccode", "findVarianceRef"); exit(1); } } /**************************************************************************************************/ void Ccode::findVarianceQuery() { try { varQuery.resize(windows.size(), 0); sdQuery.resize(windows.size(), 0); //for each window for (int i = 0; i < windows.size(); i++) { varQuery[i] = (sumSquaredQuery[i] - ((sumQuery[i]*sumQuery[i])/(float) closest.size())) / (float) (closest.size()-1); sdQuery[i] = sqrt(varQuery[i]); //set minimum error rate to 0.001 - to avoid potential divide by zero - not sure if this is necessary but it follows ccode implementation if (averageQuery[i] < 0.001) { averageQuery[i] = 0.001; } if (sumQuery[i] < 0.001) { sumQuery[i] = 0.001; } if (varQuery[i] < 0.001) { varQuery[i] = 0.001; } if (sumSquaredQuery[i] < 0.001) { sumSquaredQuery[i] = 0.001; } if (sdQuery[i] < 0.001) { sdQuery[i] = 0.001; } } } catch(exception& e) { m->errorOut(e, "Ccode", "findVarianceQuery"); exit(1); } } /**************************************************************************************************/ void Ccode::determineChimeras() { try { isChimericConfidence.resize(windows.size(), false); isChimericTStudent.resize(windows.size(), false); isChimericANOVA.resize(windows.size(), false); anova.resize(windows.size()); //for each window for (int i = 0; i < windows.size(); i++) { //get confidence limits float t = getT(closest.size()-1); //how many seqs you are comparing to this querySeq float dsUpper = (averageQuery[i] + (t * sdQuery[i])) / averageRef[i]; float dsLower = (averageQuery[i] - (t * sdQuery[i])) / averageRef[i]; if ((dsUpper > 1.0) && (dsLower > 1.0) && (averageQuery[i] > averageRef[i])) { /* range does not include 1 */ isChimericConfidence[i] = true; /* significantly higher at P<0.05 */ } //student t test int degreeOfFreedom = refCombo + closest.size() - 2; float denomForT = (((refCombo-1) * varQuery[i] + (closest.size() - 1) * varRef[i]) / (float) degreeOfFreedom) * ((refCombo + closest.size()) / (float) (refCombo * closest.size())); /* denominator, without sqrt(), for ts calculations */ float ts = fabs((averageQuery[i] - averageRef[i]) / (sqrt(denomForT))); /* value of ts for t-student test */ t = getT(degreeOfFreedom); if ((ts >= t) && (averageQuery[i] > averageRef[i])) { isChimericTStudent[i] = true; /* significantly higher at P<0.05 */ } //anova test float value1 = sumQuery[i] + sumRef[i]; float value2 = sumSquaredQuery[i] + sumSquaredRef[i]; float value3 = ((sumQuery[i]*sumQuery[i]) / (float) (closest.size())) + ((sumRef[i] * sumRef[i]) / (float) refCombo); float value4 = (value1 * value1) / ( (float) (closest.size() + refCombo) ); float value5 = value2 - value4; float value6 = value3 - value4; float value7 = value5 - value6; float value8 = value7 / ((float) degreeOfFreedom); float anovaValue = value6 / value8; float f = getF(degreeOfFreedom); if ((anovaValue >= f) && (averageQuery[i] > averageRef[i])) { isChimericANOVA[i] = true; /* significant P<0.05 */ } if (isnan(anovaValue) || isinf(anovaValue)) { anovaValue = 0.0; } anova[i] = anovaValue; } } catch(exception& e) { m->errorOut(e, "Ccode", "determineChimeras"); exit(1); } } /**************************************************************************************************/ float Ccode::getT(int numseq) { try { float tvalue = 0; /* t-student critical values for different degrees of freedom and alpha 0.1 in one-tail tests (equivalent to 0.05) */ if (numseq > 120) tvalue = 1.645; else if (numseq > 60) tvalue = 1.658; else if (numseq > 40) tvalue = 1.671; else if (numseq > 30) tvalue = 1.684; else if (numseq > 29) tvalue = 1.697; else if (numseq > 28) tvalue = 1.699; else if (numseq > 27) tvalue = 1.701; else if (numseq > 26) tvalue = 1.703; else if (numseq > 25) tvalue = 1.706; else if (numseq > 24) tvalue = 1.708; else if (numseq > 23) tvalue = 1.711; else if (numseq > 22) tvalue = 1.714; else if (numseq > 21) tvalue = 1.717; else if (numseq > 20) tvalue = 1.721; else if (numseq > 19) tvalue = 1.725; else if (numseq > 18) tvalue = 1.729; else if (numseq > 17) tvalue = 1.734; else if (numseq > 16) tvalue = 1.740; else if (numseq > 15) tvalue = 1.746; else if (numseq > 14) tvalue = 1.753; else if (numseq > 13) tvalue = 1.761; else if (numseq > 12) tvalue = 1.771; else if (numseq > 11) tvalue = 1.782; else if (numseq > 10) tvalue = 1.796; else if (numseq > 9) tvalue = 1.812; else if (numseq > 8) tvalue = 1.833; else if (numseq > 7) tvalue = 1.860; else if (numseq > 6) tvalue = 1.895; else if (numseq > 5) tvalue = 1.943; else if (numseq > 4) tvalue = 2.015; else if (numseq > 3) tvalue = 2.132; else if (numseq > 2) tvalue = 2.353; else if (numseq > 1) tvalue = 2.920; else if (numseq <= 1) { m->mothurOut("Two or more reference sequences are required, your data will be flawed.\n"); m->mothurOutEndLine(); } return tvalue; } catch(exception& e) { m->errorOut(e, "Ccode", "getT"); exit(1); } } /**************************************************************************************************/ float Ccode::getF(int numseq) { try { float fvalue = 0; /* F-Snedecor critical values for v1=1 and different degrees of freedom v2 and alpha 0.05 */ if (numseq > 120) fvalue = 3.84; else if (numseq > 60) fvalue = 3.92; else if (numseq > 40) fvalue = 4.00; else if (numseq > 30) fvalue = 4.08; else if (numseq > 29) fvalue = 4.17; else if (numseq > 28) fvalue = 4.18; else if (numseq > 27) fvalue = 4.20; else if (numseq > 26) fvalue = 4.21; else if (numseq > 25) fvalue = 4.23; else if (numseq > 24) fvalue = 4.24; else if (numseq > 23) fvalue = 4.26; else if (numseq > 22) fvalue = 4.28; else if (numseq > 21) fvalue = 4.30; else if (numseq > 20) fvalue = 4.32; else if (numseq > 19) fvalue = 4.35; else if (numseq > 18) fvalue = 4.38; else if (numseq > 17) fvalue = 4.41; else if (numseq > 16) fvalue = 4.45; else if (numseq > 15) fvalue = 4.49; else if (numseq > 14) fvalue = 4.54; else if (numseq > 13) fvalue = 4.60; else if (numseq > 12) fvalue = 4.67; else if (numseq > 11) fvalue = 4.75; else if (numseq > 10) fvalue = 4.84; else if (numseq > 9) fvalue = 4.96; else if (numseq > 8) fvalue = 5.12; else if (numseq > 7) fvalue = 5.32; else if (numseq > 6) fvalue = 5.59; else if (numseq > 5) fvalue = 5.99; else if (numseq > 4) fvalue = 6.61; else if (numseq > 3) fvalue = 7.71; else if (numseq > 2) fvalue = 10.1; else if (numseq > 1) fvalue = 18.5; else if (numseq > 0) fvalue = 161; else if (numseq <= 0) { m->mothurOut("Two or more reference sequences are required, your data will be flawed.\n"); m->mothurOutEndLine(); } return fvalue; } catch(exception& e) { m->errorOut(e, "Ccode", "getF"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/chimera/ccode.h000066400000000000000000000101471255543666200172770ustar00rootroot00000000000000#ifndef CCODE_H #define CCODE_H /* * ccode.h * Mothur * * Created by westcott on 8/24/09. * Copyright 2009 Schloss LAB. All rights reserved. * */ #include "chimera.h" #include "dist.h" #include "decalc.h" /***********************************************************/ //This class was created using the algorithms described in the // "Evaluating putative chimeric sequences from PCR-amplified products" paper //by Juan M. Gonzalez, Johannes Zimmerman and Cesareo Saiz-Jimenez. /***********************************************************/ class Ccode : public Chimera { public: Ccode(string, string, bool, string, int, int, string); //fasta, template, filter, mask, window, numWanted, outputDir ~Ccode(); int getChimeras(Sequence* query); Sequence print(ostream&, ostream&); #ifdef USE_MPI Sequence print(MPI_File&, MPI_File&); #endif private: Dist* distCalc; DeCalculator* decalc; int iters, window, numWanted; string fastafile, mapInfo; Sequence* querySeq; map spotMap; map::iterator it; vector windows; //windows is the vector of window breaks for query int windowSizes; //windowSizes is the size of the windows for query map trim; //trim is the map containing the starting and ending positions for query vector closest; //closest is a vector of sequence at are closest to query vector averageRef; //averageRef is the average distance at each window for the references for query vector averageQuery; //averageQuery is the average distance at each winow for the query for query vector sumRef; //sumRef is the sum of distances at each window for the references for query vector sumSquaredRef; //sumSquaredRef is the sum of squared distances at each window for the references for query vector sumQuery; //sumQuery is the sum of distances at each window for the comparison of query to references for query vector sumSquaredQuery; //sumSquaredQuery is the sum of squared distances at each window for the comparison of query to references for query vector varRef; //varRef is the variance among references seqs at each window for query vector varQuery; //varQuery is the variance among references and query at each window vector sdRef; //sdRef is the standard deviation of references seqs at each window for query vector sdQuery; //sdQuery is the standard deviation of references and query at each window vector anova; //anova is the vector of anova scores for each window for query int refCombo; //refCombo is the number of reference sequences combinations for query vector isChimericConfidence; //isChimericConfidence indicates whether query is chimeric at a given window according to the confidence limits vector isChimericTStudent; //isChimericConfidence indicates whether query is chimeric at a given window according to the confidence limits vector isChimericANOVA; //isChimericConfidence indicates whether query is chimeric at a given window according to the confidence limits vector findClosest(Sequence*, int); void removeBadReferenceSeqs(vector&); //removes sequences from closest that are to different of too similar to eachother. void trimSequences(Sequence*); vector findWindows(); void getAverageRef(vector); //fills sumRef, averageRef, sumSquaredRef and refCombo. void getAverageQuery (vector, Sequence*); //fills sumQuery, averageQuery, sumSquaredQuery. void findVarianceRef (); //fills varRef and sdRef also sets minimum error rate to 0.001 to avoid divide by 0. void findVarianceQuery (); //fills varQuery and sdQuery void determineChimeras (); //fills anova, isChimericConfidence, isChimericTStudent and isChimericANOVA. int getDiff(string, string); //return number of mismatched bases, a gap to base is not counted as a mismatch float getT(int); float getF(int); #ifdef USE_MPI int printMapping(string&); MPI_File outMap; #endif }; /***********************************************************/ #endif mothur-1.36.1/source/chimera/chimera.cpp000066400000000000000000000444251255543666200201730ustar00rootroot00000000000000/* * chimera.cpp * Mothur * * Created by Sarah Westcott on 8/11/09. * Copyright 2009 Schloss Lab Umass Amherst. All rights reserved. * */ #include "chimera.h" #include "referencedb.h" //*************************************************************************************************************** //this is a vertical soft filter string Chimera::createFilter(vector seqs, float t) { try { filterString = ""; int threshold = int (t * seqs.size()); //cout << "threshhold = " << threshold << endl; vector gaps; gaps.resize(seqs[0]->getAligned().length(), 0); vector a; a.resize(seqs[0]->getAligned().length(), 0); vector t; t.resize(seqs[0]->getAligned().length(), 0); vector g; g.resize(seqs[0]->getAligned().length(), 0); vector c; c.resize(seqs[0]->getAligned().length(), 0); filterString = (string(seqs[0]->getAligned().length(), '1')); //for each sequence for (int i = 0; i < seqs.size(); i++) { if (m->control_pressed) { return filterString; } string seqAligned = seqs[i]->getAligned(); if (seqAligned.length() != filterString.length()) { m->mothurOut(seqs[i]->getName() + " is not the same length as the template sequences. Aborting!\n"); exit(1); } for (int j = 0; j < seqAligned.length(); j++) { //if this spot is a gap if ((seqAligned[j] == '-') || (seqAligned[j] == '.')) { gaps[j]++; } else if (toupper(seqAligned[j]) == 'A') { a[j]++; } else if (toupper(seqAligned[j]) == 'T') { t[j]++; } else if (toupper(seqAligned[j]) == 'G') { g[j]++; } else if (toupper(seqAligned[j]) == 'C') { c[j]++; } } } //zero out spot where all sequences have blanks int numColRemoved = 0; for(int i = 0;i < seqs[0]->getAligned().length(); i++){ if (m->control_pressed) { return filterString; } if(gaps[i] == seqs.size()) { filterString[i] = '0'; numColRemoved++; } else if (((a[i] < threshold) && (t[i] < threshold) && (g[i] < threshold) && (c[i] < threshold))) { filterString[i] = '0'; numColRemoved++; } //cout << "a = " << a[i] << " t = " << t[i] << " g = " << g[i] << " c = " << c[i] << endl; } if (threshold != 0.0) { m->mothurOut("Filter removed " + toString(numColRemoved) + " columns."); m->mothurOutEndLine(); } return filterString; } catch(exception& e) { m->errorOut(e, "Chimera", "createFilter"); exit(1); } } //*************************************************************************************************************** map Chimera::runFilter(Sequence* seq) { try { map maskMap; string seqAligned = seq->getAligned(); string newAligned = ""; int count = 0; for (int j = 0; j < seqAligned.length(); j++) { //if this spot is a gap if (filterString[j] == '1') { newAligned += seqAligned[j]; maskMap[count] = j; count++; } } seq->setAligned(newAligned); return maskMap; } catch(exception& e) { m->errorOut(e, "Chimera", "runFilter"); exit(1); } } //*************************************************************************************************************** vector Chimera::readSeqs(string file) { try { vector container; int count = 0; length = 0; unaligned = false; ReferenceDB* rdb = ReferenceDB::getInstance(); if (file == "saved") { m->mothurOutEndLine(); m->mothurOut("Using sequences from " + rdb->getSavedReference() + " that are saved in memory."); m->mothurOutEndLine(); for (int i = 0; i < rdb->referenceSeqs.size(); i++) { Sequence* temp = new Sequence(rdb->referenceSeqs[i].getName(), rdb->referenceSeqs[i].getAligned()); if (count == 0) { length = temp->getAligned().length(); count++; } //gets first seqs length else if (length != temp->getAligned().length()) { unaligned = true; } if (temp->getName() != "") { container.push_back(temp); } } templateFileName = rdb->getSavedReference(); }else { m->mothurOut("Reading sequences from " + file + "..."); cout.flush(); #ifdef USE_MPI int pid, processors; vector positions; int numSeqs; int tag = 2001; MPI_File inMPI; MPI_Status status; if (byGroup) { char inFileName[1024]; strcpy(inFileName, file.c_str()); MPI_File_open(MPI_COMM_SELF, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer positions = m->setFilePosFasta(file, numSeqs); //fills MPIPos, returns numSeqs //read file for(int i=0;icontrol_pressed) { MPI_File_close(&inMPI); return container; } //read next sequence int seqlength = positions[i+1] - positions[i]; char* buf4 = new char[seqlength]; MPI_File_read_at(inMPI, positions[i], buf4, seqlength, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > seqlength) { tempBuf = tempBuf.substr(0, seqlength); } delete buf4; istringstream iss (tempBuf,istringstream::in); Sequence* current = new Sequence(iss); if (current->getName() != "") { if (count == 0) { length = current->getAligned().length(); count++; } //gets first seqs length else if (length != current->getAligned().length()) { unaligned = true; } container.push_back(current); if (rdb->save) { rdb->referenceSeqs.push_back(*current); } } } MPI_File_close(&inMPI); }else { MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); //char* inFileName = new char[file.length()]; //memcpy(inFileName, file.c_str(), file.length()); char inFileName[1024]; strcpy(inFileName, file.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer //delete inFileName; if (pid == 0) { positions = m->setFilePosFasta(file, numSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&positions[0], (numSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } }else{ MPI_Recv(&numSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); positions.resize(numSeqs+1); MPI_Recv(&positions[0], (numSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); } //read file for(int i=0;icontrol_pressed) { MPI_File_close(&inMPI); return container; } //read next sequence int seqlength = positions[i+1] - positions[i]; char* buf4 = new char[seqlength]; MPI_File_read_at(inMPI, positions[i], buf4, seqlength, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > seqlength) { tempBuf = tempBuf.substr(0, seqlength); } delete buf4; istringstream iss (tempBuf,istringstream::in); Sequence* current = new Sequence(iss); if (current->getName() != "") { if (count == 0) { length = current->getAligned().length(); count++; } //gets first seqs length else if (length != current->getAligned().length()) { unaligned = true; } container.push_back(current); if (rdb->save) { rdb->referenceSeqs.push_back(*current); } } } MPI_File_close(&inMPI); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case } #else ifstream in; m->openInputFile(file, in); //read in seqs and store in vector while(!in.eof()){ if (m->control_pressed) { return container; } Sequence* current = new Sequence(in); m->gobble(in); if (count == 0) { length = current->getAligned().length(); count++; } //gets first seqs length else if (length != current->getAligned().length()) { unaligned = true; } if (current->getName() != "") { container.push_back(current); if (rdb->save) { rdb->referenceSeqs.push_back(*current); } } } in.close(); #endif m->mothurOut("Done."); m->mothurOutEndLine(); filterString = (string(container[0]->getAligned().length(), '1')); } return container; } catch(exception& e) { m->errorOut(e, "Chimera", "readSeqs"); exit(1); } } //*************************************************************************************************************** void Chimera::setMask(string filename) { try { if (filename == "default") { //default is from wigeon 236627 EU009184.1 Shigella dysenteriae str. FBD013 seqMask = ".....................................................................................................AAATTGAAGAGTTT-GA--T-CA-T-G-GCTC-AG-AT-TGAA-C-GC--TGG-C--G-GC-A-GG--C----C-T--AACACA-T-GC-A-AGT-CGA-A-CG----------G-TAA-CA-G----------------------------GAAG-A-AG----------------------------------------------------CTT-G----------------------------------------------------------------------------------CT-TCTTT----------------G-CT--G--AC--G--AG-T-GG-C-GG-A--C-------------GGG-TGAGT-A--AT-GT-C-T-G-GG---A-A--A-CT-G--C-C-TGA--TG-G------------------------------------------------------------------A-GG----GGG-AT-AA-CTA-------------------------C-T-G-----------------------GAA-A---CGG-TAG-CTAA-TA---CC-G--C-AT-A----------A--------------------C-------------------------------------GT-C-----------------------------------------------------------------------------------------------------------------------G-CA-A--------------------------------------------------------------------------------------------------------------------------------------G-A-C---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------CAAA--G-A-G-GG-----G--GA-C-CT--------------------------------------------------------------------------------------------------------------------TCG-G----------------------------------------------------------------------------------------------------------------------G----CC-TC--T---T-G--------------C----C-A---T-CG-G---AT---G-T-----G-CCC-AGA--T-GGG--A------TT--A--G-CT-A----G---TAGG-T-G-GG-G-T----AAC-GG-C-T-C-ACCT--A-GG-C-G--A-CG-A------------TCC-C-T------AG-CT-G-G-TCT-G-AG----A--GG-AT--G-AC-C-AG-CCAC-A-CTGGA--A-C-TG-A-GA-C-AC-G-G-TCCAGA-CTCC-TAC-G--G-G-A-G-GC-A-GC-A-G-TG---GG-G-A-ATA-TTGCA-C-AA-T-GG--GC-GC-A----A-G-CC-T-GA-TG-CA-GCCA-TGCC-G-CG-T---G-T-A--T--GA-A-G--A--A-G-G-CC-----TT-CG---------G-G-T-T-G-T--A---AA-G-TAC--------TT-TC-A-G--C-GGG----GA-G--G---AA-GGGA---GTAA-AG----T--T--AA-T---A----C-----CT-T-TGC-TCA-TT-GA-CG-TT-A-C-CC-G-CA-G---------AA-----------GAAGC-ACC-GG-C-TAA---C--T-CCGT--GCCA--G-C---A--GCCG---C-GG--TA-AT--AC---GG-AG-GGT-GCA-A-G-CG-TTAA-T-CGG-AA-TT-A--C-T--GGGC-GTA----AA-GCGC-AC--G-CA-G-G-C-G------------G--T-TT-G-T-T-AA----G-T-C-A---G-ATG-TG-A-AA-TC--CC-CGG-G--------------------------------------------------------------------CT-C-AA-------------------------------------------------------------------------CC-T-G-GG-AA-C----T-G-C-A-T-C--------T--GA-T-A-C-T-G-GCA--A-G-C---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------T-T-G-A-G-T-C-----T-CG--TA-G-A------------G-GG-G-GG-T----AG--AATT-CCA-G-GT--GT-A-GCG-GTGAAA-TG-CGT-AGAG-A-TC-T-GGA--GG-A-AT-A-CC-GG--T--G--GC-GAA-G--G-C---G----G--C-C-CCCTG------G-AC-GA--------------------------------------------------------------AG-A-C-T--GA--CG-----CT-CA-GG--T-G-CGA--AA-G-C--------------G-TGGG-GAG-C-A-AACA--GG-ATTA-G-ATA-C-----CC-T-G-GTA-G-T----C-CA--C-G-CCG-T-AAA--C-GATG-TC--GA-CT---------T-GG--A--G-G-TT-G-TG-C--C--------------------------------------------------------------------------------------CTT-GA--------------------------------------------------------------------------------------------------------------------------------------------------G-G-C-GT--G-G-C-T-TC-C------GG--A----GC-TAA--CG-C-G-T--T--AA-GT--C----G-ACC-GCC-T-G-GG-GAG-TA---CGG-----C-C--G-C-A-A-GGT-T--AAA-ACTC-AAA---------TGAA-TTG-ACGGG-G-G-CCCG----C-A--C-A-A-GCG-GT-G--G--AG-CA-T--GT-GGT-TT-AATT-C-G-ATG-CAAC-G-CG-A-AG-A-A-CC-TT-A-CC-TGGTC-TT-G-AC-A-T-C--------------CAC-G-G-------------A-AG-T-T-T--TC--A-GA-G-A-T--G-A-G--A-A-T-G--T-G-----CC-------------------------------------T--TC-G------------------------------------------GG----A----A---CC-GTG---A--GA---------------------------------------------------C-A-G-G-T-GCTG-CA-TGG-CT--GTC-GTC-A-GC-TC---G-TG-TT-G--TGA-AA-TGT-T-GG-G-TT-AA-GT-CCCGC-AA--------C-GAG-CGC-A-ACC-C-T-TA--TC--C-TTTG--T-T-G-C-C---AG-C-G-----G-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------TCC------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------GG---C----C-G------------G----G---A-A--CT---------------C-A-A-A-G-GA-G--AC-T-G-CCA--G-T------------------------------------G-A---TAA----------------------------------A-C-T-G--G-A-GG-A--AGG-T--GGGG-A-TGAC-GTC--AAGT-C---ATC-A-T-G-G-C-C-CTT----AC-G--AC-C-A-GG-GC-TA-CAC-ACGTG-C--TA--CAATG---G-CGCA-T-A--C-AAA-GA-GA--------------------------------------------------------------------------------------------------A-G-C-G-A--C-CTCG-C--G---------------------------------------A-GA-G-C-----------A--A-G-CG---G----------A--CCT-C------A-T-AAAGT-GC-G-T-C-G-TAG-TCC--------GGA-T-TGGAG-TC--T-GCAA-CT-C-------------------------------------------------------------------------------------------------G-ACTCC-A-T-G-AA-G-TC-GGAAT-CG-C-TA--G-TA-AT-C-G-T----GGA-TC-A-G--A------AT--GCC-AC-G-GT-G-AAT-ACGT-T-CCCGGGCCT-TGTA----CACACCG-CCC-GTC-----A---CA--CCA-TG-GG-A--G---TGG-G-TT-GC-AAA--A-GAA------G--T-AGG-TA-G-C-T-T-AA-C-C--------------------------------------------------------------TT----C-------------------------------------------------------------------------------------------------G--GG-A--GG-G--C---GC-TTA--CC--ACT-T----T-GTG-AT-TCA------------------------TG--ACT-GGGG-TG-AAG-TCGTAACAA-GGTAA-CCGT-AGGGGAA-CCTG-CGGT-TGGATCACCTCCTTA................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................"; }else if (filename == "") { //do nothing seqMask = ""; }else{ #ifdef USE_MPI MPI_File inMPI; MPI_Offset size; MPI_Status status; //char* inFileName = new char[filename.length()]; //memcpy(inFileName, filename.c_str(), filename.length()); char inFileName[1024]; strcpy(inFileName, filename.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_get_size(inMPI, &size); //delete inFileName; char* buffer = new char[size]; MPI_File_read(inMPI, buffer, size, MPI_CHAR, &status); string tempBuf = buffer; if (tempBuf.length() > size) { tempBuf = tempBuf.substr(0, size); } istringstream iss (tempBuf,istringstream::in); delete buffer; if (!iss.eof()) { Sequence temp(iss); seqMask = temp.getAligned(); }else { m->mothurOut("Problem with mask."); m->mothurOutEndLine(); seqMask = ""; } MPI_File_close(&inMPI); #else ifstream infile; m->openInputFile(filename, infile); if (!infile.eof()) { Sequence temp(infile); seqMask = temp.getAligned(); }else { m->mothurOut("Problem with mask."); m->mothurOutEndLine(); seqMask = ""; } infile.close(); #endif } } catch(exception& e) { m->errorOut(e, "Chimera", "setMask"); exit(1); } } //*************************************************************************************************************** Sequence* Chimera::getSequence(string name) { try{ Sequence* temp; //look through templateSeqs til you find it int spot = -1; for (int i = 0; i < templateSeqs.size(); i++) { if (name == templateSeqs[i]->getName()) { spot = i; break; } } if(spot == -1) { m->mothurOut("Error: Could not find sequence."); m->mothurOutEndLine(); return NULL; } temp = new Sequence(templateSeqs[spot]->getName(), templateSeqs[spot]->getAligned()); return temp; } catch(exception& e) { m->errorOut(e, "Chimera", "getSequence"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/chimera/chimera.h000066400000000000000000000126621255543666200176360ustar00rootroot00000000000000#ifndef CHIMERA_H #define CHIMERA_H /* * chimera.h * Mothur * * Created by Sarah Westcott on 7/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "sequence.hpp" /***********************************************************************/ struct data_struct { float divr_qla_qrb; float divr_qlb_qra; float qla_qrb; float qlb_qra; float qla; float qrb; float ab; float qa; float qb; float lab; float rab; float qra; float qlb; int winLStart; int winLEnd; int winRStart; int winREnd; Sequence querySeq; Sequence parentA; Sequence parentB; float bsa; float bsb; float bsMax; float chimeraMax; }; /***********************************************************************/ struct data_results { vector results; string flag; Sequence trimQuery; //results malignerResults; data_results(vector d, string f, map s, Sequence t) : results(d), flag(f), trimQuery(t) {} data_results() {} }; /***********************************************************************/ //sorts lowest to highest first by bsMax, then if tie by chimeraMax inline bool compareDataStruct(data_struct left, data_struct right){ if (left.bsMax < right.bsMax) { return true; } else if (left.bsMax == right.bsMax) { return (left.chimeraMax < right.chimeraMax); }else { return false; } } /***********************************************************************/ struct Preference { string name; string leftParent; //keep the name of closest left string rightParent; //keep the name of closest float score; //preference score float closestLeft; //keep the closest left float closestRight; //keep the closest right int midpoint; Preference() { name = ""; leftParent = ""; rightParent = ""; score = 0.0; closestLeft = 10000.0; closestRight = 10000.0; midpoint = 0; } ~Preference() {} }; /***********************************************************************/ struct score_struct { int prev; int score; int row; int col; // int mismatches; }; /***********************************************************************/ struct trace_struct { int col; int oldCol; int row; }; /***********************************************************************/ struct results { int regionStart; int regionEnd; int nastRegionStart; int nastRegionEnd; string parent; string parentAligned; float queryToParent; float queryToParentLocal; float divR; }; /***********************************************************************/ struct SeqDist { Sequence* seq; float dist; int index; }; /***********************************************************************/ struct SeqCompare { Sequence seq; float dist; int index; }; //******************************************************************************************************************** //sorts lowest to highest inline bool compareRegionStart(results left, results right){ return (left.nastRegionStart < right.nastRegionStart); } //******************************************************************************************************************** //sorts lowest to highest inline bool compareSeqDist(SeqDist left, SeqDist right){ return (left.dist < right.dist); } //******************************************************************************************************************** //sorts lowest to highest inline bool compareSeqCompare(SeqCompare left, SeqCompare right){ return (left.dist < right.dist); } //******************************************************************************************************************** struct sim { string leftParent; string rightParent; float score; int midpoint; }; /***********************************************************************/ class Chimera { public: Chimera(){ m = MothurOut::getInstance(); length = 0; unaligned = false; byGroup = false; } virtual ~Chimera(){ for (int i = 0; i < templateSeqs.size(); i++) { delete templateSeqs[i]; } for (int i = 0; i < filteredTemplateSeqs.size(); i++) { delete filteredTemplateSeqs[i]; } }; virtual bool getUnaligned() { return unaligned; } virtual int getLength() { return length; } virtual vector readSeqs(string); virtual void setMask(string); virtual map runFilter(Sequence*); virtual string createFilter(vector, float); virtual void printHeader(ostream&){}; virtual int getChimeras(Sequence*){ return 0; } virtual int getChimeras(){ return 0; } virtual Sequence print(ostream&, ostream&){ Sequence temp; return temp; } virtual Sequence print(ostream&, ostream&, data_results, data_results) { Sequence temp; return temp; } virtual int print(ostream&, ostream&, string){ return 0; } virtual int getNumNoParents(){ return 0; } virtual data_results getResults() { data_results results; return results; } #ifdef USE_MPI virtual Sequence print(MPI_File&, MPI_File&){ Sequence temp; return temp; } virtual Sequence print(MPI_File&, MPI_File&, data_results, data_results, bool&){ Sequence temp; return temp; } virtual int print(MPI_File&, MPI_File&, string){ return 0; } #endif protected: vector templateSeqs; vector filteredTemplateSeqs; bool filter, unaligned, byGroup; int length; string seqMask, filterString, outputDir, templateFileName; Sequence* getSequence(string); //find sequence from name MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/chimera/chimeracheckrdp.cpp000066400000000000000000000370621255543666200216760ustar00rootroot00000000000000/* * chimeracheckrdp.cpp * Mothur * * Created by westcott on 9/8/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "chimeracheckrdp.h" //*************************************************************************************************************** ChimeraCheckRDP::ChimeraCheckRDP(string filename, string temp, string n, bool s, int inc, int k, string o) : Chimera() { try { fastafile = filename; templateFileName = temp; name = n; svg = s; increment = inc; kmerSize = k; outputDir = o; templateDB = new AlignmentDB(templateFileName, "kmer", kmerSize, 0.0,0.0,0.0,0.0, rand()); m->mothurOutEndLine(); kmer = new Kmer(kmerSize); if (name != "") { readName(name); //fills name map with names of seqs the user wants to have .svg for. } } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "ChimeraCheckRDP"); exit(1); } } //*************************************************************************************************************** ChimeraCheckRDP::~ChimeraCheckRDP() { try { delete templateDB; delete kmer; } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "~ChimeraCheckRDP"); exit(1); } } //*************************************************************************************************************** Sequence ChimeraCheckRDP::print(ostream& out, ostream& outAcc) { try { m->mothurOut("Processing: " + querySeq->getName()); m->mothurOutEndLine(); out << querySeq->getName() << endl; out << "IS scores: " << '\t'; for (int k = 0; k < IS.size(); k++) { out << IS[k].score << '\t'; } out << endl; if (svg) { if (name != "") { //if user has specific names map::iterator it = names.find(querySeq->getName()); if (it != names.end()) { //user wants pic of this makeSVGpic(IS); //zeros out negative results } }else{//output them all makeSVGpic(IS); //zeros out negative results } } return *querySeq; } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "print"); exit(1); } } #ifdef USE_MPI //*************************************************************************************************************** Sequence ChimeraCheckRDP::print(MPI_File& out, MPI_File& outAcc) { try { cout << "Processing: " << querySeq->getName() << endl; string outString = ""; outString += querySeq->getName() + "\nIS scores: \t"; for (int k = 0; k < IS.size(); k++) { outString += toString(IS[k].score) + "\t"; } outString += "\n"; MPI_Status status; int length = outString.length(); char* buf = new char[length]; memcpy(buf, outString.c_str(), length); MPI_File_write_shared(out, buf, length, MPI_CHAR, &status); delete buf; if (svg) { if (name != "") { //if user has specific names map::iterator it = names.find(querySeq->getName()); if (it != names.end()) { //user wants pic of this makeSVGpic(IS); //zeros out negative results } }else{//output them all makeSVGpic(IS); //zeros out negative results } } return *querySeq; } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "print"); exit(1); } } #endif //*************************************************************************************************************** int ChimeraCheckRDP::getChimeras(Sequence* query) { try { IS.clear(); querySeq = query; closest = templateDB->findClosestSequence(query); IS = findIS(); //determine chimera report cutoff - window score above 95% //getCutoff(); - not very acurate predictor return 0; } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "getChimeras"); exit(1); } } //*************************************************************************************************************** vector ChimeraCheckRDP::findIS() { try { vector< map > queryKmerInfo; //vector of maps - each entry in the vector is a map of the kmers up to that spot in the unaligned seq //example: seqKmerInfo[50] = map containing the kmers found in the first 50 + kmersize characters of ecoli. //i chose to store the kmers numbers in a map so you wouldn't have to check for dupilcate entries and could easily find the //kmers 2 seqs had in common. There may be a better way to do this thats why I am leaving so many comments... vector< map > subjectKmerInfo; vector isValues; string queryName = querySeq->getName(); string seq = querySeq->getUnaligned(); queryKmerInfo = kmer->getKmerCounts(seq); subjectKmerInfo = kmer->getKmerCounts(closest.getUnaligned()); //find total kmers you have in common with closest[query] by looking at the last entry in the vector of maps for each int nTotal = calcKmers(queryKmerInfo[(queryKmerInfo.size()-1)], subjectKmerInfo[(subjectKmerInfo.size()-1)]); //you don't want the starting point to be virtually at the end so move it in 10% int start = seq.length() / 10; //for each window for (int f = start; f < (seq.length() - start); f+=increment) { if ((f - kmerSize) < 0) { m->mothurOut("[ERROR]: Sequence " + querySeq->getName() + " is too short for your kmerSize, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (m->control_pressed) { return isValues; } sim temp; string fragLeft = seq.substr(0, f); //left side of breakpoint string fragRight = seq.substr(f); //right side of breakpoint //make a sequence of the left side and right side Sequence* left = new Sequence(queryName, fragLeft); Sequence* right = new Sequence(queryName, fragRight); //find seqs closest to each fragment Sequence closestLeft = templateDB->findClosestSequence(left); Sequence closestRight = templateDB->findClosestSequence(right); //get kmerinfo for the closest left vector< map > closeLeftKmerInfo = kmer->getKmerCounts(closestLeft.getUnaligned()); //get kmerinfo for the closest right vector< map > closeRightKmerInfo = kmer->getKmerCounts(closestRight.getUnaligned()); //right side is tricky - since the counts grow on eachother to find the correct counts of only the right side you must subtract the counts of the left side //iterate through left sides map to subtract the number of times you saw things before you got the the right side map rightside = queryKmerInfo[queryKmerInfo.size()-1]; for (map::iterator itleft = queryKmerInfo[f-kmerSize].begin(); itleft != queryKmerInfo[f-kmerSize].end(); itleft++) { int howManyTotal = queryKmerInfo[queryKmerInfo.size()-1][itleft->first]; //times that kmer was seen in total //itleft->second is times it was seen in left side, so howmanytotal - leftside should give you right side int howmanyright = howManyTotal - itleft->second; //if any were seen just on the left erase if (howmanyright == 0) { rightside.erase(itleft->first); } } map closerightside = closeRightKmerInfo[closeRightKmerInfo.size()-1]; for (map::iterator itright = closeRightKmerInfo[f-kmerSize].begin(); itright != closeRightKmerInfo[f-kmerSize].end(); itright++) { int howManyTotal = closeRightKmerInfo[(closeRightKmerInfo.size()-1)][itright->first]; //times that kmer was seen in total //itleft->second is times it was seen in left side, so howmanytotal - leftside should give you right side int howmanyright = howManyTotal - itright->second; //if any were seen just on the left erase if (howmanyright == 0) { closerightside.erase(itright->first); } } int nLeft = calcKmers(closeLeftKmerInfo[f-kmerSize], queryKmerInfo[f-kmerSize]); int nRight = calcKmers(closerightside, rightside); int is = nLeft + nRight - nTotal; //save IS, leftparent, rightparent, breakpoint temp.leftParent = closestLeft.getName(); temp.rightParent = closestRight.getName(); temp.score = is; temp.midpoint = f; isValues.push_back(temp); delete left; delete right; } return isValues; } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "findIS"); exit(1); } } //*************************************************************************************************************** void ChimeraCheckRDP::readName(string namefile) { try{ string name; #ifdef USE_MPI MPI_File inMPI; MPI_Offset size; MPI_Status status; //char* inFileName = new char[namefile.length()]; //memcpy(inFileName, namefile.c_str(), namefile.length()); char inFileName[1024]; strcpy(inFileName, namefile.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); MPI_File_get_size(inMPI, &size); //delete inFileName; char* buffer = new char[size]; MPI_File_read(inMPI, buffer, size, MPI_CHAR, &status); string tempBuf = buffer; if (tempBuf.length() > size) { tempBuf = tempBuf.substr(0, size); } istringstream iss (tempBuf,istringstream::in); delete buffer; while(!iss.eof()) { iss >> name; m->gobble(iss); names[name] = name; } MPI_File_close(&inMPI); #else ifstream in; m->openInputFile(namefile, in); while (!in.eof()) { in >> name; m->gobble(in); names[name] = name; } in.close(); #endif } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "readName"); exit(1); } } //*************************************************************************************************************** //find the smaller map and iterate through it and count kmers in common int ChimeraCheckRDP::calcKmers(map query, map subject) { try{ int common = 0; map::iterator smallone; map::iterator largeone; if (query.size() < subject.size()) { for (smallone = query.begin(); smallone != query.end(); smallone++) { largeone = subject.find(smallone->first); //if you found it they have that kmer in common if (largeone != subject.end()) { common++; } } }else { for (smallone = subject.begin(); smallone != subject.end(); smallone++) { largeone = query.find(smallone->first); //if you found it they have that kmer in common if (largeone != query.end()) { common++; } } } return common; } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "calcKmers"); exit(1); } } #ifdef USE_MPI //*************************************************************************************************************** void ChimeraCheckRDP::makeSVGpic(vector info) { try{ string file = outputDir + querySeq->getName() + ".chimeracheck.svg"; MPI_File outSVG; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; //char* FileName = new char[file.length()]; //memcpy(FileName, file.c_str(), file.length()); char FileName[1024]; strcpy(FileName, file.c_str()); MPI_File_open(MPI_COMM_SELF, FileName, outMode, MPI_INFO_NULL, &outSVG); //comm, filename, mode, info, filepointer //delete FileName; int width = (info.size()*5) + 150; string outString = ""; outString += "\n"; outString += "\n"; outString += "Plotted IS values for " + querySeq->getName() + "\n"; outString += "\n"; outString += "\n"; outString += "" + toString(info[0].midpoint) + "\n"; outString += "" + toString(info[info.size()-1].midpoint) + "\n"; outString += "Base Positions\n"; outString += "0\n"; outString += "IS\n"; //find max is score float biggest = 0.0; for (int i = 0; i < info.size(); i++) { if (info[i].score > biggest) { biggest = info[i].score; } } outString += "" + toString(biggest) + "\n"; int scaler2 = 500 / biggest; outString += " "; for (int i = 0; i < info.size(); i++) { if(info[i].score < 0) { info[i].score = 0; } outString += toString(((i*5) + 75)) + "," + toString((600 - (info[i].score * scaler2))) + " "; } outString += "\"/> "; outString += "\n\n"; MPI_Status status; int length = outString.length(); char* buf2 = new char[length]; memcpy(buf2, outString.c_str(), length); MPI_File_write(outSVG, buf2, length, MPI_CHAR, &status); delete buf2; MPI_File_close(&outSVG); } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "makeSVGpic"); exit(1); } } #else //*************************************************************************************************************** void ChimeraCheckRDP::makeSVGpic(vector info) { try{ string file = outputDir + querySeq->getName() + ".chimeracheck.svg"; ofstream outsvg; m->openOutputFile(file, outsvg); int width = (info.size()*5) + 150; outsvg << "\n"; outsvg << "\n"; outsvg << "Plotted IS values for " + querySeq->getName() + "\n"; outsvg << "\n"; outsvg << "\n"; outsvg << "" + toString(info[0].midpoint) + "\n"; outsvg << "" + toString(info[info.size()-1].midpoint) + "\n"; outsvg << "Base Positions\n"; outsvg << "0\n"; outsvg << "IS\n"; //find max is score float biggest = 0.0; for (int i = 0; i < info.size(); i++) { if (info[i].score > biggest) { biggest = info[i].score; } } outsvg << "" + toString(biggest) + "\n"; int scaler2 = 500 / biggest; outsvg << " "; for (int i = 0; i < info.size(); i++) { if(info[i].score < 0) { info[i].score = 0; } outsvg << ((i*5) + 75) << "," << (600 - (info[i].score * scaler2)) << " "; } outsvg << "\"/> "; outsvg << "\n\n"; outsvg.close(); } catch(exception& e) { m->errorOut(e, "ChimeraCheckRDP", "makeSVGpic"); exit(1); } } #endif //***************************************************************************************************************/ mothur-1.36.1/source/chimera/chimeracheckrdp.h000066400000000000000000000025511255543666200213360ustar00rootroot00000000000000#ifndef CHIMERACHECK_H #define CHIMERACHECK_H /* * chimeracheckrdp.h * Mothur * * Created by westcott on 9/8/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "chimera.h" #include "kmer.hpp" #include "kmerdb.hpp" #include "alignmentdb.h" /***********************************************************/ //This class was created using the algorithms described in //CHIMERA_CHECK version 2.7 written by Niels Larsen. /***********************************************************/ class ChimeraCheckRDP : public Chimera { public: ChimeraCheckRDP(string, string, string, bool, int, int, string); //fasta, template, name, svg, increment, ksize, outputDir ~ChimeraCheckRDP(); int getChimeras(Sequence*); Sequence print(ostream&, ostream&); #ifdef USE_MPI Sequence print(MPI_File&, MPI_File&); #endif private: Sequence* querySeq; AlignmentDB* templateDB; Kmer* kmer; Sequence closest; //closest is the closest overall seq to query vector IS; //IS is the vector of IS values for each window for query string fastafile; map names; string name; bool svg; int kmerSize, increment; vector findIS(); int calcKmers(map, map); void makeSVGpic(vector); void readName(string); }; /***********************************************************/ #endif mothur-1.36.1/source/chimera/chimerarealigner.cpp000066400000000000000000000236431255543666200220630ustar00rootroot00000000000000/* * chimerarealigner.cpp * Mothur * * Created by westcott on 2/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "chimerarealigner.h" #include "needlemanoverlap.hpp" #include "nast.hpp" //*************************************************************************************************************** ChimeraReAligner::ChimeraReAligner() { m = MothurOut::getInstance(); } //*************************************************************************************************************** ChimeraReAligner::~ChimeraReAligner() {} //*************************************************************************************************************** void ChimeraReAligner::reAlign(Sequence* query, vector parents) { try { // if (parents.size() != 0) { // vector queryParts; // vector parentParts; //queryParts[0] relates to parentParts[0] // // string qAligned = query->getAligned(); // string newQuery = ""; // // //sort parents by region start // sort(parents.begin(), parents.end(), compareRegionStart); // // //make sure you don't cutoff beginning of query // if (parents[0].nastRegionStart > 0) { newQuery += qAligned.substr(0, parents[0].nastRegionStart); } // int longest = 0; // // //take query and break apart into pieces using breakpoints given by results of parents // for (int i = 0; i < parents.size(); i++) { // int length = parents[i].nastRegionEnd - parents[i].nastRegionStart+1; // string q = qAligned.substr(parents[i].nastRegionStart, length); // // Sequence* queryFrag = new Sequence(query->getName(), q); // queryParts.push_back(queryFrag); // // string p = parents[i].parentAligned; // p = p.substr(parents[i].nastRegionStart, length); // Sequence* parent = new Sequence(parents[i].parent, p); // parentParts.push_back(parent); // // if (queryFrag->getUnaligned().length() > longest) { longest = queryFrag->getUnaligned().length(); } // if (parent->getUnaligned().length() > longest) { longest = parent->getUnaligned().length(); } // } // // //align each peice to correct parent from results // for (int i = 0; i < queryParts.size(); i++) { // if ((queryParts[i]->getUnaligned() == "") || (parentParts[i]->getUnaligned() == "")) {;} // else { // Alignment* alignment = new NeedlemanOverlap(-2.0, 1.0, -1.0, longest+1); //default gapopen, match, mismatch, longestbase // // Nast nast(alignment, queryParts[i], parentParts[i]); // delete alignment; // } // } // // //recombine pieces to form new query sequence // for (int i = 0; i < queryParts.size(); i++) { // //sometimes the parent regions do not meet, for example region 1 may end at 1000 and region 2 starts at 1100. // //we don't want to loose length so in this case we will leave query alone // if (i != 0) { // int space = parents[i].nastRegionStart - parents[i-1].nastRegionEnd - 1; // if (space > 0) { //they don't meet and we need to add query piece // string q = qAligned.substr(parents[i-1].nastRegionEnd+1, space); // newQuery += q; // } // } // // newQuery += queryParts[i]->getAligned(); // } // // //make sure you don't cutoff end of query // if (parents[parents.size()-1].nastRegionEnd < (qAligned.length()-1)) { newQuery += qAligned.substr(parents[parents.size()-1].nastRegionEnd+1); } // // query->setAligned(newQuery); // // //free memory // for (int i = 0; i < queryParts.size(); i++) { delete queryParts[i]; } // for (int i = 0; i < parentParts.size(); i++) { delete parentParts[i]; } // // } //else leave query alone, you have bigger problems... if(parents.size() != 0){ alignmentLength = query->getAlignLength(); //x int queryUnalignedLength = query->getNumBases(); //y buildTemplateProfile(parents); createAlignMatrix(queryUnalignedLength, alignmentLength); fillAlignMatrix(query->getUnaligned()); query->setAligned(getNewAlignment(query->getUnaligned())); } } catch(exception& e) { m->errorOut(e, "ChimeraReAligner", "reAlign"); exit(1); } } /***************************************************************************************************************/ void ChimeraReAligner::buildTemplateProfile(vector parents) { try{ int numParents = parents.size(); profile.resize(alignmentLength); for(int i=0;ierrorOut(e, "ChimeraReAligner", "buildTemplateProfile"); exit(1); } } /***************************************************************************************************************/ void ChimeraReAligner::createAlignMatrix(int queryUnalignedLength, int alignmentLength){ try{ alignMatrix.resize(alignmentLength+1); for(int i=0;i<=alignmentLength;i++){ alignMatrix[i].resize(queryUnalignedLength+1); } for(int i=1;i<=alignmentLength;i++) { alignMatrix[i][0].direction = 'l'; } for(int j=1;j<=queryUnalignedLength;j++){ alignMatrix[0][j].direction = 'u'; } } catch(exception& e) { m->errorOut(e, "ChimeraReAligner", "createAlignMatrix"); exit(1); } } /***************************************************************************************************************/ void ChimeraReAligner::fillAlignMatrix(string query){ try{ int GAP = -4; int nrows = alignMatrix.size()-1; int ncols = alignMatrix[0].size()-1; for(int i=1;i<=nrows;i++){ bases p = profile[i-1]; int numChars = p.Chars; for(int j=1;j<=ncols;j++){ char q = query[j-1]; // score it for if there was a match int maxScore = calcMatchScore(p, q) + alignMatrix[i-1][j-1].score; int maxDirection = 'd'; // score it for if there was a gap in the query int score = alignMatrix[i-1][j].score + (numChars * GAP); if (score > maxScore) { maxScore = score; maxDirection = 'l'; } alignMatrix[i][j].score = maxScore; alignMatrix[i][j].direction = maxDirection; } } } catch(exception& e) { m->errorOut(e, "ChimeraReAligner", "fillAlignMatrix"); exit(1); } } /***************************************************************************************************************/ int ChimeraReAligner::calcMatchScore(bases p, char q){ try{ int MATCH = 5; int MISMATCH = -4; int score = 0; if(q == 'G') { score = (MATCH * p.G + MISMATCH * (p.A + p.T + p.C + p.Gap)); } else if(q == 'A') { score = (MATCH * p.A + MISMATCH * (p.G + p.T + p.C + p.Gap)); } else if(q == 'T') { score = (MATCH * p.T + MISMATCH * (p.G + p.A + p.C + p.Gap)); } else if(q == 'C') { score = (MATCH * p.C + MISMATCH * (p.G + p.A + p.T + p.Gap)); } else { score = (MATCH * p.A + MISMATCH * (p.G + p.T + p.C + p.Gap)); } return score; } catch(exception& e) { m->errorOut(e, "ChimeraReAligner", "calcMatchScore"); exit(1); } } /***************************************************************************************************************/ string ChimeraReAligner::getNewAlignment(string query){ try{ string queryAlignment(alignmentLength, '.'); string referenceAlignment(alignmentLength, '.'); int maxScore = -99999999; int nrows = alignMatrix.size()-1; int ncols = alignMatrix[0].size()-1; int bestCol = -1; int bestRow = -1; for(int i=1;i<=nrows;i++){ int score = alignMatrix[i][ncols].score; if (score > maxScore) { maxScore = score; bestRow = i; bestCol = ncols; } } for(int j=1;j<=ncols;j++){ int score = alignMatrix[nrows][j].score; if (score > maxScore) { maxScore = score; bestRow = nrows; bestCol = j; } } int currentRow = bestRow; int currentCol = bestCol; int alignmentPosition = 0; if(currentRow < alignmentLength){ for(int i=alignmentLength;i>currentRow;i--){ alignmentPosition++; } } AlignCell c = alignMatrix[currentRow][currentCol]; while(c.direction != 'x'){ char q; if(c.direction == 'd'){ q = query[currentCol-1]; currentCol--; currentRow--; } else if (c.direction == 'u') { break; } else if(c.direction == 'l'){ char gapChar; if(currentCol == 0) { gapChar = '.'; } else { gapChar = '-'; } q = gapChar; currentRow--; } else{ cout << "you shouldn't be here..." << endl; } queryAlignment[alignmentPosition] = q; alignmentPosition++; c = alignMatrix[currentRow][currentCol]; } // need to reverse the string string flipSeq = ""; for(int i=alignmentLength-1;i>=0;i--){ flipSeq += queryAlignment[i]; } return flipSeq; } catch(exception& e) { m->errorOut(e, "ChimeraReAligner", "getNewAlignment"); exit(1); } } /***************************************************************************************************************/ // Sequence* ChimeraReAligner::getSequence(string name) { // try{ // Sequence* temp; // // //look through templateSeqs til you find it // int spot = -1; // for (int i = 0; i < templateSeqs.size(); i++) { // if (name == templateSeqs[i]->getName()) { // spot = i; // break; // } // } // // if(spot == -1) { m->mothurOut("Error: Could not find sequence."); m->mothurOutEndLine(); return NULL; } // // temp = new Sequence(templateSeqs[spot]->getName(), templateSeqs[spot]->getAligned()); // // return temp; // } // catch(exception& e) { // m->errorOut(e, "ChimeraReAligner", "getSequence"); // exit(1); // } //} //***************************************************************************************************************/ mothur-1.36.1/source/chimera/chimerarealigner.h000066400000000000000000000021441255543666200215210ustar00rootroot00000000000000#ifndef CHIMERAREALIGNER_H #define CHIMERAREALIGNER_H /* * chimerarealigner.h * Mothur * * Created by westcott on 2/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "chimera.h" #include "alignment.hpp" /***********************************************************/ struct AlignCell { int score; char direction; AlignCell() : score(0), direction('x') {}; }; /***********************************************************/ struct bases { int A, T, G, C, Gap, Chars; bases() : A(0), T(0), G(0), C(0), Gap(0), Chars(0){}; }; /***********************************************************/ class ChimeraReAligner { public: ChimeraReAligner(); ~ChimeraReAligner(); void reAlign(Sequence*, vector); private: void buildTemplateProfile(vector); void createAlignMatrix(int, int); void fillAlignMatrix(string); int calcMatchScore(bases, char); string getNewAlignment(string); int alignmentLength; vector profile; vector > alignMatrix; MothurOut* m; }; /***********************************************************/ #endif mothur-1.36.1/source/chimera/chimeraslayer.cpp000066400000000000000000001364031255543666200214110ustar00rootroot00000000000000/* * chimeraslayer.cpp * Mothur * * Created by westcott on 9/25/09. * Copyright 2009 Pschloss Lab. All rights reserved. * */ #include "chimeraslayer.h" #include "chimerarealigner.h" #include "kmerdb.hpp" #include "blastdb.hpp" //*************************************************************************************************************** ChimeraSlayer::ChimeraSlayer(string file, string temp, bool trim, string mode, int k, int ms, int mms, int win, float div, int minsim, int mincov, int minbs, int minsnp, int par, int it, int inc, int numw, bool r, string blas, int tid) : Chimera() { try { fastafile = file; templateFileName = temp; templateSeqs = readSeqs(temp); searchMethod = mode; kmerSize = k; match = ms; misMatch = mms; window = win; divR = div; minSim = minsim; minCov = mincov; minBS = minbs; minSNP = minsnp; parents = par; iters = it; increment = inc; numWanted = numw; realign = r; trimChimera = trim; numNoParents = 0; blastlocation = blas; threadID = tid; doPrep(); } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "ChimeraSlayer"); exit(1); } } //*************************************************************************************************************** //template=self, byGroup parameter used for mpienabled version to read the template as MPI_COMM_SELF instead of MPI_COMM_WORLD ChimeraSlayer::ChimeraSlayer(string file, string temp, bool trim, map& prior, string mode, int k, int ms, int mms, int win, float div, int minsim, int mincov, int minbs, int minsnp, int par, int it, int inc, int numw, bool r, string blas, int tid, bool bg) : Chimera() { try { byGroup = bg; fastafile = file; templateSeqs = readSeqs(fastafile); templateFileName = temp; searchMethod = mode; kmerSize = k; match = ms; misMatch = mms; window = win; divR = div; minSim = minsim; minCov = mincov; minBS = minbs; minSNP = minsnp; parents = par; iters = it; increment = inc; numWanted = numw; realign = r; trimChimera = trim; priority = prior; numNoParents = 0; blastlocation = blas; threadID = tid; createFilter(templateSeqs, 0.0); //just removed columns where all seqs have a gap if (searchMethod == "distance") { //createFilter(templateSeqs, 0.0); //just removed columns where all seqs have a gap //run filter on template copying templateSeqs into filteredTemplateSeqs for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { break; } Sequence* newSeq = new Sequence(templateSeqs[i]->getName(), templateSeqs[i]->getAligned()); runFilter(newSeq); filteredTemplateSeqs.push_back(newSeq); } } } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "ChimeraSlayer"); exit(1); } } //*************************************************************************************************************** //template=self ChimeraSlayer::ChimeraSlayer(string file, string temp, bool trim, map& prior, string mode, int k, int ms, int mms, int win, float div, int minsim, int mincov, int minbs, int minsnp, int par, int it, int inc, int numw, bool r, string blas, int tid) : Chimera() { try { fastafile = file; templateSeqs = readSeqs(fastafile); templateFileName = temp; searchMethod = mode; kmerSize = k; match = ms; misMatch = mms; window = win; divR = div; minSim = minsim; minCov = mincov; minBS = minbs; minSNP = minsnp; parents = par; iters = it; increment = inc; numWanted = numw; realign = r; trimChimera = trim; priority = prior; numNoParents = 0; blastlocation = blas; threadID = tid; createFilter(templateSeqs, 0.0); //just removed columns where all seqs have a gap if (searchMethod == "distance") { //createFilter(templateSeqs, 0.0); //just removed columns where all seqs have a gap //run filter on template copying templateSeqs into filteredTemplateSeqs for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { break; } Sequence* newSeq = new Sequence(templateSeqs[i]->getName(), templateSeqs[i]->getAligned()); runFilter(newSeq); filteredTemplateSeqs.push_back(newSeq); } } } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "ChimeraSlayer"); exit(1); } } //*************************************************************************************************************** int ChimeraSlayer::doPrep() { try { if (searchMethod == "distance") { //read in all query seqs vector tempQuerySeqs = readSeqs(fastafile); vector temp = templateSeqs; for (int i = 0; i < tempQuerySeqs.size(); i++) { temp.push_back(tempQuerySeqs[i]); } createFilter(temp, 0.0); //just removed columns where all seqs have a gap for (int i = 0; i < tempQuerySeqs.size(); i++) { delete tempQuerySeqs[i]; } if (m->control_pressed) { return 0; } //run filter on template copying templateSeqs into filteredTemplateSeqs for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { return 0; } Sequence* newSeq = new Sequence(templateSeqs[i]->getName(), templateSeqs[i]->getAligned()); runFilter(newSeq); filteredTemplateSeqs.push_back(newSeq); } } string kmerDBNameLeft; string kmerDBNameRight; //generate the kmerdb to pass to maligner if (searchMethod == "kmer") { string templatePath = m->hasPath(templateFileName); string rightTemplateFileName = templatePath + "right." + m->getRootName(m->getSimpleName(templateFileName)); databaseRight = new KmerDB(rightTemplateFileName, kmerSize); string leftTemplateFileName = templatePath + "left." + m->getRootName(m->getSimpleName(templateFileName)); databaseLeft = new KmerDB(leftTemplateFileName, kmerSize); #ifdef USE_MPI for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { return 0; } string leftFrag = templateSeqs[i]->getUnaligned(); leftFrag = leftFrag.substr(0, int(leftFrag.length() * 0.33)); Sequence leftTemp(templateSeqs[i]->getName(), leftFrag); databaseLeft->addSequence(leftTemp); } databaseLeft->generateDB(); databaseLeft->setNumSeqs(templateSeqs.size()); for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { return 0; } string rightFrag = templateSeqs[i]->getUnaligned(); rightFrag = rightFrag.substr(int(rightFrag.length() * 0.66)); Sequence rightTemp(templateSeqs[i]->getName(), rightFrag); databaseRight->addSequence(rightTemp); } databaseRight->generateDB(); databaseRight->setNumSeqs(templateSeqs.size()); #else //leftside kmerDBNameLeft = leftTemplateFileName.substr(0,leftTemplateFileName.find_last_of(".")+1) + char('0'+ kmerSize) + "mer"; ifstream kmerFileTestLeft(kmerDBNameLeft.c_str()); bool needToGenerateLeft = true; if(kmerFileTestLeft){ bool GoodFile = m->checkReleaseVersion(kmerFileTestLeft, m->getVersion()); if (GoodFile) { needToGenerateLeft = false; } } if(needToGenerateLeft){ for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { return 0; } string leftFrag = templateSeqs[i]->getUnaligned(); leftFrag = leftFrag.substr(0, int(leftFrag.length() * 0.33)); Sequence leftTemp(templateSeqs[i]->getName(), leftFrag); databaseLeft->addSequence(leftTemp); } databaseLeft->generateDB(); }else { databaseLeft->readKmerDB(kmerFileTestLeft); } kmerFileTestLeft.close(); databaseLeft->setNumSeqs(templateSeqs.size()); //rightside kmerDBNameRight = rightTemplateFileName.substr(0,rightTemplateFileName.find_last_of(".")+1) + char('0'+ kmerSize) + "mer"; ifstream kmerFileTestRight(kmerDBNameRight.c_str()); bool needToGenerateRight = true; if(kmerFileTestRight){ bool GoodFile = m->checkReleaseVersion(kmerFileTestRight, m->getVersion()); if (GoodFile) { needToGenerateRight = false; } } if(needToGenerateRight){ for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { return 0; } string rightFrag = templateSeqs[i]->getUnaligned(); rightFrag = rightFrag.substr(int(rightFrag.length() * 0.66)); Sequence rightTemp(templateSeqs[i]->getName(), rightFrag); databaseRight->addSequence(rightTemp); } databaseRight->generateDB(); }else { databaseRight->readKmerDB(kmerFileTestRight); } kmerFileTestRight.close(); databaseRight->setNumSeqs(templateSeqs.size()); #endif }else if (searchMethod == "blast") { //generate blastdb databaseLeft = new BlastDB(m->getRootName(m->getSimpleName(fastafile)), -1.0, -1.0, 1, -3, blastlocation, threadID); if (m->control_pressed) { return 0; } for (int i = 0; i < templateSeqs.size(); i++) { databaseLeft->addSequence(*templateSeqs[i]); } databaseLeft->generateDB(); databaseLeft->setNumSeqs(templateSeqs.size()); } return 0; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "doprep"); exit(1); } } //*************************************************************************************************************** vector ChimeraSlayer::getTemplate(Sequence q, vector& userTemplateFiltered) { try { //when template=self, the query file is sorted from most abundance to least abundant //userTemplate grows as the query file is processed by adding sequences that are not chimeric and more abundant vector userTemplate; int myAbund = priority[q.getName()]; for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { return userTemplate; } //have I reached a sequence with the same abundance as myself? if (!(priority[templateSeqs[i]->getName()] > myAbund)) { break; } //if its am not chimeric add it if (chimericSeqs.count(templateSeqs[i]->getName()) == 0) { userTemplate.push_back(templateSeqs[i]); if (searchMethod == "distance") { userTemplateFiltered.push_back(filteredTemplateSeqs[i]); } } } //avoids nuisance error from formatdb for making blank blast database if (userTemplate.size() == 0) { return userTemplate; } string kmerDBNameLeft; string kmerDBNameRight; //generate the kmerdb to pass to maligner if (searchMethod == "kmer") { string templatePath = m->hasPath(templateFileName); string rightTemplateFileName = templatePath + "right." + m->getRootName(m->getSimpleName(templateFileName)); databaseRight = new KmerDB(rightTemplateFileName, kmerSize); string leftTemplateFileName = templatePath + "left." + m->getRootName(m->getSimpleName(templateFileName)); databaseLeft = new KmerDB(leftTemplateFileName, kmerSize); #ifdef USE_MPI for (int i = 0; i < userTemplate.size(); i++) { if (m->control_pressed) { return userTemplate; } string leftFrag = userTemplate[i]->getUnaligned(); leftFrag = leftFrag.substr(0, int(leftFrag.length() * 0.33)); Sequence leftTemp(userTemplate[i]->getName(), leftFrag); databaseLeft->addSequence(leftTemp); } databaseLeft->generateDB(); databaseLeft->setNumSeqs(userTemplate.size()); for (int i = 0; i < userTemplate.size(); i++) { if (m->control_pressed) { return userTemplate; } string rightFrag = userTemplate[i]->getUnaligned(); rightFrag = rightFrag.substr(int(rightFrag.length() * 0.66)); Sequence rightTemp(userTemplate[i]->getName(), rightFrag); databaseRight->addSequence(rightTemp); } databaseRight->generateDB(); databaseRight->setNumSeqs(userTemplate.size()); #else for (int i = 0; i < userTemplate.size(); i++) { if (m->control_pressed) { return userTemplate; } string leftFrag = userTemplate[i]->getUnaligned(); leftFrag = leftFrag.substr(0, int(leftFrag.length() * 0.33)); Sequence leftTemp(userTemplate[i]->getName(), leftFrag); databaseLeft->addSequence(leftTemp); } databaseLeft->generateDB(); databaseLeft->setNumSeqs(userTemplate.size()); for (int i = 0; i < userTemplate.size(); i++) { if (m->control_pressed) { return userTemplate; } string rightFrag = userTemplate[i]->getUnaligned(); rightFrag = rightFrag.substr(int(rightFrag.length() * 0.66)); Sequence rightTemp(userTemplate[i]->getName(), rightFrag); databaseRight->addSequence(rightTemp); } databaseRight->generateDB(); databaseRight->setNumSeqs(userTemplate.size()); #endif }else if (searchMethod == "blast") { //generate blastdb databaseLeft = new BlastDB(m->getRootName(m->getSimpleName(templateFileName)), -1.0, -1.0, 1, -3, blastlocation, threadID); if (m->control_pressed) { return userTemplate; } for (int i = 0; i < userTemplate.size(); i++) { if (m->control_pressed) { return userTemplate; } databaseLeft->addSequence(*userTemplate[i]); } databaseLeft->generateDB(); databaseLeft->setNumSeqs(userTemplate.size()); } return userTemplate; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "getTemplate"); exit(1); } } //*************************************************************************************************************** ChimeraSlayer::~ChimeraSlayer() { if (templateFileName != "self") { if (searchMethod == "kmer") { delete databaseRight; delete databaseLeft; } else if (searchMethod == "blast") { delete databaseLeft; } } } //*************************************************************************************************************** void ChimeraSlayer::printHeader(ostream& out) { m->mothurOutEndLine(); m->mothurOut("Only reporting sequence supported by " + toString(minBS) + "% of bootstrapped results."); m->mothurOutEndLine(); out << "Name\tLeftParent\tRightParent\tDivQLAQRB\tPerIDQLAQRB\tBootStrapA\tDivQLBQRA\tPerIDQLBQRA\tBootStrapB\tFlag\tLeftWindow\tRightWindow\n"; } //*************************************************************************************************************** Sequence ChimeraSlayer::print(ostream& out, ostream& outAcc) { try { Sequence trim; if (trimChimera) { trim.setName(trimQuery.getName()); trim.setAligned(trimQuery.getAligned()); } if (chimeraFlags == "yes") { string chimeraFlag = "no"; if( (chimeraResults[0].bsa >= minBS && chimeraResults[0].divr_qla_qrb >= divR) || (chimeraResults[0].bsb >= minBS && chimeraResults[0].divr_qlb_qra >= divR) ) { chimeraFlag = "yes"; } if (chimeraFlag == "yes") { if ((chimeraResults[0].bsa >= minBS) || (chimeraResults[0].bsb >= minBS)) { m->mothurOut(querySeq.getName() + "\tyes"); m->mothurOutEndLine(); outAcc << querySeq.getName() << endl; if (templateFileName == "self") { chimericSeqs.insert(querySeq.getName()); } if (trimChimera) { int lengthLeft = chimeraResults[0].winLEnd - chimeraResults[0].winLStart; int lengthRight = chimeraResults[0].winREnd - chimeraResults[0].winRStart; string newAligned = trim.getAligned(); if (lengthLeft > lengthRight) { //trim right for (int i = (chimeraResults[0].winRStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else { //trim left for (int i = 0; i < chimeraResults[0].winLEnd; i++) { newAligned[i] = '.'; } } trim.setAligned(newAligned); } } } printBlock(chimeraResults[0], chimeraFlag, out); }else { out << querySeq.getName() << "\tno" << endl; } return trim; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "print"); exit(1); } } //*************************************************************************************************************** Sequence ChimeraSlayer::print(ostream& out, ostream& outAcc, data_results leftPiece, data_results rightPiece) { try { Sequence trim; if (trimChimera) { string aligned = leftPiece.trimQuery.getAligned() + rightPiece.trimQuery.getAligned(); trim.setName(leftPiece.trimQuery.getName()); trim.setAligned(aligned); } if ((leftPiece.flag == "yes") || (rightPiece.flag == "yes")) { string chimeraFlag = "no"; if (leftPiece.flag == "yes") { if( (leftPiece.results[0].bsa >= minBS && leftPiece.results[0].divr_qla_qrb >= divR) || (leftPiece.results[0].bsb >= minBS && leftPiece.results[0].divr_qlb_qra >= divR) ) { chimeraFlag = "yes"; } } if (rightPiece.flag == "yes") { if ( (rightPiece.results[0].bsa >= minBS && rightPiece.results[0].divr_qla_qrb >= divR) || (rightPiece.results[0].bsb >= minBS && rightPiece.results[0].divr_qlb_qra >= divR) ) { chimeraFlag = "yes"; } } bool rightChimeric = false; bool leftChimeric = false; if (chimeraFlag == "yes") { //which peice is chimeric or are both if (rightPiece.flag == "yes") { if ((rightPiece.results[0].bsa >= minBS) || (rightPiece.results[0].bsb >= minBS)) { rightChimeric = true; } } if (leftPiece.flag == "yes") { if ((leftPiece.results[0].bsa >= minBS) || (leftPiece.results[0].bsb >= minBS)) { leftChimeric = true; } } if (rightChimeric || leftChimeric) { m->mothurOut(querySeq.getName() + "\tyes"); m->mothurOutEndLine(); outAcc << querySeq.getName() << endl; if (templateFileName == "self") { chimericSeqs.insert(querySeq.getName()); } if (trimChimera) { string newAligned = trim.getAligned(); //right side is fine so keep that if ((leftChimeric) && (!rightChimeric)) { for (int i = 0; i < leftPiece.results[0].winREnd; i++) { newAligned[i] = '.'; } }else if ((!leftChimeric) && (rightChimeric)) { //leftside is fine so keep that for (int i = (rightPiece.results[0].winLStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else { //both sides are chimeric, keep longest piece int lengthLeftLeft = leftPiece.results[0].winLEnd - leftPiece.results[0].winLStart; int lengthLeftRight = leftPiece.results[0].winREnd - leftPiece.results[0].winRStart; int longest = 1; // leftleft = 1, leftright = 2, rightleft = 3 rightright = 4 int length = lengthLeftLeft; if (lengthLeftLeft < lengthLeftRight) { longest = 2; length = lengthLeftRight; } int lengthRightLeft = rightPiece.results[0].winLEnd - rightPiece.results[0].winLStart; int lengthRightRight = rightPiece.results[0].winREnd - rightPiece.results[0].winRStart; if (lengthRightLeft > length) { longest = 3; length = lengthRightLeft; } if (lengthRightRight > length) { longest = 4; } if (longest == 1) { //leftleft for (int i = (leftPiece.results[0].winRStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else if (longest == 2) { //leftright //get rid of leftleft for (int i = (leftPiece.results[0].winLStart-1); i < (leftPiece.results[0].winLEnd-1); i++) { newAligned[i] = '.'; } //get rid of right for (int i = (rightPiece.results[0].winLStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else if (longest == 3) { //rightleft //get rid of left for (int i = 0; i < leftPiece.results[0].winREnd; i++) { newAligned[i] = '.'; } //get rid of rightright for (int i = (rightPiece.results[0].winRStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else { //rightright //get rid of left for (int i = 0; i < leftPiece.results[0].winREnd; i++) { newAligned[i] = '.'; } //get rid of rightleft for (int i = (rightPiece.results[0].winLStart-1); i < (rightPiece.results[0].winLEnd-1); i++) { newAligned[i] = '.'; } } } trim.setAligned(newAligned); } } } printBlock(leftPiece, rightPiece, leftChimeric, rightChimeric, chimeraFlag, out); }else { out << querySeq.getName() << "\tno" << endl; } return trim; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "print"); exit(1); } } #ifdef USE_MPI //*************************************************************************************************************** Sequence ChimeraSlayer::print(MPI_File& out, MPI_File& outAcc, data_results leftPiece, data_results rightPiece, bool& chimFlag) { try { MPI_Status status; bool results = false; string outAccString = ""; string outputString = ""; chimFlag = false; Sequence trim; if (trimChimera) { string aligned = leftPiece.trimQuery.getAligned() + rightPiece.trimQuery.getAligned(); trim.setName(leftPiece.trimQuery.getName()); trim.setAligned(aligned); } if ((leftPiece.flag == "yes") || (rightPiece.flag == "yes")) { string chimeraFlag = "no"; if (leftPiece.flag == "yes") { if( (leftPiece.results[0].bsa >= minBS && leftPiece.results[0].divr_qla_qrb >= divR) || (leftPiece.results[0].bsb >= minBS && leftPiece.results[0].divr_qlb_qra >= divR) ) { chimeraFlag = "yes"; } } if (rightPiece.flag == "yes") { if ( (rightPiece.results[0].bsa >= minBS && rightPiece.results[0].divr_qla_qrb >= divR) || (rightPiece.results[0].bsb >= minBS && rightPiece.results[0].divr_qlb_qra >= divR) ) { chimeraFlag = "yes"; } } bool rightChimeric = false; bool leftChimeric = false; cout << endl; if (chimeraFlag == "yes") { //which peice is chimeric or are both if (rightPiece.flag == "yes") { if ((rightPiece.results[0].bsa >= minBS) || (rightPiece.results[0].bsb >= minBS)) { rightChimeric = true; } } if (leftPiece.flag == "yes") { if ((leftPiece.results[0].bsa >= minBS) || (leftPiece.results[0].bsb >= minBS)) { leftChimeric = true; } } if (rightChimeric || leftChimeric) { cout << querySeq.getName() << "\tyes" << endl; outAccString += querySeq.getName() + "\n"; results = true; if (templateFileName == "self") { chimericSeqs.insert(querySeq.getName()); } //write to accnos file int length = outAccString.length(); char* buf2 = new char[length]; memcpy(buf2, outAccString.c_str(), length); MPI_File_write_shared(outAcc, buf2, length, MPI_CHAR, &status); chimFlag = true; delete buf2; if (trimChimera) { string newAligned = trim.getAligned(); //right side is fine so keep that if ((leftChimeric) && (!rightChimeric)) { for (int i = 0; i < leftPiece.results[0].winREnd; i++) { newAligned[i] = '.'; } }else if ((!leftChimeric) && (rightChimeric)) { //leftside is fine so keep that for (int i = (rightPiece.results[0].winLStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else { //both sides are chimeric, keep longest piece int lengthLeftLeft = leftPiece.results[0].winLEnd - leftPiece.results[0].winLStart; int lengthLeftRight = leftPiece.results[0].winREnd - leftPiece.results[0].winRStart; int longest = 1; // leftleft = 1, leftright = 2, rightleft = 3 rightright = 4 int length = lengthLeftLeft; if (lengthLeftLeft < lengthLeftRight) { longest = 2; length = lengthLeftRight; } int lengthRightLeft = rightPiece.results[0].winLEnd - rightPiece.results[0].winLStart; int lengthRightRight = rightPiece.results[0].winREnd - rightPiece.results[0].winRStart; if (lengthRightLeft > length) { longest = 3; length = lengthRightLeft; } if (lengthRightRight > length) { longest = 4; } if (longest == 1) { //leftleft for (int i = (leftPiece.results[0].winRStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else if (longest == 2) { //leftright //get rid of leftleft for (int i = (leftPiece.results[0].winLStart-1); i < (leftPiece.results[0].winLEnd-1); i++) { newAligned[i] = '.'; } //get rid of right for (int i = (rightPiece.results[0].winLStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else if (longest == 3) { //rightleft //get rid of left for (int i = 0; i < leftPiece.results[0].winREnd; i++) { newAligned[i] = '.'; } //get rid of rightright for (int i = (rightPiece.results[0].winRStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else { //rightright //get rid of left for (int i = 0; i < leftPiece.results[0].winREnd; i++) { newAligned[i] = '.'; } //get rid of rightleft for (int i = (rightPiece.results[0].winLStart-1); i < (rightPiece.results[0].winLEnd-1); i++) { newAligned[i] = '.'; } } } trim.setAligned(newAligned); } } } outputString = getBlock(leftPiece, rightPiece, leftChimeric, rightChimeric, chimeraFlag); //write to output file int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(out, buf, length, MPI_CHAR, &status); delete buf; }else { outputString += querySeq.getName() + "\tno\n"; //write to output file int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(out, buf, length, MPI_CHAR, &status); delete buf; } return trim; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "print"); exit(1); } } //*************************************************************************************************************** Sequence ChimeraSlayer::print(MPI_File& out, MPI_File& outAcc) { try { MPI_Status status; bool results = false; string outAccString = ""; string outputString = ""; Sequence trim; if (trimChimera) { trim.setName(trimQuery.getName()); trim.setAligned(trimQuery.getAligned()); } if (chimeraFlags == "yes") { string chimeraFlag = "no"; if( (chimeraResults[0].bsa >= minBS && chimeraResults[0].divr_qla_qrb >= divR) || (chimeraResults[0].bsb >= minBS && chimeraResults[0].divr_qlb_qra >= divR) ) { chimeraFlag = "yes"; } if (chimeraFlag == "yes") { if ((chimeraResults[0].bsa >= minBS) || (chimeraResults[0].bsb >= minBS)) { cout << querySeq.getName() << "\tyes" << endl; outAccString += querySeq.getName() + "\n"; results = true; if (templateFileName == "self") { chimericSeqs.insert(querySeq.getName()); } //write to accnos file int length = outAccString.length(); char* buf2 = new char[length]; memcpy(buf2, outAccString.c_str(), length); MPI_File_write_shared(outAcc, buf2, length, MPI_CHAR, &status); delete buf2; if (trimChimera) { int lengthLeft = chimeraResults[0].winLEnd - chimeraResults[0].winLStart; int lengthRight = chimeraResults[0].winREnd - chimeraResults[0].winRStart; string newAligned = trim.getAligned(); if (lengthLeft > lengthRight) { //trim right for (int i = (chimeraResults[0].winRStart-1); i < newAligned.length(); i++) { newAligned[i] = '.'; } }else { //trim left for (int i = 0; i < (chimeraResults[0].winLEnd-1); i++) { newAligned[i] = '.'; } } trim.setAligned(newAligned); } } } outputString = getBlock(chimeraResults[0], chimeraFlag); //write to output file int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(out, buf, length, MPI_CHAR, &status); delete buf; }else { outputString += querySeq.getName() + "\tno\n"; //write to output file int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(out, buf, length, MPI_CHAR, &status); delete buf; } return trim; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "print"); exit(1); } } #endif //*************************************************************************************************************** int ChimeraSlayer::getChimeras(Sequence* query) { try { trimQuery.setName(query->getName()); trimQuery.setAligned(query->getAligned()); printResults.trimQuery = trimQuery; chimeraFlags = "no"; printResults.flag = "no"; querySeq = *query; //you must create a template vector thisTemplate; vector thisFilteredTemplate; if (templateFileName != "self") { thisTemplate = templateSeqs; thisFilteredTemplate = filteredTemplateSeqs; } else { thisTemplate = getTemplate(*query, thisFilteredTemplate); } //fills this template and creates the databases if (m->control_pressed) { return 0; } if (thisTemplate.size() == 0) { return 0; } //not chimeric //moved this out of maligner - 4/29/11 vector refSeqs = getRefSeqs(*query, thisTemplate, thisFilteredTemplate); Maligner maligner(refSeqs, match, misMatch, divR, minSim, minCov); Slayer slayer(window, increment, minSim, divR, iters, minSNP, minBS); if (templateFileName == "self") { if (searchMethod == "kmer") { delete databaseRight; delete databaseLeft; } else if (searchMethod == "blast") { delete databaseLeft; } } if (m->control_pressed) { return 0; } string chimeraFlag = maligner.getResults(*query, decalc); if (m->control_pressed) { return 0; } vector Results = maligner.getOutput(); //for (int i = 0; i < refSeqs.size(); i++) { delete refSeqs[i]; } if (chimeraFlag == "yes") { if (realign) { vector parents; for (int i = 0; i < Results.size(); i++) { parents.push_back(Results[i].parentAligned); } ChimeraReAligner realigner; realigner.reAlign(query, parents); } // cout << query->getAligned() << endl; //get sequence that were given from maligner results vector seqs; map removeDups; map::iterator itDup; map parentNameSeq; map::iterator itSeq; for (int j = 0; j < Results.size(); j++) { float dist = (Results[j].regionEnd - Results[j].regionStart + 1) * Results[j].queryToParentLocal; //only add if you are not a duplicate // cout << Results[j].parent << '\t' << Results[j].regionEnd << '\t' << Results[j].regionStart << '\t' << Results[j].regionEnd - Results[j].regionStart +1 << '\t' << Results[j].queryToParentLocal << '\t' << dist << endl; if(Results[j].queryToParentLocal >= 90){ //local match has to be over 90% similarity itDup = removeDups.find(Results[j].parent); if (itDup == removeDups.end()) { //this is not duplicate removeDups[Results[j].parent] = dist; parentNameSeq[Results[j].parent] = Results[j].parentAligned; }else if (dist > itDup->second) { //is this a stronger number for this parent removeDups[Results[j].parent] = dist; parentNameSeq[Results[j].parent] = Results[j].parentAligned; } } } for (itDup = removeDups.begin(); itDup != removeDups.end(); itDup++) { itSeq = parentNameSeq.find(itDup->first); Sequence seq(itDup->first, itSeq->second); SeqCompare member; member.seq = seq; member.dist = itDup->second; seqs.push_back(member); } //limit number of parents to explore - default 3 if (Results.size() > parents) { //sort by distance sort(seqs.begin(), seqs.end(), compareSeqCompare); //prioritize larger more similiar sequence fragments reverse(seqs.begin(), seqs.end()); //for (int k = seqs.size()-1; k > (parents-1); k--) { // delete seqs[k].seq; //seqs.pop_back(); //} } //put seqs into vector to send to slayer // cout << query->getAligned() << endl; vector seqsForSlayer; for (int k = 0; k < seqs.size(); k++) { // cout << seqs[k].seq->getAligned() << endl; seqsForSlayer.push_back(seqs[k].seq); // cout << seqs[k].seq->getName() << endl; } if (m->control_pressed) { return 0; } //send to slayer chimeraFlags = slayer.getResults(*query, seqsForSlayer); if (m->control_pressed) { return 0; } chimeraResults = slayer.getOutput(); printResults.flag = chimeraFlags; printResults.results = chimeraResults; //free memory //for (int k = 0; k < seqs.size(); k++) { delete seqs[k].seq; } } //cout << endl << endl; return 0; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "getChimeras"); exit(1); } } //*************************************************************************************************************** void ChimeraSlayer::printBlock(data_struct data, string flag, ostream& out){ try { out << querySeq.getName(); out << '\t' << data.parentA.getName() << "\t" << data.parentB.getName(); out << '\t' << data.divr_qla_qrb << '\t' << data.qla_qrb << '\t' << data.bsa; out << '\t' << data.divr_qlb_qra << '\t' << data.qlb_qra << '\t' << data.bsb ; out << '\t' << flag << '\t' << data.winLStart << "-" << data.winLEnd << '\t' << data.winRStart << "-" << data.winREnd << '\n'; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "printBlock"); exit(1); } } //*************************************************************************************************************** void ChimeraSlayer::printBlock(data_results leftdata, data_results rightdata, bool leftChimeric, bool rightChimeric, string flag, ostream& out){ try { if ((leftChimeric) && (!rightChimeric)) { //print left out << querySeq.getName(); out << '\t' << leftdata.results[0].parentA.getName() << "\t" << leftdata.results[0].parentB.getName(); out << '\t' << leftdata.results[0].divr_qla_qrb << '\t' << leftdata.results[0].qla_qrb << '\t' << leftdata.results[0].bsa; out << '\t' << leftdata.results[0].divr_qlb_qra << '\t' << leftdata.results[0].qlb_qra << '\t' << leftdata.results[0].bsb; out << '\t' << flag << '\t' << leftdata.results[0].winLStart << "-" << leftdata.results[0].winLEnd << '\t' << leftdata.results[0].winRStart << "-" << leftdata.results[0].winREnd << endl; }else if ((!leftChimeric) && (rightChimeric)) { //print right out << querySeq.getName(); out << '\t' << rightdata.results[0].parentA.getName() << "\t" << rightdata.results[0].parentB.getName(); out << '\t' << rightdata.results[0].divr_qla_qrb << '\t' << rightdata.results[0].qla_qrb << '\t' << rightdata.results[0].bsa; out << '\t' << rightdata.results[0].divr_qlb_qra << '\t' << rightdata.results[0].qlb_qra << '\t' << rightdata.results[0].bsb; out << '\t' << flag << '\t' << rightdata.results[0].winLStart << "-" << rightdata.results[0].winLEnd << '\t' << rightdata.results[0].winRStart << "-" << rightdata.results[0].winREnd << endl; }else { //print both results if (leftdata.flag == "yes") { out << querySeq.getName() + "_LEFT"; out << '\t' << leftdata.results[0].parentA.getName() << "\t" << leftdata.results[0].parentB.getName(); out << '\t' << leftdata.results[0].divr_qla_qrb << '\t' << leftdata.results[0].qla_qrb << '\t' << leftdata.results[0].bsa; out << '\t' << leftdata.results[0].divr_qlb_qra << '\t' << leftdata.results[0].qlb_qra << '\t' << leftdata.results[0].bsb; out << '\t' << flag << '\t' << leftdata.results[0].winLStart << "-" << leftdata.results[0].winLEnd << '\t' << leftdata.results[0].winRStart << "-" << leftdata.results[0].winREnd << endl; } if (rightdata.flag == "yes") { out << querySeq.getName() + "_RIGHT"; out << '\t' << rightdata.results[0].parentA.getName() << "\t" << rightdata.results[0].parentB.getName(); out << '\t' << rightdata.results[0].divr_qla_qrb << '\t' << rightdata.results[0].qla_qrb << '\t' << rightdata.results[0].bsa; out << '\t' << rightdata.results[0].divr_qlb_qra << '\t' << rightdata.results[0].qlb_qra << '\t' << rightdata.results[0].bsb; out << '\t' << flag << '\t' << rightdata.results[0].winLStart << "-" << rightdata.results[0].winLEnd << '\t' << rightdata.results[0].winRStart << "-" << rightdata.results[0].winREnd << '\n'; } } } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "printBlock"); exit(1); } } //*************************************************************************************************************** string ChimeraSlayer::getBlock(data_results leftdata, data_results rightdata, bool leftChimeric, bool rightChimeric, string flag){ try { string out = ""; if ((leftChimeric) && (!rightChimeric)) { //get left out += querySeq.getName(); out += "\t" + leftdata.results[0].parentA.getName() + "\t" + leftdata.results[0].parentB.getName(); out += "\t" + toString(leftdata.results[0].divr_qla_qrb) + "\t" + toString(leftdata.results[0].qla_qrb) + "\t" + toString(leftdata.results[0].bsa); out += "\t" + toString(leftdata.results[0].divr_qlb_qra) + "\t" + toString(leftdata.results[0].qlb_qra) + "\t" + toString(leftdata.results[0].bsb); out += "\t" + flag + "\t" + toString(leftdata.results[0].winLStart) + "-" + toString(leftdata.results[0].winLEnd) + "\t" + toString(leftdata.results[0].winRStart) + "-" + toString(leftdata.results[0].winREnd) + "\n"; }else if ((!leftChimeric) && (rightChimeric)) { //print right out += querySeq.getName(); out += "\t" + rightdata.results[0].parentA.getName() + "\t" + rightdata.results[0].parentB.getName(); out += "\t" + toString(rightdata.results[0].divr_qla_qrb) + "\t" + toString(rightdata.results[0].qla_qrb) + "\t" + toString(rightdata.results[0].bsa); out += "\t" + toString(rightdata.results[0].divr_qlb_qra) + "\t" + toString(rightdata.results[0].qlb_qra) + "\t" + toString(rightdata.results[0].bsb); out += "\t" + flag + "\t" + toString(rightdata.results[0].winLStart) + "-" + toString(rightdata.results[0].winLEnd) + "\t" + toString(rightdata.results[0].winRStart) + "-" + toString(rightdata.results[0].winREnd) + "\n"; }else { //print both results if (leftdata.flag == "yes") { out += querySeq.getName() + "_LEFT"; out += "\t" + leftdata.results[0].parentA.getName() + "\t" + leftdata.results[0].parentB.getName(); out += "\t" + toString(leftdata.results[0].divr_qla_qrb) + "\t" + toString(leftdata.results[0].qla_qrb) + "\t" + toString(leftdata.results[0].bsa); out += "\t" + toString(leftdata.results[0].divr_qlb_qra) + "\t" + toString(leftdata.results[0].qlb_qra) + "\t" + toString(leftdata.results[0].bsb); out += "\t" + flag + "\t" + toString(leftdata.results[0].winLStart) + "-" + toString(leftdata.results[0].winLEnd) + "\t" + toString(leftdata.results[0].winRStart) + "-" + toString(leftdata.results[0].winREnd) + "\n"; } if (rightdata.flag == "yes") { out += querySeq.getName() + "_RIGHT"; out += "\t" + rightdata.results[0].parentA.getName() + "\t" + rightdata.results[0].parentB.getName(); out += "\t" + toString(rightdata.results[0].divr_qla_qrb) + "\t" + toString(rightdata.results[0].qla_qrb) + "\t" + toString(rightdata.results[0].bsa); out += "\t" + toString(rightdata.results[0].divr_qlb_qra) + "\t" + toString(rightdata.results[0].qlb_qra) + "\t" + toString(rightdata.results[0].bsb); out += "\t" + flag + "\t" + toString(rightdata.results[0].winLStart) + "-" + toString(rightdata.results[0].winLEnd) + "\t" + toString(rightdata.results[0].winRStart) + "-" + toString(rightdata.results[0].winREnd) + "\n"; } } return out; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "getBlock"); exit(1); } } //*************************************************************************************************************** string ChimeraSlayer::getBlock(data_struct data, string flag){ try { string outputString = ""; outputString += querySeq.getName(); outputString += "\t" + data.parentA.getName() + "\t" + data.parentB.getName(); outputString += "\t" + toString(data.divr_qla_qrb) + "\t" + toString(data.qla_qrb) + "\t" + toString(data.bsa); outputString += "\t" + toString(data.divr_qlb_qra) + "\t" + toString(data.qlb_qra) + "\t" + toString(data.bsb); outputString += "\t" + flag + "\t" + toString(data.winLStart) + "-" + toString(data.winLEnd) + "\t" + toString(data.winRStart) + "-" + toString(data.winREnd) + "\n"; return outputString; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "getBlock"); exit(1); } } //*************************************************************************************************************** vector ChimeraSlayer::getRefSeqs(Sequence q, vector& thisTemplate, vector& thisFilteredTemplate){ try { vector refSeqs; if (searchMethod == "distance") { //find closest seqs to query in template - returns copies of seqs so trim does not destroy - remember to deallocate Sequence* newSeq = new Sequence(q.getName(), q.getAligned()); runFilter(newSeq); refSeqs = decalc.findClosest(*newSeq, thisTemplate, thisFilteredTemplate, numWanted, minSim); delete newSeq; }else if (searchMethod == "blast") { refSeqs = getBlastSeqs(q, thisTemplate, numWanted); //fills indexes }else if (searchMethod == "kmer") { refSeqs = getKmerSeqs(q, thisTemplate, numWanted); //fills indexes }else { m->mothurOut("not valid search."); exit(1); } //should never get here return refSeqs; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "getRefSeqs"); exit(1); } } //***************************************************************************************************************/ vector ChimeraSlayer::getBlastSeqs(Sequence q, vector& db, int num) { try { vector refResults; //get parts of query string queryUnAligned = q.getUnaligned(); string leftQuery = queryUnAligned.substr(0, int(queryUnAligned.length() * 0.33)); //first 1/3 of the sequence string rightQuery = queryUnAligned.substr(int(queryUnAligned.length() * 0.66)); //last 1/3 of the sequence //cout << "whole length = " << queryUnAligned.length() << '\t' << "left length = " << leftQuery.length() << '\t' << "right length = "<< rightQuery.length() << endl; Sequence* queryLeft = new Sequence(q.getName(), leftQuery); Sequence* queryRight = new Sequence(q.getName(), rightQuery); vector tempIndexesLeft = databaseLeft->findClosestMegaBlast(queryLeft, num+1, minSim); vector tempIndexesRight = databaseLeft->findClosestMegaBlast(queryRight, num+1, minSim); //cout << q->getName() << '\t' << leftQuery << '\t' << "leftMatches = " << tempIndexesLeft.size() << '\t' << rightQuery << " rightMatches = " << tempIndexesRight.size() << endl; // vector smaller; // vector larger; // // if (tempIndexesRight.size() < tempIndexesLeft.size()) { smaller = tempIndexesRight; larger = tempIndexesLeft; } // else { smaller = tempIndexesLeft; larger = tempIndexesRight; } //merge results map seen; map::iterator it; vector mergedResults; int index = 0; // for (int i = 0; i < smaller.size(); i++) { while(index < tempIndexesLeft.size() && index < tempIndexesRight.size()){ if (m->control_pressed) { delete queryRight; delete queryLeft; return refResults; } //add left if you havent already it = seen.find(tempIndexesLeft[index]); if (it == seen.end()) { mergedResults.push_back(tempIndexesLeft[index]); seen[tempIndexesLeft[index]] = tempIndexesLeft[index]; } //add right if you havent already it = seen.find(tempIndexesRight[index]); if (it == seen.end()) { mergedResults.push_back(tempIndexesRight[index]); seen[tempIndexesRight[index]] = tempIndexesRight[index]; } index++; } for (int i = index; i < tempIndexesLeft.size(); i++) { if (m->control_pressed) { delete queryRight; delete queryLeft; return refResults; } //add right if you havent already it = seen.find(tempIndexesLeft[i]); if (it == seen.end()) { mergedResults.push_back(tempIndexesLeft[i]); seen[tempIndexesLeft[i]] = tempIndexesLeft[i]; } } for (int i = index; i < tempIndexesRight.size(); i++) { if (m->control_pressed) { delete queryRight; delete queryLeft; return refResults; } //add right if you havent already it = seen.find(tempIndexesRight[i]); if (it == seen.end()) { mergedResults.push_back(tempIndexesRight[i]); seen[tempIndexesRight[i]] = tempIndexesRight[i]; } } //string qname = q->getName().substr(0, q->getName().find_last_of('_')); //cout << qname << endl; if (mergedResults.size() == 0) { numNoParents++; } for (int i = 0; i < mergedResults.size(); i++) { //cout << q->getName() << mergedResults[i] << '\t' << db[mergedResults[i]]->getName() << endl; if (db[mergedResults[i]]->getName() != q.getName()) { Sequence temp(db[mergedResults[i]]->getName(), db[mergedResults[i]]->getAligned()); refResults.push_back(temp); } } //cout << endl << endl; delete queryRight; delete queryLeft; return refResults; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "getBlastSeqs"); exit(1); } } //*************************************************************************************************************** vector ChimeraSlayer::getKmerSeqs(Sequence q, vector& db, int num) { try { vector refResults; //get parts of query string queryUnAligned = q.getUnaligned(); string leftQuery = queryUnAligned.substr(0, int(queryUnAligned.length() * 0.33)); //first 1/3 of the sequence string rightQuery = queryUnAligned.substr(int(queryUnAligned.length() * 0.66)); //last 1/3 of the sequence Sequence* queryLeft = new Sequence(q.getName(), leftQuery); Sequence* queryRight = new Sequence(q.getName(), rightQuery); vector tempIndexesLeft = databaseLeft->findClosestSequences(queryLeft, num); vector tempIndexesRight = databaseRight->findClosestSequences(queryRight, num); //merge results map seen; map::iterator it; vector mergedResults; int index = 0; // for (int i = 0; i < smaller.size(); i++) { while(index < tempIndexesLeft.size() && index < tempIndexesRight.size()){ if (m->control_pressed) { delete queryRight; delete queryLeft; return refResults; } //add left if you havent already it = seen.find(tempIndexesLeft[index]); if (it == seen.end()) { mergedResults.push_back(tempIndexesLeft[index]); seen[tempIndexesLeft[index]] = tempIndexesLeft[index]; } //add right if you havent already it = seen.find(tempIndexesRight[index]); if (it == seen.end()) { mergedResults.push_back(tempIndexesRight[index]); seen[tempIndexesRight[index]] = tempIndexesRight[index]; } index++; } for (int i = index; i < tempIndexesLeft.size(); i++) { if (m->control_pressed) { delete queryRight; delete queryLeft; return refResults; } //add right if you havent already it = seen.find(tempIndexesLeft[i]); if (it == seen.end()) { mergedResults.push_back(tempIndexesLeft[i]); seen[tempIndexesLeft[i]] = tempIndexesLeft[i]; } } for (int i = index; i < tempIndexesRight.size(); i++) { if (m->control_pressed) { delete queryRight; delete queryLeft; return refResults; } //add right if you havent already it = seen.find(tempIndexesRight[i]); if (it == seen.end()) { mergedResults.push_back(tempIndexesRight[i]); seen[tempIndexesRight[i]] = tempIndexesRight[i]; } } for (int i = 0; i < mergedResults.size(); i++) { //cout << mergedResults[i] << '\t' << db[mergedResults[i]]->getName() << endl; if (db[mergedResults[i]]->getName() != q.getName()) { Sequence temp(db[mergedResults[i]]->getName(), db[mergedResults[i]]->getAligned()); refResults.push_back(temp); } } //cout << endl; delete queryRight; delete queryLeft; return refResults; } catch(exception& e) { m->errorOut(e, "ChimeraSlayer", "getKmerSeqs"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/chimera/chimeraslayer.h000066400000000000000000000052131255543666200210500ustar00rootroot00000000000000#ifndef CHIMERASLAYER_H #define CHIMERASLAYER_H /* * chimeraslayer.h * Mothur * * Created by westcott on 9/25/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "chimera.h" #include "maligner.h" #include "slayer.h" //***********************************************************************/ //This class was modeled after the chimeraSlayer written by the Broad Institute /***********************************************************************/ class ChimeraSlayer : public Chimera { public: ChimeraSlayer(string, string, bool, string, int, int, int, int, float, int, int, int, int, int, int, int, int, bool, string, int); ChimeraSlayer(string, string, bool, map&, string, int, int, int, int, float, int, int, int, int, int, int, int, int, bool, string, int); ChimeraSlayer(string, string, bool, map&, string, int, int, int, int, float, int, int, int, int, int, int, int, int, bool, string, int, bool); ~ChimeraSlayer(); int getChimeras(Sequence*); Sequence print(ostream&, ostream&); Sequence print(ostream&, ostream&, data_results, data_results); void printHeader(ostream&); int doPrep(); int getNumNoParents() { return numNoParents; } data_results getResults() { return printResults; } #ifdef USE_MPI Sequence print(MPI_File&, MPI_File&); Sequence print(MPI_File&, MPI_File&, data_results, data_results, bool&); #endif private: Sequence querySeq; Sequence trimQuery; DeCalculator decalc; Database* databaseRight; Database* databaseLeft; map priority; //for template=self, seqname, seqAligned, abundance set chimericSeqs; //for template=self, so we don't add chimeric sequences to the userTemplate set int numNoParents, threadID; vector chimeraResults; data_results printResults; string chimeraFlags, searchMethod, fastafile, blastlocation; bool realign, trimChimera; int window, numWanted, kmerSize, match, misMatch, minSim, minCov, minBS, minSNP, parents, iters, increment; float divR; void printBlock(data_struct, string, ostream&); void printBlock(data_results, data_results, bool, bool, string, ostream&); string getBlock(data_struct, string); string getBlock(data_results, data_results, bool, bool, string); //int readNameFile(string); vector getTemplate(Sequence, vector&); vector getRefSeqs(Sequence, vector&, vector&); vector getBlastSeqs(Sequence, vector&, int); vector getKmerSeqs(Sequence, vector&, int); }; /************************************************************************/ #endif mothur-1.36.1/source/chimera/decalc.cpp000066400000000000000000000717241255543666200200000ustar00rootroot00000000000000/* * decalc.cpp * Mothur * * Created by Sarah Westcott on 7/22/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "decalc.h" #include "chimera.h" #include "dist.h" #include "eachgapdist.h" #include "ignoregaps.h" #include "eachgapdist.h" //*************************************************************************************************************** void DeCalculator::setMask(string ms) { try { seqMask = ms; int count = 0; maskMap.clear(); if (seqMask.length() != 0) { //whereever there is a base in the mask, save that value is query and subject for (int i = 0; i < seqMask.length(); i++) { if (isalpha(seqMask[i])) { h.insert(i); maskMap[count] = i; count++; } } }else { for (int i = 0; i < alignLength; i++) { h.insert(i); maskMap[count] = i; count++; } } } catch(exception& e) { m->errorOut(e, "DeCalculator", "setMask"); exit(1); } } //*************************************************************************************************************** void DeCalculator::runMask(Sequence* seq) { try{ string q = seq->getAligned(); string tempQuery = ""; //whereever there is a base in the mask, save that value is query and subject set::iterator setit; for ( setit=h.begin() ; setit != h.end(); setit++ ) { tempQuery += q[*setit]; } //save masked values seq->setAligned(tempQuery); seq->setUnaligned(tempQuery); } catch(exception& e) { m->errorOut(e, "DeCalculator", "runMask"); exit(1); } } //*************************************************************************************************************** //num is query's spot in querySeqs void DeCalculator::trimSeqs(Sequence* query, Sequence* subject, map& trim) { try { string q = query->getAligned(); string s = subject->getAligned(); int front = 0; for (int i = 0; i < q.length(); i++) { //cout << "query = " << q[i] << " subject = " << s[i] << endl; if (isalpha(q[i]) && isalpha(s[i])) { front = i; break; } } //cout << endl << endl; int back = 0; for (int i = q.length(); i >= 0; i--) { //cout << "query = " << q[i] << " subject = " << s[i] << endl; if (isalpha(q[i]) && isalpha(s[i])) { back = i; break; } } trim[front] = back; } catch(exception& e) { m->errorOut(e, "DeCalculator", "trimSeqs"); exit(1); } } //*************************************************************************************************************** vector DeCalculator::findWindows(Sequence* query, int front, int back, int& size, int increment) { try { vector win; int cutoff = back - front; //back - front //if window is set to default if (size == 0) { if (cutoff > 1200) { size = 300; } else{ size = (cutoff / 4); } } else if (size > (cutoff / 4)) { m->mothurOut("You have selected too large a window size for sequence " + query->getName() + ". I will choose an appropriate window size."); m->mothurOutEndLine(); size = (cutoff / 4); } /* string seq = query->getAligned().substr(front, cutoff); //count bases int numBases = 0; for (int l = 0; l < seq.length(); l++) { if (isalpha(seq[l])) { numBases++; } } //cout << "num Bases = " << numBases << endl; //save start of seq win.push_back(front); //cout << front << '\t'; //move ahead increment bases at a time until all bases are in a window int countBases = 0; int totalBases = 0; //used to eliminate window of blanks at end of sequence seq = query->getAligned(); for (int m = front; m < (back - size) ; m++) { //count number of bases you see if (isalpha(seq[m])) { countBases++; } //if you have seen enough bases to make a new window if (countBases >= increment) { //total bases is the number of bases in a window already. totalBases += countBases; //cout << "total bases = " << totalBases << endl; win.push_back(m); //save spot in alignment //cout << m << '\t'; countBases = 0; //reset bases you've seen in this window } //no need to continue if all your bases are in a window if (totalBases == numBases) { break; } } //get last window if needed if (totalBases < numBases) { win.push_back(back-size); } //cout << endl << endl; */ //this follows wigeon, but we may want to consider that it chops off the end values if the sequence cannot be evenly divided into steps for (int i = front; i < (back - size) ; i+=increment) { win.push_back(i); } return win; } catch(exception& e) { m->errorOut(e, "DeCalculator", "findWindows"); exit(1); } } //*************************************************************************************************************** vector DeCalculator::calcObserved(Sequence* query, Sequence* subject, vector window, int size) { try { vector temp; //int gaps = 0; for (int i = 0; i < window.size(); i++) { string seqFrag = query->getAligned().substr(window[i], size); string seqFragsub = subject->getAligned().substr(window[i], size); int diff = 0; for (int b = 0; b < seqFrag.length(); b++) { //if at least one is a base and they are not equal if( (isalpha(seqFrag[b]) || isalpha(seqFragsub[b])) && (seqFrag[b] != seqFragsub[b]) ) { diff++; } //ignore gaps //if((!isalpha(seqFrag[b])) && (!isalpha(seqFragsub[b]))) { gaps++; } } //percentage of mismatched bases float dist; //if the whole fragment is 0 distance = 0 //if ((seqFrag.length()-gaps) == 0) { dist = 0.0; } //percentage of mismatched bases //else { dist = diff / (float) (seqFrag.length()-gaps) * 100; } dist = diff / (float) (seqFrag.length()) * 100; temp.push_back(dist); } return temp; } catch(exception& e) { m->errorOut(e, "DeCalculator", "calcObserved"); exit(1); } } //*************************************************************************************************************** float DeCalculator::calcDist(Sequence* query, Sequence* subject, int front, int back) { try { //so you only look at the trimmed part of the sequence int cutoff = back - front; int gaps = 0; //from first startpoint with length back-front string seqFrag = query->getAligned().substr(front, cutoff); string seqFragsub = subject->getAligned().substr(front, cutoff); int diff = 0; for (int b = 0; b < seqFrag.length(); b++) { //ignore gaps if((!isalpha(seqFrag[b])) && (!isalpha(seqFragsub[b]))) { gaps++; } if (seqFrag[b] != seqFragsub[b]) { diff++; } } //if the whole fragment is 0 distance = 0 if ((seqFrag.length()-gaps) == 0) { return 0.0; } //percentage of mismatched bases float dist = diff / (float) (seqFrag.length()-gaps) * 100; return dist; } catch(exception& e) { m->errorOut(e, "DeCalculator", "calcDist"); exit(1); } } //*************************************************************************************************************** vector DeCalculator::calcExpected(vector qav, float coef) { try { //for each window vector queryExpected; for (int j = 0; j < qav.size(); j++) { float expected = qav[j] * coef; queryExpected.push_back(expected); } return queryExpected; } catch(exception& e) { m->errorOut(e, "DeCalculator", "calcExpected"); exit(1); } } //*************************************************************************************************************** float DeCalculator::calcDE(vector obs, vector exp) { try { //for each window float sum = 0.0; //sum = sum from 1 to i of (oi-ei)^2 int numZeros = 0; for (int j = 0; j < obs.size(); j++) { //if (obs[j] != 0.0) { sum += ((obs[j] - exp[j]) * (obs[j] - exp[j])); //}else { numZeros++; } } float de = sqrt((sum / (obs.size() - 1 - numZeros))); return de; } catch(exception& e) { m->errorOut(e, "DeCalculator", "calcDE"); exit(1); } } //*************************************************************************************************************** vector DeCalculator::calcFreq(vector seqs, string filename) { try { vector prob; string freqfile = m->getRootName(filename) + "freq"; ofstream outFreq; m->openOutputFile(freqfile, outFreq); outFreq << "#" << m->getVersion() << endl; string length = toString(seqs.size()); //if there are 5000 seqs in the template then set precision to 3 int precision = length.length() - 1; //format output outFreq.setf(ios::fixed, ios::floatfield); outFreq.setf(ios::showpoint); //at each position in the sequence for (int i = 0; i < seqs[0]->getAligned().length(); i++) { vector freq; freq.resize(4,0); int gaps = 0; //find the frequency of each nucleotide for (int j = 0; j < seqs.size(); j++) { char value = seqs[j]->getAligned()[i]; if(toupper(value) == 'A') { freq[0]++; } else if(toupper(value) == 'T' || toupper(value) == 'U') { freq[1]++; } else if(toupper(value) == 'G') { freq[2]++; } else if(toupper(value) == 'C') { freq[3]++; } else { gaps++; } } //find base with highest frequency int highest = 0; for (int j = 0; j < freq.size(); j++) { if (freq[j] > highest) { highest = freq[j]; } } float highFreq = highest / (float) (seqs.size()); float Pi; Pi = (highFreq - 0.25) / 0.75; //cannot have probability less than 0. if (Pi < 0) { Pi = 0.0; } //saves this for later outFreq << setprecision(precision) << i << '\t' << highFreq << endl; if (h.count(i) > 0) { prob.push_back(Pi); } } outFreq.close(); return prob; } catch(exception& e) { m->errorOut(e, "DeCalculator", "calcFreq"); exit(1); } } //*************************************************************************************************************** vector DeCalculator::findQav(vector window, int size, vector probabilityProfile) { try { vector averages; //for each window find average for (int i = 0; i < window.size(); i++) { float average = 0.0; //while you are in the window for this sequence int count = 0; for (int j = window[i]; j < (window[i]+size); j++) { average += probabilityProfile[j]; count++; } average = average / count; //save this windows average averages.push_back(average); } return averages; } catch(exception& e) { m->errorOut(e, "DeCalculator", "findQav"); exit(1); } } //*************************************************************************************************************** //seqs have already been masked vector< vector > DeCalculator::getQuantiles(vector seqs, vector windowSizesTemplate, int window, vector probProfile, int increment, int start, int end) { try { vector< vector > quan; //percentage of mismatched pairs 1 to 100 quan.resize(100); //for each sequence for(int i = start; i < end; i++){ m->mothurOut("Processing sequence " + toString(i)); m->mothurOutEndLine(); Sequence* query = new Sequence(seqs[i]->getName(), seqs[i]->getAligned()); //compare to every other sequence in template for(int j = 0; j < i; j++){ Sequence* subject = new Sequence(seqs[j]->getName(), seqs[j]->getAligned()); if (m->control_pressed) { delete query; delete subject; return quan; } map trim; map::iterator it; trimSeqs(query, subject, trim); it = trim.begin(); int front = it->first; int back = it->second; //reset window for each new comparison windowSizesTemplate[i] = window; vector win = findWindows(query, front, back, windowSizesTemplate[i], increment); vector obsi = calcObserved(query, subject, win, windowSizesTemplate[i]); vector q = findQav(win, windowSizesTemplate[i], probProfile); float alpha = getCoef(obsi, q); vector exp = calcExpected(q, alpha); float de = calcDE(obsi, exp); float dist = calcDist(query, subject, front, back); //cout << i << '\t' << j << '\t' << dist << '\t' << de << endl; dist = ceil(dist); //quanMember newScore(de, i, j); quan[dist].push_back(de); delete subject; } delete query; } return quan; } catch(exception& e) { m->errorOut(e, "DeCalculator", "getQuantiles"); exit(1); } } //******************************************************************************************************************** //sorts lowest to highest inline bool compareQuanMembers(quanMember left, quanMember right){ return (left.score < right.score); } //*************************************************************************************************************** //this was going to be used by pintail to increase the sensitivity of the chimera detection, but it wasn't quite right. may want to revisit in the future... void DeCalculator::removeObviousOutliers(vector< vector >& quantiles, int num) { try { for (int i = 0; i < quantiles.size(); i++) { //find mean of this quantile score sort(quantiles[i].begin(), quantiles[i].end()); vector temp; if (quantiles[i].size() != 0) { float high = quantiles[i][int(quantiles[i].size() * 0.99)]; float low = quantiles[i][int(quantiles[i].size() * 0.01)]; //look at each value in quantiles to see if it is an outlier for (int j = 0; j < quantiles[i].size(); j++) { //is this score between 1 and 99% if ((quantiles[i][j] > low) && (quantiles[i][j] < high)) { temp.push_back(quantiles[i][j]); } } } quantiles[i] = temp; } /* //find contributer with most offending score related to it int largestContrib = findLargestContrib(seen); //while you still have guys to eliminate while (contributions.size() > 0) { m->mothurOut("Removing scores contributed by sequence " + toString(largestContrib) + " in your template file."); m->mothurOutEndLine(); //remove from quantiles all scores that were made using this largestContrib for (int i = 0; i < quantiles.size(); i++) { //cout << "before remove " << quantiles[i].size() << '\t'; removeContrib(largestContrib, quantiles[i]); //cout << "after remove " << quantiles[i].size() << endl; } //cout << count << " largest contrib = " << largestContrib << endl; count++; //remove from contributions all scores that were made using this largestContrib removeContrib(largestContrib, contributions); //"erase" largestContrib seen[largestContrib] = -1; //get next largestContrib largestContrib = findLargestContrib(seen); } ABOVE IS ATTEMPT #1 ************************************************************************************************** BELOW IS ATTEMPT #2 vector marked = returnObviousOutliers(quantiles, num); vector copyMarked = marked; //find the 99% of marked sort(copyMarked.begin(), copyMarked.end()); int high = copyMarked[int(copyMarked.size() * 0.99)]; cout << "high = " << high << endl; for(int i = 0; i < marked.size(); i++) { if (marked[i] > high) { m->mothurOut("Removing scores contributed by sequence " + toString(marked[i]) + " in your template file."); m->mothurOutEndLine(); for (int i = 0; i < quantiles.size(); i++) { removeContrib(marked[i], quantiles[i]); } } } //adjust quantiles for (int i = 0; i < quantiles.size(); i++) { vector temp; if (quantiles[i].size() == 0) { //in case this is not a distance found in your template files for (int g = 0; g < 6; g++) { temp.push_back(0.0); } }else{ sort(quantiles[i].begin(), quantiles[i].end(), compareQuanMembers); //save 10% temp.push_back(quantiles[i][int(quantiles[i].size() * 0.10)].score); //save 25% temp.push_back(quantiles[i][int(quantiles[i].size() * 0.25)].score); //save 50% temp.push_back(quantiles[i][int(quantiles[i].size() * 0.5)].score); //save 75% temp.push_back(quantiles[i][int(quantiles[i].size() * 0.75)].score); //save 95% temp.push_back(quantiles[i][int(quantiles[i].size() * 0.95)].score); //save 99% temp.push_back(quantiles[i][int(quantiles[i].size() * 0.99)].score); } quan[i] = temp; } */ } catch(exception& e) { m->errorOut(e, "DeCalculator", "removeObviousOutliers"); exit(1); } } //*************************************************************************************************************** //put quanMember in the vector based on how far they are from the 99% or 1%. Biggest offenders in front. /*vector DeCalculator::sortContrib(map quan) { try{ vector newQuan; map::iterator it; while (quan.size() > 0) { map::iterator largest = quan.begin(); //find biggest member for (it = quan.begin(); it != quan.end(); it++) { if (it->second > largest->second) { largest = it; } } cout << largest->second << '\t' << largest->first->score << '\t' << largest->first->member1 << '\t' << largest->first->member2 << endl; //save it newQuan.push_back(*(largest->first)); //erase from quan quan.erase(largest); } return newQuan; } catch(exception& e) { m->errorOut(e, "DeCalculator", "sortContrib"); exit(1); } } *************************************************************************************************************** //used by removeObviousOutliers which was attempt to increase sensitivity of chimera detection...not currently used... int DeCalculator::findLargestContrib(vector seen) { try{ int largest = 0; int largestContribs; for (int i = 0; i < seen.size(); i++) { if (seen[i] > largest) { largestContribs = i; largest = seen[i]; } } return largestContribs; } catch(exception& e) { m->errorOut(e, "DeCalculator", "findLargestContribs"); exit(1); } } *************************************************************************************************************** void DeCalculator::removeContrib(int bad, vector& quan) { try{ vector newQuan; for (int i = 0; i < quan.size(); i++) { if ((quan[i].member1 != bad) && (quan[i].member2 != bad) ) { newQuan.push_back(quan[i]); } } quan = newQuan; } catch(exception& e) { m->errorOut(e, "DeCalculator", "removeContrib"); exit(1); } } */ //*************************************************************************************************************** float DeCalculator::findAverage(vector myVector) { try{ float total = 0.0; for (int i = 0; i < myVector.size(); i++) { total += myVector[i]; } float average = total / (float) myVector.size(); return average; } catch(exception& e) { m->errorOut(e, "DeCalculator", "findAverage"); exit(1); } } //*************************************************************************************************************** float DeCalculator::getCoef(vector obs, vector qav) { try { //find average prob for this seqs windows float probAverage = findAverage(qav); //find observed average float obsAverage = findAverage(obs); float coef = obsAverage / probAverage; return coef; } catch(exception& e) { m->errorOut(e, "DeCalculator", "getCoef"); exit(1); } } //*************************************************************************************************************** //gets closest matches to each end, since chimeras will most likely have different parents on each end vector DeCalculator::findClosest(Sequence querySeq, vector& thisTemplate, vector& thisFilteredTemplate, int numWanted, int minSim) { try { //indexes.clear(); vector seqsMatches; vector distsLeft; vector distsRight; Dist* distcalculator = new eachGapDist(); string queryUnAligned = querySeq.getUnaligned(); int numBases = int(queryUnAligned.length() * 0.33); string leftQuery = ""; //first 1/3 of the sequence string rightQuery = ""; //last 1/3 of the sequence string queryAligned = querySeq.getAligned(); //left side bool foundFirstBase = false; int baseCount = 0; int leftSpot = 0; int firstBaseSpot = 0; for (int i = 0; i < queryAligned.length(); i++) { //if you are a base if (isalpha(queryAligned[i])) { baseCount++; if (!foundFirstBase) { foundFirstBase = true; firstBaseSpot = i; } } //eliminate opening .'s if (foundFirstBase) { leftQuery += queryAligned[i]; } //if you have 1/3 if (baseCount >= numBases) { leftSpot = i; break; } //first 1/3 } //right side - count through another 1/3, so you are at last third baseCount = 0; int rightSpot = 0; for (int i = leftSpot; i < queryAligned.length(); i++) { //if you are a base if (isalpha(queryAligned[i])) { baseCount++; } //if you have 1/3 if (baseCount > numBases + 1) { rightSpot = i; break; } //last 1/3 } //trim end //find last position in query that is a non gap character int lastBaseSpot = queryAligned.length()-1; for (int j = queryAligned.length()-1; j >= 0; j--) { if (isalpha(queryAligned[j])) { lastBaseSpot = j; break; } } rightQuery = queryAligned.substr(rightSpot, (lastBaseSpot-rightSpot+1)); //sequence from pos spot to end Sequence queryLeft(querySeq.getName(), leftQuery); Sequence queryRight(querySeq.getName(), rightQuery); //cout << querySeq->getName() << '\t' << leftSpot << '\t' << rightSpot << '\t' << firstBaseSpot << '\t' << lastBaseSpot << endl; //cout << queryUnAligned.length() << '\t' << queryLeft.getUnaligned().length() << '\t' << queryRight.getUnaligned().length() << endl; for(int j = 0; j < thisFilteredTemplate.size(); j++){ string dbAligned = thisFilteredTemplate[j]->getAligned(); string leftDB = dbAligned.substr(firstBaseSpot, (leftSpot-firstBaseSpot+1)); //first 1/3 of the sequence string rightDB = dbAligned.substr(rightSpot, (lastBaseSpot-rightSpot+1)); //last 1/3 of the sequence Sequence dbLeft(thisFilteredTemplate[j]->getName(), leftDB); Sequence dbRight(thisFilteredTemplate[j]->getName(), rightDB); distcalculator->calcDist(queryLeft, dbLeft); float distLeft = distcalculator->getDist(); distcalculator->calcDist(queryRight, dbRight); float distRight = distcalculator->getDist(); SeqDist subjectLeft; subjectLeft.seq = NULL; subjectLeft.dist = distLeft; subjectLeft.index = j; distsLeft.push_back(subjectLeft); SeqDist subjectRight; subjectRight.seq = NULL; subjectRight.dist = distRight; subjectRight.index = j; distsRight.push_back(subjectRight); } delete distcalculator; //sort by smallest distance sort(distsRight.begin(), distsRight.end(), compareSeqDist); sort(distsLeft.begin(), distsLeft.end(), compareSeqDist); //merge results map seen; map::iterator it; vector dists; float lastRight = distsRight[0].dist; float lastLeft = distsLeft[0].dist; float maxDist = 1.0 - (minSim / 100.0); for (int i = 0; i < numWanted+1; i++) { if (m->control_pressed) { return seqsMatches; } //add left if you havent already it = seen.find(thisTemplate[distsLeft[i].index]->getName()); if (it == seen.end() && distsLeft[i].dist <= maxDist) { dists.push_back(distsLeft[i]); seen[thisTemplate[distsLeft[i].index]->getName()] = thisTemplate[distsLeft[i].index]->getName(); lastLeft = distsLeft[i].dist; // cout << "loop-left\t" << db[distsLeft[i].index]->getName() << '\t' << distsLeft[i].dist << endl; } //add right if you havent already it = seen.find(thisTemplate[distsRight[i].index]->getName()); if (it == seen.end() && distsRight[i].dist <= maxDist) { dists.push_back(distsRight[i]); seen[thisTemplate[distsRight[i].index]->getName()] = thisTemplate[distsRight[i].index]->getName(); lastRight = distsRight[i].dist; // cout << "loop-right\t" << db[distsRight[i].index]->getName() << '\t' << distsRight[i].dist << endl; } if (i == numWanted) { break; } } //are we still above the minimum similarity cutoff if ((lastLeft >= minSim) || (lastRight >= minSim)) { //add in ties from left int i = numWanted; while (i < distsLeft.size()) { if (distsLeft[i].dist == lastLeft) { dists.push_back(distsLeft[i]); } else { break; } i++; } //add in ties from right i = numWanted; while (i < distsRight.size()) { if (distsRight[i].dist == lastRight) { dists.push_back(distsRight[i]); } else { break; } i++; } } //cout << numWanted << endl; for (int i = 0; i < dists.size(); i++) { // cout << db[dists[i].index]->getName() << '\t' << dists[i].dist << endl; if ((thisTemplate[dists[i].index]->getName() != querySeq.getName()) && (((1.0-dists[i].dist)*100) >= minSim)) { Sequence temp(thisTemplate[dists[i].index]->getName(), thisTemplate[dists[i].index]->getAligned()); //have to make a copy so you can trim and filter without stepping on eachother. //cout << querySeq->getName() << '\t' << thisTemplate[dists[i].index]->getName() << '\t' << dists[i].dist << endl; seqsMatches.push_back(temp); } } return seqsMatches; } catch(exception& e) { m->errorOut(e, "DeCalculator", "findClosest"); exit(1); } } //*************************************************************************************************************** Sequence* DeCalculator::findClosest(Sequence* querySeq, vector db) { try { Sequence* seqsMatch; Dist* distcalculator = new eachGapDist(); int index = 0; int smallest = 1000000; for(int j = 0; j < db.size(); j++){ distcalculator->calcDist(*querySeq, *db[j]); float dist = distcalculator->getDist(); if (dist < smallest) { smallest = dist; index = j; } } delete distcalculator; seqsMatch = new Sequence(db[index]->getName(), db[index]->getAligned()); //have to make a copy so you can trim and filter without stepping on eachother. return seqsMatch; } catch(exception& e) { m->errorOut(e, "DeCalculator", "findClosest"); exit(1); } } /***************************************************************************************************************/ map DeCalculator::trimSeqs(Sequence& query, vector& topMatches) { try { int frontPos = 0; //should contain first position in all seqs that is not a gap character int rearPos = query.getAligned().length(); //********find first position in topMatches that is a non gap character***********// //find first position all query seqs that is a non gap character for (int i = 0; i < topMatches.size(); i++) { string aligned = topMatches[i].getAligned(); int pos = 0; //find first spot in this seq for (int j = 0; j < aligned.length(); j++) { if (isalpha(aligned[j])) { pos = j; break; } } //save this spot if it is the farthest if (pos > frontPos) { frontPos = pos; } } string aligned = query.getAligned(); int pos = 0; //find first position in query that is a non gap character for (int j = 0; j < aligned.length(); j++) { if (isalpha(aligned[j])) { pos = j; break; } } //save this spot if it is the farthest if (pos > frontPos) { frontPos = pos; } //********find last position in topMatches that is a non gap character***********// for (int i = 0; i < topMatches.size(); i++) { string aligned = topMatches[i].getAligned(); int pos = aligned.length(); //find first spot in this seq for (int j = aligned.length()-1; j >= 0; j--) { if (isalpha(aligned[j])) { pos = j; break; } } //save this spot if it is the farthest if (pos < rearPos) { rearPos = pos; } } aligned = query.getAligned(); pos = aligned.length(); //find last position in query that is a non gap character for (int j = aligned.length()-1; j >= 0; j--) { if (isalpha(aligned[j])) { pos = j; break; } } //save this spot if it is the farthest if (pos < rearPos) { rearPos = pos; } map trimmedPos; //check to make sure that is not whole seq if ((rearPos - frontPos - 1) <= 0) { query.setAligned(""); //trim topMatches for (int i = 0; i < topMatches.size(); i++) { topMatches[i].setAligned(""); } }else { //trim query string newAligned = query.getAligned(); newAligned = newAligned.substr(frontPos, (rearPos-frontPos+1)); query.setAligned(newAligned); //trim topMatches for (int i = 0; i < topMatches.size(); i++) { newAligned = topMatches[i].getAligned(); newAligned = newAligned.substr(frontPos, (rearPos-frontPos+1)); topMatches[i].setAligned(newAligned); } for (int i = 0; i < newAligned.length(); i++) { trimmedPos[i] = i+frontPos; } } return trimmedPos; } catch(exception& e) { m->errorOut(e, "DeCalculator", "trimSequences"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/chimera/decalc.h000066400000000000000000000054761255543666200174460ustar00rootroot00000000000000#ifndef DECALC_H #define DECALC_H /* * decalc.h * Mothur * * Created by Sarah Westcott on 7/22/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "sequence.hpp" /***********************************************************************/ //This class was created using the algorithms described in the // "At Least 1 in 20 16S rRNA Sequence Records Currently Held in the Public Repositories is Estimated To Contain Substantial Anomalies" paper //by Kevin E. Ashelford 1, Nadia A. Chuzhanova 3, John C. Fry 1, Antonia J. Jones 2 and Andrew J. Weightman 1. /***********************************************************************/ //this structure is necessary to determine the sequence that contributed to the outliers when we remove them //this way we can remove all scores that are contributed by outlier sequences. struct quanMember { float score; int member1; int member2; quanMember (float s, int m, int n) : score(s), member1(m), member2(n) {} quanMember() {} }; //******************************************************************************************************************** class DeCalculator { public: DeCalculator() { m = MothurOut::getInstance(); } ~DeCalculator() {}; vector findClosest(Sequence, vector&, vector&, int, int); //takes querySeq, a reference db, filteredRefDB, numWanted, minSim Sequence* findClosest(Sequence*, vector); set getPos() { return h; } void setMask(string); void setAlignmentLength(int l) { alignLength = l; } void runMask(Sequence*); void trimSeqs(Sequence*, Sequence*, map&); map trimSeqs(Sequence&, vector&); void removeObviousOutliers(vector< vector >&, int); vector calcFreq(vector, string); vector findWindows(Sequence*, int, int, int&, int); vector calcObserved(Sequence*, Sequence*, vector, int); vector calcExpected(vector, float); vector findQav(vector, int, vector); float calcDE(vector, vector); float calcDist(Sequence*, Sequence*, int, int); float getCoef(vector, vector); vector< vector > getQuantiles(vector, vector, int, vector, int, int, int); vector returnObviousOutliers(vector< vector >, int); map getMaskMap() { return maskMap; } private: //vector sortContrib(map); //used by mallard float findAverage(vector); //int findLargestContrib(vector); //void removeContrib(int, vector&); string seqMask; set h; int alignLength; map maskMap; MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/chimera/maligner.cpp000066400000000000000000000533631255543666200203620ustar00rootroot00000000000000/* * maligner.cpp * Mothur * * Created by westcott on 9/23/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "maligner.h" /***********************************************************************/ //int num, int match, int misMatch, , string mode, Database* dataLeft, Database* dataRight Maligner::Maligner(vector temp, int match, int misMatch, float div, int ms, int minCov) : db(temp), matchScore(match), misMatchPenalty(misMatch), minDivR(div), minSimilarity(ms), minCoverage(minCov) { //numWanted(num), , searchMethod(mode), databaseLeft(dataLeft), databaseRight(dataRight) m = MothurOut::getInstance(); } /***********************************************************************/ string Maligner::getResults(Sequence q, DeCalculator decalc) { try { outputResults.clear(); //make copy so trimming doesn't destroy query from calling class - remember to deallocate query.setName(q.getName()); query.setAligned(q.getAligned()); string chimera; //copy refSeqs so that filter does not effect original for(int i = 0; i < db.size(); i++) { Sequence newSeq(db[i].getName(), db[i].getAligned()); refSeqs.push_back(newSeq); } refSeqs = minCoverageFilter(refSeqs); if (refSeqs.size() < 2) { //for (int i = 0; i < refSeqs.size(); i++) { delete refSeqs[i]; } percentIdenticalQueryChimera = 0.0; return "unknown"; } int chimeraPenalty = computeChimeraPenalty(); //fills outputResults chimera = chimeraMaligner(chimeraPenalty, decalc); if (m->control_pressed) { return chimera; } //free memory //delete query; //for (int i = 0; i < refSeqs.size(); i++) { delete refSeqs[i]; } return chimera; } catch(exception& e) { m->errorOut(e, "Maligner", "getResults"); exit(1); } } /***********************************************************************/ string Maligner::chimeraMaligner(int chimeraPenalty, DeCalculator decalc) { try { string chimera; //trims seqs to first non gap char in all seqs and last non gap char in all seqs spotMap = decalc.trimSeqs(query, refSeqs); //you trimmed the whole sequence, skip if (query.getAligned() == "") { return "no"; } vector temp = refSeqs; temp.push_back(query); temp = verticalFilter(temp); query = temp[temp.size()-1]; for (int i = 0; i < temp.size()-1; i++) { refSeqs[i] = temp[i]; } //for (int i = 0; i < refSeqs.size(); i++) { cout << refSeqs[i]->getName() << endl ; }//<< refSeqs[i]->getAligned() << endl vector< vector > matrix = buildScoreMatrix(query.getAligned().length(), refSeqs.size()); //builds and initializes if (m->control_pressed) { return chimera; } fillScoreMatrix(matrix, refSeqs, chimeraPenalty); vector path = extractHighestPath(matrix); if (m->control_pressed) { return chimera; } vector trace = mapTraceRegionsToAlignment(path); if (trace.size() > 1) { chimera = "yes"; } else { chimera = "no"; return chimera; } int traceStart = path[0].col; int traceEnd = path[path.size()-1].col; string queryInRange = query.getAligned(); queryInRange = queryInRange.substr(traceStart, (traceEnd-traceStart+1)); // cout << queryInRange << endl; string chimeraSeq = constructChimericSeq(trace, refSeqs); // cout << chimeraSeq << endl; // cout << queryInRange.length() << endl; // cout << chimeraSeq.length() << endl; percentIdenticalQueryChimera = computePercentID(queryInRange, chimeraSeq); // cout << percentIdenticalQueryChimera << endl; /* vector trace = extractHighestPath(matrix); //cout << "traces\n"; //for(int i=0;igetName() << endl; //} if (trace.size() > 1) { chimera = "yes"; } else { chimera = "no"; return chimera; } int traceStart = trace[0].col; int traceEnd = trace[trace.size()-1].oldCol; string queryInRange = query->getAligned(); queryInRange = queryInRange.substr(traceStart, (traceEnd-traceStart));*/ if (m->control_pressed) { return chimera; } //save output results for (int i = 0; i < trace.size(); i++) { int regionStart = trace[i].col; int regionEnd = trace[i].oldCol; int seqIndex = trace[i].row; // cout << regionStart << '\t' << regionEnd << '\t' << seqIndex << endl; results temp; temp.parent = refSeqs[seqIndex].getName(); temp.parentAligned = db[seqIndex].getAligned(); temp.nastRegionStart = spotMap[regionStart]; temp.nastRegionEnd = spotMap[regionEnd]; temp.regionStart = unalignedMap[regionStart]; temp.regionEnd = unalignedMap[regionEnd]; string parentInRange = refSeqs[seqIndex].getAligned(); parentInRange = parentInRange.substr(traceStart, (traceEnd-traceStart+1)); temp.queryToParent = computePercentID(queryInRange, parentInRange); temp.divR = (percentIdenticalQueryChimera / temp.queryToParent); string queryInRegion = query.getAligned(); queryInRegion = queryInRegion.substr(regionStart, (regionEnd-regionStart+1)); string parentInRegion = refSeqs[seqIndex].getAligned(); parentInRegion = parentInRegion.substr(regionStart, (regionEnd-regionStart+1)); temp.queryToParentLocal = computePercentID(queryInRegion, parentInRegion); //cout << query->getName() << '\t' << temp.parent << '\t' << "NAST:" << temp.nastRegionStart << '-' << temp.nastRegionEnd << " G:" << temp.queryToParent << " L:" << temp.queryToParentLocal << ", " << temp.divR << endl; outputResults.push_back(temp); } return chimera; } catch(exception& e) { m->errorOut(e, "Maligner", "chimeraMaligner"); exit(1); } } /***********************************************************************/ //removes top matches that do not have minimum coverage with query. vector Maligner::minCoverageFilter(vector ref){ try { vector newRefs; string queryAligned = query.getAligned(); for (int i = 0; i < ref.size(); i++) { string refAligned = ref[i].getAligned(); int numBases = 0; int numCovered = 0; //calculate coverage for (int j = 0; j < queryAligned.length(); j++) { if (isalpha(queryAligned[j])) { numBases++; if (isalpha(refAligned[j])) { numCovered++; } } } int coverage = ((numCovered/(float)numBases)*100); //if coverage above minimum if (coverage > minCoverage) { newRefs.push_back(ref[i]); }//else { //delete ref[i]; //} } return newRefs; } catch(exception& e) { m->errorOut(e, "Maligner", "minCoverageFilter"); exit(1); } } /***********************************************************************/ // a breakpoint should yield fewer mismatches than this number with respect to the best parent sequence. int Maligner::computeChimeraPenalty() { try { int numAllowable = ((1.0 - (1.0/minDivR)) * query.getNumBases()); // if(numAllowable < 1){ numAllowable = 1; } int penalty = int(numAllowable + 1) * misMatchPenalty; return penalty; } catch(exception& e) { m->errorOut(e, "Maligner", "computeChimeraPenalty"); exit(1); } } /***********************************************************************/ //this is a vertical filter vector Maligner::verticalFilter(vector seqs) { try { vector gaps; gaps.resize(query.getAligned().length(), 0); string filterString = (string(query.getAligned().length(), '1')); //for each sequence for (int i = 0; i < seqs.size(); i++) { string seqAligned = seqs[i].getAligned(); for (int j = 0; j < seqAligned.length(); j++) { //if this spot is a gap if ((seqAligned[j] == '-') || (seqAligned[j] == '.')) { gaps[j]++; } } } //zero out spot where all sequences have blanks int numColRemoved = 0; for(int i = 0; i < seqs[0].getAligned().length(); i++){ if(gaps[i] == seqs.size()) { filterString[i] = '0'; numColRemoved++; } } map newMap; //for each sequence for (int i = 0; i < seqs.size(); i++) { string seqAligned = seqs[i].getAligned(); string newAligned = ""; int count = 0; for (int j = 0; j < seqAligned.length(); j++) { //if this spot is not a gap if (filterString[j] == '1') { newAligned += seqAligned[j]; newMap[count] = spotMap[j]; count++; } } seqs[i].setAligned(newAligned); } string query = seqs[seqs.size()-1].getAligned(); int queryLength = query.length(); unalignedMap.resize(queryLength, 0); for(int i=1;ierrorOut(e, "Maligner", "verticalFilter"); exit(1); } } //*************************************************************************************************************** vector< vector > Maligner::buildScoreMatrix(int cols, int rows) { try{ vector< vector > m(rows); for (int i = 0; i < rows; i++) { for (int j = 0; j < cols; j++) { //initialize each cell score_struct temp; temp.prev = -1; temp.score = -9999999; temp.col = j; temp.row = i; m[i].push_back(temp); } } return m; } catch(exception& e) { m->errorOut(e, "Maligner", "buildScoreMatrix"); exit(1); } } //*************************************************************************************************************** void Maligner::fillScoreMatrix(vector >& ms, vector seqs, int penalty) { try{ //get matrix dimensions int numCols = query.getAligned().length(); int numRows = seqs.size(); // cout << numRows << endl; //initialize first col string queryAligned = query.getAligned(); for (int i = 0; i < numRows; i++) { string subjectAligned = seqs[i].getAligned(); //are you both gaps? if ((!isalpha(queryAligned[0])) && (!isalpha(subjectAligned[0]))) { ms[i][0].score = 0; // ms[i][0].mismatches = 0; }else if (queryAligned[0] == subjectAligned[0]) { //|| subjectAligned[0] == 'N') ms[i][0].score = matchScore; // ms[i][0].mismatches = 0; }else{ ms[i][0].score = 0; // ms[i][0].mismatches = 1; } } //fill rest of matrix for (int j = 1; j < numCols; j++) { //iterate through matrix columns // for (int i = 0; i < 1; i++) { //iterate through matrix rows for (int i = 0; i < numRows; i++) { //iterate through matrix rows string subjectAligned = seqs[i].getAligned(); int matchMisMatchScore = 0; //are you both gaps? if ((!isalpha(queryAligned[j])) && (!isalpha(subjectAligned[j]))) { //leave the same }else if ((toupper(queryAligned[j]) == 'N') || (toupper(subjectAligned[j]) == 'N')) { //leave the same }else if (queryAligned[j] == subjectAligned[j]) { matchMisMatchScore = matchScore; }else if (queryAligned[j] != subjectAligned[j]) { matchMisMatchScore = misMatchPenalty; } //compute score based on previous columns scores for (int prevIndex = 0; prevIndex < numRows; prevIndex++) { //iterate through rows int sumScore = matchMisMatchScore + ms[prevIndex][j-1].score; //you are not at yourself if (prevIndex != i) { sumScore += penalty; } if (sumScore < 0) { sumScore = 0; } if (sumScore > ms[i][j].score) { ms[i][j].score = sumScore; ms[i][j].prev = prevIndex; } } // cout << i << '\t' << j << '\t' << queryAligned[j] << '\t' << subjectAligned[j] << '\t' << matchMisMatchScore << '\t' << ms[i][j].score << endl; } } // cout << numRows << '\t' << numCols << endl; // for(int i=0;igetName(); // for(int j=0;jgetName(); // for(int j=0;jerrorOut(e, "Maligner", "fillScoreMatrix"); exit(1); } } //*************************************************************************************************************** vector Maligner::extractHighestPath(vector > ms) { try { //get matrix dimensions int numCols = query.getAligned().length(); int numRows = ms.size(); //find highest score scoring matrix score_struct highestStruct; int highestScore = 0; for (int i = 0; i < numRows; i++) { for (int j = 0; j < numCols; j++) { if (ms[i][j].score > highestScore) { highestScore = ms[i][j].score; highestStruct = ms[i][j]; } } } vector path; int rowIndex = highestStruct.row; int pos = highestStruct.col; int score = highestStruct.score; // cout << rowIndex << '\t' << pos << '\t' << score << endl; while (pos >= 0 && score > 0) { score_struct temp = ms[rowIndex][pos]; score = temp.score; if (score > 0) { path.push_back(temp); } rowIndex = temp.prev; pos--; } reverse(path.begin(), path.end()); return path; } catch(exception& e) { m->errorOut(e, "Maligner", "extractHighestPath"); exit(1); } } //*************************************************************************************************************** vector Maligner::mapTraceRegionsToAlignment(vector path) { try { vector trace; int region_index = path[0].row; int region_start = path[0].col; for (int i = 1; i < path.size(); i++) { int next_region_index = path[i].row; //cout << i << '\t' << next_region_index << endl; if (next_region_index != region_index) { // add trace region int col_index = path[i].col; trace_struct temp; temp.col = region_start; temp.oldCol = col_index-1; temp.row = region_index; trace.push_back(temp); region_index = path[i].row; region_start = col_index; } } // get last one trace_struct temp; temp.col = region_start; temp.oldCol = path[path.size()-1].col; temp.row = region_index; trace.push_back(temp); return trace; } catch(exception& e) { m->errorOut(e, "Maligner", "mapTraceRegionsToAlignment"); exit(1); } } /*************************************************************************************************************** vector Maligner::extractHighestPath(vector > ms) { try { //get matrix dimensions int numCols = query->getAligned().length(); int numRows = ms.size(); //find highest score scoring matrix vector highestStruct; int highestScore = 0; for (int i = 0; i < numRows; i++) { for (int j = 0; j < numCols; j++) { if (ms[i][j].score > highestScore) { highestScore = ms[i][j].score; highestStruct.resize(0); highestStruct.push_back(ms[i][j]); } else if(ms[i][j].score == highestScore){ highestStruct.push_back(ms[i][j]); } } } //cout << endl << highestScore << '\t' << highestStruct.size() << '\t' << highestStruct[0].row << endl; vector maxTrace; double maxPercentIdenticalQueryAntiChimera = 0; for(int i=0;i path; int rowIndex = highestStruct[i].row; int pos = highestStruct[i].col; int score = highestStruct[i].score; while (pos >= 0 && score > 0) { score_struct temp = ms[rowIndex][pos]; score = temp.score; if (score > 0) { path.push_back(temp); } rowIndex = temp.prev; pos--; } reverse(path.begin(), path.end()); vector trace = mapTraceRegionsToAlignment(path, refSeqs); //cout << "traces\n"; //for(int j=0;jgetName() << endl; //} int traceStart = path[0].col; int traceEnd = path[path.size()-1].col; // cout << "traceStart/End\t" << traceStart << '\t' << traceEnd << endl; string queryInRange = query->getAligned(); queryInRange = queryInRange.substr(traceStart, (traceEnd-traceStart+1)); // cout << "here" << endl; string chimeraSeq = constructChimericSeq(trace, refSeqs); string antiChimeraSeq = constructAntiChimericSeq(trace, refSeqs); percentIdenticalQueryChimera = computePercentID(queryInRange, chimeraSeq); double percentIdenticalQueryAntiChimera = computePercentID(queryInRange, antiChimeraSeq); // cout << i << '\t' << percentIdenticalQueryChimera << '\t' << percentIdenticalQueryAntiChimera << endl; if(percentIdenticalQueryAntiChimera > maxPercentIdenticalQueryAntiChimera){ maxPercentIdenticalQueryAntiChimera = percentIdenticalQueryAntiChimera; maxTrace = trace; } } // cout << maxPercentIdenticalQueryAntiChimera << endl; return maxTrace; } catch(exception& e) { m->errorOut(e, "Maligner", "extractHighestPath"); exit(1); } } *************************************************************************************************************** vector Maligner::mapTraceRegionsToAlignment(vector path, vector seqs) { try { vector trace; int region_index = path[0].row; int region_start = path[0].col; for (int i = 1; i < path.size(); i++) { int next_region_index = path[i].row; if (next_region_index != region_index) { // add trace region int col_index = path[i].col; trace_struct temp; temp.col = region_start; temp.oldCol = col_index-1; temp.row = region_index; trace.push_back(temp); region_index = path[i].row; region_start = col_index; } } // get last one trace_struct temp; temp.col = region_start; temp.oldCol = path[path.size()-1].col; temp.row = region_index; trace.push_back(temp); // cout << endl; // cout << trace.size() << endl; // for(int i=0;igetName() << endl; // } // cout << endl; return trace; } catch(exception& e) { m->errorOut(e, "Maligner", "mapTraceRegionsToAlignment"); exit(1); } } */ //*************************************************************************************************************** string Maligner::constructChimericSeq(vector trace, vector seqs) { try { string chimera = ""; for (int i = 0; i < trace.size(); i++) { // cout << i << '\t' << trace[i].row << '\t' << trace[i].col << '\t' << trace[i].oldCol << endl; string seqAlign = seqs[trace[i].row].getAligned(); seqAlign = seqAlign.substr(trace[i].col, (trace[i].oldCol-trace[i].col+1)); chimera += seqAlign; } // cout << chimera << endl; // if (chimera != "") { chimera = chimera.substr(0, (chimera.length()-1)); } //this was introducing a fence post error // cout << chimera << endl; return chimera; } catch(exception& e) { m->errorOut(e, "Maligner", "constructChimericSeq"); exit(1); } } //*************************************************************************************************************** string Maligner::constructAntiChimericSeq(vector trace, vector seqs) { try { string antiChimera = ""; for (int i = 0; i < trace.size(); i++) { // cout << i << '\t' << (trace.size() - i - 1) << '\t' << trace[i].row << '\t' << trace[i].col << '\t' << trace[i].oldCol << endl; int oppositeIndex = trace.size() - i - 1; string seqAlign = seqs[trace[oppositeIndex].row].getAligned(); seqAlign = seqAlign.substr(trace[i].col, (trace[i].oldCol-trace[i].col+1)); antiChimera += seqAlign; } return antiChimera; } catch(exception& e) { m->errorOut(e, "Maligner", "constructChimericSeq"); exit(1); } } //*************************************************************************************************************** float Maligner::computePercentID(string queryAlign, string chimera) { try { if (queryAlign.length() != chimera.length()) { m->mothurOut("Error, alignment strings are of different lengths: "); m->mothurOutEndLine(); m->mothurOut(toString(queryAlign.length())); m->mothurOutEndLine(); m->mothurOut(toString(chimera.length())); m->mothurOutEndLine(); return -1.0; } // cout << queryAlign.length() << endl; int numIdentical = 0; int countA = 0; int countB = 0; for (int i = 0; i < queryAlign.length(); i++) { if (((queryAlign[i] != 'G') && (queryAlign[i] != 'T') && (queryAlign[i] != 'A') && (queryAlign[i] != 'C')&& (queryAlign[i] != '.') && (queryAlign[i] != '-')) || ((chimera[i] != 'G') && (chimera[i] != 'T') && (chimera[i] != 'A') && (chimera[i] != 'C')&& (chimera[i] != '.') && (chimera[i] != '-'))) {} else { bool charA = false; bool charB = false; if ((queryAlign[i] == 'G') || (queryAlign[i] == 'T') || (queryAlign[i] == 'A') || (queryAlign[i] == 'C')) { charA = true; } if ((chimera[i] == 'G') || (chimera[i] == 'T') || (chimera[i] == 'A') || (chimera[i] == 'C')) { charB = true; } if (charA || charB) { if (charA) { countA++; } if (charB) { countB++; } if (queryAlign[i] == chimera[i]) { numIdentical++; } } // cout << queryAlign[i] << '\t' << chimera[i] << '\t' << countA << '\t' << countB << endl; } } // cout << "pat\t" << countA << '\t' << countB << '\t' << numIdentical << endl; float numBases = (countA + countB) /(float) 2; if (numBases == 0) { return 0; } // cout << numIdentical << '\t' << numBases << endl; float percentIdentical = (numIdentical/(float)numBases) * 100; // cout << percentIdentical << endl; return percentIdentical; } catch(exception& e) { m->errorOut(e, "Maligner", "computePercentID"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/chimera/maligner.h000066400000000000000000000035641255543666200200250ustar00rootroot00000000000000#ifndef MALIGNER_H #define MALIGNER_H /* * maligner.h * Mothur * * Created by westcott on 9/23/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "decalc.h" #include "chimera.h" #include "database.hpp" /***********************************************************************/ //This class was modeled after the chimeraMaligner written by the Broad Institute /**********************************************************************/ class Maligner { public: Maligner(vector, int, int, float, int, int); //int, int, int, , string, Database*, Database* ~Maligner() {}; string getResults(Sequence, DeCalculator); float getPercentID() { return percentIdenticalQueryChimera; } vector getOutput() { return outputResults; } private: Sequence query; vector refSeqs; vector db; int minCoverage, minSimilarity, matchScore, misMatchPenalty; float minDivR, percentIdenticalQueryChimera; vector outputResults; map spotMap; vector unalignedMap; vector minCoverageFilter(vector); //removes top matches that do not have minimum coverage with query. int computeChimeraPenalty(); vector verticalFilter(vector); vector< vector > buildScoreMatrix(int, int); void fillScoreMatrix(vector >&, vector, int); vector extractHighestPath(vector >); vector mapTraceRegionsToAlignment(vector); string constructChimericSeq(vector, vector); string constructAntiChimericSeq(vector, vector); float computePercentID(string, string); string chimeraMaligner(int, DeCalculator); MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/chimera/myPerseus.cpp000066400000000000000000000626561255543666200205650ustar00rootroot00000000000000/* * myPerseus.cpp * * * Created by Pat Schloss on 9/5/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "myPerseus.h" /**************************************************************************************************/ int PERSEUSMAXINT = numeric_limits::max(); /**************************************************************************************************/ vector > Perseus::binomial(int maxOrder){ try { vector > binomial(maxOrder+1); for(int i=0;i<=maxOrder;i++){ binomial[i].resize(maxOrder+1); binomial[i][0]=1; binomial[0][i]=0; } binomial[0][0]=1; binomial[1][0]=1; binomial[1][1]=1; for(int i=2;i<=maxOrder;i++){ binomial[1][i]=0; } for(int i=2;i<=maxOrder;i++){ for(int j=1;j<=maxOrder;j++){ if(i==j){ binomial[i][j]=1; } if(j>i) { binomial[i][j]=0; } else { binomial[i][j]=binomial[i-1][j-1]+binomial[i-1][j]; } } } return binomial; } catch(exception& e) { m->errorOut(e, "Perseus", "binomial"); exit(1); } } /**************************************************************************************************/ double Perseus::basicPairwiseAlignSeqs(string query, string reference, string& qAlign, string& rAlign, pwModel model){ try { double GAP = model.GAP_OPEN; double MATCH = model.MATCH; double MISMATCH = model.MISMATCH; int queryLength = query.size(); int refLength = reference.size(); vector > alignMatrix(queryLength + 1); vector > alignMoves(queryLength + 1); for(int i=0;i<=queryLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[i].resize(refLength + 1, 0); alignMoves[i].resize(refLength + 1, 'x'); } for(int i=0;i<=queryLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[i][0] = GAP * i; alignMoves[i][0] = 'u'; } for(int i=0;i<=refLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[0][i] = GAP * i; alignMoves[0][i] = 'l'; } for(int i=1;i<=queryLength;i++){ if (m->control_pressed) { return 0; } for(int j=1;j<=refLength;j++){ double nogapScore; if(query[i-1] == reference[j-1]){ nogapScore = alignMatrix[i-1][j-1] + MATCH; } else { nogapScore = alignMatrix[i-1][j-1] + MISMATCH; } double leftScore; if(i == queryLength) { leftScore = alignMatrix[i][j-1]; } else { leftScore = alignMatrix[i][j-1] + GAP; } double upScore; if(j == refLength) { upScore = alignMatrix[i-1][j]; } else { upScore = alignMatrix[i-1][j] + GAP; } if(nogapScore > leftScore){ if(nogapScore > upScore){ alignMoves[i][j] = 'd'; alignMatrix[i][j] = nogapScore; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = upScore; } } else{ if(leftScore > upScore){ alignMoves[i][j] = 'l'; alignMatrix[i][j] = leftScore; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = upScore; } } } } int i = queryLength; int j = refLength; qAlign = ""; rAlign = ""; int diffs = 0; int length = 0; while(i > 0 && j > 0){ if (m->control_pressed) { return 0; } if(alignMoves[i][j] == 'd'){ qAlign = query[i-1] + qAlign; rAlign = reference[j-1] + rAlign; if(query[i-1] != reference[j-1]){ diffs++; } length++; i--; j--; } else if(alignMoves[i][j] == 'u'){ qAlign = query[i-1] + qAlign; if(j != refLength) { rAlign = '-' + rAlign; diffs++; length++; } else { rAlign = '.' + rAlign; } i--; } else if(alignMoves[i][j] == 'l'){ rAlign = reference[j-1] + rAlign; if(i != queryLength){ qAlign = '-' + qAlign; diffs++; length++; } else { qAlign = '.' + qAlign; } j--; } } while(i>0){ if (m->control_pressed) { return 0; } rAlign = '.' + rAlign; qAlign = query[i-1] + qAlign; i--; } while(j>0){ if (m->control_pressed) { return 0; } rAlign = reference[j-1] + rAlign; qAlign = '.' + qAlign; j--; } return double(diffs)/double(length); } catch(exception& e) { m->errorOut(e, "Perseus", "basicPairwiseAlignSeqs"); exit(1); } } /**************************************************************************************************/ int Perseus::getDiffs(string qAlign, string rAlign, vector& leftDiffs, vector& leftMap, vector& rightDiffs, vector& rightMap){ try { int alignLength = qAlign.length(); int lDiffs = 0; int lCount = 0; for(int l=0;lcontrol_pressed) { return 0; } if(qAlign[l] == '-'){ lDiffs++; } else if(qAlign[l] != '.'){ if(rAlign[l] == '-'){ lDiffs++; } else if(qAlign[l] != rAlign[l] && rAlign[l] != '.'){ lDiffs++; } leftDiffs[lCount] = lDiffs; leftMap[lCount] = l; lCount++; } } int rDiffs = 0; int rCount = 0; for(int l=alignLength-1;l>=0;l--){ if (m->control_pressed) { return 0; } if(qAlign[l] == '-'){ rDiffs++; } else if(qAlign[l] != '.'){ if(rAlign[l] == '-'){ rDiffs++; } else if(qAlign[l] != rAlign[l] && rAlign[l] != '.'){ rDiffs++; } rightDiffs[rCount] = rDiffs; rightMap[rCount] = l; rCount++; } } return 0; } catch(exception& e) { m->errorOut(e, "Perseus", "getDiffs"); exit(1); } } /**************************************************************************************************/ int Perseus::getLastMatch(char direction, vector >& alignMoves, int i, int j, string& seqA, string& seqB){ try { char nullReturn = -1; while(i>=1 && j>=1){ if (m->control_pressed) { return 0; } if(direction == 'd'){ if(seqA[i-1] == seqB[j-1]) { return seqA[i-1]; } else { return nullReturn; } } else if(direction == 'l') { j--; } else { i--; } direction = alignMoves[i][j]; } return nullReturn; } catch(exception& e) { m->errorOut(e, "Perseus", "getLastMatch"); exit(1); } } /**************************************************************************************************/ int Perseus::toInt(char b){ try { if(b == 'A') { return 0; } else if(b == 'C') { return 1; } else if(b == 'T') { return 2; } else if(b == 'G') { return 3; } else { m->mothurOut("[ERROR]: " + toString(b) + " is not ATGC."); m->mothurOutEndLine(); return -1; } } catch(exception& e) { m->errorOut(e, "Perseus", "toInt"); exit(1); } } /**************************************************************************************************/ double Perseus::modeledPairwiseAlignSeqs(string query, string reference, string& qAlign, string& rAlign, vector >& correctMatrix){ try { int queryLength = query.size(); int refLength = reference.size(); vector > alignMatrix(queryLength + 1); vector > alignMoves(queryLength + 1); for(int i=0;i<=queryLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[i].resize(refLength + 1, 0); alignMoves[i].resize(refLength + 1, 'x'); } for(int i=0;i<=queryLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[i][0] = 15.0 * i; alignMoves[i][0] = 'u'; } for(int i=0;i<=refLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[0][i] = 15.0 * i; alignMoves[0][i] = 'l'; } for(int i=1;i<=queryLength;i++){ if (m->control_pressed) { return 0; } for(int j=1;j<=refLength;j++){ double nogap; nogap = alignMatrix[i-1][j-1] + correctMatrix[toInt(query[i-1])][toInt(reference[j-1])]; double gap; double left; if(i == queryLength){ //terminal gap left = alignMatrix[i][j-1]; } else{ if(reference[j-1] == getLastMatch('l', alignMoves, i, j, query, reference)){ gap = 4.0; } else{ gap = 15.0; } left = alignMatrix[i][j-1] + gap; } double up; if(j == refLength){ //terminal gap up = alignMatrix[i-1][j]; } else{ if(query[i-1] == getLastMatch('u', alignMoves, i, j, query, reference)){ gap = 4.0; } else{ gap = 15.0; } up = alignMatrix[i-1][j] + gap; } if(nogap < left){ if(nogap < up){ alignMoves[i][j] = 'd'; alignMatrix[i][j] = nogap; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = up; } } else{ if(left < up){ alignMoves[i][j] = 'l'; alignMatrix[i][j] = left; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = up; } } } } int i = queryLength; int j = refLength; int alignLength = 0; while(i > 0 && j > 0){ if (m->control_pressed) { return 0; } if(alignMoves[i][j] == 'd'){ qAlign = query[i-1] + qAlign; rAlign = reference[j-1] + rAlign; alignLength++; i--; j--; } else if(alignMoves[i][j] == 'u'){ if(j != refLength){ qAlign = query[i-1] + qAlign; rAlign = '-' + rAlign; alignLength++; } i--; } else if(alignMoves[i][j] == 'l'){ if(i != queryLength){ qAlign = '-' + qAlign; rAlign = reference[j-1] + rAlign; alignLength++; } j--; } } return alignMatrix[queryLength][refLength] / (double)alignLength; } catch(exception& e) { m->errorOut(e, "Perseus", "modeledPairwiseAlignSeqs"); exit(1); } } /**************************************************************************************************/ int Perseus::getAlignments(int curSequenceIndex, vector sequences, vector& alignments, vector >& leftDiffs, vector >& leftMaps, vector >& rightDiffs, vector >& rightMaps, int& bestRefSeq, int& bestRefDiff, vector& restricted){ try { int numSeqs = sequences.size(); //int bestSequenceMismatch = PERSEUSMAXINT; string curSequence = sequences[curSequenceIndex].sequence; int curFrequency = sequences[curSequenceIndex].frequency; bestRefSeq = -1; int bestIndex = -1; int bestDiffs = PERSEUSMAXINT; int comparisons = 0; pwModel model(0, -1, -1.5); for(int i=0;icontrol_pressed) { return 0; } if(i != curSequenceIndex && restricted[i] != 1 && sequences[i].frequency >= 2 * curFrequency){ string refSequence = sequences[i].sequence; leftDiffs[i].assign(curSequence.length(), 0); leftMaps[i].assign(curSequence.length(), 0); rightDiffs[i].assign(curSequence.length(), 0); rightMaps[i].assign(curSequence.length(), 0); basicPairwiseAlignSeqs(curSequence, refSequence, alignments[i].query, alignments[i].reference, model); getDiffs(alignments[i].query, alignments[i].reference, leftDiffs[i], leftMaps[i], rightDiffs[i], rightMaps[i]); int diffs = rightDiffs[i][curSequence.length()-1]; if(diffs < bestDiffs){ bestDiffs = diffs; bestIndex = i; } comparisons++; restricted[i] = 0; } else{ restricted[i] = 1; } } bestRefSeq = bestIndex; bestRefDiff = bestDiffs; return comparisons; } catch(exception& e) { m->errorOut(e, "Perseus", "getAlignments"); exit(1); } } /**************************************************************************************************/ int Perseus::getChimera(vector sequences, vector >& leftDiffs, vector >& rightDiffs, int& leftParent, int& rightParent, int& breakPoint, vector& singleLeft, vector& bestLeft, vector& singleRight, vector& bestRight, vector restricted){ try { int numRefSeqs = restricted.size(); int seqLength = leftDiffs[0].size(); singleLeft.resize(seqLength, PERSEUSMAXINT); bestLeft.resize(seqLength, -1); for(int l=0;lcontrol_pressed) { return 0; } for(int i=0;i sequences[bestLeft[l]].frequency))){ singleLeft[l] = leftDiffs[i][l]; bestLeft[l] = i; } } } } singleRight.resize(seqLength, PERSEUSMAXINT); bestRight.resize(seqLength, -1); for(int l=0;lcontrol_pressed) { return 0; } for(int i=0;i sequences[bestRight[l]].frequency))){ singleRight[l] = rightDiffs[i][l]; bestRight[l] = i; } } } } int bestChimeraMismatches = PERSEUSMAXINT; leftParent = -1; rightParent = -1; breakPoint = -1; for(int l=0;lcontrol_pressed) { return 0; } int chimera = singleLeft[l] + singleRight[seqLength - l - 2]; if(chimera < bestChimeraMismatches){ bestChimeraMismatches = chimera; breakPoint = l; leftParent = bestLeft[l]; rightParent = bestRight[seqLength - l - 2]; } } return bestChimeraMismatches; } catch(exception& e) { m->errorOut(e, "Perseus", "getChimera"); exit(1); } } /**************************************************************************************************/ string Perseus::stitchBimera(vector& alignments, int leftParent, int rightParent, int breakPoint, vector >& leftMaps, vector >& rightMaps){ try { int breakLeft = leftMaps[leftParent][breakPoint]; int breakRight = rightMaps[rightParent][rightMaps[rightParent].size() - breakPoint - 2]; string left = alignments[leftParent].reference; string right = alignments[rightParent].reference; string chimera = ""; for(int i=0;i<=breakLeft;i++){ if (m->control_pressed) { return 0; } if(left[i] != '-' && left[i] != '.'){ chimera += left[i]; } } for(int i=breakRight;icontrol_pressed) { return 0; } if(right[i] != '-' && right[i] != '.'){ chimera += right[i]; } } return chimera; } catch(exception& e) { m->errorOut(e, "Perseus", "stitchBimera"); exit(1); } } /**************************************************************************************************/ int Perseus::getTrimera(vector& sequences, vector >& leftDiffs, int& leftParent, int& middleParent, int& rightParent, int& breakPointA, int& breakPointB, vector& singleLeft, vector& bestLeft, vector& singleRight, vector& bestRight, vector restricted){ try { int numRefSeqs = leftDiffs.size(); int alignLength = leftDiffs[0].size(); int bestTrimeraMismatches = PERSEUSMAXINT; leftParent = -1; middleParent = -1; rightParent = -1; breakPointA = -1; breakPointB = -1; vector > minDelta(alignLength); vector > minDeltaSeq(alignLength); for(int i=0;icontrol_pressed) { return 0; } minDelta[i].assign(alignLength, PERSEUSMAXINT); minDeltaSeq[i].assign(alignLength, -1); } for(int x=0;xcontrol_pressed) { return 0; } if(restricted[i] == 0){ int delta = leftDiffs[i][y] - leftDiffs[i][x]; if(delta < minDelta[x][y] || (delta == minDelta[x][y] && sequences[i].frequency > sequences[minDeltaSeq[x][y]].frequency)){ minDelta[x][y] = delta; minDeltaSeq[x][y] = i; } } } minDelta[x][y] += singleLeft[x] + singleRight[alignLength - y - 2]; if(minDelta[x][y] < bestTrimeraMismatches){ bestTrimeraMismatches = minDelta[x][y]; breakPointA = x; breakPointB = y; leftParent = bestLeft[x]; middleParent = minDeltaSeq[x][y]; rightParent = bestRight[alignLength - y - 2]; } } } return bestTrimeraMismatches; } catch(exception& e) { m->errorOut(e, "Perseus", "getTrimera"); exit(1); } } /**************************************************************************************************/ string Perseus::stitchTrimera(vector alignments, int leftParent, int middleParent, int rightParent, int breakPointA, int breakPointB, vector >& leftMaps, vector >& rightMaps){ try { int p1SplitPoint = leftMaps[leftParent][breakPointA]; int p2SplitPoint = leftMaps[middleParent][breakPointB]; int p3SplitPoint = rightMaps[rightParent][rightMaps[rightParent].size() - breakPointB - 2]; string chimeraRefSeq; for(int i=0;i<=p1SplitPoint;i++){ if (m->control_pressed) { return chimeraRefSeq; } if(alignments[leftParent].reference[i] != '-' && alignments[leftParent].reference[i] != '.'){ chimeraRefSeq += alignments[leftParent].reference[i]; } } for(int i=p1SplitPoint+1;i<=p2SplitPoint;i++){ if (m->control_pressed) { return chimeraRefSeq; } if(alignments[middleParent].reference[i] != '-' && alignments[middleParent].reference[i] != '.'){ chimeraRefSeq += alignments[middleParent].reference[i]; } } for(int i=p3SplitPoint;icontrol_pressed) { return chimeraRefSeq; } if(alignments[rightParent].reference[i] != '-' && alignments[rightParent].reference[i] != '.'){ chimeraRefSeq += alignments[rightParent].reference[i]; } } return chimeraRefSeq; } catch(exception& e) { m->errorOut(e, "Perseus", "stitchTrimera"); exit(1); } } /**************************************************************************************************/ int Perseus::threeWayAlign(string query, string parent1, string parent2, string& qAlign, string& aAlign, string& bAlign){ try { pwModel model(1.0, -1.0, -5.0); string qL, rL; string qR, rR; basicPairwiseAlignSeqs(query, parent1, qL, rL, model); basicPairwiseAlignSeqs(query, parent2, qR, rR, model); int lLength = qL.length(); int rLength = qR.length(); string qLNew, rLNew; string qRNew, rRNew; int lIndex = 0; int rIndex = 0; while(lIndexcontrol_pressed) { return 0; } if(qL[lIndex] == qR[rIndex]){ qLNew += qL[lIndex]; rLNew += rL[lIndex]; lIndex++; qRNew += qR[rIndex]; rRNew += rR[rIndex]; rIndex++; } else if(qL[lIndex] == '-' || qL[lIndex] == '.'){ //insert a gap into the right sequences qLNew += qL[lIndex]; rLNew += rL[lIndex]; lIndex++; if(rIndex != rLength){ qRNew += '-'; rRNew += '-'; } else{ qRNew += '.'; rRNew += '.'; } } else if(qR[rIndex] == '-' || qR[rIndex] == '.'){ //insert a gap into the left sequences qRNew += qR[rIndex]; rRNew += rR[rIndex]; rIndex++; if(lIndex != lLength){ qLNew += '-'; rLNew += '-'; } else{ qLNew += '.'; rLNew += '.'; } } } qAlign = qLNew; aAlign = rLNew; bAlign = rRNew; bool qStart = 0; bool aStart = 0; bool bStart = 0; for(int i=0;icontrol_pressed) { return 0; } if(qStart == 0){ if(qAlign[i] == '-') { qAlign[i] = '.'; } else { qStart = 1; } } if(aStart == 0){ if(aAlign[i] == '-') { aAlign[i] = '.'; } else { aStart = 1; } } if(bStart == 0){ if(bAlign[i] == '-') { bAlign[i] = '.'; } else { bStart = 1; } } if(aStart == 1 && bStart == 1 && qStart == 1){ break; } } return 0; } catch(exception& e) { m->errorOut(e, "Perseus", "threeWayAlign"); exit(1); } } /**************************************************************************************************/ double Perseus::calcLoonIndex(string query, string parent1, string parent2, int breakPoint, vector >& binMatrix){ try { string queryAln, leftParentAln, rightParentAln; threeWayAlign(query, parent1, parent2, queryAln, leftParentAln, rightParentAln); int alignLength = queryAln.length(); int endPos = alignLength; for(int i=alignLength-1;i>=0; i--){ if(queryAln[i] != '.' && leftParentAln[i] != '.' && rightParentAln[i] != '.'){ endPos = i + 1; break; } } int diffToLeftCount = 0; vector diffToLeftMap(alignLength, 0); int diffToRightCount = 0; vector diffToRightMap(alignLength, 0); for(int i=0;icontrol_pressed) { return 0; } if(queryAln[i] != leftParentAln[i]){ diffToLeftMap[diffToLeftCount] = i; diffToLeftCount++; } if(queryAln[i] != rightParentAln[i]){ diffToRightMap[diffToRightCount] = i; diffToRightCount++; } } diffToLeftMap[diffToLeftCount] = endPos; diffToRightMap[diffToRightCount] = endPos; int indexL = 0; int indexR = 0; int indexS = 0; vector diffs; vector splits; splits.push_back(-1); diffs.push_back(diffToRightCount); indexS++; while(indexL < diffToLeftCount || indexR < diffToRightCount){ if (m->control_pressed) { return 0; } if(diffToLeftMap[indexL] <= diffToRightMap[indexR]){ diffs.push_back(diffs[indexS - 1] + 1); splits.push_back(diffToLeftMap[indexL]); indexL++; indexS++; } else if(diffToLeftMap[indexL] > diffToRightMap[indexR]) { diffs.push_back(diffs[indexS - 1] - 1); splits.push_back(diffToRightMap[indexR]); indexR++; indexS++; } } int minDiff = PERSEUSMAXINT; int minIndex = -1; for(int i=0;icontrol_pressed) { return 0; } if(diffs[i] < minDiff){ minDiff = diffs[i]; minIndex = i; } } int splitPos = endPos; if(minIndex < indexS - 1){ splitPos = (splits[minIndex]+splits[minIndex+1]) / 2; } int diffToChimera = 0; int leftDiffToP1 = 0; int rightDiffToP1 = 0; int leftDiffToP2 = 0; int rightDiffToP2 = 0; for(int i=0;icontrol_pressed) { return 0; } char bQuery = queryAln[i]; char bP1 = leftParentAln[i]; char bP2 = rightParentAln[i]; char bConsensus = bQuery; if(bP1 == bP2){ bConsensus = bP1; } if(bConsensus != bQuery){ diffToChimera++; } if(bConsensus != bP1){ if(i <= splitPos){ leftDiffToP1++; } else{ rightDiffToP1++; } } if(bConsensus != bP2){ if(i <= splitPos){ leftDiffToP2++; } else{ rightDiffToP2++; } } } int diffToClosestParent, diffToFurtherParent; int xA, xB, yA, yB; double aFraction, bFraction; if(diffToLeftCount <= diffToRightCount){ //if parent 1 is closer diffToClosestParent = leftDiffToP1 + rightDiffToP1; xA = leftDiffToP1; xB = rightDiffToP1; diffToFurtherParent = leftDiffToP2 + rightDiffToP2; yA = leftDiffToP2; yB = rightDiffToP2; aFraction = double(splitPos + 1)/(double) endPos; bFraction = 1 - aFraction; } else{ //if parent 2 is closer diffToClosestParent = leftDiffToP2 + rightDiffToP2; xA = rightDiffToP2; xB = leftDiffToP2; diffToFurtherParent = leftDiffToP1 + rightDiffToP1; yA = rightDiffToP1; yB = leftDiffToP1; bFraction = double(splitPos + 1)/(double) endPos; aFraction = 1 - bFraction; } double loonIndex = 0; int totalDifference = diffToClosestParent + diffToChimera; if(totalDifference > 0){ double prob = 0; for(int i=diffToClosestParent;i<=totalDifference;i++){ prob += binMatrix[totalDifference][i] * pow(0.50, i) * pow(0.50, totalDifference - i); } loonIndex += -log(prob); } if(diffToFurtherParent > 0){ double prob = 0; for(int i=yA;i<=diffToFurtherParent;i++){ prob += binMatrix[diffToFurtherParent][i] * pow(aFraction, i) * pow(1-aFraction, diffToFurtherParent - i); } loonIndex += -log(prob); } if(diffToClosestParent > 0){ double prob = 0; for(int i=xB;i<=diffToClosestParent;i++){ prob += binMatrix[diffToClosestParent][i] * pow(bFraction, i) * pow(1-bFraction, diffToClosestParent - i); } loonIndex += -log(prob); } return loonIndex; } catch(exception& e) { m->errorOut(e, "Perseus", "calcLoonIndex"); exit(1); } } /**************************************************************************************************/ double Perseus::calcBestDistance(string query, string reference){ try { int alignLength = query.length(); int mismatch = 0; int counter = 0; for(int i=0;icontrol_pressed) { return 0; } if((query[i] != '.' || reference[i] != '.') && (query[i] != '-' && reference[i] != '-')){ if(query[i] != reference[i]){ mismatch++; } counter++; } } return (double)mismatch / (double)counter; } catch(exception& e) { m->errorOut(e, "Perseus", "calcBestDistance"); exit(1); } } /**************************************************************************************************/ double Perseus::classifyChimera(double singleDist, double cIndex, double loonIndex, double alpha, double beta){ try { double difference = cIndex - singleDist; //y double probability; if(cIndex >= 0.15 || difference > 0.00){ probability = 0.0000; } else{ probability = 1.0 / (1.0 + exp(-(alpha + beta * loonIndex))); } return probability; } catch(exception& e) { m->errorOut(e, "Perseus", "classifyChimera"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/chimera/myPerseus.h000066400000000000000000000055321255543666200202200ustar00rootroot00000000000000#ifndef MOTHURPERSEUS #define MOTHURPERSEUS /* * myPerseus.h * * * Created by Pat Schloss on 9/5/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "mothurout.h" /**************************************************************************************************/ struct seqData { seqData(string name, string seq, int freq) : seqName(name), sequence(seq), frequency(freq) { } bool operator<( seqData const& rhs ) const { bool verdict = 0; if(frequency < rhs.frequency){ verdict = 1; } else if(frequency == rhs.frequency){ verdict = (seqName > rhs.seqName); } return verdict; } string seqName; string sequence; int frequency; }; /**************************************************************************************************/ struct pwModel { pwModel(double m, double mm, double g): MATCH(m), MISMATCH(mm), GAP_OPEN(g) {;} double MATCH; double MISMATCH; double GAP_OPEN; }; /**************************************************************************************************/ struct pwAlign { pwAlign(): query(""), reference(""){} pwAlign(string q, string r): query(q), reference(r){} string query; string reference; }; /**************************************************************************************************/ class Perseus { public: Perseus() { m = MothurOut::getInstance(); } ~Perseus() {} vector > binomial(int); double modeledPairwiseAlignSeqs(string, string, string&, string&, vector >&); int getAlignments(int, vector, vector&, vector >& , vector >&, vector >&, vector >&, int&, int&, vector&); int getChimera(vector,vector >&, vector >&,int&, int&, int&,vector&, vector&, vector&, vector&, vector); string stitchBimera(vector&, int, int, int, vector >&, vector >&); int getTrimera(vector&, vector >&, int&, int&, int&, int&, int&, vector&, vector&, vector&, vector&, vector); string stitchTrimera(vector, int, int, int, int, int, vector >&, vector >&); double calcLoonIndex(string, string, string, int, vector >&); double classifyChimera(double, double, double, double, double); private: MothurOut* m; int toInt(char); double basicPairwiseAlignSeqs(string, string, string&, string&, pwModel); int getDiffs(string, string, vector&, vector&, vector&, vector&); int getLastMatch(char, vector >&, int, int, string&, string&); int threeWayAlign(string, string, string, string&, string&, string&); double calcBestDistance(string, string); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/chimera/pintail.cpp000066400000000000000000000614261255543666200202230ustar00rootroot00000000000000/* * pintail.cpp * Mothur * * Created by Sarah Westcott on 7/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "pintail.h" #include "ignoregaps.h" #include "eachgapdist.h" //******************************************************************************************************************** //sorts lowest to highest inline bool compareQuanMembers(quanMember left, quanMember right){ return (left.score < right.score); } //*************************************************************************************************************** Pintail::Pintail(string filename, string temp, bool f, int p, string mask, string cons, string q, int win, int inc, string o) : Chimera() { try { fastafile = filename; templateFileName = temp; templateSeqs = readSeqs(temp); filter = f; processors = p; setMask(mask); consfile = cons; quanfile = q; window = win; increment = inc; outputDir = o; distcalculator = new eachGapDist(); decalc = new DeCalculator(); doPrep(); } catch(exception& e) { m->errorOut(e, "Pintail", "Pintail"); exit(1); } } //*************************************************************************************************************** Pintail::~Pintail() { try { delete distcalculator; delete decalc; } catch(exception& e) { m->errorOut(e, "Pintail", "~Pintail"); exit(1); } } //*************************************************************************************************************** int Pintail::doPrep() { try { mergedFilterString = ""; windowSizesTemplate.resize(templateSeqs.size(), window); quantiles.resize(100); //one for every percent mismatch quantilesMembers.resize(100); //one for every percent mismatch //if the user does not enter a mask then you want to keep all the spots in the alignment if (seqMask.length() == 0) { decalc->setAlignmentLength(templateSeqs[0]->getAligned().length()); } else { decalc->setAlignmentLength(seqMask.length()); } decalc->setMask(seqMask); #ifdef USE_MPI //do nothing #else #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //find breakup of templatefile for quantiles if (processors == 1) { templateLines.push_back(new linePair(0, templateSeqs.size())); } else { for (int i = 0; i < processors; i++) { templateLines.push_back(new linePair()); templateLines[i]->start = int (sqrt(float(i)/float(processors)) * templateSeqs.size()); templateLines[i]->end = int (sqrt(float(i+1)/float(processors)) * templateSeqs.size()); } } #else templateLines.push_back(new linePair(0, templateSeqs.size())); #endif #endif m->mothurOut("Getting conservation... "); cout.flush(); if (consfile == "") { m->mothurOut("Calculating probability of conservation for your template sequences. This can take a while... I will output the frequency of the highest base in each position to a .freq file so that you can input them using the conservation parameter next time you run this command. Providing the .freq file will improve speed. "); cout.flush(); probabilityProfile = decalc->calcFreq(templateSeqs, templateFileName); if (m->control_pressed) { return 0; } m->mothurOut("Done."); m->mothurOutEndLine(); }else { probabilityProfile = readFreq(); m->mothurOut("Done."); } m->mothurOutEndLine(); //make P into Q for (int i = 0; i < probabilityProfile.size(); i++) { probabilityProfile[i] = 1 - probabilityProfile[i]; } // bool reRead = false; //create filter if needed for later if (filter) { //read in all query seqs vector tempQuerySeqs = readSeqs(fastafile); vector temp; //merge query seqs and template seqs temp = templateSeqs; for (int i = 0; i < tempQuerySeqs.size(); i++) { temp.push_back(tempQuerySeqs[i]); } if (seqMask != "") { reRead = true; //mask templates for (int i = 0; i < temp.size(); i++) { if (m->control_pressed) { for (int i = 0; i < tempQuerySeqs.size(); i++) { delete tempQuerySeqs[i]; } return 0; } decalc->runMask(temp[i]); } } mergedFilterString = createFilter(temp, 0.5); if (m->control_pressed) { for (int i = 0; i < tempQuerySeqs.size(); i++) { delete tempQuerySeqs[i]; } return 0; } //reread template seqs for (int i = 0; i < tempQuerySeqs.size(); i++) { delete tempQuerySeqs[i]; } } //quantiles are used to determine whether the de values found indicate a chimera //if you have to calculate them, its time intensive because you are finding the de and deviation values for each //combination of sequences in the template if (quanfile != "") { quantiles = readQuantiles(); }else { if ((!filter) && (seqMask != "")) { //if you didn't filter but you want to mask. if you filtered then you did mask first above. reRead = true; //mask templates for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { return 0; } decalc->runMask(templateSeqs[i]); } } if (filter) { reRead = true; for (int i = 0; i < templateSeqs.size(); i++) { if (m->control_pressed) { return 0; } runFilter(templateSeqs[i]); } } m->mothurOut("Calculating quantiles for your template. This can take a while... I will output the quantiles to a .quan file that you can input them using the quantiles parameter next time you run this command. Providing the .quan file will dramatically improve speed. "); cout.flush(); if (processors == 1) { quantilesMembers = decalc->getQuantiles(templateSeqs, windowSizesTemplate, window, probabilityProfile, increment, 0, templateSeqs.size()); }else { createProcessesQuan(); } if (m->control_pressed) { return 0; } string noOutliers, outliers; if ((!filter) && (seqMask == "")) { noOutliers = m->getRootName(m->getSimpleName(templateFileName)) + "pintail.quan"; }else if ((!filter) && (seqMask != "")) { noOutliers =m->getRootName(m->getSimpleName(templateFileName)) + "pintail.masked.quan"; }else if ((filter) && (seqMask != "")) { noOutliers = m->getRootName(m->getSimpleName(templateFileName)) + "pintail.filtered." + m->getSimpleName(m->getRootName(fastafile)) + "masked.quan"; }else if ((filter) && (seqMask == "")) { noOutliers = m->getRootName(m->getSimpleName(templateFileName)) + "pintail.filtered." + m->getSimpleName(m->getRootName(fastafile)) + "quan"; } decalc->removeObviousOutliers(quantilesMembers, templateSeqs.size()); if (m->control_pressed) { return 0; } string outputString = "#" + m->getVersion() + "\n"; //adjust quantiles for (int i = 0; i < quantilesMembers.size(); i++) { vector temp; if (quantilesMembers[i].size() == 0) { //in case this is not a distance found in your template files for (int g = 0; g < 6; g++) { temp.push_back(0.0); } }else{ sort(quantilesMembers[i].begin(), quantilesMembers[i].end()); //save 10% temp.push_back(quantilesMembers[i][int(quantilesMembers[i].size() * 0.10)]); //save 25% temp.push_back(quantilesMembers[i][int(quantilesMembers[i].size() * 0.25)]); //save 50% temp.push_back(quantilesMembers[i][int(quantilesMembers[i].size() * 0.5)]); //save 75% temp.push_back(quantilesMembers[i][int(quantilesMembers[i].size() * 0.75)]); //save 95% temp.push_back(quantilesMembers[i][int(quantilesMembers[i].size() * 0.95)]); //save 99% temp.push_back(quantilesMembers[i][int(quantilesMembers[i].size() * 0.99)]); } //output quan value outputString += toString(i+1); for (int u = 0; u < temp.size(); u++) { outputString += "\t" + toString(temp[u]); } outputString += "\n"; quantiles[i] = temp; } printQuanFile(noOutliers, outputString); //free memory quantilesMembers.clear(); m->mothurOut("Done."); m->mothurOutEndLine(); } if (reRead) { for (int i = 0; i < templateSeqs.size(); i++) { delete templateSeqs[i]; } templateSeqs.clear(); templateSeqs = readSeqs(templateFileName); } //free memory for (int i = 0; i < templateLines.size(); i++) { delete templateLines[i]; } return 0; } catch(exception& e) { m->errorOut(e, "Pintail", "doPrep"); exit(1); } } //*************************************************************************************************************** Sequence Pintail::print(ostream& out, ostream& outAcc) { try { int index = ceil(deviation); //is your DE value higher than the 95% string chimera; if (index != 0) { //if index is 0 then its an exact match to a template seq if (quantiles[index][4] == 0.0) { chimera = "Your template does not include sequences that provide quantile values at distance " + toString(index); }else { if (DE > quantiles[index][4]) { chimera = "Yes"; } else { chimera = "No"; } } }else{ chimera = "No"; } out << querySeq->getName() << '\t' << "div: " << deviation << "\tstDev: " << DE << "\tchimera flag: " << chimera << endl; if (chimera == "Yes") { m->mothurOut(querySeq->getName() + "\tdiv: " + toString(deviation) + "\tstDev: " + toString(DE) + "\tchimera flag: " + chimera); m->mothurOutEndLine(); outAcc << querySeq->getName() << endl; } out << "Observed"; for (int j = 0; j < obsDistance.size(); j++) { out << '\t' << obsDistance[j]; } out << endl; out << "Expected"; for (int m = 0; m < expectedDistance.size(); m++) { out << '\t' << expectedDistance[m] ; } out << endl; return *querySeq; } catch(exception& e) { m->errorOut(e, "Pintail", "print"); exit(1); } } #ifdef USE_MPI //*************************************************************************************************************** Sequence Pintail::print(MPI_File& out, MPI_File& outAcc) { try { string outputString = ""; int index = ceil(deviation); //is your DE value higher than the 95% string chimera; if (index != 0) { //if index is 0 then its an exact match to a template seq if (quantiles[index][4] == 0.0) { chimera = "Your template does not include sequences that provide quantile values at distance " + toString(index); }else { if (DE > quantiles[index][4]) { chimera = "Yes"; } else { chimera = "No"; } } }else{ chimera = "No"; } outputString += querySeq->getName() + "\tdiv: " + toString(deviation) + "\tstDev: " + toString(DE) + "\tchimera flag: " + chimera + "\n"; if (chimera == "Yes") { cout << querySeq->getName() << "\tdiv: " << toString(deviation) << "\tstDev: " << toString(DE) << "\tchimera flag: " << chimera << endl; string outAccString = querySeq->getName() + "\n"; MPI_Status statusAcc; int length = outAccString.length(); char* buf = new char[length]; memcpy(buf, outAccString.c_str(), length); MPI_File_write_shared(outAcc, buf, length, MPI_CHAR, &statusAcc); delete buf; return *querySeq; } outputString += "Observed"; for (int j = 0; j < obsDistance.size(); j++) { outputString += "\t" + toString(obsDistance[j]); } outputString += "\n"; outputString += "Expected"; for (int m = 0; m < expectedDistance.size(); m++) { outputString += "\t" + toString(expectedDistance[m]); } outputString += "\n"; MPI_Status status; int length = outputString.length(); char* buf2 = new char[length]; memcpy(buf2, outputString.c_str(), length); MPI_File_write_shared(out, buf2, length, MPI_CHAR, &status); delete buf2; return *querySeq; } catch(exception& e) { m->errorOut(e, "Pintail", "print"); exit(1); } } #endif //*************************************************************************************************************** int Pintail::getChimeras(Sequence* query) { try { querySeq = query; trimmed.clear(); windowSizes = window; //find pairs has to be done before a mask bestfit = findPairs(query); if (m->control_pressed) { return 0; } //if they mask if (seqMask != "") { decalc->runMask(query); decalc->runMask(bestfit); } if (filter) { //must be done after a mask runFilter(query); runFilter(bestfit); } //trim seq decalc->trimSeqs(query, bestfit, trimmed); //find windows it = trimmed.begin(); windowsForeachQuery = decalc->findWindows(query, it->first, it->second, windowSizes, increment); //find observed distance obsDistance = decalc->calcObserved(query, bestfit, windowsForeachQuery, windowSizes); if (m->control_pressed) { return 0; } Qav = decalc->findQav(windowsForeachQuery, windowSizes, probabilityProfile); if (m->control_pressed) { return 0; } //find alpha seqCoef = decalc->getCoef(obsDistance, Qav); //calculating expected distance expectedDistance = decalc->calcExpected(Qav, seqCoef); if (m->control_pressed) { return 0; } //finding de DE = decalc->calcDE(obsDistance, expectedDistance); if (m->control_pressed) { return 0; } //find distance between query and closest match it = trimmed.begin(); deviation = decalc->calcDist(query, bestfit, it->first, it->second); delete bestfit; return 0; } catch(exception& e) { m->errorOut(e, "Pintail", "getChimeras"); exit(1); } } //*************************************************************************************************************** vector Pintail::readFreq() { try { //read in probabilities and store in vector int pos; float num; vector prob; set h = decalc->getPos(); //positions of bases in masking sequence #ifdef USE_MPI MPI_File inMPI; MPI_Offset size; MPI_Status status; //char* inFileName = new char[consfile.length()]; //memcpy(inFileName, consfile.c_str(), consfile.length()); char inFileName[1024]; strcpy(inFileName, consfile.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); MPI_File_get_size(inMPI, &size); //delete inFileName; char* buffer = new char[size]; MPI_File_read(inMPI, buffer, size, MPI_CHAR, &status); string tempBuf = buffer; delete buffer; if (tempBuf.length() > size) { tempBuf = tempBuf.substr(0, size); } istringstream iss (tempBuf,istringstream::in); //read version string line = m->getline(iss); m->gobble(iss); while(!iss.eof()) { iss >> pos >> num; if (h.count(pos) > 0) { float Pi; Pi = (num - 0.25) / 0.75; //cannot have probability less than 0. if (Pi < 0) { Pi = 0.0; } //do you want this spot prob.push_back(Pi); } m->gobble(iss); } MPI_File_close(&inMPI); #else ifstream in; m->openInputFile(consfile, in); //read version string line = m->getline(in); m->gobble(in); while(!in.eof()){ in >> pos >> num; if (h.count(pos) > 0) { float Pi; Pi = (num - 0.25) / 0.75; //cannot have probability less than 0. if (Pi < 0) { Pi = 0.0; } //do you want this spot prob.push_back(Pi); } m->gobble(in); } in.close(); #endif return prob; } catch(exception& e) { m->errorOut(e, "Pintail", "readFreq"); exit(1); } } //*************************************************************************************************************** //calculate the distances from each query sequence to all sequences in the template to find the closest sequence Sequence* Pintail::findPairs(Sequence* q) { try { Sequence* seqsMatches; seqsMatches = decalc->findClosest(q, templateSeqs); return seqsMatches; } catch(exception& e) { m->errorOut(e, "Pintail", "findPairs"); exit(1); } } //************************************************************************************************** void Pintail::createProcessesQuan() { try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; vector processIDS; bool recalc = false; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ quantilesMembers = decalc->getQuantiles(templateSeqs, windowSizesTemplate, window, probabilityProfile, increment, templateLines[process]->start, templateLines[process]->end); //write out data to file so parent can read it ofstream out; string s = m->mothurGetpid(process) + ".temp"; m->openOutputFile(s, out); //output observed distances for (int i = 0; i < quantilesMembers.size(); i++) { out << quantilesMembers[i].size(); for (int j = 0; j < quantilesMembers[i].size(); j++) { out << '\t' << quantilesMembers[i][j]; } out << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide for (int i = 0; i < templateLines.size(); i++) { delete templateLines[i]; } templateLines.clear(); for (int i = 0; i < processors; i++) { templateLines.push_back(new linePair()); templateLines[i]->start = int (sqrt(float(i)/float(processors)) * templateSeqs.size()); templateLines[i]->end = int (sqrt(float(i+1)/float(processors)) * templateSeqs.size()); } processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ quantilesMembers = decalc->getQuantiles(templateSeqs, windowSizesTemplate, window, probabilityProfile, increment, templateLines[process]->start, templateLines[process]->end); //write out data to file so parent can read it ofstream out; string s = m->mothurGetpid(process) + ".temp"; m->openOutputFile(s, out); //output observed distances for (int i = 0; i < quantilesMembers.size(); i++) { out << quantilesMembers[i].size(); for (int j = 0; j < quantilesMembers[i].size(); j++) { out << '\t' << quantilesMembers[i][j]; } out << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //parent does its part quantilesMembers = decalc->getQuantiles(templateSeqs, windowSizesTemplate, window, probabilityProfile, increment, templateLines[0]->start, templateLines[0]->end); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } //get data created by processes for (int i=0;i<(processors-1);i++) { ifstream in; string s = toString(processIDS[i]) + ".temp"; m->openInputFile(s, in); vector< vector > quan; quan.resize(100); //get quantiles for (int h = 0; h < quan.size(); h++) { int num; in >> num; m->gobble(in); vector q; float w; for (int j = 0; j < num; j++) { in >> w; q.push_back(w); } quan[h] = q; m->gobble(in); } //save quan in quantiles for (int j = 0; j < quan.size(); j++) { //put all values of q[i] into quan[i] for (int l = 0; l < quan[j].size(); l++) { quantilesMembers[j].push_back(quan[j][l]); } //quantilesMembers[j].insert(quantilesMembers[j].begin(), quan[j].begin(), quan[j].end()); } in.close(); m->mothurRemove(s); } #else quantilesMembers = decalc->getQuantiles(templateSeqs, windowSizesTemplate, window, probabilityProfile, increment, 0, templateSeqs.size()); #endif } catch(exception& e) { m->errorOut(e, "Pintail", "createProcessesQuan"); exit(1); } } //*************************************************************************************************************** vector< vector > Pintail::readQuantiles() { try { int num; float ten, twentyfive, fifty, seventyfive, ninetyfive, ninetynine; vector< vector > quan; vector temp; temp.resize(6, 0); //to fill 0 quan.push_back(temp); #ifdef USE_MPI MPI_File inMPI; MPI_Offset size; MPI_Status status; //char* inFileName = new char[quanfile.length()]; //memcpy(inFileName, quanfile.c_str(), quanfile.length()); char inFileName[1024]; strcpy(inFileName, quanfile.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); MPI_File_get_size(inMPI, &size); //delete inFileName; char* buffer = new char[size]; MPI_File_read(inMPI, buffer, size, MPI_CHAR, &status); string tempBuf = buffer; if (tempBuf.length() > size) { tempBuf = tempBuf.substr(0, size); } istringstream iss (tempBuf,istringstream::in); delete buffer; //read version string line = m->getline(iss); m->gobble(iss); while(!iss.eof()) { iss >> num >> ten >> twentyfive >> fifty >> seventyfive >> ninetyfive >> ninetynine; temp.clear(); temp.push_back(ten); temp.push_back(twentyfive); temp.push_back(fifty); temp.push_back(seventyfive); temp.push_back(ninetyfive); temp.push_back(ninetynine); quan.push_back(temp); m->gobble(iss); } MPI_File_close(&inMPI); #else ifstream in; m->openInputFile(quanfile, in); //read version string line = m->getline(in); m->gobble(in); while(!in.eof()){ in >> num >> ten >> twentyfive >> fifty >> seventyfive >> ninetyfive >> ninetynine; temp.clear(); temp.push_back(ten); temp.push_back(twentyfive); temp.push_back(fifty); temp.push_back(seventyfive); temp.push_back(ninetyfive); temp.push_back(ninetynine); quan.push_back(temp); m->gobble(in); } in.close(); #endif return quan; } catch(exception& e) { m->errorOut(e, "Pintail", "readQuantiles"); exit(1); } } //***************************************************************************************************************/ void Pintail::printQuanFile(string file, string outputString) { try { #ifdef USE_MPI MPI_File outQuan; MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; //char* FileName = new char[file.length()]; //memcpy(FileName, file.c_str(), file.length()); char FileName[1024]; strcpy(FileName, file.c_str()); if (pid == 0) { MPI_File_open(MPI_COMM_SELF, FileName, outMode, MPI_INFO_NULL, &outQuan); //comm, filename, mode, info, filepointer int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write(outQuan, buf, length, MPI_CHAR, &status); delete buf; MPI_File_close(&outQuan); } //delete FileName; #else ofstream outQuan; m->openOutputFile(file, outQuan); outQuan << outputString; outQuan.close(); #endif } catch(exception& e) { m->errorOut(e, "Pintail", "printQuanFile"); exit(1); } } //***************************************************************************************************************/ mothur-1.36.1/source/chimera/pintail.h000066400000000000000000000061351255543666200176640ustar00rootroot00000000000000#ifndef PINTAIL_H #define PINTAIL_H /* * pintail.h * Mothur * * Created by Sarah Westcott on 7/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "chimera.h" #include "dist.h" #include "decalc.h" /***********************************************************/ //This class was created using the algorithms described in the // "At Least 1 in 20 16S rRNA Sequence Records Currently Held in the Public Repositories is Estimated To Contain Substantial Anomalies" paper //by Kevin E. Ashelford 1, Nadia A. Chuzhanova 3, John C. Fry 1, Antonia J. Jones 2 and Andrew J. Weightman 1. /***********************************************************/ class Pintail : public Chimera { public: Pintail(string, string, bool, int, string, string, string, int, int, string); //fastafile, templatefile, filter, processors, mask, conservation, quantile, window, increment, outputDir) ~Pintail(); int getChimeras(Sequence*); Sequence print(ostream&, ostream&); void setCons(string c) { consfile = c; } void setQuantiles(string q) { quanfile = q; } #ifdef USE_MPI Sequence print(MPI_File&, MPI_File&); #endif private: Dist* distcalculator; DeCalculator* decalc; int iters, window, increment, processors; string fastafile, quanfile, consfile; vector templateLines; Sequence* querySeq; Sequence* bestfit; //closest match to query in template vector obsDistance; //obsDistance is the vector of observed distances for query vector expectedDistance; //expectedDistance is the vector of expected distances for query float deviation; //deviation is the percentage of mismatched pairs over the whole seq between query and its best match. vector windowsForeachQuery; // windowsForeachQuery is a vector containing the starting spot in query aligned sequence for each window. //this is needed so you can move by bases and not just spots in the alignment int windowSizes; //windowSizes = window size of query vector windowSizesTemplate; //windowSizesTemplate[0] = window size of templateSeqs[0] map trimmed; //trimmed = start and stop of trimmed sequences for query map::iterator it; vector Qav; //Qav is the vector of average variablility for query float seqCoef; //seqCoef is the coeff for query float DE; //DE is the deviaation for query vector probabilityProfile; vector< vector > quantiles; //quantiles[0] is the vector of deviations with ceiling score of 1, quantiles[1] is the vector of deviations with ceiling score of 2... vector< vector > quantilesMembers; //quantiles[0] is the vector of deviations with ceiling score of 1, quantiles[1] is the vector of deviations with ceiling score of 2... set h; string mergedFilterString; vector< vector > readQuantiles(); vector readFreq(); Sequence* findPairs(Sequence*); void createProcessesQuan(); int doPrep(); void printQuanFile(string, string); }; /***********************************************************/ #endif mothur-1.36.1/source/chimera/slayer.cpp000066400000000000000000000442601255543666200200570ustar00rootroot00000000000000/* * slayer.cpp * Mothur * * Created by westcott on 9/25/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "slayer.h" /***********************************************************************/ Slayer::Slayer(int win, int increment, int parentThreshold, float div, int i, int snp, int mi) : minBS(mi), windowSize(win), windowStep(increment), parentFragmentThreshold(parentThreshold), divRThreshold(div), iters(i), percentSNPSample(snp){ m = MothurOut::getInstance(); } /***********************************************************************/ string Slayer::getResults(Sequence query, vector refSeqs) { try { vector all; all.clear(); myQuery = query; for (int i = 0; i < refSeqs.size(); i++) { for (int j = i+1; j < refSeqs.size(); j++) { if (m->control_pressed) { return "no"; } //make copies of query and each parent because runBellerophon removes gaps and messes them up Sequence q(query.getName(), query.getAligned()); Sequence leftParent(refSeqs[i].getName(), refSeqs[i].getAligned()); Sequence rightParent(refSeqs[j].getName(), refSeqs[j].getAligned()); //cout << q->getName() << endl << q->getAligned() << endl << endl; //cout << leftParent.getName() << '\t' << leftParent.getAligned().length() << endl << endl; //cout << rightParent.getName() << '\t' << rightParent.getAligned().length() << endl << endl; //cout << q.getName() << '\t' << q.getAligned().length() << endl << endl; //cout << rightParent->getName() << endl << rightParent->getAligned() << endl << endl; //cout << " length = " << rightParent->getAligned().length() << endl; map spots; //map from spot in original sequence to spot in filtered sequence for query and both parents vector divs = runBellerophon(q, leftParent, rightParent, spots); if (m->control_pressed) { return "no"; } // cout << "examining:\t" << refSeqs[i]->getName() << '\t' << refSeqs[j]->getName() << endl; vector selectedDivs; for (int k = 0; k < divs.size(); k++) { vector snpsLeft = getSNPS(divs[k].parentA.getAligned(), divs[k].querySeq.getAligned(), divs[k].parentB.getAligned(), divs[k].winLStart, divs[k].winLEnd); vector snpsRight = getSNPS(divs[k].parentA.getAligned(), divs[k].querySeq.getAligned(), divs[k].parentB.getAligned(), divs[k].winRStart, divs[k].winREnd); if (m->control_pressed) { return "no"; } int numSNPSLeft = snpsLeft.size(); int numSNPSRight = snpsRight.size(); // cout << numSNPSLeft << '\t' << numSNPSRight << endl; //require at least 4 SNPs on each side of the break if ((numSNPSLeft >= 4) && (numSNPSRight >= 4)) { float BS_A, BS_B; bootstrapSNPS(snpsLeft, snpsRight, BS_A, BS_B, iters); if (m->control_pressed) { return "no"; } divs[k].bsa = BS_A; divs[k].bsb = BS_B; divs[k].bsMax = max(BS_A, BS_B); divs[k].chimeraMax = max(divs[k].qla_qrb, divs[k].qlb_qra); //are we within 10 points of the bootstrap cutoff? // if ((divs[k].bsMax >= (minBS-10)) && (iters < 1000)) { // bootstrapSNPS(snpsLeft, snpsRight, BS_A, BS_B, 1000); // // if (m->control_pressed) { delete q; delete leftParent; delete rightParent; return "no"; } // // divs[k].bsa = BS_A; // divs[k].bsb = BS_B; // divs[k].bsMax = max(BS_A, BS_B); // divs[k].chimeraMax = max(divs[k].qla_qrb, divs[k].qlb_qra); // } //so results reflect orignal alignment divs[k].winLStart = spots[divs[k].winLStart]; divs[k].winLEnd = spots[divs[k].winLEnd]; divs[k].winRStart = spots[divs[k].winRStart]; divs[k].winREnd = spots[divs[k].winREnd]; selectedDivs.push_back(divs[k]); } } //save selected for (int mi = 0; mi < selectedDivs.size(); mi++) { all.push_back(selectedDivs[mi]); } } } // compute bootstrap support if (all.size() > 0) { //sort them sort(all.begin(), all.end(), compareDataStruct); reverse(all.begin(), all.end()); outputResults = all; return "yes"; }else { outputResults = all; return "no"; } } catch(exception& e) { m->errorOut(e, "Slayer", "getResults"); exit(1); } } /***********************************************************************/ vector Slayer::runBellerophon(Sequence q, Sequence pA, Sequence pB, map& spots) { try{ vector data; //vertical filter //cout << q.getName() << endl << q.getAligned() << endl << endl; //cout << pA.getName() << endl << pA.getUnaligned() << endl << endl; //cout << pB.getName() << endl << pB.getUnaligned() << endl << endl; //maps spot in new alignment to spot in alignment before filter spots = verticalFilter(q, pA, pB); //fills baseSpots //get these to avoid numerous function calls string query = q.getAligned(); string parentA = pA.getAligned(); string parentB = pB.getAligned(); int length = query.length(); //cout << q.getName() << endl << q.getAligned() << endl << endl; //cout << pA.getName() << endl << pA.getUnaligned() << endl << endl; //cout << pB.getName() << endl << pB.getUnaligned() << endl << endl; //cout << " length = " << length << endl; //check window size if (length < (2*windowSize+windowStep)) { // m->mothurOut("Your window size is too large for " + q->getName() + ". I will make the window size " + toString(length/4) + " which is 1/4 the filtered length."); m->mothurOutEndLine(); windowSize = length / 4; } for (int i = windowSize-1; i <= (length - windowSize); i += windowStep) { if (m->control_pressed) { return data; } int breakpoint = i; int leftLength = breakpoint + 1; int rightLength = length - leftLength; float QLA = computePercentID(query, parentA, 0, breakpoint); float QRB = computePercentID(query, parentB, breakpoint+1, length-1); float QLB = computePercentID(query, parentB, 0, breakpoint); float QRA = computePercentID(query, parentA, breakpoint+1, length-1); float LAB = computePercentID(parentA, parentB, 0, breakpoint); float RAB = computePercentID(parentA, parentB, breakpoint+1, length-1); float AB = ((LAB*leftLength) + (RAB*rightLength)) / (float) length; float QA = ((QLA*leftLength) + (QRA*rightLength)) / (float) length; float QB = ((QLB*leftLength) + (QRB*rightLength)) / (float) length; float QLA_QRB = ((QLA*leftLength) + (QRB*rightLength)) / (float) length; float QLB_QRA = ((QLB*leftLength) + (QRA*rightLength)) / (float) length; //in original and not used //float avgQA_QB = ((QA*leftLength) + (QB*rightLength)) / (float) length; float divR_QLA_QRB = min((QLA_QRB/QA), (QLA_QRB/QB)); float divR_QLB_QRA = min((QLB_QRA/QA), (QLB_QRA/QB)); //cout << q->getName() << '\t'; //cout << pA->getName() << '\t'; //cout << pB->getName() << '\t'; //cout << "bp: " << breakpoint << " CHIM_TYPE_A\t" << divR_QLA_QRB << "\tQLA: " << QLA << "\tQRB: " << QRB << "\tQLA_QRB: " << QLA_QRB; //cout << "\tCHIM_TYPE_B\t" << divR_QLB_QRA << "\tQLB: " << QLB << "\tQRA: " << QRA << "\tQLB_QRA: " << QLB_QRA << endl; //cout << leftLength << '\t' << rightLength << '\t' << QLA << '\t' << QRB << '\t' << QLB << '\t' << QRA << '\t' << LAB << '\t' << RAB << '\t' << AB << '\t' << QA << '\t' << QB << '\t' << QLA_QRB << '\t' << QLB_QRA << endl; //cout << divRThreshold << endl; //cout << breakpoint << '\t' << divR_QLA_QRB << '\t' << divR_QLB_QRA << endl; //is one of them above the if (divR_QLA_QRB >= divRThreshold || divR_QLB_QRA >= divRThreshold) { if (((QLA_QRB > QA) && (QLA_QRB > QB) && (QLA >= parentFragmentThreshold) && (QRB >= parentFragmentThreshold)) || ((QLB_QRA > QA) && (QLB_QRA > QB) && (QLB >=parentFragmentThreshold) && (QRA >= parentFragmentThreshold))) { data_struct member; member.divr_qla_qrb = divR_QLA_QRB; member.divr_qlb_qra = divR_QLB_QRA; member.qla_qrb = QLA_QRB; member.qlb_qra = QLB_QRA; member.qla = QLA; member.qrb = QRB; member.ab = AB; member.qa = QA; member.qb = QB; member.lab = LAB; member.rab = RAB; member.qra = QRA; member.qlb = QLB; member.winLStart = 0; member.winLEnd = breakpoint; member.winRStart = breakpoint+1; member.winREnd = length-1; member.querySeq = q; member.parentA = pA; member.parentB = pB; member.bsa = 0; member.bsb = 0; member.bsMax = 0; member.chimeraMax = 0; data.push_back(member); }//if }//if }//for return data; } catch(exception& e) { m->errorOut(e, "Slayer", "runBellerophon"); exit(1); } } /***********************************************************************/ vector Slayer::getSNPS(string parentA, string query, string parentB, int left, int right) { try { vector data; for (int i = left; i <= right; i++) { char A = parentA[i]; char Q = query[i]; char B = parentB[i]; if ((A != Q) || (B != Q)) { //ensure not neighboring a gap. change to 12/09 release of chimeraSlayer - not sure what this adds, but it eliminates alot of SNPS if ( //did query loose a base here during filter?? ( i == 0 || abs (baseSpots[0][i] - baseSpots[0][i-1]) == 1) && ( i == query.length()-1 || abs (baseSpots[0][i] - baseSpots[0][i+1]) == 1) && //did parentA loose a base here during filter?? ( i == 0 || abs (baseSpots[1][i] - baseSpots[1][i-1]) == 1) && ( i == parentA.length()-1 || abs (baseSpots[1][i] - baseSpots[1][i+1]) == 1) && //did parentB loose a base here during filter?? ( i == 0 || abs (baseSpots[2][i] - baseSpots[2][i-1]) == 1) && ( i == parentB.length()-1 || abs (baseSpots[2][i] - baseSpots[2][i+1]) == 1) ) { snps member; member.queryChar = Q; member.parentAChar = A; member.parentBChar = B; data.push_back(member); } } } return data; } catch(exception& e) { m->errorOut(e, "Slayer", "getSNPS"); exit(1); } } /***********************************************************************/ int Slayer::bootstrapSNPS(vector left, vector right, float& BSA, float& BSB, int numIters) { try { srand((unsigned)time( NULL )); int count_A = 0; // sceneario QLA,QRB supported int count_B = 0; // sceneario QLB,QRA supported int numLeft = max(1, int(left.size() * percentSNPSample/(float)100 + 0.5)); int numRight = max(1, int(right.size() * percentSNPSample/(float)100 + 0.5)); for (int i = 0; i < numIters; i++) { //random sampling with replacement. if (m->control_pressed) { return 0; } vector selectedLeft; for (int j = 0; j < numLeft; j++) { int index = int(rand() % left.size()); selectedLeft.push_back(left[index]); } vector selectedRight; for (int j = 0; j < numRight; j++) { int index = int(rand() % right.size()); selectedRight.push_back(right[index]); } /* A ------------------------------------------ # QLA QRA # Q ------------------------------------------ # | # | # Q ------------------------------------------ # QLB QRB # B ------------------------------------------ */ float QLA = snpQA(selectedLeft); float QRA = snpQA(selectedRight); float QLB = snpQB(selectedLeft); float QRB = snpQB(selectedRight); //in original - not used - not sure why? //float ALB = snpAB(selectedLeft); //float ARB = snpAB(selectedRight); if ((QLA > QLB) && (QRB > QRA)) { count_A++; } if ((QLB > QLA) && (QRA > QRB)) { count_B++; } //cout << "selected left snp: \n"; //for (int j = 0; j < selectedLeft.size(); j++) { cout << selectedLeft[j].parentAChar; } //cout << endl; //for (int j = 0; j < selectedLeft.size(); j++) { cout << selectedLeft[j].queryChar; } //cout << endl; //for (int j = 0; j < selectedLeft.size(); j++) { cout << selectedLeft[j].parentBChar; } //cout << endl; //cout << "selected right snp: \n"; //for (int j = 0; j < selectedRight.size(); j++) { cout << selectedRight[j].parentAChar; } //cout << endl; //for (int i = 0; i < selectedRight.size(); i++) { cout << selectedRight[i].queryChar; } //cout << endl; //for (int i = 0; i < selectedRight.size(); i++) { cout << selectedRight[i].parentBChar; } //cout << endl; } //cout << count_A << '\t' << count_B << endl; BSA = (float) count_A / (float) numIters * 100; BSB = (float) count_B / (float) numIters * 100; //cout << "bsa = " << BSA << " bsb = " << BSB << endl; return 0; } catch(exception& e) { m->errorOut(e, "Slayer", "bootstrapSNPS"); exit(1); } } /***********************************************************************/ float Slayer::snpQA(vector data) { try { int numIdentical = 0; for (int i = 0; i < data.size(); i++) { if (data[i].parentAChar == data[i].queryChar) { numIdentical++; } } float percentID = (numIdentical / (float) data.size()) * 100; return percentID; } catch(exception& e) { m->errorOut(e, "Slayer", "snpQA"); exit(1); } } /***********************************************************************/ float Slayer::snpQB(vector data) { try { int numIdentical = 0; for (int i = 0; i < data.size(); i++) { if (data[i].parentBChar == data[i].queryChar) { numIdentical++; } } float percentID = (numIdentical / (float) data.size()) * 100; return percentID; } catch(exception& e) { m->errorOut(e, "Slayer", "snpQB"); exit(1); } } /***********************************************************************/ float Slayer::snpAB(vector data) { try { int numIdentical = 0; for (int i = 0; i < data.size(); i++) { if (data[i].parentAChar == data[i].parentBChar) { numIdentical++; } } float percentID = (numIdentical / (float) data.size()) * 100; return percentID; } catch(exception& e) { m->errorOut(e, "Slayer", "snpAB"); exit(1); } } /***********************************************************************/ float Slayer::computePercentID(string queryAlign, string chimera, int left, int right) { try { int numIdentical = 0; int countA = 0; int countB = 0; for (int i = left; i <= right; i++) { if (((queryAlign[i] != 'G') && (queryAlign[i] != 'T') && (queryAlign[i] != 'A') && (queryAlign[i] != 'C')&& (queryAlign[i] != '.') && (queryAlign[i] != '-')) || ((chimera[i] != 'G') && (chimera[i] != 'T') && (chimera[i] != 'A') && (chimera[i] != 'C')&& (chimera[i] != '.') && (chimera[i] != '-'))) {} else { bool charA = false; bool charB = false; if ((queryAlign[i] == 'G') || (queryAlign[i] == 'T') || (queryAlign[i] == 'A') || (queryAlign[i] == 'C')) { charA = true; } if ((chimera[i] == 'G') || (chimera[i] == 'T') || (chimera[i] == 'A') || (chimera[i] == 'C')) { charB = true; } if (charA || charB) { if (charA) { countA++; } if (charB) { countB++; } if (queryAlign[i] == chimera[i]) { numIdentical++; } } } } float numBases = (countA + countB) /(float) 2; if (numBases == 0) { return 0; } float percentIdentical = (numIdentical/(float)numBases) * 100; return percentIdentical; } catch(exception& e) { m->errorOut(e, "Slayer", "computePercentID"); exit(1); } } /***********************************************************************/ //remove columns that contain any gaps map Slayer::verticalFilter(Sequence& q, Sequence& pA, Sequence& pB) { try { //find baseSpots baseSpots.clear(); baseSpots.resize(3); //query, parentA, parentB vector gaps; gaps.resize(q.getAligned().length(), 0); string filterString = (string(q.getAligned().length(), '1')); string seqAligned = q.getAligned(); for (int j = 0; j < seqAligned.length(); j++) { //if this spot is a gap if ((seqAligned[j] == '-') || (seqAligned[j] == '.') || (toupper(seqAligned[j]) == 'N')) { gaps[j]++; } } seqAligned = pA.getAligned(); for (int j = 0; j < seqAligned.length(); j++) { //if this spot is a gap if ((seqAligned[j] == '-') || (seqAligned[j] == '.') || (toupper(seqAligned[j]) == 'N')) { gaps[j]++; } } seqAligned = pB.getAligned(); for (int j = 0; j < seqAligned.length(); j++) { //if this spot is a gap if ((seqAligned[j] == '-') || (seqAligned[j] == '.') || (toupper(seqAligned[j]) == 'N')) { gaps[j]++; } } //zero out spot where any sequences have blanks int numColRemoved = 0; int count = 0; map maskMap; maskMap.clear(); for(int i = 0; i < q.getAligned().length(); i++){ if(gaps[i] != 0) { filterString[i] = '0'; numColRemoved++; } else { maskMap[count] = i; count++; } } seqAligned = q.getAligned(); string newAligned = ""; int baseCount = 0; count = 0; for (int j = 0; j < seqAligned.length(); j++) { //are you a base if ((seqAligned[j] != '-') && (seqAligned[j] != '.') && (toupper(seqAligned[j]) != 'N')) { baseCount++; } //if this spot is not a gap if (filterString[j] == '1') { newAligned += seqAligned[j]; baseSpots[0][count] = baseCount; count++; } } q.setAligned(newAligned); seqAligned = pA.getAligned(); newAligned = ""; baseCount = 0; count = 0; for (int j = 0; j < seqAligned.length(); j++) { //are you a base if ((seqAligned[j] != '-') && (seqAligned[j] != '.') && (toupper(seqAligned[j]) != 'N')) { baseCount++; } //if this spot is not a gap if (filterString[j] == '1') { newAligned += seqAligned[j]; baseSpots[1][count] = baseCount; count++; } } pA.setAligned(newAligned); seqAligned = pB.getAligned(); newAligned = ""; baseCount = 0; count = 0; for (int j = 0; j < seqAligned.length(); j++) { //are you a base if ((seqAligned[j] != '-') && (seqAligned[j] != '.') && (toupper(seqAligned[j]) != 'N')) { baseCount++; } //if this spot is not a gap if (filterString[j] == '1') { newAligned += seqAligned[j]; baseSpots[2][count] = baseCount; count++; } } pB.setAligned(newAligned); return maskMap; } catch(exception& e) { m->errorOut(e, "Slayer", "verticalFilter"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/chimera/slayer.h000066400000000000000000000030171255543666200175170ustar00rootroot00000000000000#ifndef SLAYER_H #define SLAYER_H /* * slayer.h * Mothur * * Created by westcott on 9/25/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "sequence.hpp" #include "chimera.h" /***********************************************************************/ //This class was modeled after the chimeraSlayer written by the Broad Institute /***********************************************************************/ struct snps { char queryChar; char parentAChar; char parentBChar; }; /***********************************************************************/ class Slayer { public: Slayer(int, int, int, float, int, int, int); ~Slayer() {}; string getResults(Sequence, vector); vector getOutput() { return outputResults; } private: int windowSize, windowStep, parentFragmentThreshold, iters, percentSNPSample, minBS; float divRThreshold; vector outputResults; vector< map > baseSpots; Sequence myQuery; map verticalFilter(Sequence&, Sequence&, Sequence&); float computePercentID(string, string, int, int); vector runBellerophon(Sequence, Sequence, Sequence, map&); vector getSNPS(string, string, string, int, int); int bootstrapSNPS(vector, vector, float&, float&, int); float snpQA(vector); float snpQB(vector); float snpAB(vector); MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/classifier/000077500000000000000000000000001255543666200165625ustar00rootroot00000000000000mothur-1.36.1/source/classifier/alignnode.cpp000066400000000000000000000212531255543666200212310ustar00rootroot00000000000000/* * alignNode.cpp * bayesian * * Created by Pat Schloss on 10/11/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "alignnode.h" #include "taxonomynode.h" #include "bayesian.h" /**************************************************************************************************/ AlignNode::AlignNode(string n, int l): TaxonomyNode(n, l){ alignLength = 0; } /**************************************************************************************************/ void AlignNode::printTheta(){ try { m->mothurOut("A:\t"); for(int i=0;imothurOut(toString(theta[i].A)+ '\t'); } m->mothurOutEndLine(); m->mothurOut("T:\t"); for(int i=0;imothurOut(toString(theta[i].T)+ '\t'); } m->mothurOutEndLine(); m->mothurOut("G:\t"); for(int i=0;imothurOut(toString(theta[i].G)+ '\t'); } m->mothurOutEndLine(); m->mothurOut("C:\t"); for(int i=0;imothurOut(toString(theta[i].C)+ '\t'); } m->mothurOutEndLine(); m->mothurOut("I:\t"); for(int i=0;imothurOut(toString(theta[i].gap)+ '\t'); } m->mothurOutEndLine(); } catch(exception& e) { m->errorOut(e, "AlignNode", "printTheta"); exit(1); } } /**************************************************************************************************/ int AlignNode::loadSequence(string& sequence){ try { alignLength = (int)sequence.length(); // this function runs through the alignment and increments the frequency // of each base for a particular taxon. we are building the thetas if(theta.size() == 0){ theta.resize(alignLength); columnCounts.resize(alignLength, 0); } for(int i=0;icontrol_pressed) { return 0; } char base = sequence[i]; if(base == 'A') { theta[i].A++; columnCounts[i]++; } // our thetas will be alignLength x 5 else if(base == 'T'){ theta[i].T++; columnCounts[i]++; } // and we ignore any position that has else if(base == 'G'){ theta[i].G++; columnCounts[i]++; } // an ambiguous base call else if(base == 'C'){ theta[i].C++; columnCounts[i]++; } else if(base == '-'){ theta[i].gap++; columnCounts[i]++; } else if(base == 'U'){ theta[i].T++; columnCounts[i]++; } } numSeqs++; return 0; } catch(exception& e) { m->errorOut(e, "AlignNode", "loadSequence"); exit(1); } } /**************************************************************************************************/ int AlignNode::checkTheta(){ try { for(int i=0;icontrol_pressed) { return 0; } if(theta[i].gap == columnCounts[i]){ columnCounts[i] = 0; } // else{ // int maxCount = theta[i].A; // // if(theta[i].T > maxCount) { maxCount = theta[i].T; } // if(theta[i].G > maxCount) { maxCount = theta[i].T; } // if(theta[i].C > maxCount) { maxCount = theta[i].T; } // if(theta[i].gap > maxCount) { maxCount = theta[i].T; } // // if(maxCount < columnCounts[i] * 0.25){// || maxCount == columnCounts[i]){ //remove any column where the maximum frequency is <50% // columnCounts[i] = 0; // } // } } return 0; } catch(exception& e) { m->errorOut(e, "AlignNode", "checkTheta"); exit(1); } } /**************************************************************************************************/ int AlignNode::addThetas(vector newTheta, int newNumSeqs){ try { if(alignLength == 0){ alignLength = (int)newTheta.size(); theta.resize(alignLength); columnCounts.resize(alignLength); } for(int i=0;icontrol_pressed) { return 0; } theta[i].A += newTheta[i].A; columnCounts[i] += newTheta[i].A; theta[i].T += newTheta[i].T; columnCounts[i] += newTheta[i].T; theta[i].G += newTheta[i].G; columnCounts[i] += newTheta[i].G; theta[i].C += newTheta[i].C; columnCounts[i] += newTheta[i].C; theta[i].gap += newTheta[i].gap; columnCounts[i] += newTheta[i].gap; } numSeqs += newNumSeqs; return 0; } catch(exception& e) { m->errorOut(e, "AlignNode", "addThetas"); exit(1); } } /**************************************************************************************************/ double AlignNode::getSimToConsensus(string& query){ try { double similarity = 0; int length = 0; for(int i=0;icontrol_pressed) { return similarity; } char base = query[i]; if(base != '.' && base != 'N' && columnCounts[i] != 0){ double fraction = 0; if(base == 'A'){ fraction = (int) theta[i].A / (double) columnCounts[i]; similarity += fraction; length++; } else if(base == 'T'){ fraction = (int) theta[i].T / (double) columnCounts[i]; similarity += fraction; length++; } else if(base == 'G'){ fraction = (int) theta[i].G / (double) columnCounts[i]; similarity += fraction; length++; } else if(base == 'C'){ fraction = (int) theta[i].C / (double) columnCounts[i]; similarity += fraction; length++; } else if(base == '-'){ fraction = (int) theta[i].gap / (double) columnCounts[i]; similarity += fraction; length++; } } } if(length != 0){ similarity /= double(length); } else { similarity = 0; } return similarity; } catch(exception& e) { m->errorOut(e, "AlignNode", "getSimToConsensus"); exit(1); } } /**************************************************************************************************/ double AlignNode::getPxGivenkj_D_j(string& query){ //P(x | k_j, D, j) try { double PxGivenkj_D_j = 0; int count = 0; double alpha = 1 / (double)totalSeqs; //flat prior for(int s=0;scontrol_pressed) { return PxGivenkj_D_j; } char base = query[s]; thetaAlign thetaS = theta[s]; if(base != '.' && base != 'N' && columnCounts[s] != 0){ double Nkj_s = (double)columnCounts[s]; double nkj_si = 0; if(base == 'A') { nkj_si = (double)thetaS.A; } else if(base == 'T'){ nkj_si = (double)thetaS.T; } else if(base == 'G'){ nkj_si = (double)thetaS.G; } else if(base == 'C'){ nkj_si = (double)thetaS.C; } else if(base == '-'){ nkj_si = (double)thetaS.gap; } else if(base == 'U'){ nkj_si = (double)thetaS.T; } // double alpha = pow(0.2, double(Nkj_s)) + 0.0001; //need to make 1e-4 a variable in future; this is the non-flat prior // if(columnCounts[s] != nkj_si){ //deal only with segregating sites... double numerator = nkj_si + alpha; double denomenator = Nkj_s + 5.0 * alpha; PxGivenkj_D_j += log(numerator) - log(denomenator); count++; // } } if(base != '.' && columnCounts[s] == 0 && thetaS.gap == 0){ count = 0; break; } } if(count == 0){ PxGivenkj_D_j = -1e10; } return PxGivenkj_D_j; } catch(exception& e) { m->errorOut(e, "AlignNode", "getPxGivenkj_D_j"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/alignnode.h000066400000000000000000000021321255543666200206710ustar00rootroot00000000000000#ifndef ALIGNNODE #define ALIGNNODE /* * alignNode.h * bayesian * * Created by Pat Schloss on 10/11/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "taxonomynode.h" /**************************************************************************************************/ struct thetaAlign { thetaAlign() : A(0), T(0), G(0), C(0), gap(0){} unsigned int A; unsigned int T; unsigned int G; unsigned int C; unsigned int gap; }; /**************************************************************************************************/ class AlignNode : public TaxonomyNode { public: AlignNode(string, int); int loadSequence(string&); int checkTheta(); void printTheta(); double getPxGivenkj_D_j(string& query); //P(x | k_j, D, j) double getSimToConsensus(string& query); vector getTheta() { return theta; } int addThetas(vector, int); private: vector theta; vector columnCounts; int alignLength; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/classifier/aligntree.cpp000066400000000000000000000327101255543666200212430ustar00rootroot00000000000000// // alignTree.cpp // pdsBayesian // // Created by Patrick Schloss on 4/3/12. // Copyright (c) 2012 University of Michigan. All rights reserved. // #include "alignnode.h" #include "aligntree.h" /**************************************************************************************************/ AlignTree::AlignTree(string referenceFileName, string taxonomyFileName, int cutoff) : Classify(), confidenceThreshold(cutoff){ try { AlignNode* newNode = new AlignNode("Root", 0); tree.push_back(newNode); // the tree is stored as a vector of elements of type TaxonomyNode string refTaxonomy; readTaxonomy(taxonomyFileName); ifstream referenceFile; m->openInputFile(referenceFileName, referenceFile); bool error = false; map lengths; while(!referenceFile.eof()){ if (m->control_pressed) { break; } Sequence seq(referenceFile); m->gobble(referenceFile); if (seq.getName() != "") { map::iterator it = taxonomy.find(seq.getName()); if (it != taxonomy.end()) { refTaxonomy = it->second; // lookup the taxonomy string for the current reference sequence string aligned = seq.getAligned(); lengths[aligned.length()] = 1; if (lengths.size() > 1) { error = true; m->mothurOut("[ERROR]: reference sequences must be aligned to use the align method, quitting.\n"); break; } addTaxonomyToTree(seq.getName(), refTaxonomy, aligned); }else { m->mothurOut(seq.getName() + " is in your reference file, but not in your taxonomy file, please correct.\n"); error = true; } } } referenceFile.close(); length = (lengths.begin())->first; if (error) { m->control_pressed = true; } numTaxa = (int)tree.size(); numLevels = 0; for(int i=0;igetLevel(); if(level > numLevels){ numLevels = level; } } numLevels++; aggregateThetas(); int dbSize = tree[0]->getNumSeqs(); for(int i=0;icheckTheta(); tree[i]->setTotalSeqs(dbSize); } } catch(exception& e) { m->errorOut(e, "AlignTree", "AlignTree"); exit(1); } } /**************************************************************************************************/ AlignTree::~AlignTree(){ try { for(int i=0;ierrorOut(e, "AlignTree", "~AlignTree"); exit(1); } } /**************************************************************************************************/ int AlignTree::addTaxonomyToTree(string seqName, string& taxonomy, string& sequence){ try { AlignNode* newNode; string taxonName = ""; int treePosition = 0; // the root is element 0 int level = 1; for(int i=0;icontrol_pressed) { break; } if(taxonomy[i] == ';'){ // looking for semicolons... if (taxonName == "") { m->mothurOut(seqName + " has an error in the taxonomy. This may be due to a ;;"); m->mothurOutEndLine(); m->control_pressed = true; } int newIndex = tree[treePosition]->getChildIndex(taxonName); // look to see if your current node already // has a child with the new taxonName if(newIndex != -1) { treePosition = newIndex; } // if you've seen it before, jump to that else { // position in the tree int newChildIndex = (int)tree.size(); // otherwise, we'll have to create one... tree[treePosition]->makeChild(taxonName, newChildIndex); newNode = new AlignNode(taxonName, level); newNode->setParent(treePosition); tree.push_back(newNode); treePosition = newChildIndex; } // sequence data to that node to update that node's theta - seems slow... taxonName = ""; // clear out the taxon name that we will build as we look level++; } // for a semicolon else{ taxonName += taxonomy[i]; // keep adding letters until we reach a semicolon } } tree[treePosition]->loadSequence(sequence); // now that we've gotten to the correct node, add the return 0; } catch(exception& e) { m->errorOut(e, "AlignTree", "addTaxonomyToTree"); exit(1); } } /**************************************************************************************************/ int AlignTree::aggregateThetas(){ try { vector > levelMatrix(numLevels+1); for(int i=0;icontrol_pressed) { return 0; } levelMatrix[tree[i]->getLevel()].push_back(i); } for(int i=numLevels-1;i>0;i--){ if (m->control_pressed) { return 0; } for(int j=0;jgetParent()]->addThetas(holder->getTheta(), holder->getNumSeqs()); } } return 0; } catch(exception& e) { m->errorOut(e, "AlignTree", "aggregateThetas"); exit(1); } } /**************************************************************************************************/ double AlignTree::getOutlierLogProbability(string& sequence){ try { double count = 0; for(int i=0;ierrorOut(e, "AlignTree", "getOutlierLogProbability"); exit(1); } } /**************************************************************************************************/ int AlignTree::getMinRiskIndexAlign(string& sequence, vector& taxaIndices, vector& probabilities){ try { int numProbs = (int)probabilities.size(); vector G(numProbs, 0.2); //a random sequence will, on average, be 20% similar to any other sequence vector risk(numProbs, 0); for(int i=1;icontrol_pressed) { return 0; } G[i] = tree[taxaIndices[i]]->getSimToConsensus(sequence); } double minRisk = 1e6; int minRiskIndex = 0; for(int i=0;icontrol_pressed) { return 0; } for(int j=0;jerrorOut(e, "AlignTree", "getMinRiskIndexAlign"); exit(1); } } /**************************************************************************************************/ int AlignTree::sanityCheck(vector >& indices, vector& maxIndices){ try { int finalLevel = (int)indices.size()-1; for(int position=1;positioncontrol_pressed) { return 0; } int predictedParent = tree[indices[position][maxIndices[position]]]->getParent(); int actualParent = indices[position-1][maxIndices[position-1]]; if(predictedParent != actualParent){ finalLevel = position - 1; return finalLevel; } } return finalLevel; } catch(exception& e) { m->errorOut(e, "AlignTree", "sanityCheck"); exit(1); } } /**************************************************************************************************/ string AlignTree::getTaxonomy(Sequence* seq){ try { string seqName = seq->getName(); string querySequence = seq->getAligned(); string taxonProbabilityString = ""; if (querySequence.length() != length) { m->mothurOut("[ERROR]: " + seq->getName() + " has length " + toString(querySequence.length()) + ", reference sequences length is " + toString(length) + ". Are your sequences aligned? Sequences must be aligned to use the align search method.\n"); m->control_pressed = true; return ""; } double logPOutlier = getOutlierLogProbability(querySequence); vector > pXgivenKj_D_j(numLevels); vector > indices(numLevels); for(int i=0;icontrol_pressed) { return taxonProbabilityString; } pXgivenKj_D_j[i].push_back(logPOutlier); indices[i].push_back(-1); } for(int i=0;igetName() << '\t' << tree[i]->getLevel() << '\t' << tree[i]->getPxGivenkj_D_j(querySequence) << endl; if (m->control_pressed) { return taxonProbabilityString; } pXgivenKj_D_j[tree[i]->getLevel()].push_back(tree[i]->getPxGivenkj_D_j(querySequence)); indices[tree[i]->getLevel()].push_back(i); } vector sumLikelihood(numLevels, 0); vector bestPosterior(numLevels, 0); vector maxIndex(numLevels, 0); int maxPosteriorIndex; //cout << "before best level" << endl; //let's find the best level and taxa within that level for(int i=0;icontrol_pressed) { return taxonProbabilityString; } int numTaxaInLevel = (int)indices[i].size(); //cout << "numTaxaInLevel:\t" << numTaxaInLevel << endl; vector posteriors(numTaxaInLevel, 0); sumLikelihood[i] = getLogExpSum(pXgivenKj_D_j[i], maxPosteriorIndex); maxPosteriorIndex = 0; for(int j=0;j posteriors[maxPosteriorIndex]){ maxPosteriorIndex = j; } } maxIndex[i] = getMinRiskIndexAlign(querySequence, indices[i], posteriors); maxIndex[i] = maxPosteriorIndex; bestPosterior[i] = posteriors[maxIndex[i]]; } // vector pX_level(numLevels, 0); // // for(int i=0;igetNumSeqs(); // } // // int max_pLevel_X_index = -1; // double pX_level_sum = getLogExpSum(pX_level, max_pLevel_X_index); // double max_pLevel_X = exp(pX_level[max_pLevel_X_index] - pX_level_sum); // // vector pLevel_X(numLevels, 0); // for(int i=0;icontrol_pressed) { return taxonProbabilityString; } int confidenceScore = (int) (bestPosterior[i] * 100); if (confidenceScore >= confidenceThreshold) { if(indices[i][maxIndex[i]] != -1){ taxonProbabilityString += tree[indices[i][maxIndex[i]]]->getName() + '(' + toString(confidenceScore) + ");"; simpleTax += tree[indices[i][maxIndex[i]]]->getName() + ";"; // levelProbabilityOutput << tree[indices[i][maxIndex[i]]]->getName() << '(' << setprecision(6) << pLevel_X[i] << ");"; } else{ taxonProbabilityString + "unclassified" + '(' + toString(confidenceScore) + ");"; // levelProbabilityOutput << "unclassified" << '(' << setprecision(6) << pLevel_X[i] << ");"; simpleTax += "unclassified;"; } }else { break; } savedspot = i; } for(int i=savedspot+1;icontrol_pressed) { return taxonProbabilityString; } taxonProbabilityString + "unclassified(0);"; simpleTax += "unclassified;"; } return taxonProbabilityString; } catch(exception& e) { m->errorOut(e, "AlignTree", "getTaxonomy"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/aligntree.h000066400000000000000000000013151255543666200207050ustar00rootroot00000000000000// // alignTree.h // pdsBayesian // // Created by Patrick Schloss on 4/3/12. // Copyright (c) 2012 University of Michigan. All rights reserved. // #ifndef pdsBayesian_alignTree_h #define pdsBayesian_alignTree_h #include "classify.h" class AlignNode; class AlignTree : public Classify { public: AlignTree(string, string, int); ~AlignTree(); string getTaxonomy(Sequence*); private: int addTaxonomyToTree(string, string&, string&); double getOutlierLogProbability(string&); int getMinRiskIndexAlign(string&, vector&, vector&); int aggregateThetas(); int sanityCheck(vector >&, vector&); int numSeqs, confidenceThreshold, length; vector tree; }; #endif mothur-1.36.1/source/classifier/bayesian.cpp000066400000000000000000000573741255543666200211010ustar00rootroot00000000000000/* * bayesian.cpp * Mothur * * Created by westcott on 11/3/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "bayesian.h" #include "kmer.hpp" #include "phylosummary.h" #include "referencedb.h" /**************************************************************************************************/ Bayesian::Bayesian(string tfile, string tempFile, string method, int ksize, int cutoff, int i, int tid, bool f, bool sh) : Classify(), kmerSize(ksize), confidenceThreshold(cutoff), iters(i) { try { ReferenceDB* rdb = ReferenceDB::getInstance(); threadID = tid; flip = f; shortcuts = sh; string baseName = tempFile; if (baseName == "saved") { baseName = rdb->getSavedReference(); } string baseTName = tfile; if (baseTName == "saved") { baseTName = rdb->getSavedTaxonomy(); } /************calculate the probablity that each word will be in a specific taxonomy*************/ string tfileroot = m->getFullPathName(baseTName.substr(0,baseTName.find_last_of(".")+1)); string tempfileroot = m->getRootName(m->getSimpleName(baseName)); string phyloTreeName = tfileroot + "tree.train"; string phyloTreeSumName = tfileroot + "tree.sum"; string probFileName = tfileroot + tempfileroot + char('0'+ kmerSize) + "mer.prob"; string probFileName2 = tfileroot + tempfileroot + char('0'+ kmerSize) + "mer.numNonZero"; ofstream out; ofstream out2; ifstream phyloTreeTest(phyloTreeName.c_str()); ifstream probFileTest2(probFileName2.c_str()); ifstream probFileTest(probFileName.c_str()); ifstream probFileTest3(phyloTreeSumName.c_str()); int start = time(NULL); //if they are there make sure they were created after this release date bool FilesGood = false; if(probFileTest && probFileTest2 && phyloTreeTest && probFileTest3){ FilesGood = checkReleaseDate(probFileTest, probFileTest2, phyloTreeTest, probFileTest3); } //if you want to save, but you dont need to calculate then just read if (rdb->save && probFileTest && probFileTest2 && phyloTreeTest && probFileTest3 && FilesGood && (tempFile != "saved")) { ifstream saveIn; m->openInputFile(tempFile, saveIn); while (!saveIn.eof()) { Sequence temp(saveIn); m->gobble(saveIn); rdb->referenceSeqs.push_back(temp); } saveIn.close(); } if(probFileTest && probFileTest2 && phyloTreeTest && probFileTest3 && FilesGood){ if (tempFile == "saved") { m->mothurOutEndLine(); m->mothurOut("Using sequences from " + rdb->getSavedReference() + " that are saved in memory."); m->mothurOutEndLine(); } m->mothurOut("Reading template taxonomy... "); cout.flush(); phyloTree = new PhyloTree(phyloTreeTest, phyloTreeName); m->mothurOut("DONE."); m->mothurOutEndLine(); genusNodes = phyloTree->getGenusNodes(); genusTotals = phyloTree->getGenusTotals(); if (tfile == "saved") { m->mothurOutEndLine(); m->mothurOut("Using probabilties from " + rdb->getSavedTaxonomy() + " that are saved in memory... "); cout.flush();; wordGenusProb = rdb->wordGenusProb; WordPairDiffArr = rdb->WordPairDiffArr; }else { m->mothurOut("Reading template probabilities... "); cout.flush(); readProbFile(probFileTest, probFileTest2, probFileName, probFileName2); } //save probabilities if (rdb->save) { rdb->wordGenusProb = wordGenusProb; rdb->WordPairDiffArr = WordPairDiffArr; } }else{ //create search database and names vector generateDatabaseAndNames(tfile, tempFile, method, ksize, 0.0, 0.0, 0.0, 0.0); //prevents errors caused by creating shortcut files if you had an error in the sanity check. if (m->control_pressed) { m->mothurRemove(phyloTreeName); m->mothurRemove(probFileName); m->mothurRemove(probFileName2); } else{ genusNodes = phyloTree->getGenusNodes(); genusTotals = phyloTree->getGenusTotals(); m->mothurOut("Calculating template taxonomy tree... "); cout.flush(); phyloTree->printTreeNodes(phyloTreeName); m->mothurOut("DONE."); m->mothurOutEndLine(); m->mothurOut("Calculating template probabilities... "); cout.flush(); numKmers = database->getMaxKmer() + 1; //initialze probabilities wordGenusProb.resize(numKmers); WordPairDiffArr.resize(numKmers); for (int j = 0; j < wordGenusProb.size(); j++) { wordGenusProb[j].resize(genusNodes.size()); } ofstream out; ofstream out2; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { #endif if (shortcuts) { m->openOutputFile(probFileName, out); //output mothur version out << "#" << m->getVersion() << endl; out << numKmers << endl; m->openOutputFile(probFileName2, out2); //output mothur version out2 << "#" << m->getVersion() << endl; } #ifdef USE_MPI } #endif //for each word for (int i = 0; i < numKmers; i++) { //m->mothurOut("[DEBUG]: kmer = " + toString(i) + "\n"); if (m->control_pressed) { break; } #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { #endif if (shortcuts) { out << i << '\t'; } #ifdef USE_MPI } #endif vector seqsWithWordi = database->getSequencesWithKmer(i); //for each sequence with that word vector count; count.resize(genusNodes.size(), 0); for (int j = 0; j < seqsWithWordi.size(); j++) { int temp = phyloTree->getGenusIndex(names[seqsWithWordi[j]]); count[temp]++; //increment count of seq in this genus who have this word } //probabilityInTemplate = (# of seqs with that word in template + 0.50) / (total number of seqs in template + 1); float probabilityInTemplate = (seqsWithWordi.size() + 0.50) / (float) (names.size() + 1); diffPair tempProb(log(probabilityInTemplate), 0.0); WordPairDiffArr[i] = tempProb; int numNotZero = 0; for (int k = 0; k < genusNodes.size(); k++) { //probabilityInThisTaxonomy = (# of seqs with that word in this taxonomy + probabilityInTemplate) / (total number of seqs in this taxonomy + 1); wordGenusProb[i][k] = log((count[k] + probabilityInTemplate) / (float) (genusTotals[k] + 1)); if (count[k] != 0) { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { #endif if (shortcuts) { out << k << '\t' << wordGenusProb[i][k] << '\t' ; } #ifdef USE_MPI } #endif numNotZero++; } } #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { #endif if (shortcuts) { out << endl; out2 << probabilityInTemplate << '\t' << numNotZero << '\t' << log(probabilityInTemplate) << endl; } #ifdef USE_MPI } #endif } #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { #endif if (shortcuts) { out.close(); out2.close(); } #ifdef USE_MPI } #endif //read in new phylotree with less info. - its faster ifstream phyloTreeTest(phyloTreeName.c_str()); delete phyloTree; phyloTree = new PhyloTree(phyloTreeTest, phyloTreeName); //save probabilities if (rdb->save) { rdb->wordGenusProb = wordGenusProb; rdb->WordPairDiffArr = WordPairDiffArr; } } } if (m->debug) { m->mothurOut("[DEBUG]: about to generateWordPairDiffArr\n"); } generateWordPairDiffArr(); if (m->debug) { m->mothurOut("[DEBUG]: done generateWordPairDiffArr\n"); } //save probabilities if (rdb->save) { rdb->wordGenusProb = wordGenusProb; rdb->WordPairDiffArr = WordPairDiffArr; } m->mothurOut("DONE."); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " seconds get probabilities. "); m->mothurOutEndLine(); } catch(exception& e) { m->errorOut(e, "Bayesian", "Bayesian"); exit(1); } } /**************************************************************************************************/ Bayesian::~Bayesian() { try { if (phyloTree != NULL) { delete phyloTree; } if (database != NULL) { delete database; } } catch(exception& e) { m->errorOut(e, "Bayesian", "~Bayesian"); exit(1); } } /**************************************************************************************************/ string Bayesian::getTaxonomy(Sequence* seq) { try { string tax = ""; Kmer kmer(kmerSize); flipped = false; //get words contained in query //getKmerString returns a string where the index in the string is hte kmer number //and the character at that index can be converted to be the number of times that kmer was seen string queryKmerString = kmer.getKmerString(seq->getUnaligned()); vector queryKmers; for (int i = 0; i < queryKmerString.length()-1; i++) { // the -1 is to ignore any kmer with an N in it if (queryKmerString[i] != '!') { //this kmer is in the query queryKmers.push_back(i); } } //if user wants to test reverse compliment and its reversed use that instead if (flip) { if (isReversed(queryKmers)) { flipped = true; seq->reverseComplement(); queryKmerString = kmer.getKmerString(seq->getUnaligned()); queryKmers.clear(); for (int i = 0; i < queryKmerString.length()-1; i++) { // the -1 is to ignore any kmer with an N in it if (queryKmerString[i] != '!') { //this kmer is in the query queryKmers.push_back(i); } } } } if (queryKmers.size() == 0) { m->mothurOut(seq->getName() + " is bad. It has no kmers of length " + toString(kmerSize) + "."); m->mothurOutEndLine(); simpleTax = "unknown;"; return "unknown;"; } int index = getMostProbableTaxonomy(queryKmers); if (m->control_pressed) { return tax; } //bootstrap - to set confidenceScore int numToSelect = queryKmers.size() / 8; if (m->debug) { m->mothurOut(seq->getName() + "\t"); } tax = bootstrapResults(queryKmers, index, numToSelect); if (m->debug) { m->mothurOut("\n"); } return tax; } catch(exception& e) { m->errorOut(e, "Bayesian", "getTaxonomy"); exit(1); } } /**************************************************************************************************/ string Bayesian::bootstrapResults(vector kmers, int tax, int numToSelect) { try { map confidenceScores; //initialize confidences to 0 int seqIndex = tax; TaxNode seq = phyloTree->get(tax); confidenceScores[tax] = 0; while (seq.level != 0) { //while you are not at the root seqIndex = seq.parent; confidenceScores[seqIndex] = 0; seq = phyloTree->get(seq.parent); } map::iterator itBoot; map::iterator itBoot2; map::iterator itConvert; for (int i = 0; i < iters; i++) { if (m->control_pressed) { return "control"; } vector temp; for (int j = 0; j < numToSelect; j++) { int index = int(rand() % kmers.size()); //add word to temp temp.push_back(kmers[index]); } //get taxonomy int newTax = getMostProbableTaxonomy(temp); //int newTax = 1; TaxNode taxonomyTemp = phyloTree->get(newTax); //add to confidence results while (taxonomyTemp.level != 0) { //while you are not at the root itBoot2 = confidenceScores.find(newTax); //is this a classification we already have a count on if (itBoot2 != confidenceScores.end()) { //this is a classification we need a confidence for (itBoot2->second)++; } newTax = taxonomyTemp.parent; taxonomyTemp = phyloTree->get(newTax); } } string confidenceTax = ""; simpleTax = ""; int seqTaxIndex = tax; TaxNode seqTax = phyloTree->get(tax); while (seqTax.level != 0) { //while you are not at the root itBoot2 = confidenceScores.find(seqTaxIndex); //is this a classification we already have a count on int confidence = 0; if (itBoot2 != confidenceScores.end()) { //already in confidence scores confidence = itBoot2->second; } if (m->debug) { m->mothurOut(seqTax.name + "(" + toString(((confidence/(float)iters) * 100)) + ");"); } if (((confidence/(float)iters) * 100) >= confidenceThreshold) { confidenceTax = seqTax.name + "(" + toString(((confidence/(float)iters) * 100)) + ");" + confidenceTax; simpleTax = seqTax.name + ";" + simpleTax; } seqTaxIndex = seqTax.parent; seqTax = phyloTree->get(seqTax.parent); } if (confidenceTax == "") { confidenceTax = "unknown;"; simpleTax = "unknown;"; } return confidenceTax; } catch(exception& e) { m->errorOut(e, "Bayesian", "bootstrapResults"); exit(1); } } /**************************************************************************************************/ int Bayesian::getMostProbableTaxonomy(vector queryKmer) { try { int indexofGenus = 0; double maxProbability = -1000000.0; //find taxonomy with highest probability that this sequence is from it // cout << genusNodes.size() << endl; for (int k = 0; k < genusNodes.size(); k++) { //for each taxonomy calc its probability double prob = 0.0000; for (int i = 0; i < queryKmer.size(); i++) { prob += wordGenusProb[queryKmer[i]][k]; } // cout << phyloTree->get(genusNodes[k]).name << '\t' << prob << endl; //is this the taxonomy with the greatest probability? if (prob > maxProbability) { indexofGenus = genusNodes[k]; maxProbability = prob; } } return indexofGenus; } catch(exception& e) { m->errorOut(e, "Bayesian", "getMostProbableTaxonomy"); exit(1); } } //******************************************************************************************************************** //if it is more probable that the reverse compliment kmers are in the template, then we assume the sequence is reversed. bool Bayesian::isReversed(vector& queryKmers){ try{ bool reversed = false; float prob = 0; float reverseProb = 0; for (int i = 0; i < queryKmers.size(); i++){ int kmer = queryKmers[i]; if (kmer >= 0){ prob += WordPairDiffArr[kmer].prob; reverseProb += WordPairDiffArr[kmer].reverseProb; } } if (reverseProb > prob){ reversed = true; } return reversed; } catch(exception& e) { m->errorOut(e, "Bayesian", "isReversed"); exit(1); } } //******************************************************************************************************************** int Bayesian::generateWordPairDiffArr(){ try{ Kmer kmer(kmerSize); for (int i = 0; i < WordPairDiffArr.size(); i++) { int reversedWord = kmer.getReverseKmerNumber(i); WordPairDiffArr[i].reverseProb = WordPairDiffArr[reversedWord].prob; } return 0; }catch(exception& e) { m->errorOut(e, "Bayesian", "generateWordPairDiffArr"); exit(1); } } /************************************************************************************************* map Bayesian::parseTaxMap(string newTax) { try{ map parsed; newTax = newTax.substr(0, newTax.length()-1); //get rid of last ';' //parse taxonomy string individual; while (newTax.find_first_of(';') != -1) { individual = newTax.substr(0,newTax.find_first_of(';')); newTax = newTax.substr(newTax.find_first_of(';')+1, newTax.length()); parsed[individual] = 1; } //get last one parsed[newTax] = 1; return parsed; } catch(exception& e) { m->errorOut(e, "Bayesian", "parseTax"); exit(1); } } **************************************************************************************************/ void Bayesian::readProbFile(ifstream& in, ifstream& inNum, string inName, string inNumName) { try{ #ifdef USE_MPI int pid, num, num2, processors; vector positions; vector positions2; MPI_Status status; MPI_File inMPI; MPI_File inMPI2; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); int tag = 2001; char inFileName[1024]; strcpy(inFileName, inNumName.c_str()); char inFileName2[1024]; strcpy(inFileName2, inName.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, inFileName2, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI2); //comm, filename, mode, info, filepointer if (pid == 0) { positions = m->setFilePosEachLine(inNumName, num); positions2 = m->setFilePosEachLine(inName, num2); for(int i = 1; i < processors; i++) { MPI_Send(&num, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&positions[0], (num+1), MPI_LONG, i, tag, MPI_COMM_WORLD); MPI_Send(&num2, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&positions2[0], (num2+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } }else{ MPI_Recv(&num, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); positions.resize(num+1); MPI_Recv(&positions[0], (num+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&num2, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); positions2.resize(num2+1); MPI_Recv(&positions2[0], (num2+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); } //read version int length = positions2[1] - positions2[0]; char* buf5 = new char[length]; MPI_File_read_at(inMPI2, positions2[0], buf5, length, MPI_CHAR, &status); delete buf5; //read numKmers length = positions2[2] - positions2[1]; char* buf = new char[length]; MPI_File_read_at(inMPI2, positions2[1], buf, length, MPI_CHAR, &status); string tempBuf = buf; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } delete buf; istringstream iss (tempBuf,istringstream::in); iss >> numKmers; //initialze probabilities wordGenusProb.resize(numKmers); for (int j = 0; j < wordGenusProb.size(); j++) { wordGenusProb[j].resize(genusNodes.size()); } int kmer, name; vector numbers; numbers.resize(numKmers); float prob; vector zeroCountProb; zeroCountProb.resize(numKmers); WordPairDiffArr.resize(numKmers); //read version length = positions[1] - positions[0]; char* buf6 = new char[length]; MPI_File_read_at(inMPI2, positions[0], buf6, length, MPI_CHAR, &status); delete buf6; //read file for(int i=1;i length) { tempBuf = tempBuf.substr(0, length); } delete buf4; istringstream iss (tempBuf,istringstream::in); float probTemp; iss >> zeroCountProb[i] >> numbers[i] >> probTemp; WordPairDiffArr[i].prob = probTemp; } MPI_File_close(&inMPI); for(int i=2;i length) { tempBuf = tempBuf.substr(0, length); } delete buf4; istringstream iss (tempBuf,istringstream::in); iss >> kmer; //set them all to zero value for (int i = 0; i < genusNodes.size(); i++) { wordGenusProb[kmer][i] = log(zeroCountProb[kmer] / (float) (genusTotals[i]+1)); } //get probs for nonzero values for (int i = 0; i < numbers[kmer]; i++) { iss >> name >> prob; wordGenusProb[kmer][name] = prob; } } MPI_File_close(&inMPI2); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else //read version string line = m->getline(in); m->gobble(in); in >> numKmers; m->gobble(in); //cout << threadID << '\t' << line << '\t' << numKmers << &in << '\t' << &inNum << '\t' << genusNodes.size() << endl; //initialze probabilities wordGenusProb.resize(numKmers); for (int j = 0; j < wordGenusProb.size(); j++) { wordGenusProb[j].resize(genusNodes.size()); } int kmer, name, count; count = 0; vector num; num.resize(numKmers); float prob; vector zeroCountProb; zeroCountProb.resize(numKmers); WordPairDiffArr.resize(numKmers); //read version string line2 = m->getline(inNum); m->gobble(inNum); float probTemp; //cout << threadID << '\t' << line2 << '\t' << this << endl; while (inNum) { inNum >> zeroCountProb[count] >> num[count] >> probTemp; WordPairDiffArr[count].prob = probTemp; count++; m->gobble(inNum); //cout << threadID << '\t' << count << endl; } inNum.close(); //cout << threadID << '\t' << "here1 " << &wordGenusProb << '\t' << &num << endl; // //cout << threadID << '\t' << &genusTotals << '\t' << endl; //cout << threadID << '\t' << genusNodes.size() << endl; while(in) { in >> kmer; //cout << threadID << '\t' << kmer << endl; //set them all to zero value for (int i = 0; i < genusNodes.size(); i++) { wordGenusProb[kmer][i] = log(zeroCountProb[kmer] / (float) (genusTotals[i]+1)); } //cout << threadID << '\t' << num[kmer] << "here" << endl; //get probs for nonzero values for (int i = 0; i < num[kmer]; i++) { in >> name >> prob; wordGenusProb[kmer][name] = prob; } m->gobble(in); } in.close(); //cout << threadID << '\t' << "here" << endl; #endif } catch(exception& e) { m->errorOut(e, "Bayesian", "readProbFile"); exit(1); } } /**************************************************************************************************/ bool Bayesian::checkReleaseDate(ifstream& file1, ifstream& file2, ifstream& file3, ifstream& file4) { try { bool good = true; vector lines; lines.push_back(m->getline(file1)); lines.push_back(m->getline(file2)); lines.push_back(m->getline(file3)); lines.push_back(m->getline(file4)); //before we added this check if ((lines[0][0] != '#') || (lines[1][0] != '#') || (lines[2][0] != '#') || (lines[3][0] != '#')) { good = false; } else { //rip off # for (int i = 0; i < lines.size(); i++) { lines[i] = lines[i].substr(1); } //get mothurs current version string version = m->getVersion(); vector versionVector; m->splitAtChar(version, versionVector, '.'); //check each files version for (int i = 0; i < lines.size(); i++) { vector linesVector; m->splitAtChar(lines[i], linesVector, '.'); if (versionVector.size() != linesVector.size()) { good = false; break; } else { for (int j = 0; j < versionVector.size(); j++) { int num1, num2; convert(versionVector[j], num1); convert(linesVector[j], num2); //if mothurs version is newer than this files version, then we want to remake it if (num1 > num2) { good = false; break; } } } if (!good) { break; } } } if (!good) { file1.close(); file2.close(); file3.close(); file4.close(); } else { file1.seekg(0); file2.seekg(0); file3.seekg(0); file4.seekg(0); } return good; } catch(exception& e) { m->errorOut(e, "Bayesian", "checkReleaseDate"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/bayesian.h000066400000000000000000000024711255543666200205320ustar00rootroot00000000000000#ifndef BAYESIAN_H #define BAYESIAN_H /* * bayesian.h * Mothur * * Created by westcott on 11/3/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "classify.h" /**************************************************************************************************/ class Bayesian : public Classify { public: Bayesian(string, string, string, int, int, int, int, bool, bool); ~Bayesian(); string getTaxonomy(Sequence*); private: vector< vector > wordGenusProb; //vector of maps from genus to probability //wordGenusProb[0][392] = probability that a sequence within genus that's index in the tree is 392 would contain kmer 0; vector genusTotals; vector genusNodes; //indexes in phyloTree where genus' are located vector WordPairDiffArr; int kmerSize, numKmers, confidenceThreshold, iters; string bootstrapResults(vector, int, int); int getMostProbableTaxonomy(vector); void readProbFile(ifstream&, ifstream&, string, string); bool checkReleaseDate(ifstream&, ifstream&, ifstream&, ifstream&); bool isReversed(vector&); vector createWordIndexArr(Sequence*); int generateWordPairDiffArr(); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/classifier/classify.cpp000066400000000000000000000315011255543666200211030ustar00rootroot00000000000000/* * classify.cpp * Mothur * * Created by westcott on 11/3/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "classify.h" #include "sequence.hpp" #include "kmerdb.hpp" #include "suffixdb.hpp" #include "blastdb.hpp" #include "distancedb.hpp" #include "referencedb.h" /**************************************************************************************************/ void Classify::generateDatabaseAndNames(string tfile, string tempFile, string method, int kmerSize, float gapOpen, float gapExtend, float match, float misMatch) { try { ReferenceDB* rdb = ReferenceDB::getInstance(); if (tfile == "saved") { tfile = rdb->getSavedTaxonomy(); } taxFile = tfile; int numSeqs = 0; if (tempFile == "saved") { int start = time(NULL); m->mothurOutEndLine(); m->mothurOut("Using sequences from " + rdb->getSavedReference() + " that are saved in memory."); m->mothurOutEndLine(); numSeqs = rdb->referenceSeqs.size(); templateFile = rdb->getSavedReference(); tempFile = rdb->getSavedReference(); bool needToGenerate = true; string kmerDBName; if(method == "kmer") { database = new KmerDB(tempFile, kmerSize); kmerDBName = tempFile.substr(0,tempFile.find_last_of(".")+1) + char('0'+ kmerSize) + "mer"; ifstream kmerFileTest(kmerDBName.c_str()); if(kmerFileTest){ bool GoodFile = m->checkReleaseVersion(kmerFileTest, m->getVersion()); if (GoodFile) { needToGenerate = false; } } } else if(method == "suffix") { database = new SuffixDB(numSeqs); } else if(method == "blast") { database = new BlastDB(tempFile.substr(0,tempFile.find_last_of(".")+1), gapOpen, gapExtend, match, misMatch, "", threadID); } else if(method == "distance") { database = new DistanceDB(); } else { m->mothurOut(method + " is not a valid search option. I will run the command using kmer, ksize=8."); m->mothurOutEndLine(); database = new KmerDB(tempFile, 8); } if (needToGenerate) { for (int k = 0; k < rdb->referenceSeqs.size(); k++) { Sequence temp(rdb->referenceSeqs[k].getName(), rdb->referenceSeqs[k].getAligned()); names.push_back(temp.getName()); database->addSequence(temp); } if ((method == "kmer") && (!shortcuts)) {;} //don't print else {database->generateDB(); } }else if ((method == "kmer") && (!needToGenerate)) { ifstream kmerFileTest(kmerDBName.c_str()); database->readKmerDB(kmerFileTest); for (int k = 0; k < rdb->referenceSeqs.size(); k++) { names.push_back(rdb->referenceSeqs[k].getName()); } } database->setNumSeqs(numSeqs); m->mothurOut("It took " + toString(time(NULL) - start) + " to load " + toString(rdb->referenceSeqs.size()) + " sequences and generate the search databases.");m->mothurOutEndLine(); }else { templateFile = tempFile; int start = time(NULL); m->mothurOut("Generating search database... "); cout.flush(); #ifdef USE_MPI int pid, processors; vector positions; int tag = 2001; MPI_Status status; MPI_File inMPI; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); //char* inFileName = new char[tempFile.length()]; //memcpy(inFileName, tempFile.c_str(), tempFile.length()); char inFileName[1024]; strcpy(inFileName, tempFile.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer //delete inFileName; if (pid == 0) { //only one process needs to scan file positions = m->setFilePosFasta(tempFile, numSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&positions[0], (numSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } }else{ MPI_Recv(&numSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); positions.resize(numSeqs+1); MPI_Recv(&positions[0], (numSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); } //create database if(method == "kmer") { database = new KmerDB(tempFile, kmerSize); } else if(method == "suffix") { database = new SuffixDB(numSeqs); } else if(method == "blast") { database = new BlastDB(tempFile.substr(0,tempFile.find_last_of(".")+1), gapOpen, gapExtend, match, misMatch, "", pid); } else if(method == "distance") { database = new DistanceDB(); } else { m->mothurOut(method + " is not a valid search option. I will run the command using kmer, ksize=8."); m->mothurOutEndLine(); database = new KmerDB(tempFile, 8); } //read file for(int i=0;i length) { tempBuf = tempBuf.substr(0, length); } delete buf4; istringstream iss (tempBuf,istringstream::in); Sequence temp(iss); if (temp.getName() != "") { if (rdb->save) { rdb->referenceSeqs.push_back(temp); } names.push_back(temp.getName()); database->addSequence(temp); } } database->generateDB(); MPI_File_close(&inMPI); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else //need to know number of template seqs for suffixdb if (method == "suffix") { ifstream inFASTA; m->openInputFile(tempFile, inFASTA); m->getNumSeqs(inFASTA, numSeqs); inFASTA.close(); } bool needToGenerate = true; string kmerDBName; if(method == "kmer") { database = new KmerDB(tempFile, kmerSize); kmerDBName = tempFile.substr(0,tempFile.find_last_of(".")+1) + char('0'+ kmerSize) + "mer"; ifstream kmerFileTest(kmerDBName.c_str()); if(kmerFileTest){ bool GoodFile = m->checkReleaseVersion(kmerFileTest, m->getVersion()); if (GoodFile) { needToGenerate = false; } } } else if(method == "suffix") { database = new SuffixDB(numSeqs); } else if(method == "blast") { database = new BlastDB(tempFile.substr(0,tempFile.find_last_of(".")+1), gapOpen, gapExtend, match, misMatch, "", threadID); } else if(method == "distance") { database = new DistanceDB(); } else { m->mothurOut(method + " is not a valid search option. I will run the command using kmer, ksize=8."); m->mothurOutEndLine(); database = new KmerDB(tempFile, 8); } if (needToGenerate) { ifstream fastaFile; m->openInputFile(tempFile, fastaFile); while (!fastaFile.eof()) { Sequence temp(fastaFile); m->gobble(fastaFile); if (rdb->save) { rdb->referenceSeqs.push_back(temp); } names.push_back(temp.getName()); database->addSequence(temp); } fastaFile.close(); if ((method == "kmer") && (!shortcuts)) {;} //don't print else {database->generateDB(); } }else if ((method == "kmer") && (!needToGenerate)) { ifstream kmerFileTest(kmerDBName.c_str()); database->readKmerDB(kmerFileTest); ifstream fastaFile; m->openInputFile(tempFile, fastaFile); while (!fastaFile.eof()) { Sequence temp(fastaFile); m->gobble(fastaFile); if (rdb->save) { rdb->referenceSeqs.push_back(temp); } names.push_back(temp.getName()); } fastaFile.close(); } #endif database->setNumSeqs(names.size()); m->mothurOut("DONE."); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " seconds generate search database. "); m->mothurOutEndLine(); } readTaxonomy(taxFile); //sanity check bool okay = phyloTree->ErrorCheck(names); if (!okay) { m->control_pressed = true; } } catch(exception& e) { m->errorOut(e, "Classify", "generateDatabaseAndNames"); exit(1); } } /**************************************************************************************************/ Classify::Classify() { m = MothurOut::getInstance(); database = NULL; phyloTree=NULL; flipped=false; } /**************************************************************************************************/ int Classify::readTaxonomy(string file) { try { phyloTree = new PhyloTree(); string name, taxInfo; m->mothurOutEndLine(); m->mothurOut("Reading in the " + file + " taxonomy...\t"); cout.flush(); if (m->debug) { m->mothurOut("[DEBUG]: Taxonomies read in...\n"); } #ifdef USE_MPI int pid, num, processors; vector positions; int tag = 2001; MPI_Status status; MPI_File inMPI; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); char inFileName[1024]; strcpy(inFileName, file.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer //delete inFileName; if (pid == 0) { positions = m->setFilePosEachLine(file, num); //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&num, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&positions[0], (num+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } }else{ MPI_Recv(&num, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); positions.resize(num+1); MPI_Recv(&positions[0], (num+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); } //read file for(int i=0;i length) { tempBuf = tempBuf.substr(0, length); } delete buf4; istringstream iss (tempBuf,istringstream::in); iss >> name; m->gobble(iss); iss >> taxInfo; if (m->debug) { m->mothurOut("[DEBUG]: name = " + name + " tax = " + taxInfo + "\n"); } //commented out to save time with large templates. 10/7/13 //if (m->inUsersGroups(name, names)) { taxonomy[name] = taxInfo; phyloTree->addSeqToTree(name, taxInfo); //}else { // m->mothurOut("[WARNING]: " + name + " is in your taxonomy file and not in your reference file, ignoring.\n"); //} } MPI_File_close(&inMPI); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else taxonomy.clear(); m->readTax(file, taxonomy, true); //commented out to save time with large templates. 6/12/13 //map tempTaxonomy; for (map::iterator itTax = taxonomy.begin(); itTax != taxonomy.end(); itTax++) { //if (m->inUsersGroups(itTax->first, names)) { phyloTree->addSeqToTree(itTax->first, itTax->second); if (m->control_pressed) { break; } //tempTaxonomy[itTax->first] = itTax->second; // }else { // m->mothurOut("[WARNING]: " + itTax->first + " is in your taxonomy file and not in your reference file, ignoring.\n"); //} } //taxonomy = tempTaxonomy; #endif phyloTree->assignHeirarchyIDs(0); phyloTree->setUp(file); m->mothurOut("DONE."); m->mothurOutEndLine(); cout.flush(); return phyloTree->getNumSeqs(); } catch(exception& e) { m->errorOut(e, "Classify", "readTaxonomy"); exit(1); } } /**************************************************************************************************/ vector Classify::parseTax(string tax) { try { vector taxons; m->splitAtChar(tax, taxons, ';'); return taxons; } catch(exception& e) { m->errorOut(e, "Classify", "parseTax"); exit(1); } } /**************************************************************************************************/ double Classify::getLogExpSum(vector probabilities, int& maxIndex){ try { // http://jblevins.org/notes/log-sum-exp double maxProb = probabilities[0]; maxIndex = 0; int numProbs = (int)probabilities.size(); for(int i=1;i= maxProb){ maxProb = probabilities[i]; maxIndex = i; } } double probSum = 0.0000; for(int i=0;ierrorOut(e, "Classify", "getLogExpSum"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/classify.h000066400000000000000000000026211255543666200205510ustar00rootroot00000000000000#ifndef CLASSIFY_H #define CLASSIFY_H /* * classify.h * Mothur * * Created by westcott on 11/3/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ /* This class is a parent to bayesian, knn. */ #include "mothur.h" #include "database.hpp" #include "phylotree.h" class Sequence; /**************************************************************************************************/ class Classify { public: Classify(); virtual ~Classify(){}; virtual string getTaxonomy(Sequence*) = 0; virtual string getSimpleTax() { return simpleTax; } virtual bool getFlipped() { return flipped; } virtual void generateDatabaseAndNames(string, string, string, int, float, float, float, float); virtual void setDistName(string s) {} //for knn, so if distance method is selected with knn you can create the smallest distance file in the right place. protected: map taxonomy; //name maps to taxonomy map::iterator itTax; map::iterator it; Database* database; PhyloTree* phyloTree; string taxFile, templateFile, simpleTax; vector names; int threadID, numLevels, numTaxa; bool flip, flipped, shortcuts; int readTaxonomy(string); vector parseTax(string); double getLogExpSum(vector, int&); MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/classifier/kmernode.cpp000066400000000000000000000153261255543666200211010ustar00rootroot00000000000000/* * kmerNode.cpp * bayesian * * Created by Pat Schloss on 10/11/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "kmernode.h" /**********************************************************************************************************************/ KmerNode::KmerNode(string s, int l, int n) : TaxonomyNode(s, l), kmerSize(n) { try { int power4s[14] = { 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864 }; numPossibleKmers = power4s[kmerSize]; numUniqueKmers = 0; kmerVector.assign(numPossibleKmers, 0); } catch(exception& e) { m->errorOut(e, "KmerNode", "KmerNode"); exit(1); } } /**********************************************************************************************************************/ void KmerNode::loadSequence(vector& kmerProfile){ try { for(int i=0;icontrol_pressed) { break; } if(kmerVector[i] == 0 && kmerProfile[i] != 0) { numUniqueKmers++; } kmerVector[i] += kmerProfile[i]; } numSeqs++; } catch(exception& e) { m->errorOut(e, "KmerNode", "loadSequence"); exit(1); } } /**********************************************************************************************************************/ string KmerNode::getKmerBases(int kmerNumber){ try { // Here we convert the kmer number into the kmer in terms of bases. // // Example: Score = 915 (for a 6-mer) // Base6 = (915 / 4^0) % 4 = 915 % 4 = 3 => T [T] // Base5 = (915 / 4^1) % 4 = 228 % 4 = 0 => A [AT] // Base4 = (915 / 4^2) % 4 = 57 % 4 = 1 => C [CAT] // Base3 = (915 / 4^3) % 4 = 14 % 4 = 2 => G [GCAT] // Base2 = (915 / 4^4) % 4 = 3 % 4 = 3 => T [TGCAT] // Base1 = (915 / 4^5) % 4 = 0 % 4 = 0 => A [ATGCAT] -> this checks out with the previous method int power4s[14] = { 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864 }; string kmer = ""; if(kmerNumber == power4s[kmerSize]){//pow(4.,7)){ // if the kmer number is the same as the maxKmer then it must for(int i=0;icontrol_pressed) { return kmer; } int nt = (int)(kmerNumber / (float)power4s[i]) % 4; // the '%' operator returns the remainder if(nt == 0) { kmer = 'A' + kmer; } // from int-based division ] else if(nt == 1){ kmer = 'C' + kmer; } else if(nt == 2){ kmer = 'G' + kmer; } else if(nt == 3){ kmer = 'T' + kmer; } } } return kmer; } catch(exception& e) { m->errorOut(e, "KmerNode", "getKmerBases"); exit(1); } } /**************************************************************************************************/ void KmerNode::addThetas(vector newTheta, int newNumSeqs){ try { for(int i=0;icontrol_pressed) { break; } kmerVector[i] += newTheta[i]; } // if(alignLength == 0){ // alignLength = (int)newTheta.size(); // theta.resize(alignLength); // columnCounts.resize(alignLength); // } // // for(int i=0;ierrorOut(e, "KmerNode", "addThetas"); exit(1); } } /**********************************************************************************************************************/ int KmerNode::getNumUniqueKmers(){ try { if(numUniqueKmers == 0){ for(int i=0;icontrol_pressed) { return numUniqueKmers; } if(kmerVector[i] != 0){ numUniqueKmers++; } } } return numUniqueKmers; } catch(exception& e) { m->errorOut(e, "KmerNode", "getNumUniqueKmers"); exit(1); } } /**********************************************************************************************************************/ void KmerNode::printTheta(){ try { m->mothurOut(name + "\n"); for(int i=0;imothurOut(getKmerBases(i) + '\t' + toString(kmerVector[i]) + "\n"); } } m->mothurOutEndLine(); } catch(exception& e) { m->errorOut(e, "KmerNode", "printTheta"); exit(1); } } /**************************************************************************************************/ double KmerNode::getSimToConsensus(vector& queryKmerProfile){ try { double present = 0; for(int i=0;icontrol_pressed) { return present; } if(queryKmerProfile[i] != 0 && kmerVector[i] != 0){ present++; } } return present / double(queryKmerProfile.size() - kmerSize + 1); } catch(exception& e) { m->errorOut(e, "KmerNode", "getSimToConsensus"); exit(1); } } /**********************************************************************************************************************/ double KmerNode::getPxGivenkj_D_j(vector& queryKmerProfile) { try { double sumLogProb = 0.0000; double alpha = 1.0 / (double)totalSeqs; //flat prior // double alpha = pow((1.0 / (double)numUniqueKmers), numSeqs)+0.0001; //non-flat prior for(int i=0;icontrol_pressed) { return sumLogProb; } if(queryKmerProfile[i] != 0){ //numUniqueKmers needs to be the value from Root; sumLogProb += log((kmerVector[i] + alpha) / (numSeqs + numUniqueKmers * alpha)); } } return sumLogProb; } catch(exception& e) { m->errorOut(e, "KmerNode", "getPxGivenkj_D_j"); exit(1); } } /**********************************************************************************************************************/ mothur-1.36.1/source/classifier/kmernode.h000066400000000000000000000022461255543666200205430ustar00rootroot00000000000000#ifndef KMERNODE #define KMERNODE /* * kmerNode.h * bayesian * * Created by Pat Schloss on 10/11/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "taxonomynode.h" /**********************************************************************************************************************/ class KmerNode : public TaxonomyNode { public: KmerNode(string, int, int); void loadSequence(vector&); void printTheta(); double getPxGivenkj_D_j(vector&); double getSimToConsensus(vector&); void checkTheta(){}; void setNumUniqueKmers(int num) { numUniqueKmers = num; } int getNumUniqueKmers(); void addThetas(vector, int); vector getTheta() { return kmerVector; } private: string getKmerBases(int); int kmerSize; // value of k int numPossibleKmers; // 4^kmerSize int numUniqueKmers; // number of unique kmers seen in a group ~ O_kj int numKmers; // number of kmers in a sequence vector kmerVector; // counts of kmers across all sequences in a node }; /**********************************************************************************************************************/ #endif mothur-1.36.1/source/classifier/kmertree.cpp000066400000000000000000000344741255543666200211200ustar00rootroot00000000000000// // kmerTree.cpp // pdsBayesian // // Created by Patrick Schloss on 4/3/12. // Copyright (c) 2012 University of Michigan. All rights reserved. // #include "kmernode.h" #include "kmertree.h" /**************************************************************************************************/ KmerTree::KmerTree(string referenceFileName, string taxonomyFileName, int k, int cutoff) : Classify(), confidenceThreshold(cutoff), kmerSize(k){ try { KmerNode* newNode = new KmerNode("Root", 0, kmerSize); tree.push_back(newNode); // the tree is stored as a vector of elements of type TaxonomyNode int power4s[14] = { 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864 }; numPossibleKmers = power4s[kmerSize]; string refTaxonomy; readTaxonomy(taxonomyFileName); ifstream referenceFile; m->openInputFile(referenceFileName, referenceFile); bool error = false; while(!referenceFile.eof()){ if (m->control_pressed) { break; } Sequence seq(referenceFile); m->gobble(referenceFile); if (seq.getName() != "") { map::iterator it = taxonomy.find(seq.getName()); if (it != taxonomy.end()) { refTaxonomy = it->second; // lookup the taxonomy string for the current reference sequence vector kmerProfile = ripKmerProfile(seq.getUnaligned()); //convert to kmer vector addTaxonomyToTree(seq.getName(), refTaxonomy, kmerProfile); }else { m->mothurOut(seq.getName() + " is in your reference file, but not in your taxonomy file, please correct.\n"); error = true; } } } referenceFile.close(); if (error) { m->control_pressed = true; } numTaxa = (int)tree.size(); numLevels = 0; for(int i=0;igetLevel(); if(level > numLevels){ numLevels = level; } } numLevels++; aggregateThetas(); int dbSize = tree[0]->getNumSeqs(); for(int i=0;icheckTheta(); tree[i]->setNumUniqueKmers(tree[0]->getNumUniqueKmers()); tree[i]->setTotalSeqs(dbSize); } } catch(exception& e) { m->errorOut(e, "KmerTree", "KmerTree"); exit(1); } } /**************************************************************************************************/ KmerTree::~KmerTree(){ for(int i=0;i KmerTree::ripKmerProfile(string sequence){ try { // assume all input sequences are unaligned int power4s[14] = { 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864 }; int nKmers = (int)sequence.length() - kmerSize + 1; vector kmerProfile(numPossibleKmers + 1, 0); for(int i=0;icontrol_pressed) { break; } int kmer = 0; for(int j=0;jerrorOut(e, "KmerTree", "ripKmerProfile"); exit(1); } } /**************************************************************************************************/ int KmerTree::addTaxonomyToTree(string seqName, string taxonomy, vector& sequence){ try { KmerNode* newNode; string taxonName = ""; int treePosition = 0; // the root is element 0 int level = 1; for(int i=0;icontrol_pressed) { break; } if(taxonomy[i] == ';'){ // looking for semicolons... if (taxonName == "") { m->mothurOut(seqName + " has an error in the taxonomy. This may be due to a ;;"); m->mothurOutEndLine(); m->control_pressed = true; } int newIndex = tree[treePosition]->getChildIndex(taxonName);// look to see if your current node already // has a child with the new taxonName if(newIndex != -1) { treePosition = newIndex; } // if you've seen it before, jump to that else { // position in the tree int newChildIndex = (int)tree.size(); // otherwise, we'll have to create one... tree[treePosition]->makeChild(taxonName, newChildIndex); newNode = new KmerNode(taxonName, level, kmerSize); newNode->setParent(treePosition); tree.push_back(newNode); treePosition = newChildIndex; } // sequence data to that node to update that node's theta - seems slow... taxonName = ""; // clear out the taxon name that we will build as we look level++; } // for a semicolon else{ taxonName += taxonomy[i]; // keep adding letters until we reach a semicolon } } tree[treePosition]->loadSequence(sequence); // now that we've gotten to the correct node, add the return 0; } catch(exception& e) { m->errorOut(e, "KmerTree", "addTaxonomyToTree"); exit(1); } } /**************************************************************************************************/ int KmerTree::aggregateThetas(){ try { vector > levelMatrix(numLevels+1); for(int i=0;icontrol_pressed) { return 0; } levelMatrix[tree[i]->getLevel()].push_back(i); } for(int i=numLevels-1;i>0;i--) { if (m->control_pressed) { return 0; } for(int j=0;jgetParent()]->addThetas(holder->getTheta(), holder->getNumSeqs()); } } return 0; } catch(exception& e) { m->errorOut(e, "KmerTree", "aggregateThetas"); exit(1); } } /**************************************************************************************************/ int KmerTree::getMinRiskIndexKmer(vector& sequence, vector& taxaIndices, vector& probabilities){ try { int numProbs = (int)probabilities.size(); vector G(numProbs, 0.2); //a random sequence will, on average, be 20% similar to any other sequence; not sure that this holds up for kmers; whatever. vector risk(numProbs, 0); for(int i=1;icontrol_pressed) { return 0; } G[i] = tree[taxaIndices[i]]->getSimToConsensus(sequence); } double minRisk = 1e6; int minRiskIndex = 0; for(int i=0;icontrol_pressed) { return 0; } for(int j=0;jerrorOut(e, "KmerTree", "getMinRiskIndexKmer"); exit(1); } } /**************************************************************************************************/ int KmerTree::sanityCheck(vector >& indices, vector& maxIndices){ try { int finalLevel = (int)indices.size()-1; for(int position=1;positioncontrol_pressed) { return 0; } int predictedParent = tree[indices[position][maxIndices[position]]]->getParent(); int actualParent = indices[position-1][maxIndices[position-1]]; if(predictedParent != actualParent){ finalLevel = position - 1; return finalLevel; } } return finalLevel; } catch(exception& e) { m->errorOut(e, "KmerTree", "sanityCheck"); exit(1); } } /**************************************************************************************************/ string KmerTree::getTaxonomy(Sequence* thisSeq){ try { string seqName = thisSeq->getName(); string querySequence = thisSeq->getAligned(); string taxonProbabilityString = ""; string unalignedSeq = thisSeq->getUnaligned(); double logPOutlier = (querySequence.length() - kmerSize + 1) * log(1.0/(double)tree[0]->getNumUniqueKmers()); vector queryProfile = ripKmerProfile(unalignedSeq); //convert to kmer vector vector > pXgivenKj_D_j(numLevels); vector > indices(numLevels); for(int i=0;icontrol_pressed) { return taxonProbabilityString; } pXgivenKj_D_j[i].push_back(logPOutlier); indices[i].push_back(-1); } for(int i=0;icontrol_pressed) { return taxonProbabilityString; } pXgivenKj_D_j[tree[i]->getLevel()].push_back(tree[i]->getPxGivenkj_D_j(queryProfile)); indices[tree[i]->getLevel()].push_back(i); } vector sumLikelihood(numLevels, 0); vector bestPosterior(numLevels, 0); vector maxIndex(numLevels, 0); int maxPosteriorIndex; //let's find the best level and taxa within that level for(int i=0;icontrol_pressed) { return taxonProbabilityString; } int numTaxaInLevel = (int)indices[i].size(); vector posteriors(numTaxaInLevel, 0); sumLikelihood[i] = getLogExpSum(pXgivenKj_D_j[i], maxPosteriorIndex); maxPosteriorIndex = 0; for(int j=0;j posteriors[maxPosteriorIndex]){ maxPosteriorIndex = j; } } maxIndex[i] = getMinRiskIndexKmer(queryProfile, indices[i], posteriors); maxIndex[i] = maxPosteriorIndex; bestPosterior[i] = posteriors[maxIndex[i]]; } // vector pX_level(numLevels, 0); // // for(int i=0;igetNumSeqs(); // } // // int max_pLevel_X_index = -1; // double pX_level_sum = getLogExpSum(pX_level, max_pLevel_X_index); // double max_pLevel_X = exp(pX_level[max_pLevel_X_index] - pX_level_sum); // // vector pLevel_X(numLevels, 0); // for(int i=0;icontrol_pressed) { return taxonProbabilityString; } int confidenceScore = (int) (bestPosterior[i] * 100); if (confidenceScore >= confidenceThreshold) { if(indices[i][maxIndex[i]] != -1){ taxonProbabilityString += tree[indices[i][maxIndex[i]]]->getName() + "(" + toString(confidenceScore) + ");"; simpleTax += tree[indices[i][maxIndex[i]]]->getName() + ";"; // levelProbabilityOutput << tree[indices[i][maxIndex[i]]]->getName() << '(' << setprecision(6) << pLevel_X[i] << ");"; } else{ taxonProbabilityString += "unclassified(" + toString(confidenceScore) + ");"; // levelProbabilityOutput << "unclassified" << '(' << setprecision(6) << pLevel_X[i] << ");"; simpleTax += "unclassified;"; } }else { break; } savedspot = i; } for(int i=savedspot+1;icontrol_pressed) { return taxonProbabilityString; } taxonProbabilityString += "unclassified(0);"; simpleTax += "unclassified;"; } return taxonProbabilityString; } catch(exception& e) { m->errorOut(e, "KmerTree", "getTaxonomy"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/kmertree.h000066400000000000000000000013421255543666200205510ustar00rootroot00000000000000// // kmerTree.h // pdsBayesian // // Created by Patrick Schloss on 4/3/12. // Copyright (c) 2012 University of Michigan. All rights reserved. // #ifndef pdsBayesian_kmerTree_h #define pdsBayesian_kmerTree_h #include "classify.h" class KmerNode; class KmerTree : public Classify { public: KmerTree(string, string, int, int); ~KmerTree(); string getTaxonomy(Sequence*); private: int addTaxonomyToTree(string, string, vector&); vector ripKmerProfile(string); int getMinRiskIndexKmer(vector&, vector&, vector&); int aggregateThetas(); int sanityCheck(vector >&, vector&); int kmerSize; int numPossibleKmers, confidenceThreshold; vector tree; }; #endif mothur-1.36.1/source/classifier/knn.cpp000066400000000000000000000130521255543666200200550ustar00rootroot00000000000000/* * knn.cpp * Mothur * * Created by westcott on 11/4/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "knn.h" /**************************************************************************************************/ Knn::Knn(string tfile, string tempFile, string method, int kmerSize, float gapOpen, float gapExtend, float match, float misMatch, int n, int tid) : Classify(), num(n), search(method) { try { threadID = tid; shortcuts = true; //create search database and names vector generateDatabaseAndNames(tfile, tempFile, method, kmerSize, gapOpen, gapExtend, match, misMatch); } catch(exception& e) { m->errorOut(e, "Knn", "Knn"); exit(1); } } /**************************************************************************************************/ void Knn::setDistName(string s) { try { outDistName = s; ofstream outDistance; m->openOutputFile(outDistName, outDistance); outDistance << "Name\tBestMatch\tDistance" << endl; outDistance.close(); } catch(exception& e) { m->errorOut(e, "Knn", "setDistName"); exit(1); } } /**************************************************************************************************/ Knn::~Knn() { try { delete phyloTree; if (database != NULL) { delete database; } } catch(exception& e) { m->errorOut(e, "Knn", "~Knn"); exit(1); } } /**************************************************************************************************/ string Knn::getTaxonomy(Sequence* seq) { try { string tax; //use database to find closest seq vector closest = database->findClosestSequences(seq, num); if (search == "distance") { ofstream outDistance; m->openOutputFileAppend(outDistName, outDistance); outDistance << seq->getName() << '\t' << database->getName(closest[0]) << '\t' << database->getSearchScore() << endl; outDistance.close(); } if (m->control_pressed) { return tax; } vector closestNames; for (int i = 0; i < closest.size(); i++) { //find that sequences taxonomy in map it = taxonomy.find(names[closest[i]]); //is this sequence in the taxonomy file if (it == taxonomy.end()) { //error not in file m->mothurOut("Error: sequence " + names[closest[i]] + " is not in the taxonomy file. It will be eliminated as a match to sequence " + seq->getName() + "."); m->mothurOutEndLine(); }else{ closestNames.push_back(it->first); } } if (closestNames.size() == 0) { m->mothurOut("Error: All the matches for sequence " + seq->getName() + " have been eliminated. "); m->mothurOutEndLine(); tax = "unknown;"; }else{ tax = findCommonTaxonomy(closestNames); if (tax == "") { m->mothurOut("There are no common levels for sequence " + seq->getName() + ". "); m->mothurOutEndLine(); tax = "unknown;"; } } simpleTax = tax; return tax; } catch(exception& e) { m->errorOut(e, "Knn", "getTaxonomy"); exit(1); } } /**************************************************************************************************/ string Knn::findCommonTaxonomy(vector closest) { try { /*vector< vector > taxons; //taxon[0] = vector of taxonomy info for closest[0]. //so if closest[0] taxonomy is Bacteria;Alphaproteobacteria;Rhizobiales;Azorhizobium_et_rel.;Methylobacterium_et_rel.;Bosea; //taxon[0][0] = Bacteria, taxon[0][1] = Alphaproteobacteria.... taxons.resize(closest.size()); int smallest = 100; for (int i = 0; i < closest.size(); i++) { if (m->control_pressed) { return "control"; } string tax = taxonomy[closest[i]]; //we know its there since we checked in getTaxonomy cout << tax << endl; taxons[i] = parseTax(tax); //figure out who has the shortest taxonomy info. so you can start comparing there if (taxons[i].size() < smallest) { smallest = taxons[i].size(); } } //start at the highest level all the closest seqs have string common = ""; for (int i = (smallest-1); i >= 0; i--) { if (m->control_pressed) { return "control"; } string thistax = taxons[0][i]; int num = 0; for (int j = 1; j < taxons.size(); j++) { if (taxons[j][i] != thistax) { break; } num = j; } if (num == (taxons.size()-1)) { //they all match at this level for (int k = 0; k <= i; k++) { common += taxons[0][k] + ';'; } break; } }*/ string conTax; //create a tree containing sequences from this bin PhyloTree* p = new PhyloTree(); for (int i = 0; i < closest.size(); i++) { p->addSeqToTree(closest[i], taxonomy[closest[i]]); } //build tree p->assignHeirarchyIDs(0); TaxNode currentNode = p->get(0); //at each level while (currentNode.children.size() != 0) { //you still have more to explore TaxNode bestChild; int bestChildSize = 0; //go through children for (map::iterator itChild = currentNode.children.begin(); itChild != currentNode.children.end(); itChild++) { TaxNode temp = p->get(itChild->second); //select child with largest accessions - most seqs assigned to it if (temp.accessions.size() > bestChildSize) { bestChild = p->get(itChild->second); bestChildSize = temp.accessions.size(); } } if (bestChildSize == closest.size()) { //if yes, add it conTax += bestChild.name + ";"; }else{ //if no, quit break; } //move down a level currentNode = bestChild; } delete p; return conTax; } catch(exception& e) { m->errorOut(e, "Knn", "findCommonTaxonomy"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/knn.h000066400000000000000000000012671255543666200175270ustar00rootroot00000000000000#ifndef KNN_H #define KNN_H /* * knn.h * Mothur * * Created by westcott on 11/4/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "classify.h" /**************************************************************************************************/ class Knn : public Classify { public: Knn(string, string, string, int, float, float, float, float, int, int); ~Knn(); void setDistName(string s); string getTaxonomy(Sequence*); private: int num; string findCommonTaxonomy(vector); string search, outDistName; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/classifier/phylosummary.cpp000066400000000000000000000602671255543666200220520ustar00rootroot00000000000000/* * rawTrainingDataMaker.cpp * Mothur * * Created by westcott on 4/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "phylosummary.h" #include "referencedb.h" /**************************************************************************************************/ PhyloSummary::PhyloSummary(string refTfile, CountTable* c, bool r){ try { m = MothurOut::getInstance(); maxLevel = 0; ignore = false; numSeqs = 0; relabund = r; ct = c; groupmap = NULL; //check for necessary files if (refTfile == "saved") { ReferenceDB* rdb = ReferenceDB::getInstance(); refTfile = rdb->getSavedTaxonomy(); } string taxFileNameTest = m->getFullPathName((refTfile.substr(0,refTfile.find_last_of(".")+1) + "tree.sum")); ifstream FileTest(taxFileNameTest.c_str()); if (!FileTest) { m->mothurOut("Error: can't find " + taxFileNameTest + "."); m->mothurOutEndLine(); exit(1); }else{ readTreeStruct(FileTest); } tree[0].rank = "0"; assignRank(0); } catch(exception& e) { m->errorOut(e, "PhyloSummary", "PhyloSummary"); exit(1); } } /**************************************************************************************************/ PhyloSummary::PhyloSummary(CountTable* c, bool r){ try { m = MothurOut::getInstance(); maxLevel = 0; ignore = true; numSeqs = 0; relabund = r; ct = c; groupmap = NULL; tree.push_back(rawTaxNode("Root")); tree[0].rank = "0"; } catch(exception& e) { m->errorOut(e, "PhyloSummary", "PhyloSummary"); exit(1); } } /**************************************************************************************************/ PhyloSummary::PhyloSummary(string refTfile, GroupMap* g, bool r){ try { m = MothurOut::getInstance(); maxLevel = 0; ignore = false; numSeqs = 0; relabund = r; groupmap = g; ct = NULL; //check for necessary files if (refTfile == "saved") { ReferenceDB* rdb = ReferenceDB::getInstance(); refTfile = rdb->getSavedTaxonomy(); } string taxFileNameTest = m->getFullPathName((refTfile.substr(0,refTfile.find_last_of(".")+1) + "tree.sum")); ifstream FileTest(taxFileNameTest.c_str()); if (!FileTest) { m->mothurOut("Error: can't find " + taxFileNameTest + "."); m->mothurOutEndLine(); exit(1); }else{ readTreeStruct(FileTest); } tree[0].rank = "0"; assignRank(0); } catch(exception& e) { m->errorOut(e, "PhyloSummary", "PhyloSummary"); exit(1); } } /**************************************************************************************************/ PhyloSummary::PhyloSummary(GroupMap* g, bool r){ try { m = MothurOut::getInstance(); maxLevel = 0; ignore = true; numSeqs = 0; relabund = r; groupmap = g; ct = NULL; tree.push_back(rawTaxNode("Root")); tree[0].rank = "0"; } catch(exception& e) { m->errorOut(e, "PhyloSummary", "PhyloSummary"); exit(1); } } /**************************************************************************************************/ int PhyloSummary::summarize(string userTfile){ try { map temp; m->readTax(userTfile, temp, true); for (map::iterator itTemp = temp.begin(); itTemp != temp.end();) { addSeqToTree(itTemp->first, itTemp->second); temp.erase(itTemp++); } return numSeqs; } catch(exception& e) { m->errorOut(e, "PhyloSummary", "summarize"); exit(1); } } /**************************************************************************************************/ string PhyloSummary::getNextTaxon(string& heirarchy){ try { string currentLevel = ""; if(heirarchy != ""){ int pos = heirarchy.find_first_of(';'); currentLevel=heirarchy.substr(0,pos); if (pos != (heirarchy.length()-1)) { heirarchy=heirarchy.substr(pos+1); } else { heirarchy = ""; } } return currentLevel; } catch(exception& e) { m->errorOut(e, "PhyloSummary", "getNextTaxon"); exit(1); } } /**************************************************************************************************/ int PhyloSummary::addSeqToTree(string seqName, string seqTaxonomy){ try { numSeqs++; map::iterator childPointer; int currentNode = 0; string taxon; int level = 0; //are there confidence scores, if so remove them if (seqTaxonomy.find_first_of('(') != -1) { m->removeConfidences(seqTaxonomy); } while (seqTaxonomy != "") { if (m->control_pressed) { return 0; } //somehow the parent is getting one too many accnos //use print to reassign the taxa id taxon = getNextTaxon(seqTaxonomy); childPointer = tree[currentNode].children.find(taxon); if(childPointer != tree[currentNode].children.end()){ //if the node already exists, update count and move on int thisCount = 1; if (groupmap != NULL) { //find out the sequences group string group = groupmap->getGroup(seqName); if (group == "not found") { m->mothurOut("[WARNING]: " + seqName + " is not in your groupfile, and will be included in the overall total, but not any group total."); m->mothurOutEndLine(); } //do you have a count for this group? map::iterator itGroup = tree[childPointer->second].groupCount.find(group); //if yes, increment it - there should not be a case where we can't find it since we load group in read if (itGroup != tree[childPointer->second].groupCount.end()) { tree[childPointer->second].groupCount[group]++; } }else if (ct != NULL) { if (ct->hasGroupInfo()) { vector groupCounts = ct->getGroupCounts(seqName); vector groups = ct->getNamesOfGroups(); for (int i = 0; i < groups.size(); i++) { if (groupCounts[i] != 0) { //do you have a count for this group? map::iterator itGroup = tree[childPointer->second].groupCount.find(groups[i]); //if yes, increment it - there should not be a case where we can't find it since we load group in read if (itGroup != tree[childPointer->second].groupCount.end()) { tree[childPointer->second].groupCount[groups[i]] += groupCounts[i]; } } } } thisCount = ct->getNumSeqs(seqName); } tree[childPointer->second].total += thisCount; currentNode = childPointer->second; }else{ if (ignore) { tree.push_back(rawTaxNode(taxon)); int index = tree.size() - 1; tree[index].parent = currentNode; tree[index].level = (level+1); tree[currentNode].children[taxon] = index; int thisCount = 1; //initialize groupcounts if (groupmap != NULL) { vector mGroups = groupmap->getNamesOfGroups(); for (int j = 0; j < mGroups.size(); j++) { tree[index].groupCount[mGroups[j]] = 0; } //find out the sequences group string group = groupmap->getGroup(seqName); if (group == "not found") { m->mothurOut("[WARNING]: " + seqName + " is not in your groupfile, and will be included in the overall total, but not any group total."); m->mothurOutEndLine(); } //do you have a count for this group? map::iterator itGroup = tree[index].groupCount.find(group); //if yes, increment it - there should not be a case where we can't find it since we load group in read if (itGroup != tree[index].groupCount.end()) { tree[index].groupCount[group]++; } }else if (ct != NULL) { if (ct->hasGroupInfo()) { vector mGroups = ct->getNamesOfGroups(); for (int j = 0; j < mGroups.size(); j++) { tree[index].groupCount[mGroups[j]] = 0; } vector groupCounts = ct->getGroupCounts(seqName); vector groups = ct->getNamesOfGroups(); for (int i = 0; i < groups.size(); i++) { if (groupCounts[i] != 0) { //do you have a count for this group? map::iterator itGroup = tree[index].groupCount.find(groups[i]); //if yes, increment it - there should not be a case where we can't find it since we load group in read if (itGroup != tree[index].groupCount.end()) { tree[index].groupCount[groups[i]]+=groupCounts[i]; } } } } thisCount = ct->getNumSeqs(seqName); } tree[index].total = thisCount; currentNode = index; }else{ //otherwise, error m->mothurOut("Warning: cannot find taxon " + taxon + " in reference taxonomy tree at level " + toString(tree[currentNode].level) + " for " + seqName + ". This may cause totals of daughter levels not to add up in summary file."); m->mothurOutEndLine(); break; } } level++; if ((seqTaxonomy == "") && (level < maxLevel)) { //if you think you are done and you are not. for (int k = level; k < maxLevel; k++) { seqTaxonomy += "unclassified;"; } } } return 0; } catch(exception& e) { m->errorOut(e, "PhyloSummary", "addSeqToTree"); exit(1); } } /**************************************************************************************************/ int PhyloSummary::addSeqToTree(string seqTaxonomy, map containsGroup){ try { numSeqs++; map::iterator childPointer; int currentNode = 0; string taxon; int level = 0; //are there confidence scores, if so remove them if (seqTaxonomy.find_first_of('(') != -1) { m->removeConfidences(seqTaxonomy); } while (seqTaxonomy != "") { if (m->control_pressed) { return 0; } //somehow the parent is getting one too many accnos //use print to reassign the taxa id taxon = getNextTaxon(seqTaxonomy); childPointer = tree[currentNode].children.find(taxon); if(childPointer != tree[currentNode].children.end()){ //if the node already exists, update count and move on for (map::iterator itGroup = containsGroup.begin(); itGroup != containsGroup.end(); itGroup++) { if (itGroup->second == true) { tree[childPointer->second].groupCount[itGroup->first]++; } } tree[childPointer->second].total++; currentNode = childPointer->second; }else{ if (ignore) { tree.push_back(rawTaxNode(taxon)); int index = tree.size() - 1; tree[index].parent = currentNode; tree[index].level = (level+1); tree[index].total = 1; tree[currentNode].children[taxon] = index; for (map::iterator itGroup = containsGroup.begin(); itGroup != containsGroup.end(); itGroup++) { if (itGroup->second == true) { tree[index].groupCount[itGroup->first]++; } } currentNode = index; }else{ //otherwise, error m->mothurOut("Warning: cannot find taxon " + taxon + " in reference taxonomy tree at level " + toString(tree[currentNode].level) + ". This may cause totals of daughter levels not to add up in summary file."); m->mothurOutEndLine(); break; } } level++; if ((seqTaxonomy == "") && (level < maxLevel)) { //if you think you are done and you are not. for (int k = level; k < maxLevel; k++) { seqTaxonomy += "unclassified;"; } } } return 0; } catch(exception& e) { m->errorOut(e, "PhyloSummary", "addSeqToTree"); exit(1); } } /**************************************************************************************************/ void PhyloSummary::assignRank(int index){ try { map::iterator it; int counter = 1; for(it=tree[index].children.begin();it!=tree[index].children.end();it++){ tree[it->second].rank = tree[index].rank + '.' + toString(counter); counter++; assignRank(it->second); } } catch(exception& e) { m->errorOut(e, "PhyloSummary", "assignRank"); exit(1); } } /**************************************************************************************************/ void PhyloSummary::print(ofstream& out){ try { if (ignore) { assignRank(0); } vector mGroups; //print labels out << "taxlevel\trankID\ttaxon\tdaughterlevels\ttotal"; if (groupmap != NULL) { //so the labels match the counts below, since the map sorts them automatically... //sort(groupmap->namesOfGroups.begin(), groupmap->namesOfGroups.end()); mGroups = groupmap->getNamesOfGroups(); for (int i = 0; i < mGroups.size(); i++) { out << '\t' << mGroups[i]; } }else if (ct != NULL) { if (ct->hasGroupInfo()) { mGroups = ct->getNamesOfGroups(); for (int i = 0; i < mGroups.size(); i++) { out << '\t' << mGroups[i]; } } } out << endl; int totalChildrenInTree = 0; map::iterator itGroup; map::iterator it; for(it=tree[0].children.begin();it!=tree[0].children.end();it++){ if (tree[it->second].total != 0) { totalChildrenInTree++; tree[0].total += tree[it->second].total; if (groupmap != NULL) { for (int i = 0; i < mGroups.size(); i++) { tree[0].groupCount[mGroups[i]] += tree[it->second].groupCount[mGroups[i]]; } }else if ( ct != NULL) { if (ct->hasGroupInfo()) { for (int i = 0; i < mGroups.size(); i++) { tree[0].groupCount[mGroups[i]] += tree[it->second].groupCount[mGroups[i]]; } } } } } //print root if (relabund) { out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); out << tree[0].level << "\t" << tree[0].rank << "\t" << tree[0].name << "\t" << totalChildrenInTree << "\t" << (tree[0].total/(double) tree[0].total); if (groupmap != NULL) { for (int i = 0; i < mGroups.size(); i++) { double thisNum = tree[0].groupCount[mGroups[i]]; thisNum /= (double) groupmap->getNumSeqs(mGroups[i]); out << '\t' << thisNum; } }else if ( ct != NULL) { if (ct->hasGroupInfo()) { for (int i = 0; i < mGroups.size(); i++) { double thisNum = tree[0].groupCount[mGroups[i]]; thisNum /= (double) ct->getGroupCount(mGroups[i]); out << '\t' << thisNum; } } } out << endl; }else { out << tree[0].level << "\t" << tree[0].rank << "\t" << tree[0].name << "\t" << totalChildrenInTree << "\t" << tree[0].total; if (groupmap != NULL) { for (int i = 0; i < mGroups.size(); i++) { out << '\t'<< tree[0].groupCount[mGroups[i]]; } }else if ( ct != NULL) { if (ct->hasGroupInfo()) { for (int i = 0; i < mGroups.size(); i++) { out << '\t' << tree[0].groupCount[mGroups[i]]; } } } out << endl; } //print rest print(0, out); } catch(exception& e) { m->errorOut(e, "PhyloSummary", "print"); exit(1); } } /**************************************************************************************************/ void PhyloSummary::print(ofstream& out, bool relabund){ try { if (ignore) { assignRank(0); } int totalChildrenInTree = 0; map::iterator itGroup; map::iterator it; for(it=tree[0].children.begin();it!=tree[0].children.end();it++){ if (tree[it->second].total != 0) { totalChildrenInTree++; tree[0].total += tree[it->second].total; if (groupmap != NULL) { vector mGroups = groupmap->getNamesOfGroups(); for (int i = 0; i < mGroups.size(); i++) { tree[0].groupCount[mGroups[i]] += tree[it->second].groupCount[mGroups[i]]; } }else if ( ct != NULL) { vector mGroups = ct->getNamesOfGroups(); if (ct->hasGroupInfo()) { for (int i = 0; i < mGroups.size(); i++) { tree[0].groupCount[mGroups[i]] += tree[it->second].groupCount[mGroups[i]]; } } } } } //print root out << tree[0].name << "\t" << "1.0000"; //root relative abundance is 1, everyone classifies to root /* if (groupmap != NULL) { for (int i = 0; i < mGroups.size(); i++) { out << tree[0].groupCount[mGroups[i]] << '\t'; } }else if ( ct != NULL) { if (ct->hasGroupInfo()) { for (int i = 0; i < mGroups.size(); i++) { out << tree[0].groupCount[mGroups[i]] << '\t'; } } }*/ if (groupmap != NULL) { vector mGroups = groupmap->getNamesOfGroups(); for (int i = 0; i < mGroups.size(); i++) { out << '\t' << "1.0000"; } }else if ( ct != NULL) { vector mGroups = ct->getNamesOfGroups(); if (ct->hasGroupInfo()) { for (int i = 0; i < mGroups.size(); i++) { out << '\t' << "1.0000"; } } } out << endl; //print rest print(0, out, relabund); } catch(exception& e) { m->errorOut(e, "PhyloSummary", "print"); exit(1); } } /**************************************************************************************************/ void PhyloSummary::print(int i, ofstream& out){ try { map::iterator it; for(it=tree[i].children.begin();it!=tree[i].children.end();it++){ if (tree[it->second].total != 0) { int totalChildrenInTree = 0; map::iterator it2; for(it2=tree[it->second].children.begin();it2!=tree[it->second].children.end();it2++){ if (tree[it2->second].total != 0) { totalChildrenInTree++; } } if (relabund) { out << tree[it->second].level << "\t" << tree[it->second].rank << "\t" << tree[it->second].name << "\t" << totalChildrenInTree << "\t" << (tree[it->second].total/(double) tree[0].total); }else { out << tree[it->second].level << "\t" << tree[it->second].rank << "\t" << tree[it->second].name << "\t" << totalChildrenInTree << "\t" << tree[it->second].total; } if (relabund) { map::iterator itGroup; if (groupmap != NULL) { vector mGroups = groupmap->getNamesOfGroups(); for (int i = 0; i < mGroups.size(); i++) { out << '\t' << (tree[it->second].groupCount[mGroups[i]]/(double)groupmap->getNumSeqs(mGroups[i])); } }else if (ct != NULL) { if (ct->hasGroupInfo()) { vector mGroups = ct->getNamesOfGroups(); for (int i = 0; i < mGroups.size(); i++) { out << '\t' << (tree[it->second].groupCount[mGroups[i]]/(double)ct->getGroupCount(mGroups[i])); } } } }else { map::iterator itGroup; if (groupmap != NULL) { vector mGroups = groupmap->getNamesOfGroups(); for (int i = 0; i < mGroups.size(); i++) { out << '\t' << tree[it->second].groupCount[mGroups[i]]; } }else if (ct != NULL) { if (ct->hasGroupInfo()) { vector mGroups = ct->getNamesOfGroups(); for (int i = 0; i < mGroups.size(); i++) { out << '\t' << tree[it->second].groupCount[mGroups[i]]; } } } } out << endl; } print(it->second, out); } } catch(exception& e) { m->errorOut(e, "PhyloSummary", "print"); exit(1); } } /**************************************************************************************************/ void PhyloSummary::print(int i, ofstream& out, bool relabund){ try { map::iterator it; for(it=tree[i].children.begin();it!=tree[i].children.end();it++){ if (tree[it->second].total != 0) { int totalChildrenInTree = 0; map::iterator it2; for(it2=tree[it->second].children.begin();it2!=tree[it->second].children.end();it2++){ if (tree[it2->second].total != 0) { totalChildrenInTree++; } } string nodeName = ""; int thisNode = it->second; while (tree[thisNode].rank != "0") { //while you are not at top if (m->control_pressed) { break; } nodeName = tree[thisNode].name + "|" + nodeName; thisNode = tree[thisNode].parent; } if (nodeName != "") { nodeName = nodeName.substr(0, nodeName.length()-1); } out << nodeName << "\t" << (tree[it->second].total / (float)tree[i].total); map::iterator itGroup; if (groupmap != NULL) { vector mGroups = groupmap->getNamesOfGroups(); for (int j = 0; j < mGroups.size(); j++) { if (tree[i].groupCount[mGroups[j]] == 0) { out << '\t' << 0; }else { out << '\t' << (tree[it->second].groupCount[mGroups[j]] / (float)tree[i].groupCount[mGroups[j]]); } } }else if (ct != NULL) { if (ct->hasGroupInfo()) { vector mGroups = ct->getNamesOfGroups(); for (int j = 0; j < mGroups.size(); j++) { if (tree[i].groupCount[mGroups[j]] == 0) { out << '\t' << 0 ; }else { out << '\t' << (tree[it->second].groupCount[mGroups[j]] / (float)tree[i].groupCount[mGroups[j]]); } } } } out << endl; } print(it->second, out, relabund); } } catch(exception& e) { m->errorOut(e, "PhyloSummary", "print"); exit(1); } } /**************************************************************************************************/ void PhyloSummary::readTreeStruct(ifstream& in){ try { //read version string line = m->getline(in); m->gobble(in); int num; in >> num; m->gobble(in); tree.resize(num); in >> maxLevel; m->gobble(in); //read the tree file for (int i = 0; i < tree.size(); i++) { in >> tree[i].level >> tree[i].name >> num; //num contains the number of children tree[i] has //set children string childName; int childIndex; for (int j = 0; j < num; j++) { in >> childName >> childIndex; tree[i].children[childName] = childIndex; } //initialize groupcounts if (groupmap != NULL) { for (int j = 0; j < (groupmap->getNamesOfGroups()).size(); j++) { tree[i].groupCount[(groupmap->getNamesOfGroups())[j]] = 0; } }else if (ct != NULL) { if (ct->hasGroupInfo()) { for (int j = 0; j < (ct->getNamesOfGroups()).size(); j++) { tree[i].groupCount[(ct->getNamesOfGroups())[j]] = 0; } } } tree[i].total = 0; m->gobble(in); //if (tree[i].level > maxLevel) { maxLevel = tree[i].level; } } } catch(exception& e) { m->errorOut(e, "PhyloSummary", "readTreeStruct"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/phylosummary.h000066400000000000000000000032551255543666200215110ustar00rootroot00000000000000#ifndef RAWTRAININGDATAMAKER_H #define RAWTRAININGDATAMAKER_H /* * rawTrainingDataMaker.h * Mothur * * Created by westcott on 4/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" #include "groupmap.h" #include "counttable.h" /**************************************************************************************************/ struct rawTaxNode { map children; //childs name to index in tree int parent, level; string name, rank; map groupCount; int total; rawTaxNode(string n) : name(n), level(0), parent(-1), total(0) {} rawTaxNode(){} }; /**************************************************************************************************/ //doesn't use MPI ifdefs since only pid 0 uses this class class PhyloSummary { public: PhyloSummary(GroupMap*, bool); PhyloSummary(string, GroupMap*, bool); PhyloSummary(CountTable*, bool); PhyloSummary(string, CountTable*, bool); ~PhyloSummary() {} int summarize(string); //pass it a taxonomy file and a group file and it makes the tree int addSeqToTree(string, string); int addSeqToTree(string, map); void print(ofstream&); void print(ofstream&, bool); int getMaxLevel() { return maxLevel; } private: string getNextTaxon(string&); vector tree; void print(int, ofstream&); void print(int, ofstream&, bool); void assignRank(int); void readTreeStruct(ifstream&); GroupMap* groupmap; CountTable* ct; bool ignore, relabund; int numNodes; int numSeqs; int maxLevel; MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/classifier/phylotree.cpp000066400000000000000000000501731255543666200213070ustar00rootroot00000000000000/* * doTaxonomy.cpp * * * Created by Pat Schloss on 6/17/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "phylotree.h" /**************************************************************************************************/ PhyloTree::PhyloTree(){ try { m = MothurOut::getInstance(); numNodes = 1; numSeqs = 0; tree.push_back(TaxNode("Root")); tree[0].heirarchyID = "0"; maxLevel = 0; calcTotals = true; addSeqToTree("unknown", "unknown;"); } catch(exception& e) { m->errorOut(e, "PhyloTree", "PhyloTree"); exit(1); } } /**************************************************************************************************/ PhyloTree::PhyloTree(ifstream& in, string filename){ try { m = MothurOut::getInstance(); calcTotals = false; numNodes = 0; numSeqs = 0; #ifdef USE_MPI MPI_File inMPI; MPI_Offset size; MPI_Status status; char inFileName[1024]; strcpy(inFileName, filename.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); MPI_File_get_size(inMPI, &size); char* buffer = new char[size]; MPI_File_read(inMPI, buffer, size, MPI_CHAR, &status); string tempBuf = buffer; if (tempBuf.length() > size) { tempBuf = tempBuf.substr(0, size); } istringstream iss (tempBuf,istringstream::in); delete buffer; //read version m->getline(iss); m->gobble(iss); iss >> numNodes; m->gobble(iss); tree.resize(numNodes); for (int i = 0; i < tree.size(); i++) { iss >> tree[i].name >> tree[i].level >> tree[i].parent; m->gobble(iss); } //read genus nodes int numGenus = 0; iss >> numGenus; m->gobble(iss); int gnode, gsize; totals.clear(); for (int i = 0; i < numGenus; i++) { iss >> gnode >> gsize; m->gobble(iss); uniqueTaxonomies.insert(gnode); totals.push_back(gsize); } MPI_File_close(&inMPI); #else //read version string line = m->getline(in); m->gobble(in); in >> numNodes; m->gobble(in); tree.resize(numNodes); for (int i = 0; i < tree.size(); i++) { in >> tree[i].name >> tree[i].level >> tree[i].parent; m->gobble(in); } //read genus nodes int numGenus = 0; in >> numGenus; m->gobble(in); int gnode, gsize; totals.clear(); for (int i = 0; i < numGenus; i++) { in >> gnode >> gsize; m->gobble(in); uniqueTaxonomies.insert(gnode); totals.push_back(gsize); } in.close(); #endif } catch(exception& e) { m->errorOut(e, "PhyloTree", "PhyloTree"); exit(1); } } /**************************************************************************************************/ PhyloTree::PhyloTree(string tfile){ try { m = MothurOut::getInstance(); numNodes = 1; numSeqs = 0; tree.push_back(TaxNode("Root")); tree[0].heirarchyID = "0"; maxLevel = 0; calcTotals = true; string name, tax; #ifdef USE_MPI int pid, num, processors; vector positions; MPI_Status status; MPI_File inMPI; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); char inFileName[1024]; strcpy(inFileName, tfile.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer if (pid == 0) { positions = m->setFilePosEachLine(tfile, num); //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&num, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&positions[0], (num+1), MPI_LONG, i, 2001, MPI_COMM_WORLD); } }else{ MPI_Recv(&num, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); positions.resize(num+1); MPI_Recv(&positions[0], (num+1), MPI_LONG, 0, 2001, MPI_COMM_WORLD, &status); } //read file for(int i=0;i length) { tempBuf = tempBuf.substr(0, length); } delete buf4; istringstream iss (tempBuf,istringstream::in); iss >> name >> tax; addSeqToTree(name, tax); } MPI_File_close(&inMPI); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else map temp; m->readTax(tfile, temp, true); for (map::iterator itTemp = temp.begin(); itTemp != temp.end();) { addSeqToTree(itTemp->first, itTemp->second); temp.erase(itTemp++); } #endif assignHeirarchyIDs(0); string unknownTax = "unknown;"; //added last taxon until you get desired level for (int i = 1; i < maxLevel; i++) { unknownTax += "unclassfied;"; } addSeqToTree("unknown", unknownTax); //create file for summary if needed setUp(tfile); } catch(exception& e) { m->errorOut(e, "PhyloTree", "PhyloTree"); exit(1); } } /**************************************************************************************************/ string PhyloTree::getNextTaxon(string& heirarchy, string seqname){ try { string currentLevel = ""; if(heirarchy != ""){ int pos = heirarchy.find_first_of(';'); if (pos == -1) { //you can't find another ; currentLevel = heirarchy; heirarchy = ""; m->mothurOut(seqname + " is missing a ;, please check for other errors."); m->mothurOutEndLine(); }else{ currentLevel=heirarchy.substr(0,pos); if (pos != (heirarchy.length()-1)) { heirarchy=heirarchy.substr(pos+1); } else { heirarchy = ""; } } } return currentLevel; } catch(exception& e) { m->errorOut(e, "PhyloTree", "getNextTaxon"); exit(1); } } /**************************************************************************************************/ vector PhyloTree::getSeqs(string seqTaxonomy){ try { string taxCopy = seqTaxonomy; vector names; map::iterator childPointer; int currentNode = 0; m->removeConfidences(seqTaxonomy); string taxon; while(seqTaxonomy != ""){ if (m->control_pressed) { return names; } taxon = getNextTaxon(seqTaxonomy, ""); if (m->debug) { m->mothurOut(taxon +'\n'); } if (taxon == "") { m->mothurOut(taxCopy + " has an error in the taxonomy. This may be due to a ;;"); m->mothurOutEndLine(); break; } childPointer = tree[currentNode].children.find(taxon); if(childPointer != tree[currentNode].children.end()){ //if the node already exists, move on currentNode = childPointer->second; } else{ //otherwise, error this taxonomy is not in tree m->mothurOut("[ERROR]: " + taxCopy + " is not in taxonomy tree, please correct."); m->mothurOutEndLine(); m->control_pressed = true; return names; } if (seqTaxonomy == "") { names = tree[currentNode].accessions; } } return names; } catch(exception& e) { m->errorOut(e, "PhyloTree", "getSeqs"); exit(1); } } /**************************************************************************************************/ int PhyloTree::addSeqToTree(string seqName, string seqTaxonomy){ try { numSeqs++; map::iterator childPointer; int currentNode = 0; int level = 1; tree[0].accessions.push_back(seqName); m->removeConfidences(seqTaxonomy); string taxon;// = getNextTaxon(seqTaxonomy); while(seqTaxonomy != ""){ level++; if (m->control_pressed) { return 0; } //somehow the parent is getting one too many accnos //use print to reassign the taxa id taxon = getNextTaxon(seqTaxonomy, seqName); if (m->debug) { m->mothurOut(seqName +'\t' + taxon +'\n'); } if (taxon == "") { m->mothurOut(seqName + " has an error in the taxonomy. This may be due to a ;;"); m->mothurOutEndLine(); if (currentNode != 0) { uniqueTaxonomies.insert(currentNode); } break; } childPointer = tree[currentNode].children.find(taxon); if(childPointer != tree[currentNode].children.end()){ //if the node already exists, move on currentNode = childPointer->second; tree[currentNode].accessions.push_back(seqName); name2Taxonomy[seqName] = currentNode; } else{ //otherwise, create it tree.push_back(TaxNode(taxon)); numNodes++; tree[currentNode].children[taxon] = numNodes-1; tree[numNodes-1].parent = currentNode; currentNode = tree[currentNode].children[taxon]; tree[currentNode].accessions.push_back(seqName); name2Taxonomy[seqName] = currentNode; } if (seqTaxonomy == "") { uniqueTaxonomies.insert(currentNode); } } return 0; } catch(exception& e) { m->errorOut(e, "PhyloTree", "addSeqToTree"); exit(1); } } /**************************************************************************************************/ vector PhyloTree::getGenusNodes() { try { genusIndex.clear(); //generate genusIndexes set::iterator it2; map temp; for (it2=uniqueTaxonomies.begin(); it2!=uniqueTaxonomies.end(); it2++) { genusIndex.push_back(*it2); temp[*it2] = genusIndex.size()-1; } for (map::iterator itName = name2Taxonomy.begin(); itName != name2Taxonomy.end(); itName++) { map::iterator itTemp = temp.find(itName->second); if (itTemp != temp.end()) { name2GenusNodeIndex[itName->first] = itTemp->second; } else { m->mothurOut("[ERROR]: trouble making name2GenusNodeIndex, aborting.\n"); m->control_pressed = true; } } return genusIndex; } catch(exception& e) { m->errorOut(e, "PhyloTree", "getGenusNodes"); exit(1); } } /**************************************************************************************************/ vector PhyloTree::getGenusTotals() { try { if (calcTotals) { totals.clear(); //reset counts because we are on a new word for (int j = 0; j < genusIndex.size(); j++) { totals.push_back(tree[genusIndex[j]].accessions.size()); } return totals; }else{ return totals; } } catch(exception& e) { m->errorOut(e, "PhyloTree", "getGenusNodes"); exit(1); } } /**************************************************************************************************/ void PhyloTree::assignHeirarchyIDs(int index){ try { map::iterator it; int counter = 1; for(it=tree[index].children.begin();it!=tree[index].children.end();it++){ if (m->debug) { m->mothurOut(toString(index) +'\t' + tree[it->second].name +'\n'); } tree[it->second].heirarchyID = tree[index].heirarchyID + '.' + toString(counter); counter++; tree[it->second].level = tree[index].level + 1; //save maxLevel for binning the unclassified seqs if (tree[it->second].level > maxLevel) { maxLevel = tree[it->second].level; } assignHeirarchyIDs(it->second); } } catch(exception& e) { m->errorOut(e, "PhyloTree", "assignHeirarchyIDs"); exit(1); } } /**************************************************************************************************/ void PhyloTree::setUp(string tfile){ try{ string taxFileNameTest = tfile.substr(0,tfile.find_last_of(".")+1) + "tree.sum"; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { binUnclassified(taxFileNameTest); } #else binUnclassified(taxFileNameTest); #endif } catch(exception& e) { m->errorOut(e, "PhyloTree", "setUp"); exit(1); } } /**************************************************************************************************/ void PhyloTree::binUnclassified(string file){ try { ofstream out; m->openOutputFile(file, out); map::iterator itBin; map::iterator childPointer; vector copy = tree; //fill out tree fillOutTree(0, copy); //get leaf nodes that may need extension for (int i = 0; i < copy.size(); i++) { if (copy[i].children.size() == 0) { leafNodes[i] = i; } } if (m->debug) { m->mothurOut("maxLevel = " + toString(maxLevel) +'\n'); } int copyNodes = copy.size(); //go through the seqs and if a sequence finest taxon is not the same level as the most finely defined taxon then classify it as unclassified where necessary map::iterator itLeaf; for (itLeaf = leafNodes.begin(); itLeaf != leafNodes.end(); itLeaf++) { if (m->control_pressed) { out.close(); break; } int level = copy[itLeaf->second].level; int currentNode = itLeaf->second; if (m->debug) { m->mothurOut(copy[currentNode].name +'\n'); } //this sequence is unclassified at some levels while(level < maxLevel){ level++; if (m->debug) { m->mothurOut("level = " + toString(level) +'\n'); } string taxon = "unclassified"; //does the parent have a child names 'unclassified'? childPointer = copy[currentNode].children.find(taxon); if(childPointer != copy[currentNode].children.end()){ //if the node already exists, move on currentNode = childPointer->second; //currentNode becomes 'unclassified' } else{ //otherwise, create it copy.push_back(TaxNode(taxon)); copyNodes++; copy[currentNode].children[taxon] = copyNodes-1; copy[copyNodes-1].parent = currentNode; copy[copyNodes-1].level = copy[currentNode].level + 1; currentNode = copy[currentNode].children[taxon]; } } } if (!m->control_pressed) { //print copy tree print(out, copy); } } catch(exception& e) { m->errorOut(e, "PhyloTree", "binUnclassified"); exit(1); } } /**************************************************************************************************/ void PhyloTree::fillOutTree(int index, vector& copy) { try { map::iterator it; it = copy[index].children.find("unclassified"); if (it == copy[index].children.end()) { //no unclassified at this level string taxon = "unclassified"; copy.push_back(TaxNode(taxon)); copy[index].children[taxon] = copy.size()-1; copy[copy.size()-1].parent = index; copy[copy.size()-1].level = copy[index].level + 1; } if (tree[index].level < maxLevel) { for(it=tree[index].children.begin();it!=tree[index].children.end();it++){ //check your children fillOutTree(it->second, copy); } } } catch(exception& e) { m->errorOut(e, "PhyloTree", "fillOutTree"); exit(1); } } /**************************************************************************************************/ string PhyloTree::getFullTaxonomy(string seqName) { try { string tax = ""; int currentNode = name2Taxonomy[seqName]; while (tree[currentNode].parent != -1) { tax = tree[currentNode].name + ";" + tax; currentNode = tree[currentNode].parent; } return tax; } catch(exception& e) { m->errorOut(e, "PhyloTree", "getFullTaxonomy"); exit(1); } } /**************************************************************************************************/ void PhyloTree::print(ofstream& out, vector& copy){ try { //output mothur version out << "#" << m->getVersion() << endl; out << copy.size() << endl; out << maxLevel << endl; for (int i = 0; i < copy.size(); i++) { out << copy[i].level << '\t'<< copy[i].name << '\t' << copy[i].children.size(); map::iterator it; for(it=copy[i].children.begin();it!=copy[i].children.end();it++){ out << '\t' << it->first << '\t' << it->second; } out << endl; } out.close(); } catch(exception& e) { m->errorOut(e, "PhyloTree", "print"); exit(1); } } /**************************************************************************************************/ void PhyloTree::printTreeNodes(string treefilename) { try { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { #endif ofstream outTree; m->openOutputFile(treefilename, outTree); //output mothur version outTree << "#" << m->getVersion() << endl; //print treenodes outTree << tree.size() << endl; for (int i = 0; i < tree.size(); i++) { outTree << tree[i].name << '\t' << tree[i].level << '\t' << tree[i].parent << endl; } //print genus nodes outTree << endl << uniqueTaxonomies.size() << endl; set::iterator it2; for (it2=uniqueTaxonomies.begin(); it2!=uniqueTaxonomies.end(); it2++) { outTree << *it2 << '\t' << tree[*it2].accessions.size() << endl; } outTree << endl; outTree.close(); #ifdef USE_MPI } #endif } catch(exception& e) { m->errorOut(e, "PhyloTree", "printTreeNodes"); exit(1); } } /**************************************************************************************************/ TaxNode PhyloTree::get(int i ){ try { if (i < tree.size()) { return tree[i]; } else { cout << i << '\t' << tree.size() << endl ; m->mothurOut("Mismatch with taxonomy and template files. Cannot continue."); m->mothurOutEndLine(); exit(1); } } catch(exception& e) { m->errorOut(e, "PhyloTree", "get"); exit(1); } } /**************************************************************************************************/ TaxNode PhyloTree::get(string seqName){ try { map::iterator itFind = name2Taxonomy.find(seqName); if (itFind != name2Taxonomy.end()) { return tree[name2Taxonomy[seqName]]; } else { m->mothurOut("Cannot find " + seqName + ". Mismatch with taxonomy and template files. Cannot continue."); m->mothurOutEndLine(); exit(1);} } catch(exception& e) { m->errorOut(e, "PhyloTree", "get"); exit(1); } } /**************************************************************************************************/ string PhyloTree::getName(int i ){ try { if (i < tree.size()) { return tree[i].name; } else { m->mothurOut("Mismatch with taxonomy and template files. Cannot continue."); m->mothurOutEndLine(); exit(1); } } catch(exception& e) { m->errorOut(e, "PhyloTree", "get"); exit(1); } } /**************************************************************************************************/ int PhyloTree::getGenusIndex(string seqName){ try { map::iterator itFind = name2GenusNodeIndex.find(seqName); if (itFind != name2GenusNodeIndex.end()) { return itFind->second; } else { m->mothurOut("Cannot find " + seqName + ". Could be a mismatch with taxonomy and template files. Cannot continue."); m->mothurOutEndLine(); exit(1);} } catch(exception& e) { m->errorOut(e, "PhyloTree", "get"); exit(1); } } /**************************************************************************************************/ bool PhyloTree::ErrorCheck(vector templateFileNames){ try { bool okay = true; templateFileNames.push_back("unknown"); map::iterator itFind; map taxonomyFileNames = name2Taxonomy; if (m->debug) { m->mothurOut("[DEBUG]: in error check. Numseqs in template = " + toString(templateFileNames.size()) + ". Numseqs in taxonomy = " + toString(taxonomyFileNames.size()) + ".\n"); } for (int i = 0; i < templateFileNames.size(); i++) { itFind = taxonomyFileNames.find(templateFileNames[i]); if (itFind != taxonomyFileNames.end()) { //found it so erase it taxonomyFileNames.erase(itFind); }else { m->mothurOut("'" +templateFileNames[i] + "' is in your template file and is not in your taxonomy file. Please correct."); m->mothurOutEndLine(); okay = false; } //templateFileNames.erase(templateFileNames.begin()+i); //i--; } templateFileNames.clear(); if (taxonomyFileNames.size() > 0) { //there are names in tax file that are not in template okay = false; for (itFind = taxonomyFileNames.begin(); itFind != taxonomyFileNames.end(); itFind++) { m->mothurOut(itFind->first + " is in your taxonomy file and is not in your template file. Please correct."); m->mothurOutEndLine(); } } return okay; } catch(exception& e) { m->errorOut(e, "PhyloTree", "ErrorCheck"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/phylotree.h000066400000000000000000000050601255543666200207470ustar00rootroot00000000000000#ifndef DOTAXONOMY_H #define DOTAXONOMY_H /* * phylotree.h * * * Created by Pat Schloss on 6/17/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" /**************************************************************************************************/ struct TaxNode { vector accessions; //names of seqs in this branch of tree map children; //childs name to index in tree int parent, childNumber, level; string name, heirarchyID; TaxNode(string n) : name(n), level(0), parent(-1) { } TaxNode(){} }; /**************************************************************************************************/ class PhyloTree { public: PhyloTree(); PhyloTree(string); //pass it a taxonomy file and it makes the tree PhyloTree(ifstream&, string); //pass it a taxonomy file and it makes the train.tree ~PhyloTree() {}; int addSeqToTree(string, string); void assignHeirarchyIDs(int); void printTreeNodes(string); //used by bayesian to save time vector getGenusNodes(); vector getGenusTotals(); void setUp(string); //used to create file needed for summary file if you use () constructor and add seqs manually instead of passing taxonomyfile TaxNode get(int i); TaxNode get(string seqName); string getName(int i); int getGenusIndex(string seqName); string getFullTaxonomy(string); //pass a sequence name return taxonomy vector getSeqs(string); //returns names of sequences in given taxonomy int getMaxLevel() { return maxLevel; } int getNumSeqs() { return numSeqs; } int getNumNodes() { return tree.size(); } bool ErrorCheck(vector); private: string getNextTaxon(string&, string); void print(ofstream&, vector&); //used to create static reference taxonomy file void fillOutTree(int, vector&); //used to create static reference taxonomy file void binUnclassified(string); vector tree; vector genusIndex; //holds the indexes in tree where the genus level taxonomies are stored vector totals; //holds the numSeqs at each genus level taxonomy map name2Taxonomy; //maps name to index in tree map name2GenusNodeIndex; set uniqueTaxonomies; //map of unique taxonomies map leafNodes; //used to create static reference taxonomy file //void print(int, ofstream&); int numNodes; int numSeqs; int maxLevel; bool calcTotals; MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/classifier/taxonomyequalizer.cpp000066400000000000000000000102601255543666200230650ustar00rootroot00000000000000/* * taxonomyequalizer.cpp * Mothur * * Created by westcott on 11/20/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "taxonomyequalizer.h" /**************************************************************************************************/ TaxEqualizer::TaxEqualizer(string tfile, int c, string o) : cutoff(c), outputDir(o) { try { m = MothurOut::getInstance(); containsConfidence = false; ifstream inTax; m->openInputFile(tfile, inTax); highestLevel = getHighestLevel(inTax); if (!m->control_pressed) { //if the user has specified a cutoff and it's smaller than the highest level if ((cutoff != -1) && (cutoff < highestLevel)) { highestLevel = cutoff; }else if (cutoff > highestLevel) { m->mothurOut("The highest level taxonomy you have is " + toString(highestLevel) + " and your cutoff is " + toString(cutoff) + ". I will set the cutoff to " + toString(highestLevel)); m->mothurOutEndLine(); } inTax.close(); ifstream in; m->openInputFile(tfile, in); ofstream out; equalizedFile = outputDir + m->getRootName(m->getSimpleName(tfile)) + "equalized.taxonomy"; m->openOutputFile(equalizedFile, out); string name, tax; while (in) { if (m->control_pressed) { break; } in >> name >> tax; m->gobble(in); if (containsConfidence) { m->removeConfidences(tax); } //is this a taxonomy that needs to be extended? if (seqLevels[name] < highestLevel) { extendTaxonomy(name, tax, highestLevel); }else if (seqLevels[name] > highestLevel) { //this can happen if the user enters a cutoff truncateTaxonomy(name, tax, highestLevel); } out << name << '\t' << tax << endl; } in.close(); out.close(); if (m->control_pressed) { m->mothurRemove(equalizedFile); } }else { inTax.close(); } } catch(exception& e) { m->errorOut(e, "TaxEqualizer", "TaxEqualizer"); exit(1); } } /**************************************************************************************************/ int TaxEqualizer::getHighestLevel(ifstream& in) { try { int level = 0; string name, tax; while (in) { in >> name >> tax; m->gobble(in); //count levels in this taxonomy int thisLevel = 0; for (int i = 0; i < tax.length(); i++) { if (tax[i] == ';') { thisLevel++; } } //save sequences level seqLevels[name] = thisLevel; //is this the longest taxonomy? if (thisLevel > level) { level = thisLevel; testTax = tax; //testTax is used to figure out if this file has confidences we need to strip out } } int pos = testTax.find_first_of('('); //if there are '(' then there are confidences we need to take out if (pos != -1) { containsConfidence = true; } return level; } catch(exception& e) { m->errorOut(e, "TaxEqualizer", "getHighestLevel"); exit(1); } } /**************************************************************************************************/ void TaxEqualizer::extendTaxonomy(string name, string& tax, int desiredLevel) { try { //get last taxon tax = tax.substr(0, tax.length()-1); //take off final ";" int pos = tax.find_last_of(';'); string lastTaxon = tax.substr(pos+1); lastTaxon += ";"; //add back on delimiting char tax += ";"; int currentLevel = seqLevels[name]; //added last taxon until you get desired level for (int i = currentLevel; i < desiredLevel; i++) { tax += lastTaxon; } } catch(exception& e) { m->errorOut(e, "TaxEqualizer", "extendTaxonomy"); exit(1); } } /**************************************************************************************************/ void TaxEqualizer::truncateTaxonomy(string name, string& tax, int desiredLevel) { try { int currentLevel = seqLevels[name]; tax = tax.substr(0, tax.length()-1); //take off final ";" //remove a taxon until you get to desired level for (int i = currentLevel; i > desiredLevel; i--) { tax = tax.substr(0, tax.find_last_of(';')); } tax += ";"; } catch(exception& e) { m->errorOut(e, "TaxEqualizer", "truncateTaxonomy"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/classifier/taxonomyequalizer.h000066400000000000000000000024241255543666200225350ustar00rootroot00000000000000#ifndef TAXONOMYEQUALIZER_H #define TAXONOMYEQUALIZER_H /* * taxonomyequalizer.h * Mothur * * Created by westcott on 11/20/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" //reads in taxonomy file and makes all the taxonomies the same length //by appending the last taxon to a given taxonomy as many times as needed to //make it as long as the longest taxonomy in the file /**************************************************************************************************/ class TaxEqualizer { public: TaxEqualizer(string, int, string); ~TaxEqualizer() {}; string getEqualizedTaxFile() { return equalizedFile; } int getHighestLevel() { return highestLevel; } private: string equalizedFile, testTax, outputDir; bool containsConfidence; int cutoff, highestLevel; map seqLevels; //maps name to level of taxonomy int getHighestLevel(ifstream&); //scans taxonomy file to find taxonomy with highest level void extendTaxonomy(string, string&, int); //name, taxonomy, desired level void truncateTaxonomy(string, string&, int); //name, taxonomy, desired level MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/classifier/taxonomynode.cpp000066400000000000000000000047511255543666200220210ustar00rootroot00000000000000/* * taxonomynode.cpp * * * Created by Pat Schloss on 7/8/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ /**************************************************************************************************/ #include "taxonomynode.h" /**************************************************************************************************/ TaxonomyNode::TaxonomyNode(string n, int l): name(n), level(l){ m = MothurOut::getInstance(); parent = -1; numChildren = 0; numSeqs = 0; } /**************************************************************************************************/ void TaxonomyNode::setName(string n) { name = n; } /**************************************************************************************************/ string TaxonomyNode::getName() { return name; } /**************************************************************************************************/ void TaxonomyNode::setParent(int p) { parent = p; } /**************************************************************************************************/ int TaxonomyNode::getParent() { return parent; } /**************************************************************************************************/ void TaxonomyNode::makeChild(string c, int i) { children[c] = i; } /**************************************************************************************************/ map TaxonomyNode::getChildren() { return children; } /**************************************************************************************************/ int TaxonomyNode::getChildIndex(string c){ map::iterator it = children.find(c); if(it != children.end()) { return it->second; } else { return -1; } } /**************************************************************************************************/ int TaxonomyNode::getNumKids() { return (int)children.size(); } /**************************************************************************************************/ int TaxonomyNode::getNumSeqs() { return numSeqs; } /**************************************************************************************************/ void TaxonomyNode::setTotalSeqs(int n) { totalSeqs = n; } /**************************************************************************************************/ int TaxonomyNode::getLevel() { return level; } /**************************************************************************************************/ mothur-1.36.1/source/classifier/taxonomynode.h000066400000000000000000000017611255543666200214640ustar00rootroot00000000000000#ifndef TAXONOMYNODE #define TAXONOMYNODE /* * taxonomynode.h * * * Created by Pat Schloss on 7/8/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ /**************************************************************************************************/ #include "mothurout.h" /**************************************************************************************************/ class TaxonomyNode { public: TaxonomyNode(); TaxonomyNode(string, int); void setName(string); string getName(); void setParent(int); int getParent(); void makeChild(string, int); map getChildren(); int getChildIndex(string); int getNumKids(); int getNumSeqs(); void setTotalSeqs(int); int getLevel(); private: int parent; map children; int numChildren; int level; protected: MothurOut* m; int numSeqs; int totalSeqs; string name; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/clearcut/000077500000000000000000000000001255543666200162405ustar00rootroot00000000000000mothur-1.36.1/source/clearcut/clearcut.cpp000066400000000000000000001433451255543666200205600ustar00rootroot00000000000000 /* * clearcut.c * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * An implementation of the Relaxed Neighbor-Joining algorithm * of Evans, J., Sheneman, L., and Foster, J. * * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #include #include #include #include #include #include #include #include #include "distclearcut.h" #include "dmat.h" #include "fasta.h" #include "cmdargs.h" #include "common.h" #include "clearcut.h" #include "prng.h" /* * main() - * * The entry point to the program. * */ int clearcut_main(int argc, char *argv[]) { DMAT *dmat; /* The working distance matrix */ DMAT *dmat_backup = NULL;/* A backup distance matrix */ NJ_TREE *tree; /* The phylogenetic tree */ NJ_ARGS *nj_args; /* Structure for holding command-line arguments */ long int i; /* some variables for tracking time */ struct timeval tv; unsigned long long startUs, endUs; /* check and parse supplied command-line arguments */ nj_args = NJ_handle_args(argc, argv); if(!nj_args) { fprintf(stderr, "Clearcut: Error processing command-line arguments.\n"); exit(-1); } /* for verbose reporting, print the random number seed to stdout */ if(nj_args->verbose_flag) { printf("PRNG SEED: %d\n", nj_args->seed); } /* Initialize Mersenne Twister PRNG */ init_genrand(nj_args->seed); switch(nj_args->input_mode) { /* If the input type is a distance matrix */ case NJ_INPUT_MODE_DISTANCE: /* parse the distance matrix */ dmat = NJ_parse_distance_matrix(nj_args); if(!dmat) { exit(-1); } break; /* If the input type is a multiple sequence alignment */ case NJ_INPUT_MODE_ALIGNED_SEQUENCES: /* build a distance matrix from a multiple sequence alignment */ dmat = NJ_build_distance_matrix(nj_args); if(!dmat) { fprintf(stderr, "Clearcut: Failed to build distance matrix from alignment.\n"); exit(-1); } break; default: fprintf(stderr, "Clearcut: Could not determine how to process input\n"); exit(-1); } /* * Output the computed distance matrix, * if the user specified one. */ if(nj_args->matrixout) { NJ_output_matrix(nj_args, dmat); } /* * If we are going to generate multiple trees from * the same distance matrix, we need to make a backup * of the original distance matrix. */ if(nj_args->ntrees > 1) { dmat_backup = NJ_dup_dmat(dmat); } /* process n trees */ for(i=0;intrees;i++) { /* * If the user has specified matrix shuffling, we need * to randomize the distance matrix */ if(nj_args->shuffle) { NJ_shuffle_distance_matrix(dmat); } /* RECORD THE PRECISE TIME OF THE START OF THE NEIGHBOR-JOINING */ gettimeofday(&tv, NULL); startUs = ((unsigned long long) tv.tv_sec * 1000000ULL) + ((unsigned long long) tv.tv_usec); /* * Invoke either the Relaxed Neighbor-Joining algorithm (default) * or the "traditional" Neighbor-Joining algorithm */ if(nj_args->neighbor) { tree = NJ_neighbor_joining(nj_args, dmat); } else { tree = NJ_relaxed_nj(nj_args, dmat); } if(!tree) { fprintf(stderr, "Clearcut: Failed to construct tree.\n"); exit(0); } /* RECORD THE PRECISE TIME OF THE END OF THE NEIGHBOR-JOINING */ gettimeofday(&tv, NULL); endUs = ((unsigned long long) tv.tv_sec * 1000000ULL) + ((unsigned long long) tv.tv_usec); /* print the time taken to perform the neighbor join */ if(nj_args->verbose_flag) { if(nj_args->neighbor) { fprintf(stderr, "NJ tree built in %llu.%06llu secs\n", (endUs - startUs) / 1000000ULL, (endUs - startUs) % 1000000ULL); } else { fprintf(stderr, "RNJ tree built in %llu.%06llu secs\n", (endUs - startUs) / 1000000ULL, (endUs - startUs) % 1000000ULL); } } /* Output the neighbor joining tree here */ NJ_output_tree(nj_args, tree, dmat, i); NJ_free_tree(tree); /* Free the tree */ NJ_free_dmat(dmat); /* Free the working distance matrix */ /* * If we need to do another iteration, lets re-initialize * our working distance matrix. */ if(nj_args->ntrees > 1 && i<(nj_args->ntrees-1) ) { dmat = NJ_dup_dmat(dmat_backup); } } /* Free the backup distance matrix */ if(nj_args->ntrees > 1) { NJ_free_dmat(dmat_backup); } /* If verbosity, describe where the tree output is */ if(nj_args->verbose_flag) { if(nj_args->neighbor) { printf("NJ tree(s) in %s\n", nj_args->outfilename); } else { printf("Relaxed NJ tree(s) in %s\n", nj_args->outfilename); } } return 0; } /* * NJ_find_hmin() - Find minimum transformed values along horizontal * * * INPUTS: * ------- * dmat -- The distance matrix * a -- The index of the specific taxon in the distance matrix * * RETURNS: * -------- * -- The value of the selected minimum * min -- Used to transport the index of the minima out * of the function (by reference) * hmincount -- Return the number of minima along the horizontal * (by reference) * * * DESCRIPTION: * ------------ * * A fast, inline function to find the smallest transformed value * along the "horizontal" portion of an entry in a distance matrix. * * Distance matrices are stored internally as continguously-allocated * upper-diagonal structures. With the exception of the taxa at * row 0 of this upper-diagonal matrix, all taxa have both a horizontal * and vertical component in the distance matrix. This function * scans the horizonal portion of the entry in the distance matrix * for the specified taxon and finds the minimum transformed value * along that horizontal component. * * Since multiple minima can exist along the horizontal portion * of the entry, I consider all minima and break ties * stochastically to help avoid systematic bias. * * Just searching along the horizontal portion of a row is very fast * since the data is stored linearly and contiguously in memory and * cache locality is exploited in the distance matrix representation. * * Look at nj.h for more information on how the distance matrix * is architected. * */ static inline float NJ_find_hmin(DMAT *dmat, long int a, long int *min, long int *hmincount) { long int i; /* index variable for looping */ int size; /* current size of distance matrix */ int mindex = 0; /* holds the current index to the chosen minimum */ float curval; /* used to hold current transformed values */ float hmin; /* the value of the transformed minimum */ float *ptr, *r2, *val; /* pointers used to reduce dereferencing in inner loop */ /* values used for stochastic selection among multiple minima */ float p, x; long int smallcnt; /* initialize the min to something large */ hmin = (float)HUGE_VAL; /* setup some pointers to limit dereferencing later */ r2 = dmat->r2; val = dmat->val; size = dmat->size; /* initialize values associated with minima tie breaking */ p = 1.0; smallcnt = 0; ptr = &(val[NJ_MAP(a, a+1, size)]); /* at the start of the horiz. part */ for(i=a+1;i -- The value of the selected minimum * min -- Used to transport the index of the minima out * of the function (by reference) * vmincount -- The number of minima along the vertical * return by reference. * * DESCRIPTION: * ------------ * * A fast, inline function to find the smallest transformed value * along the "vertical" portion of an entry in a distance matrix. * * Distance matrices are stored internally as continguously-allocated * upper-diagonal matrices. With the exception of the taxa at * row 0 of this upper-diagonal matrix, all taxa have both a horizontal * and vertical component in the distance matrix. This function * scans the vertical portion of the entry in the distance matrix * for the specified taxon and finds the minimum transformed value * along that vertical component. * * Since multiple minima can exist along the vertical portion * of the entry, I consider all minima and break ties * stochastically to help avoid systematic bias. * * Due to cache locality reasons, searching along the vertical * component is going to be considerably slower than searching * along the horizontal. * * Look at nj.h for more information on how the distance matrix * is architected. * */ static inline float NJ_find_vmin(DMAT *dmat, long int a, long int *min, long int *vmincount) { long int i; /* index variable used for looping */ long int size; /* track the size of the matrix */ long int mindex = 0;/* track the index to the minimum */ float curval; /* track value of current transformed distance */ float vmin; /* the index to the smallest "vertical" minimum */ /* pointers which are used to reduce pointer dereferencing in inner loop */ float *ptr, *r2, *val; /* values used in stochastically breaking ties */ float p, x; long int smallcnt; /* initialize the vertical min to something really big */ vmin = (float)HUGE_VAL; /* save off some values to limit dereferencing later */ r2 = dmat->r2; val = dmat->val; size = dmat->size; p = 1.0; smallcnt = 0; /* start on the first row and work down */ ptr = &(val[NJ_MAP(0, a, size)]); for(i=0;isize; val = dmat->val; r = dmat->r+a+1; /* * Loop through the rows and decrement the stored r values * by the distances stored in the rows and columns of the distance * matrix which are being removed post-join. * * We do the rows altogether in order to benefit from cache locality. */ ptrx = &(val[NJ_MAP(a, a+1, size)]); ptry = &(val[NJ_MAP(b, b+1, size)]); for(i=a+1;ib) { *r -= *(ptry++); } r++; } /* Similar to the above loop, we now do the columns */ ptrx = &(val[NJ_MAP(0, a, size)]); ptry = &(val[NJ_MAP(0, b, size)]); r = dmat->r; for(i=0;isize-1) { /* if we can't do a row here, lets do a column */ if(a==0) { if(b==1) { target = 2; } else { target = 1; } } else { target = 0; } } else { target = b+1; } /* distance between a and the root of clade (a,b) */ a2clade = ( (dmat->val[NJ_MAP(a, b, dmat->size)]) + (dmat->r2[a] - dmat->r2[b]) ) / 2.0; /* distance between b and the root of clade (a,b) */ b2clade = ( (dmat->val[NJ_MAP(a, b, dmat->size)]) + (dmat->r2[b] - dmat->r2[a]) ) / 2.0; /* distance between the clade (a,b) and the target taxon */ if(bval[NJ_MAP(a, target, dmat->size)] - a2clade) + (dmat->val[NJ_MAP(b, target, dmat->size)] - b2clade) ) / 2.0; /* * Check to see that distance from clade root to target + distance from * b to clade root are equal to the distance from b to the target */ if(NJ_FLT_EQ(dmat->val[NJ_MAP(b, target, dmat->size)], (clade_dist + b2clade))) { return(1); /* join is legitimate */ } else { return(0); /* join is illigitimate */ } } else { /* compute the distance from the clade root to the target */ clade_dist = ( (dmat->val[NJ_MAP(target, a, dmat->size)] - a2clade) + (dmat->val[NJ_MAP(target, b, dmat->size)] - b2clade) ) / 2.0; /* * Check to see that distance from clade root to target + distance from * b to clade root are equal to the distance from b to the target */ if(NJ_FLT_EQ(dmat->val[NJ_MAP(target, b, dmat->size)], (clade_dist + b2clade))) { return(1); /* join is legitimate */ } else { return(0); /* join is illegitimate */ } } } /* * NJ_check() - Check to see if two taxa can be joined * * INPUTS: * ------- * nj_args -- Pointer to the data structure holding command-line args * dmat -- distance matrix * a -- index into dmat for one of the rows to be joined * b -- index into dmat for another row to be joined * min -- the minimum value found * additivity -- a flag (0 = not additive mode, 1 = additive mode) * * OUTPUTS: * -------- * int 1 if join is okay * 0 if join is not okay * * DESCRIPTION: * ------------ * * This function ultimately takes two rows and makes sure that the * intersection of those two rows, which has a transformed distance of * "min", is actually the smallest (or equal to the smallest) * transformed distance for both rows (a, b). If so, it returns * 1, else it returns 0. * * Basically, we want to join two rows only if the minimum * transformed distance on either row is at the intersection of * those two rows. * */ static inline int NJ_check(NJ_ARGS *nj_args, DMAT *dmat, long int a, long int b, float min, int additivity) { long int i, size; float *ptr, *val, *r2; /* some aliases for speed and readability reasons */ val = dmat->val; r2 = dmat->r2; size = dmat->size; /* now determine if joining a, b will result in broken distances */ if(additivity) { if(!NJ_check_additivity(dmat, a, b)) { return(0); } } /* scan the horizontal of row b, punt if anything < min */ ptr = &(val[NJ_MAP(b, b+1, size)]); for(i=b+1;inorandom) { /* if we are doing random joins, we checked this */ ptr = val + a; for(i=0;i reduce pointer dereferencing */ float a2clade; /* distance from a to the new node that joins a and b */ float b2clade; /* distance from b to the new node that joins a and b */ float cval; /* stores distance information during loop */ float *vptr; /* pointer to elements in first row of dist matrix */ float *ptra; /* pointer to elements in row a of distance matrix */ float *ptrb; /* pointer to elements in row b of distance matrix */ float *val, *r, *r2; /* simply used to limit pointer dereferencing */ /* We must assume that a < b */ if(a >= b) { fprintf(stderr, "Clearcut: (aval; r = dmat->r; r2 = dmat->r2; size = dmat->size; /* compute the distance from the clade components (a, b) to the new node */ a2clade = ( (val[NJ_MAP(a, b, size)]) + (dmat->r2[a] - dmat->r2[b]) ) / 2.0; b2clade = ( (val[NJ_MAP(a, b, size)]) + (dmat->r2[b] - dmat->r2[a]) ) / 2.0; r[a] = 0.0; /* we are removing row a, so clear dist. in r */ /* * Fill the horizontal part of the "a" row and finish computing r and r2 * we handle the horizontal component first to maximize cache locality */ ptra = &(val[NJ_MAP(a, a+1, size)]); /* start ptra at the horiz. of a */ ptrb = &(val[NJ_MAP(a+1, b, size)]); /* start ptrb at comparable place */ for(i=a+1;ir = r+1; /* * Collapse r2 here by copying contents of r2[0] into r2[b] and * incrementing pointer to the beginning of r2 by one row */ r2[b] = r2[0]; dmat->r2 = r2+1; /* increment dmat pointer to next row */ dmat->val += size; /* decrement the total size of the distance matrix by one row */ dmat->size--; return; } /* * NJ_neighbor_joining() - Perform a traditional Neighbor-Joining * * * INPUTS: * ------- * nj_args -- A pointer to a structure containing the command-line arguments * dmat -- A pointer to the distance matrix * * RETURNS: * -------- * NJ_TREE * -- A pointer to the Neighbor-Joining tree. * * DESCRIPTION: * ------------ * * This function performs a traditional Neighbor-Joining operation in which * the distance matrix is exhaustively searched for the global minimum * transformed distance. The two nodes which intersect at the global * minimum transformed distance are then joined and the distance * matrix is collapsed. This process continues until there are only * two nodes left, at which point those nodes are joined. * */ NJ_TREE * NJ_neighbor_joining(NJ_ARGS *nj_args, DMAT *dmat) { NJ_TREE *tree = NULL; NJ_VERTEX *vertex = NULL; long int a, b; float min; /* initialize the r and r2 vectors */ NJ_init_r(dmat); /* allocate and initialize our vertex vector used for tree construction */ vertex = NJ_init_vertex(dmat); if(!vertex) { fprintf(stderr, "Clearcut: Could not initialize vertex in NJ_neighbor_joining()\n"); return(NULL); } /* we iterate until the working distance matrix has only 2 entries */ while(vertex->nactive > 2) { /* * Find the global minimum transformed distance from the distance matrix */ min = NJ_min_transform(dmat, &a, &b); /* * Build the tree by removing nodes a and b from the vertex array * and inserting a new internal node which joins a and b. Collapse * the vertex array similarly to how the distance matrix and r and r2 * are compacted. */ NJ_decompose(dmat, vertex, a, b, 0); /* decrement the r and r2 vectors by the distances corresponding to a, b */ NJ_compute_r(dmat, a, b); /* compact the distance matrix and the r and r2 vectors */ NJ_collapse(dmat, vertex, a, b); } /* Properly join the last two nodes on the vertex list */ tree = NJ_decompose(dmat, vertex, 0, 1, NJ_LAST); /* return the computed tree to the calling function */ return(tree); } /* * NJ_relaxed_nj() - Construct a tree using the Relaxed Neighbor-Joining * * INPUTS: * ------- * nj_args -- A pointer to a data structure containing the command-line args * dmat -- A pointer to the distance matrix * * RETURNS: * -------- * * NJ_TREE * -- A pointer to a Relaxed Neighbor-Joining tree * * DESCRIPTION: * ------------ * * This function implements the Relaxed Neighbor-Joining algorithm of * Evans, J., Sheneman, L., and Foster, J. * * Relaxed Neighbor-Joining works by choosing a local minimum transformed * distance when determining when to join two nodes. (Traditional * Neighbor-Joining chooses a global minimum transformed distance). * * The algorithm shares the property with traditional NJ that if the * input distances are additive (self-consistent), then the algorithm * will manage to construct the true tree consistent with the additive * distances. Additivity state is tracked and every proposed join is checked * to make sure it maintains additivity constraints. If no * additivity-preserving join is possible in a single pass, then the distance * matrix is non-additive, and additivity checking is abandoned. * * The algorithm will either attempt joins randomly, or it will perform joins * in a particular order. The default behavior is to perform joins randomly, * but this can be switched off with a command-line switch. * * For randomized joins, all attempts are made to alleviate systematic bias * for the choice of rows to joins. All tie breaking is done in a way which * is virtually free of bias. * * To perform randomized joins, a random permutation is constructed which * specifies the order in which to attempt joins. I iterate through the * random permutation, and for each row in the random permutation, I find * the minimum transformed distance for that row. If there are multiple * minima, I break ties evenly. For the row which intersects our * randomly chosen row at the chosen minimum, if we are are still in * additivity mode, I check to see if joining the two rows will break * our additivity constraints. If not, I check to see if there exists * a transformed distance which is smaller than the minimum found on the * original row. If there is, then we proceed through the random permutation * trying additional rows in the random order specified in the permutation. * If there is no smaller minimum transformed distance on either of the * two rows, then we join them, collapse the distance matrix, and compute * a new random permutation. * * If the entire random permutation is traversed and no joins are possible * due to additivity constraints, then the distance matrix is not * additive, and additivity constraint-checking is disabled. * */ NJ_TREE * NJ_relaxed_nj(NJ_ARGS *nj_args, DMAT *dmat) { NJ_TREE *tree; NJ_VERTEX *vertex; long int a, b, t, bh, bv, i; float hmin, vmin, hvmin; float p, q, x; int join_flag; int additivity_mode; long int hmincount, vmincount; long int *permutation = NULL; /* initialize the r and r2 vectors */ NJ_init_r(dmat); additivity_mode = 1; /* allocate the permutation vector, if we are in randomize mode */ if(!nj_args->norandom) { permutation = (long int *)calloc(dmat->size, sizeof(long int)); if(!permutation) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_relaxed_nj()\n"); return(NULL); } } /* allocate and initialize our vertex vector used for tree construction */ vertex = NJ_init_vertex(dmat); /* loop until there are only 2 nodes left to join */ while(vertex->nactive > 2) { switch(nj_args->norandom) { /* RANDOMIZED JOINS */ case 0: join_flag = 0; NJ_permute(permutation, dmat->size-1); for(i=0;isize-1 && (vertex->nactive>2) ;i++) { a = permutation[i]; /* find min trans dist along horiz. of row a */ hmin = NJ_find_hmin(dmat, a, &bh, &hmincount); if(a) { /* find min trans dist along vert. of row a */ vmin = NJ_find_vmin(dmat, a, &bv, &vmincount); } else { vmin = hmin; bv = bh; vmincount = 0; } if(NJ_FLT_EQ(hmin, vmin)) { /* * The minima along the vertical and horizontal are * the same. Compute the proportion of minima along * the horizonal (p) and the proportion of minima * along the vertical (q). * * If the same minima exist along the horizonal and * vertical, we break the tie in a way which is * non-biased. That is, we break the tie based on the * proportion of horiz. minima versus vertical minima. * */ p = (float)hmincount / ((float)hmincount + (float)vmincount); q = 1.0 - p; x = genrand_real2(); if(x < p) { hvmin = hmin; b = bh; } else { hvmin = vmin; b = bv; } } else if(NJ_FLT_LT(hmin, vmin) ) { hvmin = hmin; b = bh; } else { hvmin = vmin; b = bv; } if(NJ_check(nj_args, dmat, a, b, hvmin, additivity_mode)) { /* swap a and b, if necessary, to make sure a < b */ if(b < a) { t = a; a = b; b = t; } join_flag = 1; /* join taxa from rows a and b */ NJ_decompose(dmat, vertex, a, b, 0); /* collapse matrix */ NJ_compute_r(dmat, a, b); NJ_collapse(dmat, vertex, a, b); NJ_permute(permutation, dmat->size-1); } } /* turn off additivity if go through an entire cycle without joining */ if(!join_flag) { additivity_mode = 0; } break; /* DETERMINISTIC JOINS */ case 1: join_flag = 0; for(a=0;asize-1 && (vertex->nactive > 2) ;) { /* find the min along the horizontal of row a */ hmin = NJ_find_hmin(dmat, a, &b, &hmincount); if(NJ_check(nj_args, dmat, a, b, hmin, additivity_mode)) { join_flag = 1; /* join taxa from rows a and b */ NJ_decompose(dmat, vertex, a, b, 0); /* collapse matrix */ NJ_compute_r(dmat, a, b); NJ_collapse(dmat, vertex, a, b); if(a) { a--; } } else { a++; } } /* turn off additivity if go through an entire cycle without joining */ if(!join_flag) { additivity_mode = 0; } break; } } /* WHILE */ /* Join the last two nodes on the vertex list */ tree = NJ_decompose(dmat, vertex, 0, 1, NJ_LAST); if(nj_args->verbose_flag) { if(additivity_mode) { printf("Tree is additive\n"); } else { printf("Tree is not additive\n"); } } if(vertex) { NJ_free_vertex(vertex); } if(!nj_args->norandom && permutation) { free(permutation); } return(tree); } /* * NJ_print_distance_matrix() - * * Print a distance matrix * */ void NJ_print_distance_matrix(DMAT *dmat) { long int i, j; printf("ntaxa: %ld\n", dmat->ntaxa); printf(" size: %ld\n", dmat->size); for(i=0;isize;i++) { for(j=0;jsize;j++) { if(j>i) { printf(" %0.4f", dmat->val[NJ_MAP(i, j, dmat->size)]); } else { printf(" -"); } } if(dmat->r && dmat->r2) { printf("\t\t%0.4f", dmat->r[i]); printf("\t%0.4f", dmat->r2[i]); printf("\n"); for(j=0;jsize;j++) { if(j>i) { printf(" %0.4f", dmat->val[NJ_MAP(i, j, dmat->size)] - (dmat->r2[i] + dmat->r2[j])); } else { printf(" "); } } printf("\n"); } } printf("\n\n"); return; } /* * NJ_output_tree() - * * A wrapper for the function that really prints the tree, * basically to get a newline in there conveniently. :-) * * Print n trees, as specified in command-args * using "count" variable from 0 to (n-1) * */ void NJ_output_tree(NJ_ARGS *nj_args, NJ_TREE *tree, DMAT *dmat, long int count) { FILE *fp; if(nj_args->stdout_flag) { fp = stdout; } else { if(count == 0) { fp = fopen(nj_args->outfilename, "w"); /* open for writing */ } else { fp = fopen(nj_args->outfilename, "a"); /* open for appending */ } if(!fp) { fprintf(stderr, "Clearcut: Failed to open outfile %s\n", nj_args->outfilename); exit(-1); } } NJ_output_tree2(fp, nj_args, tree, tree, dmat); fprintf(fp, ";\n"); if(!nj_args->stdout_flag) { fclose(fp); } return; } /* * NJ_output_tree2() - * * */ void NJ_output_tree2(FILE *fp, NJ_ARGS *nj_args, NJ_TREE *tree, NJ_TREE *root, DMAT *dmat) { if(!tree) { return; } if(tree->taxa_index != NJ_INTERNAL_NODE) { if(nj_args->expblen) { fprintf(fp, "%s:%e", dmat->taxaname[tree->taxa_index], tree->dist); } else { fprintf(fp, "%s:%f", dmat->taxaname[tree->taxa_index], tree->dist); } } else { if(tree->left && tree->right) { fprintf(fp, "("); } if(tree->left) { NJ_output_tree2(fp, nj_args, tree->left, root, dmat); } if(tree->left && tree->right) { fprintf(fp, ","); } if(tree->right) { NJ_output_tree2(fp, nj_args, tree->right, root, dmat); } if(tree != root->left) { if(tree->left && tree->right) { if(tree != root) { if(nj_args->expblen) { fprintf(fp, "):%e", tree->dist); } else { fprintf(fp, "):%f", tree->dist); } } else { fprintf(fp, ")"); } } } else { fprintf(fp, ")"); } } return; } /* * NJ_init_r() * * This function computes the r column in our matrix * */ void NJ_init_r(DMAT *dmat) { long int i, j, size; long int index; float *r, *r2, *val; long int size1; float size2; r = dmat->r; r2 = dmat->r2; val = dmat->val; size = dmat->size; size1 = size-1; size2 = (float)(size-2); index = 0; for(i=0;inodes = (NJ_TREE **)calloc(dmat->ntaxa, sizeof(NJ_TREE *)); vertex->nodes_handle = vertex->nodes; /* initialize our size and active variables */ vertex->nactive = dmat->ntaxa; vertex->size = dmat->ntaxa; /* initialize the nodes themselves */ for(i=0;intaxa;i++) { vertex->nodes[i] = (NJ_TREE *)calloc(1, sizeof(NJ_TREE)); vertex->nodes[i]->left = NULL; vertex->nodes[i]->right = NULL; vertex->nodes[i]->taxa_index = i; } return(vertex); } /* * NJ_decompose() - * * This function decomposes the star by creating new internal nodes * and joining two existing tree nodes to it * */ NJ_TREE * NJ_decompose(DMAT *dmat, NJ_VERTEX *vertex, long int x, long int y, int last_flag) { NJ_TREE *new_node; float x2clade, y2clade; /* compute the distance from the clade components to the new node */ if(last_flag) { x2clade = (dmat->val[NJ_MAP(x, y, dmat->size)]); } else { x2clade = (dmat->val[NJ_MAP(x, y, dmat->size)])/2 + ((dmat->r2[x] - dmat->r2[y])/2); } vertex->nodes[x]->dist = x2clade; if(last_flag) { y2clade = (dmat->val[NJ_MAP(x, y, dmat->size)]); } else { y2clade = (dmat->val[NJ_MAP(x, y, dmat->size)])/2 + ((dmat->r2[y] - dmat->r2[x])/2); } vertex->nodes[y]->dist = y2clade; /* allocate new node to connect two sub-clades */ new_node = (NJ_TREE *)calloc(1, sizeof(NJ_TREE)); new_node->left = vertex->nodes[x]; new_node->right = vertex->nodes[y]; new_node->taxa_index = NJ_INTERNAL_NODE; /* this is not a terminal node, no taxa index */ if(last_flag) { return(new_node); } vertex->nodes[x] = new_node; vertex->nodes[y] = vertex->nodes[0]; vertex->nodes = &(vertex->nodes[1]); vertex->nactive--; return(new_node); } /* * NJ_print_vertex() - * * For debugging, print the contents of the vertex * */ void NJ_print_vertex(NJ_VERTEX *vertex) { long int i; printf("Number of active nodes: %ld\n", vertex->nactive); for(i=0;inactive;i++) { printf("%ld ", vertex->nodes[i]->taxa_index); } printf("\n"); return; } /* * NJ_print_r() - * */ void NJ_print_r(DMAT *dmat) { long int i; printf("\n"); for(i=0;isize;i++) { printf("r[%ld] = %0.2f\n", i, dmat->r[i]); } printf("\n"); return; } /* * NJ_print_taxanames() - * * Print taxa names here * */ void NJ_print_taxanames(DMAT *dmat) { long int i; printf("Number of taxa: %ld\n", dmat->ntaxa); for(i=0;intaxa;i++) { printf("%ld) %s\n", i, dmat->taxaname[i]); } printf("\n"); return; } /* * NJ_shuffle_distance_matrix() - * * Randomize a distance matrix here * */ void NJ_shuffle_distance_matrix(DMAT *dmat) { long int *perm = NULL; char **tmp_taxaname = NULL; float *tmp_val = NULL; long int i, j; /* alloc the random permutation and a new matrix to hold the shuffled vals */ perm = (long int *)calloc(dmat->size, sizeof(long int)); tmp_taxaname = (char **)calloc(dmat->size, sizeof(char *)); tmp_val = (float *)calloc(NJ_NCELLS(dmat->ntaxa), sizeof(float)); if(!tmp_taxaname || !perm || !tmp_val) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_shuffle_distance_matrix()\n"); exit(-1); } /* compute a permutation which will describe how to shuffle the matrix */ NJ_permute(perm, dmat->size); for(i=0;isize;i++) { for(j=i+1;jsize;j++) { if(perm[j] < perm[i]) { tmp_val[NJ_MAP(i, j, dmat->size)] = dmat->val[NJ_MAP(perm[j], perm[i], dmat->size)]; } else { tmp_val[NJ_MAP(i, j, dmat->size)] = dmat->val[NJ_MAP(perm[i], perm[j], dmat->size)]; } } tmp_taxaname[i] = dmat->taxaname[perm[i]]; } /* free our random permutation */ if(perm) { free(perm); } /* free the old value matrix */ if(dmat->val) { free(dmat->val); } /* re-assign the value matrix pointers */ dmat->val = tmp_val; dmat->valhandle = dmat->val; /* * Free our old taxaname with its particular ordering * and re-assign to the new. */ if(dmat->taxaname) { free(dmat->taxaname); } dmat->taxaname = tmp_taxaname; return; } /* * NJ_free_tree() - * * Free a given NJ tree */ void NJ_free_tree(NJ_TREE *node) { if(!node) { return; } if(node->left) { NJ_free_tree(node->left); } if(node->right) { NJ_free_tree(node->right); } free(node); return; } /* * NJ_print_permutation() * * Print a permutation * */ void NJ_print_permutation(long int *perm, long int size) { long int i; for(i=0;intaxa = src->ntaxa; dest->size = src->size; /* allocate space for array of pointers to taxanames */ dest->taxaname = (char **)calloc(dest->ntaxa, sizeof(char *)); if(!dest->taxaname) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_dup_dmat()\n"); goto XIT_BAD; } /* allocate space for the taxanames themselves */ for(i=0;intaxa;i++) { dest->taxaname[i] = (char *)calloc(strlen(src->taxaname[i])+1, sizeof(char)); if(!dest->taxaname[i]) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_dup_dmat()\n"); goto XIT_BAD; } } /* allocate space for the distance values */ dest->val = (float *)calloc(NJ_NCELLS(src->ntaxa), sizeof(float)); if(!dest->val) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_dup_dmat()\n"); goto XIT_BAD; } /* allocate space for the r and r2 vectors */ dest->r = (float *)calloc(src->ntaxa, sizeof(float)); dest->r2 = (float *)calloc(src->ntaxa, sizeof(float)); /* copy titles */ for(i=0;intaxa;i++) { strcpy(dest->taxaname[i], src->taxaname[i]); } /* copy values */ memcpy(dest->val, src->valhandle, NJ_NCELLS(src->ntaxa)*sizeof(float)); /* copy r and r2 */ memcpy(dest->r, src->rhandle, src->ntaxa*sizeof(float)); memcpy(dest->r2, src->r2handle, src->ntaxa*sizeof(float)); /* track some memory addresses */ dest->valhandle = dest->val; dest->rhandle = dest->r; dest->r2handle = dest->r2; return(dest); XIT_BAD: /* free what we may have allocated */ NJ_free_dmat(dest); return(NULL); } /* * NJ_free_dmat() - */ void NJ_free_dmat(DMAT *dmat) { long int i; if(dmat) { if(dmat->taxaname) { for(i=0;intaxa;i++) { if(dmat->taxaname[i]) { free(dmat->taxaname[i]); } } free(dmat->taxaname); } if(dmat->valhandle) { free(dmat->valhandle); } if(dmat->rhandle) { free(dmat->rhandle); } if(dmat->r2handle) { free(dmat->r2handle); } free(dmat); } return; } /* * NJ_free_vertex() - * * Free the vertex data structure * */ void NJ_free_vertex(NJ_VERTEX *vertex) { if(vertex) { if(vertex->nodes_handle) { free(vertex->nodes_handle); } free(vertex); } return; } /* * * NJ_min_transform() - Find the smallest transformed value to identify * which nodes to join. * * INPUTS: * ------- * dmat -- The distance matrix * * RETURNS: * -------- * -- The minimimum transformed distance * ret_i -- The row of the smallest transformed distance (by reference) * ret_j -- The col of the smallest transformed distance (by reference) * * * DESCRIPTION: * ------------ * * Used only with traditional Neighbor-Joining, this function checks the entire * working distance matrix and identifies the smallest transformed distance. * This requires traversing the entire diagonal matrix, which is itself a * O(N^2) operation. * */ float NJ_min_transform(DMAT *dmat, long int *ret_i, long int *ret_j) { long int i, j; /* indices used for looping */ long int tmp_i = 0;/* to limit pointer dereferencing */ long int tmp_j = 0;/* to limit pointer dereferencing */ float smallest; /* track the smallest trans. dist */ float curval; /* the current trans. dist in loop */ float *ptr; /* pointer into distance matrix */ float *r2; /* pointer to r2 matrix for computing transformed dists */ smallest = (float)HUGE_VAL; /* track these here to limit pointer dereferencing in inner loop */ ptr = dmat->val; r2 = dmat->r2; /* for every row */ for(i=0;isize;i++) { ptr++; /* skip diagonal */ for(j=i+1;jsize;j++) { /* for every column */ /* find transformed distance in matrix at i, j */ curval = *(ptr++) - (r2[i] + r2[j]); /* if the transformed distanance is less than the known minimum */ if(curval < smallest) { smallest = curval; tmp_i = i; tmp_j = j; } } } /* pass back (by reference) the coords of the min. transformed distance */ *ret_i = tmp_i; *ret_j = tmp_j; return(smallest); /* return the min transformed distance */ } mothur-1.36.1/source/clearcut/clearcut.h000066400000000000000000000165701255543666200202240ustar00rootroot00000000000000 /* * clearcut.h * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #ifndef _INC_CLEARCUT_H_ #define _INC_CLEARCUT_H_ 1 extern "C" { #include "common.h" #include "cmdargs.h" #define NJ_VERSION "1.0.9" #define NJ_INTERNAL_NODE -1 #define NJ_LAST 101 #define NJ_INPUT_MODE_UNKNOWN 0 #define NJ_INPUT_MODE_DISTANCE 100 #define NJ_INPUT_MODE_UNALIGNED_SEQUENCES 101 #define NJ_INPUT_MODE_ALIGNED_SEQUENCES 102 #define NJ_MODEL_NONE 100 #define NJ_MODEL_JUKES 101 #define NJ_MODEL_KIMURA 102 /* * DMAT - Distance Matrix * * This is arguably the most important structure in the * program. This is the distance matrix, and it is used * by many functions throughout the application. * * The matrix is architected as a contiguously allocated * upper-diagonal matrix of floats which include the * diagonal. * * Example: * * 0 1 2 3 4 5 * 0 0.0 1.0 0.3 0.2 0.1 0.3 * 1 0.0 0.3 0.2 0.1 0.8 * 2 0.0 0.1 0.3 0.5 * 3 0.0 0.2 0.1 * 4 0.0 0.2 * 5 0.0 * * The distance matrix shrinks with every join operation, * so I track the original and working size of the matrix * inside the matrix. * * One fast optimization to shrink the distance matrix * involves incrementing the "val" pointer. Thus, in * addition to tracking the pointer to the distances, * I also track the original pointer to that I can * free the memory associated with the working distance * matrix. * * This also applies to the r and r2 vectors which are * used to compute the transformed distances in the * matrix. * */ typedef struct _STRUCT_DMAT { long int ntaxa; /* the original size of the distance matrix */ long int size; /* the current/effective size of the distance matrix */ char **taxaname; /* a pointer to an array of taxa name strings */ float *val; /* the distances */ float *valhandle; /* to track the orig. pointer to free memory */ float *r, *r2; /* r and r2 vectors (used to compute transformed dists) */ float *rhandle, *r2handle; /* track orig. pointers to free memory */ } DMAT; /* * NJ_TREE - The Tree Data Structure * * * The tree is represented internally as a rooted * binary tree. Each internal node has a left and a right child. * * Additionally, I track the distance between the current node * and that node's parent (i.e. the branch length). * * Finally, I track the index of the taxa for leaf nodes. * */ typedef struct _STRUCT_NJ_TREE { struct _STRUCT_NJ_TREE *left; /* left child */ struct _STRUCT_NJ_TREE *right; /* right child */ float dist; /* branch length. i.e. dist from node to parent */ long int taxa_index; /* for terminal nodes, track the taxon index */ } NJ_TREE; /* * NJ_VERTEX * * This structure is used for building trees. It is a vector * which, represents the center of the star when building the RNJ/NJ * tree through star-decomposition. * * It contains a vector of tree (node) pointers. These pointers * get joined together by a new internal node, and the new internal * node is placed back into the vector of nodes (which is now smaller). * * To keep this vector in sync. with the shrinking matrix, parts of * the vector are shuffled around, and so a pointer to the originally * allocated vector is stored such that it can be freed from memory * later. * * The original and working sizes of the vector are also tracked. * */ typedef struct _STRUCT_NJ_VERTEX { NJ_TREE **nodes; NJ_TREE **nodes_handle; /* original memory handle for freeing */ long int nactive; /* number of active nodes in the list */ long int size; /* the total size of the vertex */ } NJ_VERTEX; /* some function prototypes */ int clearcut_main(int, char**); /* core function for performing Relaxed Neighbor Joining */ NJ_TREE * NJ_relaxed_nj(NJ_ARGS *nj_args, DMAT *dmat); /* function for performing traditional Neighbor-Joining */ NJ_TREE * NJ_neighbor_joining(NJ_ARGS *nj_args, DMAT *dmat); /* print the distance matrix (for debugging) */ void NJ_print_distance_matrix(DMAT *dmat); /* output the computed tree to stdout or to the specified file */ void NJ_output_tree(NJ_ARGS *nj_args, NJ_TREE *tree, DMAT *dmat, long int count); /* the recursive function for outputting trees */ void NJ_output_tree2(FILE *fp, NJ_ARGS *nj_args, NJ_TREE *tree, NJ_TREE *root, DMAT *dmat); /* initialize vertex */ NJ_VERTEX * NJ_init_vertex(DMAT *dmat); /* used to decompose the star topology and build the tree */ NJ_TREE * NJ_decompose(DMAT *dmat, NJ_VERTEX *vertex, long int x, long int y, int last_flag); /* print the vertex vector (for debugging) */ void NJ_print_vertex(NJ_VERTEX *vertex); /* print taxa names (for debugging) */ void NJ_print_taxanames(DMAT *dmat); /* initialize r-vector prior to RNJ/NJ */ void NJ_init_r(DMAT *dmat); /* print the r-vector (for debugging) */ void NJ_print_r(DMAT *dmat); /* shuffle the distance matrix, usually after reading in input */ void NJ_shuffle_distance_matrix(DMAT *dmat); /* free memory from the tree */ void NJ_free_tree(NJ_TREE *node); /* print permutations (for debugging) */ void NJ_print_permutation(long int *perm, long int size); /* duplicate a distance matrix for multiple iterations */ DMAT * NJ_dup_dmat(DMAT *src); /* free the distance matrix */ void NJ_free_dmat(DMAT *dmat); /* free the vertex vector */ void NJ_free_vertex(NJ_VERTEX *vertex); /* for computing the global minimum transformed distance in traditional NJ */ float NJ_min_transform(DMAT *dmat, long int *ret_i, long int *ret_j); } #endif /* _INC_CLEARCUT_H_ */ mothur-1.36.1/source/clearcut/cmdargs.cpp000066400000000000000000000351121255543666200203660ustar00rootroot00000000000000/* * cmdargs.c * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #include #include #include #include #ifdef USE_GNU #include #else #include "getopt_long.h" #endif /* USE_GNU*/ #include "clearcut.h" #include "cmdargs.h" /* * NJ_handle_args() - * */ NJ_ARGS * NJ_handle_args(int argc, char *argv[]) { static NJ_ARGS nj_args; int option_index, c; optind = 0; //neccasary to read in arguments if code is run more than once struct option NJ_long_options[] = { /* These options don't set a flag */ {"in", required_argument, NULL, 'i'}, {"out", required_argument, NULL, 'o'}, {"seed", required_argument, NULL, 's'}, {"matrixout", required_argument, NULL, 'm'}, {"ntrees", required_argument, NULL, 'n'}, /* These options set a flag */ {"verbose", no_argument, &(nj_args.verbose_flag), 1}, {"quiet", no_argument, &(nj_args.quiet_flag), 1}, {"distance", no_argument, &(nj_args.input_mode), NJ_INPUT_MODE_DISTANCE}, {"alignment", no_argument, &(nj_args.input_mode), NJ_INPUT_MODE_ALIGNED_SEQUENCES}, {"help", no_argument, &(nj_args.help), 1}, {"version", no_argument, &(nj_args.version), 1}, {"norandom", no_argument, &(nj_args.norandom), 1}, {"shuffle", no_argument, &(nj_args.shuffle), 1}, {"stdin", no_argument, &(nj_args.stdin_flag), 1}, {"stdout", no_argument, &(nj_args.stdout_flag), 1}, {"dna", no_argument, &(nj_args.dna_flag), 1}, {"DNA", no_argument, &(nj_args.dna_flag), 1}, {"protein", no_argument, &(nj_args.protein_flag), 1}, {"neighbor", no_argument, &(nj_args.neighbor), 1}, {"expblen", no_argument, &(nj_args.expblen), 1}, {"expdist", no_argument, &(nj_args.expdist), 1}, {"jukes", no_argument, &(nj_args.jukes_flag), 1}, {"kimura", no_argument, &(nj_args.kimura_flag), 1}, {0, 0, 0, 0} }; /* initializes options to their default */ nj_args.infilename = NULL; nj_args.outfilename = NULL; nj_args.matrixout = NULL; nj_args.seed = time(0); nj_args.verbose_flag = 0; nj_args.quiet_flag = 0; nj_args.input_mode = NJ_INPUT_MODE_DISTANCE; nj_args.help = 0; nj_args.version = 0; nj_args.norandom = 0; nj_args.shuffle = 0; nj_args.stdin_flag = 0; nj_args.stdout_flag = 0; nj_args.dna_flag = 0; nj_args.protein_flag = 0; nj_args.correction_model = NJ_MODEL_NONE; nj_args.jukes_flag = 0; nj_args.kimura_flag = 0; nj_args.neighbor = 0; nj_args.ntrees = 1; nj_args.expblen = 0; nj_args.expdist = 0; while(1) { c = getopt_long(argc, argv, "i:o:s:m:n:vqduahVSIOrDPjkNeE", NJ_long_options, &option_index); if(c == -1) { break; } //printf("%d\t%d\n", option_index, argc); //for (int red = 0; red < argc; red++) { printf("%s\n", argv[red]); } switch(c) { case 0: if(NJ_long_options[option_index].flag) { break; } printf("option %s", NJ_long_options[option_index].name); if(optarg) { printf(" with arg %s", optarg); } printf("\n"); break; case 'i': nj_args.infilename = optarg; break; case 'o': nj_args.outfilename = optarg; break; case 's': nj_args.seed = atoi(optarg); break; case 'm': nj_args.matrixout = optarg; break; case 'n': nj_args.ntrees = atoi(optarg); break; case 'v': nj_args.verbose_flag = 1; break; case 'q': nj_args.quiet_flag = 1; break; case 'd': nj_args.input_mode = NJ_INPUT_MODE_DISTANCE; break; case 'a': nj_args.input_mode = NJ_INPUT_MODE_ALIGNED_SEQUENCES; break; case 'h': nj_args.help = 1; break; case 'V': nj_args.version = 1; break; case 'S': nj_args.shuffle = 1; break; case 'I': nj_args.stdin_flag = 1; break; case 'O': nj_args.stdin_flag = 1; break; case 'r': nj_args.norandom = 1; break; case 'D': nj_args.dna_flag = 1; break; case 'P': nj_args.protein_flag = 1; break; case 'j': nj_args.jukes_flag = 1; break; case 'k': nj_args.kimura_flag = 1; break; case 'N': nj_args.neighbor = 1; break; case 'e': nj_args.expblen = 1; break; case 'E': nj_args.expdist = 1; break; default: NJ_usage(); exit(-1); } } if(optind < argc) { fprintf(stderr, "Clearcut: Unknown command-line argument:\n --> %s\n", argv[optind]); NJ_usage(); exit(-1); } if(nj_args.version) { printf("Clearcut Version: %s\n", NJ_VERSION); //exit(0); } if(nj_args.help) { NJ_usage(); //exit(0); } /* if stdin & explicit filename are specified for input */ if(nj_args.stdin_flag) { if(nj_args.infilename) { fprintf(stderr, "Clearcut: Ambiguous input source specified. Specify input filename OR stdin.\n"); NJ_usage(); exit(-1); } } /* if stdout & explicit filename are specified for output */ if(nj_args.stdout_flag) { if(nj_args.outfilename) { fprintf(stderr, "Clearcut: Ambiguous output specified. Specify output filename OR stdout.\n"); NJ_usage(); exit(-1); } } /* if user did not specify stdin or filename, default to stdin */ if(!nj_args.stdin_flag) { if(!nj_args.infilename) { fprintf(stderr, "Clearcut: No input file specified. Using stdin.\n"); nj_args.stdin_flag = 1; } } /* if user did not specify stdout or filename, default to stdout */ if(!nj_args.stdout_flag) { if(!nj_args.outfilename) { fprintf(stderr, "Clearcut: No output file specified. Using stdout.\n"); nj_args.stdout_flag = 1; } } /* User must specify distance matrix or alignment */ if(nj_args.input_mode == NJ_INPUT_MODE_UNKNOWN) { fprintf(stderr, "Clearcut: Must specify input type (--distance | --alignment)\n"); NJ_usage(); exit(-1); } /* do not allow protein or DNA options for distance matrix input */ if(nj_args.input_mode == NJ_INPUT_MODE_DISTANCE) { if(nj_args.dna_flag || nj_args.protein_flag) { fprintf(stderr, "Clearcut: Ambiguous arguments. (--protein | --DNA) do not apply to distance \n"); NJ_usage(); exit(-1); } } /* make sure different filenames were specified for input and output */ if(!nj_args.stdin_flag && !nj_args.stdout_flag) { if(!strcmp(nj_args.infilename, nj_args.outfilename)) { fprintf(stderr, "Clearcut: Input filename and output filename must be unique.\n"); NJ_usage(); exit(-1); } } /* make sure that user specifies DNA or Protein if dealing with alignment input */ if(nj_args.input_mode == NJ_INPUT_MODE_ALIGNED_SEQUENCES) { if(!nj_args.dna_flag && !nj_args.protein_flag) { fprintf(stderr, "Clearcut: Must specify protein or DNA for alignment input.\n"); NJ_usage(); exit(-1); } } /* make sure that user does not specify both protein and DNA when dealing with alignment input */ if(nj_args.input_mode == NJ_INPUT_MODE_ALIGNED_SEQUENCES) { if(nj_args.dna_flag && nj_args.protein_flag) { fprintf(stderr, "Clearcut: Specifying protein and DNA sequences are mutually exclusive options\n"); NJ_usage(); exit(-1); } } /* make sure verbose and quiet were not specified together */ if(nj_args.verbose_flag && nj_args.quiet_flag) { fprintf(stderr, "Clearcut: Verbose and Quiet mode are mutually exclusive.\n"); NJ_usage(); exit(-1); } /* make sure that a correction model was specified only when providing an alignment */ if(nj_args.input_mode == NJ_INPUT_MODE_DISTANCE) { if(nj_args.jukes_flag || nj_args.kimura_flag) { fprintf(stderr, "Clearcut: Only specify correction model for alignment input.\n"); NJ_usage(); exit(-1); } } else { if(nj_args.jukes_flag && nj_args.kimura_flag) { fprintf(stderr, "Clearcut: Only specify one correction model\n"); NJ_usage(); exit(-1); } else { if(nj_args.jukes_flag && !nj_args.kimura_flag) { nj_args.correction_model = NJ_MODEL_JUKES; } else if(nj_args.kimura_flag && !nj_args.jukes_flag) { nj_args.correction_model = NJ_MODEL_KIMURA; } else { nj_args.correction_model = NJ_MODEL_NONE; /* DEFAULT */ } } } /* make sure that the number of output trees is reasonable */ if(nj_args.ntrees <= 0) { fprintf(stderr, "Clearcut: Number of output trees must be a positive integer.\n"); NJ_usage(); exit(-1); } /* * make sure that if exponential distances are specified, * we are dealing with alignment input */ if(nj_args.expdist && nj_args.input_mode != NJ_INPUT_MODE_ALIGNED_SEQUENCES) { fprintf(stderr, "Clearcut: Exponential notation for distance matrix output requires that input be an alignment\n"); NJ_usage(); exit(-1); } return(&nj_args); } /* * NJ_print_args() - * */ void NJ_print_args(NJ_ARGS *nj_args) { char input_mode[32]; switch (nj_args->input_mode) { case NJ_INPUT_MODE_DISTANCE: sprintf(input_mode, "Distance Matrix"); break; case NJ_INPUT_MODE_UNALIGNED_SEQUENCES: sprintf(input_mode, "Unaligned Sequences"); break; case NJ_INPUT_MODE_ALIGNED_SEQUENCES: sprintf(input_mode, "Aligned Sequences"); break; default: sprintf(input_mode, "UNKNOWN"); break; } printf("\n*** Command Line Arguments ***\n"); printf("Input Mode: %s\n", input_mode); if(nj_args->stdin_flag) { printf("Input from STDIN\n"); } else { printf("Input File: %s\n", nj_args->infilename); } if(nj_args->stdout_flag) { printf("Output from STDOUT\n"); } else { printf("Output File: %s\n", nj_args->outfilename); } if(nj_args->input_mode != NJ_INPUT_MODE_DISTANCE) { if(nj_args->aligned_flag) { printf("Input Sequences Aligned: YES\n"); } else { printf("Input Sequences Aligned: NO\n"); } } if(nj_args->verbose_flag) { printf("Verbose Mode: ON\n"); } else { printf("Verbose Mode: OFF\n"); } if(nj_args->quiet_flag) { printf("Quiet Mode: ON\n"); } else { printf("Quiet Mode: OFF\n"); } if(nj_args->seed) { printf("Random Seed: %d\n", nj_args->seed); } printf("\n*******\n"); return; } /* * NJ_usage() - * * Print a usage message * */ void NJ_usage(void) { printf("Usage: clearcut --in= --out= [options]...\n"); printf("GENERAL OPTIONS:\n"); printf(" -h, --help Display this information.\n"); printf(" -V, --version Print the version of this program.\n"); printf(" -v, --verbose More output. (Default: OFF)\n"); printf(" -q, --quiet Silent operation. (Default: ON)\n"); printf(" -s, --seed= Explicitly set the PRNG seed to a specific value.\n"); printf(" -r, --norandom Attempt joins deterministically. (Default: OFF)\n"); printf(" -S, --shuffle Randomly shuffle the distance matrix. (Default: OFF)\n"); printf(" -N, --neighbor Use traditional Neighbor-Joining algorithm. (Default: OFF)\n"); printf("\n"); printf("INPUT OPTIONS:\n"); printf(" -I, --stdin Read input from STDIN.\n"); printf(" -d, --distance Input file is a distance matrix. (Default: ON)\n"); printf(" -a, --alignment Input file is a set of aligned sequences. (Default: OFF)\n"); printf(" -D, --DNA Input alignment are DNA sequences.\n"); printf(" -P, --protein Input alignment are protein sequences.\n"); printf("\n"); printf("CORRECTION MODEL FOR COMPUTING DISTANCE MATRIX (Default: NO Correction):\n"); printf(" -j, --jukes Use Jukes-Cantor correction for computing distance matrix.\n"); printf(" -k, --kimura Use Kimura correction for distance matrix.\n"); printf("\n"); printf("OUTPUT OPTIONS:\n"); printf(" -O, --stdout Output tree to STDOUT.\n"); printf(" -m, --matrixout= Output distance matrix to specified file.\n"); printf(" -n, --ntrees= Output n trees. (Default: 1)\n"); printf(" -e, --expblen Exponential notation for branch lengths. (Default: OFF)\n"); printf(" -E, --expdist Exponential notation in distance output. (Default: OFF)\n"); printf("\n"); printf("EXAMPLES:\n"); printf(" Compute tree by supplying distance matrix via stdin:\n"); printf(" clearcut --distance < distances.txt > treefile.tre\n"); printf("\n"); printf(" Compute tree by supplying an alignment of DNA sequences from a file:\n"); printf(" clearcut --alignment --DNA --in=alignment.txt --out=treefile.tre\n"); return; } mothur-1.36.1/source/clearcut/cmdargs.h000066400000000000000000000057021255543666200200350ustar00rootroot00000000000000/* * njdist.h * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #ifndef _INC_NJ_CMDARGS_H_ #define _INC_NJ_CMDARGS_H_ 1 #include "clearcut.h" /* some datatypes */ typedef struct _STRUCT_NJ_ARGS { char *infilename; /* the name of the input file */ char *outfilename; /* the name of the output tree */ char *matrixout; /* the name of the distance matrix output file */ int input_mode; int aligned_flag; int verbose_flag; int quiet_flag; int stdin_flag; int stdout_flag; int help; int version; int norandom; int shuffle; int dna_flag; int protein_flag; int seed; /* correction models for distance */ int correction_model; int jukes_flag; int kimura_flag; /* flag for using traditional neighbor-joining */ int neighbor; /* number of trees to output */ int ntrees; /* exponential notation output */ int expblen; /* exp notation for tree branch lengths */ int expdist; /* exp notation for distances in matrix output */ } NJ_ARGS; /* some function prototypes */ NJ_ARGS * NJ_handle_args(int argc, char *argv[]); void NJ_print_args(NJ_ARGS *nj_args); void NJ_usage(void); #endif /* _INC_NJ_CMDARGS_H_ */ mothur-1.36.1/source/clearcut/common.h000066400000000000000000000060161255543666200177040ustar00rootroot00000000000000/* * common.h * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * A header file filled with common definitions and simple inline functions * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #ifndef _INC_NJ_COMMON_H_ #define _INC_NJ_COMMON_H_ 1 #include #include #define NJ_AMBIGUITY_CHAR 63 /* ? character */ /* * this macro defines the number of cells in the diagonal matrix * based on the number of taxa involved * */ #define NJ_NCELLS(a) ( ((a)*(a+1))/2 ) /* * NJ_MAP() - * * Thus function maps i, j coordinates to the correct offset into * the distance matrix * */ static inline long int NJ_MAP(long int i, long int j, long int ntaxa) { return((i*(2*ntaxa-i-1))/2 + j); } static inline int NJ_FLT_EQ(float x, float y) { if(fabs(x - y) y) { return(1); } else { return(0); } } } #endif /* _INC_NJ_COMMON_H_ */ mothur-1.36.1/source/clearcut/distclearcut.cpp000066400000000000000000000341401255543666200214340ustar00rootroot00000000000000/* * dist.c * * $Id$ * * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * Compute a distance matrix given a set of sequences * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #include #include #include #include #include #include "common.h" #include "dayhoff.h" #include "fasta.h" #include "distclearcut.h" /* * NJ_build_distance_matrix() - * * Given a filename for an alignment, read the alignment * into memory and then compute the distance matrix * using the appropriate correction model */ DMAT * NJ_build_distance_matrix(NJ_ARGS *nj_args) { DMAT *dmat; NJ_alignment *alignment; /* Read an alignment in FASTA format */ alignment = NJ_read_fasta(nj_args); if(!alignment) { return(NULL); } /* * Given a global multiple sequence alignment (MSA) and * a specified distance correction model, compute a * corrected distance matrix * * From proteins, we may want to allow users to specify * a substitution matrix (feature) */ dmat = NJ_compute_dmat(nj_args, alignment); // NJ_print_taxanames(dmat); if(!dmat) { fprintf(stderr, "Clearcut: Error computing distance matrix\n"); } /* now free the memory associated with the alignment */ NJ_free_alignment(alignment); return(dmat); } /* * NJ_compute_dmat() - * * Given an alignment and a correction model, compute the * distance matrix and return it * */ DMAT * NJ_compute_dmat(NJ_ARGS *nj_args, NJ_alignment *alignment) { DMAT *dmat; long int i; /* allocate distance matrix here */ dmat = (DMAT *)calloc(1, sizeof(DMAT)); if(!dmat) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_compute_dmat()\n"); return(NULL); } dmat->ntaxa = alignment->nseq; dmat->size = alignment->nseq; /* allocate memory to hold the taxa names */ dmat->taxaname = (char **)calloc(alignment->nseq, sizeof(char *)); if(!dmat->taxaname) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_compute_dmat()\n"); return(NULL); } /* copy sequence titles */ for(i=0;inseq;i++) { dmat->taxaname[i] = (char *)calloc(strlen(alignment->titles[i])+1, sizeof(char)); if(!dmat->taxaname[i]) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_compute_dmat()\n"); return(NULL); } strncpy(dmat->taxaname[i], alignment->titles[i], strlen(alignment->titles[i])); } /* allocate val matrix in dmat */ dmat->val = (float *)calloc(dmat->ntaxa*dmat->ntaxa, sizeof(float)); if(!dmat->val) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_compute_dmat()\n"); return(NULL); } /* now lets allocate space for the r and r2 columns */ dmat->r = (float *)calloc(dmat->ntaxa, sizeof(float)); dmat->r2 = (float *)calloc(dmat->ntaxa, sizeof(float)); /* track some memory addresses */ dmat->rhandle = dmat->r; dmat->r2handle = dmat->r2; dmat->valhandle = dmat->val; /* apply model correction to matrix */ switch(nj_args->correction_model) { case NJ_MODEL_JUKES: if(nj_args->dna_flag) { NJ_DNA_jc_correction(dmat, alignment); } else if(nj_args->protein_flag) { NJ_PROTEIN_jc_correction(dmat, alignment); } else { fprintf(stderr, "Clearcut: Need to know sequence type for Jukes-Cantor model correction.\n"); return(NULL); } break; case NJ_MODEL_KIMURA: if(nj_args->dna_flag) { NJ_DNA_k2p_correction(dmat, alignment); } else if(nj_args->protein_flag) { NJ_PROTEIN_kimura_correction(dmat, alignment); } else { fprintf(stderr, "Clearcut: Need to know sequence type for Kimura model correction.\n"); return(NULL); } break; case NJ_MODEL_NONE: NJ_no_correction(dmat, alignment); break; default: fprintf(stderr, "Clearcut: Invalid distance correction model.\n"); return(NULL); } return(dmat); } /* * NJ_no_correction() - * * Compute the distance matrix without correction * (straight percent ID) * * Resolve ambiguities in sequence data by skipping * those nucleotides/residues * */ void NJ_no_correction(DMAT *dmat, NJ_alignment *alignment) { long int i, j; float pdiff; /* compute pairwise percent identity */ for(i=0;isize;i++) { for(j=i+1;jsize;j++) { pdiff = 1.0 - NJ_pw_percentid(alignment, i, j); dmat->val[NJ_MAP(i, j, dmat->size)] = pdiff; } } return; } /* * NJ_DNA_jc_correction() - * * Compute the distance matrix with jukes-cantor correction * and assign high distance if sequence divergence exceeds * 0.75 * * Jukes, T.H. (1969), Evolution of protein molecules. In H.N. Munro (Ed.), * Mammalian Protein Metabolism, Volume III, Chapter 24, pp. 21-132. * New York: Academic Press * */ void NJ_DNA_jc_correction(DMAT *dmat, NJ_alignment *alignment) { long int i, j; long int k; float d, cutoff, dist; long int residues; cutoff = 0.75; for(i=0;isize;i++) { for(j=i+1;jsize;j++) { k = NJ_pw_differences(alignment, i, j, &residues); d = 1.0 - NJ_pw_percentid(alignment, i, j); if(d > cutoff) { dist = NJ_BIGDIST; } else { dist = (-0.75) * log(1.0 - (4.0/3.0)*d); } if(fabs(dist) < FLT_EPSILON) { dmat->val[NJ_MAP(i, j, dmat->size)] = 0.0; } else { dmat->val[NJ_MAP(i, j, dmat->size)] = dist; } } } return; } /* * NJ_PROTEIN_jc_correction() - * * This function performs modified jukes/cantor correction on * a protein alignment * * Jukes, T.H. (1969), Evolution of protein molecules. In H.N. Munro (Ed.), * Mammalian Protein Metabolism, Volume III, Chapter 24, pp. 21-132. * New York: Academic Press * */ void NJ_PROTEIN_jc_correction(DMAT *dmat, NJ_alignment *alignment) { long int i, j; long int residues; long int diff; float dist, x; for(i=0;isize;i++) { for(j=i+1;jsize;j++) { diff = NJ_pw_differences(alignment, i, j, &residues); if(!diff || !residues) { dist = 0.0; } else { dist = (float)diff/(float)residues; x = ((20.0/19.0)*dist); if(NJ_FLT_GT(x, 1.0)) { dist = NJ_BIGDIST; } else { dist = -(19.0/20.0) * log(1.0 - x); } } dmat->val[NJ_MAP(i, j, dmat->size)] = dist; } } return; } /* * NJ_DNA_k2p_correction() - * * Correct a distance matrix using k2p correction using * cutoffs to avoid problems with logarithms. * * dist = -0.5ln(1-2P-Q) - 0.25ln(1-2Q) * * But due to the logarithms, this is only valid when * * (2P+Q <= 1) && * (2Q <= 1) * * So assign arbitary distances when these constraints are * not strictly followed. * * Kimura, M. (1980), A simple method for estimating evolutionary * rates of base substitutions through comparative studies of * nucleotide sequences. J. Mol. Evol., 16, 111-120 * */ void NJ_DNA_k2p_correction(DMAT *dmat, NJ_alignment *alignment) { long int i, j; float P; /* proportion of transitions */ float Q; /* proportion of transversions */ long int nucleotides; long int transitions, transversions; float dist; float log_x = 0.0; /* the params for the first log */ float log_y = 0.0; /* the params for the second log */ int blowup; /* a flag to specify if we have a log blowup */ for(i=0;isize;i++) { for(j=i+1;jsize;j++) { blowup = 0; /* count the number of transitions and transversions */ NJ_DNA_count_tt(alignment, i, j, &transitions, &transversions, &nucleotides); if(!nucleotides) { /* sequences have no non-ambiguous overlap in alignment */ P = 0.0; Q = 0.0; } else { P = (float)transitions / (float)nucleotides; Q = (float)transversions / (float)nucleotides; } /* the first log blows up if 2*P+Q = 1.0 */ if(NJ_FLT_EQ((2.0 * P + Q), 1.0)) { blowup = 1; } else { if( NJ_FLT_LT(1.0 - 2.0*P - Q, 0.0) ) { blowup = 1; } else { log_x = log(1.0 - 2.0*P - Q); } } /* the second log blows up if 2*Q >= 1.0 */ if( NJ_FLT_EQ((2.0 * Q), 1.0) || NJ_FLT_GT((2.0 * Q), 1.0) ) { blowup = 1; } else { log_y = log(1.0 - 2.0*Q); } /* if our logarithms blow up, we just set the distance to the max */ if(blowup) { dist = NJ_BIGDIST; } else { dist = (-0.5)*log_x - 0.25*log_y; } if(fabs(dist) < FLT_EPSILON) { dmat->val[NJ_MAP(i, j, dmat->size)] = 0.0; } else { dmat->val[NJ_MAP(i, j, dmat->size)] = dist; } } } return; } /* * NJ_PROTEIN_kimura_correction() - * * Perform Kimura correction for distances derived from protein * alignments. * * Kimura, M. (1983), The Neutral Theory of Molecular Evolution. * p. 75., Cambridge University Press, Cambridge, England * */ void NJ_PROTEIN_kimura_correction(DMAT *dmat, NJ_alignment *alignment) { long int i, j; long int residues; long int diff; float dist; printf("NJ_PROTEIN_kimura_correction()\n"); for(i=0;isize;i++) { for(j=i+1;jsize;j++) { diff = NJ_pw_differences(alignment, i, j, &residues); if(!diff || !residues) { dist = 0.0; } else { dist = (float)diff/(float)residues; } if(NJ_FLT_LT(dist, 0.75)) { if(NJ_FLT_GT(dist, 0.0) ) { dist = -log(1.0 - dist - (dist * dist/5.0) ); } } else { if(NJ_FLT_GT(dist, 0.93) ) { dist = 10.0; } else { dist = (float)NJ_dayhoff[ (int)((dist*1000.0)-750.0) ] / 100.0 ; } } dmat->val[NJ_MAP(i, j, dmat->size)] = dist; } } return; } /* * NJ_DNA_count_tt() - * * Count the number of transitions and transversions * between two aligned DNA sequences * * This routine automatically skips ambiguities when * counting transitions and transversions. * */ void NJ_DNA_count_tt(NJ_alignment *alignment, long int x, long int y, long int *transitions, long int *transversions, long int *residues) { long int tmp_transitions = 0; long int tmp_transversions = 0; long int tmp_residues = 0; char a, b; long int i; for(i=0;ilength;i++) { a = toupper(alignment->data[x*alignment->length+i]); b = toupper(alignment->data[y*alignment->length+i]); if( (a == 'A' && b == 'T') || (a == 'T' && b == 'A') || (a == 'A' && b == 'C') || (a == 'C' && b == 'A') || (a == 'T' && b == 'G') || (a == 'G' && b == 'T') || (a == 'C' && b == 'G') || (a == 'G' && b == 'C') ) { tmp_transversions++; } if( (a == 'C' && b == 'T') || (a == 'T' && b == 'C') || (a == 'G' && b == 'A') || (a == 'A' && b == 'G') ) { tmp_transitions++; } /* count the number of residues */ if(a != NJ_AMBIGUITY_CHAR && b != NJ_AMBIGUITY_CHAR ) { tmp_residues++; } } *transitions = tmp_transitions; *transversions = tmp_transversions; if(residues) { *residues = tmp_residues; } return; } /* * NJ_pw_percentid() - * * Given an alignment and a specification * for two rows, compute the pairwise * percent identity between the two * */ float NJ_pw_percentid(NJ_alignment *alignment, long int x, long int y) { float pid; long int i; long int residues; long int same; char c1, c2; residues = 0; same = 0; for(i=0;ilength;i++) { c1 = alignment->data[x*alignment->length+i]; c2 = alignment->data[y*alignment->length+i]; if( c1 != NJ_AMBIGUITY_CHAR || c2 != NJ_AMBIGUITY_CHAR ) { residues++; if(c1 == c2) { same++; } } } pid = (float)same/(float)residues; return(pid); } /* * NJ_pw_differences() - * * Given an alignment and a specification * for two rows in the alignment, compute the * number of differences between the two sequences * * With respect to ambiguity codes, we will want to * disregard those sites entirely in our count. * */ long int NJ_pw_differences(NJ_alignment *alignment, long int x, long int y, long int *residues) { long int i; long int diff; char c1, c2; long int tmp_residues; diff = 0; tmp_residues = 0; for(i=0;ilength;i++) { c1 = alignment->data[x*alignment->length+i]; c2 = alignment->data[y*alignment->length+i]; if( c1 != NJ_AMBIGUITY_CHAR || c2 != NJ_AMBIGUITY_CHAR ) { tmp_residues++; if(c1 != c2) { diff++; } } } *residues = tmp_residues; return(diff); } mothur-1.36.1/source/clearcut/distclearcut.h000066400000000000000000000061251255543666200211030ustar00rootroot00000000000000/* * dist.h * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * Compute a distance matrix given a set of sequences * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #ifndef _INC_DIST_H_ #define _INC_DIST_H_ 1 #ifdef __cplusplus extern "C" { #endif #include "fasta.h" #include "clearcut.h" /* * An arbitrarily large distance to represent distances * which are too great to accurately correct. */ #define NJ_BIGDIST 10.0 /* some function prototypes */ DMAT * NJ_build_distance_matrix(NJ_ARGS *nj_args); DMAT * NJ_compute_dmat(NJ_ARGS *nj_args, NJ_alignment *alignment); float NJ_pw_percentid(NJ_alignment *alignment, long int x, long int y); long int NJ_pw_differences(NJ_alignment *alignment, long int x, long int y, long int *residues); void NJ_no_correction(DMAT *dmat, NJ_alignment *alignment); void NJ_DNA_jc_correction(DMAT *dmat, NJ_alignment *alignment); void NJ_PROTEIN_jc_correction(DMAT *dmat, NJ_alignment *alignment); void NJ_DNA_k2p_correction(DMAT *dmat, NJ_alignment *alignment); void NJ_PROTEIN_kimura_correction(DMAT *dmat, NJ_alignment *alignment); void NJ_DNA_count_tt(NJ_alignment *alignment, long int x, long int y, long int *transitions, long int *transversions, long int *residues); #ifdef __cplusplus } #endif #endif /* _INC_DIST_H_ */ mothur-1.36.1/source/clearcut/dmat.cpp000066400000000000000000000445341255543666200177030ustar00rootroot00000000000000/* * dmat.c * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * Distance matrix parser * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #include #include #include #include #include #include "common.h" #include "clearcut.h" #include "dmat.h" /* * * NJ_is_alpha() - determine if character is an alphabetic character * * INPUT: * ------ * c -- character to test * * RETURN: * ------- * int -- 1 if character is alphabetic (A-Z || a-z) * 0 if character is NOT alphabetic * */ static inline int NJ_is_alpha(char c) { if( (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') ) { return(1); } else { return(0); } } /* * * NJ_is_whitespace() - determine if character is a whitespace character * * INPUT: * ------ * c -- character to test * * RETURN: * ------- * int -- 1 if character is whitespace (space, tab, CR, LF) * 0 if character is NOT whitespace * */ static inline int NJ_is_whitespace(char c) { if( c == ' ' || /* space */ c == '\n' || /* newline */ c == '\r' || /* carriage-return */ c == '\v' || /* vertical tab */ c == '\f' || /* form feed */ c == '\t' ) { /* horizontal tab */ return(1); } else { return(0); } } /* * * NJ_is_number() - determine if character is a number * * INPUT: * ------ * c -- character to test * * RETURN: * ------- * int -- 1 if character is a number (0-9) * 0 if character is NOT a number * */ static inline int NJ_is_number(char c) { if(c >= '0' && c <= '9') { return(1); } else { return(0); } } /* * NJ_is_distance() - check if string is a properly formatted distance value * */ static inline int NJ_is_distance(char *token) { int i; char c; int exponent_state; int expsign_state; int dpoint_state; /* if token is NULL return failure */ if(!token) { return(0); } exponent_state = 0; expsign_state = 0; dpoint_state = 0; /* The first character must be a number, a decimal point or a sign */ c = token[0]; if(!NJ_is_number(c) && c != '.' && c != '-' && c != '+' ) { goto BAD; } /* * if the first character is not a number, and string is only one * character long, then we return failure. */ if(strlen(token) == 1) { if(!NJ_is_number(c)) { goto BAD; } } for(i=0;i0 && !exponent_state) { if(c == '-' || c == '+') { goto BAD; } } /* if we are in the exponent state, and we've already seen a sign */ if(exponent_state && expsign_state) { if(c == '-' || c == '+') { goto BAD; } } /* if we are in the exponent state and we see a decimal point */ if(exponent_state) { if(c == '.') { goto BAD; } } /* if we are in the exponent state and see another e or E */ if(exponent_state) { if(c == 'e' || c == 'E') { goto BAD; } } /* if we are dpoint_state and see another decimal point */ if(dpoint_state) { if(c == '.') { goto BAD; } } /* enter the exponent state if we need to */ if(!exponent_state) { if(c == 'e' || c == 'E') { exponent_state = 1; } } /* enter the expsign_state if we need to */ if(exponent_state && !expsign_state) { if(c == '-' || c == '+') { expsign_state = 1; } } /* if not in dpoint state and we see a dpoint */ if(!dpoint_state) { if(c == '.') { dpoint_state = 1; } } } /* the token must end in a number char */ if(!NJ_is_number(token[strlen(token)-1])) { goto BAD; } /* token is a valid numerical distance */ return(1); BAD: /* token is invalid distance format */ return(0); } /* * NJ_is_label() - * * Simply, if token is not a valid number, then it is a name * */ static inline int NJ_is_label(char *token) { if(NJ_is_distance(token)) { return(0); } else { return(1); } } /* * NJ_get_token() - get a token from an input stream * */ static inline int NJ_get_token(FILE *fp, NJ_DIST_TOKEN *token) { char c; int index; c = fgetc(fp); if(feof(fp)) { token->type = NJ_EOF_STATE; return(token->type); } if(NJ_is_whitespace(c)) { token->buf[0] = c; token->buf[1] = '\0'; token->type = NJ_WS_STATE; return NJ_WS_STATE; } index = 0; while(!NJ_is_whitespace(c)) { /* reallocate our buffer if necessary */ if(index >= token->bufsize) { token->bufsize *= 2; token->buf = (char *)realloc(token->buf, token->bufsize*sizeof(char)); if(!token->buf) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_get_token()\n"); exit(-1); } } token->buf[index++] = c; c = fgetc(fp); if(feof(fp)) { token->type = NJ_EOF_STATE; break; } } token->buf[index] = '\0'; if(token->type != NJ_EOF_STATE) { if(NJ_is_distance(token->buf)) { token->type = NJ_FLOAT_STATE; } else { token->type = NJ_NAME_STATE; } } return(token->type); } /* * NJ_parse_distance_matrix() -- Takes a filename and returns a distance matrix * * * INPUT: * ------ * nj_args -- a pointer to a structure containing the command-line arguments * * OUTPUT: * ------- * -- NULL (failure) * -- A pointer to a populated distance matrix * * DESCRIPTION: * ------------ * This function implements a simple state machine to parse a distance matrix * in approximate PHYLIP format. This function auto-detects whether the * distance matrix is in upper, lower, or fully-symmetric format and handles * it accordingly. For full/symmetric matrices, values must be symmetric * around the diagonal, which is required to be zeroes. Names and values must * be separated by whitespace (space, tab, newlines, etc.). Taxon labels can * include numbers, but must start with non-numerical symbols. * * * *** UPPER FORMAT EXAMPLE *** * * 4 * seq1 0.2 0.3 0.1 * seq2 0.2 0.3 * seq3 0.1 * seq4 * * *** LOWER FORMAT EXAMPLE *** * * 4 * seq1 * seq2 0.3 * seq3 0.2 0.4 * seq4 0.3 0.1 0.3 * * *** SYMMETRIC (FULL) EXAMPLE *** * * 4 * seq1 0.0 0.3 0.5 0.3 * seq2 0.3 0.0 0.1 0.2 * seq3 0.5 0.1 0.0 0.9 * seq4 0.3 0.2 0.9 0.0 * * Values in the distance matrix can be positive or negative, integers or * real values. Values can also be parsed in exponential notation form. * */ DMAT * NJ_parse_distance_matrix(NJ_ARGS *nj_args) { DMAT *dmat = NULL; FILE *fp = NULL; NJ_DIST_TOKEN *token = NULL; int state, dmat_type; int row; int fltcnt; int x, y, i; int numvalread; int expectedvalues = -1; float val; int first_state = 0; /* allocate our distance matrix and token structure */ dmat = (DMAT *)calloc(1, sizeof(DMAT)); token = (NJ_DIST_TOKEN *)calloc(1, sizeof(NJ_DIST_TOKEN)); if(token) { token->bufsize = NJ_INITIAL_BUFSIZE; token->buf = (char *)calloc(token->bufsize, sizeof(char)); } if(!dmat || !token || !token->buf) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_parse_distance_matrix()\n"); goto XIT_BAD; } /* open distance matrix file here */ if(nj_args->stdin_flag) { fp = stdin; } else { fp = fopen(nj_args->infilename, "r"); if(fp==NULL) { fprintf(stderr, "Clearcut: Could not open distance matrix: %s\n", nj_args->infilename); perror("Clearcut"); goto XIT_BAD; } } /* get the number of taxa in this file */ fscanf(fp, "%ld\n", &dmat->ntaxa); if(dmat->ntaxa < 2) { fprintf(stderr, "Clearcut: Invalid number of taxa in distance matrix\n"); goto XIT_BAD; } /* set our initial working size according to the # of taxa */ dmat->size = dmat->ntaxa; /* allocate space for the distance matrix values here */ dmat->val = (float *)calloc(NJ_NCELLS(dmat->ntaxa), sizeof(float)); if(!dmat->val) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_parse_distance_matrix()\n"); goto XIT_BAD; } /* taxa names */ dmat->taxaname = (char **)calloc(dmat->ntaxa, sizeof(char *)); if(!dmat->taxaname) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_parse_distance_matrix()\n"); goto XIT_BAD; } /* set the initial state of our state machine */ dmat_type = NJ_PARSE_UNKNOWN; row = -1; fltcnt = 0; numvalread = 0; /* read the input one character at a time to drive simple state machine */ state = NJ_get_token(fp, token); while(state != NJ_EOF_STATE) { switch(state) { case NJ_NAME_STATE: if(first_state == 0) { first_state = 1; } row++; if(row > 0 && dmat_type == NJ_PARSE_UNKNOWN) { if(fltcnt == dmat->ntaxa) { dmat_type = NJ_PARSE_SYMMETRIC; expectedvalues = dmat->ntaxa * dmat->ntaxa; } else if (fltcnt == dmat->ntaxa-1) { dmat_type = NJ_PARSE_UPPER; expectedvalues = ((dmat->ntaxa) * (dmat->ntaxa-1)) / 2; /* shift everything in first row by one char */ for(i=dmat->ntaxa-2;i>=0;i--) { dmat->val[i+1] = dmat->val[i]; } } else if (fltcnt == 0) { dmat_type = NJ_PARSE_LOWER; expectedvalues = ((dmat->ntaxa) * (dmat->ntaxa-1)) / 2; } else { goto XIT_BAD; } } if(row >= dmat->ntaxa) { goto XIT_BAD; } /* allocate space for this taxon label */ dmat->taxaname[row] = (char *)calloc(strlen(token->buf)+1, sizeof(char)); if(!dmat->taxaname[row]) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_parse_distance_matrix()\n"); goto XIT_BAD; } strcpy(dmat->taxaname[row], token->buf); fltcnt = 0; break; case NJ_FLOAT_STATE: if(first_state == 0) { goto XIT_BAD; } //fprintf(stdout, "the token buf is %s", token->buf); //fprintf(stdout, "the token buf is %f", errno); val = atof(token->buf); //fprintf(stdout, "the token buf is %f", errno); if(errno) { fprintf(stderr, "Clearcut: Distance value out-of-range.\n"); goto XIT_BAD; } x = row; y = fltcnt; switch(dmat_type) { case NJ_PARSE_UNKNOWN: dmat->val[NJ_MAP(x, y, dmat->size)] = val; break; case NJ_PARSE_SYMMETRIC: if(fltcnt >= dmat->ntaxa) { fprintf(stderr, "Clearcut: Incorrect number of distance values on row, %s. Expected %d, and found %ld.\n", dmat->taxaname[row], fltcnt, (dmat->ntaxa)); goto XIT_BAD; } if(x < y) { dmat->val[NJ_MAP(x, y, dmat->size)] = val; } else if(x > y) { if(!NJ_FLT_EQ(val, dmat->val[NJ_MAP(y, x, dmat->size)])) { fprintf(stderr, "Clearcut: Full matrices must be symmetric.\n"); goto XIT_BAD; } } else { if(!NJ_FLT_EQ(val, 0.0)) { fprintf(stderr, "Clearcut: Values along the diagonal in a symmetric matrix must be zero.\n"); goto XIT_BAD; } } break; case NJ_PARSE_UPPER: if(fltcnt > dmat->ntaxa-row) { fprintf(stderr, "Clearcut: Incorrect number of distance values on row, %s. Expected %d, and found %ld.\n", dmat->taxaname[row], fltcnt, (dmat->ntaxa-row)); goto XIT_BAD; } dmat->val[NJ_MAP(x, x+y+1, dmat->size)] = val; break; case NJ_PARSE_LOWER: if(fltcnt > row-1) { fprintf(stderr, "Clearcut: Incorrect number of distance values on row, %s. Expected %d, and found %d.\n", dmat->taxaname[row], fltcnt, (row-1)); goto XIT_BAD; } dmat->val[NJ_MAP(y, x, dmat->size)] = val; break; default: goto XIT_BAD; break; } fltcnt++; numvalread++; break; case NJ_WS_STATE: break; case NJ_EOF_STATE: if(first_state == 0) { goto XIT_BAD; } break; default: fprintf(stderr, "Clearcut: Unknown state in distance matrix parser.\n"); break; } /* get next token from stream */ state = NJ_get_token(fp, token); } /* * At the end, if we have not read the number of values that we predicted * we would need, then there was a problem and we need to punt. */ if(numvalread != expectedvalues) { fprintf(stderr, "Clearcut: Incorrect number of values in the distance matrix. Expected %d, and found %d.\n", numvalread, expectedvalues); goto XIT_BAD; } /* special check to make sure first value read is 0.0 */ if(dmat_type == NJ_PARSE_SYMMETRIC) { if(!NJ_FLT_EQ(dmat->val[NJ_MAP(0, 0, dmat->size)], 0.0)) { fprintf(stderr, "Clearcut: Values along the diagonal in a symmetric matrix must be zero.\n"); goto XIT_BAD; } } /* now lets allocate space for the r and r2 columns */ dmat->r = (float *)calloc(dmat->ntaxa, sizeof(float)); dmat->r2 = (float *)calloc(dmat->ntaxa, sizeof(float)); if(!dmat->r || !dmat->r2) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_parse_distance_matrix()\n"); goto XIT_BAD; } /* track some memory addresses */ dmat->rhandle = dmat->r; dmat->r2handle = dmat->r2; dmat->valhandle = dmat->val; /* close matrix file here */ if(!nj_args->stdin_flag) { fclose(fp); } if(token) { if(token->buf) { free(token->buf); } free(token); } return(dmat); /* clean up our partial progress */ XIT_BAD: if(fp) { fprintf(stderr, "Clearcut: Syntax error in distance matrix at offset %ld.\n", ftell(fp)); } /* close matrix file here */ if(!nj_args->stdin_flag) { if(fp) { fclose(fp); } } /* if we have a valid dmat (partial or complete), we need to free it */ if(dmat) { NJ_free_dmat(dmat); } if(token) { if(token->buf) { free(token->buf); } free(token); } return(NULL); } /* * NJ_output_matrix() - Output a distance matrix to the specified file * * * INPUTS: * ------- * nj_args -- a pointer to a data structure holding the command-line args * dmat -- a pointer to a distance matrix * * * RETURNS: * -------- * NOTHING * * * DESCRIPTION: * ------------ * If the appropriate flag was specified in the command-line, this function * now outputs the parsed or computed distance matrix to a file. This * can be useful if generating a distance matrix was the primary goal of * running the program, or if one wanted to debug and/or verify the * correctness of the program. * * Currently this function outputs full/symmetric matrices only. * */ void NJ_output_matrix(NJ_ARGS *nj_args, DMAT *dmat) { FILE *fp = NULL; long int i, j; /* if we haven't specieid matrixout, return immediately */ if(!nj_args->matrixout) { return; } /* open the specified matrix file for writing */ fp = fopen(nj_args->matrixout, "w"); if(!fp) { fprintf(stderr, "Clearcut: Could not open matrix file %s for output.\n", nj_args->matrixout); return; } /* output the number of taxa in the matrix */ fprintf(fp, " %ld\n", dmat->size); fprintf(fp, "%s\n", dmat->taxaname[0]); // print the first taxon name outside of the main loop for(i=1;isize;i++) { /* output taxaname */ fprintf(fp, "%s\t", dmat->taxaname[i]); for(j=0;jexpdist) { /* exponential notation (or not) */ fprintf(fp, "%e ", dmat->val[NJ_MAP(j,i,dmat->size)]); } else { fprintf(fp, "%f ", dmat->val[NJ_MAP(j,i,dmat->size)]); } } fprintf(fp, "\n"); } #ifdef FULL_SYMMETRIC_MATRIX /* output the number of taxa in the matrix */ fprintf(fp, " %ld\n", dmat->size); for(i=0;isize;i++) { /* output taxaname */ fprintf(fp, "%s\t", dmat->taxaname[i]); for(j=0;jsize;j++) { if(i>j) { if(nj_args->expdist) { /* exponential notation (or not) */ fprintf(fp, "%e ", dmat->val[NJ_MAP(j,i,dmat->size)]); } else { fprintf(fp, "%f ", dmat->val[NJ_MAP(j,i,dmat->size)]); } } else if(iexpdist) { /* exponential notation (or not) */ fprintf(fp, "%e ", dmat->val[NJ_MAP(i,j,dmat->size)]); } else { fprintf(fp, "%f ", dmat->val[NJ_MAP(i,j,dmat->size)]); } } else { if(nj_args->expdist) { /* exponential notation (or not) */ fprintf(fp, "%e ", 0.0); } else { fprintf(fp, "%f ", 0.0); } } } fprintf(fp, "\n"); } #endif // FULL_SYMMETRIC_MATRIX /* close the file here */ if(fp) { fclose(fp); } return; } mothur-1.36.1/source/clearcut/dmat.h000066400000000000000000000051411255543666200173370ustar00rootroot00000000000000/* * dmat.h * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * Distance matrix parser header file * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu */ #ifndef _INC_DMAT_H_ #define _INC_DMAT_H_ 1 #ifdef __cplusplus extern "C" { #endif #include "clearcut.h" #define NJ_INITIAL_BUFSIZE 32 #define NJ_NAME_STATE 100 #define NJ_FLOAT_STATE 101 #define NJ_WS_STATE 102 #define NJ_EOF_STATE 103 #define NJ_PARSE_SYMMETRIC 100 #define NJ_PARSE_LOWER 101 #define NJ_PARSE_UPPER 102 #define NJ_PARSE_UNKNOWN 103 /* some data structures */ typedef struct _NJ_DIST_TOKEN_STRUCT { char *buf; long int bufsize; int type; } NJ_DIST_TOKEN; /* some function prototypes */ DMAT * NJ_parse_distance_matrix(NJ_ARGS *nj_args); void NJ_output_matrix(NJ_ARGS *nj_args, DMAT *dmat); #ifdef __cplusplus } #endif #endif /* _INC_DMAT_H_ */ mothur-1.36.1/source/clearcut/fasta.cpp000066400000000000000000000377261255543666200200610ustar00rootroot00000000000000/* * fasta.c * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * Functions for parsing FASTA formatted alignment files * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #include #include #include #include #include "clearcut.h" #include "common.h" #include "fasta.h" #define NJ_NUM_DNA_AMBIGUITY_SYMS 14 static const char NJ_dna_ambiguity_syms[NJ_NUM_DNA_AMBIGUITY_SYMS] = { 'M', 'R', 'W', 'S', 'Y', 'K', 'V', 'H', 'D', 'B', 'X', 'N', '-', '.' }; #define NJ_NUM_PROTEIN_AMBIGUITY_SYMS 6 static const char NJ_protein_ambiguity_syms[NJ_NUM_PROTEIN_AMBIGUITY_SYMS] = { 'X', 'B', 'Z', '*', '-', '.' }; #define NJ_NUM_DNA_SYMS 5 static const char NJ_dna_syms[NJ_NUM_DNA_SYMS] = { 'A', 'G', 'C', 'T', 'U' }; #define NJ_NUM_PROTEIN_SYMS 20 static const char NJ_protein_syms[NJ_NUM_PROTEIN_SYMS] = { 'A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V' }; /* * NJ_is_whitespace() - Check to see if character is whitespace * * INPUTS: * ------- * c -- character to check * * RETURNS: * -------- * int -- 0 if not whitespace * 1 if whitespace */ static inline int NJ_is_whitespace(char c) { if( c == ' ' || /* space */ c == '\n' || /* newline */ c == '\r' || /* carriage-return */ c == '\v' || /* vertical tab */ c == '\f' || /* form feed */ c == '\t' ) { /* horizontal tab */ return(1); } else { return(0); } } /* * NJ_is_dna() - * * Determines if the given symbol is DNA * * RETURNS: 1 if DNA * 0 if not DNA * */ static inline int NJ_is_dna(char c) { int i; char up_c; up_c = toupper(c); for(i=0;i sequence1 * ATAGATATAGATTAGAATAT----TATAGATAT----ATATAT-TTT- * > sequence2 * --ATAGATA---ATATATATATTTT--GTCTCATAGT---ATATGCTT * > sequence3 * TTATAGATA---ATATATATATTTTAAGTCTCATAGT-A-ATATGC-- * * This function will parse alignments for DNA or protein, and will do * so mindful of ambiguity codes for these kinds of sequences. All * ambiguity codes are ignored by this program for the purposes of * computing a distance matrix from a multiple alignment. By design, * this program does not auto-detect DNA vs. Protein, and requires that * the user explictly specify that on the command-line. * * Gaps can be represented either with the '-' or '.' characters. * * Newlines and other whitespace are allowed to be interspersed * throughout the sequences. * * Taxon labels are required to be unique, and they must start with * an alphabetic character (not a number, etc.). The parser will read * the first token after the > character in the description line up until the * first whitespace and use that for the taxon label. * * For example, in the line "> taxon1 is homo sapien", the taxon label will be * "taxon1" * */ NJ_alignment * NJ_read_fasta(NJ_ARGS *nj_args) { FILE *fp = NULL; char *buf = NULL; char *ptr = NULL; NJ_alignment *alignment = NULL; char c; int state; long int index, x, seq; long int i; long int bufsize, nseqs = NJ_INITIAL_NSEQS; int first_sequence_flag; /* * In this function, we implement a FASTA alignment parser which * reads in an alignment character-by-character, maintaining state * information which guides the parser. * * The program reads either DNA or Protein alignments. All title lines * and sequences can be arbitrarily long. Gaps can be represented by * "-" or "." characters. * * Ambiguity codes are also handled. * */ /* * We can't handle reading fasta input unless the user explicity * specifies the input type...just to be sure. */ if( (!nj_args->dna_flag && !nj_args->protein_flag) || (nj_args->dna_flag && nj_args->protein_flag) ) { fprintf(stderr, "Clearcut: Explicitly specify protein or DNA\n"); goto XIT_BAD; } /* open specified fasta file here */ if(nj_args->stdin_flag) { fp = stdin; } else { fp = fopen(nj_args->infilename, "r"); if(!fp) { fprintf(stderr, "Clearcut: Failed to open input FASTA file: %s\n", nj_args->infilename); perror("Clearcut"); goto XIT_BAD; } } /* allocate the initial buffer */ bufsize = NJ_INITIAL_BUFFER_SIZE; buf = (char *)calloc(bufsize, sizeof(char)); /* allocate the alignment container here */ alignment = (NJ_alignment *)calloc(1, sizeof(NJ_alignment)); /* allocate initial title array */ // printf("allocating initial title array\n"); alignment->titles = (char **)calloc(NJ_INITIAL_NSEQS, sizeof(char *)); /* make sure that we successfully allocated memory */ if(!buf || !alignment || !alignment->titles) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_read_fasta()\n"); goto XIT_BAD; } /* a flag */ first_sequence_flag = 1; index = 0; /* tracks the position in buffer */ x = 0; /* tracks the position on sequence */ seq = 0; /* tracks the active sequence */ /* intitial state of state machine */ state = NJ_FASTA_MODE_UNKNOWN; while(1) { /* get the next character */ c = fgetc(fp); if(feof(fp)) { if(state == NJ_FASTA_MODE_SEQUENCE) { buf[index+1] = '\0'; /* copy buf to alignment */ for(i=1;i<=alignment->length;i++) { alignment->data[seq*alignment->length+i-1] = buf[i]; } } break; } /* make sure our dynamic buffer is big enough */ if(index >= bufsize) { bufsize *= 2; buf = (char *)realloc(buf, bufsize); if(!buf) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_read_fasta()\n"); goto XIT_BAD; } } switch(state) { case NJ_FASTA_MODE_UNKNOWN: if(!NJ_is_whitespace(c)) { if(c == '>') { state = NJ_FASTA_MODE_TITLE; buf[0] = '>'; } else { goto XIT_BAD; } } break; case NJ_FASTA_MODE_TITLE: if( c == '\n' || c == '\r' ) { buf[index] = '\0'; state = NJ_FASTA_MODE_SEQUENCE; index = 0; x = -1; /* make sure we've allocated enough space for titles and sequences */ if(seq == nseqs) { // printf("realloc(). seq = %d, nseqs = %d\n", seq, nseqs); nseqs *= 2; alignment->titles = (char **)realloc(alignment->titles, nseqs*sizeof(char *)); if(!alignment->titles) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_read_fasta()\n"); goto XIT_BAD; } alignment->data = (char *)realloc(alignment->data, alignment->length*nseqs*sizeof(char)); if(!alignment->data) { fprintf(stderr, "Clearcut: Allocation error in NJ_read_fasta()\n"); goto XIT_BAD; } } // printf("Allocating %d bytes for title %d: %s\n", (int)strlen(buf), (int)seq, buf); alignment->titles[seq] = (char *)calloc(strlen(buf), sizeof(char)); if(!alignment->titles[seq]) { fprintf(stderr, "Clearcut: Memory allocation error in NJ_read_fasta()\n"); goto XIT_BAD; } /* lets forward to the first non-space (space/tab) character after the '>' */ if(first_sequence_flag) { ptr = buf; } else { ptr = &buf[1]; } while(*ptr == '\t' || *ptr == ' ') { ptr++; } sscanf(ptr, "%s", alignment->titles[seq]); /* get the first word and use as the title */ alignment->nseq++; } buf[index++] = c; break; case NJ_FASTA_MODE_SEQUENCE: if(c == '>') { if(first_sequence_flag) { first_sequence_flag = 0; /* allocate our alignment data section here */ alignment->length = index-1; nseqs = NJ_INITIAL_NSEQS; alignment->data = (char *)calloc(alignment->length*nseqs, sizeof(char)); if(!alignment->data) { fprintf(stderr, "Clearcut: Allocation error in NJ_read_fasta()\n"); goto XIT_BAD; } } if(!first_sequence_flag) { if(index-1 < alignment->length) { fprintf(stderr, "Clearcut: Sequences must be of uniform length in alignment at sequence %ld\n", seq); goto XIT_BAD; } } /* re-allocate if necessary */ /* if(seq >= nseqs) { nseqs *= 2; alignment->data = (char *)realloc(alignment->data, alignment->length*nseqs*sizeof(char)); if(!alignment->data) { fprintf(stderr, "Clearcut: Allocation error in NJ_read_fasta()\n"); goto XIT_BAD; } } */ /* copy buf to alignment */ for(i=1;i<=alignment->length;i++) { alignment->data[seq*alignment->length+i-1] = buf[i]; } state = NJ_FASTA_MODE_TITLE; index = 1; x = 1; buf[0] = c; seq++; } else { if(NJ_is_whitespace(c)) { break; } if(!first_sequence_flag) { if(index-1 >= alignment->length) { fprintf(stderr, "Clearcut: Sequences must be of uniform length in alignment at sequence %ld\n", seq); goto XIT_BAD; } } /* * Here we check to make sure that the symbol read is appropriate * for the type of data the user specified. (dna or protein). * We also handle ambiguity codes by converting them to a specific * assigned ambiguity code character. Ambiguity codes are ignored * when computing distances */ if(nj_args->dna_flag) { if(NJ_is_dna(c)) { buf[index++] = toupper(c); } else { if(NJ_is_dna_ambiguity(c)) { buf[index++] = NJ_AMBIGUITY_CHAR; } else { fprintf(stderr, "Clearcut: Unknown symbol '%c' in nucleotide sequence %ld.\n", c, seq); goto XIT_BAD; } } } else if(nj_args->protein_flag) { if(NJ_is_protein(c)) { buf[index++] = toupper(c); } else { if(NJ_is_protein_ambiguity(c)) { buf[index++] = NJ_AMBIGUITY_CHAR; } else { fprintf(stderr, "Clearcut: Unknown symbol '%c' in protein sequence %ld.\n", c, seq); goto XIT_BAD; } } } } break; default: goto XIT_BAD; break; } } if(index-1 != alignment->length) { fprintf(stderr, "Clearcut: Sequences must be of uniform length in alignment at sequence %ld\n", seq); goto XIT_BAD; } /* check for duplicate taxon labels */ if(!NJ_taxaname_unique(alignment)) { goto XIT_BAD; } return(alignment); XIT_BAD: if(fp) { fprintf(stderr, "Clearcut: Fatal error parsing FASTA file at file offset %ld.\n", ftell(fp)); } if(buf) { free(buf); } NJ_free_alignment(alignment); return(NULL); } /* * NJ_print_alignment() - Print multiple sequence alignment (for debugging) * * INPUTS: * ------- * alignment -- A pointer to the alignment * * RETURNS: * -------- * NONE * */ void NJ_print_alignment(NJ_alignment *alignment) { long int i, j; printf("nseq = %ld, length = %ld\n", alignment->nseq, alignment->length); for(i=0;inseq;i++) { printf("> %s\n", alignment->titles[i]); for(j=0;jlength;j++) { printf("%c", alignment->data[i*alignment->length+j]); } printf("\n"); } return; } /* * * NJ_free_alignment() - Free all of the memory allocated for the * multiple sequence alignment * * INPUTS: * ------- * alignment -- A pointer to the multiple sequence alignment * * RETURNS: * -------- * NONE * */ void NJ_free_alignment(NJ_alignment *alignment) { long int i; if(alignment) { /* free the allocated titles */ if(alignment->titles) { for(i=0;inseq;i++) { if(alignment->titles[i]) { free(alignment->titles[i]); } } free(alignment->titles); } /* free the alignment data */ if(alignment->data) { free(alignment->data); } /* free the alignment itself */ free(alignment); } return; } /* * NJ_taxaname_unique() - Check to see if taxanames are unique in alignment * * INPUTS: * ------- * alignment -- a pointer to a multiple sequence alignment * * OUTPUTS: * -------- * int -- 0 if all taxanames in alignment are unique * 1 if all taxanames in alignment are NOT unique * * * DESCRIPTION: * ------------ * * Check to see if the taxanames in the alignment are unique. It * will be impossible to make sense of the final tree if the taxon * labels are not unqiue. * */ int NJ_taxaname_unique(NJ_alignment *alignment) { long int i, j; for(i=0;inseq;i++) { for(j=i+1;jnseq;j++) { if(!strcmp(alignment->titles[i], alignment->titles[j])) { fprintf(stderr, "Clearcut: Taxa %ld and %ld (%s) do not have unique taxon labels.\n", i, j, alignment->titles[i]); return(0); } } } return(1); } void NJ_print_titles(NJ_alignment *alignment) { int i; for(i=0;inseq;i++) { printf("%d: %s\n", i, alignment->titles[i]); } return; } mothur-1.36.1/source/clearcut/fasta.h000066400000000000000000000050051255543666200175070ustar00rootroot00000000000000/* * fasta.h * * $Id$ * ***************************************************************************** * * Copyright (c) 2004, Luke Sheneman * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * + Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * + Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * + The names of its contributors may not be used to endorse or promote * products derived from this software without specific prior * written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * ***************************************************************************** * * AUTHOR: * * Luke Sheneman * sheneman@cs.uidaho.edu * */ #ifndef _INC_NJ_FASTA_H_ #define _INC_NJ_FASTA_H_ 1 #ifdef __cplusplus extern "C" { #endif #include "clearcut.h" #define NJ_INITIAL_BUFFER_SIZE 512 #define NJ_INITIAL_NSEQS 64 #define NJ_FASTA_MODE_TITLE 100 #define NJ_FASTA_MODE_SEQUENCE 101 #define NJ_FASTA_MODE_NEWLINE 102 #define NJ_FASTA_MODE_UNKNOWN 103 typedef struct _STRUCT_NJ_ALIGNMENT { long int nseq; long int length; char **titles; char *data; } NJ_alignment; NJ_alignment * NJ_read_fasta(NJ_ARGS *nj_args); void NJ_print_alignment(NJ_alignment *alignment); void NJ_free_alignment(NJ_alignment *alignment); int NJ_taxaname_unique(NJ_alignment *alignment); #ifdef __cplusplus } #endif #endif /* _INC_NJ_FASTA_H_ */ mothur-1.36.1/source/clearcut/getopt_long.cpp000066400000000000000000001565551255543666200213060ustar00rootroot00000000000000/* This getopt_long() is compatible with GNU's, however, added original extention (short 1 byte option). Copyright (c) 2004 Koji Arai Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Compilation for Test: GNU: cc -DUSE_GNU -DDEBUG getopt_long.c -o test_getopt_long_gnu not GNU: cc -I. -DDEBUG getopt_long.c -o test_getopt_long ./test_getopt_long ./test_getopt_long_gnu BUGS: * not implemented any features for getopt() and getopt_long(). */ #include #include #include #if DEBUG static int puts_argv(char **argv) { int i; for (i = 0; argv[i]; i++) { if (i) printf(" "); printf("%s", argv[i]); } printf("\n"); return 0; } #endif #ifndef USE_GNU #include #include "getopt_long.h" char *optarg; int optind; int opterr; int optopt; /* return value 0: no option (include '-') 1: short option like '-x' 2: long option like '--xxx' and just '--' */ static int is_option(char *arg) { if (arg[0] == '-') { switch (arg[1]) { case 0: /* just "-" */ return 0; case '-': /* long option (include just "--")*/ return 2; default: /* short option */ return 1; } } return 0; } static int insert_argv(char **argv, int src, int dest) { int i; char *tmp = argv[src]; if (src > dest) { for (i = src; i > dest; i--) argv[i] = argv[i-1]; //printf("%s\n", argv[i]); } if (src < dest) { for (i = src; i < dest; i++) argv[i] = argv[i+1]; //printf("%s\n", argv[i]); } argv[dest] = tmp; //printf("%s\n", argv[dest]); return 0; } static int search_longopt(char *arg, struct option *longopts) { int i, found = -1; int len; for (len = 0; arg[len] && arg[len] != '='; len++) ; for (i = 0; longopts[i].name; i++) { //printf("%d\t%s\t", i, longopts[i].name); //printf("%s\n", arg); if (strncmp(arg, longopts[i].name, len) == 0) { if (found != -1) return -1; /* found some candidate */ found = i; //printf("found = %d\n", found); } } return found; } /* * implemented my extention feature. * optional 1 byte argument with [...] * e.g.) shortopts = "a[0123]b" * accepts "-a0 -a1b" (same as "-a0 -a1 -b") */ static int has_argument_short(char *arg, const char *shortopts) { int i; int open_bracket = 0; for (i = 0; shortopts[i]; i++) { switch (shortopts[i]) { case '[': open_bracket++; continue; case ']': if (open_bracket <= 0) { fprintf(stderr, "getopt_long() -- unbalanced bracket in short options"); return -1; } open_bracket--; continue; } if (open_bracket) continue; if (*arg != shortopts[i]) continue; switch (shortopts[i+1]) { case ':': if (shortopts[i+2] != ':') { if (arg[1]) return 1; /* following string is argument */ else return 2; /* next argv is argument */ } else { /* '::' means optional argument (GNU extention) */ if (arg[1]) return 1; else return 0; /* no argument */ } case '[': if (arg[1] == '\0') return 0; /* no argument */ /* my extention */ for (i++; shortopts[i] && shortopts[i] != ']'; i++) { if (arg[1] == shortopts[i]) return 3; /* has 1 byte argument */ } if (!shortopts[i]) { fprintf(stderr, "getopt_long() -- unbalanced bracket in short options"); return -1; } break; default: return 0; /* no argument */ } } /* Invalid option */ return -1; } static int has_argument_long(char *arg, struct option *longopts) { int i; i = search_longopt(arg, longopts); if (i == -1) { /* Invalid option */ return -1; } else { int len = strlen(arg); char *p = strchr(arg, '='); if (p) { len = p - arg; } switch (longopts[i].has_arg) { case no_argument: return 0; case required_argument: if (arg[len] == '=') return 1; else return 2; case optional_argument: if (arg[len] == '=') return 1; else return 0; default: assert(0); } } } /* -1: no option 0: no argument 1: has argument in this argv 2: has argument in next argv 3: has 1 byte argument in this argv */ static int has_argument(char *arg, const char *shortopts, struct option *longopts) { int i, n; switch (is_option(arg)) { case 0: /* no option */ return -1; case 1: /* short option */ n = -1; for (i = 1; arg[i]; i++) { n = has_argument_short(arg+i, shortopts); if (n == 0 && arg[i+1]) continue; if (n == 3 && arg[i+2]) { i++; continue; } break; } return n; case 2: /* long option */ return has_argument_long(arg+2, longopts); break; default: assert(0); } } int getopt_long(int argc, char **argv, const char *shortopts, struct option *longopts, int *indexptr) { char *opt; int i; static int shortoptind; static int no_optind = 0; if (optind == 0) { /* skip first argument (command name) */ optind++; no_optind = 0; shortoptind = 0; } optarg = 0; if (no_optind && !shortoptind) { while (!is_option(argv[no_optind])) insert_argv(argv, no_optind, optind-1); if (has_argument(argv[no_optind], shortopts, longopts) == 2) no_optind += 2; else no_optind++; if (argv[optind] && strcmp(argv[optind], "--") == 0) { while (!is_option(argv[no_optind])) insert_argv(argv, no_optind, optind); optind = no_optind; no_optind = 0; } } if (optind >= argc) goto end_of_option; retry: /* puts_argv(&argv[optind]); */ opt = argv[optind]; if (shortoptind == 0 && is_option(opt) == 1) { shortoptind++; } if (shortoptind) { /* short option */ char *p = &opt[shortoptind]; if (*p == '\0') assert(0); switch (has_argument_short(p, shortopts)) { case 0: /* no argument */ optarg = 0; shortoptind++; if (opt[shortoptind] == '\0') optind++, shortoptind = 0; return *p; case 1: /* following character is argument */ optind++, shortoptind = 0; optarg = &p[1]; return *p; case 2: /* next argv is argument */ optind++, shortoptind = 0; optarg = argv[optind++]; return *p; case 3: /* has 1 byte argument */ optarg = &p[1]; if (p[2] == 0) optind++, shortoptind = 0; else shortoptind += 2; return *p; default: /* Invalid option */ if (opterr) fprintf(stderr, "%s: invalid option -- %c\n", argv[0], *p); optind++, shortoptind = 0; optopt = *p; return '?'; } } else if (opt[0] == '-' && opt[1] == '-') { /* long option */ if (opt[2] == '\0') { /* end of command line switch */ optind++; return -1; } opt += 2; i = search_longopt(opt, longopts); if (i == -1) { optind++; optopt = 0; return '?'; } else { int len = strlen(opt); char *p = strchr(opt, '='); if (p) { len = p - opt; } switch (longopts[i].has_arg) { case no_argument: break; case required_argument: if (opt[len] == '=') optarg = opt + len + 1; else { optind++; optarg = argv[optind]; if (optarg == 0) { if (opterr) fprintf(stderr, "%s: option `--%s' requires an argument\n", argv[0], opt); optopt = 0; return '?'; /* no argument */ } } break; case optional_argument: if (opt[len] == '=') optarg = opt + len + 1; else { optarg = 0; } break; default: break; } *indexptr = i; optind++; if (longopts[i].flag) { *longopts[i].flag = longopts[i].val; return 0; } else { return longopts[i].val; } } optind++; optopt = 0; return '?'; } /* not option */ if (no_optind == 0) no_optind = optind; for (i = optind; argv[i]; i++) { if (is_option(argv[i])) { optind = i; goto retry; } } end_of_option: if (no_optind) { optind = no_optind; no_optind = 0; } return -1; } #endif /* USE_GNU */ #if DEBUG #include #include #include #if USE_GNU #include /* use GNU getopt_long() */ #endif static int verbose_flag; static int option_index; int argc; char *argv[50]; char **p; int c; static struct option long_options[] = { {"verbose", no_argument, &verbose_flag, 1}, {"brief", no_argument, &verbose_flag, 0}, {"add", required_argument, 0, 'a'}, {"append", no_argument, 0, 0}, {"delete", required_argument, 0, 0}, {"create", optional_argument, 0, 0}, {"change", optional_argument, 0, 0}, {0, 0, 0, 0} }; int call_getopt_long(int argc, char **argv, const char *shortopts, struct option *longopts, int *indexptr) { int c; c = getopt_long(argc, argv, shortopts, longopts, indexptr); puts_argv(argv); printf("ret=%d(%c) option_index=%d ", c, c, option_index); printf("optind=%d optarg=[%s] opterr=%d optopt=%d(%c)\n", optind, optarg, opterr, optopt, optopt); if (c == 0) { struct option *opt; opt = &longopts[*indexptr]; printf("long option: --%s has_arg=%d\n", opt->name, opt->has_arg); if (opt->flag) printf(" flag=[%8p] val=%d\n", opt->flag, *opt->flag); } return c; } void test1() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "-bcd"; argc++; *p++ = "-d"; argc++; *p++ = "-e"; argc++; *p++ = "-f"; argc++; *p++ = "-g"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 'a'); assert(option_index == 0); assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 'b'); assert(option_index == 0); assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 'c'); assert(option_index == 0); assert(optind == 3); assert(optarg == &argv[2][3]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 'd'); assert(option_index == 0); assert(optind == 5); assert(optarg == argv[4]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == '?'); assert(option_index == 0); assert(optind == 6); assert(optarg == 0); assert(optopt == 'f'); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == '?'); assert(option_index == 0); assert(optind == 7); assert(optarg == 0); assert(optopt == 'g'); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == -1); assert(option_index == 0); assert(optind == 7); assert(optarg == 0); assert(optopt == 'g'); /* no changed */ } void test2() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "--verbose"; argc++; *p++ = "--brief"; argc++; *p++ = "--add"; argc++; *p++ = "add_argument"; argc++; *p++ = "--add=add_argument"; argc++; *p++ = "--append"; argc++; *p++ = "--delete=del_argument"; argc++; *p++ = "--create=cre_argument"; argc++; *p++ = "--create"; argc++; *p++ = "files..."; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 0); assert(option_index == 0); assert(optind == 2); assert(optarg == 0); assert(optopt == 'g'); /* no changed */ assert(strcmp(long_options[option_index].name, "verbose") == 0); assert(*long_options[option_index].flag == 1); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 0); assert(option_index == 1); assert(optind == 3); assert(optarg == 0); assert(optopt == 'g'); /* no changed */ assert(strcmp(long_options[option_index].name, "brief") == 0); assert(*long_options[option_index].flag == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 'a'); assert(option_index == 2); assert(optind == 5); assert(optarg == argv[4]); assert(optopt == 'g'); /* no changed */ assert(strcmp(long_options[option_index].name, "add") == 0); assert(long_options[option_index].flag == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 'a'); assert(option_index == 2); assert(optind == 6); assert(optarg == argv[5]+6); assert(optopt == 'g'); /* no changed */ assert(strcmp(long_options[option_index].name, "add") == 0); assert(long_options[option_index].flag == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 0); assert(option_index == 3); assert(optind == 7); assert(optarg == 0); assert(optopt == 'g'); /* no changed */ assert(strcmp(long_options[option_index].name, "append") == 0); assert(long_options[option_index].flag == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 0); assert(option_index == 4); assert(optind == 8); assert(optarg == argv[7]+9); assert(optopt == 'g'); /* no changed */ assert(strcmp(long_options[option_index].name, "delete") == 0); assert(long_options[option_index].flag == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 0); assert(option_index == 5); assert(optind == 9); assert(optarg == argv[8]+9); assert(optopt == 'g'); /* no changed */ assert(strcmp(long_options[option_index].name, "create") == 0); assert(long_options[option_index].flag == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == 0); assert(option_index == 5); assert(optind == 10); assert(optarg == 0); assert(optopt == 'g'); /* no changed */ assert(strcmp(long_options[option_index].name, "create") == 0); assert(long_options[option_index].flag == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == -1); assert(option_index == 5); /* no changed */ assert(optind == 10); assert(optarg == 0); assert(optopt == 'g'); /* no changed */ assert(strcmp(argv[optind], "files...") == 0); } void test3() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "--delete"; /* required argument has no argument */ *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == '?'); assert(option_index == 5); /* no changed */ assert(optind == 2); /* changed */ assert(optarg == 0); assert(optopt == 0); /* changed */ assert(argv[optind] == 0); /* */ optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "--file"; /* not option */ *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); assert(c == '?'); assert(option_index == 5); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); assert(argv[optind] == 0); } void test4() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "a1"; argc++; *p++ = "a2"; argc++; *p++ = "-b"; argc++; *p++ = "b"; argc++; *p++ = "-efg"; /* some options in a argument */ argc++; *p++ = "g"; argc++; *p++ = "-c"; argc++; *p++ = "c"; argc++; *p++ = "d"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:efg:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "a1") == 0); assert(strcmp(*p++, "a2") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "b") == 0); assert(strcmp(*p++, "-efg") == 0); assert(strcmp(*p++, "g") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'a'); assert(option_index == 5); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:efg:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "a1") == 0); assert(strcmp(*p++, "a2") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "b") == 0); assert(strcmp(*p++, "-efg") == 0); assert(strcmp(*p++, "g") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'b'); assert(option_index == 5); /* no changed */ assert(optind == 5); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:efg:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "a1") == 0); assert(strcmp(*p++, "a2") == 0); assert(strcmp(*p++, "b") == 0); assert(strcmp(*p++, "-efg") == 0); assert(strcmp(*p++, "g") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'e'); assert(option_index == 5); /* no changed */ assert(optind == 6); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:efg:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "a1") == 0); assert(strcmp(*p++, "a2") == 0); assert(strcmp(*p++, "b") == 0); assert(strcmp(*p++, "-efg") == 0); assert(strcmp(*p++, "g") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'f'); assert(option_index == 5); /* no changed */ assert(optind == 6); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:efg:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "a1") == 0); assert(strcmp(*p++, "a2") == 0); assert(strcmp(*p++, "b") == 0); assert(strcmp(*p++, "-efg") == 0); assert(strcmp(*p++, "g") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'g'); assert(option_index == 5); /* no changed */ assert(optind == 8); assert(optarg == argv[7]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:efg:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "-efg") == 0); assert(strcmp(*p++, "g") == 0); assert(strcmp(*p++, "a1") == 0); assert(strcmp(*p++, "a2") == 0); assert(strcmp(*p++, "b") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 5); /* no changed */ assert(optind == 10); assert(optarg == argv[9]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:efg:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "-efg") == 0); assert(strcmp(*p++, "g") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "a1") == 0); assert(strcmp(*p++, "a2") == 0); assert(strcmp(*p++, "b") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 5); /* no changed */ assert(optind == 7); assert(optarg == 0); assert(optopt == 0); } void test5() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "-"; argc++; *p++ = "-b"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-b") == 0); assert(*p == 0); assert(c == 'a'); assert(option_index == 5); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-b") == 0); assert(*p == 0); assert(c == 'b'); assert(option_index == 5); /* no changed */ assert(optind == 4); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "-") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 5); /* no changed */ assert(optind == 3); assert(optarg == 0); assert(optopt == 0); } void test6() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "-"; argc++; *p++ = "-"; argc++; *p++ = "-b"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-b") == 0); assert(*p == 0); assert(c == 'a'); assert(option_index == 5); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-b") == 0); assert(*p == 0); assert(c == 'b'); assert(option_index == 5); /* no changed */ assert(optind == 5); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-b") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 5); /* no changed */ assert(optind == 3); assert(optarg == 0); assert(optopt == 0); } void test7() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "-"; argc++; *p++ = "-"; argc++; *p++ = "-c"; argc++; *p++ = "c"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(*p == 0); assert(c == 'a'); assert(option_index == 5); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 5); /* no changed */ assert(optind == 6); assert(optarg == argv[5]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 5); /* no changed */ assert(optind == 4); assert(optarg == 0); assert(optopt == 0); } void test8() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "-c"; argc++; *p++ = "c"; argc++; *p++ = "--"; argc++; *p++ = "-d"; argc++; *p++ = "d"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'a'); assert(option_index == 5); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 5); /* no changed */ assert(optind == 4); assert(optarg == argv[3]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 5); /* no changed */ assert(optind == 5); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'd'); assert(option_index == 5); /* no changed */ assert(optind == 7); assert(optarg == argv[6]); assert(optopt == 0); } void test9() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "-"; argc++; *p++ = "-"; argc++; *p++ = "-c"; argc++; *p++ = "c"; argc++; *p++ = "--"; argc++; *p++ = "-d"; argc++; *p++ = "d"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'a'); assert(option_index == 5); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 5); /* no changed */ assert(optind == 6); assert(optarg == argv[5]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 5); /* no changed */ assert(optind == 5); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc:d:", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "c") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == 'd'); assert(option_index == 5); /* no changed */ assert(optind == 9); assert(optarg == argv[8]); assert(optopt == 0); } void test10() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "-cc"; argc++; *p++ = "-d"; argc++; *p++ = "d"; argc++; *p++ = "-c"; /* no argument */ argc++; *p++ = "-d"; /* at last */ *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-cc") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'a'); assert(option_index == 5); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-cc") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 5); /* no changed */ assert(optind == 3); assert(optarg == argv[2]+2); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-cc") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'd'); assert(option_index == 5); /* no changed */ assert(optind == 4); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-cc") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 5); /* no changed */ assert(optind == 6); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-cc") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'd'); assert(option_index == 5); /* no changed */ assert(optind == 7); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-cc") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "-c") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 5); /* no changed */ assert(optind == 6); assert(optarg == 0); assert(optopt == 0); } void test11() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "--verbose"; argc++; *p++ = "--create=c"; argc++; *p++ = "--change"; argc++; *p++ = "d"; argc++; *p++ = "--create"; /* no argument */ argc++; *p++ = "--change"; /* at last */ *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--change") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 0); assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--change") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 5); assert(optind == 3); assert(optarg == argv[2]+9); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--change") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 6); assert(optind == 4); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--change") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 5); assert(optind == 6); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--change") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 6); assert(optind == 7); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--change") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 6); assert(optind == 6); assert(optarg == 0); assert(optopt == 0); } void test12() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "--verbose"; argc++; *p++ = "--create=c"; argc++; *p++ = "files..."; argc++; *p++ = "--delete"; /* required argument */ argc++; *p++ = "d"; argc++; *p++ = "--create"; /* no argument */ argc++; *p++ = "--change"; /* at last */ *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 0); assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 5); assert(optind == 3); assert(optarg == argv[2]+9); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 4); assert(optind == 6); assert(optarg == argv[5]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 5); assert(optind == 7); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 6); assert(optind == 8); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--create") == 0); assert(strcmp(*p++, "--change") == 0); assert(strcmp(*p++, "files...") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 6); assert(optind == 7); assert(optarg == 0); assert(optopt == 0); } void test13() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "--verbose"; argc++; *p++ = "--create=c"; argc++; *p++ = "files..."; argc++; *p++ = "--delete"; argc++; *p++ = "d"; argc++; *p++ = "--"; /* option terminator */ argc++; *p++ = "--change"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 0); assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 5); assert(optind == 3); assert(optarg == argv[2]+9); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == 0); assert(option_index == 4); assert(optind == 6); assert(optarg == argv[5]); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc::d::", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "--verbose") == 0); assert(strcmp(*p++, "--create=c") == 0); assert(strcmp(*p++, "--delete") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "--") == 0); assert(strcmp(*p++, "files...") == 0); assert(strcmp(*p++, "--change") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 4); assert(optind == 6); assert(optarg == 0); assert(optopt == 0); } void test14() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-o5"; argc++; *p++ = "files..."; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "o[567]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-o5") == 0); assert(strcmp(*p++, "files...") == 0); assert(*p == 0); assert(c == 'o'); assert(option_index == 4); /* no changed */ assert(optind == 2); assert(optarg == argv[1]+2); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-o5") == 0); assert(strcmp(*p++, "files...") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 4); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); } void test15() { optind = 0; argc = 0; p = argv; argc++; *p++ = "command_name"; argc++; *p++ = "-a"; argc++; *p++ = "-ccd"; argc++; *p++ = "-ce"; argc++; *p++ = "-d"; argc++; *p++ = "d"; argc++; *p++ = "-cdd"; argc++; *p++ = "-d"; *p = 0; /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'a'); assert(option_index == 4); /* no changed */ assert(optind == 2); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 4); /* no changed */ assert(optind == 2); assert(optarg == argv[2]+2); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'd'); assert(option_index == 4); /* no changed */ assert(optind == 3); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 4); /* no changed */ assert(optind == 4); assert(optarg == argv[3]+2); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'd'); assert(option_index == 4); /* no changed */ assert(optind == 5); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'c'); assert(option_index == 4); /* no changed */ assert(optind == 6); assert(optarg == argv[6]+2); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'd'); assert(option_index == 4); /* no changed */ assert(optind == 7); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "d") == 0); assert(strcmp(*p++, "-d") == 0); assert(*p == 0); assert(c == 'd'); assert(option_index == 4); /* no changed */ assert(optind == 8); assert(optarg == 0); assert(optopt == 0); /*************************/ c = call_getopt_long(argc, argv, "abc[cde]d[fgh]", long_options, &option_index); p = argv; assert(strcmp(*p++, "command_name") == 0); assert(strcmp(*p++, "-a") == 0); assert(strcmp(*p++, "-ccd") == 0); assert(strcmp(*p++, "-ce") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "-cdd") == 0); assert(strcmp(*p++, "-d") == 0); assert(strcmp(*p++, "d") == 0); assert(*p == 0); assert(c == -1); assert(option_index == 4); /* no changed */ assert(optind == 7); assert(optarg == 0); assert(optopt == 0); } /* int main() { opterr = 0; optopt = 0; test1(); test2(); test3(); test4(); test5(); test6(); test7(); test8(); test9(); test10(); test11(); test12(); test13(); #ifndef USE_GNU test14(); test15(); #endif return 0; } */ #endif mothur-1.36.1/source/clearcut/getopt_long.h000066400000000000000000000032771255543666200207430ustar00rootroot00000000000000/* This getopt_long() is compatible with GNU's, however, added original extention (short 1 byte option). Copyright (c) 2004 Koji Arai Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. */ #ifndef _GETOPT_H #ifdef __cplusplus extern "C" { #endif struct option { const char *name; int has_arg; /* values of has_arg */ #define no_argument 0 #define required_argument 1 #define optional_argument 2 int *flag; int val; }; extern char *optarg; extern int optind; int getopt_long(int argc, char **argv, const char *shortopts, struct option *longopts, int *indexptr); #ifdef __cplusplus } #endif #endif /* _GETOPT_H */ mothur-1.36.1/source/cluster.cpp000066400000000000000000000152101255543666200166220ustar00rootroot00000000000000/* * cluster.cpp * * * Created by Pat Schloss on 8/14/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "cluster.hpp" #include "rabundvector.hpp" #include "listvector.hpp" /***********************************************************************/ Cluster::Cluster(RAbundVector* rav, ListVector* lv, SparseDistanceMatrix* dm, float c, string f, float cs) : rabund(rav), list(lv), dMatrix(dm), method(f), adjust(cs) { try { mapWanted = false; //set to true by mgcluster to speed up overlap merge //save so you can modify as it changes in average neighbor cutoff = c; m = MothurOut::getInstance(); } catch(exception& e) { m->errorOut(e, "Cluster", "Cluster"); exit(1); } } /***********************************************************************/ void Cluster::clusterBins(){ try { rabund->set(smallCol, rabund->get(smallRow)+rabund->get(smallCol)); rabund->set(smallRow, 0); rabund->setLabel(toString(smallDist)); } catch(exception& e) { m->errorOut(e, "Cluster", "clusterBins"); exit(1); } } /***********************************************************************/ void Cluster::clusterNames(){ try { if (mapWanted) { updateMap(); } list->set(smallCol, list->get(smallRow)+','+list->get(smallCol)); list->set(smallRow, ""); list->setLabel(toString(smallDist)); } catch(exception& e) { m->errorOut(e, "Cluster", "clusterNames"); exit(1); } } /***********************************************************************/ void Cluster::update(double& cutOFF){ try { smallCol = dMatrix->getSmallestCell(smallRow); nColCells = dMatrix->seqVec[smallCol].size(); nRowCells = dMatrix->seqVec[smallRow].size(); vector foundCol(nColCells, 0); //cout << dMatrix->getNNodes() << " small cell: " << smallRow << '\t' << smallCol << endl; int search; bool changed; for (int i=nRowCells-1;i>=0;i--) { if (m->control_pressed) { break; } //if you are not the smallCell if (dMatrix->seqVec[smallRow][i].index != smallCol) { search = dMatrix->seqVec[smallRow][i].index; bool merged = false; for (int j=0;jseqVec[smallCol][j].index != smallRow) { //if you are not the smallest distance if (dMatrix->seqVec[smallCol][j].index == search) { foundCol[j] = 1; merged = true; changed = updateDistance(dMatrix->seqVec[smallCol][j], dMatrix->seqVec[smallRow][i]); dMatrix->updateCellCompliment(smallCol, j); break; }else if (dMatrix->seqVec[smallCol][j].index < search) { //we don't have a distance for this cell if (adjust != -1.0) { //adjust merged = true; PDistCell value(search, adjust); //create a distance for the missing value int location = dMatrix->addCellSorted(smallCol, value); changed = updateDistance(dMatrix->seqVec[smallCol][location], dMatrix->seqVec[smallRow][i]); dMatrix->updateCellCompliment(smallCol, location); nColCells++; foundCol.push_back(0); //add a new found column //adjust value for (int k = foundCol.size()-1; k > location; k--) { foundCol[k] = foundCol[k-1]; } foundCol[location] = 1; } j+=nColCells; } } } //if not merged it you need it for warning if ((!merged) && (method == "average" || method == "weighted")) { if (cutOFF > dMatrix->seqVec[smallRow][i].dist) { cutOFF = dMatrix->seqVec[smallRow][i].dist; //cout << "changing cutoff to " << cutOFF << endl; } } dMatrix->rmCell(smallRow, i); } } clusterBins(); clusterNames(); // Special handling for singlelinkage case, not sure whether this // could be avoided for (int i=nColCells-1;i>=0;i--) { if (foundCol[i] == 0) { if (adjust != -1.0) { //adjust PDistCell value(smallCol, adjust); //create a distance for the missing value changed = updateDistance(dMatrix->seqVec[smallCol][i], value); dMatrix->updateCellCompliment(smallCol, i); }else { if (method == "average" || method == "weighted") { if (dMatrix->seqVec[smallCol][i].index != smallRow) { //if you are not hte smallest distance if (cutOFF > dMatrix->seqVec[smallCol][i].dist) { cutOFF = dMatrix->seqVec[smallCol][i].dist; } } } } dMatrix->rmCell(smallCol, i); } } } catch(exception& e) { m->errorOut(e, "Cluster", "update"); exit(1); } } /***********************************************************************/ void Cluster::setMapWanted(bool f) { try { mapWanted = f; //initialize map for (int k = 0; k < list->getNumBins(); k++) { string names = list->get(k); //parse bin string individual = ""; int binNameslength = names.size(); for(int j=0;jerrorOut(e, "Cluster", "setMapWanted"); exit(1); } } /***********************************************************************/ void Cluster::updateMap() { try { //update location of seqs in smallRow since they move to smallCol now string names = list->get(smallRow); string individual = ""; int binNameslength = names.size(); for(int j=0;jerrorOut(e, "Cluster", "updateMap"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/cluster.hpp000066400000000000000000000045141255543666200166340ustar00rootroot00000000000000#ifndef CLUSTER_H #define CLUSTER_H //test change #include "mothur.h" #include "sparsedistancematrix.h" #include "mothurout.h" class RAbundVector; class ListVector; class Cluster { public: Cluster(RAbundVector*, ListVector*, SparseDistanceMatrix*, float, string, float); virtual ~Cluster() {} virtual void update(double&); virtual string getTag() = 0; virtual void setMapWanted(bool m); virtual map getSeqtoBin() { return seq2Bin; } protected: virtual bool updateDistance(PDistCell& colCell, PDistCell& rowCell) = 0; virtual void clusterBins(); virtual void clusterNames(); virtual void updateMap(); RAbundVector* rabund; ListVector* list; SparseDistanceMatrix* dMatrix; ull smallRow; ull smallCol; float smallDist, adjust; bool mapWanted; float cutoff; map seq2Bin; string method; ull nRowCells; ull nColCells; MothurOut* m; }; /***********************************************************************/ class CompleteLinkage : public Cluster { public: CompleteLinkage(RAbundVector*, ListVector*, SparseDistanceMatrix*, float, string, float); bool updateDistance(PDistCell& colCell, PDistCell& rowCell); string getTag(); private: }; /***********************************************************************/ class SingleLinkage : public Cluster { public: SingleLinkage(RAbundVector*, ListVector*, SparseDistanceMatrix*, float, string, float); //void update(double&); bool updateDistance(PDistCell& colCell, PDistCell& rowCell); string getTag(); private: }; /***********************************************************************/ class AverageLinkage : public Cluster { public: AverageLinkage(RAbundVector*, ListVector*, SparseDistanceMatrix*, float, string, float); bool updateDistance(PDistCell& colCell, PDistCell& rowCell); string getTag(); private: int saveRow; int saveCol; int rowBin; int colBin; int totalBin; }; /***********************************************************************/ class WeightedLinkage : public Cluster { public: WeightedLinkage(RAbundVector*, ListVector*, SparseDistanceMatrix*, float, string, float); bool updateDistance(PDistCell& colCell, PDistCell& rowCell); string getTag(); private: int saveRow; int saveCol; }; /***********************************************************************/ #endif mothur-1.36.1/source/clusterclassic.cpp000066400000000000000000000512211255543666200201660ustar00rootroot00000000000000/* * clusterclassic.cpp * Mothur * * Created by westcott on 10/29/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "clusterclassic.h" #include "progress.hpp" /***********************************************************************/ ClusterClassic::ClusterClassic(float c, string f, bool s) : method(f), smallDist(1e6), nseqs(0), sim(s) { try { mapWanted = false; //set to true by mgcluster to speed up overlap merge //save so you can modify as it changes in average neighbor cutoff = c; aboveCutoff = cutoff + 10000.0; m = MothurOut::getInstance(); if(method == "furthest") { tag = "fn"; } else if (method == "average") { tag = "an"; } else if (method == "weighted") { tag = "wn"; } else if (method == "nearest") { tag = "nn"; } } catch(exception& e) { m->errorOut(e, "ClusterClassic", "ClusterClassic"); exit(1); } } /***********************************************************************/ int ClusterClassic::readPhylipFile(string filename, NameAssignment* nameMap) { try { double distance; int square; string name; vector matrixNames; ifstream fileHandle; m->openInputFile(filename, fileHandle); string numTest; fileHandle >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } matrixNames.push_back(name); if(nameMap == NULL){ list = new ListVector(nseqs); list->set(0, name); } else{ list = new ListVector(nameMap->getListVector()); if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } } //initialize distance matrix to cutoff dMatrix.resize(nseqs); //colDist temp(0, 0, aboveCutoff); //rowSmallDists.resize(nseqs, temp); for (int i = 1; i < nseqs; i++) { dMatrix[i].resize(i, aboveCutoff); } char d; while((d=fileHandle.get()) != EOF){ if(isalnum(d)){ square = 1; fileHandle.putback(d); for(int i=0;i> distance; } break; } if(d == '\n'){ square = 0; break; } } Progress* reading; if(square == 0){ reading = new Progress("Reading matrix: ", nseqs * (nseqs - 1) / 2); int index = 0; for(int i=1;icontrol_pressed) { fileHandle.close(); delete reading; return 0; } fileHandle >> name; matrixNames.push_back(name); //there's A LOT of repeated code throughout this method... if(nameMap == NULL){ list->set(i, name); for(int j=0;jcontrol_pressed) { delete reading; fileHandle.close(); return 0; } fileHandle >> distance; if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. //if(distance < cutoff){ dMatrix[i][j] = distance; if (distance < smallDist) { smallDist = distance; } //if (rowSmallDists[i].dist > distance) { rowSmallDists[i].dist = distance; rowSmallDists[i].col = j; rowSmallDists[i].row = i; } //if (rowSmallDists[j].dist > distance) { rowSmallDists[j].dist = distance; rowSmallDists[j].col = i; rowSmallDists[j].row = j; } //} index++; reading->update(index); } } else{ if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } for(int j=0;j> distance; if (m->control_pressed) { delete reading; fileHandle.close(); return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. //if(distance < cutoff){ if (distance < smallDist) { smallDist = distance; } int row = nameMap->get(matrixNames[i]); int col = nameMap->get(matrixNames[j]); if (row < col) { dMatrix[col][row] = distance; } else { dMatrix[row][col] = distance; } //if (rowSmallDists[row].dist > distance) { rowSmallDists[row].dist = distance; rowSmallDists[row].col = col; rowSmallDists[row].row = row; } //if (rowSmallDists[col].dist > distance) { rowSmallDists[col].dist = distance; rowSmallDists[col].col = row; rowSmallDists[col].row = col; } //} index++; reading->update(index); } } } } else{ reading = new Progress("Reading matrix: ", nseqs * nseqs); int index = nseqs; for(int i=1;i> name; matrixNames.push_back(name); if(nameMap == NULL){ list->set(i, name); for(int j=0;j> distance; if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(j < i){ if (distance < smallDist) { smallDist = distance; } dMatrix[i][j] = distance; //if (rowSmallDists[i].dist > distance) { rowSmallDists[i].dist = distance; rowSmallDists[i].col = j; rowSmallDists[i].row = i; } //if (rowSmallDists[j].dist > distance) { rowSmallDists[j].dist = distance; rowSmallDists[j].col = i; rowSmallDists[j].row = j; } } index++; reading->update(index); } } else{ if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } for(int j=0;j> distance; if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(j < i){ if (distance < smallDist) { smallDist = distance; } int row = nameMap->get(matrixNames[i]); int col = nameMap->get(matrixNames[j]); if (row < col) { dMatrix[col][row] = distance; } else { dMatrix[row][col] = distance; } //if (rowSmallDists[row].dist > distance) { rowSmallDists[row].dist = distance; rowSmallDists[row].col = col; rowSmallDists[row].row = row; } //if (rowSmallDists[col].dist > distance) { rowSmallDists[col].dist = distance; rowSmallDists[col].col = row; rowSmallDists[col].row = col; } } index++; reading->update(index); } } } } if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } reading->finish(); delete reading; list->setLabel("0"); rabund = new RAbundVector(list->getRAbundVector()); fileHandle.close(); return 0; } catch(exception& e) { m->errorOut(e, "ClusterClassic", "readPhylipFile"); exit(1); } } /***********************************************************************/ int ClusterClassic::readPhylipFile(string filename, CountTable* countTable) { try { double distance; int square; string name; vector matrixNames; ifstream fileHandle; m->openInputFile(filename, fileHandle); string numTest; fileHandle >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } matrixNames.push_back(name); if(countTable == NULL){ list = new ListVector(nseqs); list->set(0, name); } else{ list = new ListVector(countTable->getListVector()); } //initialize distance matrix to cutoff dMatrix.resize(nseqs); //rowSmallDists.resize(nseqs, temp); for (int i = 1; i < nseqs; i++) { dMatrix[i].resize(i, aboveCutoff); } char d; while((d=fileHandle.get()) != EOF){ if(isalnum(d)){ square = 1; fileHandle.putback(d); for(int i=0;i> distance; } break; } if(d == '\n'){ square = 0; break; } } Progress* reading; if(square == 0){ reading = new Progress("Reading matrix: ", nseqs * (nseqs - 1) / 2); int index = 0; for(int i=1;icontrol_pressed) { fileHandle.close(); delete reading; return 0; } fileHandle >> name; matrixNames.push_back(name); //there's A LOT of repeated code throughout this method... if(countTable == NULL){ list->set(i, name); for(int j=0;jcontrol_pressed) { delete reading; fileHandle.close(); return 0; } fileHandle >> distance; if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. //if(distance < cutoff){ dMatrix[i][j] = distance; if (distance < smallDist) { smallDist = distance; } //if (rowSmallDists[i].dist > distance) { rowSmallDists[i].dist = distance; rowSmallDists[i].col = j; rowSmallDists[i].row = i; } //if (rowSmallDists[j].dist > distance) { rowSmallDists[j].dist = distance; rowSmallDists[j].col = i; rowSmallDists[j].row = j; } //} index++; reading->update(index); } } else{ for(int j=0;j> distance; if (m->control_pressed) { delete reading; fileHandle.close(); return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if (distance < smallDist) { smallDist = distance; } int row = countTable->get(matrixNames[i]); int col = countTable->get(matrixNames[j]); if (row < col) { dMatrix[col][row] = distance; } else { dMatrix[row][col] = distance; } index++; reading->update(index); } } } } else{ reading = new Progress("Reading matrix: ", nseqs * nseqs); int index = nseqs; for(int i=1;i> name; matrixNames.push_back(name); if(countTable == NULL){ list->set(i, name); for(int j=0;j> distance; if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(j < i){ if (distance < smallDist) { smallDist = distance; } dMatrix[i][j] = distance; } index++; reading->update(index); } } else{ for(int j=0;j> distance; if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(j < i){ if (distance < smallDist) { smallDist = distance; } int row = countTable->get(matrixNames[i]); int col = countTable->get(matrixNames[j]); if (row < col) { dMatrix[col][row] = distance; } else { dMatrix[row][col] = distance; } } index++; reading->update(index); } } } } if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } reading->finish(); delete reading; list->setLabel("0"); rabund = new RAbundVector(); rabund->setLabel(list->getLabel()); for(int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { break; } vector binNames; string bin = list->get(i); m->splitAtComma(bin, binNames); int total = 0; for (int j = 0; j < binNames.size(); j++) { total += countTable->getNumSeqs(binNames[j]); } rabund->push_back(total); } fileHandle.close(); return 0; } catch(exception& e) { m->errorOut(e, "ClusterClassic", "readPhylipFile"); exit(1); } } /***********************************************************************/ //sets smallCol and smallRow, returns distance double ClusterClassic::getSmallCell() { try { smallDist = aboveCutoff; smallRow = 1; smallCol = 0; vector mins; for(int i=1;i 0) { int zrand = 0; if (mins.size() > 1) { //pick random number between 0 and mins.size() zrand = (int)((float)(rand()) / (RAND_MAX / (mins.size()-1) + 1)); } smallRow = mins[zrand].row; smallCol = mins[zrand].col; } //cout << smallRow << '\t' << smallCol << '\t' << smallDist << endl; //eliminate smallCell if (smallRow < smallCol) { dMatrix[smallCol][smallRow] = aboveCutoff; } else { dMatrix[smallRow][smallCol] = aboveCutoff; } return smallDist; } catch(exception& e) { m->errorOut(e, "ClusterClassic", "getSmallCell"); exit(1); } } /***********************************************************************/ void ClusterClassic::clusterBins(){ try { // cout << smallCol << '\t' << smallRow << '\t' << smallDist << '\t' << rabund->get(smallRow) << '\t' << rabund->get(smallCol); rabund->set(smallRow, rabund->get(smallRow)+rabund->get(smallCol)); rabund->set(smallCol, 0); /*for (int i = smallCol+1; i < rabund->size(); i++) { rabund->set((i-1), rabund->get(i)); } rabund->resize((rabund->size()-1));*/ rabund->setLabel(toString(smallDist)); // cout << '\t' << rabund->get(smallRow) << '\t' << rabund->get(smallCol) << endl; } catch(exception& e) { m->errorOut(e, "ClusterClassic", "clusterBins"); exit(1); } } /***********************************************************************/ void ClusterClassic::clusterNames(){ try { // cout << smallCol << '\t' << smallRow << '\t' << smallDist << '\t' << list->get(smallRow) << '\t' << list->get(smallCol); if (mapWanted) { updateMap(); } list->set(smallRow, list->get(smallRow)+','+list->get(smallCol)); list->set(smallCol, ""); /*for (int i = smallCol+1; i < list->size(); i++) { list->set((i-1), list->get(i)); } list->resize((list->size()-1));*/ list->setLabel(toString(smallDist)); // cout << '\t' << list->get(smallRow) << '\t' << list->get(smallCol) << endl; } catch(exception& e) { m->errorOut(e, "ClusterClassic", "clusterNames"); exit(1); } } /***********************************************************************/ void ClusterClassic::update(double& cutOFF){ try { //print(); getSmallCell(); int r, c; r = smallRow; c = smallCol; for(int i=0;i r) { distRow = dMatrix[i][r]; } else { distRow = dMatrix[r][i]; } if (i > c) { distCol = dMatrix[i][c]; dMatrix[i][c] = aboveCutoff; } //like removeCell else { distCol = dMatrix[c][i]; dMatrix[c][i] = aboveCutoff; } if(method == "furthest"){ newDist = max(distRow, distCol); } else if (method == "average"){ int rowBin = rabund->get(r); int colBin = rabund->get(c); newDist = (colBin * distCol + rowBin * distRow) / (rowBin + colBin); } else if (method == "weighted"){ newDist = (distCol + distRow) / 2.0; } else if (method == "nearest"){ newDist = min(distRow, distCol); } //cout << "newDist = " << newDist << endl; if (i > r) { dMatrix[i][r] = newDist; } else { dMatrix[r][i] = newDist; } } } clusterBins(); clusterNames(); //resize each row /*for(int i=0;ierrorOut(e, "ClusterClassic", "update"); exit(1); } } /***********************************************************************/ void ClusterClassic::setMapWanted(bool f) { try { mapWanted = f; //initialize map for (int i = 0; i < list->getNumBins(); i++) { //parse bin string names = list->get(i); vector binnames; m->splitAtComma(names, binnames); for (int j = 0; j < binnames.size(); j++) { //save name and bin number seq2Bin[binnames[j]] = i; } } } catch(exception& e) { m->errorOut(e, "ClusterClassic", "setMapWanted"); exit(1); } } /***********************************************************************/ void ClusterClassic::updateMap() { try { //update location of seqs in smallRow since they move to smallCol now string names = list->get(smallRow); vector binnames; m->splitAtComma(names, binnames); for (int j = 0; j < binnames.size(); j++) { //save name and bin number seq2Bin[binnames[j]] = smallCol; } } catch(exception& e) { m->errorOut(e, "ClusterClassic", "updateMap"); exit(1); } } /***********************************************************************/ void ClusterClassic::print() { try { //update location of seqs in smallRow since they move to smallCol now for (int i = 0; i < dMatrix.size(); i++) { m->mothurOut("row = " + toString(i) + "\t"); for (int j = 0; j < dMatrix[i].size(); j++) { m->mothurOut(toString(dMatrix[i][j]) + "\t"); } m->mothurOutEndLine(); } } catch(exception& e) { m->errorOut(e, "ClusterClassic", "updateMap"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/clusterclassic.h000066400000000000000000000024731255543666200176400ustar00rootroot00000000000000#ifndef CLUSTERCLASSIC_H #define CLUSTERCLASSIC_H #include "mothurout.h" #include "listvector.hpp" #include "rabundvector.hpp" #include "nameassignment.hpp" #include "counttable.h" /* * clusterclassic.h * Mothur * * Created by westcott on 10/29/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ class ClusterClassic { public: ClusterClassic(float, string, bool); int readPhylipFile(string, NameAssignment*); int readPhylipFile(string, CountTable*); void update(double&); double getSmallDist() { return smallDist; } int getNSeqs() { return nseqs; } ListVector* getListVector() { return list; } RAbundVector* getRAbundVector() { return rabund; } string getTag() { return tag; } void setMapWanted(bool m); map getSeqtoBin() { return seq2Bin; } private: double getSmallCell(); void clusterBins(); void clusterNames(); void updateMap(); void print(); struct colDist { int col; int row; float dist; colDist(int r, int c, double d) : row(r), col(c), dist(d) {} }; RAbundVector* rabund; ListVector* list; vector< vector > dMatrix; //vector rowSmallDists; int smallRow; int smallCol, nseqs; double smallDist; bool mapWanted, sim; double cutoff, aboveCutoff; map seq2Bin; string method, tag; MothurOut* m; }; #endif mothur-1.36.1/source/collect.cpp000066400000000000000000000205331255543666200165720ustar00rootroot00000000000000/* * collect.cpp * Dotur * * Created by Sarah Westcott on 11/18/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "collect.h" /***********************************************************************/ int Collect::getCurve(float percentFreq = 0.01){ try { RAbundVector* lookup = new RAbundVector(order->getNumBins()); SAbundVector* rank = new SAbundVector(order->getMaxRank()+1); CollectorsCurveData* ccd = new CollectorsCurveData(); for(int i=0;iregisterDisplay(displays[i]); //adds a display[i] to cdd displays[i]->init(label); //sets displays label } //convert freq percentage to number int increment = 1; if (percentFreq < 1.0) { increment = numSeqs * percentFreq; } else { increment = percentFreq; } for(int i=0;icontrol_pressed) { delete lookup; delete rank; delete ccd; return 1; } int binNumber = order->get(i); int abundance = lookup->get(binNumber); rank->set(abundance, rank->get(abundance)-1); abundance++; lookup->set(binNumber, abundance); rank->set(abundance, rank->get(abundance)+1); //increment rank(abundance) if((i == 0) || (i+1) % increment == 0){ ccd->updateRankData(rank); } } if(numSeqs % increment != 0){ ccd->updateRankData(rank); } for(int i=0;ireset(); } delete lookup; delete rank; delete ccd; return 0; } catch(exception& e) { m->errorOut(e, "Collect", "getCurve"); exit(1); } } /***********************************************************************/ int Collect::getSharedCurve(float percentFreq = 0.01){ try { vector lookup; vector subset; //create and initialize vector of sharedvectors, one for each group vector mGroups = m->getGroups(); for (int i = 0; i < mGroups.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(sharedorder->getNumBins()); temp->setLabel(sharedorder->getLabel()); temp->setGroup(mGroups[i]); lookup.push_back(temp); } SharedCollectorsCurveData* ccd = new SharedCollectorsCurveData(); //initialize labels for output //makes 'uniqueAB uniqueAC uniqueBC' if your groups are A, B, C getGroupComb(); for(int i=0;iregisterDisplay(displays[i]); //adds a display[i] to cdd bool hasLciHci = displays[i]->hasLciHci(); groupLabel = ""; for (int s = 0; s < groupComb.size(); s++) { if (hasLciHci) { groupLabel = groupLabel + label + groupComb[s] + "\t" + label + groupComb[s] + "lci\t" + label + groupComb[s] + "hci\t"; } else{ groupLabel = groupLabel + label + groupComb[s] + "\t"; } } string groupLabelAll = groupLabel + label + "all\t"; if ((displays[i]->isCalcMultiple() == true) && (displays[i]->getAll() == true)) { displays[i]->init(groupLabelAll); } else { displays[i]->init(groupLabel); } } //convert freq percentage to number int increment = 1; if (percentFreq < 1.0) { increment = numSeqs * percentFreq; } else { increment = percentFreq; } //sample all the members for(int i=0;icontrol_pressed) { for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } delete ccd; return 1; } //get first sample individual chosen = sharedorder->get(i); int abundance; //set info for sharedvector in chosens group for (int j = 0; j < lookup.size(); j++) { if (chosen.group == lookup[j]->getGroup()) { abundance = lookup[j]->getAbundance(chosen.bin); lookup[j]->set(chosen.bin, (abundance + 1), chosen.group); break; } } //calculate at 0 and the given increment if((i == 0) || (i+1) % increment == 0){ //how many comparisons to make i.e. for group a, b, c = ab, ac, bc. int n = 1; bool pair = true; for (int k = 0; k < (lookup.size() - 1); k++) { // pass cdd each set of groups to commpare for (int l = n; l < lookup.size(); l++) { subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabund vectors subset.push_back(lookup[k]); subset.push_back(lookup[l]); //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < lookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(lookup[w]); } } ccd->updateSharedData(subset, i+1, m->getNumGroups(), pair); } n++; } //if this is a calculator that can do multiples then do them pair = false; ccd->updateSharedData(lookup, i+1, m->getNumGroups(), pair); } totalNumSeq = i+1; } //calculate last label if you haven't already if(numSeqs % increment != 0){ //how many comparisons to make i.e. for group a, b, c = ab, ac, bc. int n = 1; bool pair = true; for (int k = 0; k < (lookup.size() - 1); k++) { // pass cdd each set of groups to commpare for (int l = n; l < lookup.size(); l++) { subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabund vectors subset.push_back(lookup[k]); subset.push_back(lookup[l]); //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < lookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(lookup[w]); } } ccd->updateSharedData(subset, totalNumSeq, m->getNumGroups(), pair); } n++; } //if this is a calculator that can do multiples then do them pair = false; ccd->updateSharedData(lookup, totalNumSeq, m->getNumGroups(), pair); } //resets output files for(int i=0;ireset(); } //memory cleanup delete ccd; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } catch(exception& e) { m->errorOut(e, "Collect", "getSharedCurve"); exit(1); } } /**************************************************************************************/ void Collect::getGroupComb() { string group; numGroupComb = 0; int n = 1; vector mGroups = m->getGroups(); for (int i = 0; i < (m->getNumGroups() - 1); i++) { for (int l = n; l < m->getNumGroups(); l++) { group = mGroups[i] + mGroups[l]; groupComb.push_back(group); numGroupComb++; } n++; } } /**************************************************************************************/ mothur-1.36.1/source/collect.h000066400000000000000000000021131255543666200162310ustar00rootroot00000000000000#ifndef COLLECT_H #define COLLECT_H #include "collectorscurvedata.h" #include "display.h" #include "ordervector.hpp" #include "sabundvector.hpp" #include "rabundvector.hpp" #include "sharedordervector.h" #include "datavector.hpp" #include "mothurout.h" /***********************************************************************/ class Collect { public: Collect(OrderVector* order, vector disp) : numSeqs(order->getNumSeqs()), order(order), displays(disp), label(order->getLabel()) { m = MothurOut::getInstance(); }; Collect(SharedOrderVector* sharedorder, vector disp) : numSeqs(sharedorder->getNumSeqs()), sharedorder(sharedorder), displays(disp), label(sharedorder->getLabel()) { m = MothurOut::getInstance(); } ~Collect(){ }; int getCurve(float); int getSharedCurve(float); private: SharedOrderVector* sharedorder; OrderVector* order; vector displays; int numSeqs, numGroupComb, totalNumSeq; string label, groupLabel; void getGroupComb(); vector groupComb; bool validGroup(vector, string); MothurOut* m; }; #endif mothur-1.36.1/source/collectdisplay.h000066400000000000000000000060261255543666200176260ustar00rootroot00000000000000#ifndef COLLECTDISPLAY_H #define COLLECTDISPLAY_H #include "sabundvector.hpp" #include "sharedsabundvector.h" #include "calculator.h" #include "fileoutput.h" #include "display.h" /***********************************************************************/ class CollectDisplay : public Display { public: CollectDisplay(Calculator* calc, FileOutput* file) : estimate(calc), output(file) {timesCalled = 0;}; ~CollectDisplay() { delete estimate; delete output; } void update(SAbundVector* rank){ nSeqs=rank->getNumSeqs(); data = estimate->getValues(rank); output->output(nSeqs, data); }; void update(vector shared, int numSeqs, int numGroups){ timesCalled++; data = estimate->getValues(shared); //passes estimators a shared vector from each group to be compared //figure out what groups are being compared in getValues //because we randomizes the order we need to put the results in the correct column in the output file int group1Index, group2Index, pos; vector mGroups = m->getGroups(); for (int i = 0; i < mGroups.size(); i++) { if (shared[0]->getGroup() == mGroups[i]) { group1Index = i; } if (shared[1]->getGroup() == mGroups[i]) { group2Index = i; } } numGroupComb = 0; int n = 1; for (int i = 0; i < (numGroups - 1); i++) { for (int l = n; l < numGroups; l++) { if ((group1Index == i) && (group2Index == l)) { pos = numGroupComb; //pos tells you which column in the output file you are in }else if ((group1Index == l) && (group2Index == i)) { pos = numGroupComb; } numGroupComb++; } n++; } if ((estimate->getMultiple() == true) && all) { numGroupComb++; groupData.resize((numGroupComb*data.size()), 0); //is this the time its called with all values if ((timesCalled % numGroupComb) == 0) { //last spot pos = ((groupData.size()-1) * data.size()); } //fills groupdata with datas info for (int i = 0; i < data.size(); i++) { groupData[pos+i] = data[i]; } }else { groupData.resize((numGroupComb*data.size()), 0); //fills groupdata with datas info for (int i = 0; i < data.size(); i++) { groupData[pos+i] = data[i]; } } //when you get all your groups info then output if ((timesCalled % numGroupComb) == 0) { output->output(numSeqs, groupData); } }; void init(string s) { output->initFile(s); }; void reset() { output->resetFile(); }; void close() { output->resetFile(); }; void setAll(bool a) { all = a; } bool getAll() { return all; } bool isCalcMultiple() { return estimate->getMultiple(); } bool calcNeedsAll() { return estimate->getNeedsAll(); } bool hasLciHci() { if (estimate->getCols() == 3) { return true; } else{ return false; } } string getName() { return estimate->getName(); } private: Calculator* estimate; FileOutput* output; int nSeqs, timesCalled, numGroupComb; vector data; vector groupData; bool all; }; /***********************************************************************/ #endif mothur-1.36.1/source/collectorscurvedata.h000066400000000000000000000042661255543666200206670ustar00rootroot00000000000000#ifndef COLLECTORSCURVEDATA_H #define COLLECTORSCURVEDATA_H #include "sabundvector.hpp" #include "sharedrabundvector.h" #include "display.h" #include "observable.h" /***********************************************************************/ class CollectorsCurveData : public Observable { public: CollectorsCurveData() : rank(0) {}; void registerDisplay(Display* o) { displays.insert(o); }; void removeDisplay(Display* o) { displays.erase(o); delete o; }; SAbundVector* getRankData() { return rank; }; void rankDataChanged() { notifyDisplays(); }; void updateRankData(SAbundVector* rv) { rank = rv; rankDataChanged(); }; void notifyDisplays(){ for(set::iterator pos=displays.begin();pos!=displays.end();pos++){ (*pos)->update(rank); } }; private: set displays; SAbundVector* rank; }; /***********************************************************************/ class SharedCollectorsCurveData : public Observable { public: SharedCollectorsCurveData() { }; //: shared1(0), shared2(0) void registerDisplay(Display* o) { displays.insert(o); }; void removeDisplay(Display* o) { displays.erase(o); delete o; }; void SharedDataChanged() { notifyDisplays(); }; void updateSharedData(vector s, int numSeqs, int numGroupComb, bool p) { pairs = p; shared = s; NumSeqs = numSeqs; NumGroupComb = numGroupComb; SharedDataChanged(); }; void notifyDisplays(){ for(set::iterator pos=displays.begin();pos!=displays.end();pos++){ if ((*pos)->calcNeedsAll() == true) { (*pos)->update(shared, NumSeqs, NumGroupComb); }else{ if ( ((*pos)->isCalcMultiple() == true) && ((*pos)->getAll() == true) && (!pairs) ) { (*pos)->update(shared, NumSeqs, NumGroupComb); }else { vector temp; temp.push_back(shared[0]); temp.push_back(shared[1]); shared = temp; (*pos)->update(shared, NumSeqs, NumGroupComb); } } } }; private: set displays; vector multiDisplays; vector shared; int NumSeqs, NumGroupComb; bool pairs; }; /***********************************************************************/ #endif mothur-1.36.1/source/commandfactory.cpp000066400000000000000000001777011255543666200201650ustar00rootroot00000000000000/* * commandfactory.cpp * * * Created by Pat Schloss on 10/25/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "command.hpp" #include "clustercommand.h" #include "collectcommand.h" #include "collectsharedcommand.h" #include "getgroupcommand.h" #include "getlabelcommand.h" #include "rarefactcommand.h" #include "summarycommand.h" #include "summarysharedcommand.h" #include "rarefactsharedcommand.h" #include "quitcommand.h" #include "helpcommand.h" #include "commandfactory.hpp" #include "deconvolutecommand.h" #include "parsimonycommand.h" #include "unifracunweightedcommand.h" #include "unifracweightedcommand.h" #include "libshuffcommand.h" #include "heatmapcommand.h" #include "heatmapsimcommand.h" #include "filterseqscommand.h" #include "venncommand.h" #include "nocommands.h" #include "binsequencecommand.h" #include "getoturepcommand.h" #include "treegroupscommand.h" #include "distancecommand.h" #include "aligncommand.h" #include "matrixoutputcommand.h" #include "getsabundcommand.h" #include "getrabundcommand.h" #include "seqsummarycommand.h" #include "screenseqscommand.h" #include "reversecommand.h" #include "trimseqscommand.h" #include "mergefilecommand.h" #include "listseqscommand.h" #include "getseqscommand.h" #include "removeseqscommand.h" #include "systemcommand.h" #include "secondarystructurecommand.h" #include "getsharedotucommand.h" #include "getlistcountcommand.h" #include "hclustercommand.h" #include "classifyseqscommand.h" #include "phylotypecommand.h" #include "mgclustercommand.h" #include "preclustercommand.h" #include "pcoacommand.h" #include "otuhierarchycommand.h" #include "setdircommand.h" #include "parselistscommand.h" #include "chimeraccodecommand.h" #include "chimeracheckcommand.h" #include "chimeraslayercommand.h" #include "chimerapintailcommand.h" #include "chimerabellerophoncommand.h" #include "chimerauchimecommand.h" #include "setlogfilecommand.h" #include "phylodiversitycommand.h" #include "makegroupcommand.h" #include "chopseqscommand.h" #include "clearcutcommand.h" #include "catchallcommand.h" #include "splitabundcommand.h" #include "clustersplitcommand.h" #include "classifyotucommand.h" #include "degapseqscommand.h" #include "getrelabundcommand.h" #include "sensspeccommand.h" #include "sffinfocommand.h" #include "seqerrorcommand.h" #include "normalizesharedcommand.h" #include "metastatscommand.h" #include "splitgroupscommand.h" #include "clusterfragmentscommand.h" #include "getlineagecommand.h" #include "removelineagecommand.h" #include "parsefastaqcommand.h" #include "pipelinepdscommand.h" #include "deuniqueseqscommand.h" #include "pairwiseseqscommand.h" #include "clusterdoturcommand.h" #include "subsamplecommand.h" #include "removegroupscommand.h" #include "getgroupscommand.h" #include "getotuscommand.h" #include "removeotuscommand.h" #include "indicatorcommand.h" #include "consensusseqscommand.h" #include "trimflowscommand.h" #include "corraxescommand.h" #include "shhhercommand.h" #include "pcacommand.h" #include "nmdscommand.h" #include "removerarecommand.h" #include "mergegroupscommand.h" #include "amovacommand.h" #include "homovacommand.h" #include "mantelcommand.h" #include "makefastqcommand.h" #include "anosimcommand.h" #include "getcurrentcommand.h" #include "setcurrentcommand.h" #include "sharedcommand.h" #include "getcommandinfocommand.h" #include "deuniquetreecommand.h" #include "countseqscommand.h" #include "countgroupscommand.h" #include "clearmemorycommand.h" #include "summarytaxcommand.h" #include "chimeraperseuscommand.h" #include "shhhseqscommand.h" #include "summaryqualcommand.h" #include "otuassociationcommand.h" #include "sortseqscommand.h" #include "classifytreecommand.h" #include "cooccurrencecommand.h" #include "pcrseqscommand.h" #include "createdatabasecommand.h" #include "makebiomcommand.h" #include "getcoremicrobiomecommand.h" #include "listotulabelscommand.h" #include "getotulabelscommand.h" #include "removeotulabelscommand.h" #include "makecontigscommand.h" #include "loadlogfilecommand.h" #include "sffmultiplecommand.h" #include "classifysvmsharedcommand.h" #include "classifyrfsharedcommand.h" #include "filtersharedcommand.h" #include "primerdesigncommand.h" #include "getdistscommand.h" #include "removedistscommand.h" #include "mergetaxsummarycommand.h" #include "getmetacommunitycommand.h" #include "sparcccommand.h" #include "makelookupcommand.h" #include "renameseqscommand.h" #include "makelefsecommand.h" #include "lefsecommand.h" #include "kruskalwalliscommand.h" #include "sracommand.h" #include "mergesfffilecommand.h" #include "getmimarkspackagecommand.h" #include "mimarksattributescommand.h" #include "setseedcommand.h" #include "makefilecommand.h" //needed for testing project //CommandFactory* CommandFactory::_uniqueInstance; /*******************************************************/ /******************************************************/ CommandFactory* CommandFactory::getInstance() { if( _uniqueInstance == 0) { _uniqueInstance = new CommandFactory(); } return _uniqueInstance; } /***********************************************************/ /***********************************************************/ //note: This class is resposible for knowing which commands are mpiEnabled, //If a command is not enabled only process 0 will execute the command. //This avoids redundant outputs on pieces of code we have not paralellized. //If you add mpi code to a existing command you need to modify the list below or the code will hang on MPI blocking commands like FIle_open. //example: commands["dist.seqs"] = "MPIEnabled"; CommandFactory::CommandFactory(){ string s = ""; m = MothurOut::getInstance(); command = new NoCommand(s); shellcommand = new NoCommand(s); pipecommand = new NoCommand(s); outputDir = ""; inputDir = ""; logFileName = ""; append = false; //initialize list of valid commands commands["make.shared"] = "make.shared"; commands["bin.seqs"] = "bin.seqs"; commands["get.oturep"] = "get.oturep"; commands["cluster"] = "cluster"; commands["unique.seqs"] = "unique.seqs"; commands["dist.shared"] = "dist.shared"; commands["collect.single"] = "collect.single"; commands["collect.shared"] = "collect.shared"; commands["rarefaction.single"] = "rarefaction.single"; commands["rarefaction.shared"] = "rarefaction.shared"; commands["summary.single"] = "summary.single"; commands["summary.shared"] = "summary.shared"; commands["parsimony"] = "parsimony"; commands["unifrac.weighted"] = "unifrac.weighted"; commands["unifrac.unweighted"] = "unifrac.unweighted"; commands["libshuff"] = "libshuff"; commands["tree.shared"] = "tree.shared"; commands["heatmap.bin"] = "heatmap.bin"; commands["heatmap.sim"] = "heatmap.sim"; commands["venn"] = "venn"; commands["get.group"] = "get.group"; commands["get.label"] = "get.label"; commands["get.sabund"] = "get.sabund"; commands["get.rabund"] = "get.rabund"; commands["help"] = "help"; commands["reverse.seqs"] = "reverse.seqs"; commands["trim.seqs"] = "trim.seqs"; commands["trim.flows"] = "trim.flows"; commands["list.seqs"] = "list.seqs"; commands["get.seqs"] = "get.seqs"; commands["remove.seqs"] = "remove.seqs"; commands["system"] = "system"; commands["align.check"] = "align.check"; commands["get.sharedseqs"] = "get.sharedseqs"; commands["get.otulist"] = "get.otulist"; commands["hcluster"] = "hcluster"; commands["phylotype"] = "phylotype"; commands["mgcluster"] = "mgcluster"; commands["pre.cluster"] = "pre.cluster"; commands["pcoa"] = "pcoa"; commands["otu.hierarchy"] = "otu.hierarchy"; commands["set.dir"] = "MPIEnabled"; commands["merge.files"] = "merge.files"; commands["parse.list"] = "parse.list"; commands["set.logfile"] = "set.logfile"; commands["phylo.diversity"] = "phylo.diversity"; commands["make.group"] = "make.group"; commands["chop.seqs"] = "chop.seqs"; commands["clearcut"] = "clearcut"; commands["catchall"] = "catchall"; commands["split.abund"] = "split.abund"; commands["classify.otu"] = "classify.otu"; commands["degap.seqs"] = "degap.seqs"; commands["get.relabund"] = "get.relabund"; commands["sffinfo"] = "sffinfo"; commands["normalize.shared"] = "normalize.shared"; commands["metastats"] = "metastats"; commands["split.groups"] = "split.groups"; commands["cluster.fragments"] = "cluster.fragments"; commands["get.lineage"] = "get.lineage"; commands["remove.lineage"] = "remove.lineage"; commands["fastq.info"] = "fastq.info"; commands["deunique.seqs"] = "deunique.seqs"; commands["cluster.classic"] = "cluster.classic"; commands["sub.sample"] = "sub.sample"; commands["remove.groups"] = "remove.groups"; commands["get.groups"] = "get.groups"; commands["get.otus"] = "get.otus"; commands["remove.otus"] = "remove.otus"; commands["indicator"] = "indicator"; commands["consensus.seqs"] = "consensus.seqs"; commands["corr.axes"] = "corr.axes"; commands["pca"] = "pca"; commands["nmds"] = "nmds"; commands["remove.rare"] = "remove.rare"; commands["amova"] = "amova"; commands["homova"] = "homova"; commands["mantel"] = "mantel"; commands["anosim"] = "anosim"; commands["make.fastq"] = "make.fastq"; commands["merge.groups"] = "merge.groups"; commands["get.current"] = "MPIEnabled"; commands["set.current"] = "MPIEnabled"; commands["get.commandinfo"] = "get.commandinfo"; commands["deunique.tree"] = "deunique.tree"; commands["count.seqs"] = "count.seqs"; commands["count.groups"] = "count.groups"; commands["clear.memory"] = "clear.memory"; commands["pairwise.seqs"] = "MPIEnabled"; commands["pipeline.pds"] = "MPIEnabled"; commands["classify.seqs"] = "MPIEnabled"; commands["dist.seqs"] = "MPIEnabled"; commands["filter.seqs"] = "MPIEnabled"; commands["align.seqs"] = "MPIEnabled"; commands["chimera.ccode"] = "MPIEnabled"; commands["chimera.check"] = "MPIEnabled"; commands["chimera.slayer"] = "MPIEnabled"; commands["chimera.uchime"] = "chimera.uchime"; commands["chimera.perseus"] = "chimera.perseus"; commands["chimera.pintail"] = "MPIEnabled"; commands["chimera.bellerophon"] = "MPIEnabled"; commands["screen.seqs"] = "MPIEnabled"; commands["summary.seqs"] = "summary.seqs"; commands["cluster.split"] = "MPIEnabled"; commands["shhh.flows"] = "MPIEnabled"; commands["sens.spec"] = "sens.spec"; commands["seq.error"] = "seq.error"; commands["summary.tax"] = "summary.tax"; commands["summary.qual"] = "summary.qual"; commands["shhh.seqs"] = "shhh.seqs"; commands["otu.association"] = "otu.association"; commands["sort.seqs"] = "sort.seqs"; commands["classify.tree"] = "classify.tree"; commands["cooccurrence"] = "cooccurrence"; commands["pcr.seqs"] = "pcr.seqs"; commands["create.database"] = "create.database"; commands["make.biom"] = "make.biom"; commands["get.coremicrobiome"] = "get.coremicrobiome"; commands["list.otulabels"] = "list.otulabels"; commands["get.otulabels"] = "get.otulabels"; commands["remove.otulabels"] = "remove.otulabels"; commands["make.contigs"] = "make.contigs"; commands["load.logfile"] = "load.logfile"; commands["make.table"] = "make.table"; commands["sff.multiple"] = "sff.multiple"; commands["quit"] = "MPIEnabled"; commands["classify.rf"] = "classify.rf"; commands["classify.svm"] = "classify.svm"; commands["filter.shared"] = "filter.shared"; commands["primer.design"] = "primer.design"; commands["get.dists"] = "get.dists"; commands["remove.dists"] = "remove.dists"; commands["merge.taxsummary"] = "merge.taxsummary"; commands["get.communitytype"] = "get.communitytype"; commands["sparcc"] = "sparcc"; commands["make.lookup"] = "make.lookup"; commands["rename.seqs"] = "rename.seqs"; commands["make.lefse"] = "make.lefse"; commands["lefse"] = "lefse"; commands["kruskal.wallis"] = "kruskal.wallis"; commands["make.sra"] = "make.sra"; commands["merge.sfffiles"] = "merge.sfffiles"; commands["get.mimarkspackage"] = "get.mimarkspackage"; commands["mimarks.attributes"] = "mimarks.attributes"; commands["make.file"] = "make.file"; commands["set.seed"] = "set.seed"; } /***********************************************************/ /***********************************************************/ bool CommandFactory::MPIEnabled(string commandName) { bool mpi = false; it = commands.find(commandName); if (it != commands.end()) { if (it->second == "MPIEnabled") { return true; } } return mpi; } /***********************************************************/ /***********************************************************/ CommandFactory::~CommandFactory(){ _uniqueInstance = 0; delete command; delete shellcommand; delete pipecommand; } /***********************************************************/ /***********************************************************/ int CommandFactory::checkForRedirects(string optionString) { try { int pos = optionString.find("outputdir"); if (pos != string::npos) { //user has set outputdir in command option string string outputOption = ""; bool foundEquals = false; for(int i=pos;imkDir(outputOption)){ setOutputDirectory(outputOption); m->mothurOut("Setting output directory to: " + outputOption); m->mothurOutEndLine(); } } pos = optionString.find("inputdir"); if (pos != string::npos) { //user has set inputdir in command option string string intputOption = ""; bool foundEquals = false; for(int i=pos;idirCheck(intputOption)){ setInputDirectory(intputOption); m->mothurOut("Setting input directory to: " + intputOption); m->mothurOutEndLine(); } } pos = optionString.find("seed"); if (pos != string::npos) { //user has set inputdir in command option string string intputOption = ""; bool foundEquals = false; for(int i=pos;iisInteger(intputOption)) { m->mothurConvert(intputOption, random); seed=true; } else { m->mothurOut("[ERROR]: Seed must be an integer."); m->mothurOutEndLine(); seed = false;} } if (seed) { srand(random); m->mothurOut("Setting random seed to " + toString(random) + ".\n\n"); } } return 0; } catch(exception& e) { m->errorOut(e, "CommandFactory", "getCommand"); exit(1); } } /***********************************************************/ /***********************************************************/ //This function calls the appropriate command fucntions based on user input. Command* CommandFactory::getCommand(string commandName, string optionString){ try { delete command; //delete the old command checkForRedirects(optionString); //user has opted to redirect output from dir where input files are located to some other place if (outputDir != "") { if (optionString != "") { optionString += ", outputdir=" + outputDir; } else { optionString += "outputdir=" + outputDir; } } //user has opted to redirect input from dir where mothur.exe is located to some other place if (inputDir != "") { if (optionString != "") { optionString += ", inputdir=" + inputDir; } else { optionString += "inputdir=" + inputDir; } } if(commandName == "cluster") { command = new ClusterCommand(optionString); } else if(commandName == "unique.seqs") { command = new DeconvoluteCommand(optionString); } else if(commandName == "parsimony") { command = new ParsimonyCommand(optionString); } else if(commandName == "help") { command = new HelpCommand(optionString); } else if(commandName == "quit") { command = new QuitCommand(optionString); } else if(commandName == "collect.single") { command = new CollectCommand(optionString); } else if(commandName == "collect.shared") { command = new CollectSharedCommand(optionString); } else if(commandName == "rarefaction.single") { command = new RareFactCommand(optionString); } else if(commandName == "rarefaction.shared") { command = new RareFactSharedCommand(optionString); } else if(commandName == "summary.single") { command = new SummaryCommand(optionString); } else if(commandName == "summary.shared") { command = new SummarySharedCommand(optionString); } else if(commandName == "unifrac.weighted") { command = new UnifracWeightedCommand(optionString); } else if(commandName == "unifrac.unweighted") { command = new UnifracUnweightedCommand(optionString); } else if(commandName == "get.group") { command = new GetgroupCommand(optionString); } else if(commandName == "get.label") { command = new GetlabelCommand(optionString); } else if(commandName == "get.sabund") { command = new GetSAbundCommand(optionString); } else if(commandName == "get.rabund") { command = new GetRAbundCommand(optionString); } else if(commandName == "libshuff") { command = new LibShuffCommand(optionString); } else if(commandName == "heatmap.bin") { command = new HeatMapCommand(optionString); } else if(commandName == "heatmap.sim") { command = new HeatMapSimCommand(optionString); } else if(commandName == "filter.seqs") { command = new FilterSeqsCommand(optionString); } else if(commandName == "venn") { command = new VennCommand(optionString); } else if(commandName == "bin.seqs") { command = new BinSeqCommand(optionString); } else if(commandName == "get.oturep") { command = new GetOTURepCommand(optionString); } else if(commandName == "tree.shared") { command = new TreeGroupCommand(optionString); } else if(commandName == "dist.shared") { command = new MatrixOutputCommand(optionString); } else if(commandName == "dist.seqs") { command = new DistanceCommand(optionString); } else if(commandName == "align.seqs") { command = new AlignCommand(optionString); } else if(commandName == "summary.seqs") { command = new SeqSummaryCommand(optionString); } else if(commandName == "screen.seqs") { command = new ScreenSeqsCommand(optionString); } else if(commandName == "reverse.seqs") { command = new ReverseSeqsCommand(optionString); } else if(commandName == "trim.seqs") { command = new TrimSeqsCommand(optionString); } else if(commandName == "trim.flows") { command = new TrimFlowsCommand(optionString); } else if(commandName == "shhh.flows") { command = new ShhherCommand(optionString); } else if(commandName == "list.seqs") { command = new ListSeqsCommand(optionString); } else if(commandName == "get.seqs") { command = new GetSeqsCommand(optionString); } else if(commandName == "remove.seqs") { command = new RemoveSeqsCommand(optionString); } else if(commandName == "merge.files") { command = new MergeFileCommand(optionString); } else if(commandName == "system") { command = new SystemCommand(optionString); } else if(commandName == "align.check") { command = new AlignCheckCommand(optionString); } else if(commandName == "get.sharedseqs") { command = new GetSharedOTUCommand(optionString); } else if(commandName == "get.otulist") { command = new GetListCountCommand(optionString); } else if(commandName == "hcluster") { command = new HClusterCommand(optionString); } else if(commandName == "classify.seqs") { command = new ClassifySeqsCommand(optionString); } else if(commandName == "chimera.ccode") { command = new ChimeraCcodeCommand(optionString); } else if(commandName == "chimera.check") { command = new ChimeraCheckCommand(optionString); } else if(commandName == "chimera.slayer") { command = new ChimeraSlayerCommand(optionString); } else if(commandName == "chimera.uchime") { command = new ChimeraUchimeCommand(optionString); } else if(commandName == "chimera.pintail") { command = new ChimeraPintailCommand(optionString); } else if(commandName == "chimera.bellerophon") { command = new ChimeraBellerophonCommand(optionString); } else if(commandName == "phylotype") { command = new PhylotypeCommand(optionString); } else if(commandName == "mgcluster") { command = new MGClusterCommand(optionString); } else if(commandName == "pre.cluster") { command = new PreClusterCommand(optionString); } else if(commandName == "pcoa") { command = new PCOACommand(optionString); } else if(commandName == "pca") { command = new PCACommand(optionString); } else if(commandName == "nmds") { command = new NMDSCommand(optionString); } else if(commandName == "otu.hierarchy") { command = new OtuHierarchyCommand(optionString); } else if(commandName == "set.dir") { command = new SetDirectoryCommand(optionString); } else if(commandName == "set.logfile") { command = new SetLogFileCommand(optionString); } else if(commandName == "parse.list") { command = new ParseListCommand(optionString); } else if(commandName == "phylo.diversity") { command = new PhyloDiversityCommand(optionString); } else if(commandName == "make.group") { command = new MakeGroupCommand(optionString); } else if(commandName == "chop.seqs") { command = new ChopSeqsCommand(optionString); } else if(commandName == "clearcut") { command = new ClearcutCommand(optionString); } else if(commandName == "catchall") { command = new CatchAllCommand(optionString); } else if(commandName == "split.abund") { command = new SplitAbundCommand(optionString); } else if(commandName == "cluster.split") { command = new ClusterSplitCommand(optionString); } else if(commandName == "classify.otu") { command = new ClassifyOtuCommand(optionString); } else if(commandName == "degap.seqs") { command = new DegapSeqsCommand(optionString); } else if(commandName == "get.relabund") { command = new GetRelAbundCommand(optionString); } else if(commandName == "sens.spec") { command = new SensSpecCommand(optionString); } else if(commandName == "seq.error") { command = new SeqErrorCommand(optionString); } else if(commandName == "sffinfo") { command = new SffInfoCommand(optionString); } else if(commandName == "normalize.shared") { command = new NormalizeSharedCommand(optionString); } else if(commandName == "metastats") { command = new MetaStatsCommand(optionString); } else if(commandName == "split.groups") { command = new SplitGroupCommand(optionString); } else if(commandName == "cluster.fragments") { command = new ClusterFragmentsCommand(optionString); } else if(commandName == "get.lineage") { command = new GetLineageCommand(optionString); } else if(commandName == "remove.lineage") { command = new RemoveLineageCommand(optionString); } else if(commandName == "get.groups") { command = new GetGroupsCommand(optionString); } else if(commandName == "remove.groups") { command = new RemoveGroupsCommand(optionString); } else if(commandName == "get.otus") { command = new GetOtusCommand(optionString); } else if(commandName == "remove.otus") { command = new RemoveOtusCommand(optionString); } else if(commandName == "fastq.info") { command = new ParseFastaQCommand(optionString); } else if(commandName == "pipeline.pds") { command = new PipelineCommand(optionString); } else if(commandName == "deunique.seqs") { command = new DeUniqueSeqsCommand(optionString); } else if(commandName == "pairwise.seqs") { command = new PairwiseSeqsCommand(optionString); } else if(commandName == "cluster.classic") { command = new ClusterDoturCommand(optionString); } else if(commandName == "sub.sample") { command = new SubSampleCommand(optionString); } else if(commandName == "indicator") { command = new IndicatorCommand(optionString); } else if(commandName == "consensus.seqs") { command = new ConsensusSeqsCommand(optionString); } else if(commandName == "corr.axes") { command = new CorrAxesCommand(optionString); } else if(commandName == "remove.rare") { command = new RemoveRareCommand(optionString); } else if(commandName == "merge.groups") { command = new MergeGroupsCommand(optionString); } else if(commandName == "amova") { command = new AmovaCommand(optionString); } else if(commandName == "homova") { command = new HomovaCommand(optionString); } else if(commandName == "mantel") { command = new MantelCommand(optionString); } else if(commandName == "make.fastq") { command = new MakeFastQCommand(optionString); } else if(commandName == "get.current") { command = new GetCurrentCommand(optionString); } else if(commandName == "set.current") { command = new SetCurrentCommand(optionString); } else if(commandName == "anosim") { command = new AnosimCommand(optionString); } else if(commandName == "make.shared") { command = new SharedCommand(optionString); } else if(commandName == "get.commandinfo") { command = new GetCommandInfoCommand(optionString); } else if(commandName == "deunique.tree") { command = new DeuniqueTreeCommand(optionString); } else if((commandName == "count.seqs") || (commandName == "make.table")) { command = new CountSeqsCommand(optionString); } else if(commandName == "count.groups") { command = new CountGroupsCommand(optionString); } else if(commandName == "clear.memory") { command = new ClearMemoryCommand(optionString); } else if(commandName == "summary.tax") { command = new SummaryTaxCommand(optionString); } else if(commandName == "summary.qual") { command = new SummaryQualCommand(optionString); } else if(commandName == "chimera.perseus") { command = new ChimeraPerseusCommand(optionString); } else if(commandName == "shhh.seqs") { command = new ShhhSeqsCommand(optionString); } else if(commandName == "otu.association") { command = new OTUAssociationCommand(optionString); } else if(commandName == "sort.seqs") { command = new SortSeqsCommand(optionString); } else if(commandName == "classify.tree") { command = new ClassifyTreeCommand(optionString); } else if(commandName == "cooccurrence") { command = new CooccurrenceCommand(optionString); } else if(commandName == "pcr.seqs") { command = new PcrSeqsCommand(optionString); } else if(commandName == "create.database") { command = new CreateDatabaseCommand(optionString); } else if(commandName == "make.biom") { command = new MakeBiomCommand(optionString); } else if(commandName == "get.coremicrobiome") { command = new GetCoreMicroBiomeCommand(optionString); } else if(commandName == "list.otulabels") { command = new ListOtuLabelsCommand(optionString); } else if(commandName == "get.otulabels") { command = new GetOtuLabelsCommand(optionString); } else if(commandName == "remove.otulabels") { command = new RemoveOtuLabelsCommand(optionString); } else if(commandName == "make.contigs") { command = new MakeContigsCommand(optionString); } else if(commandName == "load.logfile") { command = new LoadLogfileCommand(optionString); } else if(commandName == "sff.multiple") { command = new SffMultipleCommand(optionString); } else if(commandName == "classify.svm") { command = new ClassifySvmSharedCommand(optionString); } else if(commandName == "classify.rf") { command = new ClassifyRFSharedCommand(optionString); } else if(commandName == "filter.shared") { command = new FilterSharedCommand(optionString); } else if(commandName == "primer.design") { command = new PrimerDesignCommand(optionString); } else if(commandName == "get.dists") { command = new GetDistsCommand(optionString); } else if(commandName == "remove.dists") { command = new RemoveDistsCommand(optionString); } else if(commandName == "merge.taxsummary") { command = new MergeTaxSummaryCommand(optionString); } else if(commandName == "get.communitytype") { command = new GetMetaCommunityCommand(optionString); } else if(commandName == "sparcc") { command = new SparccCommand(optionString); } else if(commandName == "make.lookup") { command = new MakeLookupCommand(optionString); } else if(commandName == "rename.seqs") { command = new RenameSeqsCommand(optionString); } else if(commandName == "make.lefse") { command = new MakeLefseCommand(optionString); } else if(commandName == "lefse") { command = new LefseCommand(optionString); } else if(commandName == "kruskal.wallis") { command = new KruskalWallisCommand(optionString); } else if(commandName == "make.sra") { command = new SRACommand(optionString); } else if(commandName == "merge.sfffiles") { command = new MergeSfffilesCommand(optionString); } else if(commandName == "get.mimarkspackage") { command = new GetMIMarksPackageCommand(optionString); } else if(commandName == "mimarks.attributes") { command = new MimarksAttributesCommand(optionString); } else if(commandName == "set.seed") { command = new SetSeedCommand(optionString); } else if(commandName == "make.file") { command = new MakeFileCommand(optionString); } else { command = new NoCommand(optionString); } return command; } catch(exception& e) { m->errorOut(e, "CommandFactory", "getCommand"); exit(1); } } /***********************************************************/ /***********************************************************/ //This function calls the appropriate command fucntions based on user input. Command* CommandFactory::getCommand(string commandName, string optionString, string mode){ try { delete pipecommand; //delete the old command checkForRedirects(optionString); //user has opted to redirect output from dir where input files are located to some other place if (outputDir != "") { if (optionString != "") { optionString += ", outputdir=" + outputDir; } else { optionString += "outputdir=" + outputDir; } } //user has opted to redirect input from dir where mothur.exe is located to some other place if (inputDir != "") { if (optionString != "") { optionString += ", inputdir=" + inputDir; } else { optionString += "inputdir=" + inputDir; } } if(commandName == "cluster") { pipecommand = new ClusterCommand(optionString); } else if(commandName == "unique.seqs") { pipecommand = new DeconvoluteCommand(optionString); } else if(commandName == "parsimony") { pipecommand = new ParsimonyCommand(optionString); } else if(commandName == "help") { pipecommand = new HelpCommand(optionString); } else if(commandName == "quit") { pipecommand = new QuitCommand(optionString); } else if(commandName == "collect.single") { pipecommand = new CollectCommand(optionString); } else if(commandName == "collect.shared") { pipecommand = new CollectSharedCommand(optionString); } else if(commandName == "rarefaction.single") { pipecommand = new RareFactCommand(optionString); } else if(commandName == "rarefaction.shared") { pipecommand = new RareFactSharedCommand(optionString); } else if(commandName == "summary.single") { pipecommand = new SummaryCommand(optionString); } else if(commandName == "summary.shared") { pipecommand = new SummarySharedCommand(optionString); } else if(commandName == "unifrac.weighted") { pipecommand = new UnifracWeightedCommand(optionString); } else if(commandName == "unifrac.unweighted") { pipecommand = new UnifracUnweightedCommand(optionString); } else if(commandName == "get.group") { pipecommand = new GetgroupCommand(optionString); } else if(commandName == "get.label") { pipecommand = new GetlabelCommand(optionString); } else if(commandName == "get.sabund") { pipecommand = new GetSAbundCommand(optionString); } else if(commandName == "get.rabund") { pipecommand = new GetRAbundCommand(optionString); } else if(commandName == "libshuff") { pipecommand = new LibShuffCommand(optionString); } else if(commandName == "heatmap.bin") { pipecommand = new HeatMapCommand(optionString); } else if(commandName == "heatmap.sim") { pipecommand = new HeatMapSimCommand(optionString); } else if(commandName == "filter.seqs") { pipecommand = new FilterSeqsCommand(optionString); } else if(commandName == "venn") { pipecommand = new VennCommand(optionString); } else if(commandName == "bin.seqs") { pipecommand = new BinSeqCommand(optionString); } else if(commandName == "get.oturep") { pipecommand = new GetOTURepCommand(optionString); } else if(commandName == "tree.shared") { pipecommand = new TreeGroupCommand(optionString); } else if(commandName == "dist.shared") { pipecommand = new MatrixOutputCommand(optionString); } else if(commandName == "dist.seqs") { pipecommand = new DistanceCommand(optionString); } else if(commandName == "align.seqs") { pipecommand = new AlignCommand(optionString); } else if(commandName == "summary.seqs") { pipecommand = new SeqSummaryCommand(optionString); } else if(commandName == "screen.seqs") { pipecommand = new ScreenSeqsCommand(optionString); } else if(commandName == "reverse.seqs") { pipecommand = new ReverseSeqsCommand(optionString); } else if(commandName == "trim.seqs") { pipecommand = new TrimSeqsCommand(optionString); } else if(commandName == "trim.flows") { pipecommand = new TrimFlowsCommand(optionString); } else if(commandName == "shhh.flows") { pipecommand = new ShhherCommand(optionString); } else if(commandName == "list.seqs") { pipecommand = new ListSeqsCommand(optionString); } else if(commandName == "get.seqs") { pipecommand = new GetSeqsCommand(optionString); } else if(commandName == "remove.seqs") { pipecommand = new RemoveSeqsCommand(optionString); } else if(commandName == "merge.files") { pipecommand = new MergeFileCommand(optionString); } else if(commandName == "system") { pipecommand = new SystemCommand(optionString); } else if(commandName == "align.check") { pipecommand = new AlignCheckCommand(optionString); } else if(commandName == "get.sharedseqs") { pipecommand = new GetSharedOTUCommand(optionString); } else if(commandName == "get.otulist") { pipecommand = new GetListCountCommand(optionString); } else if(commandName == "hcluster") { pipecommand = new HClusterCommand(optionString); } else if(commandName == "classify.seqs") { pipecommand = new ClassifySeqsCommand(optionString); } else if(commandName == "chimera.ccode") { pipecommand = new ChimeraCcodeCommand(optionString); } else if(commandName == "chimera.check") { pipecommand = new ChimeraCheckCommand(optionString); } else if(commandName == "chimera.uchime") { pipecommand = new ChimeraUchimeCommand(optionString); } else if(commandName == "chimera.slayer") { pipecommand = new ChimeraSlayerCommand(optionString); } else if(commandName == "chimera.pintail") { pipecommand = new ChimeraPintailCommand(optionString); } else if(commandName == "chimera.bellerophon") { pipecommand = new ChimeraBellerophonCommand(optionString); } else if(commandName == "phylotype") { pipecommand = new PhylotypeCommand(optionString); } else if(commandName == "mgcluster") { pipecommand = new MGClusterCommand(optionString); } else if(commandName == "pre.cluster") { pipecommand = new PreClusterCommand(optionString); } else if(commandName == "pcoa") { pipecommand = new PCOACommand(optionString); } else if(commandName == "pca") { pipecommand = new PCACommand(optionString); } else if(commandName == "nmds") { pipecommand = new NMDSCommand(optionString); } else if(commandName == "otu.hierarchy") { pipecommand = new OtuHierarchyCommand(optionString); } else if(commandName == "set.dir") { pipecommand = new SetDirectoryCommand(optionString); } else if(commandName == "set.logfile") { pipecommand = new SetLogFileCommand(optionString); } else if(commandName == "parse.list") { pipecommand = new ParseListCommand(optionString); } else if(commandName == "phylo.diversity") { pipecommand = new PhyloDiversityCommand(optionString); } else if(commandName == "make.group") { pipecommand = new MakeGroupCommand(optionString); } else if(commandName == "chop.seqs") { pipecommand = new ChopSeqsCommand(optionString); } else if(commandName == "clearcut") { pipecommand = new ClearcutCommand(optionString); } else if(commandName == "catchall") { pipecommand = new CatchAllCommand(optionString); } else if(commandName == "split.abund") { pipecommand = new SplitAbundCommand(optionString); } else if(commandName == "cluster.split") { pipecommand = new ClusterSplitCommand(optionString); } else if(commandName == "classify.otu") { pipecommand = new ClassifyOtuCommand(optionString); } else if(commandName == "degap.seqs") { pipecommand = new DegapSeqsCommand(optionString); } else if(commandName == "get.relabund") { pipecommand = new GetRelAbundCommand(optionString); } else if(commandName == "sens.spec") { pipecommand = new SensSpecCommand(optionString); } else if(commandName == "seq.error") { pipecommand = new SeqErrorCommand(optionString); } else if(commandName == "sffinfo") { pipecommand = new SffInfoCommand(optionString); } else if(commandName == "normalize.shared") { pipecommand = new NormalizeSharedCommand(optionString); } else if(commandName == "metastats") { pipecommand = new MetaStatsCommand(optionString); } else if(commandName == "split.groups") { pipecommand = new SplitGroupCommand(optionString); } else if(commandName == "cluster.fragments") { pipecommand = new ClusterFragmentsCommand(optionString); } else if(commandName == "get.lineage") { pipecommand = new GetLineageCommand(optionString); } else if(commandName == "get.groups") { pipecommand = new GetGroupsCommand(optionString); } else if(commandName == "remove.lineage") { pipecommand = new RemoveLineageCommand(optionString); } else if(commandName == "remove.groups") { pipecommand = new RemoveGroupsCommand(optionString); } else if(commandName == "get.otus") { pipecommand = new GetOtusCommand(optionString); } else if(commandName == "remove.otus") { pipecommand = new RemoveOtusCommand(optionString); } else if(commandName == "fastq.info") { pipecommand = new ParseFastaQCommand(optionString); } else if(commandName == "deunique.seqs") { pipecommand = new DeUniqueSeqsCommand(optionString); } else if(commandName == "pairwise.seqs") { pipecommand = new PairwiseSeqsCommand(optionString); } else if(commandName == "cluster.classic") { pipecommand = new ClusterDoturCommand(optionString); } else if(commandName == "sub.sample") { pipecommand = new SubSampleCommand(optionString); } else if(commandName == "indicator") { pipecommand = new IndicatorCommand(optionString); } else if(commandName == "consensus.seqs") { pipecommand = new ConsensusSeqsCommand(optionString); } else if(commandName == "corr.axes") { pipecommand = new CorrAxesCommand(optionString); } else if(commandName == "remove.rare") { pipecommand = new RemoveRareCommand(optionString); } else if(commandName == "merge.groups") { pipecommand = new MergeGroupsCommand(optionString); } else if(commandName == "amova") { pipecommand = new AmovaCommand(optionString); } else if(commandName == "homova") { pipecommand = new HomovaCommand(optionString); } else if(commandName == "mantel") { pipecommand = new MantelCommand(optionString); } else if(commandName == "anosim") { pipecommand = new AnosimCommand(optionString); } else if(commandName == "make.fastq") { pipecommand = new MakeFastQCommand(optionString); } else if(commandName == "get.current") { pipecommand = new GetCurrentCommand(optionString); } else if(commandName == "set.current") { pipecommand = new SetCurrentCommand(optionString); } else if(commandName == "make.shared") { pipecommand = new SharedCommand(optionString); } else if(commandName == "get.commandinfo") { pipecommand = new GetCommandInfoCommand(optionString); } else if(commandName == "deunique.tree") { pipecommand = new DeuniqueTreeCommand(optionString); } else if((commandName == "count.seqs") || (commandName == "make.table")) { pipecommand = new CountSeqsCommand(optionString); } else if(commandName == "count.groups") { pipecommand = new CountGroupsCommand(optionString); } else if(commandName == "clear.memory") { pipecommand = new ClearMemoryCommand(optionString); } else if(commandName == "summary.tax") { pipecommand = new SummaryTaxCommand(optionString); } else if(commandName == "summary.qual") { pipecommand = new SummaryQualCommand(optionString); } else if(commandName == "chimera.perseus") { pipecommand = new ChimeraPerseusCommand(optionString); } else if(commandName == "shhh.seqs") { pipecommand = new ShhhSeqsCommand(optionString); } else if(commandName == "otu.association") { pipecommand = new OTUAssociationCommand(optionString); } else if(commandName == "sort.seqs") { pipecommand = new SortSeqsCommand(optionString); } else if(commandName == "classify.tree") { pipecommand = new ClassifyTreeCommand(optionString); } else if(commandName == "cooccurrence") { pipecommand = new CooccurrenceCommand(optionString); } else if(commandName == "pcr.seqs") { pipecommand = new PcrSeqsCommand(optionString); } else if(commandName == "create.database") { pipecommand = new CreateDatabaseCommand(optionString); } else if(commandName == "make.biom") { pipecommand = new MakeBiomCommand(optionString); } else if(commandName == "get.coremicrobiome") { pipecommand = new GetCoreMicroBiomeCommand(optionString); } else if(commandName == "list.otulabels") { pipecommand = new ListOtuLabelsCommand(optionString); } else if(commandName == "get.otulabels") { pipecommand = new GetOtuLabelsCommand(optionString); } else if(commandName == "remove.otulabels") { pipecommand = new RemoveOtuLabelsCommand(optionString); } else if(commandName == "make.contigs") { pipecommand = new MakeContigsCommand(optionString); } else if(commandName == "load.logfile") { pipecommand = new LoadLogfileCommand(optionString); } else if(commandName == "sff.multiple") { pipecommand = new SffMultipleCommand(optionString); } else if(commandName == "classify.rf") { pipecommand = new ClassifyRFSharedCommand(optionString); } else if(commandName == "filter.shared") { pipecommand = new FilterSharedCommand(optionString); } else if(commandName == "primer.design") { pipecommand = new PrimerDesignCommand(optionString); } else if(commandName == "get.dists") { pipecommand = new GetDistsCommand(optionString); } else if(commandName == "remove.dists") { pipecommand = new RemoveDistsCommand(optionString); } else if(commandName == "merge.taxsummary") { pipecommand = new MergeTaxSummaryCommand(optionString); } else if(commandName == "get.communitytype") { pipecommand = new GetMetaCommunityCommand(optionString); } else if(commandName == "sparcc") { pipecommand = new SparccCommand(optionString); } else if(commandName == "make.lookup") { pipecommand = new MakeLookupCommand(optionString); } else if(commandName == "rename.seqs") { pipecommand = new RenameSeqsCommand(optionString); } else if(commandName == "make.lefse") { pipecommand = new MakeLefseCommand(optionString); } else if(commandName == "lefse") { pipecommand = new LefseCommand(optionString); } else if(commandName == "kruskal.wallis") { pipecommand = new KruskalWallisCommand(optionString); } else if(commandName == "make.sra") { pipecommand = new SRACommand(optionString); } else if(commandName == "merge.sfffiles") { pipecommand = new MergeSfffilesCommand(optionString); } else if(commandName == "classify.svm") { pipecommand = new ClassifySvmSharedCommand(optionString); } else if(commandName == "get.mimarkspackage") { pipecommand = new GetMIMarksPackageCommand(optionString); } else if(commandName == "mimarks.attributes") { pipecommand = new MimarksAttributesCommand(optionString); } else if(commandName == "set.seed") { pipecommand = new SetSeedCommand(optionString); } else if(commandName == "make.file") { pipecommand = new MakeFileCommand(optionString); } else { pipecommand = new NoCommand(optionString); } return pipecommand; } catch(exception& e) { m->errorOut(e, "CommandFactory", "getCommand"); exit(1); } } /***********************************************************/ /***********************************************************/ //This function calls the appropriate command fucntions based on user input, this is used by the pipeline command to check a users piepline for errors before running Command* CommandFactory::getCommand(string commandName){ try { delete shellcommand; //delete the old command if(commandName == "cluster") { shellcommand = new ClusterCommand(); } else if(commandName == "unique.seqs") { shellcommand = new DeconvoluteCommand(); } else if(commandName == "parsimony") { shellcommand = new ParsimonyCommand(); } else if(commandName == "help") { shellcommand = new HelpCommand(); } else if(commandName == "quit") { shellcommand = new QuitCommand(); } else if(commandName == "collect.single") { shellcommand = new CollectCommand(); } else if(commandName == "collect.shared") { shellcommand = new CollectSharedCommand(); } else if(commandName == "rarefaction.single") { shellcommand = new RareFactCommand(); } else if(commandName == "rarefaction.shared") { shellcommand = new RareFactSharedCommand(); } else if(commandName == "summary.single") { shellcommand = new SummaryCommand(); } else if(commandName == "summary.shared") { shellcommand = new SummarySharedCommand(); } else if(commandName == "unifrac.weighted") { shellcommand = new UnifracWeightedCommand(); } else if(commandName == "unifrac.unweighted") { shellcommand = new UnifracUnweightedCommand(); } else if(commandName == "get.group") { shellcommand = new GetgroupCommand(); } else if(commandName == "get.label") { shellcommand = new GetlabelCommand(); } else if(commandName == "get.sabund") { shellcommand = new GetSAbundCommand(); } else if(commandName == "get.rabund") { shellcommand = new GetRAbundCommand(); } else if(commandName == "libshuff") { shellcommand = new LibShuffCommand(); } else if(commandName == "heatmap.bin") { shellcommand = new HeatMapCommand(); } else if(commandName == "heatmap.sim") { shellcommand = new HeatMapSimCommand(); } else if(commandName == "filter.seqs") { shellcommand = new FilterSeqsCommand(); } else if(commandName == "venn") { shellcommand = new VennCommand(); } else if(commandName == "bin.seqs") { shellcommand = new BinSeqCommand(); } else if(commandName == "get.oturep") { shellcommand = new GetOTURepCommand(); } else if(commandName == "tree.shared") { shellcommand = new TreeGroupCommand(); } else if(commandName == "dist.shared") { shellcommand = new MatrixOutputCommand(); } else if(commandName == "dist.seqs") { shellcommand = new DistanceCommand(); } else if(commandName == "align.seqs") { shellcommand = new AlignCommand(); } else if(commandName == "summary.seqs") { shellcommand = new SeqSummaryCommand(); } else if(commandName == "screen.seqs") { shellcommand = new ScreenSeqsCommand(); } else if(commandName == "reverse.seqs") { shellcommand = new ReverseSeqsCommand(); } else if(commandName == "trim.seqs") { shellcommand = new TrimSeqsCommand(); } else if(commandName == "trim.flows") { shellcommand = new TrimFlowsCommand(); } else if(commandName == "shhh.flows") { shellcommand = new ShhherCommand(); } else if(commandName == "list.seqs") { shellcommand = new ListSeqsCommand(); } else if(commandName == "get.seqs") { shellcommand = new GetSeqsCommand(); } else if(commandName == "remove.seqs") { shellcommand = new RemoveSeqsCommand(); } else if(commandName == "merge.files") { shellcommand = new MergeFileCommand(); } else if(commandName == "system") { shellcommand = new SystemCommand(); } else if(commandName == "align.check") { shellcommand = new AlignCheckCommand(); } else if(commandName == "get.sharedseqs") { shellcommand = new GetSharedOTUCommand(); } else if(commandName == "get.otulist") { shellcommand = new GetListCountCommand(); } else if(commandName == "hcluster") { shellcommand = new HClusterCommand(); } else if(commandName == "classify.seqs") { shellcommand = new ClassifySeqsCommand(); } else if(commandName == "chimera.ccode") { shellcommand = new ChimeraCcodeCommand(); } else if(commandName == "chimera.check") { shellcommand = new ChimeraCheckCommand(); } else if(commandName == "chimera.slayer") { shellcommand = new ChimeraSlayerCommand(); } else if(commandName == "chimera.uchime") { shellcommand = new ChimeraUchimeCommand(); } else if(commandName == "chimera.pintail") { shellcommand = new ChimeraPintailCommand(); } else if(commandName == "chimera.bellerophon") { shellcommand = new ChimeraBellerophonCommand(); } else if(commandName == "phylotype") { shellcommand = new PhylotypeCommand(); } else if(commandName == "mgcluster") { shellcommand = new MGClusterCommand(); } else if(commandName == "pre.cluster") { shellcommand = new PreClusterCommand(); } else if(commandName == "pcoa") { shellcommand = new PCOACommand(); } else if(commandName == "pca") { shellcommand = new PCACommand(); } else if(commandName == "nmds") { shellcommand = new NMDSCommand(); } else if(commandName == "otu.hierarchy") { shellcommand = new OtuHierarchyCommand(); } else if(commandName == "set.dir") { shellcommand = new SetDirectoryCommand(); } else if(commandName == "set.logfile") { shellcommand = new SetLogFileCommand(); } else if(commandName == "parse.list") { shellcommand = new ParseListCommand(); } else if(commandName == "phylo.diversity") { shellcommand = new PhyloDiversityCommand(); } else if(commandName == "make.group") { shellcommand = new MakeGroupCommand(); } else if(commandName == "chop.seqs") { shellcommand = new ChopSeqsCommand(); } else if(commandName == "clearcut") { shellcommand = new ClearcutCommand(); } else if(commandName == "catchall") { shellcommand = new CatchAllCommand(); } else if(commandName == "split.abund") { shellcommand = new SplitAbundCommand(); } else if(commandName == "cluster.split") { shellcommand = new ClusterSplitCommand(); } else if(commandName == "classify.otu") { shellcommand = new ClassifyOtuCommand(); } else if(commandName == "degap.seqs") { shellcommand = new DegapSeqsCommand(); } else if(commandName == "get.relabund") { shellcommand = new GetRelAbundCommand(); } else if(commandName == "sens.spec") { shellcommand = new SensSpecCommand(); } else if(commandName == "seq.error") { shellcommand = new SeqErrorCommand(); } else if(commandName == "sffinfo") { shellcommand = new SffInfoCommand(); } else if(commandName == "normalize.shared") { shellcommand = new NormalizeSharedCommand(); } else if(commandName == "metastats") { shellcommand = new MetaStatsCommand(); } else if(commandName == "split.groups") { shellcommand = new SplitGroupCommand(); } else if(commandName == "cluster.fragments") { shellcommand = new ClusterFragmentsCommand(); } else if(commandName == "get.lineage") { shellcommand = new GetLineageCommand(); } else if(commandName == "remove.lineage") { shellcommand = new RemoveLineageCommand(); } else if(commandName == "get.groups") { shellcommand = new GetGroupsCommand(); } else if(commandName == "remove.groups") { shellcommand = new RemoveGroupsCommand(); } else if(commandName == "get.otus") { shellcommand = new GetOtusCommand(); } else if(commandName == "remove.otus") { shellcommand = new RemoveOtusCommand(); } else if(commandName == "fastq.info") { shellcommand = new ParseFastaQCommand(); } else if(commandName == "deunique.seqs") { shellcommand = new DeUniqueSeqsCommand(); } else if(commandName == "pairwise.seqs") { shellcommand = new PairwiseSeqsCommand(); } else if(commandName == "cluster.classic") { shellcommand = new ClusterDoturCommand(); } else if(commandName == "sub.sample") { shellcommand = new SubSampleCommand(); } else if(commandName == "indicator") { shellcommand = new IndicatorCommand(); } else if(commandName == "consensus.seqs") { shellcommand = new ConsensusSeqsCommand(); } else if(commandName == "corr.axes") { shellcommand = new CorrAxesCommand(); } else if(commandName == "remove.rare") { shellcommand = new RemoveRareCommand(); } else if(commandName == "merge.groups") { shellcommand = new MergeGroupsCommand(); } else if(commandName == "amova") { shellcommand = new AmovaCommand(); } else if(commandName == "homova") { shellcommand = new HomovaCommand(); } else if(commandName == "mantel") { shellcommand = new MantelCommand(); } else if(commandName == "anosim") { shellcommand = new AnosimCommand(); } else if(commandName == "make.fastq") { shellcommand = new MakeFastQCommand(); } else if(commandName == "get.current") { shellcommand = new GetCurrentCommand(); } else if(commandName == "set.current") { shellcommand = new SetCurrentCommand(); } else if(commandName == "make.shared") { shellcommand = new SharedCommand(); } else if(commandName == "get.commandinfo") { shellcommand = new GetCommandInfoCommand(); } else if(commandName == "deunique.tree") { shellcommand = new DeuniqueTreeCommand(); } else if((commandName == "count.seqs") || (commandName == "make.table")) { shellcommand = new CountSeqsCommand(); } else if(commandName == "count.groups") { shellcommand = new CountGroupsCommand(); } else if(commandName == "clear.memory") { shellcommand = new ClearMemoryCommand(); } else if(commandName == "summary.tax") { shellcommand = new SummaryTaxCommand(); } else if(commandName == "summary.qual") { shellcommand = new SummaryQualCommand(); } else if(commandName == "chimera.perseus") { shellcommand = new ChimeraPerseusCommand(); } else if(commandName == "shhh.seqs") { shellcommand = new ShhhSeqsCommand(); } else if(commandName == "otu.association") { shellcommand = new OTUAssociationCommand(); } else if(commandName == "sort.seqs") { shellcommand = new SortSeqsCommand(); } else if(commandName == "classify.tree") { shellcommand = new ClassifyTreeCommand(); } else if(commandName == "cooccurrence") { shellcommand = new CooccurrenceCommand(); } else if(commandName == "pcr.seqs") { shellcommand = new PcrSeqsCommand(); } else if(commandName == "create.database") { shellcommand = new CreateDatabaseCommand(); } else if(commandName == "make.biom") { shellcommand = new MakeBiomCommand(); } else if(commandName == "get.coremicrobiome") { shellcommand = new GetCoreMicroBiomeCommand(); } else if(commandName == "list.otulabels") { shellcommand = new ListOtuLabelsCommand(); } else if(commandName == "get.otulabels") { shellcommand = new GetOtuLabelsCommand(); } else if(commandName == "remove.otulabels") { shellcommand = new RemoveOtuLabelsCommand(); } else if(commandName == "make.contigs") { shellcommand = new MakeContigsCommand(); } else if(commandName == "load.logfile") { shellcommand = new LoadLogfileCommand(); } else if(commandName == "sff.multiple") { shellcommand = new SffMultipleCommand(); } else if(commandName == "classify.rf") { shellcommand = new ClassifyRFSharedCommand(); } else if(commandName == "filter.shared") { shellcommand = new FilterSharedCommand(); } else if(commandName == "primer.design") { shellcommand = new PrimerDesignCommand(); } else if(commandName == "get.dists") { shellcommand = new GetDistsCommand(); } else if(commandName == "remove.dists") { shellcommand = new RemoveDistsCommand(); } else if(commandName == "merge.taxsummary") { shellcommand = new MergeTaxSummaryCommand(); } else if(commandName == "get.communitytype") { shellcommand = new GetMetaCommunityCommand(); } else if(commandName == "sparcc") { shellcommand = new SparccCommand(); } else if(commandName == "make.lookup") { shellcommand = new MakeLookupCommand(); } else if(commandName == "rename.seqs") { shellcommand = new RenameSeqsCommand(); } else if(commandName == "make.lefse") { shellcommand = new MakeLefseCommand(); } else if(commandName == "lefse") { shellcommand = new LefseCommand(); } else if(commandName == "kruskal.wallis") { shellcommand = new KruskalWallisCommand(); } else if(commandName == "classify.svm") { shellcommand = new ClassifySvmSharedCommand(); } else if(commandName == "make.sra") { shellcommand = new SRACommand(); } else if(commandName == "merge.sfffiles") { shellcommand = new MergeSfffilesCommand(); } else if(commandName == "get.mimarkspackage") { shellcommand = new GetMIMarksPackageCommand(); } else if(commandName == "mimarks.attributes") { shellcommand = new MimarksAttributesCommand(); } else if(commandName == "set.seed") { shellcommand = new SetSeedCommand(); } else if(commandName == "make.file") { shellcommand = new MakeFileCommand(); } else { shellcommand = new NoCommand(); } return shellcommand; } catch(exception& e) { m->errorOut(e, "CommandFactory", "getCommand"); exit(1); } } /*********************************************************** //This function is used to interrupt a command Command* CommandFactory::getCommand(){ try { delete command; //delete the old command string s = ""; command = new NoCommand(s); return command; } catch(exception& e) { m->errorOut(e, "CommandFactory", "getCommand"); exit(1); } } ***********************************************************************/ bool CommandFactory::isValidCommand(string command) { try { //is the command in the map if ((commands.find(command)) != (commands.end())) { return true; }else{ m->mothurOut(command + " is not a valid command in Mothur. Valid commands are "); for (it = commands.begin(); it != commands.end(); it++) { m->mothurOut(it->first + ", "); } m->mothurOutEndLine(); return false; } } catch(exception& e) { m->errorOut(e, "CommandFactory", "isValidCommand"); exit(1); } } /***********************************************************************/ bool CommandFactory::isValidCommand(string command, string noError) { try { //is the command in the map if ((commands.find(command)) != (commands.end())) { return true; }else{ return false; } } catch(exception& e) { m->errorOut(e, "CommandFactory", "isValidCommand"); exit(1); } } /***********************************************************************/ void CommandFactory::printCommands(ostream& out) { try { out << "Valid commands are: "; for (it = commands.begin(); it != commands.end(); it++) { out << it->first << ","; } out << endl; } catch(exception& e) { m->errorOut(e, "CommandFactory", "printCommands"); exit(1); } } /***********************************************************************/ void CommandFactory::printCommandsCategories(ostream& out) { try { map commands = getListCommands(); map::iterator it; map categories; map::iterator itCat; //loop through each command outputting info for (it = commands.begin(); it != commands.end(); it++) { Command* thisCommand = getCommand(it->first); //don't add hidden commands if (thisCommand->getCommandCategory() != "Hidden") { itCat = categories.find(thisCommand->getCommandCategory()); if (itCat == categories.end()) { categories[thisCommand->getCommandCategory()] = thisCommand->getCommandName(); }else { categories[thisCommand->getCommandCategory()] += ", " + thisCommand->getCommandName(); } } } for (itCat = categories.begin(); itCat != categories.end(); itCat++) { out << itCat->first << " commmands include: " << itCat->second << endl; } } catch(exception& e) { m->errorOut(e, "CommandFactory", "printCommandsCategories"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/commandfactory.hpp000066400000000000000000000033641255543666200201630ustar00rootroot00000000000000#ifndef COMMANDFACTORY_HPP #define COMMANDFACTORY_HPP /* * commandfactory.h * * * Created by Pat Schloss on 10/25/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" #include "currentfile.h" class Command; class CommandFactory { public: static CommandFactory* getInstance(); Command* getCommand(string, string, string); Command* getCommand(string, string); Command* getCommand(string); //Command* getCommand(); bool isValidCommand(string); bool isValidCommand(string, string); void printCommands(ostream&); void printCommandsCategories(ostream&); void setOutputDirectory(string o) { if(m->dirCheck(o) || (o == "")) { outputDir = o; m->setOutputDir(o); } } void setInputDirectory(string i) { if(m->dirCheck(i) || (i == "")) { inputDir = i; } } void setLogfileName(string n, bool a) { logFileName = n; append = a; } string getLogfileName() { return logFileName; } bool getAppend() { return append; } string getOutputDir() { return outputDir; } string getInputDir() { return inputDir; } bool MPIEnabled(string); map getListCommands() { return commands; } private: Command* command; Command* shellcommand; Command* pipecommand; MothurOut* m; CurrentFile* currentFile; map commands; map::iterator it; string outputDir, inputDir, logFileName; bool append; int checkForRedirects(string); static CommandFactory* _uniqueInstance; CommandFactory( const CommandFactory& ); // Disable copy constructor void operator=( const CommandFactory& ); // Disable assignment operator CommandFactory(); ~CommandFactory(); }; #endif mothur-1.36.1/source/commandoptionparser.cpp000066400000000000000000000036631255543666200212360ustar00rootroot00000000000000/* * commandoptionparser.cpp * * * Created by Pat Schloss on 10/23/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "commandoptionparser.hpp" //********************************************************************************************************************** //This Function parses through the command line and pulls out the command then sends the options to the parseGlobalData CommandOptionParser::CommandOptionParser(string input){ try { m = MothurOut::getInstance(); int openParen = input.find_first_of('('); int closeParen = input.find_last_of(')'); optionString = ""; commandString = ""; if(openParen != -1 && closeParen != -1){ //gobble extra spaces int spot = 0; for (int i = 0; i < input.length(); i++) { if (!(isspace(input[i]))) { spot = i; break; } } if (spot > openParen) { spot = 0; } commandString = input.substr(spot, openParen-spot); //commandString contains everything before "(" optionString = input.substr((openParen+1), (closeParen-openParen-1)); //optionString contains everything between "(" and ")". } else if (openParen == -1) { m->mothurOut("[ERROR]: You are missing ("); m->mothurOutEndLine(); } else if (closeParen == -1) { m->mothurOut("[ERROR]: You are missing )"); m->mothurOutEndLine(); } } catch(exception& e) { m->errorOut(e, "CommandOptionParser", "CommandOptionParser"); exit(1); } } //********************************************************************************************************************** string CommandOptionParser::getCommandString() { return commandString; } //********************************************************************************************************************** string CommandOptionParser::getOptionString() { return optionString; } //********************************************************************************************************************** mothur-1.36.1/source/commandoptionparser.hpp000066400000000000000000000010441255543666200212320ustar00rootroot00000000000000#ifndef COMMANDOPTIONPARSER_HPP #define COMMANDOPTIONPARSER_HPP #include "mothur.h" #include "mothurout.h" //********************************************************************************************************************** class CommandOptionParser { public: CommandOptionParser(string); string getCommandString(); string getOptionString(); private: string commandString, optionString; MothurOut* m; }; //********************************************************************************************************************** #endif mothur-1.36.1/source/commandparameter.h000066400000000000000000000072161255543666200201340ustar00rootroot00000000000000#ifndef COMMANDPARAMETER_H #define COMMANDPARAMETER_H /* * commandparameter.h * Mothur * * Created by westcott on 3/23/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothur.h" //********************************************************************************************************************** class CommandParameter { public: CommandParameter() { name = ""; type = ""; options = ""; optionsDefault = ""; chooseOnlyOneGroup = ""; chooseAtLeastOneGroup = ""; linkedGroup = ""; multipleSelectionAllowed = false; required = false; important = false; outputTypes = ""; } CommandParameter(string n, string t, string o, string d, string only, string atLeast, string linked, string opt, bool m, bool r, bool i) : name(n), type(t), options(o), optionsDefault(d), chooseOnlyOneGroup(only), chooseAtLeastOneGroup(atLeast), linkedGroup(linked), outputTypes(opt),multipleSelectionAllowed(m), required(r), important(i) {} CommandParameter(string n, string t, string o, string d, string only, string atLeast, string linked, string opt, bool m, bool r) : name(n), type(t), options(o), optionsDefault(d), chooseOnlyOneGroup(only), chooseAtLeastOneGroup(atLeast), linkedGroup(linked), outputTypes(opt), multipleSelectionAllowed(m), required(r) { important = false; } ~CommandParameter() {} string name; //something like fasta, processors, method string type; //must be set to "Boolean", "Multiple", "Number", "String", "InputTypes" - InputTypes is for file inputs string options; //if the parameter has specific options allowed, used for parameters of type "Multiple", something like "furthest-nearest-average", or "sobs-chao...", leave blank for command that do not required specific options string optionsDefault; //the default for this parameter, could be something like "F" for a boolean or "100" for a number or "sobs-chao" for multiple //for chooseOnlyOneGroup, chooseAtLeastOneGroup and linkedGroup if no group is needed set to "none". string chooseOnlyOneGroup; //for file inputs: if a command has several options for input files but you can only choose one then put them in a group //for instance in the read.dist command you can use a phylip or column file but not both so set chooseOnlyOneGroup for both parameters to something like "DistanceFileGroup" string chooseAtLeastOneGroup; //for file inputs: if a command has several options for input files and you want to make sure one is choosen then put them in a group //for instance in the read.dist command you must provide a phylip or column file so set chooseAtLeastOneGroup for both parameters to something like "DistanceFileGroup" string linkedGroup; //for file inputs: if a command has a file option were if you provide one you must provide another you can put them in a group //for instance in the cluster command if you provide a column file you must provide a name file so set linkedGroup for both parameters to something like "ColumnNameGroup" bool multipleSelectionAllowed; //for "Multiple" type to say whether you can select multiple options, for instance for calc parameter set to true, but for method set to false bool required; //is this parameter required bool important; //is this parameter important. The gui will put "important" parameters first in the option panel. string outputTypes; //types on files created by the command if this parameter is given. ie. get.seqs command fasta parameter makes a fasta file. can be multiple values split by dashes. private: }; //********************************************************************************************************************** #endif mothur-1.36.1/source/commands/000077500000000000000000000000001255543666200162375ustar00rootroot00000000000000mothur-1.36.1/source/commands/aligncommand.cpp000066400000000000000000001362031255543666200214010ustar00rootroot00000000000000/* * aligncommand.cpp * Mothur * * Created by Sarah Westcott on 5/15/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * * This version of nast does everything I think that the greengenes nast server does and then some. I have added the * feature of allowing users to define their database, kmer size for searching, alignment penalty values and alignment * method. This latter feature is perhaps most significant. nastPlus enables a user to use either a Needleman-Wunsch * (non-affine gap penalty) or Gotoh (affine gap penalty) pairwise alignment algorithm. This is significant because it * allows for a global alignment and not the local alignment provided by bLAst. Furthermore, it has the potential to * provide a better alignment because of the banding method employed by blast (I'm not sure about this). * */ #include "aligncommand.h" #include "referencedb.h" //********************************************************************************************************************** vector AlignCommand::setParameters(){ try { CommandParameter ptemplate("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptemplate); CommandParameter pcandidate("fasta", "InputTypes", "", "", "none", "none", "none","fasta-alignreport-accnos",false,true,true); parameters.push_back(pcandidate); CommandParameter psearch("search", "Multiple", "kmer-blast-suffix", "kmer", "", "", "","",false,false,true); parameters.push_back(psearch); CommandParameter pksize("ksize", "Number", "", "8", "", "", "","",false,false); parameters.push_back(pksize); CommandParameter pmatch("match", "Number", "", "1.0", "", "", "","",false,false); parameters.push_back(pmatch); CommandParameter palign("align", "Multiple", "needleman-gotoh-blast-noalign", "needleman", "", "", "","",false,false,true); parameters.push_back(palign); CommandParameter pmismatch("mismatch", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pmismatch); CommandParameter pgapopen("gapopen", "Number", "", "-5.0", "", "", "","",false,false); parameters.push_back(pgapopen); CommandParameter pgapextend("gapextend", "Number", "", "-2.0", "", "", "","",false,false); parameters.push_back(pgapextend); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pflip("flip", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pflip); CommandParameter psave("save", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psave); CommandParameter pthreshold("threshold", "Number", "", "0.50", "", "", "","",false,false); parameters.push_back(pthreshold); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "AlignCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string AlignCommand::getHelpString(){ try { string helpString = ""; helpString += "The align.seqs command reads a file containing sequences and creates an alignment file and a report file."; helpString += "The align.seqs command parameters are reference, fasta, search, ksize, align, match, mismatch, gapopen, gapextend and processors."; helpString += "The reference and fasta parameters are required. You may leave fasta blank if you have a valid fasta file. You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amzon.fasta."; helpString += "The search parameter allows you to specify the method to find most similar template. Your options are: suffix, kmer and blast. The default is kmer."; helpString += "The align parameter allows you to specify the alignment method to use. Your options are: gotoh, needleman, blast and noalign. The default is needleman."; helpString += "The ksize parameter allows you to specify the kmer size for finding most similar template to candidate. The default is 8."; helpString += "The match parameter allows you to specify the bonus for having the same base. The default is 1.0."; helpString += "The mistmatch parameter allows you to specify the penalty for having different bases. The default is -1.0."; helpString += "The gapopen parameter allows you to specify the penalty for opening a gap in an alignment. The default is -5.0."; helpString += "The gapextend parameter allows you to specify the penalty for extending a gap in an alignment. The default is -2.0."; helpString += "The flip parameter is used to specify whether or not you want mothur to try the reverse complement if a sequence falls below the threshold. The default is false."; helpString += "The threshold is used to specify a cutoff at which an alignment is deemed 'bad' and the reverse complement may be tried. The default threshold is 0.50, meaning 50% of the bases are removed in the alignment."; helpString += "If the flip parameter is set to true the reverse complement of the sequence is aligned and the better alignment is reported."; helpString += "If the save parameter is set to true the reference sequences will be saved in memory, to clear them later you can use the clear.memory command. Default=f."; helpString += "The default for the threshold parameter is 0.50, meaning at least 50% of the bases must remain or the sequence is reported as potentially reversed."; helpString += "The align.seqs command should be in the following format:"; helpString += "align.seqs(reference=yourTemplateFile, fasta=yourCandidateFile, align=yourAlignmentMethod, search=yourSearchmethod, ksize=yourKmerSize, match=yourMatchBonus, mismatch=yourMismatchpenalty, gapopen=yourGapopenPenalty, gapextend=yourGapExtendPenalty)"; helpString += "Example align.seqs(candidate=candidate.fasta, template=core.filtered, align=kmer, search=gotoh, ksize=8, match=2.0, mismatch=3.0, gapopen=-2.0, gapextend=-1.0)"; helpString += "Note: No spaces between parameter labels (i.e. candidate), '=' and parameters (i.e.yourFastaFile)."; return helpString; } catch(exception& e) { m->errorOut(e, "AlignCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string AlignCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],align"; } //makes file like: amazon.align else if (type == "alignreport") { pattern = "[filename],align.report"; } else if (type == "accnos") { pattern = "[filename],flip.accnos"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "AlignCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** AlignCommand::AlignCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["accnos"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "AlignCommand", "AlignCommand"); exit(1); } } //********************************************************************************************************************** AlignCommand::AlignCommand(string option) { try { abort = false; calledHelp = false; ReferenceDB* rdb = ReferenceDB::getInstance(); //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true;} else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("align.seqs"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["accnos"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } } } candidateFileName = validParameter.validFile(parameters, "fasta", false); if (candidateFileName == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { candidateFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the candidate parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(candidateFileName, candidateFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < candidateFileNames.size(); i++) { //candidateFileNames[i] = m->getFullPathName(candidateFileNames[i]); bool ignore = false; if (candidateFileNames[i] == "current") { candidateFileNames[i] = m->getFastaFile(); if (candidateFileNames[i] != "") { m->mothurOut("Using " + candidateFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list candidateFileNames.erase(candidateFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(candidateFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { candidateFileNames[i] = inputDir + candidateFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(candidateFileNames[i], in, "noerror"); in.close(); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(candidateFileNames[i]); m->mothurOut("Unable to open " + candidateFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); candidateFileNames[i] = tryPath; } } //if you can't open it, try output location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(candidateFileNames[i]); m->mothurOut("Unable to open " + candidateFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); candidateFileNames[i] = tryPath; } } if (ableToOpen == 1) { m->mothurOut("Unable to open " + candidateFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list candidateFileNames.erase(candidateFileNames.begin()+i); i--; }else { m->setFastaFile(candidateFileNames[i]); } } } //make sure there is at least one valid file left if (candidateFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "ksize", false); if (temp == "not found"){ temp = "8"; } m->mothurConvert(temp, kmerSize); temp = validParameter.validFile(parameters, "match", false); if (temp == "not found"){ temp = "1.0"; } m->mothurConvert(temp, match); temp = validParameter.validFile(parameters, "mismatch", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, misMatch); temp = validParameter.validFile(parameters, "gapopen", false); if (temp == "not found"){ temp = "-5.0"; } m->mothurConvert(temp, gapOpen); temp = validParameter.validFile(parameters, "gapextend", false); if (temp == "not found"){ temp = "-2.0"; } m->mothurConvert(temp, gapExtend); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "flip", false); if (temp == "not found"){ temp = "f"; } flip = m->isTrue(temp); temp = validParameter.validFile(parameters, "save", false); if (temp == "not found"){ temp = "f"; } save = m->isTrue(temp); rdb->save = save; if (save) { //clear out old references rdb->clearMemory(); } //this has to go after save so that if the user sets save=t and provides no reference we abort templateFileName = validParameter.validFile(parameters, "reference", true); if (templateFileName == "not found") { //check for saved reference sequences if (rdb->referenceSeqs.size() != 0) { templateFileName = "saved"; }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required for the align.seqs command."); m->mothurOutEndLine(); abort = true; } }else if (templateFileName == "not open") { abort = true; } else { if (save) { rdb->setSavedReference(templateFileName); } } temp = validParameter.validFile(parameters, "threshold", false); if (temp == "not found"){ temp = "0.50"; } m->mothurConvert(temp, threshold); search = validParameter.validFile(parameters, "search", false); if (search == "not found"){ search = "kmer"; } if ((search != "suffix") && (search != "kmer") && (search != "blast")) { m->mothurOut("invalid search option: choices are kmer, suffix or blast."); m->mothurOutEndLine(); abort=true; } align = validParameter.validFile(parameters, "align", false); if (align == "not found"){ align = "needleman"; } if ((align != "needleman") && (align != "gotoh") && (align != "blast") && (align != "noalign")) { m->mothurOut("invalid align option: choices are needleman, gotoh, blast or noalign."); m->mothurOutEndLine(); abort=true; } } } catch(exception& e) { m->errorOut(e, "AlignCommand", "AlignCommand"); exit(1); } } //********************************************************************************************************************** AlignCommand::~AlignCommand(){ if (abort == false) { for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); delete templateDB; } } //********************************************************************************************************************** int AlignCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } templateDB = new AlignmentDB(templateFileName, search, kmerSize, gapOpen, gapExtend, match, misMatch, rand()); for (int s = 0; s < candidateFileNames.size(); s++) { if (m->control_pressed) { outputTypes.clear(); return 0; } m->mothurOut("Aligning sequences from " + candidateFileNames[s] + " ..." ); m->mothurOutEndLine(); if (outputDir == "") { outputDir += m->hasPath(candidateFileNames[s]); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(candidateFileNames[s])); string alignFileName = getOutputFileName("fasta", variables); string reportFileName = getOutputFileName("alignreport", variables); string accnosFileName = getOutputFileName("accnos", variables); bool hasAccnos = true; int numFastaSeqs = 0; for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); int start = time(NULL); #ifdef USE_MPI int pid, numSeqsPerProcessor; int tag = 2001; vector MPIPos; MPIWroteAccnos = false; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_File inMPI; MPI_File outMPIAlign; MPI_File outMPIReport; MPI_File outMPIAccnos; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outAlignFilename[1024]; strcpy(outAlignFilename, alignFileName.c_str()); char outReportFilename[1024]; strcpy(outReportFilename, reportFileName.c_str()); char outAccnosFilename[1024]; strcpy(outAccnosFilename, accnosFileName.c_str()); char inFileName[1024]; strcpy(inFileName, candidateFileNames[s].c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outAlignFilename, outMode, MPI_INFO_NULL, &outMPIAlign); MPI_File_open(MPI_COMM_WORLD, outReportFilename, outMode, MPI_INFO_NULL, &outMPIReport); MPI_File_open(MPI_COMM_WORLD, outAccnosFilename, outMode, MPI_INFO_NULL, &outMPIAccnos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPIAlign); MPI_File_close(&outMPIReport); MPI_File_close(&outMPIAccnos); outputTypes.clear(); return 0; } if (pid == 0) { //you are the root process MPIPos = m->setFilePosFasta(candidateFileNames[s], numFastaSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numFastaSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (numFastaSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } //figure out how many sequences you have to align numSeqsPerProcessor = numFastaSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPIAlign, outMPIReport, outMPIAccnos, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPIAlign); MPI_File_close(&outMPIReport); MPI_File_close(&outMPIAccnos); outputTypes.clear(); return 0; } for (int i = 1; i < processors; i++) { bool tempResult; MPI_Recv(&tempResult, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &status); if (tempResult != 0) { MPIWroteAccnos = true; } } }else{ //you are a child process MPI_Recv(&numFastaSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(numFastaSeqs+1); MPI_Recv(&MPIPos[0], (numFastaSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = numFastaSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPIAlign, outMPIReport, outMPIAccnos, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPIAlign); MPI_File_close(&outMPIReport); MPI_File_close(&outMPIAccnos); outputTypes.clear(); return 0; } MPI_Send(&MPIWroteAccnos, 1, MPI_INT, 0, tag, MPI_COMM_WORLD); } //close files MPI_File_close(&inMPI); MPI_File_close(&outMPIAlign); MPI_File_close(&outMPIReport); MPI_File_close(&outMPIAccnos); //delete accnos file if blank if (pid == 0) { //delete accnos file if its blank else report to user if (MPIWroteAccnos) { m->mothurOut("Some of you sequences generated alignments that eliminated too many bases, a list is provided in " + accnosFileName + "."); if (!flip) { m->mothurOut(" If you set the flip parameter to true mothur will try aligning the reverse compliment as well."); }else{ m->mothurOut(" If the reverse compliment proved to be better it was reported."); } m->mothurOutEndLine(); }else { //MPI_Info info; //MPI_File_delete(outAccnosFilename, info); hasAccnos = false; m->mothurRemove(accnosFileName); } } #else vector positions; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(candidateFileNames[s], processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } #else if (processors == 1) { lines.push_back(new linePair(0, 1000)); }else { positions = m->setFilePosFasta(candidateFileNames[s], numFastaSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(new linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif if(processors == 1){ numFastaSeqs = driver(lines[0], alignFileName, reportFileName, accnosFileName, candidateFileNames[s]); }else{ numFastaSeqs = createProcesses(alignFileName, reportFileName, accnosFileName, candidateFileNames[s]); } if (m->control_pressed) { m->mothurRemove(accnosFileName); m->mothurRemove(alignFileName); m->mothurRemove(reportFileName); outputTypes.clear(); return 0; } //delete accnos file if its blank else report to user if (m->isBlank(accnosFileName)) { m->mothurRemove(accnosFileName); hasAccnos = false; } else { m->mothurOut("Some of you sequences generated alignments that eliminated too many bases, a list is provided in " + accnosFileName + "."); if (!flip) { m->mothurOut(" If you set the flip parameter to true mothur will try aligning the reverse compliment as well."); }else{ m->mothurOut(" If the reverse compliment proved to be better it was reported."); } m->mothurOutEndLine(); } #endif #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif outputNames.push_back(alignFileName); outputTypes["fasta"].push_back(alignFileName); outputNames.push_back(reportFileName); outputTypes["alignreport"].push_back(reportFileName); if (hasAccnos) { outputNames.push_back(accnosFileName); outputTypes["accnos"].push_back(accnosFileName); } #ifdef USE_MPI } #endif m->mothurOut("It took " + toString(time(NULL) - start) + " secs to align " + toString(numFastaSeqs) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); } //set align file as new current fastafile string currentFasta = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { currentFasta = (itTypes->second)[0]; m->setFastaFile(currentFasta); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "AlignCommand", "execute"); exit(1); } } //********************************************************************************************************************** int AlignCommand::driver(linePair* filePos, string alignFName, string reportFName, string accnosFName, string filename){ try { ofstream alignmentFile; m->openOutputFile(alignFName, alignmentFile); ofstream accnosFile; m->openOutputFile(accnosFName, accnosFile); NastReport report(reportFName); ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(filePos->start); bool done = false; int count = 0; //moved this into driver to avoid deep copies in windows paralellized version Alignment* alignment; int longestBase = templateDB->getLongestBase(); if (m->debug) { m->mothurOut("[DEBUG]: template longest base = " + toString(templateDB->getLongestBase()) + " \n"); } if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, longestBase); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } else if(align == "blast") { alignment = new BlastAlignment(gapOpen, gapExtend, match, misMatch); } else if(align == "noalign") { alignment = new NoAlign(); } else { m->mothurOut(align + " is not a valid alignment option. I will run the command using needleman."); m->mothurOutEndLine(); alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } while (!done) { if (m->control_pressed) { break; } Sequence* candidateSeq = new Sequence(inFASTA); m->gobble(inFASTA); report.setCandidate(candidateSeq); int origNumBases = candidateSeq->getNumBases(); string originalUnaligned = candidateSeq->getUnaligned(); int numBasesNeeded = origNumBases * threshold; if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getUnaligned().length()+1 > alignment->getnRows()) { if (m->debug) { m->mothurOut("[DEBUG]: " + candidateSeq->getName() + " " + toString(candidateSeq->getUnaligned().length()) + " " + toString(alignment->getnRows()) + " \n"); } alignment->resize(candidateSeq->getUnaligned().length()+2); } Sequence temp = templateDB->findClosestSequence(candidateSeq); Sequence* templateSeq = new Sequence(temp.getName(), temp.getAligned()); float searchScore = templateDB->getSearchScore(); Nast* nast = new Nast(alignment, candidateSeq, templateSeq); Sequence* copy; Nast* nast2; bool needToDeleteCopy = false; //this is needed in case you have you enter the ifs below //since nast does not make a copy of hte sequence passed, and it is used by the reporter below //you can't delete the copy sequence til after you report, but you may choose not to create it in the first place //so this bool tells you if you need to delete it //if there is a possibility that this sequence should be reversed if (candidateSeq->getNumBases() < numBasesNeeded) { string wasBetter = ""; //if the user wants you to try the reverse if (flip) { //get reverse compliment copy = new Sequence(candidateSeq->getName(), originalUnaligned); copy->reverseComplement(); if (m->debug) { m->mothurOut("[DEBUG]: flipping " + candidateSeq->getName() + " \n"); } //rerun alignment Sequence temp2 = templateDB->findClosestSequence(copy); Sequence* templateSeq2 = new Sequence(temp2.getName(), temp2.getAligned()); if (m->debug) { m->mothurOut("[DEBUG]: closest template " + temp2.getName() + " \n"); } searchScore = templateDB->getSearchScore(); nast2 = new Nast(alignment, copy, templateSeq2); if (m->debug) { m->mothurOut("[DEBUG]: completed Nast2 " + candidateSeq->getName() + " flipped numBases = " + toString(copy->getNumBases()) + " old numbases = " + toString(candidateSeq->getNumBases()) +" \n"); } //check if any better if (copy->getNumBases() > candidateSeq->getNumBases()) { candidateSeq->setAligned(copy->getAligned()); //use reverse compliments alignment since its better delete templateSeq; templateSeq = templateSeq2; delete nast; nast = nast2; needToDeleteCopy = true; wasBetter = "\treverse complement produced a better alignment, so mothur used the reverse complement."; }else{ wasBetter = "\treverse complement did NOT produce a better alignment so it was not used, please check sequence."; delete nast2; delete templateSeq2; delete copy; } if (m->debug) { m->mothurOut("[DEBUG]: done.\n"); } } //create accnos file with names accnosFile << candidateSeq->getName() << wasBetter << endl; } report.setTemplate(templateSeq); report.setSearchParameters(search, searchScore); report.setAlignmentParameters(align, alignment); report.setNastParameters(*nast); alignmentFile << '>' << candidateSeq->getName() << '\n' << candidateSeq->getAligned() << endl; report.print(); delete nast; delete templateSeq; if (needToDeleteCopy) { delete copy; } count++; } delete candidateSeq; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= filePos->end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen(toString(count) + "\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen(toString(count) + "\n"); } delete alignment; alignmentFile.close(); inFASTA.close(); accnosFile.close(); return count; } catch(exception& e) { m->errorOut(e, "AlignCommand", "driver"); exit(1); } } //********************************************************************************************************************** #ifdef USE_MPI int AlignCommand::driverMPI(int start, int num, MPI_File& inMPI, MPI_File& alignFile, MPI_File& reportFile, MPI_File& accnosFile, vector& MPIPos){ try { string outputString = ""; MPI_Status statusReport; MPI_Status statusAlign; MPI_Status statusAccnos; MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are NastReport report; if (pid == 0) { outputString = report.getHeaders(); int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(reportFile, buf, length, MPI_CHAR, &statusReport); delete buf; } Alignment* alignment; int longestBase = templateDB->getLongestBase(); if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, longestBase); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } else if(align == "blast") { alignment = new BlastAlignment(gapOpen, gapExtend, match, misMatch); } else if(align == "noalign") { alignment = new NoAlign(); } else { m->mothurOut(align + " is not a valid alignment option. I will run the command using needleman."); m->mothurOutEndLine(); alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } for(int i=0;icontrol_pressed) { delete alignment; return 0; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; //memcpy(buf4, outputString.c_str(), length); MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; delete buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); Sequence* candidateSeq = new Sequence(iss); report.setCandidate(candidateSeq); int origNumBases = candidateSeq->getNumBases(); string originalUnaligned = candidateSeq->getUnaligned(); int numBasesNeeded = origNumBases * threshold; if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getUnaligned().length() > alignment->getnRows()) { alignment->resize(candidateSeq->getUnaligned().length()+1); } Sequence temp = templateDB->findClosestSequence(candidateSeq); Sequence* templateSeq = new Sequence(temp.getName(), temp.getAligned()); float searchScore = templateDB->getSearchScore(); Nast* nast = new Nast(alignment, candidateSeq, templateSeq); Sequence* copy; Nast* nast2; bool needToDeleteCopy = false; //this is needed in case you have you enter the ifs below //since nast does not make a copy of hte sequence passed, and it is used by the reporter below //you can't delete the copy sequence til after you report, but you may choose not to create it in the first place //so this bool tells you if you need to delete it //if there is a possibility that this sequence should be reversed if (candidateSeq->getNumBases() < numBasesNeeded) { string wasBetter = ""; //if the user wants you to try the reverse if (flip) { //get reverse compliment copy = new Sequence(candidateSeq->getName(), originalUnaligned); copy->reverseComplement(); //rerun alignment Sequence temp2 = templateDB->findClosestSequence(copy); Sequence* templateSeq2 = new Sequence(temp2.getName(), temp2.getAligned()); searchScore = templateDB->getSearchScore(); nast2 = new Nast(alignment, copy, templateSeq2); //check if any better if (copy->getNumBases() > candidateSeq->getNumBases()) { candidateSeq->setAligned(copy->getAligned()); //use reverse compliments alignment since its better delete templateSeq; templateSeq = templateSeq2; delete nast; nast = nast2; needToDeleteCopy = true; wasBetter = "\treverse complement produced a better alignment, so mothur used the reverse complement."; }else{ wasBetter = "\treverse complement did NOT produce a better alignment, please check sequence."; delete nast2; delete templateSeq2; delete copy; } } //create accnos file with names outputString = candidateSeq->getName() + wasBetter + "\n"; //send results to parent int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(accnosFile, buf, length, MPI_CHAR, &statusAccnos); delete buf; MPIWroteAccnos = true; } report.setTemplate(templateSeq); report.setSearchParameters(search, searchScore); report.setAlignmentParameters(align, alignment); report.setNastParameters(*nast); outputString = ">" + candidateSeq->getName() + "\n" + candidateSeq->getAligned() + "\n"; //send results to parent int length = outputString.length(); char* buf2 = new char[length]; memcpy(buf2, outputString.c_str(), length); MPI_File_write_shared(alignFile, buf2, length, MPI_CHAR, &statusAlign); delete buf2; outputString = report.getReport(); //send results to parent length = outputString.length(); char* buf3 = new char[length]; memcpy(buf3, outputString.c_str(), length); MPI_File_write_shared(reportFile, buf3, length, MPI_CHAR, &statusReport); delete buf3; delete nast; delete templateSeq; if (needToDeleteCopy) { delete copy; } } delete candidateSeq; //report progress if((i+1) % 100 == 0){ cout << (toString(i+1)) << endl; } } //report progress if((num) % 100 != 0){ cout << (toString(num)) << endl; } return 1; } catch(exception& e) { m->errorOut(e, "AlignCommand", "driverMPI"); exit(1); } } #endif /**************************************************************************************************/ int AlignCommand::createProcesses(string alignFileName, string reportFileName, string accnosFName, string filename) { try { int num = 0; processIDS.resize(0); bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], alignFileName + toString(m->mothurGetpid(process)) + ".temp", reportFileName + toString(m->mothurGetpid(process)) + ".temp", accnosFName + m->mothurGetpid(process) + ".temp", filename); //pass numSeqs to parent ofstream out; string tempFile = alignFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector positions; positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 1; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], alignFileName + toString(m->mothurGetpid(process)) + ".temp", reportFileName + toString(m->mothurGetpid(process)) + ".temp", accnosFName + m->mothurGetpid(process) + ".temp", filename); //pass numSeqs to parent ofstream out; string tempFile = alignFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part num = driver(lines[0], alignFileName, reportFileName, accnosFName, filename); //force parent to wait until all the processes are done for (int i=0;i nonBlankAccnosFiles; if (!(m->isBlank(accnosFName))) { nonBlankAccnosFiles.push_back(accnosFName); } else { m->mothurRemove(accnosFName); } //remove so other files can be renamed to it for (int i = 0; i < processIDS.size(); i++) { ifstream in; string tempFile = alignFileName + toString(processIDS[i]) + ".num.temp"; m->openInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); m->appendFiles((alignFileName + toString(processIDS[i]) + ".temp"), alignFileName); m->mothurRemove((alignFileName + toString(processIDS[i]) + ".temp")); appendReportFiles((reportFileName + toString(processIDS[i]) + ".temp"), reportFileName); m->mothurRemove((reportFileName + toString(processIDS[i]) + ".temp")); if (!(m->isBlank(accnosFName + toString(processIDS[i]) + ".temp"))) { nonBlankAccnosFiles.push_back(accnosFName + toString(processIDS[i]) + ".temp"); }else { m->mothurRemove((accnosFName + toString(processIDS[i]) + ".temp")); } } //append accnos files if (nonBlankAccnosFiles.size() != 0) { rename(nonBlankAccnosFiles[0].c_str(), accnosFName.c_str()); for (int h=1; h < nonBlankAccnosFiles.size(); h++) { m->appendFiles(nonBlankAccnosFiles[h], accnosFName); m->mothurRemove(nonBlankAccnosFiles[h]); } }else { //recreate the accnosfile if needed ofstream out; m->openOutputFile(accnosFName, out); out.close(); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the alignData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; istart, lines[i]->end, flip, match, misMatch, gapOpen, gapExtend, threshold, i); pDataArray.push_back(tempalign); processIDS.push_back(i); //MySeqSumThreadFunction is in header. It must be global or static to work with the threads. //default security attributes, thread function name, argument to thread function, use default creation flags, returns the thread identifier hThreadArray[i] = CreateThread(NULL, 0, MyAlignThreadFunction, pDataArray[i], 0, &dwThreadIdArray[i]); } //need to check for line ending error ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(lines[processors-1]->start-1); char c = inFASTA.peek(); if (c != '>') { //we need to move back lines[processors-1]->start--; } //using the main process as a worker saves time and memory //do my part - do last piece because windows is looking for eof num = driver(lines[processors-1], (alignFileName + toString(processors-1) + ".temp"), (reportFileName + toString(processors-1) + ".temp"), (accnosFName + toString(processors-1) + ".temp"), filename); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } num += pDataArray[i]->count; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } vector nonBlankAccnosFiles; if (!(m->isBlank(accnosFName))) { nonBlankAccnosFiles.push_back(accnosFName); } else { m->mothurRemove(accnosFName); } //remove so other files can be renamed to it for (int i = 1; i < processors; i++) { m->appendFiles((alignFileName + toString(i) + ".temp"), alignFileName); m->mothurRemove((alignFileName + toString(i) + ".temp")); appendReportFiles((reportFileName + toString(i) + ".temp"), reportFileName); m->mothurRemove((reportFileName + toString(i) + ".temp")); if (!(m->isBlank(accnosFName + toString(i) + ".temp"))) { nonBlankAccnosFiles.push_back(accnosFName + toString(i) + ".temp"); }else { m->mothurRemove((accnosFName + toString(i) + ".temp")); } } //append accnos files if (nonBlankAccnosFiles.size() != 0) { rename(nonBlankAccnosFiles[0].c_str(), accnosFName.c_str()); for (int h=1; h < nonBlankAccnosFiles.size(); h++) { m->appendFiles(nonBlankAccnosFiles[h], accnosFName); m->mothurRemove(nonBlankAccnosFiles[h]); } }else { //recreate the accnosfile if needed ofstream out; m->openOutputFile(accnosFName, out); out.close(); } #endif return num; } catch(exception& e) { m->errorOut(e, "AlignCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** void AlignCommand::appendReportFiles(string temp, string filename) { try{ ofstream output; ifstream input; m->openOutputFileAppend(filename, output); m->openInputFile(temp, input); while (!input.eof()) { char c = input.get(); if (c == 10 || c == 13){ break; } } // get header line char buffer[4096]; while (!input.eof()) { input.read(buffer, 4096); output.write(buffer, input.gcount()); } input.close(); output.close(); } catch(exception& e) { m->errorOut(e, "AlignCommand", "appendReportFiles"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/aligncommand.h000066400000000000000000000226241255543666200210470ustar00rootroot00000000000000#ifndef ALIGNCOMMAND_H #define ALIGNCOMMAND_H /* * aligncommand.h * Mothur * * Created by Sarah Westcott on 5/15/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "database.hpp" #include "alignment.hpp" #include "alignmentdb.h" #include "sequence.hpp" #include "gotohoverlap.hpp" #include "needlemanoverlap.hpp" #include "blastalign.hpp" #include "noalign.hpp" #include "nast.hpp" #include "nastreport.hpp" //test class AlignCommand : public Command { public: AlignCommand(string); AlignCommand(); ~AlignCommand(); vector setParameters(); string getCommandName() { return "align.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "DeSantis TZ, Jr., Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, Phan R, Andersen GL (2006). NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res 34: W394-9.\nSchloss PD (2009). A high-throughput DNA sequence aligner for microbial ecology studies. PLoS ONE 4: e8230.\nSchloss PD (2010). The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput Biol 6: e1000844.\nhttp://www.mothur.org/wiki/Align.seqs http://www.mothur.org/wiki/Align.seqs"; } string getDescription() { return "align sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector processIDS; //processid vector lines; bool MPIWroteAccnos; AlignmentDB* templateDB; int driver(linePair*, string, string, string, string); int createProcesses(string, string, string, string); void appendReportFiles(string, string); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, MPI_File&, MPI_File&, MPI_File&, vector&); #endif string candidateFileName, templateFileName, distanceFileName, search, align, outputDir; float match, misMatch, gapOpen, gapExtend, threshold; int processors, kmerSize; vector candidateFileNames; vector outputNames; bool abort, flip, calledHelp, save; }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct alignData { string templateFileName; string alignFName; string reportFName; string accnosFName; string filename; string align; string search; bool flip; unsigned long long start; unsigned long long end; MothurOut* m; //AlignmentDB* templateDB; float match, misMatch, gapOpen, gapExtend, threshold; int count, kmerSize, threadID; alignData(){} alignData(string te, string a, string r, string ac, string f, string al, string se, int ks, MothurOut* mout, unsigned long long st, unsigned long long en, bool fl, float ma, float misMa, float gapO, float gapE, float thr, int tid) { templateFileName = te; alignFName = a; reportFName = r; accnosFName = ac; filename = f; flip = fl; m = mout; start = st; end = en; //templateDB = tdb; match = ma; misMatch = misMa; gapOpen = gapO; gapExtend = gapE; threshold = thr; align = al; search = se; count = 0; kmerSize = ks; threadID = tid; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyAlignThreadFunction(LPVOID lpParam){ alignData* pDataArray; pDataArray = (alignData*)lpParam; try { ofstream alignmentFile; pDataArray->m->openOutputFile(pDataArray->alignFName, alignmentFile); ofstream accnosFile; pDataArray->m->openOutputFile(pDataArray->accnosFName, accnosFile); NastReport report(pDataArray->reportFName); ifstream inFASTA; pDataArray->m->openInputFile(pDataArray->filename, inFASTA); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { inFASTA.seekg(0); pDataArray->m->zapGremlins(inFASTA); }else { //this accounts for the difference in line endings. inFASTA.seekg(pDataArray->start-1); pDataArray->m->gobble(inFASTA); } AlignmentDB* templateDB = new AlignmentDB(pDataArray->templateFileName, pDataArray->search, pDataArray->kmerSize, pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch, pDataArray->threadID); //moved this into driver to avoid deep copies in windows paralellized version Alignment* alignment; int longestBase = templateDB->getLongestBase(); if(pDataArray->align == "gotoh") { alignment = new GotohOverlap(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch, longestBase); } else if(pDataArray->align == "needleman") { alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, longestBase); } else if(pDataArray->align == "blast") { alignment = new BlastAlignment(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch); } else if(pDataArray->align == "noalign") { alignment = new NoAlign(); } else { pDataArray->m->mothurOut(pDataArray->align + " is not a valid alignment option. I will run the command using needleman."); pDataArray->m->mothurOutEndLine(); alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, longestBase); } pDataArray->count = 0; for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { break; } Sequence* candidateSeq = new Sequence(inFASTA); pDataArray->m->gobble(inFASTA); report.setCandidate(candidateSeq); int origNumBases = candidateSeq->getNumBases(); string originalUnaligned = candidateSeq->getUnaligned(); int numBasesNeeded = origNumBases * pDataArray->threshold; if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getUnaligned().length() > alignment->getnRows()) { alignment->resize(candidateSeq->getUnaligned().length()+1); } Sequence temp = templateDB->findClosestSequence(candidateSeq); Sequence* templateSeq = new Sequence(temp.getName(), temp.getAligned()); float searchScore = templateDB->getSearchScore(); Nast* nast = new Nast(alignment, candidateSeq, templateSeq); Sequence* copy; Nast* nast2; bool needToDeleteCopy = false; //this is needed in case you have you enter the ifs below //since nast does not make a copy of hte sequence passed, and it is used by the reporter below //you can't delete the copy sequence til after you report, but you may choose not to create it in the first place //so this bool tells you if you need to delete it //if there is a possibility that this sequence should be reversed if (candidateSeq->getNumBases() < numBasesNeeded) { string wasBetter = ""; //if the user wants you to try the reverse if (pDataArray->flip) { //get reverse compliment copy = new Sequence(candidateSeq->getName(), originalUnaligned); copy->reverseComplement(); //rerun alignment Sequence temp2 = templateDB->findClosestSequence(copy); Sequence* templateSeq2 = new Sequence(temp2.getName(), temp2.getAligned()); searchScore = templateDB->getSearchScore(); nast2 = new Nast(alignment, copy, templateSeq2); //check if any better if (copy->getNumBases() > candidateSeq->getNumBases()) { candidateSeq->setAligned(copy->getAligned()); //use reverse compliments alignment since its better delete templateSeq; templateSeq = templateSeq2; delete nast; nast = nast2; needToDeleteCopy = true; wasBetter = "\treverse complement produced a better alignment, so mothur used the reverse complement."; }else{ wasBetter = "\treverse complement did NOT produce a better alignment so it was not used, please check sequence."; delete nast2; delete templateSeq2; delete copy; } } //create accnos file with names accnosFile << candidateSeq->getName() << wasBetter << endl; } report.setTemplate(templateSeq); report.setSearchParameters(pDataArray->search, searchScore); report.setAlignmentParameters(pDataArray->align, alignment); report.setNastParameters(*nast); alignmentFile << '>' << candidateSeq->getName() << '\n' << candidateSeq->getAligned() << endl; report.print(); delete nast; delete templateSeq; if (needToDeleteCopy) { delete copy; } pDataArray->count++; } delete candidateSeq; //report progress if((pDataArray->count) % 100 == 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count)+"\n"); } } //report progress if((pDataArray->count) % 100 != 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count)+"\n"); } delete alignment; delete templateDB; alignmentFile.close(); inFASTA.close(); accnosFile.close(); } catch(exception& e) { pDataArray->m->errorOut(e, "AlignCommand", "MyAlignThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/amovacommand.cpp000066400000000000000000000426151255543666200214150ustar00rootroot00000000000000/* * amovacommand.cpp * mothur * * Created by westcott on 2/7/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "amovacommand.h" #include "readphylipvector.h" #include "designmap.h" #include "sharedutilities.h" //********************************************************************************************************************** vector AmovaCommand::setParameters(){ try { CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","amova",false,true,true); parameters.push_back(pdesign); CommandParameter psets("sets", "String", "", "", "", "", "","",false,false); parameters.push_back(psets); CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "none", "none","amova",false,true,true); parameters.push_back(pphylip); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter palpha("alpha", "Number", "", "0.05", "", "", "","",false,false); parameters.push_back(palpha); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "AmovaCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string AmovaCommand::getHelpString(){ try { string helpString = ""; helpString += "Referenced: Anderson MJ (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecol 26: 32-46."; helpString += "The amova command outputs a .amova file."; helpString += "The amova command parameters are phylip, iters, sets and alpha. The phylip and design parameters are required, unless you have valid current files."; helpString += "The design parameter allows you to assign your samples to groups when you are running amova. It is required."; helpString += "The design file looks like the group file. It is a 2 column tab delimited file, where the first column is the sample name and the second column is the group the sample belongs to."; helpString += "The sets parameter allows you to specify which of the sets in your designfile you would like to analyze. The set names are separated by dashes. THe default is all sets in the designfile.\n"; helpString += "The iters parameter allows you to set number of randomization for the P value. The default is 1000."; helpString += "The amova command should be in the following format: amova(phylip=file.dist, design=file.design)."; helpString += "Note: No spaces between parameter labels (i.e. iters), '=' and parameters (i.e. 1000)."; return helpString; } catch(exception& e) { m->errorOut(e, "AmovaCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string AmovaCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "amova") { pattern = "[filename],amova"; } //makes file like: amazon.align else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "AmovaCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** AmovaCommand::AmovaCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["amova"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "AmovaCommand", "AmovaCommand"); exit(1); } } //********************************************************************************************************************** AmovaCommand::AmovaCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["amova"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } } phylipFileName = validParameter.validFile(parameters, "phylip", true); if (phylipFileName == "not open") { phylipFileName = ""; abort = true; } else if (phylipFileName == "not found") { //if there is a current phylip file, use it phylipFileName = m->getPhylipFile(); if (phylipFileName != "") { m->mothurOut("Using " + phylipFileName + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current phylip file and the phylip parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setPhylipFile(phylipFileName); } //check for required parameters designFileName = validParameter.validFile(parameters, "design", true); if (designFileName == "not open") { designFileName = ""; abort = true; } else if (designFileName == "not found") { //if there is a current design file, use it designFileName = m->getDesignFile(); if (designFileName != "") { m->mothurOut("Using " + designFileName + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current design file and the design parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setDesignFile(designFileName); } string temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "alpha", false); if (temp == "not found") { temp = "0.05"; } m->mothurConvert(temp, experimentwiseAlpha); string sets = validParameter.validFile(parameters, "sets", false); if (sets == "not found") { sets = ""; } else { m->splitAtDash(sets, Sets); } } } catch(exception& e) { m->errorOut(e, "AmovaCommand", "AmovaCommand"); exit(1); } } //********************************************************************************************************************** int AmovaCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //read design file designMap = new DesignMap(designFileName); if (outputDir == "") { outputDir = m->hasPath(phylipFileName); } //read in distance matrix and square it ReadPhylipVector readMatrix(phylipFileName); vector sampleNames = readMatrix.read(distanceMatrix); if (Sets.size() != 0) { //user selected sets, so we want to remove the samples not in those sets SharedUtil util; vector dGroups = designMap->getCategory(); util.setGroups(Sets, dGroups); for(int i=0;icontrol_pressed) { delete designMap; return 0; } string group = designMap->get(sampleNames[i]); if (group == "not found") { m->mothurOut("[ERROR]: " + sampleNames[i] + " is not in your design file, please correct."); m->mothurOutEndLine(); m->control_pressed = true; }else if (!m->inUsersGroups(group, Sets)){ //not in set we want remove it //remove from all other rows for(int j=0;j > origGroupSampleMap; for(int i=0;iget(sampleNames[i]); if (group == "not found") { m->mothurOut("[ERROR]: " + sampleNames[i] + " is not in your design file, please correct."); m->mothurOutEndLine(); m->control_pressed = true; }else { origGroupSampleMap[group].push_back(i); } } int numGroups = origGroupSampleMap.size(); if (m->control_pressed) { delete designMap; return 0; } //create a new filename ofstream AMOVAFile; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(phylipFileName)); string AMOVAFileName = getOutputFileName("amova", variables); m->openOutputFile(AMOVAFileName, AMOVAFile); outputNames.push_back(AMOVAFileName); outputTypes["amova"].push_back(AMOVAFileName); double fullANOVAPValue = runAMOVA(AMOVAFile, origGroupSampleMap, experimentwiseAlpha); if(fullANOVAPValue <= experimentwiseAlpha && numGroups > 2){ int numCombos = numGroups * (numGroups-1) / 2; double pairwiseAlpha = experimentwiseAlpha / (double) numCombos; map >::iterator itA; map >::iterator itB; for(itA=origGroupSampleMap.begin();itA!=origGroupSampleMap.end();itA++){ itB = itA;itB++; for(itB;itB!=origGroupSampleMap.end();itB++){ map > pairwiseGroupSampleMap; pairwiseGroupSampleMap[itA->first] = itA->second; pairwiseGroupSampleMap[itB->first] = itB->second; runAMOVA(AMOVAFile, pairwiseGroupSampleMap, pairwiseAlpha); } } m->mothurOut("Experiment-wise error rate: " + toString(experimentwiseAlpha) + '\n'); m->mothurOut("Pair-wise error rate (Bonferroni): " + toString(pairwiseAlpha) + '\n'); } else{ m->mothurOut("Experiment-wise error rate: " + toString(experimentwiseAlpha) + '\n'); } m->mothurOut("If you have borderline P-values, you should try increasing the number of iterations\n"); AMOVAFile.close(); delete designMap; m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "AmovaCommand", "execute"); exit(1); } } //********************************************************************************************************************** double AmovaCommand::runAMOVA(ofstream& AMOVAFile, map > groupSampleMap, double alpha) { try { map >::iterator it; int numGroups = groupSampleMap.size(); int totalNumSamples = 0; for(it = groupSampleMap.begin();it!=groupSampleMap.end();it++){ totalNumSamples += it->second.size(); } double ssTotalOrig = calcSSTotal(groupSampleMap); double ssWithinOrig = calcSSWithin(groupSampleMap); double ssAmongOrig = ssTotalOrig - ssWithinOrig; double counter = 0; for(int i=0;i > randomizedGroup = getRandomizedGroups(groupSampleMap); double ssWithinRand = calcSSWithin(randomizedGroup); if(ssWithinRand <= ssWithinOrig){ counter++; } } double pValue = (double)counter / (double) iters; string pString = ""; if(pValue < 1/(double)iters){ pString = '<' + toString(1/(double)iters); } else { pString = toString(pValue); } //print anova table it = groupSampleMap.begin(); AMOVAFile << it->first; m->mothurOut(it->first); it++; for(it;it!=groupSampleMap.end();it++){ AMOVAFile << '-' << it->first; m->mothurOut('-' + it->first); } AMOVAFile << "\tAmong\tWithin\tTotal" << endl; m->mothurOut("\tAmong\tWithin\tTotal\n"); AMOVAFile << "SS\t" << ssAmongOrig << '\t' << ssWithinOrig << '\t' << ssTotalOrig << endl; m->mothurOut("SS\t" + toString(ssAmongOrig) + '\t' + toString(ssWithinOrig) + '\t' + toString(ssTotalOrig) + '\n'); int dfAmong = numGroups - 1; double MSAmong = ssAmongOrig / (double) dfAmong; int dfWithin = totalNumSamples - numGroups; double MSWithin = ssWithinOrig / (double) dfWithin; int dfTotal = totalNumSamples - 1; double Fs = MSAmong / MSWithin; AMOVAFile << "df\t" << dfAmong << '\t' << dfWithin << '\t' << dfTotal << endl; m->mothurOut("df\t" + toString(dfAmong) + '\t' + toString(dfWithin) + '\t' + toString(dfTotal) + '\n'); AMOVAFile << "MS\t" << MSAmong << '\t' << MSWithin << endl << endl; m->mothurOut("MS\t" + toString(MSAmong) + '\t' + toString(MSWithin) + "\n\n"); AMOVAFile << "Fs:\t" << Fs << endl; m->mothurOut("Fs:\t" + toString(Fs) + '\n'); AMOVAFile << "p-value: " << pString; m->mothurOut("p-value: " + pString); if(pValue < alpha){ AMOVAFile << "*"; m->mothurOut("*"); } AMOVAFile << endl << endl; m->mothurOutEndLine();m->mothurOutEndLine(); return pValue; } catch(exception& e) { m->errorOut(e, "AmovaCommand", "runAMOVA"); exit(1); } } //********************************************************************************************************************** map > AmovaCommand::getRandomizedGroups(map > origMapping){ try{ vector sampleIndices; vector samplesPerGroup; map >::iterator it; for(it=origMapping.begin();it!=origMapping.end();it++){ vector indices = it->second; samplesPerGroup.push_back(indices.size()); sampleIndices.insert(sampleIndices.end(), indices.begin(), indices.end()); } random_shuffle(sampleIndices.begin(), sampleIndices.end()); int index = 0; map > randomizedGroups = origMapping; for(it=randomizedGroups.begin();it!=randomizedGroups.end();it++){ for(int i=0;isecond.size();i++){ it->second[i] = sampleIndices[index++]; } } return randomizedGroups; } catch (exception& e) { m->errorOut(e, "AmovaCommand", "getRandomizedGroups"); exit(1); } } //********************************************************************************************************************** double AmovaCommand::calcSSTotal(map >& groupSampleMap) { try { vector indices; map >::iterator it; for(it=groupSampleMap.begin();it!=groupSampleMap.end();it++){ indices.insert(indices.end(), it->second.begin(), it->second.end()); } sort(indices.begin(), indices.end()); int numIndices =indices.size(); double ssTotal = 0.0; for(int i=1;ierrorOut(e, "AmovaCommand", "calcSSTotal"); exit(1); } } //********************************************************************************************************************** double AmovaCommand::calcSSWithin(map >& groupSampleMap) { try { double ssWithin = 0.0; map >::iterator it; for(it=groupSampleMap.begin();it!=groupSampleMap.end();it++){ double withinGroup = 0; vector samples = it->second; for(int i=0;ierrorOut(e, "AmovaCommand", "calcSSWithin"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/amovacommand.h000066400000000000000000000024771255543666200210640ustar00rootroot00000000000000#ifndef AMOVACOMMAND_H #define AMOVACOMMAND_H /* * amovacommand.h * mothur * * Created by westcott on 2/7/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" class DesignMap; class AmovaCommand : public Command { public: AmovaCommand(string); AmovaCommand(); ~AmovaCommand() {} vector setParameters(); string getCommandName() { return "amova"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Anderson MJ (2001). A new method for non-parametric multivariate analysis of variance. Austral Ecol 26: 32-46.\nhttp://www.mothur.org/wiki/Amova"; } string getDescription() { return "analysis of molecular variance"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: double runAMOVA(ofstream&, map >, double); double calcSSWithin(map >&); double calcSSTotal(map >&); map > getRandomizedGroups(map >); bool abort; vector outputNames, Sets; string outputDir, inputDir, designFileName, phylipFileName; DesignMap* designMap; vector< vector > distanceMatrix; int iters; double experimentwiseAlpha; }; #endif mothur-1.36.1/source/commands/anosimcommand.cpp000066400000000000000000000416371255543666200216030ustar00rootroot00000000000000/* * anosimcommand.cpp * mothur * * Created by westcott on 2/14/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "anosimcommand.h" #include "inputdata.h" #include "readphylipvector.h" #include "designmap.h" //********************************************************************************************************************** vector AnosimCommand::setParameters(){ try { CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","anosim",false,true,true); parameters.push_back(pdesign); CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "none", "none","anosim",false,true,true); parameters.push_back(pphylip); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter palpha("alpha", "Number", "", "0.05", "", "", "","",false,false); parameters.push_back(palpha); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "AnosimCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string AnosimCommand::getHelpString(){ try { string helpString = ""; helpString += "Referenced: Clarke, K. R. (1993). Non-parametric multivariate analysis of changes in community structure. _Australian Journal of Ecology_ 18, 117-143.\n"; helpString += "The anosim command outputs a .anosim file. \n"; helpString += "The anosim command parameters are phylip, iters, and alpha. The phylip and design parameters are required, unless you have valid current files.\n"; helpString += "The design parameter allows you to assign your samples to groups when you are running anosim. It is required. \n"; helpString += "The design file looks like the group file. It is a 2 column tab delimited file, where the first column is the sample name and the second column is the group the sample belongs to.\n"; helpString += "The iters parameter allows you to set number of randomization for the P value. The default is 1000. \n"; helpString += "The anosim command should be in the following format: anosim(phylip=file.dist, design=file.design).\n"; helpString += "Note: No spaces between parameter labels (i.e. iters), '=' and parameters (i.e. 1000).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "AnosimCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string AnosimCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "anosim") { pattern = "[filename],anosim"; } //makes file like: amazon.align else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "AnosimCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** AnosimCommand::AnosimCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["anosim"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "AnosimCommand", "AnosimCommand"); exit(1); } } //********************************************************************************************************************** AnosimCommand::AnosimCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["anosim"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } } phylipFileName = validParameter.validFile(parameters, "phylip", true); if (phylipFileName == "not open") { phylipFileName = ""; abort = true; } else if (phylipFileName == "not found") { //if there is a current phylip file, use it phylipFileName = m->getPhylipFile(); if (phylipFileName != "") { m->mothurOut("Using " + phylipFileName + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current phylip file and the phylip parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setPhylipFile(phylipFileName); } //check for required parameters designFileName = validParameter.validFile(parameters, "design", true); if (designFileName == "not open") { designFileName = ""; abort = true; } else if (designFileName == "not found") { //if there is a current design file, use it designFileName = m->getDesignFile(); if (designFileName != "") { m->mothurOut("Using " + designFileName + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current design file and the design parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setDesignFile(designFileName); } string temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "alpha", false); if (temp == "not found") { temp = "0.05"; } m->mothurConvert(temp, experimentwiseAlpha); } } catch(exception& e) { m->errorOut(e, "AnosimCommand", "AnosimCommand"); exit(1); } } //********************************************************************************************************************** int AnosimCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //read design file designMap = new DesignMap(designFileName); if (outputDir == "") { outputDir = m->hasPath(phylipFileName); } //read in distance matrix and square it ReadPhylipVector readMatrix(phylipFileName); vector sampleNames = readMatrix.read(distanceMatrix); for(int i=0;i > origGroupSampleMap; for(int i=0;iget(sampleNames[i]); if (group == "not found") { m->mothurOut("[ERROR]: " + sampleNames[i] + " is not in your design file, please correct."); m->mothurOutEndLine(); m->control_pressed = true; }else { origGroupSampleMap[group].push_back(i); } } int numGroups = origGroupSampleMap.size(); if (m->control_pressed) { delete designMap; return 0; } //create a new filename ofstream ANOSIMFile; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(phylipFileName)); string ANOSIMFileName = getOutputFileName("anosim", variables); m->openOutputFile(ANOSIMFileName, ANOSIMFile); outputNames.push_back(ANOSIMFileName); outputTypes["anosim"].push_back(ANOSIMFileName); m->mothurOut("\ncomparison\tR-value\tP-value\n"); ANOSIMFile << "comparison\tR-value\tP-value\n"; double fullANOSIMPValue = runANOSIM(ANOSIMFile, distanceMatrix, origGroupSampleMap, experimentwiseAlpha); if(fullANOSIMPValue <= experimentwiseAlpha && numGroups > 2){ int numCombos = numGroups * (numGroups-1) / 2; double pairwiseAlpha = experimentwiseAlpha / (double) numCombos; for(map >::iterator itA=origGroupSampleMap.begin();itA!=origGroupSampleMap.end();itA++){ map >::iterator itB = itA; itB++; for(itB;itB!=origGroupSampleMap.end();itB++){ map > subGroupSampleMap; subGroupSampleMap[itA->first] = itA->second; string groupA = itA->first; subGroupSampleMap[itB->first] = itB->second; string groupB = itB->first; vector subIndices; for(map >::iterator it=subGroupSampleMap.begin();it!=subGroupSampleMap.end();it++){ subIndices.insert(subIndices.end(), it->second.begin(), it->second.end()); } int subNumSamples = subIndices.size(); sort(subIndices.begin(), subIndices.end()); vector > subDistMatrix(distanceMatrix.size()); for(int i=0;imothurOut("\nExperiment-wise error rate: " + toString(experimentwiseAlpha) + '\n'); m->mothurOut("Pair-wise error rate (Bonferroni): " + toString(pairwiseAlpha) + '\n'); } else{ m->mothurOut("\nExperiment-wise error rate: " + toString(experimentwiseAlpha) + '\n'); } m->mothurOut("If you have borderline P-values, you should try increasing the number of iterations\n"); ANOSIMFile.close(); delete designMap; m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "AnosimCommand", "execute"); exit(1); } } //********************************************************************************************************************** double AnosimCommand::runANOSIM(ofstream& ANOSIMFile, vector > dMatrix, map > groupSampleMap, double alpha) { try { vector > rankMatrix = convertToRanks(dMatrix); double RValue = calcR(rankMatrix, groupSampleMap); int pCount = 0; for(int i=0;i > randGroupSampleMap = getRandomizedGroups(groupSampleMap); double RValueRand = calcR(rankMatrix, randGroupSampleMap); if(RValue <= RValueRand){ pCount++; } } double pValue = (double)pCount / (double) iters; string pString = ""; if(pValue < 1/(double)iters){ pString = '<' + toString(1/(double)iters); } else { pString = toString(pValue); } map >::iterator it=groupSampleMap.begin(); m->mothurOut(it->first); ANOSIMFile << it->first; it++; for(it;it!=groupSampleMap.end();it++){ m->mothurOut('-' + it->first); ANOSIMFile << '-' << it->first; } m->mothurOut('\t' + toString(RValue) + '\t' + pString); ANOSIMFile << '\t' << RValue << '\t' << pString; if(pValue < alpha){ ANOSIMFile << "*"; m->mothurOut("*"); } ANOSIMFile << endl; m->mothurOutEndLine(); return pValue; } catch(exception& e) { m->errorOut(e, "AnosimCommand", "calcAnisom"); exit(1); } } //********************************************************************************************************************** double AnosimCommand::calcR(vector > rankMatrix, map > groupSampleMap){ try { int numSamples = 0; for(map >::iterator it=groupSampleMap.begin();it!=groupSampleMap.end();it++){ numSamples += it->second.size(); } double within = 0.0; int numWithinComps = 0; for(map >::iterator it=groupSampleMap.begin();it!=groupSampleMap.end();it++){ vector indices = it->second; for(int i=0;i indices[j]) { within += rankMatrix[indices[i]][indices[j]]; } else { within += rankMatrix[indices[j]][indices[i]]; } numWithinComps++; } } } within /= (float) numWithinComps; double between = 0.0; int numBetweenComps = 0; map >::iterator itB; for(map >::iterator itA=groupSampleMap.begin();itA!=groupSampleMap.end();itA++){ for(int i=0;isecond.size();i++){ int A = itA->second[i]; map >::iterator itB = itA; itB++; for(itB;itB!=groupSampleMap.end();itB++){ for(int j=0;jsecond.size();j++){ int B = itB->second[j]; if(A>B) { between += rankMatrix[A][B]; } else { between += rankMatrix[B][A]; } numBetweenComps++; } } } } between /= (float) numBetweenComps; double Rvalue = (between - within)/(numSamples * (numSamples-1) / 4.0); return Rvalue; } catch(exception& e) { m->errorOut(e, "AnosimCommand", "calcWithinBetween"); exit(1); } } //********************************************************************************************************************** vector > AnosimCommand::convertToRanks(vector > dist) { try { vector cells; vector > ranks = dist; for (int i = 0; i < dist.size(); i++) { for (int j = 0; j < i; j++) { if(dist[i][j] != -1){ seqDist member(i, j, dist[i][j]); cells.push_back(member); } } } //sort distances sort(cells.begin(), cells.end(), compareSequenceDistance); //find ranks of distances int index = 0; int indexSum = 0; for(int i=0;ierrorOut(e, "AnosimCommand", "convertToRanks"); exit(1); } } //********************************************************************************************************************** map > AnosimCommand::getRandomizedGroups(map > origMapping){ try{ vector sampleIndices; vector samplesPerGroup; map >::iterator it; for(it=origMapping.begin();it!=origMapping.end();it++){ vector indices = it->second; samplesPerGroup.push_back(indices.size()); sampleIndices.insert(sampleIndices.end(), indices.begin(), indices.end()); } random_shuffle(sampleIndices.begin(), sampleIndices.end()); int index = 0; map > randomizedGroups = origMapping; for(it=randomizedGroups.begin();it!=randomizedGroups.end();it++){ for(int i=0;isecond.size();i++){ it->second[i] = sampleIndices[index++]; } } return randomizedGroups; } catch (exception& e) { m->errorOut(e, "AnosimCommand", "randomizeGroups"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/anosimcommand.h000066400000000000000000000027121255543666200212370ustar00rootroot00000000000000#ifndef ANOSIMCOMMAND_H #define ANOSIMCOMMAND_H /* * anosimcommand.h * mothur * * Created by westcott on 2/14/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" class DesignMap; class AnosimCommand : public Command { public: AnosimCommand(string); AnosimCommand(); ~AnosimCommand(){} vector setParameters(); string getCommandName() { return "anosim"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Clarke, K. R. (1993). Non-parametric multivariate analysis of changes in community structure. _Australian Journal of Ecology_ 18, 117-143.\nhttp://www.mothur.org/wiki/Anosim"; } string getDescription() { return "analysis of similarity"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; DesignMap* designMap; string outputDir, inputDir, designFileName, phylipFileName; vector > convertToRanks(vector >); double calcR(vector >, map >); map > getRandomizedGroups(map >); double runANOSIM(ofstream&, vector >, map >, double); vector< vector > distanceMatrix; vector outputNames; int iters; double experimentwiseAlpha; vector< vector > namesOfGroupCombos; }; #endif mothur-1.36.1/source/commands/binsequencecommand.cpp000066400000000000000000000460361255543666200226140ustar00rootroot00000000000000/* * binsequencecommand.cpp * Mothur * * Created by Sarah Westcott on 4/3/09. * Copyright 2009 Schloss Lab UMASS Amhers. All rights reserved. * */ #include "binsequencecommand.h" //********************************************************************************************************************** vector BinSeqCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(plist); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "BinSeqCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string BinSeqCommand::getHelpString(){ try { string helpString = ""; helpString += "The bin.seqs command parameters are list, fasta, name, count, label and group. The fasta and list are required, unless you have a valid current list and fasta file.\n"; helpString += "The label parameter allows you to select what distance levels you would like a output files created for, and are separated by dashes.\n"; helpString += "The bin.seqs command should be in the following format: bin.seqs(fasta=yourFastaFile, name=yourNamesFile, group=yourGroupFile, label=yourLabels).\n"; helpString += "Example bin.seqs(fasta=amazon.fasta, group=amazon.groups, name=amazon.names).\n"; helpString += "The default value for label is all lines in your inputfile.\n"; helpString += "The bin.seqs command outputs a .fasta file for each distance you specify appending the OTU number to each name.\n"; helpString += "If you provide a groupfile, then it also appends the sequences group to the name.\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "BinSeqCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string BinSeqCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],[distance],fasta"; } //makes file like: amazon.0.03.fasta else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "BinSeqCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** BinSeqCommand::BinSeqCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "BinSeqCommand", "BinSeqCommand"); exit(1); } } //********************************************************************************************************************** BinSeqCommand::BinSeqCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; labels.clear(); //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not found") { //if there is a current phylip file, use it fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fasta file and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (fastafile == "not open") { abort = true; } else { m->setFastaFile(fastafile); } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not found") { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current list file and the list parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (listfile == "not open") { listfile = ""; abort = true; } else { m->setListFile(listfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(listfile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } namesfile = validParameter.validFile(parameters, "name", true); if (namesfile == "not open") { namesfile = ""; abort = true; } else if (namesfile == "not found") { namesfile = ""; } else { m->setNameFile(namesfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namesfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } if (countfile == "") { if (namesfile == ""){ vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "BinSeqCommand", "BinSeqCommand"); exit(1); } } //********************************************************************************************************************** BinSeqCommand::~BinSeqCommand(){} //********************************************************************************************************************** int BinSeqCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int error = 0; fasta = new FastaMap(); if (groupfile != "") { groupMap = new GroupMap(groupfile); groupMap->readMap(); } //read fastafile fasta->readFastaFile(fastafile); //if user gave a namesfile then use it if (namesfile != "") { readNamesFile(); } if (countfile != "") { ct.readTable(countfile, true, false); } input = new InputData(listfile, "list"); list = input->getListVector(); string lastLabel = list->getLabel(); if (m->control_pressed) { delete input; delete fasta; if (groupfile != "") { delete groupMap; } return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if(m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; delete fasta; if (groupfile != "") { delete groupMap; } return 0; } if(allLines == 1 || labels.count(list->getLabel()) == 1){ error = process(list); if (error == 1) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; delete fasta; if (groupfile != "") { delete groupMap; } return 0; } processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input->getListVector(lastLabel); error = process(list); if (error == 1) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; delete fasta; if (groupfile != "") { delete groupMap; } return 0; } processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input->getListVector(); } if(m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; delete fasta; if (groupfile != "") { delete groupMap; } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input->getListVector(lastLabel); error = process(list); if (error == 1) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; delete fasta; if (groupfile != "") { delete groupMap; } return 0; } delete list; } delete input; delete fasta; if (groupfile != "") { delete groupMap; } if(m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set align file as new current fastafile string currentFasta = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { currentFasta = (itTypes->second)[0]; m->setFastaFile(currentFasta); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "BinSeqCommand", "execute"); exit(1); } } //********************************************************************************************************************** void BinSeqCommand::readNamesFile() { try { vector dupNames; m->openInputFile(namesfile, inNames); string name, names, sequence; while(inNames){ inNames >> name; //read from first column A inNames >> names; //read from second column A,B,C,D dupNames.clear(); //parse names into vector m->splitAtComma(names, dupNames); //store names in fasta map sequence = fasta->getSequence(name); for (int i = 0; i < dupNames.size(); i++) { fasta->push_back(dupNames[i], sequence); } m->gobble(inNames); } inNames.close(); } catch(exception& e) { m->errorOut(e, "BinSeqCommand", "readNamesFile"); exit(1); } } //********************************************************************************************************************** //return 1 if error, 0 otherwise int BinSeqCommand::process(ListVector* list) { try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[distance]"] = list->getLabel(); string outputFileName = getOutputFileName("fasta", variables); m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["fasta"].push_back(outputFileName); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); //for each bin in the list vector vector binLabels = list->getLabels(); for (int i = 0; i < list->size(); i++) { if (m->control_pressed) { return 1; } string binnames = list->get(i); vector names; m->splitAtComma(binnames, names); for (int j = 0; j < names.size(); j++) { string name = names[j]; //do work for that name string sequence = fasta->getSequence(name); if (countfile != "") { if (sequence != "not found") { if (ct.hasGroupInfo()) { vector groups = ct.getGroups(name); string groupInfo = ""; for (int k = 0; k < groups.size()-1; k++) { groupInfo += groups[k] + "-"; } if (groups.size() != 0) { groupInfo += groups[groups.size()-1]; } else { groupInfo = "not found"; } name = name + "\t" + groupInfo + "\t" + binLabels[i] + "\tNumRep=" + toString(ct.getNumSeqs(name)); out << ">" << name << endl; out << sequence << endl; }else { name = name + "\t" + binLabels[i] + "\tNumRep=" + toString(ct.getNumSeqs(name)); out << ">" << name << endl; out << sequence << endl; } }else { m->mothurOut(name + " is missing from your fasta. Does your list file contain all sequence names or just the uniques?"); m->mothurOutEndLine(); return 1; } }else { if (sequence != "not found") { //if you don't have groups if (groupfile == "") { name = name + "\t" + binLabels[i]; out << ">" << name << endl; out << sequence << endl; }else {//if you do have groups string group = groupMap->getGroup(name); if (group == "not found") { m->mothurOut(name + " is missing from your group file. Please correct. "); m->mothurOutEndLine(); return 1; }else{ name = name + "\t" + group + "\t" + binLabels[i]; out << ">" << name << endl; out << sequence << endl; } } }else { m->mothurOut(name + " is missing from your fasta or name file. Please correct. "); m->mothurOutEndLine(); return 1; } } } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "BinSeqCommand", "process"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/binsequencecommand.h000066400000000000000000000025601255543666200222530ustar00rootroot00000000000000#ifndef BINSEQCOMMAND_H #define BINSEQCOMMAND_H /* * binsequencecommand.h * Mothur * * Created by Sarah Westcott on 4/3/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* The bin.seqs command outputs a .fasta file for each distance you specify appending the OTU number to each name. */ #include "command.hpp" #include "inputdata.h" #include "listvector.hpp" #include "fastamap.h" #include "groupmap.h" #include "counttable.h" class BinSeqCommand : public Command { public: BinSeqCommand(string); BinSeqCommand(); ~BinSeqCommand(); vector setParameters(); string getCommandName() { return "bin.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Bin.seqs"; } string getDescription() { return "maps sequences to otus"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CountTable ct; ListVector* list; InputData* input; FastaMap* fasta; GroupMap* groupMap; bool abort, allLines; set labels; //holds labels to be used string filename, fastafile, listfile, namesfile, groupfile, countfile, label, outputDir; ofstream out; ifstream in, inNames; vector outputNames; void readNamesFile(); int process(ListVector*); }; #endif mothur-1.36.1/source/commands/catchallcommand.cpp000066400000000000000000000762671255543666200220770ustar00rootroot00000000000000/* * catchallcommand.cpp * Mothur * * Created by westcott on 5/11/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "catchallcommand.h" //********************************************************************************************************************** vector CatchAllCommand::setParameters(){ try { CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); //can choose shared or sabund not both, so put them in the same chooseOnlyOneGroup CommandParameter pshared("shared", "InputTypes", "", "", "catchallInputs", "catchallInputs", "none","analysis-bestanalysis-models-bubble-summary",false,false,true); parameters.push_back(pshared); CommandParameter psabund("sabund", "InputTypes", "", "", "catchallInputs", "catchallInputs", "none","analysis-bestanalysis-models-bubble-summary",false,false,true); parameters.push_back(psabund); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string CatchAllCommand::getHelpString(){ try { string helpString = ""; helpString += "The catchall command interfaces mothur with the catchall program written by Linda Woodard, Sean Connolly and John Bunge.\n"; helpString += "For more information about catchall refer to http://www.northeastern.edu/catchall/index.html \n"; helpString += "The catchall executable must be in the same folder as your mothur executable. \n"; helpString += "If you are a MAC or Linux user you must also have installed mono, a link to mono is on the webpage. \n"; helpString += "The catchall command parameters are shared, sabund and label. shared or sabund is required. \n"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The catchall command should be in the following format: \n"; helpString += "catchall(sabund=yourSabundFile) \n"; helpString += "Example: catchall(sabund=abrecovery.fn.sabund) \n"; return helpString; } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string CatchAllCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "analysis") { pattern = "[filename],_Analysis.csv"; } else if (type == "bestanalysis") { pattern = "[filename],_BestModelsAnalysis.csv"; } else if (type == "models") { pattern = "[filename],_BestModelsAnalysis.csv"; } else if (type == "bubble") { pattern = "[filename],_BubblePlot.csv"; } else if (type == "summary") { pattern = "[filename],catchall.summary"; } else if (type == "sabund") { pattern = "[filename],[distance],csv"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** CatchAllCommand::CatchAllCommand(){ try { abort = true; calledHelp = true; setParameters(); //initialize outputTypes vector tempOutNames; outputTypes["analysis"] = tempOutNames; outputTypes["bestanalysis"] = tempOutNames; outputTypes["models"] = tempOutNames; outputTypes["bubble"] = tempOutNames; outputTypes["summary"] = tempOutNames; outputTypes["sabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "CatchAllCommand"); exit(1); } } /**************************************************************************************/ CatchAllCommand::CatchAllCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["analysis"] = tempOutNames; outputTypes["bestanalysis"] = tempOutNames; outputTypes["models"] = tempOutNames; outputTypes["bubble"] = tempOutNames; outputTypes["summary"] = tempOutNames; outputTypes["sabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //check for required parameters sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { sabundfile = ""; abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { m->setSabundFile(sabundfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } if ((sharedfile == "") && (sabundfile == "")) { //is there are current file available for either of these? //give priority to shared, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { sabundfile = m->getSabundFile(); if (sabundfile != "") { m->mothurOut("Using " + sabundfile + " as input file for the sabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a sabund or shared file before you can use the catchall command."); m->mothurOutEndLine(); abort = true; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ if (sabundfile != "") { outputDir = m->hasPath(sabundfile); } else { outputDir = m->hasPath(sharedfile); } } } } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "CatchAllCommand"); exit(1); } } /**************************************************************************************/ int CatchAllCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get location of catchall path = m->argv; path = path.substr(0, (path.find_last_of("othur")-5)); path = m->getFullPathName(path); if (m->debug) { m->mothurOut("[DEBUG]: mothur's path = " + path + "\n"); } savedOutputDir = outputDir; string catchAllCommandExe = ""; string catchAllTest = ""; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (outputDir == "") { outputDir = "./"; } //force full pathname to be created for catchall, this is necessary because if catchall is in the path it will look for input file whereever the exe is and not the cwd. catchAllTest = path + "CatchAllcmdL.exe"; #else if (outputDir == "") { outputDir = ".\\"; } //force full pathname to be created for catchall, this is necessary because if catchall is in the path it will look for input file whereever the exe is and not the cwd. catchAllTest = path + "CatchAllcmdW.exe"; #endif //test to make sure formatdb exists ifstream in; catchAllTest = m->getFullPathName(catchAllTest); int ableToOpen = m->openInputFile(catchAllTest, in, "no error"); in.close(); if(ableToOpen == 1) { m->mothurOut(catchAllTest + " file does not exist. Checking path... \n"); //check to see if uchime is in the path?? string programName = "CatchAllcmdW.exe"; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) programName = "CatchAllcmdL.exe"; #endif string cLocation = m->findProgramPath(programName); ifstream in2; ableToOpen = m->openInputFile(cLocation, in2, "no error"); in2.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + cLocation + " file does not exist. mothur requires the catchall executable."); m->mothurOutEndLine(); return 0; } else { m->mothurOut("Found catchall in your path, using " + cLocation + "\n"); catchAllTest = cLocation; } } catchAllTest = m->getFullPathName(catchAllTest); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) catchAllCommandExe += "mono \"" + catchAllTest + "\" "; #else catchAllCommandExe += "\"" + catchAllTest + "\" "; #endif //prepare full output directory outputDir = m->getFullPathName(outputDir); if (m->debug) { m->mothurOut("[DEBUG]: catchall location = " + catchAllCommandExe + "\n[DEBUG]: outputDir = " + outputDir + "\n"); } vector inputFileNames; if (sharedfile != "") { inputFileNames = parseSharedFile(sharedfile); } else { inputFileNames.push_back(sabundfile); } for (int p = 0; p < inputFileNames.size(); p++) { if (inputFileNames.size() > 1) { m->mothurOutEndLine(); m->mothurOut("Processing group " + groups[p]); m->mothurOutEndLine(); m->mothurOutEndLine(); } InputData input(inputFileNames[p], "sabund"); SAbundVector* sabund = input.getSAbundVector(); string lastLabel = sabund->getLabel(); set processedLabels; set userLabels = labels; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileNames[p])); string summaryfilename = getOutputFileName("summary", variables); summaryfilename = m->getFullPathName(summaryfilename); if (m->debug) { m->mothurOut("[DEBUG]: Input File = " + inputFileNames[p] + ".\n[DEBUG]: inputdata address = " + toString(&input) + ".\n[DEBUG]: sabund address = " + toString(&sabund) + ".\n"); } ofstream out; m->openOutputFile(summaryfilename, out); out << "label\tmodel\testimate\tlci\tuci" << endl; if (m->debug) { string open = "no"; if (out.is_open()) { open = "yes"; } m->mothurOut("[DEBUG]: output stream is open = " + open + ".\n"); } //for each label the user selected while((sabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if(allLines == 1 || labels.count(sabund->getLabel()) == 1){ m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); //create catchall input file from mothur's inputfile string filename = process(sabund, inputFileNames[p]); string outputPath = m->getPathName(filename); //create system command string catchAllCommand = ""; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) catchAllCommand += catchAllCommandExe + "\"" + filename + "\" \"" + outputPath + + "\" 1"; #else //removes extra '\\' catchall doesnt like that vector tempNames; string tempFilename = filename; m->splitAtDash(tempFilename, tempNames); tempFilename = tempNames[0]; tempNames.clear(); string tempOutputPath = outputPath; m->splitAtDash(tempOutputPath, tempNames); tempOutputPath = tempNames[0]; if (tempOutputPath.length() > 0) { tempOutputPath = tempOutputPath.substr(0, tempOutputPath.length()-1); } catchAllCommand += catchAllCommandExe + "\"" + tempFilename + "\" \"" + tempOutputPath + "\" 1"; catchAllCommand = "\"" + catchAllCommand + "\""; #endif if (m->debug) { m->mothurOut("[DEBUG]: catchall command = " + catchAllCommand + ". About to call system.\n"); } //run catchall system(catchAllCommand.c_str()); if (m->debug) { m->mothurOut("[DEBUG]: back from system call. Keeping file: " + filename + ".\n"); } if (!m->debug) { m->mothurRemove(filename); } filename = m->getRootName(filename); filename = filename.substr(0, filename.length()-1); //rip off extra . if (savedOutputDir == "") { filename = m->getSimpleName(filename); } variables["[filename]"] = filename; outputNames.push_back(getOutputFileName("analysis", variables)); outputTypes["analysis"].push_back(getOutputFileName("analysis", variables)); outputNames.push_back(getOutputFileName("bestanalysis", variables)); outputTypes["bestanalysis"].push_back(getOutputFileName("bestanalysis", variables)); outputNames.push_back(getOutputFileName("models", variables)); outputTypes["models"].push_back(getOutputFileName("models", variables)); outputNames.push_back(getOutputFileName("bubble", variables)); outputTypes["bubble"].push_back(getOutputFileName("bubble", variables)); if (m->debug) { m->mothurOut("[DEBUG]: About to create summary file for: " + filename + ".\n[DEBUG]: sabund label = " + sabund->getLabel() + ".\n"); } createSummaryFile(filename + "_BestModelsAnalysis.csv", sabund->getLabel(), out); if (m->debug) { m->mothurOut("[DEBUG]: Done creating summary file.\n"); } if (m->control_pressed) { out.close(); for (int i = 0; i < outputNames.size(); i++) {m->mothurRemove(outputNames[i]); } delete sabund; return 0; } processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); } if ((m->anyLabelsToProcess(sabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = sabund->getLabel(); delete sabund; sabund = (input.getSAbundVector(lastLabel)); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); //create catchall input file from mothur's inputfile string filename = process(sabund, inputFileNames[p]); string outputPath = m->getPathName(filename); //create system command string catchAllCommand = ""; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) catchAllCommand += catchAllCommandExe + "\"" + filename + "\" \"" + outputPath + + "\" 1"; #else //removes extra '\\' catchall doesnt like that vector tempNames; string tempFilename = filename; m->splitAtDash(tempFilename, tempNames); tempFilename = tempNames[0]; tempNames.clear(); string tempOutputPath = outputPath; m->splitAtDash(tempOutputPath, tempNames); tempOutputPath = tempNames[0]; if (tempOutputPath.length() > 0) { tempOutputPath = tempOutputPath.substr(0, tempOutputPath.length()-1); } catchAllCommand += catchAllCommandExe + "\"" + tempFilename + "\" \"" + tempOutputPath + "\" 1"; catchAllCommand = "\"" + catchAllCommand + "\""; #endif if (m->debug) { m->mothurOut("[DEBUG]: catchall command = " + catchAllCommand + ". About to call system.\n"); } //run catchall system(catchAllCommand.c_str()); if (m->debug) { m->mothurOut("[DEBUG]: back from system call. Keeping file: " + filename + ".\n"); } if (!m->debug) { m->mothurRemove(filename); } filename = m->getRootName(filename); filename = filename.substr(0, filename.length()-1); //rip off extra . if (savedOutputDir == "") { filename = m->getSimpleName(filename); } variables["[filename]"] = filename; outputNames.push_back(getOutputFileName("analysis", variables)); outputTypes["analysis"].push_back(getOutputFileName("analysis", variables)); outputNames.push_back(getOutputFileName("bestanalysis", variables)); outputTypes["bestanalysis"].push_back(getOutputFileName("bestanalysis", variables)); outputNames.push_back(getOutputFileName("models", variables)); outputTypes["models"].push_back(getOutputFileName("models", variables)); outputNames.push_back(getOutputFileName("bubble", variables)); outputTypes["bubble"].push_back(getOutputFileName("bubble", variables)); if (m->debug) { m->mothurOut("[DEBUG]: About to create summary file for: " + filename + ".\n[DEBUG]: sabund label = " + sabund->getLabel() + ".\n"); } createSummaryFile(filename + "_BestModelsAnalysis.csv", sabund->getLabel(), out); if (m->debug) { m->mothurOut("[DEBUG]: Done creating summary file.\n"); } if (m->control_pressed) { out.close(); for (int i = 0; i < outputNames.size(); i++) {m->mothurRemove(outputNames[i]); } delete sabund; return 0; } processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); //restore real lastlabel to save below sabund->setLabel(saveLabel); } lastLabel = sabund->getLabel(); delete sabund; sabund = (input.getSAbundVector()); } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (sabund != NULL) { delete sabund; } sabund = (input.getSAbundVector(lastLabel)); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); //create catchall input file from mothur's inputfile string filename = process(sabund, inputFileNames[p]); string outputPath = m->getPathName(filename); //create system command string catchAllCommand = ""; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) catchAllCommand += catchAllCommandExe + "\"" + filename + "\" \"" + outputPath + + "\" 1"; #else //removes extra '\\' catchall doesnt like that vector tempNames; string tempFilename = filename; m->splitAtDash(tempFilename, tempNames); tempFilename = tempNames[0]; tempNames.clear(); string tempOutputPath = outputPath; m->splitAtDash(tempOutputPath, tempNames); tempOutputPath = tempNames[0]; if (tempOutputPath.length() > 0) { tempOutputPath = tempOutputPath.substr(0, tempOutputPath.length()-1); } catchAllCommand += catchAllCommandExe + "\"" + tempFilename + "\" \"" + tempOutputPath + "\" 1"; catchAllCommand = "\"" + catchAllCommand + "\""; #endif if (m->debug) { m->mothurOut("[DEBUG]: catchall command = " + catchAllCommand + ". About to call system.\n"); } //run catchall system(catchAllCommand.c_str()); if (m->debug) { m->mothurOut("[DEBUG]: back from system call. Keeping file: " + filename + ".\n"); } if (!m->debug) { m->mothurRemove(filename); } filename = m->getRootName(filename); filename = filename.substr(0, filename.length()-1); //rip off extra . if (savedOutputDir == "") { filename = m->getSimpleName(filename); } variables["[filename]"] = filename; outputNames.push_back(getOutputFileName("analysis", variables)); outputTypes["analysis"].push_back(getOutputFileName("analysis", variables)); outputNames.push_back(getOutputFileName("bestanalysis", variables)); outputTypes["bestanalysis"].push_back(getOutputFileName("bestanalysis", variables)); outputNames.push_back(getOutputFileName("models", variables)); outputTypes["models"].push_back(getOutputFileName("models", variables)); outputNames.push_back(getOutputFileName("bubble", variables)); outputTypes["bubble"].push_back(getOutputFileName("bubble", variables)); if (m->debug) { m->mothurOut("[DEBUG]: About to create summary file for: " + filename + ".\n[DEBUG]: sabund label = " + sabund->getLabel() + ".\n"); } createSummaryFile(filename + "_BestModelsAnalysis.csv", sabund->getLabel(), out); if (m->debug) { m->mothurOut("[DEBUG]: Done creating summary file.\n"); } delete sabund; } out.close(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) {m->mothurRemove(outputNames[i]); } return 0; } } if (sharedfile == "") { string summaryfilename = savedOutputDir + m->getRootName(m->getSimpleName(inputFileNames[0])) + "catchall.summary"; summaryfilename = m->getFullPathName(summaryfilename); outputNames.push_back(summaryfilename); outputTypes["summary"].push_back(summaryfilename); }else { //combine summaries vector sumNames; for (int i = 0; i < inputFileNames.size(); i++) { sumNames.push_back(m->getFullPathName(outputDir + m->getRootName(m->getSimpleName(inputFileNames[i])) + "catchall.summary")); } string summaryfilename = combineSummmary(sumNames); outputNames.push_back(summaryfilename); outputTypes["summary"].push_back(summaryfilename); } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "execute"); exit(1); } } //********************************************************************************************************************** string CatchAllCommand::process(SAbundVector* sabund, string file1) { try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(file1)); variables["[distance]"] = sabund->getLabel(); string filename = getOutputFileName("sabund", variables); filename = m->getFullPathName(filename); ofstream out; m->openOutputFile(filename, out); if (m->debug) { m->mothurOut("[DEBUG]: Creating " + filename + " file for catchall, shown below.\n\n"); } for (int i = 1; i <= sabund->getMaxRank(); i++) { int temp = sabund->get(i); if (temp != 0) { out << i << "," << temp << endl; if (m->debug) { m->mothurOut(toString(i) + "," + toString(temp) + "\n"); } } } out.close(); if (m->debug) { m->mothurOut("[DEBUG]: Done creating " + filename + " file for catchall, shown above.\n\n"); } return filename; } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "process"); exit(1); } } //********************************************************************************************************************* string CatchAllCommand::combineSummmary(vector& outputNames) { try { ofstream out; map variables; variables["[filename]"] = savedOutputDir + m->getRootName(m->getSimpleName(sharedfile)); string combineFileName = getOutputFileName("summary", variables); //open combined file m->openOutputFile(combineFileName, out); out << "label\tgroup\tmodel\testimate\tlci\tuci" << endl; //open each groups summary file string newLabel = ""; int numLines = 0; map > files; for (int i=0; i thisFilesLines; ifstream temp; m->openInputFile(outputNames[i], temp); //read through first line - labels m->getline(temp); m->gobble(temp); //for each label while (!temp.eof()) { string thisLine = ""; string tempLabel; for (int j = 0; j < 5; j++) { temp >> tempLabel; //save for later if (j == 1) { thisLine += groups[i] + "\t" + tempLabel + "\t"; } else{ thisLine += tempLabel + "\t"; } } thisLine += "\n"; thisFilesLines.push_back(thisLine); m->gobble(temp); } files[outputNames[i]] = thisFilesLines; numLines = thisFilesLines.size(); temp.close(); m->mothurRemove(outputNames[i]); } //for each label for (int k = 0; k < numLines; k++) { //grab summary data for each group for (int i=0; ierrorOut(e, "CatchAllCommand", "combineSummmary"); exit(1); } } //********************************************************************************************************************** int CatchAllCommand::createSummaryFile(string file1, string label, ofstream& out) { try { ifstream in; int able = m->openInputFile(file1, in, "noerror"); if (able == 1) { m->mothurOut("[ERROR]: the catchall program did not run properly. Please check to make sure it is located in the same folder as your mothur executable.");m->mothurOutEndLine(); m->control_pressed = true; return 0; } if (!in.eof()) { string header = m->getline(in); m->gobble(in); int pos = header.find("Total Number of Observed Species ="); string numString = ""; if (pos == string::npos) { m->mothurOut("[ERROR]: cannot parse " + file1); m->mothurOutEndLine(); } else { //pos will be the position of the T in total, so we want to count to the position of = pos += 34; char c=header[pos]; while (c != ','){ if (c != ' ') { numString += c; } pos++; c=header[pos]; //sanity check if (pos > header.length()) { m->mothurOut("Cannot find number of OTUs in " + file1); m->mothurOutEndLine(); in.close(); return 0; } } } string firstline = m->getline(in); m->gobble(in); vector values; m->splitAtComma(firstline, values); values.pop_back(); //last value is always a blank string since the last character in the line is always a ',' if (values.size() == 1) { //grab next line if firstline didn't have what you wanted string secondline = m->getline(in); m->gobble(in); values.clear(); m->splitAtComma(secondline, values); values.pop_back(); //last value is always a blank string since the last character in the line is always a ',' } if (values.size() == 1) { //still not what we wanted fill values with numOTUs values.resize(8, ""); values[1] = "Sobs"; values[4] = numString; values[6] = numString; values[7] = numString; } if (values.size() < 8) { values.resize(8, ""); } out << label << '\t' << values[1] << '\t' << values[4] << '\t' << values[6] << '\t' << values[7] << endl; } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "createSummaryFile"); exit(1); } } //********************************************************************************************************************** vector CatchAllCommand::parseSharedFile(string filename) { try { vector filenames; //read first line InputData input(filename, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string sharedFileRoot = outputDir + m->getRootName(m->getSimpleName(filename)); //clears file before we start to write to it below for (int i=0; imothurRemove((sharedFileRoot + lookup[i]->getGroup() + ".sabund")); filenames.push_back((sharedFileRoot + lookup[i]->getGroup() + ".sabund")); groups.push_back(lookup[i]->getGroup()); } while(lookup[0] != NULL) { for (int i = 0; i < lookup.size(); i++) { SAbundVector sav = lookup[i]->getSAbundVector(); ofstream out; m->openOutputFileAppend(sharedFileRoot + lookup[i]->getGroup() + ".sabund", out); sav.print(out); out.close(); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); } return filenames; } catch(exception& e) { m->errorOut(e, "CatchAllCommand", "parseSharedFile"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/catchallcommand.h000066400000000000000000000030331255543666200215210ustar00rootroot00000000000000#ifndef CATCHALLCOMMAND_H #define CATCHALLCOMMAND_H /* * catchallcommand.h * Mothur * * Created by westcott on 5/11/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "sabundvector.hpp" /* citation goes here */ /****************************************************************************/ class CatchAllCommand : public Command { public: CatchAllCommand(string); CatchAllCommand(); ~CatchAllCommand() {} vector setParameters(); string getCommandName() { return "catchall"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Bunge J, Woodard L, Bohning D, Foster JA, Connolly S, Allen HK (2012). Estimating population diversity with CatchAll. Bioinformatics 28:1045.\nhttp://www.northeastern.edu/catchall/index.html\nhttp://www.mothur.org/wiki/Catchall"; } string getDescription() { return "estimate number of species"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string outputDir, sharedfile, sabundfile, format, path, savedOutputDir; bool abort, allLines; set labels; vector outputNames; vector groups; string process(SAbundVector*, string); int createSummaryFile(string, string, ofstream&); vector parseSharedFile(string); string combineSummmary(vector&); }; /****************************************************************************/ #endif mothur-1.36.1/source/commands/chimerabellerophoncommand.cpp000066400000000000000000000332011255543666200241430ustar00rootroot00000000000000/* * chimerabellerophoncommand.cpp * Mothur * * Created by westcott on 4/1/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "chimerabellerophoncommand.h" #include "bellerophon.h" //********************************************************************************************************************** vector ChimeraBellerophonCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none","none","none","chimera-accnos",false,true,true); parameters.push_back(pfasta); CommandParameter pfilter("filter", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pfilter); CommandParameter pcorrection("correction", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pcorrection); CommandParameter pwindow("window", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pwindow); CommandParameter pincrement("increment", "Number", "", "25", "", "", "","",false,false); parameters.push_back(pincrement); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ChimeraBellerophonCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ChimeraBellerophonCommand::getHelpString(){ try { string helpString = ""; helpString += "The chimera.bellerophon command reads a fastafile and creates list of potentially chimeric sequences.\n"; helpString += "The chimera.bellerophon command parameters are fasta, filter, correction, processors, window, increment. The fasta parameter is required, unless you have a valid current file.\n"; helpString += "The fasta parameter is required. You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amzon.fasta \n"; helpString += "The filter parameter allows you to specify if you would like to apply a vertical and 50% soft filter, default=false. \n"; helpString += "The correction parameter allows you to put more emphasis on the distance between highly similar sequences and less emphasis on the differences between remote homologs, default=true.\n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; #ifdef USE_MPI helpString += "When using MPI, the processors parameter is set to the number of MPI processes running. \n"; #endif helpString += "The window parameter allows you to specify the window size for searching for chimeras, default is 1/4 sequence length. \n"; helpString += "The increment parameter allows you to specify how far you move each window while finding chimeric sequences, default is 25.\n"; helpString += "chimera.bellerophon(fasta=yourFastaFile, filter=yourFilter, correction=yourCorrection, processors=yourProcessors) \n"; helpString += "Example: chimera.bellerophon(fasta=AD.align, filter=True, correction=true, window=200) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ChimeraBellerophonCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ChimeraBellerophonCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "chimera") { pattern = "[filename],bellerophon.chimeras"; } else if (type == "accnos") { pattern = "[filename],bellerophon.accnos"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ChimeraBellerophonCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ChimeraBellerophonCommand::ChimeraBellerophonCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ChimeraBellerophonCommand", "ChimeraBellerophonCommand"); exit(1); } } //*************************************************************************************************************** ChimeraBellerophonCommand::ChimeraBellerophonCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("chimera.bellerophon"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } fastafile = validParameter.validFile(parameters, "fasta", false); if (fastafile == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(fastafile, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } string temp; temp = validParameter.validFile(parameters, "filter", false); if (temp == "not found") { temp = "F"; } filter = m->isTrue(temp); temp = validParameter.validFile(parameters, "correction", false); if (temp == "not found") { temp = "T"; } correction = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "window", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, window); temp = validParameter.validFile(parameters, "increment", false); if (temp == "not found") { temp = "25"; } m->mothurConvert(temp, increment); } } catch(exception& e) { m->errorOut(e, "ChimeraBellerophonCommand", "ChimeraBellerophonCommand"); exit(1); } } //*************************************************************************************************************** int ChimeraBellerophonCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } for (int i = 0; i < fastaFileNames.size(); i++) { m->mothurOut("Checking sequences from " + fastaFileNames[i] + " ..." ); m->mothurOutEndLine(); int start = time(NULL); chimera = new Bellerophon(fastaFileNames[i], filter, correction, window, increment, processors, outputDir); if (outputDir == "") { outputDir = m->hasPath(fastaFileNames[i]); }//if user entered a file with a path then preserve it map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[i])); string outputFileName = getOutputFileName("chimera", variables); string accnosFileName = getOutputFileName("accnos", variables); chimera->getChimeras(); if (m->control_pressed) { delete chimera; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } #ifdef USE_MPI MPI_File outMPI; MPI_File outMPIAccnos; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; char outFilename[1024]; strcpy(outFilename, accnosFileName.c_str()); char FileName[1024]; strcpy(FileName, outputFileName.c_str()); MPI_File_open(MPI_COMM_WORLD, FileName, outMode, MPI_INFO_NULL, &outMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outFilename, outMode, MPI_INFO_NULL, &outMPIAccnos); numSeqs = chimera->print(outMPI, outMPIAccnos, ""); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); #else ofstream out; m->openOutputFile(outputFileName, out); ofstream out2; m->openOutputFile(accnosFileName, out2); numSeqs = chimera->print(out, out2, ""); out.close(); out2.close(); #endif if (m->control_pressed) { m->mothurRemove(accnosFileName); m->mothurRemove(outputFileName); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); delete chimera; return 0; } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); outputNames.push_back(outputFileName); outputTypes["chimera"].push_back(outputFileName); outputNames.push_back(accnosFileName); outputTypes["accnos"].push_back(accnosFileName); delete chimera; } //set accnos file as new current accnosfile string current = ""; itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraBellerophonCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/chimerabellerophoncommand.h000066400000000000000000000025461255543666200236200ustar00rootroot00000000000000#ifndef CHIMERABELLEROPHONCOMMAND_H #define CHIMERABELLEROPHONCOMMAND_H /* * chimerabellerophoncommand.h * Mothur * * Created by westcott on 4/1/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "chimera.h" /***********************************************************/ class ChimeraBellerophonCommand : public Command { public: ChimeraBellerophonCommand(string); ChimeraBellerophonCommand(); ~ChimeraBellerophonCommand(){} vector setParameters(); string getCommandName() { return "chimera.bellerophon"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Huber T, Faulkner G, Hugenholtz P (2004). Bellerophon: a program to detect chimeric sequences in multiple sequence alignments. Bioinformatics 20: 2317-9. \nhttp://www.mothur.org/wiki/Chimera.bellerophon"; } string getDescription() { return "detect chimeric sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, filter, correction; string fastafile, outputDir; int processors, window, increment, numSeqs; Chimera* chimera; vector outputNames; vector fastaFileNames; }; /***********************************************************/ #endif mothur-1.36.1/source/commands/chimeraccodecommand.cpp000066400000000000000000000741661255543666200227260ustar00rootroot00000000000000/* * chimeraccodecommand.cpp * Mothur * * Created by westcott on 3/30/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "chimeraccodecommand.h" #include "ccode.h" #include "referencedb.h" //********************************************************************************************************************** vector ChimeraCcodeCommand::setParameters(){ try { CommandParameter ptemplate("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptemplate); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","chimera-mapinfo-accnos",false,true,true); parameters.push_back(pfasta); CommandParameter pfilter("filter", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pfilter); CommandParameter pwindow("window", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pwindow); CommandParameter pnumwanted("numwanted", "Number", "", "20", "", "", "","",false,false); parameters.push_back(pnumwanted); CommandParameter pmask("mask", "String", "", "", "", "", "","",false,false); parameters.push_back(pmask); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter psave("save", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psave); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ChimeraCcodeCommand::getHelpString(){ try { string helpString = ""; helpString += "The chimera.ccode command reads a fastafile and referencefile and outputs potentially chimeric sequences.\n"; helpString += "This command was created using the algorithms described in the 'Evaluating putative chimeric sequences from PCR-amplified products' paper by Juan M. Gonzalez, Johannes Zimmerman and Cesareo Saiz-Jimenez.\n"; helpString += "The chimera.ccode command parameters are fasta, reference, filter, mask, processors, window and numwanted.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your potentially chimeric sequences, and is required unless you have a valid current fasta file. \n"; helpString += "You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amzon.fasta \n"; helpString += "The reference parameter allows you to enter a reference file containing known non-chimeric sequences, and is required. \n"; helpString += "The filter parameter allows you to specify if you would like to apply a vertical and 50% soft filter. \n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; #ifdef USE_MPI helpString += "When using MPI, the processors parameter is set to the number of MPI processes running. \n"; #endif helpString += "The mask parameter allows you to specify a file containing one sequence you wish to use as a mask for the your sequences. \n"; helpString += "The window parameter allows you to specify the window size for searching for chimeras. \n"; helpString += "The numwanted parameter allows you to specify how many sequences you would each query sequence compared with.\n"; helpString += "If the save parameter is set to true the reference sequences will be saved in memory, to clear them later you can use the clear.memory command. Default=f."; helpString += "The chimera.ccode command should be in the following format: \n"; helpString += "chimera.ccode(fasta=yourFastaFile, reference=yourTemplate) \n"; helpString += "Example: chimera.ccode(fasta=AD.align, reference=core_set_aligned.imputed.fasta) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ChimeraCcodeCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "chimera") { pattern = "[filename],[tag],ccode.chimeras-[filename],ccode.chimeras"; } else if (type == "accnos") { pattern = "[filename],[tag],ccode.accnos-[filename],ccode.accnos"; } else if (type == "mapinfo") { pattern = "[filename],mapinfo"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ChimeraCcodeCommand::ChimeraCcodeCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["mapinfo"] = tempOutNames; outputTypes["accnos"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "ChimeraCcodeCommand"); exit(1); } } //*************************************************************************************************************** ChimeraCcodeCommand::ChimeraCcodeCommand(string option) { try { abort = false; calledHelp = false; ReferenceDB* rdb = ReferenceDB::getInstance(); //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("chimera.ccode"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["mapinfo"] = tempOutNames; outputTypes["accnos"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", false); if (fastafile == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(fastafile, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } maskfile = validParameter.validFile(parameters, "mask", false); if (maskfile == "not found") { maskfile = ""; } else if (maskfile != "default") { if (inputDir != "") { string path = m->hasPath(maskfile); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { maskfile = inputDir + maskfile; } } ifstream in; int ableToOpen = m->openInputFile(maskfile, in); if (ableToOpen == 1) { abort = true; } in.close(); } string temp; temp = validParameter.validFile(parameters, "filter", false); if (temp == "not found") { temp = "F"; } filter = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "window", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, window); temp = validParameter.validFile(parameters, "numwanted", false); if (temp == "not found") { temp = "20"; } m->mothurConvert(temp, numwanted); temp = validParameter.validFile(parameters, "save", false); if (temp == "not found"){ temp = "f"; } save = m->isTrue(temp); rdb->save = save; if (save) { //clear out old references rdb->clearMemory(); } //this has to go after save so that if the user sets save=t and provides no reference we abort templatefile = validParameter.validFile(parameters, "reference", true); if (templatefile == "not found") { //check for saved reference sequences if (rdb->referenceSeqs.size() != 0) { templatefile = "saved"; }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required."); m->mothurOutEndLine(); abort = true; } }else if (templatefile == "not open") { abort = true; } else { if (save) { rdb->setSavedReference(templatefile); } } } } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "ChimeraCcodeCommand"); exit(1); } } //*************************************************************************************************************** int ChimeraCcodeCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } for (int s = 0; s < fastaFileNames.size(); s++) { m->mothurOut("Checking sequences from " + fastaFileNames[s] + " ..." ); m->mothurOutEndLine(); int start = time(NULL); //set user options if (maskfile == "default") { m->mothurOut("I am using the default 236627 EU009184.1 Shigella dysenteriae str. FBD013."); m->mothurOutEndLine(); } chimera = new Ccode(fastaFileNames[s], templatefile, filter, maskfile, window, numwanted, outputDir); //is your template aligned? if (chimera->getUnaligned()) { m->mothurOut("Your template sequences are different lengths, please correct."); m->mothurOutEndLine(); delete chimera; return 0; } templateSeqsLength = chimera->getLength(); if (outputDir == "") { outputDir = m->hasPath(fastaFileNames[s]); }//if user entered a file with a path then preserve it string outputFileName, accnosFileName; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])); string mapInfo = getOutputFileName("mapinfo", variables); if (maskfile != "") { variables["[tag]"] = maskfile; } outputFileName = getOutputFileName("chimera", variables); accnosFileName = getOutputFileName("accnos", variables); if (m->control_pressed) { delete chimera; for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); return 0; } #ifdef USE_MPI int pid, numSeqsPerProcessor; int tag = 2001; vector MPIPos; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_File inMPI; MPI_File outMPI; MPI_File outMPIAccnos; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outFilename[1024]; strcpy(outFilename, outputFileName.c_str()); char outAccnosFilename[1024]; strcpy(outAccnosFilename, accnosFileName.c_str()); char inFileName[1024]; strcpy(inFileName, fastaFileNames[s].c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outFilename, outMode, MPI_INFO_NULL, &outMPI); MPI_File_open(MPI_COMM_WORLD, outAccnosFilename, outMode, MPI_INFO_NULL, &outMPIAccnos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); delete chimera; return 0; } if (pid == 0) { //you are the root process string outTemp = "For full window mapping info refer to " + mapInfo + "\n"; //print header int length = outTemp.length(); char* buf2 = new char[length]; memcpy(buf2, outTemp.c_str(), length); MPI_File_write_shared(outMPI, buf2, length, MPI_CHAR, &status); delete buf2; MPIPos = m->setFilePosFasta(fastaFileNames[s], numSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (numSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } //figure out how many sequences you have to align numSeqsPerProcessor = numSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPI, outMPIAccnos, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); m->mothurRemove(outputFileName); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); delete chimera; return 0; } }else{ //you are a child process MPI_Recv(&numSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(numSeqs+1); MPI_Recv(&MPIPos[0], (numSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = numSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPI, outMPIAccnos, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); delete chimera; return 0; } } //close files MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else ofstream outHeader; string tempHeader = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])) + maskfile + "ccode.chimeras.tempHeader"; m->openOutputFile(tempHeader, outHeader); outHeader << "For full window mapping info refer to " << mapInfo << endl << endl; outHeader.close(); //break up file #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) vector positions = m->divideFile(fastaFileNames[s], processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } if(processors == 1){ numSeqs = driver(lines[0], outputFileName, fastaFileNames[s], accnosFileName); if (m->control_pressed) { m->mothurRemove(outputFileName); m->mothurRemove(tempHeader); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } for (int i = 0; i < lines.size(); i++) { delete lines[i]; } outputTypes.clear(); lines.clear(); delete chimera; return 0; } }else{ processIDS.resize(0); numSeqs = createProcesses(outputFileName, fastaFileNames[s], accnosFileName); rename((outputFileName + toString(processIDS[0]) + ".temp").c_str(), outputFileName.c_str()); rename((accnosFileName + toString(processIDS[0]) + ".temp").c_str(), accnosFileName.c_str()); //append output files for(int i=1;iappendFiles((outputFileName + toString(processIDS[i]) + ".temp"), outputFileName); m->mothurRemove((outputFileName + toString(processIDS[i]) + ".temp")); } //append output files for(int i=1;iappendFiles((accnosFileName + toString(processIDS[i]) + ".temp"), accnosFileName); m->mothurRemove((accnosFileName + toString(processIDS[i]) + ".temp")); } if (m->control_pressed) { m->mothurRemove(outputFileName); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); delete chimera; return 0; } } #else lines.push_back(new linePair(0, 1000)); numSeqs = driver(lines[0], outputFileName, fastaFileNames[s], accnosFileName); if (m->control_pressed) { m->mothurRemove(outputFileName); m->mothurRemove(tempHeader); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } for (int i = 0; i < lines.size(); i++) { delete lines[i]; } outputTypes.clear(); lines.clear(); delete chimera; return 0; } #endif m->appendFiles(outputFileName, tempHeader); m->mothurRemove(outputFileName); rename(tempHeader.c_str(), outputFileName.c_str()); #endif delete chimera; outputNames.push_back(outputFileName); outputTypes["chimera"].push_back(outputFileName); outputNames.push_back(mapInfo); outputTypes["mapinfo"].push_back(mapInfo); outputNames.push_back(accnosFileName); outputTypes["accnos"].push_back(accnosFileName); for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); } //set accnos file as new current accnosfile string current = ""; itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "execute"); exit(1); } } //********************************************************************************************************************** int ChimeraCcodeCommand::driver(linePair* filePos, string outputFName, string filename, string accnos){ try { ofstream out; m->openOutputFile(outputFName, out); ofstream out2; m->openOutputFile(accnos, out2); ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(filePos->start); bool done = false; int count = 0; while (!done) { if (m->control_pressed) { return 1; } Sequence* candidateSeq = new Sequence(inFASTA); m->gobble(inFASTA); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getAligned().length() != templateSeqsLength) { m->mothurOut(candidateSeq->getName() + " is not the same length as the template sequences. Skipping."); m->mothurOutEndLine(); }else{ //find chimeras chimera->getChimeras(candidateSeq); if (m->control_pressed) { delete candidateSeq; return 1; } //print results chimera->print(out, out2); } count++; } delete candidateSeq; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= filePos->end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count) + "\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count) + "\n"); } out.close(); out2.close(); inFASTA.close(); return count; } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "driver"); exit(1); } } //********************************************************************************************************************** #ifdef USE_MPI int ChimeraCcodeCommand::driverMPI(int start, int num, MPI_File& inMPI, MPI_File& outMPI, MPI_File& outAccMPI, vector& MPIPos){ try { MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are for(int i=0;icontrol_pressed) { return 0; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); delete buf4; Sequence* candidateSeq = new Sequence(iss); m->gobble(iss); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getAligned().length() != templateSeqsLength) { m->mothurOut(candidateSeq->getName() + " is not the same length as the template sequences. Skipping."); m->mothurOutEndLine(); }else{ //find chimeras chimera->getChimeras(candidateSeq); if (m->control_pressed) { delete candidateSeq; return 1; } //print results chimera->print(outMPI, outAccMPI); } } delete candidateSeq; //report progress if((i+1) % 100 == 0){ cout << "Processing sequence: " << (i+1) << endl; m->mothurOutJustToLog("Processing sequence: " + toString(i+1) + "\n"); } } //report progress if(num % 100 != 0){ cout << "Processing sequence: " << num << endl; m->mothurOutJustToLog("Processing sequence: " + toString(num) + "\n"); } return 0; } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "driverMPI"); exit(1); } } #endif /**************************************************************************************************/ int ChimeraCcodeCommand::createProcesses(string outputFileName, string filename, string accnos) { try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 0; int num = 0; bool recalc = false; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], outputFileName + toString(m->mothurGetpid(process)) + ".temp", filename, accnos + toString(m->mothurGetpid(process)) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector positions; positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 0; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], outputFileName + toString(m->mothurGetpid(process)) + ".temp", filename, accnos + toString(m->mothurGetpid(process)) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); } return num; #endif } catch(exception& e) { m->errorOut(e, "ChimeraCcodeCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/chimeraccodecommand.h000066400000000000000000000031401255543666200223530ustar00rootroot00000000000000#ifndef CHIMERACCODECOMMAND_H #define CHIMERACCODECOMMAND_H /* * chimeraccodecommand.h * Mothur * * Created by westcott on 3/30/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "chimera.h" /***********************************************************/ class ChimeraCcodeCommand : public Command { public: ChimeraCcodeCommand(string); ChimeraCcodeCommand(); ~ChimeraCcodeCommand(){} vector setParameters(); string getCommandName() { return "chimera.ccode"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Gonzalez JM, Zimmermann J, Saiz-Jimenez C (2005). Evaluating putative chimeric sequences from PCR-amplified products. Bioinformatics 21: 333-7. \nhttp://www.mothur.org/wiki/Chimera.ccode"; } string getDescription() { return "detect chimeric sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector processIDS; //processid vector lines; int driver(linePair*, string, string, string); int createProcesses(string, string, string); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, MPI_File&, MPI_File&, vector&); #endif bool abort, filter, save; string fastafile, templatefile, outputDir, maskfile; int processors, window, numwanted, numSeqs, templateSeqsLength; Chimera* chimera; vector fastaFileNames; vector outputNames; }; /***********************************************************/ #endif mothur-1.36.1/source/commands/chimeracheckcommand.cpp000066400000000000000000000717021255543666200227170ustar00rootroot00000000000000/* * chimeracheckcommand.cpp * Mothur * * Created by westcott on 3/31/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "chimeracheckcommand.h" #include "referencedb.h" //********************************************************************************************************************** vector ChimeraCheckCommand::setParameters(){ try { CommandParameter ptemplate("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptemplate); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","chimera",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "none", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter psvg("svg", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psvg); CommandParameter pincrement("increment", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pincrement); CommandParameter pksize("ksize", "Number", "", "7", "", "", "","",false,false); parameters.push_back(pksize); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter psave("save", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psave); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ChimeraCheckCommand::getHelpString(){ try { string helpString = ""; helpString += "The chimera.check command reads a fastafile and referencefile and outputs potentially chimeric sequences.\n"; helpString += "This command was created using the algorithms described in CHIMERA_CHECK version 2.7 written by Niels Larsen. \n"; helpString += "The chimera.check command parameters are fasta, reference, processors, ksize, increment, svg and name.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your potentially chimeric sequences, and is required unless you have a valid current fasta file. \n"; helpString += "You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amzon.fasta \n"; helpString += "The reference parameter allows you to enter a reference file containing known non-chimeric sequences, and is required. \n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; #ifdef USE_MPI helpString += "When using MPI, the processors parameter is set to the number of MPI processes running. \n"; #endif helpString += "The increment parameter allows you to specify how far you move each window while finding chimeric sequences, default is 10.\n"; helpString += "The ksize parameter allows you to input kmersize, default is 7. \n"; helpString += "The svg parameter allows you to specify whether or not you would like a svg file outputted for each query sequence, default is False.\n"; helpString += "The name parameter allows you to enter a file containing names of sequences you would like .svg files for.\n"; helpString += "You may enter multiple name files by separating their names with dashes. ie. fasta=abrecovery.svg.names-amzon.svg.names \n"; helpString += "If the save parameter is set to true the reference sequences will be saved in memory, to clear them later you can use the clear.memory command. Default=f."; helpString += "The chimera.check command should be in the following format: \n"; helpString += "chimera.check(fasta=yourFastaFile, reference=yourTemplateFile, processors=yourProcessors, ksize=yourKmerSize) \n"; helpString += "Example: chimera.check(fasta=AD.fasta, reference=core_set_aligned,imputed.fasta, processors=4, ksize=8) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ChimeraCheckCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "chimera") { pattern = "[filename],chimeracheck.chimeras"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ChimeraCheckCommand::ChimeraCheckCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["chimera"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "ChimeraCheckCommand"); exit(1); } } //*************************************************************************************************************** ChimeraCheckCommand::ChimeraCheckCommand(string option) { try { abort = false; calledHelp = false; ReferenceDB* rdb = ReferenceDB::getInstance(); //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("chimera.check"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["chimera"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ string path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", false); if (fastafile == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(fastafile, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] +". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } namefile = validParameter.validFile(parameters, "name", false); if (namefile == "not found") { namefile = ""; } else { m->splitAtDash(namefile, nameFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < nameFileNames.size(); i++) { bool ignore = false; if (nameFileNames[i] == "current") { nameFileNames[i] = m->getNameFile(); if (nameFileNames[i] != "") { m->mothurOut("Using " + nameFileNames[i] + " as input file for the name parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list nameFileNames.erase(nameFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(nameFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { nameFileNames[i] = inputDir + nameFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(nameFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(nameFileNames[i]); m->mothurOut("Unable to open " + nameFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); nameFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(nameFileNames[i]); m->mothurOut("Unable to open " + nameFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); nameFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + nameFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list nameFileNames.erase(nameFileNames.begin()+i); i--; }else { m->setNameFile(nameFileNames[i]); } } } //make sure there is at least one valid file left if (nameFileNames.size() != 0) { if (nameFileNames.size() != fastaFileNames.size()) { m->mothurOut("Different number of valid name files and fasta files, aborting command."); m->mothurOutEndLine(); abort = true; } } } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "save", false); if (temp == "not found"){ temp = "f"; } save = m->isTrue(temp); rdb->save = save; if (save) { //clear out old references rdb->clearMemory(); } //this has to go after save so that if the user sets save=t and provides no reference we abort templatefile = validParameter.validFile(parameters, "reference", true); if (templatefile == "not found") { //check for saved reference sequences if (rdb->referenceSeqs.size() != 0) { templatefile = "saved"; }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required."); m->mothurOutEndLine(); abort = true; } }else if (templatefile == "not open") { abort = true; } else { if (save) { rdb->setSavedReference(templatefile); } } temp = validParameter.validFile(parameters, "ksize", false); if (temp == "not found") { temp = "7"; } m->mothurConvert(temp, ksize); temp = validParameter.validFile(parameters, "svg", false); if (temp == "not found") { temp = "F"; } svg = m->isTrue(temp); if (nameFileNames.size() != 0) { svg = true; } temp = validParameter.validFile(parameters, "increment", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, increment); } } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "ChimeraCheckCommand"); exit(1); } } //*************************************************************************************************************** int ChimeraCheckCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } for (int i = 0; i < fastaFileNames.size(); i++) { m->mothurOut("Checking sequences from " + fastaFileNames[i] + " ..." ); m->mothurOutEndLine(); int start = time(NULL); string thisNameFile = ""; if (nameFileNames.size() != 0) { thisNameFile = nameFileNames[i]; } chimera = new ChimeraCheckRDP(fastaFileNames[i], templatefile, thisNameFile, svg, increment, ksize, outputDir); if (m->control_pressed) { delete chimera; return 0; } if (outputDir == "") { outputDir = m->hasPath(fastaFileNames[i]); }//if user entered a file with a path then preserve it map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[i])); string outputFileName = getOutputFileName("chimera", variables); outputNames.push_back(outputFileName); outputTypes["chimera"].push_back(outputFileName); #ifdef USE_MPI int pid, numSeqsPerProcessor; int tag = 2001; vector MPIPos; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_File inMPI; MPI_File outMPI; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outFilename[1024]; strcpy(outFilename, outputFileName.c_str()); char inFileName[1024]; strcpy(inFileName, fastaFileNames[i].c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outFilename, outMode, MPI_INFO_NULL, &outMPI); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); delete chimera; return 0; } if (pid == 0) { //you are the root process MPIPos = m->setFilePosFasta(fastaFileNames[i], numSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int j = 1; j < processors; j++) { MPI_Send(&numSeqs, 1, MPI_INT, j, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (numSeqs+1), MPI_LONG, j, tag, MPI_COMM_WORLD); } //figure out how many sequences you have to align numSeqsPerProcessor = numSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPI, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); delete chimera; return 0; } //wait on chidren for(int j = 1; j < processors; j++) { char buf[5]; MPI_Recv(buf, 5, MPI_CHAR, j, tag, MPI_COMM_WORLD, &status); } }else{ //you are a child process MPI_Recv(&numSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(numSeqs+1); MPI_Recv(&MPIPos[0], (numSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = numSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPI, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); delete chimera; return 0; } //tell parent you are done. char buf[5]; strcpy(buf, "done"); MPI_Send(buf, 5, MPI_CHAR, 0, tag, MPI_COMM_WORLD); } //close files MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else //break up file #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) vector positions = m->divideFile(fastaFileNames[i], processors); for (int s = 0; s < (positions.size()-1); s++) { lines.push_back(new linePair(positions[s], positions[(s+1)])); } if(processors == 1){ numSeqs = driver(lines[0], outputFileName, fastaFileNames[i]); if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } for (int j = 0; j < lines.size(); j++) { delete lines[j]; } outputTypes.clear(); lines.clear(); delete chimera; return 0; } }else{ processIDS.resize(0); numSeqs = createProcesses(outputFileName, fastaFileNames[i]); rename((outputFileName + toString(processIDS[0]) + ".temp").c_str(), outputFileName.c_str()); //append output files for(int j=1;jappendFiles((outputFileName + toString(processIDS[j]) + ".temp"), outputFileName); m->mothurRemove((outputFileName + toString(processIDS[j]) + ".temp")); } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); for (int j = 0; j < lines.size(); j++) { delete lines[j]; } lines.clear(); delete chimera; return 0; } } #else lines.push_back(new linePair(0, 1000)); numSeqs = driver(lines[0], outputFileName, fastaFileNames[i]); if (m->control_pressed) { for (int j = 0; j < lines.size(); j++) { delete lines[j]; } lines.clear(); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); delete chimera; return 0; } #endif #endif delete chimera; for (int j = 0; j < lines.size(); j++) { delete lines[j]; } lines.clear(); m->mothurOutEndLine(); m->mothurOut("This method does not determine if a sequence is chimeric, but allows you to make that determination based on the IS values."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "execute"); exit(1); } } //********************************************************************************************************************** int ChimeraCheckCommand::driver(linePair* filePos, string outputFName, string filename){ try { ofstream out; m->openOutputFile(outputFName, out); ofstream out2; ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(filePos->start); bool done = false; int count = 0; while (!done) { if (m->control_pressed) { return 1; } Sequence* candidateSeq = new Sequence(inFASTA); m->gobble(inFASTA); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file //find chimeras chimera->getChimeras(candidateSeq); if (m->control_pressed) { delete candidateSeq; return 1; } //print results chimera->print(out, out2); count++; } delete candidateSeq; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= filePos->end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count) + "\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count) + "\n"); } out.close(); inFASTA.close(); return count; } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "driver"); exit(1); } } //********************************************************************************************************************** #ifdef USE_MPI int ChimeraCheckCommand::driverMPI(int start, int num, MPI_File& inMPI, MPI_File& outMPI, vector& MPIPos){ try { MPI_File outAccMPI; MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are for(int i=0;icontrol_pressed) { return 0; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); delete buf4; Sequence* candidateSeq = new Sequence(iss); m->gobble(iss); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file //find chimeras chimera->getChimeras(candidateSeq); //print results chimera->print(outMPI, outAccMPI); } delete candidateSeq; //report progress if((i+1) % 100 == 0){ cout << "Processing sequence: " << (i+1) << endl; } } //report progress if(num % 100 != 0){ cout << "Processing sequence: " << num << endl; } return 0; } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "driverMPI"); exit(1); } } #endif /**************************************************************************************************/ int ChimeraCheckCommand::createProcesses(string outputFileName, string filename) { try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 0; int num = 0; bool recalc = false; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], outputFileName + toString(m->mothurGetpid(process)) + ".temp", filename); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector positions = m->divideFile(filename, processors); for (int s = 0; s < (positions.size()-1); s++) { lines.push_back(new linePair(positions[s], positions[(s+1)])); } num = 0; processIDS.resize(0); process = 0; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], outputFileName + toString(m->mothurGetpid(process)) + ".temp", filename); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); } return num; #endif } catch(exception& e) { m->errorOut(e, "ChimeraCheckCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/commands/chimeracheckcommand.h000066400000000000000000000031221255543666200223530ustar00rootroot00000000000000#ifndef CHIMERACHECKCOMMAND_H #define CHIMERACHECKCOMMAND_H /* * chimeracheckcommand.h * Mothur * * Created by westcott on 3/31/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "chimera.h" #include "chimeracheckrdp.h" /***********************************************************/ class ChimeraCheckCommand : public Command { public: ChimeraCheckCommand(string); ChimeraCheckCommand(); ~ChimeraCheckCommand(){} vector setParameters(); string getCommandName() { return "chimera.check"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "CHIMERA_CHECK version 2.7 written by Niels Larsen (http://wdcm.nig.ac.jp/RDP/docs/chimera_doc.html) \nhttp://www.mothur.org/wiki/Chimera.check"; } string getDescription() { return "detect chimeric sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector processIDS; //processid vector lines; int driver(linePair*, string, string); int createProcesses(string, string); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, MPI_File&, vector&); #endif bool abort, svg, save; string fastafile, templatefile, namefile, outputDir; int processors, increment, ksize, numSeqs, templateSeqsLength; Chimera* chimera; vector fastaFileNames; vector nameFileNames; vector outputNames; }; /***********************************************************/ #endif mothur-1.36.1/source/commands/chimeraperseuscommand.cpp000066400000000000000000001760541255543666200233360ustar00rootroot00000000000000/* * chimeraperseuscommand.cpp * Mothur * * Created by westcott on 10/26/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "chimeraperseuscommand.h" #include "deconvolutecommand.h" #include "sequence.hpp" #include "counttable.h" #include "sequencecountparser.h" //********************************************************************************************************************** vector ChimeraPerseusCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","chimera-accnos",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "NameCount", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "NameCount", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pdups("dereplicate", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pdups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter pcutoff("cutoff", "Number", "", "0.5", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter palpha("alpha", "Number", "", "-5.54", "", "", "","",false,false); parameters.push_back(palpha); CommandParameter pbeta("beta", "Number", "", "0.33", "", "", "","",false,false); parameters.push_back(pbeta); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ChimeraPerseusCommand::getHelpString(){ try { string helpString = ""; helpString += "The chimera.perseus command reads a fastafile and namefile or countfile and outputs potentially chimeric sequences.\n"; helpString += "The chimera.perseus command parameters are fasta, name, group, cutoff, processors, dereplicate, alpha and beta.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your potentially chimeric sequences, and is required, unless you have a valid current fasta file. \n"; helpString += "The name parameter allows you to provide a name file associated with your fasta file.\n"; helpString += "The count parameter allows you to provide a count file associated with your fasta file. A count or name file is required. When you use a count file with group info and dereplicate=T, mothur will create a *.pick.count_table file containing seqeunces after chimeras are removed.\n"; helpString += "You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amazon.fasta \n"; helpString += "The group parameter allows you to provide a group file. When checking sequences, only sequences from the same group as the query sequence will be used as the reference. \n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; helpString += "If the dereplicate parameter is false, then if one group finds the seqeunce to be chimeric, then all groups find it to be chimeric, default=f.\n"; helpString += "The alpha parameter .... The default is -5.54. \n"; helpString += "The beta parameter .... The default is 0.33. \n"; helpString += "The cutoff parameter .... The default is 0.50. \n"; helpString += "The chimera.perseus command should be in the following format: \n"; helpString += "chimera.perseus(fasta=yourFastaFile, name=yourNameFile) \n"; helpString += "Example: chimera.perseus(fasta=AD.align, name=AD.names) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ChimeraPerseusCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "chimera") { pattern = "[filename],perseus.chimeras"; } else if (type == "accnos") { pattern = "[filename],perseus.accnos"; } else if (type == "count") { pattern = "[filename],perseus.pick.count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ChimeraPerseusCommand::ChimeraPerseusCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "ChimeraPerseusCommand"); exit(1); } } //*************************************************************************************************************** ChimeraPerseusCommand::ChimeraPerseusCommand(string option) { try { abort = false; calledHelp = false; hasCount = false; hasName = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("chimera.perseus"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", false); if (fastafile == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(fastafile, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("[ERROR]: no valid files."); m->mothurOutEndLine(); abort = true; } } //check for required parameters namefile = validParameter.validFile(parameters, "name", false); if (namefile == "not found") { namefile = ""; } else { m->splitAtDash(namefile, nameFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < nameFileNames.size(); i++) { bool ignore = false; if (nameFileNames[i] == "current") { nameFileNames[i] = m->getNameFile(); if (nameFileNames[i] != "") { m->mothurOut("Using " + nameFileNames[i] + " as input file for the name parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list nameFileNames.erase(nameFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(nameFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { nameFileNames[i] = inputDir + nameFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(nameFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(nameFileNames[i]); m->mothurOut("Unable to open " + nameFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); nameFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(nameFileNames[i]); m->mothurOut("Unable to open " + nameFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); nameFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + nameFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list nameFileNames.erase(nameFileNames.begin()+i); i--; }else { m->setNameFile(nameFileNames[i]); } } } } if (nameFileNames.size() != 0) { hasName = true; } //check for required parameters vector countfileNames; countfile = validParameter.validFile(parameters, "count", false); if (countfile == "not found") { countfile = ""; }else { m->splitAtDash(countfile, countfileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < countfileNames.size(); i++) { bool ignore = false; if (countfileNames[i] == "current") { countfileNames[i] = m->getCountTableFile(); if (countfileNames[i] != "") { m->mothurOut("Using " + countfileNames[i] + " as input file for the count parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current count file, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list countfileNames.erase(countfileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(countfileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { countfileNames[i] = inputDir + countfileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(countfileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(countfileNames[i]); m->mothurOut("Unable to open " + countfileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); countfileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(countfileNames[i]); m->mothurOut("Unable to open " + countfileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); countfileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + countfileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list countfileNames.erase(countfileNames.begin()+i); i--; }else { m->setCountTableFile(countfileNames[i]); } } } } if (countfileNames.size() != 0) { hasCount = true; } //make sure there is at least one valid file left if (hasName && hasCount) { m->mothurOut("[ERROR]: You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } if (!hasName && !hasCount) { //if there is a current name file, use it, else look for current count file string filename = m->getNameFile(); if (filename != "") { hasName = true; nameFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the name parameter."); m->mothurOutEndLine(); } else { filename = m->getCountTableFile(); if (filename != "") { hasCount = true; countfileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("[ERROR]: You must provide a count or name file."); m->mothurOutEndLine(); abort = true; } } } if (!hasName && hasCount) { nameFileNames = countfileNames; } if (nameFileNames.size() != fastaFileNames.size()) { m->mothurOut("[ERROR]: The number of name or count files does not match the number of fastafiles, please correct."); m->mothurOutEndLine(); abort=true; } bool hasGroup = true; groupfile = validParameter.validFile(parameters, "group", false); if (groupfile == "not found") { groupfile = ""; hasGroup = false; } else { m->splitAtDash(groupfile, groupFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < groupFileNames.size(); i++) { bool ignore = false; if (groupFileNames[i] == "current") { groupFileNames[i] = m->getGroupFile(); if (groupFileNames[i] != "") { m->mothurOut("Using " + groupFileNames[i] + " as input file for the group parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list groupFileNames.erase(groupFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(groupFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { groupFileNames[i] = inputDir + groupFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(groupFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(groupFileNames[i]); m->mothurOut("Unable to open " + groupFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(groupFileNames[i]); m->mothurOut("Unable to open " + groupFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + groupFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list groupFileNames.erase(groupFileNames.begin()+i); i--; }else { m->setGroupFile(groupFileNames[i]); } } } //make sure there is at least one valid file left if (groupFileNames.size() == 0) { m->mothurOut("[ERROR]: no valid group files."); m->mothurOutEndLine(); abort = true; } } if (hasGroup && (groupFileNames.size() != fastaFileNames.size())) { m->mothurOut("[ERROR]: The number of groupfiles does not match the number of fastafiles, please correct."); m->mothurOutEndLine(); abort=true; } if (hasGroup && hasCount) { m->mothurOut("[ERROR]: You must enter ONLY ONE of the following: count or group."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found"){ temp = "0.50"; } m->mothurConvert(temp, cutoff); temp = validParameter.validFile(parameters, "alpha", false); if (temp == "not found"){ temp = "-5.54"; } m->mothurConvert(temp, alpha); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found"){ temp = "0.33"; } m->mothurConvert(temp, beta); temp = validParameter.validFile(parameters, "dereplicate", false); if (temp == "not found") { temp = "false"; } dups = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "ChimeraPerseusCommand"); exit(1); } } //*************************************************************************************************************** int ChimeraPerseusCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } //process each file for (int s = 0; s < fastaFileNames.size(); s++) { m->mothurOut("Checking sequences from " + fastaFileNames[s] + " ..." ); m->mothurOutEndLine(); int start = time(NULL); if (outputDir == "") { outputDir = m->hasPath(fastaFileNames[s]); }//if user entered a file with a path then preserve it map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])); string outputFileName = getOutputFileName("chimera", variables); string accnosFileName = getOutputFileName("accnos", variables); string newCountFile = ""; //string newFasta = m->getRootName(fastaFileNames[s]) + "temp"; //you provided a groupfile string groupFile = ""; if (groupFileNames.size() != 0) { groupFile = groupFileNames[s]; } string nameFile = ""; if (nameFileNames.size() != 0) { //you provided a namefile and we don't need to create one nameFile = nameFileNames[s]; }else { nameFile = getNamesFile(fastaFileNames[s]); } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } int numSeqs = 0; int numChimeras = 0; if (hasCount) { CountTable* ct = new CountTable(); ct->readTable(nameFile, true, false); if (ct->hasGroupInfo()) { cparser = new SequenceCountParser(fastaFileNames[s], *ct); variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(nameFile)); newCountFile = getOutputFileName("count", variables); vector groups = cparser->getNamesOfGroups(); if (m->control_pressed) { delete ct; delete cparser; for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } //clears files ofstream out, out1, out2; m->openOutputFile(outputFileName, out); out.close(); m->openOutputFile(accnosFileName, out1); out1.close(); if(processors == 1) { numSeqs = driverGroups(outputFileName, accnosFileName, newCountFile, 0, groups.size(), groups); if (dups) { CountTable c; c.readTable(nameFile, true, false); if (!m->isBlank(newCountFile)) { ifstream in2; m->openInputFile(newCountFile, in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); c.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(newCountFile); c.printTable(newCountFile); } } else { numSeqs = createProcessesGroups(outputFileName, accnosFileName, newCountFile, groups, groupFile, fastaFileNames[s], nameFile); } if (m->control_pressed) { delete ct; delete cparser; for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } map uniqueNames = cparser->getAllSeqsMap(); if (!dups) { numChimeras = deconvoluteResults(uniqueNames, outputFileName, accnosFileName); }else { set doNotRemove; CountTable c; c.readTable(newCountFile, true, true); vector namesInTable = c.getNamesOfSeqs(); for (int i = 0; i < namesInTable.size(); i++) { int temp = c.getNumSeqs(namesInTable[i]); if (temp == 0) { c.remove(namesInTable[i]); } else { doNotRemove.insert((namesInTable[i])); } } //remove names we want to keep from accnos file. set accnosNames = m->readAccnos(accnosFileName); ofstream out2; m->openOutputFile(accnosFileName, out2); for (set::iterator it = accnosNames.begin(); it != accnosNames.end(); it++) { if (doNotRemove.count(*it) == 0) { out2 << (*it) << endl; } } out2.close(); c.printTable(newCountFile); outputNames.push_back(newCountFile); outputTypes["count"].push_back(newCountFile); } delete cparser; m->mothurOut("The number of sequences checked may be larger than the number of unique sequences because some sequences are found in several samples."); m->mothurOutEndLine(); if (m->control_pressed) { delete ct; for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } }else { if (processors != 1) { m->mothurOut("Your count file does not contain group information, mothur can only use 1 processor, continuing."); m->mothurOutEndLine(); processors = 1; } //read sequences and store sorted by frequency vector sequences = readFiles(fastaFileNames[s], ct); if (m->control_pressed) { delete ct; for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } numSeqs = driver(outputFileName, sequences, accnosFileName, numChimeras); } delete ct; }else { if (groupFile != "") { //Parse sequences by group parser = new SequenceParser(groupFile, fastaFileNames[s], nameFile); vector groups = parser->getNamesOfGroups(); if (m->control_pressed) { delete parser; for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } //clears files ofstream out, out1, out2; m->openOutputFile(outputFileName, out); out.close(); m->openOutputFile(accnosFileName, out1); out1.close(); if(processors == 1) { numSeqs = driverGroups(outputFileName, accnosFileName, "", 0, groups.size(), groups); } else { numSeqs = createProcessesGroups(outputFileName, accnosFileName, "", groups, groupFile, fastaFileNames[s], nameFile); } if (m->control_pressed) { delete parser; for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } map uniqueNames = parser->getAllSeqsMap(); if (!dups) { numChimeras = deconvoluteResults(uniqueNames, outputFileName, accnosFileName); } delete parser; m->mothurOut("The number of sequences checked may be larger than the number of unique sequences because some sequences are found in several samples."); m->mothurOutEndLine(); if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } }else{ if (processors != 1) { m->mothurOut("Without a groupfile, mothur can only use 1 processor, continuing."); m->mothurOutEndLine(); processors = 1; } //read sequences and store sorted by frequency vector sequences = readFiles(fastaFileNames[s], nameFile); if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } numSeqs = driver(outputFileName, sequences, accnosFileName, numChimeras); } } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences. " + toString(numChimeras) + " chimeras were found."); m->mothurOutEndLine(); outputNames.push_back(outputFileName); outputTypes["chimera"].push_back(outputFileName); outputNames.push_back(accnosFileName); outputTypes["accnos"].push_back(accnosFileName); } //set accnos file as new current accnosfile string current = ""; itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "execute"); exit(1); } } //********************************************************************************************************************** string ChimeraPerseusCommand::getNamesFile(string& inputFile){ try { string nameFile = ""; m->mothurOutEndLine(); m->mothurOut("No namesfile given, running unique.seqs command to generate one."); m->mothurOutEndLine(); m->mothurOutEndLine(); //use unique.seqs to create new name and fastafile string inputString = "fasta=" + inputFile; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: unique.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* uniqueCommand = new DeconvoluteCommand(inputString); uniqueCommand->execute(); map > filenames = uniqueCommand->getOutputFiles(); delete uniqueCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); nameFile = filenames["name"][0]; inputFile = filenames["fasta"][0]; return nameFile; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "getNamesFile"); exit(1); } } //********************************************************************************************************************** int ChimeraPerseusCommand::driverGroups(string outputFName, string accnos, string countlist, int start, int end, vector groups){ try { int totalSeqs = 0; int numChimeras = 0; ofstream outCountList; if (hasCount && dups) { m->openOutputFile(countlist, outCountList); } for (int i = start; i < end; i++) { m->mothurOutEndLine(); m->mothurOut("Checking sequences from group " + groups[i] + "..."); m->mothurOutEndLine(); int start = time(NULL); if (m->control_pressed) { return 0; } vector sequences = loadSequences(groups[i]); if (m->control_pressed) { return 0; } int numSeqs = driver((outputFName + groups[i]), sequences, (accnos+groups[i]), numChimeras); totalSeqs += numSeqs; if (m->control_pressed) { return 0; } if (dups) { if (!m->isBlank(accnos+groups[i])) { ifstream in; m->openInputFile(accnos+groups[i], in); string name; if (hasCount) { while (!in.eof()) { in >> name; m->gobble(in); outCountList << name << '\t' << groups[i] << endl; } in.close(); }else { map thisnamemap = parser->getNameMap(groups[i]); map::iterator itN; ofstream out; m->openOutputFile(accnos+groups[i]+".temp", out); while (!in.eof()) { in >> name; m->gobble(in); itN = thisnamemap.find(name); if (itN != thisnamemap.end()) { vector tempNames; m->splitAtComma(itN->second, tempNames); for (int j = 0; j < tempNames.size(); j++) { out << tempNames[j] << endl; } }else { m->mothurOut("[ERROR]: parsing cannot find " + name + ".\n"); m->control_pressed = true; } } out.close(); in.close(); m->renameFile(accnos+groups[i]+".temp", accnos+groups[i]); } } } //append files m->appendFiles((outputFName+groups[i]), outputFName); m->mothurRemove((outputFName+groups[i])); m->appendFiles((accnos+groups[i]), accnos); m->mothurRemove((accnos+groups[i])); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences from group " + groups[i] + "."); m->mothurOutEndLine(); } if (hasCount && dups) { outCountList.close(); } return totalSeqs; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "driverGroups"); exit(1); } } //********************************************************************************************************************** vector ChimeraPerseusCommand::loadSequences(string group){ try { bool error = false; alignLength = 0; vector sequences; if (hasCount) { vector thisGroupsSeqs = cparser->getSeqs(group); map counts = cparser->getCountTable(group); map::iterator it; for (int i = 0; i < thisGroupsSeqs.size(); i++) { if (m->control_pressed) { return sequences; } it = counts.find(thisGroupsSeqs[i].getName()); if (it == counts.end()) { error = true; m->mothurOut("[ERROR]: " + thisGroupsSeqs[i].getName() + " is in your fasta file and not in your count file, please correct."); m->mothurOutEndLine(); } else { thisGroupsSeqs[i].setAligned(removeNs(thisGroupsSeqs[i].getUnaligned())); sequences.push_back(seqData(thisGroupsSeqs[i].getName(), thisGroupsSeqs[i].getUnaligned(), it->second)); if (thisGroupsSeqs[i].getUnaligned().length() > alignLength) { alignLength = thisGroupsSeqs[i].getUnaligned().length(); } } } }else{ vector thisGroupsSeqs = parser->getSeqs(group); map nameMap = parser->getNameMap(group); map::iterator it; for (int i = 0; i < thisGroupsSeqs.size(); i++) { if (m->control_pressed) { return sequences; } it = nameMap.find(thisGroupsSeqs[i].getName()); if (it == nameMap.end()) { error = true; m->mothurOut("[ERROR]: " + thisGroupsSeqs[i].getName() + " is in your fasta file and not in your namefile, please correct."); m->mothurOutEndLine(); } else { int num = m->getNumNames(it->second); thisGroupsSeqs[i].setAligned(removeNs(thisGroupsSeqs[i].getUnaligned())); sequences.push_back(seqData(thisGroupsSeqs[i].getName(), thisGroupsSeqs[i].getUnaligned(), num)); if (thisGroupsSeqs[i].getUnaligned().length() > alignLength) { alignLength = thisGroupsSeqs[i].getUnaligned().length(); } } } } if (error) { m->control_pressed = true; } //sort by frequency sort(sequences.rbegin(), sequences.rend()); return sequences; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "loadSequences"); exit(1); } } //********************************************************************************************************************** vector ChimeraPerseusCommand::readFiles(string inputFile, string name){ try { map::iterator it; map nameMap = m->readNames(name); //read fasta file and create sequenceData structure - checking for file mismatches vector sequences; bool error = false; ifstream in; m->openInputFile(inputFile, in); alignLength = 0; while (!in.eof()) { if (m->control_pressed) { in.close(); return sequences; } Sequence temp(in); m->gobble(in); it = nameMap.find(temp.getName()); if (it == nameMap.end()) { error = true; m->mothurOut("[ERROR]: " + temp.getName() + " is in your fasta file and not in your namefile, please correct."); m->mothurOutEndLine(); } else { temp.setAligned(removeNs(temp.getUnaligned())); sequences.push_back(seqData(temp.getName(), temp.getUnaligned(), it->second)); if (temp.getUnaligned().length() > alignLength) { alignLength = temp.getUnaligned().length(); } } } in.close(); if (error) { m->control_pressed = true; } //sort by frequency sort(sequences.rbegin(), sequences.rend()); return sequences; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "readFiles"); exit(1); } } //********************************************************************************************************************** string ChimeraPerseusCommand::removeNs(string seq){ try { string newSeq = ""; for (int i = 0; i < seq.length(); i++) { if (seq[i] != 'N') { newSeq += seq[i]; } } return newSeq; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "removeNs"); exit(1); } } //********************************************************************************************************************** vector ChimeraPerseusCommand::readFiles(string inputFile, CountTable* ct){ try { //read fasta file and create sequenceData structure - checking for file mismatches vector sequences; ifstream in; m->openInputFile(inputFile, in); alignLength = 0; while (!in.eof()) { Sequence temp(in); m->gobble(in); int count = ct->getNumSeqs(temp.getName()); if (m->control_pressed) { break; } else { temp.setAligned(removeNs(temp.getUnaligned())); sequences.push_back(seqData(temp.getName(), temp.getUnaligned(), count)); if (temp.getUnaligned().length() > alignLength) { alignLength = temp.getUnaligned().length(); } } } in.close(); //sort by frequency sort(sequences.rbegin(), sequences.rend()); return sequences; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "getNamesFile"); exit(1); } } //********************************************************************************************************************** int ChimeraPerseusCommand::driver(string chimeraFileName, vector& sequences, string accnosFileName, int& numChimeras){ try { vector > correctModel(4); //could be an option in the future to input own model matrix for(int i=0;i<4;i++){ correctModel[i].resize(4); } correctModel[0][0] = 0.000000; //AA correctModel[1][0] = 11.619259; //CA correctModel[2][0] = 11.694004; //TA correctModel[3][0] = 7.748623; //GA correctModel[1][1] = 0.000000; //CC correctModel[2][1] = 7.619657; //TC correctModel[3][1] = 12.852562; //GC correctModel[2][2] = 0.000000; //TT correctModel[3][2] = 10.964048; //TG correctModel[3][3] = 0.000000; //GG for(int i=0;i<4;i++){ for(int j=0;jopenOutputFile(chimeraFileName, chimeraFile); m->openOutputFile(accnosFileName, accnosFile); Perseus myPerseus; vector > binMatrix = myPerseus.binomial(alignLength); chimeraFile << "SequenceIndex\tName\tDiffsToBestMatch\tBestMatchIndex\tBestMatchName\tDiffstToChimera\tIndexofLeftParent\tIndexOfRightParent\tNameOfLeftParent\tNameOfRightParent\tDistanceToBestMatch\tcIndex\t(cIndex - singleDist)\tloonIndex\tMismatchesToChimera\tMismatchToTrimera\tChimeraBreakPoint\tLogisticProbability\tTypeOfSequence\n"; vector chimeras(numSeqs, 0); for(int i=0;icontrol_pressed) { chimeraFile.close(); m->mothurRemove(chimeraFileName); accnosFile.close(); m->mothurRemove(accnosFileName); return 0; } vector restricted = chimeras; vector > leftDiffs(numSeqs); vector > leftMaps(numSeqs); vector > rightDiffs(numSeqs); vector > rightMaps(numSeqs); vector singleLeft, bestLeft; vector singleRight, bestRight; int bestSingleIndex, bestSingleDiff; vector alignments(numSeqs); int comparisons = myPerseus.getAlignments(i, sequences, alignments, leftDiffs, leftMaps, rightDiffs, rightMaps, bestSingleIndex, bestSingleDiff, restricted); if (m->control_pressed) { chimeraFile.close(); m->mothurRemove(chimeraFileName); accnosFile.close(); m->mothurRemove(accnosFileName); return 0; } int minMismatchToChimera, leftParentBi, rightParentBi, breakPointBi; string dummyA, dummyB; if (sequences[i].sequence.size() < 3) { chimeraFile << i << '\t' << sequences[i].seqName << "\t0\t0\tNull\t0\t0\t0\tNull\tNull\t0.0\t0.0\t0.0\t0\t0\t0\t0.0\t0.0\tgood" << endl; }else if(comparisons >= 2){ minMismatchToChimera = myPerseus.getChimera(sequences, leftDiffs, rightDiffs, leftParentBi, rightParentBi, breakPointBi, singleLeft, bestLeft, singleRight, bestRight, restricted); if (m->control_pressed) { chimeraFile.close(); m->mothurRemove(chimeraFileName); accnosFile.close(); m->mothurRemove(accnosFileName); return 0; } int minMismatchToTrimera = numeric_limits::max(); int leftParentTri, middleParentTri, rightParentTri, breakPointTriA, breakPointTriB; if(minMismatchToChimera >= 3 && comparisons >= 3){ minMismatchToTrimera = myPerseus.getTrimera(sequences, leftDiffs, leftParentTri, middleParentTri, rightParentTri, breakPointTriA, breakPointTriB, singleLeft, bestLeft, singleRight, bestRight, restricted); if (m->control_pressed) { chimeraFile.close(); m->mothurRemove(chimeraFileName); accnosFile.close(); m->mothurRemove(accnosFileName); return 0; } } double singleDist = myPerseus.modeledPairwiseAlignSeqs(sequences[i].sequence, sequences[bestSingleIndex].sequence, dummyA, dummyB, correctModel); if (m->control_pressed) { chimeraFile.close(); m->mothurRemove(chimeraFileName); accnosFile.close(); m->mothurRemove(accnosFileName); return 0; } string type; string chimeraRefSeq; if(minMismatchToChimera - minMismatchToTrimera >= 3){ type = "trimera"; chimeraRefSeq = myPerseus.stitchTrimera(alignments, leftParentTri, middleParentTri, rightParentTri, breakPointTriA, breakPointTriB, leftMaps, rightMaps); } else{ type = "chimera"; chimeraRefSeq = myPerseus.stitchBimera(alignments, leftParentBi, rightParentBi, breakPointBi, leftMaps, rightMaps); } if (m->control_pressed) { chimeraFile.close(); m->mothurRemove(chimeraFileName); accnosFile.close(); m->mothurRemove(accnosFileName); return 0; } double chimeraDist = myPerseus.modeledPairwiseAlignSeqs(sequences[i].sequence, chimeraRefSeq, dummyA, dummyB, correctModel); if (m->control_pressed) { chimeraFile.close(); m->mothurRemove(chimeraFileName); accnosFile.close(); m->mothurRemove(accnosFileName); return 0; } double cIndex = chimeraDist;//modeledPairwiseAlignSeqs(sequences[i].sequence, chimeraRefSeq); double loonIndex = myPerseus.calcLoonIndex(sequences[i].sequence, sequences[leftParentBi].sequence, sequences[rightParentBi].sequence, breakPointBi, binMatrix); if (m->control_pressed) { chimeraFile.close(); m->mothurRemove(chimeraFileName); accnosFile.close(); m->mothurRemove(accnosFileName); return 0; } chimeraFile << i << '\t' << sequences[i].seqName << '\t' << bestSingleDiff << '\t' << bestSingleIndex << '\t' << sequences[bestSingleIndex].seqName << '\t'; chimeraFile << minMismatchToChimera << '\t' << leftParentBi << '\t' << rightParentBi << '\t' << sequences[leftParentBi].seqName << '\t' << sequences[rightParentBi].seqName << '\t'; chimeraFile << singleDist << '\t' << cIndex << '\t' << (cIndex - singleDist) << '\t' << loonIndex << '\t'; chimeraFile << minMismatchToChimera << '\t' << minMismatchToTrimera << '\t' << breakPointBi << '\t'; double probability = myPerseus.classifyChimera(singleDist, cIndex, loonIndex, alpha, beta); chimeraFile << probability << '\t'; if(probability > cutoff){ chimeraFile << type << endl; accnosFile << sequences[i].seqName << endl; chimeras[i] = 1; numChimeras++; } else{ chimeraFile << "good" << endl; } } else{ chimeraFile << i << '\t' << sequences[i].seqName << "\t0\t0\tNull\t0\t0\t0\tNull\tNull\t0.0\t0.0\t0.0\t0\t0\t0\t0.0\t0.0\tgood" << endl; } //report progress if((i+1) % 100 == 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(i+1) + "\n"); } } if((numSeqs) % 100 != 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(numSeqs) + "\n"); } chimeraFile.close(); accnosFile.close(); return numSeqs; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "driver"); exit(1); } } /**************************************************************************************************/ int ChimeraPerseusCommand::createProcessesGroups(string outputFName, string accnos, string newCountFile, vector groups, string group, string fasta, string name) { try { vector processIDS; int process = 1; int num = 0; bool recalc = false; CountTable newCount; if (hasCount && dups) { newCount.readTable(name, true, false); } //sanity check if (groups.size() < processors) { processors = groups.size(); } //divide the groups between the processors vector lines; int remainingPairs = groups.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverGroups(outputFName + toString(m->mothurGetpid(process)) + ".temp", accnos + toString(m->mothurGetpid(process)) + ".temp", accnos + ".byCount." + toString(m->mothurGetpid(process)) + ".temp", lines[process].start, lines[process].end, groups); //pass numSeqs to parent ofstream out; string tempFile = outputFName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); remainingPairs = groups.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } num = 0; processIDS.resize(0); process = 1; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverGroups(outputFName + toString(m->mothurGetpid(process)) + ".temp", accnos + toString(m->mothurGetpid(process)) + ".temp", accnos + ".byCount." + toString(m->mothurGetpid(process)) + ".temp", lines[process].start, lines[process].end, groups); //pass numSeqs to parent ofstream out; string tempFile = outputFName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part num = driverGroups(outputFName, accnos, accnos + ".byCount", lines[0].start, lines[0].end, groups); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the preClusterData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; icount; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //read my own if (hasCount && dups) { if (!m->isBlank(accnos + ".byCount")) { ifstream in2; m->openInputFile(accnos + ".byCount", in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); newCount.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(accnos + ".byCount"); } //append output files for(int i=0;iappendFiles((outputFName + toString(processIDS[i]) + ".temp"), outputFName); m->mothurRemove((outputFName + toString(processIDS[i]) + ".temp")); m->appendFiles((accnos + toString(processIDS[i]) + ".temp"), accnos); m->mothurRemove((accnos + toString(processIDS[i]) + ".temp")); if (hasCount && dups) { if (!m->isBlank(accnos + ".byCount." + toString(processIDS[i]) + ".temp")) { ifstream in2; m->openInputFile(accnos + ".byCount." + toString(processIDS[i]) + ".temp", in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); newCount.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(accnos + ".byCount." + toString(processIDS[i]) + ".temp"); } } //print new *.pick.count_table if (hasCount && dups) { newCount.printTable(newCountFile); } return num; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "createProcessesGroups"); exit(1); } } //********************************************************************************************************************** int ChimeraPerseusCommand::deconvoluteResults(map& uniqueNames, string outputFileName, string accnosFileName){ try { map::iterator itUnique; int total = 0; //edit accnos file ifstream in2; m->openInputFile(accnosFileName, in2); ofstream out2; m->openOutputFile(accnosFileName+".temp", out2); string name; set namesInFile; //this is so if a sequence is found to be chimera in several samples we dont write it to the results file more than once set::iterator itNames; set chimerasInFile; set::iterator itChimeras; while (!in2.eof()) { if (m->control_pressed) { in2.close(); out2.close(); m->mothurRemove(outputFileName); m->mothurRemove((accnosFileName+".temp")); return 0; } in2 >> name; m->gobble(in2); //find unique name itUnique = uniqueNames.find(name); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing accnos results. Cannot find "+ name + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { itChimeras = chimerasInFile.find((itUnique->second)); if (itChimeras == chimerasInFile.end()) { out2 << itUnique->second << endl; chimerasInFile.insert((itUnique->second)); total++; } } } in2.close(); out2.close(); m->mothurRemove(accnosFileName); rename((accnosFileName+".temp").c_str(), accnosFileName.c_str()); //edit chimera file ifstream in; m->openInputFile(outputFileName, in); ofstream out; m->openOutputFile(outputFileName+".temp", out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); int DiffsToBestMatch, BestMatchIndex, DiffstToChimera, IndexofLeftParent, IndexOfRightParent; float temp1,temp2, temp3, temp4, temp5, temp6, temp7, temp8; string index, BestMatchName, parent1, parent2, flag; name = ""; namesInFile.clear(); //assumptions - in file each read will always look like /* SequenceIndex Name DiffsToBestMatch BestMatchIndex BestMatchName DiffstToChimera IndexofLeftParent IndexOfRightParent NameOfLeftParent NameOfRightParent DistanceToBestMatch cIndex (cIndex - singleDist) loonIndex MismatchesToChimera MismatchToTrimera ChimeraBreakPoint LogisticProbability TypeOfSequence 0 F01QG4L02JVBQY 0 0 Null 0 0 0 Null Null 0.0 0.0 0.0 0.0 0 0 0 0.0 0.0 good 1 F01QG4L02ICTC6 0 0 Null 0 0 0 Null Null 0.0 0.0 0.0 0.0 0 0 0 0.0 0.0 good 2 F01QG4L02JZOEC 48 0 F01QG4L02JVBQY 47 0 0 F01QG4L02JVBQY F01QG4L02JVBQY 2.0449 2.03545 -0.00944493 0 47 2147483647 138 0 good 3 F01QG4L02G7JEC 42 0 F01QG4L02JVBQY 40 1 0 F01QG4L02ICTC6 F01QG4L02JVBQY 1.87477 1.81113 -0.0636404 5.80145 40 2147483647 25 0 good */ //get and print headers BestMatchName = m->getline(in); m->gobble(in); out << BestMatchName << endl; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove((outputFileName+".temp")); return 0; } bool print = false; in >> index; m->gobble(in); if (index != "SequenceIndex") { //if you are not a header line, there will be a header line for each group if group file is given in >> name; m->gobble(in); in >> DiffsToBestMatch; m->gobble(in); in >> BestMatchIndex; m->gobble(in); in >> BestMatchName; m->gobble(in); in >> DiffstToChimera; m->gobble(in); in >> IndexofLeftParent; m->gobble(in); in >> IndexOfRightParent; m->gobble(in); in >> parent1; m->gobble(in); in >> parent2; m->gobble(in); in >> temp1 >> temp2 >> temp3 >> temp4 >> temp5 >> temp6 >> temp7 >> temp8 >> flag; m->gobble(in); //find unique name itUnique = uniqueNames.find(name); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find "+ name + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { name = itUnique->second; //is this name already in the file itNames = namesInFile.find((name)); if (itNames == namesInFile.end()) { //no not in file if (flag == "good") { //are you really a no?? //is this sequence really not chimeric?? itChimeras = chimerasInFile.find(name); //then you really are a no so print, otherwise skip if (itChimeras == chimerasInFile.end()) { print = true; } }else{ print = true; } } } if (print) { out << index << '\t' << name << '\t' << DiffsToBestMatch << '\t' << BestMatchIndex << '\t'; namesInFile.insert(name); if (BestMatchName != "Null") { itUnique = uniqueNames.find(BestMatchName); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find BestMatchName "+ BestMatchName + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { out << itUnique->second << '\t'; } }else { out << "Null" << '\t'; } out << DiffstToChimera << '\t' << IndexofLeftParent << '\t' << IndexOfRightParent << '\t'; if (parent1 != "Null") { itUnique = uniqueNames.find(parent1); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find parent1 "+ parent1 + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { out << itUnique->second << '\t'; } }else { out << "Null" << '\t'; } if (parent1 != "Null") { itUnique = uniqueNames.find(parent2); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find parent2 "+ parent2 + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { out << itUnique->second << '\t'; } }else { out << "Null" << '\t'; } out << temp1 << '\t' << temp2 << '\t' << temp3 << '\t' << temp4 << '\t' << temp5 << '\t' << temp6 << '\t' << temp7 << '\t' << temp8 << '\t' << flag << endl; } }else { index = m->getline(in); m->gobble(in); } } in.close(); out.close(); m->mothurRemove(outputFileName); rename((outputFileName+".temp").c_str(), outputFileName.c_str()); return total; } catch(exception& e) { m->errorOut(e, "ChimeraPerseusCommand", "deconvoluteResults"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/chimeraperseuscommand.h000066400000000000000000000477131255543666200230020ustar00rootroot00000000000000#ifndef CHIMERAPERSEUSCOMMAND_H #define CHIMERAPERSEUSCOMMAND_H /* * chimeraperseuscommand.h * Mothur * * Created by westcott on 10/26/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "sequenceparser.h" #include "sequencecountparser.h" #include "myPerseus.h" #include "counttable.h" /***********************************************************/ class ChimeraPerseusCommand : public Command { public: ChimeraPerseusCommand(string); ChimeraPerseusCommand(); ~ChimeraPerseusCommand() {} vector setParameters(); string getCommandName() { return "chimera.perseus"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Quince C, Lanzen A, Davenport RJ, Turnbaugh PJ (2011). Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12:38.\nEdgar,R.C., Haas,B.J., Clemente,J.C., Quince,C. and Knight,R. (2011), UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27:2194.\nhttp://www.mothur.org/wiki/Chimera.perseus\n"; } string getDescription() { return "detect chimeric sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, hasName, hasCount, dups; string fastafile, groupfile, countfile, outputDir, namefile; int processors, alignLength; double cutoff, alpha, beta; SequenceParser* parser; SequenceCountParser* cparser; vector outputNames; vector fastaFileNames; vector nameFileNames; vector groupFileNames; string getNamesFile(string&); int driver(string, vector&, string, int&); vector readFiles(string, string); vector readFiles(string inputFile, CountTable* ct); vector loadSequences(string); int deconvoluteResults(map&, string, string); int driverGroups(string, string, string, int, int, vector); int createProcessesGroups(string, string, string, vector, string, string, string); string removeNs(string); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct perseusData { string fastafile; string namefile; string groupfile; string outputFName; string accnos; string countlist; MothurOut* m; int start; int end; bool hasName, hasCount, dups; int threadID, count, numChimeras; double alpha, beta, cutoff; vector groups; perseusData(){} perseusData(bool dps, bool hn, bool hc, double a, double b, double c, string o, string f, string n, string g, string ac, string ctlist, vector gr, MothurOut* mout, int st, int en, int tid) { alpha = a; beta = b; cutoff = c; fastafile = f; namefile = n; groupfile = g; outputFName = o; countlist = ctlist; accnos = ac; m = mout; start = st; end = en; threadID = tid; groups = gr; hasName = hn; hasCount = hc; dups = dps; count = 0; numChimeras = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyPerseusThreadFunction(LPVOID lpParam){ perseusData* pDataArray; pDataArray = (perseusData*)lpParam; try { //clears files ofstream out, out1, out2; pDataArray->m->openOutputFile(pDataArray->outputFName, out); out.close(); pDataArray->m->openOutputFile(pDataArray->accnos, out1); out1.close(); //parse fasta and name file by group SequenceParser* parser; SequenceCountParser* cparser; if (pDataArray->hasCount) { CountTable* ct = new CountTable(); ct->readTable(pDataArray->namefile, true, false); cparser = new SequenceCountParser(pDataArray->fastafile, *ct); delete ct; }else { if (pDataArray->namefile != "") { parser = new SequenceParser(pDataArray->groupfile, pDataArray->fastafile, pDataArray->namefile); } else { parser = new SequenceParser(pDataArray->groupfile, pDataArray->fastafile); } } int totalSeqs = 0; int numChimeras = 0; ofstream outCountList; if (pDataArray->hasCount && pDataArray->dups) { pDataArray->m->openOutputFile(pDataArray->countlist, outCountList); } for (int u = pDataArray->start; u < pDataArray->end; u++) { int start = time(NULL); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); return 0; } pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("Checking sequences from group " + pDataArray->groups[u] + "..."); pDataArray->m->mothurOutEndLine(); //vector sequences = loadSequences(parser, groups[i]); - same function below //////////////////////////////////////////////////////////////////////////////////////// bool error = false; int alignLength = 0; vector sequences; if (pDataArray->hasCount) { vector thisGroupsSeqs = cparser->getSeqs(pDataArray->groups[u]); map counts = cparser->getCountTable(pDataArray->groups[u]); map::iterator it; for (int i = 0; i < thisGroupsSeqs.size(); i++) { if (pDataArray->m->control_pressed) { break; } it = counts.find(thisGroupsSeqs[i].getName()); if (it == counts.end()) { error = true; pDataArray->m->mothurOut("[ERROR]: " + thisGroupsSeqs[i].getName() + " is in your fasta file and not in your count file, please correct."); pDataArray->m->mothurOutEndLine(); } else { string newSeq = ""; string tempSeq = thisGroupsSeqs[i].getUnaligned(); for (int j = 0; j < tempSeq.length(); j++) { if (tempSeq[j] != 'N') { newSeq += tempSeq[j]; } } thisGroupsSeqs[i].setAligned(newSeq); sequences.push_back(seqData(thisGroupsSeqs[i].getName(), thisGroupsSeqs[i].getUnaligned(), it->second)); if (thisGroupsSeqs[i].getUnaligned().length() > alignLength) { alignLength = thisGroupsSeqs[i].getUnaligned().length(); } } } }else{ vector thisGroupsSeqs = parser->getSeqs(pDataArray->groups[u]); map nameMap = parser->getNameMap(pDataArray->groups[u]); map::iterator it; for (int i = 0; i < thisGroupsSeqs.size(); i++) { if (pDataArray->m->control_pressed) { break; } it = nameMap.find(thisGroupsSeqs[i].getName()); if (it == nameMap.end()) { error = true; pDataArray->m->mothurOut("[ERROR]: " + thisGroupsSeqs[i].getName() + " is in your fasta file and not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); } else { int num = pDataArray->m->getNumNames(it->second); string newSeq = ""; string tempSeq = thisGroupsSeqs[i].getUnaligned(); for (int j = 0; j < tempSeq.length(); j++) { if (tempSeq[j] != 'N') { newSeq += tempSeq[j]; } } thisGroupsSeqs[i].setAligned(newSeq); sequences.push_back(seqData(thisGroupsSeqs[i].getName(), thisGroupsSeqs[i].getUnaligned(), num)); if (thisGroupsSeqs[i].getUnaligned().length() > alignLength) { alignLength = thisGroupsSeqs[i].getUnaligned().length(); } } } } if (error) { pDataArray->m->control_pressed = true; } //sort by frequency sort(sequences.rbegin(), sequences.rend()); //////////////////////////////////////////////////////////////////////////////////////// if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); return 0; } //int numSeqs = driver((outputFName + groups[i]), sequences, (accnos+groups[i]), numChimeras); - same function below //////////////////////////////////////////////////////////////////////////////////////// string chimeraFileName = pDataArray->outputFName+pDataArray->groups[u]; string accnosFileName = pDataArray->accnos+pDataArray->groups[u]; vector > correctModel(4); //could be an option in the future to input own model matrix for(int j=0;j<4;j++){ correctModel[j].resize(4); } correctModel[0][0] = 0.000000; //AA correctModel[1][0] = 11.619259; //CA correctModel[2][0] = 11.694004; //TA correctModel[3][0] = 7.748623; //GA correctModel[1][1] = 0.000000; //CC correctModel[2][1] = 7.619657; //TC correctModel[3][1] = 12.852562; //GC correctModel[2][2] = 0.000000; //TT correctModel[3][2] = 10.964048; //TG correctModel[3][3] = 0.000000; //GG for(int k=0;k<4;k++){ for(int j=0;jm->openOutputFile(chimeraFileName, chimeraFile); pDataArray->m->openOutputFile(accnosFileName, accnosFile); Perseus myPerseus; vector > binMatrix = myPerseus.binomial(alignLength); chimeraFile << "SequenceIndex\tName\tDiffsToBestMatch\tBestMatchIndex\tBestMatchName\tDiffstToChimera\tIndexofLeftParent\tIndexOfRightParent\tNameOfLeftParent\tNameOfRightParent\tDistanceToBestMatch\tcIndex\t(cIndex - singleDist)\tloonIndex\tMismatchesToChimera\tMismatchToTrimera\tChimeraBreakPoint\tLogisticProbability\tTypeOfSequence\n"; vector chimeras(numSeqs, 0); for(int j=0;jm->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); chimeraFile.close(); pDataArray->m->mothurRemove(chimeraFileName); accnosFile.close(); pDataArray->m->mothurRemove(accnosFileName); return 0; } vector restricted = chimeras; vector > leftDiffs(numSeqs); vector > leftMaps(numSeqs); vector > rightDiffs(numSeqs); vector > rightMaps(numSeqs); vector singleLeft, bestLeft; vector singleRight, bestRight; int bestSingleIndex, bestSingleDiff; vector alignments(numSeqs); int comparisons = myPerseus.getAlignments(j, sequences, alignments, leftDiffs, leftMaps, rightDiffs, rightMaps, bestSingleIndex, bestSingleDiff, restricted); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); chimeraFile.close(); pDataArray->m->mothurRemove(chimeraFileName); accnosFile.close(); pDataArray->m->mothurRemove(accnosFileName); return 0; } int minMismatchToChimera, leftParentBi, rightParentBi, breakPointBi; string dummyA, dummyB; if(comparisons >= 2){ minMismatchToChimera = myPerseus.getChimera(sequences, leftDiffs, rightDiffs, leftParentBi, rightParentBi, breakPointBi, singleLeft, bestLeft, singleRight, bestRight, restricted); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); chimeraFile.close(); pDataArray->m->mothurRemove(chimeraFileName); accnosFile.close(); pDataArray->m->mothurRemove(accnosFileName); return 0; } int minMismatchToTrimera = numeric_limits::max(); int leftParentTri, middleParentTri, rightParentTri, breakPointTriA, breakPointTriB; if(minMismatchToChimera >= 3 && comparisons >= 3){ minMismatchToTrimera = myPerseus.getTrimera(sequences, leftDiffs, leftParentTri, middleParentTri, rightParentTri, breakPointTriA, breakPointTriB, singleLeft, bestLeft, singleRight, bestRight, restricted); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); chimeraFile.close(); pDataArray->m->mothurRemove(chimeraFileName); accnosFile.close(); pDataArray->m->mothurRemove(accnosFileName); return 0; } } double singleDist = myPerseus.modeledPairwiseAlignSeqs(sequences[j].sequence, sequences[bestSingleIndex].sequence, dummyA, dummyB, correctModel); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); chimeraFile.close(); pDataArray->m->mothurRemove(chimeraFileName); accnosFile.close(); pDataArray->m->mothurRemove(accnosFileName); return 0; } string type; string chimeraRefSeq; if(minMismatchToChimera - minMismatchToTrimera >= 3){ type = "trimera"; chimeraRefSeq = myPerseus.stitchTrimera(alignments, leftParentTri, middleParentTri, rightParentTri, breakPointTriA, breakPointTriB, leftMaps, rightMaps); } else{ type = "chimera"; chimeraRefSeq = myPerseus.stitchBimera(alignments, leftParentBi, rightParentBi, breakPointBi, leftMaps, rightMaps); } if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; }; pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); chimeraFile.close(); pDataArray->m->mothurRemove(chimeraFileName); accnosFile.close(); pDataArray->m->mothurRemove(accnosFileName); return 0; } double chimeraDist = myPerseus.modeledPairwiseAlignSeqs(sequences[j].sequence, chimeraRefSeq, dummyA, dummyB, correctModel); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); chimeraFile.close(); pDataArray->m->mothurRemove(chimeraFileName); accnosFile.close(); pDataArray->m->mothurRemove(accnosFileName); return 0; } double cIndex = chimeraDist;//modeledPairwiseAlignSeqs(sequences[j].sequence, chimeraRefSeq); double loonIndex = myPerseus.calcLoonIndex(sequences[j].sequence, sequences[leftParentBi].sequence, sequences[rightParentBi].sequence, breakPointBi, binMatrix); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); chimeraFile.close(); pDataArray->m->mothurRemove(chimeraFileName); accnosFile.close(); pDataArray->m->mothurRemove(accnosFileName); return 0; } chimeraFile << j << '\t' << sequences[j].seqName << '\t' << bestSingleDiff << '\t' << bestSingleIndex << '\t' << sequences[bestSingleIndex].seqName << '\t'; chimeraFile << minMismatchToChimera << '\t' << leftParentBi << '\t' << rightParentBi << '\t' << sequences[leftParentBi].seqName << '\t' << sequences[rightParentBi].seqName << '\t'; chimeraFile << singleDist << '\t' << cIndex << '\t' << (cIndex - singleDist) << '\t' << loonIndex << '\t'; chimeraFile << minMismatchToChimera << '\t' << minMismatchToTrimera << '\t' << breakPointBi << '\t'; double probability = myPerseus.classifyChimera(singleDist, cIndex, loonIndex, pDataArray->alpha, pDataArray->beta); chimeraFile << probability << '\t'; if(probability > pDataArray->cutoff){ chimeraFile << type << endl; accnosFile << sequences[j].seqName << endl; chimeras[j] = 1; numChimeras++; } else{ chimeraFile << "good" << endl; } } else{ chimeraFile << j << '\t' << sequences[j].seqName << "\t0\t0\tNull\t0\t0\t0\tNull\tNull\t0.0\t0.0\t0.0\t0\t0\t0\t0.0\t0.0\tgood" << endl; } //report progress if((j+1) % 100 == 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(j+1) + "\n"); } } if((numSeqs) % 100 != 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(numSeqs) + "\n"); } chimeraFile.close(); accnosFile.close(); //////////////////////////////////////////////////////////////////////////////////////// totalSeqs += numSeqs; if (pDataArray->dups) { if (!pDataArray->m->isBlank(accnosFileName)) { ifstream in; pDataArray->m->openInputFile(accnosFileName, in); string name; if (pDataArray->hasCount) { while (!in.eof()) { in >> name; pDataArray->m->gobble(in); outCountList << name << '\t' << pDataArray->groups[u] << endl; } in.close(); }else { map thisnamemap = parser->getNameMap(pDataArray->groups[u]); map::iterator itN; ofstream out; pDataArray->m->openOutputFile(accnosFileName+".temp", out); while (!in.eof()) { in >> name; pDataArray->m->gobble(in); itN = thisnamemap.find(name); if (itN != thisnamemap.end()) { vector tempNames; pDataArray->m->splitAtComma(itN->second, tempNames); for (int j = 0; j < tempNames.size(); j++) { out << tempNames[j] << endl; } }else { pDataArray->m->mothurOut("[ERROR]: parsing cannot find " + name + ".\n"); pDataArray->m->control_pressed = true; } } out.close(); in.close(); pDataArray->m->renameFile(accnosFileName+".temp", accnosFileName); } } } //append files pDataArray->m->appendFiles(chimeraFileName, pDataArray->outputFName); pDataArray->m->mothurRemove(chimeraFileName); pDataArray->m->appendFiles(accnosFileName, pDataArray->accnos); pDataArray->m->mothurRemove(accnosFileName); pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences from group " + pDataArray->groups[u] + "."); pDataArray->m->mothurOutEndLine(); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } pDataArray->m->mothurRemove(pDataArray->outputFName); pDataArray->m->mothurRemove(pDataArray->accnos); return 0; } } if (pDataArray->hasCount && pDataArray->dups) { outCountList.close(); } pDataArray->count = totalSeqs; if (pDataArray->hasCount) { delete cparser; } { delete parser; } return totalSeqs; } catch(exception& e) { pDataArray->m->errorOut(e, "ChimeraUchimeCommand", "MyPerseusThreadFunction"); exit(1); } } /**************************************************************************************************/ #endif #endif mothur-1.36.1/source/commands/chimerapintailcommand.cpp000066400000000000000000001056241255543666200233030ustar00rootroot00000000000000/* * chimerapintailcommand.cpp * Mothur * * Created by westcott on 4/1/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "chimerapintailcommand.h" #include "pintail.h" //********************************************************************************************************************** vector ChimeraPintailCommand::setParameters(){ try { CommandParameter ptemplate("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptemplate); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","chimera-accnos",false,true,true); parameters.push_back(pfasta); CommandParameter pconservation("conservation", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pconservation); CommandParameter pquantile("quantile", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pquantile); CommandParameter pfilter("filter", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pfilter); CommandParameter pwindow("window", "Number", "", "0", "", "", "","","",false,false); parameters.push_back(pwindow); CommandParameter pincrement("increment", "Number", "", "25", "", "", "","",false,false); parameters.push_back(pincrement); CommandParameter pmask("mask", "String", "", "", "", "", "","",false,false); parameters.push_back(pmask); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter psave("save", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psave); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ChimeraPintailCommand::getHelpString(){ try { string helpString = ""; helpString += "The chimera.pintail command reads a fastafile and referencefile and outputs potentially chimeric sequences.\n"; helpString += "This command was created using the algorithms described in the 'At Least 1 in 20 16S rRNA Sequence Records Currently Held in the Public Repositories is Estimated To Contain Substantial Anomalies' paper by Kevin E. Ashelford 1, Nadia A. Chuzhanova 3, John C. Fry 1, Antonia J. Jones 2 and Andrew J. Weightman 1.\n"; helpString += "The chimera.pintail command parameters are fasta, reference, filter, mask, processors, window, increment, conservation and quantile.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your potentially chimeric sequences, and is required unless you have a valid current fasta file. \n"; helpString += "You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amzon.fasta \n"; helpString += "The reference parameter allows you to enter a reference file containing known non-chimeric sequences, and is required. \n"; helpString += "The filter parameter allows you to specify if you would like to apply a vertical and 50% soft filter. \n"; helpString += "The mask parameter allows you to specify a file containing one sequence you wish to use as a mask for the your sequences, by default no mask is applied. You can apply an ecoli mask by typing, mask=default. \n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; #ifdef USE_MPI helpString += "When using MPI, the processors parameter is set to the number of MPI processes running. \n"; #endif helpString += "If the save parameter is set to true the reference sequences will be saved in memory, to clear them later you can use the clear.memory command. Default=f."; helpString += "The window parameter allows you to specify the window size for searching for chimeras, default=300. \n"; helpString += "The increment parameter allows you to specify how far you move each window while finding chimeric sequences, default=25.\n"; helpString += "The conservation parameter allows you to enter a frequency file containing the highest bases frequency at each place in the alignment.\n"; helpString += "The quantile parameter allows you to enter a file containing quantiles for a template files sequences, if you use the filter the quantile file generated becomes unique to the fasta file you used.\n"; helpString += "The chimera.pintail command should be in the following format: \n"; helpString += "chimera.pintail(fasta=yourFastaFile, reference=yourTemplate) \n"; helpString += "Example: chimera.pintail(fasta=AD.align, reference=silva.bacteria.fasta) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ChimeraPintailCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "chimera") { pattern = "[filename],[tag],pintail.chimeras-[filename],pintail.chimeras"; } else if (type == "accnos") { pattern = "[filename],[tag],pintail.accnos-[filename],pintail.accnos"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ChimeraPintailCommand::ChimeraPintailCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "ChimeraPintailCommand"); exit(1); } } //*************************************************************************************************************** ChimeraPintailCommand::ChimeraPintailCommand(string option) { try { abort = false; calledHelp = false; rdb = ReferenceDB::getInstance(); //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("chimera.pintail"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } } it = parameters.find("conservation"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["conservation"] = inputDir + it->second; } } it = parameters.find("quantile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["quantile"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", false); if (fastafile == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(fastafile, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } string temp; temp = validParameter.validFile(parameters, "filter", false); if (temp == "not found") { temp = "F"; } filter = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "window", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, window); temp = validParameter.validFile(parameters, "increment", false); if (temp == "not found") { temp = "25"; } m->mothurConvert(temp, increment); temp = validParameter.validFile(parameters, "save", false); if (temp == "not found"){ temp = "f"; } save = m->isTrue(temp); rdb->save = save; if (save) { //clear out old references rdb->clearMemory(); } //this has to go after save so that if the user sets save=t and provides no reference we abort templatefile = validParameter.validFile(parameters, "reference", true); if (templatefile == "not found") { //check for saved reference sequences if (rdb->referenceSeqs.size() != 0) { templatefile = "saved"; }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required."); m->mothurOutEndLine(); abort = true; } }else if (templatefile == "not open") { abort = true; } else { if (save) { rdb->setSavedReference(templatefile); } } maskfile = validParameter.validFile(parameters, "mask", false); if (maskfile == "not found") { maskfile = ""; } else if (maskfile != "default") { if (inputDir != "") { string path = m->hasPath(maskfile); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { maskfile = inputDir + maskfile; } } ifstream in; int ableToOpen = m->openInputFile(maskfile, in, "no error"); if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(maskfile); m->mothurOut("Unable to open " + maskfile + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); maskfile = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(maskfile); m->mothurOut("Unable to open " + maskfile + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); maskfile = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + maskfile + "."); m->mothurOutEndLine(); abort = true; } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } consfile = validParameter.validFile(parameters, "conservation", true); if (consfile == "not open") { abort = true; } else if (consfile == "not found") { consfile = ""; //check for consfile string tempConsFile = m->getRootName(inputDir + m->getSimpleName(templatefile)) + "freq"; ifstream FileTest(tempConsFile.c_str()); if(FileTest){ bool GoodFile = m->checkReleaseVersion(FileTest, m->getVersion()); if (GoodFile) { m->mothurOut("I found " + tempConsFile + " in your input file directory. I will use it to save time."); m->mothurOutEndLine(); consfile = tempConsFile; FileTest.close(); } }else { string tempConsFile = m->getDefaultPath() + m->getRootName(m->getSimpleName(templatefile)) + "freq"; ifstream FileTest2(tempConsFile.c_str()); if(FileTest2){ bool GoodFile = m->checkReleaseVersion(FileTest2, m->getVersion()); if (GoodFile) { m->mothurOut("I found " + tempConsFile + " in your input file directory. I will use it to save time."); m->mothurOutEndLine(); consfile = tempConsFile; FileTest2.close(); } } } } quanfile = validParameter.validFile(parameters, "quantile", true); if (quanfile == "not open") { abort = true; } else if (quanfile == "not found") { quanfile = ""; } } } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "ChimeraPintailCommand"); exit(1); } } //*************************************************************************************************************** int ChimeraPintailCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } for (int s = 0; s < fastaFileNames.size(); s++) { m->mothurOut("Checking sequences from " + fastaFileNames[s] + " ..." ); m->mothurOutEndLine(); int start = time(NULL); //set user options if (maskfile == "default") { m->mothurOut("I am using the default 236627 EU009184.1 Shigella dysenteriae str. FBD013."); m->mothurOutEndLine(); } //check for quantile to save the time string baseName = templatefile; if (templatefile == "saved") { baseName = rdb->getSavedReference(); } string tempQuan = ""; if ((!filter) && (maskfile == "")) { tempQuan = inputDir + m->getRootName(m->getSimpleName(baseName)) + "pintail.quan"; }else if ((!filter) && (maskfile != "")) { tempQuan = inputDir + m->getRootName(m->getSimpleName(baseName)) + "pintail.masked.quan"; }else if ((filter) && (maskfile != "")) { tempQuan = inputDir + m->getRootName(m->getSimpleName(baseName)) + "pintail.filtered." + m->getSimpleName(m->getRootName(fastaFileNames[s])) + "masked.quan"; }else if ((filter) && (maskfile == "")) { tempQuan = inputDir + m->getRootName(m->getSimpleName(baseName)) + "pintail.filtered." + m->getSimpleName(m->getRootName(fastaFileNames[s])) + "quan"; } ifstream FileTest(tempQuan.c_str()); if(FileTest){ bool GoodFile = m->checkReleaseVersion(FileTest, m->getVersion()); if (GoodFile) { m->mothurOut("I found " + tempQuan + " in your input file directory. I will use it to save time."); m->mothurOutEndLine(); quanfile = tempQuan; FileTest.close(); } }else { string tryPath = m->getDefaultPath(); string tempQuan = ""; if ((!filter) && (maskfile == "")) { tempQuan = tryPath + m->getRootName(m->getSimpleName(baseName)) + "pintail.quan"; }else if ((!filter) && (maskfile != "")) { tempQuan = tryPath + m->getRootName(m->getSimpleName(baseName)) + "pintail.masked.quan"; }else if ((filter) && (maskfile != "")) { tempQuan = tryPath + m->getRootName(m->getSimpleName(baseName)) + "pintail.filtered." + m->getSimpleName(m->getRootName(fastaFileNames[s])) + "masked.quan"; }else if ((filter) && (maskfile == "")) { tempQuan = tryPath + m->getRootName(m->getSimpleName(baseName)) + "pintail.filtered." + m->getSimpleName(m->getRootName(fastaFileNames[s])) + "quan"; } ifstream FileTest2(tempQuan.c_str()); if(FileTest2){ bool GoodFile = m->checkReleaseVersion(FileTest2, m->getVersion()); if (GoodFile) { m->mothurOut("I found " + tempQuan + " in your input file directory. I will use it to save time."); m->mothurOutEndLine(); quanfile = tempQuan; FileTest2.close(); } } } chimera = new Pintail(fastaFileNames[s], templatefile, filter, processors, maskfile, consfile, quanfile, window, increment, outputDir); if (outputDir == "") { outputDir = m->hasPath(fastaFileNames[s]); }//if user entered a file with a path then preserve it string outputFileName, accnosFileName; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])); if (maskfile != "") { variables["[tag]"] = m->getSimpleName(m->getRootName(maskfile)); } outputFileName = getOutputFileName("chimera", variables); accnosFileName = getOutputFileName("accnos", variables); if (m->control_pressed) { delete chimera; for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } if (chimera->getUnaligned()) { m->mothurOut("Your template sequences are different lengths, please correct."); m->mothurOutEndLine(); delete chimera; return 0; } templateSeqsLength = chimera->getLength(); #ifdef USE_MPI int pid, numSeqsPerProcessor; int tag = 2001; vector MPIPos; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_File inMPI; MPI_File outMPI; MPI_File outMPIAccnos; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outFilename[1024]; strcpy(outFilename, outputFileName.c_str()); char outAccnosFilename[1024]; strcpy(outAccnosFilename, accnosFileName.c_str()); char inFileName[1024]; strcpy(inFileName, fastaFileNames[s].c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outFilename, outMode, MPI_INFO_NULL, &outMPI); MPI_File_open(MPI_COMM_WORLD, outAccnosFilename, outMode, MPI_INFO_NULL, &outMPIAccnos); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } delete chimera; return 0; } if (pid == 0) { //you are the root process MPIPos = m->setFilePosFasta(fastaFileNames[s], numSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (numSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } //figure out how many sequences you have to align numSeqsPerProcessor = numSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numSeqs - pid * numSeqsPerProcessor; } //do your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPI, outMPIAccnos, MPIPos); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); m->mothurRemove(outputFileName); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } delete chimera; return 0; } }else{ //you are a child process MPI_Recv(&numSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(numSeqs+1); MPI_Recv(&MPIPos[0], (numSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = numSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numSeqs - pid * numSeqsPerProcessor; } //do your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPI, outMPIAccnos, MPIPos); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } delete chimera; return 0; } } //close files MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else //break up file #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) vector positions = m->divideFile(fastaFileNames[s], processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } if(processors == 1){ numSeqs = driver(lines[0], outputFileName, fastaFileNames[s], accnosFileName); if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFileName); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); delete chimera; return 0; } }else{ processIDS.resize(0); numSeqs = createProcesses(outputFileName, fastaFileNames[s], accnosFileName); rename((outputFileName + toString(processIDS[0]) + ".temp").c_str(), outputFileName.c_str()); rename((accnosFileName + toString(processIDS[0]) + ".temp").c_str(), accnosFileName.c_str()); //append output files for(int i=1;iappendFiles((outputFileName + toString(processIDS[i]) + ".temp"), outputFileName); m->mothurRemove((outputFileName + toString(processIDS[i]) + ".temp")); } //append output files for(int i=1;iappendFiles((accnosFileName + toString(processIDS[i]) + ".temp"), accnosFileName); m->mothurRemove((accnosFileName + toString(processIDS[i]) + ".temp")); } if (m->control_pressed) { m->mothurRemove(outputFileName); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } outputTypes.clear(); for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); delete chimera; return 0; } } #else lines.push_back(new linePair(0, 1000)); numSeqs = driver(lines[0], outputFileName, fastaFileNames[s], accnosFileName); if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFileName); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); delete chimera; return 0; } #endif #endif delete chimera; for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); outputNames.push_back(outputFileName); outputTypes["chimera"].push_back(outputFileName); outputNames.push_back(accnosFileName); outputTypes["accnos"].push_back(accnosFileName); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); } //set accnos file as new current accnosfile string current = ""; itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "execute"); exit(1); } } //********************************************************************************************************************** int ChimeraPintailCommand::driver(linePair* filePos, string outputFName, string filename, string accnos){ try { ofstream out; m->openOutputFile(outputFName, out); ofstream out2; m->openOutputFile(accnos, out2); ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(filePos->start); bool done = false; int count = 0; while (!done) { if (m->control_pressed) { return 1; } Sequence* candidateSeq = new Sequence(inFASTA); m->gobble(inFASTA); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getAligned().length() != templateSeqsLength) { //chimeracheck does not require seqs to be aligned m->mothurOut(candidateSeq->getName() + " is not the same length as the template sequences. Skipping."); m->mothurOutEndLine(); }else{ //find chimeras chimera->getChimeras(candidateSeq); if (m->control_pressed) { delete candidateSeq; return 1; } //print results chimera->print(out, out2); } count++; } delete candidateSeq; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= filePos->end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count) + "\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count) + "\n"); } out.close(); out2.close(); inFASTA.close(); return count; } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "driver"); exit(1); } } //********************************************************************************************************************** #ifdef USE_MPI int ChimeraPintailCommand::driverMPI(int start, int num, MPI_File& inMPI, MPI_File& outMPI, MPI_File& outAccMPI, vector& MPIPos){ try { MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are for(int i=0;icontrol_pressed) { return 1; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); delete buf4; Sequence* candidateSeq = new Sequence(iss); m->gobble(iss); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getAligned().length() != templateSeqsLength) { //chimeracheck does not require seqs to be aligned m->mothurOut(candidateSeq->getName() + " is not the same length as the template sequences. Skipping."); m->mothurOutEndLine(); }else{ //find chimeras chimera->getChimeras(candidateSeq); if (m->control_pressed) { delete candidateSeq; return 1; } //print results chimera->print(outMPI, outAccMPI); } } delete candidateSeq; //report progress if((i+1) % 100 == 0){ cout << "Processing sequence: " << (i+1) << endl; } } //report progress if(num % 100 != 0){ cout << "Processing sequence: " << num << endl; } return 0; } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "driverMPI"); exit(1); } } #endif /**************************************************************************************************/ int ChimeraPintailCommand::createProcesses(string outputFileName, string filename, string accnos) { try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 0; int num = 0; bool recalc = false; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], outputFileName + toString(m->mothurGetpid(process)) + ".temp", filename, accnos + toString(m->mothurGetpid(process)) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 0; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], outputFileName + toString(m->mothurGetpid(process)) + ".temp", filename, accnos + toString(m->mothurGetpid(process)) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); } return num; #endif } catch(exception& e) { m->errorOut(e, "ChimeraPintailCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/commands/chimerapintailcommand.h000066400000000000000000000037411255543666200227450ustar00rootroot00000000000000#ifndef CHIMERAPINTAILCOMMAND_H #define CHIMERAPINTAILCOMMAND_H /* * chimerapintailcommand.h * Mothur * * Created by westcott on 4/1/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "chimera.h" #include "referencedb.h" /***********************************************************/ class ChimeraPintailCommand : public Command { public: ChimeraPintailCommand(string); ChimeraPintailCommand(); ~ChimeraPintailCommand(){} vector setParameters(); string getCommandName() { return "chimera.pintail"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Ashelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weightman AJ (2005). At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl Environ Microbiol 71: 7724-36. \nAshelford KE, Chuzhanova NA, Fry JC, Jones AJ, Weightman AJ (2006). New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras. Appl Environ Microbiol 72: 5734-41. \nhttp://www.mothur.org/wiki/Chimera.pintail"; } string getDescription() { return "detect chimeric sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: ReferenceDB* rdb; vector processIDS; //processid vector lines; int driver(linePair*, string, string, string); int createProcesses(string, string, string); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, MPI_File&, MPI_File&, vector&); #endif bool abort, filter, save; string fastafile, templatefile, consfile, quanfile, maskfile, outputDir, inputDir; int processors, window, increment, numSeqs, templateSeqsLength; Chimera* chimera; vector outputNames; vector fastaFileNames; }; /***********************************************************/ #endif mothur-1.36.1/source/commands/chimeraslayercommand.cpp000066400000000000000000003200671255543666200231420ustar00rootroot00000000000000/* * chimeraslayercommand.cpp * Mothur * * Created by westcott on 3/31/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "chimeraslayercommand.h" #include "deconvolutecommand.h" #include "referencedb.h" #include "sequenceparser.h" #include "counttable.h" //********************************************************************************************************************** vector ChimeraSlayerCommand::setParameters(){ try { CommandParameter ptemplate("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptemplate); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","chimera-accnos",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pwindow("window", "Number", "", "50", "", "", "","",false,false); parameters.push_back(pwindow); CommandParameter pksize("ksize", "Number", "", "7", "", "", "","",false,false); parameters.push_back(pksize); CommandParameter pmatch("match", "Number", "", "5.0", "", "", "","",false,false); parameters.push_back(pmatch); CommandParameter pmismatch("mismatch", "Number", "", "-4.0", "", "", "","",false,false); parameters.push_back(pmismatch); CommandParameter pminsim("minsim", "Number", "", "90", "", "", "","",false,false); parameters.push_back(pminsim); CommandParameter pmincov("mincov", "Number", "", "70", "", "", "","",false,false); parameters.push_back(pmincov); CommandParameter pminsnp("minsnp", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pminsnp); CommandParameter pminbs("minbs", "Number", "", "90", "", "", "","",false,false); parameters.push_back(pminbs); CommandParameter psearch("search", "Multiple", "kmer-blast", "blast", "", "", "","",false,false); parameters.push_back(psearch); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter prealign("realign", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(prealign); CommandParameter ptrim("trim", "Boolean", "", "F", "", "", "","fasta",false,false); parameters.push_back(ptrim); CommandParameter psplit("split", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psplit); CommandParameter pnumwanted("numwanted", "Number", "", "15", "", "", "","",false,false); parameters.push_back(pnumwanted); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pdivergence("divergence", "Number", "", "1.007", "", "", "","",false,false); parameters.push_back(pdivergence); CommandParameter pdups("dereplicate", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pdups); CommandParameter pparents("parents", "Number", "", "3", "", "", "","",false,false); parameters.push_back(pparents); CommandParameter pincrement("increment", "Number", "", "5", "", "", "","",false,false); parameters.push_back(pincrement); CommandParameter pblastlocation("blastlocation", "String", "", "", "", "", "","",false,false); parameters.push_back(pblastlocation); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter psave("save", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psave); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ChimeraSlayerCommand::getHelpString(){ try { string helpString = ""; helpString += "The chimera.slayer command reads a fastafile and referencefile and outputs potentially chimeric sequences.\n"; helpString += "This command was modeled after the chimeraSlayer written by the Broad Institute.\n"; helpString += "The chimera.slayer command parameters are fasta, name, group, template, processors, dereplicate, trim, ksize, window, match, mismatch, divergence. minsim, mincov, minbs, minsnp, parents, search, iters, increment, numwanted, blastlocation and realign.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your potentially chimeric sequences, and is required, unless you have a valid current fasta file. \n"; helpString += "The name parameter allows you to provide a name file, if you are using reference=self. \n"; helpString += "The group parameter allows you to provide a group file. The group file can be used with a namesfile and reference=self. When checking sequences, only sequences from the same group as the query sequence will be used as the reference. \n"; helpString += "The count parameter allows you to provide a count file. The count file reference=self. If your count file contains group information, when checking sequences, only sequences from the same group as the query sequence will be used as the reference. When you use a count file with group info and dereplicate=T, mothur will create a *.pick.count_table file containing seqeunces after chimeras are removed. \n"; helpString += "You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amazon.fasta \n"; helpString += "The reference parameter allows you to enter a reference file containing known non-chimeric sequences, and is required. You may also set template=self, in this case the abundant sequences will be used as potential parents. \n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; #ifdef USE_MPI helpString += "When using MPI, the processors parameter is set to the number of MPI processes running. \n"; #endif helpString += "If the dereplicate parameter is false, then if one group finds the seqeunce to be chimeric, then all groups find it to be chimeric, default=f.\n"; helpString += "The trim parameter allows you to output a new fasta file containing your sequences with the chimeric ones trimmed to include only their longest piece, default=F. \n"; helpString += "The split parameter allows you to check both pieces of non-chimeric sequence for chimeras, thus looking for trimeras and quadmeras. default=F. \n"; helpString += "The window parameter allows you to specify the window size for searching for chimeras, default=50. \n"; helpString += "The increment parameter allows you to specify how far you move each window while finding chimeric sequences, default=5.\n"; helpString += "The numwanted parameter allows you to specify how many sequences you would each query sequence compared with, default=15.\n"; helpString += "The ksize parameter allows you to input kmersize, default is 7, used if search is kmer. \n"; helpString += "The match parameter allows you to reward matched bases in blast search, default is 5. \n"; helpString += "The parents parameter allows you to select the number of potential parents to investigate from the numwanted best matches after rating them, default is 3. \n"; helpString += "The mismatch parameter allows you to penalize mismatched bases in blast search, default is -4. \n"; helpString += "The divergence parameter allows you to set a cutoff for chimera determination, default is 1.007. \n"; helpString += "The iters parameter allows you to specify the number of bootstrap iters to do with the chimeraslayer method, default=1000.\n"; helpString += "The minsim parameter allows you to specify a minimum similarity with the parent fragments, default=90. \n"; helpString += "The mincov parameter allows you to specify minimum coverage by closest matches found in template. Default is 70, meaning 70%. \n"; helpString += "The minbs parameter allows you to specify minimum bootstrap support for calling a sequence chimeric. Default is 90, meaning 90%. \n"; helpString += "The minsnp parameter allows you to specify percent of SNPs to sample on each side of breakpoint for computing bootstrap support (default: 10) \n"; helpString += "The search parameter allows you to specify search method for finding the closest parent. Choices are blast, and kmer, default blast. \n"; helpString += "The realign parameter allows you to realign the query to the potential parents. Choices are true or false, default true. \n"; helpString += "The blastlocation parameter allows you to specify the location of your blast executable. By default mothur will look in ./blast/bin relative to mothur's executable. \n"; helpString += "If the save parameter is set to true the reference sequences will be saved in memory, to clear them later you can use the clear.memory command. Default=f."; helpString += "The chimera.slayer command should be in the following format: \n"; helpString += "chimera.slayer(fasta=yourFastaFile, reference=yourTemplate, search=yourSearch) \n"; helpString += "Example: chimera.slayer(fasta=AD.align, reference=core_set_aligned.imputed.fasta, search=kmer) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ChimeraSlayerCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "chimera") { pattern = "[filename],slayer.chimeras"; } else if (type == "accnos") { pattern = "[filename],slayer.accnos"; } else if (type == "fasta") { pattern = "[filename],slayer.fasta"; } else if (type == "count") { pattern = "[filename],slayer.pick.count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ChimeraSlayerCommand::ChimeraSlayerCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "ChimeraSlayerCommand"); exit(1); } } //*************************************************************************************************************** ChimeraSlayerCommand::ChimeraSlayerCommand(string option) { try { abort = false; calledHelp = false; ReferenceDB* rdb = ReferenceDB::getInstance(); hasCount = false; hasName = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("chimera.slayer"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", false); if (fastafile == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(fastafile, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("[ERROR]: no valid files."); m->mothurOutEndLine(); abort = true; } } //check for required parameters namefile = validParameter.validFile(parameters, "name", false); if (namefile == "not found") { namefile = ""; } else { m->splitAtDash(namefile, nameFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < nameFileNames.size(); i++) { bool ignore = false; if (nameFileNames[i] == "current") { nameFileNames[i] = m->getNameFile(); if (nameFileNames[i] != "") { m->mothurOut("Using " + nameFileNames[i] + " as input file for the name parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list nameFileNames.erase(nameFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(nameFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { nameFileNames[i] = inputDir + nameFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(nameFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(nameFileNames[i]); m->mothurOut("Unable to open " + nameFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); nameFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(nameFileNames[i]); m->mothurOut("Unable to open " + nameFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); nameFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + nameFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list nameFileNames.erase(nameFileNames.begin()+i); i--; }else { m->setNameFile(nameFileNames[i]); } } } } if (nameFileNames.size() != 0) { hasName = true; } //check for required parameters vector countfileNames; countfile = validParameter.validFile(parameters, "count", false); if (countfile == "not found") { countfile = ""; }else { m->splitAtDash(countfile, countfileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < countfileNames.size(); i++) { bool ignore = false; if (countfileNames[i] == "current") { countfileNames[i] = m->getCountTableFile(); if (nameFileNames[i] != "") { m->mothurOut("Using " + countfileNames[i] + " as input file for the count parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current count file, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list countfileNames.erase(countfileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(countfileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { countfileNames[i] = inputDir + countfileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(countfileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(countfileNames[i]); m->mothurOut("Unable to open " + countfileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); countfileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(countfileNames[i]); m->mothurOut("Unable to open " + countfileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); countfileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + countfileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list countfileNames.erase(countfileNames.begin()+i); i--; }else { m->setCountTableFile(countfileNames[i]); } } } } if (countfileNames.size() != 0) { hasCount = true; } //make sure there is at least one valid file left if (hasName && hasCount) { m->mothurOut("[ERROR]: You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } if (!hasName && hasCount) { nameFileNames = countfileNames; } if ((hasCount || hasName) && (nameFileNames.size() != fastaFileNames.size())) { m->mothurOut("[ERROR]: The number of name or count files does not match the number of fastafiles, please correct."); m->mothurOutEndLine(); abort=true; } bool hasGroup = true; groupfile = validParameter.validFile(parameters, "group", false); if (groupfile == "not found") { groupfile = ""; hasGroup = false; } else { m->splitAtDash(groupfile, groupFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < groupFileNames.size(); i++) { bool ignore = false; if (groupFileNames[i] == "current") { groupFileNames[i] = m->getGroupFile(); if (groupFileNames[i] != "") { m->mothurOut("Using " + groupFileNames[i] + " as input file for the group parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list groupFileNames.erase(groupFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(groupFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { groupFileNames[i] = inputDir + groupFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(groupFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(groupFileNames[i]); m->mothurOut("Unable to open " + groupFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(groupFileNames[i]); m->mothurOut("Unable to open " + groupFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + groupFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list groupFileNames.erase(groupFileNames.begin()+i); i--; }else { m->setGroupFile(groupFileNames[i]); } } } //make sure there is at least one valid file left if (groupFileNames.size() == 0) { m->mothurOut("[ERROR]: no valid group files."); m->mothurOutEndLine(); abort = true; } } if (hasGroup && (groupFileNames.size() != fastaFileNames.size())) { m->mothurOut("[ERROR]: The number of groupfiles does not match the number of fastafiles, please correct."); m->mothurOutEndLine(); abort=true; } if (hasGroup && hasCount) { m->mothurOut("[ERROR]: You must enter ONLY ONE of the following: count or group."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "save", false); if (temp == "not found"){ temp = "f"; } save = m->isTrue(temp); rdb->save = save; if (save) { //clear out old references rdb->clearMemory(); } string path; it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ if (it->second == "self") { templatefile = "self"; if (save) { m->mothurOut("[WARNING]: You can't save reference=self, ignoring save."); m->mothurOutEndLine(); save = false; } } else { path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } templatefile = validParameter.validFile(parameters, "reference", true); if (templatefile == "not open") { abort = true; } else if (templatefile == "not found") { //check for saved reference sequences if (rdb->referenceSeqs.size() != 0) { templatefile = "saved"; }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required."); m->mothurOutEndLine(); abort = true; } }else { if (save) { rdb->setSavedReference(templatefile); } } } }else if (hasName) { templatefile = "self"; if (save) { m->mothurOut("[WARNING]: You can't save reference=self, ignoring save."); m->mothurOutEndLine(); save = false; } }else if (hasCount) { templatefile = "self"; if (save) { m->mothurOut("[WARNING]: You can't save reference=self, ignoring save."); m->mothurOutEndLine(); save = false; } } else { if (rdb->referenceSeqs.size() != 0) { templatefile = "saved"; }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required."); m->mothurOutEndLine(); templatefile = ""; abort = true; } } temp = validParameter.validFile(parameters, "ksize", false); if (temp == "not found") { temp = "7"; } m->mothurConvert(temp, ksize); temp = validParameter.validFile(parameters, "window", false); if (temp == "not found") { temp = "50"; } m->mothurConvert(temp, window); temp = validParameter.validFile(parameters, "match", false); if (temp == "not found") { temp = "5"; } m->mothurConvert(temp, match); temp = validParameter.validFile(parameters, "mismatch", false); if (temp == "not found") { temp = "-4"; } m->mothurConvert(temp, mismatch); temp = validParameter.validFile(parameters, "divergence", false); if (temp == "not found") { temp = "1.007"; } m->mothurConvert(temp, divR); temp = validParameter.validFile(parameters, "minsim", false); if (temp == "not found") { temp = "90"; } m->mothurConvert(temp, minSimilarity); temp = validParameter.validFile(parameters, "mincov", false); if (temp == "not found") { temp = "70"; } m->mothurConvert(temp, minCoverage); temp = validParameter.validFile(parameters, "minbs", false); if (temp == "not found") { temp = "90"; } m->mothurConvert(temp, minBS); temp = validParameter.validFile(parameters, "minsnp", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, minSNP); temp = validParameter.validFile(parameters, "parents", false); if (temp == "not found") { temp = "3"; } m->mothurConvert(temp, parents); temp = validParameter.validFile(parameters, "realign", false); if (temp == "not found") { temp = "t"; } realign = m->isTrue(temp); temp = validParameter.validFile(parameters, "trim", false); if (temp == "not found") { temp = "f"; } trim = m->isTrue(temp); temp = validParameter.validFile(parameters, "split", false); if (temp == "not found") { temp = "f"; } trimera = m->isTrue(temp); search = validParameter.validFile(parameters, "search", false); if (search == "not found") { search = "blast"; } temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "increment", false); if (temp == "not found") { temp = "5"; } m->mothurConvert(temp, increment); temp = validParameter.validFile(parameters, "numwanted", false); if (temp == "not found") { temp = "15"; } m->mothurConvert(temp, numwanted); temp = validParameter.validFile(parameters, "dereplicate", false); if (temp == "not found") { temp = "false"; } dups = m->isTrue(temp); blastlocation = validParameter.validFile(parameters, "blastlocation", false); if (blastlocation == "not found") { blastlocation = ""; } else { //add / to name if needed string lastChar = blastlocation.substr(blastlocation.length()-1); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (lastChar != "/") { blastlocation += "/"; } #else if (lastChar != "\\") { blastlocation += "\\"; } #endif blastlocation = m->getFullPathName(blastlocation); string formatdbCommand = ""; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) formatdbCommand = blastlocation + "formatdb"; #else formatdbCommand = blastlocation + "formatdb.exe"; #endif //test to make sure formatdb exists ifstream in; formatdbCommand = m->getFullPathName(formatdbCommand); int ableToOpen = m->openInputFile(formatdbCommand, in, "no error"); in.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + formatdbCommand + " file does not exist. mothur requires formatdb.exe to run chimera.slayer."); m->mothurOutEndLine(); abort = true; } string blastCommand = ""; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) blastCommand = blastlocation + "megablast"; #else blastCommand = blastlocation + "megablast.exe"; #endif //test to make sure formatdb exists ifstream in2; blastCommand = m->getFullPathName(blastCommand); ableToOpen = m->openInputFile(blastCommand, in2, "no error"); in2.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + blastCommand + " file does not exist. mothur requires blastall.exe to run chimera.slayer."); m->mothurOutEndLine(); abort = true; } } if ((search != "blast") && (search != "kmer")) { m->mothurOut(search + " is not a valid search."); m->mothurOutEndLine(); abort = true; } if ((hasName || hasCount) && (templatefile != "self")) { m->mothurOut("You have provided a namefile or countfile and the reference parameter is not set to self. I am not sure what reference you are trying to use, aborting."); m->mothurOutEndLine(); abort=true; } if (hasGroup && (templatefile != "self")) { m->mothurOut("You have provided a group file and the reference parameter is not set to self. I am not sure what reference you are trying to use, aborting."); m->mothurOutEndLine(); abort=true; } //until we resolve the issue 10-18-11 #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else //processors=1; #endif } } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "ChimeraSlayerCommand"); exit(1); } } //*************************************************************************************************************** int ChimeraSlayerCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } for (int s = 0; s < fastaFileNames.size(); s++) { m->mothurOut("Checking sequences from " + fastaFileNames[s] + " ..." ); m->mothurOutEndLine(); int start = time(NULL); if (outputDir == "") { outputDir = m->hasPath(fastaFileNames[s]); }//if user entered a file with a path then preserve it map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])); string outputFileName = getOutputFileName("chimera", variables); string accnosFileName = getOutputFileName("accnos", variables); string trimFastaFileName = getOutputFileName("fasta", variables); string newCountFile = ""; //clears files ofstream out, out1, out2; m->openOutputFile(outputFileName, out); out.close(); m->openOutputFile(accnosFileName, out1); out1.close(); if (trim) { m->openOutputFile(trimFastaFileName, out2); out2.close(); } outputNames.push_back(outputFileName); outputTypes["chimera"].push_back(outputFileName); outputNames.push_back(accnosFileName); outputTypes["accnos"].push_back(accnosFileName); if (trim) { outputNames.push_back(trimFastaFileName); outputTypes["fasta"].push_back(trimFastaFileName); } //maps a filename to priority map. //if no groupfile this is fastafileNames[s] -> prioirity //if groupfile then this is each groups seqs -> priority map > fileToPriority; map >::iterator itFile; map fileGroup; fileToPriority[fastaFileNames[s]] = priority; //default fileGroup[fastaFileNames[s]] = "noGroup"; map uniqueNames; int totalChimeras = 0; lines.clear(); if (templatefile == "self") { if (hasCount) { SequenceCountParser* parser = NULL; setUpForSelfReference(parser, fileGroup, fileToPriority, s); if (parser != NULL) { uniqueNames = parser->getAllSeqsMap(); delete parser; } }else { SequenceParser* parser = NULL; setUpForSelfReference(parser, fileGroup, fileToPriority, s); if (parser != NULL) { uniqueNames = parser->getAllSeqsMap(); delete parser; } } } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } if (fileToPriority.size() == 1) { //you running without a groupfile itFile = fileToPriority.begin(); string thisFastaName = itFile->first; map thisPriority = itFile->second; #ifdef USE_MPI MPIExecute(thisFastaName, outputFileName, accnosFileName, trimFastaFileName, thisPriority); #else //break up file vector positions; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(thisFastaName, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } #else if (processors == 1) { lines.push_back(linePair(0, 1000)); } else { positions = m->setFilePosFasta(thisFastaName, numSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif if(processors == 1){ numSeqs = driver(lines[0], outputFileName, thisFastaName, accnosFileName, trimFastaFileName, thisPriority); } else{ numSeqs = createProcesses(outputFileName, thisFastaName, accnosFileName, trimFastaFileName, thisPriority); } if (m->control_pressed) { outputTypes.clear(); if (trim) { m->mothurRemove(trimFastaFileName); } m->mothurRemove(outputFileName); m->mothurRemove(accnosFileName); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } #endif }else { //you have provided a groupfile string countFile = ""; if (hasCount) { countFile = nameFileNames[s]; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(nameFileNames[s])); newCountFile = getOutputFileName("count", variables); } #ifdef USE_MPI MPIExecuteGroups(outputFileName, accnosFileName, trimFastaFileName, fileToPriority, fileGroup, newCountFile, countFile); #else if (processors == 1) { numSeqs = driverGroups(outputFileName, accnosFileName, trimFastaFileName, fileToPriority, fileGroup, newCountFile); if (hasCount && dups) { CountTable c; c.readTable(nameFileNames[s], true, false); if (!m->isBlank(newCountFile)) { ifstream in2; m->openInputFile(newCountFile, in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); c.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(newCountFile); c.printTable(newCountFile); } } else { numSeqs = createProcessesGroups(outputFileName, accnosFileName, trimFastaFileName, fileToPriority, fileGroup, newCountFile, countFile); } //destroys fileToPriority #endif #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid == 0) { #endif if (!dups) { totalChimeras = deconvoluteResults(uniqueNames, outputFileName, accnosFileName, trimFastaFileName); m->mothurOutEndLine(); m->mothurOut(toString(totalChimeras) + " chimera found."); m->mothurOutEndLine(); }else { if (hasCount) { set doNotRemove; CountTable c; c.readTable(newCountFile, true, true); vector namesInTable = c.getNamesOfSeqs(); for (int i = 0; i < namesInTable.size(); i++) { int temp = c.getNumSeqs(namesInTable[i]); if (temp == 0) { c.remove(namesInTable[i]); } else { doNotRemove.insert((namesInTable[i])); } } //remove names we want to keep from accnos file. set accnosNames = m->readAccnos(accnosFileName); ofstream out2; m->openOutputFile(accnosFileName, out2); for (set::iterator it = accnosNames.begin(); it != accnosNames.end(); it++) { if (doNotRemove.count(*it) == 0) { out2 << (*it) << endl; } } out2.close(); c.printTable(newCountFile); outputNames.push_back(newCountFile); outputTypes["count"].push_back(newCountFile); } } #ifdef USE_MPI } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait #endif } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); } //set accnos file as new current accnosfile string current = ""; itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } if (trim) { itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "execute"); exit(1); } } //********************************************************************************************************************** int ChimeraSlayerCommand::MPIExecuteGroups(string outputFileName, string accnosFileName, string trimFastaFileName, map >& fileToPriority, map& fileGroup, string countlist, string countfile){ try { #ifdef USE_MPI int pid; int tag = 2001; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); //put filenames in a vector, then pass each process a starting and ending point in the vector //all processes already have the fileToPriority and fileGroup, they just need to know which files to process map >::iterator itFile; vector filenames; for(itFile = fileToPriority.begin(); itFile != fileToPriority.end(); itFile++) { filenames.push_back(itFile->first); } int numGroupsPerProcessor = ceil(filenames.size() / (double) processors); int startIndex = pid * numGroupsPerProcessor; int endIndex = (pid+1) * numGroupsPerProcessor; if(pid == (processors - 1)){ endIndex = filenames.size(); } vector MPIPos; MPI_File outMPI; MPI_File outMPIAccnos; MPI_File outMPIFasta; MPI_File outMPICount; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outFilename[1024]; strcpy(outFilename, outputFileName.c_str()); char outAccnosFilename[1024]; strcpy(outAccnosFilename, accnosFileName.c_str()); char outFastaFilename[1024]; strcpy(outFastaFilename, trimFastaFileName.c_str()); char outCountFilename[1024]; strcpy(outCountFilename, countlist.c_str()); MPI_File_open(MPI_COMM_WORLD, outFilename, outMode, MPI_INFO_NULL, &outMPI); MPI_File_open(MPI_COMM_WORLD, outAccnosFilename, outMode, MPI_INFO_NULL, &outMPIAccnos); if (trim) { MPI_File_open(MPI_COMM_WORLD, outFastaFilename, outMode, MPI_INFO_NULL, &outMPIFasta); } if (hasCount && dups) { MPI_File_open(MPI_COMM_WORLD, outCountFilename, outMode, MPI_INFO_NULL, &outMPICount); } if (m->control_pressed) { MPI_File_close(&outMPI); if (trim) { MPI_File_close(&outMPIFasta); } MPI_File_close(&outMPIAccnos); if (hasCount && dups) { MPI_File_close(&outMPICount); } return 0; } //print headers if (pid == 0) { //you are the root process m->mothurOutEndLine(); m->mothurOut("Only reporting sequence supported by " + toString(minBS) + "% of bootstrapped results."); m->mothurOutEndLine(); string outTemp = "Name\tLeftParent\tRightParent\tDivQLAQRB\tPerIDQLAQRB\tBootStrapA\tDivQLBQRA\tPerIDQLBQRA\tBootStrapB\tFlag\tLeftWindow\tRightWindow\n"; //print header int length = outTemp.length(); char* buf2 = new char[length]; memcpy(buf2, outTemp.c_str(), length); MPI_File_write_shared(outMPI, buf2, length, MPI_CHAR, &status); delete buf2; } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait for (int i = startIndex; i < endIndex; i++) { int start = time(NULL); int num = 0; string thisFastaName = filenames[i]; map thisPriority = fileToPriority[thisFastaName]; char inFileName[1024]; strcpy(inFileName, thisFastaName.c_str()); MPI_File inMPI; MPI_File_open(MPI_COMM_SELF, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPIPos = m->setFilePosFasta(thisFastaName, num); //fills MPIPos, returns numSeqs cout << endl << "Checking sequences from group: " << fileGroup[thisFastaName] << "." << endl; set cnames; driverMPI(0, num, inMPI, outMPI, outMPIAccnos, outMPIFasta, cnames, MPIPos, thisFastaName, thisPriority, true); numSeqs += num; MPI_File_close(&inMPI); m->mothurRemove(thisFastaName); if (dups) { if (cnames.size() != 0) { if (hasCount) { for (set::iterator it = cnames.begin(); it != cnames.end(); it++) { string outputString = (*it) + "\t" + fileGroup[thisFastaName] + "\n"; int length = outputString.length(); char* buf2 = new char[length]; memcpy(buf2, outputString.c_str(), length); MPI_File_write_shared(outMPICount, buf2, length, MPI_CHAR, &status); delete buf2; } }else { map >::iterator itGroupNameMap = group2NameMap.find(fileGroup[thisFastaName]); if (itGroupNameMap != group2NameMap.end()) { map thisnamemap = itGroupNameMap->second; map::iterator itN; for (set::iterator it = cnames.begin(); it != cnames.end(); it++) { itN = thisnamemap.find(*it); if (itN != thisnamemap.end()) { vector tempNames; m->splitAtComma(itN->second, tempNames); for (int j = 0; j < tempNames.size(); j++) { //write to accnos file string outputString = tempNames[j] + "\n"; int length = outputString.length(); char* buf2 = new char[length]; memcpy(buf2, outputString.c_str(), length); MPI_File_write_shared(outMPIAccnos, buf2, length, MPI_CHAR, &status); delete buf2; } }else { m->mothurOut("[ERROR]: parsing cannot find " + *it + ".\n"); m->control_pressed = true; } } }else { m->mothurOut("[ERROR]: parsing cannot find " + fileGroup[thisFastaName] + ".\n"); m->control_pressed = true; } } } } cout << endl << "It took " << toString(time(NULL) - start) << " secs to check " + toString(num) + " sequences from group " << fileGroup[thisFastaName] << "." << endl; } if (pid == 0) { for(int i = 1; i < processors; i++) { int temp = 0; MPI_Recv(&temp, 1, MPI_INT, i, 2001, MPI_COMM_WORLD, &status); numSeqs += temp; } }else{ MPI_Send(&numSeqs, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD); } MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); if (trim) { MPI_File_close(&outMPIFasta); } if (hasCount && dups) { MPI_File_close(&outMPICount); } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait #endif return 0; }catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "MPIExecuteGroups"); exit(1); } } //********************************************************************************************************************** int ChimeraSlayerCommand::MPIExecute(string inputFile, string outputFileName, string accnosFileName, string trimFastaFileName, map& priority){ try { #ifdef USE_MPI int pid, numSeqsPerProcessor; int tag = 2001; vector MPIPos; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_File inMPI; MPI_File outMPI; MPI_File outMPIAccnos; MPI_File outMPIFasta; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outFilename[1024]; strcpy(outFilename, outputFileName.c_str()); char outAccnosFilename[1024]; strcpy(outAccnosFilename, accnosFileName.c_str()); char outFastaFilename[1024]; strcpy(outFastaFilename, trimFastaFileName.c_str()); char inFileName[1024]; strcpy(inFileName, inputFile.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outFilename, outMode, MPI_INFO_NULL, &outMPI); MPI_File_open(MPI_COMM_WORLD, outAccnosFilename, outMode, MPI_INFO_NULL, &outMPIAccnos); if (trim) { MPI_File_open(MPI_COMM_WORLD, outFastaFilename, outMode, MPI_INFO_NULL, &outMPIFasta); } if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); if (trim) { MPI_File_close(&outMPIFasta); } MPI_File_close(&outMPIAccnos); return 0; } if (pid == 0) { //you are the root process m->mothurOutEndLine(); m->mothurOut("Only reporting sequence supported by " + toString(minBS) + "% of bootstrapped results."); m->mothurOutEndLine(); string outTemp = "Name\tLeftParent\tRightParent\tDivQLAQRB\tPerIDQLAQRB\tBootStrapA\tDivQLBQRA\tPerIDQLBQRA\tBootStrapB\tFlag\tLeftWindow\tRightWindow\n"; //print header int length = outTemp.length(); char* buf2 = new char[length]; memcpy(buf2, outTemp.c_str(), length); MPI_File_write_shared(outMPI, buf2, length, MPI_CHAR, &status); delete buf2; MPIPos = m->setFilePosFasta(inputFile, numSeqs); //fills MPIPos, returns numSeqs if (templatefile != "self") { //if template=self we can only use 1 processor //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (numSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } } //figure out how many sequences you have to align numSeqsPerProcessor = numSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numSeqs - pid * numSeqsPerProcessor; } if (templatefile == "self") { //if template=self we can only use 1 processor startIndex = 0; numSeqsPerProcessor = numSeqs; } //do your part set cnames; driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPI, outMPIAccnos, outMPIFasta, cnames, MPIPos, inputFile, priority, false); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); if (trim) { MPI_File_close(&outMPIFasta); } MPI_File_close(&outMPIAccnos); return 0; } }else{ //you are a child process if (templatefile != "self") { //if template=self we can only use 1 processor MPI_Recv(&numSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(numSeqs+1); MPI_Recv(&MPIPos[0], (numSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = numSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numSeqs - pid * numSeqsPerProcessor; } //do your part set cnames; driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPI, outMPIAccnos, outMPIFasta, cnames, MPIPos, inputFile, priority, false); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); if (trim) { MPI_File_close(&outMPIFasta); } MPI_File_close(&outMPIAccnos); return 0; } } } //close files MPI_File_close(&inMPI); MPI_File_close(&outMPI); MPI_File_close(&outMPIAccnos); if (trim) { MPI_File_close(&outMPIFasta); } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #endif return numSeqs; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "MPIExecute"); exit(1); } } //********************************************************************************************************************** int ChimeraSlayerCommand::deconvoluteResults(map& uniqueNames, string outputFileName, string accnosFileName, string trimFileName){ try { map::iterator itUnique; int total = 0; if (trimera) { //add in more potential uniqueNames map newUniqueNames = uniqueNames; for (map::iterator it = uniqueNames.begin(); it != uniqueNames.end(); it++) { newUniqueNames[(it->first)+"_LEFT"] = (it->first)+"_LEFT"; newUniqueNames[(it->first)+"_RIGHT"] = (it->first)+"_RIGHT"; } uniqueNames = newUniqueNames; newUniqueNames.clear(); } //edit accnos file ifstream in2; m->openInputFile(accnosFileName, in2, "no error"); ofstream out2; m->openOutputFile(accnosFileName+".temp", out2); string name; name = ""; set chimerasInFile; set::iterator itChimeras; while (!in2.eof()) { if (m->control_pressed) { in2.close(); out2.close(); m->mothurRemove(outputFileName); m->mothurRemove((accnosFileName+".temp")); return 0; } in2 >> name; m->gobble(in2); //find unique name itUnique = uniqueNames.find(name); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing accnos results. Cannot find "+ name + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { itChimeras = chimerasInFile.find((itUnique->second)); if (itChimeras == chimerasInFile.end()) { out2 << itUnique->second << endl; chimerasInFile.insert((itUnique->second)); total++; } } } in2.close(); out2.close(); m->mothurRemove(accnosFileName); rename((accnosFileName+".temp").c_str(), accnosFileName.c_str()); //edit chimera file ifstream in; m->openInputFile(outputFileName, in); ofstream out; m->openOutputFile(outputFileName+".temp", out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); string rest, parent1, parent2, line; set namesInFile; //this is so if a sequence is found to be chimera in several samples we dont write it to the results file more than once set::iterator itNames; //assumptions - in file each read will always look like... /* F11Fcsw_92754 no F11Fcsw_63104 F11Fcsw_33372 F11Fcsw_37007 0.89441 80.4469 0.2 1.03727 93.2961 52.2 no 0-241 243-369 */ //get header line if (!in.eof()) { line = m->getline(in); m->gobble(in); out << line << endl; } //for the chimera file, we want to make sure if any group finds a sequence to be chimeric then all groups do, //so if this is a report that did not find it to be chimeric, but it appears in the accnos file, //then ignore this report and continue until we find the report that found it to be chimeric while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove((outputFileName+".temp")); return 0; } in >> name; m->gobble(in); in >> parent1; m->gobble(in); if (name == "Name") { //name = "Name" because we append the header line each time we add results from the groups line = m->getline(in); m->gobble(in); }else { if (parent1 == "no") { //find unique name itUnique = uniqueNames.find(name); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find "+ name + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { //is this sequence really not chimeric?? itChimeras = chimerasInFile.find(itUnique->second); if (itChimeras == chimerasInFile.end()) { //is this sequence not already in the file itNames = namesInFile.find((itUnique->second)); if (itNames == namesInFile.end()) { out << itUnique->second << '\t' << "no" << endl; namesInFile.insert(itUnique->second); } } } }else { //read the rest of the line double DivQLAQRB,PerIDQLAQRB,BootStrapA,DivQLBQRA,PerIDQLBQRA,BootStrapB; string flag, range1, range2; bool print = false; in >> parent2 >> DivQLAQRB >> PerIDQLAQRB >> BootStrapA >> DivQLBQRA >> PerIDQLBQRA >> BootStrapB >> flag >> range1 >> range2; m->gobble(in); //find unique name itUnique = uniqueNames.find(name); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find "+ name + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { name = itUnique->second; //is this name already in the file itNames = namesInFile.find((name)); if (itNames == namesInFile.end()) { //no not in file if (flag == "no") { //are you really a no?? //is this sequence really not chimeric?? itChimeras = chimerasInFile.find(name); //then you really are a no so print, otherwise skip if (itChimeras == chimerasInFile.end()) { print = true; } }else{ print = true; } } } if (print) { out << name << '\t'; namesInFile.insert(name); //output parent1's name itUnique = uniqueNames.find(parent1); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find parentA "+ parent1 + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { out << itUnique->second << '\t'; } //output parent2's name itUnique = uniqueNames.find(parent2); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find parentA "+ parent2 + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { out << itUnique->second << '\t'; } out << DivQLAQRB << '\t' << PerIDQLAQRB << '\t' << BootStrapA << '\t' << DivQLBQRA << '\t' << PerIDQLBQRA << '\t' << BootStrapB << '\t' << flag << '\t' << range1 << '\t' << range2 << endl; } } } } in.close(); out.close(); m->mothurRemove(outputFileName); rename((outputFileName+".temp").c_str(), outputFileName.c_str()); //edit fasta file if (trim) { ifstream in3; m->openInputFile(trimFileName, in3); ofstream out3; m->openOutputFile(trimFileName+".temp", out3); namesInFile.clear(); while (!in3.eof()) { if (m->control_pressed) { in3.close(); out3.close(); m->mothurRemove(outputFileName); m->mothurRemove(accnosFileName); m->mothurRemove((trimFileName+".temp")); return 0; } Sequence seq(in3); m->gobble(in3); if (seq.getName() != "") { //find unique name itUnique = uniqueNames.find(seq.getName()); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing accnos results. Cannot find "+ seq.getName() + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { itNames = namesInFile.find((itUnique->second)); if (itNames == namesInFile.end()) { seq.printSequence(out3); } } } } in3.close(); out3.close(); m->mothurRemove(trimFileName); rename((trimFileName+".temp").c_str(), trimFileName.c_str()); } return total; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "deconvoluteResults"); exit(1); } } //********************************************************************************************************************** int ChimeraSlayerCommand::setUpForSelfReference(SequenceParser*& parser, map& fileGroup, map >& fileToPriority, int s){ try { fileGroup.clear(); fileToPriority.clear(); string nameFile = ""; if (nameFileNames.size() != 0) { //you provided a namefile and we don't need to create one nameFile = nameFileNames[s]; }else { nameFile = getNamesFile(fastaFileNames[s]); } //you provided a groupfile string groupFile = ""; if (groupFileNames.size() != 0) { groupFile = groupFileNames[s]; } if (groupFile == "") { if (processors != 1) { m->mothurOut("When using template=self, mothur can only use 1 processor, continuing."); m->mothurOutEndLine(); processors = 1; } //sort fastafile by abundance, returns new sorted fastafile name m->mothurOut("Sorting fastafile according to abundance..."); cout.flush(); priority = sortFastaFile(fastaFileNames[s], nameFile); m->mothurOut("Done."); m->mothurOutEndLine(); fileToPriority[fastaFileNames[s]] = priority; fileGroup[fastaFileNames[s]] = "noGroup"; }else { //Parse sequences by group parser = new SequenceParser(groupFile, fastaFileNames[s], nameFile); vector groups = parser->getNamesOfGroups(); for (int i = 0; i < groups.size(); i++) { vector thisGroupsSeqs = parser->getSeqs(groups[i]); map thisGroupsMap = parser->getNameMap(groups[i]); group2NameMap[groups[i]] = thisGroupsMap; string newFastaFile = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])) + groups[i] + "-sortedTemp.fasta"; priority = sortFastaFile(thisGroupsSeqs, thisGroupsMap, newFastaFile); fileToPriority[newFastaFile] = priority; fileGroup[newFastaFile] = groups[i]; } } return 0; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "setUpForSelfReference"); exit(1); } } //********************************************************************************************************************** int ChimeraSlayerCommand::setUpForSelfReference(SequenceCountParser*& parser, map& fileGroup, map >& fileToPriority, int s){ try { fileGroup.clear(); fileToPriority.clear(); string nameFile = ""; if (nameFileNames.size() != 0) { //you provided a namefile and we don't need to create one nameFile = nameFileNames[s]; }else { m->control_pressed = true; return 0; } CountTable ct; if (!ct.testGroups(nameFile)) { if (processors != 1) { m->mothurOut("When using template=self, mothur can only use 1 processor, continuing."); m->mothurOutEndLine(); processors = 1; } //sort fastafile by abundance, returns new sorted fastafile name m->mothurOut("Sorting fastafile according to abundance..."); cout.flush(); priority = sortFastaFile(fastaFileNames[s], nameFile); m->mothurOut("Done."); m->mothurOutEndLine(); fileToPriority[fastaFileNames[s]] = priority; fileGroup[fastaFileNames[s]] = "noGroup"; }else { //Parse sequences by group parser = new SequenceCountParser(nameFile, fastaFileNames[s]); vector groups = parser->getNamesOfGroups(); for (int i = 0; i < groups.size(); i++) { vector thisGroupsSeqs = parser->getSeqs(groups[i]); map thisGroupsMap = parser->getCountTable(groups[i]); string newFastaFile = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])) + groups[i] + "-sortedTemp.fasta"; sortFastaFile(thisGroupsSeqs, thisGroupsMap, newFastaFile); fileToPriority[newFastaFile] = thisGroupsMap; fileGroup[newFastaFile] = groups[i]; } } return 0; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "setUpForSelfReference"); exit(1); } } //********************************************************************************************************************** string ChimeraSlayerCommand::getNamesFile(string& inputFile){ try { string nameFile = ""; m->mothurOutEndLine(); m->mothurOut("No namesfile given, running unique.seqs command to generate one."); m->mothurOutEndLine(); m->mothurOutEndLine(); //use unique.seqs to create new name and fastafile string inputString = "fasta=" + inputFile; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: unique.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* uniqueCommand = new DeconvoluteCommand(inputString); uniqueCommand->execute(); map > filenames = uniqueCommand->getOutputFiles(); delete uniqueCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); nameFile = filenames["name"][0]; inputFile = filenames["fasta"][0]; return nameFile; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "getNamesFile"); exit(1); } } //********************************************************************************************************************** int ChimeraSlayerCommand::driverGroups(string outputFName, string accnos, string fasta, map >& fileToPriority, map& fileGroup, string countlist){ try { int totalSeqs = 0; ofstream outCountList; if (hasCount && dups) { m->openOutputFile(countlist, outCountList); } for (map >::iterator itFile = fileToPriority.begin(); itFile != fileToPriority.end(); itFile++) { if (m->control_pressed) { return 0; } int start = time(NULL); string thisFastaName = itFile->first; map thisPriority = itFile->second; string thisoutputFileName = outputDir + m->getRootName(m->getSimpleName(thisFastaName)) + fileGroup[thisFastaName] + "slayer.chimera"; string thisaccnosFileName = outputDir + m->getRootName(m->getSimpleName(thisFastaName)) + fileGroup[thisFastaName] + "slayer.accnos"; string thistrimFastaFileName = outputDir + m->getRootName(m->getSimpleName(thisFastaName)) + fileGroup[thisFastaName] + "slayer.fasta"; m->mothurOutEndLine(); m->mothurOut("Checking sequences from group: " + fileGroup[thisFastaName] + "."); m->mothurOutEndLine(); lines.clear(); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int proc = 1; vector positions = m->divideFile(thisFastaName, proc); lines.push_back(linePair(positions[0], positions[1])); #else lines.push_back(linePair(0, 1000)); #endif int numSeqs = driver(lines[0], thisoutputFileName, thisFastaName, thisaccnosFileName, thistrimFastaFileName, thisPriority); //if we provided a count file with group info and set dereplicate=t, then we want to create a *.pick.count_table //This table will zero out group counts for seqs determined to be chimeric by that group. if (dups) { if (!m->isBlank(thisaccnosFileName)) { ifstream in; m->openInputFile(thisaccnosFileName, in); string name; if (hasCount) { while (!in.eof()) { in >> name; m->gobble(in); outCountList << name << '\t' << fileGroup[thisFastaName] << endl; } in.close(); }else { map >::iterator itGroupNameMap = group2NameMap.find(fileGroup[thisFastaName]); if (itGroupNameMap != group2NameMap.end()) { map thisnamemap = itGroupNameMap->second; map::iterator itN; ofstream out; m->openOutputFile(thisaccnosFileName+".temp", out); while (!in.eof()) { in >> name; m->gobble(in); itN = thisnamemap.find(name); if (itN != thisnamemap.end()) { vector tempNames; m->splitAtComma(itN->second, tempNames); for (int j = 0; j < tempNames.size(); j++) { out << tempNames[j] << endl; } }else { m->mothurOut("[ERROR]: parsing cannot find " + name + ".\n"); m->control_pressed = true; } } out.close(); in.close(); m->renameFile(thisaccnosFileName+".temp", thisaccnosFileName); }else { m->mothurOut("[ERROR]: parsing cannot find " + fileGroup[thisFastaName] + ".\n"); m->control_pressed = true; } } } } //append files m->appendFiles(thisoutputFileName, outputFName); m->mothurRemove(thisoutputFileName); m->appendFiles(thisaccnosFileName, accnos); m->mothurRemove(thisaccnosFileName); if (trim) { m->appendFiles(thistrimFastaFileName, fasta); m->mothurRemove(thistrimFastaFileName); } m->mothurRemove(thisFastaName); totalSeqs += numSeqs; m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences from group " + fileGroup[thisFastaName] + "."); m->mothurOutEndLine(); } if (hasCount && dups) { outCountList.close(); } return totalSeqs; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "driverGroups"); exit(1); } } /**************************************************************************************************/ int ChimeraSlayerCommand::createProcessesGroups(string outputFName, string accnos, string fasta, map >& fileToPriority, map& fileGroup, string countlist, string countFile) { try { int process = 1; int num = 0; processIDS.clear(); bool recalc = false; map > copyFileToPriority; copyFileToPriority = fileToPriority; if (fileToPriority.size() < processors) { processors = fileToPriority.size(); } CountTable newCount; if (hasCount && dups) { newCount.readTable(countFile, true, false); } int groupsPerProcessor = fileToPriority.size() / processors; int remainder = fileToPriority.size() % processors; vector< map > > breakUp; for (int i = 0; i < processors; i++) { map > thisFileToPriority; map >::iterator itFile; int count = 0; int enough = groupsPerProcessor; if (i == 0) { enough = groupsPerProcessor + remainder; } for (itFile = fileToPriority.begin(); itFile != fileToPriority.end();) { thisFileToPriority[itFile->first] = itFile->second; fileToPriority.erase(itFile++); count++; if (count == enough) { break; } } breakUp.push_back(thisFileToPriority); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverGroups(outputFName + toString(m->mothurGetpid(process)) + ".temp", accnos + m->mothurGetpid(process) + ".temp", fasta + toString(m->mothurGetpid(process)) + ".temp", breakUp[process], fileGroup, accnos + toString(m->mothurGetpid(process)) + ".byCount"); //pass numSeqs to parent ofstream out; string tempFile = outputFName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); groupsPerProcessor = copyFileToPriority.size() / processors; remainder = copyFileToPriority.size() % processors; breakUp.clear(); for (int i = 0; i < processors; i++) { map > thisFileToPriority; map >::iterator itFile; int count = 0; int enough = groupsPerProcessor; if (i == 0) { enough = groupsPerProcessor + remainder; } for (itFile = copyFileToPriority.begin(); itFile != copyFileToPriority.end();) { thisFileToPriority[itFile->first] = itFile->second; copyFileToPriority.erase(itFile++); count++; if (count == enough) { break; } } breakUp.push_back(thisFileToPriority); } num = 0; processIDS.resize(0); process = 1; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverGroups(outputFName + toString(m->mothurGetpid(process)) + ".temp", accnos + m->mothurGetpid(process) + ".temp", fasta + toString(m->mothurGetpid(process)) + ".temp", breakUp[process], fileGroup, accnos + toString(m->mothurGetpid(process)) + ".byCount"); //pass numSeqs to parent ofstream out; string tempFile = outputFName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } num = driverGroups(outputFName, accnos, fasta, breakUp[0], fileGroup, accnos + ".byCount"); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the slayerData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for(int i=1; ifileToPriority.size() != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->end) + " of " + toString(pDataArray[i]->fileToPriority.size()) + " groups assigned to it, quitting. \n"); m->control_pressed = true; } num += pDataArray[i]->count; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //read my own if (hasCount && dups) { if (!m->isBlank(accnos + ".byCount")) { ifstream in2; m->openInputFile(accnos + ".byCount", in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); newCount.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(accnos + ".byCount"); } //append output files for(int i=0;iappendFiles((outputFName + toString(processIDS[i]) + ".temp"), outputFName); m->mothurRemove((outputFName + toString(processIDS[i]) + ".temp")); m->appendFiles((accnos + toString(processIDS[i]) + ".temp"), accnos); m->mothurRemove((accnos + toString(processIDS[i]) + ".temp")); if (trim) { m->appendFiles((fasta + toString(processIDS[i]) + ".temp"), fasta); m->mothurRemove((fasta + toString(processIDS[i]) + ".temp")); } if (hasCount && dups) { if (!m->isBlank(accnos + toString(processIDS[i]) + ".byCount")) { ifstream in2; m->openInputFile(accnos + toString(processIDS[i]) + ".byCount", in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); newCount.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(accnos + toString(processIDS[i]) + ".byCount"); } } //print new *.pick.count_table if (hasCount && dups) { newCount.printTable(countlist); } return num; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "createProcessesGroups"); exit(1); } } //********************************************************************************************************************** int ChimeraSlayerCommand::driver(linePair filePos, string outputFName, string filename, string accnos, string fasta, map& priority){ try { if (m->debug) { m->mothurOut("[DEBUG]: filename = " + filename + "\n"); } Chimera* chimera; if (templatefile != "self") { //you want to run slayer with a reference template chimera = new ChimeraSlayer(filename, templatefile, trim, search, ksize, match, mismatch, window, divR, minSimilarity, minCoverage, minBS, minSNP, parents, iters, increment, numwanted, realign, blastlocation, rand()); }else { chimera = new ChimeraSlayer(filename, templatefile, trim, priority, search, ksize, match, mismatch, window, divR, minSimilarity, minCoverage, minBS, minSNP, parents, iters, increment, numwanted, realign, blastlocation, rand()); } if (m->control_pressed) { delete chimera; return 0; } if (chimera->getUnaligned()) { delete chimera; m->mothurOut("Your template sequences are different lengths, please correct."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } templateSeqsLength = chimera->getLength(); ofstream out; m->openOutputFile(outputFName, out); ofstream out2; m->openOutputFile(accnos, out2); ofstream out3; if (trim) { m->openOutputFile(fasta, out3); } ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(filePos.start); if (filePos.start == 0) { chimera->printHeader(out); } bool done = false; int count = 0; while (!done) { if (m->control_pressed) { delete chimera; out.close(); out2.close(); if (trim) { out3.close(); } inFASTA.close(); return 1; } Sequence* candidateSeq = new Sequence(inFASTA); m->gobble(inFASTA); string candidateAligned = candidateSeq->getAligned(); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getAligned().length() != templateSeqsLength) { m->mothurOut(candidateSeq->getName() + " is not the same length as the template sequences. Skipping."); m->mothurOutEndLine(); }else{ //find chimeras chimera->getChimeras(candidateSeq); if (m->control_pressed) { delete chimera; delete candidateSeq; return 1; } //if you are not chimeric, then check each half data_results wholeResults = chimera->getResults(); //determine if we need to split bool isChimeric = false; if (wholeResults.flag == "yes") { string chimeraFlag = "no"; if( (wholeResults.results[0].bsa >= minBS && wholeResults.results[0].divr_qla_qrb >= divR) || (wholeResults.results[0].bsb >= minBS && wholeResults.results[0].divr_qlb_qra >= divR) ) { chimeraFlag = "yes"; } if (chimeraFlag == "yes") { if ((wholeResults.results[0].bsa >= minBS) || (wholeResults.results[0].bsb >= minBS)) { isChimeric = true; } } } if ((!isChimeric) && trimera) { //split sequence in half by bases string leftQuery, rightQuery; Sequence tempSeq(candidateSeq->getName(), candidateAligned); divideInHalf(tempSeq, leftQuery, rightQuery); //run chimeraSlayer on each piece Sequence* left = new Sequence(candidateSeq->getName(), leftQuery); Sequence* right = new Sequence(candidateSeq->getName(), rightQuery); //find chimeras chimera->getChimeras(left); data_results leftResults = chimera->getResults(); chimera->getChimeras(right); data_results rightResults = chimera->getResults(); //if either piece is chimeric then report Sequence trimmed = chimera->print(out, out2, leftResults, rightResults); if (trim) { trimmed.printSequence(out3); } delete left; delete right; }else { //already chimeric //print results Sequence trimmed = chimera->print(out, out2); if (trim) { trimmed.printSequence(out3); } } } count++; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= filePos.end)) { break; } #else if (inFASTA.eof()) { break; } #endif delete candidateSeq; //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count) + "\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count)+ "\n"); } int numNoParents = chimera->getNumNoParents(); if (numNoParents == count) { m->mothurOut("[WARNING]: megablast returned 0 potential parents for all your sequences. This could be due to formatdb.exe not being setup properly, please check formatdb.log for errors."); m->mothurOutEndLine(); } out.close(); out2.close(); if (trim) { out3.close(); } inFASTA.close(); delete chimera; return count; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "driver"); exit(1); } } //********************************************************************************************************************** #ifdef USE_MPI int ChimeraSlayerCommand::driverMPI(int start, int num, MPI_File& inMPI, MPI_File& outMPI, MPI_File& outAccMPI, MPI_File& outFastaMPI, set& cnames, vector& MPIPos, string filename, map& priority, bool byGroup){ try { MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are Chimera* chimera; if (templatefile != "self") { //you want to run slayer with a reference template chimera = new ChimeraSlayer(filename, templatefile, trim, search, ksize, match, mismatch, window, divR, minSimilarity, minCoverage, minBS, minSNP, parents, iters, increment, numwanted, realign, blastlocation, rand()); }else { chimera = new ChimeraSlayer(filename, templatefile, trim, priority, search, ksize, match, mismatch, window, divR, minSimilarity, minCoverage, minBS, minSNP, parents, iters, increment, numwanted, realign, blastlocation, rand(), byGroup); } if (m->control_pressed) { delete chimera; return 0; } if (chimera->getUnaligned()) { delete chimera; m->mothurOut("Your template sequences are different lengths, please correct."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } templateSeqsLength = chimera->getLength(); for(int i=0;icontrol_pressed) { delete chimera; return 1; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); delete buf4; Sequence* candidateSeq = new Sequence(iss); m->gobble(iss); string candidateAligned = candidateSeq->getAligned(); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getAligned().length() != templateSeqsLength) { m->mothurOut(candidateSeq->getName() + " is not the same length as the template sequences. Skipping."); m->mothurOutEndLine(); }else{ //find chimeras chimera->getChimeras(candidateSeq); if (m->control_pressed) { delete chimera; delete candidateSeq; return 1; } //if you are not chimeric, then check each half data_results wholeResults = chimera->getResults(); //determine if we need to split bool isChimeric = false; if (wholeResults.flag == "yes") { string chimeraFlag = "no"; if( (wholeResults.results[0].bsa >= minBS && wholeResults.results[0].divr_qla_qrb >= divR) || (wholeResults.results[0].bsb >= minBS && wholeResults.results[0].divr_qlb_qra >= divR) ) { chimeraFlag = "yes"; } if (chimeraFlag == "yes") { if ((wholeResults.results[0].bsa >= minBS) || (wholeResults.results[0].bsb >= minBS)) { isChimeric = true; } } } if ((!isChimeric) && trimera) { //split sequence in half by bases string leftQuery, rightQuery; Sequence tempSeq(candidateSeq->getName(), candidateAligned); divideInHalf(tempSeq, leftQuery, rightQuery); //run chimeraSlayer on each piece Sequence* left = new Sequence(candidateSeq->getName(), leftQuery); Sequence* right = new Sequence(candidateSeq->getName(), rightQuery); //find chimeras chimera->getChimeras(left); data_results leftResults = chimera->getResults(); chimera->getChimeras(right); data_results rightResults = chimera->getResults(); //if either piece is chimeric then report bool flag = false; Sequence trimmed = chimera->print(outMPI, outAccMPI, leftResults, rightResults, flag); if (flag) { cnames.insert(candidateSeq->getName()); } if (trim) { string outputString = ">" + trimmed.getName() + "\n" + trimmed.getAligned() + "\n"; //write to accnos file int length = outputString.length(); char* buf2 = new char[length]; memcpy(buf2, outputString.c_str(), length); MPI_File_write_shared(outFastaMPI, buf2, length, MPI_CHAR, &status); delete buf2; } delete left; delete right; }else { //print results Sequence trimmed = chimera->print(outMPI, outAccMPI); cnames.insert(candidateSeq->getName()); if (trim) { string outputString = ">" + trimmed.getName() + "\n" + trimmed.getAligned() + "\n"; //write to accnos file int length = outputString.length(); char* buf2 = new char[length]; memcpy(buf2, outputString.c_str(), length); MPI_File_write_shared(outFastaMPI, buf2, length, MPI_CHAR, &status); delete buf2; } } } } delete candidateSeq; //report progress if((i+1) % 100 == 0){ cout << "Processing sequence: " << (i+1) << endl; } } //report progress if(num % 100 != 0){ cout << "Processing sequence: " << num << endl; } int numNoParents = chimera->getNumNoParents(); if (numNoParents == num) { cout << "[WARNING]: megablast returned 0 potential parents for all your sequences. This could be due to formatdb.exe not being setup properly, please check formatdb.log for errors." << endl; } delete chimera; return 0; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "driverMPI"); exit(1); } } #endif /**************************************************************************************************/ int ChimeraSlayerCommand::createProcesses(string outputFileName, string filename, string accnos, string fasta, map& thisPriority) { try { int process = 0; int num = 0; processIDS.clear(); bool recalc = false; if (m->debug) { m->mothurOut("[DEBUG]: filename = " + filename + "\n"); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], outputFileName + toString(m->mothurGetpid(process)) + ".temp", filename, accnos + toString(m->mothurGetpid(process)) + ".temp", fasta + toString(m->mothurGetpid(process)) + ".temp", thisPriority); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 0; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], outputFileName + toString(m->mothurGetpid(process)) + ".temp", filename, accnos + toString(m->mothurGetpid(process)) + ".temp", fasta + toString(m->mothurGetpid(process)) + ".temp", thisPriority); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the slayerData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors]; HANDLE hThreadArray[processors]; //Create processor worker threads. for( int i=0; icount; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif rename((outputFileName + toString(processIDS[0]) + ".temp").c_str(), outputFileName.c_str()); rename((accnos + toString(processIDS[0]) + ".temp").c_str(), accnos.c_str()); if (trim) { rename((fasta + toString(processIDS[0]) + ".temp").c_str(), fasta.c_str()); } //append output files for(int i=1;iappendFiles((outputFileName + toString(processIDS[i]) + ".temp"), outputFileName); m->mothurRemove((outputFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((accnos + toString(processIDS[i]) + ".temp"), accnos); m->mothurRemove((accnos + toString(processIDS[i]) + ".temp")); if (trim) { m->appendFiles((fasta + toString(processIDS[i]) + ".temp"), fasta); m->mothurRemove((fasta + toString(processIDS[i]) + ".temp")); } } return num; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ int ChimeraSlayerCommand::divideInHalf(Sequence querySeq, string& leftQuery, string& rightQuery) { try { string queryUnAligned = querySeq.getUnaligned(); int numBases = int(queryUnAligned.length() * 0.5); string queryAligned = querySeq.getAligned(); leftQuery = querySeq.getAligned(); rightQuery = querySeq.getAligned(); int baseCount = 0; int leftSpot = 0; for (int i = 0; i < queryAligned.length(); i++) { //if you are a base if (isalpha(queryAligned[i])) { baseCount++; } //if you have half if (baseCount >= numBases) { leftSpot = i; break; } //first half } //blank out right side for (int i = leftSpot; i < leftQuery.length(); i++) { leftQuery[i] = '.'; } //blank out left side for (int i = 0; i < leftSpot; i++) { rightQuery[i] = '.'; } return 0; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "divideInHalf"); exit(1); } } /**************************************************************************************************/ map ChimeraSlayerCommand::sortFastaFile(string fastaFile, string nameFile) { try { map nameAbund; //read through fastafile and store info map seqs; ifstream in; m->openInputFile(fastaFile, in); while (!in.eof()) { if (m->control_pressed) { in.close(); return nameAbund; } Sequence seq(in); m->gobble(in); seqs[seq.getName()] = seq.getAligned(); } in.close(); //read namefile or countfile vector nameMapCount; int error; if (hasCount) { CountTable ct; ct.readTable(nameFile, true, false); for(map::iterator it = seqs.begin(); it != seqs.end(); it++) { int num = ct.getNumSeqs(it->first); if (num == 0) { error = 1; } else { seqPriorityNode temp(num, it->second, it->first); nameMapCount.push_back(temp); } } }else { error = m->readNames(nameFile, nameMapCount, seqs); } if (m->control_pressed) { return nameAbund; } if (error == 1) { m->control_pressed = true; return nameAbund; } if (seqs.size() != nameMapCount.size()) { m->mothurOut( "The number of sequences in your fastafile does not match the number of sequences in your namefile, aborting."); m->mothurOutEndLine(); m->control_pressed = true; return nameAbund; } sort(nameMapCount.begin(), nameMapCount.end(), compareSeqPriorityNodes); string newFasta = fastaFile + ".temp"; ofstream out; m->openOutputFile(newFasta, out); //print new file in order of for (int i = 0; i < nameMapCount.size(); i++) { out << ">" << nameMapCount[i].name << endl << nameMapCount[i].seq << endl; nameAbund[nameMapCount[i].name] = nameMapCount[i].numIdentical; } out.close(); rename(newFasta.c_str(), fastaFile.c_str()); return nameAbund; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "sortFastaFile"); exit(1); } } /**************************************************************************************************/ map ChimeraSlayerCommand::sortFastaFile(vector& thisseqs, map& nameMap, string newFile) { try { map nameAbund; vector nameVector; //read through fastafile and store info map seqs; for (int i = 0; i < thisseqs.size(); i++) { if (m->control_pressed) { return nameAbund; } map::iterator itNameMap = nameMap.find(thisseqs[i].getName()); if (itNameMap == nameMap.end()){ m->control_pressed = true; m->mothurOut("[ERROR]: " + thisseqs[i].getName() + " is in your fastafile, but is not in your namesfile, please correct."); m->mothurOutEndLine(); }else { int num = m->getNumNames(itNameMap->second); seqPriorityNode temp(num, thisseqs[i].getAligned(), thisseqs[i].getName()); nameVector.push_back(temp); } } //sort by num represented sort(nameVector.begin(), nameVector.end(), compareSeqPriorityNodes); if (m->control_pressed) { return nameAbund; } if (thisseqs.size() != nameVector.size()) { m->mothurOut( "The number of sequences in your fastafile does not match the number of sequences in your namefile, aborting."); m->mothurOutEndLine(); m->control_pressed = true; return nameAbund; } ofstream out; m->openOutputFile(newFile, out); //print new file in order of for (int i = 0; i < nameVector.size(); i++) { out << ">" << nameVector[i].name << endl << nameVector[i].seq << endl; nameAbund[nameVector[i].name] = nameVector[i].numIdentical; } out.close(); return nameAbund; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "sortFastaFile"); exit(1); } } /**************************************************************************************************/ int ChimeraSlayerCommand::sortFastaFile(vector& thisseqs, map& countMap, string newFile) { try { vector nameVector; //read through fastafile and store info map seqs; for (int i = 0; i < thisseqs.size(); i++) { if (m->control_pressed) { return 0; } map::iterator itCountMap = countMap.find(thisseqs[i].getName()); if (itCountMap == countMap.end()){ m->control_pressed = true; m->mothurOut("[ERROR]: " + thisseqs[i].getName() + " is in your fastafile, but is not in your count file, please correct."); m->mothurOutEndLine(); }else { seqPriorityNode temp(itCountMap->second, thisseqs[i].getAligned(), thisseqs[i].getName()); nameVector.push_back(temp); } } //sort by num represented sort(nameVector.begin(), nameVector.end(), compareSeqPriorityNodes); if (m->control_pressed) { return 0; } if (thisseqs.size() != nameVector.size()) { m->mothurOut( "The number of sequences in your fastafile does not match the number of sequences in your count file, aborting."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } ofstream out; m->openOutputFile(newFile, out); //print new file in order of for (int i = 0; i < nameVector.size(); i++) { out << ">" << nameVector[i].name << endl << nameVector[i].seq << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "sortFastaFile"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/commands/chimeraslayercommand.h000066400000000000000000000603471255543666200226110ustar00rootroot00000000000000#ifndef CHIMERASLAYERCOMMAND_H #define CHIMERASLAYERCOMMAND_H /* * chimeraslayercommand.h * Mothur * * Created by westcott on 3/31/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "chimera.h" #include "chimeraslayer.h" #include "sequenceparser.h" #include "sequencecountparser.h" /***********************************************************/ class ChimeraSlayerCommand : public Command { public: ChimeraSlayerCommand(string); ChimeraSlayerCommand(); ~ChimeraSlayerCommand() {} vector setParameters(); string getCommandName() { return "chimera.slayer"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Haas BJ, Gevers D, Earl A, Feldgarden M, Ward DV, Giannokous G, Ciulla D, Tabbaa D, Highlander SK, Sodergren E, Methe B, Desantis TZ, Petrosino JF, Knight R, Birren BW (2011). Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21:494.\nhttp://www.mothur.org/wiki/Chimera.slayer"; } string getDescription() { return "detect chimeric sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector processIDS; //processid vector lines; int driver(linePair, string, string, string, string, map&); int createProcesses(string, string, string, string, map&); int divideInHalf(Sequence, string&, string&); map sortFastaFile(string, string); map sortFastaFile(vector&, map&, string newFile); int sortFastaFile(vector&, map&, string newFile); string getNamesFile(string&); //int setupChimera(string,); int MPIExecute(string, string, string, string, map&); int deconvoluteResults(map&, string, string, string); map priority; int setUpForSelfReference(SequenceParser*&, map&, map >&, int); int setUpForSelfReference(SequenceCountParser*&, map&, map >&, int); int driverGroups(string, string, string, map >&, map&, string); int createProcessesGroups(string, string, string, map >&, map&, string, string); int MPIExecuteGroups(string, string, string, map >&, map&, string, string); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, MPI_File&, MPI_File&, MPI_File&, set&, vector&, string, map&, bool); #endif bool abort, realign, trim, trimera, save, hasName, hasCount, dups; string fastafile, groupfile, templatefile, outputDir, search, namefile, countfile, blastlocation; int processors, window, iters, increment, numwanted, ksize, match, mismatch, parents, minSimilarity, minCoverage, minBS, minSNP, numSeqs, templateSeqsLength; float divR; map > group2NameMap; vector outputNames; vector fastaFileNames; vector nameFileNames; vector groupFileNames; }; /***********************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct slayerData { string outputFName; string fasta; string accnos; string filename, countlist; string templatefile; string search; string blastlocation; bool trimera; bool trim, realign, dups, hasCount; unsigned long long start; unsigned long long end; int ksize, match, mismatch, window, minSimilarity, minCoverage, minBS, minSNP, parents, iters, increment, numwanted; MothurOut* m; float divR; map priority; int count; int numNoParents; int threadId; map > fileToPriority; map fileGroup; map > group2NameMap; slayerData(){} slayerData(string o, string fa, string ac, string f, string te, string se, string bl, bool tri, bool trm, bool re, MothurOut* mout, unsigned long long st, unsigned long long en, int ks, int ma, int mis, int win, int minS, int minC, int miBS, int minSN, int par, int it, int inc, int numw, float div, map prior, int tid) { outputFName = o; fasta = fa; accnos = ac; filename = f; templatefile = te; search = se; blastlocation = bl; trimera = tri; trim = trm; realign = re; m = mout; start = st; end = en; ksize = ks; match = ma; mismatch = mis; window = win; minSimilarity = minS; minCoverage = minC; minBS = miBS; minSNP = minSN; parents = par; iters = it; increment = inc; numwanted = numw; divR = div; priority = prior; threadId = tid; count = 0; numNoParents = 0; } slayerData(map > g2n, bool hc, bool dps, string cl, string o, string fa, string ac, string te, string se, string bl, bool tri, bool trm, bool re, MothurOut* mout, map >& fPriority, map& fileG, int ks, int ma, int mis, int win, int minS, int minC, int miBS, int minSN, int par, int it, int inc, int numw, float div, map prior, int tid) { outputFName = o; fasta = fa; accnos = ac; templatefile = te; search = se; blastlocation = bl; countlist = cl; dups = dps; hasCount = hc; group2NameMap = g2n; trimera = tri; trim = trm; realign = re; m = mout; fileGroup = fileG; fileToPriority = fPriority; ksize = ks; match = ma; mismatch = mis; window = win; minSimilarity = minS; minCoverage = minC; minBS = miBS; minSNP = minSN; parents = par; iters = it; increment = inc; numwanted = numw; divR = div; priority = prior; threadId = tid; count = 0; numNoParents = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MySlayerThreadFunction(LPVOID lpParam){ slayerData* pDataArray; pDataArray = (slayerData*)lpParam; try { ofstream out; pDataArray->m->openOutputFile(pDataArray->outputFName, out); ofstream out2; pDataArray->m->openOutputFile(pDataArray->accnos, out2); ofstream out3; if (pDataArray->trim) { pDataArray->m->openOutputFile(pDataArray->fasta, out3); } ifstream inFASTA; pDataArray->m->openInputFile(pDataArray->filename, inFASTA); Chimera* chimera; if (pDataArray->templatefile != "self") { //you want to run slayer with a reference template chimera = new ChimeraSlayer(pDataArray->filename, pDataArray->templatefile, pDataArray->trim, pDataArray->search, pDataArray->ksize, pDataArray->match, pDataArray->mismatch, pDataArray->window, pDataArray->divR, pDataArray->minSimilarity, pDataArray->minCoverage, pDataArray->minBS, pDataArray->minSNP, pDataArray->parents, pDataArray->iters, pDataArray->increment, pDataArray->numwanted, pDataArray->realign, pDataArray->blastlocation, pDataArray->threadId); }else { chimera = new ChimeraSlayer(pDataArray->filename, pDataArray->templatefile, pDataArray->trim, pDataArray->priority, pDataArray->search, pDataArray->ksize, pDataArray->match, pDataArray->mismatch, pDataArray->window, pDataArray->divR, pDataArray->minSimilarity, pDataArray->minCoverage, pDataArray->minBS, pDataArray->minSNP, pDataArray->parents, pDataArray->iters, pDataArray->increment, pDataArray->numwanted, pDataArray->realign, pDataArray->blastlocation, pDataArray->threadId); } //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { chimera->printHeader(out); inFASTA.seekg(0); pDataArray->m->zapGremlins(inFASTA); }else { //this accounts for the difference in line endings. inFASTA.seekg(pDataArray->start-1); pDataArray->m->gobble(inFASTA); } if (pDataArray->m->control_pressed) { out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete chimera; return 0; } if (chimera->getUnaligned()) { pDataArray->m->mothurOut("Your template sequences are different lengths, please correct."); pDataArray->m->mothurOutEndLine(); out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete chimera; return 0; } int templateSeqsLength = chimera->getLength(); if (pDataArray->start == 0) { chimera->printHeader(out); } pDataArray->count = 0; for(int i = 0; i < pDataArray->end; i++){ if (pDataArray->m->control_pressed) { out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete chimera; return 1; } Sequence* candidateSeq = new Sequence(inFASTA); pDataArray->m->gobble(inFASTA); string candidateAligned = candidateSeq->getAligned(); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getAligned().length() != templateSeqsLength) { pDataArray->m->mothurOut(candidateSeq->getName() + " is not the same length as the template sequences. Skipping."); pDataArray->m->mothurOutEndLine(); }else{ //find chimeras chimera->getChimeras(candidateSeq); if (pDataArray->m->control_pressed) { delete candidateSeq; delete chimera; return 1; } //if you are not chimeric, then check each half data_results wholeResults = chimera->getResults(); //determine if we need to split bool isChimeric = false; if (wholeResults.flag == "yes") { string chimeraFlag = "no"; if( (wholeResults.results[0].bsa >= pDataArray->minBS && wholeResults.results[0].divr_qla_qrb >= pDataArray->divR) || (wholeResults.results[0].bsb >= pDataArray->minBS && wholeResults.results[0].divr_qlb_qra >= pDataArray->divR) ) { chimeraFlag = "yes"; } if (chimeraFlag == "yes") { if ((wholeResults.results[0].bsa >= pDataArray->minBS) || (wholeResults.results[0].bsb >= pDataArray->minBS)) { isChimeric = true; } } } if ((!isChimeric) && pDataArray->trimera) { //split sequence in half by bases string leftQuery, rightQuery; Sequence tempSeq(candidateSeq->getName(), candidateAligned); //divideInHalf(tempSeq, leftQuery, rightQuery); string queryUnAligned = tempSeq.getUnaligned(); int numBases = int(queryUnAligned.length() * 0.5); string queryAligned = tempSeq.getAligned(); leftQuery = tempSeq.getAligned(); rightQuery = tempSeq.getAligned(); int baseCount = 0; int leftSpot = 0; for (int i = 0; i < queryAligned.length(); i++) { //if you are a base if (isalpha(queryAligned[i])) { baseCount++; } //if you have half if (baseCount >= numBases) { leftSpot = i; break; } //first half } //blank out right side for (int i = leftSpot; i < leftQuery.length(); i++) { leftQuery[i] = '.'; } //blank out left side for (int i = 0; i < leftSpot; i++) { rightQuery[i] = '.'; } //run chimeraSlayer on each piece Sequence* left = new Sequence(candidateSeq->getName(), leftQuery); Sequence* right = new Sequence(candidateSeq->getName(), rightQuery); //find chimeras chimera->getChimeras(left); data_results leftResults = chimera->getResults(); chimera->getChimeras(right); data_results rightResults = chimera->getResults(); //if either piece is chimeric then report Sequence trimmed = chimera->print(out, out2, leftResults, rightResults); if (pDataArray->trim) { trimmed.printSequence(out3); } delete left; delete right; }else { //already chimeric //print results Sequence trimmed = chimera->print(out, out2); if (pDataArray->trim) { trimmed.printSequence(out3); } } } pDataArray->count++; } delete candidateSeq; //report progress if((pDataArray->count) % 100 == 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(pDataArray->count) +"\n"); } } //report progress if((pDataArray->count) % 100 != 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(pDataArray->count)+"\n"); } pDataArray->numNoParents = chimera->getNumNoParents(); if (pDataArray->numNoParents == pDataArray->count) { pDataArray->m->mothurOut("[WARNING]: megablast returned 0 potential parents for all your sequences. This could be due to formatdb.exe not being setup properly, please check formatdb.log for errors.\n"); } out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete chimera; return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ChimeraSlayerCommand", "MySlayerThreadFunction"); exit(1); } } /**************************************************************************************************/ static DWORD WINAPI MySlayerGroupThreadFunction(LPVOID lpParam){ slayerData* pDataArray; pDataArray = (slayerData*)lpParam; try { ofstream outCountList; if (pDataArray->hasCount && pDataArray->dups) { pDataArray->m->openOutputFile(pDataArray->countlist, outCountList); } int totalSeqs = 0; pDataArray->end = 0; for (map >::iterator itFile = pDataArray->fileToPriority.begin(); itFile != pDataArray->fileToPriority.end(); itFile++) { if (pDataArray->m->control_pressed) { return 0; } int start = time(NULL); string thisFastaName = itFile->first; map thisPriority = itFile->second; string thisoutputFileName = pDataArray->m->getRootName(pDataArray->m->getSimpleName(thisFastaName)) + pDataArray->fileGroup[thisFastaName] + "slayer.chimera"; string thisaccnosFileName = pDataArray->m->getRootName(pDataArray->m->getSimpleName(thisFastaName)) + pDataArray->fileGroup[thisFastaName] + "slayer.accnos"; string thistrimFastaFileName = pDataArray->m->getRootName(pDataArray->m->getSimpleName(thisFastaName)) + pDataArray->fileGroup[thisFastaName] + "slayer.fasta"; pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("Checking sequences from group: " + pDataArray->fileGroup[thisFastaName] + "."); pDataArray->m->mothurOutEndLine(); //int numSeqs = driver(lines[0], thisoutputFileName, thisFastaName, thisaccnosFileName, thistrimFastaFileName, thisPriority); //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// ofstream out; pDataArray->m->openOutputFile(thisoutputFileName, out); ofstream out2; pDataArray->m->openOutputFile(thisaccnosFileName, out2); ofstream out3; if (pDataArray->trim) { pDataArray->m->openOutputFile(thistrimFastaFileName, out3); } ifstream inFASTA; pDataArray->m->openInputFile(thisFastaName, inFASTA); Chimera* chimera; chimera = new ChimeraSlayer(thisFastaName, pDataArray->templatefile, pDataArray->trim, thisPriority, pDataArray->search, pDataArray->ksize, pDataArray->match, pDataArray->mismatch, pDataArray->window, pDataArray->divR, pDataArray->minSimilarity, pDataArray->minCoverage, pDataArray->minBS, pDataArray->minSNP, pDataArray->parents, pDataArray->iters, pDataArray->increment, pDataArray->numwanted, pDataArray->realign, pDataArray->blastlocation, pDataArray->threadId); chimera->printHeader(out); int numSeqs = 0; if (pDataArray->m->control_pressed) { out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete chimera; return 0; } if (chimera->getUnaligned()) { pDataArray->m->mothurOut("Your template sequences are different lengths, please correct."); pDataArray->m->mothurOutEndLine(); out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete chimera; return 0; } int templateSeqsLength = chimera->getLength(); bool done = false; while (!done) { if (pDataArray->m->control_pressed) { out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete chimera; return 1; } Sequence* candidateSeq = new Sequence(inFASTA); pDataArray->m->gobble(inFASTA); string candidateAligned = candidateSeq->getAligned(); if (candidateSeq->getName() != "") { //incase there is a commented sequence at the end of a file if (candidateSeq->getAligned().length() != templateSeqsLength) { pDataArray->m->mothurOut(candidateSeq->getName() + " is not the same length as the template sequences. Skipping."); pDataArray->m->mothurOutEndLine(); }else{ //find chimeras chimera->getChimeras(candidateSeq); if (pDataArray->m->control_pressed) { out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete candidateSeq; delete chimera; return 1; } //if you are not chimeric, then check each half data_results wholeResults = chimera->getResults(); //determine if we need to split bool isChimeric = false; if (wholeResults.flag == "yes") { string chimeraFlag = "no"; if( (wholeResults.results[0].bsa >= pDataArray->minBS && wholeResults.results[0].divr_qla_qrb >= pDataArray->divR) || (wholeResults.results[0].bsb >= pDataArray->minBS && wholeResults.results[0].divr_qlb_qra >= pDataArray->divR) ) { chimeraFlag = "yes"; } if (chimeraFlag == "yes") { if ((wholeResults.results[0].bsa >= pDataArray->minBS) || (wholeResults.results[0].bsb >= pDataArray->minBS)) { isChimeric = true; } } } if ((!isChimeric) && pDataArray->trimera) { //split sequence in half by bases string leftQuery, rightQuery; Sequence tempSeq(candidateSeq->getName(), candidateAligned); //divideInHalf(tempSeq, leftQuery, rightQuery); string queryUnAligned = tempSeq.getUnaligned(); int numBases = int(queryUnAligned.length() * 0.5); string queryAligned = tempSeq.getAligned(); leftQuery = tempSeq.getAligned(); rightQuery = tempSeq.getAligned(); int baseCount = 0; int leftSpot = 0; for (int i = 0; i < queryAligned.length(); i++) { //if you are a base if (isalpha(queryAligned[i])) { baseCount++; } //if you have half if (baseCount >= numBases) { leftSpot = i; break; } //first half } //blank out right side for (int i = leftSpot; i < leftQuery.length(); i++) { leftQuery[i] = '.'; } //blank out left side for (int i = 0; i < leftSpot; i++) { rightQuery[i] = '.'; } //run chimeraSlayer on each piece Sequence* left = new Sequence(candidateSeq->getName(), leftQuery); Sequence* right = new Sequence(candidateSeq->getName(), rightQuery); //find chimeras chimera->getChimeras(left); data_results leftResults = chimera->getResults(); chimera->getChimeras(right); data_results rightResults = chimera->getResults(); //if either piece is chimeric then report Sequence trimmed = chimera->print(out, out2, leftResults, rightResults); if (pDataArray->trim) { trimmed.printSequence(out3); } delete left; delete right; }else { //already chimeric //print results Sequence trimmed = chimera->print(out, out2); if (pDataArray->trim) { trimmed.printSequence(out3); } } } numSeqs++; } delete candidateSeq; if (inFASTA.eof()) { break; } //report progress if((numSeqs) % 100 == 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(numSeqs)+"\n"); pDataArray->m->mothurOutEndLine(); } } //report progress if((numSeqs) % 100 != 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(numSeqs)+"\n"); } pDataArray->numNoParents = chimera->getNumNoParents(); if (pDataArray->numNoParents == numSeqs) { pDataArray->m->mothurOut("[WARNING]: megablast returned 0 potential parents for all your sequences. This could be due to formatdb.exe not being setup properly, please check formatdb.log for errors.\n"); } out.close(); out2.close(); if (pDataArray->trim) { out3.close(); } inFASTA.close(); delete chimera; pDataArray->end++; //////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// //if we provided a count file with group info and set dereplicate=t, then we want to create a *.pick.count_table //This table will zero out group counts for seqs determined to be chimeric by that group. if (pDataArray->dups) { if (!pDataArray->m->isBlank(thisaccnosFileName)) { ifstream in; pDataArray->m->openInputFile(thisaccnosFileName, in); string name; if (pDataArray->hasCount) { while (!in.eof()) { in >> name; pDataArray->m->gobble(in); outCountList << name << '\t' << pDataArray->fileGroup[thisFastaName] << endl; } in.close(); }else { map >::iterator itGroupNameMap = pDataArray->group2NameMap.find(pDataArray->fileGroup[thisFastaName]); if (itGroupNameMap != pDataArray->group2NameMap.end()) { map thisnamemap = itGroupNameMap->second; map::iterator itN; ofstream out; pDataArray->m->openOutputFile(thisaccnosFileName+".temp", out); while (!in.eof()) { in >> name; pDataArray->m->gobble(in); //pDataArray->m->mothurOut("here = " + name + '\t'); itN = thisnamemap.find(name); if (itN != thisnamemap.end()) { vector tempNames; pDataArray->m->splitAtComma(itN->second, tempNames); for (int j = 0; j < tempNames.size(); j++) { out << tempNames[j] << endl; } //pDataArray->m->mothurOut(itN->second + '\n'); }else { pDataArray->m->mothurOut("[ERROR]: parsing cannot find " + name + ".\n"); pDataArray->m->control_pressed = true; } } out.close(); in.close(); pDataArray->m->renameFile(thisaccnosFileName+".temp", thisaccnosFileName); }else { pDataArray->m->mothurOut("[ERROR]: parsing cannot find " + pDataArray->fileGroup[thisFastaName] + ".\n"); pDataArray->m->control_pressed = true; } } } } //append files pDataArray->m->appendFiles(thisoutputFileName, pDataArray->outputFName); pDataArray->m->mothurRemove(thisoutputFileName); pDataArray->m->appendFiles(thisaccnosFileName, pDataArray->accnos); pDataArray->m->mothurRemove(thisaccnosFileName); if (pDataArray->trim) { pDataArray->m->appendFiles(thistrimFastaFileName, pDataArray->fasta); pDataArray->m->mothurRemove(thistrimFastaFileName); } pDataArray->m->mothurRemove(thisFastaName); totalSeqs += numSeqs; pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences from group " + pDataArray->fileGroup[thisFastaName] + "."); pDataArray->m->mothurOutEndLine(); } pDataArray->count = totalSeqs; if (pDataArray->hasCount && pDataArray->dups) { outCountList.close(); } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ChimeraSlayerCommand", "MySlayerGroupThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/chimerauchimecommand.cpp000066400000000000000000002550111255543666200231110ustar00rootroot00000000000000/* * chimerauchimecommand.cpp * Mothur * * Created by westcott on 5/13/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "chimerauchimecommand.h" #include "deconvolutecommand.h" //#include "uc.h" #include "sequence.hpp" #include "referencedb.h" #include "systemcommand.h" //********************************************************************************************************************** vector ChimeraUchimeCommand::setParameters(){ try { CommandParameter ptemplate("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptemplate); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","chimera-accnos",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pstrand("strand", "String", "", "", "", "", "","",false,false); parameters.push_back(pstrand); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter pabskew("abskew", "Number", "", "1.9", "", "", "","",false,false); parameters.push_back(pabskew); CommandParameter pchimealns("chimealns", "Boolean", "", "F", "", "", "","alns",false,false); parameters.push_back(pchimealns); CommandParameter pminh("minh", "Number", "", "0.3", "", "", "","",false,false); parameters.push_back(pminh); CommandParameter pmindiv("mindiv", "Number", "", "0.5", "", "", "","",false,false); parameters.push_back(pmindiv); CommandParameter pxn("xn", "Number", "", "8.0", "", "", "","",false,false); parameters.push_back(pxn); CommandParameter pdn("dn", "Number", "", "1.4", "", "", "","",false,false); parameters.push_back(pdn); CommandParameter pxa("xa", "Number", "", "1", "", "", "","",false,false); parameters.push_back(pxa); CommandParameter pchunks("chunks", "Number", "", "4", "", "", "","",false,false); parameters.push_back(pchunks); CommandParameter pminchunk("minchunk", "Number", "", "64", "", "", "","",false,false); parameters.push_back(pminchunk); CommandParameter pidsmoothwindow("idsmoothwindow", "Number", "", "32", "", "", "","",false,false); parameters.push_back(pidsmoothwindow); CommandParameter pdups("dereplicate", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pdups); //CommandParameter pminsmoothid("minsmoothid", "Number", "", "0.95", "", "", "",false,false); parameters.push_back(pminsmoothid); CommandParameter pmaxp("maxp", "Number", "", "2", "", "", "","",false,false); parameters.push_back(pmaxp); CommandParameter pskipgaps("skipgaps", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pskipgaps); CommandParameter pskipgaps2("skipgaps2", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pskipgaps2); CommandParameter pminlen("minlen", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pminlen); CommandParameter pmaxlen("maxlen", "Number", "", "10000", "", "", "","",false,false); parameters.push_back(pmaxlen); CommandParameter pucl("ucl", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pucl); CommandParameter pqueryfract("queryfract", "Number", "", "0.5", "", "", "","",false,false); parameters.push_back(pqueryfract); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ChimeraUchimeCommand::getHelpString(){ try { string helpString = ""; helpString += "The chimera.uchime command reads a fastafile and referencefile and outputs potentially chimeric sequences.\n"; helpString += "This command is a wrapper for uchime written by Robert C. Edgar.\n"; helpString += "The chimera.uchime command parameters are fasta, name, count, reference, processors, dereplicate, abskew, chimealns, minh, mindiv, xn, dn, xa, chunks, minchunk, idsmoothwindow, minsmoothid, maxp, skipgaps, skipgaps2, minlen, maxlen, ucl, strand and queryfact.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your potentially chimeric sequences, and is required, unless you have a valid current fasta file. \n"; helpString += "The name parameter allows you to provide a name file, if you are using template=self. \n"; helpString += "The count parameter allows you to provide a count file, if you are using template=self. When you use a count file with group info and dereplicate=T, mothur will create a *.pick.count_table file containing seqeunces after chimeras are removed. \n"; helpString += "You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amazon.fasta \n"; helpString += "The group parameter allows you to provide a group file. The group file can be used with a namesfile and reference=self. When checking sequences, only sequences from the same group as the query sequence will be used as the reference. \n"; helpString += "If the dereplicate parameter is false, then if one group finds the seqeunce to be chimeric, then all groups find it to be chimeric, default=f.\n"; helpString += "The reference parameter allows you to enter a reference file containing known non-chimeric sequences, and is required. You may also set template=self, in this case the abundant sequences will be used as potential parents. \n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; helpString += "The abskew parameter can only be used with template=self. Minimum abundance skew. Default 1.9. Abundance skew is: min [ abund(parent1), abund(parent2) ] / abund(query).\n"; helpString += "The chimealns parameter allows you to indicate you would like a file containing multiple alignments of query sequences to parents in human readable format. Alignments show columns with differences that support or contradict a chimeric model.\n"; helpString += "The minh parameter - mininum score to report chimera. Default 0.3. Values from 0.1 to 5 might be reasonable. Lower values increase sensitivity but may report more false positives. If you decrease xn you may need to increase minh, and vice versa.\n"; helpString += "The mindiv parameter - minimum divergence ratio, default 0.5. Div ratio is 100%% - %%identity between query sequence and the closest candidate for being a parent. If you don't care about very close chimeras, then you could increase mindiv to, say, 1.0 or 2.0, and also decrease minh, say to 0.1, to increase sensitivity. How well this works will depend on your data. Best is to tune parameters on a good benchmark.\n"; helpString += "The xn parameter - weight of a no vote. Default 8.0. Decreasing this weight to around 3 or 4 may give better performance on denoised data.\n"; helpString += "The dn parameter - pseudo-count prior on number of no votes. Default 1.4. Probably no good reason to change this unless you can retune to a good benchmark for your data. Reasonable values are probably in the range from 0.2 to 2.\n"; helpString += "The xa parameter - weight of an abstain vote. Default 1. So far, results do not seem to be very sensitive to this parameter, but if you have a good training set might be worth trying. Reasonable values might range from 0.1 to 2.\n"; helpString += "The chunks parameter is the number of chunks to extract from the query sequence when searching for parents. Default 4.\n"; helpString += "The minchunk parameter is the minimum length of a chunk. Default 64.\n"; helpString += "The idsmoothwindow parameter is the length of id smoothing window. Default 32.\n"; //helpString += "The minsmoothid parameter - minimum factional identity over smoothed window of candidate parent. Default 0.95.\n"; helpString += "The maxp parameter - maximum number of candidate parents to consider. Default 2. In tests so far, increasing maxp gives only a very small improvement in sensivity but tends to increase the error rate quite a bit.\n"; helpString += "The skipgaps parameter controls how gapped columns affect counting of diffs. If skipgaps is set to T, columns containing gaps do not found as diffs. Default = T.\n"; helpString += "The skipgaps2 parameter controls how gapped columns affect counting of diffs. If skipgaps2 is set to T, if column is immediately adjacent to a column containing a gap, it is not counted as a diff. Default = T.\n"; helpString += "The minlen parameter is the minimum unaligned sequence length. Defaults 10. Applies to both query and reference sequences.\n"; helpString += "The maxlen parameter is the maximum unaligned sequence length. Defaults 10000. Applies to both query and reference sequences.\n"; helpString += "The ucl parameter - use local-X alignments. Default is global-X or false. On tests so far, global-X is always better; this option is retained because it just might work well on some future type of data.\n"; helpString += "The queryfract parameter - minimum fraction of the query sequence that must be covered by a local-X alignment. Default 0.5. Applies only when ucl is true.\n"; #ifdef USE_MPI helpString += "When using MPI, the processors parameter is set to the number of MPI processes running. \n"; #endif helpString += "The chimera.uchime command should be in the following format: \n"; helpString += "chimera.uchime(fasta=yourFastaFile, reference=yourTemplate) \n"; helpString += "Example: chimera.uchime(fasta=AD.align, reference=silva.gold.align) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ChimeraUchimeCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "chimera") { pattern = "[filename],[tag],uchime.chimeras"; } else if (type == "accnos") { pattern = "[filename],[tag],uchime.accnos"; } else if (type == "alns") { pattern = "[filename],[tag],uchime.alns"; } else if (type == "count") { pattern = "[filename],[tag],uchime.pick.count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ChimeraUchimeCommand::ChimeraUchimeCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["alns"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "ChimeraUchimeCommand"); exit(1); } } //*************************************************************************************************************** ChimeraUchimeCommand::ChimeraUchimeCommand(string option) { try { abort = false; calledHelp = false; hasName=false; hasCount=false; ReferenceDB* rdb = ReferenceDB::getInstance(); //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("chimera.uchime"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["chimera"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["alns"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", false); if (fastafile == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(fastafile, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("[ERROR]: no valid files."); m->mothurOutEndLine(); abort = true; } } //check for required parameters namefile = validParameter.validFile(parameters, "name", false); if (namefile == "not found") { namefile = ""; } else { m->splitAtDash(namefile, nameFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < nameFileNames.size(); i++) { bool ignore = false; if (nameFileNames[i] == "current") { nameFileNames[i] = m->getNameFile(); if (nameFileNames[i] != "") { m->mothurOut("Using " + nameFileNames[i] + " as input file for the name parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list nameFileNames.erase(nameFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(nameFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { nameFileNames[i] = inputDir + nameFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(nameFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(nameFileNames[i]); m->mothurOut("Unable to open " + nameFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); nameFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(nameFileNames[i]); m->mothurOut("Unable to open " + nameFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); nameFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + nameFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list nameFileNames.erase(nameFileNames.begin()+i); i--; }else { m->setNameFile(nameFileNames[i]); } } } } if (nameFileNames.size() != 0) { hasName = true; } //check for required parameters vector countfileNames; countfile = validParameter.validFile(parameters, "count", false); if (countfile == "not found") { countfile = ""; }else { m->splitAtDash(countfile, countfileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < countfileNames.size(); i++) { bool ignore = false; if (countfileNames[i] == "current") { countfileNames[i] = m->getCountTableFile(); if (nameFileNames[i] != "") { m->mothurOut("Using " + countfileNames[i] + " as input file for the count parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current count file, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list countfileNames.erase(countfileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(countfileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { countfileNames[i] = inputDir + countfileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(countfileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(countfileNames[i]); m->mothurOut("Unable to open " + countfileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); countfileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(countfileNames[i]); m->mothurOut("Unable to open " + countfileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); countfileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + countfileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list countfileNames.erase(countfileNames.begin()+i); i--; }else { m->setCountTableFile(countfileNames[i]); } } } } if (countfileNames.size() != 0) { hasCount = true; } //make sure there is at least one valid file left if (hasName && hasCount) { m->mothurOut("[ERROR]: You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } if (!hasName && hasCount) { nameFileNames = countfileNames; } if ((hasCount || hasName) && (nameFileNames.size() != fastaFileNames.size())) { m->mothurOut("[ERROR]: The number of name or count files does not match the number of fastafiles, please correct."); m->mothurOutEndLine(); abort=true; } bool hasGroup = true; groupfile = validParameter.validFile(parameters, "group", false); if (groupfile == "not found") { groupfile = ""; hasGroup = false; } else { m->splitAtDash(groupfile, groupFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < groupFileNames.size(); i++) { bool ignore = false; if (groupFileNames[i] == "current") { groupFileNames[i] = m->getGroupFile(); if (groupFileNames[i] != "") { m->mothurOut("Using " + groupFileNames[i] + " as input file for the group parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list groupFileNames.erase(groupFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(groupFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { groupFileNames[i] = inputDir + groupFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(groupFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(groupFileNames[i]); m->mothurOut("Unable to open " + groupFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(groupFileNames[i]); m->mothurOut("Unable to open " + groupFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + groupFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list groupFileNames.erase(groupFileNames.begin()+i); i--; }else { m->setGroupFile(groupFileNames[i]); } } } //make sure there is at least one valid file left if (groupFileNames.size() == 0) { m->mothurOut("[ERROR]: no valid group files."); m->mothurOutEndLine(); abort = true; } } if (hasGroup && (groupFileNames.size() != fastaFileNames.size())) { m->mothurOut("[ERROR]: The number of groupfiles does not match the number of fastafiles, please correct."); m->mothurOutEndLine(); abort=true; } if (hasGroup && hasCount) { m->mothurOut("[ERROR]: You must enter ONLY ONE of the following: count or group."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } string path; it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ if (it->second == "self") { templatefile = "self"; } else { path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } templatefile = validParameter.validFile(parameters, "reference", true); if (templatefile == "not open") { abort = true; } else if (templatefile == "not found") { //check for saved reference sequences if (rdb->getSavedReference() != "") { templatefile = rdb->getSavedReference(); m->mothurOutEndLine(); m->mothurOut("Using sequences from " + rdb->getSavedReference() + "."); m->mothurOutEndLine(); }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required."); m->mothurOutEndLine(); abort = true; } } } }else if (hasName) { templatefile = "self"; } else if (hasCount) { templatefile = "self"; } else { if (rdb->getSavedReference() != "") { templatefile = rdb->getSavedReference(); m->mothurOutEndLine(); m->mothurOut("Using sequences from " + rdb->getSavedReference() + "."); m->mothurOutEndLine(); }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required."); m->mothurOutEndLine(); templatefile = ""; abort = true; } } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); abskew = validParameter.validFile(parameters, "abskew", false); if (abskew == "not found"){ useAbskew = false; abskew = "1.9"; }else{ useAbskew = true; } if (useAbskew && templatefile != "self") { m->mothurOut("The abskew parameter is only valid with template=self, ignoring."); m->mothurOutEndLine(); useAbskew = false; } temp = validParameter.validFile(parameters, "chimealns", false); if (temp == "not found") { temp = "f"; } chimealns = m->isTrue(temp); minh = validParameter.validFile(parameters, "minh", false); if (minh == "not found") { useMinH = false; minh = "0.3"; } else{ useMinH = true; } mindiv = validParameter.validFile(parameters, "mindiv", false); if (mindiv == "not found") { useMindiv = false; mindiv = "0.5"; } else{ useMindiv = true; } xn = validParameter.validFile(parameters, "xn", false); if (xn == "not found") { useXn = false; xn = "8.0"; } else{ useXn = true; } dn = validParameter.validFile(parameters, "dn", false); if (dn == "not found") { useDn = false; dn = "1.4"; } else{ useDn = true; } xa = validParameter.validFile(parameters, "xa", false); if (xa == "not found") { useXa = false; xa = "1"; } else{ useXa = true; } chunks = validParameter.validFile(parameters, "chunks", false); if (chunks == "not found") { useChunks = false; chunks = "4"; } else{ useChunks = true; } minchunk = validParameter.validFile(parameters, "minchunk", false); if (minchunk == "not found") { useMinchunk = false; minchunk = "64"; } else{ useMinchunk = true; } idsmoothwindow = validParameter.validFile(parameters, "idsmoothwindow", false); if (idsmoothwindow == "not found") { useIdsmoothwindow = false; idsmoothwindow = "32"; } else{ useIdsmoothwindow = true; } //minsmoothid = validParameter.validFile(parameters, "minsmoothid", false); if (minsmoothid == "not found") { useMinsmoothid = false; minsmoothid = "0.95"; } else{ useMinsmoothid = true; } maxp = validParameter.validFile(parameters, "maxp", false); if (maxp == "not found") { useMaxp = false; maxp = "2"; } else{ useMaxp = true; } minlen = validParameter.validFile(parameters, "minlen", false); if (minlen == "not found") { useMinlen = false; minlen = "10"; } else{ useMinlen = true; } maxlen = validParameter.validFile(parameters, "maxlen", false); if (maxlen == "not found") { useMaxlen = false; maxlen = "10000"; } else{ useMaxlen = true; } strand = validParameter.validFile(parameters, "strand", false); if (strand == "not found") { strand = ""; } temp = validParameter.validFile(parameters, "ucl", false); if (temp == "not found") { temp = "f"; } ucl = m->isTrue(temp); queryfract = validParameter.validFile(parameters, "queryfract", false); if (queryfract == "not found") { useQueryfract = false; queryfract = "0.5"; } else{ useQueryfract = true; } if (!ucl && useQueryfract) { m->mothurOut("queryfact may only be used when ucl=t, ignoring."); m->mothurOutEndLine(); useQueryfract = false; } temp = validParameter.validFile(parameters, "skipgaps", false); if (temp == "not found") { temp = "t"; } skipgaps = m->isTrue(temp); temp = validParameter.validFile(parameters, "skipgaps2", false); if (temp == "not found") { temp = "t"; } skipgaps2 = m->isTrue(temp); temp = validParameter.validFile(parameters, "dereplicate", false); if (temp == "not found") { temp = "false"; } dups = m->isTrue(temp); if (hasName && (templatefile != "self")) { m->mothurOut("You have provided a namefile and the reference parameter is not set to self. I am not sure what reference you are trying to use, aborting."); m->mothurOutEndLine(); abort=true; } if (hasCount && (templatefile != "self")) { m->mothurOut("You have provided a countfile and the reference parameter is not set to self. I am not sure what reference you are trying to use, aborting."); m->mothurOutEndLine(); abort=true; } if (hasGroup && (templatefile != "self")) { m->mothurOut("You have provided a group file and the reference parameter is not set to self. I am not sure what reference you are trying to use, aborting."); m->mothurOutEndLine(); abort=true; } //look for uchime exe path = m->argv; string tempPath = path; for (int i = 0; i < path.length(); i++) { tempPath[i] = tolower(path[i]); } path = path.substr(0, (tempPath.find_last_of('m'))); string uchimeCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) uchimeCommand = path + "uchime"; // format the database, -o option gives us the ability if (m->debug) { m->mothurOut("[DEBUG]: Uchime location using \"which uchime\" = "); Command* newCommand = new SystemCommand("which uchime"); m->mothurOutEndLine(); newCommand->execute(); delete newCommand; m->mothurOut("[DEBUG]: Mothur's location using \"which mothur\" = "); newCommand = new SystemCommand("which mothur"); m->mothurOutEndLine(); newCommand->execute(); delete newCommand; } #else uchimeCommand = path + "uchime.exe"; #endif //test to make sure uchime exists ifstream in; uchimeCommand = m->getFullPathName(uchimeCommand); int ableToOpen = m->openInputFile(uchimeCommand, in, "no error"); in.close(); if(ableToOpen == 1) { m->mothurOut(uchimeCommand + " file does not exist. Checking path... \n"); //check to see if uchime is in the path?? string uLocation = m->findProgramPath("uchime"); ifstream in2; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) ableToOpen = m->openInputFile(uLocation, in2, "no error"); in2.close(); #else ableToOpen = m->openInputFile((uLocation + ".exe"), in2, "no error"); in2.close(); #endif if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + uLocation + " file does not exist. mothur requires the uchime executable."); m->mothurOutEndLine(); abort = true; } else { m->mothurOut("Found uchime in your path, using " + uLocation + "\n");uchimeLocation = uLocation; } }else { uchimeLocation = uchimeCommand; } uchimeLocation = m->getFullPathName(uchimeLocation); } } catch(exception& e) { m->errorOut(e, "ChimeraSlayerCommand", "ChimeraSlayerCommand"); exit(1); } } //*************************************************************************************************************** int ChimeraUchimeCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } m->mothurOut("\nuchime by Robert C. Edgar\nhttp://drive5.com/uchime\nThis code is donated to the public domain.\n\n"); for (int s = 0; s < fastaFileNames.size(); s++) { m->mothurOut("Checking sequences from " + fastaFileNames[s] + " ..." ); m->mothurOutEndLine(); int start = time(NULL); string nameFile = ""; if (outputDir == "") { outputDir = m->hasPath(fastaFileNames[s]); }//if user entered a file with a path then preserve it map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])); variables["[tag]"] = "denovo"; if (templatefile != "self") { variables["[tag]"] = "ref"; } string outputFileName = getOutputFileName("chimera", variables); string accnosFileName = getOutputFileName("accnos", variables); string alnsFileName = getOutputFileName("alns", variables); string newFasta = m->getRootName(fastaFileNames[s]) + "temp"; string newCountFile = ""; //you provided a groupfile string groupFile = ""; bool hasGroup = false; if (groupFileNames.size() != 0) { groupFile = groupFileNames[s]; hasGroup = true; } else if (hasCount) { CountTable ct; if (ct.testGroups(nameFileNames[s])) { hasGroup = true; } variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(nameFileNames[s])); newCountFile = getOutputFileName("count", variables); } if ((templatefile == "self") && (!hasGroup)) { //you want to run uchime with a template=self and no groups if (processors != 1) { m->mothurOut("When using template=self, mothur can only use 1 processor, continuing."); m->mothurOutEndLine(); processors = 1; } if (nameFileNames.size() != 0) { //you provided a namefile and we don't need to create one nameFile = nameFileNames[s]; }else { nameFile = getNamesFile(fastaFileNames[s]); } map seqs; readFasta(fastaFileNames[s], seqs); if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } //read namefile vector nameMapCount; int error; if (hasCount) { CountTable ct; ct.readTable(nameFile, true, false); for(map::iterator it = seqs.begin(); it != seqs.end(); it++) { int num = ct.getNumSeqs(it->first); if (num == 0) { error = 1; } else { seqPriorityNode temp(num, it->second, it->first); nameMapCount.push_back(temp); } } }else { error = m->readNames(nameFile, nameMapCount, seqs); if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } } if (error == 1) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } if (seqs.size() != nameMapCount.size()) { m->mothurOut( "The number of sequences in your fastafile does not match the number of sequences in your namefile, aborting."); m->mothurOutEndLine(); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } printFile(nameMapCount, newFasta); fastaFileNames[s] = newFasta; } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } if (hasGroup) { if (nameFileNames.size() != 0) { //you provided a namefile and we don't need to create one nameFile = nameFileNames[s]; }else { nameFile = getNamesFile(fastaFileNames[s]); } //Parse sequences by group vector groups; map uniqueNames; if (hasCount) { cparser = new SequenceCountParser(nameFile, fastaFileNames[s]); groups = cparser->getNamesOfGroups(); uniqueNames = cparser->getAllSeqsMap(); }else{ sparser = new SequenceParser(groupFile, fastaFileNames[s], nameFile); groups = sparser->getNamesOfGroups(); uniqueNames = sparser->getAllSeqsMap(); } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } //clears files ofstream out, out1, out2; m->openOutputFile(outputFileName, out); out.close(); m->openOutputFile(accnosFileName, out1); out1.close(); if (chimealns) { m->openOutputFile(alnsFileName, out2); out2.close(); } int totalSeqs = 0; if(processors == 1) { totalSeqs = driverGroups(outputFileName, newFasta, accnosFileName, alnsFileName, newCountFile, 0, groups.size(), groups); if (hasCount && dups) { CountTable c; c.readTable(nameFile, true, false); if (!m->isBlank(newCountFile)) { ifstream in2; m->openInputFile(newCountFile, in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); c.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(newCountFile); c.printTable(newCountFile); } }else{ totalSeqs = createProcessesGroups(outputFileName, newFasta, accnosFileName, alnsFileName, newCountFile, groups, nameFile, groupFile, fastaFileNames[s]); } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } if (!dups) { int totalChimeras = deconvoluteResults(uniqueNames, outputFileName, accnosFileName, alnsFileName); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(totalSeqs) + " sequences. " + toString(totalChimeras) + " chimeras were found."); m->mothurOutEndLine(); m->mothurOut("The number of sequences checked may be larger than the number of unique sequences because some sequences are found in several samples."); m->mothurOutEndLine(); }else { if (hasCount) { set doNotRemove; CountTable c; c.readTable(newCountFile, true, true); vector namesInTable = c.getNamesOfSeqs(); for (int i = 0; i < namesInTable.size(); i++) { int temp = c.getNumSeqs(namesInTable[i]); if (temp == 0) { c.remove(namesInTable[i]); } else { doNotRemove.insert((namesInTable[i])); } } //remove names we want to keep from accnos file. set accnosNames = m->readAccnos(accnosFileName); ofstream out2; m->openOutputFile(accnosFileName, out2); for (set::iterator it = accnosNames.begin(); it != accnosNames.end(); it++) { if (doNotRemove.count(*it) == 0) { out2 << (*it) << endl; } } out2.close(); c.printTable(newCountFile); outputNames.push_back(newCountFile); outputTypes["count"].push_back(newCountFile); } } if (hasCount) { delete cparser; } else { delete sparser; } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } }else{ if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } int numSeqs = 0; int numChimeras = 0; if(processors == 1){ numSeqs = driver(outputFileName, fastaFileNames[s], accnosFileName, alnsFileName, numChimeras); } else{ numSeqs = createProcesses(outputFileName, fastaFileNames[s], accnosFileName, alnsFileName, numChimeras); } //add headings ofstream out; m->openOutputFile(outputFileName+".temp", out); out << "Score\tQuery\tParentA\tParentB\tIdQM\tIdQA\tIdQB\tIdAB\tIdQT\tLY\tLN\tLA\tRY\tRN\tRA\tDiv\tYN\n"; out.close(); m->appendFiles(outputFileName, outputFileName+".temp"); m->mothurRemove(outputFileName); rename((outputFileName+".temp").c_str(), outputFileName.c_str()); if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } //remove file made for uchime if (templatefile == "self") { m->mothurRemove(fastaFileNames[s]); } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences. " + toString(numChimeras) + " chimeras were found."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["chimera"].push_back(outputFileName); outputNames.push_back(accnosFileName); outputTypes["accnos"].push_back(accnosFileName); if (chimealns) { outputNames.push_back(alnsFileName); outputTypes["alns"].push_back(alnsFileName); } } //set accnos file as new current accnosfile string current = ""; itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "execute"); exit(1); } } //********************************************************************************************************************** int ChimeraUchimeCommand::deconvoluteResults(map& uniqueNames, string outputFileName, string accnosFileName, string alnsFileName){ try { map::iterator itUnique; int total = 0; ofstream out2; m->openOutputFile(accnosFileName+".temp", out2); string name; set namesInFile; //this is so if a sequence is found to be chimera in several samples we dont write it to the results file more than once set::iterator itNames; set chimerasInFile; set::iterator itChimeras; if (!m->isBlank(accnosFileName)) { //edit accnos file ifstream in2; m->openInputFile(accnosFileName, in2); while (!in2.eof()) { if (m->control_pressed) { in2.close(); out2.close(); m->mothurRemove(outputFileName); m->mothurRemove((accnosFileName+".temp")); return 0; } in2 >> name; m->gobble(in2); //find unique name itUnique = uniqueNames.find(name); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing accnos results. Cannot find " + name + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { itChimeras = chimerasInFile.find((itUnique->second)); if (itChimeras == chimerasInFile.end()) { out2 << itUnique->second << endl; chimerasInFile.insert((itUnique->second)); total++; } } } in2.close(); } out2.close(); m->mothurRemove(accnosFileName); rename((accnosFileName+".temp").c_str(), accnosFileName.c_str()); //edit chimera file ifstream in; m->openInputFile(outputFileName, in); ofstream out; m->openOutputFile(outputFileName+".temp", out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); out << "Score\tQuery\tParentA\tParentB\tIdQM\tIdQA\tIdQB\tIdAB\tIdQT\tLY\tLN\tLA\tRY\tRN\tRA\tDiv\tYN\n"; float temp1; string parent1, parent2, temp2, temp3, temp4, temp5, temp6, temp7, temp8, temp9, temp10, temp11, temp12, temp13, flag; name = ""; namesInFile.clear(); //assumptions - in file each read will always look like - if uchime source is updated, revisit this code. /* 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0.000000 F11Fcsw_33372/ab=18/ * * * * * * * * * * * * * * N 0.018300 F11Fcsw_14980/ab=16/ F11Fcsw_1915/ab=35/ F11Fcsw_6032/ab=42/ 79.9 78.7 78.2 78.7 79.2 3 0 5 11 10 20 1.46 N */ while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove((outputFileName+".temp")); return 0; } bool print = false; in >> temp1; m->gobble(in); in >> name; m->gobble(in); in >> parent1; m->gobble(in); in >> parent2; m->gobble(in); in >> temp2 >> temp3 >> temp4 >> temp5 >> temp6 >> temp7 >> temp8 >> temp9 >> temp10 >> temp11 >> temp12 >> temp13 >> flag; m->gobble(in); //parse name - name will look like U68590/ab=1/ string restOfName = ""; int pos = name.find_first_of('/'); if (pos != string::npos) { restOfName = name.substr(pos); name = name.substr(0, pos); } //find unique name itUnique = uniqueNames.find(name); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find "+ name + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { name = itUnique->second; //is this name already in the file itNames = namesInFile.find((name)); if (itNames == namesInFile.end()) { //no not in file if (flag == "N") { //are you really a no?? //is this sequence really not chimeric?? itChimeras = chimerasInFile.find(name); //then you really are a no so print, otherwise skip if (itChimeras == chimerasInFile.end()) { print = true; } }else{ print = true; } } } if (print) { out << temp1 << '\t' << name << restOfName << '\t'; namesInFile.insert(name); //parse parent1 names if (parent1 != "*") { restOfName = ""; pos = parent1.find_first_of('/'); if (pos != string::npos) { restOfName = parent1.substr(pos); parent1 = parent1.substr(0, pos); } itUnique = uniqueNames.find(parent1); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find parentA "+ parent1 + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { out << itUnique->second << restOfName << '\t'; } }else { out << parent1 << '\t'; } //parse parent2 names if (parent2 != "*") { restOfName = ""; pos = parent2.find_first_of('/'); if (pos != string::npos) { restOfName = parent2.substr(pos); parent2 = parent2.substr(0, pos); } itUnique = uniqueNames.find(parent2); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing chimera results. Cannot find parentB "+ parent2 + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { out << itUnique->second << restOfName << '\t'; } }else { out << parent2 << '\t'; } out << temp2 << '\t' << temp3 << '\t' << temp4 << '\t' << temp5 << '\t' << temp6 << '\t' << temp7 << '\t' << temp8 << '\t' << temp9 << '\t' << temp10 << '\t' << temp11 << '\t' << temp12 << temp13 << '\t' << flag << endl; } } in.close(); out.close(); m->mothurRemove(outputFileName); rename((outputFileName+".temp").c_str(), outputFileName.c_str()); //edit anls file //assumptions - in file each read will always look like - if uchime source is updated, revisit this code. /* ------------------------------------------------------------------------ Query ( 179 nt) F21Fcsw_11639/ab=591/ ParentA ( 179 nt) F11Fcsw_6529/ab=1625/ ParentB ( 181 nt) F21Fcsw_12128/ab=1827/ A 1 AAGgAAGAtTAATACaagATGgCaTCatgAGtccgCATgTtcAcatGATTAAAG--gTaTtcCGGTagacGATGGGGATG 78 Q 1 AAGTAAGACTAATACCCAATGACGTCTCTAGAAGACATCTGAAAGAGATTAAAG--ATTTATCGGTGATGGATGGGGATG 78 B 1 AAGgAAGAtTAATcCaggATGggaTCatgAGttcACATgTccgcatGATTAAAGgtATTTtcCGGTagacGATGGGGATG 80 Diffs N N A N?N N N NNN N?NB N ?NaNNN B B NN NNNN Votes 0 0 + 000 0 0 000 000+ 0 00!000 + 00 0000 Model AAAAAAAAAAAAAAAAAAAAAAxBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB A 79 CGTtccATTAGaTaGTaGGCGGGGTAACGGCCCACCtAGtCttCGATggaTAGGGGTTCTGAGAGGAAGGTCCCCCACAT 158 Q 79 CGTCTGATTAGCTTGTTGGCGGGGTAACGGCCCACCAAGGCAACGATCAGTAGGGGTTCTGAGAGGAAGGTCCCCCACAT 158 B 81 CGTtccATTAGaTaGTaGGCGGGGTAACGGCCCACCtAGtCAACGATggaTAGGGGTTCTGAGAGGAAGGTCCCCCACAT 160 Diffs NNN N N N N N BB NNN Votes 000 0 0 0 0 0 ++ 000 Model BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB A 159 TGGAACTGAGACACGGTCCAA 179 Q 159 TGGAACTGAGACACGGTCCAA 179 B 161 TGGAACTGAGACACGGTCCAA 181 Diffs Votes Model BBBBBBBBBBBBBBBBBBBBB Ids. QA 76.6%, QB 77.7%, AB 93.7%, QModel 78.9%, Div. +1.5% Diffs Left 7: N 0, A 6, Y 1 (14.3%); Right 35: N 1, A 30, Y 4 (11.4%), Score 0.0047 */ if (chimealns) { ifstream in3; m->openInputFile(alnsFileName, in3); ofstream out3; m->openOutputFile(alnsFileName+".temp", out3); out3.setf(ios::fixed, ios::floatfield); out3.setf(ios::showpoint); name = ""; namesInFile.clear(); string line = ""; while (!in3.eof()) { if (m->control_pressed) { in3.close(); out3.close(); m->mothurRemove(outputFileName); m->mothurRemove((accnosFileName)); m->mothurRemove((alnsFileName+".temp")); return 0; } line = ""; line = m->getline(in3); string temp = ""; if (line != "") { istringstream iss(line); iss >> temp; //are you a name line if ((temp == "Query") || (temp == "ParentA") || (temp == "ParentB")) { int spot = 0; for (int i = 0; i < line.length(); i++) { spot = i; if (line[i] == ')') { break; } else { out3 << line[i]; } } if (spot == (line.length() - 1)) { m->mothurOut("[ERROR]: could not line sequence name in line " + line + "."); m->mothurOutEndLine(); m->control_pressed = true; } else if ((spot+2) > (line.length() - 1)) { m->mothurOut("[ERROR]: could not line sequence name in line " + line + "."); m->mothurOutEndLine(); m->control_pressed = true; } else { out << line[spot] << line[spot+1]; name = line.substr(spot+2); //parse name - name will either look like U68590/ab=1/ or U68590 string restOfName = ""; int pos = name.find_first_of('/'); if (pos != string::npos) { restOfName = name.substr(pos); name = name.substr(0, pos); } //find unique name itUnique = uniqueNames.find(name); if (itUnique == uniqueNames.end()) { m->mothurOut("[ERROR]: trouble parsing alns results. Cannot find "+ name + "."); m->mothurOutEndLine();m->control_pressed = true; } else { //only limit repeats on query names if (temp == "Query") { itNames = namesInFile.find((itUnique->second)); if (itNames == namesInFile.end()) { out << itUnique->second << restOfName << endl; namesInFile.insert((itUnique->second)); } }else { out << itUnique->second << restOfName << endl; } } } }else { //not need to alter line out3 << line << endl; } }else { out3 << endl; } } in3.close(); out3.close(); m->mothurRemove(alnsFileName); rename((alnsFileName+".temp").c_str(), alnsFileName.c_str()); } return total; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "deconvoluteResults"); exit(1); } } //********************************************************************************************************************** int ChimeraUchimeCommand::printFile(vector& nameMapCount, string filename){ try { sort(nameMapCount.begin(), nameMapCount.end(), compareSeqPriorityNodes); ofstream out; m->openOutputFile(filename, out); //print new file in order of for (int i = 0; i < nameMapCount.size(); i++) { out << ">" << nameMapCount[i].name << "/ab=" << nameMapCount[i].numIdentical << "/" << endl << nameMapCount[i].seq << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "printFile"); exit(1); } } //********************************************************************************************************************** int ChimeraUchimeCommand::readFasta(string filename, map& seqs){ try { //create input file for uchime //read through fastafile and store info ifstream in; m->openInputFile(filename, in); while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } Sequence seq(in); m->gobble(in); seqs[seq.getName()] = seq.getAligned(); } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** string ChimeraUchimeCommand::getNamesFile(string& inputFile){ try { string nameFile = ""; m->mothurOutEndLine(); m->mothurOut("No namesfile given, running unique.seqs command to generate one."); m->mothurOutEndLine(); m->mothurOutEndLine(); //use unique.seqs to create new name and fastafile string inputString = "fasta=" + inputFile; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: unique.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* uniqueCommand = new DeconvoluteCommand(inputString); uniqueCommand->execute(); map > filenames = uniqueCommand->getOutputFiles(); delete uniqueCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); nameFile = filenames["name"][0]; inputFile = filenames["fasta"][0]; return nameFile; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "getNamesFile"); exit(1); } } //********************************************************************************************************************** int ChimeraUchimeCommand::driverGroups(string outputFName, string filename, string accnos, string alns, string countlist, int start, int end, vector groups){ try { int totalSeqs = 0; int numChimeras = 0; ofstream outCountList; if (hasCount && dups) { m->openOutputFile(countlist, outCountList); } for (int i = start; i < end; i++) { int start = time(NULL); if (m->control_pressed) { outCountList.close(); m->mothurRemove(countlist); return 0; } int error; if (hasCount) { error = cparser->getSeqs(groups[i], filename, true); if ((error == 1) || m->control_pressed) { return 0; } } else { error = sparser->getSeqs(groups[i], filename, true); if ((error == 1) || m->control_pressed) { return 0; } } int numSeqs = driver((outputFName + groups[i]), filename, (accnos+groups[i]), (alns+ groups[i]), numChimeras); totalSeqs += numSeqs; if (m->control_pressed) { return 0; } //remove file made for uchime if (!m->debug) { m->mothurRemove(filename); } else { m->mothurOut("[DEBUG]: saving file: " + filename + ".\n"); } //if we provided a count file with group info and set dereplicate=t, then we want to create a *.pick.count_table //This table will zero out group counts for seqs determined to be chimeric by that group. if (dups) { if (!m->isBlank(accnos+groups[i])) { ifstream in; m->openInputFile(accnos+groups[i], in); string name; if (hasCount) { while (!in.eof()) { in >> name; m->gobble(in); outCountList << name << '\t' << groups[i] << endl; } in.close(); }else { map thisnamemap = sparser->getNameMap(groups[i]); map::iterator itN; ofstream out; m->openOutputFile(accnos+groups[i]+".temp", out); while (!in.eof()) { in >> name; m->gobble(in); itN = thisnamemap.find(name); if (itN != thisnamemap.end()) { vector tempNames; m->splitAtComma(itN->second, tempNames); for (int j = 0; j < tempNames.size(); j++) { out << tempNames[j] << endl; } }else { m->mothurOut("[ERROR]: parsing cannot find " + name + ".\n"); m->control_pressed = true; } } out.close(); in.close(); m->renameFile(accnos+groups[i]+".temp", accnos+groups[i]); } } } //append files m->appendFiles((outputFName+groups[i]), outputFName); m->mothurRemove((outputFName+groups[i])); m->appendFiles((accnos+groups[i]), accnos); m->mothurRemove((accnos+groups[i])); if (chimealns) { m->appendFiles((alns+groups[i]), alns); m->mothurRemove((alns+groups[i])); } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences from group " + groups[i] + "."); m->mothurOutEndLine(); } if (hasCount && dups) { outCountList.close(); } return totalSeqs; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "driverGroups"); exit(1); } } //********************************************************************************************************************** int ChimeraUchimeCommand::driver(string outputFName, string filename, string accnos, string alns, int& numChimeras){ try { outputFName = m->getFullPathName(outputFName); filename = m->getFullPathName(filename); alns = m->getFullPathName(alns); //to allow for spaces in the path outputFName = "\"" + outputFName + "\""; filename = "\"" + filename + "\""; alns = "\"" + alns + "\""; vector cPara; string uchimeCommand = uchimeLocation; uchimeCommand = "\"" + uchimeCommand + "\" "; char* tempUchime; tempUchime= new char[uchimeCommand.length()+1]; *tempUchime = '\0'; strncat(tempUchime, uchimeCommand.c_str(), uchimeCommand.length()); cPara.push_back(tempUchime); //are you using a reference file if (templatefile != "self") { string outputFileName = filename.substr(1, filename.length()-2) + ".uchime_formatted"; prepFile(filename.substr(1, filename.length()-2), outputFileName); filename = outputFileName; filename = "\"" + filename + "\""; //add reference file char* tempRef = new char[5]; //strcpy(tempRef, "--db"); *tempRef = '\0'; strncat(tempRef, "--db", 4); cPara.push_back(tempRef); char* tempR = new char[templatefile.length()+1]; //strcpy(tempR, templatefile.c_str()); *tempR = '\0'; strncat(tempR, templatefile.c_str(), templatefile.length()); cPara.push_back(tempR); } char* tempIn = new char[8]; *tempIn = '\0'; strncat(tempIn, "--input", 7); //strcpy(tempIn, "--input"); cPara.push_back(tempIn); char* temp = new char[filename.length()+1]; *temp = '\0'; strncat(temp, filename.c_str(), filename.length()); //strcpy(temp, filename.c_str()); cPara.push_back(temp); char* tempO = new char[12]; *tempO = '\0'; strncat(tempO, "--uchimeout", 11); //strcpy(tempO, "--uchimeout"); cPara.push_back(tempO); char* tempout = new char[outputFName.length()+1]; //strcpy(tempout, outputFName.c_str()); *tempout = '\0'; strncat(tempout, outputFName.c_str(), outputFName.length()); cPara.push_back(tempout); if (chimealns) { char* tempA = new char[13]; *tempA = '\0'; strncat(tempA, "--uchimealns", 12); //strcpy(tempA, "--uchimealns"); cPara.push_back(tempA); char* tempa = new char[alns.length()+1]; //strcpy(tempa, alns.c_str()); *tempa = '\0'; strncat(tempa, alns.c_str(), alns.length()); cPara.push_back(tempa); } if (strand != "") { char* tempA = new char[9]; *tempA = '\0'; strncat(tempA, "--strand", 8); cPara.push_back(tempA); char* tempa = new char[strand.length()+1]; *tempa = '\0'; strncat(tempa, strand.c_str(), strand.length()); cPara.push_back(tempa); } if (useAbskew) { char* tempskew = new char[9]; *tempskew = '\0'; strncat(tempskew, "--abskew", 8); //strcpy(tempskew, "--abskew"); cPara.push_back(tempskew); char* tempSkew = new char[abskew.length()+1]; //strcpy(tempSkew, abskew.c_str()); *tempSkew = '\0'; strncat(tempSkew, abskew.c_str(), abskew.length()); cPara.push_back(tempSkew); } if (useMinH) { char* tempminh = new char[7]; *tempminh = '\0'; strncat(tempminh, "--minh", 6); //strcpy(tempminh, "--minh"); cPara.push_back(tempminh); char* tempMinH = new char[minh.length()+1]; *tempMinH = '\0'; strncat(tempMinH, minh.c_str(), minh.length()); //strcpy(tempMinH, minh.c_str()); cPara.push_back(tempMinH); } if (useMindiv) { char* tempmindiv = new char[9]; *tempmindiv = '\0'; strncat(tempmindiv, "--mindiv", 8); //strcpy(tempmindiv, "--mindiv"); cPara.push_back(tempmindiv); char* tempMindiv = new char[mindiv.length()+1]; *tempMindiv = '\0'; strncat(tempMindiv, mindiv.c_str(), mindiv.length()); //strcpy(tempMindiv, mindiv.c_str()); cPara.push_back(tempMindiv); } if (useXn) { char* tempxn = new char[5]; //strcpy(tempxn, "--xn"); *tempxn = '\0'; strncat(tempxn, "--xn", 4); cPara.push_back(tempxn); char* tempXn = new char[xn.length()+1]; //strcpy(tempXn, xn.c_str()); *tempXn = '\0'; strncat(tempXn, xn.c_str(), xn.length()); cPara.push_back(tempXn); } if (useDn) { char* tempdn = new char[5]; //strcpy(tempdn, "--dn"); *tempdn = '\0'; strncat(tempdn, "--dn", 4); cPara.push_back(tempdn); char* tempDn = new char[dn.length()+1]; *tempDn = '\0'; strncat(tempDn, dn.c_str(), dn.length()); //strcpy(tempDn, dn.c_str()); cPara.push_back(tempDn); } if (useXa) { char* tempxa = new char[5]; //strcpy(tempxa, "--xa"); *tempxa = '\0'; strncat(tempxa, "--xa", 4); cPara.push_back(tempxa); char* tempXa = new char[xa.length()+1]; *tempXa = '\0'; strncat(tempXa, xa.c_str(), xa.length()); //strcpy(tempXa, xa.c_str()); cPara.push_back(tempXa); } if (useChunks) { char* tempchunks = new char[9]; //strcpy(tempchunks, "--chunks"); *tempchunks = '\0'; strncat(tempchunks, "--chunks", 8); cPara.push_back(tempchunks); char* tempChunks = new char[chunks.length()+1]; *tempChunks = '\0'; strncat(tempChunks, chunks.c_str(), chunks.length()); //strcpy(tempChunks, chunks.c_str()); cPara.push_back(tempChunks); } if (useMinchunk) { char* tempminchunk = new char[11]; //strcpy(tempminchunk, "--minchunk"); *tempminchunk = '\0'; strncat(tempminchunk, "--minchunk", 10); cPara.push_back(tempminchunk); char* tempMinchunk = new char[minchunk.length()+1]; *tempMinchunk = '\0'; strncat(tempMinchunk, minchunk.c_str(), minchunk.length()); //strcpy(tempMinchunk, minchunk.c_str()); cPara.push_back(tempMinchunk); } if (useIdsmoothwindow) { char* tempidsmoothwindow = new char[17]; *tempidsmoothwindow = '\0'; strncat(tempidsmoothwindow, "--idsmoothwindow", 16); //strcpy(tempidsmoothwindow, "--idsmoothwindow"); cPara.push_back(tempidsmoothwindow); char* tempIdsmoothwindow = new char[idsmoothwindow.length()+1]; *tempIdsmoothwindow = '\0'; strncat(tempIdsmoothwindow, idsmoothwindow.c_str(), idsmoothwindow.length()); //strcpy(tempIdsmoothwindow, idsmoothwindow.c_str()); cPara.push_back(tempIdsmoothwindow); } /*if (useMinsmoothid) { char* tempminsmoothid = new char[14]; //strcpy(tempminsmoothid, "--minsmoothid"); *tempminsmoothid = '\0'; strncat(tempminsmoothid, "--minsmoothid", 13); cPara.push_back(tempminsmoothid); char* tempMinsmoothid = new char[minsmoothid.length()+1]; *tempMinsmoothid = '\0'; strncat(tempMinsmoothid, minsmoothid.c_str(), minsmoothid.length()); //strcpy(tempMinsmoothid, minsmoothid.c_str()); cPara.push_back(tempMinsmoothid); }*/ if (useMaxp) { char* tempmaxp = new char[7]; //strcpy(tempmaxp, "--maxp"); *tempmaxp = '\0'; strncat(tempmaxp, "--maxp", 6); cPara.push_back(tempmaxp); char* tempMaxp = new char[maxp.length()+1]; *tempMaxp = '\0'; strncat(tempMaxp, maxp.c_str(), maxp.length()); //strcpy(tempMaxp, maxp.c_str()); cPara.push_back(tempMaxp); } if (!skipgaps) { char* tempskipgaps = new char[13]; //strcpy(tempskipgaps, "--[no]skipgaps"); *tempskipgaps = '\0'; strncat(tempskipgaps, "--noskipgaps", 12); cPara.push_back(tempskipgaps); } if (!skipgaps2) { char* tempskipgaps2 = new char[14]; //strcpy(tempskipgaps2, "--[no]skipgaps2"); *tempskipgaps2 = '\0'; strncat(tempskipgaps2, "--noskipgaps2", 13); cPara.push_back(tempskipgaps2); } if (useMinlen) { char* tempminlen = new char[9]; *tempminlen = '\0'; strncat(tempminlen, "--minlen", 8); //strcpy(tempminlen, "--minlen"); cPara.push_back(tempminlen); char* tempMinlen = new char[minlen.length()+1]; //strcpy(tempMinlen, minlen.c_str()); *tempMinlen = '\0'; strncat(tempMinlen, minlen.c_str(), minlen.length()); cPara.push_back(tempMinlen); } if (useMaxlen) { char* tempmaxlen = new char[9]; //strcpy(tempmaxlen, "--maxlen"); *tempmaxlen = '\0'; strncat(tempmaxlen, "--maxlen", 8); cPara.push_back(tempmaxlen); char* tempMaxlen = new char[maxlen.length()+1]; *tempMaxlen = '\0'; strncat(tempMaxlen, maxlen.c_str(), maxlen.length()); //strcpy(tempMaxlen, maxlen.c_str()); cPara.push_back(tempMaxlen); } if (ucl) { char* tempucl = new char[5]; strcpy(tempucl, "--ucl"); cPara.push_back(tempucl); } if (useQueryfract) { char* tempqueryfract = new char[13]; *tempqueryfract = '\0'; strncat(tempqueryfract, "--queryfract", 12); //strcpy(tempqueryfract, "--queryfract"); cPara.push_back(tempqueryfract); char* tempQueryfract = new char[queryfract.length()+1]; *tempQueryfract = '\0'; strncat(tempQueryfract, queryfract.c_str(), queryfract.length()); //strcpy(tempQueryfract, queryfract.c_str()); cPara.push_back(tempQueryfract); } char** uchimeParameters; uchimeParameters = new char*[cPara.size()]; string commandString = ""; for (int i = 0; i < cPara.size(); i++) { uchimeParameters[i] = cPara[i]; commandString += toString(cPara[i]) + " "; } //int numArgs = cPara.size(); //uchime_main(numArgs, uchimeParameters); //cout << "commandString = " << commandString << endl; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else commandString = "\"" + commandString + "\""; #endif if (m->debug) { m->mothurOut("[DEBUG]: uchime command = " + commandString + ".\n"); } system(commandString.c_str()); //free memory for(int i = 0; i < cPara.size(); i++) { delete cPara[i]; } delete[] uchimeParameters; //remove "" from filenames outputFName = outputFName.substr(1, outputFName.length()-2); filename = filename.substr(1, filename.length()-2); alns = alns.substr(1, alns.length()-2); if (m->control_pressed) { return 0; } //create accnos file from uchime results ifstream in; m->openInputFile(outputFName, in); ofstream out; m->openOutputFile(accnos, out); int num = 0; numChimeras = 0; while(!in.eof()) { if (m->control_pressed) { break; } string name = ""; string chimeraFlag = ""; //in >> chimeraFlag >> name; string line = m->getline(in); vector pieces = m->splitWhiteSpace(line); if (pieces.size() > 2) { name = pieces[1]; //fix name if needed if (templatefile == "self") { name = name.substr(0, name.length()-1); //rip off last / name = name.substr(0, name.find_last_of('/')); } chimeraFlag = pieces[pieces.size()-1]; } //for (int i = 0; i < 15; i++) { in >> chimeraFlag; } m->gobble(in); if (chimeraFlag == "Y") { out << name << endl; numChimeras++; } num++; } in.close(); out.close(); //if (templatefile != "self") { m->mothurRemove(filename); } return num; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "driver"); exit(1); } } /**************************************************************************************************/ //uchime can't handle some of the things allowed in mothurs fasta files. This functions "cleans up" the file. int ChimeraUchimeCommand::prepFile(string filename, string output) { try { ifstream in; m->openInputFile(filename, in); ofstream out; m->openOutputFile(output, out); while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { seq.printSequence(out); } } in.close(); out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "prepFile"); exit(1); } } /**************************************************************************************************/ int ChimeraUchimeCommand::createProcesses(string outputFileName, string filename, string accnos, string alns, int& numChimeras) { try { processIDS.clear(); int process = 1; int num = 0; vector files; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //break up file into multiple files m->divideFile(filename, processors, files); if (m->control_pressed) { return 0; } //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(outputFileName + toString(m->mothurGetpid(process)) + ".temp", files[process], accnos + toString(m->mothurGetpid(process)) + ".temp", alns + toString(m->mothurGetpid(process)) + ".temp", numChimeras); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << numChimeras << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } //do my part num = driver(outputFileName, files[0], accnos, alns, numChimeras); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; m->gobble(in); num += tempNum; in >> tempNum; numChimeras += tempNum; } in.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the preClusterData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// //divide file int count = 0; int spot = 0; map filehandles; map::iterator it3; ofstream* temp; for (int i = 0; i < processors; i++) { temp = new ofstream; filehandles[i] = temp; m->openOutputFile(filename+toString(i)+".temp", *(temp)); files.push_back(filename+toString(i)+".temp"); } ifstream in; m->openInputFile(filename, in); while(!in.eof()) { if (m->control_pressed) { in.close(); for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(it3->second)).close(); delete it3->second; } return 0; } Sequence tempSeq(in); m->gobble(in); if (tempSeq.getName() != "") { tempSeq.printSequence(*(filehandles[spot])); spot++; count++; if (spot == processors) { spot = 0; } } } in.close(); //delete memory for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(it3->second)).close(); delete it3->second; } //sanity check for number of processors if (count < processors) { processors = count; } vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; vector dummy; //used so that we can use the same struct for MyUchimeSeqsThreadFunction and MyUchimeThreadFunction //Create processor worker threads. for( int i=1; isetBooleans(dups, useAbskew, chimealns, useMinH, useMindiv, useXn, useDn, useXa, useChunks, useMinchunk, useIdsmoothwindow, useMinsmoothid, useMaxp, skipgaps, skipgaps2, useMinlen, useMaxlen, ucl, useQueryfract, hasCount); tempUchime->setVariables(abskew, minh, mindiv, xn, dn, xa, chunks, minchunk, idsmoothwindow, minsmoothid, maxp, minlen, maxlen, queryfract, strand); pDataArray.push_back(tempUchime); processIDS.push_back(i); //MySeqSumThreadFunction is in header. It must be global or static to work with the threads. //default security attributes, thread function name, argument to thread function, use default creation flags, returns the thread identifier hThreadArray[i-1] = CreateThread(NULL, 0, MyUchimeSeqsThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //using the main process as a worker saves time and memory num = driver(outputFileName, files[0], accnos, alns, numChimeras); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ num += pDataArray[i]->count; numChimeras += pDataArray[i]->numChimeras; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //append output files for(int i=0;iappendFiles((outputFileName + toString(processIDS[i]) + ".temp"), outputFileName); m->mothurRemove((outputFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((accnos + toString(processIDS[i]) + ".temp"), accnos); m->mothurRemove((accnos + toString(processIDS[i]) + ".temp")); if (chimealns) { m->appendFiles((alns + toString(processIDS[i]) + ".temp"), alns); m->mothurRemove((alns + toString(processIDS[i]) + ".temp")); } } //get rid of the file pieces. for (int i = 0; i < files.size(); i++) { m->mothurRemove(files[i]); } return num; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ int ChimeraUchimeCommand::createProcessesGroups(string outputFName, string filename, string accnos, string alns, string newCountFile, vector groups, string nameFile, string groupFile, string fastaFile) { try { processIDS.clear(); int process = 1; int num = 0; CountTable newCount; if (hasCount && dups) { newCount.readTable(nameFile, true, false); } //sanity check if (groups.size() < processors) { processors = groups.size(); } //divide the groups between the processors vector lines; int remainingPairs = groups.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverGroups(outputFName + toString(m->mothurGetpid(process)) + ".temp", filename + toString(m->mothurGetpid(process)) + ".temp", accnos + toString(m->mothurGetpid(process)) + ".temp", alns + toString(m->mothurGetpid(process)) + ".temp", accnos + ".byCount." + toString(m->mothurGetpid(process)) + ".temp", lines[process].start, lines[process].end, groups); //pass numSeqs to parent ofstream out; string tempFile = outputFName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } m->mothurOut(toString( getpid() ) + " here\n"); //do my part num = driverGroups(outputFName, filename, accnos, alns, accnos + ".byCount", lines[0].start, lines[0].end, groups); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the uchimeData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; isetBooleans(dups, useAbskew, chimealns, useMinH, useMindiv, useXn, useDn, useXa, useChunks, useMinchunk, useIdsmoothwindow, useMinsmoothid, useMaxp, skipgaps, skipgaps2, useMinlen, useMaxlen, ucl, useQueryfract, hasCount); tempUchime->setVariables(abskew, minh, mindiv, xn, dn, xa, chunks, minchunk, idsmoothwindow, minsmoothid, maxp, minlen, maxlen, queryfract, strand); pDataArray.push_back(tempUchime); processIDS.push_back(i); //MyUchimeThreadFunction is in header. It must be global or static to work with the threads. //default security attributes, thread function name, argument to thread function, use default creation flags, returns the thread identifier hThreadArray[i-1] = CreateThread(NULL, 0, MyUchimeThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //using the main process as a worker saves time and memory num = driverGroups(outputFName, filename, accnos, alns, accnos + ".byCount", lines[0].start, lines[0].end, groups); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ num += pDataArray[i]->count; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //read my own if (hasCount && dups) { if (!m->isBlank(accnos + ".byCount")) { ifstream in2; m->openInputFile(accnos + ".byCount", in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); newCount.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(accnos + ".byCount"); } //append output files for(int i=0;iappendFiles((outputFName + toString(processIDS[i]) + ".temp"), outputFName); m->mothurRemove((outputFName + toString(processIDS[i]) + ".temp")); m->appendFiles((accnos + toString(processIDS[i]) + ".temp"), accnos); m->mothurRemove((accnos + toString(processIDS[i]) + ".temp")); if (chimealns) { m->appendFiles((alns + toString(processIDS[i]) + ".temp"), alns); m->mothurRemove((alns + toString(processIDS[i]) + ".temp")); } if (hasCount && dups) { if (!m->isBlank(accnos + ".byCount." + toString(processIDS[i]) + ".temp")) { ifstream in2; m->openInputFile(accnos + ".byCount." + toString(processIDS[i]) + ".temp", in2); string name, group; while (!in2.eof()) { in2 >> name >> group; m->gobble(in2); newCount.setAbund(name, group, 0); } in2.close(); } m->mothurRemove(accnos + ".byCount." + toString(processIDS[i]) + ".temp"); } } //print new *.pick.count_table if (hasCount && dups) { newCount.printTable(newCountFile); } return num; } catch(exception& e) { m->errorOut(e, "ChimeraUchimeCommand", "createProcessesGroups"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/commands/chimerauchimecommand.h000066400000000000000000001010421255543666200225500ustar00rootroot00000000000000#ifndef CHIMERAUCHIMECOMMAND_H #define CHIMERAUCHIMECOMMAND_H /* * chimerauchimecommand.h * Mothur * * Created by westcott on 5/13/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "sequenceparser.h" #include "counttable.h" #include "sequencecountparser.h" /***********************************************************/ class ChimeraUchimeCommand : public Command { public: ChimeraUchimeCommand(string); ChimeraUchimeCommand(); ~ChimeraUchimeCommand() {} vector setParameters(); string getCommandName() { return "chimera.uchime"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "uchime by Robert C. Edgar\nhttp://drive5.com/usearch/manual/uchime_algo.html\nThis code was donated to the public domain.\nEdgar,R.C., Haas,B.J., Clemente,J.C., Quince,C. and Knight,R. (2011), UCHIME improves sensitivity and speed of chimera detection. Bioinformatics 27:2194.\nhttp://www.mothur.org/wiki/Chimera.uchime\n"; } string getDescription() { return "detect chimeric sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector processIDS; //processid int driver(string, string, string, string, int&); int createProcesses(string, string, string, string, int&); bool abort, useAbskew, chimealns, useMinH, useMindiv, useXn, useDn, useXa, useChunks, useMinchunk, useIdsmoothwindow, useMinsmoothid, useMaxp, skipgaps, skipgaps2, useMinlen, useMaxlen, ucl, useQueryfract, hasCount, hasName, dups; string fastafile, groupfile, templatefile, outputDir, namefile, countfile, abskew, minh, mindiv, xn, dn, xa, chunks, minchunk, idsmoothwindow, minsmoothid, maxp, minlen, maxlen, queryfract, uchimeLocation, strand; int processors; SequenceParser* sparser; SequenceCountParser* cparser; vector outputNames; vector fastaFileNames; vector nameFileNames; vector groupFileNames; string getNamesFile(string&); int readFasta(string, map&); int printFile(vector&, string); int deconvoluteResults(map&, string, string, string); int driverGroups(string, string, string, string, string, int, int, vector); int createProcessesGroups(string, string, string, string, string, vector, string, string, string); int prepFile(string filename, string); }; /***********************************************************/ /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct uchimeData { string fastafile; string namefile; string groupfile; string outputFName; string accnos, alns, filename, templatefile, uchimeLocation, countlist; MothurOut* m; int start; int end; int threadID, count, numChimeras; vector groups; bool dups, useAbskew, chimealns, useMinH, useMindiv, useXn, useDn, useXa, useChunks, useMinchunk, useIdsmoothwindow, useMinsmoothid, useMaxp, skipgaps, skipgaps2, useMinlen, useMaxlen, ucl, useQueryfract, hasCount; string abskew, minh, mindiv, xn, dn, xa, chunks, minchunk, idsmoothwindow, minsmoothid, maxp, minlen, maxlen, queryfract, strand; uchimeData(){} uchimeData(string o, string uloc, string t, string file, string f, string n, string g, string ac, string al, string nc, vector gr, MothurOut* mout, int st, int en, int tid) { fastafile = f; namefile = n; groupfile = g; filename = file; outputFName = o; templatefile = t; accnos = ac; alns = al; m = mout; start = st; end = en; threadID = tid; groups = gr; count = 0; numChimeras = 0; uchimeLocation = uloc; countlist = nc; } void setBooleans(bool dps, bool Abskew, bool calns, bool MinH, bool Mindiv, bool Xn, bool Dn, bool Xa, bool Chunks, bool Minchunk, bool Idsmoothwindow, bool Minsmoothid, bool Maxp, bool skipgap, bool skipgap2, bool Minlen, bool Maxlen, bool uc, bool Queryfract, bool hc) { useAbskew = Abskew; chimealns = calns; useMinH = MinH; useMindiv = Mindiv; useXn = Xn; useDn = Dn; useXa = Xa; useChunks = Chunks; useMinchunk = Minchunk; useIdsmoothwindow = Idsmoothwindow; useMinsmoothid = Minsmoothid; useMaxp = Maxp; skipgaps = skipgap; skipgaps2 = skipgap2; useMinlen = Minlen; useMaxlen = Maxlen; ucl = uc; useQueryfract = Queryfract; hasCount = hc; dups = dps; } void setVariables(string abske, string min, string mindi, string x, string d, string xa2, string chunk, string minchun, string idsmoothwindo, string minsmoothi, string max, string minle, string maxle, string queryfrac, string stra) { abskew = abske; minh = min; mindiv = mindi; strand = stra; xn = x; dn = d; xa = xa2; chunks = chunk; minchunk = minchun; idsmoothwindow = idsmoothwindo; minsmoothid = minsmoothi; maxp = max; minlen = minle; maxlen = maxle; queryfract = queryfrac; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyUchimeThreadFunction(LPVOID lpParam){ uchimeData* pDataArray; pDataArray = (uchimeData*)lpParam; try { pDataArray->outputFName = pDataArray->m->getFullPathName(pDataArray->outputFName); pDataArray->filename = pDataArray->m->getFullPathName(pDataArray->filename); pDataArray->alns = pDataArray->m->getFullPathName(pDataArray->alns); //clears files ofstream out, out1, out2; pDataArray->m->openOutputFile(pDataArray->outputFName, out); out.close(); pDataArray->m->openOutputFile(pDataArray->accnos, out1); out1.close(); if (pDataArray->chimealns) { pDataArray->m->openOutputFile(pDataArray->alns, out2); out2.close(); } //parse fasta and name file by group SequenceParser* parser; SequenceCountParser* cparser; if (pDataArray->hasCount) { CountTable* ct = new CountTable(); ct->readTable(pDataArray->namefile, true, false); cparser = new SequenceCountParser(pDataArray->fastafile, *ct); delete ct; }else { if (pDataArray->namefile != "") { parser = new SequenceParser(pDataArray->groupfile, pDataArray->fastafile, pDataArray->namefile); } else { parser = new SequenceParser(pDataArray->groupfile, pDataArray->fastafile); } } int totalSeqs = 0; int numChimeras = 0; ofstream outCountList; if (pDataArray->hasCount && pDataArray->dups) { pDataArray->m->openOutputFile(pDataArray->countlist, outCountList); } for (int i = pDataArray->start; i < pDataArray->end; i++) { int start = time(NULL); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } return 0; } int error; if (pDataArray->hasCount) { error = cparser->getSeqs(pDataArray->groups[i], pDataArray->filename, true); if ((error == 1) || pDataArray->m->control_pressed) { delete cparser; return 0; } }else { error = parser->getSeqs(pDataArray->groups[i], pDataArray->filename, true); if ((error == 1) || pDataArray->m->control_pressed) { delete parser; return 0; } } //int numSeqs = driver((outputFName + groups[i]), filename, (accnos+ groups[i]), (alns+ groups[i]), numChimeras); //////////////////////////////////////////////////////////////////////////////////////////////////////////////// //to allow for spaces in the path string outputFName = "\"" + pDataArray->outputFName+pDataArray->groups[i] + "\""; string filename = "\"" + pDataArray->filename + "\""; string alns = "\"" + pDataArray->alns+pDataArray->groups[i] + "\""; string accnos = pDataArray->accnos+pDataArray->groups[i]; vector cPara; string uchimeCommand = pDataArray->uchimeLocation; uchimeCommand = "\"" + uchimeCommand + "\""; char* tempUchime; tempUchime= new char[uchimeCommand.length()+1]; *tempUchime = '\0'; strncat(tempUchime, uchimeCommand.c_str(), uchimeCommand.length()); cPara.push_back(tempUchime); char* tempIn = new char[8]; *tempIn = '\0'; strncat(tempIn, "--input", 7); //strcpy(tempIn, "--input"); cPara.push_back(tempIn); char* temp = new char[filename.length()+1]; *temp = '\0'; strncat(temp, filename.c_str(), filename.length()); //strcpy(temp, filename.c_str()); cPara.push_back(temp); char* tempO = new char[12]; *tempO = '\0'; strncat(tempO, "--uchimeout", 11); //strcpy(tempO, "--uchimeout"); cPara.push_back(tempO); char* tempout = new char[outputFName.length()+1]; //strcpy(tempout, outputFName.c_str()); *tempout = '\0'; strncat(tempout, outputFName.c_str(), outputFName.length()); cPara.push_back(tempout); if (pDataArray->chimealns) { char* tempA = new char[13]; *tempA = '\0'; strncat(tempA, "--uchimealns", 12); //strcpy(tempA, "--uchimealns"); cPara.push_back(tempA); char* tempa = new char[alns.length()+1]; //strcpy(tempa, alns.c_str()); *tempa = '\0'; strncat(tempa, alns.c_str(), alns.length()); cPara.push_back(tempa); } if (pDataArray->strand != "") { char* tempA = new char[9]; *tempA = '\0'; strncat(tempA, "--strand", 8); cPara.push_back(tempA); char* tempa = new char[pDataArray->strand.length()+1]; *tempa = '\0'; strncat(tempa, pDataArray->strand.c_str(), pDataArray->strand.length()); cPara.push_back(tempa); } if (pDataArray->useAbskew) { char* tempskew = new char[9]; *tempskew = '\0'; strncat(tempskew, "--abskew", 8); //strcpy(tempskew, "--abskew"); cPara.push_back(tempskew); char* tempSkew = new char[pDataArray->abskew.length()+1]; //strcpy(tempSkew, abskew.c_str()); *tempSkew = '\0'; strncat(tempSkew, pDataArray->abskew.c_str(), pDataArray->abskew.length()); cPara.push_back(tempSkew); } if (pDataArray->useMinH) { char* tempminh = new char[7]; *tempminh = '\0'; strncat(tempminh, "--minh", 6); //strcpy(tempminh, "--minh"); cPara.push_back(tempminh); char* tempMinH = new char[pDataArray->minh.length()+1]; *tempMinH = '\0'; strncat(tempMinH, pDataArray->minh.c_str(), pDataArray->minh.length()); //strcpy(tempMinH, minh.c_str()); cPara.push_back(tempMinH); } if (pDataArray->useMindiv) { char* tempmindiv = new char[9]; *tempmindiv = '\0'; strncat(tempmindiv, "--mindiv", 8); //strcpy(tempmindiv, "--mindiv"); cPara.push_back(tempmindiv); char* tempMindiv = new char[pDataArray->mindiv.length()+1]; *tempMindiv = '\0'; strncat(tempMindiv, pDataArray->mindiv.c_str(), pDataArray->mindiv.length()); //strcpy(tempMindiv, mindiv.c_str()); cPara.push_back(tempMindiv); } if (pDataArray->useXn) { char* tempxn = new char[5]; //strcpy(tempxn, "--xn"); *tempxn = '\0'; strncat(tempxn, "--xn", 4); cPara.push_back(tempxn); char* tempXn = new char[pDataArray->xn.length()+1]; //strcpy(tempXn, xn.c_str()); *tempXn = '\0'; strncat(tempXn, pDataArray->xn.c_str(), pDataArray->xn.length()); cPara.push_back(tempXn); } if (pDataArray->useDn) { char* tempdn = new char[5]; //strcpy(tempdn, "--dn"); *tempdn = '\0'; strncat(tempdn, "--dn", 4); cPara.push_back(tempdn); char* tempDn = new char[pDataArray->dn.length()+1]; *tempDn = '\0'; strncat(tempDn, pDataArray->dn.c_str(), pDataArray->dn.length()); //strcpy(tempDn, dn.c_str()); cPara.push_back(tempDn); } if (pDataArray->useXa) { char* tempxa = new char[5]; //strcpy(tempxa, "--xa"); *tempxa = '\0'; strncat(tempxa, "--xa", 4); cPara.push_back(tempxa); char* tempXa = new char[pDataArray->xa.length()+1]; *tempXa = '\0'; strncat(tempXa, pDataArray->xa.c_str(), pDataArray->xa.length()); //strcpy(tempXa, xa.c_str()); cPara.push_back(tempXa); } if (pDataArray->useChunks) { char* tempchunks = new char[9]; //strcpy(tempchunks, "--chunks"); *tempchunks = '\0'; strncat(tempchunks, "--chunks", 8); cPara.push_back(tempchunks); char* tempChunks = new char[pDataArray->chunks.length()+1]; *tempChunks = '\0'; strncat(tempChunks, pDataArray->chunks.c_str(), pDataArray->chunks.length()); //strcpy(tempChunks, chunks.c_str()); cPara.push_back(tempChunks); } if (pDataArray->useMinchunk) { char* tempminchunk = new char[11]; //strcpy(tempminchunk, "--minchunk"); *tempminchunk = '\0'; strncat(tempminchunk, "--minchunk", 10); cPara.push_back(tempminchunk); char* tempMinchunk = new char[pDataArray->minchunk.length()+1]; *tempMinchunk = '\0'; strncat(tempMinchunk, pDataArray->minchunk.c_str(), pDataArray->minchunk.length()); //strcpy(tempMinchunk, minchunk.c_str()); cPara.push_back(tempMinchunk); } if (pDataArray->useIdsmoothwindow) { char* tempidsmoothwindow = new char[17]; *tempidsmoothwindow = '\0'; strncat(tempidsmoothwindow, "--idsmoothwindow", 16); //strcpy(tempidsmoothwindow, "--idsmoothwindow"); cPara.push_back(tempidsmoothwindow); char* tempIdsmoothwindow = new char[pDataArray->idsmoothwindow.length()+1]; *tempIdsmoothwindow = '\0'; strncat(tempIdsmoothwindow, pDataArray->idsmoothwindow.c_str(), pDataArray->idsmoothwindow.length()); //strcpy(tempIdsmoothwindow, idsmoothwindow.c_str()); cPara.push_back(tempIdsmoothwindow); } if (pDataArray->useMaxp) { char* tempmaxp = new char[7]; //strcpy(tempmaxp, "--maxp"); *tempmaxp = '\0'; strncat(tempmaxp, "--maxp", 6); cPara.push_back(tempmaxp); char* tempMaxp = new char[pDataArray->maxp.length()+1]; *tempMaxp = '\0'; strncat(tempMaxp, pDataArray->maxp.c_str(), pDataArray->maxp.length()); //strcpy(tempMaxp, maxp.c_str()); cPara.push_back(tempMaxp); } if (!pDataArray->skipgaps) { char* tempskipgaps = new char[13]; //strcpy(tempskipgaps, "--[no]skipgaps"); *tempskipgaps = '\0'; strncat(tempskipgaps, "--noskipgaps", 12); cPara.push_back(tempskipgaps); } if (!pDataArray->skipgaps2) { char* tempskipgaps2 = new char[14]; //strcpy(tempskipgaps2, "--[no]skipgaps2"); *tempskipgaps2 = '\0'; strncat(tempskipgaps2, "--noskipgaps2", 13); cPara.push_back(tempskipgaps2); } if (pDataArray->useMinlen) { char* tempminlen = new char[9]; *tempminlen = '\0'; strncat(tempminlen, "--minlen", 8); //strcpy(tempminlen, "--minlen"); cPara.push_back(tempminlen); char* tempMinlen = new char[pDataArray->minlen.length()+1]; //strcpy(tempMinlen, minlen.c_str()); *tempMinlen = '\0'; strncat(tempMinlen, pDataArray->minlen.c_str(), pDataArray->minlen.length()); cPara.push_back(tempMinlen); } if (pDataArray->useMaxlen) { char* tempmaxlen = new char[9]; //strcpy(tempmaxlen, "--maxlen"); *tempmaxlen = '\0'; strncat(tempmaxlen, "--maxlen", 8); cPara.push_back(tempmaxlen); char* tempMaxlen = new char[pDataArray->maxlen.length()+1]; *tempMaxlen = '\0'; strncat(tempMaxlen, pDataArray->maxlen.c_str(), pDataArray->maxlen.length()); //strcpy(tempMaxlen, maxlen.c_str()); cPara.push_back(tempMaxlen); } if (pDataArray->ucl) { char* tempucl = new char[5]; strcpy(tempucl, "--ucl"); cPara.push_back(tempucl); } if (pDataArray->useQueryfract) { char* tempqueryfract = new char[13]; *tempqueryfract = '\0'; strncat(tempqueryfract, "--queryfract", 12); //strcpy(tempqueryfract, "--queryfract"); cPara.push_back(tempqueryfract); char* tempQueryfract = new char[pDataArray->queryfract.length()+1]; *tempQueryfract = '\0'; strncat(tempQueryfract, pDataArray->queryfract.c_str(), pDataArray->queryfract.length()); //strcpy(tempQueryfract, queryfract.c_str()); cPara.push_back(tempQueryfract); } char** uchimeParameters; uchimeParameters = new char*[cPara.size()]; string commandString = ""; for (int j = 0; j < cPara.size(); j++) { uchimeParameters[j] = cPara[j]; commandString += toString(cPara[j]) + " "; } //int numArgs = cPara.size(); //uchime_main(numArgs, uchimeParameters); //cout << "commandString = " << commandString << endl; commandString = "\"" + commandString + "\""; if (pDataArray->m->debug) { pDataArray->m->mothurOut("[DEBUG]: uchime command = " + commandString + ".\n"); } system(commandString.c_str()); //free memory for(int j = 0; j < cPara.size(); j++) { delete cPara[j]; } delete[] uchimeParameters; //remove "" from filenames outputFName = outputFName.substr(1, outputFName.length()-2); filename = filename.substr(1, filename.length()-2); alns = alns.substr(1, alns.length()-2); if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } return 0; } //create accnos file from uchime results ifstream in; pDataArray->m->openInputFile(outputFName, in); ofstream out; pDataArray->m->openOutputFile(accnos, out); int num = 0; numChimeras = 0; map thisnamemap; map::iterator itN; if (pDataArray->dups && !pDataArray->hasCount) { thisnamemap = parser->getNameMap(pDataArray->groups[i]); } while(!in.eof()) { if (pDataArray->m->control_pressed) { break; } string name = ""; string chimeraFlag = ""; in >> chimeraFlag >> name; //fix name name = name.substr(0, name.length()-1); //rip off last / name = name.substr(0, name.find_last_of('/')); for (int j = 0; j < 15; j++) { in >> chimeraFlag; } pDataArray->m->gobble(in); if (chimeraFlag == "Y") { if (pDataArray->dups) { if (!pDataArray->hasCount) { //output redundant names for each group itN = thisnamemap.find(name); if (itN != thisnamemap.end()) { vector tempNames; pDataArray->m->splitAtComma(itN->second, tempNames); for (int j = 0; j < tempNames.size(); j++) { out << tempNames[j] << endl; } }else { pDataArray->m->mothurOut("[ERROR]: parsing cannot find " + name + ".\n"); pDataArray->m->control_pressed = true; } }else { out << name << endl; outCountList << name << '\t' << pDataArray->groups[i] << endl; } }else{ out << name << endl; } numChimeras++; } num++; } in.close(); out.close(); //////////////////////////////////////////////////////////////////////////////////////////////////////////////// totalSeqs += num; pDataArray->numChimeras += numChimeras; if (pDataArray->m->control_pressed) { if (pDataArray->hasCount) { delete cparser; } { delete parser; } return 0; } //remove file made for uchime pDataArray->m->mothurRemove(filename); //append files pDataArray->m->appendFiles(outputFName, pDataArray->outputFName); pDataArray->m->mothurRemove(outputFName); pDataArray->m->appendFiles(accnos, pDataArray->accnos); pDataArray->m->mothurRemove(accnos); if (pDataArray->chimealns) { pDataArray->m->appendFiles(alns, pDataArray->alns); pDataArray->m->mothurRemove(alns); } pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(num) + " sequences from group " + pDataArray->groups[i] + "."); pDataArray->m->mothurOutEndLine(); } if (pDataArray->hasCount && pDataArray->dups) { outCountList.close(); } pDataArray->count = totalSeqs; if (pDataArray->hasCount) { delete cparser; } { delete parser; } return totalSeqs; } catch(exception& e) { pDataArray->m->errorOut(e, "ChimeraUchimeCommand", "MyUchimeThreadFunction"); exit(1); } } /**************************************************************************************************/ static DWORD WINAPI MyUchimeSeqsThreadFunction(LPVOID lpParam){ uchimeData* pDataArray; pDataArray = (uchimeData*)lpParam; try { pDataArray->outputFName = pDataArray->m->getFullPathName(pDataArray->outputFName); pDataArray->filename = pDataArray->m->getFullPathName(pDataArray->filename); pDataArray->alns = pDataArray->m->getFullPathName(pDataArray->alns); int totalSeqs = 0; int numChimeras = 0; int start = time(NULL); if (pDataArray->m->control_pressed) { return 0; } //to allow for spaces in the path string outputFName = "\"" + pDataArray->outputFName + "\""; string filename = "\"" + pDataArray->filename + "\""; string alns = "\"" + pDataArray->alns+ "\""; string templatefile = "\"" + pDataArray->templatefile + "\""; string accnos = pDataArray->accnos; vector cPara; string uchimeCommand = pDataArray->uchimeLocation; uchimeCommand = "\"" + uchimeCommand + "\""; char* tempUchime; tempUchime= new char[uchimeCommand.length()+1]; *tempUchime = '\0'; strncat(tempUchime, uchimeCommand.c_str(), uchimeCommand.length()); cPara.push_back(tempUchime); string outputFileName = filename.substr(1, filename.length()-2) + ".uchime_formatted"; //prepFile(filename.substr(1, filename.length()-2), outputFileName); //prepFile(filename, outputFileName); /******************************************/ ifstream in23; pDataArray->m->openInputFile((filename.substr(1, filename.length()-2)), in23); ofstream out23; pDataArray->m->openOutputFile(outputFileName, out23); int fcount = 0; while (!in23.eof()) { if (pDataArray->m->control_pressed) { break; } Sequence seq(in23); pDataArray->m->gobble(in23); if (seq.getName() != "") { seq.printSequence(out23); fcount++; } } in23.close(); out23.close(); /******************************************/ filename = outputFileName; filename = "\"" + filename + "\""; //add reference file char* tempRef = new char[5]; //strcpy(tempRef, "--db"); *tempRef = '\0'; strncat(tempRef, "--db", 4); cPara.push_back(tempRef); char* tempR = new char[templatefile.length()+1]; //strcpy(tempR, templatefile.c_str()); *tempR = '\0'; strncat(tempR, templatefile.c_str(), templatefile.length()); cPara.push_back(tempR); char* tempIn = new char[8]; *tempIn = '\0'; strncat(tempIn, "--input", 7); //strcpy(tempIn, "--input"); cPara.push_back(tempIn); char* temp = new char[filename.length()+1]; *temp = '\0'; strncat(temp, filename.c_str(), filename.length()); //strcpy(temp, filename.c_str()); cPara.push_back(temp); char* tempO = new char[12]; *tempO = '\0'; strncat(tempO, "--uchimeout", 11); //strcpy(tempO, "--uchimeout"); cPara.push_back(tempO); char* tempout = new char[outputFName.length()+1]; //strcpy(tempout, outputFName.c_str()); *tempout = '\0'; strncat(tempout, outputFName.c_str(), outputFName.length()); cPara.push_back(tempout); if (pDataArray->chimealns) { char* tempA = new char[13]; *tempA = '\0'; strncat(tempA, "--uchimealns", 12); //strcpy(tempA, "--uchimealns"); cPara.push_back(tempA); char* tempa = new char[alns.length()+1]; //strcpy(tempa, alns.c_str()); *tempa = '\0'; strncat(tempa, alns.c_str(), alns.length()); cPara.push_back(tempa); } if (pDataArray->strand != "") { char* tempA = new char[9]; *tempA = '\0'; strncat(tempA, "--strand", 8); cPara.push_back(tempA); char* tempa = new char[pDataArray->strand.length()+1]; *tempa = '\0'; strncat(tempa, pDataArray->strand.c_str(), pDataArray->strand.length()); cPara.push_back(tempa); } if (pDataArray->useAbskew) { char* tempskew = new char[9]; *tempskew = '\0'; strncat(tempskew, "--abskew", 8); //strcpy(tempskew, "--abskew"); cPara.push_back(tempskew); char* tempSkew = new char[pDataArray->abskew.length()+1]; //strcpy(tempSkew, abskew.c_str()); *tempSkew = '\0'; strncat(tempSkew, pDataArray->abskew.c_str(), pDataArray->abskew.length()); cPara.push_back(tempSkew); } if (pDataArray->useMinH) { char* tempminh = new char[7]; *tempminh = '\0'; strncat(tempminh, "--minh", 6); //strcpy(tempminh, "--minh"); cPara.push_back(tempminh); char* tempMinH = new char[pDataArray->minh.length()+1]; *tempMinH = '\0'; strncat(tempMinH, pDataArray->minh.c_str(), pDataArray->minh.length()); //strcpy(tempMinH, minh.c_str()); cPara.push_back(tempMinH); } if (pDataArray->useMindiv) { char* tempmindiv = new char[9]; *tempmindiv = '\0'; strncat(tempmindiv, "--mindiv", 8); //strcpy(tempmindiv, "--mindiv"); cPara.push_back(tempmindiv); char* tempMindiv = new char[pDataArray->mindiv.length()+1]; *tempMindiv = '\0'; strncat(tempMindiv, pDataArray->mindiv.c_str(), pDataArray->mindiv.length()); //strcpy(tempMindiv, mindiv.c_str()); cPara.push_back(tempMindiv); } if (pDataArray->useXn) { char* tempxn = new char[5]; //strcpy(tempxn, "--xn"); *tempxn = '\0'; strncat(tempxn, "--xn", 4); cPara.push_back(tempxn); char* tempXn = new char[pDataArray->xn.length()+1]; //strcpy(tempXn, xn.c_str()); *tempXn = '\0'; strncat(tempXn, pDataArray->xn.c_str(), pDataArray->xn.length()); cPara.push_back(tempXn); } if (pDataArray->useDn) { char* tempdn = new char[5]; //strcpy(tempdn, "--dn"); *tempdn = '\0'; strncat(tempdn, "--dn", 4); cPara.push_back(tempdn); char* tempDn = new char[pDataArray->dn.length()+1]; *tempDn = '\0'; strncat(tempDn, pDataArray->dn.c_str(), pDataArray->dn.length()); //strcpy(tempDn, dn.c_str()); cPara.push_back(tempDn); } if (pDataArray->useXa) { char* tempxa = new char[5]; //strcpy(tempxa, "--xa"); *tempxa = '\0'; strncat(tempxa, "--xa", 4); cPara.push_back(tempxa); char* tempXa = new char[pDataArray->xa.length()+1]; *tempXa = '\0'; strncat(tempXa, pDataArray->xa.c_str(), pDataArray->xa.length()); //strcpy(tempXa, xa.c_str()); cPara.push_back(tempXa); } if (pDataArray->useChunks) { char* tempchunks = new char[9]; //strcpy(tempchunks, "--chunks"); *tempchunks = '\0'; strncat(tempchunks, "--chunks", 8); cPara.push_back(tempchunks); char* tempChunks = new char[pDataArray->chunks.length()+1]; *tempChunks = '\0'; strncat(tempChunks, pDataArray->chunks.c_str(), pDataArray->chunks.length()); //strcpy(tempChunks, chunks.c_str()); cPara.push_back(tempChunks); } if (pDataArray->useMinchunk) { char* tempminchunk = new char[11]; //strcpy(tempminchunk, "--minchunk"); *tempminchunk = '\0'; strncat(tempminchunk, "--minchunk", 10); cPara.push_back(tempminchunk); char* tempMinchunk = new char[pDataArray->minchunk.length()+1]; *tempMinchunk = '\0'; strncat(tempMinchunk, pDataArray->minchunk.c_str(), pDataArray->minchunk.length()); //strcpy(tempMinchunk, minchunk.c_str()); cPara.push_back(tempMinchunk); } if (pDataArray->useIdsmoothwindow) { char* tempidsmoothwindow = new char[17]; *tempidsmoothwindow = '\0'; strncat(tempidsmoothwindow, "--idsmoothwindow", 16); //strcpy(tempidsmoothwindow, "--idsmoothwindow"); cPara.push_back(tempidsmoothwindow); char* tempIdsmoothwindow = new char[pDataArray->idsmoothwindow.length()+1]; *tempIdsmoothwindow = '\0'; strncat(tempIdsmoothwindow, pDataArray->idsmoothwindow.c_str(), pDataArray->idsmoothwindow.length()); //strcpy(tempIdsmoothwindow, idsmoothwindow.c_str()); cPara.push_back(tempIdsmoothwindow); } if (pDataArray->useMaxp) { char* tempmaxp = new char[7]; //strcpy(tempmaxp, "--maxp"); *tempmaxp = '\0'; strncat(tempmaxp, "--maxp", 6); cPara.push_back(tempmaxp); char* tempMaxp = new char[pDataArray->maxp.length()+1]; *tempMaxp = '\0'; strncat(tempMaxp, pDataArray->maxp.c_str(), pDataArray->maxp.length()); //strcpy(tempMaxp, maxp.c_str()); cPara.push_back(tempMaxp); } if (!pDataArray->skipgaps) { char* tempskipgaps = new char[13]; //strcpy(tempskipgaps, "--[no]skipgaps"); *tempskipgaps = '\0'; strncat(tempskipgaps, "--noskipgaps", 12); cPara.push_back(tempskipgaps); } if (!pDataArray->skipgaps2) { char* tempskipgaps2 = new char[14]; //strcpy(tempskipgaps2, "--[no]skipgaps2"); *tempskipgaps2 = '\0'; strncat(tempskipgaps2, "--noskipgaps2", 13); cPara.push_back(tempskipgaps2); } if (pDataArray->useMinlen) { char* tempminlen = new char[9]; *tempminlen = '\0'; strncat(tempminlen, "--minlen", 8); //strcpy(tempminlen, "--minlen"); cPara.push_back(tempminlen); char* tempMinlen = new char[pDataArray->minlen.length()+1]; //strcpy(tempMinlen, minlen.c_str()); *tempMinlen = '\0'; strncat(tempMinlen, pDataArray->minlen.c_str(), pDataArray->minlen.length()); cPara.push_back(tempMinlen); } if (pDataArray->useMaxlen) { char* tempmaxlen = new char[9]; //strcpy(tempmaxlen, "--maxlen"); *tempmaxlen = '\0'; strncat(tempmaxlen, "--maxlen", 8); cPara.push_back(tempmaxlen); char* tempMaxlen = new char[pDataArray->maxlen.length()+1]; *tempMaxlen = '\0'; strncat(tempMaxlen, pDataArray->maxlen.c_str(), pDataArray->maxlen.length()); //strcpy(tempMaxlen, maxlen.c_str()); cPara.push_back(tempMaxlen); } if (pDataArray->ucl) { char* tempucl = new char[5]; strcpy(tempucl, "--ucl"); cPara.push_back(tempucl); } if (pDataArray->useQueryfract) { char* tempqueryfract = new char[13]; *tempqueryfract = '\0'; strncat(tempqueryfract, "--queryfract", 12); //strcpy(tempqueryfract, "--queryfract"); cPara.push_back(tempqueryfract); char* tempQueryfract = new char[pDataArray->queryfract.length()+1]; *tempQueryfract = '\0'; strncat(tempQueryfract, pDataArray->queryfract.c_str(), pDataArray->queryfract.length()); //strcpy(tempQueryfract, queryfract.c_str()); cPara.push_back(tempQueryfract); } char** uchimeParameters; uchimeParameters = new char*[cPara.size()]; string commandString = ""; for (int j = 0; j < cPara.size(); j++) { uchimeParameters[j] = cPara[j]; commandString += toString(cPara[j]) + " "; } //int numArgs = cPara.size(); commandString = "\"" + commandString + "\""; //uchime_main(numArgs, uchimeParameters); //cout << "commandString = " << commandString << endl; if (pDataArray->m->debug) { pDataArray->m->mothurOut("[DEBUG]: uchime command = " + commandString + ".\n"); } system(commandString.c_str()); //free memory for(int j = 0; j < cPara.size(); j++) { delete cPara[j]; } delete[] uchimeParameters; //remove "" from filenames outputFName = outputFName.substr(1, outputFName.length()-2); filename = filename.substr(1, filename.length()-2); alns = alns.substr(1, alns.length()-2); if (pDataArray->m->control_pressed) { return 0; } //create accnos file from uchime results ifstream in; pDataArray->m->openInputFile(outputFName, in); ofstream out; pDataArray->m->openOutputFile(accnos, out); numChimeras = 0; while(!in.eof()) { if (pDataArray->m->control_pressed) { break; } string name = ""; string chimeraFlag = ""; in >> chimeraFlag >> name; for (int j = 0; j < 15; j++) { in >> chimeraFlag; } pDataArray->m->gobble(in); if (chimeraFlag == "Y") { out << name << endl; numChimeras++; } totalSeqs++; } in.close(); out.close(); if (fcount != totalSeqs) { pDataArray->m->mothurOut("[ERROR]: process " + toString(pDataArray->threadID) + " only processed " + toString(pDataArray->count) + " of " + toString(pDataArray->end) + " sequences assigned to it, quitting. \n"); pDataArray->m->control_pressed = true; } if (pDataArray->m->control_pressed) { return 0; } pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(totalSeqs) + " sequences."); pDataArray->m->mothurOutEndLine(); pDataArray->count = totalSeqs; pDataArray->numChimeras = numChimeras; return totalSeqs; } catch(exception& e) { pDataArray->m->errorOut(e, "ChimeraUchimeCommand", "MyUchimeSeqsThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/chopseqscommand.cpp000066400000000000000000001135621255543666200221370ustar00rootroot00000000000000/* * chopseqscommand.cpp * Mothur * * Created by westcott on 5/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "chopseqscommand.h" #include "sequence.hpp" #include "removeseqscommand.h" //********************************************************************************************************************** vector ChopSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "none", "none","qfile",false,false,true); parameters.push_back(pqfile); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pnumbases("numbases", "Number", "", "0", "", "", "","",false,true,true); parameters.push_back(pnumbases); CommandParameter pcountgaps("countgaps", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pcountgaps); CommandParameter pshort("short", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pshort); CommandParameter pkeep("keep", "Multiple", "front-back", "front", "", "", "","",false,false); parameters.push_back(pkeep); CommandParameter pkeepn("keepn", "Boolean", "", "f", "", "", "","",false,false); parameters.push_back(pkeepn); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ChopSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The chop.seqs command reads a fasta file and outputs a .chop.fasta containing the trimmed sequences. Note: If a sequence is completely 'chopped', an accnos file will be created with the names of the sequences removed. \n"; helpString += "The chop.seqs command parameters are fasta, name, group, count, numbases, countgaps and keep. fasta is required unless you have a valid current fasta file. numbases is required.\n"; helpString += "The chop.seqs command should be in the following format: chop.seqs(fasta=yourFasta, numbases=yourNum, keep=yourKeep).\n"; helpString += "If you provide a name, group or count file any sequences removed from the fasta file will also be removed from those files.\n"; helpString += "The qfile parameter allows you to provide a quality file associated with the fastafile.\n"; helpString += "The numbases parameter allows you to specify the number of bases you want to keep.\n"; helpString += "The keep parameter allows you to specify whether you want to keep the front or the back of your sequence, default=front.\n"; helpString += "The countgaps parameter allows you to specify whether you want to count gaps as bases, default=false.\n"; helpString += "The short parameter allows you to specify you want to keep sequences that are too short to chop, default=false.\n"; helpString += "The keepn parameter allows you to specify you want to keep ambigous bases, default=false.\n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; helpString += "For example, if you ran chop.seqs with numbases=200 and short=t, if a sequence had 100 bases mothur would keep the sequence rather than eliminate it.\n"; helpString += "Example chop.seqs(fasta=amazon.fasta, numbases=200, keep=front).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ChopSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],chop.fasta"; } else if (type == "qfile") { pattern = "[filename],chop.qual"; } else if (type == "name") { pattern = "[filename],chop.names"; } else if (type == "group") { pattern = "[filename],chop.groups"; } else if (type == "count") { pattern = "[filename],chop.count_table"; } else if (type == "accnos") { pattern = "[filename],chop.accnos"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ChopSeqsCommand::ChopSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "ChopSeqsCommand"); exit(1); } } //********************************************************************************************************************** ChopSeqsCommand::ChopSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["qfile"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; } else if (fastafile == "not found") { //if there is a current fasta file, use it fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } qualfile = validParameter.validFile(parameters, "qfile", true); if (qualfile == "not open") { qualfile = ""; abort = true; } else if (qualfile == "not found") { qualfile = ""; } else { m->setQualFile(qualfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } string temp = validParameter.validFile(parameters, "numbases", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, numbases); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "countgaps", false); if (temp == "not found") { temp = "f"; } countGaps = m->isTrue(temp); temp = validParameter.validFile(parameters, "short", false); if (temp == "not found") { temp = "f"; } Short = m->isTrue(temp); temp = validParameter.validFile(parameters, "keepn", false); if (temp == "not found") { if (qualfile!= "") { temp = "t"; }else { temp = "f"; } } keepN = m->isTrue(temp); if (((!keepN) && (qualfile != "")) || ((countGaps) && (qualfile != ""))){ m->mothurOut("[ERROR]: You cannot set keepn=false with a quality file, or set countgaps to true."); m->mothurOutEndLine(); abort = true; } keep = validParameter.validFile(parameters, "keep", false); if (keep == "not found") { keep = "front"; } if (numbases == 0) { m->mothurOut("You must provide the number of bases you want to keep for the chops.seqs command."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "ChopSeqsCommand"); exit(1); } } //********************************************************************************************************************** int ChopSeqsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } map variables; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); string outputFileName = getOutputFileName("fasta", variables); outputNames.push_back(outputFileName); outputTypes["fasta"].push_back(outputFileName); string outputFileNameAccnos = getOutputFileName("accnos", variables); string fastafileTemp = ""; if (qualfile != "") { fastafileTemp = outputFileName + ".qualFile.Positions.temp"; } vector positions; vector lines; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } #else int numSeqs = 0; positions = m->setFilePosFasta(fastafile, numSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } #endif bool wroteAccnos = false; if(processors == 1) { wroteAccnos = driver(lines[0], fastafile, outputFileName, outputFileNameAccnos, fastafileTemp); } else { wroteAccnos = createProcesses(lines, fastafile, outputFileName, outputFileNameAccnos, fastafileTemp); } if (m->control_pressed) { return 0; } if (qualfile != "") { thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(qualfile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(qualfile)); string outputQualFileName = getOutputFileName("qfile", variables); outputNames.push_back(outputQualFileName); outputTypes["qfile"].push_back(outputQualFileName); processQual(outputQualFileName, fastafileTemp); m->mothurRemove(fastafileTemp); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (wroteAccnos) { outputNames.push_back(outputFileNameAccnos); outputTypes["accnos"].push_back(outputFileNameAccnos); //use remove.seqs to create new name, group and count file if ((countfile != "") || (namefile != "") || (groupfile != "")) { string inputString = "accnos=" + outputFileNameAccnos; if (countfile != "") { inputString += ", count=" + countfile; } else{ if (namefile != "") { inputString += ", name=" + namefile; } if (groupfile != "") { inputString += ", group=" + groupfile; } } m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: remove.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* removeCommand = new RemoveSeqsCommand(inputString); removeCommand->execute(); map > filenames = removeCommand->getOutputFiles(); delete removeCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); if (groupfile != "") { thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); string outGroup = getOutputFileName("group", variables); m->renameFile(filenames["group"][0], outGroup); outputNames.push_back(outGroup); outputTypes["group"].push_back(outGroup); } if (namefile != "") { thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); string outName = getOutputFileName("name", variables); m->renameFile(filenames["name"][0], outName); outputNames.push_back(outName); outputTypes["name"].push_back(outName); } if (countfile != "") { thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); string outCount = getOutputFileName("count", variables); m->renameFile(filenames["count"][0], outCount); outputNames.push_back(outCount); outputTypes["count"].push_back(outCount); } } } else { m->mothurRemove(outputFileNameAccnos); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } if (wroteAccnos) { //set accnos file as new current accnosfile itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "execute"); exit(1); } } /**************************************************************************************************/ bool ChopSeqsCommand::createProcesses(vector lines, string filename, string outFasta, string outAccnos, string fastafileTemp) { try { int process = 1; bool wroteAccnos = false; vector processIDS; vector nonBlankAccnosFiles; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ string fastafileTempThisProcess = fastafileTemp; if (fastafileTempThisProcess != "") { fastafileTempThisProcess = fastafileTempThisProcess + m->mothurGetpid(process) + ".temp"; } wroteAccnos = driver(lines[process], filename, outFasta + m->mothurGetpid(process) + ".temp", outAccnos + m->mothurGetpid(process) + ".temp", fastafileTempThisProcess); //pass numSeqs to parent ofstream out; string tempFile = fastafile + m->mothurGetpid(process) + ".bool.temp"; m->openOutputFile(tempFile, out); out << wroteAccnos << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } processIDS.resize(0); process = 1; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ string fastafileTempThisProcess = fastafileTemp; if (fastafileTempThisProcess != "") { fastafileTempThisProcess = fastafileTempThisProcess + m->mothurGetpid(process) + ".temp"; } wroteAccnos = driver(lines[process], filename, outFasta + m->mothurGetpid(process) + ".temp", outAccnos + m->mothurGetpid(process) + ".temp", fastafileTempThisProcess); //pass numSeqs to parent ofstream out; string tempFile = fastafile + m->mothurGetpid(process) + ".bool.temp"; m->openOutputFile(tempFile, out); out << wroteAccnos << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do your part wroteAccnos = driver(lines[0], filename, outFasta, outAccnos, fastafileTemp); //force parent to wait until all the processes are done for (int i=0;imothurRemove(outAccnos); } //remove so other files can be renamed to it //parent reads in and combine Filter info for (int i = 0; i < processIDS.size(); i++) { string tempFilename = fastafile + toString(processIDS[i]) + ".bool.temp"; ifstream in; m->openInputFile(tempFilename, in); bool temp; in >> temp; m->gobble(in); if (temp) { wroteAccnos = temp; nonBlankAccnosFiles.push_back(outAccnos + toString(processIDS[i]) + ".temp"); } else { m->mothurRemove((outAccnos + toString(processIDS[i]) + ".temp")); } in.close(); m->mothurRemove(tempFilename); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the seqSumData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; imothurRemove(outAccnos); } //remove so other files can be renamed to it //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ if (pDataArray[i]->wroteAccnos) { wroteAccnos = pDataArray[i]->wroteAccnos; nonBlankAccnosFiles.push_back(outAccnos + toString(processIDS[i]) + ".temp"); } else { m->mothurRemove((outAccnos + toString(processIDS[i]) + ".temp")); } //check to make sure the process finished if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif for (int i = 0; i < processIDS.size(); i++) { if (fastafileTemp != "") { m->appendFiles((fastafileTemp + toString(processIDS[i]) + ".temp"), fastafileTemp); m->mothurRemove((fastafileTemp + toString(processIDS[i]) + ".temp")); } m->appendFiles((outFasta + toString(processIDS[i]) + ".temp"), outFasta); m->mothurRemove((outFasta + toString(processIDS[i]) + ".temp")); } if (nonBlankAccnosFiles.size() != 0) { m->renameFile(nonBlankAccnosFiles[0], outAccnos); for (int h=1; h < nonBlankAccnosFiles.size(); h++) { m->appendFiles(nonBlankAccnosFiles[h], outAccnos); m->mothurRemove(nonBlankAccnosFiles[h]); } }else { //recreate the accnosfile if needed ofstream out; m->openOutputFile(outAccnos, out); out.close(); } return wroteAccnos; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "createProcesses"); exit(1); } } /**************************************************************************************/ bool ChopSeqsCommand::driver(linePair filePos, string filename, string outFasta, string outAccnos, string fastaFileTemp) { try { ofstream out; m->openOutputFile(outFasta, out); ofstream outAcc; m->openOutputFile(outAccnos, outAcc); ofstream outfTemp; if (fastaFileTemp != "") { m->openOutputFile(fastaFileTemp, outfTemp); } ifstream in; m->openInputFile(filename, in); in.seekg(filePos.start); //adjust if (filePos.start == 0) { m->zapGremlins(in); m->gobble(in); } bool done = false; bool wroteAccnos = false; int count = 0; while (!done) { if (m->control_pressed) { in.close(); out.close(); return 1; } Sequence seq(in); m->gobble(in); if (m->control_pressed) { in.close(); out.close(); outAcc.close(); m->mothurRemove(outFasta); m->mothurRemove(outAccnos); if (fastaFileTemp != "") { outfTemp.close(); m->mothurRemove(fastaFileTemp); } return 0; } if (seq.getName() != "") { string qualValues = ""; string newSeqString = getChopped(seq, qualValues); //output trimmed sequence if (newSeqString != "") { out << ">" << seq.getName() << endl << newSeqString << endl; }else{ outAcc << seq.getName() << endl; wroteAccnos = true; } if (fastaFileTemp != "") { outfTemp << qualValues << endl; } count++; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= filePos.end)) { break; } #else if (in.eof()) { break; } #endif //report progress if((count) % 10000 == 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } } //report progress if((count) % 10000 != 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } in.close(); out.close(); outAcc.close(); if (fastaFileTemp != "") { outfTemp.close(); } return wroteAccnos; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "driver"); exit(1); } } //********************************************************************************************************************** string ChopSeqsCommand::getChopped(Sequence seq, string& qualValues) { try { string temp = seq.getAligned(); string tempUnaligned = seq.getUnaligned(); if (countGaps) { //if needed trim sequence if (keep == "front") {//you want to keep the beginning int tempLength = temp.length(); if (tempLength > numbases) { //you have enough bases to remove some int stopSpot = 0; int numBasesCounted = 0; for (int i = 0; i < temp.length(); i++) { //eliminate N's if (!keepN) { if (toupper(temp[i]) == 'N') { temp[i] = '.'; } } numBasesCounted++; if (numBasesCounted >= numbases) { stopSpot = i; break; } } if (stopSpot == 0) { temp = ""; } else { temp = temp.substr(0, stopSpot+1); } }else { if (!Short) { temp = ""; } //sequence too short } }else { //you are keeping the back int tempLength = temp.length(); if (tempLength > numbases) { //you have enough bases to remove some int stopSpot = 0; int numBasesCounted = 0; for (int i = (temp.length()-1); i >= 0; i--) { //eliminate N's if (!keepN) { if (toupper(temp[i]) == 'N') { temp[i] = '.'; } } numBasesCounted++; if (numBasesCounted >= numbases) { stopSpot = i; break; } } if (stopSpot == 0) { temp = ""; } else { temp = temp.substr(stopSpot+1); } }else { if (!Short) { temp = ""; } //sequence too short } } }else{ //if needed trim sequence if (keep == "front") {//you want to keep the beginning int tempLength = tempUnaligned.length(); if (tempLength > numbases) { //you have enough bases to remove some int stopSpot = 0; int numBasesCounted = 0; for (int i = 0; i < temp.length(); i++) { //eliminate N's if (!keepN) { if (toupper(temp[i]) == 'N') { temp[i] = '.'; tempLength--; if (tempLength < numbases) { stopSpot = 0; break; } } } if(isalpha(temp[i])) { numBasesCounted++; } if (numBasesCounted >= numbases) { stopSpot = i; break; } } if (stopSpot == 0) { temp = ""; } else { temp = temp.substr(0, stopSpot+1); } qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(stopSpot+1) + '\n'; }else { if (!Short) { temp = ""; qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(0) + '\n'; } //sequence too short else { qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(tempLength) + '\n'; } } }else { //you are keeping the back int tempLength = tempUnaligned.length(); if (tempLength > numbases) { //you have enough bases to remove some int stopSpot = 0; int numBasesCounted = 0; for (int i = (temp.length()-1); i >= 0; i--) { if (!keepN) { //eliminate N's if (toupper(temp[i]) == 'N') { temp[i] = '.'; tempLength--; if (tempLength < numbases) { stopSpot = 0; break; } } } if(isalpha(temp[i])) { numBasesCounted++; } if (numBasesCounted >= numbases) { stopSpot = i; break; } } if (stopSpot == 0) { temp = ""; } else { temp = temp.substr(stopSpot); } qualValues = seq.getName() +'\t' + toString(stopSpot) + '\t' + toString(temp.length()-1) + '\n'; }else { if (!Short) { temp = ""; qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(0) + '\n'; } //sequence too short else { qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(tempLength) + '\n'; } } } } return temp; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "getChopped"); exit(1); } } //********************************************************************************************************************** int ChopSeqsCommand::processQual(string outputFile, string inputFile) { try { ofstream out; m->openOutputFile(outputFile, out); ifstream in; m->openInputFile(inputFile, in); ifstream inQual; m->openInputFile(qualfile, inQual); m->mothurOut("Processing the quality file.\n"); int count = 0; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); return 0; } QualityScores qual(inQual); string name = ""; int start = 0; int end = 0; in >> name >> start >> end; m->gobble(in); if (qual.getName() != "") { if (qual.getName() != name) { start = 0; end = 0; } else if (start != 0) { qual.trimQScores(start, -1); qual.printQScores(out); }else if ((start == 0) && (end == 0)) {} else if ((start == 0) && (end != 0)) { qual.trimQScores(-1, end); qual.printQScores(out); } } count++; //report progress if((count) % 10000 == 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } } //report progress if((count) % 10000 != 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } in.close(); out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ChopSeqsCommand", "processQual"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/chopseqscommand.h000066400000000000000000000266521255543666200216070ustar00rootroot00000000000000#ifndef CHOPSEQSCOMMAND_H #define CHOPSEQSCOMMAND_H /* * chopseqscommand.h * Mothur * * Created by westcott on 5/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sequence.hpp" #include "qualityscores.h" class ChopSeqsCommand : public Command { public: ChopSeqsCommand(string); ChopSeqsCommand(); ~ChopSeqsCommand(){}; vector setParameters(); string getCommandName() { return "chop.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Chops.seqs"; } string getDescription() { return "trim sequence length"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string fastafile, outputDir, keep, namefile, groupfile, countfile, qualfile; bool abort, countGaps, Short, keepN; int numbases, processors; vector outputNames; string getChopped(Sequence, string&); bool driver (linePair, string, string, string, string); bool createProcesses(vector, string, string, string, string); int processQual(string, string); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct chopData { string filename; string outFasta, outAccnos, keep, qualfile, fastaFileTemp; unsigned long long start; unsigned long long end; int numbases, count; bool countGaps, Short, wroteAccnos, keepN; MothurOut* m; string namefile; map nameMap; chopData(){} chopData(string f, string ff, string a, MothurOut* mout, unsigned long long st, unsigned long long en, string k, bool cGaps, int nbases, bool S, bool kn, string qu, string ft) { filename = f; outFasta = ff; outAccnos = a; m = mout; start = st; end = en; keep = k; countGaps = cGaps; numbases = nbases; Short = S; wroteAccnos = false; keepN = kn; qualfile = qu; fastaFileTemp = ft; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyChopThreadFunction(LPVOID lpParam){ chopData* pDataArray; pDataArray = (chopData*)lpParam; try { ofstream out; pDataArray->m->openOutputFile(pDataArray->outFasta, out); ofstream outAcc; pDataArray->m->openOutputFile(pDataArray->outAccnos, outAcc); ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); ofstream outfTemp; if (pDataArray->fastaFileTemp != "") { pDataArray->m->openOutputFile(pDataArray->fastaFileTemp, outfTemp); } if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->zapGremlins(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } bool done = false; bool wroteAccnos = false; pDataArray->count = 0; for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { in.close(); out.close(); outAcc.close(); pDataArray->m->mothurRemove(pDataArray->outFasta); pDataArray->m->mothurRemove(pDataArray->outAccnos); if (pDataArray->fastaFileTemp != "") { outfTemp.close(); pDataArray->m->mothurRemove(pDataArray->fastaFileTemp); } return 0; } Sequence seq(in); pDataArray->m->gobble(in); if (seq.getName() != "") { //string newSeqString = getChopped(seq); /////////////////////////////////////////////////////////////////////// string qualValues = ""; string temp = seq.getAligned(); string tempUnaligned = seq.getUnaligned(); if (pDataArray->countGaps) { //if needed trim sequence if (pDataArray->keep == "front") {//you want to keep the beginning int tempLength = temp.length(); if (tempLength > pDataArray->numbases) { //you have enough bases to remove some int stopSpot = 0; int numBasesCounted = 0; for (int i = 0; i < temp.length(); i++) { //eliminate N's if (!pDataArray->keepN) { if (toupper(temp[i]) == 'N') { temp[i] = '.'; } } numBasesCounted++; if (numBasesCounted >= pDataArray->numbases) { stopSpot = i; break; } } if (stopSpot == 0) { temp = ""; } else { temp = temp.substr(0, stopSpot+1); } }else { if (!pDataArray->Short) { temp = ""; } //sequence too short } }else { //you are keeping the back int tempLength = temp.length(); if (tempLength > pDataArray->numbases) { //you have enough bases to remove some int stopSpot = 0; int numBasesCounted = 0; for (int i = (temp.length()-1); i >= 0; i--) { //eliminate N's if (!pDataArray->keepN) { if (toupper(temp[i]) == 'N') { temp[i] = '.'; } } numBasesCounted++; if (numBasesCounted >= pDataArray->numbases) { stopSpot = i; break; } } if (stopSpot == 0) { temp = ""; } else { temp = temp.substr(stopSpot+1); } }else { if (!pDataArray->Short) { temp = ""; } //sequence too short } } }else{ //if needed trim sequence if (pDataArray->keep == "front") {//you want to keep the beginning int tempLength = tempUnaligned.length(); if (tempLength > pDataArray->numbases) { //you have enough bases to remove some int stopSpot = 0; int numBasesCounted = 0; for (int i = 0; i < temp.length(); i++) { if (!pDataArray->keepN) { //eliminate N's if (toupper(temp[i]) == 'N') { temp[i] = '.'; tempLength--; if (tempLength < pDataArray->numbases) { stopSpot = 0; break; } } } if(isalpha(temp[i])) { numBasesCounted++; } if (numBasesCounted >= pDataArray->numbases) { stopSpot = i; break; } } if (stopSpot == 0) { temp = ""; } else { temp = temp.substr(0, stopSpot+1); } qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(stopSpot+1) + '\n'; }else { if (!pDataArray->Short) { temp = ""; qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(0) + '\n'; } //sequence too short else { qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(tempLength) + '\n'; } } }else { //you are keeping the back int tempLength = tempUnaligned.length(); if (tempLength > pDataArray->numbases) { //you have enough bases to remove some int stopSpot = 0; int numBasesCounted = 0; for (int i = (temp.length()-1); i >= 0; i--) { if (!pDataArray->keepN) { //eliminate N's if (toupper(temp[i]) == 'N') { temp[i] = '.'; tempLength--; if (tempLength < pDataArray->numbases) { stopSpot = 0; break; } } } if(isalpha(temp[i])) { numBasesCounted++; } if (numBasesCounted >= pDataArray->numbases) { stopSpot = i; break; } } if (stopSpot == 0) { temp = ""; } else { temp = temp.substr(stopSpot); } qualValues = seq.getName() +'\t' + toString(stopSpot) + '\t' + toString(temp.length()-1) + '\n'; }else { if (!pDataArray->Short) { temp = ""; qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(0) + '\n'; } //sequence too short else { qualValues = seq.getName() +'\t' + toString(0) + '\t' + toString(tempLength) + '\n'; } } } } string newSeqString = temp; /////////////////////////////////////////////////////////////////////// //output trimmed sequence if (newSeqString != "") { out << ">" << seq.getName() << endl << newSeqString << endl; }else{ outAcc << seq.getName() << endl; pDataArray->wroteAccnos = true; } if (pDataArray->fastaFileTemp != "") { outfTemp << qualValues << endl; } pDataArray->count++; } //report progress if((pDataArray->count) % 1000 == 0){ pDataArray->m->mothurOut(toString(pDataArray->count)); pDataArray->m->mothurOutEndLine(); } } //report progress if((pDataArray->count) % 1000 != 0){ pDataArray->m->mothurOut(toString(pDataArray->count)); pDataArray->m->mothurOutEndLine(); } in.close(); out.close(); outAcc.close(); if (pDataArray->fastaFileTemp != "") { outfTemp.close(); } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ChopsSeqsCommand", "MyChopThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/classifyotucommand.cpp000066400000000000000000001114201255543666200226460ustar00rootroot00000000000000/* * classifyotucommand.cpp * Mothur * * Created by westcott on 6/1/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "classifyotucommand.h" #include "phylotree.h" #include "phylosummary.h" #include "sharedutilities.h" //********************************************************************************************************************** vector ClassifyOtuCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(plist); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "none","constaxonomy",false,true,true); parameters.push_back(ptaxonomy); CommandParameter preftaxonomy("reftaxonomy", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(preftaxonomy); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter ppersample("persample", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(ppersample); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pbasis("basis", "Multiple", "otu-sequence", "otu", "", "", "","",false,false); parameters.push_back(pbasis); CommandParameter pcutoff("cutoff", "Number", "", "51", "", "", "","",false,true); parameters.push_back(pcutoff); CommandParameter pthreshold("threshold", "Number", "", "0", "", "", "","",false,true); parameters.push_back(pthreshold); CommandParameter pprobs("probs", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pprobs); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClassifyOtuCommand::getHelpString(){ try { string helpString = ""; helpString += "The classify.otu command parameters are list, taxonomy, reftaxonomy, name, group, count, persample, cutoff, label, basis and probs. The taxonomy and list parameters are required unless you have a valid current file.\n"; helpString += "The reftaxonomy parameter allows you give the name of the reference taxonomy file used when you classified your sequences. Providing it will keep the rankIDs in the summary file static.\n"; helpString += "The name parameter allows you add a names file with your taxonomy file.\n"; helpString += "The group parameter allows you provide a group file to use in creating the summary file breakdown.\n"; helpString += "The count parameter allows you add a count file associated with your list file. When using the count parameter mothur assumes your list file contains only uniques.\n"; helpString += "The basis parameter allows you indicate what you want the summary file to represent, options are otu and sequence. Default is otu.\n"; helpString += "For example consider the following basis=sequence could give Clostridiales 3 105 16 43 46, where 105 is the total number of sequences whose otu classified to Clostridiales.\n"; helpString += "16 is the number of sequences in the otus from groupA, 43 is the number of sequences in the otus from groupB, and 46 is the number of sequences in the otus from groupC.\n"; helpString += "Now for basis=otu could give Clostridiales 3 7 6 1 2, where 7 is the number of otus that classified to Clostridiales.\n"; helpString += "6 is the number of otus containing sequences from groupA, 1 is the number of otus containing sequences from groupB, and 2 is the number of otus containing sequences from groupC.\n"; helpString += "The label parameter allows you to select what distance levels you would like a output files created for, and is separated by dashes.\n"; helpString += "The persample parameter allows you to find a consensus taxonomy for each group. Default=f\n"; helpString += "The default value for label is all labels in your inputfile.\n"; helpString += "The cutoff parameter allows you to specify a consensus confidence threshold for your otu taxonomy output. The default is 51, meaning 51%. Cutoff cannot be below 51.\n"; helpString += "The probs parameter shuts off the outputting of the consensus confidence results. The default is true, meaning you want the confidence to be shown.\n"; helpString += "The threshold parameter allows you to specify a cutoff for the taxonomy file that is being inputted. Once the classification falls below the threshold the mothur will refer to it as unclassified when calculating the concensus. This feature is similar to adjusting the cutoff in classify.seqs. Default=0.\n"; helpString += "The classify.otu command should be in the following format: classify.otu(taxonomy=yourTaxonomyFile, list=yourListFile, name=yourNamesFile, label=yourLabels).\n"; helpString += "Example classify.otu(taxonomy=abrecovery.silva.full.taxonomy, list=abrecovery.fn.list, label=0.10).\n"; helpString += "Note: No spaces between parameter labels (i.e. list), '=' and parameters (i.e.yourListFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClassifyOtuCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "constaxonomy") { pattern = "[filename],[distance],cons.taxonomy"; } else if (type == "taxsummary") { pattern = "[filename],[distance],cons.tax.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClassifyOtuCommand::ClassifyOtuCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["taxsummary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "ClassifyOtuCommand"); exit(1); } } //********************************************************************************************************************** ClassifyOtuCommand::ClassifyOtuCommand(string option) { try{ abort = false; calledHelp = false; allLines = 1; labels.clear(); //allow user to run help if (option == "help") { help(); abort = true; calledHelp = true; }else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["taxsummary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("reftaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reftaxonomy"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not found") { //if there is a current list file, use it listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current listfile and the list parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (listfile == "not open") { abort = true; } else { m->setListFile(listfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not found") { //if there is a current list file, use it taxfile = m->getTaxonomyFile(); if (taxfile != "") { m->mothurOut("Using " + taxfile + " as input file for the taxonomy parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current taxonomy file and the taxonomy parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (taxfile == "not open") { abort = true; } else { m->setTaxonomyFile(taxfile); } refTaxonomy = validParameter.validFile(parameters, "reftaxonomy", true); if (refTaxonomy == "not found") { refTaxonomy = ""; m->mothurOut("reftaxonomy is not required, but if given will keep the rankIDs in the summary file static."); m->mothurOutEndLine(); } else if (refTaxonomy == "not open") { abort = true; } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; allLines = 1; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } basis = validParameter.validFile(parameters, "basis", false); if (basis == "not found") { basis = "otu"; } if ((basis != "otu") && (basis != "sequence")) { m->mothurOut("Invalid option for basis. basis options are otu and sequence, using otu."); m->mothurOutEndLine(); } string temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "51"; } m->mothurConvert(temp, cutoff); temp = validParameter.validFile(parameters, "threshold", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, threshold); temp = validParameter.validFile(parameters, "probs", false); if (temp == "not found"){ temp = "true"; } probs = m->isTrue(temp); temp = validParameter.validFile(parameters, "persample", false); if (temp == "not found"){ temp = "f"; } persample = m->isTrue(temp); if ((groupfile == "") && (countfile == "")) { if (persample) { m->mothurOut("persample is only valid with a group file, or count file with group information. Setting persample=f.\n"); persample = false; } } if (countfile != "") { CountTable cts; if (!cts.testGroups(countfile)) { if (persample) { m->mothurOut("persample is only valid with a group file, or count file with group information. Setting persample=f.\n"); persample = false; } } } if ((cutoff < 51) || (cutoff > 100)) { m->mothurOut("cutoff must be above 50, and no greater than 100."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if (namefile == ""){ vector files; files.push_back(taxfile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "ClassifyOtuCommand"); exit(1); } } //********************************************************************************************************************** int ClassifyOtuCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //if user gave a namesfile then use it if (namefile != "") { m->readNames(namefile, nameMap, true); } if (groupfile != "") { groupMap = new GroupMap(groupfile); groupMap->readMap(); groups = groupMap->getNamesOfGroups(); } else { groupMap = NULL; } if (countfile != "") { ct = new CountTable(); ct->readTable(countfile, true, false); if (ct->hasGroupInfo()) { groups = ct->getNamesOfGroups(); } } else { ct = NULL; } //read taxonomy file and save in map for easy access in building bin trees bool removeConfidences = false; if (threshold == 0) { removeConfidences = true; } m->readTax(taxfile, taxMap, removeConfidences); if (threshold != 0) { processTaxMap(); } if (m->control_pressed) { return 0; } input = new InputData(listfile, "list"); list = input->getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { outputTypes.clear(); if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } delete input; delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel() + "\t" + toString(list->size())); m->mothurOutEndLine(); process(list); if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } delete input; delete list; return 0; } processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input->getListVector(lastLabel); m->mothurOut(list->getLabel() + "\t" + toString(list->size())); m->mothurOutEndLine(); process(list); if (m->control_pressed) { outputTypes.clear(); if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; delete list; return 0; } processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input->getListVector(); } //output error messages about any remaining user labels bool needToRun = false; for (set::iterator it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + (*it)); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input->getListVector(lastLabel); m->mothurOut(list->getLabel() + "\t" + toString(list->size())); m->mothurOutEndLine(); process(list); delete list; if (m->control_pressed) { outputTypes.clear(); if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; delete list; return 0; } } delete input; if (groupMap != NULL) { delete groupMap; } if (ct != NULL) { delete ct; } if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "execute"); exit(1); } } //********************************************************************************************************************** vector ClassifyOtuCommand::findConsensusTaxonomy(vector names, int& size, string& conTax) { try{ conTax = ""; vector allNames; map::iterator it; map::iterator it2; //create a tree containing sequences from this bin PhyloTree* phylo = new PhyloTree(); size = 0; for (int i = 0; i < names.size(); i++) { //if namesfile include the names if (namefile != "") { //is this sequence in the name file - namemap maps seqName -> repSeqName it2 = nameMap.find(names[i]); if (it2 == nameMap.end()) { //this name is not in name file, skip it m->mothurOut(names[i] + " is not in your name file. I will not include it in the consensus."); m->mothurOutEndLine(); }else{ //is this sequence in the taxonomy file - look for repSeqName since we are assuming the taxonomy file is unique it = taxMap.find(it2->second); if (it == taxMap.end()) { //this name is not in taxonomy file, skip it if (names[i] != it2->second) { m->mothurOut(names[i] + " is represented by " + it2->second + " and is not in your taxonomy file. I will not include it in the consensus."); m->mothurOutEndLine(); } else { m->mothurOut(names[i] + " is not in your taxonomy file. I will not include it in the consensus."); m->mothurOutEndLine(); } }else{ //add seq to tree phylo->addSeqToTree(names[i], it->second); size++; allNames.push_back(names[i]); } } }else{ //is this sequence in the taxonomy file - look for repSeqName since we are assuming the taxonomy file is unique it = taxMap.find(names[i]); if (it == taxMap.end()) { //this name is not in taxonomy file, skip it m->mothurOut(names[i] + " is not in your taxonomy file. I will not include it in the consensus."); m->mothurOutEndLine(); }else{ if (countfile != "") { int numDups = ct->getNumSeqs(names[i]); for (int j = 0; j < numDups; j++) { phylo->addSeqToTree(names[i], it->second); } size += numDups; }else{ //add seq to tree phylo->addSeqToTree(names[i], it->second); size++; } allNames.push_back(names[i]); } } if (m->control_pressed) { delete phylo; return allNames; } } //build tree phylo->assignHeirarchyIDs(0); TaxNode currentNode = phylo->get(0); int myLevel = 0; //at each level while (currentNode.children.size() != 0) { //you still have more to explore TaxNode bestChild; int bestChildSize = 0; //go through children for (map::iterator itChild = currentNode.children.begin(); itChild != currentNode.children.end(); itChild++) { TaxNode temp = phylo->get(itChild->second); //select child with largest accesions - most seqs assigned to it if (temp.accessions.size() > bestChildSize) { bestChild = phylo->get(itChild->second); bestChildSize = temp.accessions.size(); } } //phylotree adds an extra unknown so we want to remove that if (bestChild.name == "unknown") { bestChildSize--; } //is this taxonomy above cutoff int consensusConfidence = ceil((bestChildSize / (float) size) * 100); if (consensusConfidence >= cutoff) { //if yes, add it if (probs) { conTax += bestChild.name + "(" + toString(consensusConfidence) + ");"; }else{ conTax += bestChild.name + ";"; } myLevel++; }else{ //if no, quit break; } //move down a level currentNode = bestChild; } if (myLevel != phylo->getMaxLevel()) { while (myLevel != phylo->getMaxLevel()) { if (probs) { conTax += "unclassified(100);"; }else{ conTax += "unclassified;"; } myLevel++; } } if (conTax == "") { conTax = "no_consensus;"; } delete phylo; return allNames; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "findConsensusTaxonomy"); exit(1); } } //********************************************************************************************************************** int ClassifyOtuCommand::process(ListVector* processList) { try{ string conTax; int size; //create output file if (outputDir == "") { outputDir += m->hasPath(listfile); } ofstream out; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[distance]"] = processList->getLabel(); string outputFile = getOutputFileName("constaxonomy", variables); m->openOutputFile(outputFile, out); outputNames.push_back(outputFile); outputTypes["constaxonomy"].push_back(outputFile); ofstream outSum; string outputSumFile = getOutputFileName("taxsummary", variables); m->openOutputFile(outputSumFile, outSum); outputNames.push_back(outputSumFile); outputTypes["taxsummary"].push_back(outputSumFile); out << "OTU\tSize\tTaxonomy" << endl; PhyloSummary* taxaSum; if (countfile != "") { if (refTaxonomy != "") { taxaSum = new PhyloSummary(refTaxonomy, ct,false); } else { taxaSum = new PhyloSummary(ct,false); } }else { if (refTaxonomy != "") { taxaSum = new PhyloSummary(refTaxonomy, groupMap,false); } else { taxaSum = new PhyloSummary(groupMap,false); } } vector outSums; vector outs; vector taxaSums; map groupIndex; if (persample) { for (int i = 0; i < groups.size(); i++) { groupIndex[groups[i]] = i; ofstream* temp = new ofstream(); variables["[distance]"] = processList->getLabel() + "." + groups[i]; string outputFile = getOutputFileName("constaxonomy", variables); m->openOutputFile(outputFile, *temp); (*temp) << "OTU\tSize\tTaxonomy" << endl; outs.push_back(temp); outputNames.push_back(outputFile); outputTypes["constaxonomy"].push_back(outputFile); ofstream* tempSum = new ofstream(); string outputSumFile = getOutputFileName("taxsummary", variables); m->openOutputFile(outputSumFile, *tempSum); outSums.push_back(tempSum); outputNames.push_back(outputSumFile); outputTypes["taxsummary"].push_back(outputSumFile); PhyloSummary* taxaSumt; if (countfile != "") { if (refTaxonomy != "") { taxaSumt = new PhyloSummary(refTaxonomy, ct, false); } else { taxaSumt = new PhyloSummary(ct, false); } }else { if (refTaxonomy != "") { taxaSumt = new PhyloSummary(refTaxonomy, groupMap,false); } else { taxaSumt = new PhyloSummary(groupMap,false); } } taxaSums.push_back(taxaSumt); } } //for each bin in the list vector string snumBins = toString(processList->getNumBins()); vector binLabels = processList->getLabels(); for (int i = 0; i < processList->getNumBins(); i++) { if (m->control_pressed) { break; } vector names; string binnames = processList->get(i); vector thisNames; m->splitAtComma(binnames, thisNames); names = findConsensusTaxonomy(thisNames, size, conTax); if (m->control_pressed) { break; } out << binLabels[i] << '\t' << size << '\t' << conTax << endl; string noConfidenceConTax = conTax; m->removeConfidences(noConfidenceConTax); //add this bins taxonomy to summary if (basis == "sequence") { for(int j = 0; j < names.size(); j++) { //int numReps = 1; //if (countfile != "") { numReps = ct->getNumSeqs(names[j]); } //for(int k = 0; k < numReps; k++) { taxaSum->addSeqToTree(names[j], noConfidenceConTax); } taxaSum->addSeqToTree(names[j], noConfidenceConTax); } }else { //otu map containsGroup; if (countfile != "") { if (ct->hasGroupInfo()) { vector mGroups = ct->getNamesOfGroups(); for (int k = 0; k < names.size(); k++) { vector counts = ct->getGroupCounts(names[k]); for (int h = 0; h < counts.size(); h++) { if (counts[h] != 0) { containsGroup[mGroups[h]] = true; } } } } }else { if (groupfile != "") { vector mGroups = groupMap->getNamesOfGroups(); for (int j = 0; j < mGroups.size(); j++) { containsGroup[mGroups[j]] = false; } for (int k = 0; k < names.size(); k++) { //find out the sequences group string group = groupMap->getGroup(names[k]); if (group == "not found") { m->mothurOut("[WARNING]: " + names[k] + " is not in your groupfile, and will be included in the overall total, but not any group total."); m->mothurOutEndLine(); } else { containsGroup[group] = true; } } } } taxaSum->addSeqToTree(noConfidenceConTax, containsGroup); } if (persample) { //divide names by group map > parsedNames; map >::iterator itParsed; //parse names by group for (int j = 0; j < names.size(); j++) { if (groupfile != "") { string group = groupMap->getGroup(names[j]); itParsed = parsedNames.find(group); if (itParsed != parsedNames.end()) { itParsed->second.push_back(names[j]); } else { vector tempNames; tempNames.push_back(names[j]); parsedNames[group] = tempNames; } }else { //count file was used vector thisSeqsGroups = ct->getGroups(names[j]); for (int k = 0; k < thisSeqsGroups.size(); k++) { string group = thisSeqsGroups[k]; itParsed = parsedNames.find(group); if (itParsed != parsedNames.end()) { itParsed->second.push_back(names[j]); } else { vector tempNames; tempNames.push_back(names[j]); parsedNames[group] = tempNames; } } } } for (itParsed = parsedNames.begin(); itParsed != parsedNames.end(); itParsed++) { vector theseNames = findConsensusTaxonomy(itParsed->second, size, conTax); if (m->control_pressed) { break; } (*outs[groupIndex[itParsed->first]]) << binLabels[i] << '\t' << size << '\t' << conTax << endl; string noConfidenceConTax = conTax; m->removeConfidences(noConfidenceConTax); //add this bins taxonomy to summary if (basis == "sequence") { for(int j = 0; j < theseNames.size(); j++) { int numReps = 1; if (countfile != "") { numReps = ct->getGroupCount(theseNames[j], itParsed->first); } //get num seqs for this seq from this group for(int k = 0; k < numReps; k++) { (taxaSums[groupIndex[itParsed->first]])->addSeqToTree(theseNames[j], noConfidenceConTax); } } }else { //otu map containsGroup; containsGroup[itParsed->first] = true; (taxaSums[groupIndex[itParsed->first]])->addSeqToTree(noConfidenceConTax, containsGroup); } } } } out.close(); //print summary file taxaSum->print(outSum); outSum.close(); if (persample) { for (int i = 0; i < groups.size(); i++) { (*outs[i]).close(); taxaSums[i]->print(*outSums[i]); (*outSums[i]).close(); delete outs[i]; delete outSums[i]; delete taxaSums[i]; } } delete taxaSum; return 0; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "process"); exit(1); } } /**************************************************************************************************/ string ClassifyOtuCommand::addUnclassifieds(string tax, int maxlevel) { try{ string newTax, taxon; int level = 0; //keep what you have counting the levels while (tax.find_first_of(';') != -1) { //get taxon taxon = tax.substr(0,tax.find_first_of(';'))+';'; tax = tax.substr(tax.find_first_of(';')+1, tax.length()); newTax += taxon; level++; } //add "unclassified" until you reach maxLevel while (level < maxlevel) { newTax += "unclassified;"; level++; } return newTax; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "addUnclassifieds"); exit(1); } } /**************************************************************************************************/ int ClassifyOtuCommand::processTaxMap() { try{ for (map::iterator it = taxMap.begin(); it != taxMap.end(); it++) { if (m->control_pressed) { break; } vector taxons; string tax = it->second; int taxLength = tax.length(); string taxon = ""; int spot = 0; for(int i=0;iisNumeric1(confidenceScore)) { //its a confidence newtaxon = taxon.substr(0, openParen); //rip off confidence confidence = taxon.substr((openParen+1), (closeParen-openParen-1)); }else { //its part of the taxon newtaxon = taxon; confidence = "0"; } }else{ newtaxon = taxon; confidence = "-1"; } float con = 0; convert(confidence, con); if (con == -1) { i += taxLength; } //not a confidence score, no confidence scores on this taxonomy else if ( con < threshold) { spot = i; break; } //below threshold, set all to unclassified else {} //acceptable, move on taxons.push_back(taxon); taxon = ""; } else{ taxon += tax[i]; } } if (spot != 0) { string newTax = ""; for (int i = 0; i < taxons.size(); i++) { newTax += taxons[i] + ";"; } for (int i = spot; i < taxLength; i++) { if(tax[i] == ';'){ newTax += "unclassified;"; } m->removeConfidences(newTax); it->second = newTax; } }else { m->removeConfidences(tax); it->second = tax; } //leave tax alone } return 0; } catch(exception& e) { m->errorOut(e, "ClassifyOtuCommand", "processTaxMap"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/classifyotucommand.h000066400000000000000000000032321255543666200223140ustar00rootroot00000000000000#ifndef CLASSIFYOTUSCOMMAND_H #define CLASSIFYOTUSCOMMAND_H /* * classifyotucommand.h * Mothur * * Created by westcott on 6/1/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "listvector.hpp" #include "inputdata.h" #include "counttable.h" class ClassifyOtuCommand : public Command { public: ClassifyOtuCommand(string); ClassifyOtuCommand(); ~ClassifyOtuCommand() {} vector setParameters(); string getCommandName() { return "classify.otu"; } string getCommandCategory() { return "Phylotype Analysis"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Westcott SL (2011). Assessing and improving methods used in OTU-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 77:3219.\nhttp://www.mothur.org/wiki/Classify.otu"; } string getDescription() { return "find the concensus taxonomy for each OTU"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: GroupMap* groupMap; CountTable* ct; ListVector* list; InputData* input; string listfile, namefile, taxfile, label, outputDir, refTaxonomy, groupfile, basis, countfile; bool abort, allLines, probs, persample; int cutoff, threshold; set labels; //holds labels to be used vector outputNames, groups; map nameMap; map taxMap; int process(ListVector*); int processTaxMap(); string addUnclassifieds(string, int); vector findConsensusTaxonomy(vector, int&, string&); // returns the name of the "representative" taxonomy of given bin }; #endif mothur-1.36.1/source/commands/classifyrfsharedcommand.cpp000077500000000000000000000450371255543666200236520ustar00rootroot00000000000000// // classifysharedcommand.cpp // Mothur // // Created by Abu Zaher Md. Faridee on 8/13/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "classifyrfsharedcommand.h" #include "randomforest.hpp" #include "decisiontree.hpp" #include "rftreenode.hpp" //********************************************************************************************************************** vector ClassifyRFSharedCommand::setParameters(){ try { //CommandParameter pprocessors("processors", "Number", "", "1", "", "", "",false,false); parameters.push_back(pprocessors); CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","summary",false,true,true); parameters.push_back(pshared); CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pdesign); CommandParameter potupersplit("otupersplit", "Multiple", "log2-squareroot", "log2", "", "", "","",false,false); parameters.push_back(potupersplit); CommandParameter psplitcriteria("splitcriteria", "Multiple", "gainratio-infogain", "gainratio", "", "", "","",false,false); parameters.push_back(psplitcriteria); CommandParameter pnumtrees("numtrees", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pnumtrees); // parameters related to pruning CommandParameter pdopruning("prune", "Boolean", "", "T", "", "", "", "", false, false); parameters.push_back(pdopruning); CommandParameter ppruneaggrns("pruneaggressiveness", "Number", "", "0.9", "", "", "", "", false, false); parameters.push_back(ppruneaggrns); CommandParameter pdiscardhetrees("discarderrortrees", "Boolean", "", "T", "", "", "", "", false, false); parameters.push_back(pdiscardhetrees); CommandParameter phetdiscardthreshold("errorthreshold", "Number", "", "0.4", "", "", "", "", false, false); parameters.push_back(phetdiscardthreshold); CommandParameter psdthreshold("stdthreshold", "Number", "", "0.0", "", "", "", "", false, false); parameters.push_back(psdthreshold); // pruning params end CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClassifySharedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClassifyRFSharedCommand::getHelpString(){ try { string helpString = ""; helpString += "The classify.rf command allows you to ....\n"; helpString += "The classify.rf command parameters are: shared, design, label, groups, otupersplit.\n"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The groups parameter allows you to specify which of the groups in your designfile you would like analyzed.\n"; helpString += "The classify.rf should be in the following format: \n"; helpString += "classify.rf(shared=yourSharedFile, design=yourDesignFile)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClassifySharedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClassifyRFSharedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],[distance],summary"; } //makes file like: amazon.0.03.fasta else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClassifySharedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClassifyRFSharedCommand::ClassifyRFSharedCommand() { try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClassifySharedCommand", "ClassifySharedCommand"); exit(1); } } //********************************************************************************************************************** ClassifyRFSharedCommand::ClassifyRFSharedCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a shared file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("design"); //user has given a design file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } } //check for parameters //get shared file, it is required sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //get design file, it is required designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { sharedfile = ""; abort = true; } else if (designfile == "not found") { //if there is a current shared file, use it designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current designfile and the design parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setDesignFile(designfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); //if user entered a file with a path then preserve it } // NEW CODE for OTU per split selection criteria string temp = validParameter.validFile(parameters, "splitcriteria", false); if (temp == "not found") { temp = "gainratio"; } if ((temp == "gainratio") || (temp == "infogain")) { treeSplitCriterion = temp; } else { m->mothurOut("Not a valid tree splitting criterio. Valid tree splitting criteria are 'gainratio' and 'infogain'."); m->mothurOutEndLine(); abort = true; } temp = validParameter.validFile(parameters, "numtrees", false); if (temp == "not found"){ temp = "100"; } m->mothurConvert(temp, numDecisionTrees); // parameters for pruning temp = validParameter.validFile(parameters, "prune", false); if (temp == "not found") { temp = "f"; } doPruning = m->isTrue(temp); temp = validParameter.validFile(parameters, "pruneaggressiveness", false); if (temp == "not found") { temp = "0.9"; } m->mothurConvert(temp, pruneAggressiveness); temp = validParameter.validFile(parameters, "discarderrortrees", false); if (temp == "not found") { temp = "f"; } discardHighErrorTrees = m->isTrue(temp); temp = validParameter.validFile(parameters, "errorthreshold", false); if (temp == "not found") { temp = "0.4"; } m->mothurConvert(temp, highErrorTreeDiscardThreshold); temp = validParameter.validFile(parameters, "otupersplit", false); if (temp == "not found") { temp = "log2"; } if ((temp == "squareroot") || (temp == "log2")) { optimumFeatureSubsetSelectionCriteria = temp; } else { m->mothurOut("Not a valid OTU per split selection method. Valid OTU per split selection methods are 'log2' and 'squareroot'."); m->mothurOutEndLine(); abort = true; } temp = validParameter.validFile(parameters, "stdthreshold", false); if (temp == "not found") { temp = "0.0"; } m->mothurConvert(temp, featureStandardDeviationThreshold); // end of pruning params //Groups must be checked later to make sure they are valid. SharedUtilities has functions of check the validity, just make to so m->setGroups() after the checks. If you are using these with a shared file no need to check the SharedRAbundVector class will call SharedUtilites for you, kinda nice, huh? string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); //Commonly used to process list, rabund, sabund, shared and relabund files. Look at "smart distancing" examples below in the execute function. string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } } } catch(exception& e) { m->errorOut(e, "ClassifySharedCommand", "ClassifySharedCommand"); exit(1); } } //********************************************************************************************************************** int ClassifyRFSharedCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); //read design file designMap.read(designfile); string lastLabel = lookup[0]->getLabel(); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processSharedAndDesignData(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processSharedAndDesignData(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processSharedAndDesignData(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ClassifySharedCommand", "execute"); exit(1); } } //********************************************************************************************************************** void ClassifyRFSharedCommand::processSharedAndDesignData(vector lookup){ try { // for (int i = 0; i < designMap->getNamesOfGroups().size(); i++) { // string groupName = designMap->getNamesOfGroups()[i]; // cout << groupName << endl; // } // for (int i = 0; i < designMap->getNumSeqs(); i++) { // string sharedGroupName = designMap->getNamesSeqs()[i]; // string treatmentName = designMap->getGroup(sharedGroupName); // cout << sharedGroupName << " : " << treatmentName << endl; // } map treatmentToIntMap; map intToTreatmentMap; vector groups = designMap.getCategory(); for (int i = 0; i < groups.size(); i++) { string treatmentName = groups[i]; treatmentToIntMap[treatmentName] = i; intToTreatmentMap[i] = treatmentName; } int numSamples = lookup.size(); int numFeatures = lookup[0]->getNumBins(); int numRows = numSamples; int numColumns = numFeatures + 1; // extra one space needed for the treatment/outcome vector< vector > dataSet(numRows, vector(numColumns, 0)); vector names; for (int i = 0; i < lookup.size(); i++) { string sharedGroupName = lookup[i]->getGroup(); names.push_back(sharedGroupName); string treatmentName = designMap.get(sharedGroupName); int j = 0; for (; j < lookup[i]->getNumBins(); j++) { int otuCount = lookup[i]->getAbundance(j); dataSet[i][j] = otuCount; } dataSet[i][j] = treatmentToIntMap[treatmentName]; } RandomForest randomForest(dataSet, numDecisionTrees, treeSplitCriterion, doPruning, pruneAggressiveness, discardHighErrorTrees, highErrorTreeDiscardThreshold, optimumFeatureSubsetSelectionCriteria, featureStandardDeviationThreshold); randomForest.populateDecisionTrees(); randomForest.calcForrestErrorRate(); randomForest.printConfusionMatrix(intToTreatmentMap); map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)) + "RF."; variables["[distance]"] = lookup[0]->getLabel(); string filename = getOutputFileName("summary", variables); outputNames.push_back(filename); outputTypes["summary"].push_back(filename); randomForest.calcForrestVariableImportance(filename); // map variable; variable["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)) + "misclassifications."; variable["[distance]"] = lookup[0]->getLabel(); string mc_filename = getOutputFileName("summary", variable); outputNames.push_back(mc_filename); outputTypes["summary"].push_back(mc_filename); randomForest.getMissclassifications(mc_filename, intToTreatmentMap, names); // m->mothurOutEndLine(); } catch(exception& e) { m->errorOut(e, "ClassifySharedCommand", "processSharedAndDesignData"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/classifyrfsharedcommand.h000077500000000000000000000031511255543666200233060ustar00rootroot00000000000000// // classifysharedcommand.h // Mothur // // Created by Abu Zaher Md. Faridee on 8/13/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef __Mothur__classifyrfsharedcommand__ #define __Mothur__classifyrfsharedcommand__ #include "command.hpp" #include "inputdata.h" #include "designmap.h" class ClassifyRFSharedCommand : public Command { public: ClassifyRFSharedCommand(); ClassifyRFSharedCommand(string); ~ClassifyRFSharedCommand() {}; vector setParameters(); string getCommandName() { return "classify.rf"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Classify.rf\n"; } string getDescription() { return "implements the random forest machine learning algorithm to identify OTUs that can be used to differentiate between various groups of samples"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string outputDir; vector outputNames, Groups; string sharedfile, designfile; set labels; bool allLines; int processors; bool useTiming; DesignMap designMap; int numDecisionTrees; string treeSplitCriterion, optimumFeatureSubsetSelectionCriteria; bool doPruning, discardHighErrorTrees; double pruneAggressiveness, highErrorTreeDiscardThreshold, featureStandardDeviationThreshold; void processSharedAndDesignData(vector lookup); }; #endif /* defined(__Mothur__classifyrfsharedcommand__) */ mothur-1.36.1/source/commands/classifyseqscommand.cpp000066400000000000000000001645051255543666200230260ustar00rootroot00000000000000/* * classifyseqscommand.cpp * Mothur * * Created by westcott on 11/2/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "classifyseqscommand.h" //********************************************************************************************************************** vector ClassifySeqsCommand::setParameters(){ try { CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptaxonomy); CommandParameter ptemplate("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptemplate); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","taxonomy",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter psearch("search", "Multiple", "kmer-blast-suffix-distance-align", "kmer", "", "", "","",false,false); parameters.push_back(psearch); CommandParameter pksize("ksize", "Number", "", "8", "", "", "","",false,false); parameters.push_back(pksize); CommandParameter pmethod("method", "Multiple", "wang-knn-zap", "wang", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pmatch("match", "Number", "", "1.0", "", "", "","",false,false); parameters.push_back(pmatch); CommandParameter pmismatch("mismatch", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pmismatch); CommandParameter pgapopen("gapopen", "Number", "", "-2.0", "", "", "","",false,false); parameters.push_back(pgapopen); CommandParameter pgapextend("gapextend", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pgapextend); CommandParameter pcutoff("cutoff", "Number", "", "0", "", "", "","",false,true); parameters.push_back(pcutoff); CommandParameter pprobs("probs", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pprobs); CommandParameter piters("iters", "Number", "", "100", "", "", "","",false,true); parameters.push_back(piters); CommandParameter psave("save", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psave); CommandParameter pshortcuts("shortcuts", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pshortcuts); CommandParameter prelabund("relabund", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(prelabund); CommandParameter pnumwanted("numwanted", "Number", "", "10", "", "", "","",false,true); parameters.push_back(pnumwanted); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClassifySeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The classify.seqs command reads a fasta file containing sequences and creates a .taxonomy file and a .tax.summary file.\n"; helpString += "The classify.seqs command parameters are reference, fasta, name, group, count, search, ksize, method, taxonomy, processors, match, mismatch, gapopen, gapextend, numwanted, relabund and probs.\n"; helpString += "The reference, fasta and taxonomy parameters are required. You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amzon.fasta \n"; helpString += "The search parameter allows you to specify the method to find most similar template. Your options are: suffix, kmer, blast, align and distance. The default is kmer.\n"; helpString += "The name parameter allows you add a names file with your fasta file, if you enter multiple fasta files, you must enter matching names files for them.\n"; helpString += "The group parameter allows you add a group file so you can have the summary totals broken up by group.\n"; helpString += "The count parameter allows you add a count file so you can have the summary totals broken up by group.\n"; helpString += "The method parameter allows you to specify classification method to use. Your options are: wang, knn and zap. The default is wang.\n"; helpString += "The ksize parameter allows you to specify the kmer size for finding most similar template to candidate. The default is 8.\n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; #ifdef USE_MPI helpString += "When using MPI, the processors parameter is set to the number of MPI processes running. \n"; #endif helpString += "If the save parameter is set to true the reference sequences will be saved in memory, to clear them later you can use the clear.memory command. Default=f."; helpString += "The match parameter allows you to specify the bonus for having the same base. The default is 1.0.\n"; helpString += "The mistmatch parameter allows you to specify the penalty for having different bases. The default is -1.0.\n"; helpString += "The gapopen parameter allows you to specify the penalty for opening a gap in an alignment. The default is -2.0.\n"; helpString += "The gapextend parameter allows you to specify the penalty for extending a gap in an alignment. The default is -1.0.\n"; helpString += "The numwanted parameter allows you to specify the number of sequence matches you want with the knn method. The default is 10.\n"; helpString += "The cutoff parameter allows you to specify a bootstrap confidence threshold for your taxonomy. The default is 0.\n"; helpString += "The probs parameter shuts off the bootstrapping results for the wang and zap method. The default is true, meaning you want the bootstrapping to be shown.\n"; helpString += "The relabund parameter allows you to indicate you want the summary file values to be relative abundances rather than raw abundances. Default=F. \n"; helpString += "The iters parameter allows you to specify how many iterations to do when calculating the bootstrap confidence score for your taxonomy with the wang method. The default is 100.\n"; //helpString += "The flip parameter allows you shut off mothur's The default is T.\n"; helpString += "The classify.seqs command should be in the following format: \n"; helpString += "classify.seqs(reference=yourTemplateFile, fasta=yourFastaFile, method=yourClassificationMethod, search=yourSearchmethod, ksize=yourKmerSize, taxonomy=yourTaxonomyFile, processors=yourProcessors) \n"; helpString += "Example classify.seqs(fasta=amazon.fasta, reference=core.filtered, method=knn, search=gotoh, ksize=8, processors=2)\n"; helpString += "The .taxonomy file consists of 2 columns: 1 = your sequence name, 2 = the taxonomy for your sequence. \n"; helpString += "The .tax.summary is a summary of the different taxonomies represented in your fasta file. \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClassifySeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "taxonomy") { pattern = "[filename],[tag],[tag2],taxonomy"; } else if (type == "taxsummary") { pattern = "[filename],[tag],[tag2],tax.summary"; } else if (type == "accnos") { pattern = "[filename],[tag],[tag2],flip.accnos"; } else if (type == "matchdist") { pattern = "[filename],[tag],[tag2],match.dist"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClassifySeqsCommand::ClassifySeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["taxsummary"] = tempOutNames; outputTypes["matchdist"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "ClassifySeqsCommand"); exit(1); } } //********************************************************************************************************************** ClassifySeqsCommand::ClassifySeqsCommand(string option) { try { abort = false; calledHelp = false; rdb = ReferenceDB::getInstance(); hasName = false; hasCount=false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("classify.seqs"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["taxsummary"] = tempOutNames; outputTypes["matchdist"] = tempOutNames; outputTypes["accnos"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } } fastaFileName = validParameter.validFile(parameters, "fasta", false); if (fastaFileName == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else { m->splitAtDash(fastaFileName, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } namefile = validParameter.validFile(parameters, "name", false); if (namefile == "not found") { namefile = ""; } else { m->splitAtDash(namefile, namefileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < namefileNames.size(); i++) { bool ignore = false; if (namefileNames[i] == "current") { namefileNames[i] = m->getNameFile(); if (namefileNames[i] != "") { m->mothurOut("Using " + namefileNames[i] + " as input file for the name parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list namefileNames.erase(namefileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(namefileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { namefileNames[i] = inputDir + namefileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(namefileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(namefileNames[i]); m->mothurOut("Unable to open " + namefileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); namefileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(namefileNames[i]); m->mothurOut("Unable to open " + namefileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); namefileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + namefileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); abort = true; //erase from file list namefileNames.erase(namefileNames.begin()+i); i--; }else { m->setNameFile(namefileNames[i]); } } } } if (namefileNames.size() != 0) { hasName = true; } if (namefile != "") { if (namefileNames.size() != fastaFileNames.size()) { abort = true; m->mothurOut("If you provide a name file, you must have one for each fasta file."); m->mothurOutEndLine(); } } //check for required parameters countfile = validParameter.validFile(parameters, "count", false); if (countfile == "not found") { countfile = ""; }else { m->splitAtDash(countfile, countfileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < countfileNames.size(); i++) { bool ignore = false; if (countfileNames[i] == "current") { countfileNames[i] = m->getCountTableFile(); if (countfileNames[i] != "") { m->mothurOut("Using " + countfileNames[i] + " as input file for the count parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current count file, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list countfileNames.erase(countfileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(countfileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { countfileNames[i] = inputDir + countfileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(countfileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(countfileNames[i]); m->mothurOut("Unable to open " + countfileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); countfileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(countfileNames[i]); m->mothurOut("Unable to open " + countfileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); countfileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + countfileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list countfileNames.erase(countfileNames.begin()+i); i--; }else { m->setCountTableFile(countfileNames[i]); } } } } if (countfileNames.size() != 0) { hasCount = true; if (countfileNames.size() != fastaFileNames.size()) {m->mothurOut("If you provide a count file, you must have one for each fasta file."); m->mothurOutEndLine(); } } //make sure there is at least one valid file left if (hasName && hasCount) { m->mothurOut("[ERROR]: You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } groupfile = validParameter.validFile(parameters, "group", false); if (groupfile == "not found") { groupfile = ""; } else { m->splitAtDash(groupfile, groupfileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < groupfileNames.size(); i++) { bool ignore = false; if (groupfileNames[i] == "current") { groupfileNames[i] = m->getGroupFile(); if (groupfileNames[i] != "") { m->mothurOut("Using " + groupfileNames[i] + " as input file for the group parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current group file, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list groupfileNames.erase(groupfileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(groupfileNames[i]); //cout << path << '\t' << inputDir << endl; //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { groupfileNames[i] = inputDir + groupfileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(groupfileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(groupfileNames[i]); m->mothurOut("Unable to open " + groupfileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupfileNames[i] = tryPath; } } if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(groupfileNames[i]); m->mothurOut("Unable to open " + groupfileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupfileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + groupfileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list groupfileNames.erase(groupfileNames.begin()+i); i--; }else { m->setGroupFile(groupfileNames[i]); } } } } if (groupfile != "") { if (groupfileNames.size() != fastaFileNames.size()) { abort = true; m->mothurOut("If you provide a group file, you must have one for each fasta file."); m->mothurOutEndLine(); } if (hasCount) { m->mothurOut("[ERROR]: You must enter ONLY ONE of the following: count or group."); m->mothurOutEndLine(); abort = true; } }else { for (int i = 0; i < fastaFileNames.size(); i++) { groupfileNames.push_back(""); } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "save", false); if (temp == "not found"){ temp = "f"; } save = m->isTrue(temp); rdb->save = save; if (save) { //clear out old references rdb->clearMemory(); } //this has to go after save so that if the user sets save=t and provides no reference we abort templateFileName = validParameter.validFile(parameters, "reference", true); if (templateFileName == "not found") { //check for saved reference sequences if (rdb->referenceSeqs.size() != 0) { templateFileName = "saved"; }else { m->mothurOut("[ERROR]: You don't have any saved reference sequences and the reference parameter is a required for the classify.seqs command."); m->mothurOutEndLine(); abort = true; } }else if (templateFileName == "not open") { abort = true; } else { if (save) { rdb->setSavedReference(templateFileName); } } //this has to go after save so that if the user sets save=t and provides no reference we abort taxonomyFileName = validParameter.validFile(parameters, "taxonomy", true); if (taxonomyFileName == "not found") { //check for saved reference sequences if (rdb->wordGenusProb.size() != 0) { taxonomyFileName = "saved"; }else { m->mothurOut("[ERROR]: You don't have any saved taxonomy information and the taxonomy parameter is a required for the classify.seqs command."); m->mothurOutEndLine(); abort = true; } }else if (taxonomyFileName == "not open") { abort = true; } else { if (save) { rdb->setSavedTaxonomy(taxonomyFileName); } } search = validParameter.validFile(parameters, "search", false); if (search == "not found"){ search = "kmer"; } method = validParameter.validFile(parameters, "method", false); if (method == "not found"){ method = "wang"; } temp = validParameter.validFile(parameters, "ksize", false); if (temp == "not found"){ temp = "8"; if (method == "zap") { temp = "7"; } } m->mothurConvert(temp, kmerSize); temp = validParameter.validFile(parameters, "match", false); if (temp == "not found"){ temp = "1.0"; } m->mothurConvert(temp, match); temp = validParameter.validFile(parameters, "mismatch", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, misMatch); temp = validParameter.validFile(parameters, "gapopen", false); if (temp == "not found"){ temp = "-2.0"; } m->mothurConvert(temp, gapOpen); temp = validParameter.validFile(parameters, "gapextend", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, gapExtend); temp = validParameter.validFile(parameters, "numwanted", false); if (temp == "not found"){ temp = "10"; } m->mothurConvert(temp, numWanted); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, cutoff); temp = validParameter.validFile(parameters, "probs", false); if (temp == "not found"){ temp = "true"; } probs = m->isTrue(temp); temp = validParameter.validFile(parameters, "relabund", false); if (temp == "not found"){ temp = "false"; } relabund = m->isTrue(temp); temp = validParameter.validFile(parameters, "shortcuts", false); if (temp == "not found"){ temp = "true"; } writeShortcuts = m->isTrue(temp); //temp = validParameter.validFile(parameters, "flip", false); if (temp == "not found"){ temp = "T"; } //flip = m->isTrue(temp); flip = true; temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, iters); if ((method == "wang") && (search != "kmer")) { m->mothurOut("The wang method requires the kmer search. " + search + " will be disregarded, and kmer will be used." ); m->mothurOutEndLine(); search = "kmer"; } if ((method == "zap") && ((search != "kmer") && (search != "align"))) { m->mothurOut("The zap method requires the kmer or align search. " + search + " will be disregarded, and kmer will be used." ); m->mothurOutEndLine(); search = "kmer"; } if (!abort) { if (!hasCount) { if (namefileNames.size() == 0){ if (fastaFileNames.size() != 0) { vector files; files.push_back(fastaFileNames[fastaFileNames.size()-1]); parser.getNameFile(files); } } } } } } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "ClassifySeqsCommand"); exit(1); } } //********************************************************************************************************************** ClassifySeqsCommand::~ClassifySeqsCommand(){ if (abort == false) { for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); } } //********************************************************************************************************************** int ClassifySeqsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } string outputMethodTag = method; if(method == "wang"){ classify = new Bayesian(taxonomyFileName, templateFileName, search, kmerSize, cutoff, iters, rand(), flip, writeShortcuts); } else if(method == "knn"){ classify = new Knn(taxonomyFileName, templateFileName, search, kmerSize, gapOpen, gapExtend, match, misMatch, numWanted, rand()); } else if(method == "zap"){ outputMethodTag = search + "_" + outputMethodTag; if (search == "kmer") { classify = new KmerTree(templateFileName, taxonomyFileName, kmerSize, cutoff); } else { classify = new AlignTree(templateFileName, taxonomyFileName, cutoff); } } else { m->mothurOut(search + " is not a valid method option. I will run the command using wang."); m->mothurOutEndLine(); classify = new Bayesian(taxonomyFileName, templateFileName, search, kmerSize, cutoff, iters, rand(), flip, writeShortcuts); } if (m->control_pressed) { delete classify; return 0; } for (int s = 0; s < fastaFileNames.size(); s++) { m->mothurOut("Classifying sequences from " + fastaFileNames[s] + " ..." ); m->mothurOutEndLine(); string baseTName = m->getSimpleName(taxonomyFileName); if (taxonomyFileName == "saved") { baseTName = rdb->getSavedTaxonomy(); } //set rippedTaxName to string RippedTaxName = ""; bool foundDot = false; for (int i = baseTName.length()-1; i >= 0; i--) { if (foundDot && (baseTName[i] != '.')) { RippedTaxName = baseTName[i] + RippedTaxName; } else if (foundDot && (baseTName[i] == '.')) { break; } else if (!foundDot && (baseTName[i] == '.')) { foundDot = true; } } //if (RippedTaxName != "") { RippedTaxName += "."; } if (outputDir == "") { outputDir += m->hasPath(fastaFileNames[s]); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])); variables["[tag]"] = RippedTaxName; variables["[tag2]"] = outputMethodTag; string newTaxonomyFile = getOutputFileName("taxonomy", variables); string newaccnosFile = getOutputFileName("accnos", variables); string tempTaxonomyFile = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])) + "taxonomy.temp"; string taxSummary = getOutputFileName("taxsummary", variables); if ((method == "knn") && (search == "distance")) { string DistName = getOutputFileName("matchdist", variables); classify->setDistName(DistName); outputNames.push_back(DistName); outputTypes["matchdist"].push_back(DistName); } outputNames.push_back(newTaxonomyFile); outputTypes["taxonomy"].push_back(newTaxonomyFile); outputNames.push_back(taxSummary); outputTypes["taxsummary"].push_back(taxSummary); int start = time(NULL); int numFastaSeqs = 0; for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); #ifdef USE_MPI int pid, numSeqsPerProcessor; int tag = 2001; vector MPIPos; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_File inMPI; MPI_File outMPINewTax; MPI_File outMPITempTax; MPI_File outMPIAcc; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outNewTax[1024]; strcpy(outNewTax, newTaxonomyFile.c_str()); char outTempTax[1024]; strcpy(outTempTax, tempTaxonomyFile.c_str()); char outAcc[1024]; strcpy(outAcc, newaccnosFile.c_str()); char inFileName[1024]; strcpy(inFileName, fastaFileNames[s].c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outNewTax, outMode, MPI_INFO_NULL, &outMPINewTax); MPI_File_open(MPI_COMM_WORLD, outTempTax, outMode, MPI_INFO_NULL, &outMPITempTax); MPI_File_open(MPI_COMM_WORLD, outAcc, outMode, MPI_INFO_NULL, &outMPIAcc); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&inMPI); MPI_File_close(&outMPINewTax); MPI_File_close(&outMPIAcc); MPI_File_close(&outMPITempTax); delete classify; return 0; } if (pid == 0) { //you are the root process MPIPos = m->setFilePosFasta(fastaFileNames[s], numFastaSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numFastaSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (numFastaSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } //figure out how many sequences you have to align numSeqsPerProcessor = numFastaSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPINewTax, outMPITempTax, outMPIAcc, MPIPos); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&inMPI); MPI_File_close(&outMPINewTax); MPI_File_close(&outMPIAcc); MPI_File_close(&outMPITempTax); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete classify; return 0; } for (int i = 1; i < processors; i++) { int done; MPI_Recv(&done, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &status); } }else{ //you are a child process MPI_Recv(&numFastaSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(numFastaSeqs+1); MPI_Recv(&MPIPos[0], (numFastaSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = numFastaSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPINewTax, outMPITempTax, outMPIAcc, MPIPos); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&inMPI); MPI_File_close(&outMPINewTax); MPI_File_close(&outMPIAcc); MPI_File_close(&outMPITempTax); delete classify; return 0; } int done = 0; MPI_Send(&done, 1, MPI_INT, 0, tag, MPI_COMM_WORLD); } //close files MPI_File_close(&inMPI); MPI_File_close(&outMPINewTax); MPI_File_close(&outMPITempTax); MPI_File_close(&outMPIAcc); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else vector positions; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastaFileNames[s], processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } #else if (processors == 1) { lines.push_back(new linePair(0, 1000)); }else { positions = m->setFilePosFasta(fastaFileNames[s], numFastaSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(new linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif if(processors == 1){ numFastaSeqs = driver(lines[0], newTaxonomyFile, tempTaxonomyFile, newaccnosFile, fastaFileNames[s]); }else{ numFastaSeqs = createProcesses(newTaxonomyFile, tempTaxonomyFile, newaccnosFile, fastaFileNames[s]); } #endif if (!m->isBlank(newaccnosFile)) { m->mothurOutEndLine(); m->mothurOut("[WARNING]: mothur reversed some your sequences for a better classification. If you would like to take a closer look, please check " + newaccnosFile + " for the list of the sequences."); m->mothurOutEndLine(); outputNames.push_back(newaccnosFile); outputTypes["accnos"].push_back(newaccnosFile); }else { m->mothurRemove(newaccnosFile); } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to classify " + toString(numFastaSeqs) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); start = time(NULL); #ifdef USE_MPI if (pid == 0) { //this part does not need to be paralellized if(namefile != "") { m->mothurOut("Reading " + namefileNames[s] + "..."); cout.flush(); MPIReadNamesFile(namefileNames[s]); m->mothurOut(" Done."); m->mothurOutEndLine(); } #else //read namefile if(namefile != "") { m->mothurOut("Reading " + namefileNames[s] + "..."); cout.flush(); nameMap.clear(); //remove old names m->readNames(namefileNames[s], nameMap); m->mothurOut(" Done."); m->mothurOutEndLine(); } #endif string group = ""; GroupMap* groupMap = NULL; CountTable* ct = NULL; PhyloSummary* taxaSum; if (hasCount) { ct = new CountTable(); ct->readTable(countfileNames[s], true, false); taxaSum = new PhyloSummary(taxonomyFileName, ct, relabund); taxaSum->summarize(tempTaxonomyFile); }else { if (groupfile != "") { group = groupfileNames[s]; groupMap = new GroupMap(group); groupMap->readMap(); } taxaSum = new PhyloSummary(taxonomyFileName, groupMap, relabund); if (m->control_pressed) { outputTypes.clear(); if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } delete taxaSum; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete classify; return 0; } if (namefile == "") { taxaSum->summarize(tempTaxonomyFile); } else { ifstream in; m->openInputFile(tempTaxonomyFile, in); //read in users taxonomy file and add sequences to tree string name, taxon; while(!in.eof()){ if (m->control_pressed) { outputTypes.clear(); if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } delete taxaSum; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete classify; return 0; } in >> name >> taxon; m->gobble(in); itNames = nameMap.find(name); if (itNames == nameMap.end()) { m->mothurOut(name + " is not in your name file please correct."); m->mothurOutEndLine(); exit(1); }else{ for (int i = 0; i < itNames->second.size(); i++) { taxaSum->addSeqToTree(itNames->second[i], taxon); //add it as many times as there are identical seqs } itNames->second.clear(); nameMap.erase(itNames->first); } } in.close(); } } m->mothurRemove(tempTaxonomyFile); if (m->control_pressed) { outputTypes.clear(); if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete classify; return 0; } //print summary file ofstream outTaxTree; m->openOutputFile(taxSummary, outTaxTree); taxaSum->print(outTaxTree); outTaxTree.close(); //output taxonomy with the unclassified bins added ifstream inTax; m->openInputFile(newTaxonomyFile, inTax); ofstream outTax; string unclass = newTaxonomyFile + ".unclass.temp"; m->openOutputFile(unclass, outTax); //get maxLevel from phylotree so you know how many 'unclassified's to add int maxLevel = taxaSum->getMaxLevel(); //read taxfile - this reading and rewriting is done to preserve the confidence scores. string name, taxon; while (!inTax.eof()) { if (m->control_pressed) { outputTypes.clear(); if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } delete taxaSum; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } m->mothurRemove(unclass); delete classify; return 0; } inTax >> name >> taxon; m->gobble(inTax); string newTax = addUnclassifieds(taxon, maxLevel); outTax << name << '\t' << newTax << endl; } inTax.close(); outTax.close(); if (ct != NULL) { delete ct; } if (groupMap != NULL) { delete groupMap; } delete taxaSum; m->mothurRemove(newTaxonomyFile); rename(unclass.c_str(), newTaxonomyFile.c_str()); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to create the summary file for " + toString(numFastaSeqs) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); #ifdef USE_MPI } #endif } delete classify; m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set taxonomy file as new current taxonomyfile string current = ""; itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } current = ""; itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "execute"); exit(1); } } /**************************************************************************************************/ string ClassifySeqsCommand::addUnclassifieds(string tax, int maxlevel) { try{ string newTax, taxon; int level = 0; //keep what you have counting the levels while (tax.find_first_of(';') != -1) { //get taxon taxon = tax.substr(0,tax.find_first_of(';'))+';'; tax = tax.substr(tax.find_first_of(';')+1, tax.length()); newTax += taxon; level++; } //add "unclassified" until you reach maxLevel while (level < maxlevel) { newTax += "unclassified;"; level++; } return newTax; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "addUnclassifieds"); exit(1); } } /**************************************************************************************************/ int ClassifySeqsCommand::createProcesses(string taxFileName, string tempTaxFile, string accnos, string filename) { try { int num = 0; processIDS.clear(); bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], taxFileName + m->mothurGetpid(process) + ".temp", tempTaxFile + m->mothurGetpid(process) + ".temp", accnos + m->mothurGetpid(process) + ".temp", filename); //pass numSeqs to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 1; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], taxFileName + m->mothurGetpid(process) + ".temp", tempTaxFile + m->mothurGetpid(process) + ".temp", accnos + m->mothurGetpid(process) + ".temp", filename); //pass numSeqs to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //parent does its part num = driver(lines[0], taxFileName, tempTaxFile, accnos, filename); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(m->getFullPathName(tempFile)); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the alignData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; istart, lines[i]->end, match, misMatch, gapOpen, gapExtend, cutoff, i, flip, writeShortcuts); pDataArray.push_back(tempclass); //MySeqSumThreadFunction is in header. It must be global or static to work with the threads. //default security attributes, thread function name, argument to thread function, use default creation flags, returns the thread identifier hThreadArray[i] = CreateThread(NULL, 0, MyClassThreadFunction, pDataArray[i], 0, &dwThreadIdArray[i]); } //parent does its part num = driver(lines[processors-1], taxFileName + toString(processors-1) + ".temp", tempTaxFile + toString(processors-1) + ".temp", accnos + toString(processors-1) + ".temp", filename); processIDS.push_back((processors-1)); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ num += pDataArray[i]->count; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif vector nonBlankAccnosFiles; if (!(m->isBlank(accnos))) { nonBlankAccnosFiles.push_back(accnos); } else { m->mothurRemove(accnos); } //remove so other files can be renamed to it for(int i=0;iappendFiles((taxFileName + toString(processIDS[i]) + ".temp"), taxFileName); m->appendFiles((tempTaxFile + toString(processIDS[i]) + ".temp"), tempTaxFile); if (!(m->isBlank(accnos + toString(processIDS[i]) + ".temp"))) { nonBlankAccnosFiles.push_back(accnos + toString(processIDS[i]) + ".temp"); }else { m->mothurRemove((accnos + toString(processIDS[i]) + ".temp")); } m->mothurRemove((m->getFullPathName(taxFileName) + toString(processIDS[i]) + ".temp")); m->mothurRemove((m->getFullPathName(tempTaxFile) + toString(processIDS[i]) + ".temp")); } //append accnos files if (nonBlankAccnosFiles.size() != 0) { rename(nonBlankAccnosFiles[0].c_str(), accnos.c_str()); for (int h=1; h < nonBlankAccnosFiles.size(); h++) { m->appendFiles(nonBlankAccnosFiles[h], accnos); m->mothurRemove(nonBlankAccnosFiles[h]); } }else { //recreate the accnosfile if needed ofstream out; m->openOutputFile(accnos, out); out.close(); } return num; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** int ClassifySeqsCommand::driver(linePair* filePos, string taxFName, string tempTFName, string accnos, string filename){ try { ofstream outTax; m->openOutputFile(taxFName, outTax); ofstream outTaxSimple; m->openOutputFile(tempTFName, outTaxSimple); ofstream outAcc; m->openOutputFile(accnos, outAcc); ifstream inFASTA; m->openInputFile(filename, inFASTA); string taxonomy; inFASTA.seekg(filePos->start); bool done = false; int count = 0; while (!done) { if (m->control_pressed) { inFASTA.close(); outTax.close(); outTaxSimple.close(); outAcc.close(); return 0; } Sequence* candidateSeq = new Sequence(inFASTA); m->gobble(inFASTA); if (candidateSeq->getName() != "") { taxonomy = classify->getTaxonomy(candidateSeq); if (m->control_pressed) { delete candidateSeq; return 0; } if (taxonomy == "unknown;") { m->mothurOut("[WARNING]: " + candidateSeq->getName() + " could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences."); m->mothurOutEndLine(); } //output confidence scores or not if (probs) { outTax << candidateSeq->getName() << '\t' << taxonomy << endl; }else{ outTax << candidateSeq->getName() << '\t' << classify->getSimpleTax() << endl; } if (classify->getFlipped()) { outAcc << candidateSeq->getName() << endl; } outTaxSimple << candidateSeq->getName() << '\t' << classify->getSimpleTax() << endl; count++; } delete candidateSeq; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= filePos->end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count) +"\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count)+"\n"); } inFASTA.close(); outTax.close(); outTaxSimple.close(); outAcc.close(); return count; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "driver"); exit(1); } } //********************************************************************************************************************** #ifdef USE_MPI int ClassifySeqsCommand::driverMPI(int start, int num, MPI_File& inMPI, MPI_File& newFile, MPI_File& tempFile, MPI_File& accFile, vector& MPIPos){ try { MPI_Status statusNew; MPI_Status statusTemp; MPI_Status statusAcc; MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are string taxonomy; string outputString; for(int i=0;icontrol_pressed) { return 0; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); delete buf4; Sequence* candidateSeq = new Sequence(iss); if (candidateSeq->getName() != "") { taxonomy = classify->getTaxonomy(candidateSeq); if (taxonomy == "unknown;") { m->mothurOut("[WARNING]: " + candidateSeq->getName() + " could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences."); m->mothurOutEndLine(); } //output confidence scores or not if (probs) { outputString = candidateSeq->getName() + "\t" + taxonomy + "\n"; }else{ outputString = candidateSeq->getName() + "\t" + classify->getSimpleTax() + "\n"; } int length = outputString.length(); char* buf2 = new char[length]; memcpy(buf2, outputString.c_str(), length); MPI_File_write_shared(newFile, buf2, length, MPI_CHAR, &statusNew); delete buf2; outputString = candidateSeq->getName() + "\t" + classify->getSimpleTax() + "\n"; length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(tempFile, buf, length, MPI_CHAR, &statusTemp); delete buf; if (classify->getFlipped()) { outputString = candidateSeq->getName() + "\n"; length = outputString.length(); char* buf3 = new char[length]; memcpy(buf3, outputString.c_str(), length); MPI_File_write_shared(accFile, buf3, length, MPI_CHAR, &statusAcc); delete buf3; } } delete candidateSeq; if((i+1) % 100 == 0){ cout << "Classifying sequence " << (i+1) << endl; } } if(num % 100 != 0){ cout << "Classifying sequence " << (num) << endl; } return 1; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "driverMPI"); exit(1); } } //********************************************************************************************************************** int ClassifySeqsCommand::MPIReadNamesFile(string nameFilename){ try { nameMap.clear(); //remove old names MPI_File inMPI; MPI_Offset size; MPI_Status status; //char* inFileName = new char[nameFilename.length()]; //memcpy(inFileName, nameFilename.c_str(), nameFilename.length()); char inFileName[1024]; strcpy(inFileName, nameFilename.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); MPI_File_get_size(inMPI, &size); //delete inFileName; char* buffer = new char[size]; MPI_File_read(inMPI, buffer, size, MPI_CHAR, &status); string tempBuf = buffer; if (tempBuf.length() > size) { tempBuf = tempBuf.substr(0, size); } istringstream iss (tempBuf,istringstream::in); delete buffer; string firstCol, secondCol; while(!iss.eof()) { iss >> firstCol >> secondCol; m->gobble(iss); vector temp; m->splitAtComma(secondCol, temp); nameMap[firstCol] = temp; } MPI_File_close(&inMPI); return 1; } catch(exception& e) { m->errorOut(e, "ClassifySeqsCommand", "MPIReadNamesFile"); exit(1); } } #endif /**************************************************************************************************/ mothur-1.36.1/source/commands/classifyseqscommand.h000066400000000000000000000212051255543666200224600ustar00rootroot00000000000000#ifndef CLASSIFYSEQSCOMMAND_H #define CLASSIFYSEQSCOMMAND_H /* * classifyseqscommand.h * Mothur * * Created by westcott on 11/2/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "classify.h" #include "referencedb.h" #include "sequence.hpp" #include "bayesian.h" #include "phylotree.h" #include "phylosummary.h" #include "knn.h" #include "kmertree.h" #include "aligntree.h" //KNN and Wang methods modeled from algorithms in //Naı¨ve Bayesian Classifier for Rapid Assignment of rRNA Sequences //into the New Bacterial Taxonomy􏰎† //Qiong Wang,1 George M. Garrity,1,2 James M. Tiedje,1,2 and James R. Cole1* //Center for Microbial Ecology1 and Department of Microbiology and Molecular Genetics,2 Michigan State University, //East Lansing, Michigan 48824 //Received 10 January 2007/Accepted 18 June 2007 class ClassifySeqsCommand : public Command { public: ClassifySeqsCommand(string); ClassifySeqsCommand(); ~ClassifySeqsCommand(); vector setParameters(); string getCommandName() { return "classify.seqs"; } string getCommandCategory() { return "Phylotype Analysis"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Wang Q, Garrity GM, Tiedje JM, Cole JR (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73: 5261-7. [ for Bayesian classifier ] \nAltschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389-402. [ for BLAST ] \nDeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, Huber T, Dalevi D, Hu P, Andersen GL (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72: 5069-72. [ for kmer ] \nhttp://www.mothur.org/wiki/Classify.seqs"; } string getDescription() { return "classify sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector processIDS; //processid vector lines; vector fastaFileNames; vector namefileNames; vector countfileNames; vector groupfileNames; vector outputNames; map > nameMap; map >::iterator itNames; Classify* classify; ReferenceDB* rdb; string fastaFileName, templateFileName, countfile, distanceFileName, namefile, search, method, taxonomyFileName, outputDir, groupfile; int processors, kmerSize, numWanted, cutoff, iters; float match, misMatch, gapOpen, gapExtend; bool abort, probs, save, flip, hasName, hasCount, writeShortcuts, relabund; int driver(linePair*, string, string, string, string); int createProcesses(string, string, string, string); string addUnclassifieds(string, int); int MPIReadNamesFile(string); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, MPI_File&, MPI_File&, MPI_File&, vector&); #endif }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct classifyData { string taxFName; string tempTFName; string filename; string search, taxonomyFileName, templateFileName, method, accnos; unsigned long long start; unsigned long long end; MothurOut* m; float match, misMatch, gapOpen, gapExtend; int count, kmerSize, threadID, cutoff, iters, numWanted; bool probs, flip, writeShortcuts; classifyData(){} classifyData(string acc, bool p, string me, string te, string tx, string a, string r, string f, string se, int ks, int i, int numW, MothurOut* mout, unsigned long long st, unsigned long long en, float ma, float misMa, float gapO, float gapE, int cut, int tid, bool fli, bool wsh) { accnos = acc; taxonomyFileName = tx; templateFileName = te; taxFName = a; tempTFName = r; filename = f; search = se; method = me; m = mout; start = st; end = en; match = ma; misMatch = misMa; gapOpen = gapO; gapExtend = gapE; kmerSize = ks; cutoff = cut; iters = i; numWanted = numW; threadID = tid; probs = p; count = 0; flip = fli; writeShortcuts = wsh; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyClassThreadFunction(LPVOID lpParam){ classifyData* pDataArray; pDataArray = (classifyData*)lpParam; try { ofstream outTax; pDataArray->m->openOutputFile(pDataArray->taxFName, outTax); ofstream outTaxSimple; pDataArray->m->openOutputFile(pDataArray->tempTFName, outTaxSimple); ofstream outAcc; pDataArray->m->openOutputFile(pDataArray->accnos, outAcc); ifstream inFASTA; pDataArray->m->openInputFile(pDataArray->filename, inFASTA); string taxonomy; //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { inFASTA.seekg(0); pDataArray->m->zapGremlins(inFASTA); }else { //this accounts for the difference in line endings. inFASTA.seekg(pDataArray->start-1); pDataArray->m->gobble(inFASTA); } //make classify Classify* myclassify; string outputMethodTag = pDataArray->method + "."; if(pDataArray->method == "wang"){ myclassify = new Bayesian(pDataArray->taxonomyFileName, pDataArray->templateFileName, pDataArray->search, pDataArray->kmerSize, pDataArray->cutoff, pDataArray->iters, pDataArray->threadID, pDataArray->flip, pDataArray->writeShortcuts); } else if(pDataArray->method == "knn"){ myclassify = new Knn(pDataArray->taxonomyFileName, pDataArray->templateFileName, pDataArray->search, pDataArray->kmerSize, pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch, pDataArray->numWanted, pDataArray->threadID); } else if(pDataArray->method == "zap"){ outputMethodTag = pDataArray->search + "_" + outputMethodTag; if (pDataArray->search == "kmer") { myclassify = new KmerTree(pDataArray->templateFileName, pDataArray->taxonomyFileName, pDataArray->kmerSize, pDataArray->cutoff); } else { myclassify = new AlignTree(pDataArray->templateFileName, pDataArray->taxonomyFileName, pDataArray->cutoff); } } else { pDataArray->m->mothurOut(pDataArray->method + " is not a valid method option. I will run the command using wang."); pDataArray->m->mothurOutEndLine(); myclassify = new Bayesian(pDataArray->taxonomyFileName, pDataArray->templateFileName, pDataArray->search, pDataArray->kmerSize, pDataArray->cutoff, pDataArray->iters, pDataArray->threadID, pDataArray->flip, pDataArray->writeShortcuts); } if (pDataArray->m->control_pressed) { delete myclassify; return 0; } pDataArray->count = 0; for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { delete myclassify; return 0; } Sequence* candidateSeq = new Sequence(inFASTA); pDataArray->m->gobble(inFASTA); if (candidateSeq->getName() != "") { taxonomy = myclassify->getTaxonomy(candidateSeq); if (pDataArray->m->control_pressed) { delete candidateSeq; return 0; } if (taxonomy == "unknown;") { pDataArray->m->mothurOut("[WARNING]: " + candidateSeq->getName() + " could not be classified. You can use the remove.lineage command with taxon=unknown; to remove such sequences."); pDataArray->m->mothurOutEndLine(); } //output confidence scores or not if (pDataArray->probs) { outTax << candidateSeq->getName() << '\t' << taxonomy << endl; }else{ outTax << candidateSeq->getName() << '\t' << myclassify->getSimpleTax() << endl; } outTaxSimple << candidateSeq->getName() << '\t' << myclassify->getSimpleTax() << endl; if (myclassify->getFlipped()) { outAcc << candidateSeq->getName() << endl; } pDataArray->count++; } delete candidateSeq; //report progress if((pDataArray->count) % 100 == 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(pDataArray->count)+"\n"); } } //report progress if((pDataArray->count) % 100 != 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(pDataArray->count)+"\n"); } delete myclassify; inFASTA.close(); outTax.close(); outTaxSimple.close(); } catch(exception& e) { pDataArray->m->errorOut(e, "ClassifySeqsCommand", "MyClassThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/classifysvmsharedcommand.cpp000066400000000000000000001055321255543666200240420ustar00rootroot00000000000000// // classifysvmsharedcommand.cpp // Mothur // // Created by Joshua Lynch on 6/28/2013. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "classifysvmsharedcommand.h" //********************************************************************************************************************** vector ClassifySvmSharedCommand::setParameters() { try { //CommandParameter pprocessors("processors", "Number", "", "1", "", "", "",false,false); parameters.push_back(pprocessors); CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none", "summary", false, true, true); parameters.push_back(pshared); CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none", "", false, true, true); parameters.push_back(pdesign); // RFE or classification? // mode should be either 'rfe' or 'classify' CommandParameter mode("mode", "String", "", "", "", "", "", "", false, false); parameters.push_back(mode); // cross validation parameters CommandParameter evaluationFoldCountParam("evaluationfolds", "Number", "", "3", "", "", "", "", false, false); parameters.push_back(evaluationFoldCountParam); CommandParameter trainingFoldCountParam("trainingfolds", "Number", "", "10", "", "", "", "", false, false); parameters.push_back(trainingFoldCountParam); CommandParameter smoc("smoc", "Number", "", "3", "", "", "", "", false, false); parameters.push_back(smoc); // Support Vector Machine parameters CommandParameter kernelParam("kernel", "String", "", "", "", "", "", "", false, false); parameters.push_back(kernelParam); // data transformation parameters // transform should be 'zeroone' or 'zeromean' ('zeromean' is default) CommandParameter transformParam("transform", "String", "", "", "", "", "", "", false, false); parameters.push_back(transformParam); CommandParameter verbosityParam("verbose", "Number", "", "0", "", "", "", "", false, false); parameters.push_back(verbosityParam); //CommandParameter potupersplit("otupersplit", "Multiple", "log2-squareroot", "log2", "", "", "","",false,false); parameters.push_back(potupersplit); //CommandParameter psplitcriteria("splitcriteria", "Multiple", "gainratio-infogain", "gainratio", "", "", "","",false,false); parameters.push_back(psplitcriteria); //CommandParameter pnumtrees("numtrees", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pnumtrees); // parameters related to pruning //CommandParameter pdopruning("prune", "Boolean", "", "T", "", "", "", "", false, false); parameters.push_back(pdopruning); //CommandParameter ppruneaggrns("pruneaggressiveness", "Number", "", "0.9", "", "", "", "", false, false); parameters.push_back(ppruneaggrns); //CommandParameter pdiscardhetrees("discarderrortrees", "Boolean", "", "T", "", "", "", "", false, false); parameters.push_back(pdiscardhetrees); //CommandParameter phetdiscardthreshold("errorthreshold", "Number", "", "0.4", "", "", "", "", false, false); parameters.push_back(phetdiscardthreshold); // want this parameter to behave like the one in classify.rf CommandParameter pstdthreshold("stdthreshold", "Number", "", "0.0", "", "", "", "", false, false); parameters.push_back(pstdthreshold); // pruning params end CommandParameter pgroups("groups", "String", "", "", "", "", "", "", false, false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "", "", false, false); parameters.push_back(plabel); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "", "", false, false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "", "", false, false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch (exception& e) { m->errorOut(e, "ClassifySvmSharedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClassifySvmSharedCommand::getHelpString() { try { string helpString = ""; helpString += "The classifysvm.shared command allows you to ....\n"; helpString += "The classifysvm.shared command parameters are: shared, design, label, groups.\n"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The groups parameter allows you to specify which of the groups in your designfile you would like analyzed.\n"; helpString += "The classifysvm.shared should be in the following format: \n"; helpString += "classifysvm.shared(shared=yourSharedFile, design=yourDesignFile)\n"; return helpString; } catch (exception& e) { m->errorOut(e, "ClassifySvmSharedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClassifySvmSharedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],[distance],summary"; } //makes file like: amazon.0.03.fasta else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch (exception& e) { m->errorOut(e, "ClassifySvmSharedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClassifySvmSharedCommand::ClassifySvmSharedCommand() { try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch (exception& e) { m->errorOut(e, "ClassifySvmSharedCommand", "ClassifySvmSharedCommand"); exit(1); } } // here is a little function from StackExchange for splitting a string on a single character // allows return value optimization //vector& split(const string &s, char delim, vector& elems) { // stringstream ss(s); // string item; // while (getline(ss, item, delim)) { // elems.push_back(item); // } // return elems; //} //********************************************************************************************************************** ClassifySvmSharedCommand::ClassifySvmSharedCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if (option == "help") { help(); abort = true; calledHelp = true; } else if (option == "citation") { citation(); abort = true; calledHelp = true; } else { //valid parameters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found") { inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a shared file if (it != parameters.end()) { path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("design"); //user has given a design file if (it != parameters.end()) { path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } } //check for parameters //get shared file, it is required sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } } else { m->setSharedFile(sharedfile); } //get design file, it is required designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { sharedfile = ""; abort = true; } else if (designfile == "not found") { //if there is a current shared file, use it designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current designfile and the design parameter is required."); m->mothurOutEndLine(); abort = true; } } else { m->setDesignFile(designfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found") { outputDir = m->hasPath(sharedfile); //if user entered a file with a path then preserve it } //Groups must be checked later to make sure they are valid. //SharedUtilities has functions of check the validity, just make to so m->setGroups() after the checks. //If you are using these with a shared file no need to check the SharedRAbundVector class will call SharedUtilites for you, //kinda nice, huh? string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); //Commonly used to process list, rabund, sabund, shared and relabund files. //Look at "smart distancing" examples below in the execute function. string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if (label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } string modeOption = validParameter.validFile(parameters, "mode", false); if ( modeOption == "not found" || modeOption == "rfe" ) { mode = "rfe"; } else if ( modeOption == "classify" ) { mode = "classify"; } else { m->mothurOut("the mode option " + modeOption + " is not recognized -- must be 'rfe' or 'classify'"); m->mothurOutEndLine(); abort = true; } string ef = validParameter.validFile(parameters, "evaluationfolds", false); if ( ef == "not found") { evaluationFoldCount = 3; } else { m->mothurConvert(ef, evaluationFoldCount); } string tf = validParameter.validFile(parameters, "trainingfolds", false); if ( tf == "not found") { trainingFoldCount = 5; } else { m->mothurConvert(tf, trainingFoldCount); } string smocOption = validParameter.validFile(parameters, "smoc", false); smocList.clear(); if ( smocOption == "not found" ) { //smocOption = "0.001,0.01,0.1,1.0,10.0,100.0,1000.0"; } else { vector smocOptionList; //split(smocOption, ';', smocOptionList); m->splitAtDash(smocOption, smocOptionList); for (vector::iterator i = smocOptionList.begin(); i != smocOptionList.end(); i++) { smocList.push_back(atof(i->c_str())); } } // kernel specification // start with default parameter ranges for all kernels kernelParameterRangeMap.clear(); getDefaultKernelParameterRangeMap(kernelParameterRangeMap); // get the kernel option string kernelOption = validParameter.validFile(parameters, "kernel", false); // if the kernel option is "not found" then use all kernels with default parameter ranges // otherwise use only kernels listed in the kernelOption string if ( kernelOption == "not found" ) { } else { // if the kernel option has been specified then // remove kernel parameters from the kernel parameter map if // they are not listed in the kernel option // at this point the kernelParameterRangeMap looks like this: // linear_key : [ // smoc_key : smoc parameter range // linear_constant_key : linear constant range // ] // rbf_key : [ // smoc_key : smoc parameter range // rbf_gamma_key : rbf gamma range // ] // polynomial_key : [ // smoc_key : smoc parameter range // polynomial_degree_key : polynomial degree range // polynomial_constant_key : polynomial constant range // ] vector kernelList; vector unspecifiedKernelList; //split(kernelOption, '-', kernelList); m->splitAtDash(kernelOption, kernelList); set kernelSet(kernelList.begin(), kernelList.end()); // make a list of strings that are keys in the kernel parameter range map // but are not in the kernel list for (KernelParameterRangeMap::iterator i = kernelParameterRangeMap.begin(); i != kernelParameterRangeMap.end(); i++) { //cout << "looking for kernel " << *i << " in kernel option" << endl; //should be kernelList here string kernelKey = i->first; if ( kernelSet.find(kernelKey) == kernelSet.end() ) { unspecifiedKernelList.push_back(kernelKey); } } for (vector::iterator i = unspecifiedKernelList.begin(); i != unspecifiedKernelList.end(); i++) { m->mothurOut("removing kernel " + *i ); m->mothurOutEndLine(); kernelParameterRangeMap.erase(*i); } } // go through the kernel parameter range map and check for options for each kernel for (KernelParameterRangeMap::iterator i = kernelParameterRangeMap.begin(); i != kernelParameterRangeMap.end(); i++) { string kernelKey = i->first; ParameterRangeMap& kernelParameters = i->second; for (ParameterRangeMap::iterator j = kernelParameters.begin(); j != kernelParameters.end(); j++) { string parameterKey = j->first; ParameterRange& kernelParameterRange = j->second; // has an option for this kernel parameter been specified? string kernelParameterKey = kernelKey + parameterKey; //m->mothurOut("looking for option " << kernelParameterKey << endl; string kernelParameterOption = validParameter.validFile(parameters, kernelParameterKey, false); if (kernelParameterOption == "not found") { // we already have default values in the kernel parameter map } else { // replace the default parameters with the specified parameters kernelParameterRange.clear(); vector parameterList; //split(kernelParameterOption, ';', parameterList); m->splitAtDash(kernelParameterOption, parameterList); for (vector::iterator k = parameterList.begin(); k != parameterList.end(); k++) { kernelParameterRange.push_back(atof(k->c_str())); } } } } // get the normalization option string transformOption = validParameter.validFile(parameters, "transform", false); if ( transformOption == "not found" || transformOption == "unitmean") { transformName = "unitmean"; } else if ( transformOption == "zeroone" ) { transformName = "zeroone"; } else { m->mothurOut("the transform option " + transformOption + " is not recognized -- must be 'unitmean' or 'zeroone'"); m->mothurOutEndLine(); abort = true; } // get the verbosity option string verbosityOption = validParameter.validFile(parameters, "verbose", false); if ( verbosityOption == "not found") { verbosity = 0; } else { m->mothurConvert(tf, verbosity); if (verbosity < OutputFilter::QUIET || verbosity > OutputFilter::TRACE) { m->mothurOut("verbose set to unsupported value " + verbosityOption + " -- must be between 0 and 3"); } } // get the std threshold option string stdthresholdOption = validParameter.validFile(parameters, "stdthreshold", false); if ( stdthresholdOption == "not found" ) { stdthreshold = -1.0; } else { m->mothurConvert(stdthresholdOption, stdthreshold); if ( stdthreshold <= 0.0 ) { m->mothurOut("stdthreshold set to unsupported value " + stdthresholdOption + " -- must be greater than 0.0"); } } } } catch (exception& e) { m->errorOut(e, "ClassifySvmSharedCommand", "ClassifySvmSharedCommand"); exit(1); } } //********************************************************************************************************************** int ClassifySvmSharedCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); //read design file designMap.read(designfile); string lastLabel = lookup[0]->getLabel(); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processSharedAndDesignData(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processSharedAndDesignData(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processSharedAndDesignData(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ClassifySharedCommand", "execute"); exit(1); } } //********************************************************************************************************************** // This static function is intended to read all the necessary information from // a pair of shared and design files needed for SVM classification. This information // is used to build a LabeledObservationVector. Each element of the LabeledObservationVector // looks like this: // LabeledObservationVector[0] = pair("label 0", &vector[10.0, 21.0, 13.0]) // where the vector in the second position of the pair records OTU abundances. void ClassifySvmSharedCommand::readSharedAndDesignFiles(const string& sharedFilePath, const string& designFilePath, LabeledObservationVector& labeledObservationVector, FeatureVector& featureVector) { InputData input(sharedFilePath, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); DesignMap designMap; designMap.read(designFilePath); while ( lookup[0] != NULL ) { readSharedRAbundVectors(lookup, designMap, labeledObservationVector, featureVector); lookup = input.getSharedRAbundVectors(); } } void ClassifySvmSharedCommand::readSharedRAbundVectors(vector& lookup, DesignMap& designMap, LabeledObservationVector& labeledObservationVector, FeatureVector& featureVector) { for ( int j = 0; j < lookup.size(); j++ ) { //i++; vector data = lookup[j]->getData(); Observation* observation = new Observation(data.size(), 0.0); string sharedGroupName = lookup[j]->getGroup(); string treatmentName = designMap.get(sharedGroupName); //cout << "shared group name: " << sharedGroupName << " treatment name: " << treatmentName << endl; //labeledObservationVector.push_back(make_pair(treatmentName, observation)); labeledObservationVector.push_back(LabeledObservation(j, treatmentName, observation)); //cout << " j=" << j << " label : " << lookup[j]->getLabel() << " group: " << lookup[j]->getGroup(); for (int k = 0; k < data.size(); k++) { //cout << " abundance " << data[k].abundance; observation->at(k) = double(data[k].abundance); if ( j == 0) { featureVector.push_back(Feature(k, m->currentSharedBinLabels[k])); } } //cout << endl; // let this happen later? //delete lookup[j]; } } void printPerformanceSummary(MultiClassSVM* s, ostream& output) { output << "multiclass SVM accuracy: " << s->getAccuracy() << endl; output << "two-class SVM performance" << endl; int labelFieldWidth = 2 + max_element(s->getLabels().begin(), s->getLabels().end())->size(); int performanceFieldWidth = 10; int performancePrecision = 3; output << setw(labelFieldWidth) << "class 1" << setw(labelFieldWidth) << "class 2" << setw(performanceFieldWidth) << "precision" << setw(performanceFieldWidth) << "recall" << setw(performanceFieldWidth) << "f" << setw(performanceFieldWidth) << "accuracy" << endl; for ( SvmVector::const_iterator svm = s->getSvmList().begin(); svm != s->getSvmList().end(); svm++ ) { SvmPerformanceSummary sps = s->getSvmPerformanceSummary(**svm); output << setw(labelFieldWidth) << setprecision(performancePrecision) << sps.getPositiveClassLabel() << setw(labelFieldWidth) << setprecision(performancePrecision) << sps.getNegativeClassLabel() << setw(performanceFieldWidth) << setprecision(performancePrecision) << sps.getPrecision() << setw(performanceFieldWidth) << setprecision(performancePrecision) << sps.getRecall() << setw(performanceFieldWidth) << setprecision(performancePrecision) << sps.getF() << setw(performanceFieldWidth) << setprecision(performancePrecision) << sps.getAccuracy() << endl; } } //********************************************************************************************************************** void ClassifySvmSharedCommand::processSharedAndDesignData(vector lookup) { try { OutputFilter outputFilter(verbosity); LabeledObservationVector labeledObservationVector; FeatureVector featureVector; readSharedRAbundVectors(lookup, designMap, labeledObservationVector, featureVector); // optionally remove features with low standard deviation if ( stdthreshold > 0.0 ) { FeatureVector removedFeatureVector = applyStdThreshold(stdthreshold, labeledObservationVector, featureVector); if (removedFeatureVector.size() > 0) { m->mothurOut(toString(removedFeatureVector.size()) + " OTUs were below the stdthreshold of " + toString(stdthreshold) + " and were removed"); m->mothurOutEndLine(); if ( outputFilter.debug() ) { m->mothurOut("the following OTUs were below the standard deviation threshold of " + toString(stdthreshold) ); m->mothurOutEndLine(); for (FeatureVector::iterator i = removedFeatureVector.begin(); i != removedFeatureVector.end(); i++) { m->mothurOut(" " + toString(i->getFeatureLabel()) ); m->mothurOutEndLine(); } } } } // apply [0,1] standardization if ( transformName == "zeroone") { m->mothurOut("transforming data to lie within range [0,1]"); m->mothurOutEndLine(); transformZeroOne(labeledObservationVector); } else { m->mothurOut("transforming data to have zero mean and unit variance"); m->mothurOutEndLine(); transformZeroMeanUnitVariance(labeledObservationVector); } SvmDataset svmDataset(labeledObservationVector, featureVector); OneVsOneMultiClassSvmTrainer trainer(svmDataset, evaluationFoldCount, trainingFoldCount, outputFilter); if ( mode == "rfe" ) { SvmRfe svmRfe; ParameterRange& linearKernelConstantRange = kernelParameterRangeMap["linear"]["constant"]; ParameterRange& linearKernelSmoCRange = kernelParameterRangeMap["linear"]["smoc"]; RankedFeatureList rankedFeatureList = svmRfe.getOrderedFeatureList(svmDataset, trainer, linearKernelConstantRange, linearKernelSmoCRange); map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = lookup[0]->getLabel(); string filename = getOutputFileName("summary", variables); outputNames.push_back(filename); outputTypes["summary"].push_back(filename); m->mothurOutEndLine(); ofstream outputFile(filename.c_str()); int n = 0; int rfeRoundCount = rankedFeatureList.front().getRank(); m->mothurOut("ordered features:" ); m->mothurOutEndLine(); m->mothurOut("index\tOTU\trank"); m->mothurOutEndLine(); outputFile << setw(5) << "index" << setw(12) << "OTU" << setw(5) << "rank" << endl; for (RankedFeatureList::iterator i = rankedFeatureList.begin(); i != rankedFeatureList.end(); i++) { n++; int rank = rfeRoundCount - i->getRank() + 1; outputFile << setw(5) << n << setw(12) << i->getFeature().getFeatureLabel() << setw(5) << rank ; m->mothurOutEndLine(); if ( n <= 20 ) { m->mothurOut(toString(n) + toString(i->getFeature().getFeatureLabel()) + toString(rank) ); m->mothurOutEndLine(); } } outputFile.close(); } else { MultiClassSVM* mcsvm = trainer.train(kernelParameterRangeMap); map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = lookup[0]->getLabel(); string filename = getOutputFileName("summary", variables); outputNames.push_back(filename); outputTypes["summary"].push_back(filename); m->mothurOutEndLine(); ofstream outputFile(filename.c_str()); printPerformanceSummary(mcsvm, cout); printPerformanceSummary(mcsvm, outputFile); outputFile << "actual predicted" << endl; for ( LabeledObservationVector::const_iterator i = labeledObservationVector.begin(); i != labeledObservationVector.end(); i++ ) { Label actualLabel = i->getLabel(); outputFile << i->getDatasetIndex() << " " << actualLabel << " "; try { Label predictedLabel = mcsvm->classify(*(i->getObservation())); outputFile << predictedLabel << endl; } catch ( MultiClassSvmClassificationTie& e ) { outputFile << "tie" << endl; m->mothurOut("classification tie for observation " + toString(i->datasetIndex) + " with label " + toString(i->first)); m->mothurOutEndLine(); } } outputFile.close(); delete mcsvm; } } catch (exception& e) { m->errorOut(e, "ClassifySvmSharedCommand", "processSharedAndDesignData"); exit(1); } } //********************************************************************************************************************** void ClassifySvmSharedCommand::trainSharedAndDesignData(vector lookup) { try { LabeledObservationVector labeledObservationVector; FeatureVector featureVector; readSharedRAbundVectors(lookup, designMap, labeledObservationVector, featureVector); SvmDataset svmDataset(labeledObservationVector, featureVector); int evaluationFoldCount = 3; int trainFoldCount = 5; OutputFilter outputFilter(2); OneVsOneMultiClassSvmTrainer t(svmDataset, evaluationFoldCount, trainFoldCount, outputFilter); KernelParameterRangeMap kernelParameterRangeMap; getDefaultKernelParameterRangeMap(kernelParameterRangeMap); t.train(kernelParameterRangeMap); m->mothurOut("done training" ); m->mothurOutEndLine(); map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = lookup[0]->getLabel(); string filename = getOutputFileName("summary", variables); outputNames.push_back(filename); outputTypes["summary"].push_back(filename); m->mothurOutEndLine(); m->mothurOut("leaving processSharedAndDesignData" ); m->mothurOutEndLine(); } catch (exception& e) { m->errorOut(e, "ClassifySvmSharedCommand", "trainSharedAndDesignData"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/classifysvmsharedcommand.h000066400000000000000000000063461255543666200235120ustar00rootroot00000000000000// // classifysvmsharedcommand.h // Mothur // // Created by Joshua Lynch on 6/28/2013. // Copyright (c) 2013 Schloss Lab. All rights reserved. // // This class is based on ClassifySharedCommand // #ifndef __Mothur__classifysvmsharedcommand__ #define __Mothur__classifysvmsharedcommand__ #include "command.hpp" #include "inputdata.h" #include "svm.hpp" #include "designmap.h" class ClassifySvmSharedCommand : public Command { public: ClassifySvmSharedCommand(); ClassifySvmSharedCommand(string); //~ClassifySvmSharedCommand() throw() {}; ~ClassifySvmSharedCommand() {}; vector setParameters(); string getCommandName() { return "classifysvm.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/ClassifySvm.shared\n"; } string getDescription() { return "implements the support vector machine machine learning algorithm to identify OTUs that can be used to differentiate between various groups of samples"; } int execute(); void help() { m->mothurOut(getHelpString()); } void readSharedAndDesignFiles(const string&, const string&, LabeledObservationVector&, FeatureVector&); void readSharedRAbundVectors(vector&, DesignMap&, LabeledObservationVector&, FeatureVector&); //bool interruptTraining() { return m->control_pressed; } vector& getSmocList() { return smocList; } const KernelParameterRangeMap& getKernelParameterRangeMap() { return kernelParameterRangeMap; } //bool interruptTraining() { return m->control_pressed; } //std::vector& getSmocList() { return smocList; } //const KernelParameterRangeMap& getKernelParameterRangeMap() { return kernelParameterRangeMap; } private: bool abort; string outputDir; vector outputNames, Groups; string sharedfile, designfile; set labels; bool allLines; int processors; bool useTiming; DesignMap designMap; //void readSharedAndDesignFiles(const std::string&, const std::string&, LabeledObservationVector&, FeatureVector&); //void readSharedRAbundVectors(vector&, GroupMap&, LabeledObservationVector&, FeatureVector&); // mode is either "rfe" or "classify" string mode; int evaluationFoldCount; int trainingFoldCount; vector smocList; KernelParameterRangeMap kernelParameterRangeMap; string transformName; int verbosity; double stdthreshold; //int numDecisionTrees; //string treeSplitCriterion, optimumFeatureSubsetSelectionCriteria; //bool doPruning, discardHighErrorTrees; //double pruneAggressiveness, highErrorTreeDiscardThreshold, featureStandardDeviationThreshold; void processSharedAndDesignData(vector lookup); void trainSharedAndDesignData(vector lookup); void getParameterValue(int& target, string pstring, int defaultvalue) { if (pstring == "not found" or pstring == "") { target = defaultvalue; } else { m->mothurConvert(pstring, target); } } }; #endif /* defined(__Mothur__classifysvmsharedcommand__) */ mothur-1.36.1/source/commands/classifytreecommand.cpp000066400000000000000000000546671255543666200230210ustar00rootroot00000000000000// // classifytreecommand.cpp // Mothur // // Created by Sarah Westcott on 2/20/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "classifytreecommand.h" #include "phylotree.h" #include "treereader.h" //********************************************************************************************************************** vector ClassifyTreeCommand::setParameters(){ try { CommandParameter ptree("tree", "InputTypes", "", "", "", "", "none","tree-summary",false,true,true); parameters.push_back(ptree); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "", "", "none","",false,true,true); parameters.push_back(ptaxonomy); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pmethod("output", "Multiple", "node-taxon", "node", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter pcutoff("cutoff", "Number", "", "51", "", "", "","",false,true); parameters.push_back(pcutoff); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClassifyTreeCommand::getHelpString(){ try { string helpString = ""; helpString += "The classify.tree command reads a tree and taxonomy file and output the consensus taxonomy for each node on the tree. \n"; helpString += "If you provide a group file, the concensus for each group will also be provided. \n"; helpString += "The new tree contains labels at each internal node. The label is the node number so you can relate the tree to the summary file.\n"; helpString += "The count parameter allows you add a count file so you can have the summary totals broken up by group.\n"; helpString += "The summary file lists the concensus taxonomy for the descendants of each node.\n"; helpString += "The classify.tree command parameters are tree, group, name, count and taxonomy. The tree and taxonomy files are required.\n"; helpString += "The cutoff parameter allows you to specify a consensus confidence threshold for your taxonomy. The default is 51, meaning 51%. Cutoff cannot be below 51.\n"; helpString += "The output parameter allows you to specify whether you want the tree node number displayed on the tree, or the taxonomy displayed. Default=node. Options are node or taxon.\n"; helpString += "The classify.tree command should be used in the following format: classify.tree(tree=test.tre, group=test.group, taxonomy=test.taxonomy)\n"; helpString += "Note: No spaces between parameter labels (i.e. tree), '=' and parameters (i.e.yourTreefile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClassifyTreeCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],taxonomy.summary"; } //makes file like: amazon.0.03.fasta else if (type == "tree") { pattern = "[filename],taxonomy.tre"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClassifyTreeCommand::ClassifyTreeCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["tree"] = tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "ClassifyTreeCommand"); exit(1); } } //********************************************************************************************************************** ClassifyTreeCommand::ClassifyTreeCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["tree"] = tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("tree"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["tree"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters treefile = validParameter.validFile(parameters, "tree", true); if (treefile == "not open") { treefile = ""; abort = true; } else if (treefile == "not found") { treefile = ""; treefile = m->getTreeFile(); if (treefile != "") { m->mothurOut("Using " + treefile + " as input file for the tree parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a tree file."); m->mothurOutEndLine(); abort = true; } }else { m->setTreeFile(treefile); } taxonomyfile = validParameter.validFile(parameters, "taxonomy", true); if (taxonomyfile == "not open") { taxonomyfile = ""; abort = true; } else if (taxonomyfile == "not found") { taxonomyfile = ""; taxonomyfile = m->getTaxonomyFile(); if (taxonomyfile != "") { m->mothurOut("Using " + taxonomyfile + " as input file for the taxonomy parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a taxonomy file."); m->mothurOutEndLine(); abort = true; } }else { m->setTaxonomyFile(taxonomyfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } string temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "51"; } m->mothurConvert(temp, cutoff); if ((cutoff < 51) || (cutoff > 100)) { m->mothurOut("cutoff must be above 50, and no greater than 100."); m->mothurOutEndLine(); abort = true; } output = validParameter.validFile(parameters, "output", false); if (output == "not found") { output = "node"; } if ((output == "node") || (output == "taxon")) { }else { m->mothurOut("[ERROR]: " + output + "is not a valid output option. Valid output options are node or taxon.\n"); abort = true; } if (countfile == "") { if (namefile == "") { vector files; files.push_back(treefile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "ClassifyTreeCommand"); exit(1); } } //********************************************************************************************************************** int ClassifyTreeCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); int start = time(NULL); /***************************************************/ // reading tree info // /***************************************************/ m->setTreeFile(treefile); TreeReader* reader = new TreeReader(treefile, groupfile, namefile); vector T = reader->getTrees(); CountTable* tmap = T[0]->getCountTable(); Tree* outputTree = T[0]; delete reader; if (namefile != "") { m->readNames(namefile, nameMap, nameCount); } if (m->control_pressed) { delete tmap; delete outputTree; return 0; } m->readTax(taxonomyfile, taxMap, true); /***************************************************/ // get concensus taxonomies // /***************************************************/ getClassifications(outputTree); delete outputTree; delete tmap; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set tree file as new current treefile if (treefile != "") { string current = ""; itTypes = outputTypes.find("tree"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTreeFile(current); } } } m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to find the concensus taxonomies."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "execute"); exit(1); } } //********************************************************************************************************************** //traverse tree finding concensus taxonomy at each node //label node with a number to relate to output summary file //report all concensus taxonomies to file int ClassifyTreeCommand::getClassifications(Tree*& T){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(treefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(treefile)); string outputFileName = getOutputFileName("summary", variables); outputNames.push_back(outputFileName); outputTypes["summary"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); //print headings out << "TreeNode\t"; if (groupfile != "") { out << "Group\t"; } out << "NumRep\tTaxonomy" << endl; string treeOutputDir = outputDir; if (outputDir == "") { treeOutputDir += m->hasPath(treefile); } variables["[filename]"] = treeOutputDir + m->getRootName(m->getSimpleName(treefile)); string outputTreeFileName = getOutputFileName("tree", variables); //create a map from tree node index to names of descendants, save time later map > > nodeToDescendants; //node# -> (groupName -> groupMembers) for (int i = 0; i < T->getNumNodes(); i++) { if (m->control_pressed) { return 0; } nodeToDescendants[i] = getDescendantList(T, i, nodeToDescendants); } //for each node for (int i = T->getNumLeaves(); i < T->getNumNodes(); i++) { if (m->control_pressed) { out.close(); return 0; } string tax = "not classifed"; int size; if (groupfile != "") { for (map >::iterator itGroups = nodeToDescendants[i].begin(); itGroups != nodeToDescendants[i].end(); itGroups++) { if (itGroups->first != "AllGroups") { tax = getTaxonomy(itGroups->second, size); out << (i+1) << '\t' << itGroups->first << '\t' << size << '\t' << tax << endl; } } }else { string group = "AllGroups"; tax = getTaxonomy(nodeToDescendants[i][group], size); out << (i+1) << '\t' << size << '\t' << tax << endl; } if (output == "node") { T->tree[i].setLabel(toString(i+1)); } else { string cleanedTax = tax; m->removeConfidences(cleanedTax); for (int j = 0; j < cleanedTax.length(); j++) { //special chars to trees - , ) ( ; [ ] : if ((cleanedTax[j] == ',') || (cleanedTax[j] == '(') || (cleanedTax[j] == ')') || (cleanedTax[j] == ';') || (cleanedTax[j] == ':') || (cleanedTax[j] == ']') || (cleanedTax[j] == '[')) { cleanedTax[j] = '_'; //change any special chars to _ so the tree can be read by tree readers } } cout << tax << '\t' << cleanedTax << endl; T->tree[i].setLabel(cleanedTax); } } out.close(); ofstream outTree; m->openOutputFile(outputTreeFileName, outTree); outputNames.push_back(outputTreeFileName); outputTypes["tree"].push_back(outputTreeFileName); T->print(outTree, "both"); outTree.close(); return 0; } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "GetConcensusTaxonomies"); exit(1); } } //********************************************************************************************************************** string ClassifyTreeCommand::getTaxonomy(set names, int& size) { try{ string conTax = ""; size = 0; //create a tree containing sequences from this bin PhyloTree* phylo = new PhyloTree(); for (set::iterator it = names.begin(); it != names.end(); it++) { //if namesfile include the names if (namefile != "") { //is this sequence in the name file - namemap maps seqName -> repSeqName map::iterator it2 = nameMap.find(*it); if (it2 == nameMap.end()) { //this name is not in name file, skip it m->mothurOut((*it) + " is not in your name file. I will not include it in the consensus."); m->mothurOutEndLine(); }else{ //is this sequence in the taxonomy file - look for repSeqName since we are assuming the taxonomy file is unique map::iterator itTax = taxMap.find((it2->second)); if (itTax == taxMap.end()) { //this name is not in taxonomy file, skip it if ((*it) != (it2->second)) { m->mothurOut((*it) + " is represented by " + it2->second + " and is not in your taxonomy file. I will not include it in the consensus."); m->mothurOutEndLine(); } else { m->mothurOut((*it) + " is not in your taxonomy file. I will not include it in the consensus."); m->mothurOutEndLine(); } }else{ //add seq to tree int num = nameCount[(*it)]; // we know its there since we found it in nameMap for (int i = 0; i < num; i++) { phylo->addSeqToTree((*it)+toString(i), itTax->second); } size += num; } } }else{ //is this sequence in the taxonomy file - look for repSeqName since we are assuming the taxonomy file is unique map::iterator itTax = taxMap.find((*it)); if (itTax == taxMap.end()) { //this name is not in taxonomy file, skip it m->mothurOut((*it) + " is not in your taxonomy file. I will not include it in the consensus."); m->mothurOutEndLine(); }else{ if (countfile != "") { int numDups = ct->getNumSeqs((*it)); for (int j = 0; j < numDups; j++) { phylo->addSeqToTree((*it), itTax->second); } size += numDups; }else{ //add seq to tree phylo->addSeqToTree((*it), itTax->second); size++; } } } if (m->control_pressed) { delete phylo; return conTax; } } //build tree phylo->assignHeirarchyIDs(0); TaxNode currentNode = phylo->get(0); int myLevel = 0; //at each level while (currentNode.children.size() != 0) { //you still have more to explore TaxNode bestChild; int bestChildSize = 0; //go through children for (map::iterator itChild = currentNode.children.begin(); itChild != currentNode.children.end(); itChild++) { TaxNode temp = phylo->get(itChild->second); //select child with largest accesions - most seqs assigned to it if (temp.accessions.size() > bestChildSize) { bestChild = phylo->get(itChild->second); bestChildSize = temp.accessions.size(); } } //is this taxonomy above cutoff int consensusConfidence = ceil((bestChildSize / (float) size) * 100); if (consensusConfidence >= cutoff) { //if yes, add it conTax += bestChild.name + "(" + toString(consensusConfidence) + ");"; myLevel++; }else{ //if no, quit break; } //move down a level currentNode = bestChild; } if (myLevel != phylo->getMaxLevel()) { while (myLevel != phylo->getMaxLevel()) { conTax += "unclassified;"; myLevel++; } } if (conTax == "") { conTax = "no_consensus;"; } delete phylo; return conTax; } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "getTaxonomy"); exit(1); } } //********************************************************************************************************************** map > ClassifyTreeCommand::getDescendantList(Tree*& T, int i, map > > descendants){ try { map > names; map >::iterator it; map >::iterator it2; int lc = T->tree[i].getLChild(); int rc = T->tree[i].getRChild(); // TreeMap* tmap = T->getTreeMap(); if (lc == -1) { //you are a leaf your only descendant is yourself vector groups = T->tree[i].getGroup(); set mynames; mynames.insert(T->tree[i].getName()); for (int j = 0; j < groups.size(); j++) { names[groups[j]] = mynames; } //mygroup -> me names["AllGroups"] = mynames; }else{ //your descedants are the combination of your childrens descendants names = descendants[lc]; for (it = descendants[rc].begin(); it != descendants[rc].end(); it++) { it2 = names.find(it->first); //do we already have this group if (it2 == names.end()) { //nope, so add it names[it->first] = it->second; }else { for (set::iterator it3 = (it->second).begin(); it3 != (it->second).end(); it3++) { names[it->first].insert(*it3); } } } } return names; } catch(exception& e) { m->errorOut(e, "ClassifyTreeCommand", "getDescendantList"); exit(1); } } /*****************************************************************/ mothur-1.36.1/source/commands/classifytreecommand.h000066400000000000000000000025751255543666200224550ustar00rootroot00000000000000#ifndef Mothur_classifytreecommand_h #define Mothur_classifytreecommand_h // // classifytreecommand.h // Mothur // // Created by Sarah Westcott on 2/20/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "readtree.h" #include "treemap.h" #include "counttable.h" class ClassifyTreeCommand : public Command { public: ClassifyTreeCommand(string); ClassifyTreeCommand(); ~ClassifyTreeCommand(){} vector setParameters(); string getCommandName() { return "classify.tree"; } string getCommandCategory() { return "Phylotype Analysis"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Classify.tree"; } string getDescription() { return "Find the consensus taxonomy for the descendant of each tree node"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string treefile, taxonomyfile, groupfile, namefile, countfile, outputDir, output; bool abort; vector outputNames; int numUniquesInName, cutoff; map nameMap; map nameCount; map taxMap; CountTable* ct; int getClassifications(Tree*&); map > getDescendantList(Tree*&, int, map > >); string getTaxonomy(set, int&); }; #endif mothur-1.36.1/source/commands/clearcutcommand.cpp000066400000000000000000000471011255543666200221070ustar00rootroot00000000000000/* * clearcutcommand.cpp * Mothur * * Created by westcott on 5/11/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "clearcutcommand.h" #ifdef __cplusplus extern "C" { #endif #include "clearcut.h" #ifdef __cplusplus } #endif //********************************************************************************************************************** vector ClearcutCommand::setParameters(){ try { CommandParameter pphylip("phylip", "InputTypes", "", "", "FastaPhylip", "FastaPhylip", "none","tree",false,false,true); parameters.push_back(pphylip); CommandParameter pfasta("fasta", "InputTypes", "", "", "FastaPhylip", "FastaPhylip", "none","tree",false,false,true); parameters.push_back(pfasta); CommandParameter pverbose("verbose", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pverbose); CommandParameter pquiet("quiet", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pquiet); CommandParameter pversion("version", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pversion); CommandParameter prseed("rseed", "String", "", "", "*", "", "","",false,false); parameters.push_back(prseed); CommandParameter pnorandom("norandom", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pnorandom); CommandParameter pshuffle("shuffle", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pshuffle); CommandParameter pneighbor("neighbor", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pneighbor); CommandParameter pexpblen("expblen", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pexpblen); CommandParameter pexpdist("expdist", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pexpdist); CommandParameter pDNA("DNA", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pDNA); CommandParameter pprotein("protein", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pprotein); CommandParameter pjukes("jukes", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pjukes); CommandParameter pkimura("kimura", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pkimura); CommandParameter pstdout("stdout", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pstdout); CommandParameter pntrees("ntrees", "Number", "", "1", "", "", "","",false,false); parameters.push_back(pntrees); CommandParameter pmatrixout("matrixout", "String", "", "", "", "", "","",false,false); parameters.push_back(pmatrixout); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClearcutCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClearcutCommand::getHelpString(){ try { string helpString = ""; helpString += "The clearcut command interfaces mothur with the clearcut program written by Initiative for Bioinformatics and Evolutionary Studies (IBEST) at the University of Idaho.\n"; helpString += "For more information about clearcut refer to http://bioinformatics.hungry.com/clearcut/ \n"; helpString += "The clearcut command parameters are phylip, fasta, version, verbose, quiet, seed, norandom, shuffle, neighbor, expblen, expdist, ntrees, matrixout, stdout, kimura, jukes, protein, DNA. \n"; helpString += "The phylip parameter allows you to enter your phylip formatted distance matrix. \n"; helpString += "The fasta parameter allows you to enter your aligned fasta file, if you enter a fastafile you specify if the sequences are DNA or protein using the DNA or protein parameters. \n"; helpString += "The version parameter prints out the version of clearcut you are using, default=F. \n"; helpString += "The verbose parameter prints out more output from clearcut, default=F. \n"; helpString += "The quiet parameter turns on silent operation mode, default=F. \n"; helpString += "The rseed parameter allows you to explicitly set the PRNG seed to a specific value. \n"; helpString += "The norandom parameter allows you to attempt joins deterministically, default=F. \n"; helpString += "The shuffle parameter allows you to randomly shuffle the distance matrix, default=F. \n"; helpString += "The neighbor parameter allows you to use traditional Neighbor-Joining algorithm, default=T. \n"; helpString += "The DNA parameter allows you to indicate your fasta file contains DNA sequences, default=F. \n"; helpString += "The protein parameter allows you to indicate your fasta file contains protein sequences, default=F. \n"; helpString += "The stdout parameter outputs your tree to STDOUT, default=F. \n"; helpString += "The matrixout parameter allows you to specify a filename to output a distance matrix to. \n"; helpString += "The ntrees parameter allows you to specify the number of output trees, default=1. \n"; helpString += "The expblen parameter allows you to use exponential notation for branch lengths, default=F. \n"; helpString += "The expdist parameter allows you to use exponential notation for distance outputs, default=F. \n"; helpString += "The clearcut command should be in the following format: \n"; helpString += "clearcut(phylip=yourDistanceFile) \n"; helpString += "Example: clearcut(phylip=abrecovery.phylip.dist) \n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClearcutCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClearcutCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "matrixout") { pattern = "[filename],"; } else if (type == "tree") { pattern = "[filename],tre"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClearcutCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClearcutCommand::ClearcutCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["tree"] = tempOutNames; outputTypes["matrixout"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClearcutCommand", "ClearcutCommand"); exit(1); } } /**************************************************************************************/ ClearcutCommand::ClearcutCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["tree"] = tempOutNames; outputTypes["matrixout"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { inputFile = fastafile; m->setFastaFile(fastafile); } phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { phylipfile = ""; abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { inputFile = phylipfile; m->setPhylipFile(phylipfile); } if ((phylipfile == "") && (fastafile == "")) { //is there are current file available for either of these? //give priority to phylip, then fasta phylipfile = m->getPhylipFile(); if (phylipfile != "") { inputFile = phylipfile; m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { fastafile = m->getFastaFile(); if (fastafile != "") { inputFile = fastafile; m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a phylip or fasta file before you can use the clearcut command."); m->mothurOutEndLine(); abort = true; } } } if ((phylipfile != "") && (fastafile != "")) { m->mothurOut("You must provide either a phylip formatted distance matrix or an aligned fasta file, not BOTH."); m->mothurOutEndLine(); abort=true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputFile); } string temp; temp = validParameter.validFile(parameters, "version", false); if (temp == "not found"){ temp = "F"; } version = m->isTrue(temp); temp = validParameter.validFile(parameters, "verbose", false); if (temp == "not found"){ temp = "F"; } verbose = m->isTrue(temp); temp = validParameter.validFile(parameters, "quiet", false); if (temp == "not found"){ temp = "F"; } quiet = m->isTrue(temp); seed = validParameter.validFile(parameters, "rseed", false); if (seed == "not found"){ seed = "*"; } temp = validParameter.validFile(parameters, "norandom", false); if (temp == "not found"){ temp = "F"; } norandom = m->isTrue(temp); temp = validParameter.validFile(parameters, "shuffle", false); if (temp == "not found"){ temp = "F"; } shuffle = m->isTrue(temp); temp = validParameter.validFile(parameters, "neighbor", false); if (temp == "not found"){ temp = "T"; } neighbor = m->isTrue(temp); temp = validParameter.validFile(parameters, "DNA", false); if (temp == "not found"){ temp = "F"; } DNA = m->isTrue(temp); temp = validParameter.validFile(parameters, "protein", false); if (temp == "not found"){ temp = "F"; } protein = m->isTrue(temp); temp = validParameter.validFile(parameters, "jukes", false); if (temp == "not found"){ temp = "F"; } jukes = m->isTrue(temp); temp = validParameter.validFile(parameters, "kimura", false); if (temp == "not found"){ temp = "F"; } kimura = m->isTrue(temp); temp = validParameter.validFile(parameters, "stdout", false); if (temp == "not found"){ temp = "F"; } stdoutWanted = m->isTrue(temp); matrixout = validParameter.validFile(parameters, "matrixout", false); if (matrixout == "not found"){ matrixout = ""; } ntrees = validParameter.validFile(parameters, "ntrees", false); if (ntrees == "not found"){ ntrees = "1"; } temp = validParameter.validFile(parameters, "expblen", false); if (temp == "not found"){ temp = "F"; } expblen = m->isTrue(temp); temp = validParameter.validFile(parameters, "expdist", false); if (temp == "not found"){ temp = "F"; } expdist = m->isTrue(temp); if ((fastafile != "") && ((!DNA) && (!protein))) { m->mothurOut("You must specify the type of sequences you are using: DNA or protein"); m->mothurOutEndLine(); abort=true; } } } catch(exception& e) { m->errorOut(e, "ClearcutCommand", "ClearcutCommand"); exit(1); } } /**************************************************************************************/ int ClearcutCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //prepare filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFile)); string outputName = getOutputFileName("tree", variables); outputNames.push_back(outputName); outputTypes["tree"].push_back(outputName); int numArgs = 4; //clearcut, in, out and fastafile or phylipfile if (version) { numArgs++; } if (verbose) { numArgs++; } if (quiet) { numArgs++; } if (seed != "*") { numArgs++; } if (norandom) { numArgs++; } if (shuffle) { numArgs++; } if (neighbor) { numArgs++; } if (stdoutWanted) { numArgs++; } if (DNA) { numArgs++; } if (protein) { numArgs++; } if (jukes) { numArgs++; } if (kimura) { numArgs++; } if (matrixout != "") { numArgs++; } if (ntrees != "1") { numArgs++; } if (expblen) { numArgs++; } if (expdist) { numArgs++; } char** clearcutParameters; clearcutParameters = new char*[numArgs]; clearcutParameters[0] = new char[9]; *clearcutParameters[0] = '\0'; strncat(clearcutParameters[0], "clearcut", 8); //you gave us a distance matrix if (phylipfile != "") { clearcutParameters[1] = new char[11]; *clearcutParameters[1] = '\0'; strncat(clearcutParameters[1], "--distance", 10); } //you gave us a fastafile if (fastafile != "") { clearcutParameters[1] = new char[12]; *clearcutParameters[1] = '\0'; strncat(clearcutParameters[1], "--alignment", 11); } int parameterCount = 2; if (version) { clearcutParameters[parameterCount] = new char[10]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--version", 9); parameterCount++; } if (verbose) { clearcutParameters[parameterCount] = new char[10]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--verbose", 9); parameterCount++; } if (quiet) { clearcutParameters[parameterCount] = new char[8]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--quiet", 7); parameterCount++; } if (seed != "*") { string tempSeed = "--seed=" + seed; clearcutParameters[parameterCount] = new char[tempSeed.length()+1]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], tempSeed.c_str(), tempSeed.length()); parameterCount++; } if (norandom) { clearcutParameters[parameterCount] = new char[11]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--norandom", 10); parameterCount++; } if (shuffle) { clearcutParameters[parameterCount] = new char[10]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--shuffle", 9); parameterCount++; } if (neighbor) { clearcutParameters[parameterCount] = new char[11]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--neighbor", 10); parameterCount++; } string tempIn = "--in=" + inputFile; clearcutParameters[parameterCount] = new char[tempIn.length()+1]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], tempIn.c_str(), tempIn.length()); parameterCount++; if (stdoutWanted) { clearcutParameters[parameterCount] = new char[9]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--stdout", 8); parameterCount++; } else{ string tempOut = "--out=" + outputName; clearcutParameters[parameterCount] = new char[tempOut.length()+1]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], tempOut.c_str(), tempOut.length()); parameterCount++; } if (DNA) { clearcutParameters[parameterCount] = new char[6]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--DNA", 5); parameterCount++; } if (protein) { clearcutParameters[parameterCount] = new char[10]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--protein", 9); parameterCount++; } if (jukes) { clearcutParameters[parameterCount] = new char[8]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--jukes", 7); parameterCount++; } if (kimura) { clearcutParameters[parameterCount] = new char[9]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--kimura", 8); parameterCount++; } if (matrixout != "") { string tempMatrix = "--matrixout=" + outputDir + matrixout; clearcutParameters[parameterCount] = new char[tempMatrix.length()+1]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], tempMatrix.c_str(), tempMatrix.length()); parameterCount++; outputNames.push_back((outputDir + matrixout)); outputTypes["matrixout"].push_back((outputDir + matrixout)); } if (ntrees != "1") { string tempNtrees = "--ntrees=" + ntrees; clearcutParameters[parameterCount] = new char[tempNtrees.length()+1]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], tempNtrees.c_str(), tempNtrees.length()); parameterCount++; } if (expblen) { clearcutParameters[parameterCount] = new char[10]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--expblen", 9); parameterCount++; } if (expdist) { clearcutParameters[parameterCount] = new char[10]; *clearcutParameters[parameterCount] = '\0'; strncat(clearcutParameters[parameterCount], "--expdist", 9); parameterCount++; } errno = 0; clearcut_main(numArgs, clearcutParameters); //free memory for(int i = 0; i < numArgs; i++) { delete[] clearcutParameters[i]; } delete[] clearcutParameters; if (!stdoutWanted) { //set first tree file as new current treefile string currentTree = ""; itTypes = outputTypes.find("tree"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { currentTree = (itTypes->second)[0]; m->setTreeFile(currentTree); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); } return 0; } catch(exception& e) { m->errorOut(e, "ClearcutCommand", "execute"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/clearcutcommand.h000066400000000000000000000027401255543666200215540ustar00rootroot00000000000000#ifndef CLEARCUTCOMMAND_H #define CLEARCUTCOMMAND_H /* * clearcutcommand.h * Mothur * * Created by westcott on 5/11/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" /* Evans, J., L. Sheneman, and J.A. Foster (2006) Relaxed Neighbor-Joining: A Fast Distance-Based Phylogenetic Tree Construction Method, J. Mol. Evol., 62, 785-792 */ /****************************************************************************/ class ClearcutCommand : public Command { public: ClearcutCommand(string); ClearcutCommand(); ~ClearcutCommand() {} vector setParameters(); string getCommandName() { return "clearcut"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Sheneman L, Evans J, Foster JA (2006). Clearcut: a fast implementation of relaxed neighbor joining. Bioinformatics 22: 2823-4. \nhttp://www.mothur.org/wiki/Clearcut"; } string getDescription() { return "create a tree from a fasta or phylip file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string outputDir, phylipfile, fastafile, matrixout, inputFile, seed, ntrees; bool version, verbose, quiet, norandom, shuffle, neighbor, expblen, expdist, stdoutWanted, kimura, jukes, protein, DNA; bool abort; vector outputNames; }; /****************************************************************************/ #endif mothur-1.36.1/source/commands/clearmemorycommand.cpp000066400000000000000000000040301255543666200226160ustar00rootroot00000000000000/* * clearmemorycommand.cpp * Mothur * * Created by westcott on 7/6/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "clearmemorycommand.h" #include "referencedb.h" //********************************************************************************************************************** vector ClearMemoryCommand::setParameters(){ try { vector myArray; return myArray; } catch(exception& e) { m->errorOut(e, "ClearMemoryCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClearMemoryCommand::getHelpString(){ try { string helpString = ""; helpString += "The clear.memory command removes saved reference data from memory.\n"; helpString += "The clear.memory command should be in the following format: clear.memory().\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClearMemoryCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** ClearMemoryCommand::ClearMemoryCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} } catch(exception& e) { m->errorOut(e, "ClearMemoryCommand", "ClearMemoryCommand"); exit(1); } } //********************************************************************************************************************** int ClearMemoryCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } ReferenceDB* rdb = ReferenceDB::getInstance(); rdb->clearMemory(); return 0; } catch(exception& e) { m->errorOut(e, "ClearMemoryCommand", "execute"); exit(1); } } //**********************************************************************************************************************/ mothur-1.36.1/source/commands/clearmemorycommand.h000066400000000000000000000016011255543666200222640ustar00rootroot00000000000000#ifndef CLEARMEMORYCOMMAND_H #define CLEARMEMORYCOMMAND_H /* * clearmemorycommand.h * Mothur * * Created by westcott on 7/6/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" class ClearMemoryCommand : public Command { public: ClearMemoryCommand(string); ClearMemoryCommand(){ abort = true; calledHelp = true; } ~ClearMemoryCommand(){} vector setParameters(); string getCommandName() { return "clear.memory"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string) { return ""; } string getCitation() { return "http://www.mothur.org/wiki/Clear.memory"; } string getDescription() { return "remove saved references from memory"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; vector outputNames; }; #endif mothur-1.36.1/source/commands/clustercommand.cpp000066400000000000000000000557571255543666200220060ustar00rootroot00000000000000/* * clustercommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "clustercommand.h" #include "readphylip.h" #include "readcolumn.h" #include "readmatrix.hpp" #include "clusterdoturcommand.h" //********************************************************************************************************************** vector ClusterCommand::setParameters(){ try { CommandParameter pphylip("phylip", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "none","list",false,false,true); parameters.push_back(pphylip); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "ColumnName","rabund-sabund",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pcolumn("column", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "ColumnName","list",false,false,true); parameters.push_back(pcolumn); CommandParameter pcutoff("cutoff", "Number", "", "10", "", "", "","",false,false,true); parameters.push_back(pcutoff); CommandParameter pprecision("precision", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pprecision); CommandParameter pmethod("method", "Multiple", "furthest-nearest-average-weighted", "average", "", "", "","",false,false,true); parameters.push_back(pmethod); CommandParameter pshowabund("showabund", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pshowabund); CommandParameter ptiming("timing", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(ptiming); CommandParameter psim("sim", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psim); CommandParameter phard("hard", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(phard); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); //CommandParameter padjust("adjust", "String", "", "F", "", "", "","",false,false); parameters.push_back(padjust); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClusterCommand::getHelpString(){ try { string helpString = ""; helpString += "The cluster command parameter options are phylip, column, name, count, method, cuttoff, hard, precision, sim, showabund and timing. Phylip or column and name are required, unless you have a valid current file.\n"; //helpString += "The adjust parameter is used to handle missing distances. If you set a cutoff, adjust=f by default. If not, adjust=t by default. Adjust=f, means ignore missing distances and adjust cutoff as needed with the average neighbor method. Adjust=t, will treat missing distances as 1.0. You can also set the value the missing distances should be set to, adjust=0.5 would give missing distances a value of 0.5.\n"; helpString += "The cluster command should be in the following format: \n"; helpString += "cluster(method=yourMethod, cutoff=yourCutoff, precision=yourPrecision) \n"; helpString += "The acceptable cluster methods are furthest, nearest, average and weighted. If no method is provided then average is assumed.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClusterCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "list") { pattern = "[filename],[clustertag],list-[filename],[clustertag],[tag2],list"; } else if (type == "rabund") { pattern = "[filename],[clustertag],rabund"; } else if (type == "sabund") { pattern = "[filename],[clustertag],sabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClusterCommand::ClusterCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "ClusterCommand"); exit(1); } } //********************************************************************************************************************** //This function checks to make sure the cluster command has no errors and then clusters based on the method chosen. ClusterCommand::ClusterCommand(string option) { try{ abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { phylipfile = ""; abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { distfile = phylipfile; format = "phylip"; m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { columnfile = ""; abort = true; } else if (columnfile == "not found") { columnfile = ""; } else { distfile = columnfile; format = "column"; m->setColumnFile(columnfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((phylipfile == "") && (columnfile == "")) { //is there are current file available for either of these? //give priority to column, then phylip columnfile = m->getColumnFile(); if (columnfile != "") { distfile = columnfile; format = "column"; m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { distfile = phylipfile; format = "phylip"; m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a phylip or column file before you can use the cluster command."); m->mothurOutEndLine(); abort = true; } } } else if ((phylipfile != "") && (columnfile != "")) { m->mothurOut("When executing a cluster command you must enter ONLY ONE of the following: phylip or column."); m->mothurOutEndLine(); abort = true; } if (columnfile != "") { if ((namefile == "") && (countfile == "")){ namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a namefile or countfile if you are going to use the column format."); m->mothurOutEndLine(); abort = true; } } } } if ((countfile != "") && (namefile != "")) { m->mothurOut("When executing a cluster command you must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... //get user cutoff and precision or use defaults string temp; temp = validParameter.validFile(parameters, "precision", false); if (temp == "not found") { temp = "100"; } //saves precision legnth for formatting below length = temp.length(); m->mothurConvert(temp, precision); temp = validParameter.validFile(parameters, "hard", false); if (temp == "not found") { temp = "T"; } hard = m->isTrue(temp); temp = validParameter.validFile(parameters, "sim", false); if (temp == "not found") { temp = "F"; } sim = m->isTrue(temp); //bool cutoffSet = false; temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "10"; } //else { cutoffSet = true; } m->mothurConvert(temp, cutoff); cutoff += (5 / (precision * 10.0)); //temp = validParameter.validFile(parameters, "adjust", false); if (temp == "not found") { temp = "F"; } //if (m->isNumeric1(temp)) { m->mothurConvert(temp, adjust); } //else if (m->isTrue(temp)) { adjust = 1.0; } //else { adjust = -1.0; } adjust=-1.0; method = validParameter.validFile(parameters, "method", false); if (method == "not found") { method = "average"; } if ((method == "furthest") || (method == "nearest") || (method == "average") || (method == "weighted")) { } else { m->mothurOut("Not a valid clustering method. Valid clustering algorithms are furthest, nearest, average, and weighted."); m->mothurOutEndLine(); abort = true; } showabund = validParameter.validFile(parameters, "showabund", false); if (showabund == "not found") { showabund = "T"; } timing = validParameter.validFile(parameters, "timing", false); if (timing == "not found") { timing = "F"; } } } catch(exception& e) { m->errorOut(e, "ClusterCommand", "ClusterCommand"); exit(1); } } //********************************************************************************************************************** ClusterCommand::~ClusterCommand(){} //********************************************************************************************************************** int ClusterCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //phylip file given and cutoff not given - use cluster.classic because it uses less memory and is faster if ((format == "phylip") && (cutoff > 10.0)) { m->mothurOutEndLine(); m->mothurOut("You are using a phylip file and no cutoff. I will run cluster.classic to save memory and time."); m->mothurOutEndLine(); //run unique.seqs for deconvolute results string inputString = "phylip=" + distfile; if (namefile != "") { inputString += ", name=" + namefile; } else if (countfile != "") { inputString += ", count=" + countfile; } inputString += ", precision=" + toString(precision); inputString += ", method=" + method; if (hard) { inputString += ", hard=T"; } else { inputString += ", hard=F"; } if (sim) { inputString += ", sim=T"; } else { inputString += ", sim=F"; } m->mothurOutEndLine(); m->mothurOut("/------------------------------------------------------------/"); m->mothurOutEndLine(); m->mothurOut("Running command: cluster.classic(" + inputString + ")"); m->mothurOutEndLine(); Command* clusterClassicCommand = new ClusterDoturCommand(inputString); clusterClassicCommand->execute(); delete clusterClassicCommand; m->mothurOut("/------------------------------------------------------------/"); m->mothurOutEndLine(); return 0; } ReadMatrix* read; if (format == "column") { read = new ReadColumnMatrix(columnfile, sim); } //sim indicates whether its a similarity matrix else if (format == "phylip") { read = new ReadPhylipMatrix(phylipfile, sim); } read->setCutoff(cutoff); NameAssignment* nameMap = NULL; CountTable* ct = NULL; map counts; if(namefile != ""){ nameMap = new NameAssignment(namefile); nameMap->readMap(); read->read(nameMap); }else if (countfile != "") { ct = new CountTable(); ct->readTable(countfile, false, false); read->read(ct); counts = ct->getNameMap(); }else { read->read(nameMap); } list = read->getListVector(); matrix = read->getDMatrix(); if(countfile != "") { rabund = new RAbundVector(); createRabund(ct, list, rabund); //creates an rabund that includes the counts for the unique list delete ct; }else { rabund = new RAbundVector(list->getRAbundVector()); } delete read; if (m->control_pressed) { //clean up delete list; delete matrix; delete rabund; if(countfile == ""){rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } listFile.close(); m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } //create cluster if (method == "furthest") { cluster = new CompleteLinkage(rabund, list, matrix, cutoff, method, adjust); } else if(method == "nearest"){ cluster = new SingleLinkage(rabund, list, matrix, cutoff, method, adjust); } else if(method == "average"){ cluster = new AverageLinkage(rabund, list, matrix, cutoff, method, adjust); } else if(method == "weighted"){ cluster = new WeightedLinkage(rabund, list, matrix, cutoff, method, adjust); } tag = cluster->getTag(); if (outputDir == "") { outputDir += m->hasPath(distfile); } fileroot = outputDir + m->getRootName(m->getSimpleName(distfile)); map variables; variables["[filename]"] = fileroot; variables["[clustertag]"] = tag; string sabundFileName = getOutputFileName("sabund", variables); string rabundFileName = getOutputFileName("rabund", variables); if (countfile != "") { variables["[tag2]"] = "unique_list"; } string listFileName = getOutputFileName("list", variables); if (countfile == "") { m->openOutputFile(sabundFileName, sabundFile); m->openOutputFile(rabundFileName, rabundFile); outputNames.push_back(sabundFileName); outputTypes["sabund"].push_back(sabundFileName); outputNames.push_back(rabundFileName); outputTypes["rabund"].push_back(rabundFileName); } m->openOutputFile(listFileName, listFile); outputNames.push_back(listFileName); outputTypes["list"].push_back(listFileName); list->printHeaders(listFile); time_t estart = time(NULL); float previousDist = 0.00000; float rndPreviousDist = 0.00000; oldRAbund = *rabund; oldList = *list; print_start = true; start = time(NULL); loops = 0; double saveCutoff = cutoff; while (matrix->getSmallDist() < cutoff && matrix->getNNodes() > 0){ if (m->control_pressed) { //clean up delete list; delete matrix; delete rabund; delete cluster; if(countfile == "") {rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } listFile.close(); m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } if (print_start && m->isTrue(timing)) { m->mothurOut("Clustering (" + tag + ") dist " + toString(matrix->getSmallDist()) + "/" + toString(m->roundDist(matrix->getSmallDist(), precision)) + "\t(precision: " + toString(precision) + ", Nodes: " + toString(matrix->getNNodes()) + ")"); cout.flush(); print_start = false; } loops++; cluster->update(cutoff); float dist = matrix->getSmallDist(); float rndDist; if (hard) { rndDist = m->ceilDist(dist, precision); }else{ rndDist = m->roundDist(dist, precision); } if(previousDist <= 0.0000 && dist != previousDist){ printData("unique", counts); } else if(rndDist != rndPreviousDist){ printData(toString(rndPreviousDist, length-1), counts); } previousDist = dist; rndPreviousDist = rndDist; oldRAbund = *rabund; oldList = *list; } if (print_start && m->isTrue(timing)) { m->mothurOut("Clustering (" + tag + ") for distance " + toString(previousDist) + "/" + toString(rndPreviousDist) + "\t(precision: " + toString(precision) + ", Nodes: " + toString(matrix->getNNodes()) + ")"); cout.flush(); print_start = false; } if(previousDist <= 0.0000){ printData("unique", counts); } else if(rndPreviousDistceilDist(saveCutoff, precision); } else { saveCutoff = m->roundDist(saveCutoff, precision); } m->mothurOut("changed cutoff to " + toString(cutoff)); m->mothurOutEndLine(); } //set list file as new current listfile string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } //set rabund file as new current rabundfile itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } //set sabund file as new current sabundfile itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //if (m->isTrue(timing)) { m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to cluster"); m->mothurOutEndLine(); //} return 0; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "execute"); exit(1); } } //********************************************************************************************************************** void ClusterCommand::printData(string label, map& counts){ try { if (m->isTrue(timing)) { m->mothurOut("\tTime: " + toString(time(NULL) - start) + "\tsecs for " + toString(oldRAbund.getNumBins()) + "\tclusters. Updates: " + toString(loops)); m->mothurOutEndLine(); } print_start = true; loops = 0; start = time(NULL); oldRAbund.setLabel(label); if (countfile == "") { oldRAbund.print(rabundFile); oldRAbund.getSAbundVector().print(sabundFile); } if (m->isTrue(showabund)) { oldRAbund.getSAbundVector().print(cout); } oldList.setLabel(label); if(countfile != "") { oldList.print(listFile, counts); }else { oldList.print(listFile); } } catch(exception& e) { m->errorOut(e, "ClusterCommand", "printData"); exit(1); } } //********************************************************************************************************************** int ClusterCommand::createRabund(CountTable*& ct, ListVector*& list, RAbundVector*& rabund){ try { rabund->setLabel(list->getLabel()); for(int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { break; } vector binNames; string bin = list->get(i); m->splitAtComma(bin, binNames); int total = 0; for (int j = 0; j < binNames.size(); j++) { total += ct->getNumSeqs(binNames[j]); } rabund->push_back(total); } return 0; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "createRabund"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/clustercommand.h000066400000000000000000000046421255543666200214360ustar00rootroot00000000000000#ifndef CLUSTERCOMMAND_H #define CLUSTERCOMMAND_H /* * clustercommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "rabundvector.hpp" #include "sabundvector.hpp" #include "listvector.hpp" #include "cluster.hpp" #include "sparsedistancematrix.h" #include "counttable.h" /* The cluster() command: The cluster command outputs a .list , .rabund and .sabund files. The cluster command parameter options are method, cuttoff and precision. No parameters are required. The cluster command should be in the following format: cluster(method=yourMethod, cutoff=yourCutoff, precision=yourPrecision). The acceptable methods are furthest, nearest and average. If you do not provide a method the default algorithm is furthest neighbor. The cluster() command outputs three files *.list, *.rabund, and *.sabund. */ class ClusterCommand : public Command { public: ClusterCommand(string); ClusterCommand(); ~ClusterCommand(); vector setParameters(); string getCommandName() { return "cluster"; } string getCommandCategory() { return "Clustering"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Westcott SL (2011). Assessing and improving methods used in OTU-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 77:3219.\nSchloss PD, Handelsman J (2005). Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol 71: 1501-6.\nhttp://www.mothur.org/wiki/Cluster"; } string getDescription() { return "cluster your sequences into OTUs using a distance matrix"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: Cluster* cluster; SparseDistanceMatrix* matrix; ListVector* list; RAbundVector* rabund; RAbundVector oldRAbund; ListVector oldList; bool abort, hard, sim; string method, fileroot, tag, outputDir, phylipfile, columnfile, namefile, format, distfile, countfile; double cutoff; float adjust; string showabund, timing; int precision, length; ofstream sabundFile, rabundFile, listFile; bool print_start; time_t start; unsigned long loops; void printData(string label, map&); vector outputNames; int createRabund(CountTable*&, ListVector*&, RAbundVector*&); }; #endif mothur-1.36.1/source/commands/clusterdoturcommand.cpp000066400000000000000000000357001255543666200230460ustar00rootroot00000000000000/* * clusterdoturcommand.cpp * Mothur * * Created by westcott on 10/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "clusterdoturcommand.h" #include "clusterclassic.h" //********************************************************************************************************************** vector ClusterDoturCommand::setParameters(){ try { CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "none", "none","list",false,true,true); parameters.push_back(pphylip); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","rabund-sabund",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pcutoff("cutoff", "Number", "", "10", "", "", "","",false,false,true); parameters.push_back(pcutoff); CommandParameter pprecision("precision", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pprecision); CommandParameter pmethod("method", "Multiple", "furthest-nearest-average-weighted", "average", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter phard("hard", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(phard); CommandParameter psim("sim", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psim); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClusterDoturCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClusterDoturCommand::getHelpString(){ try { string helpString = ""; helpString += "The cluster.classic command clusters using the algorithm from dotur. \n"; helpString += "The cluster.classic command parameter options are phylip, name, count, method, cuttoff, hard, sim, precision. Phylip is required, unless you have a valid current file.\n"; helpString += "The cluster.classic command should be in the following format: \n"; helpString += "cluster.classic(phylip=yourDistanceMatrix, method=yourMethod, cutoff=yourCutoff, precision=yourPrecision) \n"; helpString += "The acceptable cluster methods are furthest, nearest, weighted and average. If no method is provided then average is assumed.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClusterDoturCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClusterDoturCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "list") { pattern = "[filename],[clustertag],list-[filename],[clustertag],[tag2],list"; } else if (type == "rabund") { pattern = "[filename],[clustertag],rabund"; } else if (type == "sabund") { pattern = "[filename],[clustertag],sabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClusterDoturCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClusterDoturCommand::ClusterDoturCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClusterDoturCommand", "ClusterCommand"); exit(1); } } //********************************************************************************************************************** //This function checks to make sure the cluster command has no errors and then clusters based on the method chosen. ClusterDoturCommand::ClusterDoturCommand(string option) { try{ abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //initialize outputTypes vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { abort = true; } else if (phylipfile == "not found") { phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a phylip file with the cluster.classic command."); m->mothurOutEndLine(); abort = true; } }else { m->setPhylipFile(phylipfile); } //check for optional parameter and set defaults namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; namefile = ""; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (namefile != "")) { m->mothurOut("When executing a cluster.classic command you must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } string temp; temp = validParameter.validFile(parameters, "precision", false); if (temp == "not found") { temp = "100"; } //saves precision legnth for formatting below length = temp.length(); m->mothurConvert(temp, precision); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, cutoff); cutoff += (5 / (precision * 10.0)); temp = validParameter.validFile(parameters, "hard", false); if (temp == "not found") { temp = "T"; } hard = m->isTrue(temp); temp = validParameter.validFile(parameters, "sim", false); if (temp == "not found") { temp = "F"; } sim = m->isTrue(temp); method = validParameter.validFile(parameters, "method", false); if (method == "not found") { method = "average"; } if ((method == "furthest") || (method == "nearest") || (method == "average") || (method == "weighted")) { if (method == "furthest") { tag = "fn"; } else if (method == "nearest") { tag = "nn"; } else if (method == "average") { tag = "an"; } else if (method == "weighted") { tag = "wn"; } }else { m->mothurOut("Not a valid clustering method. Valid clustering algorithms are furthest, nearest, average, weighted."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "ClusterDoturCommand", "ClusterCommand"); exit(1); } } //********************************************************************************************************************** int ClusterDoturCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } ClusterClassic* cluster = new ClusterClassic(cutoff, method, sim); NameAssignment* nameMap = NULL; CountTable* ct = NULL; map counts; if(namefile != "") { nameMap = new NameAssignment(namefile); nameMap->readMap(); cluster->readPhylipFile(phylipfile, nameMap); delete nameMap; }else if (countfile != "") { ct = new CountTable(); ct->readTable(countfile, false, false); cluster->readPhylipFile(phylipfile, ct); counts = ct->getNameMap(); delete ct; }else { cluster->readPhylipFile(phylipfile, nameMap); } tag = cluster->getTag(); if (m->control_pressed) { delete cluster; return 0; } list = cluster->getListVector(); rabund = cluster->getRAbundVector(); if (outputDir == "") { outputDir += m->hasPath(phylipfile); } fileroot = outputDir + m->getRootName(m->getSimpleName(phylipfile)); map variables; variables["[filename]"] = fileroot; variables["[clustertag]"] = tag; string sabundFileName = getOutputFileName("sabund", variables); string rabundFileName = getOutputFileName("rabund", variables); if (countfile != "") { variables["[tag2]"] = "unique_list"; } string listFileName = getOutputFileName("list", variables); if (countfile == "") { m->openOutputFile(sabundFileName, sabundFile); m->openOutputFile(rabundFileName, rabundFile); outputNames.push_back(sabundFileName); outputTypes["sabund"].push_back(sabundFileName); outputNames.push_back(rabundFileName); outputTypes["rabund"].push_back(rabundFileName); } m->openOutputFile(listFileName, listFile); outputNames.push_back(listFileName); outputTypes["list"].push_back(listFileName); list->printHeaders(listFile); float previousDist = 0.00000; float rndPreviousDist = 0.00000; oldRAbund = *rabund; oldList = *list; //double saveCutoff = cutoff; int estart = time(NULL); while ((cluster->getSmallDist() < cutoff) && (cluster->getNSeqs() > 1)){ if (m->control_pressed) { delete cluster; delete list; delete rabund; if(countfile == "") {rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } listFile.close(); m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } cluster->update(cutoff); float dist = cluster->getSmallDist(); float rndDist; if (hard) { rndDist = m->ceilDist(dist, precision); }else{ rndDist = m->roundDist(dist, precision); } if(previousDist <= 0.0000 && dist != previousDist){ printData("unique", counts); } else if(rndDist != rndPreviousDist){ printData(toString(rndPreviousDist, length-1), counts); } previousDist = dist; rndPreviousDist = rndDist; oldRAbund = *rabund; oldList = *list; } if(previousDist <= 0.0000){ printData("unique", counts); } else if(rndPreviousDistsecond).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } //set rabund file as new current rabundfile itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } //set sabund file as new current sabundfile itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to cluster"); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ClusterDoturCommand", "execute"); exit(1); } } //********************************************************************************************************************** void ClusterDoturCommand::printData(string label, map& counts){ try { oldRAbund.setLabel(label); if (countfile == "") { oldRAbund.print(rabundFile); oldRAbund.getSAbundVector().print(sabundFile); } oldRAbund.getSAbundVector().print(cout); oldList.setLabel(label); if(countfile != "") { oldList.print(listFile, counts); }else { oldList.print(listFile); } } catch(exception& e) { m->errorOut(e, "ClusterDoturCommand", "printData"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/clusterdoturcommand.h000066400000000000000000000032231255543666200225060ustar00rootroot00000000000000#ifndef CLUSTERDOTURCOMMAND_H #define CLUSTERDOTURCOMMAND_H /* * clusterdoturcommand.h * Mothur * * Created by westcott on 10/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "nameassignment.hpp" #include "rabundvector.hpp" #include "sabundvector.hpp" #include "listvector.hpp" class ClusterDoturCommand : public Command { public: ClusterDoturCommand(string); ClusterDoturCommand(); ~ClusterDoturCommand(){} vector setParameters(); string getCommandName() { return "cluster.classic"; } string getCommandCategory() { return "Clustering"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Westcott SL (2011). Assessing and improving methods used in OTU-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 77:3219.\nSchloss PD, Handelsman J (2005). Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol 71: 1501-6.\nhttp://www.mothur.org/wiki/Cluster.classic\n";} string getDescription() { return "cluster your sequences into OTUs using DOTUR’s method"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, hard, sim; string method, fileroot, tag, outputDir, phylipfile, namefile, countfile; double cutoff; int precision, length; ofstream sabundFile, rabundFile, listFile; NameAssignment* nameMap; ListVector* list; RAbundVector* rabund; RAbundVector oldRAbund; ListVector oldList; void printData(string label, map&); vector outputNames; }; #endif mothur-1.36.1/source/commands/clusterfragmentscommand.cpp000066400000000000000000000447411255543666200237040ustar00rootroot00000000000000/* * ryanscommand.cpp * Mothur * * Created by westcott on 9/23/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "clusterfragmentscommand.h" #include "needlemanoverlap.hpp" //********************************************************************************************************************** //sort by unaligned inline bool comparePriority(seqRNode first, seqRNode second) { bool better = false; if (first.length > second.length) { better = true; }else if (first.length == second.length) { if (first.numIdentical > second.numIdentical) { better = true; } } return better; } //********************************************************************************************************************** vector ClusterFragmentsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta-name",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pdiffs("diffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pdiffs); CommandParameter ppercent("percent", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ppercent); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClusterFragmentsCommand::getHelpString(){ try { string helpString = ""; helpString += "The cluster.fragments command groups sequences that are part of a larger sequence.\n"; helpString += "The cluster.fragments command outputs a new fasta and name or count file.\n"; helpString += "The cluster.fragments command parameters are fasta, name, count, diffs and percent. The fasta parameter is required, unless you have a valid current file. \n"; helpString += "The names parameter allows you to give a list of seqs that are identical. This file is 2 columns, first column is name or representative sequence, second column is a list of its identical sequences separated by commas.\n"; helpString += "The diffs parameter allows you to set the number of differences allowed, default=0. \n"; helpString += "The percent parameter allows you to set percentage of differences allowed, default=0. percent=2 means if the number of difference is less than or equal to two percent of the length of the fragment, then cluster.\n"; helpString += "You may use diffs and percent at the same time to say something like: If the number or differences is greater than 1 or more than 2% of the fragment length, don't merge. \n"; helpString += "The cluster.fragments command should be in the following format: \n"; helpString += "cluster.fragments(fasta=yourFastaFile, names=yourNamesFile) \n"; helpString += "Example cluster.fragments(fasta=amazon.fasta).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClusterFragmentsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],fragclust.fasta"; } else if (type == "name") { pattern = "[filename],fragclust.names"; } else if (type == "count") { pattern = "[filename],fragclust.count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClusterFragmentsCommand::ClusterFragmentsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "ClusterFragmentsCommand"); exit(1); } } //********************************************************************************************************************** ClusterFragmentsCommand::ClusterFragmentsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (map::iterator it2 = parameters.begin(); it2 != parameters.end(); it2++) { if (validParameter.isValidParameter(it2->first, myArray, it2->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (fastafile == "not open") { fastafile = ""; abort = true; } else { m->setFastaFile(fastafile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(fastafile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not found") { namefile = ""; } else if (namefile == "not open") { namefile = ""; abort = true; } else { readNameFile(); m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { ct.readTable(countfile, true, false); m->setCountTableFile(countfile); } if ((countfile != "") && (namefile != "")) { m->mothurOut("When executing a cluster.fragments command you must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } string temp; temp = validParameter.validFile(parameters, "diffs", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, diffs); temp = validParameter.validFile(parameters, "percent", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, percent); if (countfile == "") { if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "ClusterFragmentsCommand"); exit(1); } } //********************************************************************************************************************** int ClusterFragmentsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); //reads fasta file and return number of seqs int numSeqs = readFASTA(); //fills alignSeqs and makes all seqs active if (m->control_pressed) { return 0; } if (numSeqs == 0) { m->mothurOut("Error reading fasta file...please correct."); m->mothurOutEndLine(); return 0; } //sort seqs by length of unaligned sequence sort(alignSeqs.begin(), alignSeqs.end(), comparePriority); int count = 0; //think about running through twice... for (int i = 0; i < numSeqs; i++) { if (alignSeqs[i].active) { //this sequence has not been merged yet string iBases = alignSeqs[i].seq.getUnaligned(); //try to merge it with all smaller seqs for (int j = i+1; j < numSeqs; j++) { if (m->control_pressed) { return 0; } if (alignSeqs[j].active) { //this sequence has not been merged yet string jBases = alignSeqs[j].seq.getUnaligned(); if (isFragment(iBases, jBases)) { if (countfile != "") { ct.mergeCounts(alignSeqs[i].names, alignSeqs[j].names); }else { //merge alignSeqs[i].names += ',' + alignSeqs[j].names; alignSeqs[i].numIdentical += alignSeqs[j].numIdentical; } alignSeqs[j].active = 0; alignSeqs[j].numIdentical = 0; count++; } }//end if j active }//end if i != j //remove from active list alignSeqs[i].active = 0; }//end if active i if(i % 100 == 0) { m->mothurOutJustToScreen(toString(i) + "\t" + toString(numSeqs - count) + "\t" + toString(count)+"\n"); } } if(numSeqs % 100 != 0) { m->mothurOutJustToScreen(toString(numSeqs) + "\t" + toString(numSeqs - count) + "\t" + toString(count)+"\n"); } string fileroot = outputDir + m->getRootName(m->getSimpleName(fastafile)); map variables; variables["[filename]"] = fileroot; string newFastaFile = getOutputFileName("fasta", variables); string newNamesFile = getOutputFileName("name", variables); if (countfile != "") { newNamesFile = getOutputFileName("count", variables); } if (m->control_pressed) { return 0; } m->mothurOutEndLine(); m->mothurOut("Total number of sequences before cluster.fragments was " + toString(alignSeqs.size()) + "."); m->mothurOutEndLine(); m->mothurOut("cluster.fragments removed " + toString(count) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); printData(newFastaFile, newNamesFile); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to cluster " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); if (m->control_pressed) { m->mothurRemove(newFastaFile); m->mothurRemove(newNamesFile); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(newFastaFile); m->mothurOutEndLine(); m->mothurOut(newNamesFile); m->mothurOutEndLine(); outputNames.push_back(newFastaFile); outputNames.push_back(newNamesFile); outputTypes["fasta"].push_back(newFastaFile); outputTypes["name"].push_back(newNamesFile); m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "execute"); exit(1); } } //*************************************************************************************************************** bool ClusterFragmentsCommand::isFragment(string seq1, string seq2){ try { bool fragment = false; //exact match int pos = seq1.find(seq2); if (pos != string::npos) { return true; } //no match, no diffs wanted else if ((diffs == 0) && (percent == 0)) { return false; } else { //try aligning and see if you can find it //find number of acceptable differences for this sequence fragment int totalDiffs = 0; if (diffs == 0) { //you didnt set diffs you want a percentage totalDiffs = floor((seq2.length() * (percent / 100.0))); }else if (percent == 0) { //you didn't set percent you want diffs totalDiffs = diffs; }else if ((percent != 0) && (diffs != 0)) { //you want both, set total diffs to smaller of 2 totalDiffs = diffs; int percentDiff = floor((seq2.length() * (percent / 100.0))); if (percentDiff < totalDiffs) { totalDiffs = percentDiff; } } Alignment* alignment = new NeedlemanOverlap(-1.0, 1.0, -1.0, (seq1.length()+totalDiffs+1)); //use needleman to align alignment->align(seq2, seq1); string tempSeq2 = alignment->getSeqAAln(); string temp = alignment->getSeqBAln(); delete alignment; //chop gap ends int startPos = 0; int endPos = tempSeq2.length()-1; for (int i = 0; i < tempSeq2.length(); i++) { if (isalpha(tempSeq2[i])) { startPos = i; break; } } for (int i = tempSeq2.length()-1; i >= 0; i--) { if (isalpha(tempSeq2[i])) { endPos = i; break; } } //count number of diffs int numDiffs = 0; for (int i = startPos; i <= endPos; i++) { if (tempSeq2[i] != temp[i]) { numDiffs++; } } if (numDiffs <= totalDiffs) { fragment = true; } } return fragment; } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "isFragment"); exit(1); } } /**************************************************************************************************/ int ClusterFragmentsCommand::readFASTA(){ try { ifstream inFasta; m->openInputFile(fastafile, inFasta); while (!inFasta.eof()) { if (m->control_pressed) { inFasta.close(); return 0; } Sequence seq(inFasta); m->gobble(inFasta); if (seq.getName() != "") { //can get "" if commented line is at end of fasta file if (namefile != "") { itSize = sizes.find(seq.getName()); if (itSize == sizes.end()) { m->mothurOut(seq.getName() + " is not in your names file, please correct."); m->mothurOutEndLine(); exit(1); } else{ seqRNode tempNode(itSize->second, seq, names[seq.getName()], seq.getUnaligned().length()); alignSeqs.push_back(tempNode); } }else if(countfile != "") { seqRNode tempNode(ct.getNumSeqs(seq.getName()), seq, seq.getName(), seq.getUnaligned().length()); alignSeqs.push_back(tempNode); }else { //no names file, you are identical to yourself seqRNode tempNode(1, seq, seq.getName(), seq.getUnaligned().length()); alignSeqs.push_back(tempNode); } } } inFasta.close(); return alignSeqs.size(); } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "readFASTA"); exit(1); } } /**************************************************************************************************/ void ClusterFragmentsCommand::printData(string newfasta, string newname){ try { ofstream outFasta; ofstream outNames; m->openOutputFile(newfasta, outFasta); if (countfile == "") { m->openOutputFile(newname, outNames); } for (int i = 0; i < alignSeqs.size(); i++) { if (alignSeqs[i].numIdentical != 0) { alignSeqs[i].seq.printSequence(outFasta); if (countfile == "") { outNames << alignSeqs[i].seq.getName() << '\t' << alignSeqs[i].names << endl; } } } outFasta.close(); if (countfile == "") { outNames.close(); } else { ct.printTable(newname); } } catch(exception& e) { m->errorOut(e, "ClusterFragmentsCommand", "printData"); exit(1); } } /**************************************************************************************************/ void ClusterFragmentsCommand::readNameFile(){ try { ifstream in; m->openInputFile(namefile, in); string firstCol, secondCol; while (!in.eof()) { in >> firstCol >> secondCol; m->gobble(in); names[firstCol] = secondCol; int size = 1; for(int i=0;ierrorOut(e, "ClusterFragmentsCommand", "readNameFile"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/commands/clusterfragmentscommand.h000066400000000000000000000035721255543666200233460ustar00rootroot00000000000000#ifndef CLUSTERFRAGMENTSCOMMAND_H #define CLUSTERFRAGMENTSCOMMAND_H /* * clusterfragmentscommand.h * Mothur * * Created by westcott on 9/23/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sequence.hpp" #include "counttable.h" /************************************************************/ struct seqRNode { int numIdentical; int length; Sequence seq; string names; bool active; seqRNode() {} seqRNode(int n, Sequence s, string nm, int l) : numIdentical(n), seq(s), names(nm), active(1), length(l) {} ~seqRNode() {} }; /************************************************************/ class ClusterFragmentsCommand : public Command { public: ClusterFragmentsCommand(string); ClusterFragmentsCommand(); ~ClusterFragmentsCommand() {} vector setParameters(); string getCommandName() { return "cluster.fragments"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Cluster.fragments"; } string getDescription() { return "creates a namesfile with sequences that are a fragment of a larger sequence"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CountTable ct; bool abort; string fastafile, namefile, countfile, outputDir; int diffs, percent; vector alignSeqs; map names; //represents the names file first column maps to second column map sizes; //this map a seq name to the number of identical seqs in the names file map::iterator itSize; vector outputNames; int readFASTA(); void readNameFile(); void printData(string, string); //fasta filename, names file name bool isFragment(string, string); }; /************************************************************/ #endif mothur-1.36.1/source/commands/clustersplitcommand.cpp000066400000000000000000002212331255543666200230420ustar00rootroot00000000000000/* * clustersplitcommand.cpp * Mothur * * Created by westcott on 5/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "clustersplitcommand.h" //********************************************************************************************************************** vector ClusterSplitCommand::setParameters(){ try { CommandParameter pfile("file", "InputTypes", "", "", "PhylipColumnFasta", "PhylipColumnFasta", "none","",false,false,true); parameters.push_back(pfile); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "FastaTaxName","",false,false,true); parameters.push_back(ptaxonomy); CommandParameter pphylip("phylip", "InputTypes", "", "", "PhylipColumnFasta", "PhylipColumnFasta", "none","list",false,false,true); parameters.push_back(pphylip); CommandParameter pfasta("fasta", "InputTypes", "", "", "PhylipColumnFasta", "PhylipColumnFasta", "FastaTaxName","list",false,false,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "ColumnName-FastaTaxName","rabund-sabund",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount", "none", "","",false,false,true); parameters.push_back(pcount); CommandParameter pcolumn("column", "InputTypes", "", "", "PhylipColumnFasta", "PhylipColumnFasta", "ColumnName","list",false,false,true); parameters.push_back(pcolumn); CommandParameter ptaxlevel("taxlevel", "Number", "", "3", "", "", "","",false,false,true); parameters.push_back(ptaxlevel); CommandParameter psplitmethod("splitmethod", "Multiple", "classify-fasta-distance", "distance", "", "", "","",false,false,true); parameters.push_back(psplitmethod); CommandParameter plarge("large", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(plarge); CommandParameter pshowabund("showabund", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pshowabund); CommandParameter pcluster("cluster", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pcluster); CommandParameter ptiming("timing", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(ptiming); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pcutoff("cutoff", "Number", "", "0.25", "", "", "","",false,false,true); parameters.push_back(pcutoff); CommandParameter pprecision("precision", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pprecision); CommandParameter pmethod("method", "Multiple", "furthest-nearest-average-weighted", "average", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter phard("hard", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(phard); CommandParameter pislist("islist", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pislist); CommandParameter pclassic("classic", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pclassic); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ClusterSplitCommand::getHelpString(){ try { string helpString = ""; helpString += "The cluster.split command parameter options are file, fasta, phylip, column, name, count, cutoff, precision, method, splitmethod, taxonomy, taxlevel, showabund, timing, hard, large, cluster, processors. Fasta or Phylip or column and name are required.\n"; helpString += "The cluster.split command can split your files in 3 ways. Splitting by distance file, by classification, or by classification also using a fasta file. \n"; helpString += "For the distance file method, you need only provide your distance file and mothur will split the file into distinct groups. \n"; helpString += "For the classification method, you need to provide your distance file and taxonomy file, and set the splitmethod to classify. \n"; helpString += "You will also need to set the taxlevel you want to split by. mothur will split the sequences into distinct taxonomy groups, and split the distance file based on those groups. \n"; helpString += "For the classification method using a fasta file, you need to provide your fasta file, names file and taxonomy file. \n"; helpString += "You will also need to set the taxlevel you want to split by. mothur will split the sequence into distinct taxonomy groups, and create distance files for each grouping. \n"; helpString += "The file option allows you to enter your file containing your list of column and names/count files as well as the singleton file. This file is mothur generated, when you run cluster.split() with the cluster=f parameter. This can be helpful when you have a large dataset that you may be able to use all your processors for the splitting step, but have to reduce them for the cluster step due to RAM constraints. For example: cluster.split(fasta=yourFasta, taxonomy=yourTax, count=yourCount, taxlevel=3, cluster=f, processors=8) then cluster.split(file=yourFile, processors=4). This allows your to maximize your processors during the splitting step. Also, if you are unsure if the cluster step will have RAM issue with multiple processors, you can avoid running the first part of the command multiple times.\n"; helpString += "The phylip and column parameter allow you to enter your distance file. \n"; helpString += "The fasta parameter allows you to enter your aligned fasta file. \n"; helpString += "The name parameter allows you to enter your name file. \n"; helpString += "The count parameter allows you to enter your count file. \n A count or name file is required if your distance file is in column format"; helpString += "The cluster parameter allows you to indicate whether you want to run the clustering or just split the distance matrix, default=t"; helpString += "The cutoff parameter allow you to set the distance you want to cluster to, default is 0.25. \n"; helpString += "The precision parameter allows you specify the precision of the precision of the distances outputted, default=100, meaning 2 decimal places. \n"; helpString += "The method allows you to specify what clustering algorithm you want to use, default=average, option furthest, nearest, or average. \n"; helpString += "The splitmethod parameter allows you to specify how you want to split your distance file before you cluster, default=distance, options distance, classify or fasta. \n"; helpString += "The taxonomy parameter allows you to enter the taxonomy file for your sequences, this is only valid if you are using splitmethod=classify. Be sure your taxonomy file does not include the probability scores. \n"; helpString += "The taxlevel parameter allows you to specify the taxonomy level you want to use to split the distance file, default=3, meaning use the first taxon in each list. \n"; helpString += "The large parameter allows you to indicate that your distance matrix is too large to fit in RAM. The default value is false.\n"; helpString += "The classic parameter allows you to indicate that you want to run your files with cluster.classic. It is only valid with splitmethod=fasta. Default=f.\n"; #ifdef USE_MPI helpString += "When using MPI, the processors parameter is set to the number of MPI processes running. \n"; #endif helpString += "The cluster.split command should be in the following format: \n"; helpString += "cluster.split(column=youDistanceFile, name=yourNameFile, method=yourMethod, cutoff=yourCutoff, precision=yourPrecision, splitmethod=yourSplitmethod, taxonomy=yourTaxonomyfile, taxlevel=yourtaxlevel) \n"; helpString += "Example: cluster.split(column=abrecovery.dist, name=abrecovery.names, method=furthest, cutoff=0.10, precision=1000, splitmethod=classify, taxonomy=abrecovery.silva.slv.taxonomy, taxlevel=5) \n"; return helpString; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ClusterSplitCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "list") { pattern = "[filename],[clustertag],list-[filename],[clustertag],[tag2],list"; } else if (type == "rabund") { pattern = "[filename],[clustertag],rabund"; } else if (type == "sabund") { pattern = "[filename],[clustertag],sabund"; } else if (type == "column") { pattern = "[filename],dist"; } else if (type == "file") { pattern = "[filename],file"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ClusterSplitCommand::ClusterSplitCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; outputTypes["column"] = tempOutNames; outputTypes["file"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "ClusterSplitCommand"); exit(1); } } //********************************************************************************************************************** //This function checks to make sure the cluster command has no errors and then clusters based on the method chosen. ClusterSplitCommand::ClusterSplitCommand(string option) { try{ abort = false; calledHelp = false; format = ""; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("cluster.split"); //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; outputTypes["column"] = tempOutNames; outputTypes["file"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } } //check for required parameters file = validParameter.validFile(parameters, "file", true); if (file == "not open") { file = ""; abort = true; } else if (file == "not found") { file = ""; } else { distfile = file; } phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { distfile = phylipfile; format = "phylip"; m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { abort = true; } else if (columnfile == "not found") { columnfile = ""; } else { distfile = columnfile; format = "column"; m->setColumnFile(columnfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; namefile = "";} else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = "";} else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { distfile = fastafile; splitmethod = "fasta"; m->setFastaFile(fastafile); } taxFile = validParameter.validFile(parameters, "taxonomy", true); if (taxFile == "not open") { taxFile = ""; abort = true; } else if (taxFile == "not found") { taxFile = ""; } else { m->setTaxonomyFile(taxFile); if (splitmethod != "fasta") { splitmethod = "classify"; } } if ((phylipfile == "") && (columnfile == "") && (fastafile == "") && (file == "")) { //is there are current file available for either of these? //give priority to column, then phylip, then fasta columnfile = m->getColumnFile(); if (columnfile != "") { m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. When executing a cluster.split command you must enter a file, phylip or a column or fastafile."); m->mothurOutEndLine(); abort = true; } } } } else if ((phylipfile != "") && (columnfile != "") && (fastafile != "") && (file != "")) { m->mothurOut("When executing a cluster.split command you must enter ONLY ONE of the following: file, fasta, phylip or column."); m->mothurOutEndLine(); abort = true; } if ((countfile != "") && (namefile != "")) { m->mothurOut("When executing a cluster.split command you must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } if (columnfile != "") { if ((namefile == "") && (countfile == "")) { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a namefile or countfile if you are going to use the column format."); m->mothurOutEndLine(); abort = true; } } } } if (fastafile != "") { if (taxFile == "") { taxFile = m->getTaxonomyFile(); if (taxFile != "") { m->mothurOut("Using " + taxFile + " as input file for the taxonomy parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a taxonomy file if you are if you are using a fasta file to generate the split."); m->mothurOutEndLine(); abort = true; } } if ((namefile == "") && (countfile == "")) { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a namefile or countfile if you are going to use the fasta file to generate the split."); m->mothurOutEndLine(); abort = true; } } } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... //get user cutoff and precision or use defaults string temp; temp = validParameter.validFile(parameters, "precision", false); if (temp == "not found") { temp = "100"; } //saves precision legnth for formatting below length = temp.length(); m->mothurConvert(temp, precision); temp = validParameter.validFile(parameters, "hard", false); if (temp == "not found") { temp = "T"; } hard = m->isTrue(temp); temp = validParameter.validFile(parameters, "large", false); if (temp == "not found") { temp = "F"; } large = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "splitmethod", false); if ((splitmethod != "fasta") && (splitmethod != "classify")) { if (temp == "not found") { splitmethod = "distance"; } else { splitmethod = temp; } } temp = validParameter.validFile(parameters, "classic", false); if (temp == "not found") { temp = "F"; } classic = m->isTrue(temp); //not using file option and don't have fasta method with classic if (((splitmethod != "fasta") && classic) && (file == "")) { m->mothurOut("splitmethod must be fasta to use cluster.classic, or you must use the file option.\n"); abort=true; } temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "0.25"; } m->mothurConvert(temp, cutoff); cutoff += (5 / (precision * 10.0)); temp = validParameter.validFile(parameters, "taxlevel", false); if (temp == "not found") { temp = "3"; } m->mothurConvert(temp, taxLevelCutoff); method = validParameter.validFile(parameters, "method", false); if (method == "not found") { method = "average"; } if ((method == "furthest") || (method == "nearest") || (method == "average")) { m->mothurOut("Using splitmethod " + splitmethod + ".\n"); } else { m->mothurOut("Not a valid clustering method. Valid clustering algorithms are furthest, nearest or average."); m->mothurOutEndLine(); abort = true; } if ((splitmethod == "distance") || (splitmethod == "classify") || (splitmethod == "fasta")) { } else { m->mothurOut(splitmethod + " is not a valid splitting method. Valid splitting algorithms are distance, classify or fasta."); m->mothurOutEndLine(); abort = true; } if ((splitmethod == "classify") && (taxFile == "")) { m->mothurOut("You need to provide a taxonomy file if you are going to use the classify splitmethod."); m->mothurOutEndLine(); abort = true; } showabund = validParameter.validFile(parameters, "showabund", false); if (showabund == "not found") { showabund = "T"; } temp = validParameter.validFile(parameters, "cluster", false); if (temp == "not found") { temp = "T"; } runCluster = m->isTrue(temp); temp = validParameter.validFile(parameters, "islist", false); if (temp == "not found") { temp = "F"; } isList = m->isTrue(temp); timing = validParameter.validFile(parameters, "timing", false); if (timing == "not found") { timing = "F"; } } } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "ClusterSplitCommand"); exit(1); } } //********************************************************************************************************************** int ClusterSplitCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } time_t estart; vector listFileNames; vector< map > distName; set labels; string singletonName = ""; double saveCutoff = cutoff; #ifdef USE_MPI int pid; int tag = 2001; MPI_Status status; MPI_Comm_size(MPI_COMM_WORLD, &processors); //set processors to the number of mpi processes running MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are #endif if (file != "") { deleteFiles = false; estart = time(NULL); #ifdef USE_MPI if (pid == 0) { //only process 0 converts and splits #endif singletonName = readFile(distName); #ifdef USE_MPI } //only process 0 reads //make everyone wait MPI_Barrier(MPI_COMM_WORLD); #endif if (isList) { //set list file as new current listfile string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } }else { //****************** file prep work ******************************// #ifdef USE_MPI if (pid == 0) { //only process 0 converts and splits #endif //if user gave a phylip file convert to column file if (format == "phylip") { estart = time(NULL); m->mothurOut("Converting to column format..."); m->mothurOutEndLine(); ReadCluster* convert = new ReadCluster(distfile, cutoff, outputDir, false); NameAssignment* nameMap = NULL; convert->setFormat("phylip"); convert->read(nameMap); if (m->control_pressed) { delete convert; return 0; } distfile = convert->getOutputFile(); //if no names file given with phylip file, create it ListVector* listToMakeNameFile = convert->getListVector(); if ((namefile == "") && (countfile == "")) { //you need to make a namefile for split matrix ofstream out; namefile = phylipfile + ".names"; m->openOutputFile(namefile, out); for (int i = 0; i < listToMakeNameFile->getNumBins(); i++) { string bin = listToMakeNameFile->get(i); out << bin << '\t' << bin << endl; } out.close(); } delete listToMakeNameFile; delete convert; m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to convert the distance file."); m->mothurOutEndLine(); } if (m->control_pressed) { return 0; } estart = time(NULL); m->mothurOut("Splitting the file..."); m->mothurOutEndLine(); #ifdef USE_MPI } MPI_Barrier(MPI_COMM_WORLD); #endif //split matrix into non-overlapping groups SplitMatrix* split; if (splitmethod == "distance") { split = new SplitMatrix(distfile, namefile, countfile, taxFile, cutoff, splitmethod, large); } else if (splitmethod == "classify") { split = new SplitMatrix(distfile, namefile, countfile, taxFile, taxLevelCutoff, splitmethod, large); } else if (splitmethod == "fasta") { split = new SplitMatrix(fastafile, namefile, countfile, taxFile, taxLevelCutoff, cutoff, splitmethod, processors, classic, outputDir); } else { m->mothurOut("Not a valid splitting method. Valid splitting algorithms are distance, classify or fasta."); m->mothurOutEndLine(); return 0; } #ifdef USE_MPI if ((pid == 0) || (splitmethod == "fasta")) { //only process 0 converts and splits #endif split->split(); #ifdef USE_MPI } MPI_Barrier(MPI_COMM_WORLD); #endif if (m->control_pressed) { delete split; return 0; } singletonName = split->getSingletonNames(); distName = split->getDistanceFiles(); //returns map of distance files -> namefile sorted by distance file size delete split; if (m->debug) { m->mothurOut("[DEBUG]: distName.size() = " + toString(distName.size()) + ".\n"); } //output a merged distance file //if (splitmethod == "fasta") { createMergedDistanceFile(distName); } if (m->control_pressed) { return 0; } m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to split the distance file."); m->mothurOutEndLine(); estart = time(NULL); #ifdef USE_MPI if (pid == 0) { //only process 0 converts and splits #endif if (!runCluster) { string filename = printFile(singletonName, distName); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut(filename); m->mothurOutEndLine(); for (int i = 0; i < distName.size(); i++) { m->mothurOut(distName[i].begin()->first); m->mothurOutEndLine(); m->mothurOut(distName[i].begin()->second); m->mothurOutEndLine(); } m->mothurOutEndLine(); #ifdef USE_MPI #else return 0; #endif } deleteFiles = true; #ifdef USE_MPI } MPI_Barrier(MPI_COMM_WORLD); if (!runCluster) { return 0; } #endif } //****************** break up files between processes and cluster each file set ******************************// #ifdef USE_MPI ////you are process 0 from above//// if (pid == 0) { vector < vector < map > > dividedNames; //distNames[1] = vector of filenames for process 1... dividedNames.resize(processors); //for each file group figure out which process will complete it //want to divide the load intelligently so the big files are spread between processes for (int i = 0; i < distName.size(); i++) { int processToAssign = (i+1) % processors; if (processToAssign == 0) { processToAssign = processors; } dividedNames[(processToAssign-1)].push_back(distName[i]); } //not lets reverse the order of ever other process, so we balance big files running with little ones for (int i = 0; i < processors; i++) { int remainder = ((i+1) % processors); if (remainder) { reverse(dividedNames[i].begin(), dividedNames[i].end()); } } //send each child the list of files it needs to process for(int i = 1; i < processors; i++) { //send number of file pairs int num = dividedNames[i].size(); MPI_Send(&num, 1, MPI_INT, i, tag, MPI_COMM_WORLD); for (int j = 0; j < num; j++) { //send filenames to process i char tempDistFileName[1024]; strcpy(tempDistFileName, (dividedNames[i][j].begin()->first).c_str()); int lengthDist = (dividedNames[i][j].begin()->first).length(); MPI_Send(&lengthDist, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(tempDistFileName, 1024, MPI_CHAR, i, tag, MPI_COMM_WORLD); char tempNameFileName[1024]; strcpy(tempNameFileName, (dividedNames[i][j].begin()->second).c_str()); int lengthName = (dividedNames[i][j].begin()->second).length(); MPI_Send(&lengthName, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(tempNameFileName, 1024, MPI_CHAR, i, tag, MPI_COMM_WORLD); } } //process your share listFileNames = cluster(dividedNames[0], labels); //receive the other processes info for(int i = 1; i < processors; i++) { int num = dividedNames[i].size(); double tempCutoff; MPI_Recv(&tempCutoff, 1, MPI_DOUBLE, i, tag, MPI_COMM_WORLD, &status); if (tempCutoff < cutoff) { cutoff = tempCutoff; } //send list filenames to root process for (int j = 0; j < num; j++) { int lengthList = 0; char tempListFileName[1024]; MPI_Recv(&lengthList, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &status); MPI_Recv(tempListFileName, 1024, MPI_CHAR, i, tag, MPI_COMM_WORLD, &status); string myListFileName = tempListFileName; myListFileName = myListFileName.substr(0, lengthList); listFileNames.push_back(myListFileName); } //send Labels to root process int numLabels = 0; MPI_Recv(&numLabels, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &status); for (int j = 0; j < numLabels; j++) { int lengthLabel = 0; char tempLabel[100]; MPI_Recv(&lengthLabel, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &status); MPI_Recv(tempLabel, 100, MPI_CHAR, i, tag, MPI_COMM_WORLD, &status); string myLabel = tempLabel; myLabel = myLabel.substr(0, lengthLabel); if (labels.count(myLabel) == 0) { labels.insert(myLabel); } } } }else { //you are a child process vector < map > myNames; //recieve the files you need to process //receive number of file pairs int num = 0; MPI_Recv(&num, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); myNames.resize(num); for (int j = 0; j < num; j++) { //receive filenames to process int lengthDist = 0; char tempDistFileName[1024]; MPI_Recv(&lengthDist, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(tempDistFileName, 1024, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status); string myDistFileName = tempDistFileName; myDistFileName = myDistFileName.substr(0, lengthDist); int lengthName = 0; char tempNameFileName[1024]; MPI_Recv(&lengthName, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(tempNameFileName, 1024, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status); string myNameFileName = tempNameFileName; myNameFileName = myNameFileName.substr(0, lengthName); //save file name myNames[j][myDistFileName] = myNameFileName; } //process them listFileNames = cluster(myNames, labels); //send cutoff MPI_Send(&cutoff, 1, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD); //send list filenames to root process for (int j = 0; j < num; j++) { char tempListFileName[1024]; strcpy(tempListFileName, listFileNames[j].c_str()); int lengthList = listFileNames[j].length(); MPI_Send(&lengthList, 1, MPI_INT, 0, tag, MPI_COMM_WORLD); MPI_Send(tempListFileName, 1024, MPI_CHAR, 0, tag, MPI_COMM_WORLD); } //send Labels to root process int numLabels = labels.size(); MPI_Send(&numLabels, 1, MPI_INT, 0, tag, MPI_COMM_WORLD); for(set::iterator it = labels.begin(); it != labels.end(); ++it) { char tempLabel[100]; strcpy(tempLabel, (*it).c_str()); int lengthLabel = (*it).length(); MPI_Send(&lengthLabel, 1, MPI_INT, 0, tag, MPI_COMM_WORLD); MPI_Send(tempLabel, 100, MPI_CHAR, 0, tag, MPI_COMM_WORLD); } } //make everyone wait MPI_Barrier(MPI_COMM_WORLD); #else ///////////////////// WINDOWS CAN ONLY USE 1 PROCESSORS ACCESS VIOLATION UNRESOLVED /////////////////////// //sanity check if (processors > distName.size()) { processors = distName.size(); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if(processors == 1){ listFileNames = cluster(distName, labels); //clusters individual files and returns names of list files }else{ listFileNames = createProcesses(distName, labels); } #else listFileNames = cluster(distName, labels); //clusters individual files and returns names of list files #endif #endif if (m->control_pressed) { for (int i = 0; i < listFileNames.size(); i++) { m->mothurRemove(listFileNames[i]); } return 0; } if (saveCutoff != cutoff) { m->mothurOut("Cutoff was " + toString(saveCutoff) + " changed cutoff to " + toString(cutoff)); m->mothurOutEndLine(); } m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to cluster"); m->mothurOutEndLine(); //****************** merge list file and create rabund and sabund files ******************************// estart = time(NULL); m->mothurOut("Merging the clustered files..."); m->mothurOutEndLine(); #ifdef USE_MPI if (pid == 0) { //only process 0 merges #endif ListVector* listSingle; map labelBins = completeListFile(listFileNames, singletonName, labels, listSingle); //returns map of label to numBins if (m->control_pressed) { if (listSingle != NULL) { delete listSingle; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } mergeLists(listFileNames, labelBins, listSingle); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //delete after all are complete incase a crash happens if (!deleteFiles) { for (int i = 0; i < distName.size(); i++) { m->mothurRemove(distName[i].begin()->first); m->mothurRemove(distName[i].begin()->second); } } m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to merge."); m->mothurOutEndLine(); //set list file as new current listfile string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } //set rabund file as new current rabundfile itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } //set sabund file as new current sabundfile itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); #ifdef USE_MPI } //only process 0 merges //make everyone wait MPI_Barrier(MPI_COMM_WORLD); #endif return 0; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "execute"); exit(1); } } //********************************************************************************************************************** map ClusterSplitCommand::completeListFile(vector listNames, string singleton, set& userLabels, ListVector*& listSingle){ try { map labelBin; vector orderFloat; int numSingleBins; //read in singletons if (singleton != "none") { ifstream in; m->openInputFile(singleton, in); string firstCol, secondCol; listSingle = new ListVector(); if (countfile != "") { m->getline(in); m->gobble(in); } while (!in.eof()) { in >> firstCol >> secondCol; m->getline(in); m->gobble(in); if (countfile == "") { listSingle->push_back(secondCol); } else { listSingle->push_back(firstCol); } } in.close(); m->mothurRemove(singleton); numSingleBins = listSingle->getNumBins(); }else{ listSingle = NULL; numSingleBins = 0; } //go through users set and make them floats so we can sort them for(set::iterator it = userLabels.begin(); it != userLabels.end(); ++it) { float temp = -10.0; if ((*it != "unique") && (convertTestFloat(*it, temp) == true)) { convert(*it, temp); } else if (*it == "unique") { temp = -1.0; } if (temp <= cutoff) { orderFloat.push_back(temp); labelBin[temp] = numSingleBins; //initialize numbins } } //sort order sort(orderFloat.begin(), orderFloat.end()); userLabels.clear(); //get the list info from each file for (int k = 0; k < listNames.size(); k++) { if (m->control_pressed) { if (listSingle != NULL) { delete listSingle; listSingle = NULL; m->mothurRemove(singleton); } for (int i = 0; i < listNames.size(); i++) { m->mothurRemove(listNames[i]); } return labelBin; } InputData* input = new InputData(listNames[k], "list"); ListVector* list = input->getListVector(); string lastLabel = list->getLabel(); string filledInList = listNames[k] + "filledInTemp"; ofstream outFilled; m->openOutputFile(filledInList, outFilled); //for each label needed for(int l = 0; l < orderFloat.size(); l++){ string thisLabel; if (orderFloat[l] == -1) { thisLabel = "unique"; } else { thisLabel = toString(orderFloat[l], length-1); } //this file has reached the end if (list == NULL) { list = input->getListVector(lastLabel, true); }else{ //do you have the distance, or do you need to fill in float labelFloat; if (list->getLabel() == "unique") { labelFloat = -1.0; } else { convert(list->getLabel(), labelFloat); } //check for missing labels if (labelFloat > orderFloat[l]) { //you are missing the label, get the next smallest one //if its bigger get last label, otherwise keep it delete list; list = input->getListVector(lastLabel, true); //get last list vector to use, you actually want to move back in the file } lastLabel = list->getLabel(); } //print to new file list->setLabel(thisLabel); list->print(outFilled); //update labelBin labelBin[orderFloat[l]] += list->getNumBins(); delete list; list = input->getListVector(); } if (list != NULL) { delete list; } delete input; outFilled.close(); m->mothurRemove(listNames[k]); rename(filledInList.c_str(), listNames[k].c_str()); } return labelBin; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "completeListFile"); exit(1); } } //********************************************************************************************************************** int ClusterSplitCommand::mergeLists(vector listNames, map userLabels, ListVector* listSingle){ try { if (outputDir == "") { outputDir += m->hasPath(distfile); } fileroot = outputDir + m->getRootName(m->getSimpleName(distfile)); map variables; variables["[filename]"] = fileroot; variables["[clustertag]"] = tag; string sabundFileName = getOutputFileName("sabund", variables); string rabundFileName = getOutputFileName("rabund", variables); if (countfile != "") { variables["[tag2]"] = "unique_list"; } string listFileName = getOutputFileName("list", variables); map counts; if (countfile == "") { m->openOutputFile(sabundFileName, outSabund); m->openOutputFile(rabundFileName, outRabund); outputNames.push_back(sabundFileName); outputTypes["sabund"].push_back(sabundFileName); outputNames.push_back(rabundFileName); outputTypes["rabund"].push_back(rabundFileName); }else { if (file == "") { CountTable ct; ct.readTable(countfile, false, false); counts = ct.getNameMap(); } } m->openOutputFile(listFileName, outList); outputNames.push_back(listFileName); outputTypes["list"].push_back(listFileName); map::iterator itLabel; //clears out junk for autocompleting of list files above. Perhaps there is a beter way to handle this from within the data structure? m->printedListHeaders = false; //for each label needed for(itLabel = userLabels.begin(); itLabel != userLabels.end(); itLabel++) { string thisLabel; if (itLabel->first == -1) { thisLabel = "unique"; } else { thisLabel = toString(itLabel->first, length-1); } //outList << thisLabel << '\t' << itLabel->second << '\t'; RAbundVector* rabund = NULL; ListVector completeList; completeList.setLabel(thisLabel); if (countfile == "") { rabund = new RAbundVector(); rabund->setLabel(thisLabel); } //add in singletons if (listSingle != NULL) { for (int j = 0; j < listSingle->getNumBins(); j++) { //outList << listSingle->get(j) << '\t'; completeList.push_back(listSingle->get(j)); if (countfile == "") { rabund->push_back(m->getNumNames(listSingle->get(j))); } } } //get the list info from each file for (int k = 0; k < listNames.size(); k++) { if (m->control_pressed) { if (listSingle != NULL) { delete listSingle; } for (int i = 0; i < listNames.size(); i++) { m->mothurRemove(listNames[i]); } if (rabund != NULL) { delete rabund; } return 0; } InputData* input = new InputData(listNames[k], "list"); ListVector* list = input->getListVector(thisLabel); //this file has reached the end if (list == NULL) { m->mothurOut("Error merging listvectors in file " + listNames[k]); m->mothurOutEndLine(); } else { for (int j = 0; j < list->getNumBins(); j++) { //outList << list->get(j) << '\t'; completeList.push_back(list->get(j)); if (countfile == "") { rabund->push_back(m->getNumNames(list->get(j))); } } delete list; } delete input; } if (countfile == "") { SAbundVector sabund = rabund->getSAbundVector(); sabund.print(outSabund); rabund->print(outRabund); } //outList << endl; if (!m->printedListHeaders) { m->listBinLabelsInFile.clear(); completeList.printHeaders(outList); } if (countfile == "") { completeList.print(outList); } else if ((file == "") && (countfile != "")) { completeList.print(outList, counts); } else { completeList.print(outList); } if (rabund != NULL) { delete rabund; } } outList.close(); if (countfile == "") { outRabund.close(); outSabund.close(); } if (listSingle != NULL) { delete listSingle; } for (int i = 0; i < listNames.size(); i++) { m->mothurRemove(listNames[i]); } return 0; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "mergeLists"); exit(1); } } //********************************************************************************************************************** void ClusterSplitCommand::printData(ListVector* oldList){ try { string label = oldList->getLabel(); RAbundVector oldRAbund = oldList->getRAbundVector(); oldRAbund.setLabel(label); if (m->isTrue(showabund)) { oldRAbund.getSAbundVector().print(cout); } oldRAbund.print(outRabund); oldRAbund.getSAbundVector().print(outSabund); oldList->print(outList); } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "printData"); exit(1); } } //********************************************************************************************************************** vector ClusterSplitCommand::createProcesses(vector< map > distName, set& labels){ try { deleteFiles = false; //so if we need to recalc the processors the files are still there bool recalc = false; vector listFiles; vector < vector < map > > dividedNames; //distNames[1] = vector of filenames for process 1... dividedNames.resize(processors); //for each file group figure out which process will complete it //want to divide the load intelligently so the big files are spread between processes for (int i = 0; i < distName.size(); i++) { //cout << i << endl; int processToAssign = (i+1) % processors; if (processToAssign == 0) { processToAssign = processors; } dividedNames[(processToAssign-1)].push_back(distName[i]); if ((processToAssign-1) == 1) { m->mothurOut(distName[i].begin()->first + "\n"); } } //now lets reverse the order of ever other process, so we balance big files running with little ones for (int i = 0; i < processors; i++) { //cout << i << endl; int remainder = ((i+1) % processors); if (remainder) { reverse(dividedNames[i].begin(), dividedNames[i].end()); } } if (m->control_pressed) { return listFiles; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; processIDS.clear(); //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ set labels; vector listFileNames = cluster(dividedNames[process], labels); //write out names to file string filename = m->mothurGetpid(process) + ".temp"; ofstream out; m->openOutputFile(filename, out); out << tag << endl; for (int j = 0; j < listFileNames.size(); j++) { out << listFileNames[j] << endl; } out.close(); //print out labels ofstream outLabels; filename = m->mothurGetpid(process) + ".temp.labels"; m->openOutputFile(filename, outLabels); outLabels << cutoff << endl; for (set::iterator it = labels.begin(); it != labels.end(); it++) { outLabels << (*it) << endl; } outLabels.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;imothurRemove((toString(processIDS[i]) + ".temp")); m->mothurRemove((toString(processIDS[i]) + ".temp.labels")); } m->control_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".temp"));m->mothurRemove((toString(processIDS[i]) + ".temp.labels"));} processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); listFiles.clear(); dividedNames.clear(); //distNames[1] = vector of filenames for process 1... dividedNames.resize(processors); //for each file group figure out which process will complete it //want to divide the load intelligently so the big files are spread between processes for (int i = 0; i < distName.size(); i++) { //cout << i << endl; int processToAssign = (i+1) % processors; if (processToAssign == 0) { processToAssign = processors; } dividedNames[(processToAssign-1)].push_back(distName[i]); if ((processToAssign-1) == 1) { m->mothurOut(distName[i].begin()->first + "\n"); } } //now lets reverse the order of ever other process, so we balance big files running with little ones for (int i = 0; i < processors; i++) { //cout << i << endl; int remainder = ((i+1) % processors); if (remainder) { reverse(dividedNames[i].begin(), dividedNames[i].end()); } } processIDS.resize(0); process = 1; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ set labels; vector listFileNames = cluster(dividedNames[process], labels); //write out names to file string filename = m->mothurGetpid(process) + ".temp"; ofstream out; m->openOutputFile(filename, out); out << tag << endl; for (int j = 0; j < listFileNames.size(); j++) { out << listFileNames[j] << endl; } out.close(); //print out labels ofstream outLabels; filename = m->mothurGetpid(process) + ".temp.labels"; m->openOutputFile(filename, outLabels); outLabels << cutoff << endl; for (set::iterator it = labels.begin(); it != labels.end(); it++) { outLabels << (*it) << endl; } outLabels.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do your part listFiles = cluster(dividedNames[0], labels); //force parent to wait until all the processes are done for (int i=0;i< processIDS.size();i++) { int temp = processIDS[i]; wait(&temp); } //get list of list file names from each process for(int i=0;iopenInputFile(filename, in); in >> tag; m->gobble(in); while(!in.eof()) { string tempName; in >> tempName; m->gobble(in); listFiles.push_back(tempName); } in.close(); m->mothurRemove((toString(processIDS[i]) + ".temp")); //get labels filename = toString(processIDS[i]) + ".temp.labels"; ifstream in2; m->openInputFile(filename, in2); float tempCutoff; in2 >> tempCutoff; m->gobble(in2); if (tempCutoff < cutoff) { cutoff = tempCutoff; } while(!in2.eof()) { string tempName; in2 >> tempName; m->gobble(in2); if (labels.count(tempName) == 0) { labels.insert(tempName); } } in2.close(); m->mothurRemove((toString(processIDS[i]) + ".temp.labels")); } deleteFiles = true; //delete the temp files now that we are done for (int i = 0; i < distName.size(); i++) { string thisNamefile = distName[i].begin()->second; string thisDistFile = distName[i].begin()->first; m->mothurRemove(thisNamefile); m->mothurRemove(thisDistFile); } #else #endif return listFiles; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** vector ClusterSplitCommand::cluster(vector< map > distNames, set& labels){ try { vector listFileNames; double smallestCutoff = cutoff; //cluster each distance file for (int i = 0; i < distNames.size(); i++) { string thisNamefile = distNames[i].begin()->second; string thisDistFile = distNames[i].begin()->first; string listFileName = ""; if (classic) { listFileName = clusterClassicFile(thisDistFile, thisNamefile, labels, smallestCutoff); } else { listFileName = clusterFile(thisDistFile, thisNamefile, labels, smallestCutoff); } if (m->control_pressed) { //clean up for (int i = 0; i < listFileNames.size(); i++) { m->mothurRemove(listFileNames[i]); } listFileNames.clear(); return listFileNames; } listFileNames.push_back(listFileName); } cutoff = smallestCutoff; return listFileNames; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "cluster"); exit(1); } } //********************************************************************************************************************** string ClusterSplitCommand::clusterClassicFile(string thisDistFile, string thisNamefile, set& labels, double& smallestCutoff){ try { string listFileName = ""; ListVector* list = NULL; ListVector oldList; RAbundVector* rabund = NULL; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are //output your files too if (pid != 0) { cout << endl << "Reading " << thisDistFile << endl; } #endif m->mothurOutEndLine(); m->mothurOut("Reading " + thisDistFile); m->mothurOutEndLine(); //reads phylip file storing data in 2D vector, also fills list and rabund bool sim = false; ClusterClassic* cluster = new ClusterClassic(cutoff, method, sim); NameAssignment* nameMap = NULL; CountTable* ct = NULL; if(namefile != ""){ nameMap = new NameAssignment(thisNamefile); nameMap->readMap(); cluster->readPhylipFile(thisDistFile, nameMap); }else if (countfile != "") { ct = new CountTable(); ct->readTable(thisNamefile, false, false); cluster->readPhylipFile(thisDistFile, ct); } tag = cluster->getTag(); if (m->control_pressed) { if(namefile != ""){ delete nameMap; } else { delete ct; } delete cluster; return 0; } list = cluster->getListVector(); rabund = cluster->getRAbundVector(); if (outputDir == "") { outputDir += m->hasPath(thisDistFile); } fileroot = outputDir + m->getRootName(m->getSimpleName(thisDistFile)); listFileName = fileroot+ tag + ".list"; ofstream listFile; m->openOutputFile(fileroot+ tag + ".list", listFile); float previousDist = 0.00000; float rndPreviousDist = 0.00000; oldList = *list; #ifdef USE_MPI //output your files too if (pid != 0) { cout << endl << "Clustering " << thisDistFile << endl; } #endif m->mothurOutEndLine(); m->mothurOut("Clustering " + thisDistFile); m->mothurOutEndLine(); while ((cluster->getSmallDist() < cutoff) && (cluster->getNSeqs() > 1)){ if (m->control_pressed) { delete cluster; delete list; delete rabund; listFile.close(); if(namefile != ""){ delete nameMap; } else { delete ct; } return listFileName; } cluster->update(cutoff); float dist = cluster->getSmallDist(); float rndDist; if (hard) { rndDist = m->ceilDist(dist, precision); }else{ rndDist = m->roundDist(dist, precision); } if(previousDist <= 0.0000 && dist != previousDist){ oldList.setLabel("unique"); oldList.print(listFile); if (labels.count("unique") == 0) { labels.insert("unique"); } } else if(rndDist != rndPreviousDist){ oldList.setLabel(toString(rndPreviousDist, length-1)); oldList.print(listFile); if (labels.count(toString(rndPreviousDist, length-1)) == 0) { labels.insert(toString(rndPreviousDist, length-1)); } } previousDist = dist; rndPreviousDist = rndDist; oldList = *list; } if(previousDist <= 0.0000){ oldList.setLabel("unique"); oldList.print(listFile); if (labels.count("unique") == 0) { labels.insert("unique"); } } else if(rndPreviousDistmothurRemove(thisDistFile); m->mothurRemove(thisNamefile); } return listFileName; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "clusterClassicFile"); exit(1); } } //********************************************************************************************************************** string ClusterSplitCommand::clusterFile(string thisDistFile, string thisNamefile, set& labels, double& smallestCutoff){ try { string listFileName = ""; Cluster* cluster = NULL; SparseDistanceMatrix* matrix = NULL; ListVector* list = NULL; ListVector oldList; RAbundVector* rabund = NULL; if (m->control_pressed) { return listFileName; } #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are //output your files too if (pid != 0) { cout << endl << "Reading " << thisDistFile << endl; } #endif m->mothurOutEndLine(); m->mothurOut("Reading " + thisDistFile); m->mothurOutEndLine(); ReadMatrix* read = new ReadColumnMatrix(thisDistFile); read->setCutoff(cutoff); NameAssignment* nameMap = NULL; CountTable* ct = NULL; if(namefile != ""){ nameMap = new NameAssignment(thisNamefile); nameMap->readMap(); read->read(nameMap); }else if (countfile != "") { ct = new CountTable(); ct->readTable(thisNamefile, false, false); read->read(ct); }else { read->read(nameMap); } list = read->getListVector(); oldList = *list; matrix = read->getDMatrix(); if(countfile != "") { rabund = new RAbundVector(); createRabund(ct, list, rabund); //creates an rabund that includes the counts for the unique list delete ct; }else { rabund = new RAbundVector(list->getRAbundVector()); } delete read; read = NULL; if (namefile != "") { delete nameMap; nameMap = NULL; } #ifdef USE_MPI //output your files too if (pid != 0) { cout << endl << "Clustering " << thisDistFile << endl; } #endif m->mothurOutEndLine(); m->mothurOut("Clustering " + thisDistFile); m->mothurOutEndLine(); //create cluster float adjust = -1.0; if (method == "furthest") { cluster = new CompleteLinkage(rabund, list, matrix, cutoff, method, adjust); } else if(method == "nearest"){ cluster = new SingleLinkage(rabund, list, matrix, cutoff, method, adjust); } else if(method == "average"){ cluster = new AverageLinkage(rabund, list, matrix, cutoff, method, adjust); } tag = cluster->getTag(); if (outputDir == "") { outputDir += m->hasPath(thisDistFile); } fileroot = outputDir + m->getRootName(m->getSimpleName(thisDistFile)); ofstream listFile; m->openOutputFile(fileroot+ tag + ".list", listFile); listFileName = fileroot+ tag + ".list"; float previousDist = 0.00000; float rndPreviousDist = 0.00000; oldList = *list; print_start = true; start = time(NULL); double saveCutoff = cutoff; while (matrix->getSmallDist() < cutoff && matrix->getNNodes() > 0){ if (m->control_pressed) { //clean up delete matrix; delete list; delete cluster; delete rabund; listFile.close(); m->mothurRemove(listFileName); return listFileName; } cluster->update(saveCutoff); float dist = matrix->getSmallDist(); float rndDist; if (hard) { rndDist = m->ceilDist(dist, precision); }else{ rndDist = m->roundDist(dist, precision); } if(previousDist <= 0.0000 && dist != previousDist){ oldList.setLabel("unique"); oldList.print(listFile); if (labels.count("unique") == 0) { labels.insert("unique"); } } else if(rndDist != rndPreviousDist){ oldList.setLabel(toString(rndPreviousDist, length-1)); oldList.print(listFile); if (labels.count(toString(rndPreviousDist, length-1)) == 0) { labels.insert(toString(rndPreviousDist, length-1)); } } previousDist = dist; rndPreviousDist = rndDist; oldList = *list; } if(previousDist <= 0.0000){ oldList.setLabel("unique"); oldList.print(listFile); if (labels.count("unique") == 0) { labels.insert("unique"); } } else if(rndPreviousDistcontrol_pressed) { //clean up m->mothurRemove(listFileName); return listFileName; } if (deleteFiles) { m->mothurRemove(thisDistFile); m->mothurRemove(thisNamefile); } if (saveCutoff != cutoff) { if (hard) { saveCutoff = m->ceilDist(saveCutoff, precision); } else { saveCutoff = m->roundDist(saveCutoff, precision); } m->mothurOut("Cutoff was " + toString(cutoff) + " changed cutoff to " + toString(saveCutoff)); m->mothurOutEndLine(); } if (saveCutoff < smallestCutoff) { smallestCutoff = saveCutoff; } return listFileName; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "clusterFile"); exit(1); } } //********************************************************************************************************************** int ClusterSplitCommand::createMergedDistanceFile(vector< map > distNames) { try{ #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are if (pid != 0) { #endif string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir = m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); string outputFileName = getOutputFileName("column", variables); m->mothurRemove(outputFileName); for (int i = 0; i < distNames.size(); i++) { if (m->control_pressed) { return 0; } string thisDistFile = distNames[i].begin()->first; m->appendFiles(thisDistFile, outputFileName); } outputTypes["column"].push_back(outputFileName); outputNames.push_back(outputFileName); #ifdef USE_MPI } #endif return 0; } catch(exception& e) { m->errorOut(e, "ClusterSplitCommand", "createMergedDistanceFile"); exit(1); } } //********************************************************************************************************************** int ClusterSplitCommand::createRabund(CountTable*& ct, ListVector*& list, RAbundVector*& rabund){ try { rabund->setLabel(list->getLabel()); for(int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { break; } vector binNames; string bin = list->get(i); m->splitAtComma(bin, binNames); int total = 0; for (int j = 0; j < binNames.size(); j++) { total += ct->getNumSeqs(binNames[j]); } rabund->push_back(total); } return 0; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "createRabund"); exit(1); } } //********************************************************************************************************************** string ClusterSplitCommand::printFile(string singleton, vector< map >& distName){ try { ofstream out; map variables; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir = m->hasPath(distfile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(distfile)); string outputFileName = getOutputFileName("file", variables); m->openOutputFile(outputFileName, out); outputTypes["file"].push_back(outputFileName); outputNames.push_back(outputFileName); m->setFileFile(outputFileName); out << singleton << endl; if (namefile != "") { out << "name" << endl; } else if (countfile != "") { out << "count" << endl; } else { out << "unknown" << endl; } for (int i = 0; i < distName.size(); i++) { out << distName[i].begin()->first << '\t' << distName[i].begin()->second << endl; } out.close(); return outputFileName; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "printFile"); exit(1); } } //********************************************************************************************************************** string ClusterSplitCommand::readFile(vector< map >& distName){ try { string singleton, thiscolumn, thisname, type; ifstream in; m->openInputFile(file, in); in >> singleton; m->gobble(in); string path = m->hasPath(singleton); if (path == "") { singleton = inputDir + singleton; } in >> type; m->gobble(in); if (type == "name") { namefile = "name"; } else if (type == "count") { countfile = "count"; } else { m->mothurOut("[ERROR]: unknown file type. Are the files in column 2 of the file name files or count files? Please change unknown to name or count.\n"); m->control_pressed = true; } if (isList) { vector listFileNames; string thisListFileName = ""; set listLabels; while(!in.eof()) { if (m->control_pressed) { break; } in >> thisListFileName; m->gobble(in); string path = m->hasPath(thisListFileName); if (path == "") { thisListFileName = inputDir + thisListFileName; } getLabels(thisListFileName, listLabels); listFileNames.push_back(thisListFileName); } ListVector* listSingle; map labelBins = completeListFile(listFileNames, singleton, listLabels, listSingle); mergeLists(listFileNames, labelBins, listSingle); }else { while(!in.eof()) { if (m->control_pressed) { break; } in >> thiscolumn; m->gobble(in); in >> thisname; m->gobble(in); map temp; temp[thiscolumn] = thisname; distName.push_back(temp); } } in.close(); return singleton; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "readFile"); exit(1); } } //********************************************************************************************************************** int ClusterSplitCommand::getLabels(string file, set& listLabels){ try { ifstream in; m->openInputFile(file, in); //read headers m->getline(in); m->gobble(in); string label; while(!in.eof()) { if (m->control_pressed) { break; } in >> label; m->getline(in); m->gobble(in); listLabels.insert(label); } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "ClusterCommand", "getLabels"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/clustersplitcommand.h000066400000000000000000000253711255543666200225140ustar00rootroot00000000000000#ifndef CLUSTERSPLITCOMMAND_H #define CLUSTERSPLITCOMMAND_H /* * clustersplitcommand.h * Mothur * * Created by westcott on 5/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "rabundvector.hpp" #include "sabundvector.hpp" #include "listvector.hpp" #include "cluster.hpp" #include "sparsedistancematrix.h" #include "readcluster.h" #include "splitmatrix.h" #include "readphylip.h" #include "readcolumn.h" #include "readmatrix.hpp" #include "inputdata.h" #include "clustercommand.h" #include "clusterclassic.h" class ClusterSplitCommand : public Command { public: ClusterSplitCommand(string); ClusterSplitCommand(); ~ClusterSplitCommand() {} vector setParameters(); string getCommandName() { return "cluster.split"; } string getCommandCategory() { return "Clustering"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Westcott SL (2011). Assessing and improving methods used in OTU-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 77:3219. \nhttp://www.mothur.org/wiki/Cluster.split"; } string getDescription() { return "splits your sequences by distance or taxonomy then clusters into OTUs"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector processIDS; //processid vector outputNames; string file, method, fileroot, tag, outputDir, phylipfile, columnfile, namefile, countfile, distfile, format, showabund, timing, splitmethod, taxFile, fastafile, inputDir; double cutoff, splitcutoff; int precision, length, processors, taxLevelCutoff; bool print_start, abort, hard, large, classic, runCluster, deleteFiles, isList; time_t start; ofstream outList, outRabund, outSabund; void printData(ListVector*); vector createProcesses(vector< map >, set&); vector cluster(vector< map >, set&); string clusterFile(string, string, set&, double&); string clusterClassicFile(string, string, set&, double&); int mergeLists(vector, map, ListVector*); map completeListFile(vector, string, set&, ListVector*&); int createMergedDistanceFile(vector< map >); int createRabund(CountTable*& ct, ListVector*& list, RAbundVector*& rabund); string readFile(vector< map >&); string printFile(string, vector< map >&); int getLabels(string, set& listLabels); }; /////////////////not working for Windows//////////////////////////////////////////////////////////// // getting an access violation error. This is most likely caused by the // threads stepping on eachother's structures, as I can run the thread function and the cluster fuction // in separately without errors occuring. I suspect it may be in the use of the // static class mothurOut, but I can't pinpoint the problem. All other objects are made new // within the thread. MothurOut is used by almost all the classes in mothur, so if this was // really the cause I would expect to see all the windows threaded commands to have issues, but not // all do. So far, shhh.flows and trim.flows have similiar problems. Other thoughts, could it have // anything to do with mothur's use of copy constructors in many of our data structures. ie. listvector // is copied by nameassignment and passed to read which passes to the thread? -westcott 2-8-12 //////////////////////////////////////////////////////////////////////////////////////////////////// /************************************************************************************************** //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct clusterData { set labels; vector < map > distNames; string method; MothurOut* m; double cutoff, precision; string tag, outputDir; vector listFiles; bool hard; int length, threadID; clusterData(){} clusterData(vector < map > dv, MothurOut* mout, double cu, string me, string ou, bool hd, double pre, int len, int th) { distNames = dv; m = mout; cutoff = cu; method = me; outputDir = ou; hard = hd; precision = pre; length = len; threadID = th; } }; /************************************************************************************************** #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyClusterThreadFunction(LPVOID lpParam){ clusterData* pDataArray; pDataArray = (clusterData*)lpParam; try { cout << "starting " << endl; double smallestCutoff = pDataArray->cutoff; //cluster each distance file for (int i = 0; i < pDataArray->distNames.size(); i++) { Cluster* mycluster = NULL; SparseMatrix* mymatrix = NULL; ListVector* mylist = NULL; ListVector myoldList; RAbundVector* myrabund = NULL; if (pDataArray->m->control_pressed) { break; } string thisNamefile = pDataArray->distNames[i].begin()->second; string thisDistFile = pDataArray->distNames[i].begin()->first; cout << thisNamefile << '\t' << thisDistFile << endl; pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("Reading " + thisDistFile); pDataArray->m->mothurOutEndLine(); ReadMatrix* myread = new ReadColumnMatrix(thisDistFile); myread->setCutoff(pDataArray->cutoff); NameAssignment* mynameMap = new NameAssignment(thisNamefile); mynameMap->readMap(); cout << "done reading " << thisNamefile << endl; myread->read(mynameMap); cout << "done reading " << thisDistFile << endl; if (pDataArray->m->control_pressed) { delete myread; delete mynameMap; break; } mylist = myread->getListVector(); myoldList = *mylist; mymatrix = myread->getMatrix(); cout << "here" << endl; delete myread; myread = NULL; delete mynameMap; mynameMap = NULL; pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("Clustering " + thisDistFile); pDataArray->m->mothurOutEndLine(); myrabund = new RAbundVector(mylist->getRAbundVector()); cout << "here" << endl; //create cluster if (pDataArray->method == "furthest") { mycluster = new CompleteLinkage(myrabund, mylist, mymatrix, pDataArray->cutoff, pDataArray->method); } else if(pDataArray->method == "nearest"){ mycluster = new SingleLinkage(myrabund, mylist, mymatrix, pDataArray->cutoff, pDataArray->method); } else if(pDataArray->method == "average"){ mycluster = new AverageLinkage(myrabund, mylist, mymatrix, pDataArray->cutoff, pDataArray->method); } pDataArray->tag = mycluster->getTag(); cout << "here" << endl; if (pDataArray->outputDir == "") { pDataArray->outputDir += pDataArray->m->hasPath(thisDistFile); } string fileroot = pDataArray->outputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(thisDistFile)); cout << "here" << endl; ofstream listFile; pDataArray->m->openOutputFile(fileroot+ pDataArray->tag + ".list", listFile); cout << "here" << endl; pDataArray->listFiles.push_back(fileroot+ pDataArray->tag + ".list"); float previousDist = 0.00000; float rndPreviousDist = 0.00000; myoldList = *mylist; bool print_start = true; int start = time(NULL); double saveCutoff = pDataArray->cutoff; while (mymatrix->getSmallDist() < pDataArray->cutoff && mymatrix->getNNodes() > 0){ if (pDataArray->m->control_pressed) { //clean up delete mymatrix; delete mylist; delete mycluster; delete myrabund; listFile.close(); for (int i = 0; i < pDataArray->listFiles.size(); i++) { pDataArray->m->mothurRemove(pDataArray->listFiles[i]); } pDataArray->listFiles.clear(); break; } mycluster->update(saveCutoff); float dist = mymatrix->getSmallDist(); float rndDist; if (pDataArray->hard) { rndDist = pDataArray->m->ceilDist(dist, pDataArray->precision); }else{ rndDist = pDataArray->m->roundDist(dist, pDataArray->precision); } if(previousDist <= 0.0000 && dist != previousDist){ myoldList.setLabel("unique"); myoldList.print(listFile); if (pDataArray->labels.count("unique") == 0) { pDataArray->labels.insert("unique"); } } else if(rndDist != rndPreviousDist){ myoldList.setLabel(toString(rndPreviousDist, pDataArray->length-1)); myoldList.print(listFile); if (pDataArray->labels.count(toString(rndPreviousDist, pDataArray->length-1)) == 0) { pDataArray->labels.insert(toString(rndPreviousDist, pDataArray->length-1)); } } previousDist = dist; rndPreviousDist = rndDist; myoldList = *mylist; } cout << "here2" << endl; if(previousDist <= 0.0000){ myoldList.setLabel("unique"); myoldList.print(listFile); if (pDataArray->labels.count("unique") == 0) { pDataArray->labels.insert("unique"); } } else if(rndPreviousDistcutoff){ myoldList.setLabel(toString(rndPreviousDist, pDataArray->length-1)); myoldList.print(listFile); if (pDataArray->labels.count(toString(rndPreviousDist, pDataArray->length-1)) == 0) { pDataArray->labels.insert(toString(rndPreviousDist, pDataArray->length-1)); } } delete mymatrix; delete mylist; delete mycluster; delete myrabund; mymatrix = NULL; mylist = NULL; mycluster = NULL; myrabund = NULL; listFile.close(); if (pDataArray->m->control_pressed) { //clean up for (int i = 0; i < pDataArray->listFiles.size(); i++) { pDataArray->m->mothurRemove(pDataArray->listFiles[i]); } pDataArray->listFiles.clear(); break; } cout << "here3" << endl; pDataArray->m->mothurRemove(thisDistFile); pDataArray->m->mothurRemove(thisNamefile); cout << "here4" << endl; if (saveCutoff != pDataArray->cutoff) { if (pDataArray->hard) { saveCutoff = pDataArray->m->ceilDist(saveCutoff, pDataArray->precision); } else { saveCutoff = pDataArray->m->roundDist(saveCutoff, pDataArray->precision); } pDataArray->m->mothurOut("Cutoff was " + toString(pDataArray->cutoff) + " changed cutoff to " + toString(saveCutoff)); pDataArray->m->mothurOutEndLine(); } cout << "here5" << endl; if (saveCutoff < smallestCutoff) { smallestCutoff = saveCutoff; } } pDataArray->cutoff = smallestCutoff; return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ClusterSplitCommand", "MyClusterThreadFunction"); exit(1); } } #endif */ #endif mothur-1.36.1/source/commands/collectcommand.cpp000066400000000000000000000776201255543666200217430ustar00rootroot00000000000000/* * collectcommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "collectcommand.h" #include "ace.h" #include "sobs.h" #include "nseqs.h" #include "chao1.h" #include "bootstrap.h" #include "simpson.h" #include "simpsoneven.h" #include "invsimpson.h" #include "npshannon.h" #include "shannon.h" #include "smithwilson.h" #include "heip.h" #include "shannoneven.h" #include "jackknife.h" #include "geom.h" #include "qstat.h" #include "logsd.h" #include "bergerparker.h" #include "bstick.h" #include "goodscoverage.h" #include "efron.h" #include "boneh.h" #include "solow.h" #include "shen.h" #include "coverage.h" #include "shannonrange.h" //********************************************************************************************************************** vector CollectCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false,true); parameters.push_back(plist); CommandParameter prabund("rabund", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false,true); parameters.push_back(prabund); CommandParameter psabund("sabund", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false,true); parameters.push_back(psabund); CommandParameter pshared("shared", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false,true); parameters.push_back(pshared); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pfreq("freq", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pfreq); CommandParameter pcalc("calc", "Multiple", "sobs-chao-nseqs-coverage-ace-jack-shannon-shannoneven-npshannon-heip-smithwilson-simpson-simpsoneven-invsimpson-bootstrap-geometric-qstat-logseries-bergerparker-bstick-goodscoverage-efron-boneh-solow-shen", "sobs-chao-ace-jack-shannon-npshannon-simpson-shannonrange", "", "", "","",true,false,true); parameters.push_back(pcalc); CommandParameter pabund("abund", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pabund); CommandParameter palpha("alpha", "Multiple", "0-1-2", "1", "", "", "","",false,false,true); parameters.push_back(palpha); CommandParameter psize("size", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psize); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "CollectCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string CollectCommand::getHelpString(){ try { string helpString = ""; ValidCalculators validCalculator; helpString += "The collect.single command parameters are list, sabund, rabund, shared, label, freq, calc, alpha and abund. list, sabund, rabund or shared is required unless you have a valid current file. \n"; helpString += "The collect.single command should be in the following format: \n"; helpString += "The freq parameter is used indicate when to output your data, by default it is set to 100. But you can set it to a percentage of the number of sequence. For example freq=0.10, means 10%. \n"; helpString += "collect.single(label=yourLabel, freq=yourFreq, calc=yourEstimators).\n"; helpString += "Example collect(label=unique-.01-.03, freq=10, calc=sobs-chao-ace-jack).\n"; helpString += "The default values for freq is 100, and calc are sobs-chao-ace-jack-shannon-npshannon-simpson.\n"; helpString += "The alpha parameter is used to set the alpha value for the shannonrange calculator.\n"; helpString += validCalculator.printCalc("single"); helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "Note: No spaces between parameter labels (i.e. freq), '=' and parameters (i.e.yourFreq).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "CollectCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string CollectCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "sobs") { pattern = "[filename],sobs"; } else if (type == "chao") { pattern = "[filename],chao"; } else if (type == "nseqs") { pattern = "[filename],nseqs"; } else if (type == "coverage") { pattern = "[filename],coverage"; } else if (type == "ace") { pattern = "[filename],ace"; } else if (type == "jack") { pattern = "[filename],jack"; } else if (type == "shannon") { pattern = "[filename],shannon"; } else if (type == "shannoneven") { pattern = "[filename],shannoneven"; } else if (type == "shannonrange"){ pattern = "[filename],shannonrange"; } else if (type == "npshannon") { pattern = "[filename],npshannon"; } else if (type == "heip") { pattern = "[filename],heip"; } else if (type == "smithwilson") { pattern = "[filename],smithwilson"; } else if (type == "simpson") { pattern = "[filename],simpson"; } else if (type == "simpsoneven") { pattern = "[filename],simpsoneven"; } else if (type == "invsimpson") { pattern = "[filename],invsimpson"; } else if (type == "bootstrap") { pattern = "[filename],bootstrap"; } else if (type == "geometric") { pattern = "[filename],geometric"; } else if (type == "qstat") { pattern = "[filename],qstat"; } else if (type == "logseries") { pattern = "[filename],logseries"; } else if (type == "bergerparker") { pattern = "[filename],bergerparker"; } else if (type == "bstick") { pattern = "[filename],bstick"; } else if (type == "goodscoverage") { pattern = "[filename],goodscoverage"; } else if (type == "efron") { pattern = "[filename],efron"; } else if (type == "boneh") { pattern = "[filename],boneh"; } else if (type == "solow") { pattern = "[filename],solow"; } else if (type == "shen") { pattern = "[filename],shen"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "CollectCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** CollectCommand::CollectCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["sobs"] = tempOutNames; outputTypes["chao"] = tempOutNames; outputTypes["nseqs"] = tempOutNames; outputTypes["coverage"] = tempOutNames; outputTypes["ace"] = tempOutNames; outputTypes["jack"] = tempOutNames; outputTypes["shannon"] = tempOutNames; outputTypes["shannoneven"] = tempOutNames; outputTypes["shannonrange"] = tempOutNames; outputTypes["npshannon"] = tempOutNames; outputTypes["heip"] = tempOutNames; outputTypes["smithwilson"] = tempOutNames; outputTypes["simpson"] = tempOutNames; outputTypes["simpsoneven"] = tempOutNames; outputTypes["invsimpson"] = tempOutNames; outputTypes["bootstrap"] = tempOutNames; outputTypes["geometric"] = tempOutNames; outputTypes["qstat"] = tempOutNames; outputTypes["logseries"] = tempOutNames; outputTypes["bergerparker"] = tempOutNames; outputTypes["bstick"] = tempOutNames; outputTypes["goodscoverage"] = tempOutNames; outputTypes["efron"] = tempOutNames; outputTypes["boneh"] = tempOutNames; outputTypes["solow"] = tempOutNames; outputTypes["shen"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "CollectCommand", "CollectCommand"); exit(1); } } //********************************************************************************************************************** CollectCommand::CollectCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); calledHelp = true; abort = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["sobs"] = tempOutNames; outputTypes["chao"] = tempOutNames; outputTypes["nseqs"] = tempOutNames; outputTypes["coverage"] = tempOutNames; outputTypes["ace"] = tempOutNames; outputTypes["jack"] = tempOutNames; outputTypes["shannon"] = tempOutNames; outputTypes["shannoneven"] = tempOutNames; outputTypes["npshannon"] = tempOutNames; outputTypes["heip"] = tempOutNames; outputTypes["smithwilson"] = tempOutNames; outputTypes["simpson"] = tempOutNames; outputTypes["simpsoneven"] = tempOutNames; outputTypes["shannonrange"] = tempOutNames; outputTypes["invsimpson"] = tempOutNames; outputTypes["bootstrap"] = tempOutNames; outputTypes["geometric"] = tempOutNames; outputTypes["qstat"] = tempOutNames; outputTypes["logseries"] = tempOutNames; outputTypes["bergerparker"] = tempOutNames; outputTypes["bstick"] = tempOutNames; outputTypes["goodscoverage"] = tempOutNames; outputTypes["efron"] = tempOutNames; outputTypes["boneh"] = tempOutNames; outputTypes["solow"] = tempOutNames; outputTypes["shen"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; inputfile = listfile; m->setListFile(listfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { sabundfile = ""; abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { format = "sabund"; inputfile = sabundfile; m->setSabundFile(sabundfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { rabundfile = ""; abort = true; } else if (rabundfile == "not found") { rabundfile = ""; } else { format = "rabund"; inputfile = rabundfile; m->setRabundFile(rabundfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { format = "sharedfile"; inputfile = sharedfile; m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } if ((sharedfile == "") && (listfile == "") && (rabundfile == "") && (sabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputfile = sharedfile; format = "sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { inputfile = listfile; format = "list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { rabundfile = m->getRabundFile(); if (rabundfile != "") { inputfile = rabundfile; format = "rabund"; m->mothurOut("Using " + rabundfile + " as input file for the rabund parameter."); m->mothurOutEndLine(); } else { sabundfile = m->getSabundFile(); if (sabundfile != "") { inputfile = sabundfile; format = "sabund"; m->mothurOut("Using " + sabundfile + " as input file for the sabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list, sabund, rabund or shared file before you can use the collect.single command."); m->mothurOutEndLine(); abort = true; } } } } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } //NOTE: if you add new calc options, don't forget to add them to the parameter initialize in setParameters or the gui won't be able to use them calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "sobs-chao-ace-jack-shannon-npshannon-simpson"; } else { if (calc == "default") { calc = "sobs-chao-ace-jack-shannon-npshannon-simpson"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } string temp; temp = validParameter.validFile(parameters, "freq", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, freq); temp = validParameter.validFile(parameters, "alpha", false); if (temp == "not found") { temp = "1"; } m->mothurConvert(temp, alpha); if ((alpha != 0) && (alpha != 1) && (alpha != 2)) { m->mothurOut("[ERROR]: Not a valid alpha value. Valid values are 0, 1 and 2."); m->mothurOutEndLine(); abort=true; } temp = validParameter.validFile(parameters, "abund", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, abund); temp = validParameter.validFile(parameters, "size", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, size); } } catch(exception& e) { m->errorOut(e, "CollectCommand", "CollectCommand"); exit(1); } } //********************************************************************************************************************** int CollectCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if ((format != "sharedfile")) { inputFileNames.push_back(inputfile); } else { inputFileNames = parseSharedFile(sharedfile); format = "rabund"; } for (int p = 0; p < inputFileNames.size(); p++) { if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } m->clearGroups(); return 0; } if (outputDir == "") { outputDir += m->hasPath(inputFileNames[p]); } string fileNameRoot = outputDir + m->getRootName(m->getSimpleName(inputFileNames[p])); map variables; variables["[filename]"] = fileNameRoot; //globaldata->inputFileName = inputFileNames[p]; if (inputFileNames.size() > 1) { m->mothurOutEndLine(); m->mothurOut("Processing group " + groups[p]); m->mothurOutEndLine(); m->mothurOutEndLine(); } ValidCalculators validCalculator; for (int i=0; igetOrderVector(); string lastLabel = order->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } outputTypes.clear(); delete input; delete order; m->clearGroups(); return 0; } while((order != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } outputTypes.clear(); delete input; delete order; m->clearGroups(); return 0; } if(allLines == 1 || labels.count(order->getLabel()) == 1){ m->mothurOut(order->getLabel()); m->mothurOutEndLine(); cCurve = new Collect(order, cDisplays); cCurve->getCurve(freq); delete cCurve; processedLabels.insert(order->getLabel()); userLabels.erase(order->getLabel()); } //you have a label the user want that is smaller than this label and the last label has not already been processed if ((m->anyLabelsToProcess(order->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = order->getLabel(); delete order; order = (input->getOrderVector(lastLabel)); m->mothurOut(order->getLabel()); m->mothurOutEndLine(); cCurve = new Collect(order, cDisplays); cCurve->getCurve(freq); delete cCurve; processedLabels.insert(order->getLabel()); userLabels.erase(order->getLabel()); //restore real lastlabel to save below order->setLabel(saveLabel); } lastLabel = order->getLabel(); delete order; order = (input->getOrderVector()); } if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } outputTypes.clear(); delete input; m->clearGroups(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (order != NULL) { delete order; } order = (input->getOrderVector(lastLabel)); m->mothurOut(order->getLabel()); m->mothurOutEndLine(); cCurve = new Collect(order, cDisplays); cCurve->getCurve(freq); delete cCurve; if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } outputTypes.clear(); delete input; delete order; m->clearGroups(); return 0; } delete order; } for(int i=0;icontrol_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "CollectCommand", "execute"); exit(1); } } //********************************************************************************************************************** vector CollectCommand::parseSharedFile(string filename) { try { vector filenames; map filehandles; map::iterator it3; input = new InputData(filename, "sharedfile"); vector lookup = input->getSharedRAbundVectors(); string sharedFileRoot = m->getRootName(filename); //clears file before we start to write to it below for (int i=0; imothurRemove((sharedFileRoot + lookup[i]->getGroup() + ".rabund")); filenames.push_back((sharedFileRoot + lookup[i]->getGroup() + ".rabund")); } ofstream* temp; for (int i=0; igetGroup()] = temp; groups.push_back(lookup[i]->getGroup()); } while(lookup[0] != NULL) { for (int i = 0; i < lookup.size(); i++) { RAbundVector rav = lookup[i]->getRAbundVector(); m->openOutputFileAppend(sharedFileRoot + lookup[i]->getGroup() + ".rabund", *(filehandles[lookup[i]->getGroup()])); rav.print(*(filehandles[lookup[i]->getGroup()])); (*(filehandles[lookup[i]->getGroup()])).close(); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } //free memory for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { delete it3->second; } delete input; return filenames; } catch(exception& e) { m->errorOut(e, "CollectCommand", "parseSharedFile"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/collectcommand.h000066400000000000000000000045621255543666200214030ustar00rootroot00000000000000#ifndef COLLECTCOMMAND_H #define COLLECTCOMMAND_H /* * collectcommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "ordervector.hpp" #include "inputdata.h" #include "collect.h" #include "display.h" #include "validcalculator.h" /*The collect() command: The collect command generates a collector's curve from the given file. The collect command outputs a file for each estimator you choose to use. The collect command parameters are label, freq, single, abund. No parameters are required. The collect command should be in the following format: collect(label=yourLabel, freq=yourFreq, single=yourEstimators, abund=yourAbund). example collect(label=unique-.01-.03, freq=10, single=collect-chao-ace-jack). The default values for freq is 100, for abund is 10, and single are collect-chao-ace-jack-bootstrap-shannon-npshannon-simpson. The valid single estimators are: collect-chao-ace-jack-bootstrap-shannon-npshannon-simpson. The label parameter is used to analyze specific labels in your input. */ class CollectCommand : public Command { public: CollectCommand(string); CollectCommand(); ~CollectCommand(){} vector setParameters(); string getCommandName() { return "collect.single"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getCitation() { return "Schloss PD, Handelsman J (2006). Introducing SONS, A tool that compares the membership of microbial communities. Appl Environ Microbiol 72: 6773-9. \nhttp://www.mothur.org/wiki/Collect.single"; } string getHelpString(); string getOutputPattern(string); string getDescription() { return "generates collector's curves using calculators, that describe the richness, diversity, and other features of individual samples"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: OrderVector* order; InputData* input; Collect* cCurve; vector cDisplays; int abund, size, alpha; float freq; vector outputNames; bool abort, allLines; set labels; //holds labels to be used string label, calc, outputDir, sharedfile, listfile, rabundfile, sabundfile, format, inputfile; vector Estimators; vector inputFileNames; vector groups; vector parseSharedFile(string); }; #endif mothur-1.36.1/source/commands/collectsharedcommand.cpp000066400000000000000000001070241255543666200231220ustar00rootroot00000000000000/* * collectsharedcommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "collectsharedcommand.h" #include "sharedsobscollectsummary.h" #include "sharedchao1.h" #include "sharedace.h" #include "sharedjabund.h" #include "sharedsorabund.h" #include "sharedjclass.h" #include "sharedsorclass.h" #include "sharedjest.h" #include "sharedsorest.h" #include "sharedthetayc.h" #include "sharedthetan.h" #include "sharedkstest.h" #include "whittaker.h" #include "sharednseqs.h" #include "sharedochiai.h" #include "sharedanderbergs.h" #include "sharedkulczynski.h" #include "sharedkulczynskicody.h" #include "sharedlennon.h" #include "sharedmorisitahorn.h" #include "sharedbraycurtis.h" #include "sharedjackknife.h" #include "whittaker.h" #include "odum.h" #include "canberra.h" #include "structeuclidean.h" #include "structchord.h" #include "hellinger.h" #include "manhattan.h" #include "structpearson.h" #include "soergel.h" #include "spearman.h" #include "structkulczynski.h" #include "structchi2.h" #include "speciesprofile.h" #include "hamming.h" #include "gower.h" #include "memchi2.h" #include "memchord.h" #include "memeuclidean.h" #include "mempearson.h" #include "sharedjsd.h" #include "sharedrjsd.h" //********************************************************************************************************************** vector CollectSharedCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pshared); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pfreq("freq", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pfreq); CommandParameter pcalc("calc", "Multiple", "sharedchao-sharedsobs-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan-kstest-whittaker-sharednseqs-ochiai-anderberg-kulczynski-kulczynskicody-lennon-morisitahorn-braycurtis-odum-canberra-structeuclidean-structchord-hellinger-manhattan-structpearson-soergel-spearman-structkulczynski-speciesprofile-structchi2-hamming-gower-memchi2-memchord-memeuclidean-mempearson-jsd-rjsd", "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan", "", "", "","",true,false,true); parameters.push_back(pcalc); CommandParameter pall("all", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pall); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "CollectSharedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string CollectSharedCommand::getHelpString(){ try { string helpString = ""; ValidCalculators validCalculator; helpString += "The collect.shared command parameters are shared, label, freq, calc and groups. shared is required if there is no current sharedfile. \n"; helpString += "The collect.shared command should be in the following format: \n"; helpString += "collect.shared(label=yourLabel, freq=yourFreq, calc=yourEstimators, groups=yourGroups).\n"; helpString += "Example collect.shared(label=unique-.01-.03, freq=10, groups=B-C, calc=sharedchao-sharedace-jabund-sorensonabund-jclass-sorclass-jest-sorest-thetayc-thetan).\n"; helpString += "The default values for freq is 100 and calc are sharedsobs-sharedchao-sharedace-jabund-sorensonabund-jclass-sorclass-jest-sorest-thetayc-thetan.\n"; helpString += "The default value for groups is all the groups in your groupfile.\n"; helpString += "The freq parameter is used indicate when to output your data, by default it is set to 100. But you can set it to a percentage of the number of sequence. For example freq=0.10, means 10%. \n"; helpString += validCalculator.printCalc("shared"); helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The all parameter is used to specify if you want the estimate of all your groups together. This estimate can only be made for sharedsobs and sharedchao calculators. The default is false.\n"; helpString += "If you use sharedchao and run into memory issues, set all to false. \n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. You must enter at least 2 valid groups.\n"; helpString += "Note: No spaces between parameter labels (i.e. shared), '=' and parameters (i.e.yourSharedfile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "CollectSharedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string CollectSharedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "sharedchao") { pattern = "[filename],shared.chao"; } else if (type == "sharedsobs") { pattern = "[filename],shared.sobs"; } else if (type == "sharedace") { pattern = "[filename],shared.ace"; } else if (type == "jabund") { pattern = "[filename],jabund"; } else if (type == "sorabund") { pattern = "[filename],sorabund"; } else if (type == "jclass") { pattern = "[filename],jclass"; } else if (type == "sorclass") { pattern = "[filename],sorclass"; } else if (type == "jest") { pattern = "[filename],jest"; } else if (type == "sorest") { pattern = "[filename],sorest"; } else if (type == "thetayc") { pattern = "[filename],thetayc"; } else if (type == "thetan") { pattern = "[filename],thetan"; } else if (type == "kstest") { pattern = "[filename],kstest"; } else if (type == "whittaker") { pattern = "[filename],whittaker"; } else if (type == "sharednseqs") { pattern = "[filename],shared.nseqs"; } else if (type == "ochiai") { pattern = "[filename],ochiai"; } else if (type == "anderberg") { pattern = "[filename],anderberg"; } else if (type == "kulczynski") { pattern = "[filename],kulczynski"; } else if (type == "kulczynskicody") { pattern = "[filename],kulczynskicody"; } else if (type == "lennon") { pattern = "[filename],lennon"; } else if (type == "morisitahorn") { pattern = "[filename],morisitahorn"; } else if (type == "braycurtis") { pattern = "[filename],braycurtis"; } else if (type == "odum") { pattern = "[filename],odum"; } else if (type == "canberra") { pattern = "[filename],canberra"; } else if (type == "structeuclidean") { pattern = "[filename],structeuclidean"; } else if (type == "structchord") { pattern = "[filename],structchord"; } else if (type == "hellinger") { pattern = "[filename],hellinger"; } else if (type == "manhattan") { pattern = "[filename],manhattan"; } else if (type == "structpearson") { pattern = "[filename],structpearson"; } else if (type == "soergel") { pattern = "[filename],soergel"; } else if (type == "spearman") { pattern = "[filename],spearman"; } else if (type == "structkulczynski") { pattern = "[filename],structkulczynski";} else if (type == "structchi2") { pattern = "[filename],structchi2"; } else if (type == "speciesprofile") { pattern = "[filename],speciesprofile"; } else if (type == "hamming") { pattern = "[filename],hamming"; } else if (type == "gower") { pattern = "[filename],gower"; } else if (type == "memchi2") { pattern = "[filename],memchi2"; } else if (type == "memchord") { pattern = "[filename],memchord"; } else if (type == "memeuclidean") { pattern = "[filename],memeuclidean"; } else if (type == "mempearson") { pattern = "[filename],mempearson"; } else if (type == "jsd") { pattern = "[filename],jsd"; } else if (type == "rjsd") { pattern = "[filename],rjsd"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "CollectSharedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** CollectSharedCommand::CollectSharedCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["sharedchao"] = tempOutNames; outputTypes["sharedsobs"] = tempOutNames; outputTypes["sharedace"] = tempOutNames; outputTypes["jabund"] = tempOutNames; outputTypes["sorabund"] = tempOutNames; outputTypes["jclass"] = tempOutNames; outputTypes["sorclass"] = tempOutNames; outputTypes["jest"] = tempOutNames; outputTypes["sorest"] = tempOutNames; outputTypes["thetayc"] = tempOutNames; outputTypes["thetan"] = tempOutNames; outputTypes["kstest"] = tempOutNames; outputTypes["whittaker"] = tempOutNames; outputTypes["sharednseqs"] = tempOutNames; outputTypes["ochiai"] = tempOutNames; outputTypes["anderberg"] = tempOutNames; outputTypes["kulczynski"] = tempOutNames; outputTypes["kulczynskicody"] = tempOutNames; outputTypes["lennon"] = tempOutNames; outputTypes["morisitahorn"] = tempOutNames; outputTypes["braycurtis"] = tempOutNames; outputTypes["odum"] = tempOutNames; outputTypes["canberra"] = tempOutNames; outputTypes["structeuclidean"] = tempOutNames; outputTypes["structchord"] = tempOutNames; outputTypes["hellinger"] = tempOutNames; outputTypes["manhattan"] = tempOutNames; outputTypes["structpearson"] = tempOutNames; outputTypes["soergel"] = tempOutNames; outputTypes["spearman"] = tempOutNames; outputTypes["structkulczynski"] = tempOutNames; outputTypes["structchi2"] = tempOutNames; outputTypes["speciesprofile"] = tempOutNames; outputTypes["hamming"] = tempOutNames; outputTypes["gower"] = tempOutNames; outputTypes["memchi2"] = tempOutNames; outputTypes["memchord"] = tempOutNames; outputTypes["memeuclidean"] = tempOutNames; outputTypes["mempearson"] = tempOutNames; outputTypes["jsd"] = tempOutNames; outputTypes["rjsd"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "CollectSharedCommand", "CollectSharedCommand"); exit(1); } } //********************************************************************************************************************** CollectSharedCommand::CollectSharedCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters=parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["sharedchao"] = tempOutNames; outputTypes["sharedsobs"] = tempOutNames; outputTypes["sharedace"] = tempOutNames; outputTypes["jabund"] = tempOutNames; outputTypes["sorabund"] = tempOutNames; outputTypes["jclass"] = tempOutNames; outputTypes["sorclass"] = tempOutNames; outputTypes["jest"] = tempOutNames; outputTypes["sorest"] = tempOutNames; outputTypes["thetayc"] = tempOutNames; outputTypes["thetan"] = tempOutNames; outputTypes["kstest"] = tempOutNames; outputTypes["whittaker"] = tempOutNames; outputTypes["sharednseqs"] = tempOutNames; outputTypes["ochiai"] = tempOutNames; outputTypes["anderberg"] = tempOutNames; outputTypes["kulczynski"] = tempOutNames; outputTypes["kulczynskicody"] = tempOutNames; outputTypes["lennon"] = tempOutNames; outputTypes["morisitahorn"] = tempOutNames; outputTypes["braycurtis"] = tempOutNames; outputTypes["odum"] = tempOutNames; outputTypes["canberra"] = tempOutNames; outputTypes["structeuclidean"] = tempOutNames; outputTypes["structchord"] = tempOutNames; outputTypes["hellinger"] = tempOutNames; outputTypes["manhattan"] = tempOutNames; outputTypes["structpearson"] = tempOutNames; outputTypes["soergel"] = tempOutNames; outputTypes["spearman"] = tempOutNames; outputTypes["structkulczynski"] = tempOutNames; outputTypes["speciesprofile"] = tempOutNames; outputTypes["structchi2"] = tempOutNames; outputTypes["hamming"] = tempOutNames; outputTypes["gower"] = tempOutNames; outputTypes["memchi2"] = tempOutNames; outputTypes["memchord"] = tempOutNames; outputTypes["memeuclidean"] = tempOutNames; outputTypes["mempearson"] = tempOutNames; outputTypes["jsd"] = tempOutNames; outputTypes["rjsd"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //get shared file sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking.. label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan"; } else { if (calc == "default") { calc = "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); string temp; temp = validParameter.validFile(parameters, "freq", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, freq); temp = validParameter.validFile(parameters, "all", false); if (temp == "not found") { temp = "false"; } all = m->isTrue(temp); if (abort == false) { string fileNameRoot = outputDir + m->getRootName(m->getSimpleName(sharedfile)); map variables; variables["[filename]"] = fileNameRoot; ValidCalculators validCalculator; for (int i=0; ierrorOut(e, "CollectSharedCommand", "CollectSharedCommand"); exit(1); } } //********************************************************************************************************************** CollectSharedCommand::~CollectSharedCommand(){} //********************************************************************************************************************** int CollectSharedCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //if the users entered no valid calculators don't execute command if (cDisplays.size() == 0) { return 0; } for(int i=0;isetAll(all); } input = new InputData(sharedfile, "sharedfile"); order = input->getSharedOrderVector(); string lastLabel = order->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //set users groups SharedUtil* util = new SharedUtil(); Groups = m->getGroups(); vector allGroups = m->getAllGroups(); util->setGroups(Groups, allGroups, "collect"); m->setGroups(Groups); m->setAllGroups(allGroups); delete util; while((order != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); for(int i=0;iclearGroups(); return 0; } if(allLines == 1 || labels.count(order->getLabel()) == 1){ m->mothurOut(order->getLabel()); m->mothurOutEndLine(); //create collectors curve cCurve = new Collect(order, cDisplays); cCurve->getSharedCurve(freq); delete cCurve; processedLabels.insert(order->getLabel()); userLabels.erase(order->getLabel()); } //you have a label the user want that is smaller than this label and the last label has not already been processed if ((m->anyLabelsToProcess(order->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = order->getLabel(); delete order; order = input->getSharedOrderVector(lastLabel); m->mothurOut(order->getLabel()); m->mothurOutEndLine(); //create collectors curve cCurve = new Collect(order, cDisplays); cCurve->getSharedCurve(freq); delete cCurve; processedLabels.insert(order->getLabel()); userLabels.erase(order->getLabel()); //restore real lastlabel to save below order->setLabel(saveLabel); } lastLabel = order->getLabel(); //get next line to process delete order; order = input->getSharedOrderVector(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); for(int i=0;iclearGroups(); delete input; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (order != NULL) { delete order; } order = input->getSharedOrderVector(lastLabel); m->mothurOut(order->getLabel()); m->mothurOutEndLine(); cCurve = new Collect(order, cDisplays); cCurve->getSharedCurve(freq); delete cCurve; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); for(int i=0;iclearGroups(); return 0; } delete order; } for(int i=0;iclearGroups(); delete input; m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "CollectSharedCommand", "execute"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/commands/collectsharedcommand.h000066400000000000000000000027711255543666200225720ustar00rootroot00000000000000#ifndef COLLECTSHAREDCOMMAND_H #define COLLECTSHAREDCOMMAND_H /* * collectsharedcommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "sharedordervector.h" #include "inputdata.h" #include "collect.h" #include "display.h" #include "validcalculator.h" #include "sharedutilities.h" class CollectSharedCommand : public Command { public: CollectSharedCommand(string); CollectSharedCommand(); ~CollectSharedCommand(); vector setParameters(); string getCommandName() { return "collect.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Handelsman J (2006). Introducing SONS, A tool that compares the membership of microbial communities. Appl Environ Microbiol 72: 6773-9. \nhttp://www.mothur.org/wiki/Collect.shared"; } string getDescription() { return "generates collector's curves for calculators, which describe the similarity between communities or their shared richness"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: SharedOrderVector* order; InputData* input; Collect* cCurve; vector cDisplays; float freq; bool abort, allLines, all; set labels; //holds labels to be used string label, calc, groups, outputDir, sharedfile; vector Estimators, Groups, outputNames; }; #endif mothur-1.36.1/source/commands/command.hpp000066400000000000000000000136471255543666200204010ustar00rootroot00000000000000#ifndef COMMAND_HPP #define COMMAND_HPP //test2 /* * command.h * nast * * Created by Pat Schloss on 10/23/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ /*This class is a parent to all the command classes. */ #include "mothur.h" #include "optionparser.h" #include "validparameter.h" #include "mothurout.h" #include "commandparameter.h" class Command { public: Command() { m = MothurOut::getInstance(); } //needed by gui virtual string getCommandName() = 0; virtual string getCommandCategory() = 0; virtual string getHelpString() = 0; virtual string getCitation() = 0; virtual string getDescription() = 0; virtual map > getOutputFiles() { return outputTypes; } string getOutputFileName(string type, map variableParts) { //uses the pattern to create an output filename for a given type and input file name. try { string filename = ""; map >::iterator it; //is this a type this command creates it = outputTypes.find(type); if (it == outputTypes.end()) { m->mothurOut("[ERROR]: this command doesn't create a " + type + " output file.\n"); } else { string patternTemp = getOutputPattern(type); vector patterns; m->splitAtDash(patternTemp, patterns); //find pattern to use based on number of variables passed in string pattern = ""; bool foundPattern = false; vector numVariablesPerPattern; for (int i = 0; i < patterns.size(); i++) { int numVariables = 0; for (int j = 0; j < patterns[i].length(); j++) { if (patterns[i][j] == '[') { numVariables++; } } numVariablesPerPattern.push_back(numVariables); if (numVariables == variableParts.size()) { pattern = patterns[i]; foundPattern = true; break; } } //if you didn't find an exact match do we have something that might work if (!foundPattern) { for (int i = 0; i < numVariablesPerPattern.size(); i++) { if (numVariablesPerPattern[i] < variableParts.size()) { pattern = patterns[i]; foundPattern = true; break; } } if (!foundPattern) { m->mothurOut("[ERROR]: Not enough variable pieces for " + type + ".\n"); m->control_pressed = true; } } if (pattern != "") { int numVariables = 0; for (int i = 0; i < pattern.length(); i++) { if (pattern[i] == '[') { numVariables++; } } vector pieces; m->splitAtComma(pattern, pieces); for (int i = 0; i < pieces.size(); i++) { if (pieces[i][0] == '[') { map::iterator it = variableParts.find(pieces[i]); if (it == variableParts.end()) { m->mothurOut("[ERROR]: Did not provide variable for " + pieces[i] + ".\n"); m->control_pressed = true; }else { if (it->second != "") { if (it->first == "[filename]") { filename += it->second; } else if (it->first == "[extension]") { if (filename.length() > 0) { //rip off last "." filename = filename.substr(0, filename.length()-1); } filename += it->second + "."; }else { filename += it->second + "."; } } } }else { filename += pieces[i] + "."; } } if (filename.length() > 0) { //rip off last "." filename = filename.substr(0, filename.length()-1); } } } return filename; } catch(exception& e) { m->errorOut(e, "command", "getOutputFileName"); exit(1); } } virtual string getOutputPattern(string) = 0; //pass in type, returns something like: [filename],align or [filename],[distance],subsample.shared strings in [] means its a variable. This is used by the gui to predict output file names. use variable keywords: [filename], [distance], [group], [extension], [tag] virtual vector setParameters() = 0; //to fill parameters virtual vector getParameters() { return parameters; } virtual int execute() = 0; virtual void help() = 0; void citation() { m->mothurOutEndLine(); m->mothurOut(getCitation()); m->mothurOutEndLine(); } virtual ~Command() { } protected: MothurOut* m; bool calledHelp; map > outputTypes; vector parameters; map >::iterator itTypes; }; #endif mothur-1.36.1/source/commands/consensusseqscommand.cpp000066400000000000000000001006661255543666200232270ustar00rootroot00000000000000/* * consensusseqscommand.cpp * Mothur * * Created by westcott on 11/23/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "consensusseqscommand.h" #include "sequence.hpp" #include "inputdata.h" //********************************************************************************************************************** vector ConsensusSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta-name",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","fasta-name",false,false,true); parameters.push_back(plist); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pcutoff("cutoff", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ConsensusSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The consensus.seqs command can be used in 2 ways: create a consensus sequence from a fastafile, or with a listfile create a consensus sequence for each otu. Sequences must be aligned.\n"; helpString += "The consensus.seqs command parameters are fasta, list, name, count, cutoff and label.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your sequences, and is required, unless you have a valid current fasta file. \n"; helpString += "The list parameter allows you to enter a your list file. \n"; helpString += "The name parameter allows you to enter a names file associated with the fasta file. \n"; helpString += "The label parameter allows you to select what distance levels you would like output files for, and are separated by dashes.\n"; helpString += "The cutoff parameter allows you set a percentage of sequences that support the base. For example: cutoff=97 would only return a sequence that only showed ambiguities for bases that were not supported by at least 97% of sequences.\n"; helpString += "The consensus.seqs command should be in the following format: \n"; helpString += "consensus.seqs(fasta=yourFastaFile, list=yourListFile) \n"; helpString += "Example: consensus.seqs(fasta=abrecovery.align, list=abrecovery.fn.list) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ConsensusSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],cons.fasta-[filename],[tag],cons.fasta"; } else if (type == "name") { pattern = "[filename],cons.names-[filename],[tag],cons.names"; } else if (type == "count") { pattern = "[filename],cons.count_table-[filename],[tag],cons.count_table"; } else if (type == "summary") { pattern = "[filename],cons.summary-[filename],[tag],cons.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ConsensusSeqsCommand::ConsensusSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "ConsensusSeqsCommand"); exit(1); } } //*************************************************************************************************************** ConsensusSeqsCommand::ConsensusSeqsCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; } else if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (namefile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } string temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, cutoff); //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(fastafile); } if (countfile == "") { if (namefile == ""){ vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "ConsensusSeqsCommand"); exit(1); } } //*************************************************************************************************************** int ConsensusSeqsCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); readFasta(); if (m->control_pressed) { return 0; } if (namefile != "") { readNames(); } if (countfile != "") { ct.readTable(countfile, true, false); } if (m->control_pressed) { return 0; } if (listfile == "") { ofstream outSummary; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); string outputSummaryFile = getOutputFileName("summary", variables); m->openOutputFile(outputSummaryFile, outSummary); outSummary.setf(ios::fixed, ios::floatfield); outSummary.setf(ios::showpoint); outputNames.push_back(outputSummaryFile); outputTypes["summary"].push_back(outputSummaryFile); outSummary << "PositioninAlignment\tA\tT\tG\tC\tGap\tNumberofSeqs\tConsensusBase" << endl; ofstream outFasta; string outputFastaFile = getOutputFileName("fasta", variables); m->openOutputFile(outputFastaFile, outFasta); outputNames.push_back(outputFastaFile); outputTypes["fasta"].push_back(outputFastaFile); vector< vector > percentages; percentages.resize(5); for (int j = 0; j < percentages.size(); j++) { percentages[j].resize(seqLength, 0.0); } string consSeq = ""; int thisCount; //get counts for (int j = 0; j < seqLength; j++) { if (m->control_pressed) { outSummary.close(); outFasta.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } vector counts; counts.resize(5, 0); //A,T,G,C,Gap int numDots = 0; thisCount = 0; for (map::iterator it = fastaMap.begin(); it != fastaMap.end(); it++) { string thisSeq = it->second; int size = 0; if (countfile != "") { size = ct.getNumSeqs(it->first); } else { map::iterator itCount = nameFileMap.find(it->first); if (itCount != nameFileMap.end()) { size = itCount->second; }else { m->mothurOut("[ERROR]: file mismatch, aborting.\n"); m->control_pressed = true; break; } } for (int k = 0; k < size; k++) { if (thisSeq[j] == '.') { numDots++; } char base = toupper(thisSeq[j]); if (base == 'A') { counts[0]++; } else if (base == 'T') { counts[1]++; } else if (base == 'G') { counts[2]++; } else if (base == 'C') { counts[3]++; } else { counts[4]++; } thisCount++; } } char conBase = '.'; if (numDots != thisCount) { conBase = getBase(counts, thisCount); } consSeq += conBase; percentages[0][j] = counts[0] / (float) thisCount; percentages[1][j] = counts[1] / (float) thisCount; percentages[2][j] = counts[2] / (float) thisCount; percentages[3][j] = counts[3] / (float) thisCount; percentages[4][j] = counts[4] / (float) thisCount; } for (int j = 0; j < seqLength; j++) { outSummary << (j+1) << '\t' << percentages[0][j] << '\t'<< percentages[1][j] << '\t'<< percentages[2][j] << '\t' << percentages[3][j] << '\t' << percentages[4][j] << '\t' << thisCount << '\t' << consSeq[j] << endl; } outFasta << ">conseq" << endl << consSeq << endl; outSummary.close(); outFasta.close(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } }else { InputData* input = new InputData(listfile, "list"); ListVector* list = input->getListVector(); string lastLabel = list->getLabel(); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete list; delete input; return 0; } if(allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel()); m->mothurOutEndLine(); processList(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input->getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); processList(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = NULL; //get next line to process list = input->getListVector(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } if (list != NULL) { delete list; } delete input; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input->getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); processList(list); delete list; list = NULL; } if (list != NULL) { delete list; } delete input; } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to find the consensus sequences."); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "execute"); exit(1); } } //*************************************************************************************************************** int ConsensusSeqsCommand::processList(ListVector*& list){ try{ ofstream outSummary; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[tag]"] = list->getLabel(); string outputSummaryFile = getOutputFileName("summary", variables); m->openOutputFile(outputSummaryFile, outSummary); outSummary.setf(ios::fixed, ios::floatfield); outSummary.setf(ios::showpoint); outputNames.push_back(outputSummaryFile); outputTypes["summary"].push_back(outputSummaryFile); ofstream outName; string outputNameFile = getOutputFileName("name",variables); m->openOutputFile(outputNameFile, outName); outputNames.push_back(outputNameFile); outputTypes["name"].push_back(outputNameFile); ofstream outFasta; string outputFastaFile = getOutputFileName("fasta",variables); m->openOutputFile(outputFastaFile, outFasta); outputNames.push_back(outputFastaFile); outputTypes["fasta"].push_back(outputFastaFile); outSummary << "OTU#\tPositioninAlignment\tA\tT\tG\tC\tGap\tNumberofSeqs\tConsensusBase" << endl; string snumBins = toString(list->getNumBins()); vector binLabels = list->getLabels(); for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { outSummary.close(); outName.close(); outFasta.close(); return 0; } string bin = list->get(i); string consSeq = getConsSeq(bin, outSummary, i); outFasta << ">" << binLabels[i] << endl << consSeq << endl; outName << binLabels[i] << '\t' << binLabels[i] << "," << bin << endl; } outSummary.close(); outName.close(); outFasta.close(); return 0; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "processList"); exit(1); } } //*************************************************************************************************************** string ConsensusSeqsCommand::getConsSeq(string bin, ofstream& outSummary, int binNumber){ try{ string consSeq = ""; bool error = false; int totalSize=0; vector binNames; m->splitAtComma(bin, binNames); vector< vector > percentages; percentages.resize(5); for (int j = 0; j < percentages.size(); j++) { percentages[j].resize(seqLength, 0.0); } if (countfile != "") { //get counts for (int j = 0; j < seqLength; j++) { if (m->control_pressed) { return consSeq; } vector counts; counts.resize(5, 0); //A,T,G,C,Gap int numDots = 0; totalSize = 0; for (int i = 0; i < binNames.size(); i++) { if (m->control_pressed) { return consSeq; } string thisSeq = ""; map::iterator itFasta = fastaMap.find(binNames[i]); if (itFasta != fastaMap.end()) { thisSeq = itFasta->second; }else { m->mothurOut("[ERROR]: " + binNames[i] + " is not in your fasta file, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } int size = ct.getNumSeqs(binNames[i]); if (size != 0) { for (int k = 0; k < size; k++) { if (thisSeq[j] == '.') { numDots++; } char base = toupper(thisSeq[j]); if (base == 'A') { counts[0]++; } else if (base == 'T') { counts[1]++; } else if (base == 'G') { counts[2]++; } else if (base == 'C') { counts[3]++; } else { counts[4]++; } totalSize++; } }else { m->mothurOut("[ERROR]: " + binNames[i] + " is not in your count file, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } } char conBase = '.'; if (numDots != totalSize) { conBase = getBase(counts, totalSize); } consSeq += conBase; percentages[0][j] = counts[0] / (float) totalSize; percentages[1][j] = counts[1] / (float) totalSize; percentages[2][j] = counts[2] / (float) totalSize; percentages[3][j] = counts[3] / (float) totalSize; percentages[4][j] = counts[4] / (float) totalSize; } }else { //get sequence strings for each name in the bin vector seqs; for (int i = 0; i < binNames.size(); i++) { map::iterator it; it = nameMap.find(binNames[i]); if (it == nameMap.end()) { if (namefile == "") { m->mothurOut("[ERROR]: " + binNames[i] + " is not in your fasta file, please correct."); m->mothurOutEndLine(); error = true; } else { m->mothurOut("[ERROR]: " + binNames[i] + " is not in your fasta or name file, please correct."); m->mothurOutEndLine(); error = true; } break; }else { //add sequence string to seqs vector to process below map::iterator itFasta = fastaMap.find(it->second); if (itFasta != fastaMap.end()) { string seq = itFasta->second; seqs.push_back(seq); }else { m->mothurOut("[ERROR]: file mismatch, aborting. \n"); } } } if (error) { m->control_pressed = true; return consSeq; } totalSize = seqs.size(); //get counts for (int j = 0; j < seqLength; j++) { if (m->control_pressed) { return consSeq; } vector counts; counts.resize(5, 0); //A,T,G,C,Gap int numDots = 0; for (int i = 0; i < seqs.size(); i++) { if (seqs[i][j] == '.') { numDots++; } char base = toupper(seqs[i][j]); if (base == 'A') { counts[0]++; } else if (base == 'T') { counts[1]++; } else if (base == 'G') { counts[2]++; } else if (base == 'C') { counts[3]++; } else { counts[4]++; } } char conBase = '.'; if (numDots != seqs.size()) { conBase = getBase(counts, seqs.size()); } consSeq += conBase; percentages[0][j] = counts[0] / (float) seqs.size(); percentages[1][j] = counts[1] / (float) seqs.size(); percentages[2][j] = counts[2] / (float) seqs.size(); percentages[3][j] = counts[3] / (float) seqs.size(); percentages[4][j] = counts[4] / (float) seqs.size(); } } for (int j = 0; j < seqLength; j++) { outSummary << (binNumber + 1) << '\t' << (j+1) << '\t' << percentages[0][j] << '\t'<< percentages[1][j] << '\t'<< percentages[2][j] << '\t' << percentages[3][j] << '\t' << percentages[4][j] << '\t' << totalSize << '\t' << consSeq[j] << endl; } return consSeq; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "getConsSeq"); exit(1); } } //*************************************************************************************************************** char ConsensusSeqsCommand::getBase(vector counts, int size){ //A,T,G,C,Gap try{ /* A = adenine * C = cytosine * G = guanine * T = thymine * R = G A (purine) * Y = T C (pyrimidine) * K = G T (keto) * M = A C (amino) * S = G C (strong bonds) * W = A T (weak bonds) * B = G T C (all but A) * D = G A T (all but C) * H = A C T (all but G) * V = G C A (all but T) * N = A G C T (any) */ char conBase = 'N'; //zero out counts that don't make the cutoff float percentage = (100.0 - cutoff) / 100.0; for (int i = 0; i < counts.size(); i++) { float countPercentage = counts[i] / (float) size; if (countPercentage < percentage) { counts[i] = 0; } } //any if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'n'; } //any no gap else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'N'; } //all but T else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'v'; } //all but T no gap else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'V'; } //all but G else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'h'; } //all but G no gap else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'H'; } //all but C else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'd'; } //all but C no gap else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'D'; } //all but A else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'b'; } //all but A no gap else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'B'; } //W = A T (weak bonds) else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'w'; } //W = A T (weak bonds) no gap else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'W'; } //S = G C (strong bonds) else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 's'; } //S = G C (strong bonds) no gap else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'S'; } //M = A C (amino) else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'm'; } //M = A C (amino) no gap else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'M'; } //K = G T (keto) else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'k'; } //K = G T (keto) no gap else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'K'; } //Y = T C (pyrimidine) else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'y'; } //Y = T C (pyrimidine) no gap else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'Y'; } //R = G A (purine) else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'r'; } //R = G A (purine) no gap else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'R'; } //only A else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'a'; } //only A no gap else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'A'; } //only T else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 't'; } //only T no gap else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'T'; } //only G else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'g'; } //only G no gap else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'G'; } //only C else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'c'; } //only C no gap else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'C'; } //only gap else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = '-'; } //cutoff removed all counts else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'N'; } else{ m->mothurOut("[ERROR]: cannot find consensus base."); m->mothurOutEndLine(); } return conBase; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "getBase"); exit(1); } } //*************************************************************************************************************** int ConsensusSeqsCommand::readFasta(){ try{ ifstream in; m->openInputFile(fastafile, in); seqLength = 0; while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); string name = seq.getName(); if (name != "") { fastaMap[name] = seq.getAligned(); nameMap[name] = name; //set nameMap incase no names file nameFileMap[name] = 1; if (seqLength == 0) { seqLength = seq.getAligned().length(); } else if (seqLength != seq.getAligned().length()) { m->mothurOut("[ERROR]: sequence are not the same length, please correct."); m->mothurOutEndLine(); m->control_pressed = true; break; } } } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "readFasta"); exit(1); } } //*************************************************************************************************************** int ConsensusSeqsCommand::readNames(){ try{ map temp; map::iterator it; bool error = false; m->readNames(namefile, temp); //use central buffered read for (map::iterator itTemp = temp.begin(); itTemp != temp.end(); itTemp++) { string thisname, repnames; thisname = itTemp->first; repnames = itTemp->second; it = nameMap.find(thisname); if (it != nameMap.end()) { //then this sequence was in the fastafile nameFileMap[thisname] = m->getNumNames(repnames); //for later when outputting the new namesFile if the list file is unique vector splitRepNames; m->splitAtComma(repnames, splitRepNames); for (int i = 0; i < splitRepNames.size(); i++) { nameMap[splitRepNames[i]] = thisname; } }else{ m->mothurOut("[ERROR]: " + thisname + " is not in the fasta file, please correct."); m->mothurOutEndLine(); error = true; } } if (error) { m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "ConsensusSeqsCommand", "readNames"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/commands/consensusseqscommand.h000066400000000000000000000025121255543666200226630ustar00rootroot00000000000000#ifndef CONSENSUSSEQSCOMMAND_H #define CONSENSUSSEQSCOMMAND_H //test /* * consensusseqscommand.h * Mothur * * Created by westcott on 11/23/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "listvector.hpp" #include "counttable.h" class ConsensusSeqsCommand : public Command { public: ConsensusSeqsCommand(string); ConsensusSeqsCommand(); ~ConsensusSeqsCommand(){} vector setParameters(); string getCommandName() { return "consensus.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Consensus.seqs"; } string getDescription() { return "create a consensus sequence for each OTU or for a fasta file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CountTable ct; bool abort, allLines; string fastafile, listfile, namefile, countfile, label, outputDir; set labels; vector outputNames; map fastaMap; map nameMap; map nameFileMap; int seqLength; float cutoff; int readFasta(); int readNames(); int processList(ListVector*&); string getConsSeq(string, ofstream&, int); char getBase(vector, int); }; #endif mothur-1.36.1/source/commands/cooccurrencecommand.cpp000066400000000000000000000606721255543666200227670ustar00rootroot00000000000000/* * cooccurrencecommand.cpp * Mothur * * Created by kiverson on 1/2/12. * Copyright 2012 Schloss Lab. All rights reserved. * */ #include "cooccurrencecommand.h" //********************************************************************************************************************** vector CooccurrenceCommand::setParameters() { try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","summary",false,true,true); parameters.push_back(pshared); CommandParameter pmetric("metric", "Multiple", "cscore-checker-combo-vratio", "cscore", "", "", "","",false,false); parameters.push_back(pmetric); CommandParameter pmatrix("matrixmodel", "Multiple", "sim1-sim2-sim3-sim4-sim5-sim6-sim7-sim8-sim9", "sim2", "", "", "","",false,false); parameters.push_back(pmatrix); CommandParameter pruns("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(pruns); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "CooccurrenceCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string CooccurrenceCommand::getHelpString(){ try { string helpString = "The cooccurrence command calculates four metrics and tests their significance to assess whether presence-absence patterns are different than what one would expect by chance."; helpString += "The cooccurrence command parameters are shared, metric, matrixmodel, iters, label and groups."; helpString += "The matrixmodel parameter options are sim1, sim2, sim3, sim4, sim5, sim6, sim7, sim8 and sim9. Default=sim2"; helpString += "The metric parameter options are cscore, checker, combo and vratio. Default=cscore"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The groups parameter allows you to specify which of the groups you would like analyzed.\n"; helpString += "The cooccurrence command should be in the following format: \n"; helpString += "cooccurrence(shared=yourSharedFile) \n"; helpString += "Example cooccurrence(shared=final.an.shared).\n"; helpString += "Note: No spaces between parameter labels (i.e. shared), '=' and parameters (i.e.yourShared).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "CooccurrenceCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string CooccurrenceCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],cooccurence.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "CooccurrenceCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** CooccurrenceCommand::CooccurrenceCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "CooccurrenceCommand", "CooccurrenceCommand"); exit(1); } } //********************************************************************************************************************** CooccurrenceCommand::CooccurrenceCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } vector tempOutNames; outputTypes["summary"] = tempOutNames; //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } //get shared file sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); } metric = validParameter.validFile(parameters, "metric", false); if (metric == "not found") { metric = "cscore"; } if ((metric != "cscore") && (metric != "checker") && (metric != "combo") && (metric != "vratio")) { m->mothurOut("[ERROR]: " + metric + " is not a valid metric option for the cooccurrence command. Choices are cscore, checker, combo, vratio."); m->mothurOutEndLine(); abort = true; } matrix = validParameter.validFile(parameters, "matrixmodel", false); if (matrix == "not found") { matrix = "sim2"; } if ((matrix != "sim1") && (matrix != "sim2") && (matrix != "sim3") && (matrix != "sim4") && (matrix != "sim5" ) && (matrix != "sim6" ) && (matrix != "sim7" ) && (matrix != "sim8" ) && (matrix != "sim9" )) { m->mothurOut("[ERROR]: " + matrix + " is not a valid matrix option for the cooccurrence command. Choices are sim1, sim2, sim3, sim4, sim5, sim6, sim7, sim8, sim9."); m->mothurOutEndLine(); abort = true; } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); string temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, runs); } } catch(exception& e) { m->errorOut(e, "CooccurrenceCommand", "CooccurrenceCommand"); exit(1); } } //********************************************************************************************************************** int CooccurrenceCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData* input = new InputData(sharedfile, "sharedfile"); vector lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; ofstream out; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); string outputFileName = getOutputFileName("summary", variables); m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["summary"].push_back(outputFileName); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); out << "metric\tlabel\tScore\tzScore\tstandardDeviation\tnp_Pvalue\n"; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } delete input; out.close(); m->mothurRemove(outputFileName); return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); getCooccurrence(lookup, out); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); getCooccurrence(lookup, out); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { outputTypes.clear(); delete input; out.close(); m->mothurRemove(outputFileName); return 0; } //get next line to process lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { delete input; out.close(); m->mothurRemove(outputFileName); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); getCooccurrence(lookup, out); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } out.close(); //reset groups parameter delete input; m->clearGroups(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFileName); m->mothurOutEndLine(); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "CooccurrenceCommand", "execute"); exit(1); } } //********************************************************************************************************************** int CooccurrenceCommand::getCooccurrence(vector& thisLookUp, ofstream& out){ try { int numOTUS = thisLookUp[0]->getNumBins(); if(numOTUS < 2) { m->mothurOut("Not enough OTUs for co-occurrence analysis, skipping"); m->mothurOutEndLine(); return 0; } vector< vector > co_matrix; co_matrix.resize(thisLookUp[0]->getNumBins()); for (int i = 0; i < thisLookUp[0]->getNumBins(); i++) { co_matrix[i].resize((thisLookUp.size()), 0); } vector columntotal; columntotal.resize(thisLookUp.size(), 0); vector rowtotal; rowtotal.resize(numOTUS, 0); for (int i = 0; i < thisLookUp.size(); i++) { //nrows in the shared file for (int j = 0; j < thisLookUp[i]->getNumBins(); j++) { //cols of original shared file if (m->control_pressed) { return 0; } int abund = thisLookUp[i]->getAbundance(j); if(abund > 0) { co_matrix[j][i] = 1; rowtotal[j]++; columntotal[i]++; } } } //nrows is ncols of inital matrix. All the functions need this value. They assume the transposition has already taken place and nrows and ncols refer to that matrix. //comatrix and initmatrix are still vectors of vectors of ints as in the original script. The abundancevector is only what was read in ie not a co-occurrence matrix! int nrows = numOTUS;//rows of inital matrix int ncols = thisLookUp.size();//groups double initscore = 0.0; vector stats; vector probabilityMatrix; probabilityMatrix.resize(ncols * nrows, 0); vector > nullmatrix(nrows, vector(ncols, 0)); TrialSwap2 trial; int n = accumulate( columntotal.begin(), columntotal.end(), 0 ); //============================================================ //generate a probability matrix. Only do this once. float start = 0.0; if (matrix == "sim1") { for(int i=0;imothurOut("[ERROR]: No model selected! \n"); m->control_pressed = true; } //co_matrix is the transposed shared file, initmatrix is the original shared file if (metric == "cscore") { initscore = trial.calc_c_score(co_matrix, rowtotal, ncols, nrows); } else if (metric == "checker") { initscore = trial.calc_checker(co_matrix, rowtotal, ncols, nrows); } else if (metric == "vratio") { initscore = trial.calc_vratio(nrows, ncols, rowtotal, columntotal); } else if (metric == "combo") { initscore = trial.calc_combo(nrows, ncols, co_matrix); } else { m->mothurOut("[ERROR]: No metric selected!\n"); m->control_pressed = true; return 1; } m->mothurOut("Initial c score: " + toString(initscore)); m->mothurOutEndLine(); double previous; double current; double randnum; int count; //burn-in for sim9 if(matrix == "sim9") { for(int i=0;i<10000;i++) trial.swap_checkerboards (co_matrix, ncols, nrows); } //populate null matrix from probability matrix, do this a lot. for(int k=0;k(ncols, 0)); if(matrix == "sim1" || matrix == "sim6" || matrix == "sim8" || matrix == "sim7") { count = 0; while(count < n) { if (m->control_pressed) { return 0; } nextnum2: previous = 0.0; randnum = rand() / double(RAND_MAX); for(int i=0;i previous) { nullmatrix[i][j] = 1; count++; if (count > n) break; else goto nextnum2; } previous = current; } } } } else if (matrix == "sim2") { for(int i=0;icontrol_pressed) { return 0; } randnum = rand() / double(RAND_MAX); for(int j=0;j previous && nullmatrix[i][j] != 1) { nullmatrix[i][j] = 1; count++; previous = 0.0; break; } previous = current; } } } } else if(matrix == "sim3" || matrix == "sim5") { //columns for(int j=0;jcontrol_pressed) { return 0; } randnum = rand() / double(RAND_MAX); for(int i=0;i previous && nullmatrix[i][j] != 1) { nullmatrix[i][j] = 1; count++; previous = 0.0; break; } previous = current; } } } } //swap_checkerboards takes the original matrix and swaps checkerboards else if(matrix == "sim9") { trial.swap_checkerboards (co_matrix, ncols, nrows); nullmatrix = co_matrix; } else { m->mothurOut("[ERROR]: No null model selected!\n\n"); m->control_pressed = true; return 1; } //run metric on null matrix and add score to the stats vector if (metric == "cscore"){ stats.push_back(trial.calc_c_score(nullmatrix, rowtotal, ncols, nrows)); } else if (metric == "checker") { stats.push_back(trial.calc_checker(nullmatrix, rowtotal, ncols, nrows)); } else if (metric == "vratio") { stats.push_back(trial.calc_vratio(nrows, ncols, rowtotal, columntotal)); } else if (metric == "combo") { stats.push_back(trial.calc_combo(nrows, ncols, nullmatrix)); } else { m->mothurOut("[ERROR]: No metric selected!\n\n"); m->control_pressed = true; return 1; } } double total = 0.0; for (int i=0; imothurOutEndLine(); m->mothurOut("average metric score: " + toString(nullMean)); m->mothurOutEndLine(); //calc_p_value is not a statistical p-value, it's just the average that are either > or < the initscore. //All it does is show what is expected in a competitively structured community //zscore is output so p-value can be looked up in a ztable double pvalue = 0.0; if (metric == "cscore" || metric == "checker") { pvalue = trial.calc_pvalue_greaterthan (stats, initscore); } else{ pvalue = trial.calc_pvalue_lessthan (stats, initscore); } double sd = trial.getSD(runs, stats, nullMean); double zscore = trial.get_zscore(sd, nullMean, initscore); m->mothurOut("zscore: " + toString(zscore)); m->mothurOutEndLine(); m->mothurOut("standard deviation: " + toString(sd)); m->mothurOutEndLine(); m->mothurOut("non-parametric p-value: " + toString(pvalue)); m->mothurOutEndLine(); out << metric << '\t' << thisLookUp[0]->getLabel() << '\t' << nullMean << '\t' << zscore << '\t' << sd << '\t' << pvalue << endl; return 0; } catch(exception& e) { m->errorOut(e, "CooccurrenceCommand", "Cooccurrence"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/cooccurrencecommand.h000066400000000000000000000025531255543666200224260ustar00rootroot00000000000000#ifndef COOCCURRENCECOMMAND_H #define COOCCURRENCECOMMAND_H /* * COOCCURRENCE.h * Mothur * * Created by westcott on 11/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "trialswap2.h" #include "inputdata.h" #include "sharedrabundvector.h" class CooccurrenceCommand : public Command { public: CooccurrenceCommand(string); CooccurrenceCommand(); ~CooccurrenceCommand(){} vector setParameters(); string getCommandName() { return "cooccurrence"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Ulrich W & Gotelli NJ (2010). Null model analysis of species associations using abundance data. Ecology 91:3384.\nhttp://www.mothur.org/wiki/Cooccurrence"; } string getDescription() { return "calculates four metrics and tests their significance to assess whether presence-absence patterns are different than what one would expect by chance."; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string metric, matrix, outputDir; string label, sharedfile, groups; bool abort, allLines; set labels; vector outputNames, Groups; int runs; int getCooccurrence(vector&, ofstream&); }; #endif mothur-1.36.1/source/commands/corraxescommand.cpp000066400000000000000000001056051255543666200221370ustar00rootroot00000000000000/* * corraxescommand.cpp * Mothur * * Created by westcott on 12/22/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "corraxescommand.h" #include "sharedutilities.h" #include "linearalgebra.h" //********************************************************************************************************************** vector CorrAxesCommand::setParameters(){ try { CommandParameter paxes("axes", "InputTypes", "", "", "none", "none", "none","corraxes",false,true,true); parameters.push_back(paxes); CommandParameter pshared("shared", "InputTypes", "", "", "SharedRelMeta", "SharedRelMeta", "none","",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "SharedRelMeta", "SharedRelMeta", "none","",false,false,true); parameters.push_back(prelabund); CommandParameter pmetadata("metadata", "InputTypes", "", "", "SharedRelMeta", "SharedRelMeta", "none","",false,false); parameters.push_back(pmetadata); CommandParameter pnumaxes("numaxes", "Number", "", "3", "", "", "","",false,false); parameters.push_back(pnumaxes); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pmethod("method", "Multiple", "pearson-spearman-kendall", "pearson", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string CorrAxesCommand::getHelpString(){ try { string helpString = ""; helpString += "The corr.axes command reads a shared, relabund or metadata file as well as an axes file and calculates the correlation coefficient.\n"; helpString += "The corr.axes command parameters are shared, relabund, axes, metadata, groups, method, numaxes and label. The shared, relabund or metadata and axes parameters are required. If shared is given the relative abundance is calculated.\n"; helpString += "The groups parameter allows you to specify which of the groups you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distance level you would like used, if none is given the first distance is used.\n"; helpString += "The method parameter allows you to select what method you would like to use. Options are pearson, spearman and kendall. Default=pearson.\n"; helpString += "The numaxes parameter allows you to select the number of axes you would like to use. Default=3.\n"; helpString += "The corr.axes command should be in the following format: corr.axes(axes=yourPcoaFile, shared=yourSharedFile, method=yourMethod).\n"; helpString += "Example corr.axes(axes=genus.pool.thetayc.genus.lt.pcoa, shared=genus.pool.shared, method=kendall).\n"; helpString += "The corr.axes command outputs a .corr.axes file.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string CorrAxesCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "corraxes") { pattern = "[filename],[tag],corr.axes"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** CorrAxesCommand::CorrAxesCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["corraxes"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "CorrAxesCommand"); exit(1); } } //********************************************************************************************************************** CorrAxesCommand::CorrAxesCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["corraxes"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("axes"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["axes"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } it = parameters.find("metadata"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["metadata"] = inputDir + it->second; } } } //check for required parameters axesfile = validParameter.validFile(parameters, "axes", true); if (axesfile == "not open") { abort = true; } else if (axesfile == "not found") { axesfile = ""; m->mothurOut("axes is a required parameter for the corr.axes command."); m->mothurOutEndLine(); abort = true; } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { inputFileName = sharedfile; m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { inputFileName = relabundfile; m->setRelAbundFile(relabundfile); } metadatafile = validParameter.validFile(parameters, "metadata", true); if (metadatafile == "not open") { abort = true; } else if (metadatafile == "not found") { metadatafile = ""; } else { inputFileName = metadatafile; } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; pickedGroups = false; } else { pickedGroups = true; m->splitAtDash(groups, Groups); } m->setGroups(Groups); outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputFileName); } label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } if ((relabundfile == "") && (sharedfile == "") && (metadatafile == "")) { //is there are current file available for any of these? //give priority to shared, then relabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputFileName = sharedfile; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputFileName = relabundfile; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You must provide either a shared, relabund, or metadata file."); m->mothurOutEndLine(); abort = true; } } } if (metadatafile != "") { if ((relabundfile != "") || (sharedfile != "")) { m->mothurOut("You may only use one of the following : shared, relabund or metadata file."); m->mothurOutEndLine(); abort = true; } }else { if ((relabundfile != "") && (sharedfile != "")) { m->mothurOut("You may only use one of the following : shared, relabund or metadata file."); m->mothurOutEndLine(); abort = true; } } string temp; temp = validParameter.validFile(parameters, "numaxes", false); if (temp == "not found"){ temp = "3"; } m->mothurConvert(temp, numaxes); method = validParameter.validFile(parameters, "method", false); if (method == "not found"){ method = "pearson"; } if ((method != "pearson") && (method != "spearman") && (method != "kendall")) { m->mothurOut(method + " is not a valid method. Valid methods are pearson, spearman, and kendall."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "CorrAxesCommand"); exit(1); } } //********************************************************************************************************************** int CorrAxesCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } /*************************************************************************************/ // use smart distancing to get right sharedRabund and convert to relabund if needed // /************************************************************************************/ if (sharedfile != "") { InputData* input = new InputData(sharedfile, "sharedfile"); getSharedFloat(input); delete input; if (m->control_pressed) { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } return 0; } if (lookupFloat[0] == NULL) { m->mothurOut("[ERROR] reading relabund file."); m->mothurOutEndLine(); return 0; } }else if (relabundfile != "") { InputData* input = new InputData(relabundfile, "relabund"); getSharedFloat(input); delete input; if (m->control_pressed) { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } return 0; } if (lookupFloat[0] == NULL) { m->mothurOut("[ERROR] reading relabund file."); m->mothurOutEndLine(); return 0; } }else if (metadatafile != "") { getMetadata(); //reads metadata file and store in lookupFloat, saves column headings in metadataLabels for later if (m->control_pressed) { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } return 0; } if (lookupFloat[0] == NULL) { m->mothurOut("[ERROR] reading metadata file."); m->mothurOutEndLine(); return 0; } if (pickedGroups) { eliminateZeroOTUS(lookupFloat); } }else { m->mothurOut("[ERROR]: no file given."); m->mothurOutEndLine(); return 0; } if (m->control_pressed) { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } return 0; } //this is for a sanity check to make sure the axes file and shared file match for (int i = 0; i < lookupFloat.size(); i++) { names.insert(lookupFloat[i]->getGroup()); } /*************************************************************************************/ // read axes file and check for file mismatches with shared or relabund file // /************************************************************************************/ //read axes file map > axes = readAxes(); if (m->control_pressed) { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } return 0; } //sanity check, the read only adds groups that are in the shared or relabund file, but we want to make sure the axes file isn't missing anyone if (axes.size() != lookupFloat.size()) { map >::iterator it; for (int i = 0; i < lookupFloat.size(); i++) { it = axes.find(lookupFloat[i]->getGroup()); if (it == axes.end()) { m->mothurOut(lookupFloat[i]->getGroup() + " is in your shared of relabund file but not in your axes file, please correct."); m->mothurOutEndLine(); } } m->control_pressed = true; } if (m->control_pressed) { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } return 0; } /*************************************************************************************/ // calc the r values // /************************************************************************************/ map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[tag]"] = method; string outputFileName = getOutputFileName("corraxes", variables); outputNames.push_back(outputFileName); outputTypes["corraxes"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); //output headings if (metadatafile == "") { out << "OTU"; } else { out << "Feature"; } for (int i = 0; i < numaxes; i++) { out << '\t' << "axis" << (i+1) << "\tp-value"; } out << "\tlength" << endl; if (method == "pearson") { calcPearson(axes, out); } else if (method == "spearman") { calcSpearman(axes, out); } else if (method == "kendall") { calcKendall(axes, out); } else { m->mothurOut("[ERROR]: Invalid method."); m->mothurOutEndLine(); } out.close(); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } if (m->control_pressed) { return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "execute"); exit(1); } } //********************************************************************************************************************** int CorrAxesCommand::calcPearson(map >& axes, ofstream& out) { try { LinearAlgebra linear; //find average of each axis - X vector averageAxes; averageAxes.resize(numaxes, 0.0); for (map >::iterator it = axes.begin(); it != axes.end(); it++) { vector temp = it->second; for (int i = 0; i < temp.size(); i++) { averageAxes[i] += temp[i]; } } for (int i = 0; i < averageAxes.size(); i++) { averageAxes[i] = averageAxes[i] / (float) axes.size(); } //for each otu for (int i = 0; i < lookupFloat[0]->getNumBins(); i++) { if (metadatafile == "") { out << m->currentSharedBinLabels[i]; } else { out << metadataLabels[i]; } //find the averages this otu - Y float sumOtu = 0.0; for (int j = 0; j < lookupFloat.size(); j++) { sumOtu += lookupFloat[j]->getAbundance(i); } float Ybar = sumOtu / (float) lookupFloat.size(); vector rValues(averageAxes.size()); //find r value for each axis for (int k = 0; k < averageAxes.size(); k++) { double r = 0.0; double numerator = 0.0; double denomTerm1 = 0.0; double denomTerm2 = 0.0; for (int j = 0; j < lookupFloat.size(); j++) { float Yi = lookupFloat[j]->getAbundance(i); float Xi = axes[lookupFloat[j]->getGroup()][k]; numerator += ((Xi - averageAxes[k]) * (Yi - Ybar)); denomTerm1 += ((Xi - averageAxes[k]) * (Xi - averageAxes[k])); denomTerm2 += ((Yi - Ybar) * (Yi - Ybar)); } double denom = (sqrt(denomTerm1) * sqrt(denomTerm2)); r = numerator / denom; if (isnan(r) || isinf(r)) { r = 0.0; } rValues[k] = r; out << '\t' << r; double sig = linear.calcPearsonSig(lookupFloat.size(), r); out << '\t' << sig; } double sum = 0; for(int k=0;kerrorOut(e, "CorrAxesCommand", "calcPearson"); exit(1); } } //********************************************************************************************************************** int CorrAxesCommand::calcSpearman(map >& axes, ofstream& out) { try { LinearAlgebra linear; vector sf; //format data vector< map > tableX; tableX.resize(numaxes); map::iterator itTable; vector< vector > scores; scores.resize(numaxes); for (map >::iterator it = axes.begin(); it != axes.end(); it++) { vector temp = it->second; for (int i = 0; i < temp.size(); i++) { spearmanRank member(it->first, temp[i]); scores[i].push_back(member); //count number of repeats itTable = tableX[i].find(temp[i]); if (itTable == tableX[i].end()) { tableX[i][temp[i]] = 1; }else { tableX[i][temp[i]]++; } } } //calc LX //for each axis vector Lx; Lx.resize(numaxes, 0.0); for (int i = 0; i < numaxes; i++) { for (itTable = tableX[i].begin(); itTable != tableX[i].end(); itTable++) { double tx = (double) itTable->second; Lx[i] += ((pow(tx, 3.0) - tx) / 12.0); } } //sort each axis for (int i = 0; i < numaxes; i++) { sort(scores[i].begin(), scores[i].end(), compareSpearman); } //find ranks of xi in each axis map > rankAxes; for (int i = 0; i < numaxes; i++) { vector ties; int rankTotal = 0; double sfTemp = 0.0; for (int j = 0; j < scores[i].size(); j++) { rankTotal += (j+1); ties.push_back(scores[i][j]); if (j != (scores[i].size()-1)) { // you are not the last so you can look ahead if (scores[i][j].score != scores[i][j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankAxes[ties[k].name].push_back(thisrank); } int t = ties.size(); sfTemp += (t*t*t-t); ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankAxes[ties[k].name].push_back(thisrank); } } } sf.push_back(sfTemp); } //for each otu for (int i = 0; i < lookupFloat[0]->getNumBins(); i++) { if (metadatafile == "") { out << m->currentSharedBinLabels[i]; } else { out << metadataLabels[i]; } //find the ranks of this otu - Y vector otuScores; map tableY; for (int j = 0; j < lookupFloat.size(); j++) { spearmanRank member(lookupFloat[j]->getGroup(), lookupFloat[j]->getAbundance(i)); otuScores.push_back(member); itTable = tableY.find(member.score); if (itTable == tableY.end()) { tableY[member.score] = 1; }else { tableY[member.score]++; } } //calc Ly double Ly = 0.0; for (itTable = tableY.begin(); itTable != tableY.end(); itTable++) { double ty = (double) itTable->second; Ly += ((pow(ty, 3.0) - ty) / 12.0); } sort(otuScores.begin(), otuScores.end(), compareSpearman); double sg = 0.0; map rankOtus; vector ties; int rankTotal = 0; for (int j = 0; j < otuScores.size(); j++) { rankTotal += (j+1); ties.push_back(otuScores[j]); if (j != (otuScores.size()-1)) { // you are not the last so you can look ahead if (otuScores[j].score != otuScores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankOtus[ties[k].name] = thisrank; } int t = ties.size(); sg += (t*t*t-t); ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankOtus[ties[k].name] = thisrank; } } } vector pValues(numaxes); //calc spearman ranks for each axis for this otu for (int j = 0; j < numaxes; j++) { double di = 0.0; for (int k = 0; k < lookupFloat.size(); k++) { float xi = rankAxes[lookupFloat[k]->getGroup()][j]; float yi = rankOtus[lookupFloat[k]->getGroup()]; di += ((xi - yi) * (xi - yi)); } double p = 0.0; double n = (double) lookupFloat.size(); double SX2 = ((pow(n, 3.0) - n) / 12.0) - Lx[j]; double SY2 = ((pow(n, 3.0) - n) / 12.0) - Ly; p = (SX2 + SY2 - di) / (2.0 * sqrt((SX2*SY2))); if (isnan(p) || isinf(p)) { p = 0.0; } out << '\t' << p; pValues[j] = p; double sig = linear.calcSpearmanSig(n, sf[j], sg, di); out << '\t' << sig; } double sum = 0; for(int k=0;kerrorOut(e, "CorrAxesCommand", "calcSpearman"); exit(1); } } //********************************************************************************************************************** int CorrAxesCommand::calcKendall(map >& axes, ofstream& out) { try { LinearAlgebra linear; //format data vector< vector > scores; scores.resize(numaxes); for (map >::iterator it = axes.begin(); it != axes.end(); it++) { vector temp = it->second; for (int i = 0; i < temp.size(); i++) { spearmanRank member(it->first, temp[i]); scores[i].push_back(member); } } //sort each axis for (int i = 0; i < numaxes; i++) { sort(scores[i].begin(), scores[i].end(), compareSpearman); } //convert scores to ranks of xi in each axis for (int i = 0; i < numaxes; i++) { vector ties; int rankTotal = 0; for (int j = 0; j < scores[i].size(); j++) { rankTotal += (j+1); ties.push_back(&(scores[i][j])); if (j != scores[i].size()-1) { // you are not the last so you can look ahead if (scores[i][j].score != scores[i][j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); (*ties[k]).score = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); (*ties[k]).score = thisrank; } } } } //for each otu for (int i = 0; i < lookupFloat[0]->getNumBins(); i++) { if (metadatafile == "") { out << m->currentSharedBinLabels[i]; } else { out << metadataLabels[i]; } //find the ranks of this otu - Y vector otuScores; for (int j = 0; j < lookupFloat.size(); j++) { spearmanRank member(lookupFloat[j]->getGroup(), lookupFloat[j]->getAbundance(i)); otuScores.push_back(member); } sort(otuScores.begin(), otuScores.end(), compareSpearman); map rankOtus; vector ties; int rankTotal = 0; for (int j = 0; j < otuScores.size(); j++) { rankTotal += (j+1); ties.push_back(otuScores[j]); if (j != otuScores.size()-1) { // you are not the last so you can look ahead if (otuScores[j].score != otuScores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankOtus[ties[k].name] = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankOtus[ties[k].name] = thisrank; } } } vector pValues(numaxes); //calc spearman ranks for each axis for this otu for (int j = 0; j < numaxes; j++) { int numCoor = 0; int numDisCoor = 0; vector otus; vector otusTemp; for (int l = 0; l < scores[j].size(); l++) { spearmanRank member(scores[j][l].name, rankOtus[scores[j][l].name]); otus.push_back(member); } int count = 0; for (int l = 0; l < scores[j].size(); l++) { int numWithHigherRank = 0; int numWithLowerRank = 0; float thisrank = otus[l].score; for (int u = l+1; u < scores[j].size(); u++) { if (otus[u].score > thisrank) { numWithHigherRank++; } else if (otus[u].score < thisrank) { numWithLowerRank++; } count++; } numCoor += numWithHigherRank; numDisCoor += numWithLowerRank; } double p = (numCoor - numDisCoor) / (float) count; if (isnan(p) || isinf(p)) { p = 0.0; } out << '\t' << p; pValues[j] = p; double sig = linear.calcKendallSig(scores[j].size(), p); out << '\t' << sig; } double sum = 0; for(int k=0;kerrorOut(e, "CorrAxesCommand", "calcKendall"); exit(1); } } //********************************************************************************************************************** int CorrAxesCommand::getSharedFloat(InputData* input){ try { lookupFloat = input->getSharedRAbundFloatVectors(); string lastLabel = lookupFloat[0]->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookupFloat[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(lookupFloat[0]->getLabel()) == 1){ processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookupFloat[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookupFloat[0]->getLabel(); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); //restore real lastlabel to save below lookupFloat[0]->setLabel(saveLabel); break; } lastLabel = lookupFloat[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat = input->getSharedRAbundFloatVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i] != NULL) { delete lookupFloat[i]; } } lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "getSharedFloat"); exit(1); } } //********************************************************************************************************************** int CorrAxesCommand::eliminateZeroOTUS(vector& thislookup) { try { vector newLookup; for (int i = 0; i < thislookup.size(); i++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); newLookup.push_back(temp); } //for each bin vector newBinLabels; string snumBins = toString(thislookup[0]->getNumBins()); for (int i = 0; i < thislookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //look at each sharedRabund and make sure they are not all zero bool allZero = true; for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { allZero = false; break; } } //if they are not all zero add this bin if (!allZero) { for (int j = 0; j < thislookup.size(); j++) { newLookup[j]->push_back(thislookup[j]->getAbundance(i), thislookup[j]->getGroup()); } //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } newBinLabels.push_back(binLabel); } } for (int j = 0; j < thislookup.size(); j++) { delete thislookup[j]; } thislookup = newLookup; m->currentSharedBinLabels = newBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "eliminateZeroOTUS"); exit(1); } } /*****************************************************************/ map > CorrAxesCommand::readAxes(){ try { map > axes; ifstream in; m->openInputFile(axesfile, in); string headerLine = m->getline(in); m->gobble(in); //count the number of axis you are reading bool done = false; int count = 0; while (!done) { int pos = headerLine.find("axis"); if (pos != string::npos) { count++; headerLine = headerLine.substr(pos+4); }else { done = true; } } if (numaxes > count) { m->mothurOut("You requested " + toString(numaxes) + " axes, but your file only includes " + toString(count) + ". Using " + toString(count) + "."); m->mothurOutEndLine(); numaxes = count; } while (!in.eof()) { if (m->control_pressed) { in.close(); return axes; } string group = ""; in >> group; m->gobble(in); vector thisGroupsAxes; for (int i = 0; i < count; i++) { float temp = 0.0; in >> temp; //only save the axis we want if (i < numaxes) { thisGroupsAxes.push_back(temp); } } //save group if its one the user selected if (names.count(group) != 0) { map >::iterator it = axes.find(group); if (it == axes.end()) { axes[group] = thisGroupsAxes; }else { m->mothurOut(group + " is already in your axes file, using first definition."); m->mothurOutEndLine(); } } m->gobble(in); } in.close(); return axes; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "readAxes"); exit(1); } } /*****************************************************************/ int CorrAxesCommand::getMetadata(){ try { vector groupNames; ifstream in; m->openInputFile(metadatafile, in); string headerLine = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(headerLine); //save names of columns you are reading for (int i = 1; i < pieces.size(); i++) { metadataLabels.push_back(pieces[i]); } int count = metadataLabels.size(); //read rest of file while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } string group = ""; in >> group; m->gobble(in); groupNames.push_back(group); SharedRAbundFloatVector* tempLookup = new SharedRAbundFloatVector(); tempLookup->setGroup(group); tempLookup->setLabel("1"); for (int i = 0; i < count; i++) { float temp = 0.0; in >> temp; tempLookup->push_back(temp, group); } lookupFloat.push_back(tempLookup); m->gobble(in); } in.close(); //remove any groups the user does not want, and set globaldata->groups with only valid groups SharedUtil* util; util = new SharedUtil(); Groups = m->getGroups(); util->setGroups(Groups, groupNames); m->setGroups(Groups); for (int i = 0; i < lookupFloat.size(); i++) { //if this sharedrabund is not from a group the user wants then delete it. if (util->isValidGroup(lookupFloat[i]->getGroup(), m->getGroups()) == false) { delete lookupFloat[i]; lookupFloat[i] = NULL; lookupFloat.erase(lookupFloat.begin()+i); i--; } } delete util; return 0; } catch(exception& e) { m->errorOut(e, "CorrAxesCommand", "getMetadata"); exit(1); } } /*****************************************************************/ mothur-1.36.1/source/commands/corraxescommand.h000066400000000000000000000033151255543666200215770ustar00rootroot00000000000000#ifndef CORRAXESCOMMAND_H #define CORRAXESCOMMAND_H /* * corraxescommand.h * Mothur * * Created by westcott on 12/22/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sharedrabundfloatvector.h" #include "inputdata.h" class CorrAxesCommand : public Command { public: CorrAxesCommand(string); CorrAxesCommand(); ~CorrAxesCommand(){} vector setParameters(); string getCommandName() { return "corr.axes"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "McCune B, Grace JB, Urban DL (2002). Analysis of ecological communities. MjM Software Design: Gleneden Beach, OR. \nLegendre P, Legendre L (1998). Numerical Ecology. Elsevier: New York. \nhttp://www.mothur.org/wiki/Corr.axes"; } string getDescription() { return "calculate the correlation coefficient for each column in a shared/relabund file to the axes displayed in a pcoa file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string axesfile, sharedfile, relabundfile, metadatafile, groups, label, inputFileName, outputDir, method; bool abort, pickedGroups; int numaxes; set names; vector outputNames, Groups; vector lookupFloat; vector metadataLabels; int getSharedFloat(InputData*); int getMetadata(); int eliminateZeroOTUS(vector&); map > readAxes(); int calcPearson(map >&, ofstream&); int calcSpearman(map >&, ofstream&); int calcKendall(map >&, ofstream&); }; #endif mothur-1.36.1/source/commands/countgroupscommand.cpp000066400000000000000000000336631255543666200227050ustar00rootroot00000000000000/* * countgroupscommand.cpp * Mothur * * Created by westcott on 8/9/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "countgroupscommand.h" #include "sharedutilities.h" #include "inputdata.h" //********************************************************************************************************************** vector CountGroupsCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "sharedGroup", "sharedGroup", "none","summary",false,false,true); parameters.push_back(pshared); CommandParameter pgroup("group", "InputTypes", "", "", "sharedGroup", "sharedGroup", "none","summary",false,false,true); parameters.push_back(pgroup); CommandParameter pcount("count", "InputTypes", "", "", "sharedGroup", "sharedGroup", "none","summary",false,false,true); parameters.push_back(pcount); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "CountGroupsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string CountGroupsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],count.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** string CountGroupsCommand::getHelpString(){ try { string helpString = ""; helpString += "The count.groups command counts sequences from a specific group or set of groups from the following file types: group, count or shared file.\n"; helpString += "The count.groups command parameters are accnos, group, shared and groups. You must provide a group or shared file.\n"; helpString += "The accnos parameter allows you to provide a file containing the list of groups.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like. You can separate group names with dashes.\n"; helpString += "The count.groups command should be in the following format: count.groups(accnos=yourAccnos, group=yourGroupFile).\n"; helpString += "Example count.groups(accnos=amazon.accnos, group=amazon.groups).\n"; helpString += "or count.groups(groups=pasture, group=amazon.groups).\n"; helpString += "Note: No spaces between parameter labels (i.e. group), '=' and parameters (i.e.yourGroupFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "CountGroupsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** CountGroupsCommand::CountGroupsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "CountGroupsCommand", "CountGroupsCommand"); exit(1); } } //********************************************************************************************************************** CountGroupsCommand::CountGroupsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } vector tempOutNames; outputTypes["summary"] = tempOutNames; //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = ""; } else { m->setAccnosFile(accnosfile); } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); CountTable ct; if (!ct.testGroups(countfile)) { m->mothurOut("[ERROR]: Your count file does not have any group information, aborting."); m->mothurOutEndLine(); abort=true; } } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } if ((sharedfile == "") && (groupfile == "") && (countfile == "")) { //give priority to shared, then group sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current groupfile, countfile or sharedfile and one is required."); m->mothurOutEndLine(); abort = true; } } } } if ((accnosfile == "") && (Groups.size() == 0)) { Groups.push_back("all"); m->setGroups(Groups); } } } catch(exception& e) { m->errorOut(e, "CountGroupsCommand", "CountGroupsCommand"); exit(1); } } //********************************************************************************************************************** int CountGroupsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get groups you want to remove if (accnosfile != "") { m->readAccnos(accnosfile, Groups); m->setGroups(Groups); } if (groupfile != "") { map variables; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); string outputFileName = getOutputFileName("summary", variables); outputNames.push_back(outputFileName); outputTypes["summary"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); GroupMap groupMap(groupfile); groupMap.readMap(); //make sure groups are valid //takes care of user setting groupNames that are invalid or setting groups=all SharedUtil util; vector nameGroups = groupMap.getNamesOfGroups(); util.setGroups(Groups, nameGroups); int total = 0; for (int i = 0; i < Groups.size(); i++) { int num = groupMap.getNumSeqs(Groups[i]); total += num; m->mothurOut(Groups[i] + " contains " + toString(num) + "."); m->mothurOutEndLine(); out << Groups[i] << '\t' << num << endl; } out.close(); m->mothurOut("\nTotal seqs: " + toString(total) + "."); m->mothurOutEndLine(); } if (m->control_pressed) { return 0; } if (countfile != "") { map variables; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); string outputFileName = getOutputFileName("summary", variables); outputNames.push_back(outputFileName); outputTypes["summary"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); CountTable ct; ct.readTable(countfile, true, false); //make sure groups are valid //takes care of user setting groupNames that are invalid or setting groups=all SharedUtil util; vector nameGroups = ct.getNamesOfGroups(); util.setGroups(Groups, nameGroups); int total = 0; for (int i = 0; i < Groups.size(); i++) { int num = ct.getGroupCount(Groups[i]); total += num; m->mothurOut(Groups[i] + " contains " + toString(num) + "."); m->mothurOutEndLine(); out << Groups[i] << '\t' << num << endl; } out.close(); m->mothurOut("\nTotal seqs: " + toString(total) + "."); m->mothurOutEndLine(); } if (m->control_pressed) { return 0; } if (sharedfile != "") { InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); map variables; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); string outputFileName = getOutputFileName("summary", variables); outputNames.push_back(outputFileName); outputTypes["summary"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); int total = 0; for (int i = 0; i < lookup.size(); i++) { int num = lookup[i]->getNumSeqs(); total += num; m->mothurOut(lookup[i]->getGroup() + " contains " + toString(num) + "."); m->mothurOutEndLine(); out << lookup[i]->getGroup() << '\t' << num << endl; delete lookup[i]; } out.close(); m->mothurOut("\nTotal seqs: " + toString(total) + "."); m->mothurOutEndLine(); } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "CountGroupsCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/countgroupscommand.h000066400000000000000000000017121255543666200223400ustar00rootroot00000000000000#ifndef COUNTGROUPSCOMMAND_H #define COUNTGROUPSCOMMAND_H /* * countgroupscommand.h * Mothur * * Created by westcott on 8/9/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" class CountGroupsCommand : public Command { public: CountGroupsCommand(string); CountGroupsCommand(); ~CountGroupsCommand(){} vector setParameters(); string getCommandName() { return "count.groups"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Count.groups"; } string getDescription() { return "counts the number of sequences in each group"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string sharedfile, groupfile, countfile, outputDir, groups, accnosfile; bool abort; vector Groups; vector outputNames; }; #endif mothur-1.36.1/source/commands/countseqscommand.cpp000066400000000000000000001140421255543666200223300ustar00rootroot00000000000000/* * countseqscommand.cpp * Mothur * * Created by westcott on 6/1/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "countseqscommand.h" #include "sharedutilities.h" #include "counttable.h" #include "inputdata.h" //********************************************************************************************************************** vector CountSeqsCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "NameSHared-sharedGroup", "NameSHared", "none","count",false,false,true); parameters.push_back(pshared); CommandParameter pname("name", "InputTypes", "", "", "NameSHared", "NameSHared", "none","count",false,false,true); parameters.push_back(pname); CommandParameter pgroup("group", "InputTypes", "", "", "sharedGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter plarge("large", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(plarge); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string CountSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The count.seqs aka. make.table command reads a name or shared file and outputs a .count_table file. You may also provide a group with the names file to get the counts broken down by group.\n"; helpString += "The groups parameter allows you to indicate which groups you want to include in the counts, by default all groups in your groupfile are used.\n"; helpString += "The large parameter indicates the name and group files are too large to fit in RAM.\n"; helpString += "When you use the groups parameter and a sequence does not represent any sequences from the groups you specify it is not included in the .count.summary file.\n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; helpString += "The count.seqs command should be in the following format: count.seqs(name=yourNameFile).\n"; helpString += "Example count.seqs(name=amazon.names) or make.table(name=amazon.names).\n"; helpString += "Note: No spaces between parameter labels (i.e. name), '=' and parameters (i.e.yourNameFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string CountSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "count") { pattern = "[filename],count_table-[filename],[distance],count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** CountSeqsCommand::CountSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "CountSeqsCommand"); exit(1); } } //********************************************************************************************************************** CountSeqsCommand::CountSeqsCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //check for required parameters namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found"){ namefile = ""; } else { m->setNameFile(namefile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found"){ sharedfile = ""; } else { m->setSharedFile(sharedfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } if ((namefile == "") && (sharedfile == "")) { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile or sharedfile and the name or shared parameter is required."); m->mothurOutEndLine(); abort = true; } } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = "all"; } m->splitAtDash(groups, Groups); m->setGroups(Groups); string temp = validParameter.validFile(parameters, "large", false); if (temp == "not found") { temp = "F"; } large = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } } } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "CountSeqsCommand"); exit(1); } } //********************************************************************************************************************** int CountSeqsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else processors=1; #endif map variables; if (namefile != "") { unsigned long long total = 0; int start = time(NULL); if (outputDir == "") { outputDir = m->hasPath(namefile); } variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(namefile)); string outputFileName = getOutputFileName("count", variables); if (!large) { total = processSmall(outputFileName); } else { total = processLarge(outputFileName); } if (m->control_pressed) { m->mothurRemove(outputFileName); return 0; } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to create a table for " + toString(total) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Total number of sequences: " + toString(total)); m->mothurOutEndLine(); }else { if (outputDir == "") { outputDir = m->hasPath(sharedfile); } variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup, variables); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup, variables); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup, variables); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } } //set rabund file as new current rabundfile itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { string current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for(int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "execute"); exit(1); } } //********************************************************************************************************************** unsigned long long CountSeqsCommand::processShared(vector& lookup, map variables){ try { variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("count", variables); outputNames.push_back(outputFileName); outputTypes["count"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); out << "OTU_Label\ttotal"; for (int i = 0; i < lookup.size(); i++) { out << '\t' << lookup[i]->getGroup(); } out << endl; for (int j = 0; j < lookup[0]->getNumBins(); j++) { if (m->control_pressed) { break; } int total = 0; string output = ""; for (int i = 0; i < lookup.size(); i++) { total += lookup[i]->getAbundance(j); output += '\t' + toString(lookup[i]->getAbundance(j)); } out << m->currentSharedBinLabels[j] << '\t' << total << output << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "processShared"); exit(1); } } //********************************************************************************************************************** unsigned long long CountSeqsCommand::processSmall(string outputFileName){ try { ofstream out; m->openOutputFile(outputFileName, out); outputTypes["count"].push_back(outputFileName); outputNames.push_back(outputFileName); outputTypes["count"].push_back(outputFileName); out << "Representative_Sequence\ttotal"; GroupMap* groupMap; if (groupfile != "") { groupMap = new GroupMap(groupfile); groupMap->readMap(); //make sure groups are valid. takes care of user setting groupNames that are invalid or setting groups=all SharedUtil* util = new SharedUtil(); vector nameGroups = groupMap->getNamesOfGroups(); util->setGroups(Groups, nameGroups); delete util; //sort groupNames so that the group title match the counts below, this is needed because the map object automatically sorts sort(Groups.begin(), Groups.end()); //print groupNames for (int i = 0; i < Groups.size(); i++) { out << '\t' << Groups[i]; } } out << endl; out.close(); unsigned long long total = createProcesses(groupMap, outputFileName); if (groupfile != "") { delete groupMap; } return total; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "processSmall"); exit(1); } } /**************************************************************************************************/ unsigned long long CountSeqsCommand::createProcesses(GroupMap*& groupMap, string outputFileName) { try { vector processIDS; int process = 0; vector positions; vector lines; unsigned long long numSeqs = 0; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFilePerLine(namefile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } #else if(processors == 1){ lines.push_back(linePair(0, 1000)); } else { unsigned long long numSeqs = 0; positions = m->setFilePosEachLine(namefile, numSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors-1) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ string filename = m->mothurGetpid(process) + ".temp"; numSeqs = driver(lines[process].start, lines[process].end, filename, groupMap); string tempFile = m->mothurGetpid(process) + ".num.temp"; ofstream outTemp; m->openOutputFile(tempFile, outTemp); outTemp << numSeqs << endl; outTemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;imothurRemove((toString(processIDS[i]) + ".temp")); m->mothurRemove((toString(processIDS[i]) + ".num.temp")); } m->control_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;imothurRemove((toString(processIDS[i]) + ".temp"));m->mothurRemove((toString(processIDS[i]) + ".num.temp"));}m->control_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); positions.clear(); lines.clear(); positions = m->divideFilePerLine(namefile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } numSeqs = 0; processIDS.resize(0); process = 0; while (process != processors-1) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ string filename = m->mothurGetpid(process) + ".temp"; numSeqs = driver(lines[process].start, lines[process].end, filename, groupMap); string tempFile = m->mothurGetpid(process) + ".num.temp"; ofstream outTemp; m->openOutputFile(tempFile, outTemp); outTemp << numSeqs << endl; outTemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } string filename = m->mothurGetpid(process) + ".temp"; numSeqs = driver(lines[processors-1].start, lines[processors-1].end, filename, groupMap); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, intemp); int num; intemp >> num; intemp.close(); numSeqs += num; m->mothurRemove(tempFile); } #else vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; vector copies; //Create processor worker threads. for( int i=0; igetCopy(groupMap); copies.push_back(copyGroup); vector cGroups = Groups; countData* temp = new countData(filename, copyGroup, m, lines[i].start, lines[i].end, groupfile, namefile, cGroups); pDataArray.push_back(temp); processIDS.push_back(i); hThreadArray[i] = CreateThread(NULL, 0, MyCountThreadFunction, pDataArray[i], 0, &dwThreadIdArray[i]); } string filename = toString(processors-1) + ".temp"; numSeqs = driver(lines[processors-1].start, lines[processors-1].end, filename, groupMap); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ numSeqs += pDataArray[i]->total; delete copies[i]; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //append output files for(int i=0;iappendFiles((toString(processIDS[i]) + ".temp"), outputFileName); m->mothurRemove((toString(processIDS[i]) + ".temp")); } m->appendFiles(filename, outputFileName); m->mothurRemove(filename); //sanity check if (groupfile != "") { if (numSeqs != groupMap->getNumSeqs()) { m->mothurOut("[ERROR]: processes reported processing " + toString(numSeqs) + " sequences, but group file indicates you have " + toString(groupMap->getNumSeqs()) + " sequences."); if (processors == 1) { m->mothurOut(" Could you have a file mismatch?\n"); } else { m->mothurOut(" Either you have a file mismatch or a process failed to complete the task assigned to it.\n"); m->control_pressed = true; } } } return numSeqs; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ unsigned long long CountSeqsCommand::driver(unsigned long long start, unsigned long long end, string outputFileName, GroupMap*& groupMap) { try { ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); in.seekg(start); //adjust start if null strings if (start == 0) { m->zapGremlins(in); m->gobble(in); } bool done = false; unsigned long long total = 0; while (!done) { if (m->control_pressed) { break; } string firstCol, secondCol; in >> firstCol; m->gobble(in); in >> secondCol; m->gobble(in); //cout << firstCol << '\t' << secondCol << endl; m->checkName(firstCol); m->checkName(secondCol); //cout << firstCol << '\t' << secondCol << endl; vector names; m->splitAtChar(secondCol, names, ','); if (groupfile != "") { //set to 0 map groupCounts; int total = 0; for (int i = 0; i < Groups.size(); i++) { groupCounts[Groups[i]] = 0; } //get counts for each of the users groups for (int i = 0; i < names.size(); i++) { string group = groupMap->getGroup(names[i]); if (group == "not found") { m->mothurOut("[ERROR]: " + names[i] + " is not in your groupfile, please correct."); m->mothurOutEndLine(); } else { map::iterator it = groupCounts.find(group); //if not found, then this sequence is not from a group we care about if (it != groupCounts.end()) { it->second++; total++; } } } if (total != 0) { out << firstCol << '\t' << total; for (map::iterator it = groupCounts.begin(); it != groupCounts.end(); it++) { out << '\t' << it->second; } out << endl; } }else { out << firstCol << '\t' << names.size() << endl; } total += names.size(); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= end)) { break; } #else if (in.eof()) { break; } #endif } in.close(); out.close(); return total; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "driver"); exit(1); } } //********************************************************************************************************************** unsigned long long CountSeqsCommand::processLarge(string outputFileName){ try { set namesOfGroups; map initial; for (set::iterator it = namesOfGroups.begin(); it != namesOfGroups.end(); it++) { initial[(*it)] = 0; } ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["count"].push_back(outputFileName); out << "Representative_Sequence\ttotal"; if (groupfile == "") { out << endl; } map namesToIndex; string outfile = m->getRootName(groupfile) + "sorted.groups.temp"; string outName = m->getRootName(namefile) + "sorted.name.temp"; map indexToName; map indexToGroup; if (groupfile != "") { time_t estart = time(NULL); //convert name file to redundant -> unique. set unique name equal to index so we can use vectors, save name for later. string newNameFile = m->getRootName(namefile) + ".name.temp"; string newGroupFile = m->getRootName(groupfile) + ".group.temp"; indexToName = processNameFile(newNameFile); indexToGroup = getGroupNames(newGroupFile, namesOfGroups); //sort file by first column so the names of sequences will be easier to find //use the unix sort #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) string command = "sort -n " + newGroupFile + " -o " + outfile; system(command.c_str()); command = "sort -n " + newNameFile + " -o " + outName; system(command.c_str()); #else //sort using windows sort string command = "sort " + newGroupFile + " /O " + outfile; system(command.c_str()); command = "sort " + newNameFile + " /O " + outName; system(command.c_str()); #endif m->mothurRemove(newNameFile); m->mothurRemove(newGroupFile); m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to sort and index the group and name files. "); m->mothurOutEndLine(); }else { outName = namefile; } time_t estart = time(NULL); //open input file ifstream in; m->openInputFile(outName, in); //open input file ifstream in2; unsigned long long total = 0; vector< vector > nameMapCount; if (groupfile != "") { m->openInputFile(outfile, in2); nameMapCount.resize(indexToName.size()); for (int i = 0; i < nameMapCount.size(); i++) { nameMapCount[i].resize(indexToGroup.size(), 0); } } while (!in.eof()) { if (m->control_pressed) { break; } string firstCol; in >> firstCol; m->gobble(in); if (groupfile != "") { int uniqueIndex; in >> uniqueIndex; m->gobble(in); string name; int groupIndex; in2 >> name >> groupIndex; m->gobble(in2); if (name != firstCol) { m->mothurOut("[ERROR]: found " + name + " in your groupfile, but " + firstCol + " was in your namefile, please correct.\n"); m->control_pressed = true; } nameMapCount[uniqueIndex][groupIndex]++; total++; }else { string secondCol; in >> secondCol; m->gobble(in); int num = m->getNumNames(secondCol); out << firstCol << '\t' << num << endl; total += num; } } in.close(); if (groupfile != "") { m->mothurRemove(outfile); m->mothurRemove(outName); in2.close(); for (map::iterator it = indexToGroup.begin(); it != indexToGroup.end(); it++) { out << it->second << '\t'; } out << endl; for (int i = 0; i < nameMapCount.size(); i++) { string totalsLine = ""; int seqTotal = 0; for (int j = 0; j < nameMapCount[i].size(); j++) { seqTotal += nameMapCount[i][j]; totalsLine += '\t' + toString(nameMapCount[i][j]); } out << indexToName[i] << '\t' << seqTotal << totalsLine << endl; } } out.close(); m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to create the count table file. "); m->mothurOutEndLine(); return total; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "processLarge"); exit(1); } } /**************************************************************************************************/ map CountSeqsCommand::processNameFile(string name) { try { map indexToNames; ofstream out; m->openOutputFile(name, out); //open input file ifstream in; m->openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; int count = 0; while (!in.eof()) { if (m->control_pressed) { break; } in.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { m->checkName(firstCol); m->checkName(secondCol); //parse names into vector vector theseNames; m->splitAtComma(secondCol, theseNames); for (int i = 0; i < theseNames.size(); i++) { out << theseNames[i] << '\t' << count << endl; } indexToNames[count] = firstCol; pairDone = false; count++; } } } in.close(); if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { m->checkName(firstCol); m->checkName(secondCol); //parse names into vector vector theseNames; m->splitAtComma(secondCol, theseNames); for (int i = 0; i < theseNames.size(); i++) { out << theseNames[i] << '\t' << count << endl; } indexToNames[count] = firstCol; pairDone = false; count++; } } } out.close(); return indexToNames; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "processNameFile"); exit(1); } } /**************************************************************************************************/ map CountSeqsCommand::getGroupNames(string filename, set& namesOfGroups) { try { map indexToGroups; map groupIndex; map::iterator it; ofstream out; m->openOutputFile(filename, out); //open input file ifstream in; m->openInputFile(groupfile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; int count = 0; while (!in.eof()) { if (m->control_pressed) { break; } in.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { m->checkName(firstCol); it = groupIndex.find(secondCol); if (it == groupIndex.end()) { //add group, assigning the group and number so we can use vectors above groupIndex[secondCol] = count; count++; } out << firstCol << '\t' << groupIndex[secondCol] << endl; namesOfGroups.insert(secondCol); pairDone = false; } } } in.close(); if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { m->checkName(firstCol); it = groupIndex.find(secondCol); if (it == groupIndex.end()) { //add group, assigning the group and number so we can use vectors above groupIndex[secondCol] = count; count++; } out << firstCol << '\t' << groupIndex[secondCol] << endl; namesOfGroups.insert(secondCol); pairDone = false; } } } out.close(); for (it = groupIndex.begin(); it != groupIndex.end(); it++) { indexToGroups[it->second] = it->first; } return indexToGroups; } catch(exception& e) { m->errorOut(e, "CountSeqsCommand", "getGroupNames"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/countseqscommand.h000066400000000000000000000114371255543666200220010ustar00rootroot00000000000000#ifndef COuNTSEQSCOMMAND_H #define COuNTSEQSCOMMAND_H /* * countseqscommand.h * Mothur * * Created by westcott on 6/1/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "groupmap.h" #include "sharedrabundvector.h" class CountSeqsCommand : public Command { public: CountSeqsCommand(string); CountSeqsCommand(); ~CountSeqsCommand(){} vector setParameters(); string getCommandName() { return "count.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Count.seqs"; } string getDescription() { return "makes a count file from a names or shared file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string namefile, groupfile, outputDir, groups, sharedfile; bool abort, large, allLines; vector Groups, outputNames; int processors; set labels; unsigned long long processSmall(string); unsigned long long processLarge(string); map processNameFile(string); map getGroupNames(string, set&); unsigned long long createProcesses(GroupMap*&, string); unsigned long long driver(unsigned long long, unsigned long long, string, GroupMap*&); unsigned long long processShared(vector& lookup, map variables); }; /***********************************************************************/ struct countData { unsigned long long start; unsigned long long end; MothurOut* m; string outputFileName, namefile, groupfile; GroupMap* groupMap; int total; vector Groups; countData(){} countData(string fn, GroupMap* g, MothurOut* mout, unsigned long long st, unsigned long long en, string gfn, string nfn, vector gr) { m = mout; start = st; end = en; groupMap = g; groupfile = gfn; namefile = nfn; outputFileName = fn; Groups = gr; total = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyCountThreadFunction(LPVOID lpParam){ countData* pDataArray; pDataArray = (countData*)lpParam; try { ofstream out; pDataArray->m->openOutputFile(pDataArray->outputFileName, out); ifstream in; pDataArray->m->openInputFile(pDataArray->namefile, in); in.seekg(pDataArray->start); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->zapGremlins(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } pDataArray->total = 0; for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { break; } string firstCol, secondCol; in >> firstCol; pDataArray->m->gobble(in); in >> secondCol; pDataArray->m->gobble(in); //cout << firstCol << '\t' << secondCol << endl; pDataArray->m->checkName(firstCol); pDataArray->m->checkName(secondCol); vector names; pDataArray->m->splitAtChar(secondCol, names, ','); if (pDataArray->groupfile != "") { //set to 0 map groupCounts; int total = 0; for (int i = 0; i < pDataArray->Groups.size(); i++) { groupCounts[pDataArray->Groups[i]] = 0; } //get counts for each of the users groups for (int i = 0; i < names.size(); i++) { string group = pDataArray->groupMap->getGroup(names[i]); if (group == "not found") { pDataArray->m->mothurOut("[ERROR]: " + names[i] + " is not in your groupfile, please correct."); pDataArray->m->mothurOutEndLine(); } else { map::iterator it = groupCounts.find(group); //if not found, then this sequence is not from a group we care about if (it != groupCounts.end()) { it->second++; total++; } } } if (total != 0) { out << firstCol << '\t' << total; for (map::iterator it = groupCounts.begin(); it != groupCounts.end(); it++) { out << '\t' << it->second; } out << endl; } }else { out << firstCol << '\t' << names.size() << endl; } pDataArray->total += names.size(); } in.close(); out.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "CountSeqsCommand", "MyCountThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/createdatabasecommand.cpp000066400000000000000000001053351255543666200232410ustar00rootroot00000000000000// // createdatabasecommand.cpp // Mothur // // Created by Sarah Westcott on 3/28/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "createdatabasecommand.h" #include "inputdata.h" //********************************************************************************************************************** vector CreateDatabaseCommand::setParameters(){ try { CommandParameter pfasta("repfasta", "InputTypes", "", "", "none", "none", "none","database",false,true,true); parameters.push_back(pfasta); CommandParameter pname("repname", "InputTypes", "", "", "NameCount", "NameCount", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "NameCount", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pconstaxonomy("constaxonomy", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pconstaxonomy); CommandParameter plist("list", "InputTypes", "", "", "ListShared", "ListShared", "none","",false,false,true); parameters.push_back(plist); CommandParameter pshared("shared", "InputTypes", "", "", "ListShared", "ListShared", "none","",false,false,true); parameters.push_back(pshared); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string CreateDatabaseCommand::getHelpString(){ try { string helpString = ""; helpString += "The create.database command reads a list file or a shared file, *.cons.taxonomy, *.rep.fasta, *.rep.names and optional groupfile, or count file and creates a database file.\n"; helpString += "The create.database command parameters are repfasta, list, shared, repname, constaxonomy, group, count and label. List, repfasta, repnames or count, and constaxonomy are required.\n"; helpString += "The repfasta file is fasta file outputted by get.oturep(fasta=yourFastaFile, list=yourListfile, column=yourDistFile, name=yourNameFile).\n"; helpString += "The repname file is the name file outputted by get.oturep(fasta=yourFastaFile, list=yourListfile, column=yourDistFile, name=yourNameFile).\n"; helpString += "The count file is the count file outputted by get.oturep(fasta=yourFastaFile, list=yourListfile, column=yourDistFile, count=yourCountFile). If it includes group info, mothur will give you the abundance breakdown by group. \n"; helpString += "The constaxonomy file is the taxonomy file outputted by classify.otu(list=yourListfile, taxonomy=yourTaxonomyFile, name=yourNameFile).\n"; helpString += "The group file is optional and will just give you the abundance breakdown by group.\n"; helpString += "The label parameter allows you to specify a label to be used from your listfile.\n"; helpString += "NOTE: Make SURE the repfasta, repnames and contaxonomy are for the same label as the listfile.\n"; helpString += "The create.database command should be in the following format: \n"; helpString += "create.database(repfasta=yourFastaFileFromGetOTURep, repname=yourNameFileFromGetOTURep, contaxonomy=yourConTaxFileFromClassifyOTU, list=yourListFile) \n"; helpString += "Example: create.database(repfasta=final.an.0.03.rep.fasta, repname=final.an.0.03.rep.names, list=final.an.list, label=0.03, contaxonomy=final.an.0.03.cons.taxonomy) \n"; helpString += "Note: No spaces between parameter labels (i.e. repfasta), '=' and parameters (i.e.yourFastaFileFromGetOTURep).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string CreateDatabaseCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "database") { pattern = "[filename],database"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** CreateDatabaseCommand::CreateDatabaseCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["database"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "CreateDatabaseCommand"); exit(1); } } //********************************************************************************************************************** CreateDatabaseCommand::CreateDatabaseCommand(string option) { try{ abort = false; calledHelp = false; //allow user to run help if (option == "help") { help(); abort = true; calledHelp = true; }else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["database"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("repname"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["repname"] = inputDir + it->second; } } it = parameters.find("constaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["constaxonomy"] = inputDir + it->second; } } it = parameters.find("repfasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["repfasta"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not found") { listfile = ""; } else if (listfile == "not open") { listfile = ""; abort = true; } else { m->setListFile(listfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not found") { sharedfile = ""; } else if (sharedfile == "not open") { sharedfile = ""; abort = true; } else { m->setSharedFile(sharedfile); } if ((sharedfile == "") && (listfile == "")) { //is there are current file available for either of these? //give priority to list, then shared listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared or list file before you can use the create.database command."); m->mothurOutEndLine(); abort = true; } } } else if ((sharedfile != "") && (listfile != "")) { m->mothurOut("When executing a create.database command you must enter ONLY ONE of the following: shared or list."); m->mothurOutEndLine(); abort = true; } if (sharedfile != "") { if (outputDir == "") { outputDir = m->hasPath(sharedfile); } } else { if (outputDir == "") { outputDir = m->hasPath(listfile); } } contaxonomyfile = validParameter.validFile(parameters, "constaxonomy", true); if (contaxonomyfile == "not found") { //if there is a current list file, use it contaxonomyfile = ""; m->mothurOut("The constaxonomy parameter is required, aborting."); m->mothurOutEndLine(); abort = true; } else if (contaxonomyfile == "not open") { contaxonomyfile = ""; abort = true; } repfastafile = validParameter.validFile(parameters, "repfasta", true); if (repfastafile == "not found") { //if there is a current list file, use it repfastafile = ""; m->mothurOut("The repfasta parameter is required, aborting."); m->mothurOutEndLine(); abort = true; } else if (repfastafile == "not open") { repfastafile = ""; abort = true; } repnamesfile = validParameter.validFile(parameters, "repname", true); if (repnamesfile == "not found") { repnamesfile = ""; } else if (repnamesfile == "not open") { repnamesfile = ""; abort = true; } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not found") { countfile = ""; } else if (countfile == "not open") { countfile = ""; abort = true; } if ((countfile == "") && (repnamesfile == "")) { //if there is a current name file, use it, else look for current count file string repnamesfile = m->getNameFile(); if (repnamesfile != "") { m->mothurOut("Using " + repnamesfile + " as input file for the repname parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("[ERROR]: You must provide a count or repname file."); m->mothurOutEndLine(); abort = true; } } } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your listfile.\n");} } } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "CreateDatabaseCommand"); exit(1); } } //********************************************************************************************************************** int CreateDatabaseCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //taxonomies holds the taxonomy info for each Otu //classifyOtuSizes holds the size info of each Otu to help with error checking vector taxonomies; vector otuLabels; vector classifyOtuSizes = readTax(taxonomies, otuLabels); if (m->control_pressed) { return 0; } vector seqs; vector repOtusSizes = readFasta(seqs); if (m->control_pressed) { return 0; } //names redundants to uniques. backwards to how we normally do it, but each bin is the list file will be a key entry in the map. map repNames; map nameMap; int numUniqueNamesFile = 0; CountTable ct; if (countfile == "") { numUniqueNamesFile = m->readNames(repnamesfile, repNames, 1); //the repnames file does not have the same order as the list file bins so we need to sort and reassemble for the search below map tempRepNames; for (map::iterator it = repNames.begin(); it != repNames.end();) { string bin = it->first; vector temp; m->splitAtChar(bin, temp, ','); sort(temp.begin(), temp.end()); bin = ""; for (int i = 0; i < temp.size()-1; i++) { bin += temp[i] + ','; } bin += temp[temp.size()-1]; tempRepNames[bin] = it->second; repNames.erase(it++); } repNames = tempRepNames; }else { ct.readTable(countfile, true, false); numUniqueNamesFile = ct.getNumUniqueSeqs(); nameMap = ct.getNameMap(); } //are there the same number of otus in the fasta and name files if (repOtusSizes.size() != numUniqueNamesFile) { m->mothurOut("[ERROR]: you have " + toString(numUniqueNamesFile) + " unique seqs in your repname file, but " + toString(repOtusSizes.size()) + " seqs in your repfasta file. These should match.\n"); m->control_pressed = true; } if (m->control_pressed) { return 0; } //are there the same number of OTUs in the tax and fasta file if (classifyOtuSizes.size() != repOtusSizes.size()) { m->mothurOut("[ERROR]: you have " + toString(classifyOtuSizes.size()) + " taxonomies in your contaxonomy file, but " + toString(repOtusSizes.size()) + " seqs in your repfasta file. These should match.\n"); m->control_pressed = true; } if (m->control_pressed) { return 0; } //at this point we have the same number of OTUs. Are the sizes we have found so far accurate? for (int i = 0; i < classifyOtuSizes.size(); i++) { if (classifyOtuSizes[i] != repOtusSizes[i]) { m->mothurOut("[ERROR]: OTU size info does not match for bin " + toString(i+1) + ". The contaxonomy file indicated the OTU represented " + toString(classifyOtuSizes[i]) + " sequences, but the repfasta file had " + toString(repOtusSizes[i]) + ". These should match. Make sure you are using files for the same distance.\n"); m->control_pressed = true; } } if (m->control_pressed) { return 0; } map variables; if (listfile != "") { variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); } else { variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); } string outputFileName = getOutputFileName("database", variables); outputNames.push_back(outputFileName); outputTypes["database"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); string header = "OTUNumber\tAbundance"; if (listfile != "") { //at this point we are fairly sure the repfasta, repnames and contaxonomy files match so lets proceed with the listfile ListVector* list = getList(); if (otuLabels.size() != list->getNumBins()) { m->mothurOut("[ERROR]: you have " + toString(otuLabels.size()) + " otus in your contaxonomy file, but your list file has " + toString(list->getNumBins()) + " otus. These should match. Make sure you are using files for the same distance.\n"); m->control_pressed = true; } if (m->control_pressed) { delete list; return 0; } GroupMap* groupmap = NULL; if (groupfile != "") { groupmap = new GroupMap(groupfile); groupmap->readMap(); } if (m->control_pressed) { delete list; if (groupfile != "") { delete groupmap; } return 0; } if (groupfile != "") { header = "OTUNumber"; for (int i = 0; i < groupmap->getNamesOfGroups().size(); i++) { header += '\t' + (groupmap->getNamesOfGroups())[i]; } }else if (countfile != "") { if (ct.hasGroupInfo()) { header = "OTUNumber"; for (int i = 0; i < ct.getNamesOfGroups().size(); i++) { header += '\t' + (ct.getNamesOfGroups())[i]; } } } header += "\trepSeqName\trepSeq\tOTUConTaxonomy"; out << header << endl; vector binLabels = list->getLabels(); for (int i = 0; i < list->getNumBins(); i++) { int index = findIndex(otuLabels, binLabels[i]); if (index == -1) { m->mothurOut("[ERROR]: " + binLabels[i] + " is not in your constaxonomy file, aborting.\n"); m->control_pressed = true; } if (m->control_pressed) { break; } out << otuLabels[index]; vector binNames; string bin = list->get(i); m->splitAtComma(bin, binNames); string seqRepName = ""; int numSeqsRep = 0; if (countfile == "") { sort(binNames.begin(), binNames.end()); bin = ""; for (int j = 0; j < binNames.size()-1; j++) { bin += binNames[j] + ','; } bin += binNames[binNames.size()-1]; map::iterator it = repNames.find(bin); if (it == repNames.end()) { m->mothurOut("[ERROR: OTU " + otuLabels[index] + " is not in the repnames file. Make sure you are using files for the same distance.\n"); m->control_pressed = true; break; }else { seqRepName = it->second; numSeqsRep = binNames.size(); } //sanity check if (binNames.size() != classifyOtuSizes[index]) { m->mothurOut("[ERROR: OTU " + otuLabels[index] + " contains " + toString(binNames.size()) + " sequence, but the rep and taxonomy files indicated this OTU should have " + toString(classifyOtuSizes[index]) + ". Make sure you are using files for the same distance.\n"); m->control_pressed = true; break; } }else { //find rep sequence in bin for (int j = 0; j < binNames.size(); j++) { map::iterator itNameMap = nameMap.find(binNames[j]); //if you are in the counttable you must be the rep. because get.oturep with a countfile only includes the rep sequences in the rep.count file. if (itNameMap != nameMap.end()) { seqRepName = itNameMap->first; numSeqsRep = itNameMap->second; j += binNames.size(); } } if (seqRepName == "") { m->mothurOut("[ERROR: OTU " + otuLabels[index] + " is not in the count file. Make sure you are using files for the same distance.\n"); m->control_pressed = true; break; } if (numSeqsRep != classifyOtuSizes[i]) { m->mothurOut("[ERROR: OTU " + otuLabels[index] + " contains " + toString(numSeqsRep) + " sequence, but the rep and taxonomy files indicated this OTU should have " + toString(classifyOtuSizes[index]) + ". Make sure you are using files for the same distance.\n"); m->control_pressed = true; break; } } //output abundances if (groupfile != "") { string groupAbunds = ""; map counts; //initialize counts to 0 for (int j = 0; j < groupmap->getNamesOfGroups().size(); j++) { counts[(groupmap->getNamesOfGroups())[j]] = 0; } //find abundances by group bool error = false; for (int j = 0; j < binNames.size(); j++) { string group = groupmap->getGroup(binNames[j]); if (group == "not found") { m->mothurOut("[ERROR]: " + binNames[j] + " is not in your groupfile, please correct.\n"); error = true; }else { counts[group]++; } } //output counts for (int j = 0; j < groupmap->getNamesOfGroups().size(); j++) { out << '\t' << counts[(groupmap->getNamesOfGroups())[j]]; } if (error) { m->control_pressed = true; } }else if (countfile != "") { if (ct.hasGroupInfo()) { vector groupCounts = ct.getGroupCounts(seqRepName); for (int j = 0; j < groupCounts.size(); j++) { out << '\t' << groupCounts[j]; } }else { out << '\t' << numSeqsRep; } }else { out << '\t' << numSeqsRep; } //output repSeq out << '\t' << seqRepName << '\t' << seqs[index].getAligned() << '\t' << taxonomies[index] << endl; } delete list; if (groupfile != "") { delete groupmap; } }else { vector lookup = getShared(); header = "OTUNumber"; for (int i = 0; i < lookup.size(); i++) { header += '\t' + lookup[i]->getGroup(); } header += "\trepSeqName\trepSeq\tOTUConTaxonomy"; out << header << endl; for (int h = 0; h < lookup[0]->getNumBins(); h++) { if (m->control_pressed) { break; } int index = findIndex(otuLabels, m->currentSharedBinLabels[h]); if (index == -1) { m->mothurOut("[ERROR]: " + m->currentSharedBinLabels[h] + " is not in your constaxonomy file, aborting.\n"); m->control_pressed = true; } if (m->control_pressed) { break; } out << otuLabels[index]; int totalAbund = 0; for (int i = 0; i < lookup.size(); i++) { int abund = lookup[i]->getAbundance(h); totalAbund += abund; out << '\t' << abund; } //sanity check if (totalAbund != classifyOtuSizes[index]) { m->mothurOut("[WARNING]: OTU " + m->currentSharedBinLabels[h] + " contains " + toString(totalAbund) + " sequence, but the rep and taxonomy files indicated this OTU should have " + toString(classifyOtuSizes[index]) + ". Make sure you are using files for the same distance.\n"); //m->control_pressed = true; break; } //output repSeq out << '\t' << seqs[index].getName() << '\t' << seqs[index].getAligned() << '\t' << taxonomies[index] << endl; } } out.close(); if (m->control_pressed) { m->mothurRemove(outputFileName); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFileName); m->mothurOutEndLine(); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "execute"); exit(1); } } //********************************************************************************************************************** int CreateDatabaseCommand::findIndex(vector& otuLabels, string label){ try { int index = -1; for (int i = 0; i < otuLabels.size(); i++) { if (m->isLabelEquivalent(otuLabels[i],label)) { index = i; break; } } return index; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "findIndex"); exit(1); } } //********************************************************************************************************************** vector CreateDatabaseCommand::readTax(vector& taxonomies, vector& otuLabels){ try { vector sizes; ifstream in; m->openInputFile(contaxonomyfile, in); //read headers m->getline(in); while (!in.eof()) { if (m->control_pressed) { break; } string otu = ""; string tax = "unknown"; int size = 0; in >> otu >> size >> tax; m->gobble(in); sizes.push_back(size); taxonomies.push_back(tax); otuLabels.push_back(otu); } in.close(); return sizes; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "readTax"); exit(1); } } //********************************************************************************************************************** vector CreateDatabaseCommand::readFasta(vector& seqs){ try { vector sizes; ifstream in; m->openInputFile(repfastafile, in); set sanity; while (!in.eof()) { if (m->control_pressed) { break; } string binInfo; Sequence seq(in, binInfo, true); m->gobble(in); //the binInfo should look like - binNumber|size ie. 1|200 if it is binNumber|size|group then the user gave us the wrong repfasta file vector info; m->splitAtChar(binInfo, info, '|'); //if (info.size() != 2) { m->mothurOut("[ERROR]: your repfasta file is not the right format. The create database command is designed to be used with the output from get.oturep. When running get.oturep you can not use a group file, because mothur is only expecting one representative sequence per OTU and when you use a group file with get.oturep a representative is found for each group.\n"); m->control_pressed = true; break;} int size = 0; m->mothurConvert(info[1], size); int binNumber = 0; string temp = ""; for (int i = 0; i < info[0].size(); i++) { if (isspace(info[0][i])) {;}else{temp +=info[0][i]; } } m->mothurConvert(m->getSimpleLabel(temp), binNumber); set::iterator it = sanity.find(binNumber); if (it != sanity.end()) { m->mothurOut("[ERROR]: your repfasta file is not the right format. The create database command is designed to be used with the output from get.oturep. When running get.oturep you can not use a group file, because mothur is only expecting one representative sequence per OTU and when you use a group file with get.oturep a representative is found for each group.\n"); m->control_pressed = true; break; }else { sanity.insert(binNumber); } sizes.push_back(size); seqs.push_back(seq); } in.close(); return sizes; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** ListVector* CreateDatabaseCommand::getList(){ try { InputData* input = new InputData(listfile, "list"); ListVector* list = input->getListVector(); string lastLabel = list->getLabel(); if (label == "") { label = lastLabel; delete input; return list; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { delete input; return list; } if(labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); break; } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input->getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); break; } lastLabel = list->getLabel(); //get next line to process //prevent memory leak delete list; list = input->getListVector(); } if (m->control_pressed) { delete input; return list; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { delete list; list = input->getListVector(lastLabel); } delete input; return list; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "getList"); exit(1); } } //********************************************************************************************************************** vector CreateDatabaseCommand::getShared(){ try { InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (label == "") { label = lastLabel; return lookup; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return lookup; } if(labels.count(lookup[0]->getLabel()) == 1){ processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); break; } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return lookup; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); } return lookup; } catch(exception& e) { m->errorOut(e, "CreateDatabaseCommand", "getShared"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/createdatabasecommand.h000066400000000000000000000025431255543666200227030ustar00rootroot00000000000000#ifndef Mothur_createdatabasecommand_h #define Mothur_createdatabasecommand_h // // createdatabasecommand.h // Mothur // // Created by Sarah Westcott on 3/28/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "listvector.hpp" #include "sequence.hpp" class CreateDatabaseCommand : public Command { public: CreateDatabaseCommand(string); CreateDatabaseCommand(); ~CreateDatabaseCommand(){} vector setParameters(); string getCommandName() { return "create.database"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Create.database"; } string getDescription() { return "creates database file that includes, abundances across groups, representative sequences, and taxonomy for each OTU"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string sharedfile, listfile, groupfile, repfastafile, repnamesfile, contaxonomyfile, label, outputDir, countfile; vector outputNames; vector readFasta(vector&); vector readTax(vector&, vector&); ListVector* getList(); vector getShared(); int findIndex(vector&, string); }; #endif mothur-1.36.1/source/commands/deconvolutecommand.cpp000066400000000000000000000363221255543666200226370ustar00rootroot00000000000000/* * deconvolute.cpp * Mothur * * Created by Sarah Westcott on 1/21/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "deconvolutecommand.h" #include "sequence.hpp" //********************************************************************************************************************** vector DeconvoluteCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta-name",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "DeconvoluteCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string DeconvoluteCommand::getHelpString(){ try { string helpString = ""; helpString += "The unique.seqs command reads a fastafile and creates a name or count file.\n"; helpString += "It creates a file where the first column is the groupname and the second column is a list of sequence names who have the same sequence. \n"; helpString += "If the sequence is unique the second column will just contain its name. \n"; helpString += "The unique.seqs command parameters are fasta and name. fasta is required, unless there is a valid current fasta file.\n"; helpString += "The unique.seqs command should be in the following format: \n"; helpString += "unique.seqs(fasta=yourFastaFile) \n"; return helpString; } catch(exception& e) { m->errorOut(e, "DeconvoluteCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string DeconvoluteCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],unique,[extension]"; } else if (type == "name") { pattern = "[filename],names-[filename],[tag],names"; } else if (type == "count") { pattern = "[filename],count_table-[filename],[tag],count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "DeconvoluteCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** DeconvoluteCommand::DeconvoluteCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "DeconvoluteCommand", "DeconvoluteCommand"); exit(1); } } /**************************************************************************************/ DeconvoluteCommand::DeconvoluteCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters inFastaName = validParameter.validFile(parameters, "fasta", true); if (inFastaName == "not open") { abort = true; } else if (inFastaName == "not found") { inFastaName = m->getFastaFile(); if (inFastaName != "") { m->mothurOut("Using " + inFastaName + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(inFastaName); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(inFastaName); //if user entered a file with a path then preserve it } oldNameMapFName = validParameter.validFile(parameters, "name", true); if (oldNameMapFName == "not open") { oldNameMapFName = ""; abort = true; } else if (oldNameMapFName == "not found"){ oldNameMapFName = ""; } else { m->setNameFile(oldNameMapFName); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (oldNameMapFName != "")) { m->mothurOut("When executing a unique.seqs command you must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if (oldNameMapFName == "") { vector files; files.push_back(inFastaName); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "DeconvoluteCommand", "DeconvoluteCommand"); exit(1); } } /**************************************************************************************/ int DeconvoluteCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //prepare filenames and open files map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inFastaName)); string outNameFile = getOutputFileName("name", variables); string outCountFile = getOutputFileName("count", variables); variables["[extension]"] = m->getExtension(inFastaName); string outFastaFile = getOutputFileName("fasta", variables); map nameMap; map::iterator itNames; if (oldNameMapFName != "") { m->readNames(oldNameMapFName, nameMap); if (oldNameMapFName == outNameFile){ //prepare filenames and open files map mvariables; mvariables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inFastaName)); mvariables["[tag]"] = "unique"; outNameFile = getOutputFileName("name", mvariables); } } CountTable ct; if (countfile != "") { ct.readTable(countfile, true, false); if (countfile == outCountFile){ //prepare filenames and open files map mvariables; mvariables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inFastaName)); mvariables["[tag]"] = "unique"; outCountFile = getOutputFileName("count", mvariables); } } if (m->control_pressed) { return 0; } ifstream in; m->openInputFile(inFastaName, in); ofstream outFasta; m->openOutputFile(outFastaFile, outFasta); map sequenceStrings; //sequenceString -> list of names. "atgc...." -> seq1,seq2,seq3. map::iterator itStrings; set nameInFastaFile; //for sanity checking set::iterator itname; vector nameFileOrder; int count = 0; while (!in.eof()) { if (m->control_pressed) { in.close(); outFasta.close(); m->mothurRemove(outFastaFile); return 0; } Sequence seq(in); if (seq.getName() != "") { //sanity checks itname = nameInFastaFile.find(seq.getName()); if (itname == nameInFastaFile.end()) { nameInFastaFile.insert(seq.getName()); } else { m->mothurOut("[ERROR]: You already have a sequence named " + seq.getName() + " in your fasta file, sequence names must be unique, please correct."); m->mothurOutEndLine(); } itStrings = sequenceStrings.find(seq.getAligned()); if (itStrings == sequenceStrings.end()) { //this is a new unique sequence //output to unique fasta file seq.printSequence(outFasta); if (oldNameMapFName != "") { itNames = nameMap.find(seq.getName()); if (itNames == nameMap.end()) { //namefile and fastafile do not match m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file, and not in your namefile, please correct."); m->mothurOutEndLine(); }else { sequenceStrings[seq.getAligned()] = itNames->second; nameFileOrder.push_back(seq.getAligned()); } }else if (countfile != "") { ct.getNumSeqs(seq.getName()); //checks to make sure seq is in table sequenceStrings[seq.getAligned()] = seq.getName(); nameFileOrder.push_back(seq.getAligned()); }else { sequenceStrings[seq.getAligned()] = seq.getName(); nameFileOrder.push_back(seq.getAligned()); } }else { //this is a dup if (oldNameMapFName != "") { itNames = nameMap.find(seq.getName()); if (itNames == nameMap.end()) { //namefile and fastafile do not match m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file, and not in your namefile, please correct."); m->mothurOutEndLine(); }else { sequenceStrings[seq.getAligned()] += "," + itNames->second; } }else if (countfile != "") { int num = ct.getNumSeqs(seq.getName()); //checks to make sure seq is in table if (num != 0) { //its in the table ct.mergeCounts(itStrings->second, seq.getName()); //merges counts and saves in uniques name } }else { sequenceStrings[seq.getAligned()] += "," + seq.getName(); } } count++; } m->gobble(in); if(count % 1000 == 0) { m->mothurOutJustToScreen(toString(count) + "\t" + toString(sequenceStrings.size()) + "\n"); } } if(count % 1000 != 0) { m->mothurOut(toString(count) + "\t" + toString(sequenceStrings.size())); m->mothurOutEndLine(); } in.close(); outFasta.close(); if (m->control_pressed) { m->mothurRemove(outFastaFile); return 0; } //print new names file ofstream outNames; if (countfile == "") { m->openOutputFile(outNameFile, outNames); outputNames.push_back(outNameFile); outputTypes["name"].push_back(outNameFile); } else { m->openOutputFile(outCountFile, outNames); ct.printHeaders(outNames); outputTypes["count"].push_back(outCountFile); outputNames.push_back(outCountFile); } for (int i = 0; i < nameFileOrder.size(); i++) { if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outFastaFile); outNames.close(); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } itStrings = sequenceStrings.find(nameFileOrder[i]); if (itStrings != sequenceStrings.end()) { if (countfile == "") { //get rep name int pos = (itStrings->second).find_first_of(','); if (pos == string::npos) { // only reps itself outNames << itStrings->second << '\t' << itStrings->second << endl; }else { outNames << (itStrings->second).substr(0, pos) << '\t' << itStrings->second << endl; } }else { ct.printSeq(outNames, itStrings->second); } }else{ m->mothurOut("[ERROR]: mismatch in namefile print."); m->mothurOutEndLine(); m->control_pressed = true; } } outNames.close(); if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outFastaFile); for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); outputNames.push_back(outFastaFile); outputTypes["fasta"].push_back(outFastaFile); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "DeconvoluteCommand", "execute"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/deconvolutecommand.h000066400000000000000000000023671255543666200223060ustar00rootroot00000000000000#ifndef DECONVOLUTECOMMAND_H #define DECONVOLUTECOMMAND_H /* * deconvolute.h * Mothur * * Created by Sarah Westcott on 1/21/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "fastamap.h" #include "counttable.h" /* The unique.seqs command reads a fasta file, finds the duplicate sequences and outputs a names file containing 2 columns. The first being the groupname and the second the list of identical sequence names. */ class DeconvoluteCommand : public Command { public: DeconvoluteCommand(string); DeconvoluteCommand(); ~DeconvoluteCommand() {} vector setParameters(); string getCommandName() { return "unique.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Unique.seqs"; } string getDescription() { return "creates a fasta containing the unique sequences as well as a namesfile with the names each sequence represents"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string inFastaName, oldNameMapFName, outputDir, countfile; vector outputNames; bool abort; }; #endif mothur-1.36.1/source/commands/degapseqscommand.cpp000066400000000000000000000477231255543666200222730ustar00rootroot00000000000000/* * degapseqscommand.cpp * Mothur * * Created by westcott on 6/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "degapseqscommand.h" //********************************************************************************************************************** vector DegapSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "DegapSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string DegapSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The degap.seqs command reads a fastafile and removes all gap characters.\n"; helpString += "The degap.seqs command parameter are fasta and processors.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your sequences, and is required unless you have a valid current fasta file. \n"; helpString += "You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amzon.fasta \n"; helpString += "The processors paramter allows you to enter the number of processors you would like to use. \n"; helpString += "The degap.seqs command should be in the following format: \n"; helpString += "degap.seqs(fasta=yourFastaFile) \n"; helpString += "Example: degap.seqs(fasta=abrecovery.align) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "DegapSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string DegapSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],ng.fasta"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "DegapSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** DegapSeqsCommand::DegapSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "DegapSeqsCommand", "DegapSeqsCommand"); exit(1); } } //*************************************************************************************************************** DegapSeqsCommand::DegapSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", false); if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { fastaFileNames.push_back(fastafile); m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else { m->splitAtDash(fastafile, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } ifstream in; int ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(fastafile); //if user entered a file with a path then preserve it } } } catch(exception& e) { m->errorOut(e, "DegapSeqsCommand", "DegapSeqsCommand"); exit(1); } } //*************************************************************************************************************** int DegapSeqsCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } for (int s = 0; s < fastaFileNames.size(); s++) { m->mothurOut("Degapping sequences from " + fastaFileNames[s] + " ..." ); m->mothurOutEndLine(); string tempOutputDir = outputDir; if (outputDir == "") { tempOutputDir = m->hasPath(fastaFileNames[s]); } map variables; variables["[filename]"] = tempOutputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])); string degapFile = getOutputFileName("fasta", variables); outputNames.push_back(degapFile); outputTypes["fasta"].push_back(degapFile); int start = time(NULL); int numSeqs = createProcesses(fastaFileNames[s], degapFile); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to degap " + toString(numSeqs) + " sequences.\n\n"); if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "DegapSeqsCommand", "execute"); exit(1); } } //*************************************************************************************************************** int DegapSeqsCommand::createProcesses(string filename, string outputFileName){ try{ int numSeqs = 0; vector processIDS; processIDS.resize(0); bool recalc = false; vector lines; vector positions; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } int process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ numSeqs = driver(lines[process], filename, outputFileName + toString(m->mothurGetpid(process)) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << numSeqs << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); positions.clear(); positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } numSeqs = 0; processIDS.resize(0); process = 1; while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ numSeqs = driver(lines[process], filename, outputFileName + toString(m->mothurGetpid(process)) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = outputFileName + toString(m->mothurGetpid(process)) + ".num.temp"; m->openOutputFile(tempFile, out); out << numSeqs << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part numSeqs = driver(lines[0], filename, outputFileName); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; numSeqs += tempNum; } in.close(); m->mothurRemove(tempFile); m->appendFiles((outputFileName + toString(processIDS[i]) + ".temp"), outputFileName); m->mothurRemove((outputFileName + toString(processIDS[i]) + ".temp")); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the degapData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// if (processors == 1) { lines.push_back(linePair(0, 1000)); }else { positions = m->setFilePosFasta(filename, numSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; icount != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } numSeqs += pDataArray[i]->count; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } for (int i = 1; i < processors; i++) { m->appendFiles((outputFileName + toString(i) + ".temp"), outputFileName); m->mothurRemove((outputFileName + toString(i) + ".temp")); } #endif return numSeqs; } catch(exception& e) { m->errorOut(e, "DegapSeqsCommand", "createProcesses"); exit(1); } } //*************************************************************************************************************** int DegapSeqsCommand::driver(linePair filePos, string filename, string outputFileName){ try{ int numSeqs = 0; ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(filePos.start); if (filePos.start == 0) { m->zapGremlins(inFASTA); m->gobble(inFASTA); } ofstream outFASTA; m->openOutputFile(outputFileName, outFASTA); while(!inFASTA.eof()){ if (m->control_pressed) { break; } Sequence currSeq(inFASTA); m->gobble(inFASTA); if (currSeq.getName() != "") { outFASTA << ">" << currSeq.getName() << endl; outFASTA << currSeq.getUnaligned() << endl; numSeqs++; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= filePos.end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((numSeqs) % 1000 == 0){ m->mothurOutJustToScreen(toString(numSeqs) + "\n"); } } //report progress if((numSeqs) % 1000 != 0){ m->mothurOutJustToScreen(toString(numSeqs) + "\n"); } inFASTA.close(); outFASTA.close(); return numSeqs; } catch(exception& e) { m->errorOut(e, "DegapSeqsCommand", "driver"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/commands/degapseqscommand.h000066400000000000000000000071511255543666200217270ustar00rootroot00000000000000#ifndef DEGAPSEQSCOMMAND_H #define DEGAPSEQSCOMMAND_H /* * degapseqscommand.h * Mothur * * Created by westcott on 6/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sequence.hpp" class DegapSeqsCommand : public Command { public: DegapSeqsCommand(string); DegapSeqsCommand(); ~DegapSeqsCommand(){} vector setParameters(); string getCommandName() { return "degap.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Degap.seqs"; } string getDescription() { return "removes gap characters from sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: int processors; bool abort; string fastafile, outputDir; vector outputNames; vector fastaFileNames; int driver(linePair, string, string); int createProcesses(string, string); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct degapData { string filename; string outputFileName; unsigned long long start; unsigned long long end; int count; MothurOut* m; degapData(){} degapData(string f, string of, MothurOut* mout, unsigned long long st, unsigned long long en) { filename = f; outputFileName = of; m = mout; start = st; end = en; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyDegapThreadFunction(LPVOID lpParam){ degapData* pDataArray; pDataArray = (degapData*)lpParam; try { pDataArray->count = 0; ifstream inFASTA; pDataArray->m->openInputFile(pDataArray->filename, inFASTA); if ((pDataArray->start == 0) || (pDataArray->start == 1)) { inFASTA.seekg(0); pDataArray->m->zapGremlins(inFASTA); }else { //this accounts for the difference in line endings. inFASTA.seekg(pDataArray->start-1); pDataArray->m->gobble(inFASTA); } ofstream outFASTA; pDataArray->m->openOutputFile(pDataArray->outputFileName, outFASTA); for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { break; } Sequence currSeq(inFASTA); pDataArray->m->gobble(inFASTA); if (currSeq.getName() != "") { outFASTA << ">" << currSeq.getName() << endl; outFASTA << currSeq.getUnaligned() << endl; pDataArray->count++; } //report progress if((pDataArray->count) % 1000 == 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count) + "\n"); } } //report progress if((pDataArray->count) % 1000 != 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count) + "\n"); } inFASTA.close(); outFASTA.close(); return pDataArray->count; } catch(exception& e) { pDataArray->m->errorOut(e, "DegapSeqsCommand", "MyDegapThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/deuniqueseqscommand.cpp000066400000000000000000000322161255543666200230210ustar00rootroot00000000000000/* * deuniqueseqscommand.cpp * Mothur * * Created by westcott on 10/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "deuniqueseqscommand.h" #include "sequence.hpp" #include "counttable.h" //********************************************************************************************************************** vector DeUniqueSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "namecount", "namecount", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "namecount", "none","group",false,false,true); parameters.push_back(pcount); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "DeUniqueSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string DeUniqueSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The deunique.seqs command reads a fastafile and namefile or countfile, and creates a fastafile containing all the sequences. It you provide a count file with group information a group file is also created.\n"; helpString += "The deunique.seqs command parameters are fasta, name and count. Fasta is required and you must provide either a name or count file.\n"; helpString += "The deunique.seqs command should be in the following format: \n"; helpString += "deunique.seqs(fasta=yourFastaFile, name=yourNameFile) \n"; helpString += "Example deunique.seqs(fasta=abrecovery.unique.fasta, name=abrecovery.names).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "DeUniqueSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string DeUniqueSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],redundant.fasta"; } else if (type == "group") { pattern = "[filename],redundant.groups"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "DeUniqueSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** DeUniqueSeqsCommand::DeUniqueSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["group"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "DeUniqueSeqsCommand", "DeconvoluteCommand"); exit(1); } } /**************************************************************************************/ DeUniqueSeqsCommand::DeUniqueSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["group"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters fastaFile = validParameter.validFile(parameters, "fasta", true); if (fastaFile == "not open") { abort = true; } else if (fastaFile == "not found") { fastaFile = m->getFastaFile(); if (fastaFile != "") { m->mothurOut("Using " + fastaFile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastaFile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } nameFile = validParameter.validFile(parameters, "name", true); if (nameFile == "not open") { abort = true; } else if (nameFile == "not found"){ nameFile = ""; } else { m->setNameFile(nameFile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (nameFile != "")) { m->mothurOut("When executing a deunique.seqs command you must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } if ((countfile == "") && (nameFile == "")) { //look for currents nameFile = m->getNameFile(); if (nameFile != "") { m->mothurOut("Using " + nameFile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("[ERROR]: You have no current name or count files one is required."); m->mothurOutEndLine(); abort = true; } } } } } catch(exception& e) { m->errorOut(e, "DeUniqueSeqsCommand", "DeUniqueSeqsCommand"); exit(1); } } /**************************************************************************************/ int DeUniqueSeqsCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //prepare filenames and open files ofstream out; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastaFile); } string outFastaFile = thisOutputDir + m->getRootName(m->getSimpleName(fastaFile)); map variables; variables["[filename]"] = outFastaFile; outFastaFile = getOutputFileName("fasta", variables); m->openOutputFile(outFastaFile, out); map nameMap; CountTable ct; ofstream outGroup; string outGroupFile; vector groups; if (nameFile != "") { m->readNames(nameFile, nameMap); } else { ct.readTable(countfile, true, false); if (ct.hasGroupInfo()) { thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } outGroupFile = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[filename]"] = outGroupFile; outGroupFile = getOutputFileName("group", variables); m->openOutputFile(outGroupFile, outGroup); groups = ct.getNamesOfGroups(); } } if (m->control_pressed) { out.close(); outputTypes.clear(); m->mothurRemove(outFastaFile); if (countfile != "") { if (ct.hasGroupInfo()) { outGroup.close(); m->mothurRemove(outGroupFile); } } return 0; } ifstream in; m->openInputFile(fastaFile, in); while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); outputTypes.clear(); m->mothurRemove(outFastaFile); if (countfile != "") { if (ct.hasGroupInfo()) { outGroup.close(); m->mothurRemove(outGroupFile); } } return 0; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { if (nameFile != "") { //look for sequence name in nameMap map::iterator it = nameMap.find(seq.getName()); if (it == nameMap.end()) { m->mothurOut("[ERROR]: Your namefile does not contain " + seq.getName() + ", aborting."); m->mothurOutEndLine(); m->control_pressed = true; } else { vector names; m->splitAtComma(it->second, names); //output sequences for (int i = 0; i < names.size(); i++) { out << ">" << names[i] << endl; out << seq.getAligned() << endl; } //remove seq from name map so we can check for seqs in namefile not in fastafile later nameMap.erase(it); } }else { if (ct.hasGroupInfo()) { vector groupCounts = ct.getGroupCounts(seq.getName()); int count = 1; for (int i = 0; i < groups.size(); i++) { for (int j = 0; j < groupCounts[i]; j++) { outGroup << seq.getName()+"_"+toString(count) << '\t' << groups[i] << endl; count++; } } } int numReps = ct.getNumSeqs(seq.getName()); //will report error and set m->control_pressed if not found for (int i = 0; i < numReps; i++) { out << ">" << seq.getName()+"_"+toString(i+1) << endl; out << seq.getAligned() << endl; } } } } in.close(); out.close(); if (countfile != "") { if (ct.hasGroupInfo()) { outGroup.close(); } } if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outFastaFile); if (countfile != "") { if (ct.hasGroupInfo()) { m->mothurRemove(outGroupFile); } }return 0; } outputNames.push_back(outFastaFile); outputTypes["fasta"].push_back(outFastaFile); if (countfile != "") { if (ct.hasGroupInfo()) { outputNames.push_back(outGroupFile); outputTypes["group"].push_back(outGroupFile); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for(int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "DeUniqueSeqsCommand", "execute"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/deuniqueseqscommand.h000066400000000000000000000020051255543666200224570ustar00rootroot00000000000000#ifndef DEUNIQUESEQSCOMMAND_H #define DEUNIQUESEQSCOMMAND_H /* * deuniqueseqscommand.h * Mothur * * Created by westcott on 10/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" /* This command is the reverse of unique.seqs */ class DeUniqueSeqsCommand : public Command { public: DeUniqueSeqsCommand(string); DeUniqueSeqsCommand(); ~DeUniqueSeqsCommand() {} vector setParameters(); string getCommandName() { return "deunique.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Deunique.seqs"; } string getDescription() { return "reverse of the unique.seqs command, and creates a fasta file from a fasta and name file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string fastaFile, nameFile, outputDir, countfile; vector outputNames; bool abort; }; #endif mothur-1.36.1/source/commands/deuniquetreecommand.cpp000066400000000000000000000172061255543666200230070ustar00rootroot00000000000000/* * deuniquetreecommand.cpp * Mothur * * Created by westcott on 5/27/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "deuniquetreecommand.h" #include "treereader.h" //********************************************************************************************************************** vector DeuniqueTreeCommand::setParameters(){ try { CommandParameter ptree("tree", "InputTypes", "", "", "none", "none", "none","tree",false,true,true); parameters.push_back(ptree); CommandParameter pname("name", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pname); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "DeuniqueTreeCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string DeuniqueTreeCommand::getHelpString(){ try { string helpString = ""; helpString += "The deunique.tree command parameters are tree and name. Both parameters are required unless you have valid current files.\n"; helpString += "The deunique.tree command should be in the following format: deunique.tree(tree=yourTreeFile, name=yourNameFile).\n"; helpString += "Example deunique.tree(tree=abrecovery.tree, name=abrecovery.names).\n"; helpString += "Note: No spaces between parameter labels (i.e. tree), '=' and parameters (i.e.yourTreeFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "DeuniqueTreeCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string DeuniqueTreeCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "tree") { pattern = "[filename],deunique.tre"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "DeuniqueTreeCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** DeuniqueTreeCommand::DeuniqueTreeCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["tree"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "DeuniqueTreeCommand", "DeuniqueTreeCommand"); exit(1); } } /***********************************************************/ DeuniqueTreeCommand::DeuniqueTreeCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["tree"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("tree"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["tree"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } } //check for required parameters treefile = validParameter.validFile(parameters, "tree", true); if (treefile == "not open") { abort = true; } else if (treefile == "not found") { //if there is a current design file, use it treefile = m->getTreeFile(); if (treefile != "") { m->mothurOut("Using " + treefile + " as input file for the tree parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current tree file and the tree parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setTreeFile(treefile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { //if there is a current design file, use it namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current name file and the name parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setNameFile(namefile); } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(treefile); } } } catch(exception& e) { m->errorOut(e, "DeuniqueTreeCommand", "DeuniqueTreeCommand"); exit(1); } } /***********************************************************/ int DeuniqueTreeCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } m->setTreeFile(treefile); TreeReader* reader = new TreeReader(treefile, "", namefile); vector T = reader->getTrees(); map nameMap; m->readNames(namefile, nameMap); delete reader; //print new Tree map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(treefile)); string outputFile = getOutputFileName("tree", variables); outputNames.push_back(outputFile); outputTypes["tree"].push_back(outputFile); ofstream out; m->openOutputFile(outputFile, out); T[0]->print(out, nameMap); out.close(); delete (T[0]->getCountTable()); for (int i = 0; i < T.size(); i++) { delete T[i]; } //set phylip file as new current phylipfile string current = ""; itTypes = outputTypes.find("tree"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTreeFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "DeuniqueTreeCommand", "execute"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/commands/deuniquetreecommand.h000066400000000000000000000020561255543666200224510ustar00rootroot00000000000000#ifndef DEUNIQUETREECOMMAND_H #define DEUNIQUETREECOMMAND_H /* * deuniquetreecommand.h * Mothur * * Created by westcott on 5/27/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sharedutilities.h" #include "readtree.h" class DeuniqueTreeCommand : public Command { public: DeuniqueTreeCommand(string); DeuniqueTreeCommand(); ~DeuniqueTreeCommand() {} vector setParameters(); string getCommandName() { return "deunique.tree"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Deunique.tree"; } string getDescription() { return "add the redundant sequence names back into a tree of unique sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: int numUniquesInName; bool abort; string outputDir, treefile, namefile; vector outputNames; map nameMap; int readNamesFile(); }; #endif mothur-1.36.1/source/commands/distancecommand.cpp000066400000000000000000001405511255543666200221020ustar00rootroot00000000000000/* * distancecommand.cpp * Mothur * * Created by Sarah Westcott on 5/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "distancecommand.h" //********************************************************************************************************************** vector DistanceCommand::setParameters(){ try { CommandParameter pcolumn("column", "InputTypes", "", "", "none", "none", "OldFastaColumn","column",false,false); parameters.push_back(pcolumn); CommandParameter poldfasta("oldfasta", "InputTypes", "", "", "none", "none", "OldFastaColumn","",false,false); parameters.push_back(poldfasta); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","phylip-column",false,true, true); parameters.push_back(pfasta); CommandParameter poutput("output", "Multiple", "column-lt-square-phylip", "column", "", "", "","phylip-column",false,false, true); parameters.push_back(poutput); CommandParameter pcalc("calc", "Multiple", "nogaps-eachgap-onegap", "onegap", "", "", "","",false,false); parameters.push_back(pcalc); CommandParameter pcountends("countends", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pcountends); CommandParameter pcompress("compress", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pcompress); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false, true); parameters.push_back(pprocessors); CommandParameter pcutoff("cutoff", "Number", "", "1.0", "", "", "","",false,false, true); parameters.push_back(pcutoff); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string DistanceCommand::getHelpString(){ try { string helpString = ""; helpString += "The dist.seqs command reads a file containing sequences and creates a distance file.\n"; helpString += "The dist.seqs command parameters are fasta, oldfasta, column, calc, countends, output, compress, cutoff and processors. \n"; helpString += "The fasta parameter is required, unless you have a valid current fasta file.\n"; helpString += "The oldfasta and column parameters allow you to append the distances calculated to the column file.\n"; helpString += "The calc parameter allows you to specify the method of calculating the distances. Your options are: nogaps, onegap or eachgap. The default is onegap.\n"; helpString += "The countends parameter allows you to specify whether to include terminal gaps in distance. Your options are: T or F. The default is T.\n"; helpString += "The cutoff parameter allows you to specify maximum distance to keep. The default is 1.0.\n"; helpString += "The output parameter allows you to specify format of your distance matrix. Options are column, lt, and square. The default is column.\n"; helpString += "The processors parameter allows you to specify number of processors to use. The default is 1.\n"; helpString += "The compress parameter allows you to indicate that you want the resulting distance file compressed. The default is false.\n"; helpString += "The dist.seqs command should be in the following format: \n"; helpString += "dist.seqs(fasta=yourFastaFile, calc=yourCalc, countends=yourEnds, cutoff= yourCutOff, processors=yourProcessors) \n"; helpString += "Example dist.seqs(fasta=amazon.fasta, calc=eachgap, countends=F, cutoff= 2.0, processors=3).\n"; helpString += "Note: No spaces between parameter labels (i.e. calc), '=' and parameters (i.e.yourCalc).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string DistanceCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "phylip") { pattern = "[filename],[outputtag],dist"; } else if (type == "column") { pattern = "[filename],dist"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** DistanceCommand::DistanceCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "DistanceCommand"); exit(1); } } //********************************************************************************************************************** DistanceCommand::DistanceCommand(string option) { try { abort = false; calledHelp = false; Estimators.clear(); //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("dist.seqs"); map::iterator it2; //check to make sure all parameters are valid for command for (it2 = parameters.begin(); it2 != parameters.end(); it2++) { if (validParameter.isValidParameter(it2->first, myArray, it2->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it2 = parameters.find("fasta"); //user has given a template file if(it2 != parameters.end()){ path = m->hasPath(it2->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it2->second; } } it2 = parameters.find("oldfasta"); //user has given a template file if(it2 != parameters.end()){ path = m->hasPath(it2->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oldfasta"] = inputDir + it2->second; } } it2 = parameters.find("column"); //user has given a template file if(it2 != parameters.end()){ path = m->hasPath(it2->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it2->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); ifstream inFASTA; m->openInputFile(fastafile, inFASTA); alignDB = SequenceDB(inFASTA); inFASTA.close(); }else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else if (fastafile == "not open") { abort = true; } else{ ifstream inFASTA; m->openInputFile(fastafile, inFASTA); alignDB = SequenceDB(inFASTA); inFASTA.close(); m->setFastaFile(fastafile); } oldfastafile = validParameter.validFile(parameters, "oldfasta", true); if (oldfastafile == "not found") { oldfastafile = ""; } else if (oldfastafile == "not open") { abort = true; } column = validParameter.validFile(parameters, "column", true); if (column == "not found") { column = ""; } else if (column == "not open") { abort = true; } else { m->setColumnFile(column); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(fastafile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "onegap"; } else { if (calc == "default") { calc = "onegap"; } } m->splitAtDash(calc, Estimators); string temp; temp = validParameter.validFile(parameters, "countends", false); if(temp == "not found"){ temp = "T"; } convert(temp, countends); temp = validParameter.validFile(parameters, "cutoff", false); if(temp == "not found"){ temp = "1.0"; } m->mothurConvert(temp, cutoff); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "compress", false); if(temp == "not found"){ temp = "F"; } convert(temp, compress); output = validParameter.validFile(parameters, "output", false); if(output == "not found"){ output = "column"; } if (output == "phylip") { output = "lt"; } if (((column != "") && (oldfastafile == "")) || ((column == "") && (oldfastafile != ""))) { m->mothurOut("If you provide column or oldfasta, you must provide both."); m->mothurOutEndLine(); abort=true; } if ((column != "") && (oldfastafile != "") && (output != "column")) { m->mothurOut("You have provided column and oldfasta, indicating you want to append distances to your column file. Your output must be in column format to do so."); m->mothurOutEndLine(); abort=true; } if ((output != "column") && (output != "lt") && (output != "square")) { m->mothurOut(output + " is not a valid output form. Options are column, lt and square. I will use column."); m->mothurOutEndLine(); output = "column"; } } } catch(exception& e) { m->errorOut(e, "DistanceCommand", "DistanceCommand"); exit(1); } } //********************************************************************************************************************** int DistanceCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int startTime = time(NULL); //save number of new sequence numNewFasta = alignDB.getNumSeqs(); //sanity check the oldfasta and column file as well as add oldfasta sequences to alignDB if ((oldfastafile != "") && (column != "")) { if (!(sanityCheck())) { return 0; } } if (m->control_pressed) { return 0; } int numSeqs = alignDB.getNumSeqs(); cutoff += 0.005; if (!alignDB.sameLength()) { m->mothurOut("[ERROR]: your sequences are not the same length, aborting."); m->mothurOutEndLine(); return 0; } string outputFile; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); if (output == "lt") { //does the user want lower triangle phylip formatted file variables["[outputtag]"] = "phylip"; outputFile = getOutputFileName("phylip", variables); m->mothurRemove(outputFile); outputTypes["phylip"].push_back(outputFile); //output numSeqs to phylip formatted dist file }else if (output == "column") { //user wants column format outputFile = getOutputFileName("column", variables); outputTypes["column"].push_back(outputFile); //so we don't accidentally overwrite if (outputFile == column) { string tempcolumn = column + ".old"; rename(column.c_str(), tempcolumn.c_str()); } m->mothurRemove(outputFile); }else { //assume square variables["[outputtag]"] = "square"; outputFile = getOutputFileName("phylip", variables); m->mothurRemove(outputFile); outputTypes["phylip"].push_back(outputFile); } #ifdef USE_MPI int pid, start, end; int tag = 2001; MPI_Status status; MPI_Comm_size(MPI_COMM_WORLD, &processors); //set processors to the number of mpi processes running MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are //each process gets where it should start and stop in the file if (output != "square") { start = int (sqrt(float(pid)/float(processors)) * numSeqs); end = int (sqrt(float(pid+1)/float(processors)) * numSeqs); }else{ start = int ((float(pid)/float(processors)) * numSeqs); end = int ((float(pid+1)/float(processors)) * numSeqs); } if (output == "column") { MPI_File outMPI; int amode=MPI_MODE_CREATE|MPI_MODE_WRONLY; //char* filename = new char[outputFile.length()]; //memcpy(filename, outputFile.c_str(), outputFile.length()); char filename[1024]; strcpy(filename, outputFile.c_str()); MPI_File_open(MPI_COMM_WORLD, filename, amode, MPI_INFO_NULL, &outMPI); //delete filename; if (pid == 0) { //you are the root process //do your part string outputMyPart; driverMPI(start, end, outMPI, cutoff); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&outMPI); return 0; } //wait on chidren for(int i = 1; i < processors; i++) { if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&outMPI); return 0; } char buf[5]; MPI_Recv(buf, 5, MPI_CHAR, i, tag, MPI_COMM_WORLD, &status); } }else { //you are a child process //do your part driverMPI(start, end, outMPI, cutoff); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&outMPI); return 0; } char buf[5]; strcpy(buf, "done"); //tell parent you are done. MPI_Send(buf, 5, MPI_CHAR, 0, tag, MPI_COMM_WORLD); } MPI_File_close(&outMPI); }else { //lower triangle format if (pid == 0) { //you are the root process //do your part string outputMyPart; unsigned long long mySize; if (output != "square"){ driverMPI(start, end, outputFile, mySize); } else { driverMPI(start, end, outputFile, mySize, output); } if (m->control_pressed) { outputTypes.clear(); return 0; } int amode=MPI_MODE_APPEND|MPI_MODE_WRONLY|MPI_MODE_CREATE; // MPI_File outMPI; MPI_File inMPI; //char* filename = new char[outputFile.length()]; //memcpy(filename, outputFile.c_str(), outputFile.length()); char filename[1024]; strcpy(filename, outputFile.c_str()); MPI_File_open(MPI_COMM_SELF, filename, amode, MPI_INFO_NULL, &outMPI); //delete filename; //wait on chidren for(int b = 1; b < processors; b++) { unsigned long long fileSize; if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&outMPI); return 0; } MPI_Recv(&fileSize, 1, MPI_LONG, b, tag, MPI_COMM_WORLD, &status); string outTemp = outputFile + toString(b) + ".temp"; char* buf = new char[outTemp.length()]; memcpy(buf, outTemp.c_str(), outTemp.length()); MPI_File_open(MPI_COMM_SELF, buf, MPI_MODE_DELETE_ON_CLOSE|MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); delete buf; int count = 0; while (count < fileSize) { char buf2[1]; MPI_File_read(inMPI, buf2, 1, MPI_CHAR, &status); MPI_File_write(outMPI, buf2, 1, MPI_CHAR, &status); count += 1; } MPI_File_close(&inMPI); //deleted on close } MPI_File_close(&outMPI); }else { //you are a child process //do your part unsigned long long size; if (output != "square"){ driverMPI(start, end, (outputFile + toString(pid) + ".temp"), size); } else { driverMPI(start, end, (outputFile + toString(pid) + ".temp"), size, output); } if (m->control_pressed) { return 0; } //tell parent you are done. MPI_Send(&size, 1, MPI_LONG, 0, tag, MPI_COMM_WORLD); } } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else //#if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //if you don't need to fork anything if(processors == 1){ if (output != "square") { driver(0, numSeqs, outputFile, cutoff); } else { driver(0, numSeqs, outputFile, "square"); } }else{ //you have multiple processors createProcesses(outputFile, numSeqs); } //#else //ifstream inFASTA; //if (output != "square") { driver(0, numSeqs, outputFile, cutoff); } //else { driver(0, numSeqs, outputFile, "square"); } //#endif #endif if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFile); return 0; } #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif //if (output == "square") { convertMatrix(outputFile); } ifstream fileHandle; fileHandle.open(outputFile.c_str()); if(fileHandle) { m->gobble(fileHandle); if (fileHandle.eof()) { m->mothurOut(outputFile + " is blank. This can result if there are no distances below your cutoff."); m->mothurOutEndLine(); } } //append the old column file to the new one if ((oldfastafile != "") && (column != "")) { //we had to rename the column file so we didnt overwrite above, but we want to keep old name if (outputFile == column) { string tempcolumn = column + ".old"; m->appendFiles(tempcolumn, outputFile); m->mothurRemove(tempcolumn); }else{ m->appendFiles(outputFile, column); m->mothurRemove(outputFile); outputFile = column; } if (outputDir != "") { string newOutputName = outputDir + m->getSimpleName(outputFile); rename(outputFile.c_str(), newOutputName.c_str()); m->mothurRemove(outputFile); outputFile = newOutputName; } } #ifdef USE_MPI } #endif if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFile); return 0; } //set phylip file as new current phylipfile string current = ""; itTypes = outputTypes.find("phylip"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setPhylipFile(current); } } //set column file as new current columnfile itTypes = outputTypes.find("column"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setColumnFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFile); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - startTime) + " seconds to calculate the distances for " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); if (m->isTrue(compress)) { m->mothurOut("Compressing..."); m->mothurOutEndLine(); m->mothurOut("(Replacing " + outputFile + " with " + outputFile + ".gz)"); m->mothurOutEndLine(); system(("gzip -v " + outputFile).c_str()); outputNames.push_back(outputFile + ".gz"); }else { outputNames.push_back(outputFile); } return 0; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "execute"); exit(1); } } /**************************************************************************************************/ void DistanceCommand::createProcesses(string filename, int numSeqs) { try { unsigned long long numDists = 0; if (output == "square") { numDists = numSeqs * numSeqs; }else { for(int i=0;i processors) { break; } } } } if (numDists < processors) { processors = numDists; } for (int i = 0; i < processors; i++) { distlinePair tempLine; lines.push_back(tempLine); if (output != "square") { lines[i].start = int (sqrt(float(i)/float(processors)) * numSeqs); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numSeqs); }else{ lines[i].start = int ((float(i)/float(processors)) * numSeqs); lines[i].end = int ((float(i+1)/float(processors)) * numSeqs); } } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; processIDS.clear(); bool recalc = false; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; if (m->debug) { m->mothurOut("[DEBUG]: parent process is saving child pid " + toString(pid) + ".\n"); } }else if (pid == 0){ if (output != "square") { driver(lines[process].start, lines[process].end, filename + m->mothurGetpid(process) + ".temp", cutoff); } else { driver(lines[process].start, lines[process].end, filename + m->mothurGetpid(process) + ".temp", "square"); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + ".temp")); } m->control_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + ".temp"));}m->control_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); processIDS.resize(0); process = 1; lines.clear(); for (int i = 0; i < processors; i++) { distlinePair tempLine; lines.push_back(tempLine); if (output != "square") { lines[i].start = int (sqrt(float(i)/float(processors)) * numSeqs); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numSeqs); }else{ lines[i].start = int ((float(i)/float(processors)) * numSeqs); lines[i].end = int ((float(i+1)/float(processors)) * numSeqs); } } //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; if (m->debug) { m->mothurOut("[DEBUG]: parent process is saving child pid " + toString(pid) + ".\n"); } }else if (pid == 0){ if (output != "square") { driver(lines[process].start, lines[process].end, filename + m->mothurGetpid(process) + ".temp", cutoff); } else { driver(lines[process].start, lines[process].end, filename + m->mothurGetpid(process) + ".temp", "square"); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes. Error code: " + toString(pid)); m->mothurOutEndLine(); perror(" : "); for (int i=0;i pDataArray; //[processors-1]; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor-1 worker threads. for( int i=0; icount != (pDataArray[i]->endLine-pDataArray[i]->startLine)) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->endLine-pDataArray[i]->startLine) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //append and remove temp files for (int i=0;idebug) { m->mothurOut("[DEBUG]: parent process is appending child pid " + toString(processIDS[i]) + ".\n"); } m->appendFiles((filename + toString(processIDS[i]) + ".temp"), filename); m->mothurRemove((filename + toString(processIDS[i]) + ".temp")); } } catch(exception& e) { m->errorOut(e, "DistanceCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int DistanceCommand::driver(int startLine, int endLine, string dFileName, float cutoff){ try { ValidCalculators validCalculator; Dist* distCalculator; if (m->isTrue(countends) == true) { for (int i=0; icontrol_pressed) { delete distCalculator; outFile.close(); return 0; } //if there was a column file given and we are appending, we don't want to calculate the distances that are already in the column file //the alignDB contains the new sequences and then the old, so if i an oldsequence and j is an old sequence then break out of this loop if ((i >= numNewFasta) && (j >= numNewFasta)) { break; } distCalculator->calcDist(alignDB.get(i), alignDB.get(j)); double dist = distCalculator->getDist(); if(dist <= cutoff){ if (output == "column") { outFile << alignDB.get(i).getName() << ' ' << alignDB.get(j).getName() << ' ' << dist << endl; } } if (output == "lt") { outFile << '\t' << dist; } } if (output == "lt") { outFile << endl; } if(i % 100 == 0){ m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } } m->mothurOutJustToScreen(toString(endLine-1) + "\t" + toString(time(NULL) - startTime)+"\n"); outFile.close(); delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "driver"); exit(1); } } /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int DistanceCommand::driver(int startLine, int endLine, string dFileName, string square){ try { ValidCalculators validCalculator; Dist* distCalculator; if (m->isTrue(countends) == true) { for (int i=0; icontrol_pressed) { delete distCalculator; outFile.close(); return 0; } distCalculator->calcDist(alignDB.get(i), alignDB.get(j)); double dist = distCalculator->getDist(); outFile << dist << '\t'; } outFile << endl; if(i % 100 == 0){ m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } } m->mothurOutJustToScreen(toString(endLine-1) + "\t" + toString(time(NULL) - startTime)+"\n"); outFile.close(); delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "driver"); exit(1); } } #ifdef USE_MPI /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int DistanceCommand::driverMPI(int startLine, int endLine, MPI_File& outMPI, float cutoff){ try { ValidCalculators validCalculator; Dist* distCalculator; if (m->isTrue(countends) == true) { for (int i=0; icontrol_pressed) { delete distCalculator; return 0; } //if there was a column file given and we are appending, we don't want to calculate the distances that are already in the column file //the alignDB contains the new sequences and then the old, so if i an oldsequence and j is an old sequence then break out of this loop if ((i >= numNewFasta) && (j >= numNewFasta)) { break; } distCalculator->calcDist(alignDB.get(i), alignDB.get(j)); double dist = distCalculator->getDist(); if(dist <= cutoff){ outputString += (alignDB.get(i).getName() + ' ' + alignDB.get(j).getName() + ' ' + toString(dist) + '\n'); } } if(i % 100 == 0){ m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } //send results to parent int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(outMPI, buf, length, MPI_CHAR, &status); outputString = ""; delete buf; } m->mothurOutJustToScreen(toString(endLine-1) + "\t" + toString(time(NULL) - startTime)+"\n"); delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "driverMPI"); exit(1); } } /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int DistanceCommand::driverMPI(int startLine, int endLine, string file, unsigned long long& size){ try { ValidCalculators validCalculator; Dist* distCalculator; if (m->isTrue(countends) == true) { for (int i=0; icontrol_pressed) { delete distCalculator; return 0; } distCalculator->calcDist(alignDB.get(i), alignDB.get(j)); double dist = distCalculator->getDist(); outputString += "\t" + toString(dist); } outputString += "\n"; if(i % 100 == 0){ m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } //send results to parent int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write(outMPI, buf, length, MPI_CHAR, &status); size += outputString.length(); outputString = ""; delete buf; } m->mothurOutJustToScreen(toString(endLine-1) + "\t" + toString(time(NULL) - startTime)+"\n"); MPI_File_close(&outMPI); delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "driverMPI"); exit(1); } } /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int DistanceCommand::driverMPI(int startLine, int endLine, string file, unsigned long long& size, string square){ try { ValidCalculators validCalculator; Dist* distCalculator; if (m->isTrue(countends) == true) { for (int i=0; icontrol_pressed) { delete distCalculator; return 0; } distCalculator->calcDist(alignDB.get(i), alignDB.get(j)); double dist = distCalculator->getDist(); outputString += "\t" + toString(dist); } outputString += "\n"; if(i % 100 == 0){ m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } //send results to parent int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write(outMPI, buf, length, MPI_CHAR, &status); size += outputString.length(); outputString = ""; delete buf; } m->mothurOutJustToScreen(toString(endLine-1) + "\t" + toString(time(NULL) - startTime)+"\n"); MPI_File_close(&outMPI); delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "driverMPI"); exit(1); } } #endif /************************************************************************************************** int DistanceCommand::convertMatrix(string outputFile) { try{ //sort file by first column so the distances for each row are together string outfile = m->getRootName(outputFile) + "sorted.dist.temp"; //use the unix sort #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) string command = "sort -n " + outputFile + " -o " + outfile; system(command.c_str()); #else //sort using windows sort string command = "sort " + outputFile + " /O " + outfile; system(command.c_str()); #endif //output to new file distance for each row and save positions in file where new row begins ifstream in; m->openInputFile(outfile, in); ofstream out; m->openOutputFile(outputFile, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); out << alignDB.getNumSeqs() << endl; //get first currentRow string first, currentRow, second; float dist; map rowDists; //take advantage of the fact that maps are already sorted by key map::iterator it; in >> first; currentRow = first; rowDists[first] = 0.00; //distance to yourself is 0.0 in.seekg(0); //m->openInputFile(outfile, in); while(!in.eof()) { if (m->control_pressed) { in.close(); m->mothurRemove(outfile); out.close(); return 0; } in >> first >> second >> dist; m->gobble(in); if (first != currentRow) { //print out last row out << currentRow << '\t'; //print name //print dists for (it = rowDists.begin(); it != rowDists.end(); it++) { out << it->second << '\t'; } out << endl; //start new row currentRow = first; rowDists.clear(); rowDists[first] = 0.00; rowDists[second] = dist; }else{ rowDists[second] = dist; } } //print out last row out << currentRow << '\t'; //print name //print dists for (it = rowDists.begin(); it != rowDists.end(); it++) { out << it->second << '\t'; } out << endl; in.close(); out.close(); m->mothurRemove(outfile); return 1; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "convertMatrix"); exit(1); } } ************************************************************************************************** int DistanceCommand::convertToLowerTriangle(string outputFile) { try{ //sort file by first column so the distances for each row are together string outfile = m->getRootName(outputFile) + "sorted.dist.temp"; //use the unix sort #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) string command = "sort -n " + outputFile + " -o " + outfile; system(command.c_str()); #else //sort using windows sort string command = "sort " + outputFile + " /O " + outfile; system(command.c_str()); #endif //output to new file distance for each row and save positions in file where new row begins ifstream in; m->openInputFile(outfile, in); ofstream out; m->openOutputFile(outputFile, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); out << alignDB.getNumSeqs() << endl; //get first currentRow string first, currentRow, second; float dist; int i, j; i = 0; j = 0; map rowDists; //take advantage of the fact that maps are already sorted by key map::iterator it; in >> first; currentRow = first; rowDists[first] = 0.00; //distance to yourself is 0.0 in.seekg(0); //m->openInputFile(outfile, in); while(!in.eof()) { if (m->control_pressed) { in.close(); m->mothurRemove(outfile); out.close(); return 0; } in >> first >> second >> dist; m->gobble(in); if (first != currentRow) { //print out last row out << currentRow << '\t'; //print name //print dists for (it = rowDists.begin(); it != rowDists.end(); it++) { if (j >= i) { break; } out << it->second << '\t'; j++; } out << endl; //start new row currentRow = first; rowDists.clear(); rowDists[first] = 0.00; rowDists[second] = dist; j = 0; i++; }else{ rowDists[second] = dist; } } //print out last row out << currentRow << '\t'; //print name //print dists for (it = rowDists.begin(); it != rowDists.end(); it++) { out << it->second << '\t'; } out << endl; in.close(); out.close(); m->mothurRemove(outfile); return 1; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "convertToLowerTriangle"); exit(1); } } **************************************************************************************************/ //its okay if the column file does not contain all the names in the fasta file, since some distance may have been above a cutoff, //but no sequences can be in the column file that are not in oldfasta. also, if a distance is above the cutoff given then remove it. //also check to make sure the 2 files have the same alignment length. bool DistanceCommand::sanityCheck() { try{ bool good = true; //make sure the 2 fasta files have the same alignment length ifstream in; m->openInputFile(fastafile, in); int fastaAlignLength = 0; if (in) { Sequence tempIn(in); fastaAlignLength = tempIn.getAligned().length(); } in.close(); ifstream in2; m->openInputFile(oldfastafile, in2); int oldfastaAlignLength = 0; if (in2) { Sequence tempIn2(in2); oldfastaAlignLength = tempIn2.getAligned().length(); } in2.close(); if (fastaAlignLength != oldfastaAlignLength) { m->mothurOut("fasta files do not have the same alignment length."); m->mothurOutEndLine(); return false; } //read fasta file and save names as well as adding them to the alignDB set namesOldFasta; ifstream inFasta; m->openInputFile(oldfastafile, inFasta); while (!inFasta.eof()) { if (m->control_pressed) { inFasta.close(); return good; } Sequence temp(inFasta); if (temp.getName() != "") { namesOldFasta.insert(temp.getName()); //save name alignDB.push_back(temp); //add to DB } m->gobble(inFasta); } inFasta.close(); //read through the column file checking names and removing distances above the cutoff ifstream inDist; m->openInputFile(column, inDist); ofstream outDist; string outputFile = column + ".temp"; m->openOutputFile(outputFile, outDist); string name1, name2; float dist; while (!inDist.eof()) { if (m->control_pressed) { inDist.close(); outDist.close(); m->mothurRemove(outputFile); return good; } inDist >> name1 >> name2 >> dist; m->gobble(inDist); //both names are in fasta file and distance is below cutoff if ((namesOldFasta.count(name1) == 0) || (namesOldFasta.count(name2) == 0)) { good = false; break; } else{ if (dist <= cutoff) { outDist << name1 << '\t' << name2 << '\t' << dist << endl; } } } inDist.close(); outDist.close(); if (good) { m->mothurRemove(column); rename(outputFile.c_str(), column.c_str()); }else{ m->mothurRemove(outputFile); //temp file is bad because file mismatch above } return good; } catch(exception& e) { m->errorOut(e, "DistanceCommand", "sanityCheck"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/commands/distancecommand.h000066400000000000000000000166571255543666200215600ustar00rootroot00000000000000#ifndef DISTANCECOMMAND_H #define DISTANCECOMMAND_H /* * distancecommand.h * Mothur * * Created by Sarah Westcott on 5/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "validcalculator.h" #include "dist.h" #include "sequencedb.h" #include "ignoregaps.h" #include "eachgapdist.h" #include "eachgapignore.h" #include "onegapdist.h" #include "onegapignore.h" //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct distanceData { int startLine; int endLine; string dFileName; float cutoff; SequenceDB alignDB; vector Estimators; MothurOut* m; string output; int numNewFasta, count; string countends; distanceData(){} distanceData(int s, int e, string dbname, float c, SequenceDB db, vector Est, MothurOut* mout, string o, int num, string count) { startLine = s; endLine = e; dFileName = dbname; cutoff = c; alignDB = db; Estimators = Est; m = mout; output = o; numNewFasta = num; countends = count; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyDistThreadFunction(LPVOID lpParam){ distanceData* pDataArray; pDataArray = (distanceData*)lpParam; try { ValidCalculators validCalculator; Dist* distCalculator; if (pDataArray->m->isTrue(pDataArray->countends) == true) { for (int i=0; iEstimators.size(); i++) { if (validCalculator.isValidCalculator("distance", pDataArray->Estimators[i]) == true) { if (pDataArray->Estimators[i] == "nogaps") { distCalculator = new ignoreGaps(); } else if (pDataArray->Estimators[i] == "eachgap") { distCalculator = new eachGapDist(); } else if (pDataArray->Estimators[i] == "onegap") { distCalculator = new oneGapDist(); } } } }else { for (int i=0; iEstimators.size(); i++) { if (validCalculator.isValidCalculator("distance", pDataArray->Estimators[i]) == true) { if (pDataArray->Estimators[i] == "nogaps") { distCalculator = new ignoreGaps(); } else if (pDataArray->Estimators[i] == "eachgap"){ distCalculator = new eachGapIgnoreTermGapDist(); } else if (pDataArray->Estimators[i] == "onegap") { distCalculator = new oneGapIgnoreTermGapDist(); } } } } int startTime = time(NULL); //column file ofstream outFile(pDataArray->dFileName.c_str(), ios::trunc); outFile.setf(ios::fixed, ios::showpoint); outFile << setprecision(4); pDataArray->count = 0; if (pDataArray->output != "square") { if((pDataArray->output == "lt") && (pDataArray->startLine == 0)){ outFile << pDataArray->alignDB.getNumSeqs() << endl; } for(int i=pDataArray->startLine;iendLine;i++){ if(pDataArray->output == "lt") { string name = pDataArray->alignDB.get(i).getName(); if (name.length() < 10) { //pad with spaces to make compatible while (name.length() < 10) { name += " "; } } outFile << name; } for(int j=0;jm->control_pressed) { delete distCalculator; outFile.close(); return 0; } //if there was a column file given and we are appending, we don't want to calculate the distances that are already in the column file //the alignDB contains the new sequences and then the old, so if i an oldsequence and j is an old sequence then break out of this loop if ((i >= pDataArray->numNewFasta) && (j >= pDataArray->numNewFasta)) { break; } distCalculator->calcDist(pDataArray->alignDB.get(i), pDataArray->alignDB.get(j)); double dist = distCalculator->getDist(); if(dist <= pDataArray->cutoff){ if (pDataArray->output == "column") { outFile << pDataArray->alignDB.get(i).getName() << ' ' << pDataArray->alignDB.get(j).getName() << ' ' << dist << endl; } } if (pDataArray->output == "lt") { outFile << '\t' << dist; } } if (pDataArray->output == "lt") { outFile << endl; } if(i % 100 == 0){ pDataArray->m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } pDataArray->count++; } pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count) + "\t" + toString(time(NULL) - startTime)+"\n"); }else{ if(pDataArray->startLine == 0){ outFile << pDataArray->alignDB.getNumSeqs() << endl; } for(int i=pDataArray->startLine;iendLine;i++){ string name = pDataArray->alignDB.get(i).getName(); //pad with spaces to make compatible if (name.length() < 10) { while (name.length() < 10) { name += " "; } } outFile << name; for(int j=0;jalignDB.getNumSeqs();j++){ if (pDataArray->m->control_pressed) { delete distCalculator; outFile.close(); return 0; } distCalculator->calcDist(pDataArray->alignDB.get(i), pDataArray->alignDB.get(j)); double dist = distCalculator->getDist(); outFile << '\t' << dist; } outFile << endl; if(i % 100 == 0){ pDataArray->m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } pDataArray->count++; } pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count) + "\t" + toString(time(NULL) - startTime)+"\n"); } outFile.close(); delete distCalculator; return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "DistanceCommand", "MyDistThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ class DistanceCommand : public Command { public: DistanceCommand(string); DistanceCommand(); ~DistanceCommand() {} vector setParameters(); string getCommandName() { return "dist.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD (2010). The effects of alignment quality, distance calculation method, sequence filtering, and region on the analysis of 16S rRNA gene-based studies. PLoS Comput Biol 6: e1000844. \nhttp://www.mothur.org/wiki/Dist.seqs"; } string getDescription() { return "calculate the pairwaise distances between aligned sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: struct distlinePair { int start; int end; }; //Dist* distCalculator; SequenceDB alignDB; string countends, output, fastafile, calc, outputDir, oldfastafile, column, compress; int processors, numNewFasta; float cutoff; vector processIDS; //end line, processid vector lines; bool abort; vector Estimators, outputNames; //holds estimators to be used //void m->appendFiles(string, string); void createProcesses(string, int); int driver(/*Dist*, SequenceDB, */int, int, string, float); int driver(int, int, string, string); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, float); int driverMPI(int, int, string, unsigned long long&); int driverMPI(int, int, string, unsigned long long&, string); #endif //int convertMatrix(string); bool sanityCheck(); //int convertToLowerTriangle(string); }; #endif /**************************************************************************************************/ mothur-1.36.1/source/commands/filterseqscommand.cpp000066400000000000000000001456711255543666200225010ustar00rootroot00000000000000/* * filterseqscommand.cpp * Mothur * * Created by Thomas Ryabin on 5/4/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "filterseqscommand.h" #include "sequence.hpp" //********************************************************************************************************************** vector FilterSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta-filter",false,true, true); parameters.push_back(pfasta); CommandParameter phard("hard", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(phard); CommandParameter ptrump("trump", "String", "", "*", "", "", "","",false,false, true); parameters.push_back(ptrump); CommandParameter psoft("soft", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psoft); CommandParameter pvertical("vertical", "Boolean", "", "T", "", "", "","",false,false, true); parameters.push_back(pvertical); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false, true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string FilterSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The filter.seqs command reads a file containing sequences and creates a .filter and .filter.fasta file.\n"; helpString += "The filter.seqs command parameters are fasta, trump, soft, hard, processors and vertical. \n"; helpString += "The fasta parameter is required, unless you have a valid current fasta file. You may enter several fasta files to build the filter from and filter, by separating their names with -'s.\n"; helpString += "For example: fasta=abrecovery.fasta-amazon.fasta \n"; helpString += "The trump option will remove a column if the trump character is found at that position in any sequence of the alignment. Default=*, meaning no trump. \n"; helpString += "A soft mask removes any column where the dominant base (i.e. A, T, G, C, or U) does not occur in at least a designated percentage of sequences. Default=0.\n"; helpString += "The hard parameter allows you to enter a file containing the filter you want to use.\n"; helpString += "The vertical parameter removes columns where all sequences contain a gap character. The default is T.\n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; helpString += "The filter.seqs command should be in the following format: \n"; helpString += "filter.seqs(fasta=yourFastaFile, trump=yourTrump) \n"; helpString += "Example filter.seqs(fasta=abrecovery.fasta, trump=.).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string FilterSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],filter.fasta"; } else if (type == "filter") { pattern = "[filename],filter"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** FilterSeqsCommand::FilterSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["filter"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "FilterSeqsCommand"); exit(1); } } /**************************************************************************************/ FilterSeqsCommand::FilterSeqsCommand(string option) { try { abort = false; calledHelp = false; recalced = false; filterFileName = ""; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("filter.seqs"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["filter"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("hard"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["hard"] = inputDir + it->second; } } } //check for required parameters fasta = validParameter.validFile(parameters, "fasta", false); if (fasta == "not found") { fasta = m->getFastaFile(); if (fasta != "") { fastafileNames.push_back(fasta); m->mothurOut("Using " + fasta + " as input file for the fasta parameter."); m->mothurOutEndLine(); string simpleName = m->getSimpleName(fasta); filterFileName += simpleName.substr(0, simpleName.find_first_of('.')); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else { m->splitAtDash(fasta, fastafileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastafileNames.size(); i++) { bool ignore = false; if (fastafileNames[i] == "current") { fastafileNames[i] = m->getFastaFile(); if (fastafileNames[i] != "") { m->mothurOut("Using " + fastafileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastafileNames.erase(fastafileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastafileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastafileNames[i] = inputDir + fastafileNames[i]; } } ifstream in; int ableToOpen = m->openInputFile(fastafileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastafileNames[i]); m->mothurOut("Unable to open " + fastafileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastafileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastafileNames[i]); m->mothurOut("Unable to open " + fastafileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastafileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastafileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastafileNames.erase(fastafileNames.begin()+i); i--; }else{ string simpleName = m->getSimpleName(fastafileNames[i]); filterFileName += simpleName.substr(0, simpleName.find_first_of('.')); m->setFastaFile(fastafileNames[i]); } in.close(); } } //make sure there is at least one valid file left if (fastafileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } if (!abort) { //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(fastafileNames[0]); //if user entered a file with a path then preserve it } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; hard = validParameter.validFile(parameters, "hard", true); if (hard == "not found") { hard = ""; } else if (hard == "not open") { hard = ""; abort = true; } temp = validParameter.validFile(parameters, "trump", false); if (temp == "not found") { temp = "*"; } trump = temp[0]; temp = validParameter.validFile(parameters, "soft", false); if (temp == "not found") { soft = 0; } else { soft = (float)atoi(temp.c_str()) / 100.0; } temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); vertical = validParameter.validFile(parameters, "vertical", false); if (vertical == "not found") { if ((hard == "") && (trump == '*') && (soft == 0)) { vertical = "T"; } //you have not given a hard file or set the trump char. else { vertical = "F"; } } numSeqs = 0; } } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "FilterSeqsCommand"); exit(1); } } /**************************************************************************************/ int FilterSeqsCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } ifstream inFASTA; m->openInputFile(fastafileNames[0], inFASTA); Sequence testSeq(inFASTA); alignmentLength = testSeq.getAlignLength(); inFASTA.close(); ////////////create filter///////////////// m->mothurOut("Creating Filter... "); m->mothurOutEndLine(); filter = createFilter(); m->mothurOutEndLine(); m->mothurOutEndLine(); if (m->control_pressed) { outputTypes.clear(); return 0; } #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output the filter #endif ofstream outFilter; //prevent giantic file name map variables; variables["[filename]"] = outputDir + filterFileName + "."; if (fastafileNames.size() > 3) { variables["[filename]"] = outputDir + "merge."; } string filterFile = getOutputFileName("filter", variables); m->openOutputFile(filterFile, outFilter); outFilter << filter << endl; outFilter.close(); outputNames.push_back(filterFile); outputTypes["filter"].push_back(filterFile); #ifdef USE_MPI } #endif ////////////run filter///////////////// m->mothurOut("Running Filter... "); m->mothurOutEndLine(); filterSequences(); m->mothurOutEndLine(); m->mothurOutEndLine(); int filteredLength = 0; for(int i=0;icontrol_pressed) { outputTypes.clear(); for(int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Length of filtered alignment: " + toString(filteredLength)); m->mothurOutEndLine(); m->mothurOut("Number of columns removed: " + toString((alignmentLength-filteredLength))); m->mothurOutEndLine(); m->mothurOut("Length of the original alignment: " + toString(alignmentLength)); m->mothurOutEndLine(); m->mothurOut("Number of sequences used to construct filter: " + toString(numSeqs)); m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for(int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "execute"); exit(1); } } /**************************************************************************************/ int FilterSeqsCommand::filterSequences() { try { numSeqs = 0; for (int s = 0; s < fastafileNames.size(); s++) { for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafileNames[s])); string filteredFasta = getOutputFileName("fasta", variables); #ifdef USE_MPI int pid, numSeqsPerProcessor, num; int tag = 2001; vectorMPIPos; MPI_Status status; MPI_Comm_size(MPI_COMM_WORLD, &processors); //set processors to the number of mpi processes running MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_File outMPI; MPI_File inMPI; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outFilename[1024]; strcpy(outFilename, filteredFasta.c_str()); char inFileName[1024]; strcpy(inFileName, fastafileNames[s].c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outFilename, outMode, MPI_INFO_NULL, &outMPI); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); return 0; } if (pid == 0) { //you are the root process MPIPos = m->setFilePosFasta(fastafileNames[s], num); //fills MPIPos, returns numSeqs numSeqs += num; //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&num, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (num+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } //figure out how many sequences you have to do numSeqsPerProcessor = num / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = num - pid * numSeqsPerProcessor; } //do your part driverMPIRun(startIndex, numSeqsPerProcessor, inMPI, outMPI, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); return 0; } //wait on chidren for(int i = 1; i < processors; i++) { char buf[5]; MPI_Recv(buf, 5, MPI_CHAR, i, tag, MPI_COMM_WORLD, &status); } }else { //you are a child process MPI_Recv(&num, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(num+1); numSeqs += num; MPI_Recv(&MPIPos[0], (num+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = num / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = num - pid * numSeqsPerProcessor; } //align your part driverMPIRun(startIndex, numSeqsPerProcessor, inMPI, outMPI, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPI); return 0; } char buf[5]; strcpy(buf, "done"); //tell parent you are done. MPI_Send(buf, 5, MPI_CHAR, 0, tag, MPI_COMM_WORLD); } MPI_File_close(&outMPI); MPI_File_close(&inMPI); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else vector positions; if (savedPositions.size() != 0) { positions = savedPositions[s]; } else { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastafileNames[s], processors); #else if(processors != 1){ int numFastaSeqs = 0; positions = m->setFilePosFasta(fastafileNames[s], numFastaSeqs); if (positions.size() < processors) { processors = positions.size(); } } #endif } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //vector positions = m->divideFile(fastafileNames[s], processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } if(processors == 1){ int numFastaSeqs = driverRunFilter(filter, filteredFasta, fastafileNames[s], lines[0]); numSeqs += numFastaSeqs; }else{ int numFastaSeqs = createProcessesRunFilter(filter, fastafileNames[s], filteredFasta); numSeqs += numFastaSeqs; } if (m->control_pressed) { return 1; } #else if(processors == 1){ lines.push_back(new linePair(0, 1000)); int numFastaSeqs = driverRunFilter(filter, filteredFasta, fastafileNames[s], lines[0]); numSeqs += numFastaSeqs; }else { int numFastaSeqs = positions.size()-1; //positions = m->setFilePosFasta(fastafileNames[s], numFastaSeqs); //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(new linePair(positions[startIndex], numSeqsPerProcessor)); } numFastaSeqs = createProcessesRunFilter(filter, fastafileNames[s], filteredFasta); numSeqs += numFastaSeqs; } if (m->control_pressed) { return 1; } #endif #endif outputNames.push_back(filteredFasta); outputTypes["fasta"].push_back(filteredFasta); } return 0; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "filterSequences"); exit(1); } } #ifdef USE_MPI /**************************************************************************************/ int FilterSeqsCommand::driverMPIRun(int start, int num, MPI_File& inMPI, MPI_File& outMPI, vector& MPIPos) { try { string outputString = ""; int count = 0; MPI_Status status; for(int i=0;icontrol_pressed) { return 0; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); delete buf4; Sequence seq(iss); m->gobble(iss); if (seq.getName() != "") { string align = seq.getAligned(); string filterSeq = ""; for(int j=0;jerrorOut(e, "FilterSeqsCommand", "driverRunFilter"); exit(1); } } #endif /**************************************************************************************/ int FilterSeqsCommand::driverRunFilter(string F, string outputFilename, string inputFilename, linePair* filePos) { try { ofstream out; m->openOutputFile(outputFilename, out); ifstream in; m->openInputFile(inputFilename, in); in.seekg(filePos->start); //adjust start if null strings if (filePos->start == 0) { m->zapGremlins(in); m->gobble(in); } bool done = false; int count = 0; while (!done) { if (m->control_pressed) { in.close(); out.close(); return 0; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { string align = seq.getAligned(); string filterSeq = ""; for(int j=0;j' << seq.getName() << endl << filterSeq << endl; count++; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= filePos->end)) { break; } #else if (in.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen(toString(count)+"\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen(toString(count)+"\n"); } out.close(); in.close(); return count; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "driverRunFilter"); exit(1); } } /**************************************************************************************************/ int FilterSeqsCommand::createProcessesRunFilter(string F, string filename, string filteredFastaName) { try { int process = 1; int num = 0; processIDS.clear(); bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ string filteredFasta = filename + m->mothurGetpid(process) + ".temp"; num = driverRunFilter(F, filteredFasta, filename, lines[process]); //pass numSeqs to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + ".temp")); m->mothurRemove(filename + (toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + ".temp"));m->mothurRemove(filename + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ string filteredFasta = filename + m->mothurGetpid(process) + ".temp"; num = driverRunFilter(F, filteredFasta, filename, lines[process]); //pass numSeqs to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } num = driverRunFilter(F, filteredFastaName, filename, lines[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); m->appendFiles((filename + toString(processIDS[i]) + ".temp"), filteredFastaName); m->mothurRemove((filename + toString(processIDS[i]) + ".temp")); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the filterData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to F. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; istart, lines[i]->end, alignmentLength, i); pDataArray.push_back(tempFilter); processIDS.push_back(i); hThreadArray[i] = CreateThread(NULL, 0, MyRunFilterThreadFunction, pDataArray[i], 0, &dwThreadIdArray[i]); } num = driverRunFilter(F, (filteredFastaName + toString(processors-1) + ".temp"), filename, lines[processors-1]); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ num += pDataArray[i]->count; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } for (int i = 1; i < processors; i++) { m->appendFiles((filteredFastaName + toString(i) + ".temp"), filteredFastaName); m->mothurRemove((filteredFastaName + toString(i) + ".temp")); } #endif return num; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "createProcessesRunFilter"); exit(1); } } /**************************************************************************************/ string FilterSeqsCommand::createFilter() { try { string filterString = ""; Filters F; if (soft != 0) { F.setSoft(soft); } if (trump != '*') { F.setTrump(trump); } F.setLength(alignmentLength); if(trump != '*' || m->isTrue(vertical) || soft != 0){ F.initialize(); } if(hard.compare("") != 0) { F.doHard(hard); } else { F.setFilter(string(alignmentLength, '1')); } numSeqs = 0; if(trump != '*' || m->isTrue(vertical) || soft != 0){ for (int s = 0; s < fastafileNames.size(); s++) { for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); #ifdef USE_MPI int pid, numSeqsPerProcessor, num; int tag = 2001; vector MPIPos; MPI_Status status; MPI_File inMPI; MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_Comm_rank(MPI_COMM_WORLD, &pid); //char* tempFileName = new char(fastafileNames[s].length()); //tempFileName = &(fastafileNames[s][0]); char tempFileName[1024]; strcpy(tempFileName, fastafileNames[s].c_str()); MPI_File_open(MPI_COMM_WORLD, tempFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer if (m->control_pressed) { MPI_File_close(&inMPI); return 0; } if (pid == 0) { //you are the root process MPIPos = m->setFilePosFasta(fastafileNames[s], num); //fills MPIPos, returns numSeqs numSeqs += num; //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&num, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (num+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } //figure out how many sequences you have to do numSeqsPerProcessor = num / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = num - pid * numSeqsPerProcessor; } //do your part MPICreateFilter(startIndex, numSeqsPerProcessor, F, inMPI, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); return 0; } }else { //i am the child process MPI_Recv(&num, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(num+1); numSeqs += num; MPI_Recv(&MPIPos[0], (num+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = num / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = num - pid * numSeqsPerProcessor; } //do your part MPICreateFilter(startIndex, numSeqsPerProcessor, F, inMPI, MPIPos); if (m->control_pressed) { MPI_File_close(&inMPI); return 0; } } MPI_File_close(&inMPI); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else vector positions; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastafileNames[s], processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } if(processors == 1){ int numFastaSeqs = driverCreateFilter(F, fastafileNames[s], lines[0]); numSeqs += numFastaSeqs; }else{ int numFastaSeqs = createProcessesCreateFilter(F, fastafileNames[s]); numSeqs += numFastaSeqs; } #else if(processors == 1){ lines.push_back(new linePair(0, 1000)); int numFastaSeqs = driverCreateFilter(F, fastafileNames[s], lines[0]); numSeqs += numFastaSeqs; }else { int numFastaSeqs = 0; positions = m->setFilePosFasta(fastafileNames[s], numFastaSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(new linePair(positions[startIndex], numSeqsPerProcessor)); } numFastaSeqs = createProcessesCreateFilter(F, fastafileNames[s]); numSeqs += numFastaSeqs; } #endif //save the file positions so we can reuse them in the runFilter function if (!recalced) { savedPositions[s] = positions; } if (m->control_pressed) { return filterString; } #endif } } #ifdef USE_MPI int pid; int Atag = 1; int Ttag = 2; int Ctag = 3; int Gtag = 4; int Gaptag = 5; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if(trump != '*' || m->isTrue(vertical) || soft != 0){ if (pid == 0) { //only one process should output the filter vector temp; temp.resize(alignmentLength+1); //get the frequencies from the child processes for(int i = 1; i < processors; i++) { for (int j = 0; j < 5; j++) { MPI_Recv(&temp[0], (alignmentLength+1), MPI_INT, i, 2001, MPI_COMM_WORLD, &status); int receiveTag = temp[temp.size()-1]; //child process added a int to the end to indicate what letter count this is for if (receiveTag == Atag) { //you are recieveing the A frequencies for (int k = 0; k < alignmentLength; k++) { F.a[k] += temp[k]; } }else if (receiveTag == Ttag) { //you are recieveing the T frequencies for (int k = 0; k < alignmentLength; k++) { F.t[k] += temp[k]; } }else if (receiveTag == Ctag) { //you are recieveing the C frequencies for (int k = 0; k < alignmentLength; k++) { F.c[k] += temp[k]; } }else if (receiveTag == Gtag) { //you are recieveing the G frequencies for (int k = 0; k < alignmentLength; k++) { F.g[k] += temp[k]; } }else if (receiveTag == Gaptag) { //you are recieveing the gap frequencies for (int k = 0; k < alignmentLength; k++) { F.gap[k] += temp[k]; } } } } }else{ //send my fequency counts F.a.push_back(Atag); int ierr = MPI_Send(&(F.a[0]), (alignmentLength+1), MPI_INT, 0, 2001, MPI_COMM_WORLD); F.t.push_back(Ttag); ierr = MPI_Send (&(F.t[0]), (alignmentLength+1), MPI_INT, 0, 2001, MPI_COMM_WORLD); F.c.push_back(Ctag); ierr = MPI_Send(&(F.c[0]), (alignmentLength+1), MPI_INT, 0, 2001, MPI_COMM_WORLD); F.g.push_back(Gtag); ierr = MPI_Send(&(F.g[0]), (alignmentLength+1), MPI_INT, 0, 2001, MPI_COMM_WORLD); F.gap.push_back(Gaptag); ierr = MPI_Send(&(F.gap[0]), (alignmentLength+1), MPI_INT, 0, 2001, MPI_COMM_WORLD); } } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case if (pid == 0) { //only one process should output the filter #endif F.setNumSeqs(numSeqs); if(m->isTrue(vertical) == 1) { F.doVertical(); } if(soft != 0) { F.doSoft(); } filterString = F.getFilter(); #ifdef USE_MPI //send filter string to kids //for(int i = 1; i < processors; i++) { // MPI_Send(&filterString[0], alignmentLength, MPI_CHAR, i, 2001, MPI_COMM_WORLD); //} MPI_Bcast(&filterString[0], alignmentLength, MPI_CHAR, 0, MPI_COMM_WORLD); }else{ //recieve filterString char* tempBuf = new char[alignmentLength]; //MPI_Recv(&tempBuf[0], alignmentLength, MPI_CHAR, 0, 2001, MPI_COMM_WORLD, &status); MPI_Bcast(tempBuf, alignmentLength, MPI_CHAR, 0, MPI_COMM_WORLD); filterString = tempBuf; if (filterString.length() > alignmentLength) { filterString = filterString.substr(0, alignmentLength); } delete tempBuf; } MPI_Barrier(MPI_COMM_WORLD); #endif return filterString; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "createFilter"); exit(1); } } /**************************************************************************************/ int FilterSeqsCommand::driverCreateFilter(Filters& F, string filename, linePair* filePos) { try { ifstream in; m->openInputFile(filename, in); in.seekg(filePos->start); //adjust start if null strings if (filePos->start == 0) { m->zapGremlins(in); m->gobble(in); } bool done = false; int count = 0; bool error = false; while (!done) { if (m->control_pressed) { in.close(); return 1; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { if (m->debug) { m->mothurOut("[DEBUG]: " + seq.getName() + " length = " + toString(seq.getAligned().length())); m->mothurOutEndLine();} if (seq.getAligned().length() != alignmentLength) { m->mothurOut("[ERROR]: Sequences are not all the same length, please correct."); m->mothurOutEndLine(); error = true; if (!m->debug) { m->control_pressed = true; } } if(trump != '*') { F.doTrump(seq); } if(m->isTrue(vertical) || soft != 0) { F.getFreqs(seq); } cout.flush(); count++; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= filePos->end)) { break; } #else if (in.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen(toString(count)+"\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen(toString(count)+"\n"); } in.close(); if (error) { m->control_pressed = true; } return count; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "driverCreateFilter"); exit(1); } } #ifdef USE_MPI /**************************************************************************************/ int FilterSeqsCommand::MPICreateFilter(int start, int num, Filters& F, MPI_File& inMPI, vector& MPIPos) { try { MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are for(int i=0;icontrol_pressed) { return 0; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); delete buf4; Sequence seq(iss); if (seq.getAligned().length() != alignmentLength) { cout << "Alignment length is " << alignmentLength << " and sequence " << seq.getName() << " has length " << seq.getAligned().length() << ", please correct." << endl; exit(1); } if(trump != '*'){ F.doTrump(seq); } if(m->isTrue(vertical) || soft != 0){ F.getFreqs(seq); } cout.flush(); //report progress if((i+1) % 100 == 0){ cout << (i+1) << endl; } } //report progress if((num) % 100 != 0){ cout << num << endl; } return 0; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "MPICreateFilter"); exit(1); } } #endif /**************************************************************************************************/ int FilterSeqsCommand::createProcessesCreateFilter(Filters& F, string filename) { try { int process = 1; int num = 0; processIDS.clear(); bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ //reset child's filter counts to 0; F.a.clear(); F.a.resize(alignmentLength, 0); F.t.clear(); F.t.resize(alignmentLength, 0); F.g.clear(); F.g.resize(alignmentLength, 0); F.c.clear(); F.c.resize(alignmentLength, 0); F.gap.clear(); F.gap.resize(alignmentLength, 0); num = driverCreateFilter(F, filename, lines[process]); //write out filter counts to file filename += m->mothurGetpid(process) + "filterValues.temp"; ofstream out; m->openOutputFile(filename, out); out << num << endl; out << F.getFilter() << endl; for (int k = 0; k < alignmentLength; k++) { out << F.a[k] << '\t'; } out << endl; for (int k = 0; k < alignmentLength; k++) { out << F.t[k] << '\t'; } out << endl; for (int k = 0; k < alignmentLength; k++) { out << F.g[k] << '\t'; } out << endl; for (int k = 0; k < alignmentLength; k++) { out << F.c[k] << '\t'; } out << endl; for (int k = 0; k < alignmentLength; k++) { out << F.gap[k] << '\t'; } out << endl; //cout << F.getFilter() << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + "filterValues.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + "filterValues.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 1; recalced = true; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ //reset child's filter counts to 0; F.a.clear(); F.a.resize(alignmentLength, 0); F.t.clear(); F.t.resize(alignmentLength, 0); F.g.clear(); F.g.resize(alignmentLength, 0); F.c.clear(); F.c.resize(alignmentLength, 0); F.gap.clear(); F.gap.resize(alignmentLength, 0); num = driverCreateFilter(F, filename, lines[process]); //write out filter counts to file filename += m->mothurGetpid(process) + "filterValues.temp"; ofstream out; m->openOutputFile(filename, out); out << num << endl; out << F.getFilter() << endl; for (int k = 0; k < alignmentLength; k++) { out << F.a[k] << '\t'; } out << endl; for (int k = 0; k < alignmentLength; k++) { out << F.t[k] << '\t'; } out << endl; for (int k = 0; k < alignmentLength; k++) { out << F.g[k] << '\t'; } out << endl; for (int k = 0; k < alignmentLength; k++) { out << F.c[k] << '\t'; } out << endl; for (int k = 0; k < alignmentLength; k++) { out << F.gap[k] << '\t'; } out << endl; //cout << F.getFilter() << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //parent do your part num = driverCreateFilter(F, filename, lines[0]); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } //parent reads in and combines Filter info for (int i = 0; i < processIDS.size(); i++) { string tempFilename = filename + toString(processIDS[i]) + "filterValues.temp"; ifstream in; m->openInputFile(tempFilename, in); int temp, tempNum; string tempFilterString; in >> tempNum; m->gobble(in); num += tempNum; in >> tempFilterString; F.mergeFilter(tempFilterString); for (int k = 0; k < alignmentLength; k++) { in >> temp; F.a[k] += temp; } m->gobble(in); for (int k = 0; k < alignmentLength; k++) { in >> temp; F.t[k] += temp; } m->gobble(in); for (int k = 0; k < alignmentLength; k++) { in >> temp; F.g[k] += temp; } m->gobble(in); for (int k = 0; k < alignmentLength; k++) { in >> temp; F.c[k] += temp; } m->gobble(in); for (int k = 0; k < alignmentLength; k++) { in >> temp; F.gap[k] += temp; } m->gobble(in); in.close(); m->mothurRemove(tempFilename); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the filterData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to F. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors]; HANDLE hThreadArray[processors]; //Create processor worker threads. for( int i=0; istart, lines[i]->end, alignmentLength, trump, vertical, soft, hard, i); pDataArray.push_back(tempFilter); processIDS.push_back(i); hThreadArray[i] = CreateThread(NULL, 0, MyCreateFilterThreadFunction, pDataArray[i], 0, &dwThreadIdArray[i]); } //Wait until all threads have terminated. WaitForMultipleObjects(processors, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ num += pDataArray[i]->count; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } F.mergeFilter(pDataArray[i]->F.getFilter()); for (int k = 0; k < alignmentLength; k++) { F.a[k] += pDataArray[i]->F.a[k]; } for (int k = 0; k < alignmentLength; k++) { F.t[k] += pDataArray[i]->F.t[k]; } for (int k = 0; k < alignmentLength; k++) { F.g[k] += pDataArray[i]->F.g[k]; } for (int k = 0; k < alignmentLength; k++) { F.c[k] += pDataArray[i]->F.c[k]; } for (int k = 0; k < alignmentLength; k++) { F.gap[k] += pDataArray[i]->F.gap[k]; } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif return num; } catch(exception& e) { m->errorOut(e, "FilterSeqsCommand", "createProcessesCreateFilter"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/filterseqscommand.h000066400000000000000000000176001255543666200221340ustar00rootroot00000000000000#ifndef FILTERSEQSCOMMAND_H #define FILTERSEQSCOMMAND_H /* * filterseqscommand.h * Mothur * * Created by Thomas Ryabin on 5/4/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "filters.h" class Sequence; class FilterSeqsCommand : public Command { public: FilterSeqsCommand(string); FilterSeqsCommand(); ~FilterSeqsCommand() {}; vector setParameters(); string getCommandName() { return "filter.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Filter.seqs"; } string getDescription() { return "removes columns from alignments based on a criteria defined by the user"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lines; vector processIDS; map > savedPositions; string vertical, filter, fasta, hard, outputDir, filterFileName; vector fastafileNames; int alignmentLength, processors; vector bufferSizes; vector outputNames; char trump; bool abort, recalced; float soft; int numSeqs; string createFilter(); int filterSequences(); int createProcessesCreateFilter(Filters&, string); int createProcessesRunFilter(string, string, string); int driverRunFilter(string, string, string, linePair*); int driverCreateFilter(Filters& F, string filename, linePair* line); #ifdef USE_MPI int driverMPIRun(int, int, MPI_File&, MPI_File&, vector&); int MPICreateFilter(int, int, Filters&, MPI_File&, vector&); #endif }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct filterData { Filters F; int count, tid, alignmentLength; unsigned long long start, end; MothurOut* m; string filename, vertical, hard; char trump; float soft; filterData(){} filterData(string fn, MothurOut* mout, unsigned long long st, unsigned long long en, int aLength, char tr, string vert, float so, string ha, int t) { filename = fn; m = mout; start = st; end = en; tid = t; trump = tr; alignmentLength = aLength; vertical = vert; soft = so; hard = ha; count = 0; } }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct filterRunData { int count, tid, alignmentLength; unsigned long long start, end; MothurOut* m; string filename; string filter, outputFilename; filterRunData(){} filterRunData(string f, string fn, string ofn, MothurOut* mout, unsigned long long st, unsigned long long en, int aLength, int t) { filter = f; outputFilename = ofn; filename = fn; m = mout; start = st; end = en; tid = t; alignmentLength = aLength; count = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyCreateFilterThreadFunction(LPVOID lpParam){ filterData* pDataArray; pDataArray = (filterData*)lpParam; try { if (pDataArray->soft != 0) { pDataArray->F.setSoft(pDataArray->soft); } if (pDataArray->trump != '*') { pDataArray->F.setTrump(pDataArray->trump); } pDataArray->F.setLength(pDataArray->alignmentLength); if(pDataArray->trump != '*' || pDataArray->m->isTrue(pDataArray->vertical) || pDataArray->soft != 0){ pDataArray->F.initialize(); } if(pDataArray->hard.compare("") != 0) { pDataArray->F.doHard(pDataArray->hard); } else { pDataArray->F.setFilter(string(pDataArray->alignmentLength, '1')); } ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->zapGremlins(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } pDataArray->count = 0; for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { in.close(); pDataArray->count = 1; return 1; } Sequence current(in); pDataArray->m->gobble(in); if (current.getName() != "") { if (current.getAligned().length() != pDataArray->alignmentLength) { pDataArray->m->mothurOut("Sequences are not all the same length, please correct."); pDataArray->m->mothurOutEndLine(); pDataArray->m->control_pressed = true; } if(pDataArray->trump != '*') { pDataArray->F.doTrump(current); } if(pDataArray->m->isTrue(pDataArray->vertical) || pDataArray->soft != 0) { pDataArray->F.getFreqs(current); } } pDataArray->count++; //report progress if((i) % 100 == 0){ pDataArray->m->mothurOutJustToScreen(toString(i)+"\n"); } } if((pDataArray->count) % 100 != 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count)+"\n"); } in.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "FilterSeqsCommand", "MyCreateFilterThreadFunction"); exit(1); } } /**************************************************************************************************/ static DWORD WINAPI MyRunFilterThreadFunction(LPVOID lpParam){ filterRunData* pDataArray; pDataArray = (filterRunData*)lpParam; try { ofstream out; pDataArray->m->openOutputFile(pDataArray->outputFilename, out); ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->zapGremlins(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } pDataArray->count = 0; for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { in.close(); out.close(); pDataArray->count = 1; return 1; } Sequence seq(in); pDataArray->m->gobble(in); if (seq.getName() != "") { string align = seq.getAligned(); string filterSeq = ""; for(int j=0;jalignmentLength;j++){ if(pDataArray->filter[j] == '1'){ filterSeq += align[j]; } } out << '>' << seq.getName() << endl << filterSeq << endl; } pDataArray->count++; //report progress if((i) % 100 == 0){ pDataArray->m->mothurOutJustToScreen(toString(i)+"\n"); } } if((pDataArray->count) % 100 != 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count)+"\n"); } in.close(); out.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "FilterSeqsCommand", "MyRunFilterThreadFunction"); exit(1); } } /**************************************************************************************************/ #endif #endif mothur-1.36.1/source/commands/filtersharedcommand.cpp000066400000000000000000000571611255543666200227700ustar00rootroot00000000000000// // filtersharedcommand.cpp // Mothur // // Created by Sarah Westcott on 1/4/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "filtersharedcommand.h" //********************************************************************************************************************** vector FilterSharedCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","shared",false,true,true); parameters.push_back(pshared); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pminpercent("minpercent", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(pminpercent); CommandParameter prarepercent("rarepercent", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(prarepercent); CommandParameter pminabund("minabund", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(pminabund); CommandParameter pmintotal("mintotal", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(pmintotal); CommandParameter pminnumsamples("minnumsamples", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(pminnumsamples); CommandParameter pminpercentsamples("minpercentsamples", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(pminpercentsamples); CommandParameter pkeepties("keepties", "Boolean", "", "T", "", "", "","",false,false,true); parameters.push_back(pkeepties); CommandParameter pmakerare("makerare", "Boolean", "", "T", "", "", "","",false,false,true); parameters.push_back(pmakerare); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "FilterSharedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string FilterSharedCommand::getHelpString(){ try { string helpString = ""; helpString += "The filter.shared command is used to remove OTUs based on various critieria.\n"; helpString += "The filter.shared command parameters are shared, minpercent, minabund, mintotal, minnumsamples, minpercentsamples, rarepercent, makerare, keepties, groups and label. You must provide a shared file.\n"; helpString += "The groups parameter allows you to specify which of the groups you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distance levels you would like, and are also separated by dashes.\n"; helpString += "The minabund parameter allows you indicate the minimum abundance required for each sample in a given OTU. If any samples abundance falls below the minimum, the OTU is removed. Default=0\n"; helpString += "The minpercent parameter allows you indicate the minimum relative abundance of an OTU. For example, if the OTUs total abundance across all samples is 8, and the total abundance across all OTUs is 1000, and minpercent=1. The OTU's relative abundance is 0.008, the minimum is 0.01, so the OTU will be removed. Default=0.\n"; helpString += "The rarepercent parameter allows you indicate the percentage of otus to remove. The OTUs chosen to be removed are the rarest. For example if you have 1000 OTUs, rarepercent=20 would remove the 200 OTUs with the lowest abundance. Default=0.\n"; helpString += "The keepties parameter is used with the rarepercent parameter. It allows you indicate you want to keep the OTUs with the same abundance as the first 'not rare' OTU. For example if you have 10 OTUs, rarepercent=20 abundances of 20, 18, 15, 15, 10, 5, 3, 3, 3, 1. keepties=t, would remove the 10th OTU, but keep the 9th because its abundance ties the 8th OTU. keepties=f would remove OTUs 9 and 10. Default=T\n"; helpString += "The minnumsamples parameter allows you indicate the minimum number of samples present in an OTU. If the number of samples present falls below the minimum, the OTU is removed. Default=0.\n"; helpString += "The minpercentsamples parameter allows you indicate the minimum percent of sample present in an OTU. For example, if the total number of samples is 10, the number present is 3, and the minpercentsamples=50. The OTU's precent of samples is 0.333, the minimum is 0.50, so the OTU will be removed. Default=0.\n"; helpString += "The mintotal parameter allows you indicate the minimum abundance required for a given OTU. If abundance across all samples falls below the minimum, the OTU is removed. Default=0.\n"; helpString += "The makerare parameter allows you indicate you want the abundances of any removed OTUs to be saved and a new \"rare\" OTU created with its abundances equal to the sum of the OTUs removed. This will preserve the number of reads in your dataset. Default=T\n"; helpString += "The filter.shared command should be in the following format: filter.shared(shared=yourSharedFile, minabund=yourMinAbund, groups=yourGroups, label=yourLabels).\n"; helpString += "Example filter.shared(shared=final.an.shared, minabund=10).\n"; helpString += "The default value for groups is all the groups in your sharedfile, and all labels in your inputfile will be used.\n"; helpString += "The filter.shared command outputs a .filter.shared file.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "FilterSharedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string FilterSharedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "shared") { pattern = "[filename],[distance],filter,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "FilterSharedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** FilterSharedCommand::FilterSharedCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["shared"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "FilterSharedCommand", "GetRelAbundCommand"); exit(1); } } //********************************************************************************************************************** FilterSharedCommand::FilterSharedCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["shared"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; pickedGroups = false; } else { pickedGroups = true; m->splitAtDash(groups, Groups); m->setGroups(Groups); } bool setSomething = false; string temp = validParameter.validFile(parameters, "minabund", false); if (temp == "not found"){ temp = "-1"; } else { setSomething = true; } m->mothurConvert(temp, minAbund); temp = validParameter.validFile(parameters, "mintotal", false); if (temp == "not found"){ temp = "-1"; } else { setSomething = true; } m->mothurConvert(temp, minTotal); temp = validParameter.validFile(parameters, "minnumsamples", false); if (temp == "not found"){ temp = "-1"; } else { setSomething = true; } m->mothurConvert(temp, minSamples); temp = validParameter.validFile(parameters, "minpercent", false); if (temp == "not found"){ temp = "-1"; } else { setSomething = true; } m->mothurConvert(temp, minPercent); if (minPercent < 1) {} //already in percent form else { minPercent = minPercent / 100.0; } //user gave us a whole number version so convert to % temp = validParameter.validFile(parameters, "rarepercent", false); if (temp == "not found"){ temp = "-1"; } else { setSomething = true; } m->mothurConvert(temp, rarePercent); if (rarePercent < 1) {} //already in percent form else { rarePercent = rarePercent / 100.0; } //user gave us a whole number version so convert to % temp = validParameter.validFile(parameters, "minpercentsamples", false); if (temp == "not found"){ temp = "-1"; } else { setSomething = true; } m->mothurConvert(temp, minPercentSamples); if (minPercentSamples < 1) {} //already in percent form else { minPercentSamples = minPercentSamples / 100.0; } //user gave us a whole number version so convert to % temp = validParameter.validFile(parameters, "makerare", false); if (temp == "not found"){ temp = "T"; } makeRare = m->isTrue(temp); temp = validParameter.validFile(parameters, "keepties", false); if (temp == "not found"){ temp = "T"; } keepties = m->isTrue(temp); if (!setSomething) { m->mothurOut("\nYou did not set any parameters. I will filter using minabund=1.\n\n"); minAbund = 1; } } } catch(exception& e) { m->errorOut(e, "FilterSharedCommand", "FilterSharedCommand"); exit(1); } } //********************************************************************************************************************** int FilterSharedCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //set shared file as new current sharedfile string current = ""; itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "FilterSharedCommand", "execute"); exit(1); } } //********************************************************************************************************************** int FilterSharedCommand::processShared(vector& thislookup) { try { //save mothurOut's binLabels to restore for next label vector saveBinLabels = m->currentSharedBinLabels; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); variables["[distance]"] = thislookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); if (m->control_pressed) { return 0; } vector filteredLabels; vector rareCounts; rareCounts.resize(m->getGroups().size(), 0); //create new "filtered" lookup vector filteredLookup; for (int i = 0; i < thislookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); filteredLookup.push_back(temp); } //you want to remove a percentage of OTUs set removeLabels; if (rarePercent != -0.01) { vector otus; //rank otus by abundance for (int i = 0; i < thislookup[0]->getNumBins(); i++) { float otuTotal = 0.0; for (int j = 0; j < thislookup.size(); j++) { otuTotal += thislookup[j]->getAbundance(i); } spearmanRank temp(saveBinLabels[i], otuTotal); otus.push_back(temp); } //sort by abundance sort(otus.begin(), otus.end(), compareSpearman); //find index of cutoff int indexFirstNotRare = ceil(rarePercent * (float)thislookup[0]->getNumBins()); //handle ties if (keepties) { //adjust indexFirstNotRare if needed if (indexFirstNotRare != 0) { //not out of bounds if (otus[indexFirstNotRare].score == otus[indexFirstNotRare-1].score) { //you have a tie bool tie = true; for (int i = indexFirstNotRare-1; i >=0; i--) { if (otus[indexFirstNotRare].score != otus[i].score) { //found value below tie indexFirstNotRare = i+1; tie = false; break; } } if (tie) { if (m->debug) { m->mothurOut("For distance " + thislookup[0]->getLabel() + " all rare OTUs abundance tie with first 'non rare' OTU, not removing any for rarepercent parameter.\n"); }indexFirstNotRare = 0; } } } } //saved labels for OTUs above rarepercent for (int i = 0; i < indexFirstNotRare; i++) { removeLabels.insert(otus[i].name); } } bool filteredSomething = false; int numRemoved = 0; for (int i = 0; i < thislookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < filteredLookup.size(); j++) { delete filteredLookup[j]; } return 0; } bool okay = true; //innocent until proven guilty if (minAbund != -1) { for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) < minAbund) { okay = false; break; } } } if (okay && (minTotal != -1)) { int otuTotal = 0; for (int j = 0; j < thislookup.size(); j++) { otuTotal += thislookup[j]->getAbundance(i); } if (otuTotal < minTotal) { okay = false; } } if (okay && (minPercent != -0.01)) { double otuTotal = 0; double total = 0; for (int j = 0; j < thislookup.size(); j++) { otuTotal += thislookup[j]->getAbundance(i); total += thislookup[j]->getNumSeqs(); } double percent = otuTotal / total; if (percent < minPercent) { okay = false; } } if (okay && (minSamples != -1)) { int samples = 0; for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { samples++; } } if (samples < minSamples) { okay = false; } } if (okay && (minPercentSamples != -0.01)) { double samples = 0; double total = thislookup.size(); for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { samples++; } } double percent = samples / total; if (percent < minPercentSamples) { okay = false; } } if (okay && (rarePercent != -0.01)) { if (removeLabels.count(saveBinLabels[i]) != 0) { //are we on the 'bad' list okay = false; } } //did this OTU pass the filter criteria if (okay) { filteredLabels.push_back(saveBinLabels[i]); for (int j = 0; j < filteredLookup.size(); j++) { //add this OTU to the filtered lookup filteredLookup[j]->push_back(thislookup[j]->getAbundance(i), thislookup[j]->getGroup()); } }else { //if not, do we want to save the counts filteredSomething = true; if (makeRare) { for (int j = 0; j < rareCounts.size(); j++) { rareCounts[j] += thislookup[j]->getAbundance(i); } } numRemoved++; } } //if we are saving the counts add a "rare" OTU if anything was filtered if (makeRare) { if (filteredSomething) { for (int j = 0; j < rareCounts.size(); j++) { //add "rare" OTU to the filtered lookup filteredLookup[j]->push_back(rareCounts[j], thislookup[j]->getGroup()); } //create label for rare OTUs filteredLabels.push_back("rareOTUs"); } } ofstream out; m->openOutputFile(outputFileName, out); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); m->currentSharedBinLabels = filteredLabels; filteredLookup[0]->printHeaders(out); for (int i = 0; i < filteredLookup.size(); i++) { out << filteredLookup[i]->getLabel() << '\t' << filteredLookup[i]->getGroup() << '\t'; filteredLookup[i]->print(out); } out.close(); //save mothurOut's binLabels to restore for next label m->currentSharedBinLabels = saveBinLabels; for (int j = 0; j < filteredLookup.size(); j++) { delete filteredLookup[j]; } m->mothurOut("\nRemoved " + toString(numRemoved) + " OTUs.\n"); return 0; } catch(exception& e) { m->errorOut(e, "FilterSharedCommand", "processShared"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/filtersharedcommand.h000066400000000000000000000023421255543666200224240ustar00rootroot00000000000000// // filtersharedcommand.h // Mothur // // Created by Sarah Westcott on 1/4/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_filtersharedcommand_h #define Mothur_filtersharedcommand_h #include "command.hpp" #include "sharedrabundvector.h" #include "inputdata.h" class FilterSharedCommand : public Command { public: FilterSharedCommand(string); FilterSharedCommand(); ~FilterSharedCommand() {} vector setParameters(); string getCommandName() { return "filter.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Filter.shared"; } string getDescription() { return "remove OTUs based on various criteria"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, pickedGroups, allLines, makeRare, keepties; set labels; //holds labels to be used string groups, label, outputDir, sharedfile; vector Groups, outputNames; int minAbund, minTotal, minSamples; float minPercent, minPercentSamples, rarePercent; int processShared(vector&); }; #endif mothur-1.36.1/source/commands/getcommandinfocommand.cpp000066400000000000000000000264741255543666200233110ustar00rootroot00000000000000/* * getcommandinfo.cpp * Mothur * * Created by westcott on 4/6/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "getcommandinfocommand.h" //********************************************************************************************************************** vector GetCommandInfoCommand::setParameters(){ try { CommandParameter poutput("output", "String", "", "", "", "", "","",false,false); parameters.push_back(poutput); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetCommandInfoCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetCommandInfoCommand::getHelpString(){ try { string helpString = ""; helpString += "This command is used by the gui to get the information about current commands available in mothur.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetCommandInfoCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** GetCommandInfoCommand::GetCommandInfoCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } output = validParameter.validFile(parameters, "output", false); if (output == "not found") { output = ""; m->mothurOut("You must provide an output filename."); m->mothurOutEndLine(); abort=true; } } } catch(exception& e) { m->errorOut(e, "GetCommandInfoCommand", "GetCommandInfoCommand"); exit(1); } } //********************************************************************************************************************** int GetCommandInfoCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } commandFactory = CommandFactory::getInstance(); ofstream out; m->openOutputFile(output+".temp", out); int numNonHidden = 0; out << "mothurLocation=" << m->getFullPathName(m->argv) << endl; out << "mothurVersion=" << m->getVersion() << endl; map commands = commandFactory->getListCommands(); map::iterator it; //loop through each command outputting info for (it = commands.begin(); it != commands.end(); it++) { if (m->control_pressed) { m->mothurOut("[ERROR]: did not complete making the file."); m->mothurOutEndLine(); out.close(); m->mothurRemove((output+".temp")); } Command* thisCommand = commandFactory->getCommand(it->first); //don't add hidden commands if (thisCommand->getCommandCategory() != "Hidden") { numNonHidden++; //general info out << "commandName=" << thisCommand->getCommandName() << endl; //cout << thisCommand->getCommandName() << " current citation = " << thisCommand->getCitation() << endl; out << "commandCategory=" << thisCommand->getCommandCategory() << endl; //remove /n from help string since gui reads line by line string myhelpString = thisCommand->getHelpString(); string newHelpString = ""; for (int i = 0; i < myhelpString.length(); i++) { if (myhelpString[i] != '\n') { newHelpString += myhelpString[i]; } } out << "help=" << newHelpString << endl; //remove /n from citation string since gui reads line by line string mycitationString = thisCommand->getCitation(); string newCitationString = ""; for (int i = 0; i < mycitationString.length(); i++) { if (mycitationString[i] != '\n') { newCitationString += mycitationString[i]; } } out << "citation=" << newCitationString << endl; out << "description=" << thisCommand->getDescription() << endl; //outputTypes - makes something like outputTypes=fasta-name-qfile map > thisOutputTypes = thisCommand->getOutputFiles(); map >::iterator itTypes; if (thisOutputTypes.size() == 0) { out << "outputTypesNames=0" << endl; } else { //string types = ""; //for (itTypes = thisOutputTypes.begin(); itTypes != thisOutputTypes.end(); itTypes++) { types += itTypes->first + "-"; } //rip off last - //types = types.substr(0, types.length()-1); out << "outputTypesNames=" << thisOutputTypes.size() << endl; for (itTypes = thisOutputTypes.begin(); itTypes != thisOutputTypes.end(); itTypes++) { out << itTypes->first << "=" << thisCommand->getOutputPattern(itTypes->first) << endl; } } vector booleans; vector numbers; vector multiples; vector Strings; vector inputGroupNames; map inputTypes; getInfo(thisCommand->getParameters(), booleans, numbers, multiples, Strings, inputGroupNames, inputTypes); //output booleans out << "Boolean=" << booleans.size() << endl; for (int i = 0; i < booleans.size(); i++) { out << booleans[i] << endl; } //output mulitples out << "Multiple=" << multiples.size() << endl; for (int i = 0; i < multiples.size(); i++) { out << multiples[i] << endl; } //output numbers out << "Numbers=" << numbers.size() << endl; for (int i = 0; i < numbers.size(); i++) { out << numbers[i] << endl; } //output strings out << "String=" << Strings.size() << endl; for (int i = 0; i < Strings.size(); i++) { out << Strings[i] << endl; } //output groups out << "inputGroupNames=" << inputGroupNames.size() << endl; for (int i = 0; i < inputGroupNames.size(); i++) { out << inputGroupNames[i] << endl; } //output input types if (inputTypes.size() == 0) { out << "inputTypes=" << endl; } else { string types = ""; for (map::iterator it2 = inputTypes.begin(); it2 != inputTypes.end(); it2++) { types += it2->first + "-"; } //rip off last - types = types.substr(0, types.length()-1); out << "inputTypes=" << types << endl; for (map::iterator it2 = inputTypes.begin(); it2 != inputTypes.end(); it2++) { out << it2->first << "=" << it2->second << endl; } } } } out.close(); ofstream out2; m->openOutputFile(output, out2); out2 << numNonHidden << endl; out2.close(); m->appendFiles(output+".temp", output); m->mothurRemove((output+".temp")); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(output); m->mothurOutEndLine(); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetCommandInfoCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetCommandInfoCommand::getInfo(vector para, vector& booleans, vector& numbers, vector& multiples, vector& strings, vector& inputGroupNames, map& inputTypes){ try { map > groups; map >::iterator itGroups; for (int i = 0; i < para.size(); i++) { if ((para[i].name == "inputdir") || (para[i].name == "outputdir")) {} //ignore else { string important = "|F"; if (para[i].important || para[i].required) { important = "|T"; } string outputType = "|none"; if (para[i].outputTypes != "") { outputType = "|" + para[i].outputTypes; } if (para[i].type == "Boolean") { string temp = para[i].name + "=" + para[i].optionsDefault + important + outputType; booleans.push_back(temp); }else if (para[i].type == "Multiple") { string multAllowed = "F"; if (para[i].multipleSelectionAllowed) { multAllowed = "T"; } string temp = para[i].name + "=" + para[i].options + "|" + para[i].optionsDefault + "|" + multAllowed + important + outputType; multiples.push_back(temp); }else if (para[i].type == "Number") { string temp = para[i].name + "=" + para[i].optionsDefault + important + outputType; numbers.push_back(temp); }else if (para[i].type == "String") { string temp = para[i].name + "=" + para[i].optionsDefault + important + outputType; strings.push_back(temp); }else if (para[i].type == "InputTypes") { string required = "F"; if (para[i].required) { required = "T"; } string temp = required + important + "|" + para[i].chooseOnlyOneGroup + "|" + para[i].chooseAtLeastOneGroup + "|" + para[i].linkedGroup + outputType; inputTypes[para[i].name] = temp; //add choose only one groups vector tempGroups; m->splitAtDash(para[i].chooseOnlyOneGroup, tempGroups); for (int l = 0; l < tempGroups.size(); l++) { groups[tempGroups[l]].insert(para[i].name); } tempGroups.clear(); //add at least one group names m->splitAtDash(para[i].chooseAtLeastOneGroup, tempGroups); for (int l = 0; l < tempGroups.size(); l++) { groups[tempGroups[l]].insert(para[i].name); } tempGroups.clear(); //add at linked group names m->splitAtDash(para[i].linkedGroup, tempGroups); for (int l = 0; l < tempGroups.size(); l++) { groups[tempGroups[l]].insert(para[i].name); } tempGroups.clear(); }else { m->mothurOut("[ERROR]: " + para[i].type + " is an unknown parameter type, please correct."); m->mothurOutEndLine(); } } } for (itGroups = groups.begin(); itGroups != groups.end(); itGroups++) { if (itGroups->first != "none") { set tempNames = itGroups->second; string temp = itGroups->first + "="; for (set::iterator itNames = tempNames.begin(); itNames != tempNames.end(); itNames++) { temp += *itNames + "-"; } //rip off last - temp = temp.substr(0, temp.length()-1); inputGroupNames.push_back(temp); } } return 0; } catch(exception& e) { m->errorOut(e, "GetCommandInfoCommand", "getInfo"); exit(1); } } //**********************************************************************************************************************/ mothur-1.36.1/source/commands/getcommandinfocommand.h000066400000000000000000000023341255543666200227430ustar00rootroot00000000000000#ifndef GETCOMMANDINFOCOMMAND_H #define GETCOMMANDINFOCOMMAND_H /* * getcommandinfo.h * Mothur * * Created by westcott on 4/6/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "commandfactory.hpp" /**********************************************************/ class GetCommandInfoCommand : public Command { public: GetCommandInfoCommand(string); GetCommandInfoCommand() { abort = true; calledHelp = true; setParameters(); } ~GetCommandInfoCommand(){} vector setParameters(); string getCommandName() { return "get.commandinfo"; } string getCommandCategory() { return "Hidden"; } string getHelpString(); string getOutputPattern(string) { return ""; } string getCitation() { return "no citation"; } string getDescription() { return "get.commandinfo"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CommandFactory* commandFactory; string output; bool abort; vector outputNames; int getInfo(vector, vector&, vector&, vector&, vector&, vector&, map&); }; /**********************************************************/ #endif mothur-1.36.1/source/commands/getcoremicrobiomecommand.cpp000066400000000000000000000523731255543666200240120ustar00rootroot00000000000000// // GetCoreMicroBiomeCommand.cpp // Mothur // // Created by John Westcott on 5/8/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "getcoremicrobiomecommand.h" //********************************************************************************************************************** vector GetCoreMicroBiomeCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "SharedRel", "SharedRel", "none","coremicrobiom",false,false, true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "SharedRel", "SharedRel", "none","coremicrobiom",false,false, true); parameters.push_back(prelabund); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter poutput("output", "Multiple", "fraction-count", "fraction", "", "", "","",false,false); parameters.push_back(poutput); CommandParameter pabund("abundance", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pabund); CommandParameter psamples("samples", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(psamples); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetCoreMicroBiomeCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetCoreMicroBiomeCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.coremicrobiome determines the fraction of OTUs that are found in varying numbers of samples for different minimum relative abundances.\n"; helpString += "The get.coremicrobiome parameters are: shared, relabund, groups, label, output, abundance and samples. Shared or relabund is required.\n"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The groups parameter allows you to specify which of the groups you would like analyzed.\n"; helpString += "The output parameter is used to specify whether you would like the fraction of OTU's or OTU count outputted. Options are fraction or count. Default=fraction.\n"; helpString += "The abundance parameter allows you to specify an abundance you would like the OTU names outputted for. Values 1 to 100, will be treated as the percentage. For example relabund=0.01 can be set with abundance=1 or abundance=0.01. For abundance values < 1 percent, abundance=0.001 will specify OTUs with relative abundance of 0.001.\n"; helpString += "The samples parameter allows you to specify the minimum number of samples you would like the OTU names outputted for. Must be an interger between 1 and number of samples in your file.\n"; helpString += "The new command should be in the following format: get.coremicrobiome(shared=yourSharedFile)\n"; helpString += "get.coremicrobiom(shared=final.an.shared, abund=30)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetCoreMicroBiomeCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetCoreMicroBiomeCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "coremicrobiome") { pattern = "[filename],[tag],core.microbiome"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetCoreMicroBiomeCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetCoreMicroBiomeCommand::GetCoreMicroBiomeCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["coremicrobiome"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetCoreMicroBiomeCommand", "GetCoreMicroBiomeCommand"); exit(1); } } //********************************************************************************************************************** GetCoreMicroBiomeCommand::GetCoreMicroBiomeCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } vector tempOutNames; outputTypes["coremicrobiome"] = tempOutNames; //check for parameters sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { inputFileName = sharedfile; format = "sharedfile"; m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { inputFileName = relabundfile; format = "relabund"; m->setRelAbundFile(relabundfile); } if ((relabundfile == "") && (sharedfile == "")) { //is there are current file available for either of these? //give priority to shared, then relabund sharedfile = m->getSharedFile(); if (sharedfile != "") { inputFileName = sharedfile; format="sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputFileName = relabundfile; format="relabund"; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared or relabund."); m->mothurOutEndLine(); abort = true; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputFileName); //if user entered a file with a path then preserve it } string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } output = validParameter.validFile(parameters, "output", false); if(output == "not found"){ output = "fraction"; } if ((output != "fraction") && (output != "count")) { m->mothurOut(output + " is not a valid output form. Options are fraction and count. I will use fraction."); m->mothurOutEndLine(); output = "fraction"; } string temp = validParameter.validFile(parameters, "abundance", false); if (temp == "not found"){ temp = "-1"; } m->mothurConvert(temp, abund); if (abund != -1) { if ((abund < 0) || (abund > 100)) { m->mothurOut(toString(abund) + " is not a valid number for abund. Must be between 0 and 100.\n"); } if (abund < 1) { //convert string temp = toString(abund); string factorString = "1"; bool found = false; for (int i = 0; i < temp.length(); i++) { if (temp[i] == '.') { found = true; } else { if (found) { factorString += "0"; } } } cout << factorString << endl; m->mothurConvert(factorString, factor); }else { factor = 100; abund /= 100; } }else { factor = 100; } temp = validParameter.validFile(parameters, "samples", false); if (temp == "not found"){ temp = "-1"; } m->mothurConvert(temp, samples); } } catch(exception& e) { m->errorOut(e, "GetCoreMicroBiomeCommand", "GetCoreMicroBiomeCommand"); exit(1); } } //********************************************************************************************************************** int GetCoreMicroBiomeCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData input(inputFileName, format); vector lookup = input.getSharedRAbundFloatVectors(); string lastLabel = lookup[0]->getLabel(); if (samples != -1) { if ((samples < 1) || (samples > lookup.size())) { m->mothurOut(toString(samples) + " is not a valid number for samples. Must be an integer between 1 and the number of samples in your file. Your file contains " + toString(lookup.size()) + " samples, so I will use that.\n"); samples = lookup.size(); } } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createTable(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createTable(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //get next line to process lookup = input.getSharedRAbundFloatVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createTable(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetCoreMicroBiomeCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetCoreMicroBiomeCommand::createTable(vector& lookup){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[tag]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("coremicrobiome", variables); outputNames.push_back(outputFileName); outputTypes["coremicrobiome"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); int numSamples = lookup.size(); int numOtus = lookup[0]->getNumBins(); //table is 100 by numsamples //question we are answering is: what fraction of OTUs in a study have a relative abundance at or above %X //in at least %Y samples. x goes from 0 to 100, y from 1 to numSamples vector< vector > table; table.resize(factor+1); for (int i = 0; i < table.size(); i++) { table[i].resize(numSamples, 0.0); } map > otuNames; if ((abund != -1) && (samples == -1)) { //fill with all samples for (int i = 0; i < numSamples; i++) { vector temp; otuNames[i+1] = temp; } }else if ((abund == -1) && (samples != -1)) { //fill with all relabund for (int i = 0; i < factor+1; i++) { vector temp; otuNames[i] = temp; } }else if ((abund != -1) && (samples != -1)) { //only one line is wanted vector temp; int thisAbund = abund*factor; otuNames[thisAbund] = temp; } for (int i = 0; i < numOtus; i++) { if (m->control_pressed) { break; } //count number of samples in this otu with a relabund >= spot in count vector counts; counts.resize(factor+1, 0); for (int j = 0; j < lookup.size(); j++) { double relabund = lookup[j]->getAbundance(i); int wholeRelabund = (int) (floor(relabund*factor)); for (int k = 0; k < wholeRelabund+1; k++) { counts[k]++; } } //add this otus info to table for (int j = 0; j < table.size(); j++) { for (int k = 0; k < counts[j]; k++) { table[j][k]++; } if ((abund == -1) && (samples != -1)) { //we want all OTUs with this number of samples if (counts[j] >= samples) { otuNames[j].push_back(m->currentSharedBinLabels[i]); } }else if ((abund != -1) && (samples == -1)) { //we want all OTUs with this relabund if (j == (abund*factor)) { for (int k = 0; k < counts[j]; k++) { otuNames[k+1].push_back(m->currentSharedBinLabels[i]); } } }else if ((abund != -1) && (samples != -1)) { //we want only OTUs with this relabund for this number of samples if ((j == (abund*factor)) && (counts[j] >= samples)) { otuNames[j].push_back(m->currentSharedBinLabels[i]); } } } } //format output if (output == "fraction") { out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); } out << "NumSamples\t"; //convert table counts to percents int precisionLength = (toString(factor)).length(); for (int i = 0; i < table.size(); i++) { out << "Relabund-" << setprecision(precisionLength-1)<< (float)(i/(float)factor) << "\t"; if (m->control_pressed) { break; } for (int j = 0; j < table[i].size(); j++) { if (output == "fraction") { table[i][j] /= (double) numOtus; } } } out << endl; for (int i = 0; i < numSamples; i++) { if (m->control_pressed) { break; } out << i+1; for (int j = 0; j < table.size(); j++) { out << setprecision(6) << '\t' << table[j][i]; } out << endl; } out.close(); if (m->control_pressed) { return 0; } if ((samples != -1) || (abund != -1)) { string outputFileName2 = outputDir + m->getRootName(m->getSimpleName(inputFileName)) + lookup[0]->getLabel() + ".core.microbiomelist"; outputNames.push_back(outputFileName2); outputTypes["coremicrobiome"].push_back(outputFileName2); ofstream out2; m->openOutputFile(outputFileName2, out2); if ((abund == -1) && (samples != -1)) { //we want all OTUs with this number of samples out2 << "Relabund\tOTUList_for_samples=" << samples << "\n"; }else if ((abund != -1) && (samples == -1)) { //we want all OTUs with this relabund out2 << "Samples\tOTUList_for_abund=" << abund*factor << "\n"; }else if ((abund != -1) && (samples != -1)) { //we want only OTUs with this relabund for this number of samples out2 << "Relabund\tOTUList_for_samples=" << samples << "\n"; } for (map >::iterator it = otuNames.begin(); it != otuNames.end(); it++) { if (m->control_pressed) { break; } vector temp = it->second; string list = m->makeList(temp); if ((abund != -1) && (samples == -1)) { //fill with all samples out2 << it->first << '\t' << list << endl; }else { //fill with relabund out2 << fixed << showpoint << setprecision(precisionLength-1) << (it->first/(float)(factor)) << '\t' << list << endl; } } out2.close(); } return 0; } catch(exception& e) { m->errorOut(e, "GetCoreMicroBiomeCommand", "createTable"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getcoremicrobiomecommand.h000066400000000000000000000032471255543666200234530ustar00rootroot00000000000000#ifndef Mothur_getcoremicrobiomcommand_h #define Mothur_getcoremicrobiomcommand_h // // GetCoreMicroBiomeCommand.h // Mothur // // Created by John Westcott on 5/8/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "inputdata.h" /**************************************************************************************************/ class GetCoreMicroBiomeCommand : public Command { public: GetCoreMicroBiomeCommand(string); GetCoreMicroBiomeCommand(); ~GetCoreMicroBiomeCommand(){} vector setParameters(); string getCommandName() { return "get.coremicrobiome"; } string getCommandCategory() { return "OTU-Based Approaches"; } //commmand category choices: Sequence Processing, OTU-Based Approaches, Hypothesis Testing, Phylotype Analysis, General, Clustering and Hidden string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.coremicrobiome"; } string getDescription() { return "determines the fraction of OTUs that are found in varying numbers of samples for different minimum relative abundances"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string relabundfile, sharedfile, inputFileName, format, output; bool allLines; vector Groups; set labels; bool abort; string outputDir; vector outputNames; float abund; int samples, factor; int createTable(vector&); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/getcurrentcommand.cpp000066400000000000000000000156311255543666200224720ustar00rootroot00000000000000/* * getcurrentcommand.cpp * Mothur * * Created by westcott on 3/16/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "getcurrentcommand.h" //********************************************************************************************************************** vector GetCurrentCommand::setParameters(){ try { CommandParameter pclear("clear", "String", "", "", "", "", "","",false,false); parameters.push_back(pclear); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetCurrentCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetCurrentCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.current command outputs the current files saved by mothur.\n"; helpString += "The get.current command has one parameter: clear.\n"; helpString += "The clear paramter is used to indicate which file types you would like to clear values for, multiple types can be separated by dashes.\n"; helpString += "The get.current command should be in the following format: \n"; helpString += "get.current() or get.current(clear=fasta-name-accnos)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetCurrentCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** GetCurrentCommand::GetCurrentCommand(){ try { abort = true; calledHelp = true; setParameters(); } catch(exception& e) { m->errorOut(e, "GetCurrentCommand", "GetCurrentCommand"); exit(1); } } //********************************************************************************************************************** GetCurrentCommand::GetCurrentCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } clearTypes = validParameter.validFile(parameters, "clear", false); if (clearTypes == "not found") { clearTypes = ""; } else { m->splitAtDash(clearTypes, types); } } } catch(exception& e) { m->errorOut(e, "GetCurrentCommand", "GetCurrentCommand"); exit(1); } } //********************************************************************************************************************** int GetCurrentCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } cFactory = CommandFactory::getInstance(); //user wants to clear a type if (types.size() != 0) { for (int i = 0; i < types.size(); i++) { if (m->control_pressed) { break; } //look for file types if (types[i] == "fasta") { m->setFastaFile(""); }else if (types[i] == "qfile") { m->setQualFile(""); }else if (types[i] == "phylip") { m->setPhylipFile(""); }else if (types[i] == "column") { m->setColumnFile(""); }else if (types[i] == "list") { m->setListFile(""); }else if (types[i] == "rabund") { m->setRabundFile(""); }else if (types[i] == "sabund") { m->setSabundFile(""); }else if (types[i] == "name") { m->setNameFile(""); }else if (types[i] == "group") { m->setGroupFile(""); }else if (types[i] == "order") { m->setOrderFile(""); }else if (types[i] == "ordergroup") { m->setOrderGroupFile(""); }else if (types[i] == "tree") { m->setTreeFile(""); }else if (types[i] == "shared") { m->setSharedFile(""); }else if (types[i] == "relabund") { m->setRelAbundFile(""); }else if (types[i] == "design") { m->setDesignFile(""); }else if (types[i] == "sff") { m->setSFFFile(""); }else if (types[i] == "oligos") { m->setOligosFile(""); }else if (types[i] == "accnos") { m->setAccnosFile(""); }else if (types[i] == "taxonomy") { m->setTaxonomyFile(""); }else if (types[i] == "flow") { m->setFlowFile(""); }else if (types[i] == "biom") { m->setBiomFile(""); }else if (types[i] == "count") { m->setCountTableFile(""); }else if (types[i] == "summary") { m->setSummaryFile(""); }else if (types[i] == "file") { m->setFileFile(""); }else if (types[i] == "processors") { m->setProcessors("1"); }else if (types[i] == "all") { m->clearCurrentFiles(); }else { m->mothurOut("[ERROR]: mothur does not save a current file for " + types[i]); m->mothurOutEndLine(); } } } if (m->hasCurrentFiles()) { m->mothurOutEndLine(); m->mothurOut("Current files saved by mothur:"); m->mothurOutEndLine(); m->printCurrentFiles(); } string inputDir = cFactory->getInputDir(); if (inputDir != "") { m->mothurOutEndLine(); m->mothurOut("Current input directory saved by mothur: " + inputDir); m->mothurOutEndLine(); } string outputDir = cFactory->getOutputDir(); if (outputDir != "") { m->mothurOutEndLine(); m->mothurOut("Current output directory saved by mothur: " + outputDir); m->mothurOutEndLine(); } string defaultPath = m->getDefaultPath(); if (defaultPath != "") { m->mothurOutEndLine(); m->mothurOut("Current default directory saved by mothur: " + defaultPath); m->mothurOutEndLine(); } string temp = "./"; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else temp = ".\\"; #endif temp = m->getFullPathName(temp); m->mothurOutEndLine(); m->mothurOut("Current working directory: " + temp); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetCurrentCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getcurrentcommand.h000066400000000000000000000017371255543666200221410ustar00rootroot00000000000000#ifndef GETCURRENTCOMMAND_H #define GETCURRENTCOMMAND_H /* * getcurrentcommand.h * Mothur * * Created by westcott on 3/16/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "commandfactory.hpp" class GetCurrentCommand : public Command { public: GetCurrentCommand(string); GetCurrentCommand(); ~GetCurrentCommand() {} vector setParameters(); string getCommandName() { return "get.current"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string) { return ""; } string getCitation() { return "http://www.mothur.org/wiki/Get.current"; } string getDescription() { return "get current files saved by mothur"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CommandFactory* cFactory; vector outputNames; bool abort; string clearTypes; vector types; }; #endif mothur-1.36.1/source/commands/getdistscommand.cpp000066400000000000000000000414441255543666200221370ustar00rootroot00000000000000// // getdistscommand.cpp // Mothur // // Created by Sarah Westcott on 1/28/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "getdistscommand.h" //********************************************************************************************************************** vector GetDistsCommand::setParameters(){ try { CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "PhylipColumn", "none","phylip",false,false,true); parameters.push_back(pphylip); CommandParameter pcolumn("column", "InputTypes", "", "", "none", "PhylipColumn", "none","column",false,false,true); parameters.push_back(pcolumn); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(paccnos); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetDistsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetDistsCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.dists command selects distances from a phylip or column file related to groups or sequences listed in an accnos file.\n"; helpString += "The get.dists command parameters are accnos, phylip and column.\n"; helpString += "The get.dists command should be in the following format: get.dists(accnos=yourAccnos, phylip=yourPhylip).\n"; helpString += "Example get.dists(accnos=final.accnos, phylip=final.an.thetayc.0.03.lt.ave.dist).\n"; helpString += "Note: No spaces between parameter labels (i.e. accnos), '=' and parameters (i.e.final.accnos).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetDistsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetDistsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "phylip") { pattern = "[filename],pick,[extension]"; } else if (type == "column") { pattern = "[filename],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetDistsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetDistsCommand::GetDistsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetDistsCommand", "GetDistsCommand"); exit(1); } } //********************************************************************************************************************** GetDistsCommand::GetDistsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["column"] = tempOutNames; outputTypes["phylip"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } } //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = m->getAccnosFile(); if (accnosfile != "") { m->mothurOut("Using " + accnosfile + " as input file for the accnos parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no valid accnos file and accnos is required."); m->mothurOutEndLine(); abort = true; } }else { m->setAccnosFile(accnosfile); } phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { phylipfile = ""; abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { columnfile = ""; abort = true; } else if (columnfile == "not found") { columnfile = ""; } else { m->setColumnFile(columnfile); } if ((phylipfile == "") && (columnfile == "")) { //is there are current file available for either of these? //give priority to column, then phylip columnfile = m->getColumnFile(); if (columnfile != "") { m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a phylip or column file."); m->mothurOutEndLine(); abort = true; } } } } } catch(exception& e) { m->errorOut(e, "GetDistsCommand", "GetDistsCommand"); exit(1); } } //********************************************************************************************************************** int GetDistsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get names you want to keep names = m->readAccnos(accnosfile); if (m->control_pressed) { return 0; } //read through the correct file and output lines you want to keep if (phylipfile != "") { readPhylip(); } if (columnfile != "") { readColumn(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("phylip"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setPhylipFile(current); } } itTypes = outputTypes.find("column"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setColumnFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "GetDistsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetDistsCommand::readPhylip(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(phylipfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(phylipfile)); variables["[extension]"] = m->getExtension(phylipfile); string outputFileName = getOutputFileName("phylip", variables); ifstream in; m->openInputFile(phylipfile, in); float distance; int square, nseqs; string name; unsigned int row; set rows; //converts names in names to a index row = 0; string numTest; in >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } if (names.count(name) != 0) { rows.insert(row); } row++; //is the matrix square? char d; while((d=in.get()) != EOF){ if(isalnum(d)){ square = 1; in.putback(d); for(int i=0;i> distance; } break; } if(d == '\n'){ square = 0; break; } } //map name to row/column if(square == 0){ for(int i=1;i> name; if (names.count(name) != 0) { rows.insert(row); } row++; for(int j=0;jcontrol_pressed) { in.close(); return 0; } in >> distance; } } } else{ for(int i=1;i> name; if (names.count(name) != 0) { rows.insert(row); } row++; for(int j=0;jcontrol_pressed) { in.close(); return 0; } in >> distance; } } } in.close(); if (m->control_pressed) { return 0; } //read through file only printing rows and columns of seqs in names ifstream inPhylip; m->openInputFile(phylipfile, inPhylip); inPhylip >> numTest; ofstream out; m->openOutputFile(outputFileName, out); outputTypes["phylip"].push_back(outputFileName); outputNames.push_back(outputFileName); out << names.size() << endl; unsigned int count = 0; if(square == 0){ for(int i=0;i> name; bool ignoreRow = false; if (names.count(name) == 0) { ignoreRow = true; } else{ out << name << '\t'; count++; } for(int j=0;jcontrol_pressed) { inPhylip.close(); out.close(); return 0; } inPhylip >> distance; if (!ignoreRow) { //is this a column we want if(rows.count(j) != 0) { out << distance << '\t'; } } } if (!ignoreRow) { out << endl; } } } else{ for(int i=0;i> name; bool ignoreRow = false; if (names.count(name) == 0) { ignoreRow = true; } else{ out << name << '\t'; count++; } for(int j=0;jcontrol_pressed) { inPhylip.close(); out.close(); return 0; } inPhylip >> distance; if (!ignoreRow) { //is this a column we want if(rows.count(j) != 0) { out << distance << '\t'; } } } if (!ignoreRow) { out << endl; } } } inPhylip.close(); out.close(); if (count == 0) { m->mothurOut("Your file does NOT contain distances related to groups or sequences listed in the accnos file."); m->mothurOutEndLine(); } else if (count != names.size()) { m->mothurOut("[WARNING]: Your accnos file contains " + toString(names.size()) + " groups or sequences, but I only found " + toString(count) + " of them in the phylip file."); m->mothurOutEndLine(); //rewrite with new number m->renameFile(outputFileName, outputFileName+".temp"); ofstream out2; m->openOutputFile(outputFileName, out2); out2 << count << endl; ifstream in3; m->openInputFile(outputFileName+".temp", in3); in3 >> nseqs; m->gobble(in3); char buffer[4096]; while (!in3.eof()) { in3.read(buffer, 4096); out2.write(buffer, in3.gcount()); } in3.close(); out2.close(); m->mothurRemove(outputFileName+".temp"); } m->mothurOut("Selected " + toString(count) + " groups or sequences from your phylip file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetDistsCommand", "readPhylip"); exit(1); } } //********************************************************************************************************************** int GetDistsCommand::readColumn(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(columnfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(columnfile)); variables["[extension]"] = m->getExtension(columnfile); string outputFileName = getOutputFileName("column", variables); outputTypes["column"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(columnfile, in); set foundNames; string firstName, secondName; float distance; while (!in.eof()) { if (m->control_pressed) { out.close(); in.close(); return 0; } in >> firstName >> secondName >> distance; m->gobble(in); //are both names in the accnos file if ((names.count(firstName) != 0) && (names.count(secondName) != 0)) { out << firstName << '\t' << secondName << '\t' << distance << endl; foundNames.insert(firstName); foundNames.insert(secondName); } } in.close(); out.close(); if (foundNames.size() == 0) { m->mothurOut("Your file does NOT contain distances related to groups or sequences listed in the accnos file."); m->mothurOutEndLine(); } else if (foundNames.size() != names.size()) { m->mothurOut("[WARNING]: Your accnos file contains " + toString(names.size()) + " groups or sequences, but I only found " + toString(foundNames.size()) + " of them in the column file."); m->mothurOutEndLine(); } m->mothurOut("Selected " + toString(foundNames.size()) + " groups or sequences from your column file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetDistsCommand", "readColumn"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getdistscommand.h000066400000000000000000000020301255543666200215700ustar00rootroot00000000000000// // getdistscommand.h // Mothur // // Created by Sarah Westcott on 1/28/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_getdistscommand_h #define Mothur_getdistscommand_h #include "command.hpp" class GetDistsCommand : public Command { public: GetDistsCommand(string); GetDistsCommand(); ~GetDistsCommand(){} vector setParameters(); string getCommandName() { return "get.dists"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.dists"; } string getDescription() { return "gets distances from a phylip or column file related to groups or sequences listed in an accnos file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: set names; string accnosfile, phylipfile, columnfile, outputDir; bool abort; vector outputNames; int readPhylip(); int readColumn(); }; #endif mothur-1.36.1/source/commands/getgroupcommand.cpp000066400000000000000000000123431255543666200221410ustar00rootroot00000000000000/* * getgroupcommand.cpp * Mothur * * Created by Thomas Ryabin on 2/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "getgroupcommand.h" #include "inputdata.h" //********************************************************************************************************************** vector GetgroupCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "current", "none", "none", "none","",false,true, true); parameters.push_back(pshared); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetgroupCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetgroupCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.group command parameter is shared and it's required if you have no valid current file.\n"; helpString += "You may not use any parameters with the get.group command.\n"; helpString += "The get.group command should be in the following format: \n"; helpString += "get.group()\n"; helpString += "Example get.group().\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetgroupCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** GetgroupCommand::GetgroupCommand(){ try { abort = true; calledHelp = true; setParameters(); } catch(exception& e) { m->errorOut(e, "GetgroupCommand", "GetgroupCommand"); exit(1); } } //********************************************************************************************************************** GetgroupCommand::GetgroupCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //get shared file sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); } } } catch(exception& e) { m->errorOut(e, "GetgroupCommand", "GetgroupCommand"); exit(1); } } //********************************************************************************************************************** int GetgroupCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); for (int i = 0; i < lookup.size(); i++) { m->mothurOut(lookup[i]->getGroup()); m->mothurOutEndLine(); delete lookup[i]; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetgroupCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getgroupcommand.h000066400000000000000000000016341255543666200216070ustar00rootroot00000000000000#ifndef GETGROUPCOMMAND_H #define GETGROUPCOMMAND_H /* * getgroupcommand.h * Mothur * * Created by Thomas Ryabin on 2/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" class GetgroupCommand : public Command { public: GetgroupCommand(string); GetgroupCommand(); ~GetgroupCommand() {} vector setParameters(); string getCommandName() { return "get.group"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string) { return ""; } string getCitation() { return "http://www.mothur.org/wiki/Get.group"; } string getDescription() { return "outputs group names"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string outputFile, sharedfile, outputDir; vector outputNames; ofstream out; ifstream in; bool abort; }; #endif mothur-1.36.1/source/commands/getgroupscommand.cpp000066400000000000000000001163611255543666200223310ustar00rootroot00000000000000/* * getgroupscommand.cpp * Mothur * * Created by westcott on 11/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "getgroupscommand.h" #include "sequence.hpp" #include "listvector.hpp" #include "sharedutilities.h" #include "inputdata.h" #include "designmap.h" //********************************************************************************************************************** vector GetGroupsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "FNGLT","fasta",false,false, true); parameters.push_back(pfasta); CommandParameter pshared("shared", "InputTypes", "", "", "none", "sharedGroup", "none","shared",false,false, true); parameters.push_back(pshared); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","name",false,false, true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","count",false,false, true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "sharedGroup", "FNGLT","group",false,false, true); parameters.push_back(pgroup); CommandParameter pdesign("design", "InputTypes", "", "", "none", "sharedGroup", "FNGLT","design",false,false, true); parameters.push_back(pdesign); CommandParameter plist("list", "InputTypes", "", "", "none", "none", "FNGLT","list",false,false, true); parameters.push_back(plist); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "FNGLT","taxonomy",false,false, true); parameters.push_back(ptaxonomy); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetGroupsCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.groups command selects sequences from a specfic group or set of groups from the following file types: fasta, name, group, list, taxonomy, design or shared file.\n"; helpString += "It outputs a file containing the sequences in the those specified groups, or a sharedfile containing only those groups.\n"; helpString += "The get.groups command parameters are accnos, fasta, name, group, list, taxonomy, shared, design and groups. The group or count parameter is required, unless you have a current group or count file, or are using a shared file.\n"; helpString += "You must also provide an accnos containing the list of groups to get or set the groups parameter to the groups you wish to select.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like. You can separate group names with dashes.\n"; helpString += "The get.groups command should be in the following format: get.groups(accnos=yourAccnos, fasta=yourFasta, group=yourGroupFile).\n"; helpString += "Example get.groups(accnos=amazon.accnos, fasta=amazon.fasta, group=amazon.groups).\n"; helpString += "or get.groups(groups=pasture, fasta=amazon.fasta, group=amazon.groups).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetGroupsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],pick,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "name") { pattern = "[filename],pick,[extension]"; } else if (type == "group") { pattern = "[filename],pick,[extension]"; } else if (type == "count") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[tag],pick,[extension]"; } else if (type == "shared") { pattern = "[filename],[tag],pick,[extension]"; } else if (type == "design") { pattern = "[filename],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetGroupsCommand::GetGroupsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["design"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "GetGroupsCommand"); exit(1); } } //********************************************************************************************************************** GetGroupsCommand::GetGroupsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["design"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = ""; } else { m->setAccnosFile(accnosfile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { taxfile = ""; abort = true; } else if (taxfile == "not found") { taxfile = ""; } else { m->setTaxonomyFile(taxfile); } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { designfile = ""; abort = true; } else if (designfile == "not found") { designfile = ""; } else { m->setDesignFile(designfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } if ((sharedfile == "") && (groupfile == "") && (designfile == "") && (countfile == "")) { //is there are current file available for any of these? if ((namefile != "") || (fastafile != "") || (listfile != "") || (taxfile != "")) { //give priority to group, then shared groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current groupfile, countfile or sharedfile and one is required."); m->mothurOutEndLine(); abort = true; } } } }else { //give priority to shared, then group sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current groupfile, designfile, countfile or sharedfile and one is required."); m->mothurOutEndLine(); abort = true; } } } } } } if ((accnosfile == "") && (Groups.size() == 0)) { m->mothurOut("You must provide an accnos file or specify groups using the groups parameter."); m->mothurOutEndLine(); abort = true; } if ((fastafile == "") && (namefile == "") && (countfile == "") && (groupfile == "") && (designfile == "") && (sharedfile == "") && (listfile == "") && (taxfile == "")) { m->mothurOut("You must provide at least one of the following: fasta, name, taxonomy, group, shared, design, count or list."); m->mothurOutEndLine(); abort = true; } if (((groupfile == "") && (countfile == "")) && ((namefile != "") || (fastafile != "") || (listfile != "") || (taxfile != ""))) { m->mothurOut("If using a fasta, name, taxonomy, group or list, then you must provide a group or count file."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if ((namefile == "") && ((fastafile != "") || (taxfile != ""))){ vector files; files.push_back(fastafile); files.push_back(taxfile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "GetGroupsCommand"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get groups you want to remove if (accnosfile != "") { m->readAccnos(accnosfile, Groups); m->setGroups(Groups); } if (groupfile != "") { groupMap = new GroupMap(groupfile); groupMap->readMap(); //make sure groups are valid //takes care of user setting groupNames that are invalid or setting groups=all SharedUtil* util = new SharedUtil(); vector gNamesOfGroups = groupMap->getNamesOfGroups(); util->setGroups(Groups, gNamesOfGroups); m->setGroups(Groups); groupMap->setNamesOfGroups(gNamesOfGroups); delete util; //fill names with names of sequences that are from the groups we want to remove fillNames(); delete groupMap; }else if (countfile != ""){ if ((fastafile != "") || (listfile != "") || (taxfile != "")) { m->mothurOut("\n[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.\n\n"); } CountTable ct; ct.readTable(countfile, true, false); if (!ct.hasGroupInfo()) { m->mothurOut("[ERROR]: your count file does not contain group info, aborting.\n"); return 0; } vector gNamesOfGroups = ct.getNamesOfGroups(); SharedUtil util; util.setGroups(Groups, gNamesOfGroups); m->setGroups(Groups); for (int i = 0; i < Groups.size(); i++) { vector thisGroupsSeqs = ct.getNamesOfSeqs(Groups[i]); for (int j = 0; j < thisGroupsSeqs.size(); j++) { names.insert(thisGroupsSeqs[j]); } } } if (m->control_pressed) { return 0; } //read through the correct file and output lines you want to keep if (namefile != "") { readName(); } if (fastafile != "") { readFasta(); } if (groupfile != "") { readGroup(); } if (countfile != "") { readCount(); } if (listfile != "") { readList(); } if (taxfile != "") { readTax(); } if (sharedfile != "") { readShared(); } if (designfile != "") { readDesign(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } itTypes = outputTypes.find("design"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setDesignFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::readFasta(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string outputFileName = getOutputFileName("fasta", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(fastafile, in); string name; bool wroteSomething = false; int selectedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; currSeq.printSequence(out); selectedCount++; }else{ //if you are not in the accnos file check if you are a name that needs to be changed map::iterator it = uniqueToRedundant.find(name); if (it != uniqueToRedundant.end()) { wroteSomething = true; currSeq.setName(it->second); currSeq.printSequence(out); selectedCount++; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does NOT contain sequences from the groups you wish to get."); m->mothurOutEndLine(); } outputTypes["fasta"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your fasta file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::readShared(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); bool wroteSomething = false; while(lookup[0] != NULL) { variables["[tag]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); if (m->control_pressed) { out.close(); m->mothurRemove(outputFileName); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } lookup[0]->printHeaders(out); for (int i = 0; i < lookup.size(); i++) { out << lookup[i]->getLabel() << '\t' << lookup[i]->getGroup() << '\t'; lookup[i]->print(out); wroteSomething = true; } //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); out.close(); } if (wroteSomething == false) { m->mothurOut("Your file contains only the groups you wish to remove."); m->mothurOutEndLine(); } string groupsString = ""; for (int i = 0; i < Groups.size()-1; i++) { groupsString += Groups[i] + ", "; } groupsString += Groups[Groups.size()-1]; m->mothurOut("Selected groups: " + groupsString + " from your shared file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "readShared"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::readList(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); ifstream in; m->openInputFile(listfile, in); bool wroteSomething = false; int selectedCount = 0; while(!in.eof()){ selectedCount = 0; //read in list vector ListVector list(in); variables["[tag]"] = list.getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); vector binLabels = list.getLabels(); vector newBinLabels; //make a new list vector ListVector newList; newList.setLabel(list.getLabel()); //for each bin for (int i = 0; i < list.getNumBins(); i++) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } //parse out names that are in accnos file string binnames = list.get(i); vector thisBinNames; m->splitAtComma(binnames, thisBinNames); string newNames = ""; for (int j = 0; j < thisBinNames.size(); j++) { string name = thisBinNames[j]; //if that name is in the .accnos file, add it if (names.count(name) != 0) { newNames += name + ","; selectedCount++; } else{ //if you are not in the accnos file check if you are a name that needs to be changed map::iterator it = uniqueToRedundant.find(name); if (it != uniqueToRedundant.end()) { newNames += it->second + ","; selectedCount++; } } } //if there are names in this bin add to new list if (newNames != "") { newNames = newNames.substr(0, newNames.length()-1); //rip off extra comma newList.push_back(newNames); newBinLabels.push_back(binLabels[i]); } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } m->gobble(in); out.close(); } in.close(); if (wroteSomething == false) { m->mothurOut("Your file does NOT contain sequences from the groups you wish to get."); m->mothurOutEndLine(); } m->mothurOut("Selected " + toString(selectedCount) + " sequences from your list file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "readList"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::readName(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputFileName = getOutputFileName("name", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; bool wroteSomething = false; int selectedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); in >> secondCol; vector parsedNames; m->splitAtComma(secondCol, parsedNames); vector validSecond; validSecond.clear(); for (int i = 0; i < parsedNames.size(); i++) { if (names.count(parsedNames[i]) != 0) { validSecond.push_back(parsedNames[i]); } } selectedCount += validSecond.size(); //if the name in the first column is in the set then print it and any other names in second column also in set if (names.count(firstCol) != 0) { wroteSomething = true; out << firstCol << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; //make first name in set you come to first column and then add the remaining names to second column }else { //you want part of this row if (validSecond.size() != 0) { wroteSomething = true; out << validSecond[0] << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; uniqueToRedundant[firstCol] = validSecond[0]; } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does NOT contain sequences from the groups you wish to get."); m->mothurOutEndLine(); } outputTypes["name"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your name file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "readName"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::readGroup(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(groupfile, in); string name, group; bool wroteSomething = false; int selectedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> group; //read from second column //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; out << name << '\t' << group << endl; selectedCount++; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does NOT contain sequences from the groups you wish to get."); m->mothurOutEndLine(); } outputTypes["group"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your group file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::readCount(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string outputFileName = getOutputFileName("count", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(countfile, in); bool wroteSomething = false; int selectedCount = 0; string headers = m->getline(in); m->gobble(in); vector columnHeaders = m->splitWhiteSpace(headers); vector groups; map originalGroupIndexes; map GroupIndexes; set indexOfGroupsChosen; for (int i = 2; i < columnHeaders.size(); i++) { groups.push_back(columnHeaders[i]); originalGroupIndexes[i-2] = columnHeaders[i]; } //sort groups to keep consistent with how we store the groups in groupmap sort(groups.begin(), groups.end()); for (int i = 0; i < groups.size(); i++) { GroupIndexes[groups[i]] = i; } sort(Groups.begin(), Groups.end()); out << "Representative_Sequence\ttotal"; for (int i = 0; i < Groups.size(); i++) { out << '\t' << Groups[i]; indexOfGroupsChosen.insert(GroupIndexes[Groups[i]]); } out << endl; string name; int oldTotal; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> oldTotal; m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: " + name + '\t' + toString(oldTotal) + "\n"); } if (names.count(name) != 0) { //if group info, then read it vector selectedCounts; int thisTotal = 0; int temp; for (int i = 0; i < groups.size(); i++) { int thisIndex = GroupIndexes[originalGroupIndexes[i]]; in >> temp; m->gobble(in); if (indexOfGroupsChosen.count(thisIndex) != 0) { //we want this group selectedCounts.push_back(temp); thisTotal += temp; } } out << name << '\t' << thisTotal; for (int i = 0; i < selectedCounts.size(); i++) { out << '\t' << selectedCounts[i]; } out << endl; wroteSomething = true; selectedCount+= thisTotal; }else { m->getline(in); } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does NOT contain sequences from the groups you wish to get."); m->mothurOutEndLine(); } outputTypes["count"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your count file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "readCount"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::readDesign(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(designfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(designfile)); variables["[extension]"] = m->getExtension(designfile); string outputFileName = getOutputFileName("design", variables); DesignMap designMap(designfile); bool wroteSomething = false; ofstream out; m->openOutputFile(outputFileName, out); int numGroupsFound = designMap.printGroups(out, Groups); if (numGroupsFound > 0) { wroteSomething = true; } out.close(); if (wroteSomething == false) { m->mothurOut("Your file does NOT contain groups from the groups you wish to get."); m->mothurOutEndLine(); } outputTypes["design"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Selected " + toString(numGroupsFound) + " groups from your design file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "readDesign"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::readTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(taxfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(taxfile)); variables["[extension]"] = m->getExtension(taxfile); string outputFileName = getOutputFileName("taxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(taxfile, in); string name, tax; bool wroteSomething = false; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> tax; //read from second column //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; out << name << '\t' << tax << endl; }else{ //if you are not in the accnos file check if you are a name that needs to be changed map::iterator it = uniqueToRedundant.find(name); if (it != uniqueToRedundant.end()) { wroteSomething = true; out << it->second << '\t' << tax << endl; } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does NOT contain sequences from the groups you wish to get."); m->mothurOutEndLine(); } outputTypes["taxonomy"].push_back(outputFileName); outputNames.push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "readTax"); exit(1); } } //********************************************************************************************************************** int GetGroupsCommand::fillNames(){ try { vector seqs = groupMap->getNamesSeqs(); for (int i = 0; i < seqs.size(); i++) { if (m->control_pressed) { return 0; } string group = groupMap->getGroup(seqs[i]); if (m->inUsersGroups(group, Groups)) { names.insert(seqs[i]); } } return 0; } catch(exception& e) { m->errorOut(e, "GetGroupsCommand", "fillNames"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getgroupscommand.h000066400000000000000000000030601255543666200217650ustar00rootroot00000000000000#ifndef GETGROUPSCOMMAND_H #define GETGROUPSCOMMAND_H /* * getgroupscommand.h * Mothur * * Created by westcott on 11/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "groupmap.h" class GetGroupsCommand : public Command { public: GetGroupsCommand(string); GetGroupsCommand(); ~GetGroupsCommand(){} vector setParameters(); string getCommandName() { return "get.groups"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.groups"; } string getDescription() { return "gets sequences from a list, fasta, name, group, shared, design or taxonomy file from a given group or set of groups"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: set names; map uniqueToRedundant; //if a namefile is given and the first column name is not selected //then the other files need to change the unique name in their file to match. //only add the names that need to be changed to keep the map search quick string accnosfile, countfile, fastafile, namefile, groupfile, listfile, designfile, taxfile, outputDir, groups, sharedfile; bool abort; vector outputNames, Groups; GroupMap* groupMap; int readFasta(); int readName(); int readGroup(); int readCount(); int readList(); int readTax(); int fillNames(); int readShared(); int readDesign(); }; #endif mothur-1.36.1/source/commands/getlabelcommand.cpp000066400000000000000000000161101255543666200220600ustar00rootroot00000000000000/* * GetlabelCommand.cpp * Mothur * * Created by Thomas Ryabin on 1/30/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "getlabelcommand.h" //********************************************************************************************************************** vector GetlabelCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false, true); parameters.push_back(plist); CommandParameter prabund("rabund", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false, true); parameters.push_back(prabund); CommandParameter psabund("sabund", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false, true); parameters.push_back(psabund); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetlabelCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** GetlabelCommand::GetlabelCommand(){ try { abort = true; calledHelp = true; setParameters(); } catch(exception& e) { m->errorOut(e, "GetlabelCommand", "CollectCommand"); exit(1); } } //********************************************************************************************************************** string GetlabelCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.label command parameters are list, sabund and rabund file. \n"; helpString += "The get.label command should be in the following format: \n"; helpString += "get.label()\n"; helpString += "Example get.label().\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetlabelCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** GetlabelCommand::GetlabelCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; inputfile = listfile; m->setListFile(listfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { sabundfile = ""; abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { format = "sabund"; inputfile = sabundfile; m->setSabundFile(sabundfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { rabundfile = ""; abort = true; } else if (rabundfile == "not found") { rabundfile = ""; } else { format = "rabund"; inputfile = rabundfile; m->setRabundFile(rabundfile); } if ((listfile == "") && (rabundfile == "") && (sabundfile == "")) { //is there are current file available for any of these? //give priority to list, then rabund, then sabund //if there is a current shared file, use it listfile = m->getListFile(); if (listfile != "") { inputfile = listfile; format = "list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { rabundfile = m->getRabundFile(); if (rabundfile != "") { inputfile = rabundfile; format = "rabund"; m->mothurOut("Using " + rabundfile + " as input file for the rabund parameter."); m->mothurOutEndLine(); } else { sabundfile = m->getSabundFile(); if (sabundfile != "") { inputfile = sabundfile; format = "sabund"; m->mothurOut("Using " + sabundfile + " as input file for the sabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list, sabund or rabund file."); m->mothurOutEndLine(); abort = true; } } } } } } catch(exception& e) { m->errorOut(e, "GetlabelCommand", "GetlabelCommand"); exit(1); } } //********************************************************************************************************************** int GetlabelCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData* input = new InputData(inputfile, format); OrderVector* order = input->getOrderVector(); string label = order->getLabel(); while (order != NULL) { if (m->control_pressed) { delete input; delete order; return 0; } label = order->getLabel(); m->mothurOut(label); m->mothurOutEndLine(); delete order; order = input->getOrderVector(); } delete input; return 0; } catch(exception& e) { m->errorOut(e, "GetlabelCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getlabelcommand.h000066400000000000000000000016741255543666200215360ustar00rootroot00000000000000#ifndef GETLABELCOMMAND_H #define GETLABELCOMMAND_H /* * getlabelcommand.h * Mothur * * Created by Thomas Ryabin on 1/30/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "ordervector.hpp" #include "inputdata.h" class GetlabelCommand : public Command { public: GetlabelCommand(string); GetlabelCommand(); ~GetlabelCommand(){} vector setParameters(); string getCommandName() { return "get.label"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string) { return ""; } string getCitation() { return "http://www.mothur.org/wiki/Get.label"; } string getDescription() { return "outputs labels"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string inputfile, listfile, rabundfile, sabundfile, format; bool abort; vector outputNames; }; #endif mothur-1.36.1/source/commands/getlineagecommand.cpp000066400000000000000000001561111255543666200224130ustar00rootroot00000000000000/* * getlineagecommand.cpp * Mothur * * Created by westcott on 9/24/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "getlineagecommand.h" #include "sequence.hpp" #include "listvector.hpp" #include "counttable.h" #include "inputdata.h" //********************************************************************************************************************** vector GetLineageCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "FNGLT", "none","fasta",false,false, true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "FNGLT", "none","name",false,false, true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "FNGLT", "none","count",false,false, true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "FNGLT", "none","group",false,false, true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "FNGLT", "none","list",false,false, true); parameters.push_back(plist); CommandParameter pshared("shared", "InputTypes", "", "", "none", "FNGLT", "none","shared",false,false, true); parameters.push_back(pshared); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "tax", "FNGLT", "none","taxonomy",false,false, true); parameters.push_back(ptaxonomy); CommandParameter pconstaxonomy("constaxonomy", "InputTypes", "", "", "tax", "FNGLT", "none","constaxonomy",false,false, true); parameters.push_back(pconstaxonomy); CommandParameter palignreport("alignreport", "InputTypes", "", "", "none", "FNGLT", "none","alignreport",false,false); parameters.push_back(palignreport); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter ptaxon("taxon", "String", "", "", "", "", "","",false,true, true); parameters.push_back(ptaxon); CommandParameter pdups("dups", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pdups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetLineageCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.lineage command reads a taxonomy or constaxonomy file and any of the following file types: fasta, name, group, count, list, shared or alignreport file. The constaxonomy can only be used with a shared or list file.\n"; helpString += "It outputs a file containing only the sequences from the taxonomy file that are from the taxon requested.\n"; helpString += "The get.lineage command parameters are taxon, fasta, name, group, count, list, shared, taxonomy, alignreport, label and dups. You must provide taxonomy or constaxonomy unless you have a valid current taxonomy file.\n"; helpString += "The dups parameter allows you to add the entire line from a name file if you add any name from the line. default=false. \n"; helpString += "The taxon parameter allows you to select the taxons you would like to get and is required.\n"; helpString += "You may enter your taxons with confidence scores, doing so will get only those sequences that belong to the taxonomy and whose cofidence scores is above the scores you give.\n"; helpString += "If they belong to the taxonomy and have confidences below those you provide the sequence will not be selected.\n"; helpString += "The label parameter is used to analyze specific labels in your input. \n"; helpString += "The get.lineage command should be in the following format: get.lineage(taxonomy=yourTaxonomyFile, taxon=yourTaxons).\n"; helpString += "Example get.lineage(taxonomy=amazon.silva.taxonomy, taxon=Bacteria;Firmicutes;Bacilli;Lactobacillales;).\n"; helpString += "Note: If you are running mothur in script mode you must wrap the taxon in ' characters so mothur will ignore the ; in the taxon.\n"; helpString += "Example get.lineage(taxonomy=amazon.silva.taxonomy, taxon='Bacteria;Firmicutes;Bacilli;Lactobacillales;').\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetLineageCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],pick,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "constaxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "name") { pattern = "[filename],pick,[extension]"; } else if (type == "group") { pattern = "[filename],pick,[extension]"; } else if (type == "count") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[distance],pick,[extension]"; } else if (type == "shared") { pattern = "[filename],[distance],pick,[extension]"; } else if (type == "alignreport") { pattern = "[filename],pick.align.report"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetLineageCommand::GetLineageCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["shared"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "GetLineageCommand"); exit(1); } } //********************************************************************************************************************** GetLineageCommand::GetLineageCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["shared"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("alignreport"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["alignreport"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("constaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["constaxonomy"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } alignfile = validParameter.validFile(parameters, "alignreport", true); if (alignfile == "not open") { abort = true; } else if (alignfile == "not found") { alignfile = ""; } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { taxfile = ""; abort = true; } else if (taxfile == "not found") { taxfile = ""; } else { m->setTaxonomyFile(taxfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } constaxonomy = validParameter.validFile(parameters, "constaxonomy", true); if (constaxonomy == "not open") { constaxonomy = ""; abort = true; } else if (constaxonomy == "not found") { constaxonomy = ""; } if ((constaxonomy == "") && (taxfile == "")) { taxfile = m->getTaxonomyFile(); if (taxfile != "") { m->mothurOut("Using " + taxfile + " as input file for the taxonomy parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current taxonomy file and did not provide a constaxonomy file. The taxonomy or constaxonomy parameter is required."); m->mothurOutEndLine(); abort = true; } } string usedDups = "true"; string temp = validParameter.validFile(parameters, "dups", false); if (temp == "not found") { if (namefile != "") { temp = "true"; } else { temp = "false"; usedDups = ""; } } dups = m->isTrue(temp); countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } taxons = validParameter.validFile(parameters, "taxon", false); if (taxons == "not found") { taxons = ""; m->mothurOut("No taxons given, please correct."); m->mothurOutEndLine(); abort = true; } else { //rip off quotes if (taxons[0] == '\'') { taxons = taxons.substr(1); } if (taxons[(taxons.length()-1)] == '\'') { taxons = taxons.substr(0, (taxons.length()-1)); } } m->splitAtChar(taxons, listOfTaxons, '-'); if ((fastafile == "") && (constaxonomy == "") && (namefile == "") && (groupfile == "") && (alignfile == "") && (listfile == "") && (taxfile == "") && (countfile == "")) { m->mothurOut("You must provide one of the following: fasta, name, group, count, alignreport, taxonomy, constaxonomy, shared or listfile."); m->mothurOutEndLine(); abort = true; } if ((constaxonomy != "") && ((fastafile != "") || (namefile != "") || (groupfile != "") || (alignfile != "") || (taxfile != "") || (countfile != ""))) { m->mothurOut("[ERROR]: can only use constaxonomy file with a list or shared file, aborting.\n"); abort = true; } if ((constaxonomy != "") && (taxfile != "")) { m->mothurOut("[ERROR]: Choose only one: taxonomy or constaxonomy, aborting.\n"); abort = true; } if ((sharedfile != "") && (taxfile != "")) { m->mothurOut("[ERROR]: sharedfile can only be used with constaxonomy file, aborting.\n"); abort = true; } if ((sharedfile != "") || (listfile != "")) { label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("[WARNING]: You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); } } if (countfile == "") { if ((namefile == "") && ((fastafile != "") || (taxfile != ""))){ vector files; files.push_back(fastafile); files.push_back(taxfile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "GetLineageCommand"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (m->control_pressed) { return 0; } if (countfile != "") { if ((fastafile != "") || (listfile != "") || (taxfile != "")) { m->mothurOut("\n[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.\n\n"); } } //read through the correct file and output lines you want to keep if (taxfile != "") { readTax(); //fills the set of names to get if (namefile != "") { readName(); } if (fastafile != "") { readFasta(); } if (countfile != "") { readCount(); } if (groupfile != "") { readGroup(); } if (alignfile != "") { readAlign(); } if (listfile != "") { readList(); } }else { readConsTax(); if (listfile != "") { readConsList(); } if (sharedfile != "") { readShared(); } } if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readFasta(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string outputFileName = getOutputFileName("fasta", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(fastafile, in); string name; bool wroteSomething = false; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; currSeq.printSequence(out); } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains does not contain any sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["fasta"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readCount(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string outputFileName = getOutputFileName("count", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(countfile, in); bool wroteSomething = false; string headers = m->getline(in); m->gobble(in); out << headers << endl; string test = headers; vector pieces = m->splitWhiteSpace(test); string name, rest; int thisTotal; rest = ""; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> thisTotal; m->gobble(in); if (pieces.size() > 2) { rest = m->getline(in); m->gobble(in); } if (m->debug) { m->mothurOut("[DEBUG]: " + name + '\t' + rest + "\n"); } if (names.count(name) != 0) { out << name << '\t' << thisTotal << '\t' << rest << endl; wroteSomething = true; } } in.close(); out.close(); //check for groups that have been eliminated CountTable ct; if (ct.testGroups(outputFileName)) { ct.readTable(outputFileName, true, false); ct.printTable(outputFileName); } if (wroteSomething == false) { m->mothurOut("Your file contains does not contain any sequences from " + taxons + "."); m->mothurOutEndLine(); } outputTypes["count"].push_back(outputFileName); outputNames.push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readCount"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readList(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); ifstream in; m->openInputFile(listfile, in); bool wroteSomething = false; while(!in.eof()){ //read in list vector ListVector list(in); //make a new list vector ListVector newList; newList.setLabel(list.getLabel()); variables["[distance]"] = list.getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); if (m->control_pressed) { in.close(); out.close(); return 0; } vector binLabels = list.getLabels(); vector newBinLabels; //for each bin for (int i = 0; i < list.getNumBins(); i++) { //parse out names that are in accnos file string binnames = list.get(i); vector bnames; m->splitAtComma(binnames, bnames); string newNames = ""; for (int j = 0; j < bnames.size(); j++) { string name = bnames[j]; //if that name is in the .accnos file, add it if (names.count(name) != 0) { newNames += name + ","; } } //if there are names in this bin add to new list if (newNames != "") { newNames = newNames.substr(0, newNames.length()-1); //rip off extra comma newList.push_back(newNames); newBinLabels.push_back(binLabels[i]); } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } m->gobble(in); out.close(); } in.close(); if (wroteSomething == false) { m->mothurOut("Your file contains does not contain any sequences from " + taxons + "."); m->mothurOutEndLine(); } return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readList"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readConsList(){ try { getListVector(); if (m->control_pressed) { delete list; return 0;} ListVector newList; newList.setLabel(list->getLabel()); int selectedCount = 0; bool wroteSomething = false; string snumBins = toString(list->getNumBins()); vector binLabels = list->getLabels(); vector newBinLabels; for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { delete list; return 0;} //create a label for this otu string otuLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { otuLabel += "0"; } } otuLabel += sbinNumber; if (names.count(m->getSimpleLabel(otuLabel)) != 0) { selectedCount++; newList.push_back(list->get(i)); newBinLabels.push_back(binLabels[i]); } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); variables["[distance]"] = list->getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); delete list; //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain OTUs from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["list"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " OTUs from your list file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readConsList"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::getListVector(){ try { InputData input(listfile, "list"); list = input.getListVector(); string lastLabel = list->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); break; } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); break; } lastLabel = list->getLabel(); //get next line to process //prevent memory leak delete list; list = input.getListVector(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { delete list; list = input.getListVector(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "getListVector"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readShared(){ try { getShared(); if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } vector newLabels; //create new "filtered" lookup vector newLookup; for (int i = 0; i < lookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(lookup[i]->getLabel()); temp->setGroup(lookup[i]->getGroup()); newLookup.push_back(temp); } bool wroteSomething = false; int numSelected = 0; for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } return 0; } //is this otu on the list if (names.count(m->getSimpleLabel(m->currentSharedBinLabels[i])) != 0) { numSelected++; wroteSomething = true; newLabels.push_back(m->currentSharedBinLabels[i]); for (int j = 0; j < newLookup.size(); j++) { //add this OTU to the new lookup newLookup[j]->push_back(lookup[j]->getAbundance(i), lookup[j]->getGroup()); } } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } m->currentSharedBinLabels = newLabels; newLookup[0]->printHeaders(out); for (int i = 0; i < newLookup.size(); i++) { out << newLookup[i]->getLabel() << '\t' << newLookup[i]->getGroup() << '\t'; newLookup[i]->print(out); } out.close(); for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } if (wroteSomething == false) { m->mothurOut("Your file does not contain OTUs from " + taxons + "."); m->mothurOutEndLine(); } m->mothurOut("Selected " + toString(numSelected) + " OTUs from your shared file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readShared"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::getShared(){ try { InputData input(sharedfile, "sharedfile"); lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(lookup[0]->getLabel()) == 1){ processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); break; } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "getShared"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readName(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputFileName = getOutputFileName("name", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; bool wroteSomething = false; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; in >> secondCol; string hold = ""; if (dups) { hold = secondCol; } vector parsedNames; m->splitAtComma(secondCol, parsedNames); vector validSecond; for (int i = 0; i < parsedNames.size(); i++) { if (names.count(parsedNames[i]) != 0) { validSecond.push_back(parsedNames[i]); } } if ((dups) && (validSecond.size() != 0)) { //dups = true and we want to add someone, then add everyone for (int i = 0; i < parsedNames.size(); i++) { names.insert(parsedNames[i]); } out << firstCol << '\t' << hold << endl; wroteSomething = true; }else { //if the name in the first column is in the set then print it and any other names in second column also in set if (names.count(firstCol) != 0) { wroteSomething = true; out << firstCol << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; //make first name in set you come to first column and then add the remaining names to second column }else { //you want part of this row if (validSecond.size() != 0) { wroteSomething = true; out << validSecond[0] << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains does not contain any sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["name"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readName"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readGroup(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(groupfile, in); string name, group; bool wroteSomething = false; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> group; //read from second column //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; out << name << '\t' << group << endl; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains does not contain any sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["group"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(taxfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(taxfile)); variables["[extension]"] = m->getExtension(taxfile); string outputFileName = getOutputFileName("taxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(taxfile, in); string name, tax; //bool wroteSomething = false; vector taxonsHasConfidence; taxonsHasConfidence.resize(listOfTaxons.size(), false); vector< vector< map > > searchTaxons; searchTaxons.resize(listOfTaxons.size()); vector noConfidenceTaxons; noConfidenceTaxons.resize(listOfTaxons.size(), ""); for (int i = 0; i < listOfTaxons.size(); i++) { noConfidenceTaxons[i] = listOfTaxons[i]; int hasConPos = listOfTaxons[i].find_first_of('('); if (hasConPos != string::npos) { taxonsHasConfidence[i] = true; searchTaxons[i] = getTaxons(listOfTaxons[i]); noConfidenceTaxons[i] = listOfTaxons[i]; m->removeConfidences(noConfidenceTaxons[i]); } } while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> tax; //read from second column string noQuotesTax = m->removeQuotes(tax); for (int j = 0; j < listOfTaxons.size(); j++) { string newtax = noQuotesTax; //if the users file contains confidence scores we want to ignore them when searching for the taxons, unless the taxon has them if (!taxonsHasConfidence[j]) { int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences != string::npos) { newtax = noQuotesTax; m->removeConfidences(newtax); } int pos = newtax.find(noConfidenceTaxons[j]); if (pos != string::npos) { //this sequence contains the taxon the user wants names.insert(name); out << name << '\t' << tax << endl; //since you belong to at least one of the taxons we want you are included so no need to search for other break; } }else{//if listOfTaxons[i] has them and you don't them remove taxons int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences == string::npos) { int pos = newtax.find(noConfidenceTaxons[j]); if (pos != string::npos) { //this sequence contains the taxon the user wants names.insert(name); out << name << '\t' << tax << endl; //since you belong to at least one of the taxons we want you are included so no need to search for other break; } }else { //both have confidences so we want to make sure the users confidences are greater then or equal to the taxons //first remove confidences from both and see if the taxonomy exists string noNewTax = noQuotesTax; int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences != string::npos) { noNewTax = noQuotesTax; m->removeConfidences(noNewTax); } int pos = noNewTax.find(noConfidenceTaxons[j]); if (pos != string::npos) { //if yes, then are the confidences okay bool good = true; vector< map > usersTaxon = getTaxons(newtax); //the usersTaxon is most likely longer than the searchTaxons, and searchTaxon[0] may relate to userTaxon[4] //we want to "line them up", so we will find the the index where the searchstring starts int index = 0; for (int i = 0; i < usersTaxon.size(); i++) { if (usersTaxon[i].begin()->first == searchTaxons[j][0].begin()->first) { index = i; int spot = 0; bool goodspot = true; //is this really the start, or are we dealing with a taxon of the same name? while ((spot < searchTaxons[j].size()) && ((i+spot) < usersTaxon.size())) { if (usersTaxon[i+spot].begin()->first != searchTaxons[j][spot].begin()->first) { goodspot = false; break; } else { spot++; } } if (goodspot) { break; } } } for (int i = 0; i < searchTaxons[j].size(); i++) { if ((i+index) < usersTaxon.size()) { //just in case, should never be false if (usersTaxon[i+index].begin()->second < searchTaxons[j][i].begin()->second) { //is the users cutoff less than the search taxons good = false; break; } }else { good = false; break; } } //passed the test so add you if (good) { names.insert(name); out << name << '\t' << tax << endl; break; } } } } } m->gobble(in); } in.close(); out.close(); if (names.size() == 0) { m->mothurOut("Your taxonomy file does not contain any sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["taxonomy"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readTax"); exit(1); } } //********************************************************************************************************************** int GetLineageCommand::readConsTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(constaxonomy); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(constaxonomy)); variables["[extension]"] = m->getExtension(constaxonomy); string outputFileName = getOutputFileName("constaxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(constaxonomy, in); string otuLabel, tax; int numReps; //read headers string headers = m->getline(in); out << headers << endl; //bool wroteSomething = false; vector taxonsHasConfidence; taxonsHasConfidence.resize(listOfTaxons.size(), false); vector< vector< map > > searchTaxons; searchTaxons.resize(listOfTaxons.size()); vector noConfidenceTaxons; noConfidenceTaxons.resize(listOfTaxons.size(), ""); for (int i = 0; i < listOfTaxons.size(); i++) { noConfidenceTaxons[i] = listOfTaxons[i]; int hasConPos = listOfTaxons[i].find_first_of('('); if (hasConPos != string::npos) { taxonsHasConfidence[i] = true; searchTaxons[i] = getTaxons(listOfTaxons[i]); noConfidenceTaxons[i] = listOfTaxons[i]; m->removeConfidences(noConfidenceTaxons[i]); } } while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> otuLabel; m->gobble(in); in >> numReps; m->gobble(in); in >> tax; m->gobble(in); string noQuotesTax = m->removeQuotes(tax); for (int j = 0; j < listOfTaxons.size(); j++) { string newtax = noQuotesTax; //if the users file contains confidence scores we want to ignore them when searching for the taxons, unless the taxon has them if (!taxonsHasConfidence[j]) { int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences != string::npos) { newtax = noQuotesTax; m->removeConfidences(newtax); } int pos = newtax.find(noConfidenceTaxons[j]); if (pos != string::npos) { //this sequence contains the taxon the user wants names.insert(m->getSimpleLabel(otuLabel)); out << otuLabel << '\t' << numReps << '\t' << tax << endl; //since you belong to at least one of the taxons we want you are included so no need to search for other break; } }else{//if listOfTaxons[i] has them and you don't them remove taxons int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences == string::npos) { int pos = newtax.find(noConfidenceTaxons[j]); if (pos != string::npos) { //this sequence contains the taxon the user wants names.insert(m->getSimpleLabel(otuLabel)); out << otuLabel << '\t' << numReps << '\t' << tax << endl; //since you belong to at least one of the taxons we want you are included so no need to search for other break; } }else { //both have confidences so we want to make sure the users confidences are greater then or equal to the taxons //first remove confidences from both and see if the taxonomy exists string noNewTax = noQuotesTax; int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences != string::npos) { noNewTax = noQuotesTax; m->removeConfidences(noNewTax); } int pos = noNewTax.find(noConfidenceTaxons[j]); if (pos != string::npos) { //if yes, then are the confidences okay bool good = true; vector< map > usersTaxon = getTaxons(newtax); //the usersTaxon is most likely longer than the searchTaxons, and searchTaxon[0] may relate to userTaxon[4] //we want to "line them up", so we will find the the index where the searchstring starts int index = 0; for (int i = 0; i < usersTaxon.size(); i++) { if (usersTaxon[i].begin()->first == searchTaxons[j][0].begin()->first) { index = i; int spot = 0; bool goodspot = true; //is this really the start, or are we dealing with a taxon of the same name? while ((spot < searchTaxons[j].size()) && ((i+spot) < usersTaxon.size())) { if (usersTaxon[i+spot].begin()->first != searchTaxons[j][spot].begin()->first) { goodspot = false; break; } else { spot++; } } if (goodspot) { break; } } } for (int i = 0; i < searchTaxons[j].size(); i++) { if ((i+index) < usersTaxon.size()) { //just in case, should never be false if (usersTaxon[i+index].begin()->second < searchTaxons[j][i].begin()->second) { //is the users cutoff less than the search taxons good = false; break; } }else { good = false; break; } } //passed the test so add you if (good) { names.insert(m->getSimpleLabel(otuLabel)); out << otuLabel << '\t' << numReps << '\t' << tax << endl; break; } } } } } } in.close(); out.close(); if (names.size() == 0) { m->mothurOut("Your taxonomy file does not contain any OTUs from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["constaxonomy"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readConsTax"); exit(1); } } /**************************************************************************************************/ vector< map > GetLineageCommand::getTaxons(string tax) { try { vector< map > t; string taxon = ""; int taxLength = tax.length(); for(int i=0;iisNumeric1(confidenceScore)) { //its a confidence newtaxon = taxon.substr(0, openParen); //rip off confidence confidence = taxon.substr((openParen+1), (closeParen-openParen-1)); }else { //its part of the taxon newtaxon = taxon; confidence = "0"; } }else{ newtaxon = taxon; confidence = "0"; } float con = 0; convert(confidence, con); map temp; temp[newtaxon] = con; t.push_back(temp); taxon = ""; } else{ taxon += tax[i]; } } return t; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "getTaxons"); exit(1); } } //********************************************************************************************************************** //alignreport file has a column header line then all other lines contain 16 columns. we just want the first column since that contains the name int GetLineageCommand::readAlign(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(alignfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(alignfile)); variables["[extension]"] = m->getExtension(alignfile); string outputFileName = getOutputFileName("alignreport", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(alignfile, in); string name, junk; bool wroteSomething = false; //read column headers for (int i = 0; i < 16; i++) { if (!in.eof()) { in >> junk; out << junk << '\t'; } else { break; } } out << endl; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; out << name << '\t'; //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; out << junk << '\t'; } else { break; } } out << endl; }else {//still read just don't do anything with it //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; } else { break; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains does not contain any sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["alignreport"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetLineageCommand", "readAlign"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getlineagecommand.h000066400000000000000000000030561255543666200220570ustar00rootroot00000000000000#ifndef GETLINEAGECOMMAND_H #define GETLINEAGECOMMAND_H /* * getlineagecommand.h * Mothur * * Created by westcott on 9/24/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sharedrabundvector.h" #include "listvector.hpp" class GetLineageCommand : public Command { public: GetLineageCommand(string); GetLineageCommand(); ~GetLineageCommand(){} vector setParameters(); string getCommandName() { return "get.lineage"; } string getCommandCategory() { return "Phylotype Analysis"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.lineage"; } string getDescription() { return "gets sequences from a list, fasta, name, group, alignreport or taxonomy file from a given taxonomy or set of taxonomies"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: set names; vector outputNames, listOfTaxons; string fastafile, namefile, groupfile, alignfile, countfile, listfile, taxfile, outputDir, taxons, sharedfile, constaxonomy, label; bool abort, dups; vector lookup; ListVector* list; int readFasta(); int readName(); int readCount(); int readGroup(); int readAlign(); int readList(); int readTax(); int readShared(); int readConsTax(); int readConsList(); int getShared(); int getListVector(); vector< map > getTaxons(string); }; #endif mothur-1.36.1/source/commands/getlistcountcommand.cpp000066400000000000000000000261561255543666200230400ustar00rootroot00000000000000/* * getlistcountcommand.cpp * Mothur * * Created by westcott on 10/12/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "getlistcountcommand.h" //********************************************************************************************************************** vector GetListCountCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","otu",false,true, true); parameters.push_back(plist); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter parasort("sort", "Multiple", "name-otu", "otu", "", "", "","",false,false); parameters.push_back(parasort); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetListCountCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetListCountCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.otulist command parameters are list, sort and label. list is required, unless you have a valid current list file.\n"; helpString += "The label parameter allows you to select what distance levels you would like a output files created for, and are separated by dashes.\n"; helpString += "The sort parameter allows you to select how you want the output displayed. Options are otu and name.\n"; helpString += "If otu is selected the output will be otu number followed by the list of names in that otu.\n"; helpString += "If name is selected the output will be a sequence name followed by its otu number.\n"; helpString += "The get.otulist command should be in the following format: get.otulist(list=yourlistFile, label=yourLabels).\n"; helpString += "Example get.otulist(list=amazon.fn.list, label=0.10).\n"; helpString += "The default value for label is all lines in your inputfile.\n"; helpString += "The get.otulist command outputs a .otu file for each distance you specify listing the bin number and the names of the sequences in that bin.\n"; helpString += "Note: No spaces between parameter labels (i.e. list), '=' and parameters (i.e.yourListFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetListCountCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetListCountCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "otu") { pattern = "[filename],[tag],otu"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetListCountCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetListCountCommand::GetListCountCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["otu"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetListCountCommand", "GetListCountCommand"); exit(1); } } //********************************************************************************************************************** GetListCountCommand::GetListCountCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["otu"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not found") { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current list file and the list parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (listfile == "not open") { abort = true; } else { m->setListFile(listfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... sort = validParameter.validFile(parameters, "sort", false); if (sort == "not found") { sort = "otu"; } if ((sort != "otu") && (sort != "name")) { m->mothurOut( sort + " is not a valid sort option. Options are otu and name. I will use otu."); m->mothurOutEndLine(); sort = "otu"; } label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } } } catch(exception& e) { m->errorOut(e, "GetListCountCommand", "GetListCountCommand"); exit(1); } } //********************************************************************************************************************** int GetListCountCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } input = new InputData(listfile, "list"); list = input->getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { delete input; delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if(allLines == 1 || labels.count(list->getLabel()) == 1){ process(list); if (m->control_pressed) { delete input; delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input->getListVector(lastLabel); process(list); if (m->control_pressed) { delete input; delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input->getListVector(); } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input->getListVector(lastLabel); process(list); if (m->control_pressed) { delete input; delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } delete list; } delete input; m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetListCountCommand", "execute"); exit(1); } } //********************************************************************************************************************** //return 1 if error, 0 otherwise void GetListCountCommand::process(ListVector* list) { try { string binnames; if (outputDir == "") { outputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[tag]"] = list->getLabel(); string outputFileName = getOutputFileName("otu", variables); m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["otu"].push_back(outputFileName); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); //for each bin in the list vector vector binLabels = list->getLabels(); for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { break; } binnames = list->get(i); if (sort == "otu") { out << binLabels[i] << '\t' << binnames << endl; }else{ //sort = name vector names; m->splitAtComma(binnames, names); for (int j = 0; j < names.size(); j++) { out << names[j] << '\t' << binLabels[i] << endl; } } } out.close(); } catch(exception& e) { m->errorOut(e, "GetListCountCommand", "process"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getlistcountcommand.h000066400000000000000000000023431255543666200224750ustar00rootroot00000000000000#ifndef GETLISTCOUNTCOMMAND_H #define GETLISTCOUNTCOMMAND_H /* * getlistcountcommand.h * Mothur * * Created by westcott on 10/12/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "listvector.hpp" /**********************************************************/ class GetListCountCommand : public Command { public: GetListCountCommand(string); GetListCountCommand(); ~GetListCountCommand(){} vector setParameters(); string getCommandName() { return "get.otulist"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getCitation() { return "http://www.mothur.org/wiki/Get.otulist"; } string getDescription() { return "lists each OTU number and the sequence contained in that OTU"; } string getHelpString(); string getOutputPattern(string); int execute(); void help() { m->mothurOut(getHelpString()); } private: ListVector* list; InputData* input; bool abort, allLines; set labels; //holds labels to be used string label, listfile, outputDir, sort; ofstream out; vector outputNames; void process(ListVector*); }; /**********************************************************/ #endif mothur-1.36.1/source/commands/getmetacommunitycommand.cpp000066400000000000000000001446571255543666200237160ustar00rootroot00000000000000// // getmetacommunitycommand.cpp // Mothur // // Created by SarahsWork on 4/9/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "getmetacommunitycommand.h" #include "communitytype.h" #include "kmeans.h" #include "validcalculator.h" #include "subsample.h" //********************************************************************************************************************** vector GetMetaCommunityCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","outputType",false,true); parameters.push_back(pshared); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pcalc("calc", "Multiple", "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan-kstest-sharednseqs-ochiai-anderberg-kulczynski-kulczynskicody-lennon-morisitahorn-braycurtis-whittaker-odum-canberra-structeuclidean-structchord-hellinger-manhattan-structpearson-soergel-spearman-structkulczynski-speciesprofile-hamming-structchi2-gower-memchi2-memchord-memeuclidean-mempearson-jsd-rjsd", "rjsd", "", "", "","",false,false,true); parameters.push_back(pcalc); CommandParameter psubsample("subsample", "String", "", "", "", "", "","",false,false); parameters.push_back(psubsample); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pminpartitions("minpartitions", "Number", "", "5", "", "", "","",false,false,true); parameters.push_back(pminpartitions); CommandParameter pmaxpartitions("maxpartitions", "Number", "", "100", "", "", "","",false,false,true); parameters.push_back(pmaxpartitions); CommandParameter poptimizegap("optimizegap", "Number", "", "3", "", "", "","",false,false,true); parameters.push_back(poptimizegap); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter pmethod("method", "Multiple", "dmm-kmeans-pam", "dmm", "", "", "","",false,false,true); parameters.push_back(pmethod); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "NewCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetMetaCommunityCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.communitytype command parameters are shared, method, label, groups, minpartitions, maxpartitions, optimizegap and processors. The shared file is required. \n"; helpString += "The label parameter is used to analyze specific labels in your input. labels are separated by dashes.\n"; helpString += "The groups parameter allows you to specify which of the groups in your shared file you would like analyzed. Group names are separated by dashes.\n"; helpString += "The method parameter allows to select the method you would like to use. Options are dmm, kmeans and pam. Default=dmm.\n"; helpString += "The calc parameter allows to select the calculator you would like to use to calculate the distance matrix used by the pam and kmeans method. By default the rjsd calculator is used.\n"; helpString += "The iters parameter allows you to choose the number of times you would like to run the subsample while calculating the distance matrix for the pam and kmeans method.\n"; helpString += "The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group while calculating the distance matrix for the pam and kmeans methods.\n"; helpString += "The minpartitions parameter is used to .... Default=5.\n"; helpString += "The maxpartitions parameter is used to .... Default=10.\n"; helpString += "The optimizegap parameter is used to .... Default=3.\n"; helpString += "The processors parameter allows you to specify number of processors to use. The default is 1.\n"; helpString += "The get.communitytype command should be in the following format: get.communitytype(shared=yourSharedFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetMetaCommunityCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetMetaCommunityCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fit") { pattern = "[filename],[distance],[method],mix.fit"; } else if (type == "relabund") { pattern = "[filename],[distance],[method],[tag],mix.relabund"; } else if (type == "design") { pattern = "[filename],[distance],[method],mix.design"; } else if (type == "matrix") { pattern = "[filename],[distance],[method],[tag],mix.posterior"; } else if (type == "parameters") { pattern = "[filename],[distance],[method],mix.parameters"; } else if (type == "summary") { pattern = "[filename],[distance],[method],mix.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetMetaCommunityCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetMetaCommunityCommand::GetMetaCommunityCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fit"] = tempOutNames; outputTypes["relabund"] = tempOutNames; outputTypes["matrix"] = tempOutNames; outputTypes["design"] = tempOutNames; outputTypes["parameters"] = tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetMetaCommunityCommand", "GetMetaCommunityCommand"); exit(1); } } //********************************************************************************************************************** GetMetaCommunityCommand::GetMetaCommunityCommand(string option) { try { abort = false; calledHelp = false; allLines=true; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["fit"] = tempOutNames; outputTypes["relabund"] = tempOutNames; outputTypes["matrix"] = tempOutNames; outputTypes["design"] = tempOutNames; outputTypes["parameters"] = tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); if(it != parameters.end()){ path = m->hasPath(it->second); if (path == "") { parameters["shared"] = inputDir + it->second; } } } //get shared file, it is required sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "minpartitions", false); if (temp == "not found"){ temp = "5"; } m->mothurConvert(temp, minpartitions); temp = validParameter.validFile(parameters, "maxpartitions", false); if (temp == "not found"){ temp = "10"; } m->mothurConvert(temp, maxpartitions); temp = validParameter.validFile(parameters, "optimizegap", false); if (temp == "not found"){ temp = "3"; } m->mothurConvert(temp, optimizegap); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } //set processors to 1 until we figure out whats going on with this command. temp = "1"; //m->setProcessors(temp); m->mothurOut("Using 1 processor\n"); m->mothurConvert(temp, processors); string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } method = validParameter.validFile(parameters, "method", false); if (method == "not found") { method = "dmm"; } if ((method == "dmm") || (method == "kmeans") || (method == "pam")) { } else { m->mothurOut("[ERROR]: " + method + " is not a valid method. Valid algorithms are dmm, kmeans and pam."); m->mothurOutEndLine(); abort = true; } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "rjsd"; } else { if (calc == "default") { calc = "rjsd"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } if (Estimators.size() != 1) { abort = true; m->mothurOut("[ERROR]: only one calculator is allowed.\n"); } temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "subsample", false); if (temp == "not found") { temp = "F"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); subsample = true; } else { if (m->isTrue(temp)) { subsample = true; subsampleSize = -1; } //we will set it to smallest group later else { subsample = false; } } if (subsample == false) { iters = 0; } } } catch(exception& e) { m->errorOut(e, "GetMetaCommunityCommand", "GetMetaCommunityCommand"); exit(1); } } //********************************************************************************************************************** int GetMetaCommunityCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (subsample) { if (subsampleSize == -1) { //user has not set size, set size = smallest samples size subsampleSize = lookup[0]->getNumSeqs(); for (int i = 1; i < lookup.size(); i++) { int thisSize = lookup[i]->getNumSeqs(); if (thisSize < subsampleSize) { subsampleSize = thisSize; } } }else { m->clearGroups(); Groups.clear(); vector temp; for (int i = 0; i < lookup.size(); i++) { if (lookup[i]->getNumSeqs() < subsampleSize) { m->mothurOut(lookup[i]->getGroup() + " contains " + toString(lookup[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete lookup[i]; }else { Groups.push_back(lookup[i]->getGroup()); temp.push_back(lookup[i]); } } lookup = temp; m->setGroups(Groups); } if (lookup.size() < 2) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } } //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createProcesses(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createProcesses(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createProcesses(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetMetaCommunityCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetMetaCommunityCommand::createProcesses(vector& thislookup){ try { //#if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) // #else //until bug is resolved processors=1; //qFinderDMM not thread safe //#endif vector processIDS; int process = 1; int num = 0; int minPartition = 0; //sanity check if (maxpartitions < processors) { processors = maxpartitions; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = thislookup[0]->getLabel(); variables["[method]"] = method; string outputFileName = getOutputFileName("fit", variables); outputNames.push_back(outputFileName); outputTypes["fit"].push_back(outputFileName); //divide the partitions between the processors vector< vector > dividedPartitions; vector< vector > rels, matrix; vector doneFlags; dividedPartitions.resize(processors); rels.resize(processors); matrix.resize(processors); //for each file group figure out which process will complete it //want to divide the load intelligently so the big files are spread between processes for (int i=1; i<=maxpartitions; i++) { //cout << i << endl; int processToAssign = (i+1) % processors; if (processToAssign == 0) { processToAssign = processors; } if (m->debug) { m->mothurOut("[DEBUG]: assigning " + toString(i) + " to process " + toString(processToAssign-1) + "\n"); } dividedPartitions[(processToAssign-1)].push_back(i); variables["[tag]"] = toString(i); string relName = getOutputFileName("relabund", variables); string mName = getOutputFileName("matrix", variables); rels[(processToAssign-1)].push_back(relName); matrix[(processToAssign-1)].push_back(mName); } for (int i = 0; i < processors; i++) { //read from everyone elses, just write to yours string tempDoneFile = m->getRootName(m->getSimpleName(sharedfile)) + toString(i) + ".done.temp"; doneFlags.push_back(tempDoneFile); ofstream out; m->openOutputFile(tempDoneFile, out); //clear out out.close(); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ outputNames.clear(); num = processDriver(thislookup, dividedPartitions[process], (outputFileName + m->mothurGetpid(process)), rels[process], matrix[process], doneFlags, process); //pass numSeqs to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".outputNames.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << outputNames.size() << endl; for (int i = 0; i < outputNames.size(); i++) { out << outputNames[i] << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } //do my part if (method == "dmm") { m->mothurOut("K\tNLE\t\tlogDet\tBIC\t\tAIC\t\tLaplace\n"); } else { m->mothurOut("K\tCH"); for (int i = 0; i < thislookup.size(); i++) { m->mothurOut('\t' + thislookup[i]->getGroup()); } m->mothurOut("\n"); } minPartition = processDriver(thislookup, dividedPartitions[0], outputFileName, rels[0], matrix[0], doneFlags, 0); //force parent to wait until all the processes are done for (int i=0;i tempOutputNames = outputNames; for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; m->gobble(in); if (tempNum < minPartition) { minPartition = tempNum; } in >> tempNum; m->gobble(in); for (int i = 0; i < tempNum; i++) { string tempName = ""; in >> tempName; m->gobble(in); tempOutputNames.push_back(tempName); } } in.close(); m->mothurRemove(tempFile); m->appendFilesWithoutHeaders(outputFileName + toString(processIDS[i]), outputFileName); m->mothurRemove(outputFileName + toString(processIDS[i])); } if (processors > 1) { outputNames.clear(); for (int i = 0; i < tempOutputNames.size(); i++) { //remove files if needed string name = tempOutputNames[i]; vector parts; m->splitAtChar(name, parts, '.'); bool keep = true; if (((parts[parts.size()-1] == "relabund") || (parts[parts.size()-1] == "posterior")) && (parts[parts.size()-2] == "mix")) { string tempNum = parts[parts.size()-3]; int num; m->mothurConvert(tempNum, num); //if (num > minPartition) { // m->mothurRemove(tempOutputNames[i]); // keep = false; if (m->debug) { m->mothurOut("[DEBUG]: removing " + tempOutputNames[i] + ".\n"); } //} } if (keep) { outputNames.push_back(tempOutputNames[i]); } } //reorder fit file ifstream in; m->openInputFile(outputFileName, in); string headers = m->getline(in); m->gobble(in); map file; while (!in.eof()) { string numString, line; int num; in >> numString; line = m->getline(in); m->gobble(in); m->mothurConvert(numString, num); file[num] = line; } in.close(); ofstream out; m->openOutputFile(outputFileName, out); out << headers << endl; for (map::iterator it = file.begin(); it != file.end(); it++) { out << it->first << '\t' << it->second << endl; if (m->debug) { m->mothurOut("[DEBUG]: printing: " + toString(it->first) + '\t' + it->second + ".\n"); } } out.close(); } #else m->mothurOut("K\tNLE\t\tlogDet\tBIC\t\tAIC\t\tLaplace\n"); minPartition = processDriver(thislookup, dividedPartitions[0], outputFileName, rels[0], matrix[0], doneFlags, 0); #endif for (int i = 0; i < processors; i++) { //read from everyone elses, just write to yours string tempDoneFile = m->getRootName(m->getSimpleName(sharedfile)) + toString(i) + ".done.temp"; m->mothurRemove(tempDoneFile); } if (m->control_pressed) { return 0; } if (m->debug) { m->mothurOut("[DEBUG]: minPartition = " + toString(minPartition) + "\n"); } //run generate Summary function for smallest minPartition variables["[tag]"] = toString(minPartition); vector piValues = generateDesignFile(minPartition, variables); if (method == "dmm") { generateSummaryFile(minPartition, variables, piValues); } //pam doesn't make a relabund file return 0; } catch(exception& e) { m->errorOut(e, "GetMetaCommunityCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** int GetMetaCommunityCommand::processDriver(vector& thislookup, vector& parts, string outputFileName, vector relabunds, vector matrix, vector doneFlags, int processID){ try { double minLaplace = 1e10; int minPartition = 1; vector minSilhouettes; minSilhouettes.resize(thislookup.size(), 0); ofstream fitData, silData; if (method == "dmm") { m->openOutputFile(outputFileName, fitData); fitData.setf(ios::fixed, ios::floatfield); fitData.setf(ios::showpoint); fitData << "K\tNLE\tlogDet\tBIC\tAIC\tLaplace" << endl; }else if((method == "pam") || (method == "kmeans")) { //because ch is looking of maximal value minLaplace = 0; m->openOutputFile(outputFileName, silData); silData.setf(ios::fixed, ios::floatfield); silData.setf(ios::showpoint); silData << "K\tCH"; for (int i = 0; i < thislookup.size(); i++) { silData << '\t' << thislookup[i]->getGroup(); } silData << endl; } cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); vector< vector > sharedMatrix; vector thisGroups; for (int i = 0; i < thislookup.size(); i++) { sharedMatrix.push_back(thislookup[i]->getAbundances()); thisGroups.push_back(thislookup[i]->getGroup()); } vector< vector > dists; //do we want to output this matrix?? if ((method == "pam") || (method == "kmeans")) { dists = generateDistanceMatrix(thislookup); } if (m->debug) { m->mothurOut("[DEBUG]: dists = \n"); for (int i = 0; i < dists.size(); i++) { if (m->control_pressed) { break; } m->mothurOut("[DEBUG]: i = " + toString(i) + '\t'); for (int j = 0; j < i; j++) { m->mothurOut(toString(dists[i][j]) +"\t"); } m->mothurOut("\n"); } } for(int i=0;idebug) { m->mothurOut("[DEBUG]: running partition " + toString(numPartitions) + "\n"); } if (m->control_pressed) { break; } //check to see if anyone else is done for (int j = 0; j < doneFlags.size(); j++) { if (!m->isBlank(doneFlags[j])) { //another process has finished //are they done at a lower partition? ifstream in; m->openInputFile(doneFlags[j], in); int tempNum; in >> tempNum; in.close(); if (tempNum < numPartitions) { break; } //quit, because someone else has finished } } CommunityTypeFinder* finder = NULL; if (method == "dmm") { finder = new qFinderDMM(sharedMatrix, numPartitions); } else if (method == "kmeans") { finder = new KMeans(sharedMatrix, numPartitions); } else if (method == "pam") { finder = new Pam(sharedMatrix, dists, numPartitions); } else { if (i == 0) { m->mothurOut(method + " is not a valid method option. I will run the command using dmm.\n"); } finder = new qFinderDMM(sharedMatrix, numPartitions); } string relabund = relabunds[i]; string matrixName = matrix[i]; outputNames.push_back(matrixName); outputTypes["matrix"].push_back(matrixName); finder->printZMatrix(matrixName, thisGroups); double chi; vector silhouettes; if (method == "dmm") { double laplace = finder->getLaplace(); if(laplace < minLaplace){ minPartition = numPartitions; minLaplace = laplace; } }else { chi = finder->calcCHIndex(dists); silhouettes = finder->calcSilhouettes(dists); if (chi > minLaplace) { //save partition with maximum ch index score minPartition = numPartitions; minLaplace = chi; minSilhouettes = silhouettes; } } if (method == "dmm") { finder->printFitData(cout, minLaplace); finder->printFitData(fitData); finder->printRelAbund(relabund, m->currentSharedBinLabels); outputNames.push_back(relabund); outputTypes["relabund"].push_back(relabund); }else if ((method == "pam") || (method == "kmeans")) { //print silouettes and ch values finder->printSilData(cout, chi, silhouettes); finder->printSilData(silData, chi, silhouettes); if (method == "kmeans") { finder->printRelAbund(relabund, m->currentSharedBinLabels); outputNames.push_back(relabund); outputTypes["relabund"].push_back(relabund); } } delete finder; if(optimizegap != -1 && (numPartitions - minPartition) >= optimizegap && numPartitions >= minpartitions){ string tempDoneFile = m->getRootName(m->getSimpleName(sharedfile)) + toString(processID) + ".done.temp"; ofstream outDone; m->openOutputFile(tempDoneFile, outDone); outDone << minPartition << endl; outDone.close(); break; } } if (method == "dmm") { fitData.close(); } if (m->control_pressed) { return 0; } return minPartition; } catch(exception& e) { m->errorOut(e, "GetMetaCommunityCommand", "processDriver"); exit(1); } } /**************************************************************************************************/ vector GetMetaCommunityCommand::generateDesignFile(int numPartitions, map variables){ try { vector piValues(numPartitions, 0); ifstream postFile; variables["[tag]"] = toString(numPartitions); string input = getOutputFileName("matrix", variables); m->openInputFile(input, postFile);//((fileRoot + toString(numPartitions) + "mix.posterior").c_str()); //matrix file variables.erase("[tag]"); string outputFileName = getOutputFileName("design", variables); ofstream designFile; m->openOutputFile(outputFileName, designFile); outputNames.push_back(outputFileName); outputTypes["design"].push_back(outputFileName); vector titles(numPartitions); for(int i=0;i> titles[i]; } double posterior; string sampleName; int numSamples = 0; while(postFile){ if (m->control_pressed) { break; } double maxPosterior = 0.0000; int maxPartition = -1; postFile >> sampleName; for(int i=0;i> posterior; if(posterior > maxPosterior){ maxPosterior = posterior; maxPartition = i; } piValues[i] += posterior; } designFile << sampleName << '\t' << titles[maxPartition] << endl; numSamples++; m->gobble(postFile); } for(int i=0;ierrorOut(e, "GetMetaCommunityCommand", "generateDesignFile"); exit(1); } } /**************************************************************************************************/ inline bool summaryFunction(summaryData i, summaryData j){ return i.difference > j.difference; } /**************************************************************************************************/ int GetMetaCommunityCommand::generateSummaryFile(int numPartitions, map v, vector piValues){ try { vector summary; vector pMean(numPartitions, 0); vector pLCI(numPartitions, 0); vector pUCI(numPartitions, 0); string name, header; double mean, lci, uci; ifstream referenceFile; map variables; variables["[filename]"] = v["[filename]"]; variables["[distance]"] = v["[distance]"]; variables["[method]"] = method; variables["[tag]"] = "1"; string reference = getOutputFileName("relabund", variables); m->openInputFile(reference, referenceFile); //((fileRoot + label + ".1mix.relabund").c_str()); variables["[tag]"] = toString(numPartitions); string partFile = getOutputFileName("relabund", variables); ifstream partitionFile; m->openInputFile(partFile, partitionFile); //((fileRoot + toString(numPartitions) + "mix.relabund").c_str()); header = m->getline(referenceFile); header = m->getline(partitionFile); stringstream head(header); string dummy, label; head >> dummy; vector thetaValues(numPartitions, ""); for(int i=0;i> label >> dummy >> dummy; thetaValues[i] = label.substr(label.find_last_of('_')+1); } vector partitionDiff(numPartitions, 0.0000); while(referenceFile){ if (m->control_pressed) { break; } referenceFile >> name >> mean >> lci >> uci; summaryData tempData; tempData.name = name; tempData.refMean = mean; double difference = 0.0000; partitionFile >> name; for(int j=0;j> pMean[j] >> pLCI[j] >> pUCI[j]; difference += abs(mean - pMean[j]); partitionDiff[j] += abs(mean - pMean[j]);; } tempData.partMean = pMean; tempData.partLCI = pLCI; tempData.partUCI = pUCI; tempData.difference = difference; summary.push_back(tempData); m->gobble(referenceFile); m->gobble(partitionFile); } referenceFile.close(); partitionFile.close(); if (m->control_pressed) { return 0; } int numOTUs = (int)summary.size(); sort(summary.begin(), summary.end(), summaryFunction); variables.erase("[tag]"); string outputFileName = getOutputFileName("parameters", variables); outputNames.push_back(outputFileName); outputTypes["parameters"].push_back(outputFileName); ofstream parameterFile; m->openOutputFile(outputFileName, parameterFile); //((fileRoot + "mix.parameters").c_str()); parameterFile.setf(ios::fixed, ios::floatfield); parameterFile.setf(ios::showpoint); double totalDifference = 0.0000; parameterFile << "Part\tDif2Ref_i\ttheta_i\tpi_i\n"; for(int i=0;icontrol_pressed) { break; } parameterFile << i+1 << '\t' << setprecision(2) << partitionDiff[i] << '\t' << thetaValues[i] << '\t' << piValues[i] << endl; totalDifference += partitionDiff[i]; } parameterFile.close(); if (m->control_pressed) { return 0; } string summaryFileName = getOutputFileName("summary", variables); outputNames.push_back(summaryFileName); outputTypes["summary"].push_back(summaryFileName); ofstream summaryFile; m->openOutputFile(summaryFileName, summaryFile); //((fileRoot + "mix.summary").c_str()); summaryFile.setf(ios::fixed, ios::floatfield); summaryFile.setf(ios::showpoint); summaryFile << "OTU\tP0.mean"; for(int i=0;icontrol_pressed) { break; } summaryFile << summary[i].name << setprecision(2) << '\t' << summary[i].refMean; for(int j=0;jerrorOut(e, "GetMetaCommunityCommand", "generateSummaryFile"); exit(1); } } //********************************************************************************************************************** vector > GetMetaCommunityCommand::generateDistanceMatrix(vector& thisLookup){ try { vector > results; Calculator* matrixCalculator; ValidCalculators validCalculator; int i = 0; if (validCalculator.isValidCalculator("matrix", Estimators[i]) == true) { if (Estimators[i] == "sharedsobs") { matrixCalculator = new SharedSobsCS(); }else if (Estimators[i] == "sharedchao") { matrixCalculator = new SharedChao1(); }else if (Estimators[i] == "sharedace") { matrixCalculator = new SharedAce(); }else if (Estimators[i] == "jabund") { matrixCalculator = new JAbund(); }else if (Estimators[i] == "sorabund") { matrixCalculator = new SorAbund(); }else if (Estimators[i] == "jclass") { matrixCalculator = new Jclass(); }else if (Estimators[i] == "sorclass") { matrixCalculator = new SorClass(); }else if (Estimators[i] == "jest") { matrixCalculator = new Jest(); }else if (Estimators[i] == "sorest") { matrixCalculator = new SorEst(); }else if (Estimators[i] == "thetayc") { matrixCalculator = new ThetaYC(); }else if (Estimators[i] == "thetan") { matrixCalculator = new ThetaN(); }else if (Estimators[i] == "kstest") { matrixCalculator = new KSTest(); }else if (Estimators[i] == "sharednseqs") { matrixCalculator = new SharedNSeqs(); }else if (Estimators[i] == "ochiai") { matrixCalculator = new Ochiai(); }else if (Estimators[i] == "anderberg") { matrixCalculator = new Anderberg(); }else if (Estimators[i] == "kulczynski") { matrixCalculator = new Kulczynski(); }else if (Estimators[i] == "kulczynskicody") { matrixCalculator = new KulczynskiCody(); }else if (Estimators[i] == "lennon") { matrixCalculator = new Lennon(); }else if (Estimators[i] == "morisitahorn") { matrixCalculator = new MorHorn(); }else if (Estimators[i] == "braycurtis") { matrixCalculator = new BrayCurtis(); }else if (Estimators[i] == "whittaker") { matrixCalculator = new Whittaker(); }else if (Estimators[i] == "odum") { matrixCalculator = new Odum(); }else if (Estimators[i] == "canberra") { matrixCalculator = new Canberra(); }else if (Estimators[i] == "structeuclidean") { matrixCalculator = new StructEuclidean(); }else if (Estimators[i] == "structchord") { matrixCalculator = new StructChord(); }else if (Estimators[i] == "hellinger") { matrixCalculator = new Hellinger(); }else if (Estimators[i] == "manhattan") { matrixCalculator = new Manhattan(); }else if (Estimators[i] == "structpearson") { matrixCalculator = new StructPearson(); }else if (Estimators[i] == "soergel") { matrixCalculator = new Soergel(); }else if (Estimators[i] == "spearman") { matrixCalculator = new Spearman(); }else if (Estimators[i] == "structkulczynski") { matrixCalculator = new StructKulczynski(); }else if (Estimators[i] == "speciesprofile") { matrixCalculator = new SpeciesProfile(); }else if (Estimators[i] == "hamming") { matrixCalculator = new Hamming(); }else if (Estimators[i] == "structchi2") { matrixCalculator = new StructChi2(); }else if (Estimators[i] == "gower") { matrixCalculator = new Gower(); }else if (Estimators[i] == "memchi2") { matrixCalculator = new MemChi2(); }else if (Estimators[i] == "memchord") { matrixCalculator = new MemChord(); }else if (Estimators[i] == "memeuclidean") { matrixCalculator = new MemEuclidean(); }else if (Estimators[i] == "mempearson") { matrixCalculator = new MemPearson(); }else if (Estimators[i] == "jsd") { matrixCalculator = new JSD(); }else if (Estimators[i] == "rjsd") { matrixCalculator = new RJSD(); }else { m->mothurOut("[ERROR]: " + Estimators[i] + " is not a valid calculator, please correct.\n"); m->control_pressed = true; return results; } } //calc distances vector< vector< vector > > calcDistsTotals; //each iter, then each groupCombos dists. this will be used to make .dist files vector< vector > calcDists; calcDists.resize(1); for (int thisIter = 0; thisIter < iters+1; thisIter++) { vector thisItersLookup = thisLookup; if (subsample && (thisIter != 0)) { SubSample sample; vector tempLabels; //dont need since we arent printing the sampled sharedRabunds //make copy of lookup so we don't get access violations vector newLookup; for (int k = 0; k < thisItersLookup.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisItersLookup[k]->getLabel()); temp->setGroup(thisItersLookup[k]->getGroup()); newLookup.push_back(temp); } //for each bin for (int k = 0; k < thisItersLookup[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return results; } for (int j = 0; j < thisItersLookup.size(); j++) { newLookup[j]->push_back(thisItersLookup[j]->getAbundance(k), thisItersLookup[j]->getGroup()); } } tempLabels = sample.getSample(newLookup, subsampleSize); thisItersLookup = newLookup; } driver(thisItersLookup, calcDists, matrixCalculator); if (subsample && (thisIter != 0)) { if((thisIter) % 100 == 0){ m->mothurOutJustToScreen(toString(thisIter)+"\n"); } calcDistsTotals.push_back(calcDists); for (int i = 0; i < calcDists.size(); i++) { for (int j = 0; j < calcDists[i].size(); j++) { if (m->debug) { m->mothurOut("[DEBUG]: Results: iter = " + toString(thisIter) + ", " + thisLookup[calcDists[i][j].seq1]->getGroup() + " - " + thisLookup[calcDists[i][j].seq2]->getGroup() + " distance = " + toString(calcDists[i][j].dist) + ".\n"); } } } //clean up memory for (int i = 0; i < thisItersLookup.size(); i++) { delete thisItersLookup[i]; } thisItersLookup.clear(); }else { //print results for whole dataset for (int i = 0; i < calcDists.size(); i++) { if (m->control_pressed) { break; } //initialize matrix results.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { results[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcDists[i].size(); j++) { int row = calcDists[i][j].seq1; int column = calcDists[i][j].seq2; double dist = calcDists[i][j].dist; results[row][column] = dist; results[column][row] = dist; } } } for (int i = 0; i < calcDists.size(); i++) { calcDists[i].clear(); } } if (iters != 0) { //we need to find the average distance and standard deviation for each groups distance vector< vector > calcAverages = m->getAverages(calcDistsTotals, "average"); //print results for (int i = 0; i < calcDists.size(); i++) { results.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { results[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcAverages[i].size(); j++) { int row = calcAverages[i][j].seq1; int column = calcAverages[i][j].seq2; float dist = calcAverages[i][j].dist; results[row][column] = dist; results[column][row] = dist; } } } return results; } catch(exception& e) { m->errorOut(e, "GetMetaCommunityCommand", "generateDistanceMatrix"); exit(1); } } /**************************************************************************************************/ int GetMetaCommunityCommand::driver(vector thisLookup, vector< vector >& calcDists, Calculator* matrixCalculator) { try { vector subset; for (int k = 0; k < thisLookup.size(); k++) { // pass cdd each set of groups to compare for (int l = 0; l < k; l++) { if (k != l) { //we dont need to similiarity of a groups to itself subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabunds subset.push_back(thisLookup[k]); subset.push_back(thisLookup[l]); //if this calc needs all groups to calculate the pair load all groups if (matrixCalculator->getNeedsAll()) { //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < thisLookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(thisLookup[w]); } } } vector tempdata = matrixCalculator->getValues(subset); //saves the calculator outputs if (m->control_pressed) { return 1; } seqDist temp(l, k, tempdata[0]); //cout << l << '\t' << k << '\t' << tempdata[0] << endl; calcDists[0].push_back(temp); } } } return 0; } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "driver"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getmetacommunitycommand.h000066400000000000000000000065371255543666200233550ustar00rootroot00000000000000// // getmetacommunitycommand.h // Mothur // // Created by SarahsWork on 4/9/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_getmetacommunitycommand_h #define Mothur_getmetacommunitycommand_h #include "command.hpp" #include "inputdata.h" #include "qFinderDMM.h" #include "pam.h" #include "sharedsobscollectsummary.h" #include "sharedchao1.h" #include "sharedace.h" #include "sharednseqs.h" #include "sharedjabund.h" #include "sharedsorabund.h" #include "sharedjclass.h" #include "sharedsorclass.h" #include "sharedjest.h" #include "sharedsorest.h" #include "sharedthetayc.h" #include "sharedthetan.h" #include "sharedkstest.h" #include "whittaker.h" #include "sharedochiai.h" #include "sharedanderbergs.h" #include "sharedkulczynski.h" #include "sharedkulczynskicody.h" #include "sharedlennon.h" #include "sharedmorisitahorn.h" #include "sharedbraycurtis.h" #include "sharedjackknife.h" #include "whittaker.h" #include "odum.h" #include "canberra.h" #include "structeuclidean.h" #include "structchord.h" #include "hellinger.h" #include "manhattan.h" #include "structpearson.h" #include "soergel.h" #include "spearman.h" #include "structkulczynski.h" #include "structchi2.h" #include "speciesprofile.h" #include "hamming.h" #include "gower.h" #include "memchi2.h" #include "memchord.h" #include "memeuclidean.h" #include "mempearson.h" #include "sharedjsd.h" #include "sharedrjsd.h" /**************************************************************************************************/ class GetMetaCommunityCommand : public Command { public: GetMetaCommunityCommand(string); GetMetaCommunityCommand(); ~GetMetaCommunityCommand(){} vector setParameters(); string getCommandName() { return "get.communitytype"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "Holmes I, Harris K, Quince C (2012) Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics. PLoS ONE 7(2): e30126. doi:10.1371/journal.pone.0030126 http://www.mothur.org/wiki/get.communitytype"; } string getDescription() { return "Assigns samples to bins using a Dirichlet multinomial mixture model"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, allLines, subsample; string outputDir; vector outputNames; string sharedfile, method, calc; int minpartitions, maxpartitions, optimizegap, processors, iters, subsampleSize; vector Groups, Estimators; set labels; vector > generateDistanceMatrix(vector& lookup); int driver(vector thisLookup, vector< vector >& calcDists, Calculator*); int processDriver(vector&, vector&, string, vector, vector, vector, int); int createProcesses(vector&); vector generateDesignFile(int, map); int generateSummaryFile(int, map, vector); }; /**************************************************************************************************/ struct summaryData { string name; double refMean, difference; vector partMean, partLCI, partUCI; }; #endif mothur-1.36.1/source/commands/getmimarkspackagecommand.cpp000066400000000000000000005460431255543666200237750ustar00rootroot00000000000000// // getmimarkspackagecommand.cpp // Mothur // // Created by Sarah Westcott on 3/25/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #include "getmimarkspackagecommand.h" #include "groupmap.h" //********************************************************************************************************************** vector GetMIMarksPackageCommand::setParameters(){ try { //files that have dependancies CommandParameter pgroup("group", "InputTypes", "", "", "groupOligos", "none", "none","",false,false); parameters.push_back(pgroup); CommandParameter pfile("file", "InputTypes", "", "", "groupOligos", "none", "none","",false,false); parameters.push_back(pfile); CommandParameter poligos("oligos", "InputTypes", "", "", "groupOligos", "none", "none","",false,false); parameters.push_back(poligos); CommandParameter ppackage("package", "Multiple", "air-host_associated-human_associated-human_gut-human_oral-human_skin-human_vaginal-microbial-miscellaneous-plant_associated-sediment-soil-wastewater-water", "miscellaneous", "", "", "","",false,false,true); parameters.push_back(ppackage); CommandParameter prequiredonly("requiredonly", "Boolean", "", "F", "", "", "","",false,false, true); parameters.push_back(prequiredonly); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetMIMarksPackageCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.mimarkspackage command creates a mimarks package form with your groups. The required fields are flagged with * characters. \n"; helpString += "Further documentation on the different packages and required formats can be found here, http://www.mothur.org/wiki/MIMarks_Data_Packages.\n"; helpString += "The get.mimarkspackage command parameters are: oligos, group, package and requiredonly. oligos or group is required.\n"; helpString += "The oligos parameter is used to provide your oligos file so mothur can extract your group names.\n"; helpString += "The group parameter is used to provide your group file so mothur can extract your group names.\n"; helpString += "The package parameter is used to select the mimarks package you would like to use. The choices are: air, host_associated, human_associated, human_gut, human_oral, human_skin, human_vaginal, microbial, miscellaneous, plant_associated, sediment, soil, wastewater or waterc. Default=miscellaneous.\n"; helpString += "The requiredonly parameter is used to indicate you only want the required mimarks feilds printed. Default=F.\n"; helpString += "The get.mimarkspackage command should be in the following format: get.mimarkspackage(oligos=yourOligosFile, package=yourPackage)\n"; helpString += "get.mimarkspackage(oligos=GQY1XT001.oligos, package=human_gut)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetMIMarksPackageCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "tsv") { pattern = "[filename],tsv"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetMIMarksPackageCommand::GetMIMarksPackageCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["tsv"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "GetMIMarksPackageCommand"); exit(1); } } //********************************************************************************************************************** GetMIMarksPackageCommand::GetMIMarksPackageCommand(string option) { try { abort = false; calledHelp = false; fileOption = 0; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["tsv"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); inputfile = groupfile; } oligosfile = validParameter.validFile(parameters, "oligos", true); if (oligosfile == "not found") { oligosfile = ""; setOligosParameter = false; } else if(oligosfile == "not open") { abort = true; } else { m->setOligosFile(oligosfile); inputfile = oligosfile; setOligosParameter = true; } file = validParameter.validFile(parameters, "file", true); if (file == "not open") { file = ""; abort = true; } else if (file == "not found") { file = ""; } else { inputfile = file; fileOption = findFileOption(); } if ((groupfile == "") && (oligosfile == "") && (file == "")) { oligosfile = m->getOligosFile(); if (oligosfile != "") { inputfile = oligosfile; m->mothurOut("Using " + oligosfile + " as input file for the oligos parameter."); m->mothurOutEndLine(); } else { groupfile = m->getGroupFile(); if (groupfile != "") { inputfile = groupfile; m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { m->mothurOut("[ERROR]: You must provide file, groupfile or oligos file for the get.mimarkspackage command."); m->mothurOutEndLine(); abort = true; } } } package = validParameter.validFile(parameters, "package", false); if (package == "not found") { package = "miscellaneous"; } for (int i = 0; i < package.length(); i++) { package[i] = tolower(package[i]); } if ((package == "air") || (package == "host_associated") || (package == "human_associated") || (package == "human_gut") || (package == "human_oral") || (package == "human_skin") || (package == "human_vaginal") || (package == "microbial") || (package == "miscellaneous") || (package == "plant_associated") || (package == "sediment") || (package == "soil") || (package == "wastewater") || (package == "water") ) {} else { m->mothurOut("[ERROR]: " + package + " is not a valid package selection. Choices are: air, host_associated, human_associated, human_gut, human_oral, human_skin, human_vaginal, microbial, miscellaneous, plant_associated, sediment, soil, wastewater or water. Aborting.\n."); abort = true; } string temp; temp = validParameter.validFile(parameters, "requiredonly", false); if(temp == "not found"){ temp = "F"; } requiredonly = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "GetMIMarksPackageCommand"); exit(1); } } //********************************************************************************************************************** int GetMIMarksPackageCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if ((oligosfile != "") && (file != "")) { Oligos oligos(oligosfile); createGroupNames(oligos); } else if (file != "") { readFile(); } else if (oligosfile != "") { Oligos oligos(oligosfile); createGroupNames(oligos); } //createGroupNames fills in group names else { GroupMap groupmap(groupfile); groupmap.readMap(); vector tempGroups = groupmap.getNamesOfGroups(); for (int i = 0; i < tempGroups.size(); i++) { Groups.insert(tempGroups[i]); } } if (outputDir == "") { outputDir += m->hasPath(inputfile); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); string outputFileName = getOutputFileName("tsv", variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["tsv"].push_back(outputFileName); out << "#This is a tab-delimited file. Additional Documentation can be found at http://www.mothur.org/wiki/MIMarks_Data_Packages." << endl; out << "#Please fill all the required fields indicated with '*'" << endl; out << "#Unknown or inapplicable fields can be assigned 'missing' value." << endl; out << "#You may add extra custom fields to this template. Make sure all the fields are separated by tabs." << endl; out << "#You may remove any fields not required (marked with '*'). Make sure all the fields are separated by tabs." << endl; out << "#You can edit this template using Microsoft Excel or any other editor. But while saving the file please make sure to save them as 'TAB-DELIMITED' TEXT FILE." << endl; if (package == "air") { out << "#MIMARKS.survey.air.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{float} m} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *altitude *collection_date *env_biome *env_feature *env_material *geo_loc_name *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {force per unit area exerted against a surface by the weight of air above that surface} {carbon dioxide (gas) amount or concentration at the time of sampling} {carbon monoxide (gas) amount or concentration at the time of sampling} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {amount of water vapour in the air, at the time of sampling} {methane (gas) amount or concentration at the time of sampling} {any other measurement performed or parameter collected, that is not listed here} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {oxygen (gas) amount or concentration at the time of sampling} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {pollutant types and, amount or concentrations measured at the time of sampling; can report multiple pollutants by entering numeric values preceded by name of pollutant} {Aerobic or anaerobic} {concentration of substances that remain suspended in the air, and comprise mixtures of organic and inorganic substances (PM10 and PM2.5); can report multiple PM's by entering numeric values preceded by name of PM} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {the amount of solar energy that arrives at a specific area of a surface during a specific time interval} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {temperature of the sample at time of sampling} {ventilation rate of the system in the sampled premises} {ventilation system used in the sampled premises} {concentration of carbon-based chemicals that easily evaporate at room temperature; can report multiple volatile organic compounds by entering numeric values preceded by name of compound} {wind direction is the direction from which a wind originates} {speed of wind measured at the time of sampling}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{float} m} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{term}; {timestamp}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{float} {unit}} {{text};{interval}} {{text};{float} {unit}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text};{float} {unit}} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{float} {unit}} {{text}} {{float} {unit}} {{float} {unit}} {{text}} {{text};{float} {unit}} {{text}} {{float} {unit}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *altitude *collection_date *env_biome *env_feature *env_material *geo_loc_name *lat_lon barometric_press carb_dioxide carb_monoxide chem_administration elev humidity methane misc_param organism_count oxy_stat_samp oxygen perturbation pollutants rel_to_oxygen resp_part_matter samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext solar_irradiance source_material_id temp ventilation_rate ventilation_type volatile_org_comp wind_direction wind_speed" << endl; } }else if (package == "built") { out << "#MIMARKS.survey.built.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {actual mass of water vapor - mh20 - present in the air water vapor mixture} {temperature of the air at the time of sampling} {primary function for which a building or discrete part of a building is intended to be used} {location (geography) where a building is set} {carbon dioxide (gas) amount or concentration at the time of sampling} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {device which removes solid particulates or airborne molecular contaminants} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {methods of conditioning or heating a room or building} {a distinguishable space within a structure, the purpose for which discrete areas of a building is used} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {application of light to achieve some practical or aesthetic effect. Lighting includes the use of both artificial light sources such as lamps and light fixtures, as well as natural illumination by capturing daylight. Can also include absence of light} {number of occupants present at time of sample within the given space} {average number of occupants at time of sampling per square footage} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {partial vapor and air pressure, density of the vapor and air, or by the actual mass of the vapor and air} {customary or normal state of the space} {customary or normal density of occupants} {ventilation system used in the sampled premises}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{float} {unit} [kg|lb]} {{float} {unit} [deg C]} {['', 'office', 'market', 'restaurant', 'residence', 'school', 'residential', 'commercial', 'low rise', 'high rise', 'wood framed', 'health care', 'airport', 'sports complex', 'missing', 'not applicable', 'not collected']} {['', 'urban', 'suburban', 'exurban', 'rural', 'missing', 'not applicable', 'not collected']} {{float} {unit}} {{timestamp}} {{term}} {{term}} {{term}} {['', 'particulate air filter', 'chemical air filter', 'low-MERV pleated media', 'HEPA', 'electrostatic', 'gas-phase or ultraviolet air treatments', 'missing', 'not applicable', 'not collected']} {{term}:{term}:{text}} {['', 'radiant system', 'heat pump', 'forced air system', 'steam forced heat', 'wood stove', 'missing', 'not applicable', 'not collected']} {['', 'bedroom', 'office', 'bathroom', 'foyer', 'kitchen', 'locker room', 'hallway', 'elevator', 'missing', 'not applicable', 'not collected']} {{float} {float}} {['', 'natural light', 'electric light', 'no light', 'missing', 'not applicable', 'not collected']} {{integer}} {{float}} {{text};{float} {unit}} {{float} {unit} [%]} {['', 'typical occupied', 'typically unoccupied', 'missing', 'not applicable', 'not collected']} {{float}} {{text}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *abs_air_humidity *air_temp *build_occup_type *building_setting *carb_dioxide *collection_date *env_biome *env_feature *env_material *filter_type *geo_loc_name *heat_cool_type *indoor_space *lat_lon *light_type *occup_samp *occupant_dens_samp *organism_count *rel_air_humidity *space_typ_state *typ_occupant_dens *ventilation_type" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {actual mass of water vapor - mh20 - present in the air water vapor mixture} {temperature of the air at the time of sampling} {primary function for which a building or discrete part of a building is intended to be used} {location (geography) where a building is set} {carbon dioxide (gas) amount or concentration at the time of sampling} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {device which removes solid particulates or airborne molecular contaminants} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {methods of conditioning or heating a room or building} {a distinguishable space within a structure, the purpose for which discrete areas of a building is used} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {application of light to achieve some practical or aesthetic effect. Lighting includes the use of both artificial light sources such as lamps and light fixtures, as well as natural illumination by capturing daylight. Can also include absence of light} {number of occupants present at time of sample within the given space} {average number of occupants at time of sampling per square footage} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {partial vapor and air pressure, density of the vapor and air, or by the actual mass of the vapor and air} {customary or normal state of the space} {customary or normal density of occupants} {ventilation system used in the sampled premises} {temperature to which a given parcel of humid air must be cooled, at constant barometric pressure, for water vapor to condense into water.} {type of indoor surface} {Aerobic or anaerobic} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {Amount or size of sample (volume, mass or area) that was collected} {method by which samples are sorted} {volume (mL) or weight (g) of sample processed for DNA extraction} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {substructure or under building is that largely hidden section of the building which is built off the foundations to the ground floor level} {contaminant identified on surface} {surfaces: water activity as a function of air and material moisture} {surface materials at the point of sampling} {water held on a surface} {pH measurement of surface} {temperature of the surface at the time of sampling}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{float} {unit} [kg|lb]} {{float} {unit} [deg C]} {['', 'office', 'market', 'restaurant', 'residence', 'school', 'residential', 'commercial', 'low rise', 'high rise', 'wood framed', 'health care', 'airport', 'sports complex', 'missing', 'not applicable', 'not collected']} {['', 'urban', 'suburban', 'exurban', 'rural', 'missing', 'not applicable', 'not collected']} {{float} {unit}} {{timestamp}} {{term}} {{term}} {{term}} {['', 'particulate air filter', 'chemical air filter', 'low-MERV pleated media', 'HEPA', 'electrostatic', 'gas-phase or ultraviolet air treatments', 'missing', 'not applicable', 'not collected']} {{term}:{term}:{text}} {['', 'radiant system', 'heat pump', 'forced air system', 'steam forced heat', 'wood stove', 'missing', 'not applicable', 'not collected']} {['', 'bedroom', 'office', 'bathroom', 'foyer', 'kitchen', 'locker room', 'hallway', 'elevator', 'missing', 'not applicable', 'not collected']} {{float} {float}} {['', 'natural light', 'electric light', 'no light', 'missing', 'not applicable', 'not collected']} {{integer}} {{float}} {{text};{float} {unit}} {{float} {unit} [%]} {['', 'typical occupied', 'typically unoccupied', 'missing', 'not applicable', 'not collected']} {{float}} {{text}} {{float} {unit} [deg C]} {['', 'counter top', 'window', 'wall', 'cabinet', 'ceiling', 'door', 'shelving', 'vent cover']} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text}} {{text|term}} {{float} {unit}} {{text}} {{float} {unit}} {{text}} {['', 'crawlspace', 'slab on grade', 'basement']} {['', 'dust', 'organic matter', 'particulate matter', 'volatile organic compounds', 'biological contaminants', 'radon', 'nutrients', 'biocides']} {{float} {unit} [%]} {['', 'concrete', 'wood', 'stone', 'tile', 'plastic', 'glass', 'vinyl', 'metal', 'carpet', 'stainless steel', 'paint', 'cinder blocks', 'hay bales', 'stucco', 'adobe']} {{float} {unit}} {{integer [0-14]}} {{float} {unit} [deg C]}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *abs_air_humidity *air_temp *build_occup_type *building_setting *carb_dioxide *collection_date *env_biome *env_feature *env_material *filter_type *geo_loc_name *heat_cool_type *indoor_space *lat_lon *light_type *occup_samp *occupant_dens_samp *organism_count *rel_air_humidity *space_typ_state *typ_occupant_dens *ventilation_type dew_point indoor_surf rel_to_oxygen samp_collect_device samp_mat_process samp_size samp_sort_meth samp_vol_we_dna_ext source_material_id substructure_type surf_air_cont surf_humidity surf_material surf_moisture surf_moisture_ph surf_temp" << endl; } }else if (package == "host_associated") { out << "#MIMARKS.survey.host-associated.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {whether or not subject is gravid, and if yes date due or date post-conception, specifying which is used} {Age of host at the time of sampling} {resting diastolic blood pressureof the host, measured as mm mercury} {resting systolic blood pressure of the host, measured as mm mercury} {original body habitat where the sample was obtained from} {substance produced by the host, e.g. stool, mucus, where the sample was obtained from} {core body temperature of the host when sample was collected} {the color of host} {type of diet depending on the sample for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types} {Name of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh} {measurement of dry mass} {none} {none} {literature reference giving growth conditions of the host} {the height of subject} {taxonomic information subspecies level} {taxonomic rank information below subspecies level, such as variety, form, rank etc.} {content of last meal and time since feeding; can include multiple values} {the length of subject} {description of host life stage} {none} {Gender or physical sex of the host} {morphological shape of host} {a unique identifier by which each subject can be referred to, de-identified, e.g. #131} {the growth substrate of the host} {NCBI taxonomy ID of the host, e.g. 9606} {Type of tissue the initial sample was taken from. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1005)} {total mass of the host at collection, the unit depends on host} {any other measurement performed or parameter collected, that is not listed here} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {Aerobic or anaerobic} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {temperature of the sample at time of sampling}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}} {{float} m} {{term}; {timestamp}} {{float} m} {{float} {unit}} {{boolean};{timestamp}} {{none}} {{float} {unit}} {{float} {unit}} {{term}} {{text}} {{float} {unit}} {{text}} {{text}} {{none}} {{float} {unit}} {{none}} {{none}} {{PMID|DOI|URL}} {{float} {unit}} {{text}} {{text}} {{text};{period}} {{float} {unit}} {{text}} {{none}} {['', 'male', 'female', 'pooled male and female', 'neuter', 'hermaphrodite', 'not determined', 'missing', 'not applicable', 'not collected']} {{text}} {{text}} {{text}} {{integer}} {{none}} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{text}} {{float} {unit}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon altitude chem_administration depth elev gravidity host_age host_blood_press_diast host_blood_press_syst host_body_habitat host_body_product host_body_temp host_color host_diet host_disease host_dry_mass host_family_relationship host_genotype host_growth_cond host_height host_infra_specific_name host_infra_specific_rank host_last_meal host_length host_life_stage host_phenotype host_sex host_shape host_subject_id host_substrate host_taxid host_tissue_sampled host_tot_mass misc_param organism_count oxy_stat_samp perturbation rel_to_oxygen samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext source_material_id temp" << endl; } }else if (package == "human_associated") { out << "#MIMARKS.survey.human-associated.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {specification of the color of the amniotic fluid sample} {history of blood disorders; can include multiple disorders} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {specification of major diet changes in the last six months, if yes the change should be specified} {any drug used by subject and the frequency of usage; can include multiple drugs used} {ethnicity of the subject} {specification of foetal health status, should also include abortion} {specification of the gestation state} {Age of host at the time of sampling} {body mass index of the host, calculated as weight/(height)squared} {substance produced by the host, e.g. stool, mucus, where the sample was obtained from} {core body temperature of the host when sample was collected} {type of diet depending on the sample for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types} {Name of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh} {none} {none} {the height of subject} {HIV status of subject, if yes HAART initiation status should also be indicated as [YES or NO]} {content of last meal and time since feeding; can include multiple values} {most frequent job performed by subject} {none} {resting pulse of the host, measured as beats per minute} {Gender or physical sex of the host} {a unique identifier by which each subject can be referred to, de-identified, e.g. #131} {Type of tissue the initial sample was taken from. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1005)} {total mass of the host at collection, the unit depends on host} {can include multiple medication codes} {history of kidney disorders; can include multiple disorders} {specification of the maternal health status} {whether full medical history was collected} {any other measurement performed or parameter collected, that is not listed here} {history of nose-throat disorders; can include multiple disorders} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {specification of presence of pets or farm animals in the environment of subject, if yes the animals should be specified; can include multiple animals present} {history of pulmonary disorders; can include multiple disorders} {Aerobic or anaerobic} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {specification of smoking status} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {specification of study completion status, if no the reason should be specified} {temperature of the sample at time of sampling} {specification of the countries travelled in the last six months; can include multiple travels} {specification of twin sibling presence} {specification of urine collection method} {history of urogenitaltract disorders; can include multiple disorders} {specification of weight loss in the last three months, if yes should be further specified to include amount of weight loss}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}} {{text}} {{text}} {{term}; {timestamp}} {{boolean};{text}} {{text};{integer}/[year|month|week|day|hour]} {{integer|text}} {{text}} {{text}} {{none}} {{float} {unit}} {{text}} {{float} {unit}} {{text}} {{none}} {{none}} {{none}} {{float} {unit}} {{boolean};{boolean}} {{text};{period}} {{none}} {{none}} {{float} {unit}} {['', 'male', 'female', 'pooled male and female', 'neuter', 'hermaphrodite', 'not determined', 'missing', 'not applicable', 'not collected']} {{text}} {{none}} {{float} {unit}} {{integer}} {{text}} {{text}} {{boolean}} {{text};{float} {unit}} {{text}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {{boolean};{text}} {{text}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{boolean}} {{text}} {{boolean};[adverse event|non-compliance|lost to follow up|other-specify]} {{float} {unit}} {{text}} {{boolean}} {['', 'clean catch', 'catheter']} {{text}} {{boolean};{float} {unit}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon amniotic_fluid_color blood_blood_disord chem_administration diet_last_six_month drug_usage ethnicity foetal_health_stat gestation_state host_age host_body_mass_index host_body_product host_body_temp host_diet host_disease host_family_relationship host_genotype host_height host_hiv_stat host_last_meal host_occupation host_phenotype host_pulse host_sex host_subject_id host_tissue_sampled host_tot_mass ihmc_medication_code kidney_disord maternal_health_stat medic_hist_perform misc_param nose_throat_disord organism_count oxy_stat_samp perturbation pet_farm_animal pulmonary_disord rel_to_oxygen samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext smoker source_material_id study_complt_stat temp travel_out_six_month twin_sibling urine_collect_meth urogenit_tract_disor weight_loss_3_month" << endl; } }else if (package == "human_gut") { out << "#MIMARKS.survey.human-gut.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {ethnicity of the subject} {history of gastrointestinal tract disorders; can include multiple disorders} {Age of host at the time of sampling} {body mass index of the host, calculated as weight/(height)squared} {substance produced by the host, e.g. stool, mucus, where the sample was obtained from} {core body temperature of the host when sample was collected} {type of diet depending on the sample for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types} {Name of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh} {none} {none} {the height of subject} {content of last meal and time since feeding; can include multiple values} {most frequent job performed by subject} {none} {resting pulse of the host, measured as beats per minute} {Gender or physical sex of the host} {a unique identifier by which each subject can be referred to, de-identified, e.g. #131} {Type of tissue the initial sample was taken from. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1005)} {total mass of the host at collection, the unit depends on host} {can include multiple medication codes} {history of liver disorders; can include multiple disorders} {whether full medical history was collected} {any other measurement performed or parameter collected, that is not listed here} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {Aerobic or anaerobic} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {specification of special diet; can include multiple special diets} {temperature of the sample at time of sampling}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}} {{term}; {timestamp}} {{integer|text}} {{text}} {{none}} {{float} {unit}} {{text}} {{float} {unit}} {{text}} {{none}} {{none}} {{none}} {{float} {unit}} {{text};{period}} {{none}} {{none}} {{float} {unit}} {['', 'male', 'female', 'pooled male and female', 'neuter', 'hermaphrodite', 'not determined', 'missing', 'not applicable', 'not collected']} {{text}} {{none}} {{float} {unit}} {{integer}} {{text}} {{boolean}} {{text};{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{text}} {['', 'low carb', 'reduced calorie', 'vegetarian', 'other(to be specified)']} {{float} {unit}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon chem_administration ethnicity gastrointest_disord host_age host_body_mass_index host_body_product host_body_temp host_diet host_disease host_family_relationship host_genotype host_height host_last_meal host_occupation host_phenotype host_pulse host_sex host_subject_id host_tissue_sampled host_tot_mass ihmc_medication_code liver_disord medic_hist_perform misc_param organism_count oxy_stat_samp perturbation rel_to_oxygen samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext source_material_id special_diet temp" << endl; } }else if (package == "human_oral") { out << "#MIMARKS.survey.human-oral.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {ethnicity of the subject} {Age of host at the time of sampling} {body mass index of the host, calculated as weight/(height)squared} {substance produced by the host, e.g. stool, mucus, where the sample was obtained from} {core body temperature of the host when sample was collected} {type of diet depending on the sample for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types} {Name of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh} {none} {none} {the height of subject} {content of last meal and time since feeding; can include multiple values} {most frequent job performed by subject} {none} {resting pulse of the host, measured as beats per minute} {Gender or physical sex of the host} {a unique identifier by which each subject can be referred to, de-identified, e.g. #131} {Type of tissue the initial sample was taken from. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1005)} {total mass of the host at collection, the unit depends on host} {can include multiple medication codes} {whether full medical history was collected} {any other measurement performed or parameter collected, that is not listed here} {history of nose/mouth/teeth/throat disorders; can include multiple disorders} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {Aerobic or anaerobic} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {temperature of the sample at time of sampling} {specification of the time since last toothbrushing}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}} {{term}; {timestamp}} {{integer|text}} {{none}} {{float} {unit}} {{text}} {{float} {unit}} {{text}} {{none}} {{none}} {{none}} {{float} {unit}} {{text};{period}} {{none}} {{none}} {{float} {unit}} {['', 'male', 'female', 'pooled male and female', 'neuter', 'hermaphrodite', 'not determined', 'missing', 'not applicable', 'not collected']} {{text}} {{none}} {{float} {unit}} {{integer}} {{boolean}} {{text};{float} {unit}} {{text}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{text}} {{float} {unit}} {{timestamp}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon chem_administration ethnicity host_age host_body_mass_index host_body_product host_body_temp host_diet host_disease host_family_relationship host_genotype host_height host_last_meal host_occupation host_phenotype host_pulse host_sex host_subject_id host_tissue_sampled host_tot_mass ihmc_medication_code medic_hist_perform misc_param nose_mouth_teeth_throat_disord organism_count oxy_stat_samp perturbation rel_to_oxygen samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext source_material_id temp time_last_toothbrush" << endl; } }else if (package == "human_skin") { out << "#MIMARKS.survey.human-skin.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {history of dermatology disorders; can include multiple disorders} {dominant hand of the subject} {ethnicity of the subject} {Age of host at the time of sampling} {body mass index of the host, calculated as weight/(height)squared} {substance produced by the host, e.g. stool, mucus, where the sample was obtained from} {core body temperature of the host when sample was collected} {type of diet depending on the sample for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types} {Name of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh} {none} {none} {the height of subject} {content of last meal and time since feeding; can include multiple values} {most frequent job performed by subject} {none} {resting pulse of the host, measured as beats per minute} {Gender or physical sex of the host} {a unique identifier by which each subject can be referred to, de-identified, e.g. #131} {Type of tissue the initial sample was taken from. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1005)} {total mass of the host at collection, the unit depends on host} {can include multiple medication codes} {whether full medical history was collected} {any other measurement performed or parameter collected, that is not listed here} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {Aerobic or anaerobic} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {temperature of the sample at time of sampling} {specification of the time since last wash}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}} {{term}; {timestamp}} {{text}} {['', 'left', 'right', 'ambidextrous']} {{integer|text}} {{none}} {{float} {unit}} {{text}} {{float} {unit}} {{text}} {{none}} {{none}} {{none}} {{float} {unit}} {{text};{period}} {{none}} {{none}} {{float} {unit}} {['', 'male', 'female', 'pooled male and female', 'neuter', 'hermaphrodite', 'not determined', 'missing', 'not applicable', 'not collected']} {{text}} {{none}} {{float} {unit}} {{integer}} {{boolean}} {{text};{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{text}} {{float} {unit}} {{timestamp}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon chem_administration dermatology_disord dominant_hand ethnicity host_age host_body_mass_index host_body_product host_body_temp host_diet host_disease host_family_relationship host_genotype host_height host_last_meal host_occupation host_phenotype host_pulse host_sex host_subject_id host_tissue_sampled host_tot_mass ihmc_medication_code medic_hist_perform misc_param organism_count oxy_stat_samp perturbation rel_to_oxygen samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext source_material_id temp time_since_last_wash" << endl; } }else if (package == "human_vaginal") { out << "#MIMARKS.survey.human-vaginal.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {specification of birth control medication used} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {date of most recent douche} {ethnicity of the subject} {history of gynecological disorders; can include multiple disorders} {Age of host at the time of sampling} {body mass index of the host, calculated as weight/(height)squared} {substance produced by the host, e.g. stool, mucus, where the sample was obtained from} {core body temperature of the host when sample was collected} {type of diet depending on the sample for animals omnivore, herbivore etc., for humans high-fat, meditteranean etc.; can include multiple diet types} {Name of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh} {none} {none} {the height of subject} {content of last meal and time since feeding; can include multiple values} {most frequent job performed by subject} {none} {resting pulse of the host, measured as beats per minute} {Gender or physical sex of the host} {a unique identifier by which each subject can be referred to, de-identified, e.g. #131} {Type of tissue the initial sample was taken from. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1005)} {total mass of the host at collection, the unit depends on host} {whether subject had hormone replacement theraphy, and if yes start date} {specification of whether hysterectomy was performed} {can include multiple medication codes} {whether full medical history was collected} {date of most recent menstruation} {date of onset of menopause} {any other measurement performed or parameter collected, that is not listed here} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {date due of pregnancy} {Aerobic or anaerobic} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {current sexual partner and frequency of sex} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {temperature of the sample at time of sampling} {history of urogenital disorders, can include multiple disorders}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}} {{text}} {{term}; {timestamp}} {{timestamp}} {{integer|text}} {{text}} {{none}} {{float} {unit}} {{text}} {{float} {unit}} {{text}} {{none}} {{none}} {{none}} {{float} {unit}} {{text};{period}} {{none}} {{none}} {{float} {unit}} {['', 'male', 'female', 'pooled male and female', 'neuter', 'hermaphrodite', 'not determined', 'missing', 'not applicable', 'not collected']} {{text}} {{none}} {{float} {unit}} {{timestamp}} {{boolean}} {{integer}} {{boolean}} {{timestamp}} {{timestamp}} {{text};{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {{timestamp}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{text}} {{text}} {{float} {unit}} {{text}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon birth_control chem_administration douche ethnicity gynecologic_disord host_age host_body_mass_index host_body_product host_body_temp host_diet host_disease host_family_relationship host_genotype host_height host_last_meal host_occupation host_phenotype host_pulse host_sex host_subject_id host_tissue_sampled host_tot_mass hrt hysterectomy ihmc_medication_code medic_hist_perform menarche menopause misc_param organism_count oxy_stat_samp perturbation pregnancy rel_to_oxygen samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext sexual_act source_material_id temp urogenit_disord" << endl; } }else if (package == "microbial") { out << "#MIMARKS.survey.microbial.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{float} m} {{float} {unit}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *depth *elev *env_biome *env_feature *env_material *geo_loc_name *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {alkalinity, the ability of a solution to neutralize acids to the equivalence point of carbonate or bicarbonate} {concentration of alkyl diethers} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {measurement of aminopeptidase activity} {concentration of ammonium} {measurement of bacterial carbon production} {amount of biomass; should include the name for the part of biomass measured, e.g. microbial, total. can include multiple measurements} {concentration of bishomohopanol} {concentration of bromide} {concentration of calcium} {ratio of amount or concentrations of carbon to nitrogen} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {concentration of chloride} {concentration of chlorophyll} {concentration of diether lipids; can include multiple types of diether lipids} {concentration of dissolved carbon dioxide} {concentration of dissolved hydrogen} {dissolved inorganic carbon concentration} {concentration of dissolved organic carbon} {dissolved organic nitrogen concentration measured as; total dissolved nitrogen - NH4 - NO3 - NO2} {concentration of dissolved oxygen} {measurement of glucosidase activity} {concentration of magnesium} {measurement of mean friction velocity} {measurement of mean peak friction velocity} {methane (gas) amount or concentration at the time of sampling} {any other measurement performed or parameter collected, that is not listed here} {concentration of n-alkanes; can include multiple n-alkanes} {concentration of nitrate} {concentration of nitrite} {concentration of nitrogen (total)} {concentration of organic carbon} {concentration of organic matter} {concentration of organic nitrogen} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {concentration of particulate organic carbon} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {concentration of petroleum hydrocarbon} {pH measurement} {concentration of phaeopigments; can include multiple phaeopigments} {concentration of phosphate} {concentration of phospholipid fatty acids; can include multiple values} {concentration of potassium} {pressure to which the sample is subject, in atmospheres} {redox potential, measured relative to a hydrogen cell, indicating oxidation or reduction potential} {Aerobic or anaerobic} {salinity measurement} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {concentration of silicate} {sodium concentration} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {concentration of sulfate} {concentration of sulfide} {temperature of the sample at time of sampling} {total carbon content} {total nitrogen content of the sample} {Definition for soil: total organic C content of the soil units of g C/kg soil. Definition otherwise: total organic carbon content} {turbidity measurement} {water content measurement}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{float} m} {{float} {unit}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}} {{float} {unit}} {{float} {unit}} {{float} m} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{term}; {timestamp}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{float} {unit}} {{text};{interval}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{float} {unit}} {{text}} {{text|term}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *depth *elev *env_biome *env_feature *env_material *geo_loc_name *lat_lon alkalinity alkyl_diethers altitude aminopept_act ammonium bacteria_carb_prod biomass bishomohopanol bromide calcium carb_nitro_ratio chem_administration chloride chlorophyll diether_lipids diss_carb_dioxide diss_hydrogen diss_inorg_carb diss_org_carb diss_org_nitro diss_oxygen glucosidase_act magnesium mean_frict_vel mean_peak_frict_vel methane misc_param n_alkanes nitrate nitrite nitro org_carb org_matter org_nitro organism_count oxy_stat_samp part_org_carb perturbation petroleum_hydrocarb ph phaeopigments phosphate phosplipid_fatt_acid potassium pressure redox_potential rel_to_oxygen salinity samp_collect_device samp_mat_process samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext silicate sodium source_material_id sulfate sulfide temp tot_carb tot_nitro tot_org_carb turbidity water_content" << endl; } }else if (package == "miscellaneous") { out << "#MIMARKS.survey.miscellaneous.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {alkalinity, the ability of a solution to neutralize acids to the equivalence point of carbonate or bicarbonate} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {concentration of ammonium} {amount of biomass; should include the name for the part of biomass measured, e.g. microbial, total. can include multiple measurements} {concentration of bromide} {concentration of calcium} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {concentration of chloride} {concentration of chlorophyll} {density of sample} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {concentration of diether lipids; can include multiple types of diether lipids} {concentration of dissolved carbon dioxide} {concentration of dissolved hydrogen} {dissolved inorganic carbon concentration} {dissolved organic nitrogen concentration measured as; total dissolved nitrogen - NH4 - NO3 - NO2} {concentration of dissolved oxygen} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {any other measurement performed or parameter collected, that is not listed here} {concentration of nitrate} {concentration of nitrite} {concentration of nitrogen (total)} {concentration of organic carbon} {concentration of organic matter} {concentration of organic nitrogen} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {pH measurement} {concentration of phosphate} {concentration of phospholipid fatty acids; can include multiple values} {concentration of potassium} {pressure to which the sample is subject, in atmospheres} {Aerobic or anaerobic} {salinity measurement} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {concentration of silicate} {sodium concentration} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {concentration of sulfate} {concentration of sulfide} {temperature of the sample at time of sampling} {measurement of magnitude and direction of flow within a fluid}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}} {{float} {unit}} {{float} m} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{term}; {timestamp}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} m} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{float} {unit}} {{text}} {{text|term}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *lat_lon alkalinity altitude ammonium biomass bromide calcium chem_administration chloride chlorophyll density depth diether_lipids diss_carb_dioxide diss_hydrogen diss_inorg_carb diss_org_nitro diss_oxygen elev misc_param nitrate nitrite nitro org_carb org_matter org_nitro organism_count oxy_stat_samp perturbation ph phosphate phosplipid_fatt_acid potassium pressure rel_to_oxygen salinity samp_collect_device samp_mat_process samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext silicate sodium source_material_id sulfate sulfide temp water_current" << endl; } }else if (package == "plant_associated") { out << "#MIMARKS.survey.plant-associated.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The natural (as opposed to laboratory) host to the organism from which the sample was obtained. Use the full taxonomic name, eg, \"Homo sapiens\".} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {information about treatment involving an exposure to varying temperatures; should include the temperature, treatment duration, interval and total experimental duration; can include different temperature regimens} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {information about treatment involving antibiotic administration; should include the name of antibiotic, amount administered, treatment duration, interval and total experimental duration; can include multiple antibiotic regimens} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {treatment involving use of mutagens; should include the name of mutagen, amount administered, treatment duration, interval and total experimental duration; can include multiple mutagen regimens} {treatment involving an exposure to a particular climate; can include multiple climates} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {information about treatment involving the use of fertilizers; should include the name fertilizer, amount administered, treatment duration, interval and total experimental duration; can include multiple fertilizer regimens} {information about treatment involving use of fungicides; should include the name of fungicide, amount administered, treatment duration, interval and total experimental duration; can include multiple fungicide regimens} {use of conditions with differing gaseous environments; should include the name of gaseous compound, amount administered, treatment duration, interval and total experimental duration; can include multiple gaseous environment regimens} {information about treatment involving use of gravity factor to study various types of responses in presence, absence or modified levels of gravity; can include multiple treatments} {information about treatment involving use of growth hormones; should include the name of growth hormone, amount administered, treatment duration, interval and total experimental duration; can include multiple growth hormone regimens} {information about growth media for growing the plants or tissue cultured samples} {information about treatment involving use of herbicides; information about treatment involving use of growth hormones; should include the name of herbicide, amount administered, treatment duration, interval and total experimental duration; can include multiple regimens} {Age of host at the time of sampling} {Name of relevant disease, e.g. Salmonella gastroenteritis. Controlled vocabulary, http://bioportal.bioontology.org/ontologies/1009 or http://www.ncbi.nlm.nih.gov/mesh} {measurement of dry mass} {none} {the height of subject} {taxonomic information subspecies level} {taxonomic rank information below subspecies level, such as variety, form, rank etc.} {the length of subject} {description of host life stage} {none} {NCBI taxonomy ID of the host, e.g. 9606} {total mass of the host at collection, the unit depends on host} {measurement of wet mass} {information about treatment involving an exposure to varying degree of humidity; information about treatment involving use of growth hormones; should include amount of humidity administered, treatment duration, interval and total experimental duration; can include multiple regimens} {information about any mechanical damage exerted on the plant; can include multiple damages and sites} {information about treatment involving the use of mineral supplements; should include the name of mineral nutrient, amount administered, treatment duration, interval and total experimental duration; can include multiple mineral nutrient regimens} {any other measurement performed or parameter collected, that is not listed here} {information about treatment involving the exposure of plant to non-mineral nutrient such as oxygen, hydrogen or carbon; should include the name of non-mineral nutrient, amount administered, treatment duration, interval and total experimental duration; can include multiple non-mineral nutrient regimens} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {information about treatment involving use of insecticides; should include the name of pesticide, amount administered, treatment duration, interval and total experimental duration; can include multiple pesticide regimens} {information about treatment involving exposure of plants to varying levels of pH of the growth media; can include multiple regimen} {name of body site that the sample was obtained from. For Plant Ontology (PO) (v 20) terms, see http://purl.bioontology.org/ontology/PO} {substance produced by the plant, where the sample was obtained from} {information about treatment involving exposure of plant or a plant part to a particular radiation regimen; should include the radiation type, amount or intensity administered, treatment duration, interval and total experimental duration; can include multiple radiation regimens} {information about treatment involving an exposure to a given amount of rainfall; can include multiple regimens} {Aerobic or anaerobic} {information about treatment involving use of salts as supplement to liquid and soil growth media; should include the name of salt, amount administered, treatment duration, interval and total experimental duration; can include multiple salt regimens} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {treatment involving an exposure to a particular season (e.g. winter, summer, rabi, rainy etc.)} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {treatment involving an exposure to standing water during a plant's life span, types can be flood water or standing water; can include multiple regimens} {temperature of the sample at time of sampling} {description of plant tissue culture growth media used} {information about treatment involving an exposure to water with varying degree of temperature; can include multiple regimens} {information about treatment involving an exposure to watering frequencies; can include multiple regimens}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{none}} {{float} {float}} {{float} {unit};{period};{interval};{period}} {{float} m} {{text};{float} {unit};{period};{interval};{period}} {{term}; {timestamp}} {{text};{float} {unit};{period};{interval};{period}} {{text};{period};{interval};{period}} {{float} m} {{float} {unit}} {{text};{float} {unit};{period};{interval};{period}} {{text};{float} {unit};{period};{interval};{period}} {{text};{float} {unit};{period};{interval};{period}} {{float} {unit};{period};{interval};{period}} {{text};{float} {unit};{period};{interval};{period}} {['', 'soil', 'liquid']} {{text};{float} {unit};{period};{interval};{period}} {{none}} {{none}} {{float} {unit}} {{none}} {{float} {unit}} {{text}} {{text}} {{float} {unit}} {{text}} {{none}} {{integer}} {{float} {unit}} {{float} {unit}} {{float} {unit};{period};{interval};{period}} {{text};{text}} {{text};{float} {unit};{period};{interval};{period}} {{text};{float} {unit}} {{text};{float} {unit};{period};{interval};{period}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {{text};{float} {unit};{period};{interval};{period}} {{float} {unit};{period};{interval};{period}} {{term}} {{text}} {{text};{float} {unit};{period};{interval};{period}} {{float} {unit};{period};{interval};{period}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text};{float} {unit};{period};{interval};{period}} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{text};{period};{interval};{period}} {{text}} {{text};{period};{interval};{period}} {{float} {unit}} {{PMID|DOI|URL}} {{float} {unit};{period};{interval};{period}} {{float} {unit};{period};{interval};{period}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *host *lat_lon air_temp_regm altitude antibiotic_regm chem_administration chem_mutagen climate_environment depth elev fertilizer_regm fungicide_regm gaseous_environment gravity growth_hormone_regm growth_med herbicide_regm host_age host_disease host_dry_mass host_genotype host_height host_infra_specific_name host_infra_specific_rank host_length host_life_stage host_phenotype host_taxid host_tot_mass host_wet_mass humidity_regm mechanical_damage mineral_nutr_regm misc_param non_mineral_nutr_regm organism_count oxy_stat_samp perturbation pesticide_regm ph_regm plant_body_site plant_product radiation_regm rainfall_regm rel_to_oxygen salt_regm samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext season_environment source_material_id standing_water_regm temp tiss_cult_growth_med water_temp_regm watering_regm" << endl; } }else if (package == "sediment") { out << "#MIMARKS.survey.sediment.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{float} m} {{float} {unit}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *depth *elev *env_biome *env_feature *env_material *geo_loc_name *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {alkalinity, the ability of a solution to neutralize acids to the equivalence point of carbonate or bicarbonate} {concentration of alkyl diethers} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {measurement of aminopeptidase activity} {concentration of ammonium} {measurement of bacterial carbon production} {amount of biomass; should include the name for the part of biomass measured, e.g. microbial, total. can include multiple measurements} {concentration of bishomohopanol} {concentration of bromide} {concentration of calcium} {ratio of amount or concentrations of carbon to nitrogen} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {concentration of chloride} {concentration of chlorophyll} {density of sample} {concentration of diether lipids; can include multiple types of diether lipids} {concentration of dissolved carbon dioxide} {concentration of dissolved hydrogen} {dissolved inorganic carbon concentration} {concentration of dissolved organic carbon} {dissolved organic nitrogen concentration measured as; total dissolved nitrogen - NH4 - NO3 - NO2} {concentration of dissolved oxygen} {measurement of glucosidase activity} {concentration of magnesium} {measurement of mean friction velocity} {measurement of mean peak friction velocity} {methane (gas) amount or concentration at the time of sampling} {any other measurement performed or parameter collected, that is not listed here} {concentration of n-alkanes; can include multiple n-alkanes} {concentration of nitrate} {concentration of nitrite} {concentration of nitrogen (total)} {concentration of organic carbon} {concentration of organic matter} {concentration of organic nitrogen} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {concentration of particulate organic carbon} {particles are classified, based on their size, into six general categories:clay, silt, sand, gravel, cobbles, and boulders; should include amount of particle preceded by the name of the particle type; can include multiple values} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {concentration of petroleum hydrocarbon} {pH measurement} {concentration of phaeopigments; can include multiple phaeopigments} {concentration of phosphate} {concentration of phospholipid fatty acids; can include multiple values} {porosity of deposited sediment is volume of voids divided by the total volume of sample} {concentration of potassium} {pressure to which the sample is subject, in atmospheres} {redox potential, measured relative to a hydrogen cell, indicating oxidation or reduction potential} {Aerobic or anaerobic} {salinity measurement} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {information about the sediment type based on major constituents} {concentration of silicate} {sodium concentration} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {concentration of sulfate} {concentration of sulfide} {temperature of the sample at time of sampling} {stage of tide} {total carbon content} {total nitrogen content of the sample} {Definition for soil: total organic C content of the soil units of g C/kg soil. Definition otherwise: total organic carbon content} {turbidity measurement} {water content measurement}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{float} m} {{float} {unit}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}} {{float} {unit}} {{float} {unit}} {{float} m} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{term}; {timestamp}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{float} {unit}} {{text};{float} {unit}} {{text};{interval}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{float} {unit}} {{text}} {{text|term}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {['', 'biogenous', 'cosmogenous', 'hydrogenous', 'lithogenous']} {{float} {unit}} {{float} {unit}} {{text}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {['', 'low', 'high']} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *depth *elev *env_biome *env_feature *env_material *geo_loc_name *lat_lon alkalinity alkyl_diethers altitude aminopept_act ammonium bacteria_carb_prod biomass bishomohopanol bromide calcium carb_nitro_ratio chem_administration chloride chlorophyll density diether_lipids diss_carb_dioxide diss_hydrogen diss_inorg_carb diss_org_carb diss_org_nitro diss_oxygen glucosidase_act magnesium mean_frict_vel mean_peak_frict_vel methane misc_param n_alkanes nitrate nitrite nitro org_carb org_matter org_nitro organism_count oxy_stat_samp part_org_carb particle_class perturbation petroleum_hydrocarb ph phaeopigments phosphate phosplipid_fatt_acid porosity potassium pressure redox_potential rel_to_oxygen salinity samp_collect_device samp_mat_process samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext sediment_type silicate sodium source_material_id sulfate sulfide temp tidal_stage tot_carb tot_nitro tot_org_carb turbidity water_content" << endl; } }else if (package == "soil") { out << "#MIMARKS.survey.soil.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{float} m} {{float} {unit}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *depth *elev *env_biome *env_feature *env_material *geo_loc_name *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {addition of fertilizers, pesticides, etc. - amount and time of applications} {aluminum saturation (esp. for tropical soils)} {reference or method used in determining Al saturation} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {mean annual and seasonal precipitation (mm)} {mean annual and seasonal temperature (oC)} {whether or not crop is rotated, and if yes, rotation schedule} {present state of sample site} {vegetation classification from one or more standard classification systems, or agricultural crop} {reference or method used in vegetation classification} {drainage classification from a standard system such as the USDA system} {unusual physical events that may have affected microbial populations} {measured salinity} {soil classification from the FAO World Reference Database for Soil Resources} {historical and/or physical evidence of fire} {historical and/or physical evidence of flooding} {heavy metals present and concentrationsany drug used by subject and the frequency of usage; can include multiple heavy metals and concentrations} {reference or method used in determining heavy metals} {specific layer in the land area which measures parallel to the soil surface and possesses physical characteristics which differ from the layers above and beneath} {reference or method used in determining the horizon} {none} {link to digitized soil maps or other soil classification information} {link to climate resource} {soil classification based on local soil classification system} {reference or method used in determining the local soil classification} {the part of the organic matter in the soil that constitutes living microorganisms smaller than 5-10 µm. IF you keep this, you would need to have correction factors used for conversion to the final units, which should be mg C (or N)/kg soil).} {reference or method used in determining microbial biomass} {any other measurement performed or parameter collected, that is not listed here} {pH measurement} {reference or method used in determining pH} {were multiple DNA extractions mixed? how many?} {previous land use and dates} {reference or method used in determining previous land use and dates} {cross-sectional position in the hillslope where sample was collected.sample area position in relation to surrounding areas} {Aerobic or anaerobic} {reference or method used in determining salinity} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {Amount or size of sample (volume, mass or area) that was collected} {volume (mL) or weight (g) of sample processed for DNA extraction} {collection design of pooled samples and/or sieve size and amount of sample sieved} {the direction a slope faces. While looking down a slope use a compass to record the direction you are facing (direction or degrees); e.g., NW or 315°. This measure provides an indication of sun and wind exposure that will influence soil temperature and evapotranspiration.} {commonly called “slope.” The angle between ground surface and a horizontal line (in percent). This is the direction that overland water would flow. This measure is usually taken with a hand level meter or clinometer.} {soil series name or other lower-level classification} {reference or method used in determining soil series name or other lower-level classification} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {explain how and for how long the soil sample was stored before DNA extraction.} {the relative proportion of different grain sizes of mineral particles in a soil, as described using a standard system; express as % sand (50 um to 2 mm), silt (2 um to 50 um), and clay (} {reference or method used in determining soil texture} {note method(s) used for tilling} {reference or method used in determining the total N} {total nitrogen content of the sample} {reference or method used in determining total organic C} {Definition for soil: total organic C content of the soil units of g C/kg soil. Definition otherwise: total organic carbon content} {water content (g/g or cm3/cm3)} {reference or method used in determining the water content of soil}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{float} m} {{float} {unit}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}} {{text};{float} {unit};{timestamp}} {{float} {unit}} {{PMID|DOI|URL}} {{float} m} {{float} {unit}} {{float} {unit}} {{boolean};Rn/{timestamp}/{period}} {['', 'cities', 'farmstead', 'industrial areas', 'roads/railroads', 'rock', 'sand', 'gravel', 'mudflats', 'salt flats', 'badlands', 'permanent snow or ice', 'saline seeps', 'mines/quarries', 'oil waste areas', 'small grains', 'row crops', 'vegetable crops', 'horticultural plants (e.g. tulips)', 'marshlands (grass,sedges,rushes)', 'tundra (mosses,lichens)', 'rangeland', 'pastureland (grasslands used for livestock grazing)', 'hayland', 'meadows (grasses,alfalfa,fescue,bromegrass,timothy)', 'shrub land (e.g. mesquite,sage-brush,creosote bush,shrub oak,eucalyptus)', 'successional shrub land (tree saplings,hazels,sumacs,chokecherry,shrub dogwoods,blackberries)', 'shrub crops (blueberries,nursery ornamentals,filberts)', 'vine crops (grapes)', 'conifers (e.g. pine,spruce,fir,cypress)', 'hardwoods (e.g. oak,hickory,elm,aspen)', 'intermixed hardwood and conifers', 'tropical (e.g. mangrove,palms)', 'rainforest (evergreen forest receiving <} {{text}} {{PMID|DOI|URL}} {['', 'very poorly', 'poorly', 'somewhat poorly', 'moderately well', 'well', 'excessively drained']} {{timestamp}} {{float} {unit}} {{term}} {{timestamp}} {{timestamp}} {{text};{float} {unit}} {{PMID|DOI|URL}} {['', 'O horizon', 'A horizon', 'E horizon', 'B horizon', 'C horizon', 'R layer', 'Permafrost']} {{PMID|DOI|URL}} {{PMID|DOI|URL}} {{PMID|DOI|URL}} {{PMID|DOI|URL}} {{text}} {{PMID|DOI|URL}} {{float} {unit}} {{PMID|DOI|URL}} {{text};{float} {unit}} {{float} {unit}} {{PMID|DOI|URL}} {{boolean};{float} {unit}} {{text};{timestamp}} {{PMID|DOI|URL}} {['', 'summit', 'shoulder', 'backslope', 'footslope', 'toeslope']} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{PMID|DOI|URL}} {{text}} {{text|term}} {{float} {unit}} {{float} {unit}} {{{text}|{float} {unit}};{float} {unit}} {{float} {unit}} {{float} {unit}} {{text}} {{PMID|DOI|URL}} {{text}} {{text};{period}} {{float} {unit}} {{PMID|DOI|URL}} {['', 'drill', 'cutting disc', 'ridge till', 'strip tillage', 'zonal tillage', 'chisel', 'tined', 'mouldboard', 'disc plough']} {{PMID|DOI|URL}} {{float} {unit}} {{PMID|DOI|URL}} {{float} {unit}} {{float} [g/g|cm3/cm3]} {{PMID|DOI|URL}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *depth *elev *env_biome *env_feature *env_material *geo_loc_name *lat_lon agrochem_addition al_sat al_sat_meth altitude annual_season_precpt annual_season_temp crop_rotation cur_land_use cur_vegetation cur_vegetation_meth drainage_class extreme_event extreme_salinity fao_class fire flooding heavy_metals heavy_metals_meth horizon horizon_meth link_addit_analys link_class_info link_climate_info local_class local_class_meth microbial_biomass microbial_biomass_meth misc_param ph ph_meth pool_dna_extracts previous_land_use previous_land_use_meth profile_position rel_to_oxygen salinity_meth samp_collect_device samp_mat_process samp_size samp_vol_we_dna_ext sieving slope_aspect slope_gradient soil_type soil_type_meth source_material_id store_cond texture texture_meth tillage tot_n_meth tot_nitro tot_org_c_meth tot_org_carb water_content_soil water_content_soil_meth" << endl; } }else if (package == "wastewater") { out << "#MIMARKS.survey.wastewater.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {alkalinity, the ability of a solution to neutralize acids to the equivalence point of carbonate or bicarbonate} {a measure of the relative oxygen-depletion effect of a waste contaminant} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {a measure of the relative oxygen-depletion effect of a waste contaminant} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {percentage of volatile solids removed from the anaerobic digestor} {amount or concentration of substances such as paints, adhesives, mayonnaise, hair colorants, emulsified oils, etc.; can include multiple emulsion types} {amount or concentration of substances such as hydrogen sulfide, carbon dioxide, methane, etc.; can include multiple substances} {percentage of industrial effluents received by wastewater treatment plant} {concentration of particles such as sand, grit, metal particles, ceramics, etc.; can include multiple particles} {any other measurement performed or parameter collected, that is not listed here} {concentration of nitrate} {concentration of particles such as faeces, hairs, food, vomit, paper fibers, plant material, humus, etc.} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {pH measurement} {concentration of phosphate} {the process of pre-treatment removes materials that can be easily collected from the raw wastewater} {the process to produce both a generally homogeneous liquid capable of being treated biologically and a sludge that can be separately treated or processed} {anaerobic digesters can be designed and engineered to operate using a number of different process configurations, as batch or continuous, mesophilic, high solid or low solid, and single stage or multistage} {Aerobic or anaerobic} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {none} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {the process for substantially degrading the biological content of the sewage} {type of wastewater treatment plant as municipial or industrial} {the time activated sludge remains in reactor} {sodium concentration} {concentration of substances such as ammonia, road-salt, sea-salt, cyanide, hydrogen sulfide, thiocyanates, thiosulfates, etc.} {concentration of substances such as urea, fruit sugars, soluble proteins, drugs, pharmaceuticals, etc.} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {concentration of substances including a wide variety of material, such as silt, decaying plant and animal matter, etc,; can include multiple substances} {temperature of the sample at time of sampling} {the process providing a final treatment stage to raise the effluent quality before it is discharged to the receiving environment} {total nitrogen content of the sample} {total amount or concentration of phosphate} {the origin of wastewater such as human waste, rainfall, storm drains, etc.}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}} {{float} {unit}} {{float} {unit}} {{term}; {timestamp}} {{float} {unit}} {{float} m} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{text};{interval}} {{float} {unit}} {{float} {unit}} {{text}} {{text}} {{text}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{text}} {{text|term}} {{none}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{text}} {{text}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {{text}} {{text};{float} {unit}} {{float} {unit}} {{text}} {{float} {unit}} {{float} {unit}} {{text}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *env_biome *env_feature *env_material *geo_loc_name *lat_lon alkalinity biochem_oxygen_dem chem_administration chem_oxygen_dem depth efficiency_percent emulsions gaseous_substances indust_eff_percent inorg_particles misc_param nitrate org_particles organism_count oxy_stat_samp perturbation ph phosphate pre_treatment primary_treatment reactor_type rel_to_oxygen samp_collect_device samp_mat_process samp_salinity samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext secondary_treatment sewage_type sludge_retent_time sodium soluble_inorg_mat soluble_org_mat source_material_id suspend_solids temp tertiary_treatment tot_nitro tot_phosphate wastewater_type" << endl; } }else if (package == "water") { out << "#MIMARKS.survey.water.4.0" << endl; if (requiredonly) { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{float} m} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *depth *env_biome *env_feature *env_material *geo_loc_name *lat_lon" << endl; }else { out << "#{sample name} {description of sample} {sample title} {description of library_construction_protocol} {http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock} {Date of sampling, in \"DD-Mmm-YYYY\", \"Mmm-YYYY\" or \"YYYY\" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard \"YYYY-mm-dd\", \"YYYY-mm\" or \"YYYY-mm-ddThh:mm:ss\" (eg., 1990-10-30, 1990-10 or 1990-10-30T14:41:36)} {Depth is defined as the vertical distance below surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectivly. Depth can be reported as an interval for subsurface samples.} {descriptor of the broad ecological context of a sample. Examples include: desert, taiga, deciduous woodland, or coral reef. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {descriptor of the local environment. Examples include: harbor, cliff, or lake. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {material that was displaced by the sample, or material in which a sample was embedded, prior to the sampling event. Examples include: air, soil, or water. EnvO (v 2013-06-14) terms can be found via the link: www.environmentontology.org/Browse-EnvO} {Geographical origin of the sample; use the appropriate name from this list http://www.insdc.org/documents/country-qualifier-vocabulary. Use a colon to separate the country or ocean from more detailed information about the location, eg \"Canada: Vancouver\" or \"Germany: halfway down Zugspitze, Alps\"} {The geographical coordinates of the location where the sample was collected. Specify as degrees latitude and longitude in format \"d[d.dddd] N|S d[dd.dddd] W|E\", eg, 38.98 N 77.11 W} {alkalinity, the ability of a solution to neutralize acids to the equivalence point of carbonate or bicarbonate} {concentration of alkyl diethers} {The altitude of the sample is the vertical distance between Earth's surface above Sea Level and the sampled position in the air.} {measurement of aminopeptidase activity} {concentration of ammonium} {measurement of atmospheric data; can include multiple data} {bacterial production in the water column measured by isotope uptake} {measurement of bacterial respiration in the water column} {measurement of bacterial carbon production} {amount of biomass; should include the name for the part of biomass measured, e.g. microbial, total. can include multiple measurements} {concentration of bishomohopanol} {concentration of bromide} {concentration of calcium} {ratio of amount or concentrations of carbon to nitrogen} {list of chemical compounds administered to the host or site where sampling occurred, and when (e.g. antibiotics, N fertilizer, air filter); can include multiple compounds. For Chemical Entities of Biological Interest ontology (CHEBI) (v1.72), please see http://bioportal.bioontology.org/visualize/44603} {concentration of chloride} {concentration of chlorophyll} {electrical conductivity of water} {density of sample} {concentration of diether lipids; can include multiple types of diether lipids} {concentration of dissolved carbon dioxide} {concentration of dissolved hydrogen} {dissolved inorganic carbon concentration} {concentration of dissolved inorganic nitrogen} {concentration of dissolved inorganic phosphorus} {concentration of dissolved organic carbon} {dissolved organic nitrogen concentration measured as; total dissolved nitrogen - NH4 - NO3 - NO2} {concentration of dissolved oxygen} {visible waveband radiance and irradiance measurements in the water column} {The elevation of the sampling site as measured by the vertical distance from mean sea level.} {raw or converted fluorescence of water} {measurement of glucosidase activity} {measurement of light intensity} {concentration of magnesium} {measurement of mean friction velocity} {measurement of mean peak friction velocity} {any other measurement performed or parameter collected, that is not listed here} {concentration of n-alkanes; can include multiple n-alkanes} {concentration of nitrate} {concentration of nitrite} {concentration of nitrogen (total)} {concentration of organic carbon} {concentration of organic matter} {concentration of organic nitrogen} {total count of any organism per gram or volume of sample,should include name of organism followed by count; can include multiple organism counts} {oxygenation status of sample} {concentration of particulate organic carbon} {concentration of particulate organic nitrogen} {type of perturbation, e.g. chemical administration, physical disturbance, etc., coupled with time that perturbation occurred; can include multiple perturbation types} {concentration of petroleum hydrocarbon} {pH measurement} {concentration of phaeopigments; can include multiple phaeopigments} {concentration of phosphate} {concentration of phospholipid fatty acids; can include multiple values} {measurement of photon flux} {concentration of potassium} {pressure to which the sample is subject, in atmospheres} {measurement of primary production} {redox potential, measured relative to a hydrogen cell, indicating oxidation or reduction potential} {Aerobic or anaerobic} {salinity measurement} {Method or device employed for collecting sample} {Processing applied to the sample during or after isolation} {Amount or size of sample (volume, mass or area) that was collected} {none} {none} {none} {volume (mL) or weight (g) of sample processed for DNA extraction} {concentration of silicate} {sodium concentration} {concentration of soluble reactive phosphorus} {unique identifier assigned to a material sample used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples.} {concentration of sulfate} {concentration of sulfide} {concentration of suspended particulate matter} {temperature of the sample at time of sampling} {stage of tide} {measurement of total depth of water column} {total dissolved nitrogen concentration, reported as nitrogen, measured by: total dissolved nitrogen = NH4 + NO3NO2 + dissolved organic nitrogen} {total inorganic nitrogen content} {total nitrogen content of the sample} {total particulate carbon content} {total phosphorus concentration, calculated by: total phosphorus = total dissolved phosphorus + particulate phosphorus. Can also be measured without filtering, reported as phosphorus} {measurement of magnitude and direction of flow within a fluid}" << endl; out << "#{text} {text} {text} {text} {controlled vacabulary} {{timestamp}} {{float} m} {{term}} {{term}} {{term}} {{term}:{term}:{text}} {{float} {float}} {{float} {unit}} {{float} {unit}} {{float} m} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{term}; {timestamp}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {['', 'aerobic', 'anaerobic']} {{float} {unit}} {{float} {unit}} {{text};{interval}} {{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{text};{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {['', 'aerobe', 'anaerobe', 'facultative', 'microaerophilic', 'microanaerobe', 'obligate aerobe', 'obligate anaerobe']} {{float} {unit}} {{text}} {{text|term}} {{float} {unit}} {{none}} {{none}} {{none}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{text}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {['', 'low', 'high']} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}} {{float} {unit}}" << endl; out << "*sample_name *description *sample_title *seq_methods *organism *collection_date *depth *env_biome *env_feature *env_material *geo_loc_name *lat_lon alkalinity alkyl_diethers altitude aminopept_act ammonium atmospheric_data bac_prod bac_resp bacteria_carb_prod biomass bishomohopanol bromide calcium carb_nitro_ratio chem_administration chloride chlorophyll conduc density diether_lipids diss_carb_dioxide diss_hydrogen diss_inorg_carb diss_inorg_nitro diss_inorg_phosp diss_org_carb diss_org_nitro diss_oxygen down_par elev fluor glucosidase_act light_intensity magnesium mean_frict_vel mean_peak_frict_vel misc_param n_alkanes nitrate nitrite nitro org_carb org_matter org_nitro organism_count oxy_stat_samp part_org_carb part_org_nitro perturbation petroleum_hydrocarb ph phaeopigments phosphate phosplipid_fatt_acid photon_flux potassium pressure primary_prod redox_potential rel_to_oxygen salinity samp_collect_device samp_mat_process samp_size samp_store_dur samp_store_loc samp_store_temp samp_vol_we_dna_ext silicate sodium soluble_react_phosp source_material_id sulfate sulfide suspend_part_matter temp tidal_stage tot_depth_water_col tot_diss_nitro tot_inorg_nitro tot_nitro tot_part_carb tot_phosp water_current" << endl; } } for (set::iterator it = Groups.begin(); it != Groups.end(); it++) { out << *it << endl; } out.close(); //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "execute"); exit(1); } } //*************************************************************************************************************** // going to have to rework this to allow for other options -- /* file option 1 sfffile1 oligosfile1 sfffile2 oligosfile2 ... file option 2 fastqfile1 oligosfile1 fastqfile2 oligosfile2 ... file option 3 ffastqfile1 rfastqfile1 ffastqfile2 rfastqfile2 ... file option 4 group fastqfile fastqfile group fastqfile fastqfile group fastqfile fastqfile ... file option 5 My.forward.fastq My.reverse.fastq none My.rindex.fastq //none is an option is no forward or reverse index file ... */ int GetMIMarksPackageCommand::readFile(){ try { inputfile = file; int format = 2; ifstream in; m->openInputFile(file, in); while(!in.eof()) { Oligos oligos; if (m->control_pressed) { return 0; } string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); string group = ""; string thisFileName1, thisFileName2; thisFileName1 = ""; thisFileName2 = ""; if (pieces.size() == 2) { thisFileName1 = pieces[0]; thisFileName2 = pieces[1]; }else if (pieces.size() == 3) { thisFileName1 = pieces[1]; thisFileName2 = pieces[2]; group = pieces[0]; }else if (pieces.size() == 4) { if (!setOligosParameter) { m->mothurOut("[ERROR]: You must have an oligosfile with the index file option. Aborting. \n"); m->control_pressed = true; } thisFileName1 = pieces[0]; thisFileName2 = pieces[1]; }else { m->mothurOut("[ERROR]: file lines can be 2, 3 or 4 columns. The 2 column files are sff file then oligos or fastqfile then oligos or ffastq and rfastq. You may have multiple lines in the file. The 3 column files are for paired read libraries. The format is groupName, forwardFastqFile reverseFastqFile. Four column files are for inputting file pairs with index files. Example: My.forward.fastq My.reverse.fastq NONE My.rindex.fastq. The keyword NONE can be used when there is not a index file for either the forward or reverse file.\n"); m->control_pressed = true; } if (m->debug) { m->mothurOut("[DEBUG]: group = " + group + ", thisFileName1 = " + thisFileName1 + ", thisFileName2 = " + thisFileName2 + ".\n"); } if (inputDir != "") { string path = m->hasPath(thisFileName2); if (path == "") { thisFileName2 = inputDir + thisFileName2; } path = m->hasPath(thisFileName1); if (path == "") { thisFileName1 = inputDir + thisFileName1; } } //check to make sure both are able to be opened ifstream in2; int openForward = m->openInputFile(thisFileName1, in2, "noerror"); //if you can't open it, try default location if (openForward == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(thisFileName1); m->mothurOut("Unable to open " + thisFileName1 + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openForward = m->openInputFile(tryPath, in3, "noerror"); in3.close(); thisFileName1 = tryPath; } } //if you can't open it, try output location if (openForward == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(thisFileName1); m->mothurOut("Unable to open " + thisFileName1 + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openForward = m->openInputFile(tryPath, in4, "noerror"); thisFileName1 = tryPath; in4.close(); } } if (openForward == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + thisFileName1 + ", ignoring.\n"); }else{ in2.close(); } int openReverse = 1; ifstream in3; openReverse = m->openInputFile(thisFileName2, in3, "noerror"); //if you can't open it, try default location if (openReverse == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(thisFileName2); m->mothurOut("Unable to open " + thisFileName2 + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openReverse = m->openInputFile(tryPath, in3, "noerror"); in3.close(); thisFileName2 = tryPath; } } //if you can't open it, try output location if (openReverse == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(thisFileName2); m->mothurOut("Unable to open " + thisFileName2 + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openReverse = m->openInputFile(tryPath, in4, "noerror"); thisFileName2 = tryPath; in4.close(); } } if (openReverse == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + thisFileName2 + ", ignoring pair.\n"); }else{ in3.close(); } if ((pieces.size() == 2) && (openForward != 1) && (openReverse != 1)) { //good pair and sff or fastq and oligos oligosfile = thisFileName2; if (m->debug) { m->mothurOut("[DEBUG]: about to read oligos\n"); } oligos.read(oligosfile); createGroupNames(oligos); // adding in groupNames from this file format = 2; }else if((pieces.size() == 3) && (openForward != 1) && (openReverse != 1)) { //good pair and paired read Groups.insert(group); format = 3; } } in.close(); inputfile = file; return 0; } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "readFile"); exit(1); } } //********************************************************************************************************************** set GetMIMarksPackageCommand::createGroupNames(Oligos& oligos) { try { bool pairedOligos = false; if (oligos.hasPairedPrimers() || oligos.hasPairedBarcodes()) { pairedOligos = true; } vector groupNames = oligos.getGroupNames(); if (groupNames.size() == 0) { return Groups; } if (pairedOligos) { //overwrite global oligos - assume fastq data like make.contigs Oligos oligos; if ((fileOption == 3) || (fileOption == 5)) { oligos.read(oligosfile, false); } //like make.contigs else { oligos.read(oligosfile); } map barcodes = oligos.getPairedBarcodes(); map primers = oligos.getPairedPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->first); string barcodeName = oligos.getBarcodeName(itBar->first); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string comboName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } if(((itPrimer->second).forward+(itPrimer->second).reverse) == ""){ if ((itBar->second).forward != "NONE") { comboName += (itBar->second).forward; } if ((itBar->second).reverse != "NONE") { if (comboName == "") { comboName += (itBar->second).reverse; } else { comboName += ("."+(itBar->second).reverse); } } }else{ if(((itBar->second).forward+(itBar->second).reverse) == ""){ if ((itPrimer->second).forward != "NONE") { comboName += (itPrimer->second).forward; } if ((itPrimer->second).reverse != "NONE") { if (comboName == "") { comboName += (itPrimer->second).reverse; } else { comboName += ("."+(itPrimer->second).reverse); } } } else{ if ((itBar->second).forward != "NONE") { comboName += (itBar->second).forward; } if ((itBar->second).reverse != "NONE") { if (comboName == "") { comboName += (itBar->second).reverse; } else { comboName += ("."+(itBar->second).reverse); } } if ((itPrimer->second).forward != "NONE") { if (comboName == "") { comboName += (itPrimer->second).forward; } else { comboName += ("."+(itPrimer->second).forward); } } if ((itPrimer->second).reverse != "NONE") { if (comboName == "") { comboName += (itPrimer->second).reverse; } else { comboName += ("."+(itPrimer->second).reverse); } } } } if (comboName != "") { comboGroupName += "_" + comboName; } Groups.insert(comboGroupName); map::iterator itGroup2Barcode = Group2Barcode.find(comboGroupName); if (itGroup2Barcode == Group2Barcode.end()) { string temp = (itBar->second).forward+"."+(itBar->second).reverse; Group2Barcode[comboGroupName] = temp; }else { string temp = (itBar->second).forward+"."+(itBar->second).reverse; if ((temp != ".") && (temp != itGroup2Barcode->second)) { m->mothurOut("[ERROR]: groupName = " + comboGroupName + "\t" + temp + "\t" + itGroup2Barcode->second + " group and barcodes/primers not unique. Should never get here.\n"); } } itGroup2Barcode = Group2Primer.find(comboGroupName); if (itGroup2Barcode == Group2Primer.end()) { string temp = ((itPrimer->second).forward+"."+(itPrimer->second).reverse); Group2Primer[comboGroupName] = temp; }else { string temp = ((itPrimer->second).forward+"."+(itPrimer->second).reverse); if ((temp != ".") && (temp != itGroup2Barcode->second)) { m->mothurOut("[ERROR]: groupName = " + comboGroupName + "\t" + temp + "\t" + itGroup2Barcode->second + " group and barcodes/primers not unique. Should never get here.\n"); } } } } } }else { map barcodes = oligos.getBarcodes() ; map primers = oligos.getPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->second); string barcodeName = oligos.getBarcodeName(itBar->second); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string comboName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } if(itPrimer->first == ""){ comboName = itBar->first; }else{ if(itBar->first == ""){ comboName = itPrimer->first; } else{ comboName = itBar->first + "." + itPrimer->first; } } if (comboName != "") { comboGroupName += "_" + comboName; } Groups.insert(comboGroupName); map::iterator itGroup2Barcode = Group2Barcode.find(comboGroupName); if (itGroup2Barcode == Group2Barcode.end()) { string temp = (itBar->first); Group2Barcode[comboGroupName] = temp; }else { string temp = (itBar->first); if ((temp != ".") && (temp != itGroup2Barcode->second)) { m->mothurOut("[ERROR]: groupName = " + comboGroupName + "\t" + temp + "\t" + itGroup2Barcode->second + " group and barcodes/primers not unique. Should never get here.\n"); } } itGroup2Barcode = Group2Primer.find(comboGroupName); if (itGroup2Barcode == Group2Primer.end()) { string temp = (itPrimer->first); Group2Primer[comboGroupName] = temp; }else { string temp = (itPrimer->first); if ((temp != ".") && (temp != itGroup2Barcode->second)) { m->mothurOut("[ERROR]: groupName = " + comboGroupName + "\t" + temp + "\t" + itGroup2Barcode->second + " group and barcodes/primers not unique. Should never get here.\n"); } } } } } } if (Groups.size() == 0) { m->mothurOut("[ERROR]: your oligos file does not contain any group names."); m->mothurOutEndLine(); m->control_pressed = true; } return Groups; } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "createGroupNames"); exit(1); } } //********************************************************************************************************************** /* file option 1 sfffile1 oligosfile1 sfffile2 oligosfile2 ... file option 2 fastqfile1 oligosfile1 fastqfile2 oligosfile2 ... file option 3 ffastqfile1 rfastqfile1 ffastqfile2 rfastqfile2 ... file option 4 group fastqfile fastqfile group fastqfile fastqfile group fastqfile fastqfile ... file option 5 My.forward.fastq My.reverse.fastq none My.rindex.fastq //none is an option is no forward or reverse index file ... */ int GetMIMarksPackageCommand::findFileOption(){ try { ifstream in; m->openInputFile(file, in); fileOption = 0; while(!in.eof()) { if (m->control_pressed) { return 0; } string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); if (pieces.size() == 2) { //good pair and sff or fastq and oligos if (!setOligosParameter) { fileOption = 12; //1 or 2 }else { fileOption = 3; } }else if(pieces.size() == 3) { //good pair and paired read fileOption = 4; }else if (pieces.size() == 4) { fileOption = 5; } break; } in.close(); return fileOption; } catch(exception& e) { m->errorOut(e, "GetMIMarksPackageCommand", "findFileOption"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getmimarkspackagecommand.h000066400000000000000000000030431255543666200234260ustar00rootroot00000000000000// // getmimarkspackagecommand.h // Mothur // // Created by Sarah Westcott on 3/25/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #ifndef Mothur_getmimarkspackagecommand_h #define Mothur_getmimarkspackagecommand_h #include "command.hpp" #include "oligos.h" /**************************************************************************************************/ class GetMIMarksPackageCommand : public Command { public: GetMIMarksPackageCommand(string); GetMIMarksPackageCommand(); ~GetMIMarksPackageCommand(){} vector setParameters(); string getCommandName() { return "get.mimarkspackage"; } string getCommandCategory() { return "Sequence Processing"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "http://www.mothur.org/wiki/get.mimarkspackage"; } string getDescription() { return "create blank mimarks package form for sra command"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, requiredonly, setOligosParameter; string oligosfile, groupfile, package, inputfile, file, inputDir; int fileOption; string outputDir; vector outputNames; set createGroupNames(Oligos& oligos); set Groups; map Group2Barcode; map Group2Primer; int findFileOption(); int readFile(); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/getotulabelscommand.cpp000066400000000000000000000677431255543666200230150ustar00rootroot00000000000000// // getotulabelscommand.cpp // Mothur // // Created by Sarah Westcott on 5/21/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "getotulabelscommand.h" //********************************************************************************************************************** vector GetOtuLabelsCommand::setParameters(){ try { CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,true, true); parameters.push_back(paccnos); CommandParameter pconstaxonomy("constaxonomy", "InputTypes", "", "", "none", "FNGLT", "none","constaxonomy",false,false, true); parameters.push_back(pconstaxonomy); CommandParameter plist("list", "InputTypes", "", "", "none", "FNGLT", "none","list",false,false, true); parameters.push_back(plist); CommandParameter pshared("shared", "InputTypes", "", "", "none", "FNGLT", "none","shared",false,false, true); parameters.push_back(pshared); CommandParameter potucorr("otucorr", "InputTypes", "", "", "none", "FNGLT", "none","otucorr",false,false, true); parameters.push_back(potucorr); CommandParameter pcorraxes("corraxes", "InputTypes", "", "", "none", "FNGLT", "none","corraxes",false,false, true); parameters.push_back(pcorraxes); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetOtuLabelsCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.otulabels command can be used to select specific otus with the output from classify.otu, otu.association, or corr.axes commands. It can also be used to select a set of otus from a shared or list file.\n"; helpString += "The get.otulabels parameters are: constaxonomy, otucorr, corraxes, shared, list, label and accnos.\n"; helpString += "The constaxonomy parameter is used to input the results of the classify.otu command.\n"; helpString += "The otucorr parameter is used to input the results of the otu.association command.\n"; helpString += "The corraxes parameter is used to input the results of the corr.axes command.\n"; helpString += "The label parameter is used to analyze specific labels in your input. \n"; helpString += "The get.otulabels commmand should be in the following format: \n"; helpString += "get.otulabels(accnos=yourListOfOTULabels, corraxes=yourCorrAxesFile)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetOtuLabelsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "constaxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "otucorr") { pattern = "[filename],pick,[extension]"; } else if (type == "corraxes") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[distance],pick,[extension]"; } else if (type == "shared") { pattern = "[filename],[distance],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetOtuLabelsCommand::GetOtuLabelsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["otucorr"] = tempOutNames; outputTypes["corraxes"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["list"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "GetOtuLabelsCommand"); exit(1); } } //********************************************************************************************************************** GetOtuLabelsCommand::GetOtuLabelsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { //edit file types below to include only the types you added as parameters string path; it = parameters.find("constaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["constaxonomy"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("corraxes"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["corraxes"] = inputDir + it->second; } } it = parameters.find("otucorr"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["otucorr"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } vector tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["otucorr"] = tempOutNames; outputTypes["corraxes"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["list"] = tempOutNames; //check for parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = m->getAccnosFile(); if (accnosfile != "") { m->mothurOut("Using " + accnosfile + " as input file for the accnos parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no valid accnos file and accnos is required."); m->mothurOutEndLine(); abort = true; } }else { m->setAccnosFile(accnosfile); } constaxonomyfile = validParameter.validFile(parameters, "constaxonomy", true); if (constaxonomyfile == "not open") { constaxonomyfile = ""; abort = true; } else if (constaxonomyfile == "not found") { constaxonomyfile = ""; } corraxesfile = validParameter.validFile(parameters, "corraxes", true); if (corraxesfile == "not open") { corraxesfile = ""; abort = true; } else if (corraxesfile == "not found") { corraxesfile = ""; } otucorrfile = validParameter.validFile(parameters, "otucorr", true); if (otucorrfile == "not open") { otucorrfile = ""; abort = true; } else if (otucorrfile == "not found") { otucorrfile = ""; } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } if ((constaxonomyfile == "") && (corraxesfile == "") && (otucorrfile == "") && (sharedfile == "") && (listfile == "")) { m->mothurOut("You must provide one of the following: constaxonomy, corraxes, otucorr, shared or list."); m->mothurOutEndLine(); abort = true; } if ((sharedfile != "") || (listfile != "")) { label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } } } } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "GetOtuLabelsCommand"); exit(1); } } //********************************************************************************************************************** int GetOtuLabelsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get labels you want to keep labels = m->readAccnos(accnosfile); //simplfy labels set newLabels; for (set::iterator it = labels.begin(); it != labels.end(); it++) { newLabels.insert(m->getSimpleLabel(*it)); } labels = newLabels; if (m->control_pressed) { return 0; } //read through the correct file and output lines you want to keep if (constaxonomyfile != "") { readClassifyOtu(); } if (corraxesfile != "") { readCorrAxes(); } if (otucorrfile != "") { readOtuAssociation(); } if (listfile != "") { readList(); } if (sharedfile != "") { readShared(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetOtuLabelsCommand::readClassifyOtu(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(constaxonomyfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(constaxonomyfile)); variables["[extension]"] = m->getExtension(constaxonomyfile); string outputFileName = getOutputFileName("constaxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(constaxonomyfile, in); bool wroteSomething = false; int selectedCount = 0; //read headers string headers = m->getline(in); out << headers << endl; while (!in.eof()) { if (m->control_pressed) { break; } string otu = ""; string tax = "unknown"; int size = 0; in >> otu >> size >> tax; m->gobble(in); if (labels.count(m->getSimpleLabel(otu)) != 0) { wroteSomething = true; selectedCount++; out << otu << '\t' << size << '\t' << tax << endl; } } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any labels from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["constaxonomy"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " otus from your constaxonomy file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "readClassifyOtu"); exit(1); } } //********************************************************************************************************************** int GetOtuLabelsCommand::readOtuAssociation(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(otucorrfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(otucorrfile)); variables["[extension]"] = m->getExtension(otucorrfile); string outputFileName = getOutputFileName("otucorr", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(otucorrfile, in); bool wroteSomething = false; int selectedCount = 0; //read headers string headers = m->getline(in); out << headers << endl; while (!in.eof()) { if (m->control_pressed) { break; } string otu1 = ""; string otu2 = ""; in >> otu1 >> otu2; string line = m->getline(in); m->gobble(in); if ((labels.count(m->getSimpleLabel(otu1)) != 0) && (labels.count(m->getSimpleLabel(otu2)) != 0)){ wroteSomething = true; selectedCount++; out << otu1 << '\t' << otu2 << '\t' << line << endl; } } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any labels from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["otucorr"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " lines from your otu.corr file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "readOtuAssociation"); exit(1); } } //********************************************************************************************************************** int GetOtuLabelsCommand::readCorrAxes(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(corraxesfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(corraxesfile)); variables["[extension]"] = m->getExtension(corraxesfile); string outputFileName = getOutputFileName("corraxes", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(corraxesfile, in); bool wroteSomething = false; int selectedCount = 0; //read headers string headers = m->getline(in); out << headers << endl; while (!in.eof()) { if (m->control_pressed) { break; } string otu = ""; in >> otu; string line = m->getline(in); m->gobble(in); if (labels.count(m->getSimpleLabel(otu)) != 0) { wroteSomething = true; selectedCount++; out << otu << '\t' << line << endl; } } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any labels from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["corraxes"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " lines from your corr.axes file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "readCorrAxes"); exit(1); } } //********************************************************************************************************************** int GetOtuLabelsCommand::readShared(){ try { getShared(); if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } vector newLabels; //create new "filtered" lookup vector newLookup; for (int i = 0; i < lookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(lookup[i]->getLabel()); temp->setGroup(lookup[i]->getGroup()); newLookup.push_back(temp); } bool wroteSomething = false; int numSelected = 0; for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } return 0; } //is this otu on the list if (labels.count(m->getSimpleLabel(m->currentSharedBinLabels[i])) != 0) { numSelected++; wroteSomething = true; newLabels.push_back(m->currentSharedBinLabels[i]); for (int j = 0; j < newLookup.size(); j++) { //add this OTU to the new lookup newLookup[j]->push_back(lookup[j]->getAbundance(i), lookup[j]->getGroup()); } } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } m->currentSharedBinLabels = newLabels; newLookup[0]->printHeaders(out); for (int i = 0; i < newLookup.size(); i++) { out << newLookup[i]->getLabel() << '\t' << newLookup[i]->getGroup() << '\t'; newLookup[i]->print(out); } out.close(); for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } if (wroteSomething == false) { m->mothurOut("Your file does not contain any OTUs from the .accnos file."); m->mothurOutEndLine(); } m->mothurOut("Selected " + toString(numSelected) + " OTUs from your shared file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "readShared"); exit(1); } } //********************************************************************************************************************** int GetOtuLabelsCommand::readList(){ try { getListVector(); if (m->control_pressed) { delete list; return 0;} ListVector newList; newList.setLabel(list->getLabel()); int selectedCount = 0; bool wroteSomething = false; vector binLabels = list->getLabels(); vector newLabels; for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { delete list; return 0;} if (labels.count(m->getSimpleLabel(binLabels[i])) != 0) { selectedCount++; newList.push_back(list->get(i)); newLabels.push_back(binLabels[i]); } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); variables["[distance]"] = list->getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); delete list; //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newLabels); newList.printHeaders(out); newList.print(out); } out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any OTUs from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["list"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " OTUs from your list file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "readList"); exit(1); } } //********************************************************************************************************************** int GetOtuLabelsCommand::getListVector(){ try { InputData input(listfile, "list"); list = input.getListVector(); string lastLabel = list->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); break; } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); break; } lastLabel = list->getLabel(); //get next line to process //prevent memory leak delete list; list = input.getListVector(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { delete list; list = input.getListVector(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "getListVector"); exit(1); } } //********************************************************************************************************************** int GetOtuLabelsCommand::getShared(){ try { InputData input(sharedfile, "sharedfile"); lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(lookup[0]->getLabel()) == 1){ processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); break; } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "getShared"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getotulabelscommand.h000066400000000000000000000031771255543666200224510ustar00rootroot00000000000000#ifndef Mothur_getotulabelscommand_h #define Mothur_getotulabelscommand_h // // getotulabelscommand.h // Mothur // // Created by Sarah Westcott on 5/21/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "inputdata.h" #include "listvector.hpp" #include "sharedrabundvector.h" /**************************************************************************************************/ class GetOtuLabelsCommand : public Command { public: GetOtuLabelsCommand(string); GetOtuLabelsCommand(); ~GetOtuLabelsCommand(){} vector setParameters(); string getCommandName() { return "get.otulabels"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.otulabels"; } string getDescription() { return "Can be used with output from classify.otu, otu.association, or corr.axes to select specific otus."; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string outputDir, accnosfile, constaxonomyfile, otucorrfile, corraxesfile, listfile, sharedfile, label; vector outputNames; set labels; ListVector* list; vector lookup; int readClassifyOtu(); int readOtuAssociation(); int readCorrAxes(); int readList(); int readShared(); int getListVector(); int getShared(); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/getoturepcommand.cpp000066400000000000000000001634071255543666200223330ustar00rootroot00000000000000/* * getoturepcommand.cpp * Mothur * * Created by Sarah Westcott on 4/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "getoturepcommand.h" #include "readphylip.h" #include "readcolumn.h" #include "formatphylip.h" #include "formatcolumn.h" #include "sharedutilities.h" //******************************************************************************************************************** //sorts lowest to highest inline bool compareName(repStruct left, repStruct right){ return (left.name < right.name); } //******************************************************************************************************************** //sorts lowest to highest inline bool compareBin(repStruct left, repStruct right){ return (left.simpleBin < right.simpleBin); } //******************************************************************************************************************** //sorts lowest to highest inline bool compareSize(repStruct left, repStruct right){ return (left.size < right.size); } //******************************************************************************************************************** //sorts lowest to highest inline bool compareGroup(repStruct left, repStruct right){ return (left.group < right.group); } //********************************************************************************************************************** vector GetOTURepCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","name",false,true, true); parameters.push_back(plist); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,false, true); parameters.push_back(pfasta); CommandParameter pphylip("phylip", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "none","",false,false, true); parameters.push_back(pphylip); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "ColumnName","",false,false, true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "ColumnName","count",false,false, true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false, true); parameters.push_back(pgroup); CommandParameter pcolumn("column", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "ColumnName","",false,false, true); parameters.push_back(pcolumn); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pcutoff("cutoff", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter pprecision("precision", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pprecision); CommandParameter pweighted("weighted", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pweighted); CommandParameter psorted("sorted", "Multiple", "none-name-bin-size-group", "none", "", "", "","",false,false); parameters.push_back(psorted); CommandParameter pmethod("method", "Multiple", "distance-abundance", "distance", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter plarge("large", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(plarge); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetOTURepCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.oturep command parameters are phylip, column, list, fasta, name, group, count, large, weighted, cutoff, precision, groups, sorted, method and label. The list parameter is required, as well as phylip or column and name if you are using method=distance. If method=abundance a name or count file is required.\n"; helpString += "The label parameter allows you to select what distance levels you would like a output files created for, and is separated by dashes.\n"; helpString += "The phylip or column parameter is required for method=distance, but only one may be used. If you use a column file the name or count filename is required. \n"; helpString += "The method parameter allows you to select the method of selecting the representative sequence. Choices are distance and abundance. The distance method finds the sequence with the smallest maximum distance to the other sequences. If tie occurs the sequence with smallest average distance is selected. The abundance method chooses the most abundant sequence in the OTU as the representative.\n"; helpString += "If you do not provide a cutoff value 10.00 is assumed. If you do not provide a precision value then 100 is assumed.\n"; helpString += "The get.oturep command should be in the following format: get.oturep(phylip=yourDistanceMatrix, fasta=yourFastaFile, list=yourListFile, name=yourNamesFile, group=yourGroupFile, label=yourLabels).\n"; helpString += "Example get.oturep(phylip=amazon.dist, fasta=amazon.fasta, list=amazon.fn.list, group=amazon.groups).\n"; helpString += "The default value for label is all labels in your inputfile.\n"; helpString += "The sorted parameter allows you to indicate you want the output sorted. You can sort by sequence name, bin number, bin size or group. The default is no sorting, but your options are name, number, size, or group.\n"; helpString += "The large parameter allows you to indicate that your distance matrix is too large to fit in RAM. The default value is false.\n"; helpString += "The weighted parameter allows you to indicate that want to find the weighted representative. You must provide a namesfile to set weighted to true. The default value is false.\n"; helpString += "The representative is found by selecting the sequence that has the smallest total distance to all other sequences in the OTU. If a tie occurs the smallest average distance is used.\n"; helpString += "For weighted = false, mothur assumes the distance file contains only unique sequences, the list file may contain all sequences, but only the uniques are considered to become the representative. If your distance file contains all the sequences it would become weighted=true.\n"; helpString += "For weighted = true, mothur assumes the distance file contains only unique sequences, the list file must contain all sequences, all sequences are considered to become the representative, but unique name will be used in the output for consistency.\n"; helpString += "If your distance file contains all the sequence and you do not provide a name file, the weighted representative will be given, unless your listfile is unique. If you provide a namefile, then you can select weighted or unweighted.\n"; helpString += "The group parameter allows you provide a group file.\n"; helpString += "The groups parameter allows you to indicate that you want representative sequences for each group specified for each OTU, group name should be separated by dashes. ex. groups=A-B-C.\n"; helpString += "The get.oturep command outputs a .fastarep and .rep.names file for each distance you specify, selecting one OTU representative for each bin.\n"; helpString += "If you provide a groupfile, then it also appends the names of the groups present in that bin.\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetOTURepCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],[tag],rep.fasta-[filename],[tag],[group],rep.fasta"; } else if (type == "name") { pattern = "[filename],[tag],rep.names-[filename],[tag],[group],rep.names"; } else if (type == "count") { pattern = "[filename],[tag],rep.count_table-[filename],[tag],[group],rep.count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetOTURepCommand::GetOTURepCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "GetOTURepCommand"); exit(1); } } //********************************************************************************************************************** GetOTURepCommand::GetOTURepCommand(string option) { try{ abort = false; calledHelp = false; allLines = 1; //allow user to run help if (option == "help") { help(); abort = true; calledHelp = true; }else if(option == "citation") { citation(); abort = true; calledHelp = true; } else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not found") { fastafile = ""; } else if (fastafile == "not open") { abort = true; } else { m->setFastaFile(fastafile); } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not found") { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current list file and the list parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (listfile == "not open") { abort = true; } else { m->setListFile(listfile); } phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not found") { phylipfile = ""; } else if (phylipfile == "not open") { abort = true; } else { distFile = phylipfile; format = "phylip"; m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not found") { columnfile = ""; } else if (columnfile == "not open") { abort = true; } else { distFile = columnfile; format = "column"; m->setColumnFile(columnfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } hasGroups = false; countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not found") { countfile = ""; } else if (countfile == "not open") { abort = true; countfile = ""; } else { m->setCountTableFile(countfile); ct.readTable(countfile, true, false); if (ct.hasGroupInfo()) { hasGroups = true; } } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } method = validParameter.validFile(parameters, "method", false); if (method == "not found"){ method = "distance"; } if ((method != "distance") && (method != "abundance")) { m->mothurOut(method + " is not a valid option for the method parameter. The only options are: distance and abundance, aborting."); m->mothurOutEndLine(); abort = true; } if (method == "distance") { if ((phylipfile == "") && (columnfile == "")) { //is there are current file available for either of these? //give priority to column, then phylip columnfile = m->getColumnFile(); if (columnfile != "") { distFile = columnfile; format = "column"; m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { distFile = phylipfile; format = "phylip"; m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a phylip or column file before you can use the get.oturep command."); m->mothurOutEndLine(); abort = true; } } }else if ((phylipfile != "") && (columnfile != "")) { m->mothurOut("When executing a get.oturep command you must enter ONLY ONE of the following: phylip or column."); m->mothurOutEndLine(); abort = true; } if (columnfile != "") { if ((namefile == "") && (countfile == "")) { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a namefile or countfile if you are going to use the column format."); m->mothurOutEndLine(); abort = true; } } } } }else if (method == "abundance") { if ((namefile == "") && (countfile == "")) { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a namefile or countfile if you are going to use the abundance method."); m->mothurOutEndLine(); abort = true; } } } if ((phylipfile != "") || (columnfile != "")) { m->mothurOut("[WARNING]: A phylip or column file is not needed to use the abundance method, ignoring."); m->mothurOutEndLine(); phylipfile = ""; columnfile = ""; } } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; allLines = 1; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } sorted = validParameter.validFile(parameters, "sorted", false); if (sorted == "not found"){ sorted = ""; } if (sorted == "none") { sorted=""; } if ((sorted != "") && (sorted != "name") && (sorted != "bin") && (sorted != "size") && (sorted != "group")) { m->mothurOut(sorted + " is not a valid option for the sorted parameter. The only options are: name, bin, size and group. I will not sort."); m->mothurOutEndLine(); sorted = ""; } if ((sorted == "group") && ((groupfile == "")&& !hasGroups)) { m->mothurOut("You must provide a groupfile or have a count file with group info to sort by group. I will not sort."); m->mothurOutEndLine(); sorted = ""; } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { if ((groupfile == "") && (!hasGroups)) { m->mothurOut("You must provide a groupfile to use groups."); m->mothurOutEndLine(); abort = true; }else { m->splitAtDash(groups, Groups); } } m->setGroups(Groups); string temp = validParameter.validFile(parameters, "large", false); if (temp == "not found") { temp = "F"; } large = m->isTrue(temp); temp = validParameter.validFile(parameters, "weighted", false); if (temp == "not found") { temp = "f"; } weighted = m->isTrue(temp); if ((weighted) && (namefile == "")) { m->mothurOut("You cannot set weighted to true unless you provide a namesfile."); m->mothurOutEndLine(); abort = true; } temp = validParameter.validFile(parameters, "precision", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, precision); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "10.0"; } m->mothurConvert(temp, cutoff); cutoff += (5 / (precision * 10.0)); } } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "GetOTURepCommand"); exit(1); } } //********************************************************************************************************************** int GetOTURepCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int error; list = NULL; if (method=="distance") { readDist(); if ((!weighted) && (namefile != "")) { readNamesFile(weighted); } }else { //map name -> abundance for use if findRepAbund if (namefile != "") { nameToIndex = m->readNames(namefile); } } if (m->control_pressed) { if (method=="distance") { if (large) { inRow.close(); m->mothurRemove(distFile); } }return 0; } if (groupfile != "") { //read in group map info. groupMap = new GroupMap(groupfile); int error = groupMap->readMap(); if (error == 1) { delete groupMap; m->mothurOut("Error reading your groupfile. Proceeding without groupfile."); m->mothurOutEndLine(); groupfile = ""; } if (Groups.size() != 0) { SharedUtil util; vector gNamesOfGroups = groupMap->getNamesOfGroups(); util.setGroups(Groups, gNamesOfGroups, "getoturep"); groupMap->setNamesOfGroups(gNamesOfGroups); } }else if (hasGroups) { if (Groups.size() != 0) { SharedUtil util; vector gNamesOfGroups = ct.getNamesOfGroups(); util.setGroups(Groups, gNamesOfGroups, "getoturep"); } } //done with listvector from matrix if (list != NULL) { delete list; } InputData input(listfile, "list"); list = input.getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { if (method=="distance") { if (large) { inRow.close(); m->mothurRemove(distFile); } } delete list; return 0; } while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel() + "\t" + toString(list->size())); m->mothurOutEndLine(); error = process(list); if (error == 1) { return 0; } //there is an error in hte input files, abort command if (m->control_pressed) { if (method=="distance") { if (large) { inRow.close(); m->mothurRemove(distFile); } } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); delete list; return 0; } processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); m->mothurOut(list->getLabel() + "\t" + toString(list->size())); m->mothurOutEndLine(); error = process(list); if (error == 1) { return 0; } //there is an error in hte input files, abort command if (m->control_pressed) { if (method=="distance") { if (large) { inRow.close(); m->mothurRemove(distFile); } } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); delete list; return 0; } processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input.getListVector(); } //output error messages about any remaining user labels bool needToRun = false; for (set::iterator it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + (*it)); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input.getListVector(lastLabel); m->mothurOut(list->getLabel() + "\t" + toString(list->size())); m->mothurOutEndLine(); error = process(list); delete list; if (error == 1) { return 0; } //there is an error in hte input files, abort command if (m->control_pressed) { if (method=="distance") { if (large) { inRow.close(); m->mothurRemove(distFile); } } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); delete list; return 0; } } //close and remove formatted matrix file if (method=="distance") { if (large) { inRow.close(); m->mothurRemove(distFile); } if (!weighted) { nameFileMap.clear(); } } if (fastafile != "") { //read fastafile FastaMap* fasta = new FastaMap(); fasta->readFastaFile(fastafile); //if user gave a namesfile then use it if (namefile != "") { readNamesFile(fasta); } //output create and output the .rep.fasta files map::iterator itNameFile; for (itNameFile = outputNameFiles.begin(); itNameFile != outputNameFiles.end(); itNameFile++) { processFastaNames(itNameFile->first, itNameFile->second, fasta); } delete fasta; }else { //output create and output the .rep.fasta files map::iterator itNameFile; for (itNameFile = outputNameFiles.begin(); itNameFile != outputNameFiles.end(); itNameFile++) { processNames(itNameFile->first, itNameFile->second); } } if (groupfile != "") { delete groupMap; } if (m->control_pressed) { return 0; } //set fasta file as new current fastafile - use first one?? string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetOTURepCommand::readDist() { try { if (!large) { //read distance files if (format == "column") { readMatrix = new ReadColumnMatrix(distFile); } else if (format == "phylip") { readMatrix = new ReadPhylipMatrix(distFile); } else { m->mothurOut("File format error."); m->mothurOutEndLine(); return 0; } readMatrix->setCutoff(cutoff); NameAssignment* nameMap = NULL; if(namefile != ""){ nameMap = new NameAssignment(namefile); nameMap->readMap(); readMatrix->read(nameMap); }else if (countfile != "") { readMatrix->read(&ct); }else { readMatrix->read(nameMap); } if (m->control_pressed) { delete readMatrix; return 0; } list = readMatrix->getListVector(); SparseDistanceMatrix* matrix = readMatrix->getDMatrix(); // Create a data structure to quickly access the distance information. // It consists of a vector of distance maps, where each map contains // all distances of a certain sequence. Vector and maps are accessed // via the index of a sequence in the distance matrix seqVec = vector(list->size()); for (int i = 0; i < matrix->seqVec.size(); i++) { for (int j = 0; j < matrix->seqVec[i].size(); j++) { if (m->control_pressed) { delete readMatrix; return 0; } //already added everyone else in row if (i < matrix->seqVec[i][j].index) { seqVec[i][matrix->seqVec[i][j].index] = matrix->seqVec[i][j].dist; } } } //add dummy map for unweighted calc SeqMap dummy; seqVec.push_back(dummy); delete matrix; delete readMatrix; delete nameMap; if (m->control_pressed) { return 0; } }else { //process file and set up indexes if (format == "column") { formatMatrix = new FormatColumnMatrix(distFile); } else if (format == "phylip") { formatMatrix = new FormatPhylipMatrix(distFile); } else { m->mothurOut("File format error."); m->mothurOutEndLine(); return 0; } formatMatrix->setCutoff(cutoff); NameAssignment* nameMap = NULL; if(namefile != ""){ nameMap = new NameAssignment(namefile); nameMap->readMap(); formatMatrix->read(nameMap); }else if (countfile != "") { formatMatrix->read(&ct); }else { formatMatrix->read(nameMap); } if (m->control_pressed) { delete formatMatrix; return 0; } list = formatMatrix->getListVector(); distFile = formatMatrix->getFormattedFileName(); //positions in file where the distances for each sequence begin //rowPositions[1] = position in file where distance related to sequence 1 start. rowPositions = formatMatrix->getRowPositions(); rowPositions.push_back(-1); //dummy row for unweighted calc delete formatMatrix; delete nameMap; //openfile for getMap to use m->openInputFile(distFile, inRow); if (m->control_pressed) { inRow.close(); m->mothurRemove(distFile); return 0; } } //list bin 0 = first name read in distance matrix, list bin 1 = second name read in distance matrix if (list != NULL) { vector names; string binnames; //map names to rows in sparsematrix for (int i = 0; i < list->size(); i++) { names.clear(); binnames = list->get(i); m->splitAtComma(binnames, names); for (int j = 0; j < names.size(); j++) { nameToIndex[names[j]] = i; } } } else { m->mothurOut("error, no listvector."); m->mothurOutEndLine(); } if (m->control_pressed) { if (large) { inRow.close(); m->mothurRemove(distFile); }return 0; } return 0; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "readDist"); exit(1); } } //********************************************************************************************************************** void GetOTURepCommand::readNamesFile(FastaMap*& fasta) { try { ifstream in; vector dupNames; m->openInputFile(namefile, in); string name, names, sequence; while(!in.eof()){ in >> name; //read from first column A in >> names; //read from second column A,B,C,D dupNames.clear(); //parse names into vector m->splitAtComma(names, dupNames); //store names in fasta map sequence = fasta->getSequence(name); for (int i = 0; i < dupNames.size(); i++) { fasta->push_back(dupNames[i], sequence); } m->gobble(in); } in.close(); } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "readNamesFile"); exit(1); } } //********************************************************************************************************************** //read names file to find the weighted rep for each bin void GetOTURepCommand::readNamesFile(bool w) { try { ifstream in; vector dupNames; m->openInputFile(namefile, in); string name, names, sequence; while(!in.eof()){ in >> name; m->gobble(in); //read from first column A in >> names; //read from second column A,B,C,D dupNames.clear(); //parse names into vector m->splitAtComma(names, dupNames); for (int i = 0; i < dupNames.size(); i++) { nameFileMap[dupNames[i]] = name; } m->gobble(in); } in.close(); } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "readNamesFile"); exit(1); } } //********************************************************************************************************************** string GetOTURepCommand::findRepAbund(vector names, string group) { try{ vector reps; string rep = "notFound"; if (m->debug) { m->mothurOut("[DEBUG]: group=" + group + " names.size() = " + toString(names.size()) + " " + names[0] + "\n"); } if ((names.size() == 1)) { return names[0]; }else{ //fill seqIndex and initialize sums int maxAbund = 0; for (int i = 0; i < names.size(); i++) { if (m->control_pressed) { return "control"; } if (countfile != "") { //if countfile is not blank then we can assume the list file contains only uniques, otherwise we assume list file contains everyone. int numRep = 0; if (group != "") { numRep = ct.getGroupCount(names[i], group); } else { numRep = ct.getNumSeqs(names[i]); } if (numRep > maxAbund) { reps.clear(); reps.push_back(names[i]); maxAbund = numRep; }else if(numRep == maxAbund) { //tie reps.push_back(names[i]); } }else { //name file used, we assume list file contains all sequences map::iterator itNameMap = nameToIndex.find(names[i]); if (itNameMap == nameToIndex.end()) {} //assume that this sequence is not a unique else { if (itNameMap->second > maxAbund) { reps.clear(); reps.push_back(names[i]); maxAbund = itNameMap->second; }else if(itNameMap->second == maxAbund) { //tie reps.push_back(names[i]); } } } } if (reps.size() == 0) { m->mothurOut("[ERROR]: no rep found, file mismatch?? Quitting.\n"); m->control_pressed = true; } else if (reps.size() == 1) { rep = reps[0]; } else { //tie int index = m->getRandomIndex(reps.size()-1); rep = reps[index]; } } return rep; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "findRepAbund"); exit(1); } } //********************************************************************************************************************** string GetOTURepCommand::findRep(vector names, string group) { try{ //if using abundance if (method == "abundance") { return (findRepAbund(names, group)); } else { //find rep based on distance // if only 1 sequence in bin or processing the "unique" label, then // the first sequence of the OTU is the representative one if ((names.size() == 1)) { return names[0]; }else{ vector seqIndex; //(names.size()); map::iterator itNameFile; map::iterator itNameIndex; //fill seqIndex and initialize sums for (size_t i = 0; i < names.size(); i++) { if (weighted) { seqIndex.push_back(nameToIndex[names[i]]); if (countfile != "") { //if countfile is not blank then we can assume the list file contains only uniques, otherwise we assume list file contains everyone. int numRep = 0; if (group != "") { numRep = ct.getGroupCount(names[i], group); } else { numRep = ct.getNumSeqs(names[i]); } for (int j = 1; j < numRep; j++) { //don't add yourself again seqIndex.push_back(nameToIndex[names[i]]); } } }else { if (namefile == "") { itNameIndex = nameToIndex.find(names[i]); if (itNameIndex == nameToIndex.end()) { // you are not in the distance file and no namesfile, then assume you are not unique if (large) { seqIndex.push_back((rowPositions.size()-1)); } else { seqIndex.push_back((seqVec.size()-1)); } }else { seqIndex.push_back(itNameIndex->second); } }else { itNameFile = nameFileMap.find(names[i]); if (itNameFile == nameFileMap.end()) { m->mothurOut("[ERROR]: " + names[i] + " is not in your namefile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; }else{ string name1 = itNameFile->first; string name2 = itNameFile->second; if (name1 == name2) { //then you are unique so add your real dists seqIndex.push_back(nameToIndex[names[i]]); }else { //add dummy if (large) { seqIndex.push_back((rowPositions.size()-1)); } else { seqIndex.push_back((seqVec.size()-1)); } } } } } } vector max_dist(seqIndex.size(), 0.0); vector total_dist(seqIndex.size(), 0.0); // loop through all entries in seqIndex SeqMap::iterator it; SeqMap currMap; for (size_t i=0; i < seqIndex.size(); i++) { if (m->control_pressed) { return "control"; } if (!large) { currMap = seqVec[seqIndex[i]]; } else { currMap = getMap(seqIndex[i]); } for (size_t j=0; j < seqIndex.size(); j++) { it = currMap.find(seqIndex[j]); if (it != currMap.end()) { max_dist[i] = max(max_dist[i], it->second); max_dist[j] = max(max_dist[j], it->second); total_dist[i] += it->second; total_dist[j] += it->second; }else{ //if you can't find the distance make it the cutoff max_dist[i] = max(max_dist[i], cutoff); max_dist[j] = max(max_dist[j], cutoff); total_dist[i] += cutoff; total_dist[j] += cutoff; } } } // sequence with the smallest maximum distance is the representative //if tie occurs pick sequence with smallest average distance float min = 10000; int minIndex; for (size_t i=0; i < max_dist.size(); i++) { if (m->control_pressed) { return "control"; } if (max_dist[i] < min) { min = max_dist[i]; minIndex = i; }else if (max_dist[i] == min) { float currentAverage = total_dist[minIndex] / (float) total_dist.size(); float newAverage = total_dist[i] / (float) total_dist.size(); if (newAverage < currentAverage) { min = max_dist[i]; minIndex = i; } } } return(names[minIndex]); } } } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "FindRep"); exit(1); } } //********************************************************************************************************************** int GetOTURepCommand::process(ListVector* processList) { try{ string name, sequence; string nameRep; //create output file if (outputDir == "") { outputDir += m->hasPath(listfile); } ofstream newNamesOutput; string outputNamesFile; map filehandles; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); if (Groups.size() == 0) { //you don't want to use groups variables["[tag]"] = processList->getLabel(); if (countfile == "") { outputNamesFile = getOutputFileName("name", variables); outputNames.push_back(outputNamesFile); outputTypes["name"].push_back(outputNamesFile); }else { outputNamesFile = getOutputFileName("count", variables); outputNames.push_back(outputNamesFile); outputTypes["count"].push_back(outputNamesFile); } outputNameFiles[outputNamesFile] = processList->getLabel(); m->openOutputFile(outputNamesFile, newNamesOutput); newNamesOutput << "noGroup" << endl; }else{ //you want to use groups ofstream* temp; for (int i=0; igetLabel(); variables["[group]"] = Groups[i]; filehandles[Groups[i]] = temp; outputNamesFile = outputDir + m->getRootName(m->getSimpleName(listfile)) + processList->getLabel() + "." + Groups[i] + "."; if (countfile == "") { outputNamesFile = getOutputFileName("name", variables); outputNames.push_back(outputNamesFile); outputTypes["name"].push_back(outputNamesFile); }else { outputNamesFile = getOutputFileName("count", variables); outputNames.push_back(outputNamesFile); outputTypes["count"].push_back(outputNamesFile); } m->openOutputFile(outputNamesFile, *(temp)); *(temp) << Groups[i] << endl; outputNameFiles[outputNamesFile] = processList->getLabel() + "." + Groups[i]; } } //for each bin in the list vector vector binLabels = processList->getLabels(); for (int i = 0; i < processList->size(); i++) { if (m->control_pressed) { out.close(); if (Groups.size() == 0) { //you don't want to use groups newNamesOutput.close(); }else{ for (int j=0; jget(i); vector namesInBin; m->splitAtComma(temp, namesInBin); if (Groups.size() == 0) { nameRep = findRep(namesInBin, ""); newNamesOutput << binLabels[i] << '\t' << nameRep << '\t'; //put rep at first position in names line string outputString = nameRep + ","; for (int k=0; k > NamesInGroup; for (int j=0; jgetGroup(namesInBin[j]); if (thisgroup == "not found") { m->mothurOut(namesInBin[j] + " is not in your groupfile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } //add this name to correct group if (m->inUsersGroups(thisgroup, Groups)) { NamesInGroup[thisgroup].push_back(namesInBin[j]); } }else { vector thisSeqsGroups = ct.getGroups(namesInBin[j]); for (int k = 0; k < thisSeqsGroups.size(); k++) { if (m->inUsersGroups(thisSeqsGroups[k], Groups)) { NamesInGroup[thisSeqsGroups[k]].push_back(namesInBin[j]); } } } } //get rep for each group in otu for (int j=0; jerrorOut(e, "GetOTURepCommand", "process"); exit(1); } } //********************************************************************************************************************** int GetOTURepCommand::processFastaNames(string filename, string label, FastaMap*& fasta) { try{ //create output file if (outputDir == "") { outputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[tag]"] = label; string outputFileName = getOutputFileName("fasta",variables); m->openOutputFile(outputFileName, out); vector reps; outputNames.push_back(outputFileName); outputTypes["fasta"].push_back(outputFileName); ofstream out2; string tempNameFile = filename + ".temp"; m->openOutputFile(tempNameFile, out2); ifstream in; m->openInputFile(filename, in); string tempGroup = ""; in >> tempGroup; m->gobble(in); CountTable thisCt; if (countfile != "") { thisCt.readTable(countfile, true, false); if (tempGroup != "noGroup") { out2 << "Representative_Sequence\ttotal\t" << tempGroup << endl; } } int thistotal = 0; while (!in.eof()) { string rep, binnames, binLabel; in >> binLabel >> rep >> binnames; m->gobble(in); vector names; m->splitAtComma(binnames, names); int binsize = names.size(); if (countfile == "") { out2 << rep << '\t' << binnames << endl; } else { if (tempGroup == "noGroup") { for (int j = 0; j < names.size(); j++) { if (names[j] != rep) { thisCt.mergeCounts(rep, names[j]); } } binsize = thisCt.getNumSeqs(rep); }else { int total = 0; for (int j = 0; j < names.size(); j++) { total += thisCt.getGroupCount(names[j], tempGroup); } out2 << rep << '\t' << total << '\t' << total << endl; binsize = total; } } thistotal += binsize; //if you have a groupfile string group = ""; map groups; map::iterator groupIt; if (groupfile != "") { //find the groups that are in this bin for (int i = 0; i < names.size(); i++) { string groupName = groupMap->getGroup(names[i]); if (groupName == "not found") { m->mothurOut(names[i] + " is missing from your group file. Please correct. "); m->mothurOutEndLine(); groupError = true; } else { groups[groupName] = groupName; } } //turn the groups into a string for (groupIt = groups.begin(); groupIt != groups.end(); groupIt++) { group += groupIt->first + "-"; } //rip off last dash group = group.substr(0, group.length()-1); }else if (hasGroups) { map groups; for (int i = 0; i < names.size(); i++) { vector thisSeqsGroups = ct.getGroups(names[i]); for (int j = 0; j < thisSeqsGroups.size(); j++) { groups[thisSeqsGroups[j]] = thisSeqsGroups[j]; } } //turn the groups into a string for (groupIt = groups.begin(); groupIt != groups.end(); groupIt++) { group += groupIt->first + "-"; } //rip off last dash group = group.substr(0, group.length()-1); //cout << group << endl; } else{ group = ""; } //print out name and sequence for that bin string sequence = fasta->getSequence(rep); if (sequence != "not found") { if (sorted == "") { //print them out rep = rep + "\t" + binLabel; rep = rep + "|" + toString(binsize); if (group != "") { rep = rep + "|" + group; } out << ">" << rep << endl; out << sequence << endl; }else { //save them int simpleLabel; m->mothurConvert(m->getSimpleLabel(binLabel), simpleLabel); repStruct newRep(rep, binLabel, simpleLabel, binsize, group); reps.push_back(newRep); } }else { m->mothurOut(rep + " is missing from your fasta or name file, ignoring. Please correct."); m->mothurOutEndLine(); } } if (sorted != "") { //then sort them and print them if (sorted == "name") { sort(reps.begin(), reps.end(), compareName); } else if (sorted == "bin") { sort(reps.begin(), reps.end(), compareBin); } else if (sorted == "size") { sort(reps.begin(), reps.end(), compareSize); } else if (sorted == "group") { sort(reps.begin(), reps.end(), compareGroup); } //print them for (int i = 0; i < reps.size(); i++) { string sequence = fasta->getSequence(reps[i].name); string outputName = reps[i].name + "\t" + reps[i].bin; outputName = outputName + "|" + toString(reps[i].size); if (reps[i].group != "") { outputName = outputName + "|" + reps[i].group; } out << ">" << outputName << endl; out << sequence << endl; } } in.close(); out.close(); out2.close(); m->mothurRemove(filename); rename(tempNameFile.c_str(), filename.c_str()); if ((countfile != "") && (tempGroup == "noGroup")) { thisCt.printTable(filename); } return 0; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "processFastaNames"); exit(1); } } //********************************************************************************************************************** int GetOTURepCommand::processNames(string filename, string label) { try{ //create output file if (outputDir == "") { outputDir += m->hasPath(listfile); } ofstream out2; string tempNameFile = filename + ".temp"; m->openOutputFile(tempNameFile, out2); ifstream in; m->openInputFile(filename, in); string rep, binnames; string tempGroup = ""; in >> tempGroup; m->gobble(in); CountTable thisCt; if (countfile != "") { thisCt.readTable(countfile, true, false); if (tempGroup != "noGroup") { out2 << "Representative_Sequence\ttotal\t" << tempGroup << endl; } } while (!in.eof()) { if (m->control_pressed) { break; } string binLabel; in >> binLabel >> rep >> binnames; m->gobble(in); if (countfile == "") { out2 << rep << '\t' << binnames << endl; } else { vector names; m->splitAtComma(binnames, names); if (tempGroup == "noGroup") { for (int j = 0; j < names.size(); j++) { if (names[j] != rep) { thisCt.mergeCounts(rep, names[j]); } } }else { int total = 0; for (int j = 0; j < names.size(); j++) { total += thisCt.getGroupCount(names[j], tempGroup); } out2 << rep << '\t' << total << '\t' << total << endl; } } } in.close(); out2.close(); m->mothurRemove(filename); rename(tempNameFile.c_str(), filename.c_str()); if ((countfile != "") && (tempGroup == "noGroup")) { thisCt.printTable(filename); } return 0; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "processNames"); exit(1); } } //********************************************************************************************************************** SeqMap GetOTURepCommand::getMap(int row) { try { SeqMap rowMap; //make sure this row exists in the file, it may not if the seq did not have any distances below the cutoff if (rowPositions[row] != -1){ //go to row in file inRow.seekg(rowPositions[row]); int rowNum, numDists, colNum; float dist; inRow >> rowNum >> numDists; for(int i = 0; i < numDists; i++) { inRow >> colNum >> dist; rowMap[colNum] = dist; } } return rowMap; } catch(exception& e) { m->errorOut(e, "GetOTURepCommand", "getMap"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getoturepcommand.h000066400000000000000000000051351255543666200217710ustar00rootroot00000000000000#ifndef GETOTUREPCOMMAND_H #define GETOTUREPCOMMAND_H /* * getoturepcommand.h * Mothur * * Created by Sarah Westcott on 4/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* The get.oturep command outputs a .fastarep file for each distance you specify, selecting one OTU representative for each bin. */ #include "command.hpp" #include "listvector.hpp" #include "inputdata.h" #include "fastamap.h" #include "groupmap.h" #include "readmatrix.hpp" #include "formatmatrix.h" #include "counttable.h" typedef map SeqMap; struct repStruct { string name; string bin; int simpleBin; int size; string group; repStruct(){} repStruct(string n, string b, int sb, int s, string g) : name(n), bin(b), size(s), group(g), simpleBin(sb) { } ~repStruct() {} }; class GetOTURepCommand : public Command { public: GetOTURepCommand(string); GetOTURepCommand(); ~GetOTURepCommand(){} vector setParameters(); string getCommandName() { return "get.oturep"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.oturep"; } string getDescription() { return "gets a representative sequence for each OTU"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: ListVector* list; GroupMap* groupMap; ReadMatrix* readMatrix; FormatMatrix* formatMatrix; NameAssignment* nameMap; CountTable ct; string filename, fastafile, listfile, namefile, groupfile, label, sorted, phylipfile, countfile, columnfile, distFile, format, outputDir, groups, method; ofstream out; ifstream in, inNames, inRow; bool abort, allLines, groupError, large, weighted, hasGroups; set labels; //holds labels to be used map nameToIndex; //maps sequence name to index in sparsematrix map nameFileMap; vector outputNames, Groups; map outputNameFiles; float cutoff; int precision; vector seqVec; // contains maps with sequence index and distance // for all distances related to a certain sequence vector rowPositions; void readNamesFile(FastaMap*&); void readNamesFile(bool); int process(ListVector*); SeqMap getMap(int); string findRep(vector, string); // returns the name of the "representative" sequence of given bin or subset of a bin, for groups string findRepAbund(vector, string); int processNames(string, string); int processFastaNames(string, string, FastaMap*&); int readDist(); }; #endif mothur-1.36.1/source/commands/getotuscommand.cpp000066400000000000000000000415751255543666200220100ustar00rootroot00000000000000/* * getotuscommand.cpp * Mothur * * Created by westcott on 11/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "getotuscommand.h" #include "inputdata.h" #include "sharedutilities.h" //********************************************************************************************************************** vector GetOtusCommand::setParameters(){ try { CommandParameter pgroup("group", "InputTypes", "", "", "none", "none", "none","group",false,true, true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","list",false,true, true); parameters.push_back(plist); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetOtusCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetOtusCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.otus command selects otus containing sequences from a specfic group or set of groups.\n"; helpString += "It outputs a new list file containing the otus containing sequences from in the those specified groups.\n"; helpString += "The get.otus command parameters are accnos, group, list, label and groups. The group and list parameters are required, unless you have valid current files.\n"; helpString += "You must also provide an accnos containing the list of groups to get or set the groups parameter to the groups you wish to select.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like. You can separate group names with dashes.\n"; helpString += "The label parameter allows you to specify which distance you want to process.\n"; helpString += "The get.otus command should be in the following format: get.otus(accnos=yourAccnos, list=yourListFile, group=yourGroupFile, label=yourLabel).\n"; helpString += "Example get.otus(accnos=amazon.accnos, list=amazon.fn.list, group=amazon.groups, label=0.03).\n"; helpString += "or get.otus(groups=pasture, list=amazon.fn.list, amazon.groups, label=0.03).\n"; helpString += "Note: No spaces between parameter labels (i.e. list), '=' and parameters (i.e.yourListFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetOtusCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetOtusCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "group") { pattern = "[filename],[tag],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[tag],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetOtusCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetOtusCommand::GetOtusCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["list"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetOtusCommand", "GetOtusCommand"); exit(1); } } //********************************************************************************************************************** GetOtusCommand::GetOtusCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["list"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } } //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = ""; } else { m->setAccnosFile(accnosfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current group file and the group parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setGroupFile(groupfile); } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current list file and the list parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setListFile(listfile); } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } if ((accnosfile == "") && (Groups.size() == 0)) { m->mothurOut("You must provide an accnos file or specify groups using the groups parameter."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "GetOtusCommand", "GetOtusCommand"); exit(1); } } //********************************************************************************************************************** int GetOtusCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } groupMap = new GroupMap(groupfile); groupMap->readMap(); //get groups you want to get if (accnosfile != "") { m->readAccnos(accnosfile, Groups); m->setGroups(Groups); } //make sure groups are valid //takes care of user setting groupNames that are invalid or setting groups=all SharedUtil* util = new SharedUtil(); vector gNamesOfGroups = groupMap->getNamesOfGroups(); util->setGroups(Groups, gNamesOfGroups); groupMap->setNamesOfGroups(gNamesOfGroups); delete util; if (m->control_pressed) { delete groupMap; return 0; } //read through the list file keeping any otus that contain any sequence from the groups selected readListGroup(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set list file as new current listfile string current = ""; itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "GetOtusCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetOtusCommand::readListGroup(){ try { InputData* input = new InputData(listfile, "list"); ListVector* list = input->getListVector(); string lastLabel = list->getLabel(); //using first label seen if none is provided if (label == "") { label = lastLabel; } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[tag]"] = label; variables["[extension]"] = m->getExtension(listfile); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); string GroupOutputDir = outputDir; if (outputDir == "") { GroupOutputDir += m->hasPath(groupfile); } variables["[filename]"] = GroupOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputGroupFileName = getOutputFileName("group", variables); ofstream outGroup; m->openOutputFile(outputGroupFileName, outGroup); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; bool wroteSomething = false; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { delete list; delete input; out.close(); outGroup.close(); m->mothurRemove(outputFileName); m->mothurRemove(outputGroupFileName);return 0; } if(labels.count(list->getLabel()) == 1){ processList(list, groupMap, out, outGroup, wroteSomething); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input->getListVector(lastLabel); processList(list, groupMap, out, outGroup, wroteSomething); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = NULL; //get next line to process list = input->getListVector(); } if (m->control_pressed) { if (list != NULL) { delete list; } delete input; out.close(); outGroup.close(); m->mothurRemove(outputFileName); m->mothurRemove(outputGroupFileName); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input->getListVector(lastLabel); processList(list, groupMap, out, outGroup, wroteSomething); delete list; list = NULL; } out.close(); outGroup.close(); if (wroteSomething == false) { m->mothurOut("At distance " + label + " your file does NOT contain any otus containing sequences from the groups you wish to get."); m->mothurOutEndLine(); } outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); outputTypes["group"].push_back(outputGroupFileName); outputNames.push_back(outputGroupFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetOtusCommand", "readList"); exit(1); } } //********************************************************************************************************************** int GetOtusCommand::processList(ListVector*& list, GroupMap*& groupMap, ofstream& out, ofstream& outGroup, bool& wroteSomething){ try { //make a new list vector ListVector newList; newList.setLabel(list->getLabel()); int numOtus = 0; //for each bin vector binLabels = list->getLabels(); vector newBinLabels; for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { return 0; } //parse out names that are in accnos file string binnames = list->get(i); bool keepBin = false; string groupFileOutput = ""; //parse names string individual = ""; int length = binnames.length(); for(int j=0;jgetGroup(individual); if (group == "not found") { m->mothurOut("[ERROR]: " + individual + " is not in your groupfile. please correct."); m->mothurOutEndLine(); group = "NOTFOUND"; } if (m->inUsersGroups(group, Groups)) { keepBin = true; } groupFileOutput += individual + "\t" + group + "\n"; individual = ""; } else{ individual += binnames[j]; } } string group = groupMap->getGroup(individual); if (group == "not found") { m->mothurOut("[ERROR]: " + individual + " is not in your groupfile. please correct."); m->mothurOutEndLine(); group = "NOTFOUND"; } if (m->inUsersGroups(group, Groups)) { keepBin = true; } groupFileOutput += individual + "\t" + group + "\n"; //if there are sequences from the groups we want in this bin add to new list, output to groupfile if (keepBin) { newList.push_back(binnames); newBinLabels.push_back(binLabels[i]); outGroup << groupFileOutput; numOtus++; } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } m->mothurOut(newList.getLabel() + " - selected " + toString(numOtus) + " of the " + toString(list->getNumBins()) + " OTUs."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetOtusCommand", "processList"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getotuscommand.h000066400000000000000000000021561255543666200214450ustar00rootroot00000000000000#ifndef GETOTUSCOMMAND_H #define GETOTUSCOMMAND_H /* * getotuscommand.h * Mothur * * Created by westcott on 11/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "groupmap.h" #include "listvector.hpp" class GetOtusCommand : public Command { public: GetOtusCommand(string); GetOtusCommand(); ~GetOtusCommand(){} vector setParameters(); string getCommandName() { return "get.otus"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.otus"; } string getDescription() { return "outputs a new list file containing the otus containing sequences from the groups specified"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string accnosfile, groupfile, listfile, outputDir, groups, label; bool abort; vector outputNames, Groups; GroupMap* groupMap; int readListGroup(); int processList(ListVector*&, GroupMap*&, ofstream&, ofstream&, bool&); }; #endif mothur-1.36.1/source/commands/getrabundcommand.cpp000066400000000000000000000443111255543666200222600ustar00rootroot00000000000000/* * getrabundcommand.cpp * Mothur * * Created by Sarah Westcott on 6/2/09. * Copyright 2009 Schloss Lab Umass Amherst. All rights reserved. * */ #include "getrabundcommand.h" //********************************************************************************************************************** vector GetRAbundCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "LRSS", "LRSS", "none","rabund",false,false, true); parameters.push_back(plist); CommandParameter pcount("count", "InputTypes", "", "", "none", "none", "none","",false,false, false); parameters.push_back(pcount); CommandParameter psabund("sabund", "InputTypes", "", "", "LRSS", "LRSS", "none","rabund",false,false, true); parameters.push_back(psabund); CommandParameter psorted("sorted", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(psorted); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetRAbundCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.rabund command parameters are list, sabund, count, label and sorted. list or sabund parameters are required, unless you have valid current files.\n"; helpString += "The count parameter allows you to provide a count file associated with your list file. If you clustered with a countfile the list file only contains the unique sequences and you will want to add the redundant counts into the rabund file, providing the count file allows you to do so.\n"; helpString += "The label parameter allows you to select what distance levels you would like included in your .rabund file, and are separated by dashes.\n"; helpString += "The sorted parameters allows you to print the rabund results sorted by abundance or not. The default is sorted.\n"; helpString += "The get.rabund command should be in the following format: get.rabund(label=yourLabels, sorted=yourSorted).\n"; helpString += "Example get.rabund(sorted=F).\n"; helpString += "The default value for label is all labels in your inputfile.\n"; helpString += "The get.rabund command outputs a .rabund file containing the lines you selected.\n"; helpString += "Note: No spaces between parameter labels (i.e. label), '=' and parameters (i.e.yourLabels).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetRAbundCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "rabund") { pattern = "[filename],rabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetRAbundCommand::GetRAbundCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["rabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "GetRAbundCommand"); exit(1); } } //********************************************************************************************************************** GetRAbundCommand::GetRAbundCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["rabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; inputfile = listfile; m->setListFile(listfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { sabundfile = ""; abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { format = "sabund"; inputfile = sabundfile; m->setSabundFile(sabundfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "sorted", false); if (temp == "not found") { temp = "T"; } sorted = m->isTrue(temp); label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } if ((listfile == "") && (sabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it listfile = m->getListFile(); if (listfile != "") { inputfile = listfile; format = "list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { sabundfile = m->getSabundFile(); if (sabundfile != "") { inputfile = sabundfile; format = "sabund"; m->mothurOut("Using " + sabundfile + " as input file for the sabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list or sabund file."); m->mothurOutEndLine(); abort = true; } } } if ((countfile != "") && (listfile == "")) { m->mothurOut("[ERROR]: You can only use the count file with a list file, aborting.\n"); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } } } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "GetRAbundCommand"); exit(1); } } //********************************************************************************************************************** int GetRAbundCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); filename = getOutputFileName("rabund", variables); m->openOutputFile(filename, out); if (countfile != "") { processList(out); }else { InputData input(inputfile, format); RAbundVector* rabund = input.getRAbundVector(); string lastLabel = rabund->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); delete rabund; return 0; } while((rabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if(allLines == 1 || labels.count(rabund->getLabel()) == 1){ m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); delete rabund; return 0; } if(sorted) { rabund->print(out); } else { rabund->nonSortedPrint(out); } processedLabels.insert(rabund->getLabel()); userLabels.erase(rabund->getLabel()); } if ((m->anyLabelsToProcess(rabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = rabund->getLabel(); delete rabund; rabund = input.getRAbundVector(lastLabel); m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); delete rabund; return 0; } if(sorted) { rabund->print(out); } else { rabund->nonSortedPrint(out); } processedLabels.insert(rabund->getLabel()); userLabels.erase(rabund->getLabel()); //restore real lastlabel to save below rabund->setLabel(saveLabel); } lastLabel = rabund->getLabel(); delete rabund; rabund = input.getRAbundVector(); } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (rabund != NULL) { delete rabund; } rabund = input.getRAbundVector(lastLabel); m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); delete rabund; return 0; } if(sorted) { rabund->print(out); } else { rabund->nonSortedPrint(out); } delete rabund; } } if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(filename); m->mothurOutEndLine(); outputNames.push_back(filename); outputTypes["rabund"].push_back(filename); m->mothurOutEndLine(); out.close(); //set rabund file as new current rabundfile string current = ""; itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetRAbundCommand::processList(ofstream& out){ try { CountTable ct; ct.readTable(countfile, false, false); InputData input(inputfile, format); ListVector* list = input.getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { delete list; return 0; } while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if(allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { delete list; return 0; } RAbundVector* rabund = new RAbundVector(); createRabund(ct, list, rabund); if(sorted) { rabund->print(out); } else { rabund->nonSortedPrint(out); } delete rabund; processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { delete list; return 0; } RAbundVector* rabund = new RAbundVector(); createRabund(ct, list, rabund); if(sorted) { rabund->print(out); } else { rabund->nonSortedPrint(out); } delete rabund; processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input.getListVector(); } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input.getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { delete list; return 0; } RAbundVector* rabund = new RAbundVector(); createRabund(ct, list, rabund); if(sorted) { rabund->print(out); } else { rabund->nonSortedPrint(out); } delete rabund; delete list; } return 0; } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "processList"); exit(1); } } //********************************************************************************************************************** int GetRAbundCommand::createRabund(CountTable& ct, ListVector*& list, RAbundVector*& rabund){ try { rabund->setLabel(list->getLabel()); for(int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { return 0; } vector binNames; string bin = list->get(i); m->splitAtComma(bin, binNames); int total = 0; for (int j = 0; j < binNames.size(); j++) { total += ct.getNumSeqs(binNames[j]); } rabund->push_back(total); } return 0; } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "createRabund"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getrabundcommand.h000066400000000000000000000022511255543666200217220ustar00rootroot00000000000000#ifndef GETRABUNDCOMMAND_H #define GETRABUNDCOMMAND_H /* * getrabundcommand.h * Mothur * * Created by Sarah Westcott on 6/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "listvector.hpp" class GetRAbundCommand : public Command { public: GetRAbundCommand(string); GetRAbundCommand(); ~GetRAbundCommand(){} vector setParameters(); string getCommandName() { return "get.rabund"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.rabund"; } string getDescription() { return "creates a rabund file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string filename, listfile, sabundfile, inputfile, format, outputDir, countfile; ofstream out; vector outputNames; bool abort, allLines, sorted; set labels; //holds labels to be used string label; int processList(ofstream& out); int createRabund(CountTable& ct, ListVector*& list, RAbundVector*& rabund); }; #endif mothur-1.36.1/source/commands/getrelabundcommand.cpp000066400000000000000000000330451255543666200226030ustar00rootroot00000000000000/* * getrelabundcommand.cpp * Mothur * * Created by westcott on 6/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "getrelabundcommand.h" //********************************************************************************************************************** vector GetRelAbundCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","relabund",false,true, true); parameters.push_back(pshared); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pscale("scale", "Multiple", "totalgroup-totalotu-averagegroup-averageotu", "totalgroup", "", "", "","",false,false); parameters.push_back(pscale); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetRelAbundCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetRelAbundCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.relabund command parameters are shared, groups, scale and label. shared is required, unless you have a valid current file.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distance levels you would like, and are also separated by dashes.\n"; helpString += "The scale parameter allows you to select what scale you would like to use. Choices are totalgroup, totalotu, averagegroup, averageotu, default is totalgroup.\n"; helpString += "The get.relabund command should be in the following format: get.relabund(groups=yourGroups, label=yourLabels).\n"; helpString += "Example get.relabund(groups=A-B-C, scale=averagegroup).\n"; helpString += "The default value for groups is all the groups in your groupfile, and all labels in your inputfile will be used.\n"; helpString += "The get.relabund command outputs a .relabund file.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetRelAbundCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetRelAbundCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "relabund") { pattern = "[filename],relabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetRelAbundCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetRelAbundCommand::GetRelAbundCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["relabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetRelAbundCommand", "GetRelAbundCommand"); exit(1); } } //********************************************************************************************************************** GetRelAbundCommand::GetRelAbundCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["relabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //get shared file sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; pickedGroups = false; } else { pickedGroups = true; m->splitAtDash(groups, Groups); m->setGroups(Groups); } scale = validParameter.validFile(parameters, "scale", false); if (scale == "not found") { scale = "totalgroup"; } if ((scale != "totalgroup") && (scale != "totalotu") && (scale != "averagegroup") && (scale != "averageotu")) { m->mothurOut(scale + " is not a valid scaling option for the get.relabund command. Choices are totalgroup, totalotu, averagegroup, averageotu."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "GetRelAbundCommand", "GetRelAbundCommand"); exit(1); } } //********************************************************************************************************************** int GetRelAbundCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); string outputFileName = getOutputFileName("relabund", variables); ofstream out; m->openOutputFile(outputFileName, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); input = new InputData(sharedfile, "sharedfile"); lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); delete input; out.close(); m->mothurRemove(outputFileName); return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } getRelAbundance(lookup, out); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } getRelAbundance(lookup, out); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { outputTypes.clear(); m->clearGroups(); delete input; out.close(); m->mothurRemove(outputFileName); return 0; } //get next line to process lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { outputTypes.clear(); m->clearGroups(); delete input; out.close(); m->mothurRemove(outputFileName); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } getRelAbundance(lookup, out); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //reset groups parameter m->clearGroups(); delete input; out.close(); if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFileName); return 0;} m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFileName); m->mothurOutEndLine(); outputNames.push_back(outputFileName); outputTypes["relabund"].push_back(outputFileName); m->mothurOutEndLine(); //set relabund file as new current relabundfile string current = ""; itTypes = outputTypes.find("relabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRelAbundFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "GetRelAbundCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetRelAbundCommand::getRelAbundance(vector& thisLookUp, ofstream& out){ try { for (int i = 0; i < thisLookUp.size(); i++) { out << thisLookUp[i]->getLabel() << '\t' << thisLookUp[i]->getGroup() << '\t' << thisLookUp[i]->getNumBins(); for (int j = 0; j < thisLookUp[i]->getNumBins(); j++) { if (m->control_pressed) { return 0; } int abund = thisLookUp[i]->getAbundance(j); float relabund = 0.0; if (scale == "totalgroup") { relabund = abund / (float) thisLookUp[i]->getNumSeqs(); }else if (scale == "totalotu") { //calc the total in this otu int totalOtu = 0; for (int l = 0; l < thisLookUp.size(); l++) { totalOtu += thisLookUp[l]->getAbundance(j); } relabund = abund / (float) totalOtu; }else if (scale == "averagegroup") { relabund = abund / (float) (thisLookUp[i]->getNumSeqs() / (float) thisLookUp[i]->getNumBins()); }else if (scale == "averageotu") { //calc the total in this otu int totalOtu = 0; for (int l = 0; l < thisLookUp.size(); l++) { totalOtu += thisLookUp[l]->getAbundance(j); } float averageOtu = totalOtu / (float) thisLookUp.size(); relabund = abund / (float) averageOtu; }else{ m->mothurOut(scale + " is not a valid scaling option."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } out << '\t' << relabund; } out << endl; } return 0; } catch(exception& e) { m->errorOut(e, "GetRelAbundCommand", "getRelAbundance"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getrelabundcommand.h000066400000000000000000000022531255543666200222450ustar00rootroot00000000000000#ifndef GETRELABUNDCOMMAND_H #define GETRELABUNDCOMMAND_H /* * getrelabundcommand.h * Mothur * * Created by westcott on 6/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "sharedrabundvector.h" class GetRelAbundCommand : public Command { public: GetRelAbundCommand(string); GetRelAbundCommand(); ~GetRelAbundCommand(){} vector setParameters(); string getCommandName() { return "get.relabund"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.relabund"; } string getDescription() { return "calculates the relative abundance of each OTU in a sample"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: InputData* input; vector lookup; bool abort, allLines, pickedGroups; set labels; //holds labels to be used string groups, label, outputDir, scale, sharedfile; vector Groups, outputNames; int getRelAbundance(vector&, ofstream&); }; #endif mothur-1.36.1/source/commands/getsabundcommand.cpp000066400000000000000000000427421255543666200222670ustar00rootroot00000000000000/* * getsabundcommand.cpp * Mothur * * Created by Sarah Westcott on 6/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "getsabundcommand.h" //********************************************************************************************************************** vector GetSAbundCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "LRSS", "LRSS", "none","sabund",false,false, true); parameters.push_back(plist); CommandParameter pcount("count", "InputTypes", "", "", "none", "none", "none","",false,false, false); parameters.push_back(pcount); CommandParameter prabund("rabund", "InputTypes", "", "", "LRSS", "LRSS", "none","sabund",false,false, true); parameters.push_back(prabund); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetSAbundCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetSAbundCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.sabund command parameters is list, rabund, count and label. list or rabund is required unless a valid current file exists.\n"; helpString += "The count parameter allows you to provide a count file associated with your list file. If you clustered with a countfile the list file only contains the unique sequences and you will want to add the redundant counts into the sabund file, providing the count file allows you to do so.\n"; helpString += "The label parameter allows you to select what distance levels you would like included in your .sabund file, and are separated by dashes.\n"; helpString += "The get.sabund command should be in the following format: get.sabund(label=yourLabels).\n"; helpString += "Example get.sabund().\n"; helpString += "The default value for label is all labels in your inputfile.\n"; helpString += "The get.sabund command outputs a .sabund file containing the labels you selected.\n"; helpString += "Note: No spaces between parameter labels (i.e. label), '=' and parameters (i.e.yourLabel).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetSAbundCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetSAbundCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "sabund") { pattern = "[filename],sabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetRAbundCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetSAbundCommand::GetSAbundCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["sabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetSAbundCommand", "GetSAbundCommand"); exit(1); } } //********************************************************************************************************************** GetSAbundCommand::GetSAbundCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["sabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; inputfile = listfile; m->setListFile(listfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { rabundfile = ""; abort = true; } else if (rabundfile == "not found") { rabundfile = ""; } else { format = "rabund"; inputfile = rabundfile; m->setRabundFile(rabundfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } if ((listfile == "") && (rabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it listfile = m->getListFile(); if (listfile != "") { inputfile = listfile; format = "list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { rabundfile = m->getRabundFile(); if (rabundfile != "") { inputfile = rabundfile; format = "rabund"; m->mothurOut("Using " + rabundfile + " as input file for the rabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list or rabund file."); m->mothurOutEndLine(); abort = true; } } } if ((countfile != "") && (listfile == "")) { m->mothurOut("[ERROR]: You can only use the count file with a list file, aborting.\n"); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } } } catch(exception& e) { m->errorOut(e, "GetSAbundCommand", "GetSAbundCommand"); exit(1); } } //********************************************************************************************************************** int GetSAbundCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); filename = getOutputFileName("sabund", variables); m->openOutputFile(filename, out); if (countfile != "") { processList(out); }else { InputData input(inputfile, format); SAbundVector* sabund = input.getSAbundVector(); string lastLabel = sabund->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); delete sabund; return 0; } while((sabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if(allLines == 1 || labels.count(sabund->getLabel()) == 1){ m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); sabund->print(out); if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); delete sabund; return 0; } processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); } if ((m->anyLabelsToProcess(sabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = sabund->getLabel(); delete sabund; sabund = (input.getSAbundVector(lastLabel)); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); sabund->print(out); if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); delete sabund; return 0; } processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); //restore real lastlabel to save below sabund->setLabel(saveLabel); } lastLabel = sabund->getLabel(); delete sabund; sabund = (input.getSAbundVector()); } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (sabund != NULL) { delete sabund; } sabund = (input.getSAbundVector(lastLabel)); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); sabund->print(out); delete sabund; if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); return 0; } } } out.close(); if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(filename); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(filename); m->mothurOutEndLine(); outputNames.push_back(filename); outputTypes["sabund"].push_back(filename); m->mothurOutEndLine(); //set sabund file as new current sabundfile string current = ""; itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "GetSAbundCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetSAbundCommand::processList(ofstream& out){ try { CountTable ct; ct.readTable(countfile, false, false); InputData input(inputfile, format); ListVector* list = input.getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { delete list; return 0; } while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if(allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { delete list; return 0; } RAbundVector* rabund = new RAbundVector(); createRabund(ct, list, rabund); SAbundVector sabund = rabund->getSAbundVector(); sabund.print(out); delete rabund; processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { delete list; return 0; } RAbundVector* rabund = new RAbundVector(); createRabund(ct, list, rabund); SAbundVector sabund = rabund->getSAbundVector(); sabund.print(out); delete rabund; processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input.getListVector(); } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input.getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { delete list; return 0; } RAbundVector* rabund = new RAbundVector(); createRabund(ct, list, rabund); SAbundVector sabund = rabund->getSAbundVector(); sabund.print(out); delete rabund; delete list; } return 0; } catch(exception& e) { m->errorOut(e, "GetSAbundCommand", "processList"); exit(1); } } //********************************************************************************************************************** int GetSAbundCommand::createRabund(CountTable& ct, ListVector*& list, RAbundVector*& rabund){ try { rabund->setLabel(list->getLabel()); for(int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { return 0; } vector binNames; string bin = list->get(i); m->splitAtComma(bin, binNames); int total = 0; for (int j = 0; j < binNames.size(); j++) { total += ct.getNumSeqs(binNames[j]); } rabund->push_back(total); } return 0; } catch(exception& e) { m->errorOut(e, "GetSAbundCommand", "createRabund"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getsabundcommand.h000066400000000000000000000022441255543666200217250ustar00rootroot00000000000000#ifndef GETSABUNDCOMMAND_H #define GETSABUNDCOMMAND_H /* * getsabundcommand.h * Mothur * * Created by Sarah Westcott on 6/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "sabundvector.hpp" class GetSAbundCommand : public Command { public: GetSAbundCommand(string); GetSAbundCommand(); ~GetSAbundCommand() {} vector setParameters(); string getCommandName() { return "get.sabund"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.sabund"; } string getDescription() { return "creates a sabund file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string filename, format, inputfile, listfile, rabundfile, outputDir, countfile; ofstream out; vector outputNames; bool abort, allLines; set labels; //holds labels to be used string label; int processList(ofstream& out); int createRabund(CountTable& ct, ListVector*& list, RAbundVector*& rabund); }; #endif mothur-1.36.1/source/commands/getseqscommand.cpp000066400000000000000000001346511255543666200217670ustar00rootroot00000000000000/* * getseqscommand.cpp * Mothur * * Created by Sarah Westcott on 7/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "getseqscommand.h" #include "sequence.hpp" #include "listvector.hpp" #include "counttable.h" //********************************************************************************************************************** vector GetSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "FNGLT", "none","fasta",false,false,true); parameters.push_back(pfasta); CommandParameter pfastq("fastq", "InputTypes", "", "", "none", "FNGLT", "none","fastq",false,false,true); parameters.push_back(pfastq); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "FNGLT", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "FNGLT", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "FNGLT", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "FNGLT", "none","list",false,false,true); parameters.push_back(plist); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "FNGLT", "none","taxonomy",false,false,true); parameters.push_back(ptaxonomy); CommandParameter palignreport("alignreport", "InputTypes", "", "", "none", "FNGLT", "none","alignreport",false,false); parameters.push_back(palignreport); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "FNGLT", "none","qfile",false,false); parameters.push_back(pqfile); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(paccnos); CommandParameter pdups("dups", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pdups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter paccnos2("accnos2", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos2); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.seqs command reads an .accnos file and any of the following file types: fasta, name, group, count, list, taxonomy, quality, fastq or alignreport file.\n"; helpString += "It outputs a file containing only the sequences in the .accnos file.\n"; helpString += "The get.seqs command parameters are accnos, fasta, name, group, list, taxonomy, qfile, alignreport, fastq and dups. You must provide accnos unless you have a valid current accnos file, and at least one of the other parameters.\n"; helpString += "The dups parameter allows you to add the entire line from a name file if you add any name from the line. default=true. \n"; helpString += "The get.seqs command should be in the following format: get.seqs(accnos=yourAccnos, fasta=yourFasta).\n"; helpString += "Example get.seqs(accnos=amazon.accnos, fasta=amazon.fasta).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** GetSeqsCommand::GetSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["fastq"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["accnosreport"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "GetSeqsCommand"); exit(1); } } //********************************************************************************************************************** string GetSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],pick,[extension]"; } else if (type == "fastq") { pattern = "[filename],pick,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "name") { pattern = "[filename],pick,[extension]"; } else if (type == "group") { pattern = "[filename],pick,[extension]"; } else if (type == "count") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[distance],pick,[extension]"; } else if (type == "qfile") { pattern = "[filename],pick,[extension]"; } else if (type == "accnosreport") { pattern = "[filename],pick.accnos.report"; } else if (type == "alignreport") { pattern = "[filename],pick.align.report"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetSeqsCommand::GetSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["fastq"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["accnosreport"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("alignreport"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["alignreport"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("accnos2"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos2"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("fastq"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fastq"] = inputDir + it->second; } } } //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = m->getAccnosFile(); if (accnosfile != "") { m->mothurOut("Using " + accnosfile + " as input file for the accnos parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no valid accnos file and accnos is required."); m->mothurOutEndLine(); abort = true; } }else { m->setAccnosFile(accnosfile); } if (accnosfile2 == "not found") { accnosfile2 = ""; } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } alignfile = validParameter.validFile(parameters, "alignreport", true); if (alignfile == "not open") { abort = true; } else if (alignfile == "not found") { alignfile = ""; } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { taxfile = ""; abort = true; } else if (taxfile == "not found") { taxfile = ""; } else { m->setTaxonomyFile(taxfile); } qualfile = validParameter.validFile(parameters, "qfile", true); if (qualfile == "not open") { abort = true; } else if (qualfile == "not found") { qualfile = ""; } else { m->setQualFile(qualfile); } fastqfile = validParameter.validFile(parameters, "fastq", true); if (fastqfile == "not open") { abort = true; } else if (fastqfile == "not found") { fastqfile = ""; } accnosfile2 = validParameter.validFile(parameters, "accnos2", true); if (accnosfile2 == "not open") { abort = true; } else if (accnosfile2 == "not found") { accnosfile2 = ""; } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } string usedDups = "true"; string temp = validParameter.validFile(parameters, "dups", false); if (temp == "not found") { temp = "true"; usedDups = ""; } dups = m->isTrue(temp); if ((fastqfile == "") && (fastafile == "") && (namefile == "") && (groupfile == "") && (alignfile == "") && (listfile == "") && (taxfile == "") && (qualfile == "") && (accnosfile2 == "") && (countfile == "")) { m->mothurOut("You must provide one of the following: fasta, name, group, count, alignreport, taxonomy, quality, fastq or listfile."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if ((namefile == "") && ((fastafile != "") || (taxfile != ""))){ vector files; files.push_back(fastafile); files.push_back(taxfile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "GetSeqsCommand"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get names you want to keep names = m->readAccnos(accnosfile); if (m->control_pressed) { return 0; } if (countfile != "") { if ((fastafile != "") || (listfile != "") || (taxfile != "")) { m->mothurOut("\n[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.\n\n"); } } //read through the correct file and output lines you want to keep if (namefile != "") { readName(); } if (fastafile != "") { readFasta(); } if (fastqfile != "") { readFastq(); } if (groupfile != "") { readGroup(); } if (countfile != "") { readCount(); } if (alignfile != "") { readAlign(); } if (listfile != "") { readList(); } if (taxfile != "") { readTax(); } if (qualfile != "") { readQual(); } if (accnosfile2 != "") { compareAccnos(); } if (m->debug) { runSanityCheck(); } if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("qfile"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setQualFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::readFastq(){ try { bool wroteSomething = false; int selectedCount = 0; ifstream in; m->openInputFile(fastqfile, in); string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastqfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastqfile)); variables["[extension]"] = m->getExtension(fastqfile); string outputFileName = getOutputFileName("fastq", variables); ofstream out; m->openOutputFile(outputFileName, out); while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } //read sequence name string input = m->getline(in); m->gobble(in); string outputString = input + "\n"; if (input[0] == '@') { //get rest of lines outputString += m->getline(in) + "\n"; m->gobble(in); outputString += m->getline(in) + "\n"; m->gobble(in); outputString += m->getline(in) + "\n"; m->gobble(in); vector splits = m->splitWhiteSpace(input); string name = splits[0]; name = name.substr(1); m->checkName(name); if (names.count(name) != 0) { wroteSomething = true; selectedCount++; out << outputString; } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["fastq"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your fastq file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readFastq"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::readFasta(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string outputFileName = getOutputFileName("fasta", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(fastafile, in); string name; bool wroteSomething = false; int selectedCount = 0; if (m->debug) { set temp; sanity["fasta"] = temp; } while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (!dups) {//adjust name if needed map::iterator it = uniqueMap.find(name); if (it != uniqueMap.end()) { currSeq.setName(it->second); } } name = currSeq.getName(); if (name != "") { //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; currSeq.printSequence(out); selectedCount++; if (m->debug) { sanity["fasta"].insert(name); } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["fasta"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your fasta file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::readQual(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(qualfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(qualfile)); variables["[extension]"] = m->getExtension(qualfile); string outputFileName = getOutputFileName("qfile", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(qualfile, in); string name; bool wroteSomething = false; int selectedCount = 0; if (m->debug) { set temp; sanity["qual"] = temp; } while(!in.eof()){ string saveName = ""; string name = ""; string scores = ""; in >> name; if (!dups) {//adjust name if needed map::iterator it = uniqueMap.find(name); if (it != uniqueMap.end()) { name = it->second; } } if (name.length() != 0) { saveName = name.substr(1); while (!in.eof()) { char c = in.get(); if (c == 10 || c == 13 || c == -1){ break; } else { name += c; } } m->gobble(in); } while(in){ char letter= in.get(); if(letter == '>'){ in.putback(letter); break; } else{ scores += letter; } } m->gobble(in); if (names.count(saveName) != 0) { wroteSomething = true; out << name << endl << scores; selectedCount++; if (m->debug) { sanity["qual"].insert(name); } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["qfile"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your quality file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readQual"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::readCount(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string outputFileName = getOutputFileName("count", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(countfile, in); bool wroteSomething = false; int selectedCount = 0; string headers = m->getline(in); m->gobble(in); out << headers << endl; string test = headers; vector pieces = m->splitWhiteSpace(test); string name, rest; int thisTotal; rest = ""; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> thisTotal; m->gobble(in); if (pieces.size() > 2) { rest = m->getline(in); m->gobble(in); } if (m->debug) { m->mothurOut("[DEBUG]: " + name + '\t' + rest + "\n"); } if (names.count(name) != 0) { out << name << '\t' << thisTotal << '\t' << rest << endl; wroteSomething = true; selectedCount+= thisTotal; } } in.close(); out.close(); //check for groups that have been eliminated CountTable ct; if (ct.testGroups(outputFileName)) { ct.readTable(outputFileName, true, false); ct.printTable(outputFileName); } if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } outputTypes["count"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your count file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readCount"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::readList(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); ifstream in; m->openInputFile(listfile, in); bool wroteSomething = false; int selectedCount = 0; if (m->debug) { set temp; sanity["list"] = temp; } while(!in.eof()){ selectedCount = 0; //read in list vector ListVector list(in); //make a new list vector ListVector newList; newList.setLabel(list.getLabel()); variables["[distance]"] = list.getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); vector binLabels = list.getLabels(); vector newBinLabels; if (m->control_pressed) { in.close(); out.close(); return 0; } //for each bin for (int i = 0; i < list.getNumBins(); i++) { //parse out names that are in accnos file string binnames = list.get(i); vector bnames; m->splitAtComma(binnames, bnames); string newNames = ""; for (int j = 0; j < bnames.size(); j++) { string name = bnames[j]; //if that name is in the .accnos file, add it if (names.count(name) != 0) { newNames += name + ","; selectedCount++; if (m->debug) { sanity["list"].insert(name); } } } //if there are names in this bin add to new list if (newNames != "") { newNames = newNames.substr(0, newNames.length()-1); //rip off extra comma newList.push_back(newNames); newBinLabels.push_back(binLabels[i]); } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } m->gobble(in); out.close(); } in.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } m->mothurOut("Selected " + toString(selectedCount) + " sequences from your list file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readList"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::readName(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputFileName = getOutputFileName("name", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; bool wroteSomething = false; int selectedCount = 0; if (m->debug) { set temp; sanity["name"] = temp; } if (m->debug) { set temp; sanity["dupname"] = temp; } while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); in >> secondCol; string hold = ""; if (dups) { hold = secondCol; } vector parsedNames; m->splitAtComma(secondCol, parsedNames); vector validSecond; for (int i = 0; i < parsedNames.size(); i++) { if (names.count(parsedNames[i]) != 0) { validSecond.push_back(parsedNames[i]); if (m->debug) { sanity["dupname"].insert(parsedNames[i]); } } } if ((dups) && (validSecond.size() != 0)) { //dups = true and we want to add someone, then add everyone for (int i = 0; i < parsedNames.size(); i++) { names.insert(parsedNames[i]); if (m->debug) { sanity["dupname"].insert(parsedNames[i]); } } out << firstCol << '\t' << hold << endl; wroteSomething = true; selectedCount += parsedNames.size(); if (m->debug) { sanity["name"].insert(firstCol); } }else { selectedCount += validSecond.size(); //if the name in the first column is in the set then print it and any other names in second column also in set if (names.count(firstCol) != 0) { wroteSomething = true; out << firstCol << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; if (m->debug) { sanity["name"].insert(firstCol); } //make first name in set you come to first column and then add the remaining names to second column }else { //you want part of this row if (validSecond.size() != 0) { wroteSomething = true; out << validSecond[0] << '\t'; //we are changing the unique name in the fasta file uniqueMap[firstCol] = validSecond[0]; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; if (m->debug) { sanity["name"].insert(validSecond[0]); } } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["name"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your name file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readName"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::readGroup(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(groupfile, in); string name, group; bool wroteSomething = false; int selectedCount = 0; if (m->debug) { set temp; sanity["group"] = temp; } while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> group; //read from second column //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; out << name << '\t' << group << endl; selectedCount++; if (m->debug) { sanity["group"].insert(name); } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["group"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your group file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::readTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(taxfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(taxfile)); variables["[extension]"] = m->getExtension(taxfile); string outputFileName = getOutputFileName("taxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(taxfile, in); string name, tax; bool wroteSomething = false; int selectedCount = 0; if (m->debug) { set temp; sanity["tax"] = temp; } while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> tax; //read from second column if (!dups) {//adjust name if needed map::iterator it = uniqueMap.find(name); if (it != uniqueMap.end()) { name = it->second; } } //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; out << name << '\t' << tax << endl; selectedCount++; if (m->debug) { sanity["tax"].insert(name); } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["taxonomy"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your taxonomy file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readTax"); exit(1); } } //********************************************************************************************************************** //alignreport file has a column header line then all other lines contain 16 columns. we just want the first column since that contains the name int GetSeqsCommand::readAlign(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(alignfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(alignfile)); string outputFileName = getOutputFileName("alignreport", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(alignfile, in); string name, junk; bool wroteSomething = false; int selectedCount = 0; //read column headers for (int i = 0; i < 16; i++) { if (!in.eof()) { in >> junk; out << junk << '\t'; } else { break; } } out << endl; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column if (!dups) {//adjust name if needed map::iterator it = uniqueMap.find(name); if (it != uniqueMap.end()) { name = it->second; } } //if this name is in the accnos file if (names.count(name) != 0) { wroteSomething = true; selectedCount++; out << name << '\t'; //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; out << junk << '\t'; } else { break; } } out << endl; }else {//still read just don't do anything with it //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; } else { break; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does not contain any sequence from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["alignreport"].push_back(outputFileName); m->mothurOut("Selected " + toString(selectedCount) + " sequences from your alignreport file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "readAlign"); exit(1); } } //********************************************************************************************************************** //just looking at common mistakes. int GetSeqsCommand::runSanityCheck(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } string filename = outputDir + "get.seqs.debug.report"; ofstream out; m->openOutputFile(filename, out); //compare fasta, name, qual and taxonomy if given to make sure they contain the same seqs if (fastafile != "") { if (namefile != "") { //compare with fasta if (sanity["fasta"] != sanity["name"]) { //create mismatch file createMisMatchFile(out, fastafile, namefile, sanity["fasta"], sanity["name"]); } } if (qualfile != "") { if (sanity["fasta"] != sanity["qual"]) { //create mismatch file createMisMatchFile(out, fastafile, qualfile, sanity["fasta"], sanity["qual"]); } } if (taxfile != "") { if (sanity["fasta"] != sanity["tax"]) { //create mismatch file createMisMatchFile(out, fastafile, taxfile, sanity["fasta"], sanity["tax"]); } } } //compare dupnames, groups and list if given to make sure they match if (namefile != "") { if (groupfile != "") { if (sanity["dupname"] != sanity["group"]) { //create mismatch file createMisMatchFile(out, namefile, groupfile, sanity["dupname"], sanity["group"]); } } if (listfile != "") { if (sanity["dupname"] != sanity["list"]) { //create mismatch file createMisMatchFile(out, namefile, listfile, sanity["dupname"], sanity["list"]); } } }else{ if ((groupfile != "") && (fastafile != "")) { if (sanity["fasta"] != sanity["group"]) { //create mismatch file createMisMatchFile(out, fastafile, groupfile, sanity["fasta"], sanity["group"]); } } } out.close(); if (m->isBlank(filename)) { m->mothurRemove(filename); } else { m->mothurOut("\n[DEBUG]: " + filename + " contains the file mismatches.\n");outputNames.push_back(filename); outputTypes["debug"].push_back(filename); } return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "runSanityCheck"); exit(1); } } //********************************************************************************************************************** //just looking at common mistakes. int GetSeqsCommand::createMisMatchFile(ofstream& out, string filename1, string filename2, set set1, set set2){ try { out << "****************************************" << endl << endl; out << "Names unique to " << filename1 << ":\n"; //remove names in set1 that are also in set2 for (set::iterator it = set1.begin(); it != set1.end();) { string name = *it; if (set2.count(name) == 0) { out << name << endl; } //name unique to set1 else { set2.erase(name); } //you are in both so erase set1.erase(it++); } out << "\nNames unique to " << filename2 << ":\n"; //output results for (set::iterator it = set2.begin(); it != set2.end(); it++) { out << *it << endl; } out << "****************************************" << endl << endl; return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "runSanityCheck"); exit(1); } } //********************************************************************************************************************** int GetSeqsCommand::compareAccnos(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(accnosfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(accnosfile)); string outputFileName = getOutputFileName("accnosreport", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(accnosfile2, in); string name; set namesAccnos2; set namesDups; set namesAccnos = names; map nameCount; if (namefile != "") { ifstream inName; m->openInputFile(namefile, inName); while(!inName.eof()){ if (m->control_pressed) { inName.close(); return 0; } string thisname, repnames; inName >> thisname; m->gobble(inName); //read from first column inName >> repnames; //read from second column int num = m->getNumNames(repnames); nameCount[thisname] = num; m->gobble(inName); } inName.close(); } while(!in.eof()){ in >> name; if (namesAccnos.count(name) == 0){ //name unique to accnos2 int pos = name.find_last_of('_'); string tempName = name; if (pos != string::npos) { tempName = tempName.substr(pos+1); cout << tempName << endl; } if (namesAccnos.count(tempName) == 0){ namesAccnos2.insert(name); }else { //you are in both so erase namesAccnos.erase(name); namesDups.insert(name); } }else { //you are in both so erase namesAccnos.erase(name); namesDups.insert(name); } m->gobble(in); } in.close(); out << "Names in both files : " + toString(namesDups.size()) << endl; m->mothurOut("Names in both files : " + toString(namesDups.size())); m->mothurOutEndLine(); for (set::iterator it = namesDups.begin(); it != namesDups.end(); it++) { out << (*it); if (namefile != "") { out << '\t' << nameCount[(*it)]; } out << endl; } out << "Names unique to " + accnosfile + " : " + toString(namesAccnos.size()) << endl; m->mothurOut("Names unique to " + accnosfile + " : " + toString(namesAccnos.size())); m->mothurOutEndLine(); for (set::iterator it = namesAccnos.begin(); it != namesAccnos.end(); it++) { out << (*it); if (namefile != "") { out << '\t' << nameCount[(*it)]; } out << endl; } out << "Names unique to " + accnosfile2 + " : " + toString(namesAccnos2.size()) << endl; m->mothurOut("Names unique to " + accnosfile2 + " : " + toString(namesAccnos2.size())); m->mothurOutEndLine(); for (set::iterator it = namesAccnos2.begin(); it != namesAccnos2.end(); it++) { out << (*it); if (namefile != "") { out << '\t' << nameCount[(*it)]; } out << endl; } out.close(); outputNames.push_back(outputFileName); outputTypes["accnosreport"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "compareAccnos"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getseqscommand.h000066400000000000000000000033171255543666200214260ustar00rootroot00000000000000#ifndef GETSEQSCOMMAND_H #define GETSEQSCOMMAND_H /* * getseqscommand.h * Mothur * * Created by Sarah Westcott on 7/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" class GetSeqsCommand : public Command { public: GetSeqsCommand(string); GetSeqsCommand(); ~GetSeqsCommand(){} vector setParameters(); string getCommandName() { return "get.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.seqs"; } string getDescription() { return "gets sequences from a list, fasta, name, group, alignreport, quality or taxonomy file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: set names; vector outputNames; string accnosfile, accnosfile2, fastafile, fastqfile, namefile, countfile, groupfile, alignfile, listfile, taxfile, qualfile, outputDir; bool abort, dups; map uniqueMap; //for debug map > sanity; //maps file type to names chosen for file. something like "fasta" -> vector. If running in debug mode this is filled and we check to make sure all the files have the same names. If they don't we output the differences for the user. int readFasta(); int readFastq(); int readName(); int readGroup(); int readCount(); int readAlign(); int readList(); int readTax(); int readQual(); int compareAccnos(); int runSanityCheck(); int createMisMatchFile(ofstream&, string, string, set, set); }; #endif mothur-1.36.1/source/commands/getsharedotucommand.cpp000066400000000000000000000763371255543666200230200ustar00rootroot00000000000000/* * getsharedotucommand.cpp * Mothur * * Created by westcott on 9/22/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "getsharedotucommand.h" #include "sharedutilities.h" //********************************************************************************************************************** vector GetSharedOTUCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "sharedFasta", "none", "none","fasta",false,false); parameters.push_back(pfasta); CommandParameter pgroup("group", "InputTypes", "", "", "none", "none", "groupList","",false,false,true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "sharedList", "sharedList", "groupList","sharedseq",false,false,true); parameters.push_back(plist); CommandParameter pshared("shared", "InputTypes", "", "", "sharedList-sharedFasta", "sharedList", "none","sharedseq",false,false,true); parameters.push_back(pshared); CommandParameter poutput("output", "Multiple", "accnos-default", "default", "", "", "","",false,false); parameters.push_back(poutput); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter puniquegroups("uniquegroups", "String", "", "", "", "", "","",false,false,true); parameters.push_back(puniquegroups); CommandParameter psharedgroups("sharedgroups", "String", "", "", "", "", "","",false,false,true); parameters.push_back(psharedgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string GetSharedOTUCommand::getHelpString(){ try { string helpString = ""; helpString += "The get.sharedseqs command parameters are list, group, shared, label, uniquegroups, sharedgroups, output and fasta. The list and group or shared parameters are required, unless you have valid current files.\n"; helpString += "The label parameter allows you to select what distance levels you would like output files for, and are separated by dashes.\n"; helpString += "The uniquegroups and sharedgroups parameters allow you to select groups you would like to know the shared info for, and are separated by dashes.\n"; helpString += "If you enter your groups under the uniquegroups parameter mothur will return the otus that contain ONLY sequences from those groups.\n"; helpString += "If you enter your groups under the sharedgroups parameter mothur will return the otus that contain sequences from those groups and may also contain sequences from other groups.\n"; helpString += "If you do not enter any groups then the get.sharedseqs command will return sequences that are unique to all groups in your group or shared file.\n"; helpString += "The fasta parameter allows you to input a fasta file and outputs a fasta file for each distance level containing only the sequences that are in OTUs shared by the groups specified. It can only be used with a list and group file not the shared file input.\n"; helpString += "The output parameter allows you to output the list of names without the group and bin number added. \n"; helpString += "With this option you can use the names file as an input in get.seqs and remove.seqs commands. To do this enter output=accnos. \n"; helpString += "The get.sharedseqs command outputs a .names file for each distance level containing a list of sequences in the OTUs shared by the groups specified.\n"; helpString += "The get.sharedseqs command should be in the following format: get.sharedseqs(list=yourListFile, group=yourGroupFile, label=yourLabels, uniquegroups=yourGroups, fasta=yourFastafile, output=yourOutput).\n"; helpString += "Example get.sharedseqs(list=amazon.fn.list, label=unique-0.01, group=amazon.groups, uniquegroups=forest-pasture, fasta=amazon.fasta, output=accnos).\n"; helpString += "The output to the screen is the distance and the number of otus at that distance for the groups you specified.\n"; helpString += "The default value for label is all labels in your inputfile. The default for groups is all groups in your file.\n"; helpString += "Note: No spaces between parameter labels (i.e. label), '=' and parameters (i.e.yourLabel).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string GetSharedOTUCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],[distance],[group],shared.fasta"; } else if (type == "accnos") { pattern = "[filename],[distance],[group],accnos"; } else if (type == "sharedseqs") { pattern = "[filename],[distance],[group],shared.seqs"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** GetSharedOTUCommand::GetSharedOTUCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["sharedseqs"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "GetSharedOTUCommand"); exit(1); } } //********************************************************************************************************************** GetSharedOTUCommand::GetSharedOTUCommand(string option) { try { abort = false; calledHelp = false; unique = true; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["sharedseqs"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; m->setListFile(listfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } if ((sharedfile == "") && (listfile == "")) { //look for currents //is there are current file available for either of these? //give priority to shared, then list sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared or list file."); m->mothurOutEndLine(); abort = true; } } }else if ((sharedfile != "") && (listfile != "")) { m->mothurOut("You may enter ONLY ONE of the following: shared or list."); m->mothurOutEndLine(); abort = true; } if (listfile != "") { if (groupfile == ""){ groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a group file if you are going to use the list format."); m->mothurOutEndLine(); abort = true; } } } if ((sharedfile != "") && (fastafile != "")) { m->mothurOut("You cannot use the fasta file with the shared file."); m->mothurOutEndLine(); abort = true; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } output = validParameter.validFile(parameters, "output", false); if (output == "not found") { output = ""; } else if (output == "default") { output = ""; } groups = validParameter.validFile(parameters, "uniquegroups", false); if (groups == "not found") { groups = ""; } else { userGroups = "unique." + groups; m->splitAtDash(groups, Groups); if (Groups.size() > 4) { userGroups = "unique.selected_groups"; } //if too many groups then the filename becomes too big. } groups = validParameter.validFile(parameters, "sharedgroups", false); if (groups == "not found") { groups = ""; } else { userGroups = groups; m->splitAtDash(groups, Groups); if (Groups.size() > 4) { userGroups = "selected_groups"; } //if too many groups then the filename becomes too big. unique = false; } } } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "GetSharedOTUCommand"); exit(1); } } //********************************************************************************************************************** int GetSharedOTUCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if ( sharedfile != "") { runShared(); } else { m->setGroups(Groups); groupMap = new GroupMap(groupfile); int error = groupMap->readMap(); if (error == 1) { delete groupMap; return 0; } if (m->control_pressed) { delete groupMap; return 0; } if (Groups.size() == 0) { Groups = groupMap->getNamesOfGroups(); //make string for outputfile name userGroups = "unique."; for(int i = 0; i < Groups.size(); i++) { userGroups += Groups[i] + "-"; } userGroups = userGroups.substr(0, userGroups.length()-1); if (Groups.size() > 4) { userGroups = "unique.selected_groups"; } //if too many groups then the filename becomes too big. }else{ //sanity check for group names SharedUtil util; vector namesOfGroups = groupMap->getNamesOfGroups(); util.setGroups(Groups, namesOfGroups); groupMap->setNamesOfGroups(namesOfGroups); } //put groups in map to find easier for(int i = 0; i < Groups.size(); i++) { groupFinder[Groups[i]] = Groups[i]; } if (fastafile != "") { ifstream inFasta; m->openInputFile(fastafile, inFasta); while(!inFasta.eof()) { if (m->control_pressed) { outputTypes.clear(); inFasta.close(); delete groupMap; return 0; } Sequence seq(inFasta); m->gobble(inFasta); if (seq.getName() != "") { seqs.push_back(seq); } } inFasta.close(); } ListVector* lastlist = NULL; string lastLabel = ""; //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; ifstream in; m->openInputFile(listfile, in); //as long as you are not at the end of the file or done wih the lines you want while((!in.eof()) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { if (lastlist != NULL) { delete lastlist; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); delete groupMap; return 0; } list = new ListVector(in); if(allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel()); process(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); m->mothurOut(lastlist->getLabel()); process(lastlist); processedLabels.insert(lastlist->getLabel()); userLabels.erase(lastlist->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); if (lastlist != NULL) { delete lastlist; } lastlist = list; } in.close(); //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { m->mothurOut(lastlist->getLabel()); process(lastlist); processedLabels.insert(lastlist->getLabel()); userLabels.erase(lastlist->getLabel()); } //reset groups parameter m->clearGroups(); if (lastlist != NULL) { delete lastlist; } if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete groupMap; return 0; } } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } if (output == "accnos") { itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "execute"); exit(1); } } /***********************************************************/ int GetSharedOTUCommand::process(ListVector* shared) { try { map fastaMap; ofstream outNames; string outputFileNames; if (outputDir == "") { outputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[distance]"] = shared->getLabel(); variables["[group]"] = userGroups; if (output != "accnos") { outputFileNames = getOutputFileName("sharedseqs", variables); } else { outputFileNames = getOutputFileName("accnos", variables); } m->openOutputFile(outputFileNames, outNames); bool wroteSomething = false; int num = 0; //go through each bin, find out if shared vector binLabels = shared->getLabels(); for (int i = 0; i < shared->getNumBins(); i++) { if (m->control_pressed) { outNames.close(); m->mothurRemove(outputFileNames); return 0; } bool uniqueOTU = true; map atLeastOne; for (int f = 0; f < Groups.size(); f++) { atLeastOne[Groups[f]] = 0; } vector namesOfSeqsInThisBin; string names = shared->get(i); vector binNames; m->splitAtComma(names, binNames); for(int j = 0; j < binNames.size(); j++) { string name = binNames[j]; //find group string seqGroup = groupMap->getGroup(name); if (output != "accnos") { namesOfSeqsInThisBin.push_back((name + "|" + seqGroup + "|" + binLabels[i])); }else { namesOfSeqsInThisBin.push_back(name); } if (seqGroup == "not found") { m->mothurOut(name + " is not in your groupfile. Please correct."); m->mothurOutEndLine(); exit(1); } //is this seq in one of hte groups we care about it = groupFinder.find(seqGroup); if (it == groupFinder.end()) { uniqueOTU = false; } //you have a sequence from a group you don't want else { atLeastOne[seqGroup]++; } } //make sure you have at least one seq from each group you want bool sharedByAll = true; map::iterator it2; for (it2 = atLeastOne.begin(); it2 != atLeastOne.end(); it2++) { if (it2->second == 0) { sharedByAll = false; } } //if the user wants unique bins and this is unique then print //or this the user wants shared bins and this bin is shared then print if ((unique && uniqueOTU && sharedByAll) || (!unique && sharedByAll)) { wroteSomething = true; num++; //output list of names for (int j = 0; j < namesOfSeqsInThisBin.size(); j++) { outNames << namesOfSeqsInThisBin[j] << endl; if (fastafile != "") { if (output != "accnos") { string seqName = namesOfSeqsInThisBin[j].substr(0,namesOfSeqsInThisBin[j].find_last_of('|')); seqName = seqName.substr(0,seqName.find_last_of('|')); fastaMap[seqName] = namesOfSeqsInThisBin[j]; //fastaMap needs to contain just the seq name for output later }else { fastaMap[namesOfSeqsInThisBin[j]] = namesOfSeqsInThisBin[j]; } } } } } outNames.close(); if (!wroteSomething) { m->mothurRemove(outputFileNames); string outputString = "\t" + toString(num) + " - No otus shared by groups"; string groupString = ""; for (int h = 0; h < Groups.size(); h++) { groupString += " " + Groups[h]; } outputString += groupString + "."; m->mothurOut(outputString); m->mothurOutEndLine(); }else { m->mothurOut("\t" + toString(num)); m->mothurOutEndLine(); outputNames.push_back(outputFileNames); if (output != "accnos") { outputTypes["sharedseqs"].push_back(outputFileNames); } else { outputTypes["accnos"].push_back(outputFileNames); } } //if fasta file provided output new fasta file if ((fastafile != "") && wroteSomething) { if (outputDir == "") { outputDir += m->hasPath(fastafile); } variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); string outputFileFasta = getOutputFileName("fasta", variables); ofstream outFasta; m->openOutputFile(outputFileFasta, outFasta); outputNames.push_back(outputFileFasta); outputTypes["fasta"].push_back(outputFileFasta); for (int k = 0; k < seqs.size(); k++) { if (m->control_pressed) { outFasta.close(); return 0; } //if this is a sequence we want, output it it = fastaMap.find(seqs[k].getName()); if (it != fastaMap.end()) { if (output != "accnos") { outFasta << ">" << it->second << endl; }else { outFasta << ">" << it->first << endl; } outFasta << seqs[k].getAligned() << endl; } } outFasta.close(); } return 0; } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "process"); exit(1); } } /***********************************************************/ int GetSharedOTUCommand::runShared() { try { InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (Groups.size() == 0) { Groups = m->getGroups(); //make string for outputfile name userGroups = "unique."; for(int i = 0; i < Groups.size(); i++) { userGroups += Groups[i] + "-"; } userGroups = userGroups.substr(0, userGroups.length()-1); if (Groups.size() > 4) { userGroups = "unique.selected_groups"; } //if too many groups then the filename becomes too big. }else { //sanity check for group names SharedUtil util; vector allGroups = m->getAllGroups(); util.setGroups(Groups, allGroups); } //put groups in map to find easier for(int i = 0; i < Groups.size(); i++) { groupFinder[Groups[i]] = Groups[i]; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); process(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //reset groups parameter m->clearGroups(); return 0; } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "runShared"); exit(1); } } /***********************************************************/ int GetSharedOTUCommand::process(vector& lookup) { try { string outputFileNames; if (outputDir == "") { outputDir += m->hasPath(sharedfile); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = lookup[0]->getLabel(); variables["[group]"] = userGroups; if (output != "accnos") { outputFileNames = getOutputFileName("sharedseqs", variables); } else { outputFileNames = getOutputFileName("accnos", variables); } ofstream outNames; m->openOutputFile(outputFileNames, outNames); bool wroteSomething = false; int num = 0; //go through each bin, find out if shared for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { outNames.close(); m->mothurRemove(outputFileNames); return 0; } bool uniqueOTU = true; map atLeastOne; for (int f = 0; f < Groups.size(); f++) { atLeastOne[Groups[f]] = 0; } set namesOfGroupsInThisBin; for(int j = 0; j < lookup.size(); j++) { string seqGroup = lookup[j]->getGroup(); string name = m->currentSharedBinLabels[i]; if (lookup[j]->getAbundance(i) != 0) { if (output != "accnos") { namesOfGroupsInThisBin.insert(name + "|" + seqGroup + "|" + toString(lookup[j]->getAbundance(i))); }else { namesOfGroupsInThisBin.insert(name); } //is this seq in one of the groups we care about it = groupFinder.find(seqGroup); if (it == groupFinder.end()) { uniqueOTU = false; } //you have sequences from a group you don't want else { atLeastOne[seqGroup]++; } } } //make sure you have at least one seq from each group you want bool sharedByAll = true; map::iterator it2; for (it2 = atLeastOne.begin(); it2 != atLeastOne.end(); it2++) { if (it2->second == 0) { sharedByAll = false; } } //if the user wants unique bins and this is unique then print //or this the user wants shared bins and this bin is shared then print if ((unique && uniqueOTU && sharedByAll) || (!unique && sharedByAll)) { wroteSomething = true; num++; //output list of names for (set::iterator itNames = namesOfGroupsInThisBin.begin(); itNames != namesOfGroupsInThisBin.end(); itNames++) { outNames << (*itNames) << endl; } } } outNames.close(); if (!wroteSomething) { m->mothurRemove(outputFileNames); string outputString = "\t" + toString(num) + " - No otus shared by groups"; string groupString = ""; for (int h = 0; h < Groups.size(); h++) { groupString += " " + Groups[h]; } outputString += groupString + "."; m->mothurOut(outputString); m->mothurOutEndLine(); }else { m->mothurOut("\t" + toString(num)); m->mothurOutEndLine(); outputNames.push_back(outputFileNames); if (output != "accnos") { outputTypes["sharedseqs"].push_back(outputFileNames); } else { outputTypes["accnos"].push_back(outputFileNames); } } return 0; } catch(exception& e) { m->errorOut(e, "GetSharedOTUCommand", "process"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/getsharedotucommand.h000066400000000000000000000033631255543666200224520ustar00rootroot00000000000000#ifndef GETSHAREDOTUCOMMAND_H #define GETSHAREDOTUCOMMAND_H /* * getsharedotucommand.h * Mothur * * Created by westcott on 9/22/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "listvector.hpp" #include "sequence.hpp" #include "groupmap.h" #include "sharedrabundvector.h" #include "inputdata.h" //********************************************************************************************************************** class GetSharedOTUCommand : public Command { public: GetSharedOTUCommand(string); GetSharedOTUCommand(); ~GetSharedOTUCommand() {} vector setParameters(); string getCommandName() { return "get.sharedseqs"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getRequiredCommand() { return "none"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.sharedseqs"; } string getDescription() { return "identifies sequences that are either unique or shared by specific groups"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: ListVector* list; GroupMap* groupMap; set labels; string fastafile, label, groups, listfile, groupfile, sharedfile, output, userGroups, outputDir, format; bool abort, allLines, unique; vector Groups; map groupFinder; map::iterator it; vector seqs; vector outputNames; int process(ListVector*); int process(vector&); int runShared(); }; //********************************************************************************************************************** #endif mothur-1.36.1/source/commands/hclustercommand.cpp000066400000000000000000000446351255543666200221470ustar00rootroot00000000000000/* * hclustercommand.cpp * Mothur * * Created by westcott on 10/13/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "hclustercommand.h" //********************************************************************************************************************** vector HClusterCommand::setParameters(){ try { CommandParameter pphylip("phylip", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "none","list-rabund-sabund",false,false,true); parameters.push_back(pphylip); CommandParameter pname("name", "InputTypes", "", "", "none", "none", "ColumnName","",false,false,true); parameters.push_back(pname); CommandParameter pcolumn("column", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "ColumnName","list-rabund-sabund",false,false,true); parameters.push_back(pcolumn); CommandParameter pcutoff("cutoff", "Number", "", "10", "", "", "","",false,false,true); parameters.push_back(pcutoff); CommandParameter pprecision("precision", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pprecision); CommandParameter pmethod("method", "Multiple", "furthest-nearest-average-weighted", "average", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter phard("hard", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(phard); CommandParameter psorted("sorted", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psorted); CommandParameter pshowabund("showabund", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pshowabund); CommandParameter ptiming("timing", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(ptiming); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "HClusterCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string HClusterCommand::getHelpString(){ try { string helpString = ""; helpString += "The hcluster command parameter options are cutoff, precision, method, phylip, column, name, showabund, timing and sorted. Phylip or column and name are required, unless you have valid current files.\n"; helpString += "The phylip and column parameter allow you to enter your distance file, and sorted indicates whether your column distance file is already sorted. \n"; helpString += "The name parameter allows you to enter your name file and is required if your distance file is in column format. \n"; helpString += "The hcluster command should be in the following format: \n"; helpString += "hcluster(column=youDistanceFile, name=yourNameFile, method=yourMethod, cutoff=yourCutoff, precision=yourPrecision) \n"; helpString += "The acceptable hcluster methods are furthest, nearest, weighted and average.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "HClusterCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string HClusterCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "list") { pattern = "[filename],[clustertag],list"; } else if (type == "rabund") { pattern = "[filename],[clustertag],rabund"; } else if (type == "sabund") { pattern = "[filename],[clustertag],sabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "HClusterCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** HClusterCommand::HClusterCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "HClusterCommand", "HClusterCommand"); exit(1); } } //********************************************************************************************************************** //This function checks to make sure the cluster command has no errors and then clusters based on the method chosen. HClusterCommand::HClusterCommand(string option) { try{ abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { distfile = phylipfile; format = "phylip"; m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { abort = true; } else if (columnfile == "not found") { columnfile = ""; } else { distfile = columnfile; format = "column"; m->setColumnFile(columnfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } if ((phylipfile == "") && (columnfile == "")) { //is there are current file available for either of these? //give priority to column, then phylip columnfile = m->getColumnFile(); if (columnfile != "") { m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a phylip or column file before you can use the hcluster command."); m->mothurOutEndLine(); abort = true; } } } else if ((phylipfile != "") && (columnfile != "")) { m->mothurOut("When executing a hcluster command you must enter ONLY ONE of the following: phylip or column."); m->mothurOutEndLine(); abort = true; } if (columnfile != "") { if (namefile == "") { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a namefile if you are going to use the column format."); m->mothurOutEndLine(); abort = true; } } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... //get user cutoff and precision or use defaults string temp; temp = validParameter.validFile(parameters, "precision", false); if (temp == "not found") { temp = "100"; } //saves precision legnth for formatting below length = temp.length(); m->mothurConvert(temp, precision); temp = validParameter.validFile(parameters, "hard", false); if (temp == "not found") { temp = "T"; } hard = m->isTrue(temp); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, cutoff); cutoff += (5 / (precision * 10.0)); method = validParameter.validFile(parameters, "method", false); if (method == "not found") { method = "average"; } if ((method == "furthest") || (method == "nearest") || (method == "average") || (method == "weighted")) { } else { m->mothurOut("Not a valid clustering method. Valid clustering algorithms are furthest, nearest, average or weighted."); m->mothurOutEndLine(); abort = true; } showabund = validParameter.validFile(parameters, "showabund", false); if (showabund == "not found") { showabund = "T"; } sort = validParameter.validFile(parameters, "sorted", false); if (sort == "not found") { sort = "F"; } sorted = m->isTrue(sort); timing = validParameter.validFile(parameters, "timing", false); if (timing == "not found") { timing = "F"; } if (abort == false) { if (outputDir == "") { outputDir += m->hasPath(distfile); } fileroot = outputDir + m->getRootName(m->getSimpleName(distfile)); if (method == "furthest") { tag = "fn"; } else if (method == "nearest") { tag = "nn"; } else if (method == "weighted") { tag = "wn"; } else { tag = "an"; } map variables; variables["[filename]"] = fileroot; variables["[clustertag]"] = tag; string sabundFileName = getOutputFileName("sabund",variables); string rabundFileName = getOutputFileName("rabund",variables); string listFileName = getOutputFileName("list", variables); m->openOutputFile(sabundFileName, sabundFile); m->openOutputFile(rabundFileName, rabundFile); m->openOutputFile(listFileName, listFile); outputNames.push_back(sabundFileName); outputTypes["sabund"].push_back(sabundFileName); outputNames.push_back(rabundFileName); outputTypes["rabund"].push_back(rabundFileName); outputNames.push_back(listFileName); outputTypes["list"].push_back(listFileName); } } } catch(exception& e) { m->errorOut(e, "HClusterCommand", "HClusterCommand"); exit(1); } } //********************************************************************************************************************** int HClusterCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } NameAssignment* nameMap = NULL; if(namefile != ""){ nameMap = new NameAssignment(namefile); nameMap->readMap(); } time_t estart = time(NULL); if (!sorted) { read = new ReadCluster(distfile, cutoff, outputDir, true); read->setFormat(format); read->read(nameMap); if (m->control_pressed) { delete read; sabundFile.close(); rabundFile.close(); listFile.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } distfile = read->getOutputFile(); list = read->getListVector(); delete read; }else { list = new ListVector(nameMap->getListVector()); } if (m->control_pressed) { sabundFile.close(); rabundFile.close(); listFile.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to sort. "); m->mothurOutEndLine(); estart = time(NULL); //list vector made by read contains all sequence names if(list != NULL){ rabund = new RAbundVector(list->getRAbundVector()); }else{ m->mothurOut("Error: no list vector!"); m->mothurOutEndLine(); return 0; } list->printHeaders(listFile); float previousDist = 0.00000; float rndPreviousDist = 0.00000; oldRAbund = *rabund; oldList = *list; print_start = true; start = time(NULL); cluster = new HCluster(rabund, list, method, distfile, nameMap, cutoff); vector seqs; seqs.resize(1); // to start loop if (m->control_pressed) { delete cluster; sabundFile.close(); rabundFile.close(); listFile.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } float saveCutoff = cutoff; while (seqs.size() != 0){ seqs = cluster->getSeqs(); //to account for cutoff change in average neighbor if (seqs.size() != 0) { if (seqs[0].dist > cutoff) { break; } } if (m->control_pressed) { delete cluster; sabundFile.close(); rabundFile.close(); listFile.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } for (int i = 0; i < seqs.size(); i++) { //-1 means skip me if (seqs[i].seq1 != seqs[i].seq2) { cutoff = cluster->update(seqs[i].seq1, seqs[i].seq2, seqs[i].dist); if (m->control_pressed) { delete cluster; sabundFile.close(); rabundFile.close(); listFile.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } float rndDist; if (hard) { rndDist = m->ceilDist(seqs[i].dist, precision); }else{ rndDist = m->roundDist(seqs[i].dist, precision); } if((previousDist <= 0.0000) && (seqs[i].dist != previousDist)){ printData("unique"); } else if((rndDist != rndPreviousDist)){ printData(toString(rndPreviousDist, length-1)); } previousDist = seqs[i].dist; rndPreviousDist = rndDist; oldRAbund = *rabund; oldList = *list; } } } if (m->control_pressed) { delete cluster; sabundFile.close(); rabundFile.close(); listFile.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } if(previousDist <= 0.0000){ printData("unique"); } else if(rndPreviousDistcontrol_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } if (saveCutoff != cutoff) { if (hard) { saveCutoff = m->ceilDist(saveCutoff, precision); } else { saveCutoff = m->roundDist(saveCutoff, precision); } m->mothurOut("changed cutoff to " + toString(cutoff)); m->mothurOutEndLine(); } //set list file as new current listfile string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } //set rabund file as new current rabundfile itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } //set sabund file as new current sabundfile itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - estart) + " seconds to cluster. "); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "HClusterCommand", "execute"); exit(1); } } //********************************************************************************************************************** void HClusterCommand::printData(string label){ try { if (m->isTrue(timing)) { m->mothurOut("\tTime: " + toString(time(NULL) - start) + "\tsecs for " + toString(oldRAbund.getNumBins()) + "\tclusters. Updates: " + toString(loops)); m->mothurOutEndLine(); } print_start = true; loops = 0; start = time(NULL); oldRAbund.setLabel(label); if (m->isTrue(showabund)) { oldRAbund.getSAbundVector().print(cout); } oldRAbund.print(rabundFile); oldRAbund.getSAbundVector().print(sabundFile); oldList.setLabel(label); oldList.print(listFile); } catch(exception& e) { m->errorOut(e, "HClusterCommand", "printData"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/hclustercommand.h000066400000000000000000000046211255543666200216030ustar00rootroot00000000000000#ifndef HCLUSTERCOMMAND_H #define HCLUSTERCOMMAND_H /* * hclustercommand.h * Mothur * * Created by westcott on 10/13/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "hcluster.h" #include "rabundvector.hpp" #include "sabundvector.hpp" #include "listvector.hpp" #include "readcluster.h" /******************************************************************/ //This command is an implementation of the HCluster algorithmn described in //ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences by //Yijun Sun1,2,*, Yunpeng Cai2, Li Liu1, Fahong Yu1, Michael L. Farrell3, William McKendree3 //and William Farmerie1 1 //Interdisciplinary Center for Biotechnology Research, 2Department of Electrical and Computer Engineering, //University of Florida, Gainesville, FL 32610-3622 and 3Materials Technology Directorate, Air Force Technical //Applications Center, 1030 S. Highway A1A, Patrick AFB, FL 32925-3002, USA //Received January 28, 2009; Revised April 14, 2009; Accepted April 15, 2009 /************************************************************/ class HClusterCommand : public Command { public: HClusterCommand(string); HClusterCommand(); ~HClusterCommand(){} vector setParameters(); string getCommandName() { return "hcluster"; } string getCommandCategory() { return "Clustering"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Sun Y, Cai Y, Liu L, Yu F, Farrell ML, Mckendree W, Farmerie W (2009). ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences. Nucleic Acids Res 37: e76. \nhttp://www.mothur.org/wiki/Hcluster"; } string getDescription() { return "cluster your sequences into OTUs using a distance matrix"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: HCluster* cluster; ListVector* list; RAbundVector* rabund; RAbundVector oldRAbund; ListVector oldList; ReadCluster* read; bool abort, sorted, print_start, hard; string method, fileroot, tag, distfile, format, phylipfile, columnfile, namefile, sort, showabund, timing, outputDir; double cutoff; int precision, length; ofstream sabundFile, rabundFile, listFile; time_t start; unsigned long loops; vector outputNames; void printData(string label); }; /************************************************************/ #endif mothur-1.36.1/source/commands/heatmapcommand.cpp000066400000000000000000000575451255543666200217410ustar00rootroot00000000000000/* * heatmapcommand.cpp * Mothur * * Created by Sarah Westcott on 3/25/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "heatmapcommand.h" //********************************************************************************************************************** vector HeatMapCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "LRSS", "LRSS", "none","svg",false,false,true); parameters.push_back(plist); CommandParameter prabund("rabund", "InputTypes", "", "", "LRSS", "LRSS", "none","svg",false,false); parameters.push_back(prabund); CommandParameter psabund("sabund", "InputTypes", "", "", "LRSS", "LRSS", "none","svg",false,false); parameters.push_back(psabund); CommandParameter pshared("shared", "InputTypes", "", "", "LRSS", "LRSS", "none","svg",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "LRSS", "LRSS", "none","svg",false,false); parameters.push_back(prelabund); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pscale("scale", "Multiple", "log10-log2-linear", "log10", "", "", "","",false,false); parameters.push_back(pscale); CommandParameter psorted("sorted", "Multiple", "none-shared-topotu-topgroup", "shared", "", "", "","",false,false); parameters.push_back(psorted); CommandParameter pnumotu("numotu", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pnumotu); CommandParameter pfontsize("fontsize", "Number", "", "24", "", "", "","",false,false); parameters.push_back(pfontsize); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "HeatMapCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string HeatMapCommand::getHelpString(){ try { string helpString = ""; helpString += "The heatmap.bin command parameters are shared, relabund, list, rabund, sabund, groups, sorted, scale, numotu, fontsize and label. shared, relabund, list, rabund or sabund is required unless you have a valid current file.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included in your heatmap.\n"; helpString += "The sorted parameter allows you to order the otus displayed, default=shared, meaning display the shared otus first. Other options for sorted are none, meaning the exact representation of your otus, \n"; helpString += "topotu, meaning the otus with the greatest abundance when totaled across groups, topgroup, meaning the top otus for each group. \n"; helpString += "The scale parameter allows you to choose the range of color your bin information will be displayed with.\n"; helpString += "The numotu parameter allows you to display only the top N otus, by default all the otus are displayed. You could choose to look at the top 10, by setting numotu=10. The default for sorted is topotu when numotu is used.\n"; helpString += "The group names are separated by dashes. The label parameter allows you to select what distance levels you would like a heatmap created for, and are also separated by dashes.\n"; helpString += "The fontsize parameter allows you to adjust the font size of the picture created, default=24.\n"; helpString += "The heatmap.bin command should be in the following format: heatmap.bin(groups=yourGroups, sorted=yourSorted, label=yourLabels).\n"; helpString += "Example heatmap.bin(groups=A-B-C, sorted=none, scale=log10).\n"; helpString += "The default value for groups is all the groups in your groupfile, and all labels in your inputfile will be used.\n"; helpString += "The default value for scale is log10; your other options are log2 and linear.\n"; helpString += "The heatmap.bin command outputs a .svg file for each label you specify.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "HeatMapCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string HeatMapCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "svg") { pattern = "[filename],svg"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "HeatMapCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** HeatMapCommand::HeatMapCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["svg"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "HeatMapCommand", "HeatMapCommand"); exit(1); } } //********************************************************************************************************************** HeatMapCommand::HeatMapCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["svg"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; inputfile = listfile; m->setListFile(listfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { format = "sabund"; inputfile = sabundfile; m->setSabundFile(sabundfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { abort = true; } else if (rabundfile == "not found") { rabundfile = ""; } else { format = "rabund"; inputfile = rabundfile; m->setRabundFile(rabundfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { format = "sharedfile"; inputfile = sharedfile; m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { format = "relabund"; inputfile = relabundfile; m->setRelAbundFile(relabundfile); } if ((sharedfile == "") && (listfile == "") && (rabundfile == "") && (sabundfile == "") && (relabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputfile = sharedfile; format = "sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { inputfile = listfile; format = "list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { rabundfile = m->getRabundFile(); if (rabundfile != "") { inputfile = rabundfile; format = "rabund"; m->mothurOut("Using " + rabundfile + " as input file for the rabund parameter."); m->mothurOutEndLine(); } else { sabundfile = m->getSabundFile(); if (sabundfile != "") { inputfile = sabundfile; format = "sabund"; m->mothurOut("Using " + sabundfile + " as input file for the sabund parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputfile = relabundfile; format = "relabund"; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list, sabund, rabund, relabund or shared file."); m->mothurOutEndLine(); abort = true; } } } } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } string temp = validParameter.validFile(parameters, "numotu", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, numOTU); temp = validParameter.validFile(parameters, "fontsize", false); if (temp == "not found") { temp = "24"; } m->mothurConvert(temp, fontSize); sorted = validParameter.validFile(parameters, "sorted", false); if (sorted == "not found") { //if numOTU is used change default if (numOTU != 0) { sorted = "topotu"; } else { sorted = "shared"; } } scale = validParameter.validFile(parameters, "scale", false); if (scale == "not found") { scale = "log10"; } if ((sorted != "none") && (sorted != "shared") && (sorted != "topotu") && (sorted != "topgroup")) { m->mothurOut(sorted + " is not a valid sorting option. Sorted options are: none, shared, topotu, topgroup"); m->mothurOutEndLine(); abort=true; } } } catch(exception& e) { m->errorOut(e, "HeatMapCommand", "HeatMapCommand"); exit(1); } } //********************************************************************************************************************** int HeatMapCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } heatmap = new HeatMap(sorted, scale, numOTU, fontSize, outputDir, inputfile); string lastLabel; input = new InputData(inputfile, format); if (format == "sharedfile") { //you have groups lookup = input->getSharedRAbundVectors(); lastLabel = lookup[0]->getLabel(); }else if ((format == "list") || (format == "rabund") || (format == "sabund")) { //you are using just a list file and have only one group rabund = input->getRAbundVector(); lastLabel = rabund->getLabel(); }else if (format == "relabund") { //you have groups lookupFloat = input->getSharedRAbundFloatVectors(); lastLabel = lookupFloat[0]->getLabel(); } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (format == "sharedfile") { //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for (int i = 0; i < outputNames.size(); i++) { if (outputNames[i] != "control") { m->mothurRemove(outputNames[i]); } } outputTypes.clear(); m->clearGroups(); delete input; delete heatmap; return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(lookup); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(lookup); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } //get next line to process lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { if (outputNames[i] != "control") { m->mothurRemove(outputNames[i]); } } outputTypes.clear(); m->clearGroups(); delete input; delete heatmap; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(lookup); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //reset groups parameter m->clearGroups(); }else if ((format == "list") || (format == "rabund") || (format == "sabund")) { while((rabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { if (outputNames[i] != "control") { m->mothurRemove(outputNames[i]); } } outputTypes.clear(); delete rabund; delete input; delete heatmap; return 0; } if(allLines == 1 || labels.count(rabund->getLabel()) == 1){ m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(rabund); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); processedLabels.insert(rabund->getLabel()); userLabels.erase(rabund->getLabel()); } if ((m->anyLabelsToProcess(rabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = rabund->getLabel(); delete rabund; rabund = input->getRAbundVector(lastLabel); m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(rabund); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); processedLabels.insert(rabund->getLabel()); userLabels.erase(rabund->getLabel()); //restore real lastlabel to save below rabund->setLabel(saveLabel); } lastLabel = rabund->getLabel(); delete rabund; rabund = input->getRAbundVector(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { if (outputNames[i] != "control") { m->mothurRemove(outputNames[i]); } } outputTypes.clear(); delete input; delete heatmap; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (rabund != NULL) { delete rabund; } rabund = input->getRAbundVector(lastLabel); m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(rabund); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); delete rabund; } }else { //as long as you are not at the end of the file or done wih the lines you want while((lookupFloat[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } for (int i = 0; i < outputNames.size(); i++) { if (outputNames[i] != "control") { m->mothurRemove(outputNames[i]); } } outputTypes.clear(); m->clearGroups(); delete input; delete heatmap; return 0; } if(allLines == 1 || labels.count(lookupFloat[0]->getLabel()) == 1){ m->mothurOut(lookupFloat[0]->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(lookupFloat); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); } if ((m->anyLabelsToProcess(lookupFloat[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookupFloat[0]->getLabel(); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookupFloat[0]->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(lookupFloat); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); //restore real lastlabel to save below lookupFloat[0]->setLabel(saveLabel); } lastLabel = lookupFloat[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; lookupFloat[i] = NULL; } //get next line to process lookupFloat = input->getSharedRAbundFloatVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { if (outputNames[i] != "control") { m->mothurRemove(outputNames[i]); } } outputTypes.clear(); m->clearGroups(); delete input; delete heatmap; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i] != NULL) { delete lookupFloat[i]; } } lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookupFloat[0]->getLabel()); m->mothurOutEndLine(); string outputFileName = heatmap->getPic(lookupFloat); outputNames.push_back(outputFileName); outputTypes["svg"].push_back(outputFileName); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } } //reset groups parameter m->clearGroups(); } delete input; delete heatmap; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { if (outputNames[i] != "control") { m->mothurRemove(outputNames[i]); } } outputTypes.clear(); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "HeatMapCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/heatmapcommand.h000066400000000000000000000025231255543666200213700ustar00rootroot00000000000000#ifndef HEATMAPCOMMAND_H #define HEATMAPCOMMAND_H /* * heatmapcommand.h * Mothur * * Created by Sarah Westcott on 3/25/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "sharedlistvector.h" #include "heatmap.h" #include "rabundvector.hpp" class HeatMapCommand : public Command { public: HeatMapCommand(string); HeatMapCommand(); ~HeatMapCommand(){} vector setParameters(); string getCommandName() { return "heatmap.bin"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Heatmap.bin"; } string getDescription() { return "generate a heatmap where the color represents the relative abundanceof an OTU"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: InputData* input; RAbundVector* rabund; vector lookup; vector lookupFloat; HeatMap* heatmap; bool abort, allLines; set labels; //holds labels to be used string format, groups, sorted, scale, label, outputDir, sharedfile, relabundfile, listfile, rabundfile, sabundfile, inputfile; vector Groups, outputNames; int numOTU, fontSize; }; #endif mothur-1.36.1/source/commands/heatmapsimcommand.cpp000066400000000000000000000600001255543666200224260ustar00rootroot00000000000000/* * heatmapsimcommand.cpp * Mothur * * Created by Sarah Westcott on 6/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "heatmapsimcommand.h" #include "sharedjabund.h" #include "sharedsorabund.h" #include "sharedjclass.h" #include "sharedsorclass.h" #include "sharedjest.h" #include "sharedsorest.h" #include "sharedthetayc.h" #include "sharedthetan.h" #include "sharedmorisitahorn.h" #include "sharedbraycurtis.h" //********************************************************************************************************************** vector HeatMapSimCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "PhylipColumnShared", "PhylipColumnShared", "none","svg",false,false,true); parameters.push_back(pshared); CommandParameter pphylip("phylip", "InputTypes", "", "", "PhylipColumnShared", "PhylipColumnShared", "none","svg",false,false); parameters.push_back(pphylip); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","",false,false); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","",false,false); parameters.push_back(pcount); CommandParameter pcolumn("column", "InputTypes", "", "", "PhylipColumnShared", "PhylipColumnShared", "ColumnName","svg",false,false); parameters.push_back(pcolumn); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pcalc("calc", "Multiple", "jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan-morisitahorn-braycurtis", "jest-thetayc", "", "", "","",true,false); parameters.push_back(pcalc); CommandParameter pfontsize("fontsize", "Number", "", "24", "", "", "","",false,false); parameters.push_back(pfontsize); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "HeatMapSimCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string HeatMapSimCommand::getHelpString(){ try { string helpString = ""; ValidCalculators validCalculator; helpString += "The heatmap.sim command parameters are shared, phylip, column, name, count, groups, calc, fontsize and label. shared or phylip or column and name are required unless valid current files exist.\n"; helpString += "There are two ways to use the heatmap.sim command. The first is with a shared file, and you may use the groups, label and calc parameter. \n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included in your heatmap.\n"; helpString += "The group names are separated by dashes. The label parameter allows you to select what distance levels you would like a heatmap created for, and is also separated by dashes.\n"; helpString += "The fontsize parameter allows you to adjust the font size of the picture created, default=24.\n"; helpString += "The heatmap.sim command should be in the following format: heatmap.sim(groups=yourGroups, calc=yourCalc, label=yourLabels).\n"; helpString += "Example heatmap.sim(groups=A-B-C, calc=jabund).\n"; helpString += "The default value for groups is all the groups in your groupfile, and all labels in your inputfile will be used.\n"; helpString += validCalculator.printCalc("heat"); helpString += "The default value for calc is jclass-thetayc.\n"; helpString += "The heatmap.sim command outputs a .svg file for each calculator you choose at each label you specify.\n"; helpString += "The second way to use the heatmap.sim command is with a distance file representing the distance bewteen your groups. \n"; helpString += "Using the command this way, the phylip or column parameter are required, and only one may be used. If you use a column file the name filename is required. \n"; helpString += "The heatmap.sim command should be in the following format: heatmap.sim(phylip=yourDistanceFile).\n"; helpString += "Example heatmap.sim(phylip=amazonGroups.dist).\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "HeatMapSimCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string HeatMapSimCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "svg") { pattern = "[filename],svg"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "HeatMapCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** HeatMapSimCommand::HeatMapSimCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["svg"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "HeatMapSimCommand", "HeatMapSimCommand"); exit(1); } } //********************************************************************************************************************** HeatMapSimCommand::HeatMapSimCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["svg"] = tempOutNames; format = ""; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //required parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { format = "phylip"; inputfile = phylipfile; m-> setPhylipFile(phylipfile); if (outputDir == "") { outputDir += m->hasPath(phylipfile); } } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { abort = true; } else if (columnfile == "not found") { columnfile = ""; } else { format = "column"; inputfile = columnfile; m->setColumnFile(columnfile); if (outputDir == "") { outputDir += m->hasPath(columnfile); } } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { format = "shared"; inputfile = sharedfile; m->setSharedFile(sharedfile); if (outputDir == "") { outputDir += m->hasPath(sharedfile); } } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (namefile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } //error checking on files if ((sharedfile == "") && ((phylipfile == "") && (columnfile == ""))) { sharedfile = m->getSharedFile(); if (sharedfile != "") { format = "shared"; inputfile = sharedfile; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { //is there are current file available for either of these? //give priority to column, then phylip columnfile = m->getColumnFile(); if (columnfile != "") { format = "column"; inputfile = columnfile; m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { format = "phylip"; inputfile = phylipfile; m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared or phylip or column file."); m->mothurOutEndLine(); abort = true; } } } } else if ((phylipfile != "") && (columnfile != "")) { m->mothurOut("When running the heatmap.sim command with a distance file you may not use both the column and the phylip parameters."); m->mothurOutEndLine(); abort = true; } if (columnfile != "") { if (namefile == "") { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a name or count file if you are going to use the column format."); m->mothurOutEndLine(); abort = true; } } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "jest-thetayc"; } else { if (calc == "default") { calc = "jest-thetayc"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } string temp = validParameter.validFile(parameters, "fontsize", false); if (temp == "not found") { temp = "24"; } m->mothurConvert(temp, fontsize); if (abort == false) { ValidCalculators validCalculator; int i; for (i=0; ierrorOut(e, "HeatMapSimCommand", "HeatMapSimCommand"); exit(1); } } //********************************************************************************************************************** int HeatMapSimCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } heatmap = new HeatMapSim(outputDir, inputfile, fontsize); if (format == "shared") { runCommandShared(); }else{ runCommandDist(); } delete heatmap; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "HeatMapSimCommand", "execute"); exit(1); } } //********************************************************************************************************************** int HeatMapSimCommand::runCommandShared() { try { //if the users entered no valid calculators don't execute command if (heatCalculators.size() == 0) { m->mothurOut("No valid calculators."); m->mothurOutEndLine(); return 0; } input = new InputData(sharedfile, "sharedfile"); lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (lookup.size() < 2) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); return 0;} //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { delete input; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); return 0; } //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete input; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); vector outfilenames = heatmap->getPic(lookup, heatCalculators); for(int i = 0; i < outfilenames.size(); i++) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); vector outfilenames = heatmap->getPic(lookup, heatCalculators); for(int i = 0; i < outfilenames.size(); i++) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } //prevent memory leak lastLabel = lookup[0]->getLabel(); //get next line to process for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { delete input; m->clearGroups(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } if (m->control_pressed) { delete input; m->clearGroups(); return 0; } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); vector outfilenames = heatmap->getPic(lookup, heatCalculators); for(int i = 0; i < outfilenames.size(); i++) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } if (m->control_pressed) { delete input; m->clearGroups(); return 0; } //reset groups parameter m->clearGroups(); delete input; return 0; } catch(exception& e) { m->errorOut(e, "HeatMapSimCommand", "runCommandShared"); exit(1); } } //********************************************************************************************************************** int HeatMapSimCommand::runCommandDist() { try { vector< vector > matrix; vector names; ifstream in; //read distance file and create distance vector and names vector if (format == "phylip") { //read phylip file m->openInputFile(phylipfile, in); string name; int numSeqs; in >> numSeqs >> name; //save name names.push_back(name); //resize the matrix and fill with zeros matrix.resize(numSeqs); for(int i = 0; i < numSeqs; i++) { matrix[i].resize(numSeqs, 0.0); } //determine if matrix is square or lower triangle //if it is square read the distances for the first sequence char d; bool square; while((d=in.get()) != EOF){ //is d a number meaning its square if(isalnum(d)){ square = true; in.putback(d); for(int i=0;i> matrix[0][i]; } break; } //is d a line return meaning its lower triangle if(d == '\n'){ square = false; break; } } //read rest of matrix if (square == true) { for(int i=1;i> name; names.push_back(name); if (m->control_pressed) { return 0; } for(int j=0;j> matrix[i][j]; } m->gobble(in); } }else { double dist; for(int i=1;i> name; names.push_back(name); if (m->control_pressed) { return 0; } for(int j=0;j> dist; matrix[i][j] = dist; matrix[j][i] = dist; } m->gobble(in); } } in.close(); }else { //read names file NameAssignment* nameMap; CountTable ct; if (namefile != "") { nameMap = new NameAssignment(namefile); nameMap->readMap(); //put names in order in vector for (int i = 0; i < nameMap->size(); i++) { names.push_back(nameMap->get(i)); } }else if (countfile != "") { nameMap = NULL; ct.readTable(countfile, true, false); names = ct.getNamesOfSeqs(); } //resize matrix matrix.resize(names.size()); for (int i = 0; i < names.size(); i++) { matrix[i].resize(names.size(), 0.0); } //read column file string first, second; double dist; m->openInputFile(columnfile, in); while (!in.eof()) { in >> first >> second >> dist; m->gobble(in); if (m->control_pressed) { return 0; } if (namefile != "") { map::iterator itA = nameMap->find(first); map::iterator itB = nameMap->find(second); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + first + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + second + "' was not found in the names file, please correct\n"); exit(1); } //save distance matrix[itA->second][itB->second] = dist; matrix[itB->second][itA->second] = dist; }else if (countfile != "") { int itA = ct.get(first); int itB = ct.get(second); matrix[itA][itB] = dist; matrix[itB][itA] = dist; } } in.close(); if (namefile != "") { delete nameMap; } } string outputFileName = heatmap->getPic(matrix, names); outputNames.push_back(outputFileName); //vector>, vector outputTypes["svg"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "HeatMapSimCommand", "runCommandDist"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/heatmapsimcommand.h000066400000000000000000000026501255543666200221020ustar00rootroot00000000000000#ifndef HEATMAPSIMCOMMAND_H #define HEATMAPSIMCOMMAND_H /* * heatmapsimcommand.h * Mothur * * Created by Sarah Westcott on 6/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "validcalculator.h" #include "heatmapsim.h" #include "nameassignment.hpp" class HeatMapSimCommand : public Command { public: HeatMapSimCommand(string); HeatMapSimCommand(); ~HeatMapSimCommand(){} vector setParameters(); string getCommandName() { return "heatmap.sim"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Heatmap.sim"; } string getDescription() { return "generate a heatmap indicating the pairwise distance between multiple samples using a variety of calculators"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: InputData* input; vector lookup; vector heatCalculators; HeatMapSim* heatmap; OptionParser* parser; bool abort, allLines; set labels; //holds labels to be used string format, groups, label, calc, sharedfile, phylipfile, columnfile, countfile, namefile, outputDir, inputfile; vector Estimators, Groups, outputNames; int fontsize; int runCommandShared(); int runCommandDist(); }; #endif mothur-1.36.1/source/commands/helpcommand.cpp000066400000000000000000000021741255543666200212360ustar00rootroot00000000000000/* * helpcommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "helpcommand.h" //********************************************************************************************************************** HelpCommand::HelpCommand(string option) { validCommands = CommandFactory::getInstance(); } //********************************************************************************************************************** int HelpCommand::execute(){ try { validCommands->printCommands(cout); m->mothurOut("For more information about a specific command type 'commandName(help)' i.e. 'read.dist(help)'"); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("For further assistance please refer to the Mothur manual on our wiki at http://www.mothur.org/wiki, or contact Pat Schloss at mothur.bugs@gmail.com.\n"); return 0; } catch(exception& e) { m->errorOut(e, "HelpCommand", "execute"); exit(1); } } //**********************************************************************************************************************/ mothur-1.36.1/source/commands/helpcommand.h000066400000000000000000000020441255543666200206770ustar00rootroot00000000000000#ifndef HELPCOMMAND_H #define HELPCOMMAND_H /* * helpcommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class is designed to aid the user in running mothur. */ #include "command.hpp" #include "commandfactory.hpp" class HelpCommand : public Command { public: HelpCommand(string); HelpCommand() {} ~HelpCommand(){} vector setParameters() { return outputNames; } //dummy, doesn't really do anything string getCommandName() { return "help"; } string getCommandCategory() { return "Hidden"; } string getHelpString() { return "For more information about a specific command type 'commandName(help)' i.e. 'cluster(help)'"; } string getOutputPattern(string) { return ""; } string getCitation() { return "no citation"; } string getDescription() { return "help"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CommandFactory* validCommands; vector outputNames; }; #endif mothur-1.36.1/source/commands/homovacommand.cpp000066400000000000000000000424341255543666200216020ustar00rootroot00000000000000/* * homovacommand.cpp * mothur * * Created by westcott on 2/8/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "homovacommand.h" #include "groupmap.h" #include "readphylipvector.h" #include "sharedutilities.h" #include "designmap.h" //********************************************************************************************************************** vector HomovaCommand::setParameters(){ try { CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","homova",false,true,true); parameters.push_back(pdesign); CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "none", "none","homova",false,true,true); parameters.push_back(pphylip); CommandParameter psets("sets", "String", "", "", "", "", "","",false,false); parameters.push_back(psets); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter palpha("alpha", "Number", "", "0.05", "", "", "","",false,false); parameters.push_back(palpha); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "HomovaCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string HomovaCommand::getHelpString(){ try { string helpString = ""; helpString += "Referenced: Stewart CN, Excoffier L (1996). Assessing population genetic structure and variability with RAPD data: Application to Vaccinium macrocarpon (American Cranberry). J Evol Biol 9: 153-71.\n"; helpString += "The homova command outputs a .homova file. \n"; helpString += "The homova command parameters are phylip, iters, sets and alpha. The phylip and design parameters are required, unless valid current files exist.\n"; helpString += "The design parameter allows you to assign your samples to groups when you are running homova. It is required. \n"; helpString += "The design file looks like the group file. It is a 2 column tab delimited file, where the first column is the sample name and the second column is the group the sample belongs to.\n"; helpString += "The sets parameter allows you to specify which of the sets in your designfile you would like to analyze. The set names are separated by dashes. THe default is all sets in the designfile.\n"; helpString += "The iters parameter allows you to set number of randomization for the P value. The default is 1000. \n"; helpString += "The homova command should be in the following format: homova(phylip=file.dist, design=file.design).\n"; helpString += "Note: No spaces between parameter labels (i.e. iters), '=' and parameters (i.e. 1000).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "HomovaCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string HomovaCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "homova") { pattern = "[filename],homova"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "HomovaCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** HomovaCommand::HomovaCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["homova"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "HomovaCommand", "HomovaCommand"); exit(1); } } //********************************************************************************************************************** HomovaCommand::HomovaCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["homova"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } } phylipFileName = validParameter.validFile(parameters, "phylip", true); if (phylipFileName == "not open") { phylipFileName = ""; abort = true; } else if (phylipFileName == "not found") { //if there is a current phylip file, use it phylipFileName = m->getPhylipFile(); if (phylipFileName != "") { m->mothurOut("Using " + phylipFileName + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current phylip file and the phylip parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setPhylipFile(phylipFileName); } //check for required parameters designFileName = validParameter.validFile(parameters, "design", true); if (designFileName == "not open") { abort = true; } else if (designFileName == "not found") { //if there is a current design file, use it designFileName = m->getDesignFile(); if (designFileName != "") { m->mothurOut("Using " + designFileName + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current design file and the design parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setDesignFile(designFileName); } string temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "alpha", false); if (temp == "not found") { temp = "0.05"; } m->mothurConvert(temp, experimentwiseAlpha); string sets = validParameter.validFile(parameters, "sets", false); if (sets == "not found") { sets = ""; } else { m->splitAtDash(sets, Sets); } } } catch(exception& e) { m->errorOut(e, "HomovaCommand", "HomovaCommand"); exit(1); } } //********************************************************************************************************************** int HomovaCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //read design file designMap = new DesignMap(designFileName); if (outputDir == "") { outputDir = m->hasPath(phylipFileName); } //read in distance matrix and square it ReadPhylipVector readMatrix(phylipFileName); vector sampleNames = readMatrix.read(distanceMatrix); if (Sets.size() != 0) { //user selected sets, so we want to remove the samples not in those sets SharedUtil util; vector dGroups = designMap->getCategory(); util.setGroups(Sets, dGroups); for(int i=0;icontrol_pressed) { delete designMap; return 0; } string group = designMap->get(sampleNames[i]); if (group == "not found") { m->mothurOut("[ERROR]: " + sampleNames[i] + " is not in your design file, please correct."); m->mothurOutEndLine(); m->control_pressed = true; }else if (!m->inUsersGroups(group, Sets)){ //not in set we want remove it //remove from all other rows for(int j=0;j > origGroupSampleMap; for(int i=0;iget(sampleNames[i]); if (group == "not found") { m->mothurOut("[ERROR]: " + sampleNames[i] + " is not in your design file, please correct."); m->mothurOutEndLine(); m->control_pressed = true; }else { origGroupSampleMap[group].push_back(i); } } int numGroups = origGroupSampleMap.size(); if (m->control_pressed) { delete designMap; return 0; } //create a new filename ofstream HOMOVAFile; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(phylipFileName)); string HOMOVAFileName = getOutputFileName("homova", variables); m->openOutputFile(HOMOVAFileName, HOMOVAFile); outputNames.push_back(HOMOVAFileName); outputTypes["homova"].push_back(HOMOVAFileName); HOMOVAFile << "HOMOVA\tBValue\tP-value\tSSwithin/(Ni-1)_values" << endl; m->mothurOut("HOMOVA\tBValue\tP-value\tSSwithin/(Ni-1)_values\n"); double fullHOMOVAPValue = runHOMOVA(HOMOVAFile, origGroupSampleMap, experimentwiseAlpha); if(fullHOMOVAPValue <= experimentwiseAlpha && numGroups > 2){ int numCombos = numGroups * (numGroups-1) / 2; double pairwiseAlpha = experimentwiseAlpha / (double) numCombos; map >::iterator itA; map >::iterator itB; for(itA=origGroupSampleMap.begin();itA!=origGroupSampleMap.end();itA++){ itB = itA;itB++; for(;itB!=origGroupSampleMap.end();itB++){ map > pairwiseGroupSampleMap; pairwiseGroupSampleMap[itA->first] = itA->second; pairwiseGroupSampleMap[itB->first] = itB->second; runHOMOVA(HOMOVAFile, pairwiseGroupSampleMap, pairwiseAlpha); } } HOMOVAFile << endl; m->mothurOutEndLine(); m->mothurOut("Experiment-wise error rate: " + toString(experimentwiseAlpha) + '\n'); m->mothurOut("Pair-wise error rate (Bonferroni): " + toString(pairwiseAlpha) + '\n'); } else{ m->mothurOut("Experiment-wise error rate: " + toString(experimentwiseAlpha) + '\n'); } m->mothurOut("If you have borderline P-values, you should try increasing the number of iterations\n"); delete designMap; m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "HomovaCommand", "execute"); exit(1); } } //********************************************************************************************************************** double HomovaCommand::runHOMOVA(ofstream& HOMOVAFile, map > groupSampleMap, double alpha){ try { map >::iterator it; int numGroups = groupSampleMap.size(); vector ssWithinOrigVector; double bValueOrig = calcBValue(groupSampleMap, ssWithinOrigVector); double counter = 0; for(int i=0;i ssWithinRandVector; map > randomizedGroup = getRandomizedGroups(groupSampleMap); double bValueRand = calcBValue(randomizedGroup, ssWithinRandVector); if(bValueRand >= bValueOrig){ counter++; } } double pValue = (double) counter / (double) iters; string pString = ""; if(pValue < 1/(double)iters){ pString = '<' + toString(1/(double)iters); } else { pString = toString(pValue); } //print homova table it = groupSampleMap.begin(); HOMOVAFile << it->first; m->mothurOut(it->first); it++; for(;it!=groupSampleMap.end();it++){ HOMOVAFile << '-' << it->first; m->mothurOut('-' + it->first); } HOMOVAFile << '\t' << bValueOrig << '\t' << pString; m->mothurOut('\t' + toString(bValueOrig) + '\t' + pString); if(pValue < alpha){ HOMOVAFile << "*"; m->mothurOut("*"); } for(int i=0;imothurOut('\t' + toString(ssWithinOrigVector[i])); } HOMOVAFile << endl; m->mothurOutEndLine(); return pValue; } catch(exception& e) { m->errorOut(e, "HomovaCommand", "runHOMOVA"); exit(1); } } //********************************************************************************************************************** double HomovaCommand::calcSigleSSWithin(vector sampleIndices) { try { double ssWithin = 0.0; int numSamplesInGroup = sampleIndices.size(); for(int i=0;ierrorOut(e, "HomovaCommand", "calcSigleSSWithin"); exit(1); } } //********************************************************************************************************************** double HomovaCommand::calcBValue(map > groupSampleMap, vector& ssWithinVector) { try { map >::iterator it; double numGroups = (double)groupSampleMap.size(); ssWithinVector.resize(numGroups, 0); double totalNumSamples = 0; double ssWithinFull; double secondTermSum = 0; double inverseOneMinusSum = 0; int index = 0; ssWithinVector.resize(numGroups, 0); for(it = groupSampleMap.begin();it!=groupSampleMap.end();it++){ int numSamplesInGroup = it->second.size(); totalNumSamples += numSamplesInGroup; ssWithinVector[index] = calcSigleSSWithin(it->second); ssWithinFull += ssWithinVector[index]; secondTermSum += (numSamplesInGroup - 1) * log(ssWithinVector[index] / (double)(numSamplesInGroup - 1)); inverseOneMinusSum += 1.0 / (double)(numSamplesInGroup - 1); ssWithinVector[index] /= (double)(numSamplesInGroup - 1); //this line is only for output purposes to scale SSw by the number of samples in the group index++; } double B = (totalNumSamples - numGroups) * log(ssWithinFull/(totalNumSamples-numGroups)) - secondTermSum; double denomintor = 1 + 1.0/(3.0 * (numGroups - 1.0)) * (inverseOneMinusSum - 1.0 / (double) (totalNumSamples - numGroups)); B /= denomintor; return B; } catch(exception& e) { m->errorOut(e, "HomovaCommand", "calcBValue"); exit(1); } } //********************************************************************************************************************** map > HomovaCommand::getRandomizedGroups(map > origMapping){ try{ vector sampleIndices; vector samplesPerGroup; map >::iterator it; for(it=origMapping.begin();it!=origMapping.end();it++){ vector indices = it->second; samplesPerGroup.push_back(indices.size()); sampleIndices.insert(sampleIndices.end(), indices.begin(), indices.end()); } random_shuffle(sampleIndices.begin(), sampleIndices.end()); int index = 0; map > randomizedGroups = origMapping; for(it=randomizedGroups.begin();it!=randomizedGroups.end();it++){ for(int i=0;isecond.size();i++){ it->second[i] = sampleIndices[index++]; } } return randomizedGroups; } catch (exception& e) { m->errorOut(e, "AmovaCommand", "randomizeGroups"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/homovacommand.h000066400000000000000000000025771255543666200212530ustar00rootroot00000000000000#ifndef HOMOVACOMMAND_H #define HOMOVACOMMAND_H /* * homovacommand.h * mothur * * Created by westcott on 2/8/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" class DesignMap; class HomovaCommand : public Command { public: HomovaCommand(string); HomovaCommand(); ~HomovaCommand(){} vector setParameters(); string getCommandName() { return "homova"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Stewart CN, Excoffier L (1996). Assessing population genetic structure and variability with RAPD data: Application to Vaccinium macrocarpon (American Cranberry). J Evol Biol 9: 153-71. \nhttp://www.mothur.org/wiki/Homova"; } string getDescription() { return "homova"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: double runHOMOVA(ofstream& , map >, double); double calcSigleSSWithin(vector); double calcBValue(map >, vector&); map > getRandomizedGroups(map >); bool abort; vector outputNames, Sets; string outputDir, inputDir, designFileName, phylipFileName; DesignMap* designMap; vector< vector > distanceMatrix; int iters; double experimentwiseAlpha; }; #endif mothur-1.36.1/source/commands/indicatorcommand.cpp000066400000000000000000002051611255543666200222630ustar00rootroot00000000000000/* * indicatorcommand.cpp * Mothur * * Created by westcott on 11/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "indicatorcommand.h" #include "sharedutilities.h" //********************************************************************************************************************** vector IndicatorCommand::setParameters(){ try { CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pdesign("design", "InputTypes", "", "", "TreeDesign", "TreeDesign", "none","summary",false,false,true); parameters.push_back(pdesign); CommandParameter pshared("shared", "InputTypes", "", "", "SharedRel", "SharedRel", "none","summary",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "SharedRel", "SharedRel", "none","summary",false,false); parameters.push_back(prelabund); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter ptree("tree", "InputTypes", "", "", "TreeDesign", "TreeDesign", "none","tree-summary",false,false,true); parameters.push_back(ptree); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false); parameters.push_back(pprocessors); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string IndicatorCommand::getHelpString(){ try { string helpString = ""; helpString += "The indicator command can be run in 3 ways: with a shared or relabund file and a design file, or with a shared or relabund file and a tree file, or with a shared or relabund file, tree file and design file. \n"; helpString += "The indicator command outputs a .indicator.summary file and a .indicator.tre if a tree is given. \n"; helpString += "The new tree contains labels at each internal node. The label is the node number so you can relate the tree to the summary file.\n"; helpString += "The summary file lists the indicator value for each OTU for each node.\n"; helpString += "The indicator command parameters are tree, groups, shared, relabund, design and label. \n"; helpString += "The design parameter allows you to relate the tree to the shared or relabund file, if your tree contains the grouping names, or if no tree is provided to group your groups into groupings.\n"; helpString += "The groups parameter allows you to specify which of the groups in your shared or relabund you would like analyzed, or if you provide a design file the groups in your design file. The groups may be entered separated by dashes.\n"; helpString += "The label parameter indicates at what distance your tree relates to the shared or relabund.\n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; helpString += "The iters parameter allows you to set number of randomization for the P value. The default is 1000."; helpString += "The indicator command should be used in the following format: indicator(tree=test.tre, shared=test.shared, label=0.03)\n"; helpString += "Note: No spaces between parameter labels (i.e. tree), '=' and parameters (i.e.yourTreefile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string IndicatorCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "tree") { pattern = "[filename],indicator.tre"; } else if (type == "summary") { pattern = "[filename],indicator.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** IndicatorCommand::IndicatorCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["tree"] = tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "IndicatorCommand"); exit(1); } } //********************************************************************************************************************** IndicatorCommand::IndicatorCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } m->runParse = true; m->clearGroups(); m->clearAllGroups(); m->Treenames.clear(); vector tempOutNames; outputTypes["tree"] = tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("tree"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["tree"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters treefile = validParameter.validFile(parameters, "tree", true); if (treefile == "not open") { treefile = ""; abort = true; } else if (treefile == "not found") { treefile = ""; } else { m->setTreeFile(treefile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { inputFileName = sharedfile; m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { inputFileName = relabundfile; m->setRelAbundFile(relabundfile); } designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { designfile = ""; abort = true; } else if (designfile == "not found") { designfile = ""; } else { m->setDesignFile(designfile); } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; Groups.push_back("all"); } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } string temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if ((relabundfile == "") && (sharedfile == "")) { //is there are current file available for either of these? //give priority to shared, then relabund sharedfile = m->getSharedFile(); if (sharedfile != "") { inputFileName = sharedfile; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputFileName = relabundfile; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared or relabund."); m->mothurOutEndLine(); abort = true; } } } if ((designfile == "") && (treefile == "")) { treefile = m->getTreeFile(); if (treefile != "") { m->mothurOut("Using " + treefile + " as input file for the tree parameter."); m->mothurOutEndLine(); } else { designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("[ERROR]: You must provide either a tree or design file."); m->mothurOutEndLine(); abort = true; } } } if ((relabundfile != "") && (sharedfile != "")) { m->mothurOut("[ERROR]: You may not use both a shared and relabund file."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "IndicatorCommand"); exit(1); } } //********************************************************************************************************************** int IndicatorCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); int start = time(NULL); //read designfile if given and set up groups for read of sharedfiles if (designfile != "") { designMap = new DesignMap(designfile); //fill Groups - checks for "all" and for any typo groups SharedUtil util; vector nameGroups = designMap->getCategory(); util.setGroups(Groups, nameGroups); vector namesSeqs = designMap->getNamesGroups(Groups); m->setGroups(namesSeqs); } /***************************************************/ // use smart distancing to get right sharedRabund // /***************************************************/ if (sharedfile != "") { getShared(); if (m->control_pressed) { if (designfile != "") { delete designMap; } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if (lookup[0] == NULL) { m->mothurOut("[ERROR] reading shared file."); m->mothurOutEndLine(); return 0; } }else { getSharedFloat(); if (m->control_pressed) { if (designfile != "") { delete designMap; } for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } return 0; } if (lookupFloat[0] == NULL) { m->mothurOut("[ERROR] reading relabund file."); m->mothurOutEndLine(); return 0; } } //reset groups if needed if (designfile != "") { m->setGroups(Groups); } /***************************************************/ // reading tree info // /***************************************************/ if (treefile != "") { string groupfile = ""; m->setTreeFile(treefile); Tree* tree = new Tree(treefile); delete tree; //extracts names from tree to make faked out groupmap ct = new CountTable(); bool mismatch = false; set nameMap; map groupMap; set gps; for (int i = 0; i < m->Treenames.size(); i++) { nameMap.insert(m->Treenames[i]); //sanity check - is this a group that is not in the sharedfile? if (i == 0) { gps.insert("Group1"); } if (designfile == "") { if (!(m->inUsersGroups(m->Treenames[i], m->getAllGroups()))) { m->mothurOut("[ERROR]: " + m->Treenames[i] + " is not a group in your shared or relabund file."); m->mothurOutEndLine(); mismatch = true; } groupMap[m->Treenames[i]] = "Group1"; }else{ vector myGroups; myGroups.push_back(m->Treenames[i]); vector myNames = designMap->getNamesGroups(myGroups); for(int k = 0; k < myNames.size(); k++) { if (!(m->inUsersGroups(myNames[k], m->getAllGroups()))) { m->mothurOut("[ERROR]: " + myNames[k] + " is not a group in your shared or relabund file."); m->mothurOutEndLine(); mismatch = true; } } groupMap[m->Treenames[i]] = "Group1"; } } ct->createTable(nameMap, groupMap, gps); if ((designfile != "") && (m->Treenames.size() != Groups.size())) { cout << Groups.size() << '\t' << m->Treenames.size() << endl; m->mothurOut("[ERROR]: You design file does not match your tree, aborting."); m->mothurOutEndLine(); mismatch = true; } if (mismatch) { //cleanup and exit if (designfile != "") { delete designMap; } if (sharedfile != "") { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } else { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } } delete ct; return 0; } read = new ReadNewickTree(treefile); int readOk = read->read(ct); if (readOk != 0) { m->mothurOut("Read Terminated."); m->mothurOutEndLine(); delete ct; delete read; return 0; } vector T = read->getTrees(); delete read; if (m->control_pressed) { if (designfile != "") { delete designMap; } if (sharedfile != "") { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } else { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } } for (int i = 0; i < T.size(); i++) { delete T[i]; } delete ct; return 0; } T[0]->assembleTree(); /***************************************************/ // create ouptut tree - respecting pickedGroups // /***************************************************/ Tree* outputTree = new Tree(m->getNumGroups(), ct); outputTree->getSubTree(T[0], m->getGroups()); outputTree->assembleTree(); //no longer need original tree, we have output tree to use and label for (int i = 0; i < T.size(); i++) { delete T[i]; } if (m->control_pressed) { if (designfile != "") { delete designMap; } if (sharedfile != "") { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } else { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } } delete outputTree; delete ct; return 0; } /***************************************************/ // get indicator species values // /***************************************************/ GetIndicatorSpecies(outputTree); delete outputTree; delete ct; }else { //run with design file only //get indicator species GetIndicatorSpecies(); } if (designfile != "") { delete designMap; } if (sharedfile != "") { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } else { for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set tree file as new current treefile if (treefile != "") { string current = ""; itTypes = outputTypes.find("tree"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTreeFile(current); } } } m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to find the indicator species."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "execute"); exit(1); } } //********************************************************************************************************************** //divide shared or relabund file by groupings in the design file //report all otu values to file int IndicatorCommand::GetIndicatorSpecies(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(inputFileName); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(inputFileName)); string outputFileName = getOutputFileName("summary", variables); outputNames.push_back(outputFileName); outputTypes["summary"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); m->mothurOutEndLine(); m->mothurOut("Species\tIndicator_Groups\tIndicatorValue\tpValue\n"); int numBins = 0; if (sharedfile != "") { numBins = lookup[0]->getNumBins(); } else { numBins = lookupFloat[0]->getNumBins(); } if (m->control_pressed) { out.close(); return 0; } /*****************************************************/ //create vectors containing rabund info // /*****************************************************/ vector indicatorValues; //size of numBins vector pValues; vector indicatorGroups; map< vector, vector > randomGroupingsMap; //maps location in groupings to location in groupings, ie, [0][0] -> [1][2]. This is so we don't have to actually move the sharedRabundVectors. if (sharedfile != "") { vector< vector > groupings; set groupsAlreadyAdded; vector subset; //for each grouping for (int i = 0; i < (designMap->getCategory()).size(); i++) { for (int k = 0; k < lookup.size(); k++) { //are you from this grouping? if (designMap->get(lookup[k]->getGroup()) == (designMap->getCategory())[i]) { subset.push_back(lookup[k]); groupsAlreadyAdded.insert(lookup[k]->getGroup()); } } if (subset.size() != 0) { groupings.push_back(subset); } subset.clear(); } if (groupsAlreadyAdded.size() != lookup.size()) { m->mothurOut("[ERROR]: could not make proper groupings."); m->mothurOutEndLine(); } indicatorValues = getValues(groupings, indicatorGroups, randomGroupingsMap); pValues = getPValues(groupings, lookup.size(), indicatorValues); }else { vector< vector > groupings; set groupsAlreadyAdded; vector subset; //for each grouping for (int i = 0; i < (designMap->getCategory()).size(); i++) { for (int k = 0; k < lookupFloat.size(); k++) { //are you from this grouping? if (designMap->get(lookupFloat[k]->getGroup()) == (designMap->getCategory())[i]) { subset.push_back(lookupFloat[k]); groupsAlreadyAdded.insert(lookupFloat[k]->getGroup()); } } if (subset.size() != 0) { groupings.push_back(subset); } subset.clear(); } if (groupsAlreadyAdded.size() != lookupFloat.size()) { m->mothurOut("[ERROR]: could not make proper groupings."); m->mothurOutEndLine(); } indicatorValues = getValues(groupings, indicatorGroups, randomGroupingsMap); pValues = getPValues(groupings, lookupFloat.size(), indicatorValues); } if (m->control_pressed) { out.close(); return 0; } /******************************************************/ //output indicator values to table form // /*****************************************************/ out << "OTU\tIndicator_Groups\tIndicator_Value\tpValue" << endl; for (int j = 0; j < indicatorValues.size(); j++) { if (m->control_pressed) { out.close(); return 0; } out << m->currentSharedBinLabels[j] << '\t' << indicatorGroups[j] << '\t' << indicatorValues[j] << '\t'; if (pValues[j] > (1/(float)iters)) { out << pValues[j] << endl; } else { out << "<" << (1/(float)iters) << endl; } if (pValues[j] <= 0.05) { cout << m->currentSharedBinLabels[j] << '\t' << indicatorGroups[j] << '\t' << indicatorValues[j] << '\t'; string pValueString = "<" + toString((1/(float)iters)); if (pValues[j] > (1/(float)iters)) { pValueString = toString(pValues[j]); cout << pValues[j];} else { cout << "<" << (1/(float)iters); } m->mothurOutJustToLog(m->currentSharedBinLabels[j] + "\t" + indicatorGroups[j] + "\t" + toString(indicatorValues[j]) + "\t" + pValueString); m->mothurOutEndLine(); } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "GetIndicatorSpecies"); exit(1); } } //********************************************************************************************************************** //traverse tree finding indicator species values for each otu at each node //label node with otu number that has highest indicator value //report all otu values to file int IndicatorCommand::GetIndicatorSpecies(Tree*& T){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(inputFileName); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(inputFileName)); string outputFileName = getOutputFileName("summary",variables); outputNames.push_back(outputFileName); outputTypes["summary"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); int numBins = 0; if (sharedfile != "") { numBins = lookup[0]->getNumBins(); } else { numBins = lookupFloat[0]->getNumBins(); } //print headings out << "TreeNode\t"; for (int i = 0; i < numBins; i++) { out << m->currentSharedBinLabels[i] << "_IndGroups" << '\t' << m->currentSharedBinLabels[i] << "_IndValue" << '\t' << "pValue" << '\t'; } out << endl; m->mothurOutEndLine(); m->mothurOut("Node\tSpecies\tIndicator_Groups\tIndicatorValue\tpValue\n"); string treeOutputDir = outputDir; if (outputDir == "") { treeOutputDir += m->hasPath(treefile); } variables["[filename]"] = treeOutputDir + m->getRootName(m->getSimpleName(treefile)); string outputTreeFileName = getOutputFileName("tree", variables); //create a map from tree node index to names of descendants, save time later to know which sharedRabund you need map > nodeToDescendants; map > descendantNodes; for (int i = 0; i < T->getNumNodes(); i++) { if (m->control_pressed) { return 0; } nodeToDescendants[i] = getDescendantList(T, i, nodeToDescendants, descendantNodes); } //you need the distances to leaf to decide grouping below //this will also set branch lengths if the tree does not include them map distToRoot = getDistToRoot(T); //for each node for (int i = T->getNumLeaves(); i < T->getNumNodes(); i++) { //cout << endl << i+1 << endl; if (m->control_pressed) { out.close(); return 0; } /*****************************************************/ //create vectors containing rabund info // /*****************************************************/ vector indicatorValues; //size of numBins vector pValues; vector indicatorGroups; map< vector, vector > randomGroupingsMap; //maps location in groupings to location in groupings, ie, [0][0] -> [1][2]. This is so we don't have to actually move the sharedRabundVectors. if (sharedfile != "") { vector< vector > groupings; //get nodes that will be a valid grouping //you are valid if you are not one of my descendants //AND your distToRoot is >= mine //AND you were not added as part of a larger grouping. Largest nodes are added first. set groupsAlreadyAdded; //create a grouping with my grouping vector subset; int count = 0; int doneCount = nodeToDescendants[i].size(); for (int k = 0; k < lookup.size(); k++) { //is this descendant of i if ((nodeToDescendants[i].count(lookup[k]->getGroup()) != 0)) { subset.push_back(lookup[k]); groupsAlreadyAdded.insert(lookup[k]->getGroup()); count++; } if (count == doneCount) { break; } //quit once you get the rabunds for this grouping } if (subset.size() != 0) { groupings.push_back(subset); } for (int j = (T->getNumNodes()-1); j >= 0; j--) { if ((descendantNodes[i].count(j) == 0) && (distToRoot[j] >= distToRoot[i])) { vector subset; int count = 0; int doneCount = nodeToDescendants[j].size(); for (int k = 0; k < lookup.size(); k++) { //is this descendant of j, and we didn't already add this as part of a larger grouping if ((nodeToDescendants[j].count(lookup[k]->getGroup()) != 0) && (groupsAlreadyAdded.count(lookup[k]->getGroup()) == 0)) { subset.push_back(lookup[k]); groupsAlreadyAdded.insert(lookup[k]->getGroup()); count++; } if (count == doneCount) { break; } //quit once you get the rabunds for this grouping } //if subset.size == 0 then the node was added as part of a larger grouping if (subset.size() != 0) { groupings.push_back(subset); } } } if (groupsAlreadyAdded.size() != lookup.size()) { m->mothurOut("[ERROR]: could not make proper groupings."); m->mothurOutEndLine(); } indicatorValues = getValues(groupings, indicatorGroups, randomGroupingsMap); pValues = getPValues(groupings, lookup.size(), indicatorValues); }else { vector< vector > groupings; //get nodes that will be a valid grouping //you are valid if you are not one of my descendants //AND your distToRoot is >= mine //AND you were not added as part of a larger grouping. Largest nodes are added first. set groupsAlreadyAdded; //create a grouping with my grouping vector subset; int count = 0; int doneCount = nodeToDescendants[i].size(); for (int k = 0; k < lookupFloat.size(); k++) { //is this descendant of i if ((nodeToDescendants[i].count(lookupFloat[k]->getGroup()) != 0)) { subset.push_back(lookupFloat[k]); groupsAlreadyAdded.insert(lookupFloat[k]->getGroup()); count++; } if (count == doneCount) { break; } //quit once you get the rabunds for this grouping } if (subset.size() != 0) { groupings.push_back(subset); } for (int j = (T->getNumNodes()-1); j >= 0; j--) { if ((descendantNodes[i].count(j) == 0) && (distToRoot[j] >= distToRoot[i])) { vector subset; int count = 0; int doneCount = nodeToDescendants[j].size(); for (int k = 0; k < lookupFloat.size(); k++) { //is this descendant of j, and we didn't already add this as part of a larger grouping if ((nodeToDescendants[j].count(lookupFloat[k]->getGroup()) != 0) && (groupsAlreadyAdded.count(lookupFloat[k]->getGroup()) == 0)) { subset.push_back(lookupFloat[k]); groupsAlreadyAdded.insert(lookupFloat[k]->getGroup()); count++; } if (count == doneCount) { break; } //quit once you get the rabunds for this grouping } //if subset.size == 0 then the node was added as part of a larger grouping if (subset.size() != 0) { groupings.push_back(subset); } } } if (groupsAlreadyAdded.size() != lookupFloat.size()) { m->mothurOut("[ERROR]: could not make proper groupings."); m->mothurOutEndLine(); } indicatorValues = getValues(groupings, indicatorGroups, randomGroupingsMap); pValues = getPValues(groupings, lookupFloat.size(), indicatorValues); } if (m->control_pressed) { out.close(); return 0; } /******************************************************/ //output indicator values to table form + label tree // /*****************************************************/ out << (i+1); for (int j = 0; j < indicatorValues.size(); j++) { if (m->control_pressed) { out.close(); return 0; } if (pValues[j] < (1/(float)iters)) { out << '\t' << indicatorGroups[j] << '\t' << indicatorValues[j] << '\t' << '<' << (1/(float)iters); }else { out << '\t' << indicatorGroups[j] << '\t' << indicatorValues[j] << '\t' << pValues[j]; } if (pValues[j] <= 0.05) { cout << i+1 << '\t' << m->currentSharedBinLabels[j] << '\t' << indicatorGroups[j] << '\t' << indicatorValues[j]; string pValueString = "\t<" + toString((1/(float)iters)); if (pValues[j] > (1/(float)iters)) { pValueString = toString('\t' + pValues[j]); cout << '\t' << pValues[j];} else { cout << "\t<" << (1/(float)iters); } m->mothurOutJustToLog(toString(i) + "\t" + m->currentSharedBinLabels[j] + "\t" + indicatorGroups[j] + "\t" + toString(indicatorValues[j]) + "\t" + pValueString); m->mothurOutEndLine(); } } out << endl; T->tree[i].setLabel(toString(i+1)); } out.close(); ofstream outTree; m->openOutputFile(outputTreeFileName, outTree); outputNames.push_back(outputTreeFileName); outputTypes["tree"].push_back(outputTreeFileName); T->print(outTree, "both"); outTree.close(); return 0; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "GetIndicatorSpecies"); exit(1); } } //********************************************************************************************************************** vector IndicatorCommand::getValues(vector< vector >& groupings, vector& indicatorGroupings, map< vector, vector > groupingsMap){ try { vector values; map< vector, vector >::iterator it; indicatorGroupings.clear(); //create grouping strings vector groupingsGroups; for (int j = 0; j < groupings.size(); j++) { string tempGrouping = ""; for (int k = 0; k < groupings[j].size()-1; k++) { tempGrouping += groupings[j][k]->getGroup() + "-"; } tempGrouping += groupings[j][groupings[j].size()-1]->getGroup(); groupingsGroups.push_back(tempGrouping); } //for each otu for (int i = 0; i < groupings[0][0]->getNumBins(); i++) { if (m->control_pressed) { return values; } vector terms; float AijDenominator = 0.0; vector Bij; //get overall abundance of each grouping for (int j = 0; j < groupings.size(); j++) { float totalAbund = 0; int numNotZero = 0; for (int k = 0; k < groupings[j].size(); k++) { vector temp; temp.push_back(j); temp.push_back(k); it = groupingsMap.find(temp); if (it == groupingsMap.end()) { //this one didnt get moved totalAbund += groupings[j][k]->getAbundance(i); if (groupings[j][k]->getAbundance(i) != 0.0) { numNotZero++; } }else { totalAbund += groupings[(it->second)[0]][(it->second)[1]]->getAbundance(i); if (groupings[(it->second)[0]][(it->second)[1]]->getAbundance(i) != 0.0) { numNotZero++; } } } //mean abundance float Aij = (totalAbund / (float) groupings[j].size()); terms.push_back(Aij); //percentage of sites represented Bij.push_back(numNotZero / (float) groupings[j].size()); AijDenominator += Aij; } float maxIndVal = 0.0; string maxGrouping = ""; for (int j = 0; j < terms.size(); j++) { float thisAij = (terms[j] / AijDenominator); //relative abundance float thisValue = thisAij * Bij[j] * 100.0; //save largest if (thisValue > maxIndVal) { maxIndVal = thisValue; maxGrouping = groupingsGroups[j]; } } values.push_back(maxIndVal); indicatorGroupings.push_back(maxGrouping); } return values; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getValues"); exit(1); } } //********************************************************************************************************************** //same as above, just data type difference vector IndicatorCommand::getValues(vector< vector >& groupings, vector& indicatorGroupings, map< vector, vector > groupingsMap){ try { vector values; map< vector, vector >::iterator it; indicatorGroupings.clear(); //create grouping strings vector groupingsGroups; for (int j = 0; j < groupings.size(); j++) { string tempGrouping = ""; for (int k = 0; k < groupings[j].size()-1; k++) { tempGrouping += groupings[j][k]->getGroup() + "-"; } tempGrouping += groupings[j][groupings[j].size()-1]->getGroup(); groupingsGroups.push_back(tempGrouping); } //for each otu for (int i = 0; i < groupings[0][0]->getNumBins(); i++) { vector terms; float AijDenominator = 0.0; vector Bij; //get overall abundance of each grouping for (int j = 0; j < groupings.size(); j++) { int totalAbund = 0.0; int numNotZero = 0; for (int k = 0; k < groupings[j].size(); k++) { vector temp; temp.push_back(j); temp.push_back(k); it = groupingsMap.find(temp); if (it == groupingsMap.end()) { //this one didnt get moved totalAbund += groupings[j][k]->getAbundance(i); if (groupings[j][k]->getAbundance(i) != 0.0) { numNotZero++; } }else { totalAbund += groupings[(it->second)[0]][(it->second)[1]]->getAbundance(i); if (groupings[(it->second)[0]][(it->second)[1]]->getAbundance(i) != 0.0) { numNotZero++; } } } //mean abundance float Aij = (totalAbund / (float) groupings[j].size()); terms.push_back(Aij); //percentage of sites represented Bij.push_back(numNotZero / (float) groupings[j].size()); AijDenominator += Aij; } float maxIndVal = 0.0; string maxGrouping = ""; for (int j = 0; j < terms.size(); j++) { float thisAij = (terms[j] / AijDenominator); //relative abundance float thisValue = thisAij * Bij[j] * 100.0; //save largest if (thisValue > maxIndVal) { maxIndVal = thisValue; maxGrouping = groupingsGroups[j]; } } values.push_back(maxIndVal); indicatorGroupings.push_back(maxGrouping); } return values; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getValues"); exit(1); } } //********************************************************************************************************************** //you need the distances to root to decide groupings //this will also set branch lengths if the tree does not include them map IndicatorCommand::getDistToRoot(Tree*& T){ try { map dists; bool hasBranchLengths = false; for (int i = 0; i < T->getNumNodes(); i++) { if (T->tree[i].getBranchLength() > 0.0) { hasBranchLengths = true; break; } } //set branchlengths if needed if (!hasBranchLengths) { for (int i = 0; i < T->getNumNodes(); i++) { int lc = T->tree[i].getLChild(); int rc = T->tree[i].getRChild(); if (lc == -1) { // you are a leaf //if you are a leaf set you priliminary length to 1.0, this may adjust later T->tree[i].setBranchLength(1.0); dists[i] = 1.0; }else{ // you are an internal node //look at your children's length to leaf float ldist = dists[lc]; float rdist = dists[rc]; float greater = ldist; if (rdist > greater) { greater = rdist; dists[i] = ldist + 1.0;} else { dists[i] = rdist + 1.0; } //branch length = difference + 1 T->tree[lc].setBranchLength((abs(ldist-greater) + 1.0)); T->tree[rc].setBranchLength((abs(rdist-greater) + 1.0)); } } } dists.clear(); for (int i = 0; i < T->getNumNodes(); i++) { double sum = 0.0; int index = i; while(T->tree[index].getParent() != -1){ if (T->tree[index].getBranchLength() != -1) { sum += abs(T->tree[index].getBranchLength()); } index = T->tree[index].getParent(); } dists[i] = sum; } return dists; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getLengthToLeaf"); exit(1); } } //********************************************************************************************************************** set IndicatorCommand::getDescendantList(Tree*& T, int i, map > descendants, map >& nodes){ try { set names; set::iterator it; int lc = T->tree[i].getLChild(); int rc = T->tree[i].getRChild(); if (lc == -1) { //you are a leaf your only descendant is yourself set temp; temp.insert(i); nodes[i] = temp; if (designfile == "") { names.insert(T->tree[i].getName()); }else { vector myGroup; myGroup.push_back(T->tree[i].getName()); vector myReps = designMap->getNamesGroups(myGroup); for (int k = 0; k < myReps.size(); k++) { names.insert(myReps[k]); } } }else{ //your descedants are the combination of your childrens descendants names = descendants[lc]; nodes[i] = nodes[lc]; for (it = descendants[rc].begin(); it != descendants[rc].end(); it++) { names.insert(*it); } for (set::iterator itNum = nodes[rc].begin(); itNum != nodes[rc].end(); itNum++) { nodes[i].insert(*itNum); } //you are your own descendant nodes[i].insert(i); } return names; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getDescendantList"); exit(1); } } //********************************************************************************************************************** int IndicatorCommand::getShared(){ try { InputData* input = new InputData(sharedfile, "sharedfile"); lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (label == "") { label = lastLabel; delete input; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { delete input; return 0; } if(labels.count(lookup[0]->getLabel()) == 1){ processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); break; } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { delete input; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); } delete input; return 0; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getShared"); exit(1); } } //********************************************************************************************************************** int IndicatorCommand::getSharedFloat(){ try { InputData* input = new InputData(relabundfile, "relabund"); lookupFloat = input->getSharedRAbundFloatVectors(); string lastLabel = lookupFloat[0]->getLabel(); if (label == "") { label = lastLabel; delete input; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookupFloat[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { delete input; return 0; } if(labels.count(lookupFloat[0]->getLabel()) == 1){ processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookupFloat[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookupFloat[0]->getLabel(); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); //restore real lastlabel to save below lookupFloat[0]->setLabel(saveLabel); break; } lastLabel = lookupFloat[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat = input->getSharedRAbundFloatVectors(); } if (m->control_pressed) { delete input; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i] != NULL) { delete lookupFloat[i]; } } lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); } delete input; return 0; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getShared"); exit(1); } } //********************************************************************************************************************** vector IndicatorCommand::driver(vector< vector >& groupings, int num, vector indicatorValues, int numIters){ try { vector pvalues; pvalues.resize(indicatorValues.size(), 0); vector notUsedGroupings; //we dont care about the grouping for the pvalues since they are randomized, but we need to pass the function something to make it work. for(int i=0;icontrol_pressed) { break; } map< vector, vector > groupingsMap = randomizeGroupings(groupings, num); vector randomIndicatorValues = getValues(groupings, notUsedGroupings, groupingsMap); for (int j = 0; j < indicatorValues.size(); j++) { if (randomIndicatorValues[j] >= indicatorValues[j]) { pvalues[j]++; } } } return pvalues; }catch(exception& e) { m->errorOut(e, "IndicatorCommand", "driver"); exit(1); } } //********************************************************************************************************************** vector IndicatorCommand::getPValues(vector< vector >& groupings, int num, vector indicatorValues){ try { vector pvalues; bool recalc = false; if(processors == 1){ pvalues = driver(groupings, num, indicatorValues, iters); for (int i = 0; i < pvalues.size(); i++) { pvalues[i] /= (double)iters; } }else{ //divide iters between processors vector procIters; int numItersPerProcessor = iters / processors; //divide iters between processes for (int h = 0; h < processors; h++) { if(h == processors - 1){ numItersPerProcessor = iters - h * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } vector processIDS; int process = 1; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ pvalues = driver(groupings, num, indicatorValues, procIters[process]); //pass pvalues to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".pvalues.temp"; m->openOutputFile(tempFile, out); //pass values for (int i = 0; i < pvalues.size(); i++) { out << pvalues[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".pvalues.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".pvalues.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //divide iters between processors processIDS.resize(0); process = 1; procIters.clear(); int numItersPerProcessor = iters / processors; //divide iters between processes for (int h = 0; h < processors; h++) { if(h == processors - 1){ numItersPerProcessor = iters - h * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ pvalues = driver(groupings, num, indicatorValues, procIters[process]); //pass pvalues to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".pvalues.temp"; m->openOutputFile(tempFile, out); //pass values for (int i = 0; i < pvalues.size(); i++) { out << pvalues[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part pvalues = driver(groupings, num, indicatorValues, procIters[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); ////// to do /////////// int numTemp; numTemp = 0; for (int j = 0; j < pvalues.size(); j++) { in >> numTemp; m->gobble(in); pvalues[j] += numTemp; } in.close(); m->mothurRemove(tempFile); } for (int i = 0; i < pvalues.size(); i++) { pvalues[i] /= (double)iters; } #else //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; i > newGroupings; for (int k = 0; k < groupings.size(); k++) { vector newLookup; for (int l = 0; l < groupings[k].size(); l++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(groupings[k][l]->getLabel()); temp->setGroup(groupings[k][l]->getGroup()); newLookup.push_back(temp); } newGroupings.push_back(newLookup); } //for each bin for (int l = 0; l < groupings.size(); l++) { for (int k = 0; k < groupings[l][0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newGroupings.size(); j++) { for (int u = 0; u < newGroupings[j].size(); u++) { delete newGroupings[j][u]; } } return pvalues; } for (int j = 0; j < groupings[l].size(); j++) { newGroupings[l][j]->push_back(groupings[l][j]->getAbundance(k), groupings[l][j]->getGroup()); } } } vector copyIValues = indicatorValues; indicatorData* temp = new indicatorData(m, procIters[i], newGroupings, num, copyIValues); pDataArray.push_back(temp); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyIndicatorThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //do my part pvalues = driver(groupings, num, indicatorValues, procIters[0]); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (int j = 0; j < pDataArray[i]->pvalues.size(); j++) { pvalues[j] += pDataArray[i]->pvalues[j]; } for (int l = 0; l < pDataArray[i]->groupings.size(); l++) { for (int j = 0; j < pDataArray[i]->groupings[l].size(); j++) { delete pDataArray[i]->groupings[l][j]; } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } for (int i = 0; i < pvalues.size(); i++) { pvalues[i] /= (double)iters; } #endif } return pvalues; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getPValues"); exit(1); } } //********************************************************************************************************************** //same as above, just data type difference vector IndicatorCommand::driver(vector< vector >& groupings, int num, vector indicatorValues, int numIters){ try { vector pvalues; pvalues.resize(indicatorValues.size(), 0); vector notUsedGroupings; //we dont care about the grouping for the pvalues since they are randomized, but we need to pass the function something to make it work. for(int i=0;icontrol_pressed) { break; } map< vector, vector > groupingsMap = randomizeGroupings(groupings, num); vector randomIndicatorValues = getValues(groupings, notUsedGroupings, groupingsMap); for (int j = 0; j < indicatorValues.size(); j++) { if (randomIndicatorValues[j] >= indicatorValues[j]) { pvalues[j]++; } } } return pvalues; }catch(exception& e) { m->errorOut(e, "IndicatorCommand", "driver"); exit(1); } } //********************************************************************************************************************** //same as above, just data type difference vector IndicatorCommand::getPValues(vector< vector >& groupings, int num, vector indicatorValues){ try { vector pvalues; bool recalc = false; if(processors == 1){ pvalues = driver(groupings, num, indicatorValues, iters); for (int i = 0; i < pvalues.size(); i++) { pvalues[i] /= (double)iters; } }else{ //divide iters between processors vector procIters; int numItersPerProcessor = iters / processors; //divide iters between processes for (int h = 0; h < processors; h++) { if(h == processors - 1){ numItersPerProcessor = iters - h * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } vector processIDS; int process = 1; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ pvalues = driver(groupings, num, indicatorValues, procIters[process]); //pass pvalues to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".pvalues.temp"; m->openOutputFile(tempFile, out); //pass values for (int i = 0; i < pvalues.size(); i++) { out << pvalues[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".pvalues.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".pvalues.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //divide iters between processors processIDS.resize(0); process = 1; procIters.clear(); int numItersPerProcessor = iters / processors; //divide iters between processes for (int h = 0; h < processors; h++) { if(h == processors - 1){ numItersPerProcessor = iters - h * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ pvalues = driver(groupings, num, indicatorValues, procIters[process]); //pass pvalues to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".pvalues.temp"; m->openOutputFile(tempFile, out); //pass values for (int i = 0; i < pvalues.size(); i++) { out << pvalues[i] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part pvalues = driver(groupings, num, indicatorValues, procIters[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); ////// to do /////////// int numTemp; numTemp = 0; for (int j = 0; j < pvalues.size(); j++) { in >> numTemp; m->gobble(in); pvalues[j] += numTemp; } in.close(); m->mothurRemove(tempFile); } for (int i = 0; i < pvalues.size(); i++) { pvalues[i] /= (double)iters; } #else //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; i > newGroupings; for (int k = 0; k < groupings.size(); k++) { vector newLookup; for (int l = 0; l < groupings[k].size(); l++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(groupings[k][l]->getLabel()); temp->setGroup(groupings[k][l]->getGroup()); newLookup.push_back(temp); } newGroupings.push_back(newLookup); } //for each bin for (int l = 0; l < groupings.size(); l++) { for (int k = 0; k < groupings[l][0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newGroupings.size(); j++) { for (int u = 0; u < newGroupings[j].size(); u++) { delete newGroupings[j][u]; } } return pvalues; } for (int j = 0; j < groupings[l].size(); j++) { newGroupings[l][j]->push_back((float)(groupings[l][j]->getAbundance(k)), groupings[l][j]->getGroup()); } } } vector copyIValues = indicatorValues; indicatorData* temp = new indicatorData(m, procIters[i], newGroupings, num, copyIValues); pDataArray.push_back(temp); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyIndicatorThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //do my part pvalues = driver(groupings, num, indicatorValues, procIters[0]); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (int j = 0; j < pDataArray[i]->pvalues.size(); j++) { pvalues[j] += pDataArray[i]->pvalues[j]; } for (int l = 0; l < pDataArray[i]->groupings.size(); l++) { for (int j = 0; j < pDataArray[i]->groupings[l].size(); j++) { delete pDataArray[i]->groupings[l][j]; } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } for (int i = 0; i < pvalues.size(); i++) { pvalues[i] /= (double)iters; } #endif } return pvalues; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "getPValues"); exit(1); } } //********************************************************************************************************************** //swap groups between groupings, in essence randomizing the second column of the design file map< vector, vector > IndicatorCommand::randomizeGroupings(vector< vector >& groupings, int numLookupGroups){ try { map< vector, vector > randomGroupings; for (int i = 0; i < numLookupGroups; i++) { if (m->control_pressed) {break;} //get random groups to swap to switch with //generate random int between 0 and groupings.size()-1 int z = m->getRandomIndex(groupings.size()-1); int x = m->getRandomIndex(groupings.size()-1); int a = m->getRandomIndex(groupings[z].size()-1); int b = m->getRandomIndex(groupings[x].size()-1); //cout << i << '\t' << z << '\t' << x << '\t' << a << '\t' << b << endl; //if ((z < 0) || (z > 1) || x<0 || x>1 || a <0 || a>groupings[z].size()-1 || b<0 || b>groupings[x].size()-1) { cout << "probelm" << i << '\t' << z << '\t' << x << '\t' << a << '\t' << b << endl; } vector from; vector to; from.push_back(z); from.push_back(a); to.push_back(x); to.push_back(b); randomGroupings[from] = to; } //cout << "done" << endl; return randomGroupings; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "randomizeGroupings"); exit(1); } } //********************************************************************************************************************** //swap groups between groupings, in essence randomizing the second column of the design file map< vector, vector > IndicatorCommand::randomizeGroupings(vector< vector >& groupings, int numLookupGroups){ try { map< vector, vector > randomGroupings; for (int i = 0; i < numLookupGroups; i++) { //get random groups to swap to switch with //generate random int between 0 and groupings.size()-1 int z = m->getRandomIndex(groupings.size()-1); int x = m->getRandomIndex(groupings.size()-1); int a = m->getRandomIndex(groupings[z].size()-1); int b = m->getRandomIndex(groupings[x].size()-1); //cout << i << '\t' << z << '\t' << x << '\t' << a << '\t' << b << endl; vector from; vector to; from.push_back(z); from.push_back(a); to.push_back(x); to.push_back(b); randomGroupings[from] = to; } return randomGroupings; } catch(exception& e) { m->errorOut(e, "IndicatorCommand", "randomizeGroupings"); exit(1); } } /*****************************************************************/ mothur-1.36.1/source/commands/indicatorcommand.h000066400000000000000000000175761255543666200217430ustar00rootroot00000000000000#ifndef INDICATORCOMMAND_H #define INDICATORCOMMAND_H /* * indicatorcommand.h * Mothur * * Created by westcott on 11/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "readtree.h" #include "counttable.h" #include "sharedrabundvector.h" #include "sharedrabundfloatvector.h" #include "inputdata.h" #include "designmap.h" class IndicatorCommand : public Command { public: IndicatorCommand(string); IndicatorCommand(); ~IndicatorCommand(){} vector setParameters(); string getCommandName() { return "indicator"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Dufrene M, Legendre P (1997). Species assemblages and indicator species: The need for a flexible asymmetrical approach. Ecol Monogr 67: 345-66.\n McCune B, Grace JB, Urban DL (2002). Analysis of ecological communities. MjM Software Design: Gleneden Beach, OR. \nLegendre P, Legendre L (1998). Numerical Ecology. Elsevier: New York. \nhttp://www.mothur.org/wiki/Indicator"; } string getDescription() { return "calculate the indicator value for each OTU"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: ReadTree* read; CountTable* ct; DesignMap* designMap; string treefile, sharedfile, relabundfile, groups, label, inputFileName, outputDir, designfile; bool abort; int iters, processors; vector outputNames, Groups; vector lookup; vector lookupFloat; int getShared(); int getSharedFloat(); int GetIndicatorSpecies(Tree*&); int GetIndicatorSpecies(); set getDescendantList(Tree*&, int, map >, map >&); vector getValues(vector< vector >&, vector&, map< vector, vector >); vector getValues(vector< vector >&, vector&, map< vector, vector >); map getDistToRoot(Tree*&); map< vector, vector > randomizeGroupings(vector< vector >&, int); map< vector, vector > randomizeGroupings(vector< vector >&, int); vector driver(vector< vector >&, int, vector, int); vector driver(vector< vector >&, int, vector, int); vector getPValues(vector< vector >&, int, vector); vector getPValues(vector< vector >&, int, vector); }; /**************************************************************************************************/ struct indicatorData { vector< vector > groupings; MothurOut* m; int iters, num; vector indicatorValues; vector pvalues; indicatorData(){} indicatorData(MothurOut* mout, int it, vector< vector > ng, int n, vector iv) { m = mout; iters = it; groupings = ng; indicatorValues = iv; num = n; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyIndicatorThreadFunction(LPVOID lpParam){ indicatorData* pDataArray; pDataArray = (indicatorData*)lpParam; try { pDataArray->pvalues.resize(pDataArray->indicatorValues.size(), 0); for(int i=0;iiters;i++){ if (pDataArray->m->control_pressed) { break; } //groupingsMap = randomizeGroupings(groupings, num); /////////////////////////////////////////////////////////////////////// map< vector, vector > randomGroupings; for (int j = 0; j < pDataArray->num; j++) { //get random groups to swap to switch with //generate random int between 0 and groupings.size()-1 int z = pDataArray->m->getRandomIndex(pDataArray->groupings.size()-1); int x = pDataArray->m->getRandomIndex(pDataArray->groupings.size()-1); int a = pDataArray->m->getRandomIndex(pDataArray->groupings[z].size()-1); int b = pDataArray->m->getRandomIndex(pDataArray->groupings[x].size()-1); //cout << i << '\t' << z << '\t' << x << '\t' << a << '\t' << b << endl; vector from; vector to; from.push_back(z); from.push_back(a); to.push_back(x); to.push_back(b); randomGroupings[from] = to; } /////////////////////////////////////////////////////////////////////// //vector randomIndicatorValues = getValues(groupings, notUsedGroupings, randomGroupings); /////////////////////////////////////////////////////////////////////// vector randomIndicatorValues; map< vector, vector >::iterator it; //for each otu for (int i = 0; i < pDataArray->groupings[0][0]->getNumBins(); i++) { if (pDataArray->m->control_pressed) { return 0; } vector terms; float AijDenominator = 0.0; vector Bij; //get overall abundance of each grouping for (int j = 0; j < pDataArray->groupings.size(); j++) { float totalAbund = 0; int numNotZero = 0; for (int k = 0; k < pDataArray->groupings[j].size(); k++) { vector temp; temp.push_back(j); temp.push_back(k); it = randomGroupings.find(temp); if (it == randomGroupings.end()) { //this one didnt get moved totalAbund += pDataArray->groupings[j][k]->getAbundance(i); if (pDataArray->groupings[j][k]->getAbundance(i) != 0.0) { numNotZero++; } }else { totalAbund += pDataArray->groupings[(it->second)[0]][(it->second)[1]]->getAbundance(i); if (pDataArray->groupings[(it->second)[0]][(it->second)[1]]->getAbundance(i) != 0.0) { numNotZero++; } } } //mean abundance float Aij = (totalAbund / (float) pDataArray->groupings[j].size()); terms.push_back(Aij); //percentage of sites represented Bij.push_back(numNotZero / (float) pDataArray->groupings[j].size()); AijDenominator += Aij; } float maxIndVal = 0.0; for (int j = 0; j < terms.size(); j++) { float thisAij = (terms[j] / AijDenominator); //relative abundance float thisValue = thisAij * Bij[j] * 100.0; //save largest if (thisValue > maxIndVal) { maxIndVal = thisValue; } } randomIndicatorValues.push_back(maxIndVal); } /////////////////////////////////////////////////////////////////////// for (int j = 0; j < pDataArray->indicatorValues.size(); j++) { if (randomIndicatorValues[j] >= pDataArray->indicatorValues[j]) { pDataArray->pvalues[j]++; } } } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "IndicatorCommand", "MyIndicatorThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/kruskalwalliscommand.cpp000066400000000000000000000337161255543666200232040ustar00rootroot00000000000000/* * File: kruskalwalliscommand.cpp * Author: kiverson * * Created on June 26, 2012, 11:06 AM */ #include "kruskalwalliscommand.h" #include "linearalgebra.h" //********************************************************************************************************************** vector KruskalWallisCommand::setParameters(){ try { CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pdesign); CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","summary",false,true,true); parameters.push_back(pshared); CommandParameter pclass("class", "String", "", "", "", "", "","",false,false); parameters.push_back(pclass); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); //every command must have inputdir and outputdir. This allows mothur users to redirect input and output files. CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "KruskalWallisCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string KruskalWallisCommand::getHelpString(){ try { string helpString = ""; helpString += "The kruskal.wallis command allows you to ....\n"; helpString += "The kruskal.wallis command parameters are: shared, design, class, label and classes.\n"; helpString += "The class parameter is used to indicate the which category you would like used for the Kruskal Wallis analysis. If none is provided first category is used.\n"; helpString += "The label parameter is used to indicate which distances in the shared file you would like to use. labels are separated by dashes.\n"; helpString += "The kruskal.wallis command should be in the following format: kruskal.wallis(shared=final.an.shared, design=final.design, class=treatment).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "KruskalWallisCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string KruskalWallisCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "kruskall-wallis") { pattern = "[filename],[distance],kruskall_wallis"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "KruskalWallisCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** KruskalWallisCommand::KruskalWallisCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["kruskall-wallis"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "KruskalWallisCommand", "KruskalWallisCommand"); exit(1); } } //********************************************************************************************************************** KruskalWallisCommand::KruskalWallisCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["kruskall-wallis"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["desing"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //get shared file, it is required sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //get shared file, it is required designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { designfile = ""; abort = true; } else if (designfile == "not found") { //if there is a current shared file, use it designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current design file and the design parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setDesignFile(designfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); //if user entered a file with a path then preserve it } string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } mclass = validParameter.validFile(parameters, "class", false); if (mclass == "not found") { mclass = ""; } } } catch(exception& e) { m->errorOut(e, "KruskalWallisCommand", "KruskalWallisCommand"); exit(1); } } //********************************************************************************************************************** int KruskalWallisCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } DesignMap designMap(designfile); //if user did not select class use first column if (mclass == "") { mclass = designMap.getDefaultClass(); m->mothurOut("\nYou did not provide a class, using " + mclass +".\n\n"); } InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, designMap); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, designMap); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, designMap); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "KruskalWallisCommand", "execute"); exit(1); } } //********************************************************************************************************************** int KruskalWallisCommand::process(vector& lookup, DesignMap& designMap) { try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("kruskall-wallis",variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["kruskall-wallis"].push_back(outputFileName); out << "OTULabel\tKW\tPvalue\n"; int numBins = lookup[0]->getNumBins(); //sanity check to make sure each treatment has a group in the shared file set treatments; for (int j = 0; j < lookup.size(); j++) { string group = lookup[j]->getGroup(); string treatment = designMap.get(group, mclass); //get value for this group in this category treatments.insert(treatment); } if (treatments.size() < 2) { m->mothurOut("[ERROR]: need at least 2 things to classes to compare, quitting.\n"); m->control_pressed = true; } LinearAlgebra linear; for (int i = 0; i < numBins; i++) { if (m->control_pressed) { break; } vector values; for (int j = 0; j < lookup.size(); j++) { string group = lookup[j]->getGroup(); string treatment = designMap.get(group, mclass); //get value for this group in this category spearmanRank temp(treatment, lookup[j]->getAbundance(i)); values.push_back(temp); } double pValue = 0.0; double H = linear.calcKruskalWallis(values, pValue); //output H and signifigance out << m->currentSharedBinLabels[i] << '\t' << H << '\t' << pValue << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "KruskalWallisCommand", "process"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/kruskalwalliscommand.h000066400000000000000000000024731255543666200226450ustar00rootroot00000000000000/* * File: kruskalwalliscommand.h * Author: kiverson * * Created on June 26, 2012, 11:07 AM */ #ifndef KRUSKALWALLISCOMMAND_H #define KRUSKALWALLISCOMMAND_H #include "command.hpp" #include "inputdata.h" #include "designmap.h" class KruskalWallisCommand : public Command { public: KruskalWallisCommand(string); KruskalWallisCommand(); ~KruskalWallisCommand(){} vector setParameters(); string getCommandName() { return "kruskal.wallis"; } string getCommandCategory() { return "Hypothesis Testing"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "http://www.mothur.org/wiki/Kruskal.wallis"; } string getDescription() { return "Non-parametric method for testing whether samples originate from the same distribution."; } struct groupRank { string group; double value; double rank; }; int execute(); void help() { m->mothurOut(getHelpString()); } void assignRank(vector&); void assignValue(vector&); private: bool abort, allLines; string outputDir, sharedfile, designfile, mclass; vector outputNames; set labels; int process(vector&, DesignMap&); }; #endif /* KRUSKALWALLISCOMMAND_H */ mothur-1.36.1/source/commands/lefsecommand.cpp000066400000000000000000001722361255543666200214130ustar00rootroot00000000000000// // lefsecommand.cpp // Mothur // // Created by SarahsWork on 6/12/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "lefsecommand.h" #include "linearalgebra.h" //********************************************************************************************************************** vector LefseCommand::setParameters(){ try { CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pdesign); CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","summary",false,true,true); parameters.push_back(pshared); CommandParameter pclass("class", "String", "", "", "", "", "","",false,false); parameters.push_back(pclass); CommandParameter psubclass("subclass", "String", "", "", "", "", "","",false,false); parameters.push_back(psubclass); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); //CommandParameter pclasses("classes", "String", "", "", "", "", "","",false,false); parameters.push_back(pclasses); CommandParameter palpha("aalpha", "Number", "", "0.05", "", "", "","",false,false); parameters.push_back(palpha); CommandParameter pwalpha("walpha", "Number", "", "0.05", "", "", "","",false,false); parameters.push_back(pwalpha); CommandParameter plda("lda", "Number", "", "2.0", "", "", "","",false,false); parameters.push_back(plda); CommandParameter pwilc("wilc", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pwilc); CommandParameter pnormmillion("norm", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pnormmillion); CommandParameter piters("iters", "Number", "", "30", "", "", "","",false,false); parameters.push_back(piters); //CommandParameter pwilcsamename("wilcsamename", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pwilcsamename); CommandParameter pcurv("curv", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pcurv); CommandParameter pfiters("fboots", "Number", "", "0.67", "", "", "","",false,false); parameters.push_back(pfiters); CommandParameter pstrict("strict", "Multiple", "0-1-2", "0", "", "", "","",false,false); parameters.push_back(pstrict); CommandParameter pminc("minc", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pminc); CommandParameter pmulticlass_strat("multiclass", "Multiple", "onevone-onevall", "onevall", "", "", "","",false,false); parameters.push_back(pmulticlass_strat); //CommandParameter psubject("subject", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psubject); //not used in their current code, but in parameters //CommandParameter pnlogs("nlogs", "Number", "", "3", "", "", "","",false,false); parameters.push_back(pnlogs); //CommandParameter pranktec("ranktec", "Multiple", "lda-svm", "lda", "", "", "","",false,false); parameters.push_back(pranktec); // svm not implemented in their source yet. //CommandParameter psvmnorm("svmnorm", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(psvmnorm); //not used because svm not implemented yet. //every command must have inputdir and outputdir. This allows mothur users to redirect input and output files. CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "LefseCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string LefseCommand::getHelpString(){ try { string helpString = ""; helpString += "The lefse command allows you to ....\n"; helpString += "The lefse command parameters are: shared, design, class, subclass, label, walpha, aalpha, lda, wilc, iters, curv, fboots, strict, minc, multiclass and norm.\n"; helpString += "The class parameter is used to indicate the which category you would like used for the Kruskal Wallis analysis. If none is provided first category is used.\n"; helpString += "The subclass parameter is used to indicate the .....If none is provided, second category is used, or if only one category subclass is ignored. \n"; helpString += "The aalpha parameter is used to set the alpha value for the Krukal Wallis Anova test Default=0.05. \n"; helpString += "The walpha parameter is used to set the alpha value for the Wilcoxon test. Default=0.05. \n"; helpString += "The lda parameter is used to set the threshold on the absolute value of the logarithmic LDA score. Default=2.0. \n"; helpString += "The wilc parameter is used to indicate whether to perform the Wilcoxon test. Default=T. \n"; helpString += "The iters parameter is used to set the number of bootstrap iteration for LDA. Default=30. \n"; //helpString += "The wilcsamename parameter is used to indicate whether perform the wilcoxon test only among the subclasses with the same name. Default=F. \n"; helpString += "The curv parameter is used to set whether perform the wilcoxon testing the Curtis's approach [BETA VERSION] Default=F. \n"; helpString += "The norm parameter is used to multiply relative abundances by 1000000. Recommended when very low values are present. Default=T. \n"; helpString += "The fboots parameter is used to set the subsampling fraction value for each bootstrap iteration. Default=0.67. \n"; helpString += "The strict parameter is used to set the multiple testing correction options. 0 no correction (more strict, default), 1 correction for independent comparisons, 2 correction for independent comparison. Options = 0,1,2. Default=0. \n"; helpString += "The minc parameter is used to minimum number of samples per subclass for performing wilcoxon test. Default=10. \n"; helpString += "The multiclass parameter is used to (for multiclass tasks) set whether the test is performed in a one-against-one ( onevone - more strict!) or in a one-against-all setting ( onevall - less strict). Default=onevall. \n"; //helpString += "The classes parameter is used to indicate the classes you would like to use. Classes should be inputted in the following format: classes=label-label. For example to include groups from treatment with value early or late and age= young or old. class=treatment-age.\n"; helpString += "The label parameter is used to indicate which distances in the shared file you would like to use. labels are separated by dashes.\n"; helpString += "The lefse command should be in the following format: lefse(shared=final.an.shared, design=final.design, class=treatment, subclass=age).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "LefseCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string LefseCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],[distance],lefse_summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "LefseCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** LefseCommand::LefseCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "LefseCommand", "LefseCommand"); exit(1); } } //********************************************************************************************************************** LefseCommand::LefseCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["desing"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //get shared file, it is required sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //get shared file, it is required designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { designfile = ""; abort = true; } else if (designfile == "not found") { //if there is a current shared file, use it designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current design file and the design parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setDesignFile(designfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); //if user entered a file with a path then preserve it } string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } mclass = validParameter.validFile(parameters, "class", false); if (mclass == "not found") { mclass = ""; } subclass = validParameter.validFile(parameters, "subclass", false); if (subclass == "not found") { subclass = mclass; } string temp = validParameter.validFile(parameters, "aalpha", false); if (temp == "not found") { temp = "0.05"; } m->mothurConvert(temp, anovaAlpha); temp = validParameter.validFile(parameters, "walpha", false); if (temp == "not found") { temp = "0.05"; } m->mothurConvert(temp, wilcoxonAlpha); temp = validParameter.validFile(parameters, "wilc", false); if (temp == "not found") { temp = "T"; } wilc = m->isTrue(temp); temp = validParameter.validFile(parameters, "norm", false); if (temp == "not found") { temp = "T"; } normMillion = m->isTrue(temp); //temp = validParameter.validFile(parameters, "subject", false); //if (temp == "not found") { temp = "F"; } //subject = m->isTrue(temp); temp = validParameter.validFile(parameters, "lda", false); if (temp == "not found") { temp = "2.0"; } m->mothurConvert(temp, ldaThreshold); temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "30"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "fboots", false); if (temp == "not found") { temp = "0.67"; } m->mothurConvert(temp, fBoots); //temp = validParameter.validFile(parameters, "wilcsamename", false); //if (temp == "not found") { temp = "F"; } //wilcsamename = m->isTrue(temp); temp = validParameter.validFile(parameters, "curv", false); if (temp == "not found") { temp = "F"; } curv = m->isTrue(temp); temp = validParameter.validFile(parameters, "strict", false); if (temp == "not found"){ temp = "0"; } if ((temp != "0") && (temp != "1") && (temp != "2")) { m->mothurOut("Invalid strict option: choices are 0, 1 or 2."); m->mothurOutEndLine(); abort=true; } else { m->mothurConvert(temp, strict); } temp = validParameter.validFile(parameters, "minc", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, minC); multiClassStrat = validParameter.validFile(parameters, "multiclass", false); if (multiClassStrat == "not found"){ multiClassStrat = "onevall"; } if ((multiClassStrat != "onevall") && (multiClassStrat != "onevone")) { m->mothurOut("Invalid multiclass option: choices are onevone or onevall."); m->mothurOutEndLine(); abort=true; } } } catch(exception& e) { m->errorOut(e, "LefseCommand", "LefseCommand"); exit(1); } } //********************************************************************************************************************** int LefseCommand::execute(){ try { srand(1982); //for reading lefse formatted file and running in mothur for testing - pass number of rows used for design file if (false) { makeShared(1); exit(1); } if (abort == true) { if (calledHelp) { return 0; } return 2; } DesignMap designMap(designfile); //if user did not select class use first column if (mclass == "") { mclass = designMap.getDefaultClass(); m->mothurOut("\nYou did not provide a class, using " + mclass +".\n\n"); if (subclass == "") { subclass = mclass; } } InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundFloatVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, designMap); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, designMap); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { return 0; } //get next line to process lookup = input.getSharedRAbundFloatVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, designMap); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); srand(time(NULL)); return 0; } catch(exception& e) { m->errorOut(e, "LefseCommand", "execute"); exit(1); } } //********************************************************************************************************************** int LefseCommand::process(vector& lookup, DesignMap& designMap) { try { vector classes; vector subclasses; map subclass2Class; map > class2SubClasses; //maps class name to vector of its subclasses map > subClass2GroupIndex; //maps subclass name to vector of indexes in lookup from that subclass. old -> 1,2,3 means groups in location 1,2,3 of lookup are from old. Saves time below. map > class2GroupIndex; //maps subclass name to vector of indexes in lookup from that class. old -> 1,2,3 means groups in location 1,2,3 of lookup are from old. Saves time below. if (normMillion) { normalize(lookup); } for (int j = 0; j < lookup.size(); j++) { string group = lookup[j]->getGroup(); string treatment = designMap.get(group, mclass); //get value for this group in this category string thisSub = designMap.get(group, subclass); map::iterator it = subclass2Class.find(thisSub); if (it == subclass2Class.end()) { subclass2Class[thisSub] = treatment; vector temp; temp.push_back(j); subClass2GroupIndex[thisSub] = temp; } else { if (it->second != treatment) { //m->mothurOut("[WARNING]: subclass " + thisSub + " has members in " + it->second + " and " + treatment + ". Subclass members must be from the same class for Wilcoxon. Changing " + thisSub + " to " + treatment + "_" + thisSub + ".\n"); thisSub = treatment + "_" + thisSub; subclass2Class[thisSub] = treatment; vector temp; temp.push_back(j); subClass2GroupIndex[thisSub] = temp; }else { subClass2GroupIndex[thisSub].push_back(j); } } map >::iterator itClass = class2SubClasses.find(treatment); if (itClass == class2SubClasses.end()) { set temp; temp.insert(thisSub); class2SubClasses[treatment] = temp; vector temp2; temp2.push_back(j); class2GroupIndex[treatment] = temp2; classes.push_back(treatment); }else{ class2SubClasses[treatment].insert(thisSub); class2GroupIndex[treatment].push_back(j); } } //sort classes so order is right sort(classes.begin(), classes.end()); vector< vector > means = getMeans(lookup, class2GroupIndex); //[numOTUs][classes] - classes in same order as class2GroupIndex //run kruskal wallis on each otu map significantOtuLabels = runKruskalWallis(lookup, designMap); int numSigBeforeWilcox = significantOtuLabels.size(); if (m->debug) { m->mothurOut("[DEBUG]: completed Kruskal Wallis\n"); } //check for subclass string wilcoxString = ""; if ((subclass != "") && wilc) { significantOtuLabels = runWilcoxon(lookup, designMap, significantOtuLabels, class2SubClasses, subClass2GroupIndex, subclass2Class); wilcoxString += " ( " + toString(numSigBeforeWilcox) + " ) before internal wilcoxon"; } int numSigAfterWilcox = significantOtuLabels.size(); if (m->debug) { m->mothurOut("[DEBUG]: completed Wilcoxon\n"); } m->mothurOut("\nNumber of significantly discriminative features: " + toString(numSigAfterWilcox) + wilcoxString + ".\n"); map sigOTUSLDA; if (numSigAfterWilcox > 0) { sigOTUSLDA = testLDA(lookup, significantOtuLabels, class2GroupIndex, subClass2GroupIndex); m->mothurOut("Number of discriminative features with abs LDA score > " + toString(ldaThreshold) + " : " + toString(significantOtuLabels.size()) + ".\n"); } else { m->mothurOut("No features with significant differences between the classes.\n"); } if (m->debug) { m->mothurOut("[DEBUG]: completed lda\n"); } printResults(means, significantOtuLabels, sigOTUSLDA, lookup[0]->getLabel(), classes); return 0; } catch(exception& e) { m->errorOut(e, "LefseCommand", "process"); exit(1); } } //********************************************************************************************************************** int LefseCommand::normalize(vector& lookup) { try { vector mul; for (int i = 0; i < lookup.size(); i++) { double sum = 0.0; for (int j = 0; j < lookup[i]->getNumBins(); j++) { sum += lookup[i]->getAbundance(j); } mul.push_back(1000000.0/sum); } for (int i = 0; i < lookup.size(); i++) { for (int j = 0; j < lookup[i]->getNumBins(); j++) { lookup[i]->set(j, lookup[i]->getAbundance(j)*mul[i], lookup[i]->getGroup()); } } return 0; } catch(exception& e) { m->errorOut(e, "LefseCommand", "normalize"); exit(1); } } //********************************************************************************************************************** map LefseCommand::runKruskalWallis(vector& lookup, DesignMap& designMap) { try { map significantOtuLabels; int numBins = lookup[0]->getNumBins(); //sanity check to make sure each treatment has a group in the shared file set treatments; for (int j = 0; j < lookup.size(); j++) { string group = lookup[j]->getGroup(); string treatment = designMap.get(group, mclass); //get value for this group in this category treatments.insert(treatment); } if (treatments.size() < 2) { m->mothurOut("[ERROR]: need at least 2 things to classes to compare, quitting.\n"); m->control_pressed = true; } LinearAlgebra linear; for (int i = 0; i < numBins; i++) { if (m->control_pressed) { break; } vector values; for (int j = 0; j < lookup.size(); j++) { string group = lookup[j]->getGroup(); string treatment = designMap.get(group, mclass); //get value for this group in this category spearmanRank temp(treatment, lookup[j]->getAbundance(i)); values.push_back(temp); } double pValue = 0.0; linear.calcKruskalWallis(values, pValue); if (pValue < anovaAlpha) { significantOtuLabels[i] = pValue; } } return significantOtuLabels; } catch(exception& e) { m->errorOut(e, "LefseCommand", "runKruskalWallis"); exit(1); } } //********************************************************************************************************************** //assumes not neccessarily paired map LefseCommand::runWilcoxon(vector& lookup, DesignMap& designMap, map bins, map >& class2SubClasses, map >& subClass2GroupIndex, map subclass2Class) { try { map significantOtuLabels; map::iterator it; //if it exists and meets the following requirements run Wilcoxon /* 1. Subclass members all belong to same main class anything else */ int numBins = lookup[0]->getNumBins(); for (int i = 0; i < numBins; i++) { if (m->control_pressed) { break; } it = bins.find(i); if (it != bins.end()) { //flagged in Kruskal Wallis vector abunds; for (int j = 0; j < lookup.size(); j++) { abunds.push_back(lookup[j]->getAbundance(i)); } bool sig = testOTUWilcoxon(class2SubClasses, abunds, subClass2GroupIndex, subclass2Class); if (sig) { significantOtuLabels[i] = it->second; } }//bins flagged from kw }//for bins return significantOtuLabels; } catch(exception& e) { m->errorOut(e, "LefseCommand", "runWilcoxon"); exit(1); } } //********************************************************************************************************************** //lefse.py - test_rep_wilcoxon_r function bool LefseCommand::testOTUWilcoxon(map >& class2SubClasses, vector abunds, map >& subClass2GroupIndex, map subclass2Class) { try { int totalOk = 0; double alphaMtc = wilcoxonAlpha; vector< set > allDiffs; LinearAlgebra linear; //for each subclass comparision map >::iterator itB; for(map >::iterator it=class2SubClasses.begin();it!=class2SubClasses.end();it++){ itB = it;itB++; for(itB;itB!=class2SubClasses.end();itB++){ if (m->control_pressed) { return false; } bool first = true; int dirCmp = 0; // not set?? dir_cmp = "not_set" # 0=notset or none, 1=true, 2=false. int curv_sign = 0; int ok = 0; int count = 0; for (set::iterator itClass1 = (it->second).begin(); itClass1 != (it->second).end(); itClass1++) { bool br = false; for (set::iterator itClass2 = (itB->second).begin(); itClass2 != (itB->second).end(); itClass2++) { string subclass1 = *itClass1; string subclass2 = *itClass2; count++; if (m->debug) { m->mothurOut( "[DEBUG comparing " + it->first + "-" + *itClass1 + " to " + itB->first + "-" + *itClass2 + "\n"); } string treatment1 = subclass2Class[subclass1]; string treatment2 = subclass2Class[subclass2]; int numSubs1 = class2SubClasses[treatment1].size(); int numSubs2 = class2SubClasses[treatment2].size(); //if mul_cor != 0: alpha_mtc = th*l_subcl1*l_subcl2 if mul_cor == 2 else 1.0-math.pow(1.0-th,l_subcl1*l_subcl2) if (strict != 0) { alphaMtc = wilcoxonAlpha * numSubs1 * numSubs2 ; } if (strict == 2) {}else{ alphaMtc = 1.0-pow((1.0-wilcoxonAlpha),(double)(numSubs1 * numSubs2)); } //fill x and y with this comparisons data vector x; vector y; //fill x and y vector xIndexes = subClass2GroupIndex[subclass1]; //indexes in lookup for this subclass vector yIndexes = subClass2GroupIndex[subclass2]; //indexes in lookup for this subclass for (int k = 0; k < yIndexes.size(); k++) { y.push_back(abunds[yIndexes[k]]); } for (int k = 0; k < xIndexes.size(); k++) { x.push_back(abunds[xIndexes[k]]); } // med_comp = False //if len(cl1) < min_c or len(cl2) < min_c: //med_comp = True bool medComp = false; // are there enough samples per subclass if ((xIndexes.size() < minC) || (yIndexes.size() < minC)) { medComp = true; } double sx = m->median(x); double sy = m->median(y); //if cl1[0] == cl2[0] and len(set(cl1)) == 1 and len(set(cl2)) == 1: //tres, first = False, False double pValue = 0.0; double H = 0.0; bool tres = true; //don't think this is set in the python source. Not sure how that is handled, but setting it here. if ((x[0] == y[0]) && (x.size() == 1) && (y.size() == 1)) { tres = false; first = false; } else if (!medComp) { H = linear.calcWilcoxon(x, y, pValue); if (pValue < (alphaMtc*2.0)) { tres = true; } else { tres = false; } } /*if first: first = False if not curv and ( med_comp or tres ): dir_cmp = sx < sy if sx == sy: br = True elif curv: dir_cmp = None if med_comp or tres: curv_sign += 1 dir_cmp = sx < sy else: br = True elif not curv and med_comp: if ((sx < sy) != dir_cmp or sx == sy): br = True elif curv: if tres and dir_cmp == None: curv_sign += 1 dir_cmp = sx < sy if tres and dir_cmp != (sx < sy): br = True curv_sign = -1 elif not tres or (sx < sy) != dir_cmp or sx == sy: br = True */ int sxSy = 2; //false if (sx 0) { diff = true; } } //if curv: diff = curv_sign > 0 else { //else: diff = (ok == len(cl_hie[pair[1]])*len(cl_hie[pair[0]])) diff = false; if (ok == count) { diff = true; } } if (diff) { totalOk++; } if (!diff && (multiClassStrat == "onevone")) { return false; } if (diff && (multiClassStrat == "onevall")) { //all_diff.append(pair) set pair; pair.insert(it->first); pair.insert(itB->first); allDiffs.push_back(pair); } }//classes }//classes if (multiClassStrat == "onevall") { int tot_k = class2SubClasses.size(); for(map >::iterator it=class2SubClasses.begin();it!=class2SubClasses.end();it++){ if (m->control_pressed) { return false; } int nk = 0; //is this class okay in all comparisons for (int h = 0; h < allDiffs.size(); h++) { if (allDiffs[h].count(it->first) != 0) { nk++; } } if (nk == (tot_k-1)) { return true; }//if nk == tot_k-1: return True } return false; } return true; } catch(exception& e) { m->errorOut(e, "LefseCommand", "testOTUWilcoxon"); exit(1); } } //********************************************************************************************************************** //modelled after lefse.py test_lda_r function map LefseCommand::testLDA(vector& lookup, map bins, map >& class2GroupIndex, map >& subClass2GroupIndex) { try { map sigOTUS; map::iterator it; LinearAlgebra linear; int numBins = lookup[0]->getNumBins(); vector< vector > adjustedLookup; for (int i = 0; i < numBins; i++) { if (m->control_pressed) { break; } if (m->debug) { m->mothurOut("[DEBUG]: bin = " + toString(i) + "\n."); } it = bins.find(i); if (it != bins.end()) { //flagged in Kruskal Wallis and Wilcoxon(if we ran it) if (m->debug) { m->mothurOut("[DEBUG]:flagged bin = " + toString(i) + "\n."); } //fill x with this OTUs abundances vector x; for (int j = 0; j < lookup.size(); j++) { x.push_back(lookup[j]->getAbundance(i)); } //go through classes for (map >::iterator it = class2GroupIndex.begin(); it != class2GroupIndex.end(); it++) { if (m->debug) { m->mothurOut("[DEBUG]: class = " + it->first + "\n."); } //max(float(feats['class'].count(c))*0.5,4) //max(numGroups in this class*0.5, 4.0) double necessaryNum = ((double)((it->second).size())*0.5); if (4.0 > necessaryNum) { necessaryNum = 4.0; } set uniques; for (int j = 0; j < (it->second).size(); j++) { uniques.insert(x[(it->second)[j]]); } //if len(set([float(v[1]) for v in ff if v[0] == c])) > max(float(feats['class'].count(c))*0.5,4): continue if ((double)(uniques.size()) > necessaryNum) { } else { //feats[k][i] = math.fabs(feats[k][i] + lrand.normalvariate(0.0,max(feats[k][i]*0.05,0.01))) for (int j = 0; j < (it->second).size(); j++) { //(it->second) contains indexes of abundance for this class double sigma = max((x[(it->second)[j]]*0.05), 0.01); x[(it->second)[j]] = abs(x[(it->second)[j]] + linear.normalvariate(0.0, sigma)); } } } adjustedLookup.push_back(x); } } //go through classes int minCl = 1e6; map indexToClass; vector classes; for (map >::iterator it = class2GroupIndex.begin(); it != class2GroupIndex.end(); it++) { //class with minimum number of groups if ((it->second).size() < minCl) { minCl = (it->second).size(); } for (int i = 0; i < (it->second).size(); i++) { indexToClass[(it->second)[i]] = it->first; } classes.push_back(it->first); } int numGroups = lookup.size(); //lfk int fractionNumGroups = numGroups * fBoots; //rfk minCl = (int)((float)(minCl*fBoots*fBoots*0.05)); minCl = max(minCl, 1); if (m->debug) { m->mothurOut("[DEBUG]: about to start iters. \n."); } vector< vector< vector > > results;//[iters][numComparison][numOTUs] for (int j = 0; j < iters; j++) { if (m->control_pressed) { return sigOTUS; } if (m->debug) { m->mothurOut("[DEBUG]: iter = " + toString(j) + "\n."); } //find "good" random vector vector rand_s; int save = 0; for (int h = 0; h < 1000; h++) { //generate a vector of length fractionNumGroups with range 0 to numGroups-1 save = h; rand_s.clear(); for (int k = 0; k < fractionNumGroups; k++) { rand_s.push_back(m->getRandomIndex(numGroups-1)); } if (!contastWithinClassesOrFewPerClass(adjustedLookup, rand_s, minCl, class2GroupIndex, indexToClass)) { h+=1000; save += 1000; } //break out of loop } if (m->control_pressed) { return sigOTUS; } if (m->debug) { m->mothurOut("[DEBUG]: after 1000. \n."); } //print data in R input format for testing if (false) { vector groups; for (int h = 0; h < rand_s.size(); h++) { groups.push_back(lookup[rand_s[h]]->getGroup()); } for (int h = 0; h < groups.size(); h++) { cout << groups[h]<< endl; } //printToCoutForRTesting(adjustedLookup, rand_s, class2GroupIndex, bins, subClass2GroupIndex, groups); } if (save < 1000) { m->mothurOut("[WARNING]: Skipping iter " + toString(j+1) + " in LDA test. This can be caused by too few groups per class or not enough contrast within the classes. \n"); } else { //for each pair of classes vector< vector > temp = lda(adjustedLookup, rand_s, indexToClass, classes); //[numComparison][numOTUs] if (temp.size() != 0) { results.push_back(temp); } if (m->debug) { m->mothurOut("[DEBUG]: after lda. \n."); } } } if (results.size() == 0) { return sigOTUS; } if (m->control_pressed) { return sigOTUS; } //m = max([numpy.mean([means[k][kk][p] for kk in range(boots)]) for p in range(len(pairs))]) int k = 0; for (it = bins.begin(); it != bins.end(); it++) { //[numOTUs] - need to go through bins so we can tie adjustedLookup back to the binNumber. adjustedLookup[0] ->bins entry[0]. vector averageForEachComparison; averageForEachComparison.resize(results[0].size(), 0.0); double maxM = 0.0; //max of averages for each comparison for (int j = 0; j < results[0].size(); j++) { //numComparisons for (int i = 0; i < results.size(); i++) { //iters averageForEachComparison[j]+= results[i][j][k]; } averageForEachComparison[j] /= (double) results.size(); if (averageForEachComparison[j] > maxM) { maxM = averageForEachComparison[j]; } } //res[k] = math.copysign(1.0,m)*math.log(1.0+math.fabs(m),10) double multiple = 1.0; if (maxM < 0.0) { multiple = -1.0; } double resK = multiple * log10(1.0+abs(maxM)); if (resK > ldaThreshold) { sigOTUS[it->first] = resK; } k++; } return sigOTUS; } catch(exception& e) { m->errorOut(e, "LefseCommand", "testLDA"); exit(1); } } //********************************************************************************************************************** vector< vector > LefseCommand::getMeans(vector& lookup, map >& class2GroupIndex) { try { int numBins = lookup[0]->getNumBins(); int numClasses = class2GroupIndex.size(); vector< vector > means; //[numOTUS][classes] means.resize(numBins); for (int i = 0; i < means.size(); i++) { means[i].resize(numClasses, 0.0); } map indexToClass; int count = 0; //shortcut for vectors below map quickIndex; vector classCounts; for (map >::iterator it = class2GroupIndex.begin(); it != class2GroupIndex.end(); it++) { for (int i = 0; i < (it->second).size(); i++) { indexToClass[(it->second)[i]] = it->first; } quickIndex[it->first] = count; count++; classCounts.push_back((it->second).size()); } for (int i = 0; i < numBins; i++) { for (int j = 0; j < lookup.size(); j++) { if (m->control_pressed) { return means; } means[i][quickIndex[indexToClass[j]]] += lookup[j]->getAbundance(i); } } for (int i = 0; i < numBins; i++) { for (int j = 0; j < numClasses; j++) { means[i][j] /= (double) classCounts[j]; } } return means; } catch(exception& e) { m->errorOut(e, "LefseCommand", "getMeans"); exit(1); } } //********************************************************************************************************************** vector< vector > LefseCommand::lda(vector< vector >& adjustedLookup, vector rand_s, map& indexToClass, vector classes) { try { //shortcut for vectors below map quickIndex; for (int i = 0; i < classes.size(); i++) { quickIndex[classes[i]] = i; } vector randClass; //classes for rand sample vector counts; counts.resize(classes.size(), 0); for (int i = 0; i < rand_s.size(); i++) { string thisClass = indexToClass[rand_s[i]]; randClass.push_back(thisClass); counts[quickIndex[thisClass]]++; } vector< vector > a; //[numOTUs][numSampled] for (int i = 0; i < adjustedLookup.size(); i++) { vector temp; for (int j = 0; j < rand_s.size(); j++) { temp.push_back(adjustedLookup[i][rand_s[j]]); } a.push_back(temp); } LinearAlgebra linear; vector< vector > means; bool ignore; vector< vector > scaling = linear.lda(a, randClass, means, ignore); //means are returned sorted, quickIndex sorts as well since it uses a map. means[class][otu] = if (ignore) { scaling.clear(); return scaling; } if (m->control_pressed) { return scaling; } vector< vector > w; w.resize(a.size()); //w.unit <- w/sqrt(sum(w^2)) double denom = 0.0; for (int i = 0; i < scaling.size(); i++) { w[i].push_back(scaling[i][0]); denom += (w[i][0]*w[i][0]); } denom = sqrt(denom); for (int i = 0; i < w.size(); i++) { w[i][0] /= denom; } //[numOTUs][1] - w.unit //robjects.r('LD <- xy.matrix%*%w.unit') [numSampled][numOtus] * [numOTUs][1] vector< vector > LD = linear.matrix_mult(linear.transpose(a), w); //find means for each groups LDs vector LDMeans; LDMeans.resize(classes.size(), 0.0); //means[0] -> average for [group0]. for (int i = 0; i < LD.size(); i++) { LDMeans[quickIndex[randClass[i]]] += LD[i][0]; } for (int i = 0; i < LDMeans.size(); i++) { LDMeans[i] /= (double) counts[i]; } //calculate for each comparisons i.e. with groups A,B,C = AB, AC, BC = 3; vector< vector > results;// [numComparison][numOTUs] for (int i = 0; i < LDMeans.size(); i++) { for (int l = 0; l < i; l++) { if (m->control_pressed) { return scaling; } //robjects.r('effect.size <- abs(mean(LD[sub_d[,"class"]=="'+p[0]+'"]) - mean(LD[sub_d[,"class"]=="'+p[1]+'"]))') double effectSize = abs(LDMeans[i] - LDMeans[l]); //scal = robjects.r('wfinal <- w.unit * effect.size') vector compResults; for (int j = 0; j < w.size(); j++) { //[numOTUs][1] //coeff = [abs(float(v)) if not math.isnan(float(v)) else 0.0 for v in scal] double coeff = abs(w[j][0]*effectSize); if (isnan(coeff) || isinf(coeff)) { coeff = 0.0; } //gm = abs(res[p[0]][j] - res[p[1]][j]) - res is the means for each group for each otu double gm = abs(means[i][j] - means[l][j]); //means[k][i].append((gm+coeff[j])*0.5) compResults.push_back((gm+coeff)*0.5); } results.push_back(compResults); } } return results; } catch(exception& e) { m->errorOut(e, "LefseCommand", "lda"); exit(1); } } //********************************************************************************************************************** //modelled after lefse.py contast_within_classes_or_few_per_class function bool LefseCommand::contastWithinClassesOrFewPerClass(vector< vector >& lookup, vector rands, int minCl, map > class2GroupIndex, map indexToClass) { try { set cls; int countFound = 0; for (int i = 0; i < rands.size(); i++) { //fill cls with the classes represented in the random selection for (map >::iterator it = class2GroupIndex.begin(); it != class2GroupIndex.end(); it++) { if (m->inUsersGroups(rands[i], (it->second))) { cls.insert(it->first); countFound++; } } } //sanity check if (rands.size() != countFound) { m->mothurOut("oops, should never get here, missing something.\n"); } if (cls.size() < class2GroupIndex.size()) { return true; } //some classes are not present in sampling for (set::iterator it = cls.begin(); it != cls.end(); it++) { if (cls.count(*it) < minCl) { return true; } //this sampling has class count below minimum } //for this otu int numBins = lookup.size(); for (int i = 0; i < numBins; i++) { if (m->control_pressed) { break; } //break up random sampling by class map > class2Values; //maps class name -> set of abunds present in random sampling. F003Early -> 0.001, 0.003... for (int j = 0; j < rands.size(); j++) { class2Values[indexToClass[rands[j]]].insert(lookup[i][rands[j]]); //rands[j] = index of randomly selected group in lookup, randIndex2Class[rands[j]] = class this group belongs to. lookup[rands[j]]->getAbundance(i) = abundance of this group for this OTU. } //are the unique values less than we want //if (len(set(col)) <= min_cl and min_cl > 1) or (min_cl == 1 and len(set(col)) <= 1): for (map >::iterator it = class2Values.begin(); it != class2Values.end(); it++) { if (((it->second).size() <= minCl && minCl > 1) || (minCl == 1 && (it->second).size() <= 1)) { return true; } } } return false; } catch(exception& e) { m->errorOut(e, "LefseCommand", "contastWithinClassesOrFewPerClass"); exit(1); } } //********************************************************************************************************************** int LefseCommand::printResults(vector< vector > means, map sigKW, map sigLDA, string label, vector classes) { try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = label; string outputFileName = getOutputFileName("summary",variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["summary"].push_back(outputFileName); //output headers out << "OTU\tLogMaxMean\tClass\tLDA\tpValue\n"; string temp = ""; for (int i = 0; i < means.size(); i++) { //[numOTUs][classes] //find max mean of classes double maxMean = -1.0; string maxClass = "none"; for (int j = 0; j < means[i].size(); j++) { if (means[i][j] > maxMean) { maxMean = means[i][j]; maxClass = classes[j]; } } //str(math.log(max(max(v),1.0),10.0)) double logMaxMean = 1.0; if (maxMean > logMaxMean) { logMaxMean = maxMean; } logMaxMean = log10(logMaxMean); out << m->currentSharedBinLabels[i] << '\t' << logMaxMean << '\t'; if (m->debug) { temp = m->currentSharedBinLabels[i] + '\t' + toString(logMaxMean) + '\t'; } map::iterator it = sigLDA.find(i); if (it != sigLDA.end()) { out << maxClass << '\t' << it->second << '\t' << sigKW[i] << endl; //sigLDA is a subset of sigKW so no need to look if (m->debug) { temp += maxClass + '\t' + toString(it->second) + '\t' + toString(sigKW[i]) + '\n'; m->mothurOut(temp); temp = ""; } }else { out << '-' << endl; } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "LefseCommand", "printResults"); exit(1); } } //********************************************************************************************************************** //printToCoutForRTesting(adjustedLookup, rand_s, class2GroupIndex, numBins); bool LefseCommand::printToCoutForRTesting(vector< vector >& adjustedLookup, vector rand_s, map >& class2GroupIndex, map bins, map >& subClass2GroupIndex, vector groups) { try { cout << "rand_s = "; for (int h = 0; h < rand_s.size(); h++) { cout << rand_s[h] << '\t'; } cout << endl; //print otu data int count = 0; for (map::iterator it = bins.begin(); it != bins.end(); it++) { if (m->control_pressed) { break; } cout << m->currentSharedBinLabels[it->first] << " <- c("; for (int h = 0; h < rand_s.size()-1; h++) { cout << (adjustedLookup[count][rand_s[h]]) << ", "; } cout << (adjustedLookup[count][rand_s[rand_s.size()-1]]) << ")\n"; count++; } /* string tempOutput = ""; for (int h = 0; h < rand_s.size(); h++) { //find class this index is in for (map >::iterator it = class2GroupIndex.begin(); it!= class2GroupIndex.end(); it++) { if (m->inUsersGroups(rand_s[h], (it->second)) ) { cout << (h+1) << " <- c(\"" +it->first + "\")\n" ; } } }*/ string tempOutput = "treatments <- c("; for (int h = 0; h < rand_s.size(); h++) { //find class this index is in for (map >::iterator it = class2GroupIndex.begin(); it!= class2GroupIndex.end(); it++) { if (m->inUsersGroups(rand_s[h], (it->second)) ) { tempOutput += "\"" +it->first + "\"" + ","; } //"\"" +it->first + "\"" } } tempOutput = tempOutput.substr(0, tempOutput.length()-1); tempOutput += ")\n"; cout << tempOutput; /* if (subclass != "") { string tempOutput = "sub <- c("; for (int h = 0; h < rand_s.size(); h++) { //find class this index is in for (map >::iterator it = subClass2GroupIndex.begin(); it!= subClass2GroupIndex.end(); it++) { if (m->inUsersGroups(rand_s[h], (it->second)) ) { tempOutput += "\"" +it->first + "\"" + ','; } } } tempOutput = tempOutput.substr(0, tempOutput.length()-1); tempOutput += ")\n"; cout << tempOutput; } if (subject) { string tempOutput = "group <- c("; for (int h = 0; h < groups.size(); h++) { tempOutput += "\"" +groups[h] + "\"" + ','; } tempOutput = tempOutput.substr(0, tempOutput.length()-1); tempOutput += ")\n"; cout << tempOutput; }*/ //print data frame tempOutput = "dat <- data.frame("; for (map::iterator it = bins.begin(); it != bins.end(); it++) { if (m->control_pressed) { break; } tempOutput += "\"" + m->currentSharedBinLabels[it->first] + "\"=" + m->currentSharedBinLabels[it->first] + ","; } //tempOutput = tempOutput.substr(0, tempOutput.length()-1); tempOutput += " class=treatments"; //if (subclass != "") { tempOutput += ", subclass=sub"; } //if (subject) { tempOutput += ", subject=group"; } tempOutput += ")\n"; cout << tempOutput; tempOutput = "z <- suppressWarnings(mylda(as.formula(class ~ "; for (map::iterator it = bins.begin(); it != bins.end(); it++) { if (m->control_pressed) { break; } tempOutput += m->currentSharedBinLabels[it->first] + "+"; } tempOutput = tempOutput.substr(0, tempOutput.length()-1); //rip off extra plus sign tempOutput += "), data = dat, tol = 1e-10))"; cout << tempOutput + "\nz\n"; cout << "w <- z$scaling[,1]\n"; //robjects.r('w <- z$scaling[,1]') cout << "w.unit <- w/sqrt(sum(w^2))\n"; //robjects.r('w.unit <- w/sqrt(sum(w^2))') cout << "ss <- dat[,-match(\"class\",colnames(dat))]\n"; //robjects.r('ss <- sub_d[,-match("class",colnames(sub_d))]') //if (subclass != "") { cout << "ss <- ss[,-match(\"subclass\",colnames(ss))]\n"; }//robjects.r('ss <- ss[,-match("subclass",colnames(ss))]') //if (subject) { cout << "ss <- ss[,-match(\"subject\",colnames(ss))]\n"; }//robjects.r('ss <- ss[,-match("subject",colnames(ss))]') cout << "xy.matrix <- as.matrix(ss)\n"; //robjects.r('xy.matrix <- as.matrix(ss)') cout << "LD <- xy.matrix%*%w.unit\n"; //robjects.r('LD <- xy.matrix%*%w.unit') cout << "effect.size <- abs(mean(LD[dat[,\"class\"]==\"'+p[0]+'\"]) - mean(LD[dat[,\"class\"]==\"'+p[1]+'\"]))\n"; //robjects.r('effect.size <- abs(mean(LD[sub_d[,"class"]=="'+p[0]+'"]) - mean(LD[sub_d[,"class"]=="'+p[1]+'"]))') cout << "wfinal <- w.unit * effect.size\n"; //scal = robjects.r('wfinal <- w.unit * effect.size') cout << "mm <- z$means\n"; //rres = robjects.r('mm <- z$means') return true; } catch(exception& e) { m->errorOut(e, "LefseCommand", "printToCoutForRTesting"); exit(1); } } //********************************************************************************************************************** int LefseCommand::makeShared(int numDesignLines) { try { ifstream in; m->openInputFile(sharedfile, in); vector< vector > lines; for(int i = 0; i < numDesignLines; i++) { if (m->control_pressed) { return 0; } string line = m->getline(in); cout << line << endl; vector pieces = m->splitWhiteSpace(line); lines.push_back(pieces); } ofstream out; m->openOutputFile(sharedfile+".design", out); out << "group"; for (int j = 0; j < lines.size(); j++) { out << '\t' << lines[j][0]; } out << endl; for (int j = 1; j < lines[0].size(); j++) { out <<(j-1); for (int i = 0; i < lines.size(); i++) { out << '\t' << lines[i][j]; } out << endl; } out.close(); DesignMap design(sharedfile+".design"); vector lookup; for (int k = 0; k < lines[0].size()-1; k++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel("0.03"); temp->setGroup(toString(k)); lookup.push_back(temp); } m->currentSharedBinLabels.clear(); int count = 0; while (!in.eof()) { if (m->control_pressed) { return 0; } string line = m->getline(in); vector pieces = m->splitWhiteSpace(line); float sum = 0.0; for (int i = 1; i < pieces.size(); i++) { float value; m->mothurConvert(pieces[i], value); sum += value; } if (sum != 0.0) { //cout << count << '\t'; for (int i = 1; i < pieces.size(); i++) { float value; m->mothurConvert(pieces[i], value); lookup[i-1]->push_back(value, toString(i-1)); //cout << pieces[i] << '\t'; } m->currentSharedBinLabels.push_back(toString(count)); //m->currentBinLabels.push_back(pieces[0]); //cout << line<< endl; //cout << endl; } count++; } in.close(); for (int k = 0; k < lookup.size(); k++) { //cout << "0.03" << '\t' << toString(k) << endl; lookup[k]->print(cout); } process(lookup, design); return 0; } catch(exception& e) { m->errorOut(e, "LefseCommand", "printToCoutForRTesting"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/lefsecommand.h000066400000000000000000000106071255543666200210510ustar00rootroot00000000000000// // lefsecommand.h // Mothur // // Created by SarahsWork on 6/12/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef __Mothur__lefsecommand__ #define __Mothur__lefsecommand__ #include "command.hpp" /* Columns = groups, rows are OTUs, class = design From http://huttenhower.sph.harvard.edu/galaxy/root?tool_id=lefse_upload Input data consist of a collection of m samples (columns) each made up of n numerical features (rows, typically normalized per-sample, red representing high values and green low). These samples are labeled with a class (taking two or more possible values) that represents the main biological hypothesis under investigation; they may also have one or more subclass labels reflecting within-class groupings. Step 1: the Kruskall-Wallis test analyzes all features, testing whether the values in different classes are differentially distributed. Features violating the null hypothesis are further analyzed in Step 2. Step 2: the pairwise Wilcoxon test checks whether all pairwise comparisons between subclasses within different classes significantly agree with the class level trend. Step 3: the resulting subset of vectors is used to build a Linear Discriminant Analysis model from which the relative difference among classes is used to rank the features. The final output thus consists of a list of features that are discriminative with respect to the classes, consistent with the subclass grouping within classes, and ranked according to the effect size with which they differentiate classes. */ #include "command.hpp" #include "inputdata.h" #include "designmap.h" /**************************************************************************************************/ class LefseCommand : public Command { public: LefseCommand(string); LefseCommand(); ~LefseCommand(){} vector setParameters(); string getCommandName() { return "lefse"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "Segata, N., J. Izard, L. Waldron, D. Gevers, L. Miropolsky, W. S. Garrett, and C. Huttenhower. 2011. Metagenomic biomarker discovery and explanation. Genome Biol 12:R60, http://www.mothur.org/wiki/Lefse"; } string getDescription() { return "brief description"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, allLines, wilc, wilcsamename, curv, subject, normMillion; string outputDir, sharedfile, designfile, mclass, subclass, rankTec, multiClassStrat; vector outputNames; set labels; double anovaAlpha, wilcoxonAlpha, fBoots, ldaThreshold; int nlogs, iters, strict, minC; int process(vector&, DesignMap&); int normalize(vector&); map runKruskalWallis(vector&, DesignMap&); map runWilcoxon(vector&, DesignMap&, map, map >& class2SubClasses, map >& subClass2GroupIndex, map); bool testOTUWilcoxon(map >& class2SubClasses, vector abunds, map >& subClass2GroupIndex, map); map testLDA(vector&, map, map >& class2GroupIndex, map >&); bool contastWithinClassesOrFewPerClass(vector< vector >&, vector rands, int minCl, map > class2GroupIndex, map indexToClass); vector< vector > lda(vector< vector >& adjustedLookup, vector rand_s, map& indexToClass, vector); vector< vector > getMeans(vector& lookup, map >& class2GroupIndex); int printResults(vector< vector >, map, map, string, vector); //for testing bool printToCoutForRTesting(vector< vector >& adjustedLookup, vector rand_s, map >& class2GroupIndex, map bins, map >&, vector); int makeShared(int); }; /**************************************************************************************************/ #endif /* defined(__Mothur__lefsecommand__) */ mothur-1.36.1/source/commands/libshuffcommand.cpp000066400000000000000000000574071255543666200221210ustar00rootroot00000000000000/* * libshuffcommand.cpp * Mothur * * Created by Sarah Westcott on 3/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class is designed to implement an integral form of the Cramer-von Mises statistic. you may refer to the "Integration of Microbial Ecology and Statistics: A Test To Compare Gene Libraries" paper in Applied and Environmental Microbiology, Sept. 2004, p. 5485-5492 0099-2240/04/$8.00+0 DOI: 10.1128/AEM.70.9.5485-5492.2004 Copyright 2004 American Society for Microbiology for more information. */ #include "libshuffcommand.h" #include "libshuff.h" #include "slibshuff.h" #include "dlibshuff.h" //********************************************************************************************************************** vector LibShuffCommand::setParameters(){ try { CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "none", "none","coverage-libshuffsummary",false,true,true); parameters.push_back(pphylip); CommandParameter pgroup("group", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pgroup); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter piters("iters", "Number", "", "10000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pstep("step", "Number", "", "0.01", "", "", "","",false,false); parameters.push_back(pstep); CommandParameter pcutoff("cutoff", "Number", "", "1.0", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter pform("form", "Multiple", "discrete-integral", "integral", "", "", "","",false,false); parameters.push_back(pform); CommandParameter psim("sim", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(psim); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "LibShuffCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string LibShuffCommand::getHelpString(){ try { string helpString = ""; helpString += "The libshuff command parameters are phylip, group, sim, groups, iters, step, form and cutoff. phylip and group parameters are required, unless you have valid current files.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. You must enter at least 2 valid groups.\n"; helpString += "The group names are separated by dashes. The iters parameter allows you to specify how many random matrices you would like compared to your matrix.\n"; helpString += "The step parameter allows you to specify change in distance you would like between each output if you are using the discrete form.\n"; helpString += "The form parameter allows you to specify if you would like to analyze your matrix using the discrete or integral form. Your options are integral or discrete.\n"; helpString += "The libshuff command should be in the following format: libshuff(groups=yourGroups, iters=yourIters, cutOff=yourCutOff, form=yourForm, step=yourStep).\n"; helpString += "Example libshuff(groups=A-B-C, iters=500, form=discrete, step=0.01, cutOff=2.0).\n"; helpString += "The default value for groups is all the groups in your groupfile, iters is 10000, cutoff is 1.0, form is integral and step is 0.01.\n"; helpString += "The libshuff command output two files: .coverage and .slsummary their descriptions are in the manual.\n"; helpString += "Note: No spaces between parameter labels (i.e. iters), '=' and parameters (i.e.yourIters).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "LibShuffCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string LibShuffCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "coverage") { pattern = "[filename],libshuff.coverage"; } else if (type == "libshuffsummary") { pattern = "[filename],libshuff.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "LibShuffCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** LibShuffCommand::LibShuffCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["coverage"] = tempOutNames; outputTypes["libshuffsummary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "LibShuffCommand", "LibShuffCommand"); exit(1); } } //********************************************************************************************************************** LibShuffCommand::LibShuffCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["coverage"] = tempOutNames; outputTypes["libshuffsummary"] = tempOutNames; string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } } //check for required parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { phylipfile = ""; abort = true; } else if (phylipfile == "not found") { phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You must provide a phylip file."); m->mothurOutEndLine(); abort = true; } }else { m->setPhylipFile(phylipfile); } //check for required parameters groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You must provide a group file."); m->mothurOutEndLine(); abort = true; } }else { m->setGroupFile(groupfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(phylipfile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; savegroups = groups; } else { savegroups = groups; m->splitAtDash(groups, Groups); m->setGroups(Groups); } string temp; temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "10000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "1.0"; } m->mothurConvert(temp, cutOff); temp = validParameter.validFile(parameters, "step", false); if (temp == "not found") { temp = "0.01"; } m->mothurConvert(temp, step); temp = validParameter.validFile(parameters, "sim", false); if (temp == "not found") { temp = "F"; } sim = m->isTrue(temp); userform = validParameter.validFile(parameters, "form", false); if (userform == "not found") { userform = "integral"; } } } catch(exception& e) { m->errorOut(e, "LibShuffCommand", "LibShuffCommand"); exit(1); } } //********************************************************************************************************************** int LibShuffCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //read files groupMap = new GroupMap(groupfile); int error = groupMap->readMap(); if (error == 1) { delete groupMap; return 0; } ifstream in; m->openInputFile(phylipfile, in); matrix = new FullMatrix(in, groupMap, sim); //reads the matrix file in.close(); if (m->control_pressed) { delete groupMap; delete matrix; return 0; } //if files don't match... if (matrix->getNumSeqs() < groupMap->getNumSeqs()) { m->mothurOut("Your distance file contains " + toString(matrix->getNumSeqs()) + " sequences, and your group file contains " + toString(groupMap->getNumSeqs()) + " sequences."); m->mothurOutEndLine(); //create new group file if(outputDir == "") { outputDir += m->hasPath(groupfile); } string newGroupFile = outputDir + m->getRootName(m->getSimpleName(groupfile)) + "editted.groups"; outputNames.push_back(newGroupFile); ofstream outGroups; m->openOutputFile(newGroupFile, outGroups); for (int i = 0; i < matrix->getNumSeqs(); i++) { if (m->control_pressed) { delete groupMap; delete matrix; outGroups.close(); m->mothurRemove(newGroupFile); return 0; } Names temp = matrix->getRowInfo(i); outGroups << temp.seqName << '\t' << temp.groupName << endl; } outGroups.close(); m->mothurOut(newGroupFile + " is a new group file containing only the sequence that are in your distance file. I will read this file instead."); m->mothurOutEndLine(); //read new groupfile delete groupMap; groupfile = newGroupFile; groupMap = new GroupMap(groupfile); groupMap->readMap(); if (m->control_pressed) { delete groupMap; delete matrix; m->mothurRemove(newGroupFile); return 0; } } setGroups(); //set the groups to be analyzed and sorts them if (numGroups < 2) { m->mothurOut("[ERROR]: libshuff requires at least 2 groups, you only have " + toString(numGroups) + ", aborting."); m->mothurOutEndLine(); m->control_pressed = true; } if (m->control_pressed) { delete groupMap; delete matrix; return 0; } /********************************************************************************************/ //this is needed because when we read the matrix we sort it into groups in alphabetical order //the rest of the command and the classes used in this command assume specific order /********************************************************************************************/ matrix->setGroups(groupMap->getNamesOfGroups()); vector sizes; for (int i = 0; i < (groupMap->getNamesOfGroups()).size(); i++) { sizes.push_back(groupMap->getNumSeqs((groupMap->getNamesOfGroups())[i])); } matrix->setSizes(sizes); if(userform == "discrete"){ form = new DLibshuff(matrix, iters, step, cutOff); } else{ form = new SLibshuff(matrix, iters, cutOff); } savedDXYValues = form->evaluateAll(); savedMinValues = form->getSavedMins(); if (m->control_pressed) { delete form; m->clearGroups(); delete matrix; delete groupMap; return 0; } pValueCounts.resize(numGroups); for(int i=0;icontrol_pressed) { outputTypes.clear(); delete form; m->clearGroups(); delete matrix; delete groupMap; return 0; } Progress* reading = new Progress(); for(int i=0;icontrol_pressed) { outputTypes.clear(); delete form; m->clearGroups(); delete matrix; delete groupMap; delete reading; return 0; } reading->newLine(groupNames[i]+'-'+groupNames[j], iters); int spoti = groupMap->groupIndex[groupNames[i]]; //neccessary in case user selects groups so you know where they are in the matrix int spotj = groupMap->groupIndex[groupNames[j]]; for(int p=0;pcontrol_pressed) { outputTypes.clear(); delete form; m->clearGroups(); delete matrix; delete groupMap; delete reading; return 0; } form->randomizeGroups(spoti,spotj); if(form->evaluatePair(spoti,spotj) >= savedDXYValues[spoti][spotj]) { pValueCounts[i][j]++; } if(form->evaluatePair(spotj,spoti) >= savedDXYValues[spotj][spoti]) { pValueCounts[j][i]++; } if (m->control_pressed) { outputTypes.clear(); delete form; m->clearGroups(); delete matrix; delete groupMap; delete reading; return 0; } reading->update(p); } form->resetGroup(spoti); form->resetGroup(spotj); } } if (m->control_pressed) { outputTypes.clear(); delete form; m->clearGroups(); delete matrix; delete groupMap; delete reading; return 0; } reading->finish(); delete reading; m->mothurOutEndLine(); printSummaryFile(); printCoverageFile(); //clear out users groups m->clearGroups(); delete form; delete matrix; delete groupMap; if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "LibShuffCommand", "execute"); exit(1); } } //********************************************************************************************************************** int LibShuffCommand::printCoverageFile() { try { ofstream outCov; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(phylipfile)); summaryFile = getOutputFileName("coverage", variables); m->openOutputFile(summaryFile, outCov); outputNames.push_back(summaryFile); outputTypes["coverage"].push_back(summaryFile); outCov.setf(ios::fixed, ios::floatfield); outCov.setf(ios::showpoint); //cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); map > allDistances; map >::iterator it; vector > indices(numGroups); int numIndices = numGroups * numGroups; int index = 0; for(int i=0;igroupIndex[groupNames[i]]; //neccessary in case user selects groups so you know where they are in the matrix int spotj = groupMap->groupIndex[groupNames[j]]; for(int k=0;kcontrol_pressed) { outCov.close(); return 0; } if(allDistances[savedMinValues[spoti][spotj][k]].size() != 0){ allDistances[savedMinValues[spoti][spotj][k]][indices[i][j]]++; } else{ allDistances[savedMinValues[spoti][spotj][k]].assign(numIndices, 0); allDistances[savedMinValues[spoti][spotj][k]][indices[i][j]] = 1; } } } } it=allDistances.begin(); //cout << setprecision(8); vector prevRow = it->second; it++; for(;it!=allDistances.end();it++){ for(int i=0;isecond.size();i++){ it->second[i] += prevRow[i]; } prevRow = it->second; } vector lastRow = allDistances.rbegin()->second; outCov << setprecision(8); outCov << "dist"; for (int i = 0; i < numGroups; i++){ outCov << '\t' << groupNames[i]; } for (int i=0;icontrol_pressed) { outCov.close(); return 0; } outCov << '\t' << groupNames[i] << '-' << groupNames[j] << '\t'; outCov << groupNames[j] << '-' << groupNames[i]; } } outCov << endl; for(it=allDistances.begin();it!=allDistances.end();it++){ outCov << it->first << '\t'; for(int i=0;isecond[indices[i][i]]/(float)lastRow[indices[i][i]] << '\t'; } for(int i=0;icontrol_pressed) { outCov.close(); return 0; } outCov << it->second[indices[i][j]]/(float)lastRow[indices[i][j]] << '\t'; outCov << it->second[indices[j][i]]/(float)lastRow[indices[j][i]] << '\t'; } } outCov << endl; } outCov.close(); return 0; } catch(exception& e) { m->errorOut(e, "LibShuffCommand", "printCoverageFile"); exit(1); } } //********************************************************************************************************************** int LibShuffCommand::printSummaryFile() { try { ofstream outSum; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(phylipfile)); summaryFile = getOutputFileName("libshuffsummary",variables); m->openOutputFile(summaryFile, outSum); outputNames.push_back(summaryFile); outputTypes["libshuffsummary"].push_back(summaryFile); outSum.setf(ios::fixed, ios::floatfield); outSum.setf(ios::showpoint); cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); cout << setw(20) << left << "Comparison" << '\t' << setprecision(8) << "dCXYScore" << '\t' << "Significance" << endl; m->mothurOutJustToLog("Comparison\tdCXYScore\tSignificance"); m->mothurOutEndLine(); outSum << setw(20) << left << "Comparison" << '\t' << setprecision(8) << "dCXYScore" << '\t' << "Significance" << endl; int precision = (int)log10(iters); for(int i=0;icontrol_pressed) { outSum.close(); return 0; } int spoti = groupMap->groupIndex[groupNames[i]]; //neccessary in case user selects groups so you know where they are in the matrix int spotj = groupMap->groupIndex[groupNames[j]]; if(pValueCounts[i][j]){ cout << setw(20) << left << groupNames[i]+'-'+groupNames[j] << '\t' << setprecision(8) << savedDXYValues[spoti][spotj] << '\t' << setprecision(precision) << pValueCounts[i][j]/(float)iters << endl; m->mothurOutJustToLog(groupNames[i]+"-"+groupNames[j] + "\t" + toString(savedDXYValues[spoti][spotj]) + "\t" + toString((pValueCounts[i][j]/(float)iters))); m->mothurOutEndLine(); outSum << setw(20) << left << groupNames[i]+'-'+groupNames[j] << '\t' << setprecision(8) << savedDXYValues[spoti][spotj] << '\t' << setprecision(precision) << pValueCounts[i][j]/(float)iters << endl; } else{ cout << setw(20) << left << groupNames[i]+'-'+groupNames[j] << '\t' << setprecision(8) << savedDXYValues[spoti][spotj] << '\t' << '<' <mothurOutJustToLog(groupNames[i]+"-"+groupNames[j] + "\t" + toString(savedDXYValues[spoti][spotj]) + "\t" + toString((1/(float)iters))); m->mothurOutEndLine(); outSum << setw(20) << left << groupNames[i]+'-'+groupNames[j] << '\t' << setprecision(8) << savedDXYValues[spoti][spotj] << '\t' << '<' <mothurOutJustToLog(groupNames[j]+"-"+groupNames[i] + "\t" + toString(savedDXYValues[spotj][spoti]) + "\t" + toString((pValueCounts[j][i]/(float)iters))); m->mothurOutEndLine(); outSum << setw(20) << left << groupNames[j]+'-'+groupNames[i] << '\t' << setprecision(8) << savedDXYValues[spotj][spoti] << '\t' << setprecision (precision) << pValueCounts[j][i]/(float)iters << endl; } else{ cout << setw(20) << left << groupNames[j]+'-'+groupNames[i] << '\t' << setprecision(8) << savedDXYValues[spotj][spoti] << '\t' << '<' <mothurOutJustToLog(groupNames[j]+"-"+groupNames[i] + "\t" + toString(savedDXYValues[spotj][spoti]) + "\t" + toString((1/(float)iters))); m->mothurOutEndLine(); outSum << setw(20) << left << groupNames[j]+'-'+groupNames[i] << '\t' << setprecision(8) << savedDXYValues[spotj][spoti] << '\t' << '<' <errorOut(e, "LibShuffCommand", "printSummaryFile"); exit(1); } } //********************************************************************************************************************** void LibShuffCommand::setGroups() { try { vector myGroups = m->getGroups(); //if the user has not entered specific groups to analyze then do them all if (m->getNumGroups() == 0) { numGroups = groupMap->getNumGroups(); for (int i=0; i < numGroups; i++) { myGroups.push_back((groupMap->getNamesOfGroups())[i]); } } else { if (savegroups != "all") { //check that groups are valid for (int i = 0; i < myGroups.size(); i++) { if (groupMap->isValidGroup(myGroups[i]) != true) { m->mothurOut(myGroups[i] + " is not a valid group, and will be disregarded."); m->mothurOutEndLine(); // erase the invalid group from globaldata->Groups myGroups.erase(myGroups.begin()+i); } } //if the user only entered invalid groups if ((myGroups.size() == 0) || (myGroups.size() == 1)) { numGroups = groupMap->getNumGroups(); for (int i=0; i < numGroups; i++) { myGroups.push_back((groupMap->getNamesOfGroups())[i]); } m->mothurOut("When using the groups parameter you must have at least 2 valid groups. I will run the command using all the groups in your groupfile."); m->mothurOutEndLine(); } else { numGroups = myGroups.size(); } } else { //users wants all groups numGroups = groupMap->getNumGroups(); myGroups.clear(); for (int i=0; i < numGroups; i++) { myGroups.push_back((groupMap->getNamesOfGroups())[i]); } } } //sort so labels match sort(myGroups.begin(), myGroups.end()); //sort //sort(groupMap->namesOfGroups.begin(), groupMap->namesOfGroups.end()); for (int i = 0; i < (groupMap->getNamesOfGroups()).size(); i++) { groupMap->groupIndex[(groupMap->getNamesOfGroups())[i]] = i; } groupNames = myGroups; m->setGroups(myGroups); } catch(exception& e) { m->errorOut(e, "LibShuffCommand", "setGroups"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/commands/libshuffcommand.h000066400000000000000000000035021255543666200215510ustar00rootroot00000000000000#ifndef LIBSHUFFCOMMAND_H #define LIBSHUFFCOMMAND_H /* * libshuffcommand.h * Mothur * * Created by Sarah Westcott on 3/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "fullmatrix.h" #include "libshuff.h" #include "groupmap.h" class LibShuffCommand : public Command { public: LibShuffCommand(string); LibShuffCommand(); ~LibShuffCommand(){}; vector setParameters(); string getCommandName() { return "libshuff"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Singleton DR, Furlong MA, Rathbun SL, Whitman WB (2001). Quantitative comparisons of 16S rRNA gene sequence libraries from environmental samples. Appl Environ Microbiol 67: 4374-6. \nSchloss PD, Larget BR, Handelsman J (2004). Integration of microbial ecology and statistics: a test to compare gene libraries. Appl Environ Microbiol 70: 5485-92. \nhttp://www.mothur.org/wiki/Libshuff"; } string getDescription() { return "a generic test that describes whether two or more communities have the same structure using the Cramer-von Mises test statistic"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector groupNames; void setGroups(); int printCoverageFile(); int printSummaryFile(); GroupMap* groupMap; FullMatrix* matrix; Libshuff* form; float cutOff, step; int numGroups, numComp, iters; string coverageFile, summaryFile, phylipfile, groupfile; vector > pValueCounts; vector > savedDXYValues; vector > > savedMinValues; bool abort, sim; string outputFile, groups, userform, savegroups, outputDir; vector Groups, outputNames; //holds groups to be used }; #endif mothur-1.36.1/source/commands/listotulabelscommand.cpp000066400000000000000000000571471255543666200232060ustar00rootroot00000000000000// // listotucommand.cpp // Mothur // // Created by Sarah Westcott on 5/15/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "listotulabelscommand.h" #include "inputdata.h" //********************************************************************************************************************** vector ListOtuLabelsCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "SharedRel", "SharedRel", "none","otulabels",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "SharedRel", "SharedRel", "none","otulabels",false,false); parameters.push_back(prelabund); CommandParameter plist("list", "InputTypes", "", "", "SharedRel", "SharedRel", "none","otulabels",false,false); parameters.push_back(plist); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); //every command must have inputdir and outputdir. This allows mothur users to redirect input and output files. CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ListOtuLabelsCommand::getHelpString(){ try { string helpString = ""; helpString += "The list.otulabels lists otu labels from shared, relabund or list file. The results can be used by the get.otulabels to select specific otus with the output from classify.otu, otu.association, or corr.axes.\n"; helpString += "The list.otulabels parameters are: shared, relabund, label and groups.\n"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The groups parameter allows you to specify which of the groups you would like analyzed.\n"; helpString += "The list.otulabels commmand should be in the following format: \n"; helpString += "list.otulabels(shared=yourSharedFile, groups=yourGroup1-yourGroup2)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ListOtuLabelsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "otulabels") { pattern = "[filename],[distance],otulabels"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ListOtuLabelsCommand::ListOtuLabelsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["otulabels"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "ListOtuLabelsCommand"); exit(1); } } //********************************************************************************************************************** ListOtuLabelsCommand::ListOtuLabelsCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { //edit file types below to include only the types you added as parameters string path; it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } vector tempOutNames; outputTypes["otulabels"] = tempOutNames; //check for parameters sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { inputFileName = sharedfile; format = "sharedfile"; m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { inputFileName = relabundfile; format = "relabund"; m->setRelAbundFile(relabundfile); } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { inputFileName = listfile; format = "list"; m->setListFile(listfile); } if ((relabundfile == "") && (sharedfile == "") && (listfile== "")) { //is there are current file available for either of these? //give priority to shared, then relabund sharedfile = m->getSharedFile(); if (sharedfile != "") { inputFileName = sharedfile; format="sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputFileName = relabundfile; format="relabund"; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { inputFileName = listfile; format="list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared, list or relabund."); m->mothurOutEndLine(); abort = true; } } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputFileName); //if user entered a file with a path then preserve it } string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } } } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "ListOtuLabelsCommand"); exit(1); } } //********************************************************************************************************************** int ListOtuLabelsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } InputData input(inputFileName, format); if (format == "relabund") { vector lookup = input.getSharedRAbundFloatVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createList(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createList(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //get next line to process lookup = input.getSharedRAbundFloatVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createList(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } }else if (format == "sharedfile") { vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createList(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createList(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); createList(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } }else { ListVector* list = input.getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel()); m->mothurOutEndLine(); createList(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); createList(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); //prevent memory leak delete list; list = NULL; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //get next line to process list = input.getListVector(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { delete list; list = input.getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); createList(list); delete list; } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int ListOtuLabelsCommand::createList(vector& lookup){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("otulabels",variables); outputNames.push_back(outputFileName); outputTypes["otulabels"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); for (int i = 0; i < m->currentSharedBinLabels.size(); i++) { out << m->currentSharedBinLabels[i] << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "createList"); exit(1); } } //********************************************************************************************************************** int ListOtuLabelsCommand::createList(vector& lookup){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("otulabels",variables); outputNames.push_back(outputFileName); outputTypes["accnos"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); for (int i = 0; i < m->currentSharedBinLabels.size(); i++) { out << m->currentSharedBinLabels[i] << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "createList"); exit(1); } } //********************************************************************************************************************** int ListOtuLabelsCommand::createList(ListVector*& list){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[distance]"] = list->getLabel(); string outputFileName = getOutputFileName("otulabels",variables); outputNames.push_back(outputFileName); outputTypes["accnos"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); vector binLabels = list->getLabels(); for (int i = 0; i < binLabels.size(); i++) { out << binLabels[i] << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListOtuLabelsCommand", "createList"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/listotulabelscommand.h000066400000000000000000000033501255543666200226360ustar00rootroot00000000000000#ifndef Mothur_listotucommand_h #define Mothur_listotucommand_h // // listotucommand.h // Mothur // // Created by Sarah Westcott on 5/15/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "sharedrabundvector.h" #include "listvector.hpp" /**************************************************************************************************/ class ListOtuLabelsCommand : public Command { public: ListOtuLabelsCommand(string); ListOtuLabelsCommand(); ~ListOtuLabelsCommand(){} vector setParameters(); string getCommandName() { return "list.otulabels"; } string getCommandCategory() { return "OTU-Based Approaches"; } //commmand category choices: Sequence Processing, OTU-Based Approaches, Hypothesis Testing, Phylotype Analysis, General, Clustering and Hidden string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/List.otulabels"; } string getDescription() { return "lists otu labels from shared or relabund file. Can be used by get.otulabels with output from classify.otu, otu.association, or corr.axes to select specific otus."; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, allLines; string outputDir, sharedfile, relabundfile, label, inputFileName, format, listfile; vector outputNames; vector Groups; set labels; int createList(vector&); int createList(vector&); int createList(ListVector*&); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/listseqscommand.cpp000066400000000000000000000447361255543666200221670ustar00rootroot00000000000000/* * listseqscommand.cpp * Mothur * * Created by Sarah Westcott on 7/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "listseqscommand.h" #include "sequence.hpp" #include "listvector.hpp" #include "counttable.h" //********************************************************************************************************************** vector ListSeqsCommand::setParameters(){ try { CommandParameter pfastq("fastq", "InputTypes", "", "", "FNGLT", "FNGLT", "none","accnos",false,false,true); parameters.push_back(pfastq); CommandParameter pfasta("fasta", "InputTypes", "", "", "FNGLT", "FNGLT", "none","accnos",false,false,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "FNGLT", "FNGLT", "none","accnos",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "FNGLT", "FNGLT", "none","accnos",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "FNGLT", "FNGLT", "none","accnos",false,false,true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "FNGLT", "FNGLT", "none","accnos",false,false,true); parameters.push_back(plist); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "FNGLT", "FNGLT", "none","accnos",false,false,true); parameters.push_back(ptaxonomy); CommandParameter palignreport("alignreport", "InputTypes", "", "", "FNGLT", "FNGLT", "none","accnos",false,false); parameters.push_back(palignreport); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ListSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The list.seqs command reads a fasta, name, group, count, list, taxonomy, fastq or alignreport file and outputs a .accnos file containing sequence names.\n"; helpString += "The list.seqs command parameters are fasta, name, group, count, list, taxonomy, fastq and alignreport. You must provide one of these parameters.\n"; helpString += "The list.seqs command should be in the following format: list.seqs(fasta=yourFasta).\n"; helpString += "Example list.seqs(fasta=amazon.fasta).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ListSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "accnos") { pattern = "[filename],accnos"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ListSeqsCommand::ListSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["accnos"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "ListSeqsCommand"); exit(1); } } //********************************************************************************************************************** ListSeqsCommand::ListSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["accnos"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("alignreport"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["alignreport"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("fastq"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fastq"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } alignfile = validParameter.validFile(parameters, "alignreport", true); if (alignfile == "not open") { abort = true; } else if (alignfile == "not found") { alignfile = ""; } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { abort = true; } else if (taxfile == "not found") { taxfile = ""; } else { m->setTaxonomyFile(taxfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } fastqfile = validParameter.validFile(parameters, "fastq", true); if (fastqfile == "not open") { abort = true; } else if (fastqfile == "not found") { fastqfile = ""; } if ((fastqfile == "") && (countfile == "") && (fastafile == "") && (namefile == "") && (listfile == "") && (groupfile == "") && (alignfile == "") && (taxfile == "")) { m->mothurOut("You must provide a file."); m->mothurOutEndLine(); abort = true; } int okay = 1; if (outputDir != "") { okay++; } if (inputDir != "") { okay++; } if (parameters.size() > okay) { m->mothurOut("You may only enter one file."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "ListSeqsCommand"); exit(1); } } //********************************************************************************************************************** int ListSeqsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //read functions fill names vector if (fastafile != "") { inputFileName = fastafile; readFasta(); } else if (fastqfile != "") { inputFileName = fastqfile; readFastq(); } else if (namefile != "") { inputFileName = namefile; readName(); } else if (groupfile != "") { inputFileName = groupfile; readGroup(); } else if (alignfile != "") { inputFileName = alignfile; readAlign(); } else if (listfile != "") { inputFileName = listfile; readList(); } else if (taxfile != "") { inputFileName = taxfile; readTax(); } else if (countfile != "") { inputFileName = countfile; readCount(); } if (m->control_pressed) { outputTypes.clear(); return 0; } //sort in alphabetical order sort(names.begin(), names.end()); if (outputDir == "") { outputDir += m->hasPath(inputFileName); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); string outputFileName = getOutputFileName("accnos", variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["accnos"].push_back(outputFileName); //output to .accnos file for (int i = 0; i < names.size(); i++) { if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(outputFileName); return 0; } out << names[i] << endl; } out.close(); if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFileName); return 0; } m->setAccnosFile(outputFileName); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFileName); m->mothurOutEndLine(); m->mothurOutEndLine(); //set accnos file as new current accnosfile string current = ""; itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int ListSeqsCommand::readFastq(){ try { ifstream in; m->openInputFile(fastqfile, in); string name; //ofstream out; //string newFastaName = outputDir + m->getRootName(m->getSimpleName(fastafile)) + "numsAdded.fasta"; //m->openOutputFile(newFastaName, out); int count = 1; //string lastName = ""; while(!in.eof()){ if (m->control_pressed) { in.close(); return 0; } //read sequence name string name = m->getline(in); m->gobble(in); if (name[0] == '@') { vector splits = m->splitWhiteSpace(name); name = splits[0]; name = name.substr(1); m->checkName(name); names.push_back(name); //get rest of lines name = m->getline(in); m->gobble(in); name = m->getline(in); m->gobble(in); name = m->getline(in); m->gobble(in); } m->gobble(in); if (m->debug) { count++; cout << "[DEBUG]: count = " + toString(count) + ", name = " + name + "\n"; } } in.close(); //out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "readFastq"); exit(1); } } //********************************************************************************************************************** int ListSeqsCommand::readFasta(){ try { ifstream in; m->openInputFile(fastafile, in); string name; //ofstream out; //string newFastaName = outputDir + m->getRootName(m->getSimpleName(fastafile)) + "numsAdded.fasta"; //m->openOutputFile(newFastaName, out); int count = 1; //string lastName = ""; while(!in.eof()){ if (m->control_pressed) { in.close(); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { names.push_back(name); } m->gobble(in); if (m->debug) { count++; cout << "[DEBUG]: count = " + toString(count) + ", name = " + currSeq.getName() + "\n"; } } in.close(); //out.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** int ListSeqsCommand::readList(){ try { ifstream in; m->openInputFile(listfile, in); if(!in.eof()){ //read in list vector ListVector list(in); //for each bin for (int i = 0; i < list.getNumBins(); i++) { string binnames = list.get(i); if (m->control_pressed) { in.close(); return 0; } m->splitAtComma(binnames, names); } } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "readList"); exit(1); } } //********************************************************************************************************************** int ListSeqsCommand::readName(){ try { ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; while(!in.eof()){ if (m->control_pressed) { in.close(); return 0; } in >> firstCol; m->gobble(in); in >> secondCol; //parse second column saving each name m->splitAtComma(secondCol, names); m->gobble(in); } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "readName"); exit(1); } } //********************************************************************************************************************** int ListSeqsCommand::readGroup(){ try { ifstream in; m->openInputFile(groupfile, in); string name, group; while(!in.eof()){ if (m->control_pressed) { in.close(); return 0; } in >> name; m->gobble(in); //read from first column in >> group; //read from second column names.push_back(name); m->gobble(in); } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int ListSeqsCommand::readCount(){ try { CountTable ct; ct.readTable(countfile, false, false); if (m->control_pressed) { return 0; } names = ct.getNamesOfSeqs(); return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "readCount"); exit(1); } } //********************************************************************************************************************** //alignreport file has a column header line then all other lines contain 16 columns. we just want the first column since that contains the name int ListSeqsCommand::readAlign(){ try { ifstream in; m->openInputFile(alignfile, in); string name, junk; //read column headers for (int i = 0; i < 16; i++) { if (!in.eof()) { in >> junk; } else { break; } } //m->getline(in); while(!in.eof()){ if (m->control_pressed) { in.close(); return 0; } in >> name; //read from first column //m->getline(in); //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; } else { break; } } names.push_back(name); m->gobble(in); } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "readAlign"); exit(1); } } //********************************************************************************************************************** int ListSeqsCommand::readTax(){ try { ifstream in; m->openInputFile(taxfile, in); string name, firstCol, secondCol; while(!in.eof()){ if (m->control_pressed) { in.close(); return 0; } in >> firstCol; in >> secondCol; names.push_back(firstCol); m->gobble(in); } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "ListSeqsCommand", "readTax"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/listseqscommand.h000066400000000000000000000022631255543666200216210ustar00rootroot00000000000000#ifndef LISTSEQSCOMMAND_H #define LISTSEQSCOMMAND_H /* * listseqscommand.h * Mothur * * Created by Sarah Westcott on 7/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" class ListSeqsCommand : public Command { public: ListSeqsCommand(string); ListSeqsCommand(); ~ListSeqsCommand(){} vector setParameters(); string getCommandName() { return "list.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/List.seqs"; } string getDescription() { return "lists sequences from a list, fasta, name, group, alignreport or taxonomy file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector names, outputNames; string fastafile, namefile, groupfile, countfile, alignfile, inputFileName, outputDir, listfile, taxfile, fastqfile; bool abort; int readFasta(); int readName(); int readGroup(); int readAlign(); int readList(); int readTax(); int readCount(); int readFastq(); }; #endif mothur-1.36.1/source/commands/loadlogfilecommand.cpp000066400000000000000000000306341255543666200225710ustar00rootroot00000000000000// // loadlogfilecommand.cpp // Mothur // // Created by Sarah Westcott on 6/13/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "loadlogfilecommand.h" #include "commandoptionparser.hpp" #include "commandfactory.hpp" #include "setcurrentcommand.h" //********************************************************************************************************************** vector LoadLogfileCommand::setParameters(){ try { CommandParameter plogfile("logfile", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(plogfile); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "LoadLogfileCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string LoadLogfileCommand::getHelpString(){ try { string helpString = ""; helpString += "The load.logfile command extracts the current file names from a logfile.\n"; helpString += "The load.logfile parameter is logfile, and it is required.\n"; helpString += "The load.logfile command should be in the following format: \n"; helpString += "load.logfile(logfile=yourLogFile)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "LoadLogfileCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** LoadLogfileCommand::LoadLogfileCommand(){ try { abort = true; calledHelp = true; setParameters(); } catch(exception& e) { m->errorOut(e, "LoadLogfileCommand", "LoadLogfileCommand"); exit(1); } } //********************************************************************************************************************** LoadLogfileCommand::LoadLogfileCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("logfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["logfile"] = inputDir + it->second; } } } //get shared file, it is required logfile = validParameter.validFile(parameters, "logfile", true); if (logfile == "not open") { logfile = ""; abort = true; } else if (logfile == "not found") { m->mothurOut("The logfile parameter is required."); m->mothurOutEndLine();abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(logfile); //if user entered a file with a path then preserve it } } } catch(exception& e) { m->errorOut(e, "NewCommand", "NewCommand"); exit(1); } } //********************************************************************************************************************** int LoadLogfileCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } m->mothurOutEndLine(); m->mothurOut("Extracting current files names..."); m->mothurOutEndLine(); m->mothurOutEndLine(); CommandFactory* cFactory; cFactory = CommandFactory::getInstance(); ifstream in; m->openInputFile(logfile, in); set currentTypes = m->getCurrentTypes(); map currentFiles; string commandName = ""; bool skip = false; string line = ""; while (!in.eof()) { if (m->control_pressed) { break; } if (!skip) { line = m->getline(in); m->gobble(in); } m->gobble(in); //look for "mothur >" int pos = line.find("mothur > "); //command line int pos2 = line.find("Output File "); //indicates command completed and we can update the current file int pos3 = line.find("/*****************"); //skipping over parts where a command runs another command if (pos3 != string::npos) { while (!in.eof()) { if (m->control_pressed) { break; } line = m->getline(in); m->gobble(in); int posTemp = line.find("/*****************"); if (posTemp != string::npos) { break; } } } if (pos != string::npos) { skip=false; //extract command name and option string string input = line.substr(pos+9); CommandOptionParser parser(input); commandName = parser.getCommandString(); string options = parser.getOptionString(); //parse out parameters in option string map parameters; OptionParser optionParser(options, parameters); for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (currentTypes.count((it->first)) != 0) { //if this is a type we save if (it->second != "current") { currentFiles[it->first] = it->second; }//save the input file name as current } } }else if (pos2 != string::npos) { //read file output file names vector theseOutputNames; while (!in.eof()) { if (m->control_pressed) { break; } line = m->getline(in); m->gobble(in); int pos = line.find("mothur > "); if (pos != string::npos) { skip = true; break; } else { theseOutputNames.push_back(line); } } //ask command for the output names for each type based on inputs Command* command = cFactory->getCommand(commandName); map > thisOutputTypes = command->getOutputFiles(); for (map >::iterator it = thisOutputTypes.begin(); it != thisOutputTypes.end(); it++) { if (currentTypes.count((it->first)) != 0) { //do we save this type //if yes whats its tag map::iterator itCurrentFiles = currentFiles.find(it->first); string thisTypesCurrentFile = ""; if (itCurrentFiles != currentFiles.end()) { thisTypesCurrentFile = itCurrentFiles->second; } //outputfilename pattern for this input type string pattern = command->getOutputPattern(it->first); updateCurrent(pattern, it->first, thisTypesCurrentFile, theseOutputNames, currentFiles); //cout << "current=\n\n"; //for (map::iterator itcc = currentFiles.begin(); itcc != currentFiles.end(); itcc++) { // cout << itcc->first << '\t' << itcc->second << endl; // } } } } } in.close(); if (m->control_pressed) { return 0; } //output results string inputString = ""; for (map::iterator it = currentFiles.begin(); it != currentFiles.end(); it++) { inputString += it->first + "=" + it->second + ","; } if (inputString != "") { inputString = inputString.substr(0, inputString.length()-1); m->mothurOutEndLine(); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: set.current(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* currentCommand = new SetCurrentCommand(inputString); currentCommand->execute(); delete currentCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); } return 0; } catch(exception& e) { m->errorOut(e, "LoadLogfileCommand", "execute"); exit(1); } } //********************************************************************************************************************** int LoadLogfileCommand::updateCurrent(string pattern, string type, string thisTypesCurrentFile, vector filenames, map& currentFiles){ try { vector patterns; m->splitAtChar(pattern, patterns, '-'); for (int i = 0; i < patterns.size(); i++) { vector peices; m->splitAtChar(patterns[i], peices, ','); //cout << "patterns i = " << patterns[i] << endl; if (peices.size() != 0) { string tag = peices[peices.size()-1]; //cout << "tag = " << tag << endl; if (peices[peices.size()-1] == "[extension]") { tag = m->getExtension(thisTypesCurrentFile); } //search for the tag in the list of output files for (int h = 0; h < filenames.size(); h++) { if (m->control_pressed) { return 0; } //cout << "filename h = " << filenames[h]<< endl; string ending = filenames[h].substr(filenames[h].length()-tag.length(), tag.length()); if (ending == tag) { //if it's there and this is a type we save a current version of, save it if ((type == "column") || (type == "phylip")) { //check for format string RippedName = ""; bool foundDot = false; for (int i = filenames[h].length()-1; i >= 0; i--) { if (foundDot && (filenames[h][i] != '.')) { RippedName = filenames[h][i] + RippedName; } else if (foundDot && (filenames[h][i] == '.')) { break; } else if (!foundDot && (filenames[h][i] == '.')) { foundDot = true; } } if ((RippedName == "phylip") || (RippedName == "lt") || (RippedName == "square")) { currentFiles["phylip"] = filenames[h]; } else { currentFiles["column"] = filenames[h]; } }else { currentFiles[type] = filenames[h]; } break; } } } } return 0; } catch(exception& e) { m->errorOut(e, "LoadLogfileCommand", "updateCurrent"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/loadlogfilecommand.h000066400000000000000000000024261255543666200222340ustar00rootroot00000000000000#ifndef Mothur_loadlogfilecommand_h #define Mothur_loadlogfilecommand_h // // loadlogfilecommand.h // Mothur // // Created by Sarah Westcott on 6/13/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" /**************************************************************************************************/ class LoadLogfileCommand : public Command { public: LoadLogfileCommand(string); LoadLogfileCommand(); ~LoadLogfileCommand(){} vector setParameters(); string getCommandName() { return "load.logfile"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string) { return ""; } string getCitation() { return "http://www.mothur.org/wiki/Load.logfile"; } string getDescription() { return "extracts current files from a logfile"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string outputDir, logfile; vector outputNames; int updateCurrent(string pattern, string type, string, vector outputNames, map& currentFiles); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/makebiomcommand.cpp000066400000000000000000002073531255543666200221000ustar00rootroot00000000000000// // makebiomcommand.cpp // Mothur // // Created by Sarah Westcott on 4/16/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "makebiomcommand.h" #include "sharedrabundvector.h" #include "inputdata.h" #include "sharedutilities.h" #include "phylotree.h" //taken from http://biom-format.org/documentation/biom_format.html /* Minimal Sparse { "id":null, "format": "Biological Observation Matrix 0.9.1", "format_url": "http://biom-format.org", "type": "OTU table", "generated_by": "QIIME revision 1.4.0-dev", "date": "2011-12-19T19:00:00", "rows":[ {"id":"GG_OTU_1", "metadata":null}, {"id":"GG_OTU_2", "metadata":null}, {"id":"GG_OTU_3", "metadata":null}, {"id":"GG_OTU_4", "metadata":null}, {"id":"GG_OTU_5", "metadata":null} ], "columns": [ {"id":"Sample1", "metadata":null}, {"id":"Sample2", "metadata":null}, {"id":"Sample3", "metadata":null}, {"id":"Sample4", "metadata":null}, {"id":"Sample5", "metadata":null}, {"id":"Sample6", "metadata":null} ], "matrix_type": "sparse", "matrix_element_type": "int", "shape": [5, 6], "data":[[0,2,1], [1,0,5], [1,1,1], [1,3,2], [1,4,3], [1,5,1], [2,2,1], [2,3,4], [2,4,2], [3,0,2], [3,1,1], [3,2,1], [3,5,1], [4,1,1], [4,2,1] ] } */ /* Minimal dense { "id":null, "format": "Biological Observation Matrix 0.9.1", "format_url": "http://biom-format.org", "type": "OTU table", "generated_by": "QIIME revision 1.4.0-dev", "date": "2011-12-19T19:00:00", "rows":[ {"id":"GG_OTU_1", "metadata":null}, {"id":"GG_OTU_2", "metadata":null}, {"id":"GG_OTU_3", "metadata":null}, {"id":"GG_OTU_4", "metadata":null}, {"id":"GG_OTU_5", "metadata":null} ], "columns": [ {"id":"Sample1", "metadata":null}, {"id":"Sample2", "metadata":null}, {"id":"Sample3", "metadata":null}, {"id":"Sample4", "metadata":null}, {"id":"Sample5", "metadata":null}, {"id":"Sample6", "metadata":null} ], "matrix_type": "dense", "matrix_element_type": "int", "shape": [5,6], "data": [[0,0,1,0,0,0], [5,1,0,2,3,1], [0,0,1,4,2,0], [2,1,1,0,0,1], [0,1,1,0,0,0]] } */ //********************************************************************************************************************** vector MakeBiomCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "SharedRel", "SharedRel", "none","biom",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "SharedRel", "SharedRel", "none","biom",false,false,true); parameters.push_back(prelabund); CommandParameter pcontaxonomy("constaxonomy", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pcontaxonomy); CommandParameter preference("reftaxonomy", "InputTypes", "", "", "none", "none", "refPi","",false,false); parameters.push_back(preference); CommandParameter pmetadata("metadata", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pmetadata); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter ppicrust("picrust", "InputTypes", "", "", "none", "none", "refPi","shared",false,false); parameters.push_back(ppicrust); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter pmatrixtype("matrixtype", "Multiple", "sparse-dense", "sparse", "", "", "","",false,false); parameters.push_back(pmatrixtype); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MakeBiomCommand::getHelpString(){ try { string helpString = ""; helpString += "The make.biom command parameters are shared, relabund, contaxonomy, metadata, groups, matrixtype, picrust, reftaxonomy and label. shared or relabund are required, unless you have a valid current file.\n"; // helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distance levels you would like, and are also separated by dashes.\n"; helpString += "The matrixtype parameter allows you to select what type you would like to make. Choices are sparse and dense, default is sparse.\n"; helpString += "The contaxonomy file is the taxonomy file outputted by classify.otu(list=yourListfile, taxonomy=yourTaxonomyFile). Be SURE that the you are the constaxonomy file distance matches the shared file distance. ie, for *.0.03.cons.taxonomy set label=0.03. Mothur is smart enough to handle shared files that have been subsampled. It is used to assign taxonomy information to the metadata of rows.\n"; helpString += "The metadata parameter is used to provide experimental parameters to the columns. Things like 'sample1 gut human_gut'. \n"; helpString += "The picrust parameter is used to provide the greengenes OTU IDs map table. NOTE: Picrust requires a greengenes taxonomy. \n"; helpString += "The referencetax parameter is used with the picrust parameter. Picrust requires the greengenes OTU IDs to be in the biom file. \n"; helpString += "The make.biom command should be in the following format: make.biom(shared=yourShared, groups=yourGroups, label=yourLabels).\n"; helpString += "Example make.biom(shared=abrecovery.an.shared, groups=A-B-C).\n"; helpString += "The default value for groups is all the groups in your groupfile, and all labels in your inputfile will be used.\n"; helpString += "The make.biom command outputs a .biom file.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MakeBiomCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "biom") { pattern = "[filename],[distance],biom"; } else if (type == "shared") { pattern = "[filename],[distance],biom_shared"; } else if (type == "relabund") { pattern = "[filename],[distance],biom_relabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MakeBiomCommand::MakeBiomCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["biom"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["relabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "MakeBiomCommand"); exit(1); } } //********************************************************************************************************************** MakeBiomCommand::MakeBiomCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["biom"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["relabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } it = parameters.find("constaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["constaxonomy"] = inputDir + it->second; } } it = parameters.find("reftaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reftaxonomy"] = inputDir + it->second; } } it = parameters.find("picrust"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["picrust"] = inputDir + it->second; } } it = parameters.find("metadata"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["metadata"] = inputDir + it->second; } } } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { inputFileName = relabundfile; fileFormat = "relabund"; m->setRelAbundFile(relabundfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { inputFileName = sharedfile; fileFormat = "sharedfile"; m->setSharedFile(sharedfile); } if ((relabundfile == "") && (sharedfile == "")) { //is there are current file available for either of these? //give priority to shared, then relabund sharedfile = m->getSharedFile(); if (sharedfile != "") { inputFileName = sharedfile; fileFormat="sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputFileName = relabundfile; fileFormat="relabund"; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared or relabund."); m->mothurOutEndLine(); abort = true; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputFileName); } contaxonomyfile = validParameter.validFile(parameters, "constaxonomy", true); if (contaxonomyfile == "not found") { contaxonomyfile = ""; } else if (contaxonomyfile == "not open") { contaxonomyfile = ""; abort = true; } referenceTax = validParameter.validFile(parameters, "reftaxonomy", true); if (referenceTax == "not found") { referenceTax = ""; } else if (referenceTax == "not open") { referenceTax = ""; abort = true; } picrustOtuFile = validParameter.validFile(parameters, "picrust", true); if (picrustOtuFile == "not found") { picrustOtuFile = ""; } else if (picrustOtuFile == "not open") { picrustOtuFile = ""; abort = true; } metadatafile = validParameter.validFile(parameters, "metadata", true); if (metadatafile == "not found") { metadatafile = ""; } else if (metadatafile == "not open") { metadatafile = ""; abort = true; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } if (picrustOtuFile != "") { picrust=true; if (contaxonomyfile == "") { m->mothurOut("[ERROR]: the constaxonomy parameter is required with the picrust parameter, aborting."); m->mothurOutEndLine(); abort = true; } if (referenceTax == "") { m->mothurOut("[ERROR]: the reftaxonomy parameter is required with the picrust parameter, aborting."); m->mothurOutEndLine(); abort = true; } }else { picrust=false; } if ((contaxonomyfile != "") && (labels.size() > 1)) { m->mothurOut("[ERROR]: the contaxonomy parameter cannot be used with multiple labels."); m->mothurOutEndLine(); abort = true; } format = validParameter.validFile(parameters, "matrixtype", false); if (format == "not found") { format = "sparse"; } if ((format != "sparse") && (format != "dense")) { m->mothurOut(format + " is not a valid option for the matrixtype parameter. Options are sparse and dense."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "MakeBiomCommand"); exit(1); } } //********************************************************************************************************************** int MakeBiomCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } vector lookup; vector lookupRel; string lastLabel; InputData input(inputFileName, fileFormat); if (fileFormat == "sharedfile") { lookup = input.getSharedRAbundVectors(); lastLabel = lookup[0]->getLabel(); getSampleMetaData(lookup); }else { lookupRel = input.getSharedRAbundFloatVectors(); lastLabel = lookupRel[0]->getLabel(); getSampleMetaData(lookupRel); } //if user did not specify a label, then use first one if ((contaxonomyfile != "") && (labels.size() == 0)) { allLines = 0; labels.insert(lastLabel); } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (fileFormat == "sharedfile") { //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); getBiom(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); getBiom(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak and get next set for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } lookup = input.getSharedRAbundVectors(); } }else { //as long as you are not at the end of the file or done wih the lines you want while((lookupRel[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for (int i = 0; i < lookupRel.size(); i++) { delete lookupRel[i]; } return 0; } if(allLines == 1 || labels.count(lookupRel[0]->getLabel()) == 1){ m->mothurOut(lookupRel[0]->getLabel()); m->mothurOutEndLine(); getBiom(lookupRel); processedLabels.insert(lookupRel[0]->getLabel()); userLabels.erase(lookupRel[0]->getLabel()); } if ((m->anyLabelsToProcess(lookupRel[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookupRel[0]->getLabel(); for (int i = 0; i < lookupRel.size(); i++) { delete lookupRel[i]; } lookupRel = input.getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookupRel[0]->getLabel()); m->mothurOutEndLine(); getBiom(lookupRel); processedLabels.insert(lookupRel[0]->getLabel()); userLabels.erase(lookupRel[0]->getLabel()); //restore real lastlabel to save below lookupRel[0]->setLabel(saveLabel); } lastLabel = lookupRel[0]->getLabel(); //prevent memory leak and get next set for (int i = 0; i < lookupRel.size(); i++) { delete lookupRel[i]; lookupRel[i] = NULL; } lookupRel = input.getSharedRAbundFloatVectors(); } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (fileFormat == "sharedfile") { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); getBiom(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } }else { for (int i = 0; i < lookupRel.size(); i++) { if (lookupRel[i] != NULL) { delete lookupRel[i]; } } lookupRel = input.getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookupRel[0]->getLabel()); m->mothurOutEndLine(); getBiom(lookupRel); for (int i = 0; i < lookupRel.size(); i++) { delete lookupRel[i]; } } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set sabund file as new current sabundfile string current = ""; itTypes = outputTypes.find("biom"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setBiomFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "execute"); exit(1); } } //********************************************************************************************************************** int MakeBiomCommand::getBiom(vector& lookup){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("biom",variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["biom"].push_back(outputFileName); string mothurString = "mothur" + toString(m->getVersion()); time_t rawtime; struct tm * timeinfo; time ( &rawtime ); timeinfo = localtime ( &rawtime ); string dateString = asctime (timeinfo); int pos = dateString.find('\n'); if (pos != string::npos) { dateString = dateString.substr(0, pos);} string spaces = " "; //standard out << "{\n" + spaces + "\"id\":\"" + m->getSimpleName(sharedfile) + "-" + lookup[0]->getLabel() + "\",\n" + spaces + "\"format\": \"Biological Observation Matrix 0.9.1\",\n" + spaces + "\"format_url\": \"http://biom-format.org\",\n"; out << spaces + "\"type\": \"OTU table\",\n" + spaces + "\"generated_by\": \"" << mothurString << "\",\n" + spaces + "\"date\": \"" << dateString << "\",\n"; vector metadata = getMetaData(lookup); int numBins = lookup[0]->getNumBins(); if (m->control_pressed) { out.close(); return 0; } //get row info /*"rows":[ {"id":"GG_OTU_1", "metadata":null}, {"id":"GG_OTU_2", "metadata":null}, {"id":"GG_OTU_3", "metadata":null}, {"id":"GG_OTU_4", "metadata":null}, {"id":"GG_OTU_5", "metadata":null} ],*/ out << spaces + "\"rows\":[\n"; string rowFront = spaces + spaces + "{\"id\":\""; string rowBack = "\", \"metadata\":"; for (int i = 0; i < numBins-1; i++) { if (m->control_pressed) { out.close(); return 0; } out << rowFront << m->currentSharedBinLabels[i] << rowBack << metadata[i] << "},\n"; } out << rowFront << m->currentSharedBinLabels[(numBins-1)] << rowBack << metadata[(numBins-1)] << "}\n" + spaces + "],\n"; //get column info /*"columns": [ {"id":"Sample1", "metadata":null}, {"id":"Sample2", "metadata":null}, {"id":"Sample3", "metadata":null}, {"id":"Sample4", "metadata":null}, {"id":"Sample5", "metadata":null}, {"id":"Sample6", "metadata":null} ],*/ string colBack = "\", \"metadata\":"; out << spaces + "\"columns\":[\n"; for (int i = 0; i < lookup.size()-1; i++) { if (m->control_pressed) { out.close(); return 0; } out << rowFront << lookup[i]->getGroup() << colBack << sampleMetadata[i] << "},\n"; } out << rowFront << lookup[(lookup.size()-1)]->getGroup() << colBack << sampleMetadata[lookup.size()-1] << "}\n" + spaces + "],\n"; out << spaces + "\"matrix_type\": \"" << format << "\",\n" + spaces + "\"matrix_element_type\": \"int\",\n"; out << spaces + "\"shape\": [" << numBins << "," << lookup.size() << "],\n"; out << spaces + "\"data\": ["; vector dataRows; if (format == "sparse") { /*"data":[[0,2,1], [1,0,5], [1,1,1], [1,3,2], [1,4,3], [1,5,1], [2,2,1], [2,3,4], [2,4,2], [3,0,2], [3,1,1], [3,2,1], [3,5,1], [4,1,1], [4,2,1] ]*/ string output = ""; for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { out.close(); return 0; } for (int j = 0; j < lookup.size(); j++) { string binInfo = "[" + toString(i) + "," + toString(j) + "," + toString(lookup[j]->getAbundance(i)) + "]"; //only print non zero values if (lookup[j]->getAbundance(i) != 0) { dataRows.push_back(binInfo); } } } }else { /* "matrix_type": "dense", "matrix_element_type": "int", "shape": [5,6], "data": [[0,0,1,0,0,0], [5,1,0,2,3,1], [0,0,1,4,2,0], [2,1,1,0,0,1], [0,1,1,0,0,0]]*/ for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { out.close(); return 0; } string binInfo = "["; for (int j = 0; j < lookup.size()-1; j++) { binInfo += toString(lookup[j]->getAbundance(i)) + ","; } binInfo += toString(lookup[lookup.size()-1]->getAbundance(i)) + "]"; dataRows.push_back(binInfo); } } for (int i = 0; i < dataRows.size()-1; i++) { out << dataRows[i] << ",\n" + spaces + spaces; } out << dataRows[dataRows.size()-1] << "]\n"; out << "}\n"; out.close(); return 0; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getBiom"); exit(1); } } //********************************************************************************************************************** int MakeBiomCommand::getBiom(vector& lookup){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("biom",variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["biom"].push_back(outputFileName); string mothurString = "mothur" + toString(m->getVersion()); time_t rawtime; struct tm * timeinfo; time ( &rawtime ); timeinfo = localtime ( &rawtime ); string dateString = asctime (timeinfo); int pos = dateString.find('\n'); if (pos != string::npos) { dateString = dateString.substr(0, pos);} string spaces = " "; //standard out << "{\n" + spaces + "\"id\":\"" + sharedfile + "-" + lookup[0]->getLabel() + "\",\n" + spaces + "\"format\": \"Biological Observation Matrix 0.9.1\",\n" + spaces + "\"format_url\": \"http://biom-format.org\",\n"; out << spaces + "\"type\": \"OTU table\",\n" + spaces + "\"generated_by\": \"" << mothurString << "\",\n" + spaces + "\"date\": \"" << dateString << "\",\n"; vector metadata = getMetaData(lookup); int numBins = lookup[0]->getNumBins(); if (m->control_pressed) { out.close(); return 0; } //get row info /*"rows":[ {"id":"GG_OTU_1", "metadata":null}, {"id":"GG_OTU_2", "metadata":null}, {"id":"GG_OTU_3", "metadata":null}, {"id":"GG_OTU_4", "metadata":null}, {"id":"GG_OTU_5", "metadata":null} ],*/ out << spaces + "\"rows\":[\n"; string rowFront = spaces + spaces + "{\"id\":\""; string rowBack = "\", \"metadata\":"; for (int i = 0; i < numBins-1; i++) { if (m->control_pressed) { out.close(); return 0; } out << rowFront << m->currentSharedBinLabels[i] << rowBack << metadata[i] << "},\n"; } out << rowFront << m->currentSharedBinLabels[(numBins-1)] << rowBack << metadata[(numBins-1)] << "}\n" + spaces + "],\n"; //get column info /*"columns": [ {"id":"Sample1", "metadata":null}, {"id":"Sample2", "metadata":null}, {"id":"Sample3", "metadata":null}, {"id":"Sample4", "metadata":null}, {"id":"Sample5", "metadata":null}, {"id":"Sample6", "metadata":null} ],*/ string colBack = "\", \"metadata\":"; out << spaces + "\"columns\":[\n"; for (int i = 0; i < lookup.size()-1; i++) { if (m->control_pressed) { out.close(); return 0; } out << rowFront << lookup[i]->getGroup() << colBack << sampleMetadata[i] << "},\n"; } out << rowFront << lookup[(lookup.size()-1)]->getGroup() << colBack << sampleMetadata[lookup.size()-1] << "}\n" + spaces + "],\n"; out << spaces + "\"matrix_type\": \"" << format << "\",\n" + spaces + "\"matrix_element_type\": \"float\",\n"; out << spaces + "\"shape\": [" << numBins << "," << lookup.size() << "],\n"; out << spaces + "\"data\": ["; vector dataRows; if (format == "sparse") { /*"data":[[0,2,1], [1,0,5], [1,1,1], [1,3,2], [1,4,3], [1,5,1], [2,2,1], [2,3,4], [2,4,2], [3,0,2], [3,1,1], [3,2,1], [3,5,1], [4,1,1], [4,2,1] ]*/ string output = ""; for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { out.close(); return 0; } for (int j = 0; j < lookup.size(); j++) { string binInfo = "[" + toString(i) + "," + toString(j) + "," + toString(lookup[j]->getAbundance(i)) + "]"; //only print non zero values if (lookup[j]->getAbundance(i) != 0) { dataRows.push_back(binInfo); } } } }else { /* "matrix_type": "dense", "matrix_element_type": "int", "shape": [5,6], "data": [[0,0,1,0,0,0], [5,1,0,2,3,1], [0,0,1,4,2,0], [2,1,1,0,0,1], [0,1,1,0,0,0]]*/ for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { out.close(); return 0; } string binInfo = "["; for (int j = 0; j < lookup.size()-1; j++) { binInfo += toString(lookup[j]->getAbundance(i)) + ","; } binInfo += toString(lookup[lookup.size()-1]->getAbundance(i)) + "]"; dataRows.push_back(binInfo); } } for (int i = 0; i < dataRows.size()-1; i++) { out << dataRows[i] << ",\n" + spaces + spaces; } out << dataRows[dataRows.size()-1] << "]\n"; out << "}\n"; out.close(); return 0; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getBiom"); exit(1); } } //********************************************************************************************************************** vector MakeBiomCommand::getMetaData(vector& lookup){ try { vector metadata; if (contaxonomyfile == "") { for (int i = 0; i < lookup[0]->getNumBins(); i++) { metadata.push_back("null"); } } else { //read constaxonomy file storing in a map, otulabel -> taxonomy //constaxonomy file will most likely contain more labels than the shared file, because sharedfile could have been subsampled. ifstream in; m->openInputFile(contaxonomyfile, in); //grab headers m->getline(in); m->gobble(in); string otuLabel, tax; int size; vector otuLabels; vector taxs; while (!in.eof()) { if (m->control_pressed) { in.close(); return metadata; } in >> otuLabel >> size >> tax; m->gobble(in); otuLabels.push_back(otuLabel); taxs.push_back(tax); } in.close(); //should the labels be Otu001 or PhyloType001 string firstBin = m->currentSharedBinLabels[0]; string binTag = "Otu"; if ((firstBin.find("Otu")) == string::npos) { binTag = "PhyloType"; } //convert list file bin labels to shared file bin labels //parse tax strings //save in map map labelTaxMap; string snumBins = toString(otuLabels.size()); for (int i = 0; i < otuLabels.size(); i++) { if (m->control_pressed) { return metadata; } //if there is a bin label use it otherwise make one if (m->isContainingOnlyDigits(otuLabels[i])) { string binLabel = binTag; string sbinNumber = otuLabels[i]; if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; binLabel = m->getSimpleLabel(binLabel); labelTaxMap[binLabel] = taxs[i]; }else { map::iterator it = labelTaxMap.find(m->getSimpleLabel(otuLabels[i])); if (it == labelTaxMap.end()) { labelTaxMap[m->getSimpleLabel(otuLabels[i])] = taxs[i]; }else { m->mothurOut("[ERROR]: Cannot add OTULabel " + otuLabels[i] + " because it's simple label " + m->getSimpleLabel(otuLabels[i]) + " has already been added and will result in downstream errors. Have you mixed mothur labels and non mothur labels? To make the files work well together and backwards compatible mothur treats 1, OTU01, OTU001, OTU0001 all the same. We do this by removing any non numeric characters and leading zeros. For eaxample: Otu000018 and OtuMY18 both map to 18.\n"); m->control_pressed = true; } } } //merges OTUs classified to same gg otuid, sets otulabels to gg otuids, averages confidence scores of merged otus. overwritting of otulabels is fine because constaxonomy only allows for one label to be processed. If this assumption changes, could cause bug. if (picrust) { getGreenGenesOTUIDs(lookup, labelTaxMap); } //{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]} //traverse the binLabels forming the metadata strings and saving them //make sure to sanity check map::iterator it; for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { return metadata; } it = labelTaxMap.find(m->getSimpleLabel(m->currentSharedBinLabels[i])); if (it == labelTaxMap.end()) { m->mothurOut("[ERROR]: can't find taxonomy information for " + m->currentSharedBinLabels[i] + ".\n"); m->control_pressed = true; } else { vector bootstrapValues; string data = "{\"taxonomy\":["; vector scores; vector taxonomies = parseTax(it->second, scores); for (int j = 0; j < taxonomies.size()-1; j ++) { data += "\"" + taxonomies[j] + "\", "; } data += "\"" + taxonomies[taxonomies.size()-1] + "\"]"; //add bootstrap values if available if (scores[0] != "null") { data += ", \"bootstrap\":["; for (int j = 0; j < scores.size()-1; j ++) { data += scores[j] + ", "; } data += scores[scores.size()-1] + "]"; } data += "}"; metadata.push_back(data); } } } return metadata; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getMetadata"); exit(1); } } //********************************************************************************************************************** vector MakeBiomCommand::getMetaData(vector& lookup){ try { vector metadata; if (contaxonomyfile == "") { for (int i = 0; i < lookup[0]->getNumBins(); i++) { metadata.push_back("null"); } } else { //read constaxonomy file storing in a map, otulabel -> taxonomy //constaxonomy file will most likely contain more labels than the shared file, because sharedfile could have been subsampled. ifstream in; m->openInputFile(contaxonomyfile, in); //grab headers m->getline(in); m->gobble(in); string otuLabel, tax; int size; vector otuLabels; vector taxs; while (!in.eof()) { if (m->control_pressed) { in.close(); return metadata; } in >> otuLabel >> size >> tax; m->gobble(in); otuLabels.push_back(otuLabel); taxs.push_back(tax); } in.close(); //should the labels be Otu001 or PhyloType001 string firstBin = m->currentSharedBinLabels[0]; string binTag = "Otu"; if ((firstBin.find("Otu")) == string::npos) { binTag = "PhyloType"; } //convert list file bin labels to shared file bin labels //parse tax strings //save in map map labelTaxMap; string snumBins = toString(otuLabels.size()); for (int i = 0; i < otuLabels.size(); i++) { if (m->control_pressed) { return metadata; } //if there is a bin label use it otherwise make one if (m->isContainingOnlyDigits(otuLabels[i])) { string binLabel = binTag; string sbinNumber = otuLabels[i]; if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; binLabel = m->getSimpleLabel(binLabel); labelTaxMap[binLabel] = taxs[i]; }else { labelTaxMap[m->getSimpleLabel(otuLabels[i])] = taxs[i]; } } //merges OTUs classified to same gg otuid, sets otulabels to gg otuids, averages confidence scores of merged otus. overwritting of otulabels is fine because constaxonomy only allows for one label to be processed. If this assumption changes, could cause bug. if (picrust) { getGreenGenesOTUIDs(lookup, labelTaxMap); } //{"taxonomy":["k__Bacteria", "p__Proteobacteria", "c__Gammaproteobacteria", "o__Enterobacteriales", "f__Enterobacteriaceae", "g__Escherichia", "s__"]} //traverse the binLabels forming the metadata strings and saving them //make sure to sanity check map::iterator it; for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { return metadata; } it = labelTaxMap.find(m->getSimpleLabel(m->currentSharedBinLabels[i])); if (it == labelTaxMap.end()) { m->mothurOut("[ERROR]: can't find taxonomy information for " + m->currentSharedBinLabels[i] + ".\n"); m->control_pressed = true; } else { vector bootstrapValues; string data = "{\"taxonomy\":["; vector scores; vector taxonomies = parseTax(it->second, scores); for (int j = 0; j < taxonomies.size()-1; j ++) { data += "\"" + taxonomies[j] + "\", "; } data += "\"" + taxonomies[taxonomies.size()-1] + "\"]"; //add bootstrap values if available if (scores[0] != "null") { data += ", \"bootstrap\":["; for (int j = 0; j < scores.size()-1; j ++) { data += scores[j] + ", "; } data += scores[scores.size()-1] + "]"; } data += "}"; metadata.push_back(data); } } } return metadata; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getMetadata"); exit(1); } } //********************************************************************************************************************** int MakeBiomCommand::getGreenGenesOTUIDs(vector& lookup, map& labelTaxMap){ try { //read reftaxonomy PhyloTree phylo(referenceTax); //read otu map file map otuMap = readGGOtuMap(); //maps reference ID -> OTU ID if (m->control_pressed) { return 0; } map > ggOTUIDs; //loop through otu taxonomies for (map::iterator it = labelTaxMap.begin(); it != labelTaxMap.end(); it++) { //maps label -> consensus taxonomy if (m->control_pressed) { break; } string OTUTaxonomy = it->second; //remove confidences m->removeConfidences(OTUTaxonomy); //remove unclassifieds to match template int thisPos = OTUTaxonomy.find("unclassified;"); if (thisPos != string::npos) { OTUTaxonomy = OTUTaxonomy.substr(0, thisPos); } //get list of reference ids that map to this taxonomy vector referenceIds = phylo.getSeqs(OTUTaxonomy); if (m->control_pressed) { break; } //look for each one in otu map to find match string otuID = "not found"; string referenceString = ""; for (int i = 0; i < referenceIds.size(); i++) { referenceString += referenceIds[i] + " "; map::iterator itMap = otuMap.find(referenceIds[i]); if (itMap != otuMap.end()) { //found it otuID = itMap->second; i += referenceIds.size(); //stop looking } } //if found, add otu to ggOTUID list if (otuID != "not found") { map >::iterator itGG = ggOTUIDs.find(otuID); if (itGG == ggOTUIDs.end()) { vector temp; temp.push_back(it->first); //save mothur OTU label ggOTUIDs[otuID] = temp; }else { ggOTUIDs[otuID].push_back(it->first); } //add mothur OTU label to list }else { m->mothurOut("[ERROR]: could not find OTUId for " + it->second + ". Its reference sequences are " + referenceString + ".\n"); m->control_pressed = true; } } vector newLookup; for (int i = 0; i < lookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(lookup[i]->getLabel()); temp->setGroup(lookup[i]->getGroup()); newLookup.push_back(temp); } map labelIndex; for (int i = 0; i < m->currentSharedBinLabels.size(); i++) { labelIndex[m->getSimpleLabel(m->currentSharedBinLabels[i])] = i; } vector newBinLabels; map newLabelTaxMap; //loop through ggOTUID list combining mothur otus and adjusting labels //ggOTUIDs = 16097 -> for (map >::iterator itMap = ggOTUIDs.begin(); itMap != ggOTUIDs.end(); itMap++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //set new gg otu id to taxonomy. OTU01 -> k__Bacteria becomes 16097 -> k__Bacteria //find taxonomy of this otu map::iterator it = labelTaxMap.find(m->getSimpleLabel(itMap->second[0])); vector scores; vector taxonomies = parseTax(it->second, scores); //merge/set OTU abundances vector abunds; abunds.resize(lookup.size(), 0); string mergeString = ""; vector boots; boots.resize(scores.size(), 0); bool scoresNULL = false; for (int j = 0; j < itMap->second.size(); j++) { // if (scores[0] != "null") { //merge bootstrap scores vector scores; vector taxonomies = parseTax(it->second, scores); for (int i = 0; i < boots.size(); i++) { if (scores[i] == "null") { scoresNULL = true; break; } else { float tempScore; m->mothurConvert(scores[i], tempScore); boots[i] += tempScore; } } }else { scoresNULL = true; } //merge abunds mergeString += (itMap->second)[j] + " "; for (int i = 0; i < lookup.size(); i++) { abunds[i] += lookup[i]->getAbundance(labelIndex[m->getSimpleLabel((itMap->second)[j])]); } } if (m->debug) { m->mothurOut("[DEBUG]: merging " + mergeString + " for ggOTUid = " + itMap->first + ".\n"); } //average scores //add merged otu to new lookup string newTaxString = ""; if (!scoresNULL) { for (int j = 0; j < boots.size(); j++) { boots[j] /= (float) itMap->second.size(); } //assemble new taxomoy for (int j = 0; j < boots.size(); j++) { newTaxString += taxonomies[j] + "(" + toString(boots[j]) + ");"; } }else { //assemble new taxomoy for (int j = 0; j < taxonomies.size(); j++) { newTaxString += taxonomies[j] + ";"; } } //set new gg otu id to taxonomy. OTU01 -> k__Bacteria becomes 16097 -> k__Bacteria //find taxonomy of this otu newLabelTaxMap[itMap->first] = newTaxString; //add merged otu to new lookup for (int j = 0; j < abunds.size(); j++) { newLookup[j]->push_back(abunds[j], newLookup[j]->getGroup()); } //saved otu label newBinLabels.push_back(itMap->first); } for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } lookup = newLookup; m->currentSharedBinLabels = newBinLabels; labelTaxMap = newLabelTaxMap; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("shared",variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["shared"].push_back(outputFileName); lookup[0]->printHeaders(out); for (int i = 0; i < lookup.size(); i++) { out << lookup[i]->getLabel() << '\t' << lookup[i]->getGroup() << '\t'; lookup[i]->print(out); } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getGreenGenesOTUIDs"); exit(1); } } //********************************************************************************************************************** int MakeBiomCommand::getGreenGenesOTUIDs(vector& lookup, map& labelTaxMap){ try { //read reftaxonomy PhyloTree phylo(referenceTax); //read otu map file map otuMap = readGGOtuMap(); //maps reference ID -> OTU ID if (m->control_pressed) { return 0; } map > ggOTUIDs; //loop through otu taxonomies for (map::iterator it = labelTaxMap.begin(); it != labelTaxMap.end(); it++) { //maps label -> consensus taxonomy if (m->control_pressed) { break; } string OTUTaxonomy = it->second; //remove confidences m->removeConfidences(OTUTaxonomy); //remove unclassifieds to match template int thisPos = OTUTaxonomy.find("unclassified;"); if (thisPos != string::npos) { OTUTaxonomy = OTUTaxonomy.substr(0, thisPos); } //get list of reference ids that map to this taxonomy vector referenceIds = phylo.getSeqs(OTUTaxonomy); if (m->control_pressed) { break; } //look for each one in otu map to find match string otuID = "not found"; string referenceString = ""; for (int i = 0; i < referenceIds.size(); i++) { referenceString += referenceIds[i] + " "; map::iterator itMap = otuMap.find(referenceIds[i]); if (itMap != otuMap.end()) { //found it otuID = itMap->second; i += referenceIds.size(); //stop looking } } //if found, add otu to ggOTUID list if (otuID != "not found") { map >::iterator itGG = ggOTUIDs.find(otuID); if (itGG == ggOTUIDs.end()) { vector temp; temp.push_back(it->first); //save mothur OTU label ggOTUIDs[otuID] = temp; }else { ggOTUIDs[otuID].push_back(it->first); } //add mothur OTU label to list }else { m->mothurOut("[ERROR]: could not find OTUId for " + it->second + ". Its reference sequences are " + referenceString + ".\n"); m->control_pressed = true; } } vector newLookup; for (int i = 0; i < lookup.size(); i++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(lookup[i]->getLabel()); temp->setGroup(lookup[i]->getGroup()); newLookup.push_back(temp); } map labelIndex; for (int i = 0; i < m->currentSharedBinLabels.size(); i++) { labelIndex[m->getSimpleLabel(m->currentSharedBinLabels[i])] = i; } vector newBinLabels; map newLabelTaxMap; //loop through ggOTUID list combining mothur otus and adjusting labels //ggOTUIDs = 16097 -> for (map >::iterator itMap = ggOTUIDs.begin(); itMap != ggOTUIDs.end(); itMap++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //set new gg otu id to taxonomy. OTU01 -> k__Bacteria becomes 16097 -> k__Bacteria //find taxonomy of this otu map::iterator it = labelTaxMap.find(m->getSimpleLabel(itMap->second[0])); vector scores; vector taxonomies = parseTax(it->second, scores); //merge/set OTU abundances vector abunds; abunds.resize(lookup.size(), 0.0); string mergeString = ""; vector boots; boots.resize(scores.size(), 0.0); bool scoresNULL = false; for (int j = 0; j < itMap->second.size(); j++) { // if (scores[0] != "null") { //merge bootstrap scores vector scores; vector taxonomies = parseTax(it->second, scores); for (int i = 0; i < boots.size(); i++) { if (scores[i] == "null") { scoresNULL = true; break; } else { float tempScore; m->mothurConvert(scores[i], tempScore); boots[i] += tempScore; } } }else { scoresNULL = true; } //merge abunds mergeString += (itMap->second)[j] + " "; for (int i = 0; i < lookup.size(); i++) { abunds[i] += lookup[i]->getAbundance(labelIndex[m->getSimpleLabel((itMap->second)[j])]); } } if (m->debug) { m->mothurOut("[DEBUG]: merging " + mergeString + " for ggOTUid = " + itMap->first + ".\n"); } //average scores //add merged otu to new lookup string newTaxString = ""; if (!scoresNULL) { for (int j = 0; j < boots.size(); j++) { boots[j] /= (float) itMap->second.size(); } //assemble new taxomoy for (int j = 0; j < boots.size(); j++) { newTaxString += taxonomies[j] + "(" + toString(boots[j]) + ");"; } }else { //assemble new taxomoy for (int j = 0; j < taxonomies.size(); j++) { newTaxString += taxonomies[j] + ";"; } } //set new gg otu id to taxonomy. OTU01 -> k__Bacteria becomes 16097 -> k__Bacteria //find taxonomy of this otu newLabelTaxMap[itMap->first] = newTaxString; //add merged otu to new lookup for (int j = 0; j < abunds.size(); j++) { newLookup[j]->push_back(abunds[j], newLookup[j]->getGroup()); } //saved otu label newBinLabels.push_back(itMap->first); } for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } lookup = newLookup; m->currentSharedBinLabels = newBinLabels; labelTaxMap = newLabelTaxMap; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("relabund",variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["relabund"].push_back(outputFileName); lookup[0]->printHeaders(out); for (int i = 0; i < lookup.size(); i++) { out << lookup[i]->getLabel() << '\t' << lookup[i]->getGroup() << '\t'; lookup[i]->print(out); } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getGreenGenesOTUIDs"); exit(1); } } //********************************************************************************************************************** map MakeBiomCommand::readGGOtuMap(){ try { map otuMap; ifstream in; m->openInputFile(picrustOtuFile, in); //map referenceIDs -> otuIDs //lines look like: //16097 671376 616121 533566 683683 4332909 4434717 772666 611808 695209 while(!in.eof()) { if (m->control_pressed) { break; } string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); if (pieces.size() != 0) { string otuID = pieces[1]; for (int i = 1; i < pieces.size(); i++) { otuMap[pieces[i]] = otuID; } } } in.close(); return otuMap; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "readGGOtuMap"); exit(1); } } //********************************************************************************************************************** int MakeBiomCommand::getSampleMetaData(vector& lookup){ try { sampleMetadata.clear(); if (metadatafile == "") { for (int i = 0; i < lookup.size(); i++) { sampleMetadata.push_back("null"); } } else { ifstream in; m->openInputFile(metadatafile, in); vector groupNames, metadataLabels; map > lines; string headerLine = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(headerLine); //save names of columns you are reading for (int i = 1; i < pieces.size(); i++) { metadataLabels.push_back(pieces[i]); } int count = metadataLabels.size(); vector groups = m->getGroups(); //read rest of file while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } string group = ""; in >> group; m->gobble(in); groupNames.push_back(group); string line = m->getline(in); m->gobble(in); vector thisPieces = m->splitWhiteSpaceWithQuotes(line); if (thisPieces.size() != count) { m->mothurOut("[ERROR]: expected " + toString(count) + " items of data for sample " + group + " read " + toString(thisPieces.size()) + ", quitting.\n"); } else { if (m->inUsersGroups(group, groups)) { lines[group] = thisPieces; } } m->gobble(in); } in.close(); map >::iterator it; for (int i = 0; i < lookup.size(); i++) { if (m->control_pressed) { return 0; } it = lines.find(lookup[i]->getGroup()); if (it == lines.end()) { m->mothurOut("[ERROR]: can't find metadata information for " + lookup[i]->getGroup() + ", quitting.\n"); m->control_pressed = true; } else { vector values = it->second; string data = "{"; for (int j = 0; j < metadataLabels.size()-1; j++) { values[j] = m->removeQuotes(values[j]); data += "\"" + metadataLabels[j] + "\":\"" + values[j] + "\", "; } values[metadataLabels.size()-1] = m->removeQuotes(values[metadataLabels.size()-1]); data += "\"" + metadataLabels[metadataLabels.size()-1] + "\":\"" + values[metadataLabels.size()-1] + "\"}"; sampleMetadata.push_back(data); } } } return 0; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getSampleMetaData"); exit(1); } } //********************************************************************************************************************** int MakeBiomCommand::getSampleMetaData(vector& lookup){ try { sampleMetadata.clear(); if (metadatafile == "") { for (int i = 0; i < lookup.size(); i++) { sampleMetadata.push_back("null"); } } else { ifstream in; m->openInputFile(metadatafile, in); vector groupNames, metadataLabels; map > lines; string headerLine = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(headerLine); //save names of columns you are reading for (int i = 1; i < pieces.size(); i++) { metadataLabels.push_back(pieces[i]); } int count = metadataLabels.size(); vector groups = m->getGroups(); //read rest of file while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } string group = ""; in >> group; m->gobble(in); groupNames.push_back(group); string line = m->getline(in); m->gobble(in); vector thisPieces = m->splitWhiteSpaceWithQuotes(line); if (thisPieces.size() != count) { m->mothurOut("[ERROR]: expected " + toString(count) + " items of data for sample " + group + " read " + toString(thisPieces.size()) + ", quitting.\n"); } else { if (m->inUsersGroups(group, groups)) { lines[group] = thisPieces; } } m->gobble(in); } in.close(); map >::iterator it; for (int i = 0; i < lookup.size(); i++) { if (m->control_pressed) { return 0; } it = lines.find(lookup[i]->getGroup()); if (it == lines.end()) { m->mothurOut("[ERROR]: can't find metadata information for " + lookup[i]->getGroup() + ", quitting.\n"); m->control_pressed = true; } else { vector values = it->second; string data = "{"; for (int j = 0; j < metadataLabels.size()-1; j++) { values[j] = m->removeQuotes(values[j]); data += "\"" + metadataLabels[j] + "\":\"" + values[j] + "\", "; } values[metadataLabels.size()-1] = m->removeQuotes(values[metadataLabels.size()-1]); data += "\"" + metadataLabels[metadataLabels.size()-1] + "\":\"" + values[metadataLabels.size()-1] + "\"}"; sampleMetadata.push_back(data); } } } return 0; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "getSampleMetaData"); exit(1); } } /**************************************************************************************************/ //returns {Bacteria, Bacteroidetes, ..} and scores is filled with {100, 98, ...} or {null, null, null} vector MakeBiomCommand::parseTax(string tax, vector& scores) { try { string taxon; vector taxs; while (tax.find_first_of(';') != -1) { if (m->control_pressed) { return taxs; } //get taxon taxon = tax.substr(0,tax.find_first_of(';')); int pos = taxon.find_last_of('('); if (pos != -1) { //is it a number? int pos2 = taxon.find_last_of(')'); if (pos2 != -1) { string confidenceScore = taxon.substr(pos+1, (pos2-(pos+1))); if (m->isNumeric1(confidenceScore)) { taxon = taxon.substr(0, pos); //rip off confidence scores.push_back(confidenceScore); }else{ scores.push_back("null"); } } }else{ scores.push_back("null"); } //strip "" if they are there pos = taxon.find("\""); if (pos != string::npos) { string newTax = ""; for (int k = 0; k < taxon.length(); k++) { if (taxon[k] != '\"') { newTax += taxon[k]; } } taxon = newTax; } //look for bootstrap value taxs.push_back(taxon); tax = tax.substr(tax.find_first_of(';')+1, tax.length()); } return taxs; } catch(exception& e) { m->errorOut(e, "MakeBiomCommand", "parseTax"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/makebiomcommand.h000066400000000000000000000034321255543666200215350ustar00rootroot00000000000000#ifndef Mothur_makebiomcommand_h #define Mothur_makebiomcommand_h // // makebiomcommand.h // Mothur // // Created by Sarah Westcott on 4/16/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "sharedrabundvector.h" #include "inputdata.h" class MakeBiomCommand : public Command { public: MakeBiomCommand(string); MakeBiomCommand(); ~MakeBiomCommand(){} vector setParameters(); string getCommandName() { return "make.biom"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://biom-format.org/documentation/biom_format.html, http://www.mothur.org/wiki/Make.biom"; } string getDescription() { return "creates a biom file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string sharedfile, relabundfile, contaxonomyfile, metadatafile, groups, outputDir, format, label, referenceTax, picrustOtuFile, inputFileName, fileFormat; vector outputNames, Groups, sampleMetadata; set labels; bool abort, allLines, picrust; int getBiom(vector&); int getBiom(vector& lookup); vector getMetaData(vector&); vector getMetaData(vector&); vector parseTax(string tax, vector& scores); int getSampleMetaData(vector&); int getSampleMetaData(vector&); //for picrust int getGreenGenesOTUIDs(vector&, map&); int getGreenGenesOTUIDs(vector&, map&); map readGGOtuMap(); }; #endif mothur-1.36.1/source/commands/makecontigscommand.cpp000066400000000000000000013757271255543666200226340ustar00rootroot00000000000000// // makecontigscommand.cpp // Mothur // // Created by Sarah Westcott on 5/15/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "makecontigscommand.h" //********************************************************************************************************************** vector MakeContigsCommand::setParameters(){ try { CommandParameter pfastq("ffastq", "InputTypes", "", "", "FastaFastqFile", "FastaFastqFile", "fastqGroup","fasta-qfile",false,false,true); parameters.push_back(pfastq); CommandParameter prfastq("rfastq", "InputTypes", "", "", "none", "none", "fastqGroup","fasta-qfile",false,false,true); parameters.push_back(prfastq); CommandParameter pfasta("ffasta", "InputTypes", "", "", "FastaFastqFile", "FastaFastqFile", "fastaGroup","fasta",false,false,true); parameters.push_back(pfasta); CommandParameter prfasta("rfasta", "InputTypes", "", "", "none", "none", "none","fastaGroup",false,false,true); parameters.push_back(prfasta); CommandParameter pfqual("fqfile", "InputTypes", "", "", "none", "none", "qfileGroup","",false,false,true); parameters.push_back(pfqual); CommandParameter prqual("rqfile", "InputTypes", "", "", "none", "none", "qfileGroup","",false,false,true); parameters.push_back(prqual); CommandParameter pfile("file", "InputTypes", "", "", "FastaFastqFile", "FastaFastqFile", "none","fasta-qfile",false,false,true); parameters.push_back(pfile); CommandParameter poligos("oligos", "InputTypes", "", "", "none", "none", "none","group",false,false,true); parameters.push_back(poligos); CommandParameter pfindex("findex", "InputTypes", "", "", "none", "none", "none","",false,false,true); parameters.push_back(pfindex); CommandParameter prindex("rindex", "InputTypes", "", "", "none", "none", "none","",false,false,true); parameters.push_back(prindex); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(ppdiffs); CommandParameter pbdiffs("bdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(pbdiffs); CommandParameter ptdiffs("tdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ptdiffs); CommandParameter preorient("checkorient", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(preorient); CommandParameter palign("align", "Multiple", "needleman-gotoh-kmer", "needleman", "", "", "","",false,false); parameters.push_back(palign); CommandParameter pallfiles("allfiles", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pallfiles); CommandParameter ptrimoverlap("trimoverlap", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(ptrimoverlap); CommandParameter pmatch("match", "Number", "", "1.0", "", "", "","",false,false); parameters.push_back(pmatch); CommandParameter pmismatch("mismatch", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pmismatch); CommandParameter pgapopen("gapopen", "Number", "", "-2.0", "", "", "","",false,false); parameters.push_back(pgapopen); CommandParameter pgapextend("gapextend", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pgapextend); CommandParameter pthreshold("insert", "Number", "", "20", "", "", "","",false,false); parameters.push_back(pthreshold); CommandParameter pdeltaq("deltaq", "Number", "", "6", "", "", "","",false,false); parameters.push_back(pdeltaq); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pformat("format", "Multiple", "sanger-illumina-solexa-illumina1.8+", "illumina1.8+", "", "", "","",false,false,true); parameters.push_back(pformat); CommandParameter pksize("ksize", "Number", "", "8", "", "", "","",false,false); parameters.push_back(pksize); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MakeContigsCommand::getHelpString(){ try { string helpString = ""; helpString += "The make.contigs command reads a file, forward fastq file and a reverse fastq file or forward fasta and reverse fasta files and outputs new fasta. \n"; helpString += "If an oligos file is provided barcodes and primers will be trimmed, and a group file will be created.\n"; helpString += "If a forward index or reverse index file is provided barcodes be trimmed, and a group file will be created. The oligos parameter is required if an index file is given.\n"; helpString += "The make.contigs command parameters are file, ffastq, rfastq, ffasta, rfasta, fqfile, rqfile, oligos, findex, rindex, format, tdiffs, bdiffs, pdiffs, align, match, mismatch, gapopen, gapextend, insert, deltaq, allfiles and processors.\n"; helpString += "The ffastq and rfastq, file, or ffasta and rfasta parameters are required.\n"; helpString += "The file parameter is 2, 3 or 4 column file containing the forward fastq files in the first column and their matching reverse fastq files in the second column, or a groupName then forward fastq file and reverse fastq file, or forward fastq file then reverse fastq then forward index and reverse index file. If you only have one index file add 'none' for the other one. Mothur will process each pair and create a combined fasta and report file with all the sequences.\n"; helpString += "The ffastq and rfastq parameters are used to provide a forward fastq and reverse fastq file to process. If you provide one, you must provide the other.\n"; helpString += "The ffasta and rfasta parameters are used to provide a forward fasta and reverse fasta file to process. If you provide one, you must provide the other.\n"; helpString += "The fqfile and rqfile parameters are used to provide a forward quality and reverse quality files to process with the ffasta and rfasta parameters. If you provide one, you must provide the other.\n"; helpString += "The format parameter is used to indicate whether your sequences are sanger, solexa, illumina1.8+ or illumina, default=illumina1.8+.\n"; helpString += "The findex and rindex parameters are used to provide a forward index and reverse index files to process. \n"; helpString += "The align parameter allows you to specify the alignment method to use. Your options are: kmer, gotoh and needleman. The default is needleman.\n"; helpString += "The ksize parameter allows you to set the kmer size if you are doing align=kmer. Default=8.\n"; helpString += "The tdiffs parameter is used to specify the total number of differences allowed in the sequence. The default is pdiffs + bdiffs + sdiffs + ldiffs.\n"; helpString += "The bdiffs parameter is used to specify the number of differences allowed in the barcode. The default is 0.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; //helpString += "The ldiffs parameter is used to specify the number of differences allowed in the linker. The default is 0.\n"; //helpString += "The sdiffs parameter is used to specify the number of differences allowed in the spacer. The default is 0.\n"; helpString += "The match parameter allows you to specify the bonus for having the same base. The default is 1.0.\n"; helpString += "The mistmatch parameter allows you to specify the penalty for having different bases. The default is -1.0.\n"; helpString += "The checkorient parameter will check look for the reverse compliment of the barcode or primer in the sequence. If found the sequence is flipped. The default is false.\n"; helpString += "The deltaq parameter allows you to specify the delta allowed between quality scores of a mismatched base. For example in the overlap, if deltaq=5 and in the alignment seqA, pos 200 has a quality score of 30 and the same position in seqB has a quality score of 20, you take the base from seqA (30-20 >= 5). If the quality score in seqB is 28 then the base in the consensus will be an N (30-28<5) The default is 6.\n"; helpString += "The gapopen parameter allows you to specify the penalty for opening a gap in an alignment. The default is -2.0.\n"; helpString += "The gapextend parameter allows you to specify the penalty for extending a gap in an alignment. The default is -1.0.\n"; helpString += "The insert parameter allows you to set a quality scores threshold. In the case where we are trying to decide whether to keep a base or remove it because the base is compared to a gap in the other fragment, if the base has a quality score equal to or below the threshold we eliminate it. Default=20.\n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; helpString += "The allfiles parameter will create separate group and fasta file for each grouping. The default is F.\n"; helpString += "The trimoverlap parameter allows you to trim the sequences to only the overlapping section. The default is F.\n"; helpString += "The make.contigs command should be in the following format: \n"; helpString += "make.contigs(ffastq=yourForwardFastqFile, rfastq=yourReverseFastqFile, align=yourAlignmentMethod) \n"; helpString += "Note: No spaces between parameter labels (i.e. ffastq), '=' and parameters (i.e.yourForwardFastqFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MakeContigsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],[tag],contigs.fasta"; } else if (type == "qfile") { pattern = "[filename],[tag],contigs.qual"; } else if (type == "group") { pattern = "[filename],[tag],contigs.groups"; } else if (type == "report") { pattern = "[filename],[tag],contigs.report"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MakeContigsCommand::MakeContigsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["report"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "MakeContigsCommand"); exit(1); } } //********************************************************************************************************************** MakeContigsCommand::MakeContigsCommand(string option) { try { abort = false; calledHelp = false; createFileGroup = false; createOligosGroup = false; gz = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("pairwise.seqs"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["report"] = tempOutNames; outputTypes["group"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("ffastq"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["ffastq"] = inputDir + it->second; } } it = parameters.find("rfastq"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rfastq"] = inputDir + it->second; } } it = parameters.find("ffasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["ffasta"] = inputDir + it->second; } } it = parameters.find("rfasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rfasta"] = inputDir + it->second; } } it = parameters.find("fqfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fqfile"] = inputDir + it->second; } } it = parameters.find("rqfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rqfile"] = inputDir + it->second; } } it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } it = parameters.find("findex"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["findex"] = inputDir + it->second; } } it = parameters.find("rindex"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rindex"] = inputDir + it->second; } } } ffastqfile = validParameter.validFile(parameters, "ffastq", true); if (ffastqfile == "not open") { abort = true; } else if (ffastqfile == "not found") { ffastqfile = ""; } rfastqfile = validParameter.validFile(parameters, "rfastq", true); if (rfastqfile == "not open") { abort = true; } else if (rfastqfile == "not found") { rfastqfile = ""; } ffastafile = validParameter.validFile(parameters, "ffasta", true); if (ffastafile == "not open") { abort = true; } else if (ffastafile == "not found") { ffastafile = ""; } rfastafile = validParameter.validFile(parameters, "rfasta", true); if (rfastafile == "not open") { abort = true; } else if (rfastafile == "not found") { rfastafile = ""; } fqualfile = validParameter.validFile(parameters, "fqfile", true); if (fqualfile == "not open") { abort = true; } else if (fqualfile == "not found") { fqualfile = ""; } rqualfile = validParameter.validFile(parameters, "rqfile", true); if (rqualfile == "not open") { abort = true; } else if (rqualfile == "not found") { rqualfile = ""; } file = validParameter.validFile(parameters, "file", true); if (file == "not open") { abort = true; } else if (file == "not found") { file = ""; } //provide at least if ((file == "") && (ffastafile == "") && (ffastqfile == "")) { abort = true; m->mothurOut("[ERROR]: The file, ffastq and rfastq or ffasta and rfasta parameters are required.\n"); } if ((file != "") && ((ffastafile != "") || (ffastqfile != ""))) { abort = true; m->mothurOut("[ERROR]: The file, ffastq and rfastq or ffasta and rfasta parameters are required.\n"); } if ((ffastqfile != "") && (rfastqfile == "")) { abort = true; m->mothurOut("[ERROR]: If you provide use the ffastq, you must provide a rfastq file.\n"); } if ((ffastqfile == "") && (rfastqfile != "")) { abort = true; m->mothurOut("[ERROR]: If you provide use the rfastq, you must provide a ffastq file.\n"); } if ((ffastafile != "") && (rfastafile == "")) { abort = true; m->mothurOut("[ERROR]: If you provide use the ffasta, you must provide a rfasta file.\n"); } if ((ffastafile == "") && (rfastafile != "")) { abort = true; m->mothurOut("[ERROR]: If you provide use the rfasta, you must provide a ffasta file.\n"); } if ((fqualfile != "") && (rqualfile == "")) { abort = true; m->mothurOut("[ERROR]: If you provide use the fqfile, you must provide a rqfile file.\n"); } if ((fqualfile == "") && (rqualfile != "")) { abort = true; m->mothurOut("[ERROR]: If you provide use the rqfile, you must provide a fqfile file.\n"); } if (((fqualfile != "") || (rqualfile != "")) && ((ffastafile == "") || (rfastafile == ""))) { abort = true; m->mothurOut("[ERROR]: If you provide use the rqfile or fqfile file, you must provide the ffasta and rfasta parameters.\n"); } oligosfile = validParameter.validFile(parameters, "oligos", true); if (oligosfile == "not found") { oligosfile = ""; } else if(oligosfile == "not open") { abort = true; } else { m->setOligosFile(oligosfile); } findexfile = validParameter.validFile(parameters, "findex", true); if (findexfile == "not found") { findexfile = ""; } else if(findexfile == "not open") { abort = true; } rindexfile = validParameter.validFile(parameters, "rindex", true); if (rindexfile == "not found") { rindexfile = ""; } else if(rindexfile == "not open") { abort = true; } if ((rindexfile != "") || (findexfile != "")) { if (oligosfile == ""){ oligosfile = m->getOligosFile(); if (oligosfile != "") { m->mothurOut("Using " + oligosfile + " as input file for the oligos parameter.\n"); } else { m->mothurOut("You need to provide an oligos file if you are going to use an index file.\n"); abort = true; } } //can only use an index file with the fastq parameters not fasta and qual if ((ffastafile != "") || (rfastafile != "")) { m->mothurOut("[ERROR]: You can only use an index file with the fastq parameters or the file option.\n"); abort = true; } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "match", false); if (temp == "not found"){ temp = "1.0"; } m->mothurConvert(temp, match); temp = validParameter.validFile(parameters, "mismatch", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, misMatch); if (misMatch > 0) { m->mothurOut("[ERROR]: mismatch must be negative.\n"); abort=true; } temp = validParameter.validFile(parameters, "gapopen", false); if (temp == "not found"){ temp = "-2.0"; } m->mothurConvert(temp, gapOpen); if (gapOpen > 0) { m->mothurOut("[ERROR]: gapopen must be negative.\n"); abort=true; } temp = validParameter.validFile(parameters, "gapextend", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, gapExtend); if (gapExtend > 0) { m->mothurOut("[ERROR]: gapextend must be negative.\n"); abort=true; } temp = validParameter.validFile(parameters, "insert", false); if (temp == "not found"){ temp = "20"; } m->mothurConvert(temp, insert); if ((insert < 0) || (insert > 40)) { m->mothurOut("[ERROR]: insert must be between 0 and 40.\n"); abort=true; } temp = validParameter.validFile(parameters, "deltaq", false); if (temp == "not found"){ temp = "6"; } m->mothurConvert(temp, deltaq); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "bdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, bdiffs); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, pdiffs); // temp = validParameter.validFile(parameters, "ldiffs", false); if (temp == "not found") { temp = "0"; } // m->mothurConvert(temp, ldiffs); ldiffs = 0; // temp = validParameter.validFile(parameters, "sdiffs", false); if (temp == "not found") { temp = "0"; } // m->mothurConvert(temp, sdiffs); sdiffs = 0; temp = validParameter.validFile(parameters, "tdiffs", false); if (temp == "not found") { int tempTotal = pdiffs + bdiffs; temp = toString(tempTotal); } m->mothurConvert(temp, tdiffs); if(tdiffs == 0){ tdiffs = bdiffs + pdiffs; } //+ ldiffs + sdiffs; temp = validParameter.validFile(parameters, "allfiles", false); if (temp == "not found") { temp = "F"; } allFiles = m->isTrue(temp); temp = validParameter.validFile(parameters, "ksize", false); if (temp == "not found"){ temp = "8"; } m->mothurConvert(temp, kmerSize); temp = validParameter.validFile(parameters, "trimoverlap", false); if (temp == "not found") { temp = "F"; } trimOverlap = m->isTrue(temp); align = validParameter.validFile(parameters, "align", false); if (align == "not found"){ align = "needleman"; } if ((align != "needleman") && (align != "gotoh") && (align != "kmer")) { m->mothurOut(align + " is not a valid alignment method. Options are kmer, needleman or gotoh. I will use needleman."); m->mothurOutEndLine(); align = "needleman"; } format = validParameter.validFile(parameters, "format", false); if (format == "not found"){ format = "illumina1.8+"; } if ((format != "sanger") && (format != "illumina") && (format != "illumina1.8+") && (format != "solexa")) { m->mothurOut(format + " is not a valid format. Your format choices are sanger, solexa, illumina1.8+ and illumina, aborting." ); m->mothurOutEndLine(); abort=true; } temp = validParameter.validFile(parameters, "checkorient", false); if (temp == "not found") { temp = "F"; } reorient = m->isTrue(temp); qual_score.resize(47); qual_score[0] = -2; qual_score[1] = -1.58147; qual_score[2] = -0.996843; qual_score[3] = -0.695524; qual_score[4] = -0.507676; qual_score[5] = -0.38013; qual_score[6] = -0.289268; qual_score[7] = -0.222552; qual_score[8] = -0.172557; qual_score[9] = -0.134552; qual_score[10] = -0.105361; qual_score[11] = -0.0827653; qual_score[12] = -0.0651742; qual_score[13] = -0.0514183; qual_score[14] = -0.0406248; qual_score[15] = -0.0321336; qual_score[16] = -0.0254397; qual_score[17] = -0.0201544; qual_score[18] = -0.0159759; qual_score[19] = -0.0126692; qual_score[20] = -0.0100503; qual_score[21] = -0.007975; qual_score[22] = -0.00632956; qual_score[23] = -0.00502447; qual_score[24] = -0.00398902; qual_score[25] = -0.00316729; qual_score[26] = -0.00251505; qual_score[27] = -0.00199726; qual_score[28] = -0.00158615; qual_score[29] = -0.00125972; qual_score[30] = -0.0010005; qual_score[31] = -0.000794644; qual_score[32] = -0.000631156; qual_score[33] = -0.000501313; qual_score[34] = -0.000398186; qual_score[35] = -0.000316278; qual_score[36] = -0.00025122; qual_score[37] = -0.000199546; qual_score[38] = -0.000158502; qual_score[39] = -0.0001259; qual_score[40] = -0.000100005; qual_score[41] = -7.9436e-05; qual_score[42] = -6.30977e-05; qual_score[43] = -5.012e-05; qual_score[44] = -3.98115e-05; qual_score[45] = -3.16233e-05; qual_score[46] = -2.51192e-05; } } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "MakeContigsCommand"); exit(1); } } //********************************************************************************************************************** int MakeContigsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } unsigned long long numReads = 0; map totalGroupCounts; int start = time(NULL); longestBase = 1000; group = ""; if (file != "") { numReads = processMultipleFileOption(totalGroupCounts); }else if ((ffastqfile != "") || (ffastafile != "")) { numReads = processSingleFileOption(totalGroupCounts); }else { return 0; } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to process " + toString(numReads) + " sequences.\n"); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output group counts m->mothurOutEndLine(); int total = 0; if (totalGroupCounts.size() != 0) { m->mothurOut("Group count: \n"); } for (map::iterator it = totalGroupCounts.begin(); it != totalGroupCounts.end(); it++) { total += it->second; m->mothurOut(it->first + "\t" + toString(it->second)); m->mothurOutEndLine(); } if (total != 0) { m->mothurOut("\nTotal of all groups is " + toString(total)); m->mothurOutEndLine(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } string currentFasta = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { currentFasta = (itTypes->second)[0]; m->setFastaFile(currentFasta); } } string currentGroup = ""; itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { currentGroup = (itTypes->second)[0]; m->setGroupFile(currentGroup); } } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "execute"); exit(1); } } //********************************************************************************************************************** unsigned long long MakeContigsCommand::processSingleFileOption(map& totalGroupCounts) { try { bool hasQual = false; unsigned long long numReads = 0; string inputFile = ""; vector fileInputs; vector qualOrIndexInputs; vector lines; vector qLines; delim = '>'; map variables; string thisOutputDir = outputDir; if (ffastafile != "") { inputFile = ffastafile; if (outputDir == "") { thisOutputDir = m->hasPath(inputFile); } fileInputs.push_back(ffastafile); fileInputs.push_back(rfastafile); if (fqualfile != "") { hasQual = true; qualOrIndexInputs.push_back(fqualfile); qualOrIndexInputs.push_back(rqualfile); variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fqualfile)); variables["[tag]"] = "trim"; outQualFile = getOutputFileName("qfile",variables); variables["[tag]"] = "scrap"; outScrapQualFile = getOutputFileName("qfile",variables); }else { outQualFile = ""; outScrapQualFile = ""; } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(inputFile)); delim = '>'; }else { //ffastqfile hasQual = true; inputFile = ffastqfile; if (outputDir == "") { thisOutputDir = m->hasPath(inputFile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(inputFile)); variables["[tag]"] = "trim"; outQualFile = getOutputFileName("qfile",variables); variables["[tag]"] = "scrap"; outScrapQualFile = getOutputFileName("qfile",variables); fileInputs.push_back(ffastqfile); fileInputs.push_back(rfastqfile); if ((findexfile != "") || (rindexfile != "")){ qualOrIndexInputs.push_back("NONE"); qualOrIndexInputs.push_back("NONE"); if (findexfile != "") { qualOrIndexInputs[0] = findexfile; } if (rindexfile != "") { qualOrIndexInputs[1] = rindexfile; } } delim = '@'; } bool allGZ = true; #ifdef USE_BOOST bool allPlainTxt = true; if (m->isGZ(fileInputs[0])[1]) { allPlainTxt = false; } else { allGZ = false; } if (m->isGZ(fileInputs[1])[1]) { allPlainTxt = false; } else { allGZ = false; } if (qualOrIndexInputs.size() != 0) { if (qualOrIndexInputs[0] != "NONE") { if (m->isGZ(qualOrIndexInputs[0])[1]) { allPlainTxt = false; } else { allGZ = false; } } if (qualOrIndexInputs[1] != "NONE") { if (m->isGZ(qualOrIndexInputs[1])[1]) { allPlainTxt = false; } else { allGZ = false; } } if (!allGZ && !allPlainTxt) { //mixed bag of files, uh oh... m->mothurOut("[ERROR]: Your files must all be in compressed .gz form or all in plain text form. Please correct. \n"); m->control_pressed = true; } } #else #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else string extension = m->getExtension(fileInputs[0]); if (extension == "gz") { m->mothurOut("[ERROR]: You cannot use compressed .gz files as input with our windows version of mothur. \n"); m->control_pressed = true; } extension = m->getExtension(fileInputs[1]); if (extension == "gz") { m->mothurOut("[ERROR]: You cannot use compressed .gz files as input with our windows version of mothur. \n"); m->control_pressed = true; } if (qualOrIndexInputs.size() != 0) { if (qualOrIndexInputs[0] != "NONE") { extension = m->getExtension(qualOrIndexInputs[0]); if (extension == "gz") { m->mothurOut("[ERROR]: You cannot use compressed .gz files as input with our windows version of mothur. \n"); m->control_pressed = true; } } if (qualOrIndexInputs[1] != "NONE") { extension = m->getExtension(qualOrIndexInputs[1]); if (extension == "gz") { m->mothurOut("[ERROR]: You cannot use compressed .gz files as input with our windows version of mothur. \n"); m->control_pressed = true; } } } #endif allGZ = false; #endif if (allGZ) { gz = true; }else { gz = false; } variables["[tag]"] = "trim"; outFastaFile = getOutputFileName("fasta",variables); variables["[tag]"] = "scrap"; outScrapFastaFile = getOutputFileName("fasta",variables); variables["[tag]"] = ""; outMisMatchFile = getOutputFileName("report",variables); //divides the files so that the processors can share the workload. setLines(fileInputs, qualOrIndexInputs, lines, qLines, delim); vector > fastaFileNames, qualFileNames; map uniqueFastaNames;// so we don't add the same groupfile multiple times createOligosGroup = false; oligos = new Oligos(); numBarcodes = 0; numFPrimers= 0; numLinkers= 0; numSpacers = 0; numRPrimers = 0; if(oligosfile != "") { createOligosGroup = getOligos(fastaFileNames, qualFileNames, variables["[filename]"], uniqueFastaNames); } if (createOligosGroup || createFileGroup) { outputGroupFileName = getOutputFileName("group",variables); } //give group in file file precedence if (createFileGroup) { createOligosGroup = false; } m->mothurOut("Making contigs...\n"); numReads = createProcesses(fileInputs, qualOrIndexInputs, outFastaFile, outScrapFastaFile, outQualFile, outScrapQualFile, outMisMatchFile, fastaFileNames, qualFileNames, lines, qLines, group); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete oligos; return 0; } if(allFiles){ // so we don't add the same groupfile multiple times map::iterator it; set namesToRemove; for(int i=0;iisBlank(fastaFileNames[i][j])){ m->mothurRemove(fastaFileNames[i][j]); namesToRemove.insert(fastaFileNames[i][j]); uniqueFastaNames.erase(fastaFileNames[i][j]); //remove from list for group file print m->mothurRemove(qualFileNames[i][j]); namesToRemove.insert(qualFileNames[i][j]); } } } } } //remove names for outputFileNames, just cleans up the output vector outputNames2; for(int i = 0; i < outputNames.size(); i++) { if (namesToRemove.count(outputNames[i]) == 0) { outputNames2.push_back(outputNames[i]); } } outputNames = outputNames2; for (it = uniqueFastaNames.begin(); it != uniqueFastaNames.end(); it++) { ifstream in; m->openInputFile(it->first, in); ofstream out; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(it->first)); string thisGroupName = getOutputFileName("group",variables); outputNames.push_back(thisGroupName); outputTypes["group"].push_back(thisGroupName); m->openOutputFile(thisGroupName, out); while (!in.eof()){ if (m->control_pressed) { break; } Sequence currSeq(in); m->gobble(in); out << currSeq.getName() << '\t' << it->second << endl; } out.close(); in.close(); } } if (createFileGroup || createOligosGroup) { ofstream outGroup; m->openOutputFile(outputGroupFileName, outGroup); for (map::iterator itGroup = groupMap.begin(); itGroup != groupMap.end(); itGroup++) { outGroup << itGroup->first << '\t' << itGroup->second << endl; } outGroup.close(); } if (file == "") { totalGroupCounts = groupCounts; outputNames.push_back(outFastaFile); outputTypes["fasta"].push_back(outFastaFile); outputNames.push_back(outScrapFastaFile); outputTypes["fasta"].push_back(outScrapFastaFile); if (hasQual) { outputNames.push_back(outQualFile); outputTypes["qfile"].push_back(outQualFile); outputNames.push_back(outScrapQualFile); outputTypes["qfile"].push_back(outScrapQualFile); } outputNames.push_back(outMisMatchFile); outputTypes["report"].push_back(outMisMatchFile); if (createFileGroup || createOligosGroup) { outputNames.push_back(outputGroupFileName); outputTypes["group"].push_back(outputGroupFileName); } } m->mothurOut("Done.\n"); delete oligos; return numReads; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "processSingleFileOption"); exit(1); } } //********************************************************************************************************************** unsigned long long MakeContigsCommand::processMultipleFileOption(map& totalGroupCounts) { try { unsigned long long numReads = 0; map cvars; string compOutputDir = outputDir; if (outputDir == "") { compOutputDir = m->hasPath(file); } cvars["[filename]"] = compOutputDir + m->getRootName(m->getSimpleName(file)); cvars["[tag]"] = ""; string compositeGroupFile = getOutputFileName("group",cvars); cvars["[tag]"] = "trim"; string compositeFastaFile = getOutputFileName("fasta",cvars); cvars["[tag]"] = "scrap"; string compositeScrapFastaFile = getOutputFileName("fasta",cvars); cvars["[tag]"] = "trim"; string compositeQualFile = getOutputFileName("qfile",cvars); cvars["[tag]"] = "scrap"; string compositeScrapQualFile = getOutputFileName("qfile",cvars); cvars["[tag]"] = ""; string compositeMisMatchFile = getOutputFileName("report",cvars); ofstream outCTFasta, outCTQual, outCSFasta, outCSQual, outCMisMatch; m->openOutputFile(compositeFastaFile, outCTFasta); outCTFasta.close(); m->openOutputFile(compositeQualFile, outCTQual); outCTQual.close(); m->openOutputFile(compositeScrapFastaFile, outCSFasta); outCSFasta.close(); m->openOutputFile(compositeScrapQualFile, outCSQual); outCSQual.close(); m->openOutputFile(compositeMisMatchFile, outCMisMatch); outCMisMatch.close(); outputNames.push_back(compositeFastaFile); outputTypes["fasta"].push_back(compositeFastaFile); outputNames.push_back(compositeQualFile); outputTypes["qfile"].push_back(compositeQualFile); outputNames.push_back(compositeMisMatchFile); outputTypes["report"].push_back(compositeMisMatchFile); outputNames.push_back(compositeScrapFastaFile); outputTypes["fasta"].push_back(compositeScrapFastaFile); outputNames.push_back(compositeScrapQualFile); outputTypes["qfile"].push_back(compositeScrapQualFile); //read file vector< vector > fileInputs = readFileNames(file); if (gz) { numReads = createProcessesGroups(fileInputs, compositeGroupFile, compositeFastaFile, compositeScrapFastaFile, compositeQualFile, compositeScrapQualFile, compositeMisMatchFile, totalGroupCounts); }else { for (int l = 0; l < fileInputs.size(); l++) { int startTime = time(NULL); if (m->control_pressed) { break; } m->mothurOut("\n>>>>>\tProcessing file pair " + fileInputs[l][0] + " - " + fileInputs[l][1] + " (files " + toString(l+1) + " of " + toString(fileInputs.size()) + ")\t<<<<<\n"); ffastqfile = fileInputs[l][0]; rfastqfile = fileInputs[l][1]; findexfile = fileInputs[l][2]; rindexfile = fileInputs[l][3]; group = file2Group[l]; groupCounts.clear(); groupMap.clear(); //run file as if it was a single int thisNumReads = processSingleFileOption(groupCounts); numReads += thisNumReads; //append to combo files if (createFileGroup || createOligosGroup) { if (l == 0) { ofstream outCGroup; m->openOutputFile(compositeGroupFile, outCGroup); outCGroup.close(); outputNames.push_back(compositeGroupFile); outputTypes["group"].push_back(compositeGroupFile); } m->appendFiles(outputGroupFileName, compositeGroupFile); if (!allFiles) { m->mothurRemove(outputGroupFileName); } else { outputNames.push_back(outputGroupFileName); outputTypes["group"].push_back(outputGroupFileName); } for (map::iterator itGroups = groupCounts.begin(); itGroups != groupCounts.end(); itGroups++) { map::iterator itTemp = totalGroupCounts.find(itGroups->first); if (itTemp == totalGroupCounts.end()) { totalGroupCounts[itGroups->first] = itGroups->second; } //new group create it in totalGroups else { itTemp->second += itGroups->second; } //existing group, update total } } if (l == 0) { m->appendFiles(outMisMatchFile, compositeMisMatchFile); } else { m->appendFilesWithoutHeaders(outMisMatchFile, compositeMisMatchFile); } m->appendFiles(outFastaFile, compositeFastaFile); m->appendFiles(outScrapFastaFile, compositeScrapFastaFile); m->appendFiles(outQualFile, compositeQualFile); m->appendFiles(outScrapQualFile, compositeScrapQualFile); if (!allFiles) { m->mothurRemove(outMisMatchFile); m->mothurRemove(outFastaFile); m->mothurRemove(outScrapFastaFile); m->mothurRemove(outQualFile); m->mothurRemove(outScrapQualFile); }else { outputNames.push_back(outFastaFile); outputTypes["fasta"].push_back(outFastaFile); outputNames.push_back(outScrapFastaFile); outputTypes["fasta"].push_back(outScrapFastaFile); outputNames.push_back(outQualFile); outputTypes["qfile"].push_back(outQualFile); outputNames.push_back(outScrapQualFile); outputTypes["qfile"].push_back(outScrapQualFile); outputNames.push_back(outMisMatchFile); outputTypes["report"].push_back(outMisMatchFile); } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - startTime) + " secs to assemble " + toString(thisNumReads) + " reads.\n"); m->mothurOutEndLine(); } } return numReads; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "processMultipleFileOption"); exit(1); } } //********************************************************************************************************************** //only getting here is gz=true unsigned long long MakeContigsCommand::createProcessesGroups(vector< vector > fileInputs, string compositeGroupFile, string compositeFastaFile, string compositeScrapFastaFile, string compositeQualFile, string compositeScrapQualFile, string compositeMisMatchFile, map& totalGroupCounts) { try { unsigned long long num = 0; vector processIDS; bool recalc = false; vector startEndIndexes; //divide files between processors int remainingPairs = fileInputs.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } startEndIndexes.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverGroups(fileInputs, startEndIndexes[process].start, startEndIndexes[process].end, compositeGroupFile + m->mothurGetpid(process) + ".temp", compositeFastaFile + m->mothurGetpid(process) + ".temp", compositeScrapFastaFile + m->mothurGetpid(process) + ".temp", compositeQualFile + m->mothurGetpid(process) + ".temp", compositeScrapQualFile + m->mothurGetpid(process) + ".temp", compositeMisMatchFile + m->mothurGetpid(process) + ".temp", totalGroupCounts); //pass groupCounts to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; if (createFileGroup || createOligosGroup) { out << totalGroupCounts.size() << endl; for (map::iterator it = totalGroupCounts.begin(); it != totalGroupCounts.end(); it++) { out << it->first << '\t' << it->second << endl; } out << groupMap.size() << endl; for (map::iterator it = groupMap.begin(); it != groupMap.end(); it++) { out << it->first << '\t' << it->second << endl; } } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); vector startEndIndexes; //divide files between processors int remainingPairs = fileInputs.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } startEndIndexes.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverGroups(fileInputs, startEndIndexes[process].start, startEndIndexes[process].end, compositeGroupFile + m->mothurGetpid(process) + ".temp", compositeFastaFile + m->mothurGetpid(process) + ".temp", compositeScrapFastaFile + m->mothurGetpid(process) + ".temp", compositeQualFile + m->mothurGetpid(process) + ".temp", compositeScrapQualFile + m->mothurGetpid(process) + ".temp", compositeMisMatchFile + m->mothurGetpid(process) + ".temp", totalGroupCounts); //pass groupCounts to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; if (createFileGroup || createOligosGroup) { out << totalGroupCounts.size() << endl; for (map::iterator it = totalGroupCounts.begin(); it != totalGroupCounts.end(); it++) { out << it->first << '\t' << it->second << endl; } out << groupMap.size() << endl; for (map::iterator it = groupMap.begin(); it != groupMap.end(); it++) { out << it->first << '\t' << it->second << endl; } } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } num = driverGroups(fileInputs, startEndIndexes[0].start, startEndIndexes[0].end, compositeGroupFile, compositeFastaFile, compositeScrapFastaFile, compositeQualFile, compositeScrapQualFile, compositeMisMatchFile, totalGroupCounts); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); int tempNum; in >> tempNum; num += tempNum; m->gobble(in); if (createFileGroup || createOligosGroup) { string group; in >> tempNum; m->gobble(in); if (tempNum != 0) { for (int j = 0; j < tempNum; j++) { int groupNum; in >> group >> groupNum; m->gobble(in); map::iterator it = totalGroupCounts.find(group); if (it == totalGroupCounts.end()) { totalGroupCounts[group] = groupNum; } else { totalGroupCounts[it->first] += groupNum; } } } in >> tempNum; m->gobble(in); if (tempNum != 0) { for (int j = 0; j < tempNum; j++) { string group, seqName; in >> seqName >> group; m->gobble(in); map::iterator it = groupMap.find(seqName); if (it == groupMap.end()) { groupMap[seqName] = group; } else { m->mothurOut("[ERROR]: " + seqName + " is in your fasta file more than once. Sequence names must be unique. please correct.\n"); } } } } in.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the contigsData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for(int h=1; hcount; if (pDataArray[i]->createFileGroup || pDataArray[i]->createOligosGroup) { for (map::iterator it = pDataArray[i]->totalGroupCounts.begin(); it != pDataArray[i]->totalGroupCounts.end(); it++) { map::iterator it2 = totalGroupCounts.find(it->first); if (it2 == totalGroupCounts.end()) { totalGroupCounts[it->first] = it->second; } else { totalGroupCounts[it->first] += it->second; } } for (map::iterator it = pDataArray[i]->groupMap.begin(); it != pDataArray[i]->groupMap.end(); it++) { map::iterator it2 = groupMap.find(it->first); if (it2 == groupMap.end()) { groupMap[it->first] = it->second; } else { m->mothurOut("[ERROR]: " + it->first + " is in your fasta file more than once. Sequence names must be unique. please correct.\n"); } } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif for (int i = 0; i < processIDS.size(); i++) { m->appendFiles((compositeGroupFile + toString(processIDS[i]) + ".temp"), compositeGroupFile); m->mothurRemove((compositeGroupFile + toString(processIDS[i]) + ".temp")); m->appendFiles((compositeFastaFile + toString(processIDS[i]) + ".temp"), compositeFastaFile); m->mothurRemove((compositeFastaFile + toString(processIDS[i]) + ".temp")); m->appendFiles((compositeScrapFastaFile + toString(processIDS[i]) + ".temp"), compositeScrapFastaFile); m->mothurRemove((compositeScrapFastaFile + toString(processIDS[i]) + ".temp")); m->appendFiles((compositeQualFile + toString(processIDS[i]) + ".temp"), compositeQualFile); m->mothurRemove((compositeQualFile + toString(processIDS[i]) + ".temp")); m->appendFiles((compositeScrapQualFile + toString(processIDS[i]) + ".temp"), compositeScrapQualFile); m->mothurRemove((compositeScrapQualFile + toString(processIDS[i]) + ".temp")); m->appendFiles((compositeMisMatchFile + toString(processIDS[i]) + ".temp"), compositeMisMatchFile); m->mothurRemove((compositeMisMatchFile + toString(processIDS[i]) + ".temp")); } return num; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "createProcessesGroups"); exit(1); } } //********************************************************************************************************************** unsigned long long MakeContigsCommand::driverGroups(vector< vector > fileInputs, int start, int end, string compositeGroupFile, string compositeFastaFile, string compositeScrapFastaFile, string compositeQualFile, string compositeScrapQualFile, string compositeMisMatchFile, map& totalGroupCounts) { try { unsigned long long numReads = 0; delim = '@'; for (int l = start; l < end; l++) { int startTime = time(NULL); if (m->control_pressed) { break; } m->mothurOut("\n>>>>>\tProcessing file pair " + fileInputs[l][0] + " - " + fileInputs[l][1] + " (files " + toString(l+1) + " of " + toString(fileInputs.size()) + ")\t<<<<<\n"); ffastqfile = fileInputs[l][0]; rfastqfile = fileInputs[l][1]; findexfile = fileInputs[l][2]; rindexfile = fileInputs[l][3]; group = file2Group[l]; groupCounts.clear(); groupMap.clear(); vector thisFileInputs; vector thisQualOrIndexInputs; vector thisLines; vector thisQLines; map variables; string thisOutputDir = outputDir; string inputFile = ffastqfile; if (outputDir == "") { thisOutputDir = m->hasPath(inputFile); } variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(inputFile)); variables["[tag]"] = "trim"; outQualFile = getOutputFileName("qfile",variables); variables["[tag]"] = "scrap"; outScrapQualFile = getOutputFileName("qfile",variables); thisFileInputs.push_back(ffastqfile); thisFileInputs.push_back(rfastqfile); if ((findexfile != "") || (rindexfile != "")){ thisQualOrIndexInputs.push_back("NONE"); thisQualOrIndexInputs.push_back("NONE"); if (findexfile != "") { thisQualOrIndexInputs[0] = findexfile; } if (rindexfile != "") { thisQualOrIndexInputs[1] = rindexfile; } } variables["[tag]"] = "trim"; outFastaFile = getOutputFileName("fasta",variables); variables["[tag]"] = "scrap"; outScrapFastaFile = getOutputFileName("fasta",variables); variables["[tag]"] = ""; outMisMatchFile = getOutputFileName("report",variables); //fake out lines - we are just going to check for end of file. Work is divided by number of files per processor. thisLines.push_back(linePair(0, 1000)); thisLines.push_back(linePair(0, 1000)); //fasta[0], fasta[1] - forward and reverse thisQLines.push_back(linePair(0, 1000)); thisQLines.push_back(linePair(0, 1000)); //qual[0], qual[1] - forward and reverse vector > fastaFileNames, qualFileNames; map uniqueFastaNames;// so we don't add the same groupfile multiple times createOligosGroup = false; oligos = new Oligos(); numBarcodes = 0; numFPrimers= 0; numLinkers= 0; numSpacers = 0; numRPrimers = 0; if(oligosfile != "") { createOligosGroup = getOligos(fastaFileNames, qualFileNames, variables["[filename]"], uniqueFastaNames); } if (createOligosGroup || createFileGroup) { outputGroupFileName = getOutputFileName("group",variables); } //give group in file file precedence if (createFileGroup) { createOligosGroup = false; } ofstream temp; m->openOutputFile(outFastaFile, temp); temp.close(); m->openOutputFile(outScrapFastaFile, temp); temp.close(); m->openOutputFile(outQualFile, temp); temp.close(); m->openOutputFile(outScrapQualFile, temp); temp.close(); m->mothurOut("Making contigs...\n"); unsigned long long thisNumReads = driver(thisFileInputs, thisQualOrIndexInputs, outFastaFile, outScrapFastaFile, outQualFile, outScrapQualFile, outMisMatchFile, fastaFileNames, qualFileNames, thisLines[0], thisLines[1], thisQLines[0], thisQLines[1], group); numReads += thisNumReads; m->mothurOut("Done.\n"); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete oligos; return 0; } if(allFiles){ // so we don't add the same groupfile multiple times map::iterator it; set namesToRemove; for(int i=0;iisBlank(fastaFileNames[i][j])){ m->mothurRemove(fastaFileNames[i][j]); namesToRemove.insert(fastaFileNames[i][j]); uniqueFastaNames.erase(fastaFileNames[i][j]); //remove from list for group file print m->mothurRemove(qualFileNames[i][j]); namesToRemove.insert(qualFileNames[i][j]); } } } } } //remove names for outputFileNames, just cleans up the output vector outputNames2; for(int i = 0; i < outputNames.size(); i++) { if (namesToRemove.count(outputNames[i]) == 0) { outputNames2.push_back(outputNames[i]); } } outputNames = outputNames2; for (it = uniqueFastaNames.begin(); it != uniqueFastaNames.end(); it++) { ifstream in; m->openInputFile(it->first, in); ofstream out; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(it->first)); string thisGroupName = getOutputFileName("group",variables); outputNames.push_back(thisGroupName); outputTypes["group"].push_back(thisGroupName); m->openOutputFile(thisGroupName, out); while (!in.eof()){ if (m->control_pressed) { break; } Sequence currSeq(in); m->gobble(in); out << currSeq.getName() << '\t' << it->second << endl; } out.close(); in.close(); } } //append to combo files if (createFileGroup || createOligosGroup) { ofstream outCGroup; if (l == 0) { m->openOutputFile(compositeGroupFile, outCGroup); outputNames.push_back(compositeGroupFile); outputTypes["group"].push_back(compositeGroupFile); } else { m->openOutputFileAppend(compositeGroupFile, outCGroup); } if (!allFiles) { m->mothurRemove(outputGroupFileName); }else { ofstream outGroup; m->openOutputFile(outputGroupFileName, outGroup); for (map::iterator itGroup = groupMap.begin(); itGroup != groupMap.end(); itGroup++) { outCGroup << itGroup->first << '\t' << itGroup->second << endl; outGroup << itGroup->first << '\t' << itGroup->second << endl; } outGroup.close(); } outCGroup.close(); for (map::iterator itGroups = groupCounts.begin(); itGroups != groupCounts.end(); itGroups++) { map::iterator itTemp = totalGroupCounts.find(itGroups->first); if (itTemp == totalGroupCounts.end()) { totalGroupCounts[itGroups->first] = itGroups->second; } //new group create it in totalGroups else { itTemp->second += itGroups->second; } //existing group, update total } } if (l == 0) { m->appendFiles(outMisMatchFile, compositeMisMatchFile); } else { m->appendFilesWithoutHeaders(outMisMatchFile, compositeMisMatchFile); } m->appendFiles(outFastaFile, compositeFastaFile); m->appendFiles(outScrapFastaFile, compositeScrapFastaFile); m->appendFiles(outQualFile, compositeQualFile); m->appendFiles(outScrapQualFile, compositeScrapQualFile); if (!allFiles) { m->mothurRemove(outMisMatchFile); m->mothurRemove(outFastaFile); m->mothurRemove(outScrapFastaFile); m->mothurRemove(outQualFile); m->mothurRemove(outScrapQualFile); }else { outputNames.push_back(outFastaFile); outputTypes["fasta"].push_back(outFastaFile); outputNames.push_back(outScrapFastaFile); outputTypes["fasta"].push_back(outScrapFastaFile); outputNames.push_back(outQualFile); outputTypes["qfile"].push_back(outQualFile); outputNames.push_back(outScrapQualFile); outputTypes["qfile"].push_back(outScrapQualFile); outputNames.push_back(outMisMatchFile); outputTypes["report"].push_back(outMisMatchFile); } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - startTime) + " secs to assemble " + toString(thisNumReads) + " reads.\n"); m->mothurOutEndLine(); } return numReads; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "driverGroups"); exit(1); } } //********************************************************************************************************************** //fileInputs[0] = forward Fasta or Forward Fastq, fileInputs[1] = reverse Fasta or reverse Fastq. if qualOrIndexFiles.size() != 0, then qualOrIndexFiles[0] = forward qual or Forward index, qualOrIndexFiles[1] = reverse qual or reverse index. //lines[0] - ffasta, lines[1] - rfasta) - processor1 //lines[2] - ffasta, lines[3] - rfasta) - processor2 //lines[4] - ffasta, lines[5] - rfasta) - processor3 //... //qlines[0] - fqual or findex, qlines[1] - rqual or rindex) - processor1 //qlines[2] - fqual or findex, qlines[3] - rqual or rindex) - processor2 //qlines[4] - fqual or findex, qlines[5] - rqual or rindex) - processor3 //... //if using index files and only have 1 then the file name = NONE, and entries are duds. Copies of other index file. //if no index files are given, then qualOrIndexFiles.size() == 0. unsigned long long MakeContigsCommand::createProcesses(vector fileInputs, vector qualOrIndexFiles, string outputFasta, string outputScrapFasta, string outputQual, string outputScrapQual, string outputMisMatches, vector > fastaFileNames, vector > qualFileNames, vector lines, vector qLines, string group) { try { int num = 0; vector processIDS; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ vector > tempFASTAFileNames = fastaFileNames; vector > tempQUALFileNames = qualFileNames; if(allFiles){ ofstream temp; for(int i=0;imothurGetpid(process) + ".temp"; m->openOutputFile(tempFASTAFileNames[i][j], temp); temp.close(); } if (tempQUALFileNames[i][j] != "") { tempQUALFileNames[i][j] += m->mothurGetpid(process) + ".temp"; m->openOutputFile(tempQUALFileNames[i][j], temp); temp.close(); } } } } int spot = process*2; num = driver(fileInputs, qualOrIndexFiles, outputFasta + m->mothurGetpid(process) + ".temp", outputScrapFasta + m->mothurGetpid(process) + ".temp", outputQual + m->mothurGetpid(process) + ".temp", outputScrapQual + m->mothurGetpid(process) + ".temp", outputMisMatches + m->mothurGetpid(process) + ".temp", tempFASTAFileNames, tempQUALFileNames, lines[spot], lines[spot+1], qLines[spot], qLines[spot+1], group); //pass groupCounts to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; if (createFileGroup || createOligosGroup) { out << groupCounts.size() << endl; for (map::iterator it = groupCounts.begin(); it != groupCounts.end(); it++) { out << it->first << '\t' << it->second << endl; } out << groupMap.size() << endl; for (map::iterator it = groupMap.begin(); it != groupMap.end(); it++) { out << it->first << '\t' << it->second << endl; } } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide lines.clear(); setLines(fileInputs, qualOrIndexFiles, lines, qLines, delim); num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ vector > tempFASTAFileNames = fastaFileNames; vector > tempQUALFileNames = qualFileNames; if(allFiles){ ofstream temp; for(int i=0;imothurGetpid(process) + ".temp"; m->openOutputFile(tempFASTAFileNames[i][j], temp); temp.close(); } if (tempQUALFileNames[i][j] != "") { tempQUALFileNames[i][j] += m->mothurGetpid(process) + ".temp"; m->openOutputFile(tempQUALFileNames[i][j], temp); temp.close(); } } } } int spot = process*2; num = driver(fileInputs, qualOrIndexFiles, outputFasta + m->mothurGetpid(process) + ".temp", outputScrapFasta + m->mothurGetpid(process) + ".temp", outputQual + m->mothurGetpid(process) + ".temp", outputScrapQual + m->mothurGetpid(process) + ".temp", outputMisMatches + m->mothurGetpid(process) + ".temp", tempFASTAFileNames, tempQUALFileNames, lines[spot], lines[spot+1], qLines[spot], qLines[spot+1], group); //pass groupCounts to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; if (createFileGroup || createOligosGroup) { out << groupCounts.size() << endl; for (map::iterator it = groupCounts.begin(); it != groupCounts.end(); it++) { out << it->first << '\t' << it->second << endl; } out << groupMap.size() << endl; for (map::iterator it = groupMap.begin(); it != groupMap.end(); it++) { out << it->first << '\t' << it->second << endl; } } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } ofstream temp; m->openOutputFile(outputFasta, temp); temp.close(); m->openOutputFile(outputScrapFasta, temp); temp.close(); if (outputQual != "") { m->openOutputFile(outputQual, temp); temp.close(); m->openOutputFile(outputScrapQual, temp); temp.close(); } //do my part int spot = 0; num = driver(fileInputs, qualOrIndexFiles, outputFasta, outputScrapFasta, outputQual, outputScrapQual, outputMisMatches, fastaFileNames, qualFileNames, lines[spot], lines[spot+1], qLines[spot], qLines[spot+1], group); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); int tempNum; in >> tempNum; num += tempNum; m->gobble(in); if (createFileGroup || createOligosGroup) { string group; in >> tempNum; m->gobble(in); if (tempNum != 0) { for (int j = 0; j < tempNum; j++) { int groupNum; in >> group >> groupNum; m->gobble(in); map::iterator it = groupCounts.find(group); if (it == groupCounts.end()) { groupCounts[group] = groupNum; } else { groupCounts[it->first] += groupNum; } } } in >> tempNum; m->gobble(in); if (tempNum != 0) { for (int j = 0; j < tempNum; j++) { string group, seqName; in >> seqName >> group; m->gobble(in); map::iterator it = groupMap.find(seqName); if (it == groupMap.end()) { groupMap[seqName] = group; } else { m->mothurOut("[ERROR]: " + seqName + " is in your fasta file more than once. Sequence names must be unique. please correct.\n"); } } } } in.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the contigsData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int h=0; h > tempFASTAFileNames = fastaFileNames; vector > tempQUALFileNames = qualFileNames; if(allFiles){ ofstream temp; for(int i=0;iopenOutputFile(tempFASTAFileNames[i][j], temp); temp.close(); } if (tempQUALFileNames[i][j] != "") { tempQUALFileNames[i][j] += extension; m->openOutputFile(tempQUALFileNames[i][j], temp); temp.close(); } } } } int spot = (h)*2; contigsData* tempcontig = new contigsData(format, delim, group, fileInputs, qualOrIndexFiles, (outputFasta + extension), (outputScrapFasta + extension), (outputQual + extension), (outputScrapQual + extension), (outputMisMatches + extension), align, m, match, misMatch, gapOpen, gapExtend, insert, deltaq, tempFASTAFileNames, tempQUALFileNames, oligosfile, reorient, pdiffs, bdiffs, tdiffs, kmerSize, createOligosGroup, createFileGroup, allFiles, trimOverlap, lines[spot].start, lines[spot].end, lines[spot+1].start, lines[spot+1].end, qLines[spot].start, qLines[spot].end, qLines[spot+1].start, qLines[spot+1].end, h); pDataArray.push_back(tempcontig); hThreadArray[h] = CreateThread(NULL, 0, MyContigsThreadFunction, pDataArray[h], 0, &dwThreadIdArray[h]); } vector > tempFASTAFileNames = fastaFileNames; vector > tempQUALFileNames = qualFileNames; if(allFiles){ ofstream temp; string extension = toString(processors-1) + ".temp"; for(int i=0;iopenOutputFile(tempFASTAFileNames[i][j], temp); temp.close(); } if (tempQUALFileNames[i][j] != "") { tempQUALFileNames[i][j] += extension; m->openOutputFile(tempQUALFileNames[i][j], temp); temp.close(); } } } } //parent do my part ofstream temp, temp2, temp3, temp4; m->openOutputFile(outputFasta, temp); temp.close(); m->openOutputFile(outputScrapFasta, temp2); temp2.close(); if (outputQual != "") { m->openOutputFile(outputQual, temp3); temp3.close(); m->openOutputFile(outputScrapQual, temp4); temp4.close(); } //do my part int spot = (processors-1)*2; processIDS.push_back(processors-1); num = driver(fileInputs, qualOrIndexFiles, (outputFasta+ toString(processors-1) + ".temp"), (outputScrapFasta+ toString(processors-1) + ".temp"), (outputQual+ toString(processors-1) + ".temp"), (outputScrapQual+ toString(processors-1) + ".temp"), (outputMisMatches+ toString(processors-1) + ".temp"), tempFASTAFileNames, tempQUALFileNames, lines[spot], lines[spot+1], qLines[spot], qLines[spot+1], group); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ num += pDataArray[i]->count; if (!pDataArray[i]->done) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of sequences assigned to it, quitting. \n"); m->control_pressed = true; } for (map::iterator it = pDataArray[i]->groupCounts.begin(); it != pDataArray[i]->groupCounts.end(); it++) { map::iterator it2 = groupCounts.find(it->first); if (it2 == groupCounts.end()) { groupCounts[it->first] = it->second; } else { groupCounts[it->first] += it->second; } } for (map::iterator it = pDataArray[i]->groupMap.begin(); it != pDataArray[i]->groupMap.end(); it++) { map::iterator it2 = groupMap.find(it->first); if (it2 == groupMap.end()) { groupMap[it->first] = it->second; } else { m->mothurOut("[ERROR]: " + it->first + " is in your fasta file more than once. Sequence names must be unique. please correct.\n"); } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif for (int i = 0; i < processIDS.size(); i++) { m->appendFiles((outputFasta + toString(processIDS[i]) + ".temp"), outputFasta); m->mothurRemove((outputFasta + toString(processIDS[i]) + ".temp")); m->appendFiles((outputScrapFasta + toString(processIDS[i]) + ".temp"), outputScrapFasta); m->mothurRemove((outputScrapFasta + toString(processIDS[i]) + ".temp")); if (outputQual != "") { m->appendFiles((outputQual + toString(processIDS[i]) + ".temp"), outputQual); m->mothurRemove((outputQual + toString(processIDS[i]) + ".temp")); m->appendFiles((outputScrapQual + toString(processIDS[i]) + ".temp"), outputScrapQual); m->mothurRemove((outputScrapQual + toString(processIDS[i]) + ".temp")); } m->appendFilesWithoutHeaders((outputMisMatches + toString(processIDS[i]) + ".temp"), outputMisMatches); m->mothurRemove((outputMisMatches + toString(processIDS[i]) + ".temp")); if(allFiles){ for(int j=0;jappendFiles((fastaFileNames[j][k] + toString(processIDS[i]) + ".temp"), fastaFileNames[j][k]); m->mothurRemove((fastaFileNames[j][k] + toString(processIDS[i]) + ".temp")); } if (qualFileNames[j][k] != "") { m->appendFiles((qualFileNames[j][k] + toString(processIDS[i]) + ".temp"), qualFileNames[j][k]); m->mothurRemove((qualFileNames[j][k] + toString(processIDS[i]) + ".temp")); } } } } } return num; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** unsigned long long MakeContigsCommand::driver(vector inputFiles, vector qualOrIndexFiles, string outputFasta, string outputScrapFasta, string outputQual, string outputScrapQual, string outputMisMatches, vector > fastaFileNames, vector > qualFileNames, linePair linesInput, linePair linesInputReverse, linePair qlinesInput, linePair qlinesInputReverse, string group){ try { vector< vector > qual_match_simple_bayesian; qual_match_simple_bayesian.resize(47); for (int i = 0; i < qual_match_simple_bayesian.size(); i++) { qual_match_simple_bayesian[i].resize(47); } vector< vector > qual_mismatch_simple_bayesian; qual_mismatch_simple_bayesian.resize(47); for (int i = 0; i < qual_mismatch_simple_bayesian.size(); i++) { qual_mismatch_simple_bayesian[i].resize(47); } loadQmatchValues(qual_match_simple_bayesian, qual_mismatch_simple_bayesian); unsigned long long num = 0; string thisfqualindexfile, thisrqualindexfile, thisffastafile, thisrfastafile; thisfqualindexfile = ""; thisrqualindexfile = ""; thisffastafile = inputFiles[0]; thisrfastafile = inputFiles[1]; if (qualOrIndexFiles.size() != 0) { thisfqualindexfile = qualOrIndexFiles[0]; thisrqualindexfile = qualOrIndexFiles[1]; } if (m->debug) { m->mothurOut("[DEBUG]: ffasta = " + thisffastafile + ".\n[DEBUG]: rfasta = " + thisrfastafile + ".\n[DEBUG]: fqualindex = " + thisfqualindexfile + ".\n[DEBUG]: rqualindex = " + thisfqualindexfile + ".\n"); } ifstream inFFasta, inRFasta, inFQualIndex, inRQualIndex; #ifdef USE_BOOST boost::iostreams::filtering_istream inFF, inRF, inFQ, inRQ; #endif if (!gz) { //plain text files m->openInputFile(thisffastafile, inFFasta); m->openInputFile(thisrfastafile, inRFasta); inFFasta.seekg(linesInput.start); inRFasta.seekg(linesInputReverse.start); }else { //compressed files - no need to seekg because compressed files divide workload differently #ifdef USE_BOOST m->openInputFileBinary(thisffastafile, inFFasta, inFF); m->openInputFileBinary(thisrfastafile, inRFasta, inRF); #endif } ofstream outFasta, outMisMatch, outScrapFasta, outQual, outScrapQual; if (thisfqualindexfile != "") { if (thisfqualindexfile != "NONE") { if (!gz) { //plain text files m->openInputFile(thisfqualindexfile, inFQualIndex); inFQualIndex.seekg(qlinesInput.start); }else { #ifdef USE_BOOST m->openInputFileBinary(thisfqualindexfile, inFQualIndex, inFQ); #endif } //compressed files - no need to seekg because compressed files divide workload differently } else { thisfqualindexfile = ""; } if (thisrqualindexfile != "NONE") { if (!gz) { //plain text files m->openInputFile(thisrqualindexfile, inRQualIndex); inRQualIndex.seekg(qlinesInputReverse.start); }else { #ifdef USE_BOOST m->openInputFileBinary(thisrqualindexfile, inRQualIndex, inRQ); #endif } //compressed files - no need to seekg because compressed files divide workload differently } else { thisrqualindexfile = ""; } } m->openOutputFile(outputFasta, outFasta); m->openOutputFile(outputScrapFasta, outScrapFasta); m->openOutputFile(outputMisMatches, outMisMatch); bool hasQuality = false; bool hasIndex = false; outMisMatch << "Name\tLength\tOverlap_Length\tOverlap_Start\tOverlap_End\tMisMatches\tNum_Ns\n"; if (delim == '@') { //fastq files so make an output quality m->openOutputFile(outputQual, outQual); m->openOutputFile(outputScrapQual, outScrapQual); hasQuality = true; if (thisfqualindexfile != "") { if (thisfqualindexfile != "NONE") { hasIndex = true; } } if (thisrqualindexfile != "") { if (thisrqualindexfile != "NONE") { hasIndex = true; } } }else if ((delim == '>') && (qualOrIndexFiles.size() != 0)) { //fasta and qual files m->openOutputFile(outputQual, outQual); m->openOutputFile(outputScrapQual, outScrapQual); hasQuality = true; } if (m->debug) { if (hasQuality) { m->mothurOut("[DEBUG]: hasQuality = true\n"); } else { m->mothurOut("[DEBUG]: hasQuality = false\n"); } } TrimOligos trimOligos(pdiffs, bdiffs, 0, 0, oligos->getPairedPrimers(), oligos->getPairedBarcodes(), hasIndex); TrimOligos* rtrimOligos = NULL; if (reorient) { rtrimOligos = new TrimOligos(pdiffs, bdiffs, 0, 0, oligos->getReorientedPairedPrimers(), oligos->getReorientedPairedBarcodes(), hasIndex); numBarcodes = oligos->getReorientedPairedBarcodes().size(); } Alignment* alignment; if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, longestBase); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } else if(align == "kmer") { alignment = new KmerAlign(kmerSize); } bool good = true; while (good) { if (m->control_pressed) { break; } int success = 1; string trashCode = ""; string commentString = ""; int currentSeqsDiffs = 0; bool ignore = false; Sequence fSeq, rSeq; QualityScores* fQual = NULL; QualityScores* rQual = NULL; QualityScores* savedFQual = NULL; QualityScores* savedRQual = NULL; Sequence findexBarcode("findex", "NONE"); Sequence rindexBarcode("rindex", "NONE"); //read from input files if (gz) { #ifdef USE_BOOST ignore = read(fSeq, rSeq, fQual, rQual, savedFQual, savedRQual, findexBarcode, rindexBarcode, delim, inFF, inRF, inFQ, inRQ, thisfqualindexfile, thisrqualindexfile); #endif }else { ignore = read(fSeq, rSeq, fQual, rQual, savedFQual, savedRQual, findexBarcode, rindexBarcode, delim, inFFasta, inRFasta, inFQualIndex, inRQualIndex, thisfqualindexfile, thisrqualindexfile); } //remove primers and barcodes if neccessary if (!ignore) { int barcodeIndex = 0; int primerIndex = 0; Sequence savedFSeq(fSeq.getName(), fSeq.getAligned()); Sequence savedRSeq(rSeq.getName(), rSeq.getAligned()); Sequence savedFindex(findexBarcode.getName(), findexBarcode.getAligned()); Sequence savedRIndex(rindexBarcode.getName(), rindexBarcode.getAligned()); if(numBarcodes != 0){ vector results; if (hasQuality) { if (hasIndex) { results = trimOligos.stripBarcode(findexBarcode, rindexBarcode, *fQual, *rQual, barcodeIndex); }else { results = trimOligos.stripBarcode(fSeq, rSeq, *fQual, *rQual, barcodeIndex); } }else { results = trimOligos.stripBarcode(fSeq, rSeq, barcodeIndex); } success = results[0] + results[2]; commentString += "fbdiffs=" + toString(results[0]) + "(" + trimOligos.getCodeValue(results[1], bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + trimOligos.getCodeValue(results[3], bdiffs) + ") "; if(success > bdiffs) { trashCode += 'b'; } else{ currentSeqsDiffs += success; } } if(numFPrimers != 0){ vector results; if (hasQuality) { results = trimOligos.stripForward(fSeq, rSeq, *fQual, *rQual, primerIndex); }else { results = trimOligos.stripForward(fSeq, rSeq, primerIndex); } success = results[0] + results[2]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos.getCodeValue(results[1], pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + trimOligos.getCodeValue(results[3], pdiffs) + ") "; if(success > pdiffs) { trashCode += 'f'; } else{ currentSeqsDiffs += success; } } if (currentSeqsDiffs > tdiffs) { trashCode += 't'; } if (reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; string thiscommentString = ""; int thisCurrentSeqsDiffs = 0; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; if(numBarcodes != 0){ vector results; if (hasQuality) { if (hasIndex) { results = rtrimOligos->stripBarcode(savedFindex, savedRIndex, *savedFQual, *savedRQual, thisBarcodeIndex); }else { results = rtrimOligos->stripBarcode(savedFSeq, savedRSeq, *savedFQual, *savedRQual, thisBarcodeIndex); } }else { results = rtrimOligos->stripBarcode(savedFSeq, savedRSeq, thisBarcodeIndex); } thisSuccess = results[0] + results[2]; thiscommentString += "fbdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], bdiffs) + ") "; if(thisSuccess > bdiffs) { thisTrashCode += 'b'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if(numFPrimers != 0){ vector results; if (hasQuality) { results = rtrimOligos->stripForward(savedFSeq, savedRSeq, *savedFQual, *savedRQual, thisPrimerIndex); }else { results = rtrimOligos->stripForward(savedFSeq, savedRSeq, thisPrimerIndex); } thisSuccess = results[0] + results[2]; thiscommentString += "fpdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pdiffs) + ") "; if(thisSuccess > pdiffs) { thisTrashCode += 'f'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if (thisCurrentSeqsDiffs > tdiffs) { thisTrashCode += 't'; } if (thisTrashCode == "") { trashCode = thisTrashCode; success = thisSuccess; currentSeqsDiffs = thisCurrentSeqsDiffs; commentString = thiscommentString; barcodeIndex = thisBarcodeIndex; primerIndex = thisPrimerIndex; savedFSeq.reverseComplement(); savedRSeq.reverseComplement(); fSeq.setAligned(savedFSeq.getAligned()); rSeq.setAligned(savedRSeq.getAligned()); if(hasQuality){ savedFQual->flipQScores(); savedRQual->flipQScores(); fQual->setScores(savedFQual->getScores()); rQual->setScores(savedRQual->getScores()); } }else { trashCode += "(" + thisTrashCode + ")"; } } //assemble reads string contig = ""; int oend, oStart; int numMismatches = 0; vector contigScores = assembleFragments(qual_match_simple_bayesian, qual_mismatch_simple_bayesian, fSeq, rSeq, fQual, rQual, savedFQual, savedRQual, hasQuality, alignment, contig, trashCode, oend, oStart, numMismatches); //prints results to outputs files if(trashCode.length() == 0){ bool ignore = false; if (m->debug) { m->mothurOut("[DEBUG]: " + fSeq.getName()); } if (createOligosGroup) { string thisGroup = oligos->getGroupName(barcodeIndex, primerIndex); if (m->debug) { m->mothurOut(", group= " + thisGroup + "\n"); } int pos = thisGroup.find("ignore"); if (pos == string::npos) { groupMap[fSeq.getName()] = thisGroup; map::iterator it = groupCounts.find(thisGroup); if (it == groupCounts.end()) { groupCounts[thisGroup] = 1; } else { groupCounts[it->first] ++; } }else { ignore = true; } }else if (createFileGroup) { //for 3 column file option int pos = group.find("ignore"); if (pos == string::npos) { groupMap[fSeq.getName()] = group; map::iterator it = groupCounts.find(group); if (it == groupCounts.end()) { groupCounts[group] = 1; } else { groupCounts[it->first] ++; } }else { ignore = true; } } if (m->debug) { m->mothurOut("\n"); } if(!ignore){ //output outFasta << ">" << fSeq.getName() << '\t' << commentString << endl << contig << endl; if (hasQuality) { outQual << ">" << fSeq.getName() << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { outQual << contigScores[i] << " "; } outQual << endl; } int numNs = 0; for (int i = 0; i < contig.length(); i++) { if (contig[i] == 'N') { numNs++; } } outMisMatch << fSeq.getName() << '\t' << contig.length() << '\t' << (oend-oStart) << '\t' << oStart << '\t' << oend << '\t' << numMismatches << '\t' << numNs << endl; if (allFiles) { ofstream output; m->openOutputFileAppend(fastaFileNames[barcodeIndex][primerIndex], output); output << ">" << fSeq.getName() << '\t' << commentString << endl << contig << endl; output.close(); if (hasQuality) { ofstream output2; m->openOutputFileAppend(qualFileNames[barcodeIndex][primerIndex], output2); output2 << ">" << fSeq.getName() << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { output2 << contigScores[i] << " "; } output2 << endl; output2.close(); } } } }else { //output outScrapFasta << ">" << fSeq.getName() << " | " << trashCode << '\t' << commentString << endl << contig << endl; if (hasQuality) { outScrapQual << ">" << fSeq.getName() << " | " << trashCode << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { outScrapQual << contigScores[i] << " "; } outScrapQual << endl; } } } num++; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (!gz) { unsigned long long pos = inFFasta.tellg(); if ((pos == -1) || (pos >= linesInput.end)) { good = false; break; } }else { #ifdef USE_BOOST if (inFF.eof() || inRF.eof()) { good = false; break; } #endif } #else if (!gz) { if ((inFFasta.eof()) || (inRFasta.eof())) { good = false; break; } }else { #ifdef USE_BOOST if (inFF.eof() || inRF.eof()) { good = false; break; } #endif } #endif //report progress if((num) % 1000 == 0){ m->mothurOutJustToScreen(toString(num)+"\n"); } } //report progress if((num) % 1000 != 0){ m->mothurOutJustToScreen(toString(num)+"\n"); } //close files inFFasta.close(); inRFasta.close(); if (gz) { #ifdef USE_BOOST inFF.pop(); inRF.pop(); #endif } outFasta.close(); outScrapFasta.close(); outMisMatch.close(); if (delim == '@') { if (thisfqualindexfile != "") { inFQualIndex.close(); if (gz) { #ifdef USE_BOOST inFQ.pop(); #endif } } if (thisrqualindexfile != "") { inRQualIndex.close(); if (gz) { #ifdef USE_BOOST inRQ.pop(); #endif } } outQual.close(); outScrapQual.close(); }else{ if (hasQuality) { inFQualIndex.close(); inRQualIndex.close(); if (gz) { #ifdef USE_BOOST inFQ.pop(); inRQ.pop(); #endif } outQual.close(); outScrapQual.close(); } } //cleanup memory delete alignment; if (reorient) { delete rtrimOligos; } if (m->control_pressed) { m->mothurRemove(outputFasta); m->mothurRemove(outputScrapFasta); m->mothurRemove(outputMisMatches); if (hasQuality) { m->mothurRemove(outputQual); m->mothurRemove(outputScrapQual); } } return num; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "driver"); exit(1); } } /**************************************************************************************************/ //vector contigScores = assembleFragments(qual_match_simple_bayesian, qual_mismatch_simple_bayesian, fSeq, rSeq, alignment, contig); vector MakeContigsCommand::assembleFragments(vector< vector >&qual_match_simple_bayesian, vector< vector >& qual_mismatch_simple_bayesian, Sequence& fSeq, Sequence& rSeq, QualityScores*& fQual, QualityScores*& rQual, QualityScores*& savedFQual, QualityScores*& savedRQual, bool hasQuality, Alignment*& alignment, string& contig, string& trashCode, int& oend, int& oStart, int& numMismatches) { try { vector contigScores; //flip the reverse reads rSeq.reverseComplement(); if (hasQuality) { rQual->flipQScores(); } //pairwise align alignment->align(fSeq.getUnaligned(), rSeq.getUnaligned()); map ABaseMap = alignment->getSeqAAlnBaseMap(); map BBaseMap = alignment->getSeqBAlnBaseMap(); fSeq.setAligned(alignment->getSeqAAln()); rSeq.setAligned(alignment->getSeqBAln()); int length = fSeq.getAligned().length(); //traverse alignments merging into one contiguous seq string seq1 = fSeq.getAligned(); string seq2 = rSeq.getAligned(); vector scores1, scores2; if (hasQuality) { scores1 = fQual->getQualityScores(); scores2 = rQual->getQualityScores(); delete fQual; delete rQual; delete savedFQual; delete savedRQual; } //if (num == 5) { cout << fSeq.getStartPos() << '\t' << fSeq.getEndPos() << '\t' << rSeq.getStartPos() << '\t' << rSeq.getEndPos() << endl; exit(1); } int overlapStart = fSeq.getStartPos()-1; int seq2Start = rSeq.getStartPos()-1; //bigger of the 2 starting positions is the location of the overlapping start if (overlapStart < seq2Start) { //seq2 starts later so take from 0 to seq2Start from seq1 overlapStart = seq2Start; for (int i = 0; i < overlapStart; i++) { contig += seq1[i]; if (hasQuality) { if (((seq1[i] != '-') && (seq1[i] != '.'))) { contigScores.push_back(scores1[ABaseMap[i]]); } } } }else { //seq1 starts later so take from 0 to overlapStart from seq2 for (int i = 0; i < overlapStart; i++) { contig += seq2[i]; if (hasQuality) { if (((seq2[i] != '-') && (seq2[i] != '.'))) { contigScores.push_back(scores2[BBaseMap[i]]); } } } } int seq1End = fSeq.getEndPos(); int seq2End = rSeq.getEndPos(); int overlapEnd = seq1End; if (seq2End < overlapEnd) { overlapEnd = seq2End; } //smallest end position is where overlapping ends oStart = contig.length(); //cout << fSeq.getAligned() << endl; cout << rSeq.getAligned() << endl; int firstForward = 0; int seq2FirstForward = 0; int lastReverse = seq1.length(); int seq2lastReverse = seq2.length(); bool firstChooseSeq1 = false; bool lastChooseSeq1 = false; if (hasQuality) { for (int i = 0; i < seq1.length(); i++) { if ((seq1[i] != '.') && (seq1[i] != '-')) { if (scores1[ABaseMap[i]] == 2) { firstForward++; }else { break; } } } for (int i = 0; i < seq2.length(); i++) { if ((seq2[i] != '.') && (seq2[i] != '-')) { if (scores2[BBaseMap[i]] == 2) { seq2FirstForward++; }else { break; } } } if (seq2FirstForward > firstForward) { firstForward = seq2FirstForward; firstChooseSeq1 = true; } for (int i = seq1.length()-1; i >= 0; i--) { if ((seq1[i] != '.') && (seq1[i] != '-')) { if (scores1[ABaseMap[i]] == 2) { lastReverse--; }else { break; } } } for (int i = seq2.length()-1; i >= 0; i--) { if ((seq2[i] != '.') && (seq2[i] != '-')) { if (scores2[BBaseMap[i]] == 2) { seq2lastReverse--; }else { break; } } } if (lastReverse > seq2lastReverse) { lastReverse = seq2lastReverse; lastChooseSeq1 = true; } } //cout << firstForward << '\t' << lastReverse << endl; for (int i = overlapStart; i < overlapEnd; i++) { //cout << seq1[i] << ' ' << seq2[i] << ' ' << scores1[ABaseMap[i]] << ' ' << scores2[BBaseMap[i]] << endl; if (seq1[i] == seq2[i]) { contig += seq1[i]; if (hasQuality) { contigScores.push_back(convertProb(qual_match_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])])); } }else if (((seq1[i] == '.') || (seq1[i] == '-')) && ((seq2[i] != '-') && (seq2[i] != '.'))) { //seq1 is a gap and seq2 is a base, choose seq2, unless quality score for base is below insert. In that case eliminate base if (hasQuality) { if (scores2[BBaseMap[i]] <= insert) { } // else { contig += seq2[i]; contigScores.push_back(scores2[BBaseMap[i]]); } }else { contig += seq2[i]; } //with no quality info, then we keep it? }else if (((seq2[i] == '.') || (seq2[i] == '-')) && ((seq1[i] != '-') && (seq1[i] != '.'))) { //seq2 is a gap and seq1 is a base, choose seq1, unless quality score for base is below insert. In that case eliminate base if (hasQuality) { if (scores1[ABaseMap[i]] <= insert) { } //eliminate base else { contig += seq1[i]; contigScores.push_back(scores1[ABaseMap[i]]); } }else { contig += seq1[i]; } //with no quality info, then we keep it? }else if (((seq1[i] != '-') && (seq1[i] != '.')) && ((seq2[i] != '-') && (seq2[i] != '.'))) { //both bases choose one with better quality if (hasQuality) { if (abs(scores1[ABaseMap[i]] - scores2[BBaseMap[i]]) >= deltaq) { //is the difference in qual scores >= deltaq, if yes choose base with higher score char c = seq1[i]; if (scores1[ABaseMap[i]] < scores2[BBaseMap[i]]) { c = seq2[i]; } contig += c; if ((i >= firstForward) && (i <= lastReverse)) { //in unmasked section contigScores.push_back(convertProb(qual_mismatch_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])])); }else if (i < firstForward) { if (firstChooseSeq1) { contigScores.push_back(scores1[ABaseMap[i]]); } else { contigScores.push_back(scores2[BBaseMap[i]]); } }else if ((i > lastReverse)) { if (lastChooseSeq1) { contigScores.push_back(scores1[ABaseMap[i]]); } else { contigScores.push_back(scores2[BBaseMap[i]]); } }else { contigScores.push_back(2); } //N }else { //if no, base becomes n contig += 'N'; contigScores.push_back(2); } numMismatches++; }else { numMismatches++; } //cant decide, so eliminate and mark as mismatch }else { //should never get here m->mothurOut("[ERROR]: case I didn't think of seq1 = " + toString(seq1[i]) + " and seq2 = " + toString(seq2[i]) + "\n"); } //printf("Overlap seq: %i, %i, %i, %c, %i\n", i, scores1[ABaseMap[i]], scores2[BBaseMap[i]], contig[contig.length()-1], contigScores[contigScores.size()-1]); } oend = contig.length(); if (seq1End < seq2End) { //seq1 ends before seq2 so take from overlap to length from seq2 for (int i = overlapEnd; i < length; i++) { contig += seq2[i]; if (hasQuality) { if (((seq2[i] != '-') && (seq2[i] != '.'))) { contigScores.push_back(scores2[BBaseMap[i]]); } } } }else { //seq2 ends before seq1 so take from overlap to length from seq1 for (int i = overlapEnd; i < length; i++) { contig += seq1[i]; if (hasQuality) { if (((seq1[i] != '-') && (seq1[i] != '.'))) { contigScores.push_back(scores1[ABaseMap[i]]); } } } } //cout << contig << endl; if (trimOverlap) { contig = contig.substr(overlapStart, oend-oStart); if (contig.length() == 0) { trashCode += "l"; } if (hasQuality) { vector newContigScores; for (int i = overlapStart; i < oend; i++) { newContigScores.push_back(contigScores[i]); } contigScores = newContigScores; } } return contigScores; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "assembleFragments"); exit(1); } } /**************************************************************************************************/ #ifdef USE_BOOST //ignore = read(fSeq, rSeq, fQual, rQual, savedFQual, savedRQual, findexBarcode, rindexBarcode, delim, inFF, inRF, inFQ, inRQ); bool MakeContigsCommand::read(Sequence& fSeq, Sequence& rSeq, QualityScores*& fQual, QualityScores*& rQual, QualityScores*& savedFQual, QualityScores*& savedRQual, Sequence& findexBarcode, Sequence& rindexBarcode, char delim, boost::iostreams::filtering_istream& inFF, boost::iostreams::filtering_istream& inRF, boost::iostreams::filtering_istream& inFQ, boost::iostreams::filtering_istream& inRQ, string thisfqualindexfile, string thisrqualindexfile) { try { bool ignore = false; if (delim == '@') { //fastq files bool tignore; FastqRead fread(inFF, tignore, format); FastqRead rread(inRF, ignore, format); if (!checkName(fread, rread)) { FastqRead f2read(inFF, tignore, format); if (!checkName(f2read, rread)) { FastqRead r2read(inRF, ignore, format); if (!checkName(fread, r2read)) { m->mothurOut("[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { rread = r2read; } }else { fread = f2read; } } if (tignore) { ignore=true; } fSeq.setName(fread.getName()); fSeq.setAligned(fread.getSeq()); rSeq.setName(rread.getName()); rSeq.setAligned(rread.getSeq()); fQual = new QualityScores(fread.getName(), fread.getScores()); rQual = new QualityScores(rread.getName(), rread.getScores()); savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (thisfqualindexfile != "") { //forward index file FastqRead firead(inFQ, tignore, format); if (tignore) { ignore=true; } findexBarcode.setAligned(firead.getSeq()); if (!checkName(fread, firead)) { FastqRead f2iread(inFQ, tignore, format); if (tignore) { ignore=true; } if (!checkName(fread, f2iread)) { m->mothurOut("[WARNING]: name mismatch in forward index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { firead = f2iread; findexBarcode.setAligned(firead.getSeq()); } } } if (thisrqualindexfile != "") { //reverse index file FastqRead riread(inRQ, tignore, format); if (tignore) { ignore=true; } rindexBarcode.setAligned(riread.getSeq()); if (!checkName(fread, riread)) { FastqRead r2iread(inRQ, tignore, format); m->gobble(inRQ); if (tignore) { ignore=true; } if (!checkName(fread, r2iread)) { m->mothurOut("[WARNING]: name mismatch in reverse index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { riread = r2iread; rindexBarcode.setAligned(riread.getSeq()); } } } }else { //reading fasta and maybe qual Sequence tfSeq(inFF); Sequence trSeq(inRF); if (!checkName(tfSeq, trSeq)) { Sequence t2fSeq(inFF); if (!checkName(t2fSeq, trSeq)) { Sequence t2rSeq(inRF); if (!checkName(tfSeq, t2rSeq)) { m->mothurOut("[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; }else { trSeq = t2fSeq; } }else { tfSeq = t2fSeq; } } fSeq.setName(tfSeq.getName()); fSeq.setAligned(tfSeq.getAligned()); rSeq.setName(trSeq.getName()); rSeq.setAligned(trSeq.getAligned()); if (thisfqualindexfile != "") { fQual = new QualityScores(inFQ); m->gobble(inFQ); rQual = new QualityScores(inRQ); m->gobble(inRQ); if (!checkName(*fQual, *rQual)) { m->mothurOut("[WARNING]: name mismatch in forward and reverse qual file. Ignoring, " + fQual->getName() + ".\n"); ignore = true; } savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (fQual->getName() != tfSeq.getName()) { m->mothurOut("[WARNING]: name mismatch in forward quality file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } if (rQual->getName() != trSeq.getName()) { m->mothurOut("[WARNING]: name mismatch in reverse quality file. Ignoring, " + trSeq.getName() + ".\n"); ignore = true; } } if (tfSeq.getName() != trSeq.getName()) { m->mothurOut("[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } } return ignore; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "read"); exit(1); } } #endif /**************************************************************************************************/ bool MakeContigsCommand::read(Sequence& fSeq, Sequence& rSeq, QualityScores*& fQual, QualityScores*& rQual, QualityScores*& savedFQual, QualityScores*& savedRQual, Sequence& findexBarcode, Sequence& rindexBarcode, char delim, ifstream& inFFasta, ifstream& inRFasta, ifstream& inFQualIndex, ifstream& inRQualIndex, string thisfqualindexfile, string thisrqualindexfile) { try { bool ignore = false; if (delim == '@') { //fastq files bool tignore; FastqRead fread(inFFasta, tignore, format); m->gobble(inFFasta); FastqRead rread(inRFasta, ignore, format); m->gobble(inRFasta); if (!checkName(fread, rread)) { FastqRead f2read(inFFasta, tignore, format); m->gobble(inFFasta); if (!checkName(f2read, rread)) { FastqRead r2read(inRFasta, ignore, format); m->gobble(inRFasta); if (!checkName(fread, r2read)) { m->mothurOut("[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { rread = r2read; } }else { fread = f2read; } } if (tignore) { ignore=true; } fSeq.setName(fread.getName()); fSeq.setAligned(fread.getSeq()); rSeq.setName(rread.getName()); rSeq.setAligned(rread.getSeq()); fQual = new QualityScores(fread.getName(), fread.getScores()); rQual = new QualityScores(rread.getName(), rread.getScores()); savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (thisfqualindexfile != "") { //forward index file FastqRead firead(inFQualIndex, tignore, format); m->gobble(inFQualIndex); if (tignore) { ignore=true; } findexBarcode.setAligned(firead.getSeq()); if (!checkName(fread, firead)) { FastqRead f2iread(inFQualIndex, tignore, format); m->gobble(inFQualIndex); if (tignore) { ignore=true; } if (!checkName(fread, f2iread)) { m->mothurOut("[WARNING]: name mismatch in forward index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { firead = f2iread; findexBarcode.setAligned(firead.getSeq()); } } } if (thisrqualindexfile != "") { //reverse index file FastqRead riread(inRQualIndex, tignore, format); m->gobble(inRQualIndex); if (tignore) { ignore=true; } rindexBarcode.setAligned(riread.getSeq()); if (!checkName(fread, riread)) { FastqRead r2iread(inRQualIndex, tignore, format); m->gobble(inRQualIndex); if (tignore) { ignore=true; } if (!checkName(fread, r2iread)) { m->mothurOut("[WARNING]: name mismatch in reverse index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { riread = r2iread; rindexBarcode.setAligned(riread.getSeq()); } } } }else { //reading fasta and maybe qual Sequence tfSeq(inFFasta); m->gobble(inFFasta); Sequence trSeq(inRFasta); m->gobble(inRFasta); if (!checkName(tfSeq, trSeq)) { Sequence t2fSeq(inFFasta); m->gobble(inFFasta); if (!checkName(t2fSeq, trSeq)) { Sequence t2rSeq(inRFasta); m->gobble(inRFasta); if (!checkName(tfSeq, t2rSeq)) { m->mothurOut("[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; }else { trSeq = t2fSeq; } }else { tfSeq = t2fSeq; } } fSeq.setName(tfSeq.getName()); fSeq.setAligned(tfSeq.getAligned()); rSeq.setName(trSeq.getName()); rSeq.setAligned(trSeq.getAligned()); if (thisfqualindexfile != "") { fQual = new QualityScores(inFQualIndex); m->gobble(inFQualIndex); rQual = new QualityScores(inRQualIndex); m->gobble(inRQualIndex); if (!checkName(*fQual, *rQual)) { m->mothurOut("[WARNING]: name mismatch in forward and reverse qual file. Ignoring, " + fQual->getName() + ".\n"); ignore = true; } savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (fQual->getName() != tfSeq.getName()) { m->mothurOut("[WARNING]: name mismatch in forward quality file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } if (rQual->getName() != trSeq.getName()) { m->mothurOut("[WARNING]: name mismatch in reverse quality file. Ignoring, " + trSeq.getName() + ".\n"); ignore = true; } } if (tfSeq.getName() != trSeq.getName()) { m->mothurOut("[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } } return ignore; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "read"); exit(1); } } /**************************************************************************************************/ int MakeContigsCommand::setLines(vector fasta, vector qual, vector& lines, vector& qLines, char delim) { try { lines.clear(); qLines.clear(); vector fastaFilePos; vector qfileFilePos; vector temp; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //set file positions for fasta file fastaFilePos = m->divideFile(fasta[0], processors, delim); //get name of first sequence in each chunk map firstSeqNames; map trimmedNames; for (int i = 0; i < (fastaFilePos.size()-1); i++) { ifstream in; m->openInputFile(fasta[0], in); in.seekg(fastaFilePos[i]); string name = ""; if (delim == '>') { Sequence temp(in); name = temp.getName(); }else { string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); name = pieces[0]; name = name.substr(1); m->checkName(name); } firstSeqNames[name] = i; trimmedNames[name.substr(0, name.length()-1)]; in.close(); } map copy; map tcopy; if (qual.size() != 0) { copy = firstSeqNames; tcopy = trimmedNames; } //look for match in reverse file ifstream in2; m->openInputFile(fasta[1], in2); string input; while(!in2.eof()){ input = m->getline(in2); m->gobble(in2); if (input.length() != 0) { if(input[0] == delim){ //this is a name line vector pieces = m->splitWhiteSpace(input); string name = pieces[0]; name = name.substr(1); m->checkName(name); map::iterator it = firstSeqNames.find(name); map::iterator itTrimmed = trimmedNames.find(name.substr(0, name.length()-1)); if (it != firstSeqNames.end()) { //this is the start of a new chunk unsigned long long pos = in2.tellg(); qfileFilePos.push_back(pos - input.length() - 1); firstSeqNames.erase(it); }else if (itTrimmed != trimmedNames.end()) { unsigned long long pos = in2.tellg(); qfileFilePos.push_back(pos - input.length() - 1); trimmedNames.erase(itTrimmed); } } } if ((firstSeqNames.size() == 0) || (trimmedNames.size() == 0)) { break; } } in2.close(); //get last file position of reverse fasta[1] FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (fasta[1].c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } qfileFilePos.push_back(size); if ((firstSeqNames.size() != 0) && (trimmedNames.size() != 0)){ for (map::iterator it = firstSeqNames.begin(); it != firstSeqNames.end(); it++) { if (delim == '>') { m->mothurOut(it->first + " is in your forward fasta file and not in your reverse file, please remove it using the remove.seqs command before proceeding."); m->mothurOutEndLine(); }else { m->mothurOut(it->first + " is in your forward fastq file and not in your reverse file, please remove it using the remove.seqs command before proceeding."); m->mothurOutEndLine(); } } m->control_pressed = true; return processors; } //fill lines with paired forward and reverse fasta lines for (int i = 0; i < (fastaFilePos.size()-1); i++) { if (m->debug) { m->mothurOut("[DEBUG]: forward " + toString(i) +'\t' + toString(fastaFilePos[i]) + '\t' + toString(fastaFilePos[i+1]) + '\n'); } lines.push_back(linePair(fastaFilePos[i], fastaFilePos[(i+1)])); if (m->debug) { m->mothurOut("[DEBUG]: reverse " + toString(i) +'\t' + toString(qfileFilePos[i]) + '\t' + toString(qfileFilePos[i+1]) + '\n'); } lines.push_back(linePair(qfileFilePos[i], qfileFilePos[(i+1)])); } qfileFilePos.clear(); if (qual.size() != 0) { firstSeqNames = copy; trimmedNames = tcopy; if (qual[0] != "NONE") { //seach for filePos of each first name in the qfile and save in qfileFilePos ifstream inQual; m->openInputFile(qual[0], inQual); string input; while(!inQual.eof()){ input = m->getline(inQual); m->gobble(inQual); if (input.length() != 0) { if(input[0] == delim){ //this is a sequence name line vector pieces = m->splitWhiteSpace(input); string name = pieces[0]; name = name.substr(1); m->checkName(name); map::iterator it = firstSeqNames.find(name); map::iterator itTrimmed = trimmedNames.find(name.substr(0, name.length()-1)); if(it != firstSeqNames.end()) { //this is the start of a new chunk unsigned long long pos = inQual.tellg(); qfileFilePos.push_back(pos - input.length() - 1); firstSeqNames.erase(it); }else if (itTrimmed != trimmedNames.end()) { unsigned long long pos = inQual.tellg(); qfileFilePos.push_back(pos - input.length() - 1); trimmedNames.erase(itTrimmed); } } } if ((firstSeqNames.size() == 0) || (trimmedNames.size() == 0)) { break; } } inQual.close(); //get last file position of reverse qual[0] FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (qual[0].c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } qfileFilePos.push_back(size); if ((firstSeqNames.size() != 0) && (trimmedNames.size() != 0)){ for (map::iterator it = firstSeqNames.begin(); it != firstSeqNames.end(); it++) { if (delim == '>') { m->mothurOut(it->first + " is in your forward fasta file and reverse fasta file, but not your forward qfile, please remove it using the remove.seqs command before proceeding."); m->mothurOutEndLine(); }else { m->mothurOut(it->first + " is in your forward fastq file and reverse fastq file, but not your forward index, please remove it using the remove.seqs command before proceeding."); m->mothurOutEndLine(); } } m->control_pressed = true; return processors; } } firstSeqNames = copy; trimmedNames = tcopy; if (qual[1] != "NONE") { ifstream inQual2; m->openInputFile(qual[1], inQual2); while(!inQual2.eof()){ input = m->getline(inQual2); m->gobble(inQual2); if (input.length() != 0) { if(input[0] == delim){ //this is a sequence name line vector pieces = m->splitWhiteSpace(input); string name = pieces[0]; name = name.substr(1); m->checkName(name); map::iterator it = firstSeqNames.find(name); map::iterator itTrimmed = trimmedNames.find(name.substr(0, name.length()-1)); if(it != firstSeqNames.end()) { //this is the start of a new chunk unsigned long long pos = inQual2.tellg(); temp.push_back(pos - input.length() - 1); firstSeqNames.erase(it); }else if (itTrimmed != trimmedNames.end()) { unsigned long long pos = inQual2.tellg(); qfileFilePos.push_back(pos - input.length() - 1); trimmedNames.erase(itTrimmed); } } } if ((firstSeqNames.size() == 0) || (trimmedNames.size() == 0)) { break; } } inQual2.close(); //get last file position of reverse qual[1] FILE * pFile2; //get num bytes in file pFile2 = fopen (qual[1].c_str(),"rb"); if (pFile2==NULL) perror ("Error opening file"); else{ fseek (pFile2, 0, SEEK_END); size=ftell (pFile2); fclose (pFile2); } temp.push_back(size); if ((firstSeqNames.size() != 0) && (trimmedNames.size() != 0)){ for (map::iterator it = firstSeqNames.begin(); it != firstSeqNames.end(); it++) { if (delim == '>') { m->mothurOut(it->first + " is in your forward fasta file, reverse fasta file, and forward qfile but not your reverse qfile, please remove it using the remove.seqs command before proceeding."); m->mothurOutEndLine(); }else { if (qual[0] != "NONE") { m->mothurOut(it->first + " is in your forward fastq file, reverse fastq file, and forward index but not your reverse index, please remove it using the remove.seqs command before proceeding."); m->mothurOutEndLine(); }else { m->mothurOut(it->first + " is in your forward fastq file, reverse fastq file, but not your reverse index, please remove it using the remove.seqs command before proceeding."); m->mothurOutEndLine(); } } } m->control_pressed = true; return processors; } } if (qual[0] == "NONE") { qfileFilePos = temp; } //fill with duds, if both were NONE then qual.size() == 0 if (qual[1] == "NONE") { temp = qfileFilePos; } //fill with duds, if both were NONE then qual.size() == 0 //fill lines with paired forward and reverse fasta lines for (int i = 0; i < (fastaFilePos.size()-1); i++) { if (m->debug) { m->mothurOut("[DEBUG]: forward " + toString(i) +'\t' + toString(qfileFilePos[i]) + '\t' + toString(qfileFilePos[i+1]) + '\n'); } qLines.push_back(linePair(qfileFilePos[i], qfileFilePos[(i+1)])); if (m->debug) { m->mothurOut("[DEBUG]: reverse " + toString(i) +'\t' + toString(temp[i]) + '\t' + toString(temp[i+1]) + '\n'); } qLines.push_back(linePair(temp[i], temp[(i+1)])); } }else { qLines = lines; } //files with duds return processors; #else if (processors == 1) { //save time //fastaFilePos.push_back(0); qfileFilePos.push_back(0); //fastaFilePos.push_back(1000); qfileFilePos.push_back(1000); lines.push_back(linePair(0, 1000)); lines.push_back(linePair(0, 1000)); //fasta[0], fasta[1] - forward and reverse qLines.push_back(linePair(0, 1000)); qLines.push_back(linePair(0, 1000)); //qual[0], qual[1] - forward and reverse }else{ long long numFastaSeqs = 0; fastaFilePos = m->setFilePosFasta(fasta[0], numFastaSeqs, delim); //forward if (fastaFilePos.size() < processors) { processors = fastaFilePos.size(); } long long numRFastaSeqs = 0; qfileFilePos = m->setFilePosFasta(fasta[1], numRFastaSeqs, delim); //reverse if (numFastaSeqs != numRFastaSeqs) { if (delim == '>') { m->mothurOut("[ERROR]: You have " + toString(numFastaSeqs) + " sequences in your forward fasta file, but " + toString(numRFastaSeqs) + " sequences in your reverse fasta file. Please use the list.seqs and get.seqs commands to make the files match before proceeding."); m->mothurOutEndLine(); m->control_pressed = true; return processors; }else { m->mothurOut("[ERROR]: You have " + toString(numFastaSeqs) + " sequences in your forward fastq file, but " + toString(numRFastaSeqs) + " sequences in your reverse fastq file. Please use the list.seqs and get.seqs commands to make the files match before proceeding."); m->mothurOutEndLine(); m->control_pressed = true; return processors; } } //figure out how many sequences you have to process unsigned long long numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { unsigned long long startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(fastaFilePos[startIndex], numSeqsPerProcessor)); //forward lines.push_back(linePair(qfileFilePos[startIndex], numSeqsPerProcessor)); //reverse } if (qual.size() != 0) { long long numFQualSeqs = 0; long long numRQualSeqs = 0; fastaFilePos.clear(); qfileFilePos.clear(); if (qual[0] != "NONE") { fastaFilePos = m->setFilePosFasta(qual[0], numFQualSeqs, delim); } //forward index or qual file if (qual[1] != "NONE") { qfileFilePos = m->setFilePosFasta(qual[1], numRQualSeqs, delim); }//reverse index or qual file if (qual[0] == "NONE") { fastaFilePos = qfileFilePos; numFQualSeqs = numRQualSeqs; } //fill with duds, if both were NONE then qual.size() == 0 if (qual[1] == "NONE") { qfileFilePos = fastaFilePos; numRQualSeqs = numFQualSeqs; } //fill with duds, if both were NONE then qual.size() == 0 if ((numFQualSeqs != numRQualSeqs) || (numFQualSeqs != numFastaSeqs)){ if (delim == '>') { m->mothurOut("[ERROR]: You have " + toString(numFastaSeqs) + " sequences in your forward fasta file, " + toString(numRFastaSeqs) + " sequences in your reverse fasta file, " + toString(numFQualSeqs) + " sequences in your forward qual file, " + toString(numRQualSeqs) + " sequences in your reverse qual file. Please use the list.seqs and get.seqs commands to make the files match before proceeding."); m->mothurOutEndLine(); m->control_pressed = true; return processors; }else { if (qual[0] != "NONE") { m->mothurOut("[ERROR]: You have " + toString(numFastaSeqs) + " sequences in your forward fastq file, " + toString(numRFastaSeqs) + " sequences in your reverse fastq file and " + toString(numRQualSeqs) + " sequences in your reverse index file. Please use the list.seqs and get.seqs commands to make the files match before proceeding."); m->mothurOutEndLine(); m->control_pressed = true; return processors; }else if (qual[1] != "NONE") { m->mothurOut("[ERROR]: You have " + toString(numFastaSeqs) + " sequences in your forward fastq file, " + toString(numRFastaSeqs) + " sequences in your reverse fastq file and " + toString(numFQualSeqs) + " sequences in your forward index file. Please use the list.seqs and get.seqs commands to make the files match before proceeding."); m->mothurOutEndLine(); m->control_pressed = true; return processors; }else { m->mothurOut("[ERROR]: You have " + toString(numFastaSeqs) + " sequences in your forward fastq file, " + toString(numRFastaSeqs) + " sequences in your reverse fastq file, " + toString(numFQualSeqs) + " sequences in your forward index file, " + toString(numRQualSeqs) + " sequences in your reverse index file. Please use the list.seqs and get.seqs commands to make the files match before proceeding."); m->mothurOutEndLine(); m->control_pressed = true; return processors; } } } //figure out how many sequences you have to process unsigned long long numSeqsPerProcessor = numFQualSeqs / processors; for (int i = 0; i < processors; i++) { unsigned long long startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFQualSeqs - i * numSeqsPerProcessor; } qLines.push_back(linePair(fastaFilePos[startIndex], numSeqsPerProcessor)); //forward qLines.push_back(linePair(qfileFilePos[startIndex], numSeqsPerProcessor)); //reverse } }else { qLines = lines; } //files with duds } if(qual.size() == 0) { qLines = lines; } //files with duds return 1; #endif } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "setLines"); exit(1); } } //*************************************************************************************************************** //lines can be 2, 3, or 4 columns // forward.fastq reverse.fastq -> 2 column // groupName forward.fastq reverse.fastq -> 3 column // forward.fastq reverse.fastq forward.index.fastq reverse.index.fastq -> 4 column // forward.fastq reverse.fastq none reverse.index.fastq -> 4 column // forward.fastq reverse.fastq forward.index.fastq none -> 4 column vector< vector > MakeContigsCommand::readFileNames(string filename){ try { vector< vector > files; string forward, reverse, findex, rindex; bool allGZ = true; bool allPlainTxt = true; ifstream in; m->openInputFile(filename, in); while(!in.eof()) { if (m->control_pressed) { return files; } string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); string group = ""; if (pieces.size() == 2) { forward = pieces[0]; reverse = pieces[1]; group = ""; findex = ""; rindex = ""; }else if (pieces.size() == 3) { group = pieces[0]; forward = pieces[1]; reverse = pieces[2]; findex = ""; rindex = ""; createFileGroup = true; }else if (pieces.size() == 4) { forward = pieces[0]; reverse = pieces[1]; findex = pieces[2]; rindex = pieces[3]; if ((findex == "none") || (findex == "NONE")){ findex = "NONE"; } if ((rindex == "none") || (rindex == "NONE")){ rindex = "NONE"; } }else { m->mothurOut("[ERROR]: file lines can be 2, 3, or 4 columns. The forward fastq files in the first column and their matching reverse fastq files in the second column, or a groupName then forward fastq file and reverse fastq file, or forward fastq file then reverse fastq then forward index and reverse index file. If you only have one index file add 'none' for the other one. \n"); m->control_pressed = true; } if (m->debug) { m->mothurOut("[DEBUG]: group = " + group + ", forward = " + forward + ", reverse = " + reverse + ", forwardIndex = " + findex + ", reverseIndex = " + rindex + ".\n"); } if (inputDir != "") { string path = m->hasPath(forward); if (path == "") { forward = inputDir + forward; } path = m->hasPath(reverse); if (path == "") { reverse = inputDir + reverse; } if (findex != "") { path = m->hasPath(findex); if (path == "") { findex = inputDir + findex; } } if (rindex != "") { path = m->hasPath(rindex); if (path == "") { rindex = inputDir + rindex; } } } //check to make sure both are able to be opened ifstream in2; int openForward = m->openInputFile(forward, in2, "noerror"); //if you can't open it, try default location if (openForward == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(forward); m->mothurOut("Unable to open " + forward + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openForward = m->openInputFile(tryPath, in3, "noerror"); in3.close(); forward = tryPath; } } //if you can't open it, try output location if (openForward == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(forward); m->mothurOut("Unable to open " + forward + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openForward = m->openInputFile(tryPath, in4, "noerror"); forward = tryPath; in4.close(); } } if (openForward == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + forward + ", ignoring pair.\n"); }else{ in2.close(); } ifstream in3; int openReverse = m->openInputFile(reverse, in3, "noerror"); //if you can't open it, try default location if (openReverse == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(reverse); m->mothurOut("Unable to open " + reverse + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openReverse = m->openInputFile(tryPath, in3, "noerror"); in3.close(); reverse = tryPath; } } //if you can't open it, try output location if (openReverse == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(reverse); m->mothurOut("Unable to open " + reverse + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openReverse = m->openInputFile(tryPath, in4, "noerror"); reverse = tryPath; in4.close(); } } if (openReverse == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + reverse + ", ignoring pair.\n"); }else{ in3.close(); } int openFindex = 0; if ((findex != "") && (findex != "NONE")){ ifstream in4; openFindex = m->openInputFile(findex, in4, "noerror"); in4.close(); //if you can't open it, try default location if (openFindex == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(findex); m->mothurOut("Unable to open " + findex + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in5; openFindex = m->openInputFile(tryPath, in5, "noerror"); in5.close(); findex = tryPath; } } //if you can't open it, try output location if (openFindex == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(findex); m->mothurOut("Unable to open " + findex + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in6; openFindex = m->openInputFile(tryPath, in6, "noerror"); findex = tryPath; in6.close(); } } if (openFindex == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + findex + ", ignoring pair.\n"); } } int openRindex = 0; if ((rindex != "") && (rindex != "NONE")) { ifstream in7; openRindex = m->openInputFile(rindex, in7, "noerror"); in7.close(); //if you can't open it, try default location if (openRindex == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(rindex); m->mothurOut("Unable to open " + rindex + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in8; openRindex = m->openInputFile(tryPath, in8, "noerror"); in8.close(); rindex = tryPath; } } //if you can't open it, try output location if (openRindex == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(rindex); m->mothurOut("Unable to open " + rindex + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in9; openRindex = m->openInputFile(tryPath, in9, "noerror"); rindex = tryPath; in9.close(); } } if (openRindex == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + rindex + ", ignoring pair.\n"); } } if ((openForward != 1) && (openReverse != 1) && (openFindex != 1) && (openRindex != 1)) { //good pair file2Group[files.size()] = group; vector pair; #ifdef USE_BOOST if (m->isGZ(forward)[1]) { allPlainTxt = false; } else { allGZ = false; } if (m->isGZ(reverse)[1]) { allPlainTxt = false; } else { allGZ = false; } if ((findex != "") && (findex != "NONE")) { if (m->isGZ(findex)[1]) { allPlainTxt = false; } else { allGZ = false; } } if ((rindex != "") && (rindex != "NONE")) { if (m->isGZ(rindex)[1]) { allPlainTxt = false; } else { allGZ = false; } } if (!allGZ && !allPlainTxt) { //mixed bag of files, uh oh... m->mothurOut("[ERROR]: Your files must all be in compressed .gz form or all in plain text form. Please correct. \n"); m->control_pressed = true; } #else allGZ=false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else string extension = m->getExtension(forward); if (extension == "gz") { m->mothurOut("[ERROR]: You cannot use compressed .gz files as input with our windows version of mothur. \n"); m->control_pressed = true; } extension = m->getExtension(reverse); if (extension == "gz") { m->mothurOut("[ERROR]: You cannot use compressed .gz files as input with our windows version of mothur. \n"); m->control_pressed = true; } if ((findex != "") && (findex != "NONE")) { extension = m->getExtension(findex); if (extension == "gz") { m->mothurOut("[ERROR]: You cannot use compressed .gz files as input with our windows version of mothur. \n"); m->control_pressed = true; } } if ((rindex != "") && (rindex != "NONE")) { extension = m->getExtension(rindex); if (extension == "gz") { m->mothurOut("[ERROR]: You cannot use compressed .gz files as input with our windows version of mothur. \n"); m->control_pressed = true; } } #endif #endif pair.push_back(forward); pair.push_back(reverse); pair.push_back(findex); pair.push_back(rindex); if (((findex != "") || (rindex != "")) && (oligosfile == "")) { m->mothurOut("[ERROR]: You need to provide an oligos file if you are going to use an index file.\n"); m->control_pressed = true; } files.push_back(pair); } } in.close(); if (allGZ) { gz = true; }else { gz = false; } return files; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "readFileNames"); exit(1); } } //*************************************************************************************************************** //illumina data requires paired forward and reverse data //BARCODE atgcatgc atgcatgc groupName //PRIMER atgcatgc atgcatgc groupName //PRIMER atgcatgc atgcatgc bool MakeContigsCommand::getOligos(vector >& fastaFileNames, vector >& qualFileNames, string rootname, map& fastaFile2Group){ try { if (m->debug) { m->mothurOut("[DEBUG]: oligosfile = " + oligosfile + "\n"); } bool allBlank = false; oligos->read(oligosfile, false); if (m->control_pressed) { return false; } //error in reading oligos if (oligos->hasPairedBarcodes() || oligos->hasPairedPrimers()) { numFPrimers = oligos->getPairedPrimers().size(); numBarcodes = oligos->getPairedBarcodes().size(); }else { m->mothurOut("[ERROR]: make.contigs requires paired barcodes and primers. You can set one end to NONE if you are using an index file.\n"); m->control_pressed = true; } if (m->control_pressed) { return false; } numLinkers = oligos->getLinkers().size(); numSpacers = oligos->getSpacers().size(); numRPrimers = oligos->getReversePrimers().size(); if (numLinkers != 0) { m->mothurOut("[WARNING]: make.contigs is not setup to remove linkers, ignoring.\n"); } if (numSpacers != 0) { m->mothurOut("[WARNING]: make.contigs is not setup to remove spacers, ignoring.\n"); } vector groupNames = oligos->getGroupNames(); if (groupNames.size() == 0) { allFiles = 0; allBlank = true; } fastaFileNames.resize(oligos->getBarcodeNames().size()); for(int i=0;igetPrimerNames().size();j++){ fastaFileNames[i].push_back(""); } } qualFileNames = fastaFileNames; if (allFiles) { set uniqueNames; //used to cleanup outputFileNames map barcodes = oligos->getPairedBarcodes(); map primers = oligos->getPairedPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos->getPrimerName(itPrimer->first); string barcodeName = oligos->getBarcodeName(itBar->first); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string fastaFileName = ""; string qualFileName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } ofstream temp, temp2; map variables; variables["[filename]"] = rootname; variables["[tag]"] = comboGroupName; fastaFileName = getOutputFileName("fasta", variables); qualFileName = getOutputFileName("qfile", variables); if (uniqueNames.count(fastaFileName) == 0) { outputNames.push_back(fastaFileName); outputTypes["fasta"].push_back(fastaFileName); uniqueNames.insert(fastaFileName); fastaFile2Group[fastaFileName] = comboGroupName; outputNames.push_back(qualFileName); outputTypes["qfile"].push_back(qualFileName); uniqueNames.insert(qualFileName); } fastaFileNames[itBar->first][itPrimer->first] = fastaFileName; m->openOutputFile(fastaFileName, temp); temp.close(); //cout << fastaFileName << endl; qualFileNames[itBar->first][itPrimer->first] = qualFileName; m->openOutputFile(qualFileName, temp2); temp2.close(); } } } } if (allBlank) { m->mothurOut("[WARNING]: your oligos file does not contain any group names. mothur will not create a groupfile."); m->mothurOutEndLine(); allFiles = false; return false; } return true; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "getOligos"); exit(1); } } //*************************************************************************************************************** /** * checks for minor diffs @MS7_15058:1:1101:11899:1633#8/1 @MS7_15058:1:1101:11899:1633#8/2 should match */ bool MakeContigsCommand::checkName(FastqRead& forward, FastqRead& reverse){ try { if (forward.getName() == reverse.getName()) { return true; }else { //if no match are the names only different by 1 and 2? string tempFRead = forward.getName().substr(0, forward.getName().length()-1); string tempRRead = reverse.getName().substr(0, reverse.getName().length()-1); if (tempFRead == tempRRead) { if ((forward.getName()[forward.getName().length()-1] == '1') && (reverse.getName()[reverse.getName().length()-1] == '2')) { forward.setName(tempFRead); reverse.setName(tempRRead); return true; } }else { //if no match are the names only different by 1 and 2? string tempFRead = forward.getName(); string tempRRead = reverse.getName().substr(0, reverse.getName().length()-1); if (tempFRead == tempRRead) { reverse.setName(tempRRead); return true; } } } return false; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "ckeckName"); exit(1); } } //*************************************************************************************************************** /** * checks for minor diffs @MS7_15058:1:1101:11899:1633#8/1 @MS7_15058:1:1101:11899:1633#8/2 should match */ bool MakeContigsCommand::checkName(Sequence& forward, Sequence& reverse){ try { if (forward.getName() == reverse.getName()) { return true; }else { //if no match are the names only different by 1 and 2? string tempFRead = forward.getName().substr(0, forward.getName().length()-1); string tempRRead = reverse.getName().substr(0, reverse.getName().length()-1); if (tempFRead == tempRRead) { if ((forward.getName()[forward.getName().length()-1] == '1') && (reverse.getName()[reverse.getName().length()-1] == '2')) { forward.setName(tempFRead); reverse.setName(tempRRead); return true; } }else { //if no match are the names only different by 1 and 2? string tempFRead = forward.getName(); string tempRRead = reverse.getName().substr(0, reverse.getName().length()-1); if (tempFRead == tempRRead) { reverse.setName(tempRRead); return true; } } } return false; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "ckeckName"); exit(1); } } //*************************************************************************************************************** /** * checks for minor diffs @MS7_15058:1:1101:11899:1633#8/1 @MS7_15058:1:1101:11899:1633#8/2 should match */ bool MakeContigsCommand::checkName(QualityScores& forward, QualityScores& reverse){ try { if (forward.getName() == reverse.getName()) { return true; }else { //if no match are the names only different by 1 and 2? string tempFRead = forward.getName().substr(0, forward.getName().length()-1); string tempRRead = reverse.getName().substr(0, reverse.getName().length()-1); if (tempFRead == tempRRead) { if ((forward.getName()[forward.getName().length()-1] == '1') && (reverse.getName()[reverse.getName().length()-1] == '2')) { forward.setName(tempFRead); reverse.setName(tempRRead); return true; } }else { //if no match are the names only different by 1 and 2? string tempFRead = forward.getName(); string tempRRead = reverse.getName().substr(0, reverse.getName().length()-1); if (tempFRead == tempRRead) { reverse.setName(tempRRead); return true; } } } return false; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "ckeckName"); exit(1); } } //*************************************************************************************************************** /** * checks for minor diffs @MS7_15058:1:1101:11899:1633#8/1 @MS7_15058:1:1101:11899:1633#8/2 should match */ bool MakeContigsCommand::checkName(Sequence& forward, QualityScores& reverse){ try { if (forward.getName() == reverse.getName()) { return true; }else { //if no match are the names only different by 1 and 2? string tempFRead = forward.getName().substr(0, forward.getName().length()-1); string tempRRead = reverse.getName().substr(0, reverse.getName().length()-1); if (tempFRead == tempRRead) { if ((forward.getName()[forward.getName().length()-1] == '1') && (reverse.getName()[reverse.getName().length()-1] == '2')) { forward.setName(tempFRead); reverse.setName(tempRRead); return true; } }else { //if no match are the names only different by 1 and 2? string tempFRead = forward.getName(); string tempRRead = reverse.getName().substr(0, reverse.getName().length()-1); if (tempFRead == tempRRead) { reverse.setName(tempRRead); return true; } } } return false; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "ckeckName"); exit(1); } } //*************************************************************************************************************** /** * Convert the probability to a quality score. */ int MakeContigsCommand::convertProb(double qProb){ try { int lower = 0; int upper = 46; if (qProb < qual_score[0]) { return 1; } while (lower < upper) { int mid = lower + (upper - lower) / 2; if (qual_score[mid] == qProb) { return mid; } if (mid == lower) { return lower; } else if (qual_score[mid] > qProb) { upper = mid; } else if (qual_score[mid] < qProb) { lower = mid + 1; } } return lower; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "convertProb"); exit(1); } } //*************************************************************************************************************** int MakeContigsCommand::loadQmatchValues(vector< vector >& qual_match_simple_bayesian, vector< vector >& qual_mismatch_simple_bayesian){ try { qual_match_simple_bayesian[0][0] = -1.09861; qual_match_simple_bayesian[0][1] = -1.32887; qual_match_simple_bayesian[0][2] = -1.55913; qual_match_simple_bayesian[0][3] = -1.78939; qual_match_simple_bayesian[0][4] = -2.01965; qual_match_simple_bayesian[0][5] = -2.2499; qual_match_simple_bayesian[0][6] = -2.48016; qual_match_simple_bayesian[0][7] = -2.71042; qual_match_simple_bayesian[0][8] = -2.94068; qual_match_simple_bayesian[0][9] = -3.17094; qual_match_simple_bayesian[0][10] = -3.4012; qual_match_simple_bayesian[0][11] = -3.63146; qual_match_simple_bayesian[0][12] = -3.86171; qual_match_simple_bayesian[0][13] = -4.09197; qual_match_simple_bayesian[0][14] = -4.32223; qual_match_simple_bayesian[0][15] = -4.55249; qual_match_simple_bayesian[0][16] = -4.78275; qual_match_simple_bayesian[0][17] = -5.01301; qual_match_simple_bayesian[0][18] = -5.24327; qual_match_simple_bayesian[0][19] = -5.47352; qual_match_simple_bayesian[0][20] = -5.70378; qual_match_simple_bayesian[0][21] = -5.93404; qual_match_simple_bayesian[0][22] = -6.1643; qual_match_simple_bayesian[0][23] = -6.39456; qual_match_simple_bayesian[0][24] = -6.62482; qual_match_simple_bayesian[0][25] = -6.85508; qual_match_simple_bayesian[0][26] = -7.08533; qual_match_simple_bayesian[0][27] = -7.31559; qual_match_simple_bayesian[0][28] = -7.54585; qual_match_simple_bayesian[0][29] = -7.77611; qual_match_simple_bayesian[0][30] = -8.00637; qual_match_simple_bayesian[0][31] = -8.23663; qual_match_simple_bayesian[0][32] = -8.46688; qual_match_simple_bayesian[0][33] = -8.69714; qual_match_simple_bayesian[0][34] = -8.9274; qual_match_simple_bayesian[0][35] = -9.15766; qual_match_simple_bayesian[0][36] = -9.38792; qual_match_simple_bayesian[0][37] = -9.61818; qual_match_simple_bayesian[0][38] = -9.84844; qual_match_simple_bayesian[0][39] = -10.0787; qual_match_simple_bayesian[0][40] = -10.309; qual_match_simple_bayesian[0][41] = -10.5392; qual_match_simple_bayesian[0][42] = -10.7695; qual_match_simple_bayesian[0][43] = -10.9997; qual_match_simple_bayesian[0][44] = -11.23; qual_match_simple_bayesian[0][45] = -11.4602; qual_match_simple_bayesian[0][46] = -11.6905; qual_match_simple_bayesian[1][0] = -1.32887; qual_match_simple_bayesian[1][1] = -1.37587; qual_match_simple_bayesian[1][2] = -1.41484; qual_match_simple_bayesian[1][3] = -1.44692; qual_match_simple_bayesian[1][4] = -1.47315; qual_match_simple_bayesian[1][5] = -1.49449; qual_match_simple_bayesian[1][6] = -1.51178; qual_match_simple_bayesian[1][7] = -1.52572; qual_match_simple_bayesian[1][8] = -1.53694; qual_match_simple_bayesian[1][9] = -1.54593; qual_match_simple_bayesian[1][10] = -1.55314; qual_match_simple_bayesian[1][11] = -1.5589; qual_match_simple_bayesian[1][12] = -1.5635; qual_match_simple_bayesian[1][13] = -1.56717; qual_match_simple_bayesian[1][14] = -1.5701; qual_match_simple_bayesian[1][15] = -1.57243; qual_match_simple_bayesian[1][16] = -1.57428; qual_match_simple_bayesian[1][17] = -1.57576; qual_match_simple_bayesian[1][18] = -1.57693; qual_match_simple_bayesian[1][19] = -1.57786; qual_match_simple_bayesian[1][20] = -1.5786; qual_match_simple_bayesian[1][21] = -1.57919; qual_match_simple_bayesian[1][22] = -1.57966; qual_match_simple_bayesian[1][23] = -1.58003; qual_match_simple_bayesian[1][24] = -1.58033; qual_match_simple_bayesian[1][25] = -1.58057; qual_match_simple_bayesian[1][26] = -1.58075; qual_match_simple_bayesian[1][27] = -1.5809; qual_match_simple_bayesian[1][28] = -1.58102; qual_match_simple_bayesian[1][29] = -1.58111; qual_match_simple_bayesian[1][30] = -1.58119; qual_match_simple_bayesian[1][31] = -1.58125; qual_match_simple_bayesian[1][32] = -1.58129; qual_match_simple_bayesian[1][33] = -1.58133; qual_match_simple_bayesian[1][34] = -1.58136; qual_match_simple_bayesian[1][35] = -1.58138; qual_match_simple_bayesian[1][36] = -1.5814; qual_match_simple_bayesian[1][37] = -1.58142; qual_match_simple_bayesian[1][38] = -1.58143; qual_match_simple_bayesian[1][39] = -1.58144; qual_match_simple_bayesian[1][40] = -1.58145; qual_match_simple_bayesian[1][41] = -1.58145; qual_match_simple_bayesian[1][42] = -1.58146; qual_match_simple_bayesian[1][43] = -1.58146; qual_match_simple_bayesian[1][44] = -1.58146; qual_match_simple_bayesian[1][45] = -1.58146; qual_match_simple_bayesian[1][46] = -1.58147; qual_match_simple_bayesian[2][0] = -1.55913; qual_match_simple_bayesian[2][1] = -1.41484; qual_match_simple_bayesian[2][2] = -1.31343; qual_match_simple_bayesian[2][3] = -1.23963; qual_match_simple_bayesian[2][4] = -1.18465; qual_match_simple_bayesian[2][5] = -1.14303; qual_match_simple_bayesian[2][6] = -1.11117; qual_match_simple_bayesian[2][7] = -1.08657; qual_match_simple_bayesian[2][8] = -1.06744; qual_match_simple_bayesian[2][9] = -1.05251; qual_match_simple_bayesian[2][10] = -1.0408; qual_match_simple_bayesian[2][11] = -1.0316; qual_match_simple_bayesian[2][12] = -1.02436; qual_match_simple_bayesian[2][13] = -1.01863; qual_match_simple_bayesian[2][14] = -1.01411; qual_match_simple_bayesian[2][15] = -1.01054; qual_match_simple_bayesian[2][16] = -1.00771; qual_match_simple_bayesian[2][17] = -1.00546; qual_match_simple_bayesian[2][18] = -1.00368; qual_match_simple_bayesian[2][19] = -1.00227; qual_match_simple_bayesian[2][20] = -1.00115; qual_match_simple_bayesian[2][21] = -1.00027; qual_match_simple_bayesian[2][22] = -0.99956; qual_match_simple_bayesian[2][23] = -0.999001; qual_match_simple_bayesian[2][24] = -0.998557; qual_match_simple_bayesian[2][25] = -0.998204; qual_match_simple_bayesian[2][26] = -0.997924; qual_match_simple_bayesian[2][27] = -0.997702; qual_match_simple_bayesian[2][28] = -0.997525; qual_match_simple_bayesian[2][29] = -0.997385; qual_match_simple_bayesian[2][30] = -0.997273; qual_match_simple_bayesian[2][31] = -0.997185; qual_match_simple_bayesian[2][32] = -0.997114; qual_match_simple_bayesian[2][33] = -0.997059; qual_match_simple_bayesian[2][34] = -0.997014; qual_match_simple_bayesian[2][35] = -0.996979; qual_match_simple_bayesian[2][36] = -0.996951; qual_match_simple_bayesian[2][37] = -0.996929; qual_match_simple_bayesian[2][38] = -0.996911; qual_match_simple_bayesian[2][39] = -0.996897; qual_match_simple_bayesian[2][40] = -0.996886; qual_match_simple_bayesian[2][41] = -0.996877; qual_match_simple_bayesian[2][42] = -0.99687; qual_match_simple_bayesian[2][43] = -0.996865; qual_match_simple_bayesian[2][44] = -0.99686; qual_match_simple_bayesian[2][45] = -0.996857; qual_match_simple_bayesian[2][46] = -0.996854; qual_match_simple_bayesian[3][0] = -1.78939; qual_match_simple_bayesian[3][1] = -1.44692; qual_match_simple_bayesian[3][2] = -1.23963; qual_match_simple_bayesian[3][3] = -1.10098; qual_match_simple_bayesian[3][4] = -1.0031; qual_match_simple_bayesian[3][5] = -0.931648; qual_match_simple_bayesian[3][6] = -0.878319; qual_match_simple_bayesian[3][7] = -0.837896; qual_match_simple_bayesian[3][8] = -0.806912; qual_match_simple_bayesian[3][9] = -0.782967; qual_match_simple_bayesian[3][10] = -0.764347; qual_match_simple_bayesian[3][11] = -0.7498; qual_match_simple_bayesian[3][12] = -0.738394; qual_match_simple_bayesian[3][13] = -0.729426; qual_match_simple_bayesian[3][14] = -0.722359; qual_match_simple_bayesian[3][15] = -0.71678; qual_match_simple_bayesian[3][16] = -0.712372; qual_match_simple_bayesian[3][17] = -0.708883; qual_match_simple_bayesian[3][18] = -0.706121; qual_match_simple_bayesian[3][19] = -0.703933; qual_match_simple_bayesian[3][20] = -0.702197; qual_match_simple_bayesian[3][21] = -0.700821; qual_match_simple_bayesian[3][22] = -0.69973; qual_match_simple_bayesian[3][23] = -0.698863; qual_match_simple_bayesian[3][24] = -0.698176; qual_match_simple_bayesian[3][25] = -0.69763; qual_match_simple_bayesian[3][26] = -0.697196; qual_match_simple_bayesian[3][27] = -0.696852; qual_match_simple_bayesian[3][28] = -0.696579; qual_match_simple_bayesian[3][29] = -0.696362; qual_match_simple_bayesian[3][30] = -0.69619; qual_match_simple_bayesian[3][31] = -0.696053; qual_match_simple_bayesian[3][32] = -0.695944; qual_match_simple_bayesian[3][33] = -0.695858; qual_match_simple_bayesian[3][34] = -0.695789; qual_match_simple_bayesian[3][35] = -0.695735; qual_match_simple_bayesian[3][36] = -0.695692; qual_match_simple_bayesian[3][37] = -0.695657; qual_match_simple_bayesian[3][38] = -0.69563; qual_match_simple_bayesian[3][39] = -0.695608; qual_match_simple_bayesian[3][40] = -0.695591; qual_match_simple_bayesian[3][41] = -0.695577; qual_match_simple_bayesian[3][42] = -0.695566; qual_match_simple_bayesian[3][43] = -0.695558; qual_match_simple_bayesian[3][44] = -0.695551; qual_match_simple_bayesian[3][45] = -0.695546; qual_match_simple_bayesian[3][46] = -0.695541; qual_match_simple_bayesian[4][0] = -2.01965; qual_match_simple_bayesian[4][1] = -1.47315; qual_match_simple_bayesian[4][2] = -1.18465; qual_match_simple_bayesian[4][3] = -1.0031; qual_match_simple_bayesian[4][4] = -0.879224; qual_match_simple_bayesian[4][5] = -0.790712; qual_match_simple_bayesian[4][6] = -0.725593; qual_match_simple_bayesian[4][7] = -0.676729; qual_match_simple_bayesian[4][8] = -0.639547; qual_match_simple_bayesian[4][9] = -0.610968; qual_match_simple_bayesian[4][10] = -0.588834; qual_match_simple_bayesian[4][11] = -0.571596; qual_match_simple_bayesian[4][12] = -0.558111; qual_match_simple_bayesian[4][13] = -0.547528; qual_match_simple_bayesian[4][14] = -0.539201; qual_match_simple_bayesian[4][15] = -0.532636; qual_match_simple_bayesian[4][16] = -0.527451; qual_match_simple_bayesian[4][17] = -0.523352; qual_match_simple_bayesian[4][18] = -0.520107; qual_match_simple_bayesian[4][19] = -0.517538; qual_match_simple_bayesian[4][20] = -0.515502; qual_match_simple_bayesian[4][21] = -0.513887; qual_match_simple_bayesian[4][22] = -0.512606; qual_match_simple_bayesian[4][23] = -0.51159; qual_match_simple_bayesian[4][24] = -0.510784; qual_match_simple_bayesian[4][25] = -0.510144; qual_match_simple_bayesian[4][26] = -0.509636; qual_match_simple_bayesian[4][27] = -0.509232; qual_match_simple_bayesian[4][28] = -0.508912; qual_match_simple_bayesian[4][29] = -0.508658; qual_match_simple_bayesian[4][30] = -0.508456; qual_match_simple_bayesian[4][31] = -0.508295; qual_match_simple_bayesian[4][32] = -0.508168; qual_match_simple_bayesian[4][33] = -0.508067; qual_match_simple_bayesian[4][34] = -0.507986; qual_match_simple_bayesian[4][35] = -0.507922; qual_match_simple_bayesian[4][36] = -0.507872; qual_match_simple_bayesian[4][37] = -0.507831; qual_match_simple_bayesian[4][38] = -0.507799; qual_match_simple_bayesian[4][39] = -0.507774; qual_match_simple_bayesian[4][40] = -0.507754; qual_match_simple_bayesian[4][41] = -0.507738; qual_match_simple_bayesian[4][42] = -0.507725; qual_match_simple_bayesian[4][43] = -0.507715; qual_match_simple_bayesian[4][44] = -0.507707; qual_match_simple_bayesian[4][45] = -0.507701; qual_match_simple_bayesian[4][46] = -0.507695; qual_match_simple_bayesian[5][0] = -2.2499; qual_match_simple_bayesian[5][1] = -1.49449; qual_match_simple_bayesian[5][2] = -1.14303; qual_match_simple_bayesian[5][3] = -0.931648; qual_match_simple_bayesian[5][4] = -0.790712; qual_match_simple_bayesian[5][5] = -0.691393; qual_match_simple_bayesian[5][6] = -0.618979; qual_match_simple_bayesian[5][7] = -0.564976; qual_match_simple_bayesian[5][8] = -0.524066; qual_match_simple_bayesian[5][9] = -0.492723; qual_match_simple_bayesian[5][10] = -0.468507; qual_match_simple_bayesian[5][11] = -0.449682; qual_match_simple_bayesian[5][12] = -0.434976; qual_match_simple_bayesian[5][13] = -0.423448; qual_match_simple_bayesian[5][14] = -0.414384; qual_match_simple_bayesian[5][15] = -0.407243; qual_match_simple_bayesian[5][16] = -0.401606; qual_match_simple_bayesian[5][17] = -0.397151; qual_match_simple_bayesian[5][18] = -0.393627; qual_match_simple_bayesian[5][19] = -0.390836; qual_match_simple_bayesian[5][20] = -0.388625; qual_match_simple_bayesian[5][21] = -0.386872; qual_match_simple_bayesian[5][22] = -0.385482; qual_match_simple_bayesian[5][23] = -0.384379; qual_match_simple_bayesian[5][24] = -0.383503; qual_match_simple_bayesian[5][25] = -0.382809; qual_match_simple_bayesian[5][26] = -0.382257; qual_match_simple_bayesian[5][27] = -0.38182; qual_match_simple_bayesian[5][28] = -0.381472; qual_match_simple_bayesian[5][29] = -0.381196; qual_match_simple_bayesian[5][30] = -0.380977; qual_match_simple_bayesian[5][31] = -0.380803; qual_match_simple_bayesian[5][32] = -0.380664; qual_match_simple_bayesian[5][33] = -0.380554; qual_match_simple_bayesian[5][34] = -0.380467; qual_match_simple_bayesian[5][35] = -0.380398; qual_match_simple_bayesian[5][36] = -0.380343; qual_match_simple_bayesian[5][37] = -0.380299; qual_match_simple_bayesian[5][38] = -0.380264; qual_match_simple_bayesian[5][39] = -0.380237; qual_match_simple_bayesian[5][40] = -0.380215; qual_match_simple_bayesian[5][41] = -0.380198; qual_match_simple_bayesian[5][42] = -0.380184; qual_match_simple_bayesian[5][43] = -0.380173; qual_match_simple_bayesian[5][44] = -0.380164; qual_match_simple_bayesian[5][45] = -0.380157; qual_match_simple_bayesian[5][46] = -0.380152; qual_match_simple_bayesian[6][0] = -2.48016; qual_match_simple_bayesian[6][1] = -1.51178; qual_match_simple_bayesian[6][2] = -1.11117; qual_match_simple_bayesian[6][3] = -0.878319; qual_match_simple_bayesian[6][4] = -0.725593; qual_match_simple_bayesian[6][5] = -0.618979; qual_match_simple_bayesian[6][6] = -0.541714; qual_match_simple_bayesian[6][7] = -0.48433; qual_match_simple_bayesian[6][8] = -0.440984; qual_match_simple_bayesian[6][9] = -0.407844; qual_match_simple_bayesian[6][10] = -0.382281; qual_match_simple_bayesian[6][11] = -0.362431; qual_match_simple_bayesian[6][12] = -0.34694; qual_match_simple_bayesian[6][13] = -0.334804; qual_match_simple_bayesian[6][14] = -0.325268; qual_match_simple_bayesian[6][15] = -0.317757; qual_match_simple_bayesian[6][16] = -0.311831; qual_match_simple_bayesian[6][17] = -0.307149; qual_match_simple_bayesian[6][18] = -0.303445; qual_match_simple_bayesian[6][19] = -0.300513; qual_match_simple_bayesian[6][20] = -0.29819; qual_match_simple_bayesian[6][21] = -0.296348; qual_match_simple_bayesian[6][22] = -0.294888; qual_match_simple_bayesian[6][23] = -0.29373; qual_match_simple_bayesian[6][24] = -0.29281; qual_match_simple_bayesian[6][25] = -0.292081; qual_match_simple_bayesian[6][26] = -0.291502; qual_match_simple_bayesian[6][27] = -0.291042; qual_match_simple_bayesian[6][28] = -0.290677; qual_match_simple_bayesian[6][29] = -0.290387; qual_match_simple_bayesian[6][30] = -0.290157; qual_match_simple_bayesian[6][31] = -0.289974; qual_match_simple_bayesian[6][32] = -0.289829; qual_match_simple_bayesian[6][33] = -0.289713; qual_match_simple_bayesian[6][34] = -0.289622; qual_match_simple_bayesian[6][35] = -0.289549; qual_match_simple_bayesian[6][36] = -0.289491; qual_match_simple_bayesian[6][37] = -0.289445; qual_match_simple_bayesian[6][38] = -0.289409; qual_match_simple_bayesian[6][39] = -0.28938; qual_match_simple_bayesian[6][40] = -0.289357; qual_match_simple_bayesian[6][41] = -0.289339; qual_match_simple_bayesian[6][42] = -0.289324; qual_match_simple_bayesian[6][43] = -0.289313; qual_match_simple_bayesian[6][44] = -0.289304; qual_match_simple_bayesian[6][45] = -0.289296; qual_match_simple_bayesian[6][46] = -0.28929; qual_match_simple_bayesian[7][0] = -2.71042; qual_match_simple_bayesian[7][1] = -1.52572; qual_match_simple_bayesian[7][2] = -1.08657; qual_match_simple_bayesian[7][3] = -0.837896; qual_match_simple_bayesian[7][4] = -0.676729; qual_match_simple_bayesian[7][5] = -0.564976; qual_match_simple_bayesian[7][6] = -0.48433; qual_match_simple_bayesian[7][7] = -0.424604; qual_match_simple_bayesian[7][8] = -0.379581; qual_match_simple_bayesian[7][9] = -0.345208; qual_match_simple_bayesian[7][10] = -0.318723; qual_match_simple_bayesian[7][11] = -0.298173; qual_match_simple_bayesian[7][12] = -0.282146; qual_match_simple_bayesian[7][13] = -0.269595; qual_match_simple_bayesian[7][14] = -0.259737; qual_match_simple_bayesian[7][15] = -0.251976; qual_match_simple_bayesian[7][16] = -0.245853; qual_match_simple_bayesian[7][17] = -0.241016; qual_match_simple_bayesian[7][18] = -0.23719; qual_match_simple_bayesian[7][19] = -0.234162; qual_match_simple_bayesian[7][20] = -0.231763; qual_match_simple_bayesian[7][21] = -0.229861; qual_match_simple_bayesian[7][22] = -0.228354; qual_match_simple_bayesian[7][23] = -0.227158; qual_match_simple_bayesian[7][24] = -0.226208; qual_match_simple_bayesian[7][25] = -0.225455; qual_match_simple_bayesian[7][26] = -0.224857; qual_match_simple_bayesian[7][27] = -0.224383; qual_match_simple_bayesian[7][28] = -0.224006; qual_match_simple_bayesian[7][29] = -0.223707; qual_match_simple_bayesian[7][30] = -0.223469; qual_match_simple_bayesian[7][31] = -0.22328; qual_match_simple_bayesian[7][32] = -0.22313; qual_match_simple_bayesian[7][33] = -0.223011; qual_match_simple_bayesian[7][34] = -0.222917; qual_match_simple_bayesian[7][35] = -0.222842; qual_match_simple_bayesian[7][36] = -0.222782; qual_match_simple_bayesian[7][37] = -0.222734; qual_match_simple_bayesian[7][38] = -0.222697; qual_match_simple_bayesian[7][39] = -0.222667; qual_match_simple_bayesian[7][40] = -0.222643; qual_match_simple_bayesian[7][41] = -0.222624; qual_match_simple_bayesian[7][42] = -0.222609; qual_match_simple_bayesian[7][43] = -0.222597; qual_match_simple_bayesian[7][44] = -0.222588; qual_match_simple_bayesian[7][45] = -0.222581; qual_match_simple_bayesian[7][46] = -0.222575; qual_match_simple_bayesian[8][0] = -2.94068; qual_match_simple_bayesian[8][1] = -1.53694; qual_match_simple_bayesian[8][2] = -1.06744; qual_match_simple_bayesian[8][3] = -0.806912; qual_match_simple_bayesian[8][4] = -0.639547; qual_match_simple_bayesian[8][5] = -0.524066; qual_match_simple_bayesian[8][6] = -0.440984; qual_match_simple_bayesian[8][7] = -0.379581; qual_match_simple_bayesian[8][8] = -0.333359; qual_match_simple_bayesian[8][9] = -0.298107; qual_match_simple_bayesian[8][10] = -0.270966; qual_match_simple_bayesian[8][11] = -0.249919; qual_match_simple_bayesian[8][12] = -0.233512; qual_match_simple_bayesian[8][13] = -0.220668; qual_match_simple_bayesian[8][14] = -0.210582; qual_match_simple_bayesian[8][15] = -0.202642; qual_match_simple_bayesian[8][16] = -0.19638; qual_match_simple_bayesian[8][17] = -0.191434; qual_match_simple_bayesian[8][18] = -0.187522; qual_match_simple_bayesian[8][19] = -0.184426; qual_match_simple_bayesian[8][20] = -0.181973; qual_match_simple_bayesian[8][21] = -0.180029; qual_match_simple_bayesian[8][22] = -0.178488; qual_match_simple_bayesian[8][23] = -0.177265; qual_match_simple_bayesian[8][24] = -0.176295; qual_match_simple_bayesian[8][25] = -0.175525; qual_match_simple_bayesian[8][26] = -0.174914; qual_match_simple_bayesian[8][27] = -0.174428; qual_match_simple_bayesian[8][28] = -0.174043; qual_match_simple_bayesian[8][29] = -0.173737; qual_match_simple_bayesian[8][30] = -0.173494; qual_match_simple_bayesian[8][31] = -0.173301; qual_match_simple_bayesian[8][32] = -0.173148; qual_match_simple_bayesian[8][33] = -0.173026; qual_match_simple_bayesian[8][34] = -0.17293; qual_match_simple_bayesian[8][35] = -0.172853; qual_match_simple_bayesian[8][36] = -0.172792; qual_match_simple_bayesian[8][37] = -0.172744; qual_match_simple_bayesian[8][38] = -0.172705; qual_match_simple_bayesian[8][39] = -0.172675; qual_match_simple_bayesian[8][40] = -0.17265; qual_match_simple_bayesian[8][41] = -0.172631; qual_match_simple_bayesian[8][42] = -0.172616; qual_match_simple_bayesian[8][43] = -0.172604; qual_match_simple_bayesian[8][44] = -0.172594; qual_match_simple_bayesian[8][45] = -0.172586; qual_match_simple_bayesian[8][46] = -0.17258; qual_match_simple_bayesian[9][0] = -3.17094; qual_match_simple_bayesian[9][1] = -1.54593; qual_match_simple_bayesian[9][2] = -1.05251; qual_match_simple_bayesian[9][3] = -0.782967; qual_match_simple_bayesian[9][4] = -0.610968; qual_match_simple_bayesian[9][5] = -0.492723; qual_match_simple_bayesian[9][6] = -0.407844; qual_match_simple_bayesian[9][7] = -0.345208; qual_match_simple_bayesian[9][8] = -0.298107; qual_match_simple_bayesian[9][9] = -0.262213; qual_match_simple_bayesian[9][10] = -0.234592; qual_match_simple_bayesian[9][11] = -0.213183; qual_match_simple_bayesian[9][12] = -0.196498; qual_match_simple_bayesian[9][13] = -0.18344; qual_match_simple_bayesian[9][14] = -0.173188; qual_match_simple_bayesian[9][15] = -0.165119; qual_match_simple_bayesian[9][16] = -0.158755; qual_match_simple_bayesian[9][17] = -0.153729; qual_match_simple_bayesian[9][18] = -0.149755; qual_match_simple_bayesian[9][19] = -0.146609; qual_match_simple_bayesian[9][20] = -0.144117; qual_match_simple_bayesian[9][21] = -0.142143; qual_match_simple_bayesian[9][22] = -0.140577; qual_match_simple_bayesian[9][23] = -0.139335; qual_match_simple_bayesian[9][24] = -0.138349; qual_match_simple_bayesian[9][25] = -0.137567; qual_match_simple_bayesian[9][26] = -0.136946; qual_match_simple_bayesian[9][27] = -0.136453; qual_match_simple_bayesian[9][28] = -0.136062; qual_match_simple_bayesian[9][29] = -0.135751; qual_match_simple_bayesian[9][30] = -0.135504; qual_match_simple_bayesian[9][31] = -0.135308; qual_match_simple_bayesian[9][32] = -0.135153; qual_match_simple_bayesian[9][33] = -0.135029; qual_match_simple_bayesian[9][34] = -0.134931; qual_match_simple_bayesian[9][35] = -0.134853; qual_match_simple_bayesian[9][36] = -0.134791; qual_match_simple_bayesian[9][37] = -0.134742; qual_match_simple_bayesian[9][38] = -0.134703; qual_match_simple_bayesian[9][39] = -0.134672; qual_match_simple_bayesian[9][40] = -0.134647; qual_match_simple_bayesian[9][41] = -0.134628; qual_match_simple_bayesian[9][42] = -0.134612; qual_match_simple_bayesian[9][43] = -0.1346; qual_match_simple_bayesian[9][44] = -0.13459; qual_match_simple_bayesian[9][45] = -0.134582; qual_match_simple_bayesian[9][46] = -0.134576; qual_match_simple_bayesian[10][0] = -3.4012; qual_match_simple_bayesian[10][1] = -1.55314; qual_match_simple_bayesian[10][2] = -1.0408; qual_match_simple_bayesian[10][3] = -0.764347; qual_match_simple_bayesian[10][4] = -0.588834; qual_match_simple_bayesian[10][5] = -0.468507; qual_match_simple_bayesian[10][6] = -0.382281; qual_match_simple_bayesian[10][7] = -0.318723; qual_match_simple_bayesian[10][8] = -0.270966; qual_match_simple_bayesian[10][9] = -0.234592; qual_match_simple_bayesian[10][10] = -0.206614; qual_match_simple_bayesian[10][11] = -0.184935; qual_match_simple_bayesian[10][12] = -0.168044; qual_match_simple_bayesian[10][13] = -0.154827; qual_match_simple_bayesian[10][14] = -0.144451; qual_match_simple_bayesian[10][15] = -0.136285; qual_match_simple_bayesian[10][16] = -0.129846; qual_match_simple_bayesian[10][17] = -0.124761; qual_match_simple_bayesian[10][18] = -0.12074; qual_match_simple_bayesian[10][19] = -0.117558; qual_match_simple_bayesian[10][20] = -0.115037; qual_match_simple_bayesian[10][21] = -0.113039; qual_match_simple_bayesian[10][22] = -0.111455; qual_match_simple_bayesian[10][23] = -0.110198; qual_match_simple_bayesian[10][24] = -0.109202; qual_match_simple_bayesian[10][25] = -0.10841; qual_match_simple_bayesian[10][26] = -0.107782; qual_match_simple_bayesian[10][27] = -0.107284; qual_match_simple_bayesian[10][28] = -0.106888; qual_match_simple_bayesian[10][29] = -0.106574; qual_match_simple_bayesian[10][30] = -0.106324; qual_match_simple_bayesian[10][31] = -0.106126; qual_match_simple_bayesian[10][32] = -0.105968; qual_match_simple_bayesian[10][33] = -0.105843; qual_match_simple_bayesian[10][34] = -0.105744; qual_match_simple_bayesian[10][35] = -0.105665; qual_match_simple_bayesian[10][36] = -0.105602; qual_match_simple_bayesian[10][37] = -0.105553; qual_match_simple_bayesian[10][38] = -0.105513; qual_match_simple_bayesian[10][39] = -0.105482; qual_match_simple_bayesian[10][40] = -0.105457; qual_match_simple_bayesian[10][41] = -0.105437; qual_match_simple_bayesian[10][42] = -0.105421; qual_match_simple_bayesian[10][43] = -0.105409; qual_match_simple_bayesian[10][44] = -0.105399; qual_match_simple_bayesian[10][45] = -0.105391; qual_match_simple_bayesian[10][46] = -0.105385; qual_match_simple_bayesian[11][0] = -3.63146; qual_match_simple_bayesian[11][1] = -1.5589; qual_match_simple_bayesian[11][2] = -1.0316; qual_match_simple_bayesian[11][3] = -0.7498; qual_match_simple_bayesian[11][4] = -0.571596; qual_match_simple_bayesian[11][5] = -0.449682; qual_match_simple_bayesian[11][6] = -0.362431; qual_match_simple_bayesian[11][7] = -0.298173; qual_match_simple_bayesian[11][8] = -0.249919; qual_match_simple_bayesian[11][9] = -0.213183; qual_match_simple_bayesian[11][10] = -0.184935; qual_match_simple_bayesian[11][11] = -0.163052; qual_match_simple_bayesian[11][12] = -0.146004; qual_match_simple_bayesian[11][13] = -0.132667; qual_match_simple_bayesian[11][14] = -0.122198; qual_match_simple_bayesian[11][15] = -0.11396; qual_match_simple_bayesian[11][16] = -0.107464; qual_match_simple_bayesian[11][17] = -0.102334; qual_match_simple_bayesian[11][18] = -0.0982781; qual_match_simple_bayesian[11][19] = -0.0950678; qual_match_simple_bayesian[11][20] = -0.0925252; qual_match_simple_bayesian[11][21] = -0.09051; qual_match_simple_bayesian[11][22] = -0.0889123; qual_match_simple_bayesian[11][23] = -0.0876449; qual_match_simple_bayesian[11][24] = -0.0866394; qual_match_simple_bayesian[11][25] = -0.0858414; qual_match_simple_bayesian[11][26] = -0.0852079; qual_match_simple_bayesian[11][27] = -0.0847051; qual_match_simple_bayesian[11][28] = -0.0843058; qual_match_simple_bayesian[11][29] = -0.0839888; qual_match_simple_bayesian[11][30] = -0.083737; qual_match_simple_bayesian[11][31] = -0.0835371; qual_match_simple_bayesian[11][32] = -0.0833783; qual_match_simple_bayesian[11][33] = -0.0832522; qual_match_simple_bayesian[11][34] = -0.083152; qual_match_simple_bayesian[11][35] = -0.0830725; qual_match_simple_bayesian[11][36] = -0.0830093; qual_match_simple_bayesian[11][37] = -0.0829591; qual_match_simple_bayesian[11][38] = -0.0829192; qual_match_simple_bayesian[11][39] = -0.0828876; qual_match_simple_bayesian[11][40] = -0.0828624; qual_match_simple_bayesian[11][41] = -0.0828425; qual_match_simple_bayesian[11][42] = -0.0828266; qual_match_simple_bayesian[11][43] = -0.082814; qual_match_simple_bayesian[11][44] = -0.082804; qual_match_simple_bayesian[11][45] = -0.082796; qual_match_simple_bayesian[11][46] = -0.0827897; qual_match_simple_bayesian[12][0] = -3.86171; qual_match_simple_bayesian[12][1] = -1.5635; qual_match_simple_bayesian[12][2] = -1.02436; qual_match_simple_bayesian[12][3] = -0.738394; qual_match_simple_bayesian[12][4] = -0.558111; qual_match_simple_bayesian[12][5] = -0.434976; qual_match_simple_bayesian[12][6] = -0.34694; qual_match_simple_bayesian[12][7] = -0.282146; qual_match_simple_bayesian[12][8] = -0.233512; qual_match_simple_bayesian[12][9] = -0.196498; qual_match_simple_bayesian[12][10] = -0.168044; qual_match_simple_bayesian[12][11] = -0.146004; qual_match_simple_bayesian[12][12] = -0.128838; qual_match_simple_bayesian[12][13] = -0.115409; qual_match_simple_bayesian[12][14] = -0.104869; qual_match_simple_bayesian[12][15] = -0.096575; qual_match_simple_bayesian[12][16] = -0.0900357; qual_match_simple_bayesian[12][17] = -0.0848716; qual_match_simple_bayesian[12][18] = -0.0807886; qual_match_simple_bayesian[12][19] = -0.0775572; qual_match_simple_bayesian[12][20] = -0.0749978; qual_match_simple_bayesian[12][21] = -0.0729694; qual_match_simple_bayesian[12][22] = -0.0713612; qual_match_simple_bayesian[12][23] = -0.0700856; qual_match_simple_bayesian[12][24] = -0.0690735; qual_match_simple_bayesian[12][25] = -0.0682703; qual_match_simple_bayesian[12][26] = -0.0676327; qual_match_simple_bayesian[12][27] = -0.0671265; qual_match_simple_bayesian[12][28] = -0.0667247; qual_match_simple_bayesian[12][29] = -0.0664056; qual_match_simple_bayesian[12][30] = -0.0661522; qual_match_simple_bayesian[12][31] = -0.065951; qual_match_simple_bayesian[12][32] = -0.0657912; qual_match_simple_bayesian[12][33] = -0.0656642; qual_match_simple_bayesian[12][34] = -0.0655634; qual_match_simple_bayesian[12][35] = -0.0654833; qual_match_simple_bayesian[12][36] = -0.0654198; qual_match_simple_bayesian[12][37] = -0.0653692; qual_match_simple_bayesian[12][38] = -0.0653291; qual_match_simple_bayesian[12][39] = -0.0652972; qual_match_simple_bayesian[12][40] = -0.0652719; qual_match_simple_bayesian[12][41] = -0.0652518; qual_match_simple_bayesian[12][42] = -0.0652359; qual_match_simple_bayesian[12][43] = -0.0652232; qual_match_simple_bayesian[12][44] = -0.0652131; qual_match_simple_bayesian[12][45] = -0.0652051; qual_match_simple_bayesian[12][46] = -0.0651987; qual_match_simple_bayesian[13][0] = -4.09197; qual_match_simple_bayesian[13][1] = -1.56717; qual_match_simple_bayesian[13][2] = -1.01863; qual_match_simple_bayesian[13][3] = -0.729426; qual_match_simple_bayesian[13][4] = -0.547528; qual_match_simple_bayesian[13][5] = -0.423448; qual_match_simple_bayesian[13][6] = -0.334804; qual_match_simple_bayesian[13][7] = -0.269595; qual_match_simple_bayesian[13][8] = -0.220668; qual_match_simple_bayesian[13][9] = -0.18344; qual_match_simple_bayesian[13][10] = -0.154827; qual_match_simple_bayesian[13][11] = -0.132667; qual_match_simple_bayesian[13][12] = -0.115409; qual_match_simple_bayesian[13][13] = -0.101909; qual_match_simple_bayesian[13][14] = -0.0913142; qual_match_simple_bayesian[13][15] = -0.0829777; qual_match_simple_bayesian[13][16] = -0.0764049; qual_match_simple_bayesian[13][17] = -0.0712146; qual_match_simple_bayesian[13][18] = -0.0671109; qual_match_simple_bayesian[13][19] = -0.0638632; qual_match_simple_bayesian[13][20] = -0.061291; qual_match_simple_bayesian[13][21] = -0.0592525; qual_match_simple_bayesian[13][22] = -0.0576362; qual_match_simple_bayesian[13][23] = -0.0563542; qual_match_simple_bayesian[13][24] = -0.055337; qual_match_simple_bayesian[13][25] = -0.0545298; qual_match_simple_bayesian[13][26] = -0.053889; qual_match_simple_bayesian[13][27] = -0.0533804; qual_match_simple_bayesian[13][28] = -0.0529765; qual_match_simple_bayesian[13][29] = -0.0526558; qual_match_simple_bayesian[13][30] = -0.0524012; qual_match_simple_bayesian[13][31] = -0.0521989; qual_match_simple_bayesian[13][32] = -0.0520383; qual_match_simple_bayesian[13][33] = -0.0519108; qual_match_simple_bayesian[13][34] = -0.0518095; qual_match_simple_bayesian[13][35] = -0.051729; qual_match_simple_bayesian[13][36] = -0.0516651; qual_match_simple_bayesian[13][37] = -0.0516143; qual_match_simple_bayesian[13][38] = -0.051574; qual_match_simple_bayesian[13][39] = -0.051542; qual_match_simple_bayesian[13][40] = -0.0515165; qual_match_simple_bayesian[13][41] = -0.0514963; qual_match_simple_bayesian[13][42] = -0.0514803; qual_match_simple_bayesian[13][43] = -0.0514675; qual_match_simple_bayesian[13][44] = -0.0514574; qual_match_simple_bayesian[13][45] = -0.0514493; qual_match_simple_bayesian[13][46] = -0.051443; qual_match_simple_bayesian[14][0] = -4.32223; qual_match_simple_bayesian[14][1] = -1.5701; qual_match_simple_bayesian[14][2] = -1.01411; qual_match_simple_bayesian[14][3] = -0.722359; qual_match_simple_bayesian[14][4] = -0.539201; qual_match_simple_bayesian[14][5] = -0.414384; qual_match_simple_bayesian[14][6] = -0.325268; qual_match_simple_bayesian[14][7] = -0.259737; qual_match_simple_bayesian[14][8] = -0.210582; qual_match_simple_bayesian[14][9] = -0.173188; qual_match_simple_bayesian[14][10] = -0.144451; qual_match_simple_bayesian[14][11] = -0.122198; qual_match_simple_bayesian[14][12] = -0.104869; qual_match_simple_bayesian[14][13] = -0.0913142; qual_match_simple_bayesian[14][14] = -0.0806768; qual_match_simple_bayesian[14][15] = -0.0723072; qual_match_simple_bayesian[14][16] = -0.0657085; qual_match_simple_bayesian[14][17] = -0.0604979; qual_match_simple_bayesian[14][18] = -0.0563782; qual_match_simple_bayesian[14][19] = -0.0531178; qual_match_simple_bayesian[14][20] = -0.0505356; qual_match_simple_bayesian[14][21] = -0.0484892; qual_match_simple_bayesian[14][22] = -0.0468667; qual_match_simple_bayesian[14][23] = -0.0455797; qual_match_simple_bayesian[14][24] = -0.0445586; qual_match_simple_bayesian[14][25] = -0.0437483; qual_match_simple_bayesian[14][26] = -0.0431051; qual_match_simple_bayesian[14][27] = -0.0425945; qual_match_simple_bayesian[14][28] = -0.0421891; qual_match_simple_bayesian[14][29] = -0.0418671; qual_match_simple_bayesian[14][30] = -0.0416115; qual_match_simple_bayesian[14][31] = -0.0414085; qual_match_simple_bayesian[14][32] = -0.0412473; qual_match_simple_bayesian[14][33] = -0.0411192; qual_match_simple_bayesian[14][34] = -0.0410175; qual_match_simple_bayesian[14][35] = -0.0409368; qual_match_simple_bayesian[14][36] = -0.0408726; qual_match_simple_bayesian[14][37] = -0.0408216; qual_match_simple_bayesian[14][38] = -0.0407812; qual_match_simple_bayesian[14][39] = -0.040749; qual_match_simple_bayesian[14][40] = -0.0407235; qual_match_simple_bayesian[14][41] = -0.0407032; qual_match_simple_bayesian[14][42] = -0.0406871; qual_match_simple_bayesian[14][43] = -0.0406743; qual_match_simple_bayesian[14][44] = -0.0406641; qual_match_simple_bayesian[14][45] = -0.040656; qual_match_simple_bayesian[14][46] = -0.0406496; qual_match_simple_bayesian[15][0] = -4.55249; qual_match_simple_bayesian[15][1] = -1.57243; qual_match_simple_bayesian[15][2] = -1.01054; qual_match_simple_bayesian[15][3] = -0.71678; qual_match_simple_bayesian[15][4] = -0.532636; qual_match_simple_bayesian[15][5] = -0.407243; qual_match_simple_bayesian[15][6] = -0.317757; qual_match_simple_bayesian[15][7] = -0.251976; qual_match_simple_bayesian[15][8] = -0.202642; qual_match_simple_bayesian[15][9] = -0.165119; qual_match_simple_bayesian[15][10] = -0.136285; qual_match_simple_bayesian[15][11] = -0.11396; qual_match_simple_bayesian[15][12] = -0.096575; qual_match_simple_bayesian[15][13] = -0.0829777; qual_match_simple_bayesian[15][14] = -0.0723072; qual_match_simple_bayesian[15][15] = -0.0639118; qual_match_simple_bayesian[15][16] = -0.0572929; qual_match_simple_bayesian[15][17] = -0.0520664; qual_match_simple_bayesian[15][18] = -0.0479342; qual_match_simple_bayesian[15][19] = -0.044664; qual_match_simple_bayesian[15][20] = -0.042074; qual_match_simple_bayesian[15][21] = -0.0400214; qual_match_simple_bayesian[15][22] = -0.038394; qual_match_simple_bayesian[15][23] = -0.0371032; qual_match_simple_bayesian[15][24] = -0.0360791; qual_match_simple_bayesian[15][25] = -0.0352663; qual_match_simple_bayesian[15][26] = -0.0346212; qual_match_simple_bayesian[15][27] = -0.0341091; qual_match_simple_bayesian[15][28] = -0.0337024; qual_match_simple_bayesian[15][29] = -0.0333796; qual_match_simple_bayesian[15][30] = -0.0331232; qual_match_simple_bayesian[15][31] = -0.0329196; qual_match_simple_bayesian[15][32] = -0.0327579; qual_match_simple_bayesian[15][33] = -0.0326294; qual_match_simple_bayesian[15][34] = -0.0325274; qual_match_simple_bayesian[15][35] = -0.0324464; qual_match_simple_bayesian[15][36] = -0.0323821; qual_match_simple_bayesian[15][37] = -0.0323309; qual_match_simple_bayesian[15][38] = -0.0322904; qual_match_simple_bayesian[15][39] = -0.0322581; qual_match_simple_bayesian[15][40] = -0.0322325; qual_match_simple_bayesian[15][41] = -0.0322121; qual_match_simple_bayesian[15][42] = -0.032196; qual_match_simple_bayesian[15][43] = -0.0321831; qual_match_simple_bayesian[15][44] = -0.032173; qual_match_simple_bayesian[15][45] = -0.0321649; qual_match_simple_bayesian[15][46] = -0.0321584; qual_match_simple_bayesian[16][0] = -4.78275; qual_match_simple_bayesian[16][1] = -1.57428; qual_match_simple_bayesian[16][2] = -1.00771; qual_match_simple_bayesian[16][3] = -0.712372; qual_match_simple_bayesian[16][4] = -0.527451; qual_match_simple_bayesian[16][5] = -0.401606; qual_match_simple_bayesian[16][6] = -0.311831; qual_match_simple_bayesian[16][7] = -0.245853; qual_match_simple_bayesian[16][8] = -0.19638; qual_match_simple_bayesian[16][9] = -0.158755; qual_match_simple_bayesian[16][10] = -0.129846; qual_match_simple_bayesian[16][11] = -0.107464; qual_match_simple_bayesian[16][12] = -0.0900357; qual_match_simple_bayesian[16][13] = -0.0764049; qual_match_simple_bayesian[16][14] = -0.0657085; qual_match_simple_bayesian[16][15] = -0.0572929; qual_match_simple_bayesian[16][16] = -0.0506582; qual_match_simple_bayesian[16][17] = -0.0454193; qual_match_simple_bayesian[16][18] = -0.0412773; qual_match_simple_bayesian[16][19] = -0.0379994; qual_match_simple_bayesian[16][20] = -0.0354033; qual_match_simple_bayesian[16][21] = -0.033346; qual_match_simple_bayesian[16][22] = -0.0317148; qual_match_simple_bayesian[16][23] = -0.0304209; qual_match_simple_bayesian[16][24] = -0.0293944; qual_match_simple_bayesian[16][25] = -0.0285798; qual_match_simple_bayesian[16][26] = -0.0279331; qual_match_simple_bayesian[16][27] = -0.0274198; qual_match_simple_bayesian[16][28] = -0.0270122; qual_match_simple_bayesian[16][29] = -0.0266886; qual_match_simple_bayesian[16][30] = -0.0264316; qual_match_simple_bayesian[16][31] = -0.0262275; qual_match_simple_bayesian[16][32] = -0.0260655; qual_match_simple_bayesian[16][33] = -0.0259367; qual_match_simple_bayesian[16][34] = -0.0258345; qual_match_simple_bayesian[16][35] = -0.0257533; qual_match_simple_bayesian[16][36] = -0.0256888; qual_match_simple_bayesian[16][37] = -0.0256376; qual_match_simple_bayesian[16][38] = -0.0255969; qual_match_simple_bayesian[16][39] = -0.0255645; qual_match_simple_bayesian[16][40] = -0.0255389; qual_match_simple_bayesian[16][41] = -0.0255185; qual_match_simple_bayesian[16][42] = -0.0255023; qual_match_simple_bayesian[16][43] = -0.0254894; qual_match_simple_bayesian[16][44] = -0.0254792; qual_match_simple_bayesian[16][45] = -0.0254711; qual_match_simple_bayesian[16][46] = -0.0254646; qual_match_simple_bayesian[17][0] = -5.01301; qual_match_simple_bayesian[17][1] = -1.57576; qual_match_simple_bayesian[17][2] = -1.00546; qual_match_simple_bayesian[17][3] = -0.708883; qual_match_simple_bayesian[17][4] = -0.523352; qual_match_simple_bayesian[17][5] = -0.397151; qual_match_simple_bayesian[17][6] = -0.307149; qual_match_simple_bayesian[17][7] = -0.241016; qual_match_simple_bayesian[17][8] = -0.191434; qual_match_simple_bayesian[17][9] = -0.153729; qual_match_simple_bayesian[17][10] = -0.124761; qual_match_simple_bayesian[17][11] = -0.102334; qual_match_simple_bayesian[17][12] = -0.0848716; qual_match_simple_bayesian[17][13] = -0.0712146; qual_match_simple_bayesian[17][14] = -0.0604979; qual_match_simple_bayesian[17][15] = -0.0520664; qual_match_simple_bayesian[17][16] = -0.0454193; qual_match_simple_bayesian[17][17] = -0.0401706; qual_match_simple_bayesian[17][18] = -0.036021; qual_match_simple_bayesian[17][19] = -0.032737; qual_match_simple_bayesian[17][20] = -0.0301362; qual_match_simple_bayesian[17][21] = -0.028075; qual_match_simple_bayesian[17][22] = -0.0264408; qual_match_simple_bayesian[17][23] = -0.0251447; qual_match_simple_bayesian[17][24] = -0.0241163; qual_match_simple_bayesian[17][25] = -0.0233001; qual_match_simple_bayesian[17][26] = -0.0226523; qual_match_simple_bayesian[17][27] = -0.0221381; qual_match_simple_bayesian[17][28] = -0.0217297; qual_match_simple_bayesian[17][29] = -0.0214055; qual_match_simple_bayesian[17][30] = -0.0211481; qual_match_simple_bayesian[17][31] = -0.0209436; qual_match_simple_bayesian[17][32] = -0.0207812; qual_match_simple_bayesian[17][33] = -0.0206523; qual_match_simple_bayesian[17][34] = -0.0205498; qual_match_simple_bayesian[17][35] = -0.0204685; qual_match_simple_bayesian[17][36] = -0.0204039; qual_match_simple_bayesian[17][37] = -0.0203526; qual_match_simple_bayesian[17][38] = -0.0203118; qual_match_simple_bayesian[17][39] = -0.0202794; qual_match_simple_bayesian[17][40] = -0.0202537; qual_match_simple_bayesian[17][41] = -0.0202333; qual_match_simple_bayesian[17][42] = -0.020217; qual_match_simple_bayesian[17][43] = -0.0202041; qual_match_simple_bayesian[17][44] = -0.0201939; qual_match_simple_bayesian[17][45] = -0.0201858; qual_match_simple_bayesian[17][46] = -0.0201793; qual_match_simple_bayesian[18][0] = -5.24327; qual_match_simple_bayesian[18][1] = -1.57693; qual_match_simple_bayesian[18][2] = -1.00368; qual_match_simple_bayesian[18][3] = -0.706121; qual_match_simple_bayesian[18][4] = -0.520107; qual_match_simple_bayesian[18][5] = -0.393627; qual_match_simple_bayesian[18][6] = -0.303445; qual_match_simple_bayesian[18][7] = -0.23719; qual_match_simple_bayesian[18][8] = -0.187522; qual_match_simple_bayesian[18][9] = -0.149755; qual_match_simple_bayesian[18][10] = -0.12074; qual_match_simple_bayesian[18][11] = -0.0982781; qual_match_simple_bayesian[18][12] = -0.0807886; qual_match_simple_bayesian[18][13] = -0.0671109; qual_match_simple_bayesian[18][14] = -0.0563782; qual_match_simple_bayesian[18][15] = -0.0479342; qual_match_simple_bayesian[18][16] = -0.0412773; qual_match_simple_bayesian[18][17] = -0.036021; qual_match_simple_bayesian[18][18] = -0.0318653; qual_match_simple_bayesian[18][19] = -0.0285766; qual_match_simple_bayesian[18][20] = -0.025972; qual_match_simple_bayesian[18][21] = -0.0239079; qual_match_simple_bayesian[18][22] = -0.0222713; qual_match_simple_bayesian[18][23] = -0.0209733; qual_match_simple_bayesian[18][24] = -0.0199434; qual_match_simple_bayesian[18][25] = -0.0191261; qual_match_simple_bayesian[18][26] = -0.0184774; qual_match_simple_bayesian[18][27] = -0.0179624; qual_match_simple_bayesian[18][28] = -0.0175535; qual_match_simple_bayesian[18][29] = -0.0172288; qual_match_simple_bayesian[18][30] = -0.016971; qual_match_simple_bayesian[18][31] = -0.0167662; qual_match_simple_bayesian[18][32] = -0.0166036; qual_match_simple_bayesian[18][33] = -0.0164745; qual_match_simple_bayesian[18][34] = -0.0163719; qual_match_simple_bayesian[18][35] = -0.0162904; qual_match_simple_bayesian[18][36] = -0.0162257; qual_match_simple_bayesian[18][37] = -0.0161743; qual_match_simple_bayesian[18][38] = -0.0161335; qual_match_simple_bayesian[18][39] = -0.0161011; qual_match_simple_bayesian[18][40] = -0.0160753; qual_match_simple_bayesian[18][41] = -0.0160549; qual_match_simple_bayesian[18][42] = -0.0160386; qual_match_simple_bayesian[18][43] = -0.0160257; qual_match_simple_bayesian[18][44] = -0.0160155; qual_match_simple_bayesian[18][45] = -0.0160073; qual_match_simple_bayesian[18][46] = -0.0160009; qual_match_simple_bayesian[19][0] = -5.47352; qual_match_simple_bayesian[19][1] = -1.57786; qual_match_simple_bayesian[19][2] = -1.00227; qual_match_simple_bayesian[19][3] = -0.703933; qual_match_simple_bayesian[19][4] = -0.517538; qual_match_simple_bayesian[19][5] = -0.390836; qual_match_simple_bayesian[19][6] = -0.300513; qual_match_simple_bayesian[19][7] = -0.234162; qual_match_simple_bayesian[19][8] = -0.184426; qual_match_simple_bayesian[19][9] = -0.146609; qual_match_simple_bayesian[19][10] = -0.117558; qual_match_simple_bayesian[19][11] = -0.0950678; qual_match_simple_bayesian[19][12] = -0.0775572; qual_match_simple_bayesian[19][13] = -0.0638632; qual_match_simple_bayesian[19][14] = -0.0531178; qual_match_simple_bayesian[19][15] = -0.044664; qual_match_simple_bayesian[19][16] = -0.0379994; qual_match_simple_bayesian[19][17] = -0.032737; qual_match_simple_bayesian[19][18] = -0.0285766; qual_match_simple_bayesian[19][19] = -0.0252842; qual_match_simple_bayesian[19][20] = -0.0226766; qual_match_simple_bayesian[19][21] = -0.0206101; qual_match_simple_bayesian[19][22] = -0.0189717; qual_match_simple_bayesian[19][23] = -0.0176722; qual_match_simple_bayesian[19][24] = -0.0166412; qual_match_simple_bayesian[19][25] = -0.015823; qual_match_simple_bayesian[19][26] = -0.0151735; qual_match_simple_bayesian[19][27] = -0.0146579; qual_match_simple_bayesian[19][28] = -0.0142486; qual_match_simple_bayesian[19][29] = -0.0139235; qual_match_simple_bayesian[19][30] = -0.0136654; qual_match_simple_bayesian[19][31] = -0.0134604; qual_match_simple_bayesian[19][32] = -0.0132976; qual_match_simple_bayesian[19][33] = -0.0131684; qual_match_simple_bayesian[19][34] = -0.0130657; qual_match_simple_bayesian[19][35] = -0.0129841; qual_match_simple_bayesian[19][36] = -0.0129193; qual_match_simple_bayesian[19][37] = -0.0128679; qual_match_simple_bayesian[19][38] = -0.012827; qual_match_simple_bayesian[19][39] = -0.0127945; qual_match_simple_bayesian[19][40] = -0.0127688; qual_match_simple_bayesian[19][41] = -0.0127483; qual_match_simple_bayesian[19][42] = -0.012732; qual_match_simple_bayesian[19][43] = -0.0127191; qual_match_simple_bayesian[19][44] = -0.0127088; qual_match_simple_bayesian[19][45] = -0.0127007; qual_match_simple_bayesian[19][46] = -0.0126942; qual_match_simple_bayesian[20][0] = -5.70378; qual_match_simple_bayesian[20][1] = -1.5786; qual_match_simple_bayesian[20][2] = -1.00115; qual_match_simple_bayesian[20][3] = -0.702197; qual_match_simple_bayesian[20][4] = -0.515502; qual_match_simple_bayesian[20][5] = -0.388625; qual_match_simple_bayesian[20][6] = -0.29819; qual_match_simple_bayesian[20][7] = -0.231763; qual_match_simple_bayesian[20][8] = -0.181973; qual_match_simple_bayesian[20][9] = -0.144117; qual_match_simple_bayesian[20][10] = -0.115037; qual_match_simple_bayesian[20][11] = -0.0925252; qual_match_simple_bayesian[20][12] = -0.0749978; qual_match_simple_bayesian[20][13] = -0.061291; qual_match_simple_bayesian[20][14] = -0.0505356; qual_match_simple_bayesian[20][15] = -0.042074; qual_match_simple_bayesian[20][16] = -0.0354033; qual_match_simple_bayesian[20][17] = -0.0301362; qual_match_simple_bayesian[20][18] = -0.025972; qual_match_simple_bayesian[20][19] = -0.0226766; qual_match_simple_bayesian[20][20] = -0.0200667; qual_match_simple_bayesian[20][21] = -0.0179984; qual_match_simple_bayesian[20][22] = -0.0163585; qual_match_simple_bayesian[20][23] = -0.0150578; qual_match_simple_bayesian[20][24] = -0.0140259; qual_match_simple_bayesian[20][25] = -0.0132069; qual_match_simple_bayesian[20][26] = -0.0125569; qual_match_simple_bayesian[20][27] = -0.0120409; qual_match_simple_bayesian[20][28] = -0.0116311; qual_match_simple_bayesian[20][29] = -0.0113058; qual_match_simple_bayesian[20][30] = -0.0110475; qual_match_simple_bayesian[20][31] = -0.0108423; qual_match_simple_bayesian[20][32] = -0.0106794; qual_match_simple_bayesian[20][33] = -0.01055; qual_match_simple_bayesian[20][34] = -0.0104472; qual_match_simple_bayesian[20][35] = -0.0103655; qual_match_simple_bayesian[20][36] = -0.0103007; qual_match_simple_bayesian[20][37] = -0.0102492; qual_match_simple_bayesian[20][38] = -0.0102083; qual_match_simple_bayesian[20][39] = -0.0101758; qual_match_simple_bayesian[20][40] = -0.01015; qual_match_simple_bayesian[20][41] = -0.0101295; qual_match_simple_bayesian[20][42] = -0.0101132; qual_match_simple_bayesian[20][43] = -0.0101003; qual_match_simple_bayesian[20][44] = -0.01009; qual_match_simple_bayesian[20][45] = -0.0100819; qual_match_simple_bayesian[20][46] = -0.0100754; qual_match_simple_bayesian[21][0] = -5.93404; qual_match_simple_bayesian[21][1] = -1.57919; qual_match_simple_bayesian[21][2] = -1.00027; qual_match_simple_bayesian[21][3] = -0.700821; qual_match_simple_bayesian[21][4] = -0.513887; qual_match_simple_bayesian[21][5] = -0.386872; qual_match_simple_bayesian[21][6] = -0.296348; qual_match_simple_bayesian[21][7] = -0.229861; qual_match_simple_bayesian[21][8] = -0.180029; qual_match_simple_bayesian[21][9] = -0.142143; qual_match_simple_bayesian[21][10] = -0.113039; qual_match_simple_bayesian[21][11] = -0.09051; qual_match_simple_bayesian[21][12] = -0.0729694; qual_match_simple_bayesian[21][13] = -0.0592525; qual_match_simple_bayesian[21][14] = -0.0484892; qual_match_simple_bayesian[21][15] = -0.0400214; qual_match_simple_bayesian[21][16] = -0.033346; qual_match_simple_bayesian[21][17] = -0.028075; qual_match_simple_bayesian[21][18] = -0.0239079; qual_match_simple_bayesian[21][19] = -0.0206101; qual_match_simple_bayesian[21][20] = -0.0179984; qual_match_simple_bayesian[21][21] = -0.0159286; qual_match_simple_bayesian[21][22] = -0.0142876; qual_match_simple_bayesian[21][23] = -0.012986; qual_match_simple_bayesian[21][24] = -0.0119533; qual_match_simple_bayesian[21][25] = -0.0111338; qual_match_simple_bayesian[21][26] = -0.0104833; qual_match_simple_bayesian[21][27] = -0.00996692; qual_match_simple_bayesian[21][28] = -0.00955691; qual_match_simple_bayesian[21][29] = -0.00923135; qual_match_simple_bayesian[21][30] = -0.00897283; qual_match_simple_bayesian[21][31] = -0.00876752; qual_match_simple_bayesian[21][32] = -0.00860447; qual_match_simple_bayesian[21][33] = -0.00847497; qual_match_simple_bayesian[21][34] = -0.00837212; qual_match_simple_bayesian[21][35] = -0.00829043; qual_match_simple_bayesian[21][36] = -0.00822555; qual_match_simple_bayesian[21][37] = -0.00817401; qual_match_simple_bayesian[21][38] = -0.00813308; qual_match_simple_bayesian[21][39] = -0.00810056; qual_match_simple_bayesian[21][40] = -0.00807474; qual_match_simple_bayesian[21][41] = -0.00805422; qual_match_simple_bayesian[21][42] = -0.00803793; qual_match_simple_bayesian[21][43] = -0.00802498; qual_match_simple_bayesian[21][44] = -0.0080147; qual_match_simple_bayesian[21][45] = -0.00800654; qual_match_simple_bayesian[21][46] = -0.00800005; qual_match_simple_bayesian[22][0] = -6.1643; qual_match_simple_bayesian[22][1] = -1.57966; qual_match_simple_bayesian[22][2] = -0.99956; qual_match_simple_bayesian[22][3] = -0.69973; qual_match_simple_bayesian[22][4] = -0.512606; qual_match_simple_bayesian[22][5] = -0.385482; qual_match_simple_bayesian[22][6] = -0.294888; qual_match_simple_bayesian[22][7] = -0.228354; qual_match_simple_bayesian[22][8] = -0.178488; qual_match_simple_bayesian[22][9] = -0.140577; qual_match_simple_bayesian[22][10] = -0.111455; qual_match_simple_bayesian[22][11] = -0.0889123; qual_match_simple_bayesian[22][12] = -0.0713612; qual_match_simple_bayesian[22][13] = -0.0576362; qual_match_simple_bayesian[22][14] = -0.0468667; qual_match_simple_bayesian[22][15] = -0.038394; qual_match_simple_bayesian[22][16] = -0.0317148; qual_match_simple_bayesian[22][17] = -0.0264408; qual_match_simple_bayesian[22][18] = -0.0222713; qual_match_simple_bayesian[22][19] = -0.0189717; qual_match_simple_bayesian[22][20] = -0.0163585; qual_match_simple_bayesian[22][21] = -0.0142876; qual_match_simple_bayesian[22][22] = -0.0126457; qual_match_simple_bayesian[22][23] = -0.0113434; qual_match_simple_bayesian[22][24] = -0.0103101; qual_match_simple_bayesian[22][25] = -0.00949014; qual_match_simple_bayesian[22][26] = -0.00883928; qual_match_simple_bayesian[22][27] = -0.00832259; qual_match_simple_bayesian[22][28] = -0.00791235; qual_match_simple_bayesian[22][29] = -0.00758661; qual_match_simple_bayesian[22][30] = -0.00732794; qual_match_simple_bayesian[22][31] = -0.00712252; qual_match_simple_bayesian[22][32] = -0.00695938; qual_match_simple_bayesian[22][33] = -0.00682981; qual_match_simple_bayesian[22][34] = -0.00672691; qual_match_simple_bayesian[22][35] = -0.00664517; qual_match_simple_bayesian[22][36] = -0.00658025; qual_match_simple_bayesian[22][37] = -0.00652869; qual_match_simple_bayesian[22][38] = -0.00648773; qual_match_simple_bayesian[22][39] = -0.0064552; qual_match_simple_bayesian[22][40] = -0.00642936; qual_match_simple_bayesian[22][41] = -0.00640883; qual_match_simple_bayesian[22][42] = -0.00639253; qual_match_simple_bayesian[22][43] = -0.00637958; qual_match_simple_bayesian[22][44] = -0.00636929; qual_match_simple_bayesian[22][45] = -0.00636112; qual_match_simple_bayesian[22][46] = -0.00635463; qual_match_simple_bayesian[23][0] = -6.39456; qual_match_simple_bayesian[23][1] = -1.58003; qual_match_simple_bayesian[23][2] = -0.999001; qual_match_simple_bayesian[23][3] = -0.698863; qual_match_simple_bayesian[23][4] = -0.51159; qual_match_simple_bayesian[23][5] = -0.384379; qual_match_simple_bayesian[23][6] = -0.29373; qual_match_simple_bayesian[23][7] = -0.227158; qual_match_simple_bayesian[23][8] = -0.177265; qual_match_simple_bayesian[23][9] = -0.139335; qual_match_simple_bayesian[23][10] = -0.110198; qual_match_simple_bayesian[23][11] = -0.0876449; qual_match_simple_bayesian[23][12] = -0.0700856; qual_match_simple_bayesian[23][13] = -0.0563542; qual_match_simple_bayesian[23][14] = -0.0455797; qual_match_simple_bayesian[23][15] = -0.0371032; qual_match_simple_bayesian[23][16] = -0.0304209; qual_match_simple_bayesian[23][17] = -0.0251447; qual_match_simple_bayesian[23][18] = -0.0209733; qual_match_simple_bayesian[23][19] = -0.0176722; qual_match_simple_bayesian[23][20] = -0.0150578; qual_match_simple_bayesian[23][21] = -0.012986; qual_match_simple_bayesian[23][22] = -0.0113434; qual_match_simple_bayesian[23][23] = -0.0100405; qual_match_simple_bayesian[23][24] = -0.00900678; qual_match_simple_bayesian[23][25] = -0.00818644; qual_match_simple_bayesian[23][26] = -0.00753529; qual_match_simple_bayesian[23][27] = -0.00701837; qual_match_simple_bayesian[23][28] = -0.00660796; qual_match_simple_bayesian[23][29] = -0.00628208; qual_match_simple_bayesian[23][30] = -0.00602329; qual_match_simple_bayesian[23][31] = -0.00581778; qual_match_simple_bayesian[23][32] = -0.00565457; qual_match_simple_bayesian[23][33] = -0.00552494; qual_match_simple_bayesian[23][34] = -0.00542199; qual_match_simple_bayesian[23][35] = -0.00534022; qual_match_simple_bayesian[23][36] = -0.00527527; qual_match_simple_bayesian[23][37] = -0.00522368; qual_match_simple_bayesian[23][38] = -0.00518271; qual_match_simple_bayesian[23][39] = -0.00515016; qual_match_simple_bayesian[23][40] = -0.00512431; qual_match_simple_bayesian[23][41] = -0.00510378; qual_match_simple_bayesian[23][42] = -0.00508747; qual_match_simple_bayesian[23][43] = -0.00507451; qual_match_simple_bayesian[23][44] = -0.00506422; qual_match_simple_bayesian[23][45] = -0.00505604; qual_match_simple_bayesian[23][46] = -0.00504955; qual_match_simple_bayesian[24][0] = -6.62482; qual_match_simple_bayesian[24][1] = -1.58033; qual_match_simple_bayesian[24][2] = -0.998557; qual_match_simple_bayesian[24][3] = -0.698176; qual_match_simple_bayesian[24][4] = -0.510784; qual_match_simple_bayesian[24][5] = -0.383503; qual_match_simple_bayesian[24][6] = -0.29281; qual_match_simple_bayesian[24][7] = -0.226208; qual_match_simple_bayesian[24][8] = -0.176295; qual_match_simple_bayesian[24][9] = -0.138349; qual_match_simple_bayesian[24][10] = -0.109202; qual_match_simple_bayesian[24][11] = -0.0866394; qual_match_simple_bayesian[24][12] = -0.0690735; qual_match_simple_bayesian[24][13] = -0.055337; qual_match_simple_bayesian[24][14] = -0.0445586; qual_match_simple_bayesian[24][15] = -0.0360791; qual_match_simple_bayesian[24][16] = -0.0293944; qual_match_simple_bayesian[24][17] = -0.0241163; qual_match_simple_bayesian[24][18] = -0.0199434; qual_match_simple_bayesian[24][19] = -0.0166412; qual_match_simple_bayesian[24][20] = -0.0140259; qual_match_simple_bayesian[24][21] = -0.0119533; qual_match_simple_bayesian[24][22] = -0.0103101; qual_match_simple_bayesian[24][23] = -0.00900678; qual_match_simple_bayesian[24][24] = -0.00797271; qual_match_simple_bayesian[24][25] = -0.00715208; qual_match_simple_bayesian[24][26] = -0.00650071; qual_match_simple_bayesian[24][27] = -0.00598361; qual_match_simple_bayesian[24][28] = -0.00557305; qual_match_simple_bayesian[24][29] = -0.00524706; qual_match_simple_bayesian[24][30] = -0.00498818; qual_match_simple_bayesian[24][31] = -0.0047826; qual_match_simple_bayesian[24][32] = -0.00461933; qual_match_simple_bayesian[24][33] = -0.00448966; qual_match_simple_bayesian[24][34] = -0.00438667; qual_match_simple_bayesian[24][35] = -0.00430487; qual_match_simple_bayesian[24][36] = -0.0042399; qual_match_simple_bayesian[24][37] = -0.0041883; qual_match_simple_bayesian[24][38] = -0.00414731; qual_match_simple_bayesian[24][39] = -0.00411475; qual_match_simple_bayesian[24][40] = -0.00408889; qual_match_simple_bayesian[24][41] = -0.00406835; qual_match_simple_bayesian[24][42] = -0.00405203; qual_match_simple_bayesian[24][43] = -0.00403907; qual_match_simple_bayesian[24][44] = -0.00402878; qual_match_simple_bayesian[24][45] = -0.0040206; qual_match_simple_bayesian[24][46] = -0.0040141; qual_match_simple_bayesian[25][0] = -6.85508; qual_match_simple_bayesian[25][1] = -1.58057; qual_match_simple_bayesian[25][2] = -0.998204; qual_match_simple_bayesian[25][3] = -0.69763; qual_match_simple_bayesian[25][4] = -0.510144; qual_match_simple_bayesian[25][5] = -0.382809; qual_match_simple_bayesian[25][6] = -0.292081; qual_match_simple_bayesian[25][7] = -0.225455; qual_match_simple_bayesian[25][8] = -0.175525; qual_match_simple_bayesian[25][9] = -0.137567; qual_match_simple_bayesian[25][10] = -0.10841; qual_match_simple_bayesian[25][11] = -0.0858414; qual_match_simple_bayesian[25][12] = -0.0682703; qual_match_simple_bayesian[25][13] = -0.0545298; qual_match_simple_bayesian[25][14] = -0.0437483; qual_match_simple_bayesian[25][15] = -0.0352663; qual_match_simple_bayesian[25][16] = -0.0285798; qual_match_simple_bayesian[25][17] = -0.0233001; qual_match_simple_bayesian[25][18] = -0.0191261; qual_match_simple_bayesian[25][19] = -0.015823; qual_match_simple_bayesian[25][20] = -0.0132069; qual_match_simple_bayesian[25][21] = -0.0111338; qual_match_simple_bayesian[25][22] = -0.00949014; qual_match_simple_bayesian[25][23] = -0.00818644; qual_match_simple_bayesian[25][24] = -0.00715208; qual_match_simple_bayesian[25][25] = -0.00633122; qual_match_simple_bayesian[25][26] = -0.00567967; qual_match_simple_bayesian[25][27] = -0.00516243; qual_match_simple_bayesian[25][28] = -0.00475176; qual_match_simple_bayesian[25][29] = -0.00442567; qual_match_simple_bayesian[25][30] = -0.00416673; qual_match_simple_bayesian[25][31] = -0.00396109; qual_match_simple_bayesian[25][32] = -0.00379778; qual_match_simple_bayesian[25][33] = -0.00366807; qual_match_simple_bayesian[25][34] = -0.00356505; qual_match_simple_bayesian[25][35] = -0.00348323; qual_match_simple_bayesian[25][36] = -0.00341824; qual_match_simple_bayesian[25][37] = -0.00336662; qual_match_simple_bayesian[25][38] = -0.00332562; qual_match_simple_bayesian[25][39] = -0.00329306; qual_match_simple_bayesian[25][40] = -0.00326719; qual_match_simple_bayesian[25][41] = -0.00324664; qual_match_simple_bayesian[25][42] = -0.00323032; qual_match_simple_bayesian[25][43] = -0.00321736; qual_match_simple_bayesian[25][44] = -0.00320706; qual_match_simple_bayesian[25][45] = -0.00319888; qual_match_simple_bayesian[25][46] = -0.00319238; qual_match_simple_bayesian[26][0] = -7.08533; qual_match_simple_bayesian[26][1] = -1.58075; qual_match_simple_bayesian[26][2] = -0.997924; qual_match_simple_bayesian[26][3] = -0.697196; qual_match_simple_bayesian[26][4] = -0.509636; qual_match_simple_bayesian[26][5] = -0.382257; qual_match_simple_bayesian[26][6] = -0.291502; qual_match_simple_bayesian[26][7] = -0.224857; qual_match_simple_bayesian[26][8] = -0.174914; qual_match_simple_bayesian[26][9] = -0.136946; qual_match_simple_bayesian[26][10] = -0.107782; qual_match_simple_bayesian[26][11] = -0.0852079; qual_match_simple_bayesian[26][12] = -0.0676327; qual_match_simple_bayesian[26][13] = -0.053889; qual_match_simple_bayesian[26][14] = -0.0431051; qual_match_simple_bayesian[26][15] = -0.0346212; qual_match_simple_bayesian[26][16] = -0.0279331; qual_match_simple_bayesian[26][17] = -0.0226523; qual_match_simple_bayesian[26][18] = -0.0184774; qual_match_simple_bayesian[26][19] = -0.0151735; qual_match_simple_bayesian[26][20] = -0.0125569; qual_match_simple_bayesian[26][21] = -0.0104833; qual_match_simple_bayesian[26][22] = -0.00883928; qual_match_simple_bayesian[26][23] = -0.00753529; qual_match_simple_bayesian[26][24] = -0.00650071; qual_match_simple_bayesian[26][25] = -0.00567967; qual_match_simple_bayesian[26][26] = -0.00502798; qual_match_simple_bayesian[26][27] = -0.00451062; qual_match_simple_bayesian[26][28] = -0.00409986; qual_match_simple_bayesian[26][29] = -0.00377371; qual_match_simple_bayesian[26][30] = -0.00351471; qual_match_simple_bayesian[26][31] = -0.00330902; qual_match_simple_bayesian[26][32] = -0.00314567; qual_match_simple_bayesian[26][33] = -0.00301594; qual_match_simple_bayesian[26][34] = -0.0029129; qual_match_simple_bayesian[26][35] = -0.00283106; qual_match_simple_bayesian[26][36] = -0.00276606; qual_match_simple_bayesian[26][37] = -0.00271443; qual_match_simple_bayesian[26][38] = -0.00267342; qual_match_simple_bayesian[26][39] = -0.00264084; qual_match_simple_bayesian[26][40] = -0.00261497; qual_match_simple_bayesian[26][41] = -0.00259442; qual_match_simple_bayesian[26][42] = -0.00257809; qual_match_simple_bayesian[26][43] = -0.00256512; qual_match_simple_bayesian[26][44] = -0.00255482; qual_match_simple_bayesian[26][45] = -0.00254664; qual_match_simple_bayesian[26][46] = -0.00254014; qual_match_simple_bayesian[27][0] = -7.31559; qual_match_simple_bayesian[27][1] = -1.5809; qual_match_simple_bayesian[27][2] = -0.997702; qual_match_simple_bayesian[27][3] = -0.696852; qual_match_simple_bayesian[27][4] = -0.509232; qual_match_simple_bayesian[27][5] = -0.38182; qual_match_simple_bayesian[27][6] = -0.291042; qual_match_simple_bayesian[27][7] = -0.224383; qual_match_simple_bayesian[27][8] = -0.174428; qual_match_simple_bayesian[27][9] = -0.136453; qual_match_simple_bayesian[27][10] = -0.107284; qual_match_simple_bayesian[27][11] = -0.0847051; qual_match_simple_bayesian[27][12] = -0.0671265; qual_match_simple_bayesian[27][13] = -0.0533804; qual_match_simple_bayesian[27][14] = -0.0425945; qual_match_simple_bayesian[27][15] = -0.0341091; qual_match_simple_bayesian[27][16] = -0.0274198; qual_match_simple_bayesian[27][17] = -0.0221381; qual_match_simple_bayesian[27][18] = -0.0179624; qual_match_simple_bayesian[27][19] = -0.0146579; qual_match_simple_bayesian[27][20] = -0.0120409; qual_match_simple_bayesian[27][21] = -0.00996692; qual_match_simple_bayesian[27][22] = -0.00832259; qual_match_simple_bayesian[27][23] = -0.00701837; qual_match_simple_bayesian[27][24] = -0.00598361; qual_match_simple_bayesian[27][25] = -0.00516243; qual_match_simple_bayesian[27][26] = -0.00451062; qual_match_simple_bayesian[27][27] = -0.00399318; qual_match_simple_bayesian[27][28] = -0.00358235; qual_match_simple_bayesian[27][29] = -0.00325613; qual_match_simple_bayesian[27][30] = -0.00299709; qual_match_simple_bayesian[27][31] = -0.00279137; qual_match_simple_bayesian[27][32] = -0.00262799; qual_match_simple_bayesian[27][33] = -0.00249823; qual_match_simple_bayesian[27][34] = -0.00239518; qual_match_simple_bayesian[27][35] = -0.00231332; qual_match_simple_bayesian[27][36] = -0.00224831; qual_match_simple_bayesian[27][37] = -0.00219667; qual_match_simple_bayesian[27][38] = -0.00215565; qual_match_simple_bayesian[27][39] = -0.00212307; qual_match_simple_bayesian[27][40] = -0.00209719; qual_match_simple_bayesian[27][41] = -0.00207664; qual_match_simple_bayesian[27][42] = -0.00206031; qual_match_simple_bayesian[27][43] = -0.00204734; qual_match_simple_bayesian[27][44] = -0.00203704; qual_match_simple_bayesian[27][45] = -0.00202886; qual_match_simple_bayesian[27][46] = -0.00202236; qual_match_simple_bayesian[28][0] = -7.54585; qual_match_simple_bayesian[28][1] = -1.58102; qual_match_simple_bayesian[28][2] = -0.997525; qual_match_simple_bayesian[28][3] = -0.696579; qual_match_simple_bayesian[28][4] = -0.508912; qual_match_simple_bayesian[28][5] = -0.381472; qual_match_simple_bayesian[28][6] = -0.290677; qual_match_simple_bayesian[28][7] = -0.224006; qual_match_simple_bayesian[28][8] = -0.174043; qual_match_simple_bayesian[28][9] = -0.136062; qual_match_simple_bayesian[28][10] = -0.106888; qual_match_simple_bayesian[28][11] = -0.0843058; qual_match_simple_bayesian[28][12] = -0.0667247; qual_match_simple_bayesian[28][13] = -0.0529765; qual_match_simple_bayesian[28][14] = -0.0421891; qual_match_simple_bayesian[28][15] = -0.0337024; qual_match_simple_bayesian[28][16] = -0.0270122; qual_match_simple_bayesian[28][17] = -0.0217297; qual_match_simple_bayesian[28][18] = -0.0175535; qual_match_simple_bayesian[28][19] = -0.0142486; qual_match_simple_bayesian[28][20] = -0.0116311; qual_match_simple_bayesian[28][21] = -0.00955691; qual_match_simple_bayesian[28][22] = -0.00791235; qual_match_simple_bayesian[28][23] = -0.00660796; qual_match_simple_bayesian[28][24] = -0.00557305; qual_match_simple_bayesian[28][25] = -0.00475176; qual_match_simple_bayesian[28][26] = -0.00409986; qual_match_simple_bayesian[28][27] = -0.00358235; qual_match_simple_bayesian[28][28] = -0.00317146; qual_match_simple_bayesian[28][29] = -0.0028452; qual_match_simple_bayesian[28][30] = -0.00258612; qual_match_simple_bayesian[28][31] = -0.00238037; qual_match_simple_bayesian[28][32] = -0.00221697; qual_match_simple_bayesian[28][33] = -0.0020872; qual_match_simple_bayesian[28][34] = -0.00198413; qual_match_simple_bayesian[28][35] = -0.00190226; qual_match_simple_bayesian[28][36] = -0.00183724; qual_match_simple_bayesian[28][37] = -0.00178559; qual_match_simple_bayesian[28][38] = -0.00174457; qual_match_simple_bayesian[28][39] = -0.00171198; qual_match_simple_bayesian[28][40] = -0.0016861; qual_match_simple_bayesian[28][41] = -0.00166554; qual_match_simple_bayesian[28][42] = -0.00164921; qual_match_simple_bayesian[28][43] = -0.00163624; qual_match_simple_bayesian[28][44] = -0.00162594; qual_match_simple_bayesian[28][45] = -0.00161776; qual_match_simple_bayesian[28][46] = -0.00161126; qual_match_simple_bayesian[29][0] = -7.77611; qual_match_simple_bayesian[29][1] = -1.58111; qual_match_simple_bayesian[29][2] = -0.997385; qual_match_simple_bayesian[29][3] = -0.696362; qual_match_simple_bayesian[29][4] = -0.508658; qual_match_simple_bayesian[29][5] = -0.381196; qual_match_simple_bayesian[29][6] = -0.290387; qual_match_simple_bayesian[29][7] = -0.223707; qual_match_simple_bayesian[29][8] = -0.173737; qual_match_simple_bayesian[29][9] = -0.135751; qual_match_simple_bayesian[29][10] = -0.106574; qual_match_simple_bayesian[29][11] = -0.0839888; qual_match_simple_bayesian[29][12] = -0.0664056; qual_match_simple_bayesian[29][13] = -0.0526558; qual_match_simple_bayesian[29][14] = -0.0418671; qual_match_simple_bayesian[29][15] = -0.0333796; qual_match_simple_bayesian[29][16] = -0.0266886; qual_match_simple_bayesian[29][17] = -0.0214055; qual_match_simple_bayesian[29][18] = -0.0172288; qual_match_simple_bayesian[29][19] = -0.0139235; qual_match_simple_bayesian[29][20] = -0.0113058; qual_match_simple_bayesian[29][21] = -0.00923135; qual_match_simple_bayesian[29][22] = -0.00758661; qual_match_simple_bayesian[29][23] = -0.00628208; qual_match_simple_bayesian[29][24] = -0.00524706; qual_match_simple_bayesian[29][25] = -0.00442567; qual_match_simple_bayesian[29][26] = -0.00377371; qual_match_simple_bayesian[29][27] = -0.00325613; qual_match_simple_bayesian[29][28] = -0.0028452; qual_match_simple_bayesian[29][29] = -0.00251891; qual_match_simple_bayesian[29][30] = -0.0022598; qual_match_simple_bayesian[29][31] = -0.00205403; qual_match_simple_bayesian[29][32] = -0.00189061; qual_match_simple_bayesian[29][33] = -0.00176082; qual_match_simple_bayesian[29][34] = -0.00165774; qual_match_simple_bayesian[29][35] = -0.00157586; qual_match_simple_bayesian[29][36] = -0.00151083; qual_match_simple_bayesian[29][37] = -0.00145918; qual_match_simple_bayesian[29][38] = -0.00141815; qual_match_simple_bayesian[29][39] = -0.00138557; qual_match_simple_bayesian[29][40] = -0.00135968; qual_match_simple_bayesian[29][41] = -0.00133912; qual_match_simple_bayesian[29][42] = -0.00132279; qual_match_simple_bayesian[29][43] = -0.00130982; qual_match_simple_bayesian[29][44] = -0.00129951; qual_match_simple_bayesian[29][45] = -0.00129133; qual_match_simple_bayesian[29][46] = -0.00128483; qual_match_simple_bayesian[30][0] = -8.00637; qual_match_simple_bayesian[30][1] = -1.58119; qual_match_simple_bayesian[30][2] = -0.997273; qual_match_simple_bayesian[30][3] = -0.69619; qual_match_simple_bayesian[30][4] = -0.508456; qual_match_simple_bayesian[30][5] = -0.380977; qual_match_simple_bayesian[30][6] = -0.290157; qual_match_simple_bayesian[30][7] = -0.223469; qual_match_simple_bayesian[30][8] = -0.173494; qual_match_simple_bayesian[30][9] = -0.135504; qual_match_simple_bayesian[30][10] = -0.106324; qual_match_simple_bayesian[30][11] = -0.083737; qual_match_simple_bayesian[30][12] = -0.0661522; qual_match_simple_bayesian[30][13] = -0.0524012; qual_match_simple_bayesian[30][14] = -0.0416115; qual_match_simple_bayesian[30][15] = -0.0331232; qual_match_simple_bayesian[30][16] = -0.0264316; qual_match_simple_bayesian[30][17] = -0.0211481; qual_match_simple_bayesian[30][18] = -0.016971; qual_match_simple_bayesian[30][19] = -0.0136654; qual_match_simple_bayesian[30][20] = -0.0110475; qual_match_simple_bayesian[30][21] = -0.00897283; qual_match_simple_bayesian[30][22] = -0.00732794; qual_match_simple_bayesian[30][23] = -0.00602329; qual_match_simple_bayesian[30][24] = -0.00498818; qual_match_simple_bayesian[30][25] = -0.00416673; qual_match_simple_bayesian[30][26] = -0.00351471; qual_match_simple_bayesian[30][27] = -0.00299709; qual_match_simple_bayesian[30][28] = -0.00258612; qual_match_simple_bayesian[30][29] = -0.0022598; qual_match_simple_bayesian[30][30] = -0.00200067; qual_match_simple_bayesian[30][31] = -0.00179488; qual_match_simple_bayesian[30][32] = -0.00163145; qual_match_simple_bayesian[30][33] = -0.00150165; qual_match_simple_bayesian[30][34] = -0.00139855; qual_match_simple_bayesian[30][35] = -0.00131667; qual_match_simple_bayesian[30][36] = -0.00125164; qual_match_simple_bayesian[30][37] = -0.00119998; qual_match_simple_bayesian[30][38] = -0.00115895; qual_match_simple_bayesian[30][39] = -0.00112636; qual_match_simple_bayesian[30][40] = -0.00110047; qual_match_simple_bayesian[30][41] = -0.00107991; qual_match_simple_bayesian[30][42] = -0.00106358; qual_match_simple_bayesian[30][43] = -0.0010506; qual_match_simple_bayesian[30][44] = -0.0010403; qual_match_simple_bayesian[30][45] = -0.00103211; qual_match_simple_bayesian[30][46] = -0.00102561; qual_match_simple_bayesian[31][0] = -8.23663; qual_match_simple_bayesian[31][1] = -1.58125; qual_match_simple_bayesian[31][2] = -0.997185; qual_match_simple_bayesian[31][3] = -0.696053; qual_match_simple_bayesian[31][4] = -0.508295; qual_match_simple_bayesian[31][5] = -0.380803; qual_match_simple_bayesian[31][6] = -0.289974; qual_match_simple_bayesian[31][7] = -0.22328; qual_match_simple_bayesian[31][8] = -0.173301; qual_match_simple_bayesian[31][9] = -0.135308; qual_match_simple_bayesian[31][10] = -0.106126; qual_match_simple_bayesian[31][11] = -0.0835371; qual_match_simple_bayesian[31][12] = -0.065951; qual_match_simple_bayesian[31][13] = -0.0521989; qual_match_simple_bayesian[31][14] = -0.0414085; qual_match_simple_bayesian[31][15] = -0.0329196; qual_match_simple_bayesian[31][16] = -0.0262275; qual_match_simple_bayesian[31][17] = -0.0209436; qual_match_simple_bayesian[31][18] = -0.0167662; qual_match_simple_bayesian[31][19] = -0.0134604; qual_match_simple_bayesian[31][20] = -0.0108423; qual_match_simple_bayesian[31][21] = -0.00876752; qual_match_simple_bayesian[31][22] = -0.00712252; qual_match_simple_bayesian[31][23] = -0.00581778; qual_match_simple_bayesian[31][24] = -0.0047826; qual_match_simple_bayesian[31][25] = -0.00396109; qual_match_simple_bayesian[31][26] = -0.00330902; qual_match_simple_bayesian[31][27] = -0.00279137; qual_match_simple_bayesian[31][28] = -0.00238037; qual_match_simple_bayesian[31][29] = -0.00205403; qual_match_simple_bayesian[31][30] = -0.00179488; qual_match_simple_bayesian[31][31] = -0.00158908; qual_match_simple_bayesian[31][32] = -0.00142563; qual_match_simple_bayesian[31][33] = -0.00129582; qual_match_simple_bayesian[31][34] = -0.00119272; qual_match_simple_bayesian[31][35] = -0.00111084; qual_match_simple_bayesian[31][36] = -0.0010458; qual_match_simple_bayesian[31][37] = -0.000994137; qual_match_simple_bayesian[31][38] = -0.000953104; qual_match_simple_bayesian[31][39] = -0.000920511; qual_match_simple_bayesian[31][40] = -0.000894622; qual_match_simple_bayesian[31][41] = -0.000874059; qual_match_simple_bayesian[31][42] = -0.000857725; qual_match_simple_bayesian[31][43] = -0.000844751; qual_match_simple_bayesian[31][44] = -0.000834445; qual_match_simple_bayesian[31][45] = -0.000826259; qual_match_simple_bayesian[31][46] = -0.000819756; qual_match_simple_bayesian[32][0] = -8.46688; qual_match_simple_bayesian[32][1] = -1.58129; qual_match_simple_bayesian[32][2] = -0.997114; qual_match_simple_bayesian[32][3] = -0.695944; qual_match_simple_bayesian[32][4] = -0.508168; qual_match_simple_bayesian[32][5] = -0.380664; qual_match_simple_bayesian[32][6] = -0.289829; qual_match_simple_bayesian[32][7] = -0.22313; qual_match_simple_bayesian[32][8] = -0.173148; qual_match_simple_bayesian[32][9] = -0.135153; qual_match_simple_bayesian[32][10] = -0.105968; qual_match_simple_bayesian[32][11] = -0.0833783; qual_match_simple_bayesian[32][12] = -0.0657912; qual_match_simple_bayesian[32][13] = -0.0520383; qual_match_simple_bayesian[32][14] = -0.0412473; qual_match_simple_bayesian[32][15] = -0.0327579; qual_match_simple_bayesian[32][16] = -0.0260655; qual_match_simple_bayesian[32][17] = -0.0207812; qual_match_simple_bayesian[32][18] = -0.0166036; qual_match_simple_bayesian[32][19] = -0.0132976; qual_match_simple_bayesian[32][20] = -0.0106794; qual_match_simple_bayesian[32][21] = -0.00860447; qual_match_simple_bayesian[32][22] = -0.00695938; qual_match_simple_bayesian[32][23] = -0.00565457; qual_match_simple_bayesian[32][24] = -0.00461933; qual_match_simple_bayesian[32][25] = -0.00379778; qual_match_simple_bayesian[32][26] = -0.00314567; qual_match_simple_bayesian[32][27] = -0.00262799; qual_match_simple_bayesian[32][28] = -0.00221697; qual_match_simple_bayesian[32][29] = -0.00189061; qual_match_simple_bayesian[32][30] = -0.00163145; qual_match_simple_bayesian[32][31] = -0.00142563; qual_match_simple_bayesian[32][32] = -0.00126218; qual_match_simple_bayesian[32][33] = -0.00113236; qual_match_simple_bayesian[32][34] = -0.00102926; qual_match_simple_bayesian[32][35] = -0.000947368; qual_match_simple_bayesian[32][36] = -0.000882324; qual_match_simple_bayesian[32][37] = -0.000830661; qual_match_simple_bayesian[32][38] = -0.000789625; qual_match_simple_bayesian[32][39] = -0.00075703; qual_match_simple_bayesian[32][40] = -0.00073114; qual_match_simple_bayesian[32][41] = -0.000710576; qual_match_simple_bayesian[32][42] = -0.000694241; qual_match_simple_bayesian[32][43] = -0.000681266; qual_match_simple_bayesian[32][44] = -0.00067096; qual_match_simple_bayesian[32][45] = -0.000662773; qual_match_simple_bayesian[32][46] = -0.00065627; qual_match_simple_bayesian[33][0] = -8.69714; qual_match_simple_bayesian[33][1] = -1.58133; qual_match_simple_bayesian[33][2] = -0.997059; qual_match_simple_bayesian[33][3] = -0.695858; qual_match_simple_bayesian[33][4] = -0.508067; qual_match_simple_bayesian[33][5] = -0.380554; qual_match_simple_bayesian[33][6] = -0.289713; qual_match_simple_bayesian[33][7] = -0.223011; qual_match_simple_bayesian[33][8] = -0.173026; qual_match_simple_bayesian[33][9] = -0.135029; qual_match_simple_bayesian[33][10] = -0.105843; qual_match_simple_bayesian[33][11] = -0.0832522; qual_match_simple_bayesian[33][12] = -0.0656642; qual_match_simple_bayesian[33][13] = -0.0519108; qual_match_simple_bayesian[33][14] = -0.0411192; qual_match_simple_bayesian[33][15] = -0.0326294; qual_match_simple_bayesian[33][16] = -0.0259367; qual_match_simple_bayesian[33][17] = -0.0206523; qual_match_simple_bayesian[33][18] = -0.0164745; qual_match_simple_bayesian[33][19] = -0.0131684; qual_match_simple_bayesian[33][20] = -0.01055; qual_match_simple_bayesian[33][21] = -0.00847497; qual_match_simple_bayesian[33][22] = -0.00682981; qual_match_simple_bayesian[33][23] = -0.00552494; qual_match_simple_bayesian[33][24] = -0.00448966; qual_match_simple_bayesian[33][25] = -0.00366807; qual_match_simple_bayesian[33][26] = -0.00301594; qual_match_simple_bayesian[33][27] = -0.00249823; qual_match_simple_bayesian[33][28] = -0.0020872; qual_match_simple_bayesian[33][29] = -0.00176082; qual_match_simple_bayesian[33][30] = -0.00150165; qual_match_simple_bayesian[33][31] = -0.00129582; qual_match_simple_bayesian[33][32] = -0.00113236; qual_match_simple_bayesian[33][33] = -0.00100254; qual_match_simple_bayesian[33][34] = -0.000899433; qual_match_simple_bayesian[33][35] = -0.000817538; qual_match_simple_bayesian[33][36] = -0.000752491; qual_match_simple_bayesian[33][37] = -0.000700826; qual_match_simple_bayesian[33][38] = -0.000659788; qual_match_simple_bayesian[33][39] = -0.000627192; qual_match_simple_bayesian[33][40] = -0.000601301; qual_match_simple_bayesian[33][41] = -0.000580736; qual_match_simple_bayesian[33][42] = -0.0005644; qual_match_simple_bayesian[33][43] = -0.000551424; qual_match_simple_bayesian[33][44] = -0.000541118; qual_match_simple_bayesian[33][45] = -0.000532931; qual_match_simple_bayesian[33][46] = -0.000526428; qual_match_simple_bayesian[34][0] = -8.9274; qual_match_simple_bayesian[34][1] = -1.58136; qual_match_simple_bayesian[34][2] = -0.997014; qual_match_simple_bayesian[34][3] = -0.695789; qual_match_simple_bayesian[34][4] = -0.507986; qual_match_simple_bayesian[34][5] = -0.380467; qual_match_simple_bayesian[34][6] = -0.289622; qual_match_simple_bayesian[34][7] = -0.222917; qual_match_simple_bayesian[34][8] = -0.17293; qual_match_simple_bayesian[34][9] = -0.134931; qual_match_simple_bayesian[34][10] = -0.105744; qual_match_simple_bayesian[34][11] = -0.083152; qual_match_simple_bayesian[34][12] = -0.0655634; qual_match_simple_bayesian[34][13] = -0.0518095; qual_match_simple_bayesian[34][14] = -0.0410175; qual_match_simple_bayesian[34][15] = -0.0325274; qual_match_simple_bayesian[34][16] = -0.0258345; qual_match_simple_bayesian[34][17] = -0.0205498; qual_match_simple_bayesian[34][18] = -0.0163719; qual_match_simple_bayesian[34][19] = -0.0130657; qual_match_simple_bayesian[34][20] = -0.0104472; qual_match_simple_bayesian[34][21] = -0.00837212; qual_match_simple_bayesian[34][22] = -0.00672691; qual_match_simple_bayesian[34][23] = -0.00542199; qual_match_simple_bayesian[34][24] = -0.00438667; qual_match_simple_bayesian[34][25] = -0.00356505; qual_match_simple_bayesian[34][26] = -0.0029129; qual_match_simple_bayesian[34][27] = -0.00239518; qual_match_simple_bayesian[34][28] = -0.00198413; qual_match_simple_bayesian[34][29] = -0.00165774; qual_match_simple_bayesian[34][30] = -0.00139855; qual_match_simple_bayesian[34][31] = -0.00119272; qual_match_simple_bayesian[34][32] = -0.00102926; qual_match_simple_bayesian[34][33] = -0.000899433; qual_match_simple_bayesian[34][34] = -0.00079632; qual_match_simple_bayesian[34][35] = -0.000714422; qual_match_simple_bayesian[34][36] = -0.000649373; qual_match_simple_bayesian[34][37] = -0.000597706; qual_match_simple_bayesian[34][38] = -0.000556667; qual_match_simple_bayesian[34][39] = -0.00052407; qual_match_simple_bayesian[34][40] = -0.000498178; qual_match_simple_bayesian[34][41] = -0.000477612; qual_match_simple_bayesian[34][42] = -0.000461276; qual_match_simple_bayesian[34][43] = -0.0004483; qual_match_simple_bayesian[34][44] = -0.000437993; qual_match_simple_bayesian[34][45] = -0.000429806; qual_match_simple_bayesian[34][46] = -0.000423302; qual_match_simple_bayesian[35][0] = -9.15766; qual_match_simple_bayesian[35][1] = -1.58138; qual_match_simple_bayesian[35][2] = -0.996979; qual_match_simple_bayesian[35][3] = -0.695735; qual_match_simple_bayesian[35][4] = -0.507922; qual_match_simple_bayesian[35][5] = -0.380398; qual_match_simple_bayesian[35][6] = -0.289549; qual_match_simple_bayesian[35][7] = -0.222842; qual_match_simple_bayesian[35][8] = -0.172853; qual_match_simple_bayesian[35][9] = -0.134853; qual_match_simple_bayesian[35][10] = -0.105665; qual_match_simple_bayesian[35][11] = -0.0830725; qual_match_simple_bayesian[35][12] = -0.0654833; qual_match_simple_bayesian[35][13] = -0.051729; qual_match_simple_bayesian[35][14] = -0.0409368; qual_match_simple_bayesian[35][15] = -0.0324464; qual_match_simple_bayesian[35][16] = -0.0257533; qual_match_simple_bayesian[35][17] = -0.0204685; qual_match_simple_bayesian[35][18] = -0.0162904; qual_match_simple_bayesian[35][19] = -0.0129841; qual_match_simple_bayesian[35][20] = -0.0103655; qual_match_simple_bayesian[35][21] = -0.00829043; qual_match_simple_bayesian[35][22] = -0.00664517; qual_match_simple_bayesian[35][23] = -0.00534022; qual_match_simple_bayesian[35][24] = -0.00430487; qual_match_simple_bayesian[35][25] = -0.00348323; qual_match_simple_bayesian[35][26] = -0.00283106; qual_match_simple_bayesian[35][27] = -0.00231332; qual_match_simple_bayesian[35][28] = -0.00190226; qual_match_simple_bayesian[35][29] = -0.00157586; qual_match_simple_bayesian[35][30] = -0.00131667; qual_match_simple_bayesian[35][31] = -0.00111084; qual_match_simple_bayesian[35][32] = -0.000947368; qual_match_simple_bayesian[35][33] = -0.000817538; qual_match_simple_bayesian[35][34] = -0.000714422; qual_match_simple_bayesian[35][35] = -0.000632522; qual_match_simple_bayesian[35][36] = -0.000567471; qual_match_simple_bayesian[35][37] = -0.000515803; qual_match_simple_bayesian[35][38] = -0.000474763; qual_match_simple_bayesian[35][39] = -0.000442165; qual_match_simple_bayesian[35][40] = -0.000416272; qual_match_simple_bayesian[35][41] = -0.000395705; qual_match_simple_bayesian[35][42] = -0.000379369; qual_match_simple_bayesian[35][43] = -0.000366392; qual_match_simple_bayesian[35][44] = -0.000356085; qual_match_simple_bayesian[35][45] = -0.000347898; qual_match_simple_bayesian[35][46] = -0.000341394; qual_match_simple_bayesian[36][0] = -9.38792; qual_match_simple_bayesian[36][1] = -1.5814; qual_match_simple_bayesian[36][2] = -0.996951; qual_match_simple_bayesian[36][3] = -0.695692; qual_match_simple_bayesian[36][4] = -0.507872; qual_match_simple_bayesian[36][5] = -0.380343; qual_match_simple_bayesian[36][6] = -0.289491; qual_match_simple_bayesian[36][7] = -0.222782; qual_match_simple_bayesian[36][8] = -0.172792; qual_match_simple_bayesian[36][9] = -0.134791; qual_match_simple_bayesian[36][10] = -0.105602; qual_match_simple_bayesian[36][11] = -0.0830093; qual_match_simple_bayesian[36][12] = -0.0654198; qual_match_simple_bayesian[36][13] = -0.0516651; qual_match_simple_bayesian[36][14] = -0.0408726; qual_match_simple_bayesian[36][15] = -0.0323821; qual_match_simple_bayesian[36][16] = -0.0256888; qual_match_simple_bayesian[36][17] = -0.0204039; qual_match_simple_bayesian[36][18] = -0.0162257; qual_match_simple_bayesian[36][19] = -0.0129193; qual_match_simple_bayesian[36][20] = -0.0103007; qual_match_simple_bayesian[36][21] = -0.00822555; qual_match_simple_bayesian[36][22] = -0.00658025; qual_match_simple_bayesian[36][23] = -0.00527527; qual_match_simple_bayesian[36][24] = -0.0042399; qual_match_simple_bayesian[36][25] = -0.00341824; qual_match_simple_bayesian[36][26] = -0.00276606; qual_match_simple_bayesian[36][27] = -0.00224831; qual_match_simple_bayesian[36][28] = -0.00183724; qual_match_simple_bayesian[36][29] = -0.00151083; qual_match_simple_bayesian[36][30] = -0.00125164; qual_match_simple_bayesian[36][31] = -0.0010458; qual_match_simple_bayesian[36][32] = -0.000882324; qual_match_simple_bayesian[36][33] = -0.000752491; qual_match_simple_bayesian[36][34] = -0.000649373; qual_match_simple_bayesian[36][35] = -0.000567471; qual_match_simple_bayesian[36][36] = -0.000502419; qual_match_simple_bayesian[36][37] = -0.00045075; qual_match_simple_bayesian[36][38] = -0.000409709; qual_match_simple_bayesian[36][39] = -0.00037711; qual_match_simple_bayesian[36][40] = -0.000351217; qual_match_simple_bayesian[36][41] = -0.00033065; qual_match_simple_bayesian[36][42] = -0.000314313; qual_match_simple_bayesian[36][43] = -0.000301336; qual_match_simple_bayesian[36][44] = -0.000291028; qual_match_simple_bayesian[36][45] = -0.000282841; qual_match_simple_bayesian[36][46] = -0.000276337; qual_match_simple_bayesian[37][0] = -9.61818; qual_match_simple_bayesian[37][1] = -1.58142; qual_match_simple_bayesian[37][2] = -0.996929; qual_match_simple_bayesian[37][3] = -0.695657; qual_match_simple_bayesian[37][4] = -0.507831; qual_match_simple_bayesian[37][5] = -0.380299; qual_match_simple_bayesian[37][6] = -0.289445; qual_match_simple_bayesian[37][7] = -0.222734; qual_match_simple_bayesian[37][8] = -0.172744; qual_match_simple_bayesian[37][9] = -0.134742; qual_match_simple_bayesian[37][10] = -0.105553; qual_match_simple_bayesian[37][11] = -0.0829591; qual_match_simple_bayesian[37][12] = -0.0653692; qual_match_simple_bayesian[37][13] = -0.0516143; qual_match_simple_bayesian[37][14] = -0.0408216; qual_match_simple_bayesian[37][15] = -0.0323309; qual_match_simple_bayesian[37][16] = -0.0256376; qual_match_simple_bayesian[37][17] = -0.0203526; qual_match_simple_bayesian[37][18] = -0.0161743; qual_match_simple_bayesian[37][19] = -0.0128679; qual_match_simple_bayesian[37][20] = -0.0102492; qual_match_simple_bayesian[37][21] = -0.00817401; qual_match_simple_bayesian[37][22] = -0.00652869; qual_match_simple_bayesian[37][23] = -0.00522368; qual_match_simple_bayesian[37][24] = -0.0041883; qual_match_simple_bayesian[37][25] = -0.00336662; qual_match_simple_bayesian[37][26] = -0.00271443; qual_match_simple_bayesian[37][27] = -0.00219667; qual_match_simple_bayesian[37][28] = -0.00178559; qual_match_simple_bayesian[37][29] = -0.00145918; qual_match_simple_bayesian[37][30] = -0.00119998; qual_match_simple_bayesian[37][31] = -0.000994137; qual_match_simple_bayesian[37][32] = -0.000830661; qual_match_simple_bayesian[37][33] = -0.000700826; qual_match_simple_bayesian[37][34] = -0.000597706; qual_match_simple_bayesian[37][35] = -0.000515803; qual_match_simple_bayesian[37][36] = -0.00045075; qual_match_simple_bayesian[37][37] = -0.000399079; qual_match_simple_bayesian[37][38] = -0.000358037; qual_match_simple_bayesian[37][39] = -0.000325438; qual_match_simple_bayesian[37][40] = -0.000299544; qual_match_simple_bayesian[37][41] = -0.000278977; qual_match_simple_bayesian[37][42] = -0.00026264; qual_match_simple_bayesian[37][43] = -0.000249663; qual_match_simple_bayesian[37][44] = -0.000239355; qual_match_simple_bayesian[37][45] = -0.000231167; qual_match_simple_bayesian[37][46] = -0.000224664; qual_match_simple_bayesian[38][0] = -9.84844; qual_match_simple_bayesian[38][1] = -1.58143; qual_match_simple_bayesian[38][2] = -0.996911; qual_match_simple_bayesian[38][3] = -0.69563; qual_match_simple_bayesian[38][4] = -0.507799; qual_match_simple_bayesian[38][5] = -0.380264; qual_match_simple_bayesian[38][6] = -0.289409; qual_match_simple_bayesian[38][7] = -0.222697; qual_match_simple_bayesian[38][8] = -0.172705; qual_match_simple_bayesian[38][9] = -0.134703; qual_match_simple_bayesian[38][10] = -0.105513; qual_match_simple_bayesian[38][11] = -0.0829192; qual_match_simple_bayesian[38][12] = -0.0653291; qual_match_simple_bayesian[38][13] = -0.051574; qual_match_simple_bayesian[38][14] = -0.0407812; qual_match_simple_bayesian[38][15] = -0.0322904; qual_match_simple_bayesian[38][16] = -0.0255969; qual_match_simple_bayesian[38][17] = -0.0203118; qual_match_simple_bayesian[38][18] = -0.0161335; qual_match_simple_bayesian[38][19] = -0.012827; qual_match_simple_bayesian[38][20] = -0.0102083; qual_match_simple_bayesian[38][21] = -0.00813308; qual_match_simple_bayesian[38][22] = -0.00648773; qual_match_simple_bayesian[38][23] = -0.00518271; qual_match_simple_bayesian[38][24] = -0.00414731; qual_match_simple_bayesian[38][25] = -0.00332562; qual_match_simple_bayesian[38][26] = -0.00267342; qual_match_simple_bayesian[38][27] = -0.00215565; qual_match_simple_bayesian[38][28] = -0.00174457; qual_match_simple_bayesian[38][29] = -0.00141815; qual_match_simple_bayesian[38][30] = -0.00115895; qual_match_simple_bayesian[38][31] = -0.000953104; qual_match_simple_bayesian[38][32] = -0.000789625; qual_match_simple_bayesian[38][33] = -0.000659788; qual_match_simple_bayesian[38][34] = -0.000556667; qual_match_simple_bayesian[38][35] = -0.000474763; qual_match_simple_bayesian[38][36] = -0.000409709; qual_match_simple_bayesian[38][37] = -0.000358037; qual_match_simple_bayesian[38][38] = -0.000316995; qual_match_simple_bayesian[38][39] = -0.000284396; qual_match_simple_bayesian[38][40] = -0.000258502; qual_match_simple_bayesian[38][41] = -0.000237934; qual_match_simple_bayesian[38][42] = -0.000221596; qual_match_simple_bayesian[38][43] = -0.000208619; qual_match_simple_bayesian[38][44] = -0.000198311; qual_match_simple_bayesian[38][45] = -0.000190123; qual_match_simple_bayesian[38][46] = -0.00018362; qual_match_simple_bayesian[39][0] = -10.0787; qual_match_simple_bayesian[39][1] = -1.58144; qual_match_simple_bayesian[39][2] = -0.996897; qual_match_simple_bayesian[39][3] = -0.695608; qual_match_simple_bayesian[39][4] = -0.507774; qual_match_simple_bayesian[39][5] = -0.380237; qual_match_simple_bayesian[39][6] = -0.28938; qual_match_simple_bayesian[39][7] = -0.222667; qual_match_simple_bayesian[39][8] = -0.172675; qual_match_simple_bayesian[39][9] = -0.134672; qual_match_simple_bayesian[39][10] = -0.105482; qual_match_simple_bayesian[39][11] = -0.0828876; qual_match_simple_bayesian[39][12] = -0.0652972; qual_match_simple_bayesian[39][13] = -0.051542; qual_match_simple_bayesian[39][14] = -0.040749; qual_match_simple_bayesian[39][15] = -0.0322581; qual_match_simple_bayesian[39][16] = -0.0255645; qual_match_simple_bayesian[39][17] = -0.0202794; qual_match_simple_bayesian[39][18] = -0.0161011; qual_match_simple_bayesian[39][19] = -0.0127945; qual_match_simple_bayesian[39][20] = -0.0101758; qual_match_simple_bayesian[39][21] = -0.00810056; qual_match_simple_bayesian[39][22] = -0.0064552; qual_match_simple_bayesian[39][23] = -0.00515016; qual_match_simple_bayesian[39][24] = -0.00411475; qual_match_simple_bayesian[39][25] = -0.00329306; qual_match_simple_bayesian[39][26] = -0.00264084; qual_match_simple_bayesian[39][27] = -0.00212307; qual_match_simple_bayesian[39][28] = -0.00171198; qual_match_simple_bayesian[39][29] = -0.00138557; qual_match_simple_bayesian[39][30] = -0.00112636; qual_match_simple_bayesian[39][31] = -0.000920511; qual_match_simple_bayesian[39][32] = -0.00075703; qual_match_simple_bayesian[39][33] = -0.000627192; qual_match_simple_bayesian[39][34] = -0.00052407; qual_match_simple_bayesian[39][35] = -0.000442165; qual_match_simple_bayesian[39][36] = -0.00037711; qual_match_simple_bayesian[39][37] = -0.000325438; qual_match_simple_bayesian[39][38] = -0.000284396; qual_match_simple_bayesian[39][39] = -0.000251796; qual_match_simple_bayesian[39][40] = -0.000225901; qual_match_simple_bayesian[39][41] = -0.000205333; qual_match_simple_bayesian[39][42] = -0.000188996; qual_match_simple_bayesian[39][43] = -0.000176018; qual_match_simple_bayesian[39][44] = -0.00016571; qual_match_simple_bayesian[39][45] = -0.000157522; qual_match_simple_bayesian[39][46] = -0.000151019; qual_match_simple_bayesian[40][0] = -10.309; qual_match_simple_bayesian[40][1] = -1.58145; qual_match_simple_bayesian[40][2] = -0.996886; qual_match_simple_bayesian[40][3] = -0.695591; qual_match_simple_bayesian[40][4] = -0.507754; qual_match_simple_bayesian[40][5] = -0.380215; qual_match_simple_bayesian[40][6] = -0.289357; qual_match_simple_bayesian[40][7] = -0.222643; qual_match_simple_bayesian[40][8] = -0.17265; qual_match_simple_bayesian[40][9] = -0.134647; qual_match_simple_bayesian[40][10] = -0.105457; qual_match_simple_bayesian[40][11] = -0.0828624; qual_match_simple_bayesian[40][12] = -0.0652719; qual_match_simple_bayesian[40][13] = -0.0515165; qual_match_simple_bayesian[40][14] = -0.0407235; qual_match_simple_bayesian[40][15] = -0.0322325; qual_match_simple_bayesian[40][16] = -0.0255389; qual_match_simple_bayesian[40][17] = -0.0202537; qual_match_simple_bayesian[40][18] = -0.0160753; qual_match_simple_bayesian[40][19] = -0.0127688; qual_match_simple_bayesian[40][20] = -0.01015; qual_match_simple_bayesian[40][21] = -0.00807474; qual_match_simple_bayesian[40][22] = -0.00642936; qual_match_simple_bayesian[40][23] = -0.00512431; qual_match_simple_bayesian[40][24] = -0.00408889; qual_match_simple_bayesian[40][25] = -0.00326719; qual_match_simple_bayesian[40][26] = -0.00261497; qual_match_simple_bayesian[40][27] = -0.00209719; qual_match_simple_bayesian[40][28] = -0.0016861; qual_match_simple_bayesian[40][29] = -0.00135968; qual_match_simple_bayesian[40][30] = -0.00110047; qual_match_simple_bayesian[40][31] = -0.000894622; qual_match_simple_bayesian[40][32] = -0.00073114; qual_match_simple_bayesian[40][33] = -0.000601301; qual_match_simple_bayesian[40][34] = -0.000498178; qual_match_simple_bayesian[40][35] = -0.000416272; qual_match_simple_bayesian[40][36] = -0.000351217; qual_match_simple_bayesian[40][37] = -0.000299544; qual_match_simple_bayesian[40][38] = -0.000258502; qual_match_simple_bayesian[40][39] = -0.000225901; qual_match_simple_bayesian[40][40] = -0.000200007; qual_match_simple_bayesian[40][41] = -0.000179438; qual_match_simple_bayesian[40][42] = -0.000163101; qual_match_simple_bayesian[40][43] = -0.000150123; qual_match_simple_bayesian[40][44] = -0.000139815; qual_match_simple_bayesian[40][45] = -0.000131627; qual_match_simple_bayesian[40][46] = -0.000125123; qual_match_simple_bayesian[41][0] = -10.5392; qual_match_simple_bayesian[41][1] = -1.58145; qual_match_simple_bayesian[41][2] = -0.996877; qual_match_simple_bayesian[41][3] = -0.695577; qual_match_simple_bayesian[41][4] = -0.507738; qual_match_simple_bayesian[41][5] = -0.380198; qual_match_simple_bayesian[41][6] = -0.289339; qual_match_simple_bayesian[41][7] = -0.222624; qual_match_simple_bayesian[41][8] = -0.172631; qual_match_simple_bayesian[41][9] = -0.134628; qual_match_simple_bayesian[41][10] = -0.105437; qual_match_simple_bayesian[41][11] = -0.0828425; qual_match_simple_bayesian[41][12] = -0.0652518; qual_match_simple_bayesian[41][13] = -0.0514963; qual_match_simple_bayesian[41][14] = -0.0407032; qual_match_simple_bayesian[41][15] = -0.0322121; qual_match_simple_bayesian[41][16] = -0.0255185; qual_match_simple_bayesian[41][17] = -0.0202333; qual_match_simple_bayesian[41][18] = -0.0160549; qual_match_simple_bayesian[41][19] = -0.0127483; qual_match_simple_bayesian[41][20] = -0.0101295; qual_match_simple_bayesian[41][21] = -0.00805422; qual_match_simple_bayesian[41][22] = -0.00640883; qual_match_simple_bayesian[41][23] = -0.00510378; qual_match_simple_bayesian[41][24] = -0.00406835; qual_match_simple_bayesian[41][25] = -0.00324664; qual_match_simple_bayesian[41][26] = -0.00259442; qual_match_simple_bayesian[41][27] = -0.00207664; qual_match_simple_bayesian[41][28] = -0.00166554; qual_match_simple_bayesian[41][29] = -0.00133912; qual_match_simple_bayesian[41][30] = -0.00107991; qual_match_simple_bayesian[41][31] = -0.000874059; qual_match_simple_bayesian[41][32] = -0.000710576; qual_match_simple_bayesian[41][33] = -0.000580736; qual_match_simple_bayesian[41][34] = -0.000477612; qual_match_simple_bayesian[41][35] = -0.000395705; qual_match_simple_bayesian[41][36] = -0.00033065; qual_match_simple_bayesian[41][37] = -0.000278977; qual_match_simple_bayesian[41][38] = -0.000237934; qual_match_simple_bayesian[41][39] = -0.000205333; qual_match_simple_bayesian[41][40] = -0.000179438; qual_match_simple_bayesian[41][41] = -0.00015887; qual_match_simple_bayesian[41][42] = -0.000142532; qual_match_simple_bayesian[41][43] = -0.000129555; qual_match_simple_bayesian[41][44] = -0.000119246; qual_match_simple_bayesian[41][45] = -0.000111058; qual_match_simple_bayesian[41][46] = -0.000104554; qual_match_simple_bayesian[42][0] = -10.7695; qual_match_simple_bayesian[42][1] = -1.58146; qual_match_simple_bayesian[42][2] = -0.99687; qual_match_simple_bayesian[42][3] = -0.695566; qual_match_simple_bayesian[42][4] = -0.507725; qual_match_simple_bayesian[42][5] = -0.380184; qual_match_simple_bayesian[42][6] = -0.289324; qual_match_simple_bayesian[42][7] = -0.222609; qual_match_simple_bayesian[42][8] = -0.172616; qual_match_simple_bayesian[42][9] = -0.134612; qual_match_simple_bayesian[42][10] = -0.105421; qual_match_simple_bayesian[42][11] = -0.0828266; qual_match_simple_bayesian[42][12] = -0.0652359; qual_match_simple_bayesian[42][13] = -0.0514803; qual_match_simple_bayesian[42][14] = -0.0406871; qual_match_simple_bayesian[42][15] = -0.032196; qual_match_simple_bayesian[42][16] = -0.0255023; qual_match_simple_bayesian[42][17] = -0.020217; qual_match_simple_bayesian[42][18] = -0.0160386; qual_match_simple_bayesian[42][19] = -0.012732; qual_match_simple_bayesian[42][20] = -0.0101132; qual_match_simple_bayesian[42][21] = -0.00803793; qual_match_simple_bayesian[42][22] = -0.00639253; qual_match_simple_bayesian[42][23] = -0.00508747; qual_match_simple_bayesian[42][24] = -0.00405203; qual_match_simple_bayesian[42][25] = -0.00323032; qual_match_simple_bayesian[42][26] = -0.00257809; qual_match_simple_bayesian[42][27] = -0.00206031; qual_match_simple_bayesian[42][28] = -0.00164921; qual_match_simple_bayesian[42][29] = -0.00132279; qual_match_simple_bayesian[42][30] = -0.00106358; qual_match_simple_bayesian[42][31] = -0.000857725; qual_match_simple_bayesian[42][32] = -0.000694241; qual_match_simple_bayesian[42][33] = -0.0005644; qual_match_simple_bayesian[42][34] = -0.000461276; qual_match_simple_bayesian[42][35] = -0.000379369; qual_match_simple_bayesian[42][36] = -0.000314313; qual_match_simple_bayesian[42][37] = -0.00026264; qual_match_simple_bayesian[42][38] = -0.000221596; qual_match_simple_bayesian[42][39] = -0.000188996; qual_match_simple_bayesian[42][40] = -0.000163101; qual_match_simple_bayesian[42][41] = -0.000142532; qual_match_simple_bayesian[42][42] = -0.000126194; qual_match_simple_bayesian[42][43] = -0.000113217; qual_match_simple_bayesian[42][44] = -0.000102908; qual_match_simple_bayesian[42][45] = -9.47203e-05; qual_match_simple_bayesian[42][46] = -8.82164e-05; qual_match_simple_bayesian[43][0] = -10.9997; qual_match_simple_bayesian[43][1] = -1.58146; qual_match_simple_bayesian[43][2] = -0.996865; qual_match_simple_bayesian[43][3] = -0.695558; qual_match_simple_bayesian[43][4] = -0.507715; qual_match_simple_bayesian[43][5] = -0.380173; qual_match_simple_bayesian[43][6] = -0.289313; qual_match_simple_bayesian[43][7] = -0.222597; qual_match_simple_bayesian[43][8] = -0.172604; qual_match_simple_bayesian[43][9] = -0.1346; qual_match_simple_bayesian[43][10] = -0.105409; qual_match_simple_bayesian[43][11] = -0.082814; qual_match_simple_bayesian[43][12] = -0.0652232; qual_match_simple_bayesian[43][13] = -0.0514675; qual_match_simple_bayesian[43][14] = -0.0406743; qual_match_simple_bayesian[43][15] = -0.0321831; qual_match_simple_bayesian[43][16] = -0.0254894; qual_match_simple_bayesian[43][17] = -0.0202041; qual_match_simple_bayesian[43][18] = -0.0160257; qual_match_simple_bayesian[43][19] = -0.0127191; qual_match_simple_bayesian[43][20] = -0.0101003; qual_match_simple_bayesian[43][21] = -0.00802498; qual_match_simple_bayesian[43][22] = -0.00637958; qual_match_simple_bayesian[43][23] = -0.00507451; qual_match_simple_bayesian[43][24] = -0.00403907; qual_match_simple_bayesian[43][25] = -0.00321736; qual_match_simple_bayesian[43][26] = -0.00256512; qual_match_simple_bayesian[43][27] = -0.00204734; qual_match_simple_bayesian[43][28] = -0.00163624; qual_match_simple_bayesian[43][29] = -0.00130982; qual_match_simple_bayesian[43][30] = -0.0010506; qual_match_simple_bayesian[43][31] = -0.000844751; qual_match_simple_bayesian[43][32] = -0.000681266; qual_match_simple_bayesian[43][33] = -0.000551424; qual_match_simple_bayesian[43][34] = -0.0004483; qual_match_simple_bayesian[43][35] = -0.000366392; qual_match_simple_bayesian[43][36] = -0.000301336; qual_match_simple_bayesian[43][37] = -0.000249663; qual_match_simple_bayesian[43][38] = -0.000208619; qual_match_simple_bayesian[43][39] = -0.000176018; qual_match_simple_bayesian[43][40] = -0.000150123; qual_match_simple_bayesian[43][41] = -0.000129555; qual_match_simple_bayesian[43][42] = -0.000113217; qual_match_simple_bayesian[43][43] = -0.000100239; qual_match_simple_bayesian[43][44] = -8.99308e-05; qual_match_simple_bayesian[43][45] = -8.17427e-05; qual_match_simple_bayesian[43][46] = -7.52387e-05; qual_match_simple_bayesian[44][0] = -11.23; qual_match_simple_bayesian[44][1] = -1.58146; qual_match_simple_bayesian[44][2] = -0.99686; qual_match_simple_bayesian[44][3] = -0.695551; qual_match_simple_bayesian[44][4] = -0.507707; qual_match_simple_bayesian[44][5] = -0.380164; qual_match_simple_bayesian[44][6] = -0.289304; qual_match_simple_bayesian[44][7] = -0.222588; qual_match_simple_bayesian[44][8] = -0.172594; qual_match_simple_bayesian[44][9] = -0.13459; qual_match_simple_bayesian[44][10] = -0.105399; qual_match_simple_bayesian[44][11] = -0.082804; qual_match_simple_bayesian[44][12] = -0.0652131; qual_match_simple_bayesian[44][13] = -0.0514574; qual_match_simple_bayesian[44][14] = -0.0406641; qual_match_simple_bayesian[44][15] = -0.032173; qual_match_simple_bayesian[44][16] = -0.0254792; qual_match_simple_bayesian[44][17] = -0.0201939; qual_match_simple_bayesian[44][18] = -0.0160155; qual_match_simple_bayesian[44][19] = -0.0127088; qual_match_simple_bayesian[44][20] = -0.01009; qual_match_simple_bayesian[44][21] = -0.0080147; qual_match_simple_bayesian[44][22] = -0.00636929; qual_match_simple_bayesian[44][23] = -0.00506422; qual_match_simple_bayesian[44][24] = -0.00402878; qual_match_simple_bayesian[44][25] = -0.00320706; qual_match_simple_bayesian[44][26] = -0.00255482; qual_match_simple_bayesian[44][27] = -0.00203704; qual_match_simple_bayesian[44][28] = -0.00162594; qual_match_simple_bayesian[44][29] = -0.00129951; qual_match_simple_bayesian[44][30] = -0.0010403; qual_match_simple_bayesian[44][31] = -0.000834445; qual_match_simple_bayesian[44][32] = -0.00067096; qual_match_simple_bayesian[44][33] = -0.000541118; qual_match_simple_bayesian[44][34] = -0.000437993; qual_match_simple_bayesian[44][35] = -0.000356085; qual_match_simple_bayesian[44][36] = -0.000291028; qual_match_simple_bayesian[44][37] = -0.000239355; qual_match_simple_bayesian[44][38] = -0.000198311; qual_match_simple_bayesian[44][39] = -0.00016571; qual_match_simple_bayesian[44][40] = -0.000139815; qual_match_simple_bayesian[44][41] = -0.000119246; qual_match_simple_bayesian[44][42] = -0.000102908; qual_match_simple_bayesian[44][43] = -8.99308e-05; qual_match_simple_bayesian[44][44] = -7.96225e-05; qual_match_simple_bayesian[44][45] = -7.14344e-05; qual_match_simple_bayesian[44][46] = -6.49304e-05; qual_match_simple_bayesian[45][0] = -11.4602; qual_match_simple_bayesian[45][1] = -1.58146; qual_match_simple_bayesian[45][2] = -0.996857; qual_match_simple_bayesian[45][3] = -0.695546; qual_match_simple_bayesian[45][4] = -0.507701; qual_match_simple_bayesian[45][5] = -0.380157; qual_match_simple_bayesian[45][6] = -0.289296; qual_match_simple_bayesian[45][7] = -0.222581; qual_match_simple_bayesian[45][8] = -0.172586; qual_match_simple_bayesian[45][9] = -0.134582; qual_match_simple_bayesian[45][10] = -0.105391; qual_match_simple_bayesian[45][11] = -0.082796; qual_match_simple_bayesian[45][12] = -0.0652051; qual_match_simple_bayesian[45][13] = -0.0514493; qual_match_simple_bayesian[45][14] = -0.040656; qual_match_simple_bayesian[45][15] = -0.0321649; qual_match_simple_bayesian[45][16] = -0.0254711; qual_match_simple_bayesian[45][17] = -0.0201858; qual_match_simple_bayesian[45][18] = -0.0160073; qual_match_simple_bayesian[45][19] = -0.0127007; qual_match_simple_bayesian[45][20] = -0.0100819; qual_match_simple_bayesian[45][21] = -0.00800654; qual_match_simple_bayesian[45][22] = -0.00636112; qual_match_simple_bayesian[45][23] = -0.00505604; qual_match_simple_bayesian[45][24] = -0.0040206; qual_match_simple_bayesian[45][25] = -0.00319888; qual_match_simple_bayesian[45][26] = -0.00254664; qual_match_simple_bayesian[45][27] = -0.00202886; qual_match_simple_bayesian[45][28] = -0.00161776; qual_match_simple_bayesian[45][29] = -0.00129133; qual_match_simple_bayesian[45][30] = -0.00103211; qual_match_simple_bayesian[45][31] = -0.000826259; qual_match_simple_bayesian[45][32] = -0.000662773; qual_match_simple_bayesian[45][33] = -0.000532931; qual_match_simple_bayesian[45][34] = -0.000429806; qual_match_simple_bayesian[45][35] = -0.000347898; qual_match_simple_bayesian[45][36] = -0.000282841; qual_match_simple_bayesian[45][37] = -0.000231167; qual_match_simple_bayesian[45][38] = -0.000190123; qual_match_simple_bayesian[45][39] = -0.000157522; qual_match_simple_bayesian[45][40] = -0.000131627; qual_match_simple_bayesian[45][41] = -0.000111058; qual_match_simple_bayesian[45][42] = -9.47203e-05; qual_match_simple_bayesian[45][43] = -8.17427e-05; qual_match_simple_bayesian[45][44] = -7.14344e-05; qual_match_simple_bayesian[45][45] = -6.32462e-05; qual_match_simple_bayesian[45][46] = -5.67422e-05; qual_match_simple_bayesian[46][0] = -11.6905; qual_match_simple_bayesian[46][1] = -1.58147; qual_match_simple_bayesian[46][2] = -0.996854; qual_match_simple_bayesian[46][3] = -0.695541; qual_match_simple_bayesian[46][4] = -0.507695; qual_match_simple_bayesian[46][5] = -0.380152; qual_match_simple_bayesian[46][6] = -0.28929; qual_match_simple_bayesian[46][7] = -0.222575; qual_match_simple_bayesian[46][8] = -0.17258; qual_match_simple_bayesian[46][9] = -0.134576; qual_match_simple_bayesian[46][10] = -0.105385; qual_match_simple_bayesian[46][11] = -0.0827897; qual_match_simple_bayesian[46][12] = -0.0651987; qual_match_simple_bayesian[46][13] = -0.051443; qual_match_simple_bayesian[46][14] = -0.0406496; qual_match_simple_bayesian[46][15] = -0.0321584; qual_match_simple_bayesian[46][16] = -0.0254646; qual_match_simple_bayesian[46][17] = -0.0201793; qual_match_simple_bayesian[46][18] = -0.0160009; qual_match_simple_bayesian[46][19] = -0.0126942; qual_match_simple_bayesian[46][20] = -0.0100754; qual_match_simple_bayesian[46][21] = -0.00800005; qual_match_simple_bayesian[46][22] = -0.00635463; qual_match_simple_bayesian[46][23] = -0.00504955; qual_match_simple_bayesian[46][24] = -0.0040141; qual_match_simple_bayesian[46][25] = -0.00319238; qual_match_simple_bayesian[46][26] = -0.00254014; qual_match_simple_bayesian[46][27] = -0.00202236; qual_match_simple_bayesian[46][28] = -0.00161126; qual_match_simple_bayesian[46][29] = -0.00128483; qual_match_simple_bayesian[46][30] = -0.00102561; qual_match_simple_bayesian[46][31] = -0.000819756; qual_match_simple_bayesian[46][32] = -0.00065627; qual_match_simple_bayesian[46][33] = -0.000526428; qual_match_simple_bayesian[46][34] = -0.000423302; qual_match_simple_bayesian[46][35] = -0.000341394; qual_match_simple_bayesian[46][36] = -0.000276337; qual_match_simple_bayesian[46][37] = -0.000224664; qual_match_simple_bayesian[46][38] = -0.00018362; qual_match_simple_bayesian[46][39] = -0.000151019; qual_match_simple_bayesian[46][40] = -0.000125123; qual_match_simple_bayesian[46][41] = -0.000104554; qual_match_simple_bayesian[46][42] = -8.82164e-05; qual_match_simple_bayesian[46][43] = -7.52387e-05; qual_match_simple_bayesian[46][44] = -6.49304e-05; qual_match_simple_bayesian[46][45] = -5.67422e-05; qual_match_simple_bayesian[46][46] = -5.02381e-05; qual_mismatch_simple_bayesian[0][0] = -1.50408; qual_mismatch_simple_bayesian[0][1] = -1.40619; qual_mismatch_simple_bayesian[0][2] = -1.33474; qual_mismatch_simple_bayesian[0][3] = -1.28141; qual_mismatch_simple_bayesian[0][4] = -1.24099; qual_mismatch_simple_bayesian[0][5] = -1.21; qual_mismatch_simple_bayesian[0][6] = -1.18606; qual_mismatch_simple_bayesian[0][7] = -1.16744; qual_mismatch_simple_bayesian[0][8] = -1.15289; qual_mismatch_simple_bayesian[0][9] = -1.14148; qual_mismatch_simple_bayesian[0][10] = -1.13251; qual_mismatch_simple_bayesian[0][11] = -1.12545; qual_mismatch_simple_bayesian[0][12] = -1.11987; qual_mismatch_simple_bayesian[0][13] = -1.11546; qual_mismatch_simple_bayesian[0][14] = -1.11197; qual_mismatch_simple_bayesian[0][15] = -1.10921; qual_mismatch_simple_bayesian[0][16] = -1.10702; qual_mismatch_simple_bayesian[0][17] = -1.10529; qual_mismatch_simple_bayesian[0][18] = -1.10391; qual_mismatch_simple_bayesian[0][19] = -1.10282; qual_mismatch_simple_bayesian[0][20] = -1.10195; qual_mismatch_simple_bayesian[0][21] = -1.10126; qual_mismatch_simple_bayesian[0][22] = -1.10072; qual_mismatch_simple_bayesian[0][23] = -1.10028; qual_mismatch_simple_bayesian[0][24] = -1.09994; qual_mismatch_simple_bayesian[0][25] = -1.09967; qual_mismatch_simple_bayesian[0][26] = -1.09945; qual_mismatch_simple_bayesian[0][27] = -1.09928; qual_mismatch_simple_bayesian[0][28] = -1.09914; qual_mismatch_simple_bayesian[0][29] = -1.09903; qual_mismatch_simple_bayesian[0][30] = -1.09895; qual_mismatch_simple_bayesian[0][31] = -1.09888; qual_mismatch_simple_bayesian[0][32] = -1.09882; qual_mismatch_simple_bayesian[0][33] = -1.09878; qual_mismatch_simple_bayesian[0][34] = -1.09874; qual_mismatch_simple_bayesian[0][35] = -1.09872; qual_mismatch_simple_bayesian[0][36] = -1.0987; qual_mismatch_simple_bayesian[0][37] = -1.09868; qual_mismatch_simple_bayesian[0][38] = -1.09867; qual_mismatch_simple_bayesian[0][39] = -1.09865; qual_mismatch_simple_bayesian[0][40] = -1.09865; qual_mismatch_simple_bayesian[0][41] = -1.09864; qual_mismatch_simple_bayesian[0][42] = -1.09863; qual_mismatch_simple_bayesian[0][43] = -1.09863; qual_mismatch_simple_bayesian[0][44] = -1.09863; qual_mismatch_simple_bayesian[0][45] = -1.09862; qual_mismatch_simple_bayesian[0][46] = -1.09862; qual_mismatch_simple_bayesian[1][0] = -1.40619; qual_mismatch_simple_bayesian[1][1] = -1.38979; qual_mismatch_simple_bayesian[1][2] = -1.37696; qual_mismatch_simple_bayesian[1][3] = -1.36688; qual_mismatch_simple_bayesian[1][4] = -1.35894; qual_mismatch_simple_bayesian[1][5] = -1.35268; qual_mismatch_simple_bayesian[1][6] = -1.34774; qual_mismatch_simple_bayesian[1][7] = -1.34383; qual_mismatch_simple_bayesian[1][8] = -1.34073; qual_mismatch_simple_bayesian[1][9] = -1.33828; qual_mismatch_simple_bayesian[1][10] = -1.33634; qual_mismatch_simple_bayesian[1][11] = -1.3348; qual_mismatch_simple_bayesian[1][12] = -1.33358; qual_mismatch_simple_bayesian[1][13] = -1.33261; qual_mismatch_simple_bayesian[1][14] = -1.33184; qual_mismatch_simple_bayesian[1][15] = -1.33123; qual_mismatch_simple_bayesian[1][16] = -1.33074; qual_mismatch_simple_bayesian[1][17] = -1.33036; qual_mismatch_simple_bayesian[1][18] = -1.33005; qual_mismatch_simple_bayesian[1][19] = -1.32981; qual_mismatch_simple_bayesian[1][20] = -1.32962; qual_mismatch_simple_bayesian[1][21] = -1.32946; qual_mismatch_simple_bayesian[1][22] = -1.32934; qual_mismatch_simple_bayesian[1][23] = -1.32924; qual_mismatch_simple_bayesian[1][24] = -1.32917; qual_mismatch_simple_bayesian[1][25] = -1.32911; qual_mismatch_simple_bayesian[1][26] = -1.32906; qual_mismatch_simple_bayesian[1][27] = -1.32902; qual_mismatch_simple_bayesian[1][28] = -1.32899; qual_mismatch_simple_bayesian[1][29] = -1.32896; qual_mismatch_simple_bayesian[1][30] = -1.32895; qual_mismatch_simple_bayesian[1][31] = -1.32893; qual_mismatch_simple_bayesian[1][32] = -1.32892; qual_mismatch_simple_bayesian[1][33] = -1.32891; qual_mismatch_simple_bayesian[1][34] = -1.3289; qual_mismatch_simple_bayesian[1][35] = -1.32889; qual_mismatch_simple_bayesian[1][36] = -1.32889; qual_mismatch_simple_bayesian[1][37] = -1.32889; qual_mismatch_simple_bayesian[1][38] = -1.32888; qual_mismatch_simple_bayesian[1][39] = -1.32888; qual_mismatch_simple_bayesian[1][40] = -1.32888; qual_mismatch_simple_bayesian[1][41] = -1.32888; qual_mismatch_simple_bayesian[1][42] = -1.32888; qual_mismatch_simple_bayesian[1][43] = -1.32887; qual_mismatch_simple_bayesian[1][44] = -1.32887; qual_mismatch_simple_bayesian[1][45] = -1.32887; qual_mismatch_simple_bayesian[1][46] = -1.32887; qual_mismatch_simple_bayesian[2][0] = -1.33474; qual_mismatch_simple_bayesian[2][1] = -1.37696; qual_mismatch_simple_bayesian[2][2] = -1.41181; qual_mismatch_simple_bayesian[2][3] = -1.44039; qual_mismatch_simple_bayesian[2][4] = -1.46368; qual_mismatch_simple_bayesian[2][5] = -1.48258; qual_mismatch_simple_bayesian[2][6] = -1.49786; qual_mismatch_simple_bayesian[2][7] = -1.51016; qual_mismatch_simple_bayesian[2][8] = -1.52003; qual_mismatch_simple_bayesian[2][9] = -1.52795; qual_mismatch_simple_bayesian[2][10] = -1.53428; qual_mismatch_simple_bayesian[2][11] = -1.53934; qual_mismatch_simple_bayesian[2][12] = -1.54338; qual_mismatch_simple_bayesian[2][13] = -1.5466; qual_mismatch_simple_bayesian[2][14] = -1.54916; qual_mismatch_simple_bayesian[2][15] = -1.55121; qual_mismatch_simple_bayesian[2][16] = -1.55283; qual_mismatch_simple_bayesian[2][17] = -1.55412; qual_mismatch_simple_bayesian[2][18] = -1.55515; qual_mismatch_simple_bayesian[2][19] = -1.55597; qual_mismatch_simple_bayesian[2][20] = -1.55662; qual_mismatch_simple_bayesian[2][21] = -1.55713; qual_mismatch_simple_bayesian[2][22] = -1.55754; qual_mismatch_simple_bayesian[2][23] = -1.55787; qual_mismatch_simple_bayesian[2][24] = -1.55813; qual_mismatch_simple_bayesian[2][25] = -1.55833; qual_mismatch_simple_bayesian[2][26] = -1.5585; qual_mismatch_simple_bayesian[2][27] = -1.55863; qual_mismatch_simple_bayesian[2][28] = -1.55873; qual_mismatch_simple_bayesian[2][29] = -1.55881; qual_mismatch_simple_bayesian[2][30] = -1.55888; qual_mismatch_simple_bayesian[2][31] = -1.55893; qual_mismatch_simple_bayesian[2][32] = -1.55897; qual_mismatch_simple_bayesian[2][33] = -1.559; qual_mismatch_simple_bayesian[2][34] = -1.55903; qual_mismatch_simple_bayesian[2][35] = -1.55905; qual_mismatch_simple_bayesian[2][36] = -1.55907; qual_mismatch_simple_bayesian[2][37] = -1.55908; qual_mismatch_simple_bayesian[2][38] = -1.55909; qual_mismatch_simple_bayesian[2][39] = -1.5591; qual_mismatch_simple_bayesian[2][40] = -1.5591; qual_mismatch_simple_bayesian[2][41] = -1.55911; qual_mismatch_simple_bayesian[2][42] = -1.55911; qual_mismatch_simple_bayesian[2][43] = -1.55912; qual_mismatch_simple_bayesian[2][44] = -1.55912; qual_mismatch_simple_bayesian[2][45] = -1.55912; qual_mismatch_simple_bayesian[2][46] = -1.55912; qual_mismatch_simple_bayesian[3][0] = -1.28141; qual_mismatch_simple_bayesian[3][1] = -1.36688; qual_mismatch_simple_bayesian[3][2] = -1.44039; qual_mismatch_simple_bayesian[3][3] = -1.50289; qual_mismatch_simple_bayesian[3][4] = -1.55549; qual_mismatch_simple_bayesian[3][5] = -1.59933; qual_mismatch_simple_bayesian[3][6] = -1.63558; qual_mismatch_simple_bayesian[3][7] = -1.66534; qual_mismatch_simple_bayesian[3][8] = -1.68963; qual_mismatch_simple_bayesian[3][9] = -1.70935; qual_mismatch_simple_bayesian[3][10] = -1.72529; qual_mismatch_simple_bayesian[3][11] = -1.73814; qual_mismatch_simple_bayesian[3][12] = -1.74847; qual_mismatch_simple_bayesian[3][13] = -1.75675; qual_mismatch_simple_bayesian[3][14] = -1.76338; qual_mismatch_simple_bayesian[3][15] = -1.76867; qual_mismatch_simple_bayesian[3][16] = -1.7729; qual_mismatch_simple_bayesian[3][17] = -1.77627; qual_mismatch_simple_bayesian[3][18] = -1.77895; qual_mismatch_simple_bayesian[3][19] = -1.78109; qual_mismatch_simple_bayesian[3][20] = -1.78279; qual_mismatch_simple_bayesian[3][21] = -1.78414; qual_mismatch_simple_bayesian[3][22] = -1.78522; qual_mismatch_simple_bayesian[3][23] = -1.78608; qual_mismatch_simple_bayesian[3][24] = -1.78676; qual_mismatch_simple_bayesian[3][25] = -1.7873; qual_mismatch_simple_bayesian[3][26] = -1.78773; qual_mismatch_simple_bayesian[3][27] = -1.78807; qual_mismatch_simple_bayesian[3][28] = -1.78834; qual_mismatch_simple_bayesian[3][29] = -1.78855; qual_mismatch_simple_bayesian[3][30] = -1.78873; qual_mismatch_simple_bayesian[3][31] = -1.78886; qual_mismatch_simple_bayesian[3][32] = -1.78897; qual_mismatch_simple_bayesian[3][33] = -1.78906; qual_mismatch_simple_bayesian[3][34] = -1.78912; qual_mismatch_simple_bayesian[3][35] = -1.78918; qual_mismatch_simple_bayesian[3][36] = -1.78922; qual_mismatch_simple_bayesian[3][37] = -1.78926; qual_mismatch_simple_bayesian[3][38] = -1.78928; qual_mismatch_simple_bayesian[3][39] = -1.7893; qual_mismatch_simple_bayesian[3][40] = -1.78932; qual_mismatch_simple_bayesian[3][41] = -1.78934; qual_mismatch_simple_bayesian[3][42] = -1.78935; qual_mismatch_simple_bayesian[3][43] = -1.78935; qual_mismatch_simple_bayesian[3][44] = -1.78936; qual_mismatch_simple_bayesian[3][45] = -1.78937; qual_mismatch_simple_bayesian[3][46] = -1.78937; qual_mismatch_simple_bayesian[4][0] = -1.24099; qual_mismatch_simple_bayesian[4][1] = -1.35894; qual_mismatch_simple_bayesian[4][2] = -1.46368; qual_mismatch_simple_bayesian[4][3] = -1.55549; qual_mismatch_simple_bayesian[4][4] = -1.63493; qual_mismatch_simple_bayesian[4][5] = -1.70287; qual_mismatch_simple_bayesian[4][6] = -1.76033; qual_mismatch_simple_bayesian[4][7] = -1.80845; qual_mismatch_simple_bayesian[4][8] = -1.8484; qual_mismatch_simple_bayesian[4][9] = -1.8813; qual_mismatch_simple_bayesian[4][10] = -1.90823; qual_mismatch_simple_bayesian[4][11] = -1.93016; qual_mismatch_simple_bayesian[4][12] = -1.94792; qual_mismatch_simple_bayesian[4][13] = -1.96226; qual_mismatch_simple_bayesian[4][14] = -1.97379; qual_mismatch_simple_bayesian[4][15] = -1.98305; qual_mismatch_simple_bayesian[4][16] = -1.99047; qual_mismatch_simple_bayesian[4][17] = -1.9964; qual_mismatch_simple_bayesian[4][18] = -2.00114; qual_mismatch_simple_bayesian[4][19] = -2.00492; qual_mismatch_simple_bayesian[4][20] = -2.00793; qual_mismatch_simple_bayesian[4][21] = -2.01033; qual_mismatch_simple_bayesian[4][22] = -2.01224; qual_mismatch_simple_bayesian[4][23] = -2.01376; qual_mismatch_simple_bayesian[4][24] = -2.01497; qual_mismatch_simple_bayesian[4][25] = -2.01593; qual_mismatch_simple_bayesian[4][26] = -2.01669; qual_mismatch_simple_bayesian[4][27] = -2.0173; qual_mismatch_simple_bayesian[4][28] = -2.01778; qual_mismatch_simple_bayesian[4][29] = -2.01816; qual_mismatch_simple_bayesian[4][30] = -2.01847; qual_mismatch_simple_bayesian[4][31] = -2.01871; qual_mismatch_simple_bayesian[4][32] = -2.0189; qual_mismatch_simple_bayesian[4][33] = -2.01906; qual_mismatch_simple_bayesian[4][34] = -2.01918; qual_mismatch_simple_bayesian[4][35] = -2.01927; qual_mismatch_simple_bayesian[4][36] = -2.01935; qual_mismatch_simple_bayesian[4][37] = -2.01941; qual_mismatch_simple_bayesian[4][38] = -2.01946; qual_mismatch_simple_bayesian[4][39] = -2.0195; qual_mismatch_simple_bayesian[4][40] = -2.01953; qual_mismatch_simple_bayesian[4][41] = -2.01955; qual_mismatch_simple_bayesian[4][42] = -2.01957; qual_mismatch_simple_bayesian[4][43] = -2.01959; qual_mismatch_simple_bayesian[4][44] = -2.0196; qual_mismatch_simple_bayesian[4][45] = -2.01961; qual_mismatch_simple_bayesian[4][46] = -2.01962; qual_mismatch_simple_bayesian[5][0] = -1.21; qual_mismatch_simple_bayesian[5][1] = -1.35268; qual_mismatch_simple_bayesian[5][2] = -1.48258; qual_mismatch_simple_bayesian[5][3] = -1.59933; qual_mismatch_simple_bayesian[5][4] = -1.70287; qual_mismatch_simple_bayesian[5][5] = -1.79352; qual_mismatch_simple_bayesian[5][6] = -1.87187; qual_mismatch_simple_bayesian[5][7] = -1.93881; qual_mismatch_simple_bayesian[5][8] = -1.99536; qual_mismatch_simple_bayesian[5][9] = -2.04269; qual_mismatch_simple_bayesian[5][10] = -2.08194; qual_mismatch_simple_bayesian[5][11] = -2.11426; qual_mismatch_simple_bayesian[5][12] = -2.14069; qual_mismatch_simple_bayesian[5][13] = -2.1622; qual_mismatch_simple_bayesian[5][14] = -2.17962; qual_mismatch_simple_bayesian[5][15] = -2.19368; qual_mismatch_simple_bayesian[5][16] = -2.20499; qual_mismatch_simple_bayesian[5][17] = -2.21406; qual_mismatch_simple_bayesian[5][18] = -2.22133; qual_mismatch_simple_bayesian[5][19] = -2.22714; qual_mismatch_simple_bayesian[5][20] = -2.23178; qual_mismatch_simple_bayesian[5][21] = -2.23548; qual_mismatch_simple_bayesian[5][22] = -2.23843; qual_mismatch_simple_bayesian[5][23] = -2.24078; qual_mismatch_simple_bayesian[5][24] = -2.24265; qual_mismatch_simple_bayesian[5][25] = -2.24414; qual_mismatch_simple_bayesian[5][26] = -2.24532; qual_mismatch_simple_bayesian[5][27] = -2.24626; qual_mismatch_simple_bayesian[5][28] = -2.24701; qual_mismatch_simple_bayesian[5][29] = -2.2476; qual_mismatch_simple_bayesian[5][30] = -2.24808; qual_mismatch_simple_bayesian[5][31] = -2.24845; qual_mismatch_simple_bayesian[5][32] = -2.24875; qual_mismatch_simple_bayesian[5][33] = -2.24899; qual_mismatch_simple_bayesian[5][34] = -2.24918; qual_mismatch_simple_bayesian[5][35] = -2.24933; qual_mismatch_simple_bayesian[5][36] = -2.24945; qual_mismatch_simple_bayesian[5][37] = -2.24954; qual_mismatch_simple_bayesian[5][38] = -2.24962; qual_mismatch_simple_bayesian[5][39] = -2.24967; qual_mismatch_simple_bayesian[5][40] = -2.24972; qual_mismatch_simple_bayesian[5][41] = -2.24976; qual_mismatch_simple_bayesian[5][42] = -2.24979; qual_mismatch_simple_bayesian[5][43] = -2.24981; qual_mismatch_simple_bayesian[5][44] = -2.24983; qual_mismatch_simple_bayesian[5][45] = -2.24985; qual_mismatch_simple_bayesian[5][46] = -2.24986; qual_mismatch_simple_bayesian[6][0] = -1.18606; qual_mismatch_simple_bayesian[6][1] = -1.34774; qual_mismatch_simple_bayesian[6][2] = -1.49786; qual_mismatch_simple_bayesian[6][3] = -1.63558; qual_mismatch_simple_bayesian[6][4] = -1.76033; qual_mismatch_simple_bayesian[6][5] = -1.87187; qual_mismatch_simple_bayesian[6][6] = -1.97029; qual_mismatch_simple_bayesian[6][7] = -2.05601; qual_mismatch_simple_bayesian[6][8] = -2.12976; qual_mismatch_simple_bayesian[6][9] = -2.19248; qual_mismatch_simple_bayesian[6][10] = -2.24527; qual_mismatch_simple_bayesian[6][11] = -2.28928; qual_mismatch_simple_bayesian[6][12] = -2.32567; qual_mismatch_simple_bayesian[6][13] = -2.35556; qual_mismatch_simple_bayesian[6][14] = -2.37995; qual_mismatch_simple_bayesian[6][15] = -2.39976; qual_mismatch_simple_bayesian[6][16] = -2.41577; qual_mismatch_simple_bayesian[6][17] = -2.42868; qual_mismatch_simple_bayesian[6][18] = -2.43906; qual_mismatch_simple_bayesian[6][19] = -2.44737; qual_mismatch_simple_bayesian[6][20] = -2.45403; qual_mismatch_simple_bayesian[6][21] = -2.45935; qual_mismatch_simple_bayesian[6][22] = -2.4636; qual_mismatch_simple_bayesian[6][23] = -2.46698; qual_mismatch_simple_bayesian[6][24] = -2.46968; qual_mismatch_simple_bayesian[6][25] = -2.47183; qual_mismatch_simple_bayesian[6][26] = -2.47353; qual_mismatch_simple_bayesian[6][27] = -2.47489; qual_mismatch_simple_bayesian[6][28] = -2.47598; qual_mismatch_simple_bayesian[6][29] = -2.47684; qual_mismatch_simple_bayesian[6][30] = -2.47752; qual_mismatch_simple_bayesian[6][31] = -2.47806; qual_mismatch_simple_bayesian[6][32] = -2.47849; qual_mismatch_simple_bayesian[6][33] = -2.47884; qual_mismatch_simple_bayesian[6][34] = -2.47911; qual_mismatch_simple_bayesian[6][35] = -2.47933; qual_mismatch_simple_bayesian[6][36] = -2.4795; qual_mismatch_simple_bayesian[6][37] = -2.47964; qual_mismatch_simple_bayesian[6][38] = -2.47974; qual_mismatch_simple_bayesian[6][39] = -2.47983; qual_mismatch_simple_bayesian[6][40] = -2.4799; qual_mismatch_simple_bayesian[6][41] = -2.47995; qual_mismatch_simple_bayesian[6][42] = -2.48; qual_mismatch_simple_bayesian[6][43] = -2.48003; qual_mismatch_simple_bayesian[6][44] = -2.48006; qual_mismatch_simple_bayesian[6][45] = -2.48008; qual_mismatch_simple_bayesian[6][46] = -2.4801; qual_mismatch_simple_bayesian[7][0] = -1.16744; qual_mismatch_simple_bayesian[7][1] = -1.34383; qual_mismatch_simple_bayesian[7][2] = -1.51016; qual_mismatch_simple_bayesian[7][3] = -1.66534; qual_mismatch_simple_bayesian[7][4] = -1.80845; qual_mismatch_simple_bayesian[7][5] = -1.93881; qual_mismatch_simple_bayesian[7][6] = -2.05601; qual_mismatch_simple_bayesian[7][7] = -2.16001; qual_mismatch_simple_bayesian[7][8] = -2.25109; qual_mismatch_simple_bayesian[7][9] = -2.32986; qual_mismatch_simple_bayesian[7][10] = -2.39718; qual_mismatch_simple_bayesian[7][11] = -2.45408; qual_mismatch_simple_bayesian[7][12] = -2.5017; qual_mismatch_simple_bayesian[7][13] = -2.54122; qual_mismatch_simple_bayesian[7][14] = -2.57376; qual_mismatch_simple_bayesian[7][15] = -2.60038; qual_mismatch_simple_bayesian[7][16] = -2.62204; qual_mismatch_simple_bayesian[7][17] = -2.63959; qual_mismatch_simple_bayesian[7][18] = -2.65376; qual_mismatch_simple_bayesian[7][19] = -2.66515; qual_mismatch_simple_bayesian[7][20] = -2.6743; qual_mismatch_simple_bayesian[7][21] = -2.68162; qual_mismatch_simple_bayesian[7][22] = -2.68748; qual_mismatch_simple_bayesian[7][23] = -2.69215; qual_mismatch_simple_bayesian[7][24] = -2.69588; qual_mismatch_simple_bayesian[7][25] = -2.69886; qual_mismatch_simple_bayesian[7][26] = -2.70122; qual_mismatch_simple_bayesian[7][27] = -2.70311; qual_mismatch_simple_bayesian[7][28] = -2.70461; qual_mismatch_simple_bayesian[7][29] = -2.7058; qual_mismatch_simple_bayesian[7][30] = -2.70675; qual_mismatch_simple_bayesian[7][31] = -2.7075; qual_mismatch_simple_bayesian[7][32] = -2.7081; qual_mismatch_simple_bayesian[7][33] = -2.70858; qual_mismatch_simple_bayesian[7][34] = -2.70896; qual_mismatch_simple_bayesian[7][35] = -2.70926; qual_mismatch_simple_bayesian[7][36] = -2.7095; qual_mismatch_simple_bayesian[7][37] = -2.70969; qual_mismatch_simple_bayesian[7][38] = -2.70984; qual_mismatch_simple_bayesian[7][39] = -2.70996; qual_mismatch_simple_bayesian[7][40] = -2.71005; qual_mismatch_simple_bayesian[7][41] = -2.71013; qual_mismatch_simple_bayesian[7][42] = -2.71019; qual_mismatch_simple_bayesian[7][43] = -2.71024; qual_mismatch_simple_bayesian[7][44] = -2.71028; qual_mismatch_simple_bayesian[7][45] = -2.71031; qual_mismatch_simple_bayesian[7][46] = -2.71033; qual_mismatch_simple_bayesian[8][0] = -1.15289; qual_mismatch_simple_bayesian[8][1] = -1.34073; qual_mismatch_simple_bayesian[8][2] = -1.52003; qual_mismatch_simple_bayesian[8][3] = -1.68963; qual_mismatch_simple_bayesian[8][4] = -1.8484; qual_mismatch_simple_bayesian[8][5] = -1.99536; qual_mismatch_simple_bayesian[8][6] = -2.12976; qual_mismatch_simple_bayesian[8][7] = -2.25109; qual_mismatch_simple_bayesian[8][8] = -2.3592; qual_mismatch_simple_bayesian[8][9] = -2.45427; qual_mismatch_simple_bayesian[8][10] = -2.5368; qual_mismatch_simple_bayesian[8][11] = -2.60759; qual_mismatch_simple_bayesian[8][12] = -2.66762; qual_mismatch_simple_bayesian[8][13] = -2.71801; qual_mismatch_simple_bayesian[8][14] = -2.75994; qual_mismatch_simple_bayesian[8][15] = -2.79454; qual_mismatch_simple_bayesian[8][16] = -2.8229; qual_mismatch_simple_bayesian[8][17] = -2.84602; qual_mismatch_simple_bayesian[8][18] = -2.86477; qual_mismatch_simple_bayesian[8][19] = -2.87992; qual_mismatch_simple_bayesian[8][20] = -2.89212; qual_mismatch_simple_bayesian[8][21] = -2.90191; qual_mismatch_simple_bayesian[8][22] = -2.90977; qual_mismatch_simple_bayesian[8][23] = -2.91605; qual_mismatch_simple_bayesian[8][24] = -2.92106; qual_mismatch_simple_bayesian[8][25] = -2.92507; qual_mismatch_simple_bayesian[8][26] = -2.92826; qual_mismatch_simple_bayesian[8][27] = -2.9308; qual_mismatch_simple_bayesian[8][28] = -2.93282; qual_mismatch_simple_bayesian[8][29] = -2.93444; qual_mismatch_simple_bayesian[8][30] = -2.93572; qual_mismatch_simple_bayesian[8][31] = -2.93674; qual_mismatch_simple_bayesian[8][32] = -2.93755; qual_mismatch_simple_bayesian[8][33] = -2.93819; qual_mismatch_simple_bayesian[8][34] = -2.9387; qual_mismatch_simple_bayesian[8][35] = -2.93911; qual_mismatch_simple_bayesian[8][36] = -2.93943; qual_mismatch_simple_bayesian[8][37] = -2.93969; qual_mismatch_simple_bayesian[8][38] = -2.93989; qual_mismatch_simple_bayesian[8][39] = -2.94005; qual_mismatch_simple_bayesian[8][40] = -2.94018; qual_mismatch_simple_bayesian[8][41] = -2.94029; qual_mismatch_simple_bayesian[8][42] = -2.94037; qual_mismatch_simple_bayesian[8][43] = -2.94043; qual_mismatch_simple_bayesian[8][44] = -2.94048; qual_mismatch_simple_bayesian[8][45] = -2.94052; qual_mismatch_simple_bayesian[8][46] = -2.94056; qual_mismatch_simple_bayesian[9][0] = -1.14148; qual_mismatch_simple_bayesian[9][1] = -1.33828; qual_mismatch_simple_bayesian[9][2] = -1.52795; qual_mismatch_simple_bayesian[9][3] = -1.70935; qual_mismatch_simple_bayesian[9][4] = -1.8813; qual_mismatch_simple_bayesian[9][5] = -2.04269; qual_mismatch_simple_bayesian[9][6] = -2.19248; qual_mismatch_simple_bayesian[9][7] = -2.32986; qual_mismatch_simple_bayesian[9][8] = -2.45427; qual_mismatch_simple_bayesian[9][9] = -2.56545; qual_mismatch_simple_bayesian[9][10] = -2.66352; qual_mismatch_simple_bayesian[9][11] = -2.74891; qual_mismatch_simple_bayesian[9][12] = -2.82235; qual_mismatch_simple_bayesian[9][13] = -2.8848; qual_mismatch_simple_bayesian[9][14] = -2.93733; qual_mismatch_simple_bayesian[9][15] = -2.98112; qual_mismatch_simple_bayesian[9][16] = -3.01733; qual_mismatch_simple_bayesian[9][17] = -3.04705; qual_mismatch_simple_bayesian[9][18] = -3.07131; qual_mismatch_simple_bayesian[9][19] = -3.09101; qual_mismatch_simple_bayesian[9][20] = -3.10693; qual_mismatch_simple_bayesian[9][21] = -3.11977; qual_mismatch_simple_bayesian[9][22] = -3.13008; qual_mismatch_simple_bayesian[9][23] = -3.13835; qual_mismatch_simple_bayesian[9][24] = -3.14496; qual_mismatch_simple_bayesian[9][25] = -3.15025; qual_mismatch_simple_bayesian[9][26] = -3.15447; qual_mismatch_simple_bayesian[9][27] = -3.15784; qual_mismatch_simple_bayesian[9][28] = -3.16052; qual_mismatch_simple_bayesian[9][29] = -3.16265; qual_mismatch_simple_bayesian[9][30] = -3.16435; qual_mismatch_simple_bayesian[9][31] = -3.1657; qual_mismatch_simple_bayesian[9][32] = -3.16678; qual_mismatch_simple_bayesian[9][33] = -3.16763; qual_mismatch_simple_bayesian[9][34] = -3.16831; qual_mismatch_simple_bayesian[9][35] = -3.16885; qual_mismatch_simple_bayesian[9][36] = -3.16928; qual_mismatch_simple_bayesian[9][37] = -3.16962; qual_mismatch_simple_bayesian[9][38] = -3.16989; qual_mismatch_simple_bayesian[9][39] = -3.17011; qual_mismatch_simple_bayesian[9][40] = -3.17028; qual_mismatch_simple_bayesian[9][41] = -3.17041; qual_mismatch_simple_bayesian[9][42] = -3.17052; qual_mismatch_simple_bayesian[9][43] = -3.17061; qual_mismatch_simple_bayesian[9][44] = -3.17068; qual_mismatch_simple_bayesian[9][45] = -3.17073; qual_mismatch_simple_bayesian[9][46] = -3.17077; qual_mismatch_simple_bayesian[10][0] = -1.13251; qual_mismatch_simple_bayesian[10][1] = -1.33634; qual_mismatch_simple_bayesian[10][2] = -1.53428; qual_mismatch_simple_bayesian[10][3] = -1.72529; qual_mismatch_simple_bayesian[10][4] = -1.90823; qual_mismatch_simple_bayesian[10][5] = -2.08194; qual_mismatch_simple_bayesian[10][6] = -2.24527; qual_mismatch_simple_bayesian[10][7] = -2.39718; qual_mismatch_simple_bayesian[10][8] = -2.5368; qual_mismatch_simple_bayesian[10][9] = -2.66352; qual_mismatch_simple_bayesian[10][10] = -2.77704; qual_mismatch_simple_bayesian[10][11] = -2.87741; qual_mismatch_simple_bayesian[10][12] = -2.96499; qual_mismatch_simple_bayesian[10][13] = -3.04048; qual_mismatch_simple_bayesian[10][14] = -3.10478; qual_mismatch_simple_bayesian[10][15] = -3.15899; qual_mismatch_simple_bayesian[10][16] = -3.20424; qual_mismatch_simple_bayesian[10][17] = -3.2417; qual_mismatch_simple_bayesian[10][18] = -3.27249; qual_mismatch_simple_bayesian[10][19] = -3.29764; qual_mismatch_simple_bayesian[10][20] = -3.31808; qual_mismatch_simple_bayesian[10][21] = -3.33462; qual_mismatch_simple_bayesian[10][22] = -3.34796; qual_mismatch_simple_bayesian[10][23] = -3.35868; qual_mismatch_simple_bayesian[10][24] = -3.36728; qual_mismatch_simple_bayesian[10][25] = -3.37416; qual_mismatch_simple_bayesian[10][26] = -3.37966; qual_mismatch_simple_bayesian[10][27] = -3.38405; qual_mismatch_simple_bayesian[10][28] = -3.38756; qual_mismatch_simple_bayesian[10][29] = -3.39035; qual_mismatch_simple_bayesian[10][30] = -3.39257; qual_mismatch_simple_bayesian[10][31] = -3.39434; qual_mismatch_simple_bayesian[10][32] = -3.39574; qual_mismatch_simple_bayesian[10][33] = -3.39686; qual_mismatch_simple_bayesian[10][34] = -3.39775; qual_mismatch_simple_bayesian[10][35] = -3.39846; qual_mismatch_simple_bayesian[10][36] = -3.39902; qual_mismatch_simple_bayesian[10][37] = -3.39947; qual_mismatch_simple_bayesian[10][38] = -3.39982; qual_mismatch_simple_bayesian[10][39] = -3.40011; qual_mismatch_simple_bayesian[10][40] = -3.40033; qual_mismatch_simple_bayesian[10][41] = -3.40051; qual_mismatch_simple_bayesian[10][42] = -3.40065; qual_mismatch_simple_bayesian[10][43] = -3.40076; qual_mismatch_simple_bayesian[10][44] = -3.40085; qual_mismatch_simple_bayesian[10][45] = -3.40092; qual_mismatch_simple_bayesian[10][46] = -3.40098; qual_mismatch_simple_bayesian[11][0] = -1.12545; qual_mismatch_simple_bayesian[11][1] = -1.3348; qual_mismatch_simple_bayesian[11][2] = -1.53934; qual_mismatch_simple_bayesian[11][3] = -1.73814; qual_mismatch_simple_bayesian[11][4] = -1.93016; qual_mismatch_simple_bayesian[11][5] = -2.11426; qual_mismatch_simple_bayesian[11][6] = -2.28928; qual_mismatch_simple_bayesian[11][7] = -2.45408; qual_mismatch_simple_bayesian[11][8] = -2.60759; qual_mismatch_simple_bayesian[11][9] = -2.74891; qual_mismatch_simple_bayesian[11][10] = -2.87741; qual_mismatch_simple_bayesian[11][11] = -2.99272; qual_mismatch_simple_bayesian[11][12] = -3.09485; qual_mismatch_simple_bayesian[11][13] = -3.18412; qual_mismatch_simple_bayesian[11][14] = -3.2612; qual_mismatch_simple_bayesian[11][15] = -3.32696; qual_mismatch_simple_bayesian[11][16] = -3.38246; qual_mismatch_simple_bayesian[11][17] = -3.42885; qual_mismatch_simple_bayesian[11][18] = -3.4673; qual_mismatch_simple_bayesian[11][19] = -3.49893; qual_mismatch_simple_bayesian[11][20] = -3.52479; qual_mismatch_simple_bayesian[11][21] = -3.54582; qual_mismatch_simple_bayesian[11][22] = -3.56284; qual_mismatch_simple_bayesian[11][23] = -3.57658; qual_mismatch_simple_bayesian[11][24] = -3.58762; qual_mismatch_simple_bayesian[11][25] = -3.59648; qual_mismatch_simple_bayesian[11][26] = -3.60357; qual_mismatch_simple_bayesian[11][27] = -3.60925; qual_mismatch_simple_bayesian[11][28] = -3.61377; qual_mismatch_simple_bayesian[11][29] = -3.61738; qual_mismatch_simple_bayesian[11][30] = -3.62026; qual_mismatch_simple_bayesian[11][31] = -3.62255; qual_mismatch_simple_bayesian[11][32] = -3.62438; qual_mismatch_simple_bayesian[11][33] = -3.62583; qual_mismatch_simple_bayesian[11][34] = -3.62698; qual_mismatch_simple_bayesian[11][35] = -3.6279; qual_mismatch_simple_bayesian[11][36] = -3.62863; qual_mismatch_simple_bayesian[11][37] = -3.62921; qual_mismatch_simple_bayesian[11][38] = -3.62967; qual_mismatch_simple_bayesian[11][39] = -3.63004; qual_mismatch_simple_bayesian[11][40] = -3.63033; qual_mismatch_simple_bayesian[11][41] = -3.63056; qual_mismatch_simple_bayesian[11][42] = -3.63075; qual_mismatch_simple_bayesian[11][43] = -3.63089; qual_mismatch_simple_bayesian[11][44] = -3.63101; qual_mismatch_simple_bayesian[11][45] = -3.6311; qual_mismatch_simple_bayesian[11][46] = -3.63117; qual_mismatch_simple_bayesian[12][0] = -1.11987; qual_mismatch_simple_bayesian[12][1] = -1.33358; qual_mismatch_simple_bayesian[12][2] = -1.54338; qual_mismatch_simple_bayesian[12][3] = -1.74847; qual_mismatch_simple_bayesian[12][4] = -1.94792; qual_mismatch_simple_bayesian[12][5] = -2.14069; qual_mismatch_simple_bayesian[12][6] = -2.32567; qual_mismatch_simple_bayesian[12][7] = -2.5017; qual_mismatch_simple_bayesian[12][8] = -2.66762; qual_mismatch_simple_bayesian[12][9] = -2.82235; qual_mismatch_simple_bayesian[12][10] = -2.96499; qual_mismatch_simple_bayesian[12][11] = -3.09485; qual_mismatch_simple_bayesian[12][12] = -3.21154; qual_mismatch_simple_bayesian[12][13] = -3.31504; qual_mismatch_simple_bayesian[12][14] = -3.40563; qual_mismatch_simple_bayesian[12][15] = -3.48395; qual_mismatch_simple_bayesian[12][16] = -3.55084; qual_mismatch_simple_bayesian[12][17] = -3.60736; qual_mismatch_simple_bayesian[12][18] = -3.65465; qual_mismatch_simple_bayesian[12][19] = -3.69388; qual_mismatch_simple_bayesian[12][20] = -3.72617; qual_mismatch_simple_bayesian[12][21] = -3.75259; qual_mismatch_simple_bayesian[12][22] = -3.77408; qual_mismatch_simple_bayesian[12][23] = -3.79149; qual_mismatch_simple_bayesian[12][24] = -3.80553; qual_mismatch_simple_bayesian[12][25] = -3.81683; qual_mismatch_simple_bayesian[12][26] = -3.8259; qual_mismatch_simple_bayesian[12][27] = -3.83316; qual_mismatch_simple_bayesian[12][28] = -3.83897; qual_mismatch_simple_bayesian[12][29] = -3.84361; qual_mismatch_simple_bayesian[12][30] = -3.8473; qual_mismatch_simple_bayesian[12][31] = -3.85025; qual_mismatch_simple_bayesian[12][32] = -3.8526; qual_mismatch_simple_bayesian[12][33] = -3.85447; qual_mismatch_simple_bayesian[12][34] = -3.85595; qual_mismatch_simple_bayesian[12][35] = -3.85713; qual_mismatch_simple_bayesian[12][36] = -3.85807; qual_mismatch_simple_bayesian[12][37] = -3.85882; qual_mismatch_simple_bayesian[12][38] = -3.85942; qual_mismatch_simple_bayesian[12][39] = -3.85989; qual_mismatch_simple_bayesian[12][40] = -3.86026; qual_mismatch_simple_bayesian[12][41] = -3.86056; qual_mismatch_simple_bayesian[12][42] = -3.8608; qual_mismatch_simple_bayesian[12][43] = -3.86099; qual_mismatch_simple_bayesian[12][44] = -3.86114; qual_mismatch_simple_bayesian[12][45] = -3.86126; qual_mismatch_simple_bayesian[12][46] = -3.86135; qual_mismatch_simple_bayesian[13][0] = -1.11546; qual_mismatch_simple_bayesian[13][1] = -1.33261; qual_mismatch_simple_bayesian[13][2] = -1.5466; qual_mismatch_simple_bayesian[13][3] = -1.75675; qual_mismatch_simple_bayesian[13][4] = -1.96226; qual_mismatch_simple_bayesian[13][5] = -2.1622; qual_mismatch_simple_bayesian[13][6] = -2.35556; qual_mismatch_simple_bayesian[13][7] = -2.54122; qual_mismatch_simple_bayesian[13][8] = -2.71801; qual_mismatch_simple_bayesian[13][9] = -2.8848; qual_mismatch_simple_bayesian[13][10] = -3.04048; qual_mismatch_simple_bayesian[13][11] = -3.18412; qual_mismatch_simple_bayesian[13][12] = -3.31504; qual_mismatch_simple_bayesian[13][13] = -3.43281; qual_mismatch_simple_bayesian[13][14] = -3.53737; qual_mismatch_simple_bayesian[13][15] = -3.629; qual_mismatch_simple_bayesian[13][16] = -3.70828; qual_mismatch_simple_bayesian[13][17] = -3.77607; qual_mismatch_simple_bayesian[13][18] = -3.83339; qual_mismatch_simple_bayesian[13][19] = -3.88139; qual_mismatch_simple_bayesian[13][20] = -3.92122; qual_mismatch_simple_bayesian[13][21] = -3.95404; qual_mismatch_simple_bayesian[13][22] = -3.9809; qual_mismatch_simple_bayesian[13][23] = -4.00276; qual_mismatch_simple_bayesian[13][24] = -4.02047; qual_mismatch_simple_bayesian[13][25] = -4.03476; qual_mismatch_simple_bayesian[13][26] = -4.04626; qual_mismatch_simple_bayesian[13][27] = -4.0555; qual_mismatch_simple_bayesian[13][28] = -4.06289; qual_mismatch_simple_bayesian[13][29] = -4.0688; qual_mismatch_simple_bayesian[13][30] = -4.07352; qual_mismatch_simple_bayesian[13][31] = -4.07729; qual_mismatch_simple_bayesian[13][32] = -4.08029; qual_mismatch_simple_bayesian[13][33] = -4.08268; qual_mismatch_simple_bayesian[13][34] = -4.08459; qual_mismatch_simple_bayesian[13][35] = -4.0861; qual_mismatch_simple_bayesian[13][36] = -4.08731; qual_mismatch_simple_bayesian[13][37] = -4.08826; qual_mismatch_simple_bayesian[13][38] = -4.08903; qual_mismatch_simple_bayesian[13][39] = -4.08963; qual_mismatch_simple_bayesian[13][40] = -4.09011; qual_mismatch_simple_bayesian[13][41] = -4.0905; qual_mismatch_simple_bayesian[13][42] = -4.0908; qual_mismatch_simple_bayesian[13][43] = -4.09104; qual_mismatch_simple_bayesian[13][44] = -4.09123; qual_mismatch_simple_bayesian[13][45] = -4.09138; qual_mismatch_simple_bayesian[13][46] = -4.09151; qual_mismatch_simple_bayesian[14][0] = -1.11197; qual_mismatch_simple_bayesian[14][1] = -1.33184; qual_mismatch_simple_bayesian[14][2] = -1.54916; qual_mismatch_simple_bayesian[14][3] = -1.76338; qual_mismatch_simple_bayesian[14][4] = -1.97379; qual_mismatch_simple_bayesian[14][5] = -2.17962; qual_mismatch_simple_bayesian[14][6] = -2.37995; qual_mismatch_simple_bayesian[14][7] = -2.57376; qual_mismatch_simple_bayesian[14][8] = -2.75994; qual_mismatch_simple_bayesian[14][9] = -2.93733; qual_mismatch_simple_bayesian[14][10] = -3.10478; qual_mismatch_simple_bayesian[14][11] = -3.2612; qual_mismatch_simple_bayesian[14][12] = -3.40563; qual_mismatch_simple_bayesian[14][13] = -3.53737; qual_mismatch_simple_bayesian[14][14] = -3.65598; qual_mismatch_simple_bayesian[14][15] = -3.76138; qual_mismatch_simple_bayesian[14][16] = -3.85381; qual_mismatch_simple_bayesian[14][17] = -3.93386; qual_mismatch_simple_bayesian[14][18] = -4.00234; qual_mismatch_simple_bayesian[14][19] = -4.0603; qual_mismatch_simple_bayesian[14][20] = -4.10885; qual_mismatch_simple_bayesian[14][21] = -4.14917; qual_mismatch_simple_bayesian[14][22] = -4.1824; qual_mismatch_simple_bayesian[14][23] = -4.20961; qual_mismatch_simple_bayesian[14][24] = -4.23176; qual_mismatch_simple_bayesian[14][25] = -4.24971; qual_mismatch_simple_bayesian[14][26] = -4.2642; qual_mismatch_simple_bayesian[14][27] = -4.27586; qual_mismatch_simple_bayesian[14][28] = -4.28523; qual_mismatch_simple_bayesian[14][29] = -4.29273; qual_mismatch_simple_bayesian[14][30] = -4.29872; qual_mismatch_simple_bayesian[14][31] = -4.30351; qual_mismatch_simple_bayesian[14][32] = -4.30734; qual_mismatch_simple_bayesian[14][33] = -4.31038; qual_mismatch_simple_bayesian[14][34] = -4.31281; qual_mismatch_simple_bayesian[14][35] = -4.31474; qual_mismatch_simple_bayesian[14][36] = -4.31627; qual_mismatch_simple_bayesian[14][37] = -4.3175; qual_mismatch_simple_bayesian[14][38] = -4.31847; qual_mismatch_simple_bayesian[14][39] = -4.31924; qual_mismatch_simple_bayesian[14][40] = -4.31986; qual_mismatch_simple_bayesian[14][41] = -4.32034; qual_mismatch_simple_bayesian[14][42] = -4.32073; qual_mismatch_simple_bayesian[14][43] = -4.32104; qual_mismatch_simple_bayesian[14][44] = -4.32128; qual_mismatch_simple_bayesian[14][45] = -4.32148; qual_mismatch_simple_bayesian[14][46] = -4.32163; qual_mismatch_simple_bayesian[15][0] = -1.10921; qual_mismatch_simple_bayesian[15][1] = -1.33123; qual_mismatch_simple_bayesian[15][2] = -1.55121; qual_mismatch_simple_bayesian[15][3] = -1.76867; qual_mismatch_simple_bayesian[15][4] = -1.98305; qual_mismatch_simple_bayesian[15][5] = -2.19368; qual_mismatch_simple_bayesian[15][6] = -2.39976; qual_mismatch_simple_bayesian[15][7] = -2.60038; qual_mismatch_simple_bayesian[15][8] = -2.79454; qual_mismatch_simple_bayesian[15][9] = -2.98112; qual_mismatch_simple_bayesian[15][10] = -3.15899; qual_mismatch_simple_bayesian[15][11] = -3.32696; qual_mismatch_simple_bayesian[15][12] = -3.48395; qual_mismatch_simple_bayesian[15][13] = -3.629; qual_mismatch_simple_bayesian[15][14] = -3.76138; qual_mismatch_simple_bayesian[15][15] = -3.88065; qual_mismatch_simple_bayesian[15][16] = -3.9867; qual_mismatch_simple_bayesian[15][17] = -4.07977; qual_mismatch_simple_bayesian[15][18] = -4.16041; qual_mismatch_simple_bayesian[15][19] = -4.22945; qual_mismatch_simple_bayesian[15][20] = -4.2879; qual_mismatch_simple_bayesian[15][21] = -4.3369; qual_mismatch_simple_bayesian[15][22] = -4.3776; qual_mismatch_simple_bayesian[15][23] = -4.41116; qual_mismatch_simple_bayesian[15][24] = -4.43864; qual_mismatch_simple_bayesian[15][25] = -4.46102; qual_mismatch_simple_bayesian[15][26] = -4.47916; qual_mismatch_simple_bayesian[15][27] = -4.49381; qual_mismatch_simple_bayesian[15][28] = -4.5056; qual_mismatch_simple_bayesian[15][29] = -4.51507; qual_mismatch_simple_bayesian[15][30] = -4.52265; qual_mismatch_simple_bayesian[15][31] = -4.52872; qual_mismatch_simple_bayesian[15][32] = -4.53356; qual_mismatch_simple_bayesian[15][33] = -4.53742; qual_mismatch_simple_bayesian[15][34] = -4.5405; qual_mismatch_simple_bayesian[15][35] = -4.54296; qual_mismatch_simple_bayesian[15][36] = -4.54491; qual_mismatch_simple_bayesian[15][37] = -4.54646; qual_mismatch_simple_bayesian[15][38] = -4.5477; qual_mismatch_simple_bayesian[15][39] = -4.54868; qual_mismatch_simple_bayesian[15][40] = -4.54947; qual_mismatch_simple_bayesian[15][41] = -4.55009; qual_mismatch_simple_bayesian[15][42] = -4.55058; qual_mismatch_simple_bayesian[15][43] = -4.55097; qual_mismatch_simple_bayesian[15][44] = -4.55128; qual_mismatch_simple_bayesian[15][45] = -4.55153; qual_mismatch_simple_bayesian[15][46] = -4.55173; qual_mismatch_simple_bayesian[16][0] = -1.10702; qual_mismatch_simple_bayesian[16][1] = -1.33074; qual_mismatch_simple_bayesian[16][2] = -1.55283; qual_mismatch_simple_bayesian[16][3] = -1.7729; qual_mismatch_simple_bayesian[16][4] = -1.99047; qual_mismatch_simple_bayesian[16][5] = -2.20499; qual_mismatch_simple_bayesian[16][6] = -2.41577; qual_mismatch_simple_bayesian[16][7] = -2.62204; qual_mismatch_simple_bayesian[16][8] = -2.8229; qual_mismatch_simple_bayesian[16][9] = -3.01733; qual_mismatch_simple_bayesian[16][10] = -3.20424; qual_mismatch_simple_bayesian[16][11] = -3.38246; qual_mismatch_simple_bayesian[16][12] = -3.55084; qual_mismatch_simple_bayesian[16][13] = -3.70828; qual_mismatch_simple_bayesian[16][14] = -3.85381; qual_mismatch_simple_bayesian[16][15] = -3.9867; qual_mismatch_simple_bayesian[16][16] = -4.10649; qual_mismatch_simple_bayesian[16][17] = -4.21306; qual_mismatch_simple_bayesian[16][18] = -4.30662; qual_mismatch_simple_bayesian[16][19] = -4.38774; qual_mismatch_simple_bayesian[16][20] = -4.45721; qual_mismatch_simple_bayesian[16][21] = -4.51606; qual_mismatch_simple_bayesian[16][22] = -4.5654; qual_mismatch_simple_bayesian[16][23] = -4.60641; qual_mismatch_simple_bayesian[16][24] = -4.64022; qual_mismatch_simple_bayesian[16][25] = -4.66792; qual_mismatch_simple_bayesian[16][26] = -4.69049; qual_mismatch_simple_bayesian[16][27] = -4.70878; qual_mismatch_simple_bayesian[16][28] = -4.72355; qual_mismatch_simple_bayesian[16][29] = -4.73544; qual_mismatch_simple_bayesian[16][30] = -4.74499; qual_mismatch_simple_bayesian[16][31] = -4.75264; qual_mismatch_simple_bayesian[16][32] = -4.75876; qual_mismatch_simple_bayesian[16][33] = -4.76365; qual_mismatch_simple_bayesian[16][34] = -4.76755; qual_mismatch_simple_bayesian[16][35] = -4.77065; qual_mismatch_simple_bayesian[16][36] = -4.77313; qual_mismatch_simple_bayesian[16][37] = -4.7751; qual_mismatch_simple_bayesian[16][38] = -4.77667; qual_mismatch_simple_bayesian[16][39] = -4.77792; qual_mismatch_simple_bayesian[16][40] = -4.77891; qual_mismatch_simple_bayesian[16][41] = -4.7797; qual_mismatch_simple_bayesian[16][42] = -4.78032; qual_mismatch_simple_bayesian[16][43] = -4.78082; qual_mismatch_simple_bayesian[16][44] = -4.78122; qual_mismatch_simple_bayesian[16][45] = -4.78153; qual_mismatch_simple_bayesian[16][46] = -4.78178; qual_mismatch_simple_bayesian[17][0] = -1.10529; qual_mismatch_simple_bayesian[17][1] = -1.33036; qual_mismatch_simple_bayesian[17][2] = -1.55412; qual_mismatch_simple_bayesian[17][3] = -1.77627; qual_mismatch_simple_bayesian[17][4] = -1.9964; qual_mismatch_simple_bayesian[17][5] = -2.21406; qual_mismatch_simple_bayesian[17][6] = -2.42868; qual_mismatch_simple_bayesian[17][7] = -2.63959; qual_mismatch_simple_bayesian[17][8] = -2.84602; qual_mismatch_simple_bayesian[17][9] = -3.04705; qual_mismatch_simple_bayesian[17][10] = -3.2417; qual_mismatch_simple_bayesian[17][11] = -3.42885; qual_mismatch_simple_bayesian[17][12] = -3.60736; qual_mismatch_simple_bayesian[17][13] = -3.77607; qual_mismatch_simple_bayesian[17][14] = -3.93386; qual_mismatch_simple_bayesian[17][15] = -4.07977; qual_mismatch_simple_bayesian[17][16] = -4.21306; qual_mismatch_simple_bayesian[17][17] = -4.33325; qual_mismatch_simple_bayesian[17][18] = -4.44022; qual_mismatch_simple_bayesian[17][19] = -4.53419; qual_mismatch_simple_bayesian[17][20] = -4.61567; qual_mismatch_simple_bayesian[17][21] = -4.68549; qual_mismatch_simple_bayesian[17][22] = -4.74465; qual_mismatch_simple_bayesian[17][23] = -4.79427; qual_mismatch_simple_bayesian[17][24] = -4.83552; qual_mismatch_simple_bayesian[17][25] = -4.86954; qual_mismatch_simple_bayesian[17][26] = -4.89741; qual_mismatch_simple_bayesian[17][27] = -4.92012; qual_mismatch_simple_bayesian[17][28] = -4.93853; qual_mismatch_simple_bayesian[17][29] = -4.9534; qual_mismatch_simple_bayesian[17][30] = -4.96537; qual_mismatch_simple_bayesian[17][31] = -4.97499; qual_mismatch_simple_bayesian[17][32] = -4.98269; qual_mismatch_simple_bayesian[17][33] = -4.98885; qual_mismatch_simple_bayesian[17][34] = -4.99377; qual_mismatch_simple_bayesian[17][35] = -4.9977; qual_mismatch_simple_bayesian[17][36] = -5.00083; qual_mismatch_simple_bayesian[17][37] = -5.00332; qual_mismatch_simple_bayesian[17][38] = -5.0053; qual_mismatch_simple_bayesian[17][39] = -5.00688; qual_mismatch_simple_bayesian[17][40] = -5.00814; qual_mismatch_simple_bayesian[17][41] = -5.00914; qual_mismatch_simple_bayesian[17][42] = -5.00993; qual_mismatch_simple_bayesian[17][43] = -5.01056; qual_mismatch_simple_bayesian[17][44] = -5.01107; qual_mismatch_simple_bayesian[17][45] = -5.01147; qual_mismatch_simple_bayesian[17][46] = -5.01178; qual_mismatch_simple_bayesian[18][0] = -1.10391; qual_mismatch_simple_bayesian[18][1] = -1.33005; qual_mismatch_simple_bayesian[18][2] = -1.55515; qual_mismatch_simple_bayesian[18][3] = -1.77895; qual_mismatch_simple_bayesian[18][4] = -2.00114; qual_mismatch_simple_bayesian[18][5] = -2.22133; qual_mismatch_simple_bayesian[18][6] = -2.43906; qual_mismatch_simple_bayesian[18][7] = -2.65376; qual_mismatch_simple_bayesian[18][8] = -2.86477; qual_mismatch_simple_bayesian[18][9] = -3.07131; qual_mismatch_simple_bayesian[18][10] = -3.27249; qual_mismatch_simple_bayesian[18][11] = -3.4673; qual_mismatch_simple_bayesian[18][12] = -3.65465; qual_mismatch_simple_bayesian[18][13] = -3.83339; qual_mismatch_simple_bayesian[18][14] = -4.00234; qual_mismatch_simple_bayesian[18][15] = -4.16041; qual_mismatch_simple_bayesian[18][16] = -4.30662; qual_mismatch_simple_bayesian[18][17] = -4.44022; qual_mismatch_simple_bayesian[18][18] = -4.56074; qual_mismatch_simple_bayesian[18][19] = -4.66803; qual_mismatch_simple_bayesian[18][20] = -4.76231; qual_mismatch_simple_bayesian[18][21] = -4.84409; qual_mismatch_simple_bayesian[18][22] = -4.91418; qual_mismatch_simple_bayesian[18][23] = -4.97359; qual_mismatch_simple_bayesian[18][24] = -5.02342; qual_mismatch_simple_bayesian[18][25] = -5.06486; qual_mismatch_simple_bayesian[18][26] = -5.09904; qual_mismatch_simple_bayesian[18][27] = -5.12706; qual_mismatch_simple_bayesian[18][28] = -5.14988; qual_mismatch_simple_bayesian[18][29] = -5.16839; qual_mismatch_simple_bayesian[18][30] = -5.18334; qual_mismatch_simple_bayesian[18][31] = -5.19537; qual_mismatch_simple_bayesian[18][32] = -5.20504; qual_mismatch_simple_bayesian[18][33] = -5.21278; qual_mismatch_simple_bayesian[18][34] = -5.21897; qual_mismatch_simple_bayesian[18][35] = -5.22392; qual_mismatch_simple_bayesian[18][36] = -5.22787; qual_mismatch_simple_bayesian[18][37] = -5.23102; qual_mismatch_simple_bayesian[18][38] = -5.23352; qual_mismatch_simple_bayesian[18][39] = -5.23552; qual_mismatch_simple_bayesian[18][40] = -5.23711; qual_mismatch_simple_bayesian[18][41] = -5.23837; qual_mismatch_simple_bayesian[18][42] = -5.23938; qual_mismatch_simple_bayesian[18][43] = -5.24017; qual_mismatch_simple_bayesian[18][44] = -5.24081; qual_mismatch_simple_bayesian[18][45] = -5.24131; qual_mismatch_simple_bayesian[18][46] = -5.24172; qual_mismatch_simple_bayesian[19][0] = -1.10282; qual_mismatch_simple_bayesian[19][1] = -1.32981; qual_mismatch_simple_bayesian[19][2] = -1.55597; qual_mismatch_simple_bayesian[19][3] = -1.78109; qual_mismatch_simple_bayesian[19][4] = -2.00492; qual_mismatch_simple_bayesian[19][5] = -2.22714; qual_mismatch_simple_bayesian[19][6] = -2.44737; qual_mismatch_simple_bayesian[19][7] = -2.66515; qual_mismatch_simple_bayesian[19][8] = -2.87992; qual_mismatch_simple_bayesian[19][9] = -3.09101; qual_mismatch_simple_bayesian[19][10] = -3.29764; qual_mismatch_simple_bayesian[19][11] = -3.49893; qual_mismatch_simple_bayesian[19][12] = -3.69388; qual_mismatch_simple_bayesian[19][13] = -3.88139; qual_mismatch_simple_bayesian[19][14] = -4.0603; qual_mismatch_simple_bayesian[19][15] = -4.22945; qual_mismatch_simple_bayesian[19][16] = -4.38774; qual_mismatch_simple_bayesian[19][17] = -4.53419; qual_mismatch_simple_bayesian[19][18] = -4.66803; qual_mismatch_simple_bayesian[19][19] = -4.78881; qual_mismatch_simple_bayesian[19][20] = -4.89635; qual_mismatch_simple_bayesian[19][21] = -4.99087; qual_mismatch_simple_bayesian[19][22] = -5.07289; qual_mismatch_simple_bayesian[19][23] = -5.1432; qual_mismatch_simple_bayesian[19][24] = -5.2028; qual_mismatch_simple_bayesian[19][25] = -5.25281; qual_mismatch_simple_bayesian[19][26] = -5.29439; qual_mismatch_simple_bayesian[19][27] = -5.32871; qual_mismatch_simple_bayesian[19][28] = -5.35683; qual_mismatch_simple_bayesian[19][29] = -5.37974; qual_mismatch_simple_bayesian[19][30] = -5.39832; qual_mismatch_simple_bayesian[19][31] = -5.41334; qual_mismatch_simple_bayesian[19][32] = -5.42542; qual_mismatch_simple_bayesian[19][33] = -5.43513; qual_mismatch_simple_bayesian[19][34] = -5.44291; qual_mismatch_simple_bayesian[19][35] = -5.44913; qual_mismatch_simple_bayesian[19][36] = -5.4541; qual_mismatch_simple_bayesian[19][37] = -5.45806; qual_mismatch_simple_bayesian[19][38] = -5.46122; qual_mismatch_simple_bayesian[19][39] = -5.46374; qual_mismatch_simple_bayesian[19][40] = -5.46574; qual_mismatch_simple_bayesian[19][41] = -5.46734; qual_mismatch_simple_bayesian[19][42] = -5.46861; qual_mismatch_simple_bayesian[19][43] = -5.46962; qual_mismatch_simple_bayesian[19][44] = -5.47042; qual_mismatch_simple_bayesian[19][45] = -5.47106; qual_mismatch_simple_bayesian[19][46] = -5.47156; qual_mismatch_simple_bayesian[20][0] = -1.10195; qual_mismatch_simple_bayesian[20][1] = -1.32962; qual_mismatch_simple_bayesian[20][2] = -1.55662; qual_mismatch_simple_bayesian[20][3] = -1.78279; qual_mismatch_simple_bayesian[20][4] = -2.00793; qual_mismatch_simple_bayesian[20][5] = -2.23178; qual_mismatch_simple_bayesian[20][6] = -2.45403; qual_mismatch_simple_bayesian[20][7] = -2.6743; qual_mismatch_simple_bayesian[20][8] = -2.89212; qual_mismatch_simple_bayesian[20][9] = -3.10693; qual_mismatch_simple_bayesian[20][10] = -3.31808; qual_mismatch_simple_bayesian[20][11] = -3.52479; qual_mismatch_simple_bayesian[20][12] = -3.72617; qual_mismatch_simple_bayesian[20][13] = -3.92122; qual_mismatch_simple_bayesian[20][14] = -4.10885; qual_mismatch_simple_bayesian[20][15] = -4.2879; qual_mismatch_simple_bayesian[20][16] = -4.45721; qual_mismatch_simple_bayesian[20][17] = -4.61567; qual_mismatch_simple_bayesian[20][18] = -4.76231; qual_mismatch_simple_bayesian[20][19] = -4.89635; qual_mismatch_simple_bayesian[20][20] = -5.01732; qual_mismatch_simple_bayesian[20][21] = -5.12507; qual_mismatch_simple_bayesian[20][22] = -5.21979; qual_mismatch_simple_bayesian[20][23] = -5.30199; qual_mismatch_simple_bayesian[20][24] = -5.37247; qual_mismatch_simple_bayesian[20][25] = -5.43222; qual_mismatch_simple_bayesian[20][26] = -5.48237; qual_mismatch_simple_bayesian[20][27] = -5.52408; qual_mismatch_simple_bayesian[20][28] = -5.55849; qual_mismatch_simple_bayesian[20][29] = -5.5867; qual_mismatch_simple_bayesian[20][30] = -5.60969; qual_mismatch_simple_bayesian[20][31] = -5.62833; qual_mismatch_simple_bayesian[20][32] = -5.64339; qual_mismatch_simple_bayesian[20][33] = -5.65552; qual_mismatch_simple_bayesian[20][34] = -5.66525; qual_mismatch_simple_bayesian[20][35] = -5.67306; qual_mismatch_simple_bayesian[20][36] = -5.6793; qual_mismatch_simple_bayesian[20][37] = -5.68429; qual_mismatch_simple_bayesian[20][38] = -5.68827; qual_mismatch_simple_bayesian[20][39] = -5.69144; qual_mismatch_simple_bayesian[20][40] = -5.69396; qual_mismatch_simple_bayesian[20][41] = -5.69598; qual_mismatch_simple_bayesian[20][42] = -5.69758; qual_mismatch_simple_bayesian[20][43] = -5.69885; qual_mismatch_simple_bayesian[20][44] = -5.69986; qual_mismatch_simple_bayesian[20][45] = -5.70067; qual_mismatch_simple_bayesian[20][46] = -5.70131; qual_mismatch_simple_bayesian[21][0] = -1.10126; qual_mismatch_simple_bayesian[21][1] = -1.32946; qual_mismatch_simple_bayesian[21][2] = -1.55713; qual_mismatch_simple_bayesian[21][3] = -1.78414; qual_mismatch_simple_bayesian[21][4] = -2.01033; qual_mismatch_simple_bayesian[21][5] = -2.23548; qual_mismatch_simple_bayesian[21][6] = -2.45935; qual_mismatch_simple_bayesian[21][7] = -2.68162; qual_mismatch_simple_bayesian[21][8] = -2.90191; qual_mismatch_simple_bayesian[21][9] = -3.11977; qual_mismatch_simple_bayesian[21][10] = -3.33462; qual_mismatch_simple_bayesian[21][11] = -3.54582; qual_mismatch_simple_bayesian[21][12] = -3.75259; qual_mismatch_simple_bayesian[21][13] = -3.95404; qual_mismatch_simple_bayesian[21][14] = -4.14917; qual_mismatch_simple_bayesian[21][15] = -4.3369; qual_mismatch_simple_bayesian[21][16] = -4.51606; qual_mismatch_simple_bayesian[21][17] = -4.68549; qual_mismatch_simple_bayesian[21][18] = -4.84409; qual_mismatch_simple_bayesian[21][19] = -4.99087; qual_mismatch_simple_bayesian[21][20] = -5.12507; qual_mismatch_simple_bayesian[21][21] = -5.2462; qual_mismatch_simple_bayesian[21][22] = -5.35411; qual_mismatch_simple_bayesian[21][23] = -5.44898; qual_mismatch_simple_bayesian[21][24] = -5.53133; qual_mismatch_simple_bayesian[21][25] = -5.60194; qual_mismatch_simple_bayesian[21][26] = -5.66182; qual_mismatch_simple_bayesian[21][27] = -5.71208; qual_mismatch_simple_bayesian[21][28] = -5.75388; qual_mismatch_simple_bayesian[21][29] = -5.78837; qual_mismatch_simple_bayesian[21][30] = -5.81665; qual_mismatch_simple_bayesian[21][31] = -5.83969; qual_mismatch_simple_bayesian[21][32] = -5.85838; qual_mismatch_simple_bayesian[21][33] = -5.87348; qual_mismatch_simple_bayesian[21][34] = -5.88564; qual_mismatch_simple_bayesian[21][35] = -5.89541; qual_mismatch_simple_bayesian[21][36] = -5.90323; qual_mismatch_simple_bayesian[21][37] = -5.90949; qual_mismatch_simple_bayesian[21][38] = -5.91449; qual_mismatch_simple_bayesian[21][39] = -5.91848; qual_mismatch_simple_bayesian[21][40] = -5.92166; qual_mismatch_simple_bayesian[21][41] = -5.9242; qual_mismatch_simple_bayesian[21][42] = -5.92621; qual_mismatch_simple_bayesian[21][43] = -5.92782; qual_mismatch_simple_bayesian[21][44] = -5.92909; qual_mismatch_simple_bayesian[21][45] = -5.93011; qual_mismatch_simple_bayesian[21][46] = -5.93092; qual_mismatch_simple_bayesian[22][0] = -1.10072; qual_mismatch_simple_bayesian[22][1] = -1.32934; qual_mismatch_simple_bayesian[22][2] = -1.55754; qual_mismatch_simple_bayesian[22][3] = -1.78522; qual_mismatch_simple_bayesian[22][4] = -2.01224; qual_mismatch_simple_bayesian[22][5] = -2.23843; qual_mismatch_simple_bayesian[22][6] = -2.4636; qual_mismatch_simple_bayesian[22][7] = -2.68748; qual_mismatch_simple_bayesian[22][8] = -2.90977; qual_mismatch_simple_bayesian[22][9] = -3.13008; qual_mismatch_simple_bayesian[22][10] = -3.34796; qual_mismatch_simple_bayesian[22][11] = -3.56284; qual_mismatch_simple_bayesian[22][12] = -3.77408; qual_mismatch_simple_bayesian[22][13] = -3.9809; qual_mismatch_simple_bayesian[22][14] = -4.1824; qual_mismatch_simple_bayesian[22][15] = -4.3776; qual_mismatch_simple_bayesian[22][16] = -4.5654; qual_mismatch_simple_bayesian[22][17] = -4.74465; qual_mismatch_simple_bayesian[22][18] = -4.91418; qual_mismatch_simple_bayesian[22][19] = -5.07289; qual_mismatch_simple_bayesian[22][20] = -5.21979; qual_mismatch_simple_bayesian[22][21] = -5.35411; qual_mismatch_simple_bayesian[22][22] = -5.47537; qual_mismatch_simple_bayesian[22][23] = -5.5834; qual_mismatch_simple_bayesian[22][24] = -5.67839; qual_mismatch_simple_bayesian[22][25] = -5.76086; qual_mismatch_simple_bayesian[22][26] = -5.83158; qual_mismatch_simple_bayesian[22][27] = -5.89155; qual_mismatch_simple_bayesian[22][28] = -5.9419; qual_mismatch_simple_bayesian[22][29] = -5.98377; qual_mismatch_simple_bayesian[22][30] = -6.01833; qual_mismatch_simple_bayesian[22][31] = -6.04666; qual_mismatch_simple_bayesian[22][32] = -6.06975; qual_mismatch_simple_bayesian[22][33] = -6.08848; qual_mismatch_simple_bayesian[22][34] = -6.10361; qual_mismatch_simple_bayesian[22][35] = -6.1158; qual_mismatch_simple_bayesian[22][36] = -6.12558; qual_mismatch_simple_bayesian[22][37] = -6.13342; qual_mismatch_simple_bayesian[22][38] = -6.1397; qual_mismatch_simple_bayesian[22][39] = -6.14471; qual_mismatch_simple_bayesian[22][40] = -6.14871; qual_mismatch_simple_bayesian[22][41] = -6.15189; qual_mismatch_simple_bayesian[22][42] = -6.15443; qual_mismatch_simple_bayesian[22][43] = -6.15645; qual_mismatch_simple_bayesian[22][44] = -6.15806; qual_mismatch_simple_bayesian[22][45] = -6.15934; qual_mismatch_simple_bayesian[22][46] = -6.16036; qual_mismatch_simple_bayesian[23][0] = -1.10028; qual_mismatch_simple_bayesian[23][1] = -1.32924; qual_mismatch_simple_bayesian[23][2] = -1.55787; qual_mismatch_simple_bayesian[23][3] = -1.78608; qual_mismatch_simple_bayesian[23][4] = -2.01376; qual_mismatch_simple_bayesian[23][5] = -2.24078; qual_mismatch_simple_bayesian[23][6] = -2.46698; qual_mismatch_simple_bayesian[23][7] = -2.69215; qual_mismatch_simple_bayesian[23][8] = -2.91605; qual_mismatch_simple_bayesian[23][9] = -3.13835; qual_mismatch_simple_bayesian[23][10] = -3.35868; qual_mismatch_simple_bayesian[23][11] = -3.57658; qual_mismatch_simple_bayesian[23][12] = -3.79149; qual_mismatch_simple_bayesian[23][13] = -4.00276; qual_mismatch_simple_bayesian[23][14] = -4.20961; qual_mismatch_simple_bayesian[23][15] = -4.41116; qual_mismatch_simple_bayesian[23][16] = -4.60641; qual_mismatch_simple_bayesian[23][17] = -4.79427; qual_mismatch_simple_bayesian[23][18] = -4.97359; qual_mismatch_simple_bayesian[23][19] = -5.1432; qual_mismatch_simple_bayesian[23][20] = -5.30199; qual_mismatch_simple_bayesian[23][21] = -5.44898; qual_mismatch_simple_bayesian[23][22] = -5.5834; qual_mismatch_simple_bayesian[23][23] = -5.70476; qual_mismatch_simple_bayesian[23][24] = -5.81289; qual_mismatch_simple_bayesian[23][25] = -5.90798; qual_mismatch_simple_bayesian[23][26] = -5.99054; qual_mismatch_simple_bayesian[23][27] = -6.06134; qual_mismatch_simple_bayesian[23][28] = -6.12139; qual_mismatch_simple_bayesian[23][29] = -6.17181; qual_mismatch_simple_bayesian[23][30] = -6.21374; qual_mismatch_simple_bayesian[23][31] = -6.24836; qual_mismatch_simple_bayesian[23][32] = -6.27673; qual_mismatch_simple_bayesian[23][33] = -6.29986; qual_mismatch_simple_bayesian[23][34] = -6.31861; qual_mismatch_simple_bayesian[23][35] = -6.33377; qual_mismatch_simple_bayesian[23][36] = -6.34597; qual_mismatch_simple_bayesian[23][37] = -6.35578; qual_mismatch_simple_bayesian[23][38] = -6.36363; qual_mismatch_simple_bayesian[23][39] = -6.36991; qual_mismatch_simple_bayesian[23][40] = -6.37493; qual_mismatch_simple_bayesian[23][41] = -6.37894; qual_mismatch_simple_bayesian[23][42] = -6.38213; qual_mismatch_simple_bayesian[23][43] = -6.38467; qual_mismatch_simple_bayesian[23][44] = -6.3867; qual_mismatch_simple_bayesian[23][45] = -6.38831; qual_mismatch_simple_bayesian[23][46] = -6.38959; qual_mismatch_simple_bayesian[24][0] = -1.09994; qual_mismatch_simple_bayesian[24][1] = -1.32917; qual_mismatch_simple_bayesian[24][2] = -1.55813; qual_mismatch_simple_bayesian[24][3] = -1.78676; qual_mismatch_simple_bayesian[24][4] = -2.01497; qual_mismatch_simple_bayesian[24][5] = -2.24265; qual_mismatch_simple_bayesian[24][6] = -2.46968; qual_mismatch_simple_bayesian[24][7] = -2.69588; qual_mismatch_simple_bayesian[24][8] = -2.92106; qual_mismatch_simple_bayesian[24][9] = -3.14496; qual_mismatch_simple_bayesian[24][10] = -3.36728; qual_mismatch_simple_bayesian[24][11] = -3.58762; qual_mismatch_simple_bayesian[24][12] = -3.80553; qual_mismatch_simple_bayesian[24][13] = -4.02047; qual_mismatch_simple_bayesian[24][14] = -4.23176; qual_mismatch_simple_bayesian[24][15] = -4.43864; qual_mismatch_simple_bayesian[24][16] = -4.64022; qual_mismatch_simple_bayesian[24][17] = -4.83552; qual_mismatch_simple_bayesian[24][18] = -5.02342; qual_mismatch_simple_bayesian[24][19] = -5.2028; qual_mismatch_simple_bayesian[24][20] = -5.37247; qual_mismatch_simple_bayesian[24][21] = -5.53133; qual_mismatch_simple_bayesian[24][22] = -5.67839; qual_mismatch_simple_bayesian[24][23] = -5.81289; qual_mismatch_simple_bayesian[24][24] = -5.93433; qual_mismatch_simple_bayesian[24][25] = -6.04254; qual_mismatch_simple_bayesian[24][26] = -6.1377; qual_mismatch_simple_bayesian[24][27] = -6.22033; qual_mismatch_simple_bayesian[24][28] = -6.29121; qual_mismatch_simple_bayesian[24][29] = -6.35132; qual_mismatch_simple_bayesian[24][30] = -6.40179; qual_mismatch_simple_bayesian[24][31] = -6.44377; qual_mismatch_simple_bayesian[24][32] = -6.47843; qual_mismatch_simple_bayesian[24][33] = -6.50683; qual_mismatch_simple_bayesian[24][34] = -6.52999; qual_mismatch_simple_bayesian[24][35] = -6.54877; qual_mismatch_simple_bayesian[24][36] = -6.56395; qual_mismatch_simple_bayesian[24][37] = -6.57617; qual_mismatch_simple_bayesian[24][38] = -6.58598; qual_mismatch_simple_bayesian[24][39] = -6.59385; qual_mismatch_simple_bayesian[24][40] = -6.60014; qual_mismatch_simple_bayesian[24][41] = -6.60516; qual_mismatch_simple_bayesian[24][42] = -6.60917; qual_mismatch_simple_bayesian[24][43] = -6.61237; qual_mismatch_simple_bayesian[24][44] = -6.61492; qual_mismatch_simple_bayesian[24][45] = -6.61695; qual_mismatch_simple_bayesian[24][46] = -6.61856; qual_mismatch_simple_bayesian[25][0] = -1.09967; qual_mismatch_simple_bayesian[25][1] = -1.32911; qual_mismatch_simple_bayesian[25][2] = -1.55833; qual_mismatch_simple_bayesian[25][3] = -1.7873; qual_mismatch_simple_bayesian[25][4] = -2.01593; qual_mismatch_simple_bayesian[25][5] = -2.24414; qual_mismatch_simple_bayesian[25][6] = -2.47183; qual_mismatch_simple_bayesian[25][7] = -2.69886; qual_mismatch_simple_bayesian[25][8] = -2.92507; qual_mismatch_simple_bayesian[25][9] = -3.15025; qual_mismatch_simple_bayesian[25][10] = -3.37416; qual_mismatch_simple_bayesian[25][11] = -3.59648; qual_mismatch_simple_bayesian[25][12] = -3.81683; qual_mismatch_simple_bayesian[25][13] = -4.03476; qual_mismatch_simple_bayesian[25][14] = -4.24971; qual_mismatch_simple_bayesian[25][15] = -4.46102; qual_mismatch_simple_bayesian[25][16] = -4.66792; qual_mismatch_simple_bayesian[25][17] = -4.86954; qual_mismatch_simple_bayesian[25][18] = -5.06486; qual_mismatch_simple_bayesian[25][19] = -5.25281; qual_mismatch_simple_bayesian[25][20] = -5.43222; qual_mismatch_simple_bayesian[25][21] = -5.60194; qual_mismatch_simple_bayesian[25][22] = -5.76086; qual_mismatch_simple_bayesian[25][23] = -5.90798; qual_mismatch_simple_bayesian[25][24] = -6.04254; qual_mismatch_simple_bayesian[25][25] = -6.16404; qual_mismatch_simple_bayesian[25][26] = -6.27231; qual_mismatch_simple_bayesian[25][27] = -6.36754; qual_mismatch_simple_bayesian[25][28] = -6.45023; qual_mismatch_simple_bayesian[25][29] = -6.52116; qual_mismatch_simple_bayesian[25][30] = -6.58132; qual_mismatch_simple_bayesian[25][31] = -6.63183; qual_mismatch_simple_bayesian[25][32] = -6.67385; qual_mismatch_simple_bayesian[25][33] = -6.70854; qual_mismatch_simple_bayesian[25][34] = -6.73697; qual_mismatch_simple_bayesian[25][35] = -6.76015; qual_mismatch_simple_bayesian[25][36] = -6.77895; qual_mismatch_simple_bayesian[25][37] = -6.79414; qual_mismatch_simple_bayesian[25][38] = -6.80637; qual_mismatch_simple_bayesian[25][39] = -6.8162; qual_mismatch_simple_bayesian[25][40] = -6.82407; qual_mismatch_simple_bayesian[25][41] = -6.83037; qual_mismatch_simple_bayesian[25][42] = -6.8354; qual_mismatch_simple_bayesian[25][43] = -6.83942; qual_mismatch_simple_bayesian[25][44] = -6.84262; qual_mismatch_simple_bayesian[25][45] = -6.84517; qual_mismatch_simple_bayesian[25][46] = -6.8472; qual_mismatch_simple_bayesian[26][0] = -1.09945; qual_mismatch_simple_bayesian[26][1] = -1.32906; qual_mismatch_simple_bayesian[26][2] = -1.5585; qual_mismatch_simple_bayesian[26][3] = -1.78773; qual_mismatch_simple_bayesian[26][4] = -2.01669; qual_mismatch_simple_bayesian[26][5] = -2.24532; qual_mismatch_simple_bayesian[26][6] = -2.47353; qual_mismatch_simple_bayesian[26][7] = -2.70122; qual_mismatch_simple_bayesian[26][8] = -2.92826; qual_mismatch_simple_bayesian[26][9] = -3.15447; qual_mismatch_simple_bayesian[26][10] = -3.37966; qual_mismatch_simple_bayesian[26][11] = -3.60357; qual_mismatch_simple_bayesian[26][12] = -3.8259; qual_mismatch_simple_bayesian[26][13] = -4.04626; qual_mismatch_simple_bayesian[26][14] = -4.2642; qual_mismatch_simple_bayesian[26][15] = -4.47916; qual_mismatch_simple_bayesian[26][16] = -4.69049; qual_mismatch_simple_bayesian[26][17] = -4.89741; qual_mismatch_simple_bayesian[26][18] = -5.09904; qual_mismatch_simple_bayesian[26][19] = -5.29439; qual_mismatch_simple_bayesian[26][20] = -5.48237; qual_mismatch_simple_bayesian[26][21] = -5.66182; qual_mismatch_simple_bayesian[26][22] = -5.83158; qual_mismatch_simple_bayesian[26][23] = -5.99054; qual_mismatch_simple_bayesian[26][24] = -6.1377; qual_mismatch_simple_bayesian[26][25] = -6.27231; qual_mismatch_simple_bayesian[26][26] = -6.39386; qual_mismatch_simple_bayesian[26][27] = -6.50219; qual_mismatch_simple_bayesian[26][28] = -6.59746; qual_mismatch_simple_bayesian[26][29] = -6.6802; qual_mismatch_simple_bayesian[26][30] = -6.75117; qual_mismatch_simple_bayesian[26][31] = -6.81137; qual_mismatch_simple_bayesian[26][32] = -6.86191; qual_mismatch_simple_bayesian[26][33] = -6.90396; qual_mismatch_simple_bayesian[26][34] = -6.93867; qual_mismatch_simple_bayesian[26][35] = -6.96713; qual_mismatch_simple_bayesian[26][36] = -6.99033; qual_mismatch_simple_bayesian[26][37] = -7.00914; qual_mismatch_simple_bayesian[26][38] = -7.02435; qual_mismatch_simple_bayesian[26][39] = -7.03659; qual_mismatch_simple_bayesian[26][40] = -7.04642; qual_mismatch_simple_bayesian[26][41] = -7.0543; qual_mismatch_simple_bayesian[26][42] = -7.06061; qual_mismatch_simple_bayesian[26][43] = -7.06564; qual_mismatch_simple_bayesian[26][44] = -7.06966; qual_mismatch_simple_bayesian[26][45] = -7.07286; qual_mismatch_simple_bayesian[26][46] = -7.07542; qual_mismatch_simple_bayesian[27][0] = -1.09928; qual_mismatch_simple_bayesian[27][1] = -1.32902; qual_mismatch_simple_bayesian[27][2] = -1.55863; qual_mismatch_simple_bayesian[27][3] = -1.78807; qual_mismatch_simple_bayesian[27][4] = -2.0173; qual_mismatch_simple_bayesian[27][5] = -2.24626; qual_mismatch_simple_bayesian[27][6] = -2.47489; qual_mismatch_simple_bayesian[27][7] = -2.70311; qual_mismatch_simple_bayesian[27][8] = -2.9308; qual_mismatch_simple_bayesian[27][9] = -3.15784; qual_mismatch_simple_bayesian[27][10] = -3.38405; qual_mismatch_simple_bayesian[27][11] = -3.60925; qual_mismatch_simple_bayesian[27][12] = -3.83316; qual_mismatch_simple_bayesian[27][13] = -4.0555; qual_mismatch_simple_bayesian[27][14] = -4.27586; qual_mismatch_simple_bayesian[27][15] = -4.49381; qual_mismatch_simple_bayesian[27][16] = -4.70878; qual_mismatch_simple_bayesian[27][17] = -4.92012; qual_mismatch_simple_bayesian[27][18] = -5.12706; qual_mismatch_simple_bayesian[27][19] = -5.32871; qual_mismatch_simple_bayesian[27][20] = -5.52408; qual_mismatch_simple_bayesian[27][21] = -5.71208; qual_mismatch_simple_bayesian[27][22] = -5.89155; qual_mismatch_simple_bayesian[27][23] = -6.06134; qual_mismatch_simple_bayesian[27][24] = -6.22033; qual_mismatch_simple_bayesian[27][25] = -6.36754; qual_mismatch_simple_bayesian[27][26] = -6.50219; qual_mismatch_simple_bayesian[27][27] = -6.62378; qual_mismatch_simple_bayesian[27][28] = -6.73214; qual_mismatch_simple_bayesian[27][29] = -6.82745; qual_mismatch_simple_bayesian[27][30] = -6.91022; qual_mismatch_simple_bayesian[27][31] = -6.98123; qual_mismatch_simple_bayesian[27][32] = -7.04146; qual_mismatch_simple_bayesian[27][33] = -7.09203; qual_mismatch_simple_bayesian[27][34] = -7.13411; qual_mismatch_simple_bayesian[27][35] = -7.16884; qual_mismatch_simple_bayesian[27][36] = -7.19731; qual_mismatch_simple_bayesian[27][37] = -7.22052; qual_mismatch_simple_bayesian[27][38] = -7.23935; qual_mismatch_simple_bayesian[27][39] = -7.25456; qual_mismatch_simple_bayesian[27][40] = -7.26682; qual_mismatch_simple_bayesian[27][41] = -7.27666; qual_mismatch_simple_bayesian[27][42] = -7.28454; qual_mismatch_simple_bayesian[27][43] = -7.29085; qual_mismatch_simple_bayesian[27][44] = -7.29589; qual_mismatch_simple_bayesian[27][45] = -7.29991; qual_mismatch_simple_bayesian[27][46] = -7.30311; qual_mismatch_simple_bayesian[28][0] = -1.09914; qual_mismatch_simple_bayesian[28][1] = -1.32899; qual_mismatch_simple_bayesian[28][2] = -1.55873; qual_mismatch_simple_bayesian[28][3] = -1.78834; qual_mismatch_simple_bayesian[28][4] = -2.01778; qual_mismatch_simple_bayesian[28][5] = -2.24701; qual_mismatch_simple_bayesian[28][6] = -2.47598; qual_mismatch_simple_bayesian[28][7] = -2.70461; qual_mismatch_simple_bayesian[28][8] = -2.93282; qual_mismatch_simple_bayesian[28][9] = -3.16052; qual_mismatch_simple_bayesian[28][10] = -3.38756; qual_mismatch_simple_bayesian[28][11] = -3.61377; qual_mismatch_simple_bayesian[28][12] = -3.83897; qual_mismatch_simple_bayesian[28][13] = -4.06289; qual_mismatch_simple_bayesian[28][14] = -4.28523; qual_mismatch_simple_bayesian[28][15] = -4.5056; qual_mismatch_simple_bayesian[28][16] = -4.72355; qual_mismatch_simple_bayesian[28][17] = -4.93853; qual_mismatch_simple_bayesian[28][18] = -5.14988; qual_mismatch_simple_bayesian[28][19] = -5.35683; qual_mismatch_simple_bayesian[28][20] = -5.55849; qual_mismatch_simple_bayesian[28][21] = -5.75388; qual_mismatch_simple_bayesian[28][22] = -5.9419; qual_mismatch_simple_bayesian[28][23] = -6.12139; qual_mismatch_simple_bayesian[28][24] = -6.29121; qual_mismatch_simple_bayesian[28][25] = -6.45023; qual_mismatch_simple_bayesian[28][26] = -6.59746; qual_mismatch_simple_bayesian[28][27] = -6.73214; qual_mismatch_simple_bayesian[28][28] = -6.85376; qual_mismatch_simple_bayesian[28][29] = -6.96216; qual_mismatch_simple_bayesian[28][30] = -7.0575; qual_mismatch_simple_bayesian[28][31] = -7.1403; qual_mismatch_simple_bayesian[28][32] = -7.21133; qual_mismatch_simple_bayesian[28][33] = -7.27159; qual_mismatch_simple_bayesian[28][34] = -7.32218; qual_mismatch_simple_bayesian[28][35] = -7.36428; qual_mismatch_simple_bayesian[28][36] = -7.39902; qual_mismatch_simple_bayesian[28][37] = -7.42751; qual_mismatch_simple_bayesian[28][38] = -7.45073; qual_mismatch_simple_bayesian[28][39] = -7.46957; qual_mismatch_simple_bayesian[28][40] = -7.48479; qual_mismatch_simple_bayesian[28][41] = -7.49705; qual_mismatch_simple_bayesian[28][42] = -7.50689; qual_mismatch_simple_bayesian[28][43] = -7.51478; qual_mismatch_simple_bayesian[28][44] = -7.52109; qual_mismatch_simple_bayesian[28][45] = -7.52614; qual_mismatch_simple_bayesian[28][46] = -7.53016; qual_mismatch_simple_bayesian[29][0] = -1.09903; qual_mismatch_simple_bayesian[29][1] = -1.32896; qual_mismatch_simple_bayesian[29][2] = -1.55881; qual_mismatch_simple_bayesian[29][3] = -1.78855; qual_mismatch_simple_bayesian[29][4] = -2.01816; qual_mismatch_simple_bayesian[29][5] = -2.2476; qual_mismatch_simple_bayesian[29][6] = -2.47684; qual_mismatch_simple_bayesian[29][7] = -2.7058; qual_mismatch_simple_bayesian[29][8] = -2.93444; qual_mismatch_simple_bayesian[29][9] = -3.16265; qual_mismatch_simple_bayesian[29][10] = -3.39035; qual_mismatch_simple_bayesian[29][11] = -3.61738; qual_mismatch_simple_bayesian[29][12] = -3.84361; qual_mismatch_simple_bayesian[29][13] = -4.0688; qual_mismatch_simple_bayesian[29][14] = -4.29273; qual_mismatch_simple_bayesian[29][15] = -4.51507; qual_mismatch_simple_bayesian[29][16] = -4.73544; qual_mismatch_simple_bayesian[29][17] = -4.9534; qual_mismatch_simple_bayesian[29][18] = -5.16839; qual_mismatch_simple_bayesian[29][19] = -5.37974; qual_mismatch_simple_bayesian[29][20] = -5.5867; qual_mismatch_simple_bayesian[29][21] = -5.78837; qual_mismatch_simple_bayesian[29][22] = -5.98377; qual_mismatch_simple_bayesian[29][23] = -6.17181; qual_mismatch_simple_bayesian[29][24] = -6.35132; qual_mismatch_simple_bayesian[29][25] = -6.52116; qual_mismatch_simple_bayesian[29][26] = -6.6802; qual_mismatch_simple_bayesian[29][27] = -6.82745; qual_mismatch_simple_bayesian[29][28] = -6.96216; qual_mismatch_simple_bayesian[29][29] = -7.0838; qual_mismatch_simple_bayesian[29][30] = -7.19222; qual_mismatch_simple_bayesian[29][31] = -7.28759; qual_mismatch_simple_bayesian[29][32] = -7.37041; qual_mismatch_simple_bayesian[29][33] = -7.44147; qual_mismatch_simple_bayesian[29][34] = -7.50174; qual_mismatch_simple_bayesian[29][35] = -7.55235; qual_mismatch_simple_bayesian[29][36] = -7.59446; qual_mismatch_simple_bayesian[29][37] = -7.62922; qual_mismatch_simple_bayesian[29][38] = -7.65772; qual_mismatch_simple_bayesian[29][39] = -7.68095; qual_mismatch_simple_bayesian[29][40] = -7.6998; qual_mismatch_simple_bayesian[29][41] = -7.71502; qual_mismatch_simple_bayesian[29][42] = -7.72729; qual_mismatch_simple_bayesian[29][43] = -7.73713; qual_mismatch_simple_bayesian[29][44] = -7.74503; qual_mismatch_simple_bayesian[29][45] = -7.75134; qual_mismatch_simple_bayesian[29][46] = -7.75639; qual_mismatch_simple_bayesian[30][0] = -1.09895; qual_mismatch_simple_bayesian[30][1] = -1.32895; qual_mismatch_simple_bayesian[30][2] = -1.55888; qual_mismatch_simple_bayesian[30][3] = -1.78873; qual_mismatch_simple_bayesian[30][4] = -2.01847; qual_mismatch_simple_bayesian[30][5] = -2.24808; qual_mismatch_simple_bayesian[30][6] = -2.47752; qual_mismatch_simple_bayesian[30][7] = -2.70675; qual_mismatch_simple_bayesian[30][8] = -2.93572; qual_mismatch_simple_bayesian[30][9] = -3.16435; qual_mismatch_simple_bayesian[30][10] = -3.39257; qual_mismatch_simple_bayesian[30][11] = -3.62026; qual_mismatch_simple_bayesian[30][12] = -3.8473; qual_mismatch_simple_bayesian[30][13] = -4.07352; qual_mismatch_simple_bayesian[30][14] = -4.29872; qual_mismatch_simple_bayesian[30][15] = -4.52265; qual_mismatch_simple_bayesian[30][16] = -4.74499; qual_mismatch_simple_bayesian[30][17] = -4.96537; qual_mismatch_simple_bayesian[30][18] = -5.18334; qual_mismatch_simple_bayesian[30][19] = -5.39832; qual_mismatch_simple_bayesian[30][20] = -5.60969; qual_mismatch_simple_bayesian[30][21] = -5.81665; qual_mismatch_simple_bayesian[30][22] = -6.01833; qual_mismatch_simple_bayesian[30][23] = -6.21374; qual_mismatch_simple_bayesian[30][24] = -6.40179; qual_mismatch_simple_bayesian[30][25] = -6.58132; qual_mismatch_simple_bayesian[30][26] = -6.75117; qual_mismatch_simple_bayesian[30][27] = -6.91022; qual_mismatch_simple_bayesian[30][28] = -7.0575; qual_mismatch_simple_bayesian[30][29] = -7.19222; qual_mismatch_simple_bayesian[30][30] = -7.31389; qual_mismatch_simple_bayesian[30][31] = -7.42233; qual_mismatch_simple_bayesian[30][32] = -7.51772; qual_mismatch_simple_bayesian[30][33] = -7.60056; qual_mismatch_simple_bayesian[30][34] = -7.67163; qual_mismatch_simple_bayesian[30][35] = -7.73192; qual_mismatch_simple_bayesian[30][36] = -7.78254; qual_mismatch_simple_bayesian[30][37] = -7.82466; qual_mismatch_simple_bayesian[30][38] = -7.85943; qual_mismatch_simple_bayesian[30][39] = -7.88794; qual_mismatch_simple_bayesian[30][40] = -7.91118; qual_mismatch_simple_bayesian[30][41] = -7.93003; qual_mismatch_simple_bayesian[30][42] = -7.94526; qual_mismatch_simple_bayesian[30][43] = -7.95753; qual_mismatch_simple_bayesian[30][44] = -7.96738; qual_mismatch_simple_bayesian[30][45] = -7.97528; qual_mismatch_simple_bayesian[30][46] = -7.98159; qual_mismatch_simple_bayesian[31][0] = -1.09888; qual_mismatch_simple_bayesian[31][1] = -1.32893; qual_mismatch_simple_bayesian[31][2] = -1.55893; qual_mismatch_simple_bayesian[31][3] = -1.78886; qual_mismatch_simple_bayesian[31][4] = -2.01871; qual_mismatch_simple_bayesian[31][5] = -2.24845; qual_mismatch_simple_bayesian[31][6] = -2.47806; qual_mismatch_simple_bayesian[31][7] = -2.7075; qual_mismatch_simple_bayesian[31][8] = -2.93674; qual_mismatch_simple_bayesian[31][9] = -3.1657; qual_mismatch_simple_bayesian[31][10] = -3.39434; qual_mismatch_simple_bayesian[31][11] = -3.62255; qual_mismatch_simple_bayesian[31][12] = -3.85025; qual_mismatch_simple_bayesian[31][13] = -4.07729; qual_mismatch_simple_bayesian[31][14] = -4.30351; qual_mismatch_simple_bayesian[31][15] = -4.52872; qual_mismatch_simple_bayesian[31][16] = -4.75264; qual_mismatch_simple_bayesian[31][17] = -4.97499; qual_mismatch_simple_bayesian[31][18] = -5.19537; qual_mismatch_simple_bayesian[31][19] = -5.41334; qual_mismatch_simple_bayesian[31][20] = -5.62833; qual_mismatch_simple_bayesian[31][21] = -5.83969; qual_mismatch_simple_bayesian[31][22] = -6.04666; qual_mismatch_simple_bayesian[31][23] = -6.24836; qual_mismatch_simple_bayesian[31][24] = -6.44377; qual_mismatch_simple_bayesian[31][25] = -6.63183; qual_mismatch_simple_bayesian[31][26] = -6.81137; qual_mismatch_simple_bayesian[31][27] = -6.98123; qual_mismatch_simple_bayesian[31][28] = -7.1403; qual_mismatch_simple_bayesian[31][29] = -7.28759; qual_mismatch_simple_bayesian[31][30] = -7.42233; qual_mismatch_simple_bayesian[31][31] = -7.54401; qual_mismatch_simple_bayesian[31][32] = -7.65246; qual_mismatch_simple_bayesian[31][33] = -7.74787; qual_mismatch_simple_bayesian[31][34] = -7.83072; qual_mismatch_simple_bayesian[31][35] = -7.90181; qual_mismatch_simple_bayesian[31][36] = -7.96211; qual_mismatch_simple_bayesian[31][37] = -8.01274; qual_mismatch_simple_bayesian[31][38] = -8.05488; qual_mismatch_simple_bayesian[31][39] = -8.08965; qual_mismatch_simple_bayesian[31][40] = -8.11817; qual_mismatch_simple_bayesian[31][41] = -8.14141; qual_mismatch_simple_bayesian[31][42] = -8.16027; qual_mismatch_simple_bayesian[31][43] = -8.1755; qual_mismatch_simple_bayesian[31][44] = -8.18777; qual_mismatch_simple_bayesian[31][45] = -8.19763; qual_mismatch_simple_bayesian[31][46] = -8.20553; qual_mismatch_simple_bayesian[32][0] = -1.09882; qual_mismatch_simple_bayesian[32][1] = -1.32892; qual_mismatch_simple_bayesian[32][2] = -1.55897; qual_mismatch_simple_bayesian[32][3] = -1.78897; qual_mismatch_simple_bayesian[32][4] = -2.0189; qual_mismatch_simple_bayesian[32][5] = -2.24875; qual_mismatch_simple_bayesian[32][6] = -2.47849; qual_mismatch_simple_bayesian[32][7] = -2.7081; qual_mismatch_simple_bayesian[32][8] = -2.93755; qual_mismatch_simple_bayesian[32][9] = -3.16678; qual_mismatch_simple_bayesian[32][10] = -3.39574; qual_mismatch_simple_bayesian[32][11] = -3.62438; qual_mismatch_simple_bayesian[32][12] = -3.8526; qual_mismatch_simple_bayesian[32][13] = -4.08029; qual_mismatch_simple_bayesian[32][14] = -4.30734; qual_mismatch_simple_bayesian[32][15] = -4.53356; qual_mismatch_simple_bayesian[32][16] = -4.75876; qual_mismatch_simple_bayesian[32][17] = -4.98269; qual_mismatch_simple_bayesian[32][18] = -5.20504; qual_mismatch_simple_bayesian[32][19] = -5.42542; qual_mismatch_simple_bayesian[32][20] = -5.64339; qual_mismatch_simple_bayesian[32][21] = -5.85838; qual_mismatch_simple_bayesian[32][22] = -6.06975; qual_mismatch_simple_bayesian[32][23] = -6.27673; qual_mismatch_simple_bayesian[32][24] = -6.47843; qual_mismatch_simple_bayesian[32][25] = -6.67385; qual_mismatch_simple_bayesian[32][26] = -6.86191; qual_mismatch_simple_bayesian[32][27] = -7.04146; qual_mismatch_simple_bayesian[32][28] = -7.21133; qual_mismatch_simple_bayesian[32][29] = -7.37041; qual_mismatch_simple_bayesian[32][30] = -7.51772; qual_mismatch_simple_bayesian[32][31] = -7.65246; qual_mismatch_simple_bayesian[32][32] = -7.77416; qual_mismatch_simple_bayesian[32][33] = -7.88263; qual_mismatch_simple_bayesian[32][34] = -7.97804; qual_mismatch_simple_bayesian[32][35] = -8.06091; qual_mismatch_simple_bayesian[32][36] = -8.132; qual_mismatch_simple_bayesian[32][37] = -8.19232; qual_mismatch_simple_bayesian[32][38] = -8.24296; qual_mismatch_simple_bayesian[32][39] = -8.2851; qual_mismatch_simple_bayesian[32][40] = -8.31988; qual_mismatch_simple_bayesian[32][41] = -8.3484; qual_mismatch_simple_bayesian[32][42] = -8.37165; qual_mismatch_simple_bayesian[32][43] = -8.39051; qual_mismatch_simple_bayesian[32][44] = -8.40575; qual_mismatch_simple_bayesian[32][45] = -8.41802; qual_mismatch_simple_bayesian[32][46] = -8.42788; qual_mismatch_simple_bayesian[33][0] = -1.09878; qual_mismatch_simple_bayesian[33][1] = -1.32891; qual_mismatch_simple_bayesian[33][2] = -1.559; qual_mismatch_simple_bayesian[33][3] = -1.78906; qual_mismatch_simple_bayesian[33][4] = -2.01906; qual_mismatch_simple_bayesian[33][5] = -2.24899; qual_mismatch_simple_bayesian[33][6] = -2.47884; qual_mismatch_simple_bayesian[33][7] = -2.70858; qual_mismatch_simple_bayesian[33][8] = -2.93819; qual_mismatch_simple_bayesian[33][9] = -3.16763; qual_mismatch_simple_bayesian[33][10] = -3.39686; qual_mismatch_simple_bayesian[33][11] = -3.62583; qual_mismatch_simple_bayesian[33][12] = -3.85447; qual_mismatch_simple_bayesian[33][13] = -4.08268; qual_mismatch_simple_bayesian[33][14] = -4.31038; qual_mismatch_simple_bayesian[33][15] = -4.53742; qual_mismatch_simple_bayesian[33][16] = -4.76365; qual_mismatch_simple_bayesian[33][17] = -4.98885; qual_mismatch_simple_bayesian[33][18] = -5.21278; qual_mismatch_simple_bayesian[33][19] = -5.43513; qual_mismatch_simple_bayesian[33][20] = -5.65552; qual_mismatch_simple_bayesian[33][21] = -5.87348; qual_mismatch_simple_bayesian[33][22] = -6.08848; qual_mismatch_simple_bayesian[33][23] = -6.29986; qual_mismatch_simple_bayesian[33][24] = -6.50683; qual_mismatch_simple_bayesian[33][25] = -6.70854; qual_mismatch_simple_bayesian[33][26] = -6.90396; qual_mismatch_simple_bayesian[33][27] = -7.09203; qual_mismatch_simple_bayesian[33][28] = -7.27159; qual_mismatch_simple_bayesian[33][29] = -7.44147; qual_mismatch_simple_bayesian[33][30] = -7.60056; qual_mismatch_simple_bayesian[33][31] = -7.74787; qual_mismatch_simple_bayesian[33][32] = -7.88263; qual_mismatch_simple_bayesian[33][33] = -8.00433; qual_mismatch_simple_bayesian[33][34] = -8.11281; qual_mismatch_simple_bayesian[33][35] = -8.20823; qual_mismatch_simple_bayesian[33][36] = -8.29111; qual_mismatch_simple_bayesian[33][37] = -8.36221; qual_mismatch_simple_bayesian[33][38] = -8.42253; qual_mismatch_simple_bayesian[33][39] = -8.47318; qual_mismatch_simple_bayesian[33][40] = -8.51533; qual_mismatch_simple_bayesian[33][41] = -8.55012; qual_mismatch_simple_bayesian[33][42] = -8.57864; qual_mismatch_simple_bayesian[33][43] = -8.60189; qual_mismatch_simple_bayesian[33][44] = -8.62076; qual_mismatch_simple_bayesian[33][45] = -8.636; qual_mismatch_simple_bayesian[33][46] = -8.64827; qual_mismatch_simple_bayesian[34][0] = -1.09874; qual_mismatch_simple_bayesian[34][1] = -1.3289; qual_mismatch_simple_bayesian[34][2] = -1.55903; qual_mismatch_simple_bayesian[34][3] = -1.78912; qual_mismatch_simple_bayesian[34][4] = -2.01918; qual_mismatch_simple_bayesian[34][5] = -2.24918; qual_mismatch_simple_bayesian[34][6] = -2.47911; qual_mismatch_simple_bayesian[34][7] = -2.70896; qual_mismatch_simple_bayesian[34][8] = -2.9387; qual_mismatch_simple_bayesian[34][9] = -3.16831; qual_mismatch_simple_bayesian[34][10] = -3.39775; qual_mismatch_simple_bayesian[34][11] = -3.62698; qual_mismatch_simple_bayesian[34][12] = -3.85595; qual_mismatch_simple_bayesian[34][13] = -4.08459; qual_mismatch_simple_bayesian[34][14] = -4.31281; qual_mismatch_simple_bayesian[34][15] = -4.5405; qual_mismatch_simple_bayesian[34][16] = -4.76755; qual_mismatch_simple_bayesian[34][17] = -4.99377; qual_mismatch_simple_bayesian[34][18] = -5.21897; qual_mismatch_simple_bayesian[34][19] = -5.44291; qual_mismatch_simple_bayesian[34][20] = -5.66525; qual_mismatch_simple_bayesian[34][21] = -5.88564; qual_mismatch_simple_bayesian[34][22] = -6.10361; qual_mismatch_simple_bayesian[34][23] = -6.31861; qual_mismatch_simple_bayesian[34][24] = -6.52999; qual_mismatch_simple_bayesian[34][25] = -6.73697; qual_mismatch_simple_bayesian[34][26] = -6.93867; qual_mismatch_simple_bayesian[34][27] = -7.13411; qual_mismatch_simple_bayesian[34][28] = -7.32218; qual_mismatch_simple_bayesian[34][29] = -7.50174; qual_mismatch_simple_bayesian[34][30] = -7.67163; qual_mismatch_simple_bayesian[34][31] = -7.83072; qual_mismatch_simple_bayesian[34][32] = -7.97804; qual_mismatch_simple_bayesian[34][33] = -8.11281; qual_mismatch_simple_bayesian[34][34] = -8.23452; qual_mismatch_simple_bayesian[34][35] = -8.34301; qual_mismatch_simple_bayesian[34][36] = -8.43844; qual_mismatch_simple_bayesian[34][37] = -8.52132; qual_mismatch_simple_bayesian[34][38] = -8.59243; qual_mismatch_simple_bayesian[34][39] = -8.65276; qual_mismatch_simple_bayesian[34][40] = -8.70341; qual_mismatch_simple_bayesian[34][41] = -8.74556; qual_mismatch_simple_bayesian[34][42] = -8.78036; qual_mismatch_simple_bayesian[34][43] = -8.80888; qual_mismatch_simple_bayesian[34][44] = -8.83214; qual_mismatch_simple_bayesian[34][45] = -8.851; qual_mismatch_simple_bayesian[34][46] = -8.86625; qual_mismatch_simple_bayesian[35][0] = -1.09872; qual_mismatch_simple_bayesian[35][1] = -1.32889; qual_mismatch_simple_bayesian[35][2] = -1.55905; qual_mismatch_simple_bayesian[35][3] = -1.78918; qual_mismatch_simple_bayesian[35][4] = -2.01927; qual_mismatch_simple_bayesian[35][5] = -2.24933; qual_mismatch_simple_bayesian[35][6] = -2.47933; qual_mismatch_simple_bayesian[35][7] = -2.70926; qual_mismatch_simple_bayesian[35][8] = -2.93911; qual_mismatch_simple_bayesian[35][9] = -3.16885; qual_mismatch_simple_bayesian[35][10] = -3.39846; qual_mismatch_simple_bayesian[35][11] = -3.6279; qual_mismatch_simple_bayesian[35][12] = -3.85713; qual_mismatch_simple_bayesian[35][13] = -4.0861; qual_mismatch_simple_bayesian[35][14] = -4.31474; qual_mismatch_simple_bayesian[35][15] = -4.54296; qual_mismatch_simple_bayesian[35][16] = -4.77065; qual_mismatch_simple_bayesian[35][17] = -4.9977; qual_mismatch_simple_bayesian[35][18] = -5.22392; qual_mismatch_simple_bayesian[35][19] = -5.44913; qual_mismatch_simple_bayesian[35][20] = -5.67306; qual_mismatch_simple_bayesian[35][21] = -5.89541; qual_mismatch_simple_bayesian[35][22] = -6.1158; qual_mismatch_simple_bayesian[35][23] = -6.33377; qual_mismatch_simple_bayesian[35][24] = -6.54877; qual_mismatch_simple_bayesian[35][25] = -6.76015; qual_mismatch_simple_bayesian[35][26] = -6.96713; qual_mismatch_simple_bayesian[35][27] = -7.16884; qual_mismatch_simple_bayesian[35][28] = -7.36428; qual_mismatch_simple_bayesian[35][29] = -7.55235; qual_mismatch_simple_bayesian[35][30] = -7.73192; qual_mismatch_simple_bayesian[35][31] = -7.90181; qual_mismatch_simple_bayesian[35][32] = -8.06091; qual_mismatch_simple_bayesian[35][33] = -8.20823; qual_mismatch_simple_bayesian[35][34] = -8.34301; qual_mismatch_simple_bayesian[35][35] = -8.46472; qual_mismatch_simple_bayesian[35][36] = -8.57322; qual_mismatch_simple_bayesian[35][37] = -8.66866; qual_mismatch_simple_bayesian[35][38] = -8.75154; qual_mismatch_simple_bayesian[35][39] = -8.82266; qual_mismatch_simple_bayesian[35][40] = -8.88299; qual_mismatch_simple_bayesian[35][41] = -8.93365; qual_mismatch_simple_bayesian[35][42] = -8.9758; qual_mismatch_simple_bayesian[35][43] = -9.0106; qual_mismatch_simple_bayesian[35][44] = -9.03913; qual_mismatch_simple_bayesian[35][45] = -9.06239; qual_mismatch_simple_bayesian[35][46] = -9.08126; qual_mismatch_simple_bayesian[36][0] = -1.0987; qual_mismatch_simple_bayesian[36][1] = -1.32889; qual_mismatch_simple_bayesian[36][2] = -1.55907; qual_mismatch_simple_bayesian[36][3] = -1.78922; qual_mismatch_simple_bayesian[36][4] = -2.01935; qual_mismatch_simple_bayesian[36][5] = -2.24945; qual_mismatch_simple_bayesian[36][6] = -2.4795; qual_mismatch_simple_bayesian[36][7] = -2.7095; qual_mismatch_simple_bayesian[36][8] = -2.93943; qual_mismatch_simple_bayesian[36][9] = -3.16928; qual_mismatch_simple_bayesian[36][10] = -3.39902; qual_mismatch_simple_bayesian[36][11] = -3.62863; qual_mismatch_simple_bayesian[36][12] = -3.85807; qual_mismatch_simple_bayesian[36][13] = -4.08731; qual_mismatch_simple_bayesian[36][14] = -4.31627; qual_mismatch_simple_bayesian[36][15] = -4.54491; qual_mismatch_simple_bayesian[36][16] = -4.77313; qual_mismatch_simple_bayesian[36][17] = -5.00083; qual_mismatch_simple_bayesian[36][18] = -5.22787; qual_mismatch_simple_bayesian[36][19] = -5.4541; qual_mismatch_simple_bayesian[36][20] = -5.6793; qual_mismatch_simple_bayesian[36][21] = -5.90323; qual_mismatch_simple_bayesian[36][22] = -6.12558; qual_mismatch_simple_bayesian[36][23] = -6.34597; qual_mismatch_simple_bayesian[36][24] = -6.56395; qual_mismatch_simple_bayesian[36][25] = -6.77895; qual_mismatch_simple_bayesian[36][26] = -6.99033; qual_mismatch_simple_bayesian[36][27] = -7.19731; qual_mismatch_simple_bayesian[36][28] = -7.39902; qual_mismatch_simple_bayesian[36][29] = -7.59446; qual_mismatch_simple_bayesian[36][30] = -7.78254; qual_mismatch_simple_bayesian[36][31] = -7.96211; qual_mismatch_simple_bayesian[36][32] = -8.132; qual_mismatch_simple_bayesian[36][33] = -8.29111; qual_mismatch_simple_bayesian[36][34] = -8.43844; qual_mismatch_simple_bayesian[36][35] = -8.57322; qual_mismatch_simple_bayesian[36][36] = -8.69494; qual_mismatch_simple_bayesian[36][37] = -8.80344; qual_mismatch_simple_bayesian[36][38] = -8.89888; qual_mismatch_simple_bayesian[36][39] = -8.98177; qual_mismatch_simple_bayesian[36][40] = -9.05289; qual_mismatch_simple_bayesian[36][41] = -9.11323; qual_mismatch_simple_bayesian[36][42] = -9.16389; qual_mismatch_simple_bayesian[36][43] = -9.20605; qual_mismatch_simple_bayesian[36][44] = -9.24085; qual_mismatch_simple_bayesian[36][45] = -9.26938; qual_mismatch_simple_bayesian[36][46] = -9.29264; qual_mismatch_simple_bayesian[37][0] = -1.09868; qual_mismatch_simple_bayesian[37][1] = -1.32889; qual_mismatch_simple_bayesian[37][2] = -1.55908; qual_mismatch_simple_bayesian[37][3] = -1.78926; qual_mismatch_simple_bayesian[37][4] = -2.01941; qual_mismatch_simple_bayesian[37][5] = -2.24954; qual_mismatch_simple_bayesian[37][6] = -2.47964; qual_mismatch_simple_bayesian[37][7] = -2.70969; qual_mismatch_simple_bayesian[37][8] = -2.93969; qual_mismatch_simple_bayesian[37][9] = -3.16962; qual_mismatch_simple_bayesian[37][10] = -3.39947; qual_mismatch_simple_bayesian[37][11] = -3.62921; qual_mismatch_simple_bayesian[37][12] = -3.85882; qual_mismatch_simple_bayesian[37][13] = -4.08826; qual_mismatch_simple_bayesian[37][14] = -4.3175; qual_mismatch_simple_bayesian[37][15] = -4.54646; qual_mismatch_simple_bayesian[37][16] = -4.7751; qual_mismatch_simple_bayesian[37][17] = -5.00332; qual_mismatch_simple_bayesian[37][18] = -5.23102; qual_mismatch_simple_bayesian[37][19] = -5.45806; qual_mismatch_simple_bayesian[37][20] = -5.68429; qual_mismatch_simple_bayesian[37][21] = -5.90949; qual_mismatch_simple_bayesian[37][22] = -6.13342; qual_mismatch_simple_bayesian[37][23] = -6.35578; qual_mismatch_simple_bayesian[37][24] = -6.57617; qual_mismatch_simple_bayesian[37][25] = -6.79414; qual_mismatch_simple_bayesian[37][26] = -7.00914; qual_mismatch_simple_bayesian[37][27] = -7.22052; qual_mismatch_simple_bayesian[37][28] = -7.42751; qual_mismatch_simple_bayesian[37][29] = -7.62922; qual_mismatch_simple_bayesian[37][30] = -7.82466; qual_mismatch_simple_bayesian[37][31] = -8.01274; qual_mismatch_simple_bayesian[37][32] = -8.19232; qual_mismatch_simple_bayesian[37][33] = -8.36221; qual_mismatch_simple_bayesian[37][34] = -8.52132; qual_mismatch_simple_bayesian[37][35] = -8.66866; qual_mismatch_simple_bayesian[37][36] = -8.80344; qual_mismatch_simple_bayesian[37][37] = -8.92516; qual_mismatch_simple_bayesian[37][38] = -9.03366; qual_mismatch_simple_bayesian[37][39] = -9.12911; qual_mismatch_simple_bayesian[37][40] = -9.21201; qual_mismatch_simple_bayesian[37][41] = -9.28313; qual_mismatch_simple_bayesian[37][42] = -9.34347; qual_mismatch_simple_bayesian[37][43] = -9.39414; qual_mismatch_simple_bayesian[37][44] = -9.43629; qual_mismatch_simple_bayesian[37][45] = -9.4711; qual_mismatch_simple_bayesian[37][46] = -9.49963; qual_mismatch_simple_bayesian[38][0] = -1.09867; qual_mismatch_simple_bayesian[38][1] = -1.32888; qual_mismatch_simple_bayesian[38][2] = -1.55909; qual_mismatch_simple_bayesian[38][3] = -1.78928; qual_mismatch_simple_bayesian[38][4] = -2.01946; qual_mismatch_simple_bayesian[38][5] = -2.24962; qual_mismatch_simple_bayesian[38][6] = -2.47974; qual_mismatch_simple_bayesian[38][7] = -2.70984; qual_mismatch_simple_bayesian[38][8] = -2.93989; qual_mismatch_simple_bayesian[38][9] = -3.16989; qual_mismatch_simple_bayesian[38][10] = -3.39982; qual_mismatch_simple_bayesian[38][11] = -3.62967; qual_mismatch_simple_bayesian[38][12] = -3.85942; qual_mismatch_simple_bayesian[38][13] = -4.08903; qual_mismatch_simple_bayesian[38][14] = -4.31847; qual_mismatch_simple_bayesian[38][15] = -4.5477; qual_mismatch_simple_bayesian[38][16] = -4.77667; qual_mismatch_simple_bayesian[38][17] = -5.0053; qual_mismatch_simple_bayesian[38][18] = -5.23352; qual_mismatch_simple_bayesian[38][19] = -5.46122; qual_mismatch_simple_bayesian[38][20] = -5.68827; qual_mismatch_simple_bayesian[38][21] = -5.91449; qual_mismatch_simple_bayesian[38][22] = -6.1397; qual_mismatch_simple_bayesian[38][23] = -6.36363; qual_mismatch_simple_bayesian[38][24] = -6.58598; qual_mismatch_simple_bayesian[38][25] = -6.80637; qual_mismatch_simple_bayesian[38][26] = -7.02435; qual_mismatch_simple_bayesian[38][27] = -7.23935; qual_mismatch_simple_bayesian[38][28] = -7.45073; qual_mismatch_simple_bayesian[38][29] = -7.65772; qual_mismatch_simple_bayesian[38][30] = -7.85943; qual_mismatch_simple_bayesian[38][31] = -8.05488; qual_mismatch_simple_bayesian[38][32] = -8.24296; qual_mismatch_simple_bayesian[38][33] = -8.42253; qual_mismatch_simple_bayesian[38][34] = -8.59243; qual_mismatch_simple_bayesian[38][35] = -8.75154; qual_mismatch_simple_bayesian[38][36] = -8.89888; qual_mismatch_simple_bayesian[38][37] = -9.03366; qual_mismatch_simple_bayesian[38][38] = -9.15539; qual_mismatch_simple_bayesian[38][39] = -9.2639; qual_mismatch_simple_bayesian[38][40] = -9.35935; qual_mismatch_simple_bayesian[38][41] = -9.44225; qual_mismatch_simple_bayesian[38][42] = -9.51338; qual_mismatch_simple_bayesian[38][43] = -9.57372; qual_mismatch_simple_bayesian[38][44] = -9.62438; qual_mismatch_simple_bayesian[38][45] = -9.66654; qual_mismatch_simple_bayesian[38][46] = -9.70135; qual_mismatch_simple_bayesian[39][0] = -1.09865; qual_mismatch_simple_bayesian[39][1] = -1.32888; qual_mismatch_simple_bayesian[39][2] = -1.5591; qual_mismatch_simple_bayesian[39][3] = -1.7893; qual_mismatch_simple_bayesian[39][4] = -2.0195; qual_mismatch_simple_bayesian[39][5] = -2.24967; qual_mismatch_simple_bayesian[39][6] = -2.47983; qual_mismatch_simple_bayesian[39][7] = -2.70996; qual_mismatch_simple_bayesian[39][8] = -2.94005; qual_mismatch_simple_bayesian[39][9] = -3.17011; qual_mismatch_simple_bayesian[39][10] = -3.40011; qual_mismatch_simple_bayesian[39][11] = -3.63004; qual_mismatch_simple_bayesian[39][12] = -3.85989; qual_mismatch_simple_bayesian[39][13] = -4.08963; qual_mismatch_simple_bayesian[39][14] = -4.31924; qual_mismatch_simple_bayesian[39][15] = -4.54868; qual_mismatch_simple_bayesian[39][16] = -4.77792; qual_mismatch_simple_bayesian[39][17] = -5.00688; qual_mismatch_simple_bayesian[39][18] = -5.23552; qual_mismatch_simple_bayesian[39][19] = -5.46374; qual_mismatch_simple_bayesian[39][20] = -5.69144; qual_mismatch_simple_bayesian[39][21] = -5.91848; qual_mismatch_simple_bayesian[39][22] = -6.14471; qual_mismatch_simple_bayesian[39][23] = -6.36991; qual_mismatch_simple_bayesian[39][24] = -6.59385; qual_mismatch_simple_bayesian[39][25] = -6.8162; qual_mismatch_simple_bayesian[39][26] = -7.03659; qual_mismatch_simple_bayesian[39][27] = -7.25456; qual_mismatch_simple_bayesian[39][28] = -7.46957; qual_mismatch_simple_bayesian[39][29] = -7.68095; qual_mismatch_simple_bayesian[39][30] = -7.88794; qual_mismatch_simple_bayesian[39][31] = -8.08965; qual_mismatch_simple_bayesian[39][32] = -8.2851; qual_mismatch_simple_bayesian[39][33] = -8.47318; qual_mismatch_simple_bayesian[39][34] = -8.65276; qual_mismatch_simple_bayesian[39][35] = -8.82266; qual_mismatch_simple_bayesian[39][36] = -8.98177; qual_mismatch_simple_bayesian[39][37] = -9.12911; qual_mismatch_simple_bayesian[39][38] = -9.2639; qual_mismatch_simple_bayesian[39][39] = -9.38563; qual_mismatch_simple_bayesian[39][40] = -9.49414; qual_mismatch_simple_bayesian[39][41] = -9.58959; qual_mismatch_simple_bayesian[39][42] = -9.67249; qual_mismatch_simple_bayesian[39][43] = -9.74362; qual_mismatch_simple_bayesian[39][44] = -9.80396; qual_mismatch_simple_bayesian[39][45] = -9.85463; qual_mismatch_simple_bayesian[39][46] = -9.8968; qual_mismatch_simple_bayesian[40][0] = -1.09865; qual_mismatch_simple_bayesian[40][1] = -1.32888; qual_mismatch_simple_bayesian[40][2] = -1.5591; qual_mismatch_simple_bayesian[40][3] = -1.78932; qual_mismatch_simple_bayesian[40][4] = -2.01953; qual_mismatch_simple_bayesian[40][5] = -2.24972; qual_mismatch_simple_bayesian[40][6] = -2.4799; qual_mismatch_simple_bayesian[40][7] = -2.71005; qual_mismatch_simple_bayesian[40][8] = -2.94018; qual_mismatch_simple_bayesian[40][9] = -3.17028; qual_mismatch_simple_bayesian[40][10] = -3.40033; qual_mismatch_simple_bayesian[40][11] = -3.63033; qual_mismatch_simple_bayesian[40][12] = -3.86026; qual_mismatch_simple_bayesian[40][13] = -4.09011; qual_mismatch_simple_bayesian[40][14] = -4.31986; qual_mismatch_simple_bayesian[40][15] = -4.54947; qual_mismatch_simple_bayesian[40][16] = -4.77891; qual_mismatch_simple_bayesian[40][17] = -5.00814; qual_mismatch_simple_bayesian[40][18] = -5.23711; qual_mismatch_simple_bayesian[40][19] = -5.46574; qual_mismatch_simple_bayesian[40][20] = -5.69396; qual_mismatch_simple_bayesian[40][21] = -5.92166; qual_mismatch_simple_bayesian[40][22] = -6.14871; qual_mismatch_simple_bayesian[40][23] = -6.37493; qual_mismatch_simple_bayesian[40][24] = -6.60014; qual_mismatch_simple_bayesian[40][25] = -6.82407; qual_mismatch_simple_bayesian[40][26] = -7.04642; qual_mismatch_simple_bayesian[40][27] = -7.26682; qual_mismatch_simple_bayesian[40][28] = -7.48479; qual_mismatch_simple_bayesian[40][29] = -7.6998; qual_mismatch_simple_bayesian[40][30] = -7.91118; qual_mismatch_simple_bayesian[40][31] = -8.11817; qual_mismatch_simple_bayesian[40][32] = -8.31988; qual_mismatch_simple_bayesian[40][33] = -8.51533; qual_mismatch_simple_bayesian[40][34] = -8.70341; qual_mismatch_simple_bayesian[40][35] = -8.88299; qual_mismatch_simple_bayesian[40][36] = -9.05289; qual_mismatch_simple_bayesian[40][37] = -9.21201; qual_mismatch_simple_bayesian[40][38] = -9.35935; qual_mismatch_simple_bayesian[40][39] = -9.49414; qual_mismatch_simple_bayesian[40][40] = -9.61587; qual_mismatch_simple_bayesian[40][41] = -9.72438; qual_mismatch_simple_bayesian[40][42] = -9.81984; qual_mismatch_simple_bayesian[40][43] = -9.90274; qual_mismatch_simple_bayesian[40][44] = -9.97387; qual_mismatch_simple_bayesian[40][45] = -10.0342; qual_mismatch_simple_bayesian[40][46] = -10.0849; qual_mismatch_simple_bayesian[41][0] = -1.09864; qual_mismatch_simple_bayesian[41][1] = -1.32888; qual_mismatch_simple_bayesian[41][2] = -1.55911; qual_mismatch_simple_bayesian[41][3] = -1.78934; qual_mismatch_simple_bayesian[41][4] = -2.01955; qual_mismatch_simple_bayesian[41][5] = -2.24976; qual_mismatch_simple_bayesian[41][6] = -2.47995; qual_mismatch_simple_bayesian[41][7] = -2.71013; qual_mismatch_simple_bayesian[41][8] = -2.94029; qual_mismatch_simple_bayesian[41][9] = -3.17041; qual_mismatch_simple_bayesian[41][10] = -3.40051; qual_mismatch_simple_bayesian[41][11] = -3.63056; qual_mismatch_simple_bayesian[41][12] = -3.86056; qual_mismatch_simple_bayesian[41][13] = -4.0905; qual_mismatch_simple_bayesian[41][14] = -4.32034; qual_mismatch_simple_bayesian[41][15] = -4.55009; qual_mismatch_simple_bayesian[41][16] = -4.7797; qual_mismatch_simple_bayesian[41][17] = -5.00914; qual_mismatch_simple_bayesian[41][18] = -5.23837; qual_mismatch_simple_bayesian[41][19] = -5.46734; qual_mismatch_simple_bayesian[41][20] = -5.69598; qual_mismatch_simple_bayesian[41][21] = -5.9242; qual_mismatch_simple_bayesian[41][22] = -6.15189; qual_mismatch_simple_bayesian[41][23] = -6.37894; qual_mismatch_simple_bayesian[41][24] = -6.60516; qual_mismatch_simple_bayesian[41][25] = -6.83037; qual_mismatch_simple_bayesian[41][26] = -7.0543; qual_mismatch_simple_bayesian[41][27] = -7.27666; qual_mismatch_simple_bayesian[41][28] = -7.49705; qual_mismatch_simple_bayesian[41][29] = -7.71502; qual_mismatch_simple_bayesian[41][30] = -7.93003; qual_mismatch_simple_bayesian[41][31] = -8.14141; qual_mismatch_simple_bayesian[41][32] = -8.3484; qual_mismatch_simple_bayesian[41][33] = -8.55012; qual_mismatch_simple_bayesian[41][34] = -8.74556; qual_mismatch_simple_bayesian[41][35] = -8.93365; qual_mismatch_simple_bayesian[41][36] = -9.11323; qual_mismatch_simple_bayesian[41][37] = -9.28313; qual_mismatch_simple_bayesian[41][38] = -9.44225; qual_mismatch_simple_bayesian[41][39] = -9.58959; qual_mismatch_simple_bayesian[41][40] = -9.72438; qual_mismatch_simple_bayesian[41][41] = -9.84612; qual_mismatch_simple_bayesian[41][42] = -9.95463; qual_mismatch_simple_bayesian[41][43] = -10.0501; qual_mismatch_simple_bayesian[41][44] = -10.133; qual_mismatch_simple_bayesian[41][45] = -10.2041; qual_mismatch_simple_bayesian[41][46] = -10.2645; qual_mismatch_simple_bayesian[42][0] = -1.09863; qual_mismatch_simple_bayesian[42][1] = -1.32888; qual_mismatch_simple_bayesian[42][2] = -1.55911; qual_mismatch_simple_bayesian[42][3] = -1.78935; qual_mismatch_simple_bayesian[42][4] = -2.01957; qual_mismatch_simple_bayesian[42][5] = -2.24979; qual_mismatch_simple_bayesian[42][6] = -2.48; qual_mismatch_simple_bayesian[42][7] = -2.71019; qual_mismatch_simple_bayesian[42][8] = -2.94037; qual_mismatch_simple_bayesian[42][9] = -3.17052; qual_mismatch_simple_bayesian[42][10] = -3.40065; qual_mismatch_simple_bayesian[42][11] = -3.63075; qual_mismatch_simple_bayesian[42][12] = -3.8608; qual_mismatch_simple_bayesian[42][13] = -4.0908; qual_mismatch_simple_bayesian[42][14] = -4.32073; qual_mismatch_simple_bayesian[42][15] = -4.55058; qual_mismatch_simple_bayesian[42][16] = -4.78032; qual_mismatch_simple_bayesian[42][17] = -5.00993; qual_mismatch_simple_bayesian[42][18] = -5.23938; qual_mismatch_simple_bayesian[42][19] = -5.46861; qual_mismatch_simple_bayesian[42][20] = -5.69758; qual_mismatch_simple_bayesian[42][21] = -5.92621; qual_mismatch_simple_bayesian[42][22] = -6.15443; qual_mismatch_simple_bayesian[42][23] = -6.38213; qual_mismatch_simple_bayesian[42][24] = -6.60917; qual_mismatch_simple_bayesian[42][25] = -6.8354; qual_mismatch_simple_bayesian[42][26] = -7.06061; qual_mismatch_simple_bayesian[42][27] = -7.28454; qual_mismatch_simple_bayesian[42][28] = -7.50689; qual_mismatch_simple_bayesian[42][29] = -7.72729; qual_mismatch_simple_bayesian[42][30] = -7.94526; qual_mismatch_simple_bayesian[42][31] = -8.16027; qual_mismatch_simple_bayesian[42][32] = -8.37165; qual_mismatch_simple_bayesian[42][33] = -8.57864; qual_mismatch_simple_bayesian[42][34] = -8.78036; qual_mismatch_simple_bayesian[42][35] = -8.9758; qual_mismatch_simple_bayesian[42][36] = -9.16389; qual_mismatch_simple_bayesian[42][37] = -9.34347; qual_mismatch_simple_bayesian[42][38] = -9.51338; qual_mismatch_simple_bayesian[42][39] = -9.67249; qual_mismatch_simple_bayesian[42][40] = -9.81984; qual_mismatch_simple_bayesian[42][41] = -9.95463; qual_mismatch_simple_bayesian[42][42] = -10.0764; qual_mismatch_simple_bayesian[42][43] = -10.1849; qual_mismatch_simple_bayesian[42][44] = -10.2803; qual_mismatch_simple_bayesian[42][45] = -10.3632; qual_mismatch_simple_bayesian[42][46] = -10.4344; qual_mismatch_simple_bayesian[43][0] = -1.09863; qual_mismatch_simple_bayesian[43][1] = -1.32887; qual_mismatch_simple_bayesian[43][2] = -1.55912; qual_mismatch_simple_bayesian[43][3] = -1.78935; qual_mismatch_simple_bayesian[43][4] = -2.01959; qual_mismatch_simple_bayesian[43][5] = -2.24981; qual_mismatch_simple_bayesian[43][6] = -2.48003; qual_mismatch_simple_bayesian[43][7] = -2.71024; qual_mismatch_simple_bayesian[43][8] = -2.94043; qual_mismatch_simple_bayesian[43][9] = -3.17061; qual_mismatch_simple_bayesian[43][10] = -3.40076; qual_mismatch_simple_bayesian[43][11] = -3.63089; qual_mismatch_simple_bayesian[43][12] = -3.86099; qual_mismatch_simple_bayesian[43][13] = -4.09104; qual_mismatch_simple_bayesian[43][14] = -4.32104; qual_mismatch_simple_bayesian[43][15] = -4.55097; qual_mismatch_simple_bayesian[43][16] = -4.78082; qual_mismatch_simple_bayesian[43][17] = -5.01056; qual_mismatch_simple_bayesian[43][18] = -5.24017; qual_mismatch_simple_bayesian[43][19] = -5.46962; qual_mismatch_simple_bayesian[43][20] = -5.69885; qual_mismatch_simple_bayesian[43][21] = -5.92782; qual_mismatch_simple_bayesian[43][22] = -6.15645; qual_mismatch_simple_bayesian[43][23] = -6.38467; qual_mismatch_simple_bayesian[43][24] = -6.61237; qual_mismatch_simple_bayesian[43][25] = -6.83942; qual_mismatch_simple_bayesian[43][26] = -7.06564; qual_mismatch_simple_bayesian[43][27] = -7.29085; qual_mismatch_simple_bayesian[43][28] = -7.51478; qual_mismatch_simple_bayesian[43][29] = -7.73713; qual_mismatch_simple_bayesian[43][30] = -7.95753; qual_mismatch_simple_bayesian[43][31] = -8.1755; qual_mismatch_simple_bayesian[43][32] = -8.39051; qual_mismatch_simple_bayesian[43][33] = -8.60189; qual_mismatch_simple_bayesian[43][34] = -8.80888; qual_mismatch_simple_bayesian[43][35] = -9.0106; qual_mismatch_simple_bayesian[43][36] = -9.20605; qual_mismatch_simple_bayesian[43][37] = -9.39414; qual_mismatch_simple_bayesian[43][38] = -9.57372; qual_mismatch_simple_bayesian[43][39] = -9.74362; qual_mismatch_simple_bayesian[43][40] = -9.90274; qual_mismatch_simple_bayesian[43][41] = -10.0501; qual_mismatch_simple_bayesian[43][42] = -10.1849; qual_mismatch_simple_bayesian[43][43] = -10.3066; qual_mismatch_simple_bayesian[43][44] = -10.4151; qual_mismatch_simple_bayesian[43][45] = -10.5106; qual_mismatch_simple_bayesian[43][46] = -10.5935; qual_mismatch_simple_bayesian[44][0] = -1.09863; qual_mismatch_simple_bayesian[44][1] = -1.32887; qual_mismatch_simple_bayesian[44][2] = -1.55912; qual_mismatch_simple_bayesian[44][3] = -1.78936; qual_mismatch_simple_bayesian[44][4] = -2.0196; qual_mismatch_simple_bayesian[44][5] = -2.24983; qual_mismatch_simple_bayesian[44][6] = -2.48006; qual_mismatch_simple_bayesian[44][7] = -2.71028; qual_mismatch_simple_bayesian[44][8] = -2.94048; qual_mismatch_simple_bayesian[44][9] = -3.17068; qual_mismatch_simple_bayesian[44][10] = -3.40085; qual_mismatch_simple_bayesian[44][11] = -3.63101; qual_mismatch_simple_bayesian[44][12] = -3.86114; qual_mismatch_simple_bayesian[44][13] = -4.09123; qual_mismatch_simple_bayesian[44][14] = -4.32128; qual_mismatch_simple_bayesian[44][15] = -4.55128; qual_mismatch_simple_bayesian[44][16] = -4.78122; qual_mismatch_simple_bayesian[44][17] = -5.01107; qual_mismatch_simple_bayesian[44][18] = -5.24081; qual_mismatch_simple_bayesian[44][19] = -5.47042; qual_mismatch_simple_bayesian[44][20] = -5.69986; qual_mismatch_simple_bayesian[44][21] = -5.92909; qual_mismatch_simple_bayesian[44][22] = -6.15806; qual_mismatch_simple_bayesian[44][23] = -6.3867; qual_mismatch_simple_bayesian[44][24] = -6.61492; qual_mismatch_simple_bayesian[44][25] = -6.84262; qual_mismatch_simple_bayesian[44][26] = -7.06966; qual_mismatch_simple_bayesian[44][27] = -7.29589; qual_mismatch_simple_bayesian[44][28] = -7.52109; qual_mismatch_simple_bayesian[44][29] = -7.74503; qual_mismatch_simple_bayesian[44][30] = -7.96738; qual_mismatch_simple_bayesian[44][31] = -8.18777; qual_mismatch_simple_bayesian[44][32] = -8.40575; qual_mismatch_simple_bayesian[44][33] = -8.62076; qual_mismatch_simple_bayesian[44][34] = -8.83214; qual_mismatch_simple_bayesian[44][35] = -9.03913; qual_mismatch_simple_bayesian[44][36] = -9.24085; qual_mismatch_simple_bayesian[44][37] = -9.43629; qual_mismatch_simple_bayesian[44][38] = -9.62438; qual_mismatch_simple_bayesian[44][39] = -9.80396; qual_mismatch_simple_bayesian[44][40] = -9.97387; qual_mismatch_simple_bayesian[44][41] = -10.133; qual_mismatch_simple_bayesian[44][42] = -10.2803; qual_mismatch_simple_bayesian[44][43] = -10.4151; qual_mismatch_simple_bayesian[44][44] = -10.5369; qual_mismatch_simple_bayesian[44][45] = -10.6454; qual_mismatch_simple_bayesian[44][46] = -10.7408; qual_mismatch_simple_bayesian[45][0] = -1.09862; qual_mismatch_simple_bayesian[45][1] = -1.32887; qual_mismatch_simple_bayesian[45][2] = -1.55912; qual_mismatch_simple_bayesian[45][3] = -1.78937; qual_mismatch_simple_bayesian[45][4] = -2.01961; qual_mismatch_simple_bayesian[45][5] = -2.24985; qual_mismatch_simple_bayesian[45][6] = -2.48008; qual_mismatch_simple_bayesian[45][7] = -2.71031; qual_mismatch_simple_bayesian[45][8] = -2.94052; qual_mismatch_simple_bayesian[45][9] = -3.17073; qual_mismatch_simple_bayesian[45][10] = -3.40092; qual_mismatch_simple_bayesian[45][11] = -3.6311; qual_mismatch_simple_bayesian[45][12] = -3.86126; qual_mismatch_simple_bayesian[45][13] = -4.09138; qual_mismatch_simple_bayesian[45][14] = -4.32148; qual_mismatch_simple_bayesian[45][15] = -4.55153; qual_mismatch_simple_bayesian[45][16] = -4.78153; qual_mismatch_simple_bayesian[45][17] = -5.01147; qual_mismatch_simple_bayesian[45][18] = -5.24131; qual_mismatch_simple_bayesian[45][19] = -5.47106; qual_mismatch_simple_bayesian[45][20] = -5.70067; qual_mismatch_simple_bayesian[45][21] = -5.93011; qual_mismatch_simple_bayesian[45][22] = -6.15934; qual_mismatch_simple_bayesian[45][23] = -6.38831; qual_mismatch_simple_bayesian[45][24] = -6.61695; qual_mismatch_simple_bayesian[45][25] = -6.84517; qual_mismatch_simple_bayesian[45][26] = -7.07286; qual_mismatch_simple_bayesian[45][27] = -7.29991; qual_mismatch_simple_bayesian[45][28] = -7.52614; qual_mismatch_simple_bayesian[45][29] = -7.75134; qual_mismatch_simple_bayesian[45][30] = -7.97528; qual_mismatch_simple_bayesian[45][31] = -8.19763; qual_mismatch_simple_bayesian[45][32] = -8.41802; qual_mismatch_simple_bayesian[45][33] = -8.636; qual_mismatch_simple_bayesian[45][34] = -8.851; qual_mismatch_simple_bayesian[45][35] = -9.06239; qual_mismatch_simple_bayesian[45][36] = -9.26938; qual_mismatch_simple_bayesian[45][37] = -9.4711; qual_mismatch_simple_bayesian[45][38] = -9.66654; qual_mismatch_simple_bayesian[45][39] = -9.85463; qual_mismatch_simple_bayesian[45][40] = -10.0342; qual_mismatch_simple_bayesian[45][41] = -10.2041; qual_mismatch_simple_bayesian[45][42] = -10.3632; qual_mismatch_simple_bayesian[45][43] = -10.5106; qual_mismatch_simple_bayesian[45][44] = -10.6454; qual_mismatch_simple_bayesian[45][45] = -10.7671; qual_mismatch_simple_bayesian[45][46] = -10.8756; qual_mismatch_simple_bayesian[46][0] = -1.09862; qual_mismatch_simple_bayesian[46][1] = -1.32887; qual_mismatch_simple_bayesian[46][2] = -1.55912; qual_mismatch_simple_bayesian[46][3] = -1.78937; qual_mismatch_simple_bayesian[46][4] = -2.01962; qual_mismatch_simple_bayesian[46][5] = -2.24986; qual_mismatch_simple_bayesian[46][6] = -2.4801; qual_mismatch_simple_bayesian[46][7] = -2.71033; qual_mismatch_simple_bayesian[46][8] = -2.94056; qual_mismatch_simple_bayesian[46][9] = -3.17077; qual_mismatch_simple_bayesian[46][10] = -3.40098; qual_mismatch_simple_bayesian[46][11] = -3.63117; qual_mismatch_simple_bayesian[46][12] = -3.86135; qual_mismatch_simple_bayesian[46][13] = -4.09151; qual_mismatch_simple_bayesian[46][14] = -4.32163; qual_mismatch_simple_bayesian[46][15] = -4.55173; qual_mismatch_simple_bayesian[46][16] = -4.78178; qual_mismatch_simple_bayesian[46][17] = -5.01178; qual_mismatch_simple_bayesian[46][18] = -5.24172; qual_mismatch_simple_bayesian[46][19] = -5.47156; qual_mismatch_simple_bayesian[46][20] = -5.70131; qual_mismatch_simple_bayesian[46][21] = -5.93092; qual_mismatch_simple_bayesian[46][22] = -6.16036; qual_mismatch_simple_bayesian[46][23] = -6.38959; qual_mismatch_simple_bayesian[46][24] = -6.61856; qual_mismatch_simple_bayesian[46][25] = -6.8472; qual_mismatch_simple_bayesian[46][26] = -7.07542; qual_mismatch_simple_bayesian[46][27] = -7.30311; qual_mismatch_simple_bayesian[46][28] = -7.53016; qual_mismatch_simple_bayesian[46][29] = -7.75639; qual_mismatch_simple_bayesian[46][30] = -7.98159; qual_mismatch_simple_bayesian[46][31] = -8.20553; qual_mismatch_simple_bayesian[46][32] = -8.42788; qual_mismatch_simple_bayesian[46][33] = -8.64827; qual_mismatch_simple_bayesian[46][34] = -8.86625; qual_mismatch_simple_bayesian[46][35] = -9.08126; qual_mismatch_simple_bayesian[46][36] = -9.29264; qual_mismatch_simple_bayesian[46][37] = -9.49963; qual_mismatch_simple_bayesian[46][38] = -9.70135; qual_mismatch_simple_bayesian[46][39] = -9.8968; qual_mismatch_simple_bayesian[46][40] = -10.0849; qual_mismatch_simple_bayesian[46][41] = -10.2645; qual_mismatch_simple_bayesian[46][42] = -10.4344; qual_mismatch_simple_bayesian[46][43] = -10.5935; qual_mismatch_simple_bayesian[46][44] = -10.7408; qual_mismatch_simple_bayesian[46][45] = -10.8756; qual_mismatch_simple_bayesian[46][46] = -10.9974; return 0; } catch(exception& e) { m->errorOut(e, "MakeContigsCommand", "loadQmatchValues"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/makecontigscommand.h000066400000000000000000023356631255543666200222750ustar00rootroot00000000000000#ifndef Mothur_makecontigscommand_h #define Mothur_makecontigscommand_h // // makecontigscommand.h // Mothur // // Created by Sarah Westcott on 5/15/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "sequence.hpp" #include "qualityscores.h" #include "alignment.hpp" #include "gotohoverlap.hpp" #include "needlemanoverlap.hpp" #include "blastalign.hpp" #include "noalign.hpp" #include "trimoligos.h" #include "oligos.h" #include "fastqread.h" #include "kmeralign.h" # define PROBABILITY(score) (pow(10.0, (-(double)(score)) / 10.0)) # define PHREDMAX 46 # define PHREDCLAMP(x) ((x) > PHREDMAX ? PHREDMAX : ((x) < 0 ? 0 : (x))) struct pairFastqRead { FastqRead forward; FastqRead reverse; FastqRead findex; FastqRead rindex; pairFastqRead() {}; pairFastqRead(FastqRead f, FastqRead r) : forward(f), reverse(r){}; pairFastqRead(FastqRead f, FastqRead r, FastqRead fi, FastqRead ri) : forward(f), reverse(r), findex(fi), rindex(ri) {}; ~pairFastqRead() {}; }; /**************************************************************************************************/ class MakeContigsCommand : public Command { public: MakeContigsCommand(string); MakeContigsCommand(); ~MakeContigsCommand(){} vector setParameters(); string getCommandName() { return "make.contigs"; } string getCommandCategory() { return "Sequence Processing"; } //commmand category choices: Sequence Processing, OTU-Based Approaches, Hypothesis Testing, Phylotype Analysis, General, Clustering and Hidden string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Make.contigs"; } string getDescription() { return "description"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: char delim; bool abort, allFiles, trimOverlap, createFileGroup, createOligosGroup, makeCount, noneOk, reorient, gz; string outputDir, ffastqfile, rfastqfile, align, oligosfile, rfastafile, ffastafile, rqualfile, fqualfile, findexfile, rindexfile, file, format, inputDir; string outFastaFile, outQualFile, outScrapFastaFile, outScrapQualFile, outMisMatchFile, outputGroupFileName, group; float match, misMatch, gapOpen, gapExtend; int processors, longestBase, insert, tdiffs, bdiffs, pdiffs, ldiffs, sdiffs, deltaq, kmerSize, numBarcodes, numFPrimers, numLinkers, numSpacers, numRPrimers; vector outputNames; Oligos* oligos; map groupCounts; map groupMap; map file2Group; vector qual_score; bool checkName(FastqRead& forward, FastqRead& reverse); bool checkName(Sequence& forward, Sequence& reverse); bool checkName(QualityScores& forward, QualityScores& reverse); bool checkName(Sequence& forward, QualityScores& reverse); unsigned long long processMultipleFileOption(map&); unsigned long long processSingleFileOption(map&); int loadQmatchValues(vector< vector >&, vector< vector >&); #ifdef USE_BOOST bool read(Sequence&, Sequence&, QualityScores*&, QualityScores*&, QualityScores*& savedFQual, QualityScores*& savedRQual, Sequence&, Sequence&, char, boost::iostreams::filtering_istream&, boost::iostreams::filtering_istream&, boost::iostreams::filtering_istream&, boost::iostreams::filtering_istream&, string, string); #endif bool read(Sequence&, Sequence&, QualityScores*&, QualityScores*&, QualityScores*& savedFQual, QualityScores*& savedRQual, Sequence&, Sequence&, char, ifstream&, ifstream&, ifstream&, ifstream&, string, string); vector assembleFragments(vector< vector >&qual_match_simple_bayesian, vector< vector >& qual_mismatch_simple_bayesian, Sequence& fSeq, Sequence& rSeq, QualityScores*&, QualityScores*&, QualityScores*& savedFQual, QualityScores*& savedRQual, bool, Alignment*& alignment, string& contig, string&, int&, int&, int&); //main processing functions unsigned long long createProcesses(vector, vector, string, string, string, string, string, vector >, vector >, vector, vector, string); unsigned long long createProcessesGroups(vector< vector >, string compositeGroupFile, string compositeFastaFile, string compositeScrapFastaFile, string compositeQualFile, string compositeScrapQualFile, string compositeMisMatchFile, map& totalGroupCounts); unsigned long long driverGroups(vector >, int, int, string, string, string, string, string, string, map&); unsigned long long driver(vector files, vector qualOrIndexFiles, string outputFasta, string outputScrapFasta, string outputQual, string outputScrapQual, string outputMisMatches, vector > fastaFileNames, vector > qualFileNames, linePair, linePair, linePair, linePair, string); int convertProb(double qProb); vector< vector > readFileNames(string); bool getOligos(vector >&, vector >&, string, map&); int setLines(vector, vector, vector& fastaFilePos, vector& qfileFilePos, char delim); //the delim let you know whether this is fasta and qual, or fastq and index. linePair entries will always be in sets of two. One for the forward and one for hte reverse. (fastaFilePos[0] - ffasta, fastaFilePos[1] - rfasta) - processor1 }; /**************************************************************************************************/ /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct contigsData { char delim; unsigned long long linesInput_start, linesInput_end, linesInputReverse_start, qlinesInput_start, qlinesInputReverse_start, linesInputReverse_end, qlinesInput_end, qlinesInputReverse_end; string outputFasta, outputQual, outputDir; string outputScrapFasta, outputScrapQual; string outputMisMatches; string align, group, oligosfile, format; vector inputFiles, qualOrIndexFiles, outputNames; vector > fastaFileNames, qualFileNames; MothurOut* m; float match, misMatch, gapOpen, gapExtend; int count, insert, threadID, pdiffs, bdiffs, tdiffs, deltaq, kmerSize; bool allFiles, createOligosGroup, createFileGroup, done, trimOverlap, reorient, gz; map groupCounts; map groupMap; vector< vector > fileInputs; int start,end; string compositeGroupFile, compositeFastaFile, compositeScrapFastaFile, compositeQualFile, compositeScrapQualFile, compositeMisMatchFile; map totalGroupCounts; map file2Group; contigsData(){} contigsData(string form, char d, string g, vector f, vector qif, string of, string osf, string oq, string osq, string om, string al, MothurOut* mout, float ma, float misMa, float gapO, float gapE, int thr, int delt, vector > ffn, vector > qfn,string olig, bool ro, int pdf, int bdf, int tdf, int km, bool cg, bool cfg, bool all, bool to, unsigned long long lff, unsigned long long lff2, unsigned long long lrf, unsigned long long lrf2, unsigned long long qff, unsigned long long qff2, unsigned long long qrf, unsigned long long qrf2, int tid) { inputFiles = f; qualOrIndexFiles = qif; outputFasta = of; outputMisMatches = om; outputQual = oq; outputScrapQual = osq; m = mout; match = ma; misMatch = misMa; gapOpen = gapO; gapExtend = gapE; insert = thr; kmerSize = km; align = al; group = g; count = 0; outputScrapFasta = osf; fastaFileNames = ffn; qualFileNames = qfn; oligosfile = olig; pdiffs = pdf; bdiffs = bdf; tdiffs = tdf; allFiles = all; trimOverlap = to; createOligosGroup = cg; createFileGroup = cfg; threadID = tid; deltaq = delt; reorient = ro; linesInput_start = lff; linesInput_end = lff2; linesInputReverse_start = lrf; linesInputReverse_end = lrf2; qlinesInput_start = qff; qlinesInput_end = qff2; qlinesInputReverse_start = qrf; qlinesInputReverse_end = qrf2; delim = d; format = form; done=false; } contigsData(string form, char d, string g, string al, string opd, MothurOut* mout, float ma, float misMa, float gapO, float gapE, int thr, int delt, string olig, bool ro, int pdf, int bdf, int tdf, int km, bool cg, bool cfg, bool all, bool to, int tid, vector< vector > fileI, int st, int ed, string compGroupFile, string compFastaFile, string compScrapFastaFile, string compQualFile, string compScrapQualFile, string compMisMatchFile, map tGroupCounts, map fGroup, bool gzb) { m = mout; match = ma; misMatch = misMa; gapOpen = gapO; gapExtend = gapE; insert = thr; kmerSize = km; align = al; group = g; count = 0; oligosfile = olig; pdiffs = pdf; bdiffs = bdf; tdiffs = tdf; allFiles = all; outputDir = opd; trimOverlap = to; createOligosGroup = cg; createFileGroup = cfg; threadID = tid; deltaq = delt; reorient = ro; delim = d; format = form; done=false; fileInputs = fileI; start = st; end= ed; compositeGroupFile = compGroupFile; compositeFastaFile = compFastaFile; compositeMisMatchFile = compMisMatchFile; compositeQualFile = compQualFile; compositeScrapFastaFile = compScrapFastaFile; compositeScrapQualFile = compScrapQualFile; totalGroupCounts = tGroupCounts; file2Group = fGroup; gz = gzb; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else //********************************************************************************************************************** //unsigned long long MakeContigsCommand::driverGroups(vector< vector > fileInputs, int start, int end, string compositeGroupFile, string compositeFastaFile, string compositeScrapFastaFile, string compositeQualFile, string compositeScrapQualFile, string compositeMisMatchFile, map& totalGroupCounts) { //ONLY GETS HERE WITH BOOST OPTION static DWORD WINAPI MyGroupContigsThreadFunction(LPVOID lpParam){ contigsData* pDataArray; pDataArray = (contigsData*)lpParam; try { unsigned long long numReads = 0; pDataArray->delim = '@'; for (int l = pDataArray->start; l < pDataArray->end; l++) { int startTime = time(NULL); if (pDataArray->m->control_pressed) { break; } pDataArray->m->mothurOut("\n>>>>>\tProcessing file pair " + pDataArray->fileInputs[l][0] + " - " + pDataArray->fileInputs[l][1] + " (files " + toString(l+1) + " of " + toString(pDataArray->fileInputs.size()) + ")\t<<<<<\n"); string ffastqfile = pDataArray->fileInputs[l][0]; string rfastqfile = pDataArray->fileInputs[l][1]; string findexfile = pDataArray->fileInputs[l][2]; string rindexfile = pDataArray->fileInputs[l][3]; pDataArray->group = pDataArray->file2Group[l]; pDataArray->groupCounts.clear(); pDataArray->groupMap.clear(); pDataArray->inputFiles.clear(); pDataArray->qualOrIndexFiles.clear(); vector thisLines; vector thisQLines; string thisOutputDir = pDataArray->outputDir; string inputFile = ffastqfile; if (pDataArray->outputDir == "") { thisOutputDir = pDataArray->m->hasPath(inputFile); } pDataArray->outputQual = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(inputFile)) + ".trim.qfile"; pDataArray->outputScrapQual = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(inputFile)) + ".scrap.qfile"; pDataArray->inputFiles.push_back(ffastqfile); pDataArray->inputFiles.push_back(rfastqfile); if ((findexfile != "") || (rindexfile != "")){ pDataArray->qualOrIndexFiles.push_back("NONE"); pDataArray->qualOrIndexFiles.push_back("NONE"); if (findexfile != "") { pDataArray->qualOrIndexFiles[0] = findexfile; } if (rindexfile != "") { pDataArray->qualOrIndexFiles[1] = rindexfile; } } pDataArray->outputFasta = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(inputFile)) + ".trim.fasta"; pDataArray->outputScrapFasta = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(inputFile)) + ".scrap.fasta"; pDataArray->outputMisMatches= thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(inputFile)) + ".report"; pDataArray->linesInput_start = 0; pDataArray->linesInput_end = 1000; pDataArray->linesInputReverse_start = 1; pDataArray->qlinesInput_start = 0; pDataArray->qlinesInputReverse_start =0; pDataArray->linesInputReverse_end =1000; pDataArray->qlinesInput_end =1000; pDataArray->qlinesInputReverse_end=1000; map uniqueFastaNames;// so we don't add the same groupfile multiple times pDataArray->createOligosGroup = false; if(pDataArray->oligosfile != "") { Oligos oligos; //createOligosGroup = getOligos(pDataArray->fastaFileNames, pDataArray->qualFileNames, pDataArray->m->getRootName(pDataArray->m->getSimpleName(inputFile)), uniqueFastaNames); /////////////////////////////////////////////////////////////////////////////////////// string rootname = pDataArray->m->getRootName(pDataArray->m->getSimpleName(inputFile)); bool allBlank = false; int numFPrimers, numBarcodes, numLinkers, numSpacers, numRPrimers; numRPrimers = 0; numSpacers = 0; numLinkers = 0; numBarcodes = 0; numFPrimers = 0; oligos.read(pDataArray->oligosfile, false); if (pDataArray->m->control_pressed) { break; } //error in reading oligos if (oligos.hasPairedBarcodes() || oligos.hasPairedPrimers()) { numFPrimers = oligos.getPairedPrimers().size(); numBarcodes = oligos.getPairedBarcodes().size(); }else { pDataArray->m->mothurOut("[ERROR]: make.contigs requires paired barcodes and primers. You can set one end to NONE if you are using an index file.\n"); pDataArray->m->control_pressed = true; } if (pDataArray->m->control_pressed) { break; } numLinkers = oligos.getLinkers().size(); numSpacers = oligos.getSpacers().size(); numRPrimers = oligos.getReversePrimers().size(); if (numLinkers != 0) { pDataArray->m->mothurOut("[WARNING]: make.contigs is not setup to remove linkers, ignoring.\n"); } if (numSpacers != 0) { pDataArray->m->mothurOut("[WARNING]: make.contigs is not setup to remove spacers, ignoring.\n"); } vector groupNames = oligos.getGroupNames(); if (groupNames.size() == 0) { pDataArray->allFiles = 0; allBlank = true; } pDataArray->fastaFileNames.clear(); pDataArray->fastaFileNames.resize(oligos.getBarcodeNames().size()); for(int i=0;ifastaFileNames.size();i++){ for(int j=0;jfastaFileNames[i].push_back(""); } } pDataArray->qualFileNames = pDataArray->fastaFileNames; if (pDataArray->allFiles) { set uniqueNames; //used to cleanup outputFileNames map barcodes = oligos.getPairedBarcodes(); map primers = oligos.getPairedPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->first); string barcodeName = oligos.getBarcodeName(itBar->first); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string fastaFileName = ""; string qualFileName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } ofstream temp, temp2; fastaFileName = rootname + "." + comboGroupName + ".contigs.fasta"; qualFileName = rootname + "." + comboGroupName + ".contigs.qual"; if (uniqueNames.count(fastaFileName) == 0) { pDataArray->outputNames.push_back(fastaFileName); uniqueNames.insert(fastaFileName); uniqueFastaNames[fastaFileName] = comboGroupName; pDataArray->outputNames.push_back(qualFileName); uniqueNames.insert(qualFileName); } pDataArray->fastaFileNames[itBar->first][itPrimer->first] = fastaFileName; pDataArray->m->openOutputFile(fastaFileName, temp); temp.close(); //cout << fastaFileName << endl; pDataArray->qualFileNames[itBar->first][itPrimer->first] = qualFileName; pDataArray->m->openOutputFile(qualFileName, temp2); temp2.close(); } } } } if (allBlank) { pDataArray->m->mothurOut("[WARNING]: your oligos file does not contain any group names. mothur will not create a groupfile."); pDataArray->m->mothurOutEndLine(); pDataArray->allFiles = false; pDataArray->createOligosGroup = false; } pDataArray->createOligosGroup = true; /////////////////////////////////////////////////////////////////////////////////////// } string outputGroupFileName = ""; if (pDataArray->createOligosGroup || pDataArray->createFileGroup) { outputGroupFileName += thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(inputFile)) + ".contigs.group"; } //give group in file file precedence if (pDataArray->createFileGroup) { pDataArray->createOligosGroup = false; } ofstream temp, temp1, temp2, temp3; pDataArray->m->openOutputFile(pDataArray->outputFasta, temp); temp.close(); pDataArray->m->openOutputFile(pDataArray->outputScrapFasta, temp1); temp1.close(); pDataArray->m->openOutputFile(pDataArray->outputQual, temp2); temp2.close(); pDataArray->m->openOutputFile(pDataArray->outputScrapQual, temp3); temp3.close(); pDataArray->m->mothurOut("Making contigs...\n"); unsigned long long thisNumReads = 0; if (true) { //resolve local variable issues //unsigned long long thisNumReads = driver(thisFileInputs, thisQualOrIndexInputs, outFastaFile, outScrapFastaFile, outQualFile, outScrapQualFile, outMisMatchFile, fastaFileNames, qualFileNames, thisLines[0], thisLines[1], thisQLines[0], thisQLines[1], group); /////////////////////////////////////////////////////////////////////////////////////// vector< vector > qual_match_simple_bayesian; qual_match_simple_bayesian.resize(47); for (int i = 0; i < qual_match_simple_bayesian.size(); i++) { qual_match_simple_bayesian[i].resize(47); } qual_match_simple_bayesian[0][0] = -1.09861; qual_match_simple_bayesian[0][1] = -1.32887; qual_match_simple_bayesian[0][2] = -1.55913; qual_match_simple_bayesian[0][3] = -1.78939; qual_match_simple_bayesian[0][4] = -2.01965; qual_match_simple_bayesian[0][5] = -2.2499; qual_match_simple_bayesian[0][6] = -2.48016; qual_match_simple_bayesian[0][7] = -2.71042; qual_match_simple_bayesian[0][8] = -2.94068; qual_match_simple_bayesian[0][9] = -3.17094; qual_match_simple_bayesian[0][10] = -3.4012; qual_match_simple_bayesian[0][11] = -3.63146; qual_match_simple_bayesian[0][12] = -3.86171; qual_match_simple_bayesian[0][13] = -4.09197; qual_match_simple_bayesian[0][14] = -4.32223; qual_match_simple_bayesian[0][15] = -4.55249; qual_match_simple_bayesian[0][16] = -4.78275; qual_match_simple_bayesian[0][17] = -5.01301; qual_match_simple_bayesian[0][18] = -5.24327; qual_match_simple_bayesian[0][19] = -5.47352; qual_match_simple_bayesian[0][20] = -5.70378; qual_match_simple_bayesian[0][21] = -5.93404; qual_match_simple_bayesian[0][22] = -6.1643; qual_match_simple_bayesian[0][23] = -6.39456; qual_match_simple_bayesian[0][24] = -6.62482; qual_match_simple_bayesian[0][25] = -6.85508; qual_match_simple_bayesian[0][26] = -7.08533; qual_match_simple_bayesian[0][27] = -7.31559; qual_match_simple_bayesian[0][28] = -7.54585; qual_match_simple_bayesian[0][29] = -7.77611; qual_match_simple_bayesian[0][30] = -8.00637; qual_match_simple_bayesian[0][31] = -8.23663; qual_match_simple_bayesian[0][32] = -8.46688; qual_match_simple_bayesian[0][33] = -8.69714; qual_match_simple_bayesian[0][34] = -8.9274; qual_match_simple_bayesian[0][35] = -9.15766; qual_match_simple_bayesian[0][36] = -9.38792; qual_match_simple_bayesian[0][37] = -9.61818; qual_match_simple_bayesian[0][38] = -9.84844; qual_match_simple_bayesian[0][39] = -10.0787; qual_match_simple_bayesian[0][40] = -10.309; qual_match_simple_bayesian[0][41] = -10.5392; qual_match_simple_bayesian[0][42] = -10.7695; qual_match_simple_bayesian[0][43] = -10.9997; qual_match_simple_bayesian[0][44] = -11.23; qual_match_simple_bayesian[0][45] = -11.4602; qual_match_simple_bayesian[0][46] = -11.6905; qual_match_simple_bayesian[1][0] = -1.32887; qual_match_simple_bayesian[1][1] = -1.37587; qual_match_simple_bayesian[1][2] = -1.41484; qual_match_simple_bayesian[1][3] = -1.44692; qual_match_simple_bayesian[1][4] = -1.47315; qual_match_simple_bayesian[1][5] = -1.49449; qual_match_simple_bayesian[1][6] = -1.51178; qual_match_simple_bayesian[1][7] = -1.52572; qual_match_simple_bayesian[1][8] = -1.53694; qual_match_simple_bayesian[1][9] = -1.54593; qual_match_simple_bayesian[1][10] = -1.55314; qual_match_simple_bayesian[1][11] = -1.5589; qual_match_simple_bayesian[1][12] = -1.5635; qual_match_simple_bayesian[1][13] = -1.56717; qual_match_simple_bayesian[1][14] = -1.5701; qual_match_simple_bayesian[1][15] = -1.57243; qual_match_simple_bayesian[1][16] = -1.57428; qual_match_simple_bayesian[1][17] = -1.57576; qual_match_simple_bayesian[1][18] = -1.57693; qual_match_simple_bayesian[1][19] = -1.57786; qual_match_simple_bayesian[1][20] = -1.5786; qual_match_simple_bayesian[1][21] = -1.57919; qual_match_simple_bayesian[1][22] = -1.57966; qual_match_simple_bayesian[1][23] = -1.58003; qual_match_simple_bayesian[1][24] = -1.58033; qual_match_simple_bayesian[1][25] = -1.58057; qual_match_simple_bayesian[1][26] = -1.58075; qual_match_simple_bayesian[1][27] = -1.5809; qual_match_simple_bayesian[1][28] = -1.58102; qual_match_simple_bayesian[1][29] = -1.58111; qual_match_simple_bayesian[1][30] = -1.58119; qual_match_simple_bayesian[1][31] = -1.58125; qual_match_simple_bayesian[1][32] = -1.58129; qual_match_simple_bayesian[1][33] = -1.58133; qual_match_simple_bayesian[1][34] = -1.58136; qual_match_simple_bayesian[1][35] = -1.58138; qual_match_simple_bayesian[1][36] = -1.5814; qual_match_simple_bayesian[1][37] = -1.58142; qual_match_simple_bayesian[1][38] = -1.58143; qual_match_simple_bayesian[1][39] = -1.58144; qual_match_simple_bayesian[1][40] = -1.58145; qual_match_simple_bayesian[1][41] = -1.58145; qual_match_simple_bayesian[1][42] = -1.58146; qual_match_simple_bayesian[1][43] = -1.58146; qual_match_simple_bayesian[1][44] = -1.58146; qual_match_simple_bayesian[1][45] = -1.58146; qual_match_simple_bayesian[1][46] = -1.58147; qual_match_simple_bayesian[2][0] = -1.55913; qual_match_simple_bayesian[2][1] = -1.41484; qual_match_simple_bayesian[2][2] = -1.31343; qual_match_simple_bayesian[2][3] = -1.23963; qual_match_simple_bayesian[2][4] = -1.18465; qual_match_simple_bayesian[2][5] = -1.14303; qual_match_simple_bayesian[2][6] = -1.11117; qual_match_simple_bayesian[2][7] = -1.08657; qual_match_simple_bayesian[2][8] = -1.06744; qual_match_simple_bayesian[2][9] = -1.05251; qual_match_simple_bayesian[2][10] = -1.0408; qual_match_simple_bayesian[2][11] = -1.0316; qual_match_simple_bayesian[2][12] = -1.02436; qual_match_simple_bayesian[2][13] = -1.01863; qual_match_simple_bayesian[2][14] = -1.01411; qual_match_simple_bayesian[2][15] = -1.01054; qual_match_simple_bayesian[2][16] = -1.00771; qual_match_simple_bayesian[2][17] = -1.00546; qual_match_simple_bayesian[2][18] = -1.00368; qual_match_simple_bayesian[2][19] = -1.00227; qual_match_simple_bayesian[2][20] = -1.00115; qual_match_simple_bayesian[2][21] = -1.00027; qual_match_simple_bayesian[2][22] = -0.99956; qual_match_simple_bayesian[2][23] = -0.999001; qual_match_simple_bayesian[2][24] = -0.998557; qual_match_simple_bayesian[2][25] = -0.998204; qual_match_simple_bayesian[2][26] = -0.997924; qual_match_simple_bayesian[2][27] = -0.997702; qual_match_simple_bayesian[2][28] = -0.997525; qual_match_simple_bayesian[2][29] = -0.997385; qual_match_simple_bayesian[2][30] = -0.997273; qual_match_simple_bayesian[2][31] = -0.997185; qual_match_simple_bayesian[2][32] = -0.997114; qual_match_simple_bayesian[2][33] = -0.997059; qual_match_simple_bayesian[2][34] = -0.997014; qual_match_simple_bayesian[2][35] = -0.996979; qual_match_simple_bayesian[2][36] = -0.996951; qual_match_simple_bayesian[2][37] = -0.996929; qual_match_simple_bayesian[2][38] = -0.996911; qual_match_simple_bayesian[2][39] = -0.996897; qual_match_simple_bayesian[2][40] = -0.996886; qual_match_simple_bayesian[2][41] = -0.996877; qual_match_simple_bayesian[2][42] = -0.99687; qual_match_simple_bayesian[2][43] = -0.996865; qual_match_simple_bayesian[2][44] = -0.99686; qual_match_simple_bayesian[2][45] = -0.996857; qual_match_simple_bayesian[2][46] = -0.996854; qual_match_simple_bayesian[3][0] = -1.78939; qual_match_simple_bayesian[3][1] = -1.44692; qual_match_simple_bayesian[3][2] = -1.23963; qual_match_simple_bayesian[3][3] = -1.10098; qual_match_simple_bayesian[3][4] = -1.0031; qual_match_simple_bayesian[3][5] = -0.931648; qual_match_simple_bayesian[3][6] = -0.878319; qual_match_simple_bayesian[3][7] = -0.837896; qual_match_simple_bayesian[3][8] = -0.806912; qual_match_simple_bayesian[3][9] = -0.782967; qual_match_simple_bayesian[3][10] = -0.764347; qual_match_simple_bayesian[3][11] = -0.7498; qual_match_simple_bayesian[3][12] = -0.738394; qual_match_simple_bayesian[3][13] = -0.729426; qual_match_simple_bayesian[3][14] = -0.722359; qual_match_simple_bayesian[3][15] = -0.71678; qual_match_simple_bayesian[3][16] = -0.712372; qual_match_simple_bayesian[3][17] = -0.708883; qual_match_simple_bayesian[3][18] = -0.706121; qual_match_simple_bayesian[3][19] = -0.703933; qual_match_simple_bayesian[3][20] = -0.702197; qual_match_simple_bayesian[3][21] = -0.700821; qual_match_simple_bayesian[3][22] = -0.69973; qual_match_simple_bayesian[3][23] = -0.698863; qual_match_simple_bayesian[3][24] = -0.698176; qual_match_simple_bayesian[3][25] = -0.69763; qual_match_simple_bayesian[3][26] = -0.697196; qual_match_simple_bayesian[3][27] = -0.696852; qual_match_simple_bayesian[3][28] = -0.696579; qual_match_simple_bayesian[3][29] = -0.696362; qual_match_simple_bayesian[3][30] = -0.69619; qual_match_simple_bayesian[3][31] = -0.696053; qual_match_simple_bayesian[3][32] = -0.695944; qual_match_simple_bayesian[3][33] = -0.695858; qual_match_simple_bayesian[3][34] = -0.695789; qual_match_simple_bayesian[3][35] = -0.695735; qual_match_simple_bayesian[3][36] = -0.695692; qual_match_simple_bayesian[3][37] = -0.695657; qual_match_simple_bayesian[3][38] = -0.69563; qual_match_simple_bayesian[3][39] = -0.695608; qual_match_simple_bayesian[3][40] = -0.695591; qual_match_simple_bayesian[3][41] = -0.695577; qual_match_simple_bayesian[3][42] = -0.695566; qual_match_simple_bayesian[3][43] = -0.695558; qual_match_simple_bayesian[3][44] = -0.695551; qual_match_simple_bayesian[3][45] = -0.695546; qual_match_simple_bayesian[3][46] = -0.695541; qual_match_simple_bayesian[4][0] = -2.01965; qual_match_simple_bayesian[4][1] = -1.47315; qual_match_simple_bayesian[4][2] = -1.18465; qual_match_simple_bayesian[4][3] = -1.0031; qual_match_simple_bayesian[4][4] = -0.879224; qual_match_simple_bayesian[4][5] = -0.790712; qual_match_simple_bayesian[4][6] = -0.725593; qual_match_simple_bayesian[4][7] = -0.676729; qual_match_simple_bayesian[4][8] = -0.639547; qual_match_simple_bayesian[4][9] = -0.610968; qual_match_simple_bayesian[4][10] = -0.588834; qual_match_simple_bayesian[4][11] = -0.571596; qual_match_simple_bayesian[4][12] = -0.558111; qual_match_simple_bayesian[4][13] = -0.547528; qual_match_simple_bayesian[4][14] = -0.539201; qual_match_simple_bayesian[4][15] = -0.532636; qual_match_simple_bayesian[4][16] = -0.527451; qual_match_simple_bayesian[4][17] = -0.523352; qual_match_simple_bayesian[4][18] = -0.520107; qual_match_simple_bayesian[4][19] = -0.517538; qual_match_simple_bayesian[4][20] = -0.515502; qual_match_simple_bayesian[4][21] = -0.513887; qual_match_simple_bayesian[4][22] = -0.512606; qual_match_simple_bayesian[4][23] = -0.51159; qual_match_simple_bayesian[4][24] = -0.510784; qual_match_simple_bayesian[4][25] = -0.510144; qual_match_simple_bayesian[4][26] = -0.509636; qual_match_simple_bayesian[4][27] = -0.509232; qual_match_simple_bayesian[4][28] = -0.508912; qual_match_simple_bayesian[4][29] = -0.508658; qual_match_simple_bayesian[4][30] = -0.508456; qual_match_simple_bayesian[4][31] = -0.508295; qual_match_simple_bayesian[4][32] = -0.508168; qual_match_simple_bayesian[4][33] = -0.508067; qual_match_simple_bayesian[4][34] = -0.507986; qual_match_simple_bayesian[4][35] = -0.507922; qual_match_simple_bayesian[4][36] = -0.507872; qual_match_simple_bayesian[4][37] = -0.507831; qual_match_simple_bayesian[4][38] = -0.507799; qual_match_simple_bayesian[4][39] = -0.507774; qual_match_simple_bayesian[4][40] = -0.507754; qual_match_simple_bayesian[4][41] = -0.507738; qual_match_simple_bayesian[4][42] = -0.507725; qual_match_simple_bayesian[4][43] = -0.507715; qual_match_simple_bayesian[4][44] = -0.507707; qual_match_simple_bayesian[4][45] = -0.507701; qual_match_simple_bayesian[4][46] = -0.507695; qual_match_simple_bayesian[5][0] = -2.2499; qual_match_simple_bayesian[5][1] = -1.49449; qual_match_simple_bayesian[5][2] = -1.14303; qual_match_simple_bayesian[5][3] = -0.931648; qual_match_simple_bayesian[5][4] = -0.790712; qual_match_simple_bayesian[5][5] = -0.691393; qual_match_simple_bayesian[5][6] = -0.618979; qual_match_simple_bayesian[5][7] = -0.564976; qual_match_simple_bayesian[5][8] = -0.524066; qual_match_simple_bayesian[5][9] = -0.492723; qual_match_simple_bayesian[5][10] = -0.468507; qual_match_simple_bayesian[5][11] = -0.449682; qual_match_simple_bayesian[5][12] = -0.434976; qual_match_simple_bayesian[5][13] = -0.423448; qual_match_simple_bayesian[5][14] = -0.414384; qual_match_simple_bayesian[5][15] = -0.407243; qual_match_simple_bayesian[5][16] = -0.401606; qual_match_simple_bayesian[5][17] = -0.397151; qual_match_simple_bayesian[5][18] = -0.393627; qual_match_simple_bayesian[5][19] = -0.390836; qual_match_simple_bayesian[5][20] = -0.388625; qual_match_simple_bayesian[5][21] = -0.386872; qual_match_simple_bayesian[5][22] = -0.385482; qual_match_simple_bayesian[5][23] = -0.384379; qual_match_simple_bayesian[5][24] = -0.383503; qual_match_simple_bayesian[5][25] = -0.382809; qual_match_simple_bayesian[5][26] = -0.382257; qual_match_simple_bayesian[5][27] = -0.38182; qual_match_simple_bayesian[5][28] = -0.381472; qual_match_simple_bayesian[5][29] = -0.381196; qual_match_simple_bayesian[5][30] = -0.380977; qual_match_simple_bayesian[5][31] = -0.380803; qual_match_simple_bayesian[5][32] = -0.380664; qual_match_simple_bayesian[5][33] = -0.380554; qual_match_simple_bayesian[5][34] = -0.380467; qual_match_simple_bayesian[5][35] = -0.380398; qual_match_simple_bayesian[5][36] = -0.380343; qual_match_simple_bayesian[5][37] = -0.380299; qual_match_simple_bayesian[5][38] = -0.380264; qual_match_simple_bayesian[5][39] = -0.380237; qual_match_simple_bayesian[5][40] = -0.380215; qual_match_simple_bayesian[5][41] = -0.380198; qual_match_simple_bayesian[5][42] = -0.380184; qual_match_simple_bayesian[5][43] = -0.380173; qual_match_simple_bayesian[5][44] = -0.380164; qual_match_simple_bayesian[5][45] = -0.380157; qual_match_simple_bayesian[5][46] = -0.380152; qual_match_simple_bayesian[6][0] = -2.48016; qual_match_simple_bayesian[6][1] = -1.51178; qual_match_simple_bayesian[6][2] = -1.11117; qual_match_simple_bayesian[6][3] = -0.878319; qual_match_simple_bayesian[6][4] = -0.725593; qual_match_simple_bayesian[6][5] = -0.618979; qual_match_simple_bayesian[6][6] = -0.541714; qual_match_simple_bayesian[6][7] = -0.48433; qual_match_simple_bayesian[6][8] = -0.440984; qual_match_simple_bayesian[6][9] = -0.407844; qual_match_simple_bayesian[6][10] = -0.382281; qual_match_simple_bayesian[6][11] = -0.362431; qual_match_simple_bayesian[6][12] = -0.34694; qual_match_simple_bayesian[6][13] = -0.334804; qual_match_simple_bayesian[6][14] = -0.325268; qual_match_simple_bayesian[6][15] = -0.317757; qual_match_simple_bayesian[6][16] = -0.311831; qual_match_simple_bayesian[6][17] = -0.307149; qual_match_simple_bayesian[6][18] = -0.303445; qual_match_simple_bayesian[6][19] = -0.300513; qual_match_simple_bayesian[6][20] = -0.29819; qual_match_simple_bayesian[6][21] = -0.296348; qual_match_simple_bayesian[6][22] = -0.294888; qual_match_simple_bayesian[6][23] = -0.29373; qual_match_simple_bayesian[6][24] = -0.29281; qual_match_simple_bayesian[6][25] = -0.292081; qual_match_simple_bayesian[6][26] = -0.291502; qual_match_simple_bayesian[6][27] = -0.291042; qual_match_simple_bayesian[6][28] = -0.290677; qual_match_simple_bayesian[6][29] = -0.290387; qual_match_simple_bayesian[6][30] = -0.290157; qual_match_simple_bayesian[6][31] = -0.289974; qual_match_simple_bayesian[6][32] = -0.289829; qual_match_simple_bayesian[6][33] = -0.289713; qual_match_simple_bayesian[6][34] = -0.289622; qual_match_simple_bayesian[6][35] = -0.289549; qual_match_simple_bayesian[6][36] = -0.289491; qual_match_simple_bayesian[6][37] = -0.289445; qual_match_simple_bayesian[6][38] = -0.289409; qual_match_simple_bayesian[6][39] = -0.28938; qual_match_simple_bayesian[6][40] = -0.289357; qual_match_simple_bayesian[6][41] = -0.289339; qual_match_simple_bayesian[6][42] = -0.289324; qual_match_simple_bayesian[6][43] = -0.289313; qual_match_simple_bayesian[6][44] = -0.289304; qual_match_simple_bayesian[6][45] = -0.289296; qual_match_simple_bayesian[6][46] = -0.28929; qual_match_simple_bayesian[7][0] = -2.71042; qual_match_simple_bayesian[7][1] = -1.52572; qual_match_simple_bayesian[7][2] = -1.08657; qual_match_simple_bayesian[7][3] = -0.837896; qual_match_simple_bayesian[7][4] = -0.676729; qual_match_simple_bayesian[7][5] = -0.564976; qual_match_simple_bayesian[7][6] = -0.48433; qual_match_simple_bayesian[7][7] = -0.424604; qual_match_simple_bayesian[7][8] = -0.379581; qual_match_simple_bayesian[7][9] = -0.345208; qual_match_simple_bayesian[7][10] = -0.318723; qual_match_simple_bayesian[7][11] = -0.298173; qual_match_simple_bayesian[7][12] = -0.282146; qual_match_simple_bayesian[7][13] = -0.269595; qual_match_simple_bayesian[7][14] = -0.259737; qual_match_simple_bayesian[7][15] = -0.251976; qual_match_simple_bayesian[7][16] = -0.245853; qual_match_simple_bayesian[7][17] = -0.241016; qual_match_simple_bayesian[7][18] = -0.23719; qual_match_simple_bayesian[7][19] = -0.234162; qual_match_simple_bayesian[7][20] = -0.231763; qual_match_simple_bayesian[7][21] = -0.229861; qual_match_simple_bayesian[7][22] = -0.228354; qual_match_simple_bayesian[7][23] = -0.227158; qual_match_simple_bayesian[7][24] = -0.226208; qual_match_simple_bayesian[7][25] = -0.225455; qual_match_simple_bayesian[7][26] = -0.224857; qual_match_simple_bayesian[7][27] = -0.224383; qual_match_simple_bayesian[7][28] = -0.224006; qual_match_simple_bayesian[7][29] = -0.223707; qual_match_simple_bayesian[7][30] = -0.223469; qual_match_simple_bayesian[7][31] = -0.22328; qual_match_simple_bayesian[7][32] = -0.22313; qual_match_simple_bayesian[7][33] = -0.223011; qual_match_simple_bayesian[7][34] = -0.222917; qual_match_simple_bayesian[7][35] = -0.222842; qual_match_simple_bayesian[7][36] = -0.222782; qual_match_simple_bayesian[7][37] = -0.222734; qual_match_simple_bayesian[7][38] = -0.222697; qual_match_simple_bayesian[7][39] = -0.222667; qual_match_simple_bayesian[7][40] = -0.222643; qual_match_simple_bayesian[7][41] = -0.222624; qual_match_simple_bayesian[7][42] = -0.222609; qual_match_simple_bayesian[7][43] = -0.222597; qual_match_simple_bayesian[7][44] = -0.222588; qual_match_simple_bayesian[7][45] = -0.222581; qual_match_simple_bayesian[7][46] = -0.222575; qual_match_simple_bayesian[8][0] = -2.94068; qual_match_simple_bayesian[8][1] = -1.53694; qual_match_simple_bayesian[8][2] = -1.06744; qual_match_simple_bayesian[8][3] = -0.806912; qual_match_simple_bayesian[8][4] = -0.639547; qual_match_simple_bayesian[8][5] = -0.524066; qual_match_simple_bayesian[8][6] = -0.440984; qual_match_simple_bayesian[8][7] = -0.379581; qual_match_simple_bayesian[8][8] = -0.333359; qual_match_simple_bayesian[8][9] = -0.298107; qual_match_simple_bayesian[8][10] = -0.270966; qual_match_simple_bayesian[8][11] = -0.249919; qual_match_simple_bayesian[8][12] = -0.233512; qual_match_simple_bayesian[8][13] = -0.220668; qual_match_simple_bayesian[8][14] = -0.210582; qual_match_simple_bayesian[8][15] = -0.202642; qual_match_simple_bayesian[8][16] = -0.19638; qual_match_simple_bayesian[8][17] = -0.191434; qual_match_simple_bayesian[8][18] = -0.187522; qual_match_simple_bayesian[8][19] = -0.184426; qual_match_simple_bayesian[8][20] = -0.181973; qual_match_simple_bayesian[8][21] = -0.180029; qual_match_simple_bayesian[8][22] = -0.178488; qual_match_simple_bayesian[8][23] = -0.177265; qual_match_simple_bayesian[8][24] = -0.176295; qual_match_simple_bayesian[8][25] = -0.175525; qual_match_simple_bayesian[8][26] = -0.174914; qual_match_simple_bayesian[8][27] = -0.174428; qual_match_simple_bayesian[8][28] = -0.174043; qual_match_simple_bayesian[8][29] = -0.173737; qual_match_simple_bayesian[8][30] = -0.173494; qual_match_simple_bayesian[8][31] = -0.173301; qual_match_simple_bayesian[8][32] = -0.173148; qual_match_simple_bayesian[8][33] = -0.173026; qual_match_simple_bayesian[8][34] = -0.17293; qual_match_simple_bayesian[8][35] = -0.172853; qual_match_simple_bayesian[8][36] = -0.172792; qual_match_simple_bayesian[8][37] = -0.172744; qual_match_simple_bayesian[8][38] = -0.172705; qual_match_simple_bayesian[8][39] = -0.172675; qual_match_simple_bayesian[8][40] = -0.17265; qual_match_simple_bayesian[8][41] = -0.172631; qual_match_simple_bayesian[8][42] = -0.172616; qual_match_simple_bayesian[8][43] = -0.172604; qual_match_simple_bayesian[8][44] = -0.172594; qual_match_simple_bayesian[8][45] = -0.172586; qual_match_simple_bayesian[8][46] = -0.17258; qual_match_simple_bayesian[9][0] = -3.17094; qual_match_simple_bayesian[9][1] = -1.54593; qual_match_simple_bayesian[9][2] = -1.05251; qual_match_simple_bayesian[9][3] = -0.782967; qual_match_simple_bayesian[9][4] = -0.610968; qual_match_simple_bayesian[9][5] = -0.492723; qual_match_simple_bayesian[9][6] = -0.407844; qual_match_simple_bayesian[9][7] = -0.345208; qual_match_simple_bayesian[9][8] = -0.298107; qual_match_simple_bayesian[9][9] = -0.262213; qual_match_simple_bayesian[9][10] = -0.234592; qual_match_simple_bayesian[9][11] = -0.213183; qual_match_simple_bayesian[9][12] = -0.196498; qual_match_simple_bayesian[9][13] = -0.18344; qual_match_simple_bayesian[9][14] = -0.173188; qual_match_simple_bayesian[9][15] = -0.165119; qual_match_simple_bayesian[9][16] = -0.158755; qual_match_simple_bayesian[9][17] = -0.153729; qual_match_simple_bayesian[9][18] = -0.149755; qual_match_simple_bayesian[9][19] = -0.146609; qual_match_simple_bayesian[9][20] = -0.144117; qual_match_simple_bayesian[9][21] = -0.142143; qual_match_simple_bayesian[9][22] = -0.140577; qual_match_simple_bayesian[9][23] = -0.139335; qual_match_simple_bayesian[9][24] = -0.138349; qual_match_simple_bayesian[9][25] = -0.137567; qual_match_simple_bayesian[9][26] = -0.136946; qual_match_simple_bayesian[9][27] = -0.136453; qual_match_simple_bayesian[9][28] = -0.136062; qual_match_simple_bayesian[9][29] = -0.135751; qual_match_simple_bayesian[9][30] = -0.135504; qual_match_simple_bayesian[9][31] = -0.135308; qual_match_simple_bayesian[9][32] = -0.135153; qual_match_simple_bayesian[9][33] = -0.135029; qual_match_simple_bayesian[9][34] = -0.134931; qual_match_simple_bayesian[9][35] = -0.134853; qual_match_simple_bayesian[9][36] = -0.134791; qual_match_simple_bayesian[9][37] = -0.134742; qual_match_simple_bayesian[9][38] = -0.134703; qual_match_simple_bayesian[9][39] = -0.134672; qual_match_simple_bayesian[9][40] = -0.134647; qual_match_simple_bayesian[9][41] = -0.134628; qual_match_simple_bayesian[9][42] = -0.134612; qual_match_simple_bayesian[9][43] = -0.1346; qual_match_simple_bayesian[9][44] = -0.13459; qual_match_simple_bayesian[9][45] = -0.134582; qual_match_simple_bayesian[9][46] = -0.134576; qual_match_simple_bayesian[10][0] = -3.4012; qual_match_simple_bayesian[10][1] = -1.55314; qual_match_simple_bayesian[10][2] = -1.0408; qual_match_simple_bayesian[10][3] = -0.764347; qual_match_simple_bayesian[10][4] = -0.588834; qual_match_simple_bayesian[10][5] = -0.468507; qual_match_simple_bayesian[10][6] = -0.382281; qual_match_simple_bayesian[10][7] = -0.318723; qual_match_simple_bayesian[10][8] = -0.270966; qual_match_simple_bayesian[10][9] = -0.234592; qual_match_simple_bayesian[10][10] = -0.206614; qual_match_simple_bayesian[10][11] = -0.184935; qual_match_simple_bayesian[10][12] = -0.168044; qual_match_simple_bayesian[10][13] = -0.154827; qual_match_simple_bayesian[10][14] = -0.144451; qual_match_simple_bayesian[10][15] = -0.136285; qual_match_simple_bayesian[10][16] = -0.129846; qual_match_simple_bayesian[10][17] = -0.124761; qual_match_simple_bayesian[10][18] = -0.12074; qual_match_simple_bayesian[10][19] = -0.117558; qual_match_simple_bayesian[10][20] = -0.115037; qual_match_simple_bayesian[10][21] = -0.113039; qual_match_simple_bayesian[10][22] = -0.111455; qual_match_simple_bayesian[10][23] = -0.110198; qual_match_simple_bayesian[10][24] = -0.109202; qual_match_simple_bayesian[10][25] = -0.10841; qual_match_simple_bayesian[10][26] = -0.107782; qual_match_simple_bayesian[10][27] = -0.107284; qual_match_simple_bayesian[10][28] = -0.106888; qual_match_simple_bayesian[10][29] = -0.106574; qual_match_simple_bayesian[10][30] = -0.106324; qual_match_simple_bayesian[10][31] = -0.106126; qual_match_simple_bayesian[10][32] = -0.105968; qual_match_simple_bayesian[10][33] = -0.105843; qual_match_simple_bayesian[10][34] = -0.105744; qual_match_simple_bayesian[10][35] = -0.105665; qual_match_simple_bayesian[10][36] = -0.105602; qual_match_simple_bayesian[10][37] = -0.105553; qual_match_simple_bayesian[10][38] = -0.105513; qual_match_simple_bayesian[10][39] = -0.105482; qual_match_simple_bayesian[10][40] = -0.105457; qual_match_simple_bayesian[10][41] = -0.105437; qual_match_simple_bayesian[10][42] = -0.105421; qual_match_simple_bayesian[10][43] = -0.105409; qual_match_simple_bayesian[10][44] = -0.105399; qual_match_simple_bayesian[10][45] = -0.105391; qual_match_simple_bayesian[10][46] = -0.105385; qual_match_simple_bayesian[11][0] = -3.63146; qual_match_simple_bayesian[11][1] = -1.5589; qual_match_simple_bayesian[11][2] = -1.0316; qual_match_simple_bayesian[11][3] = -0.7498; qual_match_simple_bayesian[11][4] = -0.571596; qual_match_simple_bayesian[11][5] = -0.449682; qual_match_simple_bayesian[11][6] = -0.362431; qual_match_simple_bayesian[11][7] = -0.298173; qual_match_simple_bayesian[11][8] = -0.249919; qual_match_simple_bayesian[11][9] = -0.213183; qual_match_simple_bayesian[11][10] = -0.184935; qual_match_simple_bayesian[11][11] = -0.163052; qual_match_simple_bayesian[11][12] = -0.146004; qual_match_simple_bayesian[11][13] = -0.132667; qual_match_simple_bayesian[11][14] = -0.122198; qual_match_simple_bayesian[11][15] = -0.11396; qual_match_simple_bayesian[11][16] = -0.107464; qual_match_simple_bayesian[11][17] = -0.102334; qual_match_simple_bayesian[11][18] = -0.0982781; qual_match_simple_bayesian[11][19] = -0.0950678; qual_match_simple_bayesian[11][20] = -0.0925252; qual_match_simple_bayesian[11][21] = -0.09051; qual_match_simple_bayesian[11][22] = -0.0889123; qual_match_simple_bayesian[11][23] = -0.0876449; qual_match_simple_bayesian[11][24] = -0.0866394; qual_match_simple_bayesian[11][25] = -0.0858414; qual_match_simple_bayesian[11][26] = -0.0852079; qual_match_simple_bayesian[11][27] = -0.0847051; qual_match_simple_bayesian[11][28] = -0.0843058; qual_match_simple_bayesian[11][29] = -0.0839888; qual_match_simple_bayesian[11][30] = -0.083737; qual_match_simple_bayesian[11][31] = -0.0835371; qual_match_simple_bayesian[11][32] = -0.0833783; qual_match_simple_bayesian[11][33] = -0.0832522; qual_match_simple_bayesian[11][34] = -0.083152; qual_match_simple_bayesian[11][35] = -0.0830725; qual_match_simple_bayesian[11][36] = -0.0830093; qual_match_simple_bayesian[11][37] = -0.0829591; qual_match_simple_bayesian[11][38] = -0.0829192; qual_match_simple_bayesian[11][39] = -0.0828876; qual_match_simple_bayesian[11][40] = -0.0828624; qual_match_simple_bayesian[11][41] = -0.0828425; qual_match_simple_bayesian[11][42] = -0.0828266; qual_match_simple_bayesian[11][43] = -0.082814; qual_match_simple_bayesian[11][44] = -0.082804; qual_match_simple_bayesian[11][45] = -0.082796; qual_match_simple_bayesian[11][46] = -0.0827897; qual_match_simple_bayesian[12][0] = -3.86171; qual_match_simple_bayesian[12][1] = -1.5635; qual_match_simple_bayesian[12][2] = -1.02436; qual_match_simple_bayesian[12][3] = -0.738394; qual_match_simple_bayesian[12][4] = -0.558111; qual_match_simple_bayesian[12][5] = -0.434976; qual_match_simple_bayesian[12][6] = -0.34694; qual_match_simple_bayesian[12][7] = -0.282146; qual_match_simple_bayesian[12][8] = -0.233512; qual_match_simple_bayesian[12][9] = -0.196498; qual_match_simple_bayesian[12][10] = -0.168044; qual_match_simple_bayesian[12][11] = -0.146004; qual_match_simple_bayesian[12][12] = -0.128838; qual_match_simple_bayesian[12][13] = -0.115409; qual_match_simple_bayesian[12][14] = -0.104869; qual_match_simple_bayesian[12][15] = -0.096575; qual_match_simple_bayesian[12][16] = -0.0900357; qual_match_simple_bayesian[12][17] = -0.0848716; qual_match_simple_bayesian[12][18] = -0.0807886; qual_match_simple_bayesian[12][19] = -0.0775572; qual_match_simple_bayesian[12][20] = -0.0749978; qual_match_simple_bayesian[12][21] = -0.0729694; qual_match_simple_bayesian[12][22] = -0.0713612; qual_match_simple_bayesian[12][23] = -0.0700856; qual_match_simple_bayesian[12][24] = -0.0690735; qual_match_simple_bayesian[12][25] = -0.0682703; qual_match_simple_bayesian[12][26] = -0.0676327; qual_match_simple_bayesian[12][27] = -0.0671265; qual_match_simple_bayesian[12][28] = -0.0667247; qual_match_simple_bayesian[12][29] = -0.0664056; qual_match_simple_bayesian[12][30] = -0.0661522; qual_match_simple_bayesian[12][31] = -0.065951; qual_match_simple_bayesian[12][32] = -0.0657912; qual_match_simple_bayesian[12][33] = -0.0656642; qual_match_simple_bayesian[12][34] = -0.0655634; qual_match_simple_bayesian[12][35] = -0.0654833; qual_match_simple_bayesian[12][36] = -0.0654198; qual_match_simple_bayesian[12][37] = -0.0653692; qual_match_simple_bayesian[12][38] = -0.0653291; qual_match_simple_bayesian[12][39] = -0.0652972; qual_match_simple_bayesian[12][40] = -0.0652719; qual_match_simple_bayesian[12][41] = -0.0652518; qual_match_simple_bayesian[12][42] = -0.0652359; qual_match_simple_bayesian[12][43] = -0.0652232; qual_match_simple_bayesian[12][44] = -0.0652131; qual_match_simple_bayesian[12][45] = -0.0652051; qual_match_simple_bayesian[12][46] = -0.0651987; qual_match_simple_bayesian[13][0] = -4.09197; qual_match_simple_bayesian[13][1] = -1.56717; qual_match_simple_bayesian[13][2] = -1.01863; qual_match_simple_bayesian[13][3] = -0.729426; qual_match_simple_bayesian[13][4] = -0.547528; qual_match_simple_bayesian[13][5] = -0.423448; qual_match_simple_bayesian[13][6] = -0.334804; qual_match_simple_bayesian[13][7] = -0.269595; qual_match_simple_bayesian[13][8] = -0.220668; qual_match_simple_bayesian[13][9] = -0.18344; qual_match_simple_bayesian[13][10] = -0.154827; qual_match_simple_bayesian[13][11] = -0.132667; qual_match_simple_bayesian[13][12] = -0.115409; qual_match_simple_bayesian[13][13] = -0.101909; qual_match_simple_bayesian[13][14] = -0.0913142; qual_match_simple_bayesian[13][15] = -0.0829777; qual_match_simple_bayesian[13][16] = -0.0764049; qual_match_simple_bayesian[13][17] = -0.0712146; qual_match_simple_bayesian[13][18] = -0.0671109; qual_match_simple_bayesian[13][19] = -0.0638632; qual_match_simple_bayesian[13][20] = -0.061291; qual_match_simple_bayesian[13][21] = -0.0592525; qual_match_simple_bayesian[13][22] = -0.0576362; qual_match_simple_bayesian[13][23] = -0.0563542; qual_match_simple_bayesian[13][24] = -0.055337; qual_match_simple_bayesian[13][25] = -0.0545298; qual_match_simple_bayesian[13][26] = -0.053889; qual_match_simple_bayesian[13][27] = -0.0533804; qual_match_simple_bayesian[13][28] = -0.0529765; qual_match_simple_bayesian[13][29] = -0.0526558; qual_match_simple_bayesian[13][30] = -0.0524012; qual_match_simple_bayesian[13][31] = -0.0521989; qual_match_simple_bayesian[13][32] = -0.0520383; qual_match_simple_bayesian[13][33] = -0.0519108; qual_match_simple_bayesian[13][34] = -0.0518095; qual_match_simple_bayesian[13][35] = -0.051729; qual_match_simple_bayesian[13][36] = -0.0516651; qual_match_simple_bayesian[13][37] = -0.0516143; qual_match_simple_bayesian[13][38] = -0.051574; qual_match_simple_bayesian[13][39] = -0.051542; qual_match_simple_bayesian[13][40] = -0.0515165; qual_match_simple_bayesian[13][41] = -0.0514963; qual_match_simple_bayesian[13][42] = -0.0514803; qual_match_simple_bayesian[13][43] = -0.0514675; qual_match_simple_bayesian[13][44] = -0.0514574; qual_match_simple_bayesian[13][45] = -0.0514493; qual_match_simple_bayesian[13][46] = -0.051443; qual_match_simple_bayesian[14][0] = -4.32223; qual_match_simple_bayesian[14][1] = -1.5701; qual_match_simple_bayesian[14][2] = -1.01411; qual_match_simple_bayesian[14][3] = -0.722359; qual_match_simple_bayesian[14][4] = -0.539201; qual_match_simple_bayesian[14][5] = -0.414384; qual_match_simple_bayesian[14][6] = -0.325268; qual_match_simple_bayesian[14][7] = -0.259737; qual_match_simple_bayesian[14][8] = -0.210582; qual_match_simple_bayesian[14][9] = -0.173188; qual_match_simple_bayesian[14][10] = -0.144451; qual_match_simple_bayesian[14][11] = -0.122198; qual_match_simple_bayesian[14][12] = -0.104869; qual_match_simple_bayesian[14][13] = -0.0913142; qual_match_simple_bayesian[14][14] = -0.0806768; qual_match_simple_bayesian[14][15] = -0.0723072; qual_match_simple_bayesian[14][16] = -0.0657085; qual_match_simple_bayesian[14][17] = -0.0604979; qual_match_simple_bayesian[14][18] = -0.0563782; qual_match_simple_bayesian[14][19] = -0.0531178; qual_match_simple_bayesian[14][20] = -0.0505356; qual_match_simple_bayesian[14][21] = -0.0484892; qual_match_simple_bayesian[14][22] = -0.0468667; qual_match_simple_bayesian[14][23] = -0.0455797; qual_match_simple_bayesian[14][24] = -0.0445586; qual_match_simple_bayesian[14][25] = -0.0437483; qual_match_simple_bayesian[14][26] = -0.0431051; qual_match_simple_bayesian[14][27] = -0.0425945; qual_match_simple_bayesian[14][28] = -0.0421891; qual_match_simple_bayesian[14][29] = -0.0418671; qual_match_simple_bayesian[14][30] = -0.0416115; qual_match_simple_bayesian[14][31] = -0.0414085; qual_match_simple_bayesian[14][32] = -0.0412473; qual_match_simple_bayesian[14][33] = -0.0411192; qual_match_simple_bayesian[14][34] = -0.0410175; qual_match_simple_bayesian[14][35] = -0.0409368; qual_match_simple_bayesian[14][36] = -0.0408726; qual_match_simple_bayesian[14][37] = -0.0408216; qual_match_simple_bayesian[14][38] = -0.0407812; qual_match_simple_bayesian[14][39] = -0.040749; qual_match_simple_bayesian[14][40] = -0.0407235; qual_match_simple_bayesian[14][41] = -0.0407032; qual_match_simple_bayesian[14][42] = -0.0406871; qual_match_simple_bayesian[14][43] = -0.0406743; qual_match_simple_bayesian[14][44] = -0.0406641; qual_match_simple_bayesian[14][45] = -0.040656; qual_match_simple_bayesian[14][46] = -0.0406496; qual_match_simple_bayesian[15][0] = -4.55249; qual_match_simple_bayesian[15][1] = -1.57243; qual_match_simple_bayesian[15][2] = -1.01054; qual_match_simple_bayesian[15][3] = -0.71678; qual_match_simple_bayesian[15][4] = -0.532636; qual_match_simple_bayesian[15][5] = -0.407243; qual_match_simple_bayesian[15][6] = -0.317757; qual_match_simple_bayesian[15][7] = -0.251976; qual_match_simple_bayesian[15][8] = -0.202642; qual_match_simple_bayesian[15][9] = -0.165119; qual_match_simple_bayesian[15][10] = -0.136285; qual_match_simple_bayesian[15][11] = -0.11396; qual_match_simple_bayesian[15][12] = -0.096575; qual_match_simple_bayesian[15][13] = -0.0829777; qual_match_simple_bayesian[15][14] = -0.0723072; qual_match_simple_bayesian[15][15] = -0.0639118; qual_match_simple_bayesian[15][16] = -0.0572929; qual_match_simple_bayesian[15][17] = -0.0520664; qual_match_simple_bayesian[15][18] = -0.0479342; qual_match_simple_bayesian[15][19] = -0.044664; qual_match_simple_bayesian[15][20] = -0.042074; qual_match_simple_bayesian[15][21] = -0.0400214; qual_match_simple_bayesian[15][22] = -0.038394; qual_match_simple_bayesian[15][23] = -0.0371032; qual_match_simple_bayesian[15][24] = -0.0360791; qual_match_simple_bayesian[15][25] = -0.0352663; qual_match_simple_bayesian[15][26] = -0.0346212; qual_match_simple_bayesian[15][27] = -0.0341091; qual_match_simple_bayesian[15][28] = -0.0337024; qual_match_simple_bayesian[15][29] = -0.0333796; qual_match_simple_bayesian[15][30] = -0.0331232; qual_match_simple_bayesian[15][31] = -0.0329196; qual_match_simple_bayesian[15][32] = -0.0327579; qual_match_simple_bayesian[15][33] = -0.0326294; qual_match_simple_bayesian[15][34] = -0.0325274; qual_match_simple_bayesian[15][35] = -0.0324464; qual_match_simple_bayesian[15][36] = -0.0323821; qual_match_simple_bayesian[15][37] = -0.0323309; qual_match_simple_bayesian[15][38] = -0.0322904; qual_match_simple_bayesian[15][39] = -0.0322581; qual_match_simple_bayesian[15][40] = -0.0322325; qual_match_simple_bayesian[15][41] = -0.0322121; qual_match_simple_bayesian[15][42] = -0.032196; qual_match_simple_bayesian[15][43] = -0.0321831; qual_match_simple_bayesian[15][44] = -0.032173; qual_match_simple_bayesian[15][45] = -0.0321649; qual_match_simple_bayesian[15][46] = -0.0321584; qual_match_simple_bayesian[16][0] = -4.78275; qual_match_simple_bayesian[16][1] = -1.57428; qual_match_simple_bayesian[16][2] = -1.00771; qual_match_simple_bayesian[16][3] = -0.712372; qual_match_simple_bayesian[16][4] = -0.527451; qual_match_simple_bayesian[16][5] = -0.401606; qual_match_simple_bayesian[16][6] = -0.311831; qual_match_simple_bayesian[16][7] = -0.245853; qual_match_simple_bayesian[16][8] = -0.19638; qual_match_simple_bayesian[16][9] = -0.158755; qual_match_simple_bayesian[16][10] = -0.129846; qual_match_simple_bayesian[16][11] = -0.107464; qual_match_simple_bayesian[16][12] = -0.0900357; qual_match_simple_bayesian[16][13] = -0.0764049; qual_match_simple_bayesian[16][14] = -0.0657085; qual_match_simple_bayesian[16][15] = -0.0572929; qual_match_simple_bayesian[16][16] = -0.0506582; qual_match_simple_bayesian[16][17] = -0.0454193; qual_match_simple_bayesian[16][18] = -0.0412773; qual_match_simple_bayesian[16][19] = -0.0379994; qual_match_simple_bayesian[16][20] = -0.0354033; qual_match_simple_bayesian[16][21] = -0.033346; qual_match_simple_bayesian[16][22] = -0.0317148; qual_match_simple_bayesian[16][23] = -0.0304209; qual_match_simple_bayesian[16][24] = -0.0293944; qual_match_simple_bayesian[16][25] = -0.0285798; qual_match_simple_bayesian[16][26] = -0.0279331; qual_match_simple_bayesian[16][27] = -0.0274198; qual_match_simple_bayesian[16][28] = -0.0270122; qual_match_simple_bayesian[16][29] = -0.0266886; qual_match_simple_bayesian[16][30] = -0.0264316; qual_match_simple_bayesian[16][31] = -0.0262275; qual_match_simple_bayesian[16][32] = -0.0260655; qual_match_simple_bayesian[16][33] = -0.0259367; qual_match_simple_bayesian[16][34] = -0.0258345; qual_match_simple_bayesian[16][35] = -0.0257533; qual_match_simple_bayesian[16][36] = -0.0256888; qual_match_simple_bayesian[16][37] = -0.0256376; qual_match_simple_bayesian[16][38] = -0.0255969; qual_match_simple_bayesian[16][39] = -0.0255645; qual_match_simple_bayesian[16][40] = -0.0255389; qual_match_simple_bayesian[16][41] = -0.0255185; qual_match_simple_bayesian[16][42] = -0.0255023; qual_match_simple_bayesian[16][43] = -0.0254894; qual_match_simple_bayesian[16][44] = -0.0254792; qual_match_simple_bayesian[16][45] = -0.0254711; qual_match_simple_bayesian[16][46] = -0.0254646; qual_match_simple_bayesian[17][0] = -5.01301; qual_match_simple_bayesian[17][1] = -1.57576; qual_match_simple_bayesian[17][2] = -1.00546; qual_match_simple_bayesian[17][3] = -0.708883; qual_match_simple_bayesian[17][4] = -0.523352; qual_match_simple_bayesian[17][5] = -0.397151; qual_match_simple_bayesian[17][6] = -0.307149; qual_match_simple_bayesian[17][7] = -0.241016; qual_match_simple_bayesian[17][8] = -0.191434; qual_match_simple_bayesian[17][9] = -0.153729; qual_match_simple_bayesian[17][10] = -0.124761; qual_match_simple_bayesian[17][11] = -0.102334; qual_match_simple_bayesian[17][12] = -0.0848716; qual_match_simple_bayesian[17][13] = -0.0712146; qual_match_simple_bayesian[17][14] = -0.0604979; qual_match_simple_bayesian[17][15] = -0.0520664; qual_match_simple_bayesian[17][16] = -0.0454193; qual_match_simple_bayesian[17][17] = -0.0401706; qual_match_simple_bayesian[17][18] = -0.036021; qual_match_simple_bayesian[17][19] = -0.032737; qual_match_simple_bayesian[17][20] = -0.0301362; qual_match_simple_bayesian[17][21] = -0.028075; qual_match_simple_bayesian[17][22] = -0.0264408; qual_match_simple_bayesian[17][23] = -0.0251447; qual_match_simple_bayesian[17][24] = -0.0241163; qual_match_simple_bayesian[17][25] = -0.0233001; qual_match_simple_bayesian[17][26] = -0.0226523; qual_match_simple_bayesian[17][27] = -0.0221381; qual_match_simple_bayesian[17][28] = -0.0217297; qual_match_simple_bayesian[17][29] = -0.0214055; qual_match_simple_bayesian[17][30] = -0.0211481; qual_match_simple_bayesian[17][31] = -0.0209436; qual_match_simple_bayesian[17][32] = -0.0207812; qual_match_simple_bayesian[17][33] = -0.0206523; qual_match_simple_bayesian[17][34] = -0.0205498; qual_match_simple_bayesian[17][35] = -0.0204685; qual_match_simple_bayesian[17][36] = -0.0204039; qual_match_simple_bayesian[17][37] = -0.0203526; qual_match_simple_bayesian[17][38] = -0.0203118; qual_match_simple_bayesian[17][39] = -0.0202794; qual_match_simple_bayesian[17][40] = -0.0202537; qual_match_simple_bayesian[17][41] = -0.0202333; qual_match_simple_bayesian[17][42] = -0.020217; qual_match_simple_bayesian[17][43] = -0.0202041; qual_match_simple_bayesian[17][44] = -0.0201939; qual_match_simple_bayesian[17][45] = -0.0201858; qual_match_simple_bayesian[17][46] = -0.0201793; qual_match_simple_bayesian[18][0] = -5.24327; qual_match_simple_bayesian[18][1] = -1.57693; qual_match_simple_bayesian[18][2] = -1.00368; qual_match_simple_bayesian[18][3] = -0.706121; qual_match_simple_bayesian[18][4] = -0.520107; qual_match_simple_bayesian[18][5] = -0.393627; qual_match_simple_bayesian[18][6] = -0.303445; qual_match_simple_bayesian[18][7] = -0.23719; qual_match_simple_bayesian[18][8] = -0.187522; qual_match_simple_bayesian[18][9] = -0.149755; qual_match_simple_bayesian[18][10] = -0.12074; qual_match_simple_bayesian[18][11] = -0.0982781; qual_match_simple_bayesian[18][12] = -0.0807886; qual_match_simple_bayesian[18][13] = -0.0671109; qual_match_simple_bayesian[18][14] = -0.0563782; qual_match_simple_bayesian[18][15] = -0.0479342; qual_match_simple_bayesian[18][16] = -0.0412773; qual_match_simple_bayesian[18][17] = -0.036021; qual_match_simple_bayesian[18][18] = -0.0318653; qual_match_simple_bayesian[18][19] = -0.0285766; qual_match_simple_bayesian[18][20] = -0.025972; qual_match_simple_bayesian[18][21] = -0.0239079; qual_match_simple_bayesian[18][22] = -0.0222713; qual_match_simple_bayesian[18][23] = -0.0209733; qual_match_simple_bayesian[18][24] = -0.0199434; qual_match_simple_bayesian[18][25] = -0.0191261; qual_match_simple_bayesian[18][26] = -0.0184774; qual_match_simple_bayesian[18][27] = -0.0179624; qual_match_simple_bayesian[18][28] = -0.0175535; qual_match_simple_bayesian[18][29] = -0.0172288; qual_match_simple_bayesian[18][30] = -0.016971; qual_match_simple_bayesian[18][31] = -0.0167662; qual_match_simple_bayesian[18][32] = -0.0166036; qual_match_simple_bayesian[18][33] = -0.0164745; qual_match_simple_bayesian[18][34] = -0.0163719; qual_match_simple_bayesian[18][35] = -0.0162904; qual_match_simple_bayesian[18][36] = -0.0162257; qual_match_simple_bayesian[18][37] = -0.0161743; qual_match_simple_bayesian[18][38] = -0.0161335; qual_match_simple_bayesian[18][39] = -0.0161011; qual_match_simple_bayesian[18][40] = -0.0160753; qual_match_simple_bayesian[18][41] = -0.0160549; qual_match_simple_bayesian[18][42] = -0.0160386; qual_match_simple_bayesian[18][43] = -0.0160257; qual_match_simple_bayesian[18][44] = -0.0160155; qual_match_simple_bayesian[18][45] = -0.0160073; qual_match_simple_bayesian[18][46] = -0.0160009; qual_match_simple_bayesian[19][0] = -5.47352; qual_match_simple_bayesian[19][1] = -1.57786; qual_match_simple_bayesian[19][2] = -1.00227; qual_match_simple_bayesian[19][3] = -0.703933; qual_match_simple_bayesian[19][4] = -0.517538; qual_match_simple_bayesian[19][5] = -0.390836; qual_match_simple_bayesian[19][6] = -0.300513; qual_match_simple_bayesian[19][7] = -0.234162; qual_match_simple_bayesian[19][8] = -0.184426; qual_match_simple_bayesian[19][9] = -0.146609; qual_match_simple_bayesian[19][10] = -0.117558; qual_match_simple_bayesian[19][11] = -0.0950678; qual_match_simple_bayesian[19][12] = -0.0775572; qual_match_simple_bayesian[19][13] = -0.0638632; qual_match_simple_bayesian[19][14] = -0.0531178; qual_match_simple_bayesian[19][15] = -0.044664; qual_match_simple_bayesian[19][16] = -0.0379994; qual_match_simple_bayesian[19][17] = -0.032737; qual_match_simple_bayesian[19][18] = -0.0285766; qual_match_simple_bayesian[19][19] = -0.0252842; qual_match_simple_bayesian[19][20] = -0.0226766; qual_match_simple_bayesian[19][21] = -0.0206101; qual_match_simple_bayesian[19][22] = -0.0189717; qual_match_simple_bayesian[19][23] = -0.0176722; qual_match_simple_bayesian[19][24] = -0.0166412; qual_match_simple_bayesian[19][25] = -0.015823; qual_match_simple_bayesian[19][26] = -0.0151735; qual_match_simple_bayesian[19][27] = -0.0146579; qual_match_simple_bayesian[19][28] = -0.0142486; qual_match_simple_bayesian[19][29] = -0.0139235; qual_match_simple_bayesian[19][30] = -0.0136654; qual_match_simple_bayesian[19][31] = -0.0134604; qual_match_simple_bayesian[19][32] = -0.0132976; qual_match_simple_bayesian[19][33] = -0.0131684; qual_match_simple_bayesian[19][34] = -0.0130657; qual_match_simple_bayesian[19][35] = -0.0129841; qual_match_simple_bayesian[19][36] = -0.0129193; qual_match_simple_bayesian[19][37] = -0.0128679; qual_match_simple_bayesian[19][38] = -0.012827; qual_match_simple_bayesian[19][39] = -0.0127945; qual_match_simple_bayesian[19][40] = -0.0127688; qual_match_simple_bayesian[19][41] = -0.0127483; qual_match_simple_bayesian[19][42] = -0.012732; qual_match_simple_bayesian[19][43] = -0.0127191; qual_match_simple_bayesian[19][44] = -0.0127088; qual_match_simple_bayesian[19][45] = -0.0127007; qual_match_simple_bayesian[19][46] = -0.0126942; qual_match_simple_bayesian[20][0] = -5.70378; qual_match_simple_bayesian[20][1] = -1.5786; qual_match_simple_bayesian[20][2] = -1.00115; qual_match_simple_bayesian[20][3] = -0.702197; qual_match_simple_bayesian[20][4] = -0.515502; qual_match_simple_bayesian[20][5] = -0.388625; qual_match_simple_bayesian[20][6] = -0.29819; qual_match_simple_bayesian[20][7] = -0.231763; qual_match_simple_bayesian[20][8] = -0.181973; qual_match_simple_bayesian[20][9] = -0.144117; qual_match_simple_bayesian[20][10] = -0.115037; qual_match_simple_bayesian[20][11] = -0.0925252; qual_match_simple_bayesian[20][12] = -0.0749978; qual_match_simple_bayesian[20][13] = -0.061291; qual_match_simple_bayesian[20][14] = -0.0505356; qual_match_simple_bayesian[20][15] = -0.042074; qual_match_simple_bayesian[20][16] = -0.0354033; qual_match_simple_bayesian[20][17] = -0.0301362; qual_match_simple_bayesian[20][18] = -0.025972; qual_match_simple_bayesian[20][19] = -0.0226766; qual_match_simple_bayesian[20][20] = -0.0200667; qual_match_simple_bayesian[20][21] = -0.0179984; qual_match_simple_bayesian[20][22] = -0.0163585; qual_match_simple_bayesian[20][23] = -0.0150578; qual_match_simple_bayesian[20][24] = -0.0140259; qual_match_simple_bayesian[20][25] = -0.0132069; qual_match_simple_bayesian[20][26] = -0.0125569; qual_match_simple_bayesian[20][27] = -0.0120409; qual_match_simple_bayesian[20][28] = -0.0116311; qual_match_simple_bayesian[20][29] = -0.0113058; qual_match_simple_bayesian[20][30] = -0.0110475; qual_match_simple_bayesian[20][31] = -0.0108423; qual_match_simple_bayesian[20][32] = -0.0106794; qual_match_simple_bayesian[20][33] = -0.01055; qual_match_simple_bayesian[20][34] = -0.0104472; qual_match_simple_bayesian[20][35] = -0.0103655; qual_match_simple_bayesian[20][36] = -0.0103007; qual_match_simple_bayesian[20][37] = -0.0102492; qual_match_simple_bayesian[20][38] = -0.0102083; qual_match_simple_bayesian[20][39] = -0.0101758; qual_match_simple_bayesian[20][40] = -0.01015; qual_match_simple_bayesian[20][41] = -0.0101295; qual_match_simple_bayesian[20][42] = -0.0101132; qual_match_simple_bayesian[20][43] = -0.0101003; qual_match_simple_bayesian[20][44] = -0.01009; qual_match_simple_bayesian[20][45] = -0.0100819; qual_match_simple_bayesian[20][46] = -0.0100754; qual_match_simple_bayesian[21][0] = -5.93404; qual_match_simple_bayesian[21][1] = -1.57919; qual_match_simple_bayesian[21][2] = -1.00027; qual_match_simple_bayesian[21][3] = -0.700821; qual_match_simple_bayesian[21][4] = -0.513887; qual_match_simple_bayesian[21][5] = -0.386872; qual_match_simple_bayesian[21][6] = -0.296348; qual_match_simple_bayesian[21][7] = -0.229861; qual_match_simple_bayesian[21][8] = -0.180029; qual_match_simple_bayesian[21][9] = -0.142143; qual_match_simple_bayesian[21][10] = -0.113039; qual_match_simple_bayesian[21][11] = -0.09051; qual_match_simple_bayesian[21][12] = -0.0729694; qual_match_simple_bayesian[21][13] = -0.0592525; qual_match_simple_bayesian[21][14] = -0.0484892; qual_match_simple_bayesian[21][15] = -0.0400214; qual_match_simple_bayesian[21][16] = -0.033346; qual_match_simple_bayesian[21][17] = -0.028075; qual_match_simple_bayesian[21][18] = -0.0239079; qual_match_simple_bayesian[21][19] = -0.0206101; qual_match_simple_bayesian[21][20] = -0.0179984; qual_match_simple_bayesian[21][21] = -0.0159286; qual_match_simple_bayesian[21][22] = -0.0142876; qual_match_simple_bayesian[21][23] = -0.012986; qual_match_simple_bayesian[21][24] = -0.0119533; qual_match_simple_bayesian[21][25] = -0.0111338; qual_match_simple_bayesian[21][26] = -0.0104833; qual_match_simple_bayesian[21][27] = -0.00996692; qual_match_simple_bayesian[21][28] = -0.00955691; qual_match_simple_bayesian[21][29] = -0.00923135; qual_match_simple_bayesian[21][30] = -0.00897283; qual_match_simple_bayesian[21][31] = -0.00876752; qual_match_simple_bayesian[21][32] = -0.00860447; qual_match_simple_bayesian[21][33] = -0.00847497; qual_match_simple_bayesian[21][34] = -0.00837212; qual_match_simple_bayesian[21][35] = -0.00829043; qual_match_simple_bayesian[21][36] = -0.00822555; qual_match_simple_bayesian[21][37] = -0.00817401; qual_match_simple_bayesian[21][38] = -0.00813308; qual_match_simple_bayesian[21][39] = -0.00810056; qual_match_simple_bayesian[21][40] = -0.00807474; qual_match_simple_bayesian[21][41] = -0.00805422; qual_match_simple_bayesian[21][42] = -0.00803793; qual_match_simple_bayesian[21][43] = -0.00802498; qual_match_simple_bayesian[21][44] = -0.0080147; qual_match_simple_bayesian[21][45] = -0.00800654; qual_match_simple_bayesian[21][46] = -0.00800005; qual_match_simple_bayesian[22][0] = -6.1643; qual_match_simple_bayesian[22][1] = -1.57966; qual_match_simple_bayesian[22][2] = -0.99956; qual_match_simple_bayesian[22][3] = -0.69973; qual_match_simple_bayesian[22][4] = -0.512606; qual_match_simple_bayesian[22][5] = -0.385482; qual_match_simple_bayesian[22][6] = -0.294888; qual_match_simple_bayesian[22][7] = -0.228354; qual_match_simple_bayesian[22][8] = -0.178488; qual_match_simple_bayesian[22][9] = -0.140577; qual_match_simple_bayesian[22][10] = -0.111455; qual_match_simple_bayesian[22][11] = -0.0889123; qual_match_simple_bayesian[22][12] = -0.0713612; qual_match_simple_bayesian[22][13] = -0.0576362; qual_match_simple_bayesian[22][14] = -0.0468667; qual_match_simple_bayesian[22][15] = -0.038394; qual_match_simple_bayesian[22][16] = -0.0317148; qual_match_simple_bayesian[22][17] = -0.0264408; qual_match_simple_bayesian[22][18] = -0.0222713; qual_match_simple_bayesian[22][19] = -0.0189717; qual_match_simple_bayesian[22][20] = -0.0163585; qual_match_simple_bayesian[22][21] = -0.0142876; qual_match_simple_bayesian[22][22] = -0.0126457; qual_match_simple_bayesian[22][23] = -0.0113434; qual_match_simple_bayesian[22][24] = -0.0103101; qual_match_simple_bayesian[22][25] = -0.00949014; qual_match_simple_bayesian[22][26] = -0.00883928; qual_match_simple_bayesian[22][27] = -0.00832259; qual_match_simple_bayesian[22][28] = -0.00791235; qual_match_simple_bayesian[22][29] = -0.00758661; qual_match_simple_bayesian[22][30] = -0.00732794; qual_match_simple_bayesian[22][31] = -0.00712252; qual_match_simple_bayesian[22][32] = -0.00695938; qual_match_simple_bayesian[22][33] = -0.00682981; qual_match_simple_bayesian[22][34] = -0.00672691; qual_match_simple_bayesian[22][35] = -0.00664517; qual_match_simple_bayesian[22][36] = -0.00658025; qual_match_simple_bayesian[22][37] = -0.00652869; qual_match_simple_bayesian[22][38] = -0.00648773; qual_match_simple_bayesian[22][39] = -0.0064552; qual_match_simple_bayesian[22][40] = -0.00642936; qual_match_simple_bayesian[22][41] = -0.00640883; qual_match_simple_bayesian[22][42] = -0.00639253; qual_match_simple_bayesian[22][43] = -0.00637958; qual_match_simple_bayesian[22][44] = -0.00636929; qual_match_simple_bayesian[22][45] = -0.00636112; qual_match_simple_bayesian[22][46] = -0.00635463; qual_match_simple_bayesian[23][0] = -6.39456; qual_match_simple_bayesian[23][1] = -1.58003; qual_match_simple_bayesian[23][2] = -0.999001; qual_match_simple_bayesian[23][3] = -0.698863; qual_match_simple_bayesian[23][4] = -0.51159; qual_match_simple_bayesian[23][5] = -0.384379; qual_match_simple_bayesian[23][6] = -0.29373; qual_match_simple_bayesian[23][7] = -0.227158; qual_match_simple_bayesian[23][8] = -0.177265; qual_match_simple_bayesian[23][9] = -0.139335; qual_match_simple_bayesian[23][10] = -0.110198; qual_match_simple_bayesian[23][11] = -0.0876449; qual_match_simple_bayesian[23][12] = -0.0700856; qual_match_simple_bayesian[23][13] = -0.0563542; qual_match_simple_bayesian[23][14] = -0.0455797; qual_match_simple_bayesian[23][15] = -0.0371032; qual_match_simple_bayesian[23][16] = -0.0304209; qual_match_simple_bayesian[23][17] = -0.0251447; qual_match_simple_bayesian[23][18] = -0.0209733; qual_match_simple_bayesian[23][19] = -0.0176722; qual_match_simple_bayesian[23][20] = -0.0150578; qual_match_simple_bayesian[23][21] = -0.012986; qual_match_simple_bayesian[23][22] = -0.0113434; qual_match_simple_bayesian[23][23] = -0.0100405; qual_match_simple_bayesian[23][24] = -0.00900678; qual_match_simple_bayesian[23][25] = -0.00818644; qual_match_simple_bayesian[23][26] = -0.00753529; qual_match_simple_bayesian[23][27] = -0.00701837; qual_match_simple_bayesian[23][28] = -0.00660796; qual_match_simple_bayesian[23][29] = -0.00628208; qual_match_simple_bayesian[23][30] = -0.00602329; qual_match_simple_bayesian[23][31] = -0.00581778; qual_match_simple_bayesian[23][32] = -0.00565457; qual_match_simple_bayesian[23][33] = -0.00552494; qual_match_simple_bayesian[23][34] = -0.00542199; qual_match_simple_bayesian[23][35] = -0.00534022; qual_match_simple_bayesian[23][36] = -0.00527527; qual_match_simple_bayesian[23][37] = -0.00522368; qual_match_simple_bayesian[23][38] = -0.00518271; qual_match_simple_bayesian[23][39] = -0.00515016; qual_match_simple_bayesian[23][40] = -0.00512431; qual_match_simple_bayesian[23][41] = -0.00510378; qual_match_simple_bayesian[23][42] = -0.00508747; qual_match_simple_bayesian[23][43] = -0.00507451; qual_match_simple_bayesian[23][44] = -0.00506422; qual_match_simple_bayesian[23][45] = -0.00505604; qual_match_simple_bayesian[23][46] = -0.00504955; qual_match_simple_bayesian[24][0] = -6.62482; qual_match_simple_bayesian[24][1] = -1.58033; qual_match_simple_bayesian[24][2] = -0.998557; qual_match_simple_bayesian[24][3] = -0.698176; qual_match_simple_bayesian[24][4] = -0.510784; qual_match_simple_bayesian[24][5] = -0.383503; qual_match_simple_bayesian[24][6] = -0.29281; qual_match_simple_bayesian[24][7] = -0.226208; qual_match_simple_bayesian[24][8] = -0.176295; qual_match_simple_bayesian[24][9] = -0.138349; qual_match_simple_bayesian[24][10] = -0.109202; qual_match_simple_bayesian[24][11] = -0.0866394; qual_match_simple_bayesian[24][12] = -0.0690735; qual_match_simple_bayesian[24][13] = -0.055337; qual_match_simple_bayesian[24][14] = -0.0445586; qual_match_simple_bayesian[24][15] = -0.0360791; qual_match_simple_bayesian[24][16] = -0.0293944; qual_match_simple_bayesian[24][17] = -0.0241163; qual_match_simple_bayesian[24][18] = -0.0199434; qual_match_simple_bayesian[24][19] = -0.0166412; qual_match_simple_bayesian[24][20] = -0.0140259; qual_match_simple_bayesian[24][21] = -0.0119533; qual_match_simple_bayesian[24][22] = -0.0103101; qual_match_simple_bayesian[24][23] = -0.00900678; qual_match_simple_bayesian[24][24] = -0.00797271; qual_match_simple_bayesian[24][25] = -0.00715208; qual_match_simple_bayesian[24][26] = -0.00650071; qual_match_simple_bayesian[24][27] = -0.00598361; qual_match_simple_bayesian[24][28] = -0.00557305; qual_match_simple_bayesian[24][29] = -0.00524706; qual_match_simple_bayesian[24][30] = -0.00498818; qual_match_simple_bayesian[24][31] = -0.0047826; qual_match_simple_bayesian[24][32] = -0.00461933; qual_match_simple_bayesian[24][33] = -0.00448966; qual_match_simple_bayesian[24][34] = -0.00438667; qual_match_simple_bayesian[24][35] = -0.00430487; qual_match_simple_bayesian[24][36] = -0.0042399; qual_match_simple_bayesian[24][37] = -0.0041883; qual_match_simple_bayesian[24][38] = -0.00414731; qual_match_simple_bayesian[24][39] = -0.00411475; qual_match_simple_bayesian[24][40] = -0.00408889; qual_match_simple_bayesian[24][41] = -0.00406835; qual_match_simple_bayesian[24][42] = -0.00405203; qual_match_simple_bayesian[24][43] = -0.00403907; qual_match_simple_bayesian[24][44] = -0.00402878; qual_match_simple_bayesian[24][45] = -0.0040206; qual_match_simple_bayesian[24][46] = -0.0040141; qual_match_simple_bayesian[25][0] = -6.85508; qual_match_simple_bayesian[25][1] = -1.58057; qual_match_simple_bayesian[25][2] = -0.998204; qual_match_simple_bayesian[25][3] = -0.69763; qual_match_simple_bayesian[25][4] = -0.510144; qual_match_simple_bayesian[25][5] = -0.382809; qual_match_simple_bayesian[25][6] = -0.292081; qual_match_simple_bayesian[25][7] = -0.225455; qual_match_simple_bayesian[25][8] = -0.175525; qual_match_simple_bayesian[25][9] = -0.137567; qual_match_simple_bayesian[25][10] = -0.10841; qual_match_simple_bayesian[25][11] = -0.0858414; qual_match_simple_bayesian[25][12] = -0.0682703; qual_match_simple_bayesian[25][13] = -0.0545298; qual_match_simple_bayesian[25][14] = -0.0437483; qual_match_simple_bayesian[25][15] = -0.0352663; qual_match_simple_bayesian[25][16] = -0.0285798; qual_match_simple_bayesian[25][17] = -0.0233001; qual_match_simple_bayesian[25][18] = -0.0191261; qual_match_simple_bayesian[25][19] = -0.015823; qual_match_simple_bayesian[25][20] = -0.0132069; qual_match_simple_bayesian[25][21] = -0.0111338; qual_match_simple_bayesian[25][22] = -0.00949014; qual_match_simple_bayesian[25][23] = -0.00818644; qual_match_simple_bayesian[25][24] = -0.00715208; qual_match_simple_bayesian[25][25] = -0.00633122; qual_match_simple_bayesian[25][26] = -0.00567967; qual_match_simple_bayesian[25][27] = -0.00516243; qual_match_simple_bayesian[25][28] = -0.00475176; qual_match_simple_bayesian[25][29] = -0.00442567; qual_match_simple_bayesian[25][30] = -0.00416673; qual_match_simple_bayesian[25][31] = -0.00396109; qual_match_simple_bayesian[25][32] = -0.00379778; qual_match_simple_bayesian[25][33] = -0.00366807; qual_match_simple_bayesian[25][34] = -0.00356505; qual_match_simple_bayesian[25][35] = -0.00348323; qual_match_simple_bayesian[25][36] = -0.00341824; qual_match_simple_bayesian[25][37] = -0.00336662; qual_match_simple_bayesian[25][38] = -0.00332562; qual_match_simple_bayesian[25][39] = -0.00329306; qual_match_simple_bayesian[25][40] = -0.00326719; qual_match_simple_bayesian[25][41] = -0.00324664; qual_match_simple_bayesian[25][42] = -0.00323032; qual_match_simple_bayesian[25][43] = -0.00321736; qual_match_simple_bayesian[25][44] = -0.00320706; qual_match_simple_bayesian[25][45] = -0.00319888; qual_match_simple_bayesian[25][46] = -0.00319238; qual_match_simple_bayesian[26][0] = -7.08533; qual_match_simple_bayesian[26][1] = -1.58075; qual_match_simple_bayesian[26][2] = -0.997924; qual_match_simple_bayesian[26][3] = -0.697196; qual_match_simple_bayesian[26][4] = -0.509636; qual_match_simple_bayesian[26][5] = -0.382257; qual_match_simple_bayesian[26][6] = -0.291502; qual_match_simple_bayesian[26][7] = -0.224857; qual_match_simple_bayesian[26][8] = -0.174914; qual_match_simple_bayesian[26][9] = -0.136946; qual_match_simple_bayesian[26][10] = -0.107782; qual_match_simple_bayesian[26][11] = -0.0852079; qual_match_simple_bayesian[26][12] = -0.0676327; qual_match_simple_bayesian[26][13] = -0.053889; qual_match_simple_bayesian[26][14] = -0.0431051; qual_match_simple_bayesian[26][15] = -0.0346212; qual_match_simple_bayesian[26][16] = -0.0279331; qual_match_simple_bayesian[26][17] = -0.0226523; qual_match_simple_bayesian[26][18] = -0.0184774; qual_match_simple_bayesian[26][19] = -0.0151735; qual_match_simple_bayesian[26][20] = -0.0125569; qual_match_simple_bayesian[26][21] = -0.0104833; qual_match_simple_bayesian[26][22] = -0.00883928; qual_match_simple_bayesian[26][23] = -0.00753529; qual_match_simple_bayesian[26][24] = -0.00650071; qual_match_simple_bayesian[26][25] = -0.00567967; qual_match_simple_bayesian[26][26] = -0.00502798; qual_match_simple_bayesian[26][27] = -0.00451062; qual_match_simple_bayesian[26][28] = -0.00409986; qual_match_simple_bayesian[26][29] = -0.00377371; qual_match_simple_bayesian[26][30] = -0.00351471; qual_match_simple_bayesian[26][31] = -0.00330902; qual_match_simple_bayesian[26][32] = -0.00314567; qual_match_simple_bayesian[26][33] = -0.00301594; qual_match_simple_bayesian[26][34] = -0.0029129; qual_match_simple_bayesian[26][35] = -0.00283106; qual_match_simple_bayesian[26][36] = -0.00276606; qual_match_simple_bayesian[26][37] = -0.00271443; qual_match_simple_bayesian[26][38] = -0.00267342; qual_match_simple_bayesian[26][39] = -0.00264084; qual_match_simple_bayesian[26][40] = -0.00261497; qual_match_simple_bayesian[26][41] = -0.00259442; qual_match_simple_bayesian[26][42] = -0.00257809; qual_match_simple_bayesian[26][43] = -0.00256512; qual_match_simple_bayesian[26][44] = -0.00255482; qual_match_simple_bayesian[26][45] = -0.00254664; qual_match_simple_bayesian[26][46] = -0.00254014; qual_match_simple_bayesian[27][0] = -7.31559; qual_match_simple_bayesian[27][1] = -1.5809; qual_match_simple_bayesian[27][2] = -0.997702; qual_match_simple_bayesian[27][3] = -0.696852; qual_match_simple_bayesian[27][4] = -0.509232; qual_match_simple_bayesian[27][5] = -0.38182; qual_match_simple_bayesian[27][6] = -0.291042; qual_match_simple_bayesian[27][7] = -0.224383; qual_match_simple_bayesian[27][8] = -0.174428; qual_match_simple_bayesian[27][9] = -0.136453; qual_match_simple_bayesian[27][10] = -0.107284; qual_match_simple_bayesian[27][11] = -0.0847051; qual_match_simple_bayesian[27][12] = -0.0671265; qual_match_simple_bayesian[27][13] = -0.0533804; qual_match_simple_bayesian[27][14] = -0.0425945; qual_match_simple_bayesian[27][15] = -0.0341091; qual_match_simple_bayesian[27][16] = -0.0274198; qual_match_simple_bayesian[27][17] = -0.0221381; qual_match_simple_bayesian[27][18] = -0.0179624; qual_match_simple_bayesian[27][19] = -0.0146579; qual_match_simple_bayesian[27][20] = -0.0120409; qual_match_simple_bayesian[27][21] = -0.00996692; qual_match_simple_bayesian[27][22] = -0.00832259; qual_match_simple_bayesian[27][23] = -0.00701837; qual_match_simple_bayesian[27][24] = -0.00598361; qual_match_simple_bayesian[27][25] = -0.00516243; qual_match_simple_bayesian[27][26] = -0.00451062; qual_match_simple_bayesian[27][27] = -0.00399318; qual_match_simple_bayesian[27][28] = -0.00358235; qual_match_simple_bayesian[27][29] = -0.00325613; qual_match_simple_bayesian[27][30] = -0.00299709; qual_match_simple_bayesian[27][31] = -0.00279137; qual_match_simple_bayesian[27][32] = -0.00262799; qual_match_simple_bayesian[27][33] = -0.00249823; qual_match_simple_bayesian[27][34] = -0.00239518; qual_match_simple_bayesian[27][35] = -0.00231332; qual_match_simple_bayesian[27][36] = -0.00224831; qual_match_simple_bayesian[27][37] = -0.00219667; qual_match_simple_bayesian[27][38] = -0.00215565; qual_match_simple_bayesian[27][39] = -0.00212307; qual_match_simple_bayesian[27][40] = -0.00209719; qual_match_simple_bayesian[27][41] = -0.00207664; qual_match_simple_bayesian[27][42] = -0.00206031; qual_match_simple_bayesian[27][43] = -0.00204734; qual_match_simple_bayesian[27][44] = -0.00203704; qual_match_simple_bayesian[27][45] = -0.00202886; qual_match_simple_bayesian[27][46] = -0.00202236; qual_match_simple_bayesian[28][0] = -7.54585; qual_match_simple_bayesian[28][1] = -1.58102; qual_match_simple_bayesian[28][2] = -0.997525; qual_match_simple_bayesian[28][3] = -0.696579; qual_match_simple_bayesian[28][4] = -0.508912; qual_match_simple_bayesian[28][5] = -0.381472; qual_match_simple_bayesian[28][6] = -0.290677; qual_match_simple_bayesian[28][7] = -0.224006; qual_match_simple_bayesian[28][8] = -0.174043; qual_match_simple_bayesian[28][9] = -0.136062; qual_match_simple_bayesian[28][10] = -0.106888; qual_match_simple_bayesian[28][11] = -0.0843058; qual_match_simple_bayesian[28][12] = -0.0667247; qual_match_simple_bayesian[28][13] = -0.0529765; qual_match_simple_bayesian[28][14] = -0.0421891; qual_match_simple_bayesian[28][15] = -0.0337024; qual_match_simple_bayesian[28][16] = -0.0270122; qual_match_simple_bayesian[28][17] = -0.0217297; qual_match_simple_bayesian[28][18] = -0.0175535; qual_match_simple_bayesian[28][19] = -0.0142486; qual_match_simple_bayesian[28][20] = -0.0116311; qual_match_simple_bayesian[28][21] = -0.00955691; qual_match_simple_bayesian[28][22] = -0.00791235; qual_match_simple_bayesian[28][23] = -0.00660796; qual_match_simple_bayesian[28][24] = -0.00557305; qual_match_simple_bayesian[28][25] = -0.00475176; qual_match_simple_bayesian[28][26] = -0.00409986; qual_match_simple_bayesian[28][27] = -0.00358235; qual_match_simple_bayesian[28][28] = -0.00317146; qual_match_simple_bayesian[28][29] = -0.0028452; qual_match_simple_bayesian[28][30] = -0.00258612; qual_match_simple_bayesian[28][31] = -0.00238037; qual_match_simple_bayesian[28][32] = -0.00221697; qual_match_simple_bayesian[28][33] = -0.0020872; qual_match_simple_bayesian[28][34] = -0.00198413; qual_match_simple_bayesian[28][35] = -0.00190226; qual_match_simple_bayesian[28][36] = -0.00183724; qual_match_simple_bayesian[28][37] = -0.00178559; qual_match_simple_bayesian[28][38] = -0.00174457; qual_match_simple_bayesian[28][39] = -0.00171198; qual_match_simple_bayesian[28][40] = -0.0016861; qual_match_simple_bayesian[28][41] = -0.00166554; qual_match_simple_bayesian[28][42] = -0.00164921; qual_match_simple_bayesian[28][43] = -0.00163624; qual_match_simple_bayesian[28][44] = -0.00162594; qual_match_simple_bayesian[28][45] = -0.00161776; qual_match_simple_bayesian[28][46] = -0.00161126; qual_match_simple_bayesian[29][0] = -7.77611; qual_match_simple_bayesian[29][1] = -1.58111; qual_match_simple_bayesian[29][2] = -0.997385; qual_match_simple_bayesian[29][3] = -0.696362; qual_match_simple_bayesian[29][4] = -0.508658; qual_match_simple_bayesian[29][5] = -0.381196; qual_match_simple_bayesian[29][6] = -0.290387; qual_match_simple_bayesian[29][7] = -0.223707; qual_match_simple_bayesian[29][8] = -0.173737; qual_match_simple_bayesian[29][9] = -0.135751; qual_match_simple_bayesian[29][10] = -0.106574; qual_match_simple_bayesian[29][11] = -0.0839888; qual_match_simple_bayesian[29][12] = -0.0664056; qual_match_simple_bayesian[29][13] = -0.0526558; qual_match_simple_bayesian[29][14] = -0.0418671; qual_match_simple_bayesian[29][15] = -0.0333796; qual_match_simple_bayesian[29][16] = -0.0266886; qual_match_simple_bayesian[29][17] = -0.0214055; qual_match_simple_bayesian[29][18] = -0.0172288; qual_match_simple_bayesian[29][19] = -0.0139235; qual_match_simple_bayesian[29][20] = -0.0113058; qual_match_simple_bayesian[29][21] = -0.00923135; qual_match_simple_bayesian[29][22] = -0.00758661; qual_match_simple_bayesian[29][23] = -0.00628208; qual_match_simple_bayesian[29][24] = -0.00524706; qual_match_simple_bayesian[29][25] = -0.00442567; qual_match_simple_bayesian[29][26] = -0.00377371; qual_match_simple_bayesian[29][27] = -0.00325613; qual_match_simple_bayesian[29][28] = -0.0028452; qual_match_simple_bayesian[29][29] = -0.00251891; qual_match_simple_bayesian[29][30] = -0.0022598; qual_match_simple_bayesian[29][31] = -0.00205403; qual_match_simple_bayesian[29][32] = -0.00189061; qual_match_simple_bayesian[29][33] = -0.00176082; qual_match_simple_bayesian[29][34] = -0.00165774; qual_match_simple_bayesian[29][35] = -0.00157586; qual_match_simple_bayesian[29][36] = -0.00151083; qual_match_simple_bayesian[29][37] = -0.00145918; qual_match_simple_bayesian[29][38] = -0.00141815; qual_match_simple_bayesian[29][39] = -0.00138557; qual_match_simple_bayesian[29][40] = -0.00135968; qual_match_simple_bayesian[29][41] = -0.00133912; qual_match_simple_bayesian[29][42] = -0.00132279; qual_match_simple_bayesian[29][43] = -0.00130982; qual_match_simple_bayesian[29][44] = -0.00129951; qual_match_simple_bayesian[29][45] = -0.00129133; qual_match_simple_bayesian[29][46] = -0.00128483; qual_match_simple_bayesian[30][0] = -8.00637; qual_match_simple_bayesian[30][1] = -1.58119; qual_match_simple_bayesian[30][2] = -0.997273; qual_match_simple_bayesian[30][3] = -0.69619; qual_match_simple_bayesian[30][4] = -0.508456; qual_match_simple_bayesian[30][5] = -0.380977; qual_match_simple_bayesian[30][6] = -0.290157; qual_match_simple_bayesian[30][7] = -0.223469; qual_match_simple_bayesian[30][8] = -0.173494; qual_match_simple_bayesian[30][9] = -0.135504; qual_match_simple_bayesian[30][10] = -0.106324; qual_match_simple_bayesian[30][11] = -0.083737; qual_match_simple_bayesian[30][12] = -0.0661522; qual_match_simple_bayesian[30][13] = -0.0524012; qual_match_simple_bayesian[30][14] = -0.0416115; qual_match_simple_bayesian[30][15] = -0.0331232; qual_match_simple_bayesian[30][16] = -0.0264316; qual_match_simple_bayesian[30][17] = -0.0211481; qual_match_simple_bayesian[30][18] = -0.016971; qual_match_simple_bayesian[30][19] = -0.0136654; qual_match_simple_bayesian[30][20] = -0.0110475; qual_match_simple_bayesian[30][21] = -0.00897283; qual_match_simple_bayesian[30][22] = -0.00732794; qual_match_simple_bayesian[30][23] = -0.00602329; qual_match_simple_bayesian[30][24] = -0.00498818; qual_match_simple_bayesian[30][25] = -0.00416673; qual_match_simple_bayesian[30][26] = -0.00351471; qual_match_simple_bayesian[30][27] = -0.00299709; qual_match_simple_bayesian[30][28] = -0.00258612; qual_match_simple_bayesian[30][29] = -0.0022598; qual_match_simple_bayesian[30][30] = -0.00200067; qual_match_simple_bayesian[30][31] = -0.00179488; qual_match_simple_bayesian[30][32] = -0.00163145; qual_match_simple_bayesian[30][33] = -0.00150165; qual_match_simple_bayesian[30][34] = -0.00139855; qual_match_simple_bayesian[30][35] = -0.00131667; qual_match_simple_bayesian[30][36] = -0.00125164; qual_match_simple_bayesian[30][37] = -0.00119998; qual_match_simple_bayesian[30][38] = -0.00115895; qual_match_simple_bayesian[30][39] = -0.00112636; qual_match_simple_bayesian[30][40] = -0.00110047; qual_match_simple_bayesian[30][41] = -0.00107991; qual_match_simple_bayesian[30][42] = -0.00106358; qual_match_simple_bayesian[30][43] = -0.0010506; qual_match_simple_bayesian[30][44] = -0.0010403; qual_match_simple_bayesian[30][45] = -0.00103211; qual_match_simple_bayesian[30][46] = -0.00102561; qual_match_simple_bayesian[31][0] = -8.23663; qual_match_simple_bayesian[31][1] = -1.58125; qual_match_simple_bayesian[31][2] = -0.997185; qual_match_simple_bayesian[31][3] = -0.696053; qual_match_simple_bayesian[31][4] = -0.508295; qual_match_simple_bayesian[31][5] = -0.380803; qual_match_simple_bayesian[31][6] = -0.289974; qual_match_simple_bayesian[31][7] = -0.22328; qual_match_simple_bayesian[31][8] = -0.173301; qual_match_simple_bayesian[31][9] = -0.135308; qual_match_simple_bayesian[31][10] = -0.106126; qual_match_simple_bayesian[31][11] = -0.0835371; qual_match_simple_bayesian[31][12] = -0.065951; qual_match_simple_bayesian[31][13] = -0.0521989; qual_match_simple_bayesian[31][14] = -0.0414085; qual_match_simple_bayesian[31][15] = -0.0329196; qual_match_simple_bayesian[31][16] = -0.0262275; qual_match_simple_bayesian[31][17] = -0.0209436; qual_match_simple_bayesian[31][18] = -0.0167662; qual_match_simple_bayesian[31][19] = -0.0134604; qual_match_simple_bayesian[31][20] = -0.0108423; qual_match_simple_bayesian[31][21] = -0.00876752; qual_match_simple_bayesian[31][22] = -0.00712252; qual_match_simple_bayesian[31][23] = -0.00581778; qual_match_simple_bayesian[31][24] = -0.0047826; qual_match_simple_bayesian[31][25] = -0.00396109; qual_match_simple_bayesian[31][26] = -0.00330902; qual_match_simple_bayesian[31][27] = -0.00279137; qual_match_simple_bayesian[31][28] = -0.00238037; qual_match_simple_bayesian[31][29] = -0.00205403; qual_match_simple_bayesian[31][30] = -0.00179488; qual_match_simple_bayesian[31][31] = -0.00158908; qual_match_simple_bayesian[31][32] = -0.00142563; qual_match_simple_bayesian[31][33] = -0.00129582; qual_match_simple_bayesian[31][34] = -0.00119272; qual_match_simple_bayesian[31][35] = -0.00111084; qual_match_simple_bayesian[31][36] = -0.0010458; qual_match_simple_bayesian[31][37] = -0.000994137; qual_match_simple_bayesian[31][38] = -0.000953104; qual_match_simple_bayesian[31][39] = -0.000920511; qual_match_simple_bayesian[31][40] = -0.000894622; qual_match_simple_bayesian[31][41] = -0.000874059; qual_match_simple_bayesian[31][42] = -0.000857725; qual_match_simple_bayesian[31][43] = -0.000844751; qual_match_simple_bayesian[31][44] = -0.000834445; qual_match_simple_bayesian[31][45] = -0.000826259; qual_match_simple_bayesian[31][46] = -0.000819756; qual_match_simple_bayesian[32][0] = -8.46688; qual_match_simple_bayesian[32][1] = -1.58129; qual_match_simple_bayesian[32][2] = -0.997114; qual_match_simple_bayesian[32][3] = -0.695944; qual_match_simple_bayesian[32][4] = -0.508168; qual_match_simple_bayesian[32][5] = -0.380664; qual_match_simple_bayesian[32][6] = -0.289829; qual_match_simple_bayesian[32][7] = -0.22313; qual_match_simple_bayesian[32][8] = -0.173148; qual_match_simple_bayesian[32][9] = -0.135153; qual_match_simple_bayesian[32][10] = -0.105968; qual_match_simple_bayesian[32][11] = -0.0833783; qual_match_simple_bayesian[32][12] = -0.0657912; qual_match_simple_bayesian[32][13] = -0.0520383; qual_match_simple_bayesian[32][14] = -0.0412473; qual_match_simple_bayesian[32][15] = -0.0327579; qual_match_simple_bayesian[32][16] = -0.0260655; qual_match_simple_bayesian[32][17] = -0.0207812; qual_match_simple_bayesian[32][18] = -0.0166036; qual_match_simple_bayesian[32][19] = -0.0132976; qual_match_simple_bayesian[32][20] = -0.0106794; qual_match_simple_bayesian[32][21] = -0.00860447; qual_match_simple_bayesian[32][22] = -0.00695938; qual_match_simple_bayesian[32][23] = -0.00565457; qual_match_simple_bayesian[32][24] = -0.00461933; qual_match_simple_bayesian[32][25] = -0.00379778; qual_match_simple_bayesian[32][26] = -0.00314567; qual_match_simple_bayesian[32][27] = -0.00262799; qual_match_simple_bayesian[32][28] = -0.00221697; qual_match_simple_bayesian[32][29] = -0.00189061; qual_match_simple_bayesian[32][30] = -0.00163145; qual_match_simple_bayesian[32][31] = -0.00142563; qual_match_simple_bayesian[32][32] = -0.00126218; qual_match_simple_bayesian[32][33] = -0.00113236; qual_match_simple_bayesian[32][34] = -0.00102926; qual_match_simple_bayesian[32][35] = -0.000947368; qual_match_simple_bayesian[32][36] = -0.000882324; qual_match_simple_bayesian[32][37] = -0.000830661; qual_match_simple_bayesian[32][38] = -0.000789625; qual_match_simple_bayesian[32][39] = -0.00075703; qual_match_simple_bayesian[32][40] = -0.00073114; qual_match_simple_bayesian[32][41] = -0.000710576; qual_match_simple_bayesian[32][42] = -0.000694241; qual_match_simple_bayesian[32][43] = -0.000681266; qual_match_simple_bayesian[32][44] = -0.00067096; qual_match_simple_bayesian[32][45] = -0.000662773; qual_match_simple_bayesian[32][46] = -0.00065627; qual_match_simple_bayesian[33][0] = -8.69714; qual_match_simple_bayesian[33][1] = -1.58133; qual_match_simple_bayesian[33][2] = -0.997059; qual_match_simple_bayesian[33][3] = -0.695858; qual_match_simple_bayesian[33][4] = -0.508067; qual_match_simple_bayesian[33][5] = -0.380554; qual_match_simple_bayesian[33][6] = -0.289713; qual_match_simple_bayesian[33][7] = -0.223011; qual_match_simple_bayesian[33][8] = -0.173026; qual_match_simple_bayesian[33][9] = -0.135029; qual_match_simple_bayesian[33][10] = -0.105843; qual_match_simple_bayesian[33][11] = -0.0832522; qual_match_simple_bayesian[33][12] = -0.0656642; qual_match_simple_bayesian[33][13] = -0.0519108; qual_match_simple_bayesian[33][14] = -0.0411192; qual_match_simple_bayesian[33][15] = -0.0326294; qual_match_simple_bayesian[33][16] = -0.0259367; qual_match_simple_bayesian[33][17] = -0.0206523; qual_match_simple_bayesian[33][18] = -0.0164745; qual_match_simple_bayesian[33][19] = -0.0131684; qual_match_simple_bayesian[33][20] = -0.01055; qual_match_simple_bayesian[33][21] = -0.00847497; qual_match_simple_bayesian[33][22] = -0.00682981; qual_match_simple_bayesian[33][23] = -0.00552494; qual_match_simple_bayesian[33][24] = -0.00448966; qual_match_simple_bayesian[33][25] = -0.00366807; qual_match_simple_bayesian[33][26] = -0.00301594; qual_match_simple_bayesian[33][27] = -0.00249823; qual_match_simple_bayesian[33][28] = -0.0020872; qual_match_simple_bayesian[33][29] = -0.00176082; qual_match_simple_bayesian[33][30] = -0.00150165; qual_match_simple_bayesian[33][31] = -0.00129582; qual_match_simple_bayesian[33][32] = -0.00113236; qual_match_simple_bayesian[33][33] = -0.00100254; qual_match_simple_bayesian[33][34] = -0.000899433; qual_match_simple_bayesian[33][35] = -0.000817538; qual_match_simple_bayesian[33][36] = -0.000752491; qual_match_simple_bayesian[33][37] = -0.000700826; qual_match_simple_bayesian[33][38] = -0.000659788; qual_match_simple_bayesian[33][39] = -0.000627192; qual_match_simple_bayesian[33][40] = -0.000601301; qual_match_simple_bayesian[33][41] = -0.000580736; qual_match_simple_bayesian[33][42] = -0.0005644; qual_match_simple_bayesian[33][43] = -0.000551424; qual_match_simple_bayesian[33][44] = -0.000541118; qual_match_simple_bayesian[33][45] = -0.000532931; qual_match_simple_bayesian[33][46] = -0.000526428; qual_match_simple_bayesian[34][0] = -8.9274; qual_match_simple_bayesian[34][1] = -1.58136; qual_match_simple_bayesian[34][2] = -0.997014; qual_match_simple_bayesian[34][3] = -0.695789; qual_match_simple_bayesian[34][4] = -0.507986; qual_match_simple_bayesian[34][5] = -0.380467; qual_match_simple_bayesian[34][6] = -0.289622; qual_match_simple_bayesian[34][7] = -0.222917; qual_match_simple_bayesian[34][8] = -0.17293; qual_match_simple_bayesian[34][9] = -0.134931; qual_match_simple_bayesian[34][10] = -0.105744; qual_match_simple_bayesian[34][11] = -0.083152; qual_match_simple_bayesian[34][12] = -0.0655634; qual_match_simple_bayesian[34][13] = -0.0518095; qual_match_simple_bayesian[34][14] = -0.0410175; qual_match_simple_bayesian[34][15] = -0.0325274; qual_match_simple_bayesian[34][16] = -0.0258345; qual_match_simple_bayesian[34][17] = -0.0205498; qual_match_simple_bayesian[34][18] = -0.0163719; qual_match_simple_bayesian[34][19] = -0.0130657; qual_match_simple_bayesian[34][20] = -0.0104472; qual_match_simple_bayesian[34][21] = -0.00837212; qual_match_simple_bayesian[34][22] = -0.00672691; qual_match_simple_bayesian[34][23] = -0.00542199; qual_match_simple_bayesian[34][24] = -0.00438667; qual_match_simple_bayesian[34][25] = -0.00356505; qual_match_simple_bayesian[34][26] = -0.0029129; qual_match_simple_bayesian[34][27] = -0.00239518; qual_match_simple_bayesian[34][28] = -0.00198413; qual_match_simple_bayesian[34][29] = -0.00165774; qual_match_simple_bayesian[34][30] = -0.00139855; qual_match_simple_bayesian[34][31] = -0.00119272; qual_match_simple_bayesian[34][32] = -0.00102926; qual_match_simple_bayesian[34][33] = -0.000899433; qual_match_simple_bayesian[34][34] = -0.00079632; qual_match_simple_bayesian[34][35] = -0.000714422; qual_match_simple_bayesian[34][36] = -0.000649373; qual_match_simple_bayesian[34][37] = -0.000597706; qual_match_simple_bayesian[34][38] = -0.000556667; qual_match_simple_bayesian[34][39] = -0.00052407; qual_match_simple_bayesian[34][40] = -0.000498178; qual_match_simple_bayesian[34][41] = -0.000477612; qual_match_simple_bayesian[34][42] = -0.000461276; qual_match_simple_bayesian[34][43] = -0.0004483; qual_match_simple_bayesian[34][44] = -0.000437993; qual_match_simple_bayesian[34][45] = -0.000429806; qual_match_simple_bayesian[34][46] = -0.000423302; qual_match_simple_bayesian[35][0] = -9.15766; qual_match_simple_bayesian[35][1] = -1.58138; qual_match_simple_bayesian[35][2] = -0.996979; qual_match_simple_bayesian[35][3] = -0.695735; qual_match_simple_bayesian[35][4] = -0.507922; qual_match_simple_bayesian[35][5] = -0.380398; qual_match_simple_bayesian[35][6] = -0.289549; qual_match_simple_bayesian[35][7] = -0.222842; qual_match_simple_bayesian[35][8] = -0.172853; qual_match_simple_bayesian[35][9] = -0.134853; qual_match_simple_bayesian[35][10] = -0.105665; qual_match_simple_bayesian[35][11] = -0.0830725; qual_match_simple_bayesian[35][12] = -0.0654833; qual_match_simple_bayesian[35][13] = -0.051729; qual_match_simple_bayesian[35][14] = -0.0409368; qual_match_simple_bayesian[35][15] = -0.0324464; qual_match_simple_bayesian[35][16] = -0.0257533; qual_match_simple_bayesian[35][17] = -0.0204685; qual_match_simple_bayesian[35][18] = -0.0162904; qual_match_simple_bayesian[35][19] = -0.0129841; qual_match_simple_bayesian[35][20] = -0.0103655; qual_match_simple_bayesian[35][21] = -0.00829043; qual_match_simple_bayesian[35][22] = -0.00664517; qual_match_simple_bayesian[35][23] = -0.00534022; qual_match_simple_bayesian[35][24] = -0.00430487; qual_match_simple_bayesian[35][25] = -0.00348323; qual_match_simple_bayesian[35][26] = -0.00283106; qual_match_simple_bayesian[35][27] = -0.00231332; qual_match_simple_bayesian[35][28] = -0.00190226; qual_match_simple_bayesian[35][29] = -0.00157586; qual_match_simple_bayesian[35][30] = -0.00131667; qual_match_simple_bayesian[35][31] = -0.00111084; qual_match_simple_bayesian[35][32] = -0.000947368; qual_match_simple_bayesian[35][33] = -0.000817538; qual_match_simple_bayesian[35][34] = -0.000714422; qual_match_simple_bayesian[35][35] = -0.000632522; qual_match_simple_bayesian[35][36] = -0.000567471; qual_match_simple_bayesian[35][37] = -0.000515803; qual_match_simple_bayesian[35][38] = -0.000474763; qual_match_simple_bayesian[35][39] = -0.000442165; qual_match_simple_bayesian[35][40] = -0.000416272; qual_match_simple_bayesian[35][41] = -0.000395705; qual_match_simple_bayesian[35][42] = -0.000379369; qual_match_simple_bayesian[35][43] = -0.000366392; qual_match_simple_bayesian[35][44] = -0.000356085; qual_match_simple_bayesian[35][45] = -0.000347898; qual_match_simple_bayesian[35][46] = -0.000341394; qual_match_simple_bayesian[36][0] = -9.38792; qual_match_simple_bayesian[36][1] = -1.5814; qual_match_simple_bayesian[36][2] = -0.996951; qual_match_simple_bayesian[36][3] = -0.695692; qual_match_simple_bayesian[36][4] = -0.507872; qual_match_simple_bayesian[36][5] = -0.380343; qual_match_simple_bayesian[36][6] = -0.289491; qual_match_simple_bayesian[36][7] = -0.222782; qual_match_simple_bayesian[36][8] = -0.172792; qual_match_simple_bayesian[36][9] = -0.134791; qual_match_simple_bayesian[36][10] = -0.105602; qual_match_simple_bayesian[36][11] = -0.0830093; qual_match_simple_bayesian[36][12] = -0.0654198; qual_match_simple_bayesian[36][13] = -0.0516651; qual_match_simple_bayesian[36][14] = -0.0408726; qual_match_simple_bayesian[36][15] = -0.0323821; qual_match_simple_bayesian[36][16] = -0.0256888; qual_match_simple_bayesian[36][17] = -0.0204039; qual_match_simple_bayesian[36][18] = -0.0162257; qual_match_simple_bayesian[36][19] = -0.0129193; qual_match_simple_bayesian[36][20] = -0.0103007; qual_match_simple_bayesian[36][21] = -0.00822555; qual_match_simple_bayesian[36][22] = -0.00658025; qual_match_simple_bayesian[36][23] = -0.00527527; qual_match_simple_bayesian[36][24] = -0.0042399; qual_match_simple_bayesian[36][25] = -0.00341824; qual_match_simple_bayesian[36][26] = -0.00276606; qual_match_simple_bayesian[36][27] = -0.00224831; qual_match_simple_bayesian[36][28] = -0.00183724; qual_match_simple_bayesian[36][29] = -0.00151083; qual_match_simple_bayesian[36][30] = -0.00125164; qual_match_simple_bayesian[36][31] = -0.0010458; qual_match_simple_bayesian[36][32] = -0.000882324; qual_match_simple_bayesian[36][33] = -0.000752491; qual_match_simple_bayesian[36][34] = -0.000649373; qual_match_simple_bayesian[36][35] = -0.000567471; qual_match_simple_bayesian[36][36] = -0.000502419; qual_match_simple_bayesian[36][37] = -0.00045075; qual_match_simple_bayesian[36][38] = -0.000409709; qual_match_simple_bayesian[36][39] = -0.00037711; qual_match_simple_bayesian[36][40] = -0.000351217; qual_match_simple_bayesian[36][41] = -0.00033065; qual_match_simple_bayesian[36][42] = -0.000314313; qual_match_simple_bayesian[36][43] = -0.000301336; qual_match_simple_bayesian[36][44] = -0.000291028; qual_match_simple_bayesian[36][45] = -0.000282841; qual_match_simple_bayesian[36][46] = -0.000276337; qual_match_simple_bayesian[37][0] = -9.61818; qual_match_simple_bayesian[37][1] = -1.58142; qual_match_simple_bayesian[37][2] = -0.996929; qual_match_simple_bayesian[37][3] = -0.695657; qual_match_simple_bayesian[37][4] = -0.507831; qual_match_simple_bayesian[37][5] = -0.380299; qual_match_simple_bayesian[37][6] = -0.289445; qual_match_simple_bayesian[37][7] = -0.222734; qual_match_simple_bayesian[37][8] = -0.172744; qual_match_simple_bayesian[37][9] = -0.134742; qual_match_simple_bayesian[37][10] = -0.105553; qual_match_simple_bayesian[37][11] = -0.0829591; qual_match_simple_bayesian[37][12] = -0.0653692; qual_match_simple_bayesian[37][13] = -0.0516143; qual_match_simple_bayesian[37][14] = -0.0408216; qual_match_simple_bayesian[37][15] = -0.0323309; qual_match_simple_bayesian[37][16] = -0.0256376; qual_match_simple_bayesian[37][17] = -0.0203526; qual_match_simple_bayesian[37][18] = -0.0161743; qual_match_simple_bayesian[37][19] = -0.0128679; qual_match_simple_bayesian[37][20] = -0.0102492; qual_match_simple_bayesian[37][21] = -0.00817401; qual_match_simple_bayesian[37][22] = -0.00652869; qual_match_simple_bayesian[37][23] = -0.00522368; qual_match_simple_bayesian[37][24] = -0.0041883; qual_match_simple_bayesian[37][25] = -0.00336662; qual_match_simple_bayesian[37][26] = -0.00271443; qual_match_simple_bayesian[37][27] = -0.00219667; qual_match_simple_bayesian[37][28] = -0.00178559; qual_match_simple_bayesian[37][29] = -0.00145918; qual_match_simple_bayesian[37][30] = -0.00119998; qual_match_simple_bayesian[37][31] = -0.000994137; qual_match_simple_bayesian[37][32] = -0.000830661; qual_match_simple_bayesian[37][33] = -0.000700826; qual_match_simple_bayesian[37][34] = -0.000597706; qual_match_simple_bayesian[37][35] = -0.000515803; qual_match_simple_bayesian[37][36] = -0.00045075; qual_match_simple_bayesian[37][37] = -0.000399079; qual_match_simple_bayesian[37][38] = -0.000358037; qual_match_simple_bayesian[37][39] = -0.000325438; qual_match_simple_bayesian[37][40] = -0.000299544; qual_match_simple_bayesian[37][41] = -0.000278977; qual_match_simple_bayesian[37][42] = -0.00026264; qual_match_simple_bayesian[37][43] = -0.000249663; qual_match_simple_bayesian[37][44] = -0.000239355; qual_match_simple_bayesian[37][45] = -0.000231167; qual_match_simple_bayesian[37][46] = -0.000224664; qual_match_simple_bayesian[38][0] = -9.84844; qual_match_simple_bayesian[38][1] = -1.58143; qual_match_simple_bayesian[38][2] = -0.996911; qual_match_simple_bayesian[38][3] = -0.69563; qual_match_simple_bayesian[38][4] = -0.507799; qual_match_simple_bayesian[38][5] = -0.380264; qual_match_simple_bayesian[38][6] = -0.289409; qual_match_simple_bayesian[38][7] = -0.222697; qual_match_simple_bayesian[38][8] = -0.172705; qual_match_simple_bayesian[38][9] = -0.134703; qual_match_simple_bayesian[38][10] = -0.105513; qual_match_simple_bayesian[38][11] = -0.0829192; qual_match_simple_bayesian[38][12] = -0.0653291; qual_match_simple_bayesian[38][13] = -0.051574; qual_match_simple_bayesian[38][14] = -0.0407812; qual_match_simple_bayesian[38][15] = -0.0322904; qual_match_simple_bayesian[38][16] = -0.0255969; qual_match_simple_bayesian[38][17] = -0.0203118; qual_match_simple_bayesian[38][18] = -0.0161335; qual_match_simple_bayesian[38][19] = -0.012827; qual_match_simple_bayesian[38][20] = -0.0102083; qual_match_simple_bayesian[38][21] = -0.00813308; qual_match_simple_bayesian[38][22] = -0.00648773; qual_match_simple_bayesian[38][23] = -0.00518271; qual_match_simple_bayesian[38][24] = -0.00414731; qual_match_simple_bayesian[38][25] = -0.00332562; qual_match_simple_bayesian[38][26] = -0.00267342; qual_match_simple_bayesian[38][27] = -0.00215565; qual_match_simple_bayesian[38][28] = -0.00174457; qual_match_simple_bayesian[38][29] = -0.00141815; qual_match_simple_bayesian[38][30] = -0.00115895; qual_match_simple_bayesian[38][31] = -0.000953104; qual_match_simple_bayesian[38][32] = -0.000789625; qual_match_simple_bayesian[38][33] = -0.000659788; qual_match_simple_bayesian[38][34] = -0.000556667; qual_match_simple_bayesian[38][35] = -0.000474763; qual_match_simple_bayesian[38][36] = -0.000409709; qual_match_simple_bayesian[38][37] = -0.000358037; qual_match_simple_bayesian[38][38] = -0.000316995; qual_match_simple_bayesian[38][39] = -0.000284396; qual_match_simple_bayesian[38][40] = -0.000258502; qual_match_simple_bayesian[38][41] = -0.000237934; qual_match_simple_bayesian[38][42] = -0.000221596; qual_match_simple_bayesian[38][43] = -0.000208619; qual_match_simple_bayesian[38][44] = -0.000198311; qual_match_simple_bayesian[38][45] = -0.000190123; qual_match_simple_bayesian[38][46] = -0.00018362; qual_match_simple_bayesian[39][0] = -10.0787; qual_match_simple_bayesian[39][1] = -1.58144; qual_match_simple_bayesian[39][2] = -0.996897; qual_match_simple_bayesian[39][3] = -0.695608; qual_match_simple_bayesian[39][4] = -0.507774; qual_match_simple_bayesian[39][5] = -0.380237; qual_match_simple_bayesian[39][6] = -0.28938; qual_match_simple_bayesian[39][7] = -0.222667; qual_match_simple_bayesian[39][8] = -0.172675; qual_match_simple_bayesian[39][9] = -0.134672; qual_match_simple_bayesian[39][10] = -0.105482; qual_match_simple_bayesian[39][11] = -0.0828876; qual_match_simple_bayesian[39][12] = -0.0652972; qual_match_simple_bayesian[39][13] = -0.051542; qual_match_simple_bayesian[39][14] = -0.040749; qual_match_simple_bayesian[39][15] = -0.0322581; qual_match_simple_bayesian[39][16] = -0.0255645; qual_match_simple_bayesian[39][17] = -0.0202794; qual_match_simple_bayesian[39][18] = -0.0161011; qual_match_simple_bayesian[39][19] = -0.0127945; qual_match_simple_bayesian[39][20] = -0.0101758; qual_match_simple_bayesian[39][21] = -0.00810056; qual_match_simple_bayesian[39][22] = -0.0064552; qual_match_simple_bayesian[39][23] = -0.00515016; qual_match_simple_bayesian[39][24] = -0.00411475; qual_match_simple_bayesian[39][25] = -0.00329306; qual_match_simple_bayesian[39][26] = -0.00264084; qual_match_simple_bayesian[39][27] = -0.00212307; qual_match_simple_bayesian[39][28] = -0.00171198; qual_match_simple_bayesian[39][29] = -0.00138557; qual_match_simple_bayesian[39][30] = -0.00112636; qual_match_simple_bayesian[39][31] = -0.000920511; qual_match_simple_bayesian[39][32] = -0.00075703; qual_match_simple_bayesian[39][33] = -0.000627192; qual_match_simple_bayesian[39][34] = -0.00052407; qual_match_simple_bayesian[39][35] = -0.000442165; qual_match_simple_bayesian[39][36] = -0.00037711; qual_match_simple_bayesian[39][37] = -0.000325438; qual_match_simple_bayesian[39][38] = -0.000284396; qual_match_simple_bayesian[39][39] = -0.000251796; qual_match_simple_bayesian[39][40] = -0.000225901; qual_match_simple_bayesian[39][41] = -0.000205333; qual_match_simple_bayesian[39][42] = -0.000188996; qual_match_simple_bayesian[39][43] = -0.000176018; qual_match_simple_bayesian[39][44] = -0.00016571; qual_match_simple_bayesian[39][45] = -0.000157522; qual_match_simple_bayesian[39][46] = -0.000151019; qual_match_simple_bayesian[40][0] = -10.309; qual_match_simple_bayesian[40][1] = -1.58145; qual_match_simple_bayesian[40][2] = -0.996886; qual_match_simple_bayesian[40][3] = -0.695591; qual_match_simple_bayesian[40][4] = -0.507754; qual_match_simple_bayesian[40][5] = -0.380215; qual_match_simple_bayesian[40][6] = -0.289357; qual_match_simple_bayesian[40][7] = -0.222643; qual_match_simple_bayesian[40][8] = -0.17265; qual_match_simple_bayesian[40][9] = -0.134647; qual_match_simple_bayesian[40][10] = -0.105457; qual_match_simple_bayesian[40][11] = -0.0828624; qual_match_simple_bayesian[40][12] = -0.0652719; qual_match_simple_bayesian[40][13] = -0.0515165; qual_match_simple_bayesian[40][14] = -0.0407235; qual_match_simple_bayesian[40][15] = -0.0322325; qual_match_simple_bayesian[40][16] = -0.0255389; qual_match_simple_bayesian[40][17] = -0.0202537; qual_match_simple_bayesian[40][18] = -0.0160753; qual_match_simple_bayesian[40][19] = -0.0127688; qual_match_simple_bayesian[40][20] = -0.01015; qual_match_simple_bayesian[40][21] = -0.00807474; qual_match_simple_bayesian[40][22] = -0.00642936; qual_match_simple_bayesian[40][23] = -0.00512431; qual_match_simple_bayesian[40][24] = -0.00408889; qual_match_simple_bayesian[40][25] = -0.00326719; qual_match_simple_bayesian[40][26] = -0.00261497; qual_match_simple_bayesian[40][27] = -0.00209719; qual_match_simple_bayesian[40][28] = -0.0016861; qual_match_simple_bayesian[40][29] = -0.00135968; qual_match_simple_bayesian[40][30] = -0.00110047; qual_match_simple_bayesian[40][31] = -0.000894622; qual_match_simple_bayesian[40][32] = -0.00073114; qual_match_simple_bayesian[40][33] = -0.000601301; qual_match_simple_bayesian[40][34] = -0.000498178; qual_match_simple_bayesian[40][35] = -0.000416272; qual_match_simple_bayesian[40][36] = -0.000351217; qual_match_simple_bayesian[40][37] = -0.000299544; qual_match_simple_bayesian[40][38] = -0.000258502; qual_match_simple_bayesian[40][39] = -0.000225901; qual_match_simple_bayesian[40][40] = -0.000200007; qual_match_simple_bayesian[40][41] = -0.000179438; qual_match_simple_bayesian[40][42] = -0.000163101; qual_match_simple_bayesian[40][43] = -0.000150123; qual_match_simple_bayesian[40][44] = -0.000139815; qual_match_simple_bayesian[40][45] = -0.000131627; qual_match_simple_bayesian[40][46] = -0.000125123; qual_match_simple_bayesian[41][0] = -10.5392; qual_match_simple_bayesian[41][1] = -1.58145; qual_match_simple_bayesian[41][2] = -0.996877; qual_match_simple_bayesian[41][3] = -0.695577; qual_match_simple_bayesian[41][4] = -0.507738; qual_match_simple_bayesian[41][5] = -0.380198; qual_match_simple_bayesian[41][6] = -0.289339; qual_match_simple_bayesian[41][7] = -0.222624; qual_match_simple_bayesian[41][8] = -0.172631; qual_match_simple_bayesian[41][9] = -0.134628; qual_match_simple_bayesian[41][10] = -0.105437; qual_match_simple_bayesian[41][11] = -0.0828425; qual_match_simple_bayesian[41][12] = -0.0652518; qual_match_simple_bayesian[41][13] = -0.0514963; qual_match_simple_bayesian[41][14] = -0.0407032; qual_match_simple_bayesian[41][15] = -0.0322121; qual_match_simple_bayesian[41][16] = -0.0255185; qual_match_simple_bayesian[41][17] = -0.0202333; qual_match_simple_bayesian[41][18] = -0.0160549; qual_match_simple_bayesian[41][19] = -0.0127483; qual_match_simple_bayesian[41][20] = -0.0101295; qual_match_simple_bayesian[41][21] = -0.00805422; qual_match_simple_bayesian[41][22] = -0.00640883; qual_match_simple_bayesian[41][23] = -0.00510378; qual_match_simple_bayesian[41][24] = -0.00406835; qual_match_simple_bayesian[41][25] = -0.00324664; qual_match_simple_bayesian[41][26] = -0.00259442; qual_match_simple_bayesian[41][27] = -0.00207664; qual_match_simple_bayesian[41][28] = -0.00166554; qual_match_simple_bayesian[41][29] = -0.00133912; qual_match_simple_bayesian[41][30] = -0.00107991; qual_match_simple_bayesian[41][31] = -0.000874059; qual_match_simple_bayesian[41][32] = -0.000710576; qual_match_simple_bayesian[41][33] = -0.000580736; qual_match_simple_bayesian[41][34] = -0.000477612; qual_match_simple_bayesian[41][35] = -0.000395705; qual_match_simple_bayesian[41][36] = -0.00033065; qual_match_simple_bayesian[41][37] = -0.000278977; qual_match_simple_bayesian[41][38] = -0.000237934; qual_match_simple_bayesian[41][39] = -0.000205333; qual_match_simple_bayesian[41][40] = -0.000179438; qual_match_simple_bayesian[41][41] = -0.00015887; qual_match_simple_bayesian[41][42] = -0.000142532; qual_match_simple_bayesian[41][43] = -0.000129555; qual_match_simple_bayesian[41][44] = -0.000119246; qual_match_simple_bayesian[41][45] = -0.000111058; qual_match_simple_bayesian[41][46] = -0.000104554; qual_match_simple_bayesian[42][0] = -10.7695; qual_match_simple_bayesian[42][1] = -1.58146; qual_match_simple_bayesian[42][2] = -0.99687; qual_match_simple_bayesian[42][3] = -0.695566; qual_match_simple_bayesian[42][4] = -0.507725; qual_match_simple_bayesian[42][5] = -0.380184; qual_match_simple_bayesian[42][6] = -0.289324; qual_match_simple_bayesian[42][7] = -0.222609; qual_match_simple_bayesian[42][8] = -0.172616; qual_match_simple_bayesian[42][9] = -0.134612; qual_match_simple_bayesian[42][10] = -0.105421; qual_match_simple_bayesian[42][11] = -0.0828266; qual_match_simple_bayesian[42][12] = -0.0652359; qual_match_simple_bayesian[42][13] = -0.0514803; qual_match_simple_bayesian[42][14] = -0.0406871; qual_match_simple_bayesian[42][15] = -0.032196; qual_match_simple_bayesian[42][16] = -0.0255023; qual_match_simple_bayesian[42][17] = -0.020217; qual_match_simple_bayesian[42][18] = -0.0160386; qual_match_simple_bayesian[42][19] = -0.012732; qual_match_simple_bayesian[42][20] = -0.0101132; qual_match_simple_bayesian[42][21] = -0.00803793; qual_match_simple_bayesian[42][22] = -0.00639253; qual_match_simple_bayesian[42][23] = -0.00508747; qual_match_simple_bayesian[42][24] = -0.00405203; qual_match_simple_bayesian[42][25] = -0.00323032; qual_match_simple_bayesian[42][26] = -0.00257809; qual_match_simple_bayesian[42][27] = -0.00206031; qual_match_simple_bayesian[42][28] = -0.00164921; qual_match_simple_bayesian[42][29] = -0.00132279; qual_match_simple_bayesian[42][30] = -0.00106358; qual_match_simple_bayesian[42][31] = -0.000857725; qual_match_simple_bayesian[42][32] = -0.000694241; qual_match_simple_bayesian[42][33] = -0.0005644; qual_match_simple_bayesian[42][34] = -0.000461276; qual_match_simple_bayesian[42][35] = -0.000379369; qual_match_simple_bayesian[42][36] = -0.000314313; qual_match_simple_bayesian[42][37] = -0.00026264; qual_match_simple_bayesian[42][38] = -0.000221596; qual_match_simple_bayesian[42][39] = -0.000188996; qual_match_simple_bayesian[42][40] = -0.000163101; qual_match_simple_bayesian[42][41] = -0.000142532; qual_match_simple_bayesian[42][42] = -0.000126194; qual_match_simple_bayesian[42][43] = -0.000113217; qual_match_simple_bayesian[42][44] = -0.000102908; qual_match_simple_bayesian[42][45] = -9.47203e-05; qual_match_simple_bayesian[42][46] = -8.82164e-05; qual_match_simple_bayesian[43][0] = -10.9997; qual_match_simple_bayesian[43][1] = -1.58146; qual_match_simple_bayesian[43][2] = -0.996865; qual_match_simple_bayesian[43][3] = -0.695558; qual_match_simple_bayesian[43][4] = -0.507715; qual_match_simple_bayesian[43][5] = -0.380173; qual_match_simple_bayesian[43][6] = -0.289313; qual_match_simple_bayesian[43][7] = -0.222597; qual_match_simple_bayesian[43][8] = -0.172604; qual_match_simple_bayesian[43][9] = -0.1346; qual_match_simple_bayesian[43][10] = -0.105409; qual_match_simple_bayesian[43][11] = -0.082814; qual_match_simple_bayesian[43][12] = -0.0652232; qual_match_simple_bayesian[43][13] = -0.0514675; qual_match_simple_bayesian[43][14] = -0.0406743; qual_match_simple_bayesian[43][15] = -0.0321831; qual_match_simple_bayesian[43][16] = -0.0254894; qual_match_simple_bayesian[43][17] = -0.0202041; qual_match_simple_bayesian[43][18] = -0.0160257; qual_match_simple_bayesian[43][19] = -0.0127191; qual_match_simple_bayesian[43][20] = -0.0101003; qual_match_simple_bayesian[43][21] = -0.00802498; qual_match_simple_bayesian[43][22] = -0.00637958; qual_match_simple_bayesian[43][23] = -0.00507451; qual_match_simple_bayesian[43][24] = -0.00403907; qual_match_simple_bayesian[43][25] = -0.00321736; qual_match_simple_bayesian[43][26] = -0.00256512; qual_match_simple_bayesian[43][27] = -0.00204734; qual_match_simple_bayesian[43][28] = -0.00163624; qual_match_simple_bayesian[43][29] = -0.00130982; qual_match_simple_bayesian[43][30] = -0.0010506; qual_match_simple_bayesian[43][31] = -0.000844751; qual_match_simple_bayesian[43][32] = -0.000681266; qual_match_simple_bayesian[43][33] = -0.000551424; qual_match_simple_bayesian[43][34] = -0.0004483; qual_match_simple_bayesian[43][35] = -0.000366392; qual_match_simple_bayesian[43][36] = -0.000301336; qual_match_simple_bayesian[43][37] = -0.000249663; qual_match_simple_bayesian[43][38] = -0.000208619; qual_match_simple_bayesian[43][39] = -0.000176018; qual_match_simple_bayesian[43][40] = -0.000150123; qual_match_simple_bayesian[43][41] = -0.000129555; qual_match_simple_bayesian[43][42] = -0.000113217; qual_match_simple_bayesian[43][43] = -0.000100239; qual_match_simple_bayesian[43][44] = -8.99308e-05; qual_match_simple_bayesian[43][45] = -8.17427e-05; qual_match_simple_bayesian[43][46] = -7.52387e-05; qual_match_simple_bayesian[44][0] = -11.23; qual_match_simple_bayesian[44][1] = -1.58146; qual_match_simple_bayesian[44][2] = -0.99686; qual_match_simple_bayesian[44][3] = -0.695551; qual_match_simple_bayesian[44][4] = -0.507707; qual_match_simple_bayesian[44][5] = -0.380164; qual_match_simple_bayesian[44][6] = -0.289304; qual_match_simple_bayesian[44][7] = -0.222588; qual_match_simple_bayesian[44][8] = -0.172594; qual_match_simple_bayesian[44][9] = -0.13459; qual_match_simple_bayesian[44][10] = -0.105399; qual_match_simple_bayesian[44][11] = -0.082804; qual_match_simple_bayesian[44][12] = -0.0652131; qual_match_simple_bayesian[44][13] = -0.0514574; qual_match_simple_bayesian[44][14] = -0.0406641; qual_match_simple_bayesian[44][15] = -0.032173; qual_match_simple_bayesian[44][16] = -0.0254792; qual_match_simple_bayesian[44][17] = -0.0201939; qual_match_simple_bayesian[44][18] = -0.0160155; qual_match_simple_bayesian[44][19] = -0.0127088; qual_match_simple_bayesian[44][20] = -0.01009; qual_match_simple_bayesian[44][21] = -0.0080147; qual_match_simple_bayesian[44][22] = -0.00636929; qual_match_simple_bayesian[44][23] = -0.00506422; qual_match_simple_bayesian[44][24] = -0.00402878; qual_match_simple_bayesian[44][25] = -0.00320706; qual_match_simple_bayesian[44][26] = -0.00255482; qual_match_simple_bayesian[44][27] = -0.00203704; qual_match_simple_bayesian[44][28] = -0.00162594; qual_match_simple_bayesian[44][29] = -0.00129951; qual_match_simple_bayesian[44][30] = -0.0010403; qual_match_simple_bayesian[44][31] = -0.000834445; qual_match_simple_bayesian[44][32] = -0.00067096; qual_match_simple_bayesian[44][33] = -0.000541118; qual_match_simple_bayesian[44][34] = -0.000437993; qual_match_simple_bayesian[44][35] = -0.000356085; qual_match_simple_bayesian[44][36] = -0.000291028; qual_match_simple_bayesian[44][37] = -0.000239355; qual_match_simple_bayesian[44][38] = -0.000198311; qual_match_simple_bayesian[44][39] = -0.00016571; qual_match_simple_bayesian[44][40] = -0.000139815; qual_match_simple_bayesian[44][41] = -0.000119246; qual_match_simple_bayesian[44][42] = -0.000102908; qual_match_simple_bayesian[44][43] = -8.99308e-05; qual_match_simple_bayesian[44][44] = -7.96225e-05; qual_match_simple_bayesian[44][45] = -7.14344e-05; qual_match_simple_bayesian[44][46] = -6.49304e-05; qual_match_simple_bayesian[45][0] = -11.4602; qual_match_simple_bayesian[45][1] = -1.58146; qual_match_simple_bayesian[45][2] = -0.996857; qual_match_simple_bayesian[45][3] = -0.695546; qual_match_simple_bayesian[45][4] = -0.507701; qual_match_simple_bayesian[45][5] = -0.380157; qual_match_simple_bayesian[45][6] = -0.289296; qual_match_simple_bayesian[45][7] = -0.222581; qual_match_simple_bayesian[45][8] = -0.172586; qual_match_simple_bayesian[45][9] = -0.134582; qual_match_simple_bayesian[45][10] = -0.105391; qual_match_simple_bayesian[45][11] = -0.082796; qual_match_simple_bayesian[45][12] = -0.0652051; qual_match_simple_bayesian[45][13] = -0.0514493; qual_match_simple_bayesian[45][14] = -0.040656; qual_match_simple_bayesian[45][15] = -0.0321649; qual_match_simple_bayesian[45][16] = -0.0254711; qual_match_simple_bayesian[45][17] = -0.0201858; qual_match_simple_bayesian[45][18] = -0.0160073; qual_match_simple_bayesian[45][19] = -0.0127007; qual_match_simple_bayesian[45][20] = -0.0100819; qual_match_simple_bayesian[45][21] = -0.00800654; qual_match_simple_bayesian[45][22] = -0.00636112; qual_match_simple_bayesian[45][23] = -0.00505604; qual_match_simple_bayesian[45][24] = -0.0040206; qual_match_simple_bayesian[45][25] = -0.00319888; qual_match_simple_bayesian[45][26] = -0.00254664; qual_match_simple_bayesian[45][27] = -0.00202886; qual_match_simple_bayesian[45][28] = -0.00161776; qual_match_simple_bayesian[45][29] = -0.00129133; qual_match_simple_bayesian[45][30] = -0.00103211; qual_match_simple_bayesian[45][31] = -0.000826259; qual_match_simple_bayesian[45][32] = -0.000662773; qual_match_simple_bayesian[45][33] = -0.000532931; qual_match_simple_bayesian[45][34] = -0.000429806; qual_match_simple_bayesian[45][35] = -0.000347898; qual_match_simple_bayesian[45][36] = -0.000282841; qual_match_simple_bayesian[45][37] = -0.000231167; qual_match_simple_bayesian[45][38] = -0.000190123; qual_match_simple_bayesian[45][39] = -0.000157522; qual_match_simple_bayesian[45][40] = -0.000131627; qual_match_simple_bayesian[45][41] = -0.000111058; qual_match_simple_bayesian[45][42] = -9.47203e-05; qual_match_simple_bayesian[45][43] = -8.17427e-05; qual_match_simple_bayesian[45][44] = -7.14344e-05; qual_match_simple_bayesian[45][45] = -6.32462e-05; qual_match_simple_bayesian[45][46] = -5.67422e-05; qual_match_simple_bayesian[46][0] = -11.6905; qual_match_simple_bayesian[46][1] = -1.58147; qual_match_simple_bayesian[46][2] = -0.996854; qual_match_simple_bayesian[46][3] = -0.695541; qual_match_simple_bayesian[46][4] = -0.507695; qual_match_simple_bayesian[46][5] = -0.380152; qual_match_simple_bayesian[46][6] = -0.28929; qual_match_simple_bayesian[46][7] = -0.222575; qual_match_simple_bayesian[46][8] = -0.17258; qual_match_simple_bayesian[46][9] = -0.134576; qual_match_simple_bayesian[46][10] = -0.105385; qual_match_simple_bayesian[46][11] = -0.0827897; qual_match_simple_bayesian[46][12] = -0.0651987; qual_match_simple_bayesian[46][13] = -0.051443; qual_match_simple_bayesian[46][14] = -0.0406496; qual_match_simple_bayesian[46][15] = -0.0321584; qual_match_simple_bayesian[46][16] = -0.0254646; qual_match_simple_bayesian[46][17] = -0.0201793; qual_match_simple_bayesian[46][18] = -0.0160009; qual_match_simple_bayesian[46][19] = -0.0126942; qual_match_simple_bayesian[46][20] = -0.0100754; qual_match_simple_bayesian[46][21] = -0.00800005; qual_match_simple_bayesian[46][22] = -0.00635463; qual_match_simple_bayesian[46][23] = -0.00504955; qual_match_simple_bayesian[46][24] = -0.0040141; qual_match_simple_bayesian[46][25] = -0.00319238; qual_match_simple_bayesian[46][26] = -0.00254014; qual_match_simple_bayesian[46][27] = -0.00202236; qual_match_simple_bayesian[46][28] = -0.00161126; qual_match_simple_bayesian[46][29] = -0.00128483; qual_match_simple_bayesian[46][30] = -0.00102561; qual_match_simple_bayesian[46][31] = -0.000819756; qual_match_simple_bayesian[46][32] = -0.00065627; qual_match_simple_bayesian[46][33] = -0.000526428; qual_match_simple_bayesian[46][34] = -0.000423302; qual_match_simple_bayesian[46][35] = -0.000341394; qual_match_simple_bayesian[46][36] = -0.000276337; qual_match_simple_bayesian[46][37] = -0.000224664; qual_match_simple_bayesian[46][38] = -0.00018362; qual_match_simple_bayesian[46][39] = -0.000151019; qual_match_simple_bayesian[46][40] = -0.000125123; qual_match_simple_bayesian[46][41] = -0.000104554; qual_match_simple_bayesian[46][42] = -8.82164e-05; qual_match_simple_bayesian[46][43] = -7.52387e-05; qual_match_simple_bayesian[46][44] = -6.49304e-05; qual_match_simple_bayesian[46][45] = -5.67422e-05; qual_match_simple_bayesian[46][46] = -5.02381e-05; vector< vector > qual_mismatch_simple_bayesian; qual_mismatch_simple_bayesian.resize(47); for (int i = 0; i < qual_mismatch_simple_bayesian.size(); i++) { qual_mismatch_simple_bayesian[i].resize(47); } qual_mismatch_simple_bayesian[0][0] = -1.50408; qual_mismatch_simple_bayesian[0][1] = -1.40619; qual_mismatch_simple_bayesian[0][2] = -1.33474; qual_mismatch_simple_bayesian[0][3] = -1.28141; qual_mismatch_simple_bayesian[0][4] = -1.24099; qual_mismatch_simple_bayesian[0][5] = -1.21; qual_mismatch_simple_bayesian[0][6] = -1.18606; qual_mismatch_simple_bayesian[0][7] = -1.16744; qual_mismatch_simple_bayesian[0][8] = -1.15289; qual_mismatch_simple_bayesian[0][9] = -1.14148; qual_mismatch_simple_bayesian[0][10] = -1.13251; qual_mismatch_simple_bayesian[0][11] = -1.12545; qual_mismatch_simple_bayesian[0][12] = -1.11987; qual_mismatch_simple_bayesian[0][13] = -1.11546; qual_mismatch_simple_bayesian[0][14] = -1.11197; qual_mismatch_simple_bayesian[0][15] = -1.10921; qual_mismatch_simple_bayesian[0][16] = -1.10702; qual_mismatch_simple_bayesian[0][17] = -1.10529; qual_mismatch_simple_bayesian[0][18] = -1.10391; qual_mismatch_simple_bayesian[0][19] = -1.10282; qual_mismatch_simple_bayesian[0][20] = -1.10195; qual_mismatch_simple_bayesian[0][21] = -1.10126; qual_mismatch_simple_bayesian[0][22] = -1.10072; qual_mismatch_simple_bayesian[0][23] = -1.10028; qual_mismatch_simple_bayesian[0][24] = -1.09994; qual_mismatch_simple_bayesian[0][25] = -1.09967; qual_mismatch_simple_bayesian[0][26] = -1.09945; qual_mismatch_simple_bayesian[0][27] = -1.09928; qual_mismatch_simple_bayesian[0][28] = -1.09914; qual_mismatch_simple_bayesian[0][29] = -1.09903; qual_mismatch_simple_bayesian[0][30] = -1.09895; qual_mismatch_simple_bayesian[0][31] = -1.09888; qual_mismatch_simple_bayesian[0][32] = -1.09882; qual_mismatch_simple_bayesian[0][33] = -1.09878; qual_mismatch_simple_bayesian[0][34] = -1.09874; qual_mismatch_simple_bayesian[0][35] = -1.09872; qual_mismatch_simple_bayesian[0][36] = -1.0987; qual_mismatch_simple_bayesian[0][37] = -1.09868; qual_mismatch_simple_bayesian[0][38] = -1.09867; qual_mismatch_simple_bayesian[0][39] = -1.09865; qual_mismatch_simple_bayesian[0][40] = -1.09865; qual_mismatch_simple_bayesian[0][41] = -1.09864; qual_mismatch_simple_bayesian[0][42] = -1.09863; qual_mismatch_simple_bayesian[0][43] = -1.09863; qual_mismatch_simple_bayesian[0][44] = -1.09863; qual_mismatch_simple_bayesian[0][45] = -1.09862; qual_mismatch_simple_bayesian[0][46] = -1.09862; qual_mismatch_simple_bayesian[1][0] = -1.40619; qual_mismatch_simple_bayesian[1][1] = -1.38979; qual_mismatch_simple_bayesian[1][2] = -1.37696; qual_mismatch_simple_bayesian[1][3] = -1.36688; qual_mismatch_simple_bayesian[1][4] = -1.35894; qual_mismatch_simple_bayesian[1][5] = -1.35268; qual_mismatch_simple_bayesian[1][6] = -1.34774; qual_mismatch_simple_bayesian[1][7] = -1.34383; qual_mismatch_simple_bayesian[1][8] = -1.34073; qual_mismatch_simple_bayesian[1][9] = -1.33828; qual_mismatch_simple_bayesian[1][10] = -1.33634; qual_mismatch_simple_bayesian[1][11] = -1.3348; qual_mismatch_simple_bayesian[1][12] = -1.33358; qual_mismatch_simple_bayesian[1][13] = -1.33261; qual_mismatch_simple_bayesian[1][14] = -1.33184; qual_mismatch_simple_bayesian[1][15] = -1.33123; qual_mismatch_simple_bayesian[1][16] = -1.33074; qual_mismatch_simple_bayesian[1][17] = -1.33036; qual_mismatch_simple_bayesian[1][18] = -1.33005; qual_mismatch_simple_bayesian[1][19] = -1.32981; qual_mismatch_simple_bayesian[1][20] = -1.32962; qual_mismatch_simple_bayesian[1][21] = -1.32946; qual_mismatch_simple_bayesian[1][22] = -1.32934; qual_mismatch_simple_bayesian[1][23] = -1.32924; qual_mismatch_simple_bayesian[1][24] = -1.32917; qual_mismatch_simple_bayesian[1][25] = -1.32911; qual_mismatch_simple_bayesian[1][26] = -1.32906; qual_mismatch_simple_bayesian[1][27] = -1.32902; qual_mismatch_simple_bayesian[1][28] = -1.32899; qual_mismatch_simple_bayesian[1][29] = -1.32896; qual_mismatch_simple_bayesian[1][30] = -1.32895; qual_mismatch_simple_bayesian[1][31] = -1.32893; qual_mismatch_simple_bayesian[1][32] = -1.32892; qual_mismatch_simple_bayesian[1][33] = -1.32891; qual_mismatch_simple_bayesian[1][34] = -1.3289; qual_mismatch_simple_bayesian[1][35] = -1.32889; qual_mismatch_simple_bayesian[1][36] = -1.32889; qual_mismatch_simple_bayesian[1][37] = -1.32889; qual_mismatch_simple_bayesian[1][38] = -1.32888; qual_mismatch_simple_bayesian[1][39] = -1.32888; qual_mismatch_simple_bayesian[1][40] = -1.32888; qual_mismatch_simple_bayesian[1][41] = -1.32888; qual_mismatch_simple_bayesian[1][42] = -1.32888; qual_mismatch_simple_bayesian[1][43] = -1.32887; qual_mismatch_simple_bayesian[1][44] = -1.32887; qual_mismatch_simple_bayesian[1][45] = -1.32887; qual_mismatch_simple_bayesian[1][46] = -1.32887; qual_mismatch_simple_bayesian[2][0] = -1.33474; qual_mismatch_simple_bayesian[2][1] = -1.37696; qual_mismatch_simple_bayesian[2][2] = -1.41181; qual_mismatch_simple_bayesian[2][3] = -1.44039; qual_mismatch_simple_bayesian[2][4] = -1.46368; qual_mismatch_simple_bayesian[2][5] = -1.48258; qual_mismatch_simple_bayesian[2][6] = -1.49786; qual_mismatch_simple_bayesian[2][7] = -1.51016; qual_mismatch_simple_bayesian[2][8] = -1.52003; qual_mismatch_simple_bayesian[2][9] = -1.52795; qual_mismatch_simple_bayesian[2][10] = -1.53428; qual_mismatch_simple_bayesian[2][11] = -1.53934; qual_mismatch_simple_bayesian[2][12] = -1.54338; qual_mismatch_simple_bayesian[2][13] = -1.5466; qual_mismatch_simple_bayesian[2][14] = -1.54916; qual_mismatch_simple_bayesian[2][15] = -1.55121; qual_mismatch_simple_bayesian[2][16] = -1.55283; qual_mismatch_simple_bayesian[2][17] = -1.55412; qual_mismatch_simple_bayesian[2][18] = -1.55515; qual_mismatch_simple_bayesian[2][19] = -1.55597; qual_mismatch_simple_bayesian[2][20] = -1.55662; qual_mismatch_simple_bayesian[2][21] = -1.55713; qual_mismatch_simple_bayesian[2][22] = -1.55754; qual_mismatch_simple_bayesian[2][23] = -1.55787; qual_mismatch_simple_bayesian[2][24] = -1.55813; qual_mismatch_simple_bayesian[2][25] = -1.55833; qual_mismatch_simple_bayesian[2][26] = -1.5585; qual_mismatch_simple_bayesian[2][27] = -1.55863; qual_mismatch_simple_bayesian[2][28] = -1.55873; qual_mismatch_simple_bayesian[2][29] = -1.55881; qual_mismatch_simple_bayesian[2][30] = -1.55888; qual_mismatch_simple_bayesian[2][31] = -1.55893; qual_mismatch_simple_bayesian[2][32] = -1.55897; qual_mismatch_simple_bayesian[2][33] = -1.559; qual_mismatch_simple_bayesian[2][34] = -1.55903; qual_mismatch_simple_bayesian[2][35] = -1.55905; qual_mismatch_simple_bayesian[2][36] = -1.55907; qual_mismatch_simple_bayesian[2][37] = -1.55908; qual_mismatch_simple_bayesian[2][38] = -1.55909; qual_mismatch_simple_bayesian[2][39] = -1.5591; qual_mismatch_simple_bayesian[2][40] = -1.5591; qual_mismatch_simple_bayesian[2][41] = -1.55911; qual_mismatch_simple_bayesian[2][42] = -1.55911; qual_mismatch_simple_bayesian[2][43] = -1.55912; qual_mismatch_simple_bayesian[2][44] = -1.55912; qual_mismatch_simple_bayesian[2][45] = -1.55912; qual_mismatch_simple_bayesian[2][46] = -1.55912; qual_mismatch_simple_bayesian[3][0] = -1.28141; qual_mismatch_simple_bayesian[3][1] = -1.36688; qual_mismatch_simple_bayesian[3][2] = -1.44039; qual_mismatch_simple_bayesian[3][3] = -1.50289; qual_mismatch_simple_bayesian[3][4] = -1.55549; qual_mismatch_simple_bayesian[3][5] = -1.59933; qual_mismatch_simple_bayesian[3][6] = -1.63558; qual_mismatch_simple_bayesian[3][7] = -1.66534; qual_mismatch_simple_bayesian[3][8] = -1.68963; qual_mismatch_simple_bayesian[3][9] = -1.70935; qual_mismatch_simple_bayesian[3][10] = -1.72529; qual_mismatch_simple_bayesian[3][11] = -1.73814; qual_mismatch_simple_bayesian[3][12] = -1.74847; qual_mismatch_simple_bayesian[3][13] = -1.75675; qual_mismatch_simple_bayesian[3][14] = -1.76338; qual_mismatch_simple_bayesian[3][15] = -1.76867; qual_mismatch_simple_bayesian[3][16] = -1.7729; qual_mismatch_simple_bayesian[3][17] = -1.77627; qual_mismatch_simple_bayesian[3][18] = -1.77895; qual_mismatch_simple_bayesian[3][19] = -1.78109; qual_mismatch_simple_bayesian[3][20] = -1.78279; qual_mismatch_simple_bayesian[3][21] = -1.78414; qual_mismatch_simple_bayesian[3][22] = -1.78522; qual_mismatch_simple_bayesian[3][23] = -1.78608; qual_mismatch_simple_bayesian[3][24] = -1.78676; qual_mismatch_simple_bayesian[3][25] = -1.7873; qual_mismatch_simple_bayesian[3][26] = -1.78773; qual_mismatch_simple_bayesian[3][27] = -1.78807; qual_mismatch_simple_bayesian[3][28] = -1.78834; qual_mismatch_simple_bayesian[3][29] = -1.78855; qual_mismatch_simple_bayesian[3][30] = -1.78873; qual_mismatch_simple_bayesian[3][31] = -1.78886; qual_mismatch_simple_bayesian[3][32] = -1.78897; qual_mismatch_simple_bayesian[3][33] = -1.78906; qual_mismatch_simple_bayesian[3][34] = -1.78912; qual_mismatch_simple_bayesian[3][35] = -1.78918; qual_mismatch_simple_bayesian[3][36] = -1.78922; qual_mismatch_simple_bayesian[3][37] = -1.78926; qual_mismatch_simple_bayesian[3][38] = -1.78928; qual_mismatch_simple_bayesian[3][39] = -1.7893; qual_mismatch_simple_bayesian[3][40] = -1.78932; qual_mismatch_simple_bayesian[3][41] = -1.78934; qual_mismatch_simple_bayesian[3][42] = -1.78935; qual_mismatch_simple_bayesian[3][43] = -1.78935; qual_mismatch_simple_bayesian[3][44] = -1.78936; qual_mismatch_simple_bayesian[3][45] = -1.78937; qual_mismatch_simple_bayesian[3][46] = -1.78937; qual_mismatch_simple_bayesian[4][0] = -1.24099; qual_mismatch_simple_bayesian[4][1] = -1.35894; qual_mismatch_simple_bayesian[4][2] = -1.46368; qual_mismatch_simple_bayesian[4][3] = -1.55549; qual_mismatch_simple_bayesian[4][4] = -1.63493; qual_mismatch_simple_bayesian[4][5] = -1.70287; qual_mismatch_simple_bayesian[4][6] = -1.76033; qual_mismatch_simple_bayesian[4][7] = -1.80845; qual_mismatch_simple_bayesian[4][8] = -1.8484; qual_mismatch_simple_bayesian[4][9] = -1.8813; qual_mismatch_simple_bayesian[4][10] = -1.90823; qual_mismatch_simple_bayesian[4][11] = -1.93016; qual_mismatch_simple_bayesian[4][12] = -1.94792; qual_mismatch_simple_bayesian[4][13] = -1.96226; qual_mismatch_simple_bayesian[4][14] = -1.97379; qual_mismatch_simple_bayesian[4][15] = -1.98305; qual_mismatch_simple_bayesian[4][16] = -1.99047; qual_mismatch_simple_bayesian[4][17] = -1.9964; qual_mismatch_simple_bayesian[4][18] = -2.00114; qual_mismatch_simple_bayesian[4][19] = -2.00492; qual_mismatch_simple_bayesian[4][20] = -2.00793; qual_mismatch_simple_bayesian[4][21] = -2.01033; qual_mismatch_simple_bayesian[4][22] = -2.01224; qual_mismatch_simple_bayesian[4][23] = -2.01376; qual_mismatch_simple_bayesian[4][24] = -2.01497; qual_mismatch_simple_bayesian[4][25] = -2.01593; qual_mismatch_simple_bayesian[4][26] = -2.01669; qual_mismatch_simple_bayesian[4][27] = -2.0173; qual_mismatch_simple_bayesian[4][28] = -2.01778; qual_mismatch_simple_bayesian[4][29] = -2.01816; qual_mismatch_simple_bayesian[4][30] = -2.01847; qual_mismatch_simple_bayesian[4][31] = -2.01871; qual_mismatch_simple_bayesian[4][32] = -2.0189; qual_mismatch_simple_bayesian[4][33] = -2.01906; qual_mismatch_simple_bayesian[4][34] = -2.01918; qual_mismatch_simple_bayesian[4][35] = -2.01927; qual_mismatch_simple_bayesian[4][36] = -2.01935; qual_mismatch_simple_bayesian[4][37] = -2.01941; qual_mismatch_simple_bayesian[4][38] = -2.01946; qual_mismatch_simple_bayesian[4][39] = -2.0195; qual_mismatch_simple_bayesian[4][40] = -2.01953; qual_mismatch_simple_bayesian[4][41] = -2.01955; qual_mismatch_simple_bayesian[4][42] = -2.01957; qual_mismatch_simple_bayesian[4][43] = -2.01959; qual_mismatch_simple_bayesian[4][44] = -2.0196; qual_mismatch_simple_bayesian[4][45] = -2.01961; qual_mismatch_simple_bayesian[4][46] = -2.01962; qual_mismatch_simple_bayesian[5][0] = -1.21; qual_mismatch_simple_bayesian[5][1] = -1.35268; qual_mismatch_simple_bayesian[5][2] = -1.48258; qual_mismatch_simple_bayesian[5][3] = -1.59933; qual_mismatch_simple_bayesian[5][4] = -1.70287; qual_mismatch_simple_bayesian[5][5] = -1.79352; qual_mismatch_simple_bayesian[5][6] = -1.87187; qual_mismatch_simple_bayesian[5][7] = -1.93881; qual_mismatch_simple_bayesian[5][8] = -1.99536; qual_mismatch_simple_bayesian[5][9] = -2.04269; qual_mismatch_simple_bayesian[5][10] = -2.08194; qual_mismatch_simple_bayesian[5][11] = -2.11426; qual_mismatch_simple_bayesian[5][12] = -2.14069; qual_mismatch_simple_bayesian[5][13] = -2.1622; qual_mismatch_simple_bayesian[5][14] = -2.17962; qual_mismatch_simple_bayesian[5][15] = -2.19368; qual_mismatch_simple_bayesian[5][16] = -2.20499; qual_mismatch_simple_bayesian[5][17] = -2.21406; qual_mismatch_simple_bayesian[5][18] = -2.22133; qual_mismatch_simple_bayesian[5][19] = -2.22714; qual_mismatch_simple_bayesian[5][20] = -2.23178; qual_mismatch_simple_bayesian[5][21] = -2.23548; qual_mismatch_simple_bayesian[5][22] = -2.23843; qual_mismatch_simple_bayesian[5][23] = -2.24078; qual_mismatch_simple_bayesian[5][24] = -2.24265; qual_mismatch_simple_bayesian[5][25] = -2.24414; qual_mismatch_simple_bayesian[5][26] = -2.24532; qual_mismatch_simple_bayesian[5][27] = -2.24626; qual_mismatch_simple_bayesian[5][28] = -2.24701; qual_mismatch_simple_bayesian[5][29] = -2.2476; qual_mismatch_simple_bayesian[5][30] = -2.24808; qual_mismatch_simple_bayesian[5][31] = -2.24845; qual_mismatch_simple_bayesian[5][32] = -2.24875; qual_mismatch_simple_bayesian[5][33] = -2.24899; qual_mismatch_simple_bayesian[5][34] = -2.24918; qual_mismatch_simple_bayesian[5][35] = -2.24933; qual_mismatch_simple_bayesian[5][36] = -2.24945; qual_mismatch_simple_bayesian[5][37] = -2.24954; qual_mismatch_simple_bayesian[5][38] = -2.24962; qual_mismatch_simple_bayesian[5][39] = -2.24967; qual_mismatch_simple_bayesian[5][40] = -2.24972; qual_mismatch_simple_bayesian[5][41] = -2.24976; qual_mismatch_simple_bayesian[5][42] = -2.24979; qual_mismatch_simple_bayesian[5][43] = -2.24981; qual_mismatch_simple_bayesian[5][44] = -2.24983; qual_mismatch_simple_bayesian[5][45] = -2.24985; qual_mismatch_simple_bayesian[5][46] = -2.24986; qual_mismatch_simple_bayesian[6][0] = -1.18606; qual_mismatch_simple_bayesian[6][1] = -1.34774; qual_mismatch_simple_bayesian[6][2] = -1.49786; qual_mismatch_simple_bayesian[6][3] = -1.63558; qual_mismatch_simple_bayesian[6][4] = -1.76033; qual_mismatch_simple_bayesian[6][5] = -1.87187; qual_mismatch_simple_bayesian[6][6] = -1.97029; qual_mismatch_simple_bayesian[6][7] = -2.05601; qual_mismatch_simple_bayesian[6][8] = -2.12976; qual_mismatch_simple_bayesian[6][9] = -2.19248; qual_mismatch_simple_bayesian[6][10] = -2.24527; qual_mismatch_simple_bayesian[6][11] = -2.28928; qual_mismatch_simple_bayesian[6][12] = -2.32567; qual_mismatch_simple_bayesian[6][13] = -2.35556; qual_mismatch_simple_bayesian[6][14] = -2.37995; qual_mismatch_simple_bayesian[6][15] = -2.39976; qual_mismatch_simple_bayesian[6][16] = -2.41577; qual_mismatch_simple_bayesian[6][17] = -2.42868; qual_mismatch_simple_bayesian[6][18] = -2.43906; qual_mismatch_simple_bayesian[6][19] = -2.44737; qual_mismatch_simple_bayesian[6][20] = -2.45403; qual_mismatch_simple_bayesian[6][21] = -2.45935; qual_mismatch_simple_bayesian[6][22] = -2.4636; qual_mismatch_simple_bayesian[6][23] = -2.46698; qual_mismatch_simple_bayesian[6][24] = -2.46968; qual_mismatch_simple_bayesian[6][25] = -2.47183; qual_mismatch_simple_bayesian[6][26] = -2.47353; qual_mismatch_simple_bayesian[6][27] = -2.47489; qual_mismatch_simple_bayesian[6][28] = -2.47598; qual_mismatch_simple_bayesian[6][29] = -2.47684; qual_mismatch_simple_bayesian[6][30] = -2.47752; qual_mismatch_simple_bayesian[6][31] = -2.47806; qual_mismatch_simple_bayesian[6][32] = -2.47849; qual_mismatch_simple_bayesian[6][33] = -2.47884; qual_mismatch_simple_bayesian[6][34] = -2.47911; qual_mismatch_simple_bayesian[6][35] = -2.47933; qual_mismatch_simple_bayesian[6][36] = -2.4795; qual_mismatch_simple_bayesian[6][37] = -2.47964; qual_mismatch_simple_bayesian[6][38] = -2.47974; qual_mismatch_simple_bayesian[6][39] = -2.47983; qual_mismatch_simple_bayesian[6][40] = -2.4799; qual_mismatch_simple_bayesian[6][41] = -2.47995; qual_mismatch_simple_bayesian[6][42] = -2.48; qual_mismatch_simple_bayesian[6][43] = -2.48003; qual_mismatch_simple_bayesian[6][44] = -2.48006; qual_mismatch_simple_bayesian[6][45] = -2.48008; qual_mismatch_simple_bayesian[6][46] = -2.4801; qual_mismatch_simple_bayesian[7][0] = -1.16744; qual_mismatch_simple_bayesian[7][1] = -1.34383; qual_mismatch_simple_bayesian[7][2] = -1.51016; qual_mismatch_simple_bayesian[7][3] = -1.66534; qual_mismatch_simple_bayesian[7][4] = -1.80845; qual_mismatch_simple_bayesian[7][5] = -1.93881; qual_mismatch_simple_bayesian[7][6] = -2.05601; qual_mismatch_simple_bayesian[7][7] = -2.16001; qual_mismatch_simple_bayesian[7][8] = -2.25109; qual_mismatch_simple_bayesian[7][9] = -2.32986; qual_mismatch_simple_bayesian[7][10] = -2.39718; qual_mismatch_simple_bayesian[7][11] = -2.45408; qual_mismatch_simple_bayesian[7][12] = -2.5017; qual_mismatch_simple_bayesian[7][13] = -2.54122; qual_mismatch_simple_bayesian[7][14] = -2.57376; qual_mismatch_simple_bayesian[7][15] = -2.60038; qual_mismatch_simple_bayesian[7][16] = -2.62204; qual_mismatch_simple_bayesian[7][17] = -2.63959; qual_mismatch_simple_bayesian[7][18] = -2.65376; qual_mismatch_simple_bayesian[7][19] = -2.66515; qual_mismatch_simple_bayesian[7][20] = -2.6743; qual_mismatch_simple_bayesian[7][21] = -2.68162; qual_mismatch_simple_bayesian[7][22] = -2.68748; qual_mismatch_simple_bayesian[7][23] = -2.69215; qual_mismatch_simple_bayesian[7][24] = -2.69588; qual_mismatch_simple_bayesian[7][25] = -2.69886; qual_mismatch_simple_bayesian[7][26] = -2.70122; qual_mismatch_simple_bayesian[7][27] = -2.70311; qual_mismatch_simple_bayesian[7][28] = -2.70461; qual_mismatch_simple_bayesian[7][29] = -2.7058; qual_mismatch_simple_bayesian[7][30] = -2.70675; qual_mismatch_simple_bayesian[7][31] = -2.7075; qual_mismatch_simple_bayesian[7][32] = -2.7081; qual_mismatch_simple_bayesian[7][33] = -2.70858; qual_mismatch_simple_bayesian[7][34] = -2.70896; qual_mismatch_simple_bayesian[7][35] = -2.70926; qual_mismatch_simple_bayesian[7][36] = -2.7095; qual_mismatch_simple_bayesian[7][37] = -2.70969; qual_mismatch_simple_bayesian[7][38] = -2.70984; qual_mismatch_simple_bayesian[7][39] = -2.70996; qual_mismatch_simple_bayesian[7][40] = -2.71005; qual_mismatch_simple_bayesian[7][41] = -2.71013; qual_mismatch_simple_bayesian[7][42] = -2.71019; qual_mismatch_simple_bayesian[7][43] = -2.71024; qual_mismatch_simple_bayesian[7][44] = -2.71028; qual_mismatch_simple_bayesian[7][45] = -2.71031; qual_mismatch_simple_bayesian[7][46] = -2.71033; qual_mismatch_simple_bayesian[8][0] = -1.15289; qual_mismatch_simple_bayesian[8][1] = -1.34073; qual_mismatch_simple_bayesian[8][2] = -1.52003; qual_mismatch_simple_bayesian[8][3] = -1.68963; qual_mismatch_simple_bayesian[8][4] = -1.8484; qual_mismatch_simple_bayesian[8][5] = -1.99536; qual_mismatch_simple_bayesian[8][6] = -2.12976; qual_mismatch_simple_bayesian[8][7] = -2.25109; qual_mismatch_simple_bayesian[8][8] = -2.3592; qual_mismatch_simple_bayesian[8][9] = -2.45427; qual_mismatch_simple_bayesian[8][10] = -2.5368; qual_mismatch_simple_bayesian[8][11] = -2.60759; qual_mismatch_simple_bayesian[8][12] = -2.66762; qual_mismatch_simple_bayesian[8][13] = -2.71801; qual_mismatch_simple_bayesian[8][14] = -2.75994; qual_mismatch_simple_bayesian[8][15] = -2.79454; qual_mismatch_simple_bayesian[8][16] = -2.8229; qual_mismatch_simple_bayesian[8][17] = -2.84602; qual_mismatch_simple_bayesian[8][18] = -2.86477; qual_mismatch_simple_bayesian[8][19] = -2.87992; qual_mismatch_simple_bayesian[8][20] = -2.89212; qual_mismatch_simple_bayesian[8][21] = -2.90191; qual_mismatch_simple_bayesian[8][22] = -2.90977; qual_mismatch_simple_bayesian[8][23] = -2.91605; qual_mismatch_simple_bayesian[8][24] = -2.92106; qual_mismatch_simple_bayesian[8][25] = -2.92507; qual_mismatch_simple_bayesian[8][26] = -2.92826; qual_mismatch_simple_bayesian[8][27] = -2.9308; qual_mismatch_simple_bayesian[8][28] = -2.93282; qual_mismatch_simple_bayesian[8][29] = -2.93444; qual_mismatch_simple_bayesian[8][30] = -2.93572; qual_mismatch_simple_bayesian[8][31] = -2.93674; qual_mismatch_simple_bayesian[8][32] = -2.93755; qual_mismatch_simple_bayesian[8][33] = -2.93819; qual_mismatch_simple_bayesian[8][34] = -2.9387; qual_mismatch_simple_bayesian[8][35] = -2.93911; qual_mismatch_simple_bayesian[8][36] = -2.93943; qual_mismatch_simple_bayesian[8][37] = -2.93969; qual_mismatch_simple_bayesian[8][38] = -2.93989; qual_mismatch_simple_bayesian[8][39] = -2.94005; qual_mismatch_simple_bayesian[8][40] = -2.94018; qual_mismatch_simple_bayesian[8][41] = -2.94029; qual_mismatch_simple_bayesian[8][42] = -2.94037; qual_mismatch_simple_bayesian[8][43] = -2.94043; qual_mismatch_simple_bayesian[8][44] = -2.94048; qual_mismatch_simple_bayesian[8][45] = -2.94052; qual_mismatch_simple_bayesian[8][46] = -2.94056; qual_mismatch_simple_bayesian[9][0] = -1.14148; qual_mismatch_simple_bayesian[9][1] = -1.33828; qual_mismatch_simple_bayesian[9][2] = -1.52795; qual_mismatch_simple_bayesian[9][3] = -1.70935; qual_mismatch_simple_bayesian[9][4] = -1.8813; qual_mismatch_simple_bayesian[9][5] = -2.04269; qual_mismatch_simple_bayesian[9][6] = -2.19248; qual_mismatch_simple_bayesian[9][7] = -2.32986; qual_mismatch_simple_bayesian[9][8] = -2.45427; qual_mismatch_simple_bayesian[9][9] = -2.56545; qual_mismatch_simple_bayesian[9][10] = -2.66352; qual_mismatch_simple_bayesian[9][11] = -2.74891; qual_mismatch_simple_bayesian[9][12] = -2.82235; qual_mismatch_simple_bayesian[9][13] = -2.8848; qual_mismatch_simple_bayesian[9][14] = -2.93733; qual_mismatch_simple_bayesian[9][15] = -2.98112; qual_mismatch_simple_bayesian[9][16] = -3.01733; qual_mismatch_simple_bayesian[9][17] = -3.04705; qual_mismatch_simple_bayesian[9][18] = -3.07131; qual_mismatch_simple_bayesian[9][19] = -3.09101; qual_mismatch_simple_bayesian[9][20] = -3.10693; qual_mismatch_simple_bayesian[9][21] = -3.11977; qual_mismatch_simple_bayesian[9][22] = -3.13008; qual_mismatch_simple_bayesian[9][23] = -3.13835; qual_mismatch_simple_bayesian[9][24] = -3.14496; qual_mismatch_simple_bayesian[9][25] = -3.15025; qual_mismatch_simple_bayesian[9][26] = -3.15447; qual_mismatch_simple_bayesian[9][27] = -3.15784; qual_mismatch_simple_bayesian[9][28] = -3.16052; qual_mismatch_simple_bayesian[9][29] = -3.16265; qual_mismatch_simple_bayesian[9][30] = -3.16435; qual_mismatch_simple_bayesian[9][31] = -3.1657; qual_mismatch_simple_bayesian[9][32] = -3.16678; qual_mismatch_simple_bayesian[9][33] = -3.16763; qual_mismatch_simple_bayesian[9][34] = -3.16831; qual_mismatch_simple_bayesian[9][35] = -3.16885; qual_mismatch_simple_bayesian[9][36] = -3.16928; qual_mismatch_simple_bayesian[9][37] = -3.16962; qual_mismatch_simple_bayesian[9][38] = -3.16989; qual_mismatch_simple_bayesian[9][39] = -3.17011; qual_mismatch_simple_bayesian[9][40] = -3.17028; qual_mismatch_simple_bayesian[9][41] = -3.17041; qual_mismatch_simple_bayesian[9][42] = -3.17052; qual_mismatch_simple_bayesian[9][43] = -3.17061; qual_mismatch_simple_bayesian[9][44] = -3.17068; qual_mismatch_simple_bayesian[9][45] = -3.17073; qual_mismatch_simple_bayesian[9][46] = -3.17077; qual_mismatch_simple_bayesian[10][0] = -1.13251; qual_mismatch_simple_bayesian[10][1] = -1.33634; qual_mismatch_simple_bayesian[10][2] = -1.53428; qual_mismatch_simple_bayesian[10][3] = -1.72529; qual_mismatch_simple_bayesian[10][4] = -1.90823; qual_mismatch_simple_bayesian[10][5] = -2.08194; qual_mismatch_simple_bayesian[10][6] = -2.24527; qual_mismatch_simple_bayesian[10][7] = -2.39718; qual_mismatch_simple_bayesian[10][8] = -2.5368; qual_mismatch_simple_bayesian[10][9] = -2.66352; qual_mismatch_simple_bayesian[10][10] = -2.77704; qual_mismatch_simple_bayesian[10][11] = -2.87741; qual_mismatch_simple_bayesian[10][12] = -2.96499; qual_mismatch_simple_bayesian[10][13] = -3.04048; qual_mismatch_simple_bayesian[10][14] = -3.10478; qual_mismatch_simple_bayesian[10][15] = -3.15899; qual_mismatch_simple_bayesian[10][16] = -3.20424; qual_mismatch_simple_bayesian[10][17] = -3.2417; qual_mismatch_simple_bayesian[10][18] = -3.27249; qual_mismatch_simple_bayesian[10][19] = -3.29764; qual_mismatch_simple_bayesian[10][20] = -3.31808; qual_mismatch_simple_bayesian[10][21] = -3.33462; qual_mismatch_simple_bayesian[10][22] = -3.34796; qual_mismatch_simple_bayesian[10][23] = -3.35868; qual_mismatch_simple_bayesian[10][24] = -3.36728; qual_mismatch_simple_bayesian[10][25] = -3.37416; qual_mismatch_simple_bayesian[10][26] = -3.37966; qual_mismatch_simple_bayesian[10][27] = -3.38405; qual_mismatch_simple_bayesian[10][28] = -3.38756; qual_mismatch_simple_bayesian[10][29] = -3.39035; qual_mismatch_simple_bayesian[10][30] = -3.39257; qual_mismatch_simple_bayesian[10][31] = -3.39434; qual_mismatch_simple_bayesian[10][32] = -3.39574; qual_mismatch_simple_bayesian[10][33] = -3.39686; qual_mismatch_simple_bayesian[10][34] = -3.39775; qual_mismatch_simple_bayesian[10][35] = -3.39846; qual_mismatch_simple_bayesian[10][36] = -3.39902; qual_mismatch_simple_bayesian[10][37] = -3.39947; qual_mismatch_simple_bayesian[10][38] = -3.39982; qual_mismatch_simple_bayesian[10][39] = -3.40011; qual_mismatch_simple_bayesian[10][40] = -3.40033; qual_mismatch_simple_bayesian[10][41] = -3.40051; qual_mismatch_simple_bayesian[10][42] = -3.40065; qual_mismatch_simple_bayesian[10][43] = -3.40076; qual_mismatch_simple_bayesian[10][44] = -3.40085; qual_mismatch_simple_bayesian[10][45] = -3.40092; qual_mismatch_simple_bayesian[10][46] = -3.40098; qual_mismatch_simple_bayesian[11][0] = -1.12545; qual_mismatch_simple_bayesian[11][1] = -1.3348; qual_mismatch_simple_bayesian[11][2] = -1.53934; qual_mismatch_simple_bayesian[11][3] = -1.73814; qual_mismatch_simple_bayesian[11][4] = -1.93016; qual_mismatch_simple_bayesian[11][5] = -2.11426; qual_mismatch_simple_bayesian[11][6] = -2.28928; qual_mismatch_simple_bayesian[11][7] = -2.45408; qual_mismatch_simple_bayesian[11][8] = -2.60759; qual_mismatch_simple_bayesian[11][9] = -2.74891; qual_mismatch_simple_bayesian[11][10] = -2.87741; qual_mismatch_simple_bayesian[11][11] = -2.99272; qual_mismatch_simple_bayesian[11][12] = -3.09485; qual_mismatch_simple_bayesian[11][13] = -3.18412; qual_mismatch_simple_bayesian[11][14] = -3.2612; qual_mismatch_simple_bayesian[11][15] = -3.32696; qual_mismatch_simple_bayesian[11][16] = -3.38246; qual_mismatch_simple_bayesian[11][17] = -3.42885; qual_mismatch_simple_bayesian[11][18] = -3.4673; qual_mismatch_simple_bayesian[11][19] = -3.49893; qual_mismatch_simple_bayesian[11][20] = -3.52479; qual_mismatch_simple_bayesian[11][21] = -3.54582; qual_mismatch_simple_bayesian[11][22] = -3.56284; qual_mismatch_simple_bayesian[11][23] = -3.57658; qual_mismatch_simple_bayesian[11][24] = -3.58762; qual_mismatch_simple_bayesian[11][25] = -3.59648; qual_mismatch_simple_bayesian[11][26] = -3.60357; qual_mismatch_simple_bayesian[11][27] = -3.60925; qual_mismatch_simple_bayesian[11][28] = -3.61377; qual_mismatch_simple_bayesian[11][29] = -3.61738; qual_mismatch_simple_bayesian[11][30] = -3.62026; qual_mismatch_simple_bayesian[11][31] = -3.62255; qual_mismatch_simple_bayesian[11][32] = -3.62438; qual_mismatch_simple_bayesian[11][33] = -3.62583; qual_mismatch_simple_bayesian[11][34] = -3.62698; qual_mismatch_simple_bayesian[11][35] = -3.6279; qual_mismatch_simple_bayesian[11][36] = -3.62863; qual_mismatch_simple_bayesian[11][37] = -3.62921; qual_mismatch_simple_bayesian[11][38] = -3.62967; qual_mismatch_simple_bayesian[11][39] = -3.63004; qual_mismatch_simple_bayesian[11][40] = -3.63033; qual_mismatch_simple_bayesian[11][41] = -3.63056; qual_mismatch_simple_bayesian[11][42] = -3.63075; qual_mismatch_simple_bayesian[11][43] = -3.63089; qual_mismatch_simple_bayesian[11][44] = -3.63101; qual_mismatch_simple_bayesian[11][45] = -3.6311; qual_mismatch_simple_bayesian[11][46] = -3.63117; qual_mismatch_simple_bayesian[12][0] = -1.11987; qual_mismatch_simple_bayesian[12][1] = -1.33358; qual_mismatch_simple_bayesian[12][2] = -1.54338; qual_mismatch_simple_bayesian[12][3] = -1.74847; qual_mismatch_simple_bayesian[12][4] = -1.94792; qual_mismatch_simple_bayesian[12][5] = -2.14069; qual_mismatch_simple_bayesian[12][6] = -2.32567; qual_mismatch_simple_bayesian[12][7] = -2.5017; qual_mismatch_simple_bayesian[12][8] = -2.66762; qual_mismatch_simple_bayesian[12][9] = -2.82235; qual_mismatch_simple_bayesian[12][10] = -2.96499; qual_mismatch_simple_bayesian[12][11] = -3.09485; qual_mismatch_simple_bayesian[12][12] = -3.21154; qual_mismatch_simple_bayesian[12][13] = -3.31504; qual_mismatch_simple_bayesian[12][14] = -3.40563; qual_mismatch_simple_bayesian[12][15] = -3.48395; qual_mismatch_simple_bayesian[12][16] = -3.55084; qual_mismatch_simple_bayesian[12][17] = -3.60736; qual_mismatch_simple_bayesian[12][18] = -3.65465; qual_mismatch_simple_bayesian[12][19] = -3.69388; qual_mismatch_simple_bayesian[12][20] = -3.72617; qual_mismatch_simple_bayesian[12][21] = -3.75259; qual_mismatch_simple_bayesian[12][22] = -3.77408; qual_mismatch_simple_bayesian[12][23] = -3.79149; qual_mismatch_simple_bayesian[12][24] = -3.80553; qual_mismatch_simple_bayesian[12][25] = -3.81683; qual_mismatch_simple_bayesian[12][26] = -3.8259; qual_mismatch_simple_bayesian[12][27] = -3.83316; qual_mismatch_simple_bayesian[12][28] = -3.83897; qual_mismatch_simple_bayesian[12][29] = -3.84361; qual_mismatch_simple_bayesian[12][30] = -3.8473; qual_mismatch_simple_bayesian[12][31] = -3.85025; qual_mismatch_simple_bayesian[12][32] = -3.8526; qual_mismatch_simple_bayesian[12][33] = -3.85447; qual_mismatch_simple_bayesian[12][34] = -3.85595; qual_mismatch_simple_bayesian[12][35] = -3.85713; qual_mismatch_simple_bayesian[12][36] = -3.85807; qual_mismatch_simple_bayesian[12][37] = -3.85882; qual_mismatch_simple_bayesian[12][38] = -3.85942; qual_mismatch_simple_bayesian[12][39] = -3.85989; qual_mismatch_simple_bayesian[12][40] = -3.86026; qual_mismatch_simple_bayesian[12][41] = -3.86056; qual_mismatch_simple_bayesian[12][42] = -3.8608; qual_mismatch_simple_bayesian[12][43] = -3.86099; qual_mismatch_simple_bayesian[12][44] = -3.86114; qual_mismatch_simple_bayesian[12][45] = -3.86126; qual_mismatch_simple_bayesian[12][46] = -3.86135; qual_mismatch_simple_bayesian[13][0] = -1.11546; qual_mismatch_simple_bayesian[13][1] = -1.33261; qual_mismatch_simple_bayesian[13][2] = -1.5466; qual_mismatch_simple_bayesian[13][3] = -1.75675; qual_mismatch_simple_bayesian[13][4] = -1.96226; qual_mismatch_simple_bayesian[13][5] = -2.1622; qual_mismatch_simple_bayesian[13][6] = -2.35556; qual_mismatch_simple_bayesian[13][7] = -2.54122; qual_mismatch_simple_bayesian[13][8] = -2.71801; qual_mismatch_simple_bayesian[13][9] = -2.8848; qual_mismatch_simple_bayesian[13][10] = -3.04048; qual_mismatch_simple_bayesian[13][11] = -3.18412; qual_mismatch_simple_bayesian[13][12] = -3.31504; qual_mismatch_simple_bayesian[13][13] = -3.43281; qual_mismatch_simple_bayesian[13][14] = -3.53737; qual_mismatch_simple_bayesian[13][15] = -3.629; qual_mismatch_simple_bayesian[13][16] = -3.70828; qual_mismatch_simple_bayesian[13][17] = -3.77607; qual_mismatch_simple_bayesian[13][18] = -3.83339; qual_mismatch_simple_bayesian[13][19] = -3.88139; qual_mismatch_simple_bayesian[13][20] = -3.92122; qual_mismatch_simple_bayesian[13][21] = -3.95404; qual_mismatch_simple_bayesian[13][22] = -3.9809; qual_mismatch_simple_bayesian[13][23] = -4.00276; qual_mismatch_simple_bayesian[13][24] = -4.02047; qual_mismatch_simple_bayesian[13][25] = -4.03476; qual_mismatch_simple_bayesian[13][26] = -4.04626; qual_mismatch_simple_bayesian[13][27] = -4.0555; qual_mismatch_simple_bayesian[13][28] = -4.06289; qual_mismatch_simple_bayesian[13][29] = -4.0688; qual_mismatch_simple_bayesian[13][30] = -4.07352; qual_mismatch_simple_bayesian[13][31] = -4.07729; qual_mismatch_simple_bayesian[13][32] = -4.08029; qual_mismatch_simple_bayesian[13][33] = -4.08268; qual_mismatch_simple_bayesian[13][34] = -4.08459; qual_mismatch_simple_bayesian[13][35] = -4.0861; qual_mismatch_simple_bayesian[13][36] = -4.08731; qual_mismatch_simple_bayesian[13][37] = -4.08826; qual_mismatch_simple_bayesian[13][38] = -4.08903; qual_mismatch_simple_bayesian[13][39] = -4.08963; qual_mismatch_simple_bayesian[13][40] = -4.09011; qual_mismatch_simple_bayesian[13][41] = -4.0905; qual_mismatch_simple_bayesian[13][42] = -4.0908; qual_mismatch_simple_bayesian[13][43] = -4.09104; qual_mismatch_simple_bayesian[13][44] = -4.09123; qual_mismatch_simple_bayesian[13][45] = -4.09138; qual_mismatch_simple_bayesian[13][46] = -4.09151; qual_mismatch_simple_bayesian[14][0] = -1.11197; qual_mismatch_simple_bayesian[14][1] = -1.33184; qual_mismatch_simple_bayesian[14][2] = -1.54916; qual_mismatch_simple_bayesian[14][3] = -1.76338; qual_mismatch_simple_bayesian[14][4] = -1.97379; qual_mismatch_simple_bayesian[14][5] = -2.17962; qual_mismatch_simple_bayesian[14][6] = -2.37995; qual_mismatch_simple_bayesian[14][7] = -2.57376; qual_mismatch_simple_bayesian[14][8] = -2.75994; qual_mismatch_simple_bayesian[14][9] = -2.93733; qual_mismatch_simple_bayesian[14][10] = -3.10478; qual_mismatch_simple_bayesian[14][11] = -3.2612; qual_mismatch_simple_bayesian[14][12] = -3.40563; qual_mismatch_simple_bayesian[14][13] = -3.53737; qual_mismatch_simple_bayesian[14][14] = -3.65598; qual_mismatch_simple_bayesian[14][15] = -3.76138; qual_mismatch_simple_bayesian[14][16] = -3.85381; qual_mismatch_simple_bayesian[14][17] = -3.93386; qual_mismatch_simple_bayesian[14][18] = -4.00234; qual_mismatch_simple_bayesian[14][19] = -4.0603; qual_mismatch_simple_bayesian[14][20] = -4.10885; qual_mismatch_simple_bayesian[14][21] = -4.14917; qual_mismatch_simple_bayesian[14][22] = -4.1824; qual_mismatch_simple_bayesian[14][23] = -4.20961; qual_mismatch_simple_bayesian[14][24] = -4.23176; qual_mismatch_simple_bayesian[14][25] = -4.24971; qual_mismatch_simple_bayesian[14][26] = -4.2642; qual_mismatch_simple_bayesian[14][27] = -4.27586; qual_mismatch_simple_bayesian[14][28] = -4.28523; qual_mismatch_simple_bayesian[14][29] = -4.29273; qual_mismatch_simple_bayesian[14][30] = -4.29872; qual_mismatch_simple_bayesian[14][31] = -4.30351; qual_mismatch_simple_bayesian[14][32] = -4.30734; qual_mismatch_simple_bayesian[14][33] = -4.31038; qual_mismatch_simple_bayesian[14][34] = -4.31281; qual_mismatch_simple_bayesian[14][35] = -4.31474; qual_mismatch_simple_bayesian[14][36] = -4.31627; qual_mismatch_simple_bayesian[14][37] = -4.3175; qual_mismatch_simple_bayesian[14][38] = -4.31847; qual_mismatch_simple_bayesian[14][39] = -4.31924; qual_mismatch_simple_bayesian[14][40] = -4.31986; qual_mismatch_simple_bayesian[14][41] = -4.32034; qual_mismatch_simple_bayesian[14][42] = -4.32073; qual_mismatch_simple_bayesian[14][43] = -4.32104; qual_mismatch_simple_bayesian[14][44] = -4.32128; qual_mismatch_simple_bayesian[14][45] = -4.32148; qual_mismatch_simple_bayesian[14][46] = -4.32163; qual_mismatch_simple_bayesian[15][0] = -1.10921; qual_mismatch_simple_bayesian[15][1] = -1.33123; qual_mismatch_simple_bayesian[15][2] = -1.55121; qual_mismatch_simple_bayesian[15][3] = -1.76867; qual_mismatch_simple_bayesian[15][4] = -1.98305; qual_mismatch_simple_bayesian[15][5] = -2.19368; qual_mismatch_simple_bayesian[15][6] = -2.39976; qual_mismatch_simple_bayesian[15][7] = -2.60038; qual_mismatch_simple_bayesian[15][8] = -2.79454; qual_mismatch_simple_bayesian[15][9] = -2.98112; qual_mismatch_simple_bayesian[15][10] = -3.15899; qual_mismatch_simple_bayesian[15][11] = -3.32696; qual_mismatch_simple_bayesian[15][12] = -3.48395; qual_mismatch_simple_bayesian[15][13] = -3.629; qual_mismatch_simple_bayesian[15][14] = -3.76138; qual_mismatch_simple_bayesian[15][15] = -3.88065; qual_mismatch_simple_bayesian[15][16] = -3.9867; qual_mismatch_simple_bayesian[15][17] = -4.07977; qual_mismatch_simple_bayesian[15][18] = -4.16041; qual_mismatch_simple_bayesian[15][19] = -4.22945; qual_mismatch_simple_bayesian[15][20] = -4.2879; qual_mismatch_simple_bayesian[15][21] = -4.3369; qual_mismatch_simple_bayesian[15][22] = -4.3776; qual_mismatch_simple_bayesian[15][23] = -4.41116; qual_mismatch_simple_bayesian[15][24] = -4.43864; qual_mismatch_simple_bayesian[15][25] = -4.46102; qual_mismatch_simple_bayesian[15][26] = -4.47916; qual_mismatch_simple_bayesian[15][27] = -4.49381; qual_mismatch_simple_bayesian[15][28] = -4.5056; qual_mismatch_simple_bayesian[15][29] = -4.51507; qual_mismatch_simple_bayesian[15][30] = -4.52265; qual_mismatch_simple_bayesian[15][31] = -4.52872; qual_mismatch_simple_bayesian[15][32] = -4.53356; qual_mismatch_simple_bayesian[15][33] = -4.53742; qual_mismatch_simple_bayesian[15][34] = -4.5405; qual_mismatch_simple_bayesian[15][35] = -4.54296; qual_mismatch_simple_bayesian[15][36] = -4.54491; qual_mismatch_simple_bayesian[15][37] = -4.54646; qual_mismatch_simple_bayesian[15][38] = -4.5477; qual_mismatch_simple_bayesian[15][39] = -4.54868; qual_mismatch_simple_bayesian[15][40] = -4.54947; qual_mismatch_simple_bayesian[15][41] = -4.55009; qual_mismatch_simple_bayesian[15][42] = -4.55058; qual_mismatch_simple_bayesian[15][43] = -4.55097; qual_mismatch_simple_bayesian[15][44] = -4.55128; qual_mismatch_simple_bayesian[15][45] = -4.55153; qual_mismatch_simple_bayesian[15][46] = -4.55173; qual_mismatch_simple_bayesian[16][0] = -1.10702; qual_mismatch_simple_bayesian[16][1] = -1.33074; qual_mismatch_simple_bayesian[16][2] = -1.55283; qual_mismatch_simple_bayesian[16][3] = -1.7729; qual_mismatch_simple_bayesian[16][4] = -1.99047; qual_mismatch_simple_bayesian[16][5] = -2.20499; qual_mismatch_simple_bayesian[16][6] = -2.41577; qual_mismatch_simple_bayesian[16][7] = -2.62204; qual_mismatch_simple_bayesian[16][8] = -2.8229; qual_mismatch_simple_bayesian[16][9] = -3.01733; qual_mismatch_simple_bayesian[16][10] = -3.20424; qual_mismatch_simple_bayesian[16][11] = -3.38246; qual_mismatch_simple_bayesian[16][12] = -3.55084; qual_mismatch_simple_bayesian[16][13] = -3.70828; qual_mismatch_simple_bayesian[16][14] = -3.85381; qual_mismatch_simple_bayesian[16][15] = -3.9867; qual_mismatch_simple_bayesian[16][16] = -4.10649; qual_mismatch_simple_bayesian[16][17] = -4.21306; qual_mismatch_simple_bayesian[16][18] = -4.30662; qual_mismatch_simple_bayesian[16][19] = -4.38774; qual_mismatch_simple_bayesian[16][20] = -4.45721; qual_mismatch_simple_bayesian[16][21] = -4.51606; qual_mismatch_simple_bayesian[16][22] = -4.5654; qual_mismatch_simple_bayesian[16][23] = -4.60641; qual_mismatch_simple_bayesian[16][24] = -4.64022; qual_mismatch_simple_bayesian[16][25] = -4.66792; qual_mismatch_simple_bayesian[16][26] = -4.69049; qual_mismatch_simple_bayesian[16][27] = -4.70878; qual_mismatch_simple_bayesian[16][28] = -4.72355; qual_mismatch_simple_bayesian[16][29] = -4.73544; qual_mismatch_simple_bayesian[16][30] = -4.74499; qual_mismatch_simple_bayesian[16][31] = -4.75264; qual_mismatch_simple_bayesian[16][32] = -4.75876; qual_mismatch_simple_bayesian[16][33] = -4.76365; qual_mismatch_simple_bayesian[16][34] = -4.76755; qual_mismatch_simple_bayesian[16][35] = -4.77065; qual_mismatch_simple_bayesian[16][36] = -4.77313; qual_mismatch_simple_bayesian[16][37] = -4.7751; qual_mismatch_simple_bayesian[16][38] = -4.77667; qual_mismatch_simple_bayesian[16][39] = -4.77792; qual_mismatch_simple_bayesian[16][40] = -4.77891; qual_mismatch_simple_bayesian[16][41] = -4.7797; qual_mismatch_simple_bayesian[16][42] = -4.78032; qual_mismatch_simple_bayesian[16][43] = -4.78082; qual_mismatch_simple_bayesian[16][44] = -4.78122; qual_mismatch_simple_bayesian[16][45] = -4.78153; qual_mismatch_simple_bayesian[16][46] = -4.78178; qual_mismatch_simple_bayesian[17][0] = -1.10529; qual_mismatch_simple_bayesian[17][1] = -1.33036; qual_mismatch_simple_bayesian[17][2] = -1.55412; qual_mismatch_simple_bayesian[17][3] = -1.77627; qual_mismatch_simple_bayesian[17][4] = -1.9964; qual_mismatch_simple_bayesian[17][5] = -2.21406; qual_mismatch_simple_bayesian[17][6] = -2.42868; qual_mismatch_simple_bayesian[17][7] = -2.63959; qual_mismatch_simple_bayesian[17][8] = -2.84602; qual_mismatch_simple_bayesian[17][9] = -3.04705; qual_mismatch_simple_bayesian[17][10] = -3.2417; qual_mismatch_simple_bayesian[17][11] = -3.42885; qual_mismatch_simple_bayesian[17][12] = -3.60736; qual_mismatch_simple_bayesian[17][13] = -3.77607; qual_mismatch_simple_bayesian[17][14] = -3.93386; qual_mismatch_simple_bayesian[17][15] = -4.07977; qual_mismatch_simple_bayesian[17][16] = -4.21306; qual_mismatch_simple_bayesian[17][17] = -4.33325; qual_mismatch_simple_bayesian[17][18] = -4.44022; qual_mismatch_simple_bayesian[17][19] = -4.53419; qual_mismatch_simple_bayesian[17][20] = -4.61567; qual_mismatch_simple_bayesian[17][21] = -4.68549; qual_mismatch_simple_bayesian[17][22] = -4.74465; qual_mismatch_simple_bayesian[17][23] = -4.79427; qual_mismatch_simple_bayesian[17][24] = -4.83552; qual_mismatch_simple_bayesian[17][25] = -4.86954; qual_mismatch_simple_bayesian[17][26] = -4.89741; qual_mismatch_simple_bayesian[17][27] = -4.92012; qual_mismatch_simple_bayesian[17][28] = -4.93853; qual_mismatch_simple_bayesian[17][29] = -4.9534; qual_mismatch_simple_bayesian[17][30] = -4.96537; qual_mismatch_simple_bayesian[17][31] = -4.97499; qual_mismatch_simple_bayesian[17][32] = -4.98269; qual_mismatch_simple_bayesian[17][33] = -4.98885; qual_mismatch_simple_bayesian[17][34] = -4.99377; qual_mismatch_simple_bayesian[17][35] = -4.9977; qual_mismatch_simple_bayesian[17][36] = -5.00083; qual_mismatch_simple_bayesian[17][37] = -5.00332; qual_mismatch_simple_bayesian[17][38] = -5.0053; qual_mismatch_simple_bayesian[17][39] = -5.00688; qual_mismatch_simple_bayesian[17][40] = -5.00814; qual_mismatch_simple_bayesian[17][41] = -5.00914; qual_mismatch_simple_bayesian[17][42] = -5.00993; qual_mismatch_simple_bayesian[17][43] = -5.01056; qual_mismatch_simple_bayesian[17][44] = -5.01107; qual_mismatch_simple_bayesian[17][45] = -5.01147; qual_mismatch_simple_bayesian[17][46] = -5.01178; qual_mismatch_simple_bayesian[18][0] = -1.10391; qual_mismatch_simple_bayesian[18][1] = -1.33005; qual_mismatch_simple_bayesian[18][2] = -1.55515; qual_mismatch_simple_bayesian[18][3] = -1.77895; qual_mismatch_simple_bayesian[18][4] = -2.00114; qual_mismatch_simple_bayesian[18][5] = -2.22133; qual_mismatch_simple_bayesian[18][6] = -2.43906; qual_mismatch_simple_bayesian[18][7] = -2.65376; qual_mismatch_simple_bayesian[18][8] = -2.86477; qual_mismatch_simple_bayesian[18][9] = -3.07131; qual_mismatch_simple_bayesian[18][10] = -3.27249; qual_mismatch_simple_bayesian[18][11] = -3.4673; qual_mismatch_simple_bayesian[18][12] = -3.65465; qual_mismatch_simple_bayesian[18][13] = -3.83339; qual_mismatch_simple_bayesian[18][14] = -4.00234; qual_mismatch_simple_bayesian[18][15] = -4.16041; qual_mismatch_simple_bayesian[18][16] = -4.30662; qual_mismatch_simple_bayesian[18][17] = -4.44022; qual_mismatch_simple_bayesian[18][18] = -4.56074; qual_mismatch_simple_bayesian[18][19] = -4.66803; qual_mismatch_simple_bayesian[18][20] = -4.76231; qual_mismatch_simple_bayesian[18][21] = -4.84409; qual_mismatch_simple_bayesian[18][22] = -4.91418; qual_mismatch_simple_bayesian[18][23] = -4.97359; qual_mismatch_simple_bayesian[18][24] = -5.02342; qual_mismatch_simple_bayesian[18][25] = -5.06486; qual_mismatch_simple_bayesian[18][26] = -5.09904; qual_mismatch_simple_bayesian[18][27] = -5.12706; qual_mismatch_simple_bayesian[18][28] = -5.14988; qual_mismatch_simple_bayesian[18][29] = -5.16839; qual_mismatch_simple_bayesian[18][30] = -5.18334; qual_mismatch_simple_bayesian[18][31] = -5.19537; qual_mismatch_simple_bayesian[18][32] = -5.20504; qual_mismatch_simple_bayesian[18][33] = -5.21278; qual_mismatch_simple_bayesian[18][34] = -5.21897; qual_mismatch_simple_bayesian[18][35] = -5.22392; qual_mismatch_simple_bayesian[18][36] = -5.22787; qual_mismatch_simple_bayesian[18][37] = -5.23102; qual_mismatch_simple_bayesian[18][38] = -5.23352; qual_mismatch_simple_bayesian[18][39] = -5.23552; qual_mismatch_simple_bayesian[18][40] = -5.23711; qual_mismatch_simple_bayesian[18][41] = -5.23837; qual_mismatch_simple_bayesian[18][42] = -5.23938; qual_mismatch_simple_bayesian[18][43] = -5.24017; qual_mismatch_simple_bayesian[18][44] = -5.24081; qual_mismatch_simple_bayesian[18][45] = -5.24131; qual_mismatch_simple_bayesian[18][46] = -5.24172; qual_mismatch_simple_bayesian[19][0] = -1.10282; qual_mismatch_simple_bayesian[19][1] = -1.32981; qual_mismatch_simple_bayesian[19][2] = -1.55597; qual_mismatch_simple_bayesian[19][3] = -1.78109; qual_mismatch_simple_bayesian[19][4] = -2.00492; qual_mismatch_simple_bayesian[19][5] = -2.22714; qual_mismatch_simple_bayesian[19][6] = -2.44737; qual_mismatch_simple_bayesian[19][7] = -2.66515; qual_mismatch_simple_bayesian[19][8] = -2.87992; qual_mismatch_simple_bayesian[19][9] = -3.09101; qual_mismatch_simple_bayesian[19][10] = -3.29764; qual_mismatch_simple_bayesian[19][11] = -3.49893; qual_mismatch_simple_bayesian[19][12] = -3.69388; qual_mismatch_simple_bayesian[19][13] = -3.88139; qual_mismatch_simple_bayesian[19][14] = -4.0603; qual_mismatch_simple_bayesian[19][15] = -4.22945; qual_mismatch_simple_bayesian[19][16] = -4.38774; qual_mismatch_simple_bayesian[19][17] = -4.53419; qual_mismatch_simple_bayesian[19][18] = -4.66803; qual_mismatch_simple_bayesian[19][19] = -4.78881; qual_mismatch_simple_bayesian[19][20] = -4.89635; qual_mismatch_simple_bayesian[19][21] = -4.99087; qual_mismatch_simple_bayesian[19][22] = -5.07289; qual_mismatch_simple_bayesian[19][23] = -5.1432; qual_mismatch_simple_bayesian[19][24] = -5.2028; qual_mismatch_simple_bayesian[19][25] = -5.25281; qual_mismatch_simple_bayesian[19][26] = -5.29439; qual_mismatch_simple_bayesian[19][27] = -5.32871; qual_mismatch_simple_bayesian[19][28] = -5.35683; qual_mismatch_simple_bayesian[19][29] = -5.37974; qual_mismatch_simple_bayesian[19][30] = -5.39832; qual_mismatch_simple_bayesian[19][31] = -5.41334; qual_mismatch_simple_bayesian[19][32] = -5.42542; qual_mismatch_simple_bayesian[19][33] = -5.43513; qual_mismatch_simple_bayesian[19][34] = -5.44291; qual_mismatch_simple_bayesian[19][35] = -5.44913; qual_mismatch_simple_bayesian[19][36] = -5.4541; qual_mismatch_simple_bayesian[19][37] = -5.45806; qual_mismatch_simple_bayesian[19][38] = -5.46122; qual_mismatch_simple_bayesian[19][39] = -5.46374; qual_mismatch_simple_bayesian[19][40] = -5.46574; qual_mismatch_simple_bayesian[19][41] = -5.46734; qual_mismatch_simple_bayesian[19][42] = -5.46861; qual_mismatch_simple_bayesian[19][43] = -5.46962; qual_mismatch_simple_bayesian[19][44] = -5.47042; qual_mismatch_simple_bayesian[19][45] = -5.47106; qual_mismatch_simple_bayesian[19][46] = -5.47156; qual_mismatch_simple_bayesian[20][0] = -1.10195; qual_mismatch_simple_bayesian[20][1] = -1.32962; qual_mismatch_simple_bayesian[20][2] = -1.55662; qual_mismatch_simple_bayesian[20][3] = -1.78279; qual_mismatch_simple_bayesian[20][4] = -2.00793; qual_mismatch_simple_bayesian[20][5] = -2.23178; qual_mismatch_simple_bayesian[20][6] = -2.45403; qual_mismatch_simple_bayesian[20][7] = -2.6743; qual_mismatch_simple_bayesian[20][8] = -2.89212; qual_mismatch_simple_bayesian[20][9] = -3.10693; qual_mismatch_simple_bayesian[20][10] = -3.31808; qual_mismatch_simple_bayesian[20][11] = -3.52479; qual_mismatch_simple_bayesian[20][12] = -3.72617; qual_mismatch_simple_bayesian[20][13] = -3.92122; qual_mismatch_simple_bayesian[20][14] = -4.10885; qual_mismatch_simple_bayesian[20][15] = -4.2879; qual_mismatch_simple_bayesian[20][16] = -4.45721; qual_mismatch_simple_bayesian[20][17] = -4.61567; qual_mismatch_simple_bayesian[20][18] = -4.76231; qual_mismatch_simple_bayesian[20][19] = -4.89635; qual_mismatch_simple_bayesian[20][20] = -5.01732; qual_mismatch_simple_bayesian[20][21] = -5.12507; qual_mismatch_simple_bayesian[20][22] = -5.21979; qual_mismatch_simple_bayesian[20][23] = -5.30199; qual_mismatch_simple_bayesian[20][24] = -5.37247; qual_mismatch_simple_bayesian[20][25] = -5.43222; qual_mismatch_simple_bayesian[20][26] = -5.48237; qual_mismatch_simple_bayesian[20][27] = -5.52408; qual_mismatch_simple_bayesian[20][28] = -5.55849; qual_mismatch_simple_bayesian[20][29] = -5.5867; qual_mismatch_simple_bayesian[20][30] = -5.60969; qual_mismatch_simple_bayesian[20][31] = -5.62833; qual_mismatch_simple_bayesian[20][32] = -5.64339; qual_mismatch_simple_bayesian[20][33] = -5.65552; qual_mismatch_simple_bayesian[20][34] = -5.66525; qual_mismatch_simple_bayesian[20][35] = -5.67306; qual_mismatch_simple_bayesian[20][36] = -5.6793; qual_mismatch_simple_bayesian[20][37] = -5.68429; qual_mismatch_simple_bayesian[20][38] = -5.68827; qual_mismatch_simple_bayesian[20][39] = -5.69144; qual_mismatch_simple_bayesian[20][40] = -5.69396; qual_mismatch_simple_bayesian[20][41] = -5.69598; qual_mismatch_simple_bayesian[20][42] = -5.69758; qual_mismatch_simple_bayesian[20][43] = -5.69885; qual_mismatch_simple_bayesian[20][44] = -5.69986; qual_mismatch_simple_bayesian[20][45] = -5.70067; qual_mismatch_simple_bayesian[20][46] = -5.70131; qual_mismatch_simple_bayesian[21][0] = -1.10126; qual_mismatch_simple_bayesian[21][1] = -1.32946; qual_mismatch_simple_bayesian[21][2] = -1.55713; qual_mismatch_simple_bayesian[21][3] = -1.78414; qual_mismatch_simple_bayesian[21][4] = -2.01033; qual_mismatch_simple_bayesian[21][5] = -2.23548; qual_mismatch_simple_bayesian[21][6] = -2.45935; qual_mismatch_simple_bayesian[21][7] = -2.68162; qual_mismatch_simple_bayesian[21][8] = -2.90191; qual_mismatch_simple_bayesian[21][9] = -3.11977; qual_mismatch_simple_bayesian[21][10] = -3.33462; qual_mismatch_simple_bayesian[21][11] = -3.54582; qual_mismatch_simple_bayesian[21][12] = -3.75259; qual_mismatch_simple_bayesian[21][13] = -3.95404; qual_mismatch_simple_bayesian[21][14] = -4.14917; qual_mismatch_simple_bayesian[21][15] = -4.3369; qual_mismatch_simple_bayesian[21][16] = -4.51606; qual_mismatch_simple_bayesian[21][17] = -4.68549; qual_mismatch_simple_bayesian[21][18] = -4.84409; qual_mismatch_simple_bayesian[21][19] = -4.99087; qual_mismatch_simple_bayesian[21][20] = -5.12507; qual_mismatch_simple_bayesian[21][21] = -5.2462; qual_mismatch_simple_bayesian[21][22] = -5.35411; qual_mismatch_simple_bayesian[21][23] = -5.44898; qual_mismatch_simple_bayesian[21][24] = -5.53133; qual_mismatch_simple_bayesian[21][25] = -5.60194; qual_mismatch_simple_bayesian[21][26] = -5.66182; qual_mismatch_simple_bayesian[21][27] = -5.71208; qual_mismatch_simple_bayesian[21][28] = -5.75388; qual_mismatch_simple_bayesian[21][29] = -5.78837; qual_mismatch_simple_bayesian[21][30] = -5.81665; qual_mismatch_simple_bayesian[21][31] = -5.83969; qual_mismatch_simple_bayesian[21][32] = -5.85838; qual_mismatch_simple_bayesian[21][33] = -5.87348; qual_mismatch_simple_bayesian[21][34] = -5.88564; qual_mismatch_simple_bayesian[21][35] = -5.89541; qual_mismatch_simple_bayesian[21][36] = -5.90323; qual_mismatch_simple_bayesian[21][37] = -5.90949; qual_mismatch_simple_bayesian[21][38] = -5.91449; qual_mismatch_simple_bayesian[21][39] = -5.91848; qual_mismatch_simple_bayesian[21][40] = -5.92166; qual_mismatch_simple_bayesian[21][41] = -5.9242; qual_mismatch_simple_bayesian[21][42] = -5.92621; qual_mismatch_simple_bayesian[21][43] = -5.92782; qual_mismatch_simple_bayesian[21][44] = -5.92909; qual_mismatch_simple_bayesian[21][45] = -5.93011; qual_mismatch_simple_bayesian[21][46] = -5.93092; qual_mismatch_simple_bayesian[22][0] = -1.10072; qual_mismatch_simple_bayesian[22][1] = -1.32934; qual_mismatch_simple_bayesian[22][2] = -1.55754; qual_mismatch_simple_bayesian[22][3] = -1.78522; qual_mismatch_simple_bayesian[22][4] = -2.01224; qual_mismatch_simple_bayesian[22][5] = -2.23843; qual_mismatch_simple_bayesian[22][6] = -2.4636; qual_mismatch_simple_bayesian[22][7] = -2.68748; qual_mismatch_simple_bayesian[22][8] = -2.90977; qual_mismatch_simple_bayesian[22][9] = -3.13008; qual_mismatch_simple_bayesian[22][10] = -3.34796; qual_mismatch_simple_bayesian[22][11] = -3.56284; qual_mismatch_simple_bayesian[22][12] = -3.77408; qual_mismatch_simple_bayesian[22][13] = -3.9809; qual_mismatch_simple_bayesian[22][14] = -4.1824; qual_mismatch_simple_bayesian[22][15] = -4.3776; qual_mismatch_simple_bayesian[22][16] = -4.5654; qual_mismatch_simple_bayesian[22][17] = -4.74465; qual_mismatch_simple_bayesian[22][18] = -4.91418; qual_mismatch_simple_bayesian[22][19] = -5.07289; qual_mismatch_simple_bayesian[22][20] = -5.21979; qual_mismatch_simple_bayesian[22][21] = -5.35411; qual_mismatch_simple_bayesian[22][22] = -5.47537; qual_mismatch_simple_bayesian[22][23] = -5.5834; qual_mismatch_simple_bayesian[22][24] = -5.67839; qual_mismatch_simple_bayesian[22][25] = -5.76086; qual_mismatch_simple_bayesian[22][26] = -5.83158; qual_mismatch_simple_bayesian[22][27] = -5.89155; qual_mismatch_simple_bayesian[22][28] = -5.9419; qual_mismatch_simple_bayesian[22][29] = -5.98377; qual_mismatch_simple_bayesian[22][30] = -6.01833; qual_mismatch_simple_bayesian[22][31] = -6.04666; qual_mismatch_simple_bayesian[22][32] = -6.06975; qual_mismatch_simple_bayesian[22][33] = -6.08848; qual_mismatch_simple_bayesian[22][34] = -6.10361; qual_mismatch_simple_bayesian[22][35] = -6.1158; qual_mismatch_simple_bayesian[22][36] = -6.12558; qual_mismatch_simple_bayesian[22][37] = -6.13342; qual_mismatch_simple_bayesian[22][38] = -6.1397; qual_mismatch_simple_bayesian[22][39] = -6.14471; qual_mismatch_simple_bayesian[22][40] = -6.14871; qual_mismatch_simple_bayesian[22][41] = -6.15189; qual_mismatch_simple_bayesian[22][42] = -6.15443; qual_mismatch_simple_bayesian[22][43] = -6.15645; qual_mismatch_simple_bayesian[22][44] = -6.15806; qual_mismatch_simple_bayesian[22][45] = -6.15934; qual_mismatch_simple_bayesian[22][46] = -6.16036; qual_mismatch_simple_bayesian[23][0] = -1.10028; qual_mismatch_simple_bayesian[23][1] = -1.32924; qual_mismatch_simple_bayesian[23][2] = -1.55787; qual_mismatch_simple_bayesian[23][3] = -1.78608; qual_mismatch_simple_bayesian[23][4] = -2.01376; qual_mismatch_simple_bayesian[23][5] = -2.24078; qual_mismatch_simple_bayesian[23][6] = -2.46698; qual_mismatch_simple_bayesian[23][7] = -2.69215; qual_mismatch_simple_bayesian[23][8] = -2.91605; qual_mismatch_simple_bayesian[23][9] = -3.13835; qual_mismatch_simple_bayesian[23][10] = -3.35868; qual_mismatch_simple_bayesian[23][11] = -3.57658; qual_mismatch_simple_bayesian[23][12] = -3.79149; qual_mismatch_simple_bayesian[23][13] = -4.00276; qual_mismatch_simple_bayesian[23][14] = -4.20961; qual_mismatch_simple_bayesian[23][15] = -4.41116; qual_mismatch_simple_bayesian[23][16] = -4.60641; qual_mismatch_simple_bayesian[23][17] = -4.79427; qual_mismatch_simple_bayesian[23][18] = -4.97359; qual_mismatch_simple_bayesian[23][19] = -5.1432; qual_mismatch_simple_bayesian[23][20] = -5.30199; qual_mismatch_simple_bayesian[23][21] = -5.44898; qual_mismatch_simple_bayesian[23][22] = -5.5834; qual_mismatch_simple_bayesian[23][23] = -5.70476; qual_mismatch_simple_bayesian[23][24] = -5.81289; qual_mismatch_simple_bayesian[23][25] = -5.90798; qual_mismatch_simple_bayesian[23][26] = -5.99054; qual_mismatch_simple_bayesian[23][27] = -6.06134; qual_mismatch_simple_bayesian[23][28] = -6.12139; qual_mismatch_simple_bayesian[23][29] = -6.17181; qual_mismatch_simple_bayesian[23][30] = -6.21374; qual_mismatch_simple_bayesian[23][31] = -6.24836; qual_mismatch_simple_bayesian[23][32] = -6.27673; qual_mismatch_simple_bayesian[23][33] = -6.29986; qual_mismatch_simple_bayesian[23][34] = -6.31861; qual_mismatch_simple_bayesian[23][35] = -6.33377; qual_mismatch_simple_bayesian[23][36] = -6.34597; qual_mismatch_simple_bayesian[23][37] = -6.35578; qual_mismatch_simple_bayesian[23][38] = -6.36363; qual_mismatch_simple_bayesian[23][39] = -6.36991; qual_mismatch_simple_bayesian[23][40] = -6.37493; qual_mismatch_simple_bayesian[23][41] = -6.37894; qual_mismatch_simple_bayesian[23][42] = -6.38213; qual_mismatch_simple_bayesian[23][43] = -6.38467; qual_mismatch_simple_bayesian[23][44] = -6.3867; qual_mismatch_simple_bayesian[23][45] = -6.38831; qual_mismatch_simple_bayesian[23][46] = -6.38959; qual_mismatch_simple_bayesian[24][0] = -1.09994; qual_mismatch_simple_bayesian[24][1] = -1.32917; qual_mismatch_simple_bayesian[24][2] = -1.55813; qual_mismatch_simple_bayesian[24][3] = -1.78676; qual_mismatch_simple_bayesian[24][4] = -2.01497; qual_mismatch_simple_bayesian[24][5] = -2.24265; qual_mismatch_simple_bayesian[24][6] = -2.46968; qual_mismatch_simple_bayesian[24][7] = -2.69588; qual_mismatch_simple_bayesian[24][8] = -2.92106; qual_mismatch_simple_bayesian[24][9] = -3.14496; qual_mismatch_simple_bayesian[24][10] = -3.36728; qual_mismatch_simple_bayesian[24][11] = -3.58762; qual_mismatch_simple_bayesian[24][12] = -3.80553; qual_mismatch_simple_bayesian[24][13] = -4.02047; qual_mismatch_simple_bayesian[24][14] = -4.23176; qual_mismatch_simple_bayesian[24][15] = -4.43864; qual_mismatch_simple_bayesian[24][16] = -4.64022; qual_mismatch_simple_bayesian[24][17] = -4.83552; qual_mismatch_simple_bayesian[24][18] = -5.02342; qual_mismatch_simple_bayesian[24][19] = -5.2028; qual_mismatch_simple_bayesian[24][20] = -5.37247; qual_mismatch_simple_bayesian[24][21] = -5.53133; qual_mismatch_simple_bayesian[24][22] = -5.67839; qual_mismatch_simple_bayesian[24][23] = -5.81289; qual_mismatch_simple_bayesian[24][24] = -5.93433; qual_mismatch_simple_bayesian[24][25] = -6.04254; qual_mismatch_simple_bayesian[24][26] = -6.1377; qual_mismatch_simple_bayesian[24][27] = -6.22033; qual_mismatch_simple_bayesian[24][28] = -6.29121; qual_mismatch_simple_bayesian[24][29] = -6.35132; qual_mismatch_simple_bayesian[24][30] = -6.40179; qual_mismatch_simple_bayesian[24][31] = -6.44377; qual_mismatch_simple_bayesian[24][32] = -6.47843; qual_mismatch_simple_bayesian[24][33] = -6.50683; qual_mismatch_simple_bayesian[24][34] = -6.52999; qual_mismatch_simple_bayesian[24][35] = -6.54877; qual_mismatch_simple_bayesian[24][36] = -6.56395; qual_mismatch_simple_bayesian[24][37] = -6.57617; qual_mismatch_simple_bayesian[24][38] = -6.58598; qual_mismatch_simple_bayesian[24][39] = -6.59385; qual_mismatch_simple_bayesian[24][40] = -6.60014; qual_mismatch_simple_bayesian[24][41] = -6.60516; qual_mismatch_simple_bayesian[24][42] = -6.60917; qual_mismatch_simple_bayesian[24][43] = -6.61237; qual_mismatch_simple_bayesian[24][44] = -6.61492; qual_mismatch_simple_bayesian[24][45] = -6.61695; qual_mismatch_simple_bayesian[24][46] = -6.61856; qual_mismatch_simple_bayesian[25][0] = -1.09967; qual_mismatch_simple_bayesian[25][1] = -1.32911; qual_mismatch_simple_bayesian[25][2] = -1.55833; qual_mismatch_simple_bayesian[25][3] = -1.7873; qual_mismatch_simple_bayesian[25][4] = -2.01593; qual_mismatch_simple_bayesian[25][5] = -2.24414; qual_mismatch_simple_bayesian[25][6] = -2.47183; qual_mismatch_simple_bayesian[25][7] = -2.69886; qual_mismatch_simple_bayesian[25][8] = -2.92507; qual_mismatch_simple_bayesian[25][9] = -3.15025; qual_mismatch_simple_bayesian[25][10] = -3.37416; qual_mismatch_simple_bayesian[25][11] = -3.59648; qual_mismatch_simple_bayesian[25][12] = -3.81683; qual_mismatch_simple_bayesian[25][13] = -4.03476; qual_mismatch_simple_bayesian[25][14] = -4.24971; qual_mismatch_simple_bayesian[25][15] = -4.46102; qual_mismatch_simple_bayesian[25][16] = -4.66792; qual_mismatch_simple_bayesian[25][17] = -4.86954; qual_mismatch_simple_bayesian[25][18] = -5.06486; qual_mismatch_simple_bayesian[25][19] = -5.25281; qual_mismatch_simple_bayesian[25][20] = -5.43222; qual_mismatch_simple_bayesian[25][21] = -5.60194; qual_mismatch_simple_bayesian[25][22] = -5.76086; qual_mismatch_simple_bayesian[25][23] = -5.90798; qual_mismatch_simple_bayesian[25][24] = -6.04254; qual_mismatch_simple_bayesian[25][25] = -6.16404; qual_mismatch_simple_bayesian[25][26] = -6.27231; qual_mismatch_simple_bayesian[25][27] = -6.36754; qual_mismatch_simple_bayesian[25][28] = -6.45023; qual_mismatch_simple_bayesian[25][29] = -6.52116; qual_mismatch_simple_bayesian[25][30] = -6.58132; qual_mismatch_simple_bayesian[25][31] = -6.63183; qual_mismatch_simple_bayesian[25][32] = -6.67385; qual_mismatch_simple_bayesian[25][33] = -6.70854; qual_mismatch_simple_bayesian[25][34] = -6.73697; qual_mismatch_simple_bayesian[25][35] = -6.76015; qual_mismatch_simple_bayesian[25][36] = -6.77895; qual_mismatch_simple_bayesian[25][37] = -6.79414; qual_mismatch_simple_bayesian[25][38] = -6.80637; qual_mismatch_simple_bayesian[25][39] = -6.8162; qual_mismatch_simple_bayesian[25][40] = -6.82407; qual_mismatch_simple_bayesian[25][41] = -6.83037; qual_mismatch_simple_bayesian[25][42] = -6.8354; qual_mismatch_simple_bayesian[25][43] = -6.83942; qual_mismatch_simple_bayesian[25][44] = -6.84262; qual_mismatch_simple_bayesian[25][45] = -6.84517; qual_mismatch_simple_bayesian[25][46] = -6.8472; qual_mismatch_simple_bayesian[26][0] = -1.09945; qual_mismatch_simple_bayesian[26][1] = -1.32906; qual_mismatch_simple_bayesian[26][2] = -1.5585; qual_mismatch_simple_bayesian[26][3] = -1.78773; qual_mismatch_simple_bayesian[26][4] = -2.01669; qual_mismatch_simple_bayesian[26][5] = -2.24532; qual_mismatch_simple_bayesian[26][6] = -2.47353; qual_mismatch_simple_bayesian[26][7] = -2.70122; qual_mismatch_simple_bayesian[26][8] = -2.92826; qual_mismatch_simple_bayesian[26][9] = -3.15447; qual_mismatch_simple_bayesian[26][10] = -3.37966; qual_mismatch_simple_bayesian[26][11] = -3.60357; qual_mismatch_simple_bayesian[26][12] = -3.8259; qual_mismatch_simple_bayesian[26][13] = -4.04626; qual_mismatch_simple_bayesian[26][14] = -4.2642; qual_mismatch_simple_bayesian[26][15] = -4.47916; qual_mismatch_simple_bayesian[26][16] = -4.69049; qual_mismatch_simple_bayesian[26][17] = -4.89741; qual_mismatch_simple_bayesian[26][18] = -5.09904; qual_mismatch_simple_bayesian[26][19] = -5.29439; qual_mismatch_simple_bayesian[26][20] = -5.48237; qual_mismatch_simple_bayesian[26][21] = -5.66182; qual_mismatch_simple_bayesian[26][22] = -5.83158; qual_mismatch_simple_bayesian[26][23] = -5.99054; qual_mismatch_simple_bayesian[26][24] = -6.1377; qual_mismatch_simple_bayesian[26][25] = -6.27231; qual_mismatch_simple_bayesian[26][26] = -6.39386; qual_mismatch_simple_bayesian[26][27] = -6.50219; qual_mismatch_simple_bayesian[26][28] = -6.59746; qual_mismatch_simple_bayesian[26][29] = -6.6802; qual_mismatch_simple_bayesian[26][30] = -6.75117; qual_mismatch_simple_bayesian[26][31] = -6.81137; qual_mismatch_simple_bayesian[26][32] = -6.86191; qual_mismatch_simple_bayesian[26][33] = -6.90396; qual_mismatch_simple_bayesian[26][34] = -6.93867; qual_mismatch_simple_bayesian[26][35] = -6.96713; qual_mismatch_simple_bayesian[26][36] = -6.99033; qual_mismatch_simple_bayesian[26][37] = -7.00914; qual_mismatch_simple_bayesian[26][38] = -7.02435; qual_mismatch_simple_bayesian[26][39] = -7.03659; qual_mismatch_simple_bayesian[26][40] = -7.04642; qual_mismatch_simple_bayesian[26][41] = -7.0543; qual_mismatch_simple_bayesian[26][42] = -7.06061; qual_mismatch_simple_bayesian[26][43] = -7.06564; qual_mismatch_simple_bayesian[26][44] = -7.06966; qual_mismatch_simple_bayesian[26][45] = -7.07286; qual_mismatch_simple_bayesian[26][46] = -7.07542; qual_mismatch_simple_bayesian[27][0] = -1.09928; qual_mismatch_simple_bayesian[27][1] = -1.32902; qual_mismatch_simple_bayesian[27][2] = -1.55863; qual_mismatch_simple_bayesian[27][3] = -1.78807; qual_mismatch_simple_bayesian[27][4] = -2.0173; qual_mismatch_simple_bayesian[27][5] = -2.24626; qual_mismatch_simple_bayesian[27][6] = -2.47489; qual_mismatch_simple_bayesian[27][7] = -2.70311; qual_mismatch_simple_bayesian[27][8] = -2.9308; qual_mismatch_simple_bayesian[27][9] = -3.15784; qual_mismatch_simple_bayesian[27][10] = -3.38405; qual_mismatch_simple_bayesian[27][11] = -3.60925; qual_mismatch_simple_bayesian[27][12] = -3.83316; qual_mismatch_simple_bayesian[27][13] = -4.0555; qual_mismatch_simple_bayesian[27][14] = -4.27586; qual_mismatch_simple_bayesian[27][15] = -4.49381; qual_mismatch_simple_bayesian[27][16] = -4.70878; qual_mismatch_simple_bayesian[27][17] = -4.92012; qual_mismatch_simple_bayesian[27][18] = -5.12706; qual_mismatch_simple_bayesian[27][19] = -5.32871; qual_mismatch_simple_bayesian[27][20] = -5.52408; qual_mismatch_simple_bayesian[27][21] = -5.71208; qual_mismatch_simple_bayesian[27][22] = -5.89155; qual_mismatch_simple_bayesian[27][23] = -6.06134; qual_mismatch_simple_bayesian[27][24] = -6.22033; qual_mismatch_simple_bayesian[27][25] = -6.36754; qual_mismatch_simple_bayesian[27][26] = -6.50219; qual_mismatch_simple_bayesian[27][27] = -6.62378; qual_mismatch_simple_bayesian[27][28] = -6.73214; qual_mismatch_simple_bayesian[27][29] = -6.82745; qual_mismatch_simple_bayesian[27][30] = -6.91022; qual_mismatch_simple_bayesian[27][31] = -6.98123; qual_mismatch_simple_bayesian[27][32] = -7.04146; qual_mismatch_simple_bayesian[27][33] = -7.09203; qual_mismatch_simple_bayesian[27][34] = -7.13411; qual_mismatch_simple_bayesian[27][35] = -7.16884; qual_mismatch_simple_bayesian[27][36] = -7.19731; qual_mismatch_simple_bayesian[27][37] = -7.22052; qual_mismatch_simple_bayesian[27][38] = -7.23935; qual_mismatch_simple_bayesian[27][39] = -7.25456; qual_mismatch_simple_bayesian[27][40] = -7.26682; qual_mismatch_simple_bayesian[27][41] = -7.27666; qual_mismatch_simple_bayesian[27][42] = -7.28454; qual_mismatch_simple_bayesian[27][43] = -7.29085; qual_mismatch_simple_bayesian[27][44] = -7.29589; qual_mismatch_simple_bayesian[27][45] = -7.29991; qual_mismatch_simple_bayesian[27][46] = -7.30311; qual_mismatch_simple_bayesian[28][0] = -1.09914; qual_mismatch_simple_bayesian[28][1] = -1.32899; qual_mismatch_simple_bayesian[28][2] = -1.55873; qual_mismatch_simple_bayesian[28][3] = -1.78834; qual_mismatch_simple_bayesian[28][4] = -2.01778; qual_mismatch_simple_bayesian[28][5] = -2.24701; qual_mismatch_simple_bayesian[28][6] = -2.47598; qual_mismatch_simple_bayesian[28][7] = -2.70461; qual_mismatch_simple_bayesian[28][8] = -2.93282; qual_mismatch_simple_bayesian[28][9] = -3.16052; qual_mismatch_simple_bayesian[28][10] = -3.38756; qual_mismatch_simple_bayesian[28][11] = -3.61377; qual_mismatch_simple_bayesian[28][12] = -3.83897; qual_mismatch_simple_bayesian[28][13] = -4.06289; qual_mismatch_simple_bayesian[28][14] = -4.28523; qual_mismatch_simple_bayesian[28][15] = -4.5056; qual_mismatch_simple_bayesian[28][16] = -4.72355; qual_mismatch_simple_bayesian[28][17] = -4.93853; qual_mismatch_simple_bayesian[28][18] = -5.14988; qual_mismatch_simple_bayesian[28][19] = -5.35683; qual_mismatch_simple_bayesian[28][20] = -5.55849; qual_mismatch_simple_bayesian[28][21] = -5.75388; qual_mismatch_simple_bayesian[28][22] = -5.9419; qual_mismatch_simple_bayesian[28][23] = -6.12139; qual_mismatch_simple_bayesian[28][24] = -6.29121; qual_mismatch_simple_bayesian[28][25] = -6.45023; qual_mismatch_simple_bayesian[28][26] = -6.59746; qual_mismatch_simple_bayesian[28][27] = -6.73214; qual_mismatch_simple_bayesian[28][28] = -6.85376; qual_mismatch_simple_bayesian[28][29] = -6.96216; qual_mismatch_simple_bayesian[28][30] = -7.0575; qual_mismatch_simple_bayesian[28][31] = -7.1403; qual_mismatch_simple_bayesian[28][32] = -7.21133; qual_mismatch_simple_bayesian[28][33] = -7.27159; qual_mismatch_simple_bayesian[28][34] = -7.32218; qual_mismatch_simple_bayesian[28][35] = -7.36428; qual_mismatch_simple_bayesian[28][36] = -7.39902; qual_mismatch_simple_bayesian[28][37] = -7.42751; qual_mismatch_simple_bayesian[28][38] = -7.45073; qual_mismatch_simple_bayesian[28][39] = -7.46957; qual_mismatch_simple_bayesian[28][40] = -7.48479; qual_mismatch_simple_bayesian[28][41] = -7.49705; qual_mismatch_simple_bayesian[28][42] = -7.50689; qual_mismatch_simple_bayesian[28][43] = -7.51478; qual_mismatch_simple_bayesian[28][44] = -7.52109; qual_mismatch_simple_bayesian[28][45] = -7.52614; qual_mismatch_simple_bayesian[28][46] = -7.53016; qual_mismatch_simple_bayesian[29][0] = -1.09903; qual_mismatch_simple_bayesian[29][1] = -1.32896; qual_mismatch_simple_bayesian[29][2] = -1.55881; qual_mismatch_simple_bayesian[29][3] = -1.78855; qual_mismatch_simple_bayesian[29][4] = -2.01816; qual_mismatch_simple_bayesian[29][5] = -2.2476; qual_mismatch_simple_bayesian[29][6] = -2.47684; qual_mismatch_simple_bayesian[29][7] = -2.7058; qual_mismatch_simple_bayesian[29][8] = -2.93444; qual_mismatch_simple_bayesian[29][9] = -3.16265; qual_mismatch_simple_bayesian[29][10] = -3.39035; qual_mismatch_simple_bayesian[29][11] = -3.61738; qual_mismatch_simple_bayesian[29][12] = -3.84361; qual_mismatch_simple_bayesian[29][13] = -4.0688; qual_mismatch_simple_bayesian[29][14] = -4.29273; qual_mismatch_simple_bayesian[29][15] = -4.51507; qual_mismatch_simple_bayesian[29][16] = -4.73544; qual_mismatch_simple_bayesian[29][17] = -4.9534; qual_mismatch_simple_bayesian[29][18] = -5.16839; qual_mismatch_simple_bayesian[29][19] = -5.37974; qual_mismatch_simple_bayesian[29][20] = -5.5867; qual_mismatch_simple_bayesian[29][21] = -5.78837; qual_mismatch_simple_bayesian[29][22] = -5.98377; qual_mismatch_simple_bayesian[29][23] = -6.17181; qual_mismatch_simple_bayesian[29][24] = -6.35132; qual_mismatch_simple_bayesian[29][25] = -6.52116; qual_mismatch_simple_bayesian[29][26] = -6.6802; qual_mismatch_simple_bayesian[29][27] = -6.82745; qual_mismatch_simple_bayesian[29][28] = -6.96216; qual_mismatch_simple_bayesian[29][29] = -7.0838; qual_mismatch_simple_bayesian[29][30] = -7.19222; qual_mismatch_simple_bayesian[29][31] = -7.28759; qual_mismatch_simple_bayesian[29][32] = -7.37041; qual_mismatch_simple_bayesian[29][33] = -7.44147; qual_mismatch_simple_bayesian[29][34] = -7.50174; qual_mismatch_simple_bayesian[29][35] = -7.55235; qual_mismatch_simple_bayesian[29][36] = -7.59446; qual_mismatch_simple_bayesian[29][37] = -7.62922; qual_mismatch_simple_bayesian[29][38] = -7.65772; qual_mismatch_simple_bayesian[29][39] = -7.68095; qual_mismatch_simple_bayesian[29][40] = -7.6998; qual_mismatch_simple_bayesian[29][41] = -7.71502; qual_mismatch_simple_bayesian[29][42] = -7.72729; qual_mismatch_simple_bayesian[29][43] = -7.73713; qual_mismatch_simple_bayesian[29][44] = -7.74503; qual_mismatch_simple_bayesian[29][45] = -7.75134; qual_mismatch_simple_bayesian[29][46] = -7.75639; qual_mismatch_simple_bayesian[30][0] = -1.09895; qual_mismatch_simple_bayesian[30][1] = -1.32895; qual_mismatch_simple_bayesian[30][2] = -1.55888; qual_mismatch_simple_bayesian[30][3] = -1.78873; qual_mismatch_simple_bayesian[30][4] = -2.01847; qual_mismatch_simple_bayesian[30][5] = -2.24808; qual_mismatch_simple_bayesian[30][6] = -2.47752; qual_mismatch_simple_bayesian[30][7] = -2.70675; qual_mismatch_simple_bayesian[30][8] = -2.93572; qual_mismatch_simple_bayesian[30][9] = -3.16435; qual_mismatch_simple_bayesian[30][10] = -3.39257; qual_mismatch_simple_bayesian[30][11] = -3.62026; qual_mismatch_simple_bayesian[30][12] = -3.8473; qual_mismatch_simple_bayesian[30][13] = -4.07352; qual_mismatch_simple_bayesian[30][14] = -4.29872; qual_mismatch_simple_bayesian[30][15] = -4.52265; qual_mismatch_simple_bayesian[30][16] = -4.74499; qual_mismatch_simple_bayesian[30][17] = -4.96537; qual_mismatch_simple_bayesian[30][18] = -5.18334; qual_mismatch_simple_bayesian[30][19] = -5.39832; qual_mismatch_simple_bayesian[30][20] = -5.60969; qual_mismatch_simple_bayesian[30][21] = -5.81665; qual_mismatch_simple_bayesian[30][22] = -6.01833; qual_mismatch_simple_bayesian[30][23] = -6.21374; qual_mismatch_simple_bayesian[30][24] = -6.40179; qual_mismatch_simple_bayesian[30][25] = -6.58132; qual_mismatch_simple_bayesian[30][26] = -6.75117; qual_mismatch_simple_bayesian[30][27] = -6.91022; qual_mismatch_simple_bayesian[30][28] = -7.0575; qual_mismatch_simple_bayesian[30][29] = -7.19222; qual_mismatch_simple_bayesian[30][30] = -7.31389; qual_mismatch_simple_bayesian[30][31] = -7.42233; qual_mismatch_simple_bayesian[30][32] = -7.51772; qual_mismatch_simple_bayesian[30][33] = -7.60056; qual_mismatch_simple_bayesian[30][34] = -7.67163; qual_mismatch_simple_bayesian[30][35] = -7.73192; qual_mismatch_simple_bayesian[30][36] = -7.78254; qual_mismatch_simple_bayesian[30][37] = -7.82466; qual_mismatch_simple_bayesian[30][38] = -7.85943; qual_mismatch_simple_bayesian[30][39] = -7.88794; qual_mismatch_simple_bayesian[30][40] = -7.91118; qual_mismatch_simple_bayesian[30][41] = -7.93003; qual_mismatch_simple_bayesian[30][42] = -7.94526; qual_mismatch_simple_bayesian[30][43] = -7.95753; qual_mismatch_simple_bayesian[30][44] = -7.96738; qual_mismatch_simple_bayesian[30][45] = -7.97528; qual_mismatch_simple_bayesian[30][46] = -7.98159; qual_mismatch_simple_bayesian[31][0] = -1.09888; qual_mismatch_simple_bayesian[31][1] = -1.32893; qual_mismatch_simple_bayesian[31][2] = -1.55893; qual_mismatch_simple_bayesian[31][3] = -1.78886; qual_mismatch_simple_bayesian[31][4] = -2.01871; qual_mismatch_simple_bayesian[31][5] = -2.24845; qual_mismatch_simple_bayesian[31][6] = -2.47806; qual_mismatch_simple_bayesian[31][7] = -2.7075; qual_mismatch_simple_bayesian[31][8] = -2.93674; qual_mismatch_simple_bayesian[31][9] = -3.1657; qual_mismatch_simple_bayesian[31][10] = -3.39434; qual_mismatch_simple_bayesian[31][11] = -3.62255; qual_mismatch_simple_bayesian[31][12] = -3.85025; qual_mismatch_simple_bayesian[31][13] = -4.07729; qual_mismatch_simple_bayesian[31][14] = -4.30351; qual_mismatch_simple_bayesian[31][15] = -4.52872; qual_mismatch_simple_bayesian[31][16] = -4.75264; qual_mismatch_simple_bayesian[31][17] = -4.97499; qual_mismatch_simple_bayesian[31][18] = -5.19537; qual_mismatch_simple_bayesian[31][19] = -5.41334; qual_mismatch_simple_bayesian[31][20] = -5.62833; qual_mismatch_simple_bayesian[31][21] = -5.83969; qual_mismatch_simple_bayesian[31][22] = -6.04666; qual_mismatch_simple_bayesian[31][23] = -6.24836; qual_mismatch_simple_bayesian[31][24] = -6.44377; qual_mismatch_simple_bayesian[31][25] = -6.63183; qual_mismatch_simple_bayesian[31][26] = -6.81137; qual_mismatch_simple_bayesian[31][27] = -6.98123; qual_mismatch_simple_bayesian[31][28] = -7.1403; qual_mismatch_simple_bayesian[31][29] = -7.28759; qual_mismatch_simple_bayesian[31][30] = -7.42233; qual_mismatch_simple_bayesian[31][31] = -7.54401; qual_mismatch_simple_bayesian[31][32] = -7.65246; qual_mismatch_simple_bayesian[31][33] = -7.74787; qual_mismatch_simple_bayesian[31][34] = -7.83072; qual_mismatch_simple_bayesian[31][35] = -7.90181; qual_mismatch_simple_bayesian[31][36] = -7.96211; qual_mismatch_simple_bayesian[31][37] = -8.01274; qual_mismatch_simple_bayesian[31][38] = -8.05488; qual_mismatch_simple_bayesian[31][39] = -8.08965; qual_mismatch_simple_bayesian[31][40] = -8.11817; qual_mismatch_simple_bayesian[31][41] = -8.14141; qual_mismatch_simple_bayesian[31][42] = -8.16027; qual_mismatch_simple_bayesian[31][43] = -8.1755; qual_mismatch_simple_bayesian[31][44] = -8.18777; qual_mismatch_simple_bayesian[31][45] = -8.19763; qual_mismatch_simple_bayesian[31][46] = -8.20553; qual_mismatch_simple_bayesian[32][0] = -1.09882; qual_mismatch_simple_bayesian[32][1] = -1.32892; qual_mismatch_simple_bayesian[32][2] = -1.55897; qual_mismatch_simple_bayesian[32][3] = -1.78897; qual_mismatch_simple_bayesian[32][4] = -2.0189; qual_mismatch_simple_bayesian[32][5] = -2.24875; qual_mismatch_simple_bayesian[32][6] = -2.47849; qual_mismatch_simple_bayesian[32][7] = -2.7081; qual_mismatch_simple_bayesian[32][8] = -2.93755; qual_mismatch_simple_bayesian[32][9] = -3.16678; qual_mismatch_simple_bayesian[32][10] = -3.39574; qual_mismatch_simple_bayesian[32][11] = -3.62438; qual_mismatch_simple_bayesian[32][12] = -3.8526; qual_mismatch_simple_bayesian[32][13] = -4.08029; qual_mismatch_simple_bayesian[32][14] = -4.30734; qual_mismatch_simple_bayesian[32][15] = -4.53356; qual_mismatch_simple_bayesian[32][16] = -4.75876; qual_mismatch_simple_bayesian[32][17] = -4.98269; qual_mismatch_simple_bayesian[32][18] = -5.20504; qual_mismatch_simple_bayesian[32][19] = -5.42542; qual_mismatch_simple_bayesian[32][20] = -5.64339; qual_mismatch_simple_bayesian[32][21] = -5.85838; qual_mismatch_simple_bayesian[32][22] = -6.06975; qual_mismatch_simple_bayesian[32][23] = -6.27673; qual_mismatch_simple_bayesian[32][24] = -6.47843; qual_mismatch_simple_bayesian[32][25] = -6.67385; qual_mismatch_simple_bayesian[32][26] = -6.86191; qual_mismatch_simple_bayesian[32][27] = -7.04146; qual_mismatch_simple_bayesian[32][28] = -7.21133; qual_mismatch_simple_bayesian[32][29] = -7.37041; qual_mismatch_simple_bayesian[32][30] = -7.51772; qual_mismatch_simple_bayesian[32][31] = -7.65246; qual_mismatch_simple_bayesian[32][32] = -7.77416; qual_mismatch_simple_bayesian[32][33] = -7.88263; qual_mismatch_simple_bayesian[32][34] = -7.97804; qual_mismatch_simple_bayesian[32][35] = -8.06091; qual_mismatch_simple_bayesian[32][36] = -8.132; qual_mismatch_simple_bayesian[32][37] = -8.19232; qual_mismatch_simple_bayesian[32][38] = -8.24296; qual_mismatch_simple_bayesian[32][39] = -8.2851; qual_mismatch_simple_bayesian[32][40] = -8.31988; qual_mismatch_simple_bayesian[32][41] = -8.3484; qual_mismatch_simple_bayesian[32][42] = -8.37165; qual_mismatch_simple_bayesian[32][43] = -8.39051; qual_mismatch_simple_bayesian[32][44] = -8.40575; qual_mismatch_simple_bayesian[32][45] = -8.41802; qual_mismatch_simple_bayesian[32][46] = -8.42788; qual_mismatch_simple_bayesian[33][0] = -1.09878; qual_mismatch_simple_bayesian[33][1] = -1.32891; qual_mismatch_simple_bayesian[33][2] = -1.559; qual_mismatch_simple_bayesian[33][3] = -1.78906; qual_mismatch_simple_bayesian[33][4] = -2.01906; qual_mismatch_simple_bayesian[33][5] = -2.24899; qual_mismatch_simple_bayesian[33][6] = -2.47884; qual_mismatch_simple_bayesian[33][7] = -2.70858; qual_mismatch_simple_bayesian[33][8] = -2.93819; qual_mismatch_simple_bayesian[33][9] = -3.16763; qual_mismatch_simple_bayesian[33][10] = -3.39686; qual_mismatch_simple_bayesian[33][11] = -3.62583; qual_mismatch_simple_bayesian[33][12] = -3.85447; qual_mismatch_simple_bayesian[33][13] = -4.08268; qual_mismatch_simple_bayesian[33][14] = -4.31038; qual_mismatch_simple_bayesian[33][15] = -4.53742; qual_mismatch_simple_bayesian[33][16] = -4.76365; qual_mismatch_simple_bayesian[33][17] = -4.98885; qual_mismatch_simple_bayesian[33][18] = -5.21278; qual_mismatch_simple_bayesian[33][19] = -5.43513; qual_mismatch_simple_bayesian[33][20] = -5.65552; qual_mismatch_simple_bayesian[33][21] = -5.87348; qual_mismatch_simple_bayesian[33][22] = -6.08848; qual_mismatch_simple_bayesian[33][23] = -6.29986; qual_mismatch_simple_bayesian[33][24] = -6.50683; qual_mismatch_simple_bayesian[33][25] = -6.70854; qual_mismatch_simple_bayesian[33][26] = -6.90396; qual_mismatch_simple_bayesian[33][27] = -7.09203; qual_mismatch_simple_bayesian[33][28] = -7.27159; qual_mismatch_simple_bayesian[33][29] = -7.44147; qual_mismatch_simple_bayesian[33][30] = -7.60056; qual_mismatch_simple_bayesian[33][31] = -7.74787; qual_mismatch_simple_bayesian[33][32] = -7.88263; qual_mismatch_simple_bayesian[33][33] = -8.00433; qual_mismatch_simple_bayesian[33][34] = -8.11281; qual_mismatch_simple_bayesian[33][35] = -8.20823; qual_mismatch_simple_bayesian[33][36] = -8.29111; qual_mismatch_simple_bayesian[33][37] = -8.36221; qual_mismatch_simple_bayesian[33][38] = -8.42253; qual_mismatch_simple_bayesian[33][39] = -8.47318; qual_mismatch_simple_bayesian[33][40] = -8.51533; qual_mismatch_simple_bayesian[33][41] = -8.55012; qual_mismatch_simple_bayesian[33][42] = -8.57864; qual_mismatch_simple_bayesian[33][43] = -8.60189; qual_mismatch_simple_bayesian[33][44] = -8.62076; qual_mismatch_simple_bayesian[33][45] = -8.636; qual_mismatch_simple_bayesian[33][46] = -8.64827; qual_mismatch_simple_bayesian[34][0] = -1.09874; qual_mismatch_simple_bayesian[34][1] = -1.3289; qual_mismatch_simple_bayesian[34][2] = -1.55903; qual_mismatch_simple_bayesian[34][3] = -1.78912; qual_mismatch_simple_bayesian[34][4] = -2.01918; qual_mismatch_simple_bayesian[34][5] = -2.24918; qual_mismatch_simple_bayesian[34][6] = -2.47911; qual_mismatch_simple_bayesian[34][7] = -2.70896; qual_mismatch_simple_bayesian[34][8] = -2.9387; qual_mismatch_simple_bayesian[34][9] = -3.16831; qual_mismatch_simple_bayesian[34][10] = -3.39775; qual_mismatch_simple_bayesian[34][11] = -3.62698; qual_mismatch_simple_bayesian[34][12] = -3.85595; qual_mismatch_simple_bayesian[34][13] = -4.08459; qual_mismatch_simple_bayesian[34][14] = -4.31281; qual_mismatch_simple_bayesian[34][15] = -4.5405; qual_mismatch_simple_bayesian[34][16] = -4.76755; qual_mismatch_simple_bayesian[34][17] = -4.99377; qual_mismatch_simple_bayesian[34][18] = -5.21897; qual_mismatch_simple_bayesian[34][19] = -5.44291; qual_mismatch_simple_bayesian[34][20] = -5.66525; qual_mismatch_simple_bayesian[34][21] = -5.88564; qual_mismatch_simple_bayesian[34][22] = -6.10361; qual_mismatch_simple_bayesian[34][23] = -6.31861; qual_mismatch_simple_bayesian[34][24] = -6.52999; qual_mismatch_simple_bayesian[34][25] = -6.73697; qual_mismatch_simple_bayesian[34][26] = -6.93867; qual_mismatch_simple_bayesian[34][27] = -7.13411; qual_mismatch_simple_bayesian[34][28] = -7.32218; qual_mismatch_simple_bayesian[34][29] = -7.50174; qual_mismatch_simple_bayesian[34][30] = -7.67163; qual_mismatch_simple_bayesian[34][31] = -7.83072; qual_mismatch_simple_bayesian[34][32] = -7.97804; qual_mismatch_simple_bayesian[34][33] = -8.11281; qual_mismatch_simple_bayesian[34][34] = -8.23452; qual_mismatch_simple_bayesian[34][35] = -8.34301; qual_mismatch_simple_bayesian[34][36] = -8.43844; qual_mismatch_simple_bayesian[34][37] = -8.52132; qual_mismatch_simple_bayesian[34][38] = -8.59243; qual_mismatch_simple_bayesian[34][39] = -8.65276; qual_mismatch_simple_bayesian[34][40] = -8.70341; qual_mismatch_simple_bayesian[34][41] = -8.74556; qual_mismatch_simple_bayesian[34][42] = -8.78036; qual_mismatch_simple_bayesian[34][43] = -8.80888; qual_mismatch_simple_bayesian[34][44] = -8.83214; qual_mismatch_simple_bayesian[34][45] = -8.851; qual_mismatch_simple_bayesian[34][46] = -8.86625; qual_mismatch_simple_bayesian[35][0] = -1.09872; qual_mismatch_simple_bayesian[35][1] = -1.32889; qual_mismatch_simple_bayesian[35][2] = -1.55905; qual_mismatch_simple_bayesian[35][3] = -1.78918; qual_mismatch_simple_bayesian[35][4] = -2.01927; qual_mismatch_simple_bayesian[35][5] = -2.24933; qual_mismatch_simple_bayesian[35][6] = -2.47933; qual_mismatch_simple_bayesian[35][7] = -2.70926; qual_mismatch_simple_bayesian[35][8] = -2.93911; qual_mismatch_simple_bayesian[35][9] = -3.16885; qual_mismatch_simple_bayesian[35][10] = -3.39846; qual_mismatch_simple_bayesian[35][11] = -3.6279; qual_mismatch_simple_bayesian[35][12] = -3.85713; qual_mismatch_simple_bayesian[35][13] = -4.0861; qual_mismatch_simple_bayesian[35][14] = -4.31474; qual_mismatch_simple_bayesian[35][15] = -4.54296; qual_mismatch_simple_bayesian[35][16] = -4.77065; qual_mismatch_simple_bayesian[35][17] = -4.9977; qual_mismatch_simple_bayesian[35][18] = -5.22392; qual_mismatch_simple_bayesian[35][19] = -5.44913; qual_mismatch_simple_bayesian[35][20] = -5.67306; qual_mismatch_simple_bayesian[35][21] = -5.89541; qual_mismatch_simple_bayesian[35][22] = -6.1158; qual_mismatch_simple_bayesian[35][23] = -6.33377; qual_mismatch_simple_bayesian[35][24] = -6.54877; qual_mismatch_simple_bayesian[35][25] = -6.76015; qual_mismatch_simple_bayesian[35][26] = -6.96713; qual_mismatch_simple_bayesian[35][27] = -7.16884; qual_mismatch_simple_bayesian[35][28] = -7.36428; qual_mismatch_simple_bayesian[35][29] = -7.55235; qual_mismatch_simple_bayesian[35][30] = -7.73192; qual_mismatch_simple_bayesian[35][31] = -7.90181; qual_mismatch_simple_bayesian[35][32] = -8.06091; qual_mismatch_simple_bayesian[35][33] = -8.20823; qual_mismatch_simple_bayesian[35][34] = -8.34301; qual_mismatch_simple_bayesian[35][35] = -8.46472; qual_mismatch_simple_bayesian[35][36] = -8.57322; qual_mismatch_simple_bayesian[35][37] = -8.66866; qual_mismatch_simple_bayesian[35][38] = -8.75154; qual_mismatch_simple_bayesian[35][39] = -8.82266; qual_mismatch_simple_bayesian[35][40] = -8.88299; qual_mismatch_simple_bayesian[35][41] = -8.93365; qual_mismatch_simple_bayesian[35][42] = -8.9758; qual_mismatch_simple_bayesian[35][43] = -9.0106; qual_mismatch_simple_bayesian[35][44] = -9.03913; qual_mismatch_simple_bayesian[35][45] = -9.06239; qual_mismatch_simple_bayesian[35][46] = -9.08126; qual_mismatch_simple_bayesian[36][0] = -1.0987; qual_mismatch_simple_bayesian[36][1] = -1.32889; qual_mismatch_simple_bayesian[36][2] = -1.55907; qual_mismatch_simple_bayesian[36][3] = -1.78922; qual_mismatch_simple_bayesian[36][4] = -2.01935; qual_mismatch_simple_bayesian[36][5] = -2.24945; qual_mismatch_simple_bayesian[36][6] = -2.4795; qual_mismatch_simple_bayesian[36][7] = -2.7095; qual_mismatch_simple_bayesian[36][8] = -2.93943; qual_mismatch_simple_bayesian[36][9] = -3.16928; qual_mismatch_simple_bayesian[36][10] = -3.39902; qual_mismatch_simple_bayesian[36][11] = -3.62863; qual_mismatch_simple_bayesian[36][12] = -3.85807; qual_mismatch_simple_bayesian[36][13] = -4.08731; qual_mismatch_simple_bayesian[36][14] = -4.31627; qual_mismatch_simple_bayesian[36][15] = -4.54491; qual_mismatch_simple_bayesian[36][16] = -4.77313; qual_mismatch_simple_bayesian[36][17] = -5.00083; qual_mismatch_simple_bayesian[36][18] = -5.22787; qual_mismatch_simple_bayesian[36][19] = -5.4541; qual_mismatch_simple_bayesian[36][20] = -5.6793; qual_mismatch_simple_bayesian[36][21] = -5.90323; qual_mismatch_simple_bayesian[36][22] = -6.12558; qual_mismatch_simple_bayesian[36][23] = -6.34597; qual_mismatch_simple_bayesian[36][24] = -6.56395; qual_mismatch_simple_bayesian[36][25] = -6.77895; qual_mismatch_simple_bayesian[36][26] = -6.99033; qual_mismatch_simple_bayesian[36][27] = -7.19731; qual_mismatch_simple_bayesian[36][28] = -7.39902; qual_mismatch_simple_bayesian[36][29] = -7.59446; qual_mismatch_simple_bayesian[36][30] = -7.78254; qual_mismatch_simple_bayesian[36][31] = -7.96211; qual_mismatch_simple_bayesian[36][32] = -8.132; qual_mismatch_simple_bayesian[36][33] = -8.29111; qual_mismatch_simple_bayesian[36][34] = -8.43844; qual_mismatch_simple_bayesian[36][35] = -8.57322; qual_mismatch_simple_bayesian[36][36] = -8.69494; qual_mismatch_simple_bayesian[36][37] = -8.80344; qual_mismatch_simple_bayesian[36][38] = -8.89888; qual_mismatch_simple_bayesian[36][39] = -8.98177; qual_mismatch_simple_bayesian[36][40] = -9.05289; qual_mismatch_simple_bayesian[36][41] = -9.11323; qual_mismatch_simple_bayesian[36][42] = -9.16389; qual_mismatch_simple_bayesian[36][43] = -9.20605; qual_mismatch_simple_bayesian[36][44] = -9.24085; qual_mismatch_simple_bayesian[36][45] = -9.26938; qual_mismatch_simple_bayesian[36][46] = -9.29264; qual_mismatch_simple_bayesian[37][0] = -1.09868; qual_mismatch_simple_bayesian[37][1] = -1.32889; qual_mismatch_simple_bayesian[37][2] = -1.55908; qual_mismatch_simple_bayesian[37][3] = -1.78926; qual_mismatch_simple_bayesian[37][4] = -2.01941; qual_mismatch_simple_bayesian[37][5] = -2.24954; qual_mismatch_simple_bayesian[37][6] = -2.47964; qual_mismatch_simple_bayesian[37][7] = -2.70969; qual_mismatch_simple_bayesian[37][8] = -2.93969; qual_mismatch_simple_bayesian[37][9] = -3.16962; qual_mismatch_simple_bayesian[37][10] = -3.39947; qual_mismatch_simple_bayesian[37][11] = -3.62921; qual_mismatch_simple_bayesian[37][12] = -3.85882; qual_mismatch_simple_bayesian[37][13] = -4.08826; qual_mismatch_simple_bayesian[37][14] = -4.3175; qual_mismatch_simple_bayesian[37][15] = -4.54646; qual_mismatch_simple_bayesian[37][16] = -4.7751; qual_mismatch_simple_bayesian[37][17] = -5.00332; qual_mismatch_simple_bayesian[37][18] = -5.23102; qual_mismatch_simple_bayesian[37][19] = -5.45806; qual_mismatch_simple_bayesian[37][20] = -5.68429; qual_mismatch_simple_bayesian[37][21] = -5.90949; qual_mismatch_simple_bayesian[37][22] = -6.13342; qual_mismatch_simple_bayesian[37][23] = -6.35578; qual_mismatch_simple_bayesian[37][24] = -6.57617; qual_mismatch_simple_bayesian[37][25] = -6.79414; qual_mismatch_simple_bayesian[37][26] = -7.00914; qual_mismatch_simple_bayesian[37][27] = -7.22052; qual_mismatch_simple_bayesian[37][28] = -7.42751; qual_mismatch_simple_bayesian[37][29] = -7.62922; qual_mismatch_simple_bayesian[37][30] = -7.82466; qual_mismatch_simple_bayesian[37][31] = -8.01274; qual_mismatch_simple_bayesian[37][32] = -8.19232; qual_mismatch_simple_bayesian[37][33] = -8.36221; qual_mismatch_simple_bayesian[37][34] = -8.52132; qual_mismatch_simple_bayesian[37][35] = -8.66866; qual_mismatch_simple_bayesian[37][36] = -8.80344; qual_mismatch_simple_bayesian[37][37] = -8.92516; qual_mismatch_simple_bayesian[37][38] = -9.03366; qual_mismatch_simple_bayesian[37][39] = -9.12911; qual_mismatch_simple_bayesian[37][40] = -9.21201; qual_mismatch_simple_bayesian[37][41] = -9.28313; qual_mismatch_simple_bayesian[37][42] = -9.34347; qual_mismatch_simple_bayesian[37][43] = -9.39414; qual_mismatch_simple_bayesian[37][44] = -9.43629; qual_mismatch_simple_bayesian[37][45] = -9.4711; qual_mismatch_simple_bayesian[37][46] = -9.49963; qual_mismatch_simple_bayesian[38][0] = -1.09867; qual_mismatch_simple_bayesian[38][1] = -1.32888; qual_mismatch_simple_bayesian[38][2] = -1.55909; qual_mismatch_simple_bayesian[38][3] = -1.78928; qual_mismatch_simple_bayesian[38][4] = -2.01946; qual_mismatch_simple_bayesian[38][5] = -2.24962; qual_mismatch_simple_bayesian[38][6] = -2.47974; qual_mismatch_simple_bayesian[38][7] = -2.70984; qual_mismatch_simple_bayesian[38][8] = -2.93989; qual_mismatch_simple_bayesian[38][9] = -3.16989; qual_mismatch_simple_bayesian[38][10] = -3.39982; qual_mismatch_simple_bayesian[38][11] = -3.62967; qual_mismatch_simple_bayesian[38][12] = -3.85942; qual_mismatch_simple_bayesian[38][13] = -4.08903; qual_mismatch_simple_bayesian[38][14] = -4.31847; qual_mismatch_simple_bayesian[38][15] = -4.5477; qual_mismatch_simple_bayesian[38][16] = -4.77667; qual_mismatch_simple_bayesian[38][17] = -5.0053; qual_mismatch_simple_bayesian[38][18] = -5.23352; qual_mismatch_simple_bayesian[38][19] = -5.46122; qual_mismatch_simple_bayesian[38][20] = -5.68827; qual_mismatch_simple_bayesian[38][21] = -5.91449; qual_mismatch_simple_bayesian[38][22] = -6.1397; qual_mismatch_simple_bayesian[38][23] = -6.36363; qual_mismatch_simple_bayesian[38][24] = -6.58598; qual_mismatch_simple_bayesian[38][25] = -6.80637; qual_mismatch_simple_bayesian[38][26] = -7.02435; qual_mismatch_simple_bayesian[38][27] = -7.23935; qual_mismatch_simple_bayesian[38][28] = -7.45073; qual_mismatch_simple_bayesian[38][29] = -7.65772; qual_mismatch_simple_bayesian[38][30] = -7.85943; qual_mismatch_simple_bayesian[38][31] = -8.05488; qual_mismatch_simple_bayesian[38][32] = -8.24296; qual_mismatch_simple_bayesian[38][33] = -8.42253; qual_mismatch_simple_bayesian[38][34] = -8.59243; qual_mismatch_simple_bayesian[38][35] = -8.75154; qual_mismatch_simple_bayesian[38][36] = -8.89888; qual_mismatch_simple_bayesian[38][37] = -9.03366; qual_mismatch_simple_bayesian[38][38] = -9.15539; qual_mismatch_simple_bayesian[38][39] = -9.2639; qual_mismatch_simple_bayesian[38][40] = -9.35935; qual_mismatch_simple_bayesian[38][41] = -9.44225; qual_mismatch_simple_bayesian[38][42] = -9.51338; qual_mismatch_simple_bayesian[38][43] = -9.57372; qual_mismatch_simple_bayesian[38][44] = -9.62438; qual_mismatch_simple_bayesian[38][45] = -9.66654; qual_mismatch_simple_bayesian[38][46] = -9.70135; qual_mismatch_simple_bayesian[39][0] = -1.09865; qual_mismatch_simple_bayesian[39][1] = -1.32888; qual_mismatch_simple_bayesian[39][2] = -1.5591; qual_mismatch_simple_bayesian[39][3] = -1.7893; qual_mismatch_simple_bayesian[39][4] = -2.0195; qual_mismatch_simple_bayesian[39][5] = -2.24967; qual_mismatch_simple_bayesian[39][6] = -2.47983; qual_mismatch_simple_bayesian[39][7] = -2.70996; qual_mismatch_simple_bayesian[39][8] = -2.94005; qual_mismatch_simple_bayesian[39][9] = -3.17011; qual_mismatch_simple_bayesian[39][10] = -3.40011; qual_mismatch_simple_bayesian[39][11] = -3.63004; qual_mismatch_simple_bayesian[39][12] = -3.85989; qual_mismatch_simple_bayesian[39][13] = -4.08963; qual_mismatch_simple_bayesian[39][14] = -4.31924; qual_mismatch_simple_bayesian[39][15] = -4.54868; qual_mismatch_simple_bayesian[39][16] = -4.77792; qual_mismatch_simple_bayesian[39][17] = -5.00688; qual_mismatch_simple_bayesian[39][18] = -5.23552; qual_mismatch_simple_bayesian[39][19] = -5.46374; qual_mismatch_simple_bayesian[39][20] = -5.69144; qual_mismatch_simple_bayesian[39][21] = -5.91848; qual_mismatch_simple_bayesian[39][22] = -6.14471; qual_mismatch_simple_bayesian[39][23] = -6.36991; qual_mismatch_simple_bayesian[39][24] = -6.59385; qual_mismatch_simple_bayesian[39][25] = -6.8162; qual_mismatch_simple_bayesian[39][26] = -7.03659; qual_mismatch_simple_bayesian[39][27] = -7.25456; qual_mismatch_simple_bayesian[39][28] = -7.46957; qual_mismatch_simple_bayesian[39][29] = -7.68095; qual_mismatch_simple_bayesian[39][30] = -7.88794; qual_mismatch_simple_bayesian[39][31] = -8.08965; qual_mismatch_simple_bayesian[39][32] = -8.2851; qual_mismatch_simple_bayesian[39][33] = -8.47318; qual_mismatch_simple_bayesian[39][34] = -8.65276; qual_mismatch_simple_bayesian[39][35] = -8.82266; qual_mismatch_simple_bayesian[39][36] = -8.98177; qual_mismatch_simple_bayesian[39][37] = -9.12911; qual_mismatch_simple_bayesian[39][38] = -9.2639; qual_mismatch_simple_bayesian[39][39] = -9.38563; qual_mismatch_simple_bayesian[39][40] = -9.49414; qual_mismatch_simple_bayesian[39][41] = -9.58959; qual_mismatch_simple_bayesian[39][42] = -9.67249; qual_mismatch_simple_bayesian[39][43] = -9.74362; qual_mismatch_simple_bayesian[39][44] = -9.80396; qual_mismatch_simple_bayesian[39][45] = -9.85463; qual_mismatch_simple_bayesian[39][46] = -9.8968; qual_mismatch_simple_bayesian[40][0] = -1.09865; qual_mismatch_simple_bayesian[40][1] = -1.32888; qual_mismatch_simple_bayesian[40][2] = -1.5591; qual_mismatch_simple_bayesian[40][3] = -1.78932; qual_mismatch_simple_bayesian[40][4] = -2.01953; qual_mismatch_simple_bayesian[40][5] = -2.24972; qual_mismatch_simple_bayesian[40][6] = -2.4799; qual_mismatch_simple_bayesian[40][7] = -2.71005; qual_mismatch_simple_bayesian[40][8] = -2.94018; qual_mismatch_simple_bayesian[40][9] = -3.17028; qual_mismatch_simple_bayesian[40][10] = -3.40033; qual_mismatch_simple_bayesian[40][11] = -3.63033; qual_mismatch_simple_bayesian[40][12] = -3.86026; qual_mismatch_simple_bayesian[40][13] = -4.09011; qual_mismatch_simple_bayesian[40][14] = -4.31986; qual_mismatch_simple_bayesian[40][15] = -4.54947; qual_mismatch_simple_bayesian[40][16] = -4.77891; qual_mismatch_simple_bayesian[40][17] = -5.00814; qual_mismatch_simple_bayesian[40][18] = -5.23711; qual_mismatch_simple_bayesian[40][19] = -5.46574; qual_mismatch_simple_bayesian[40][20] = -5.69396; qual_mismatch_simple_bayesian[40][21] = -5.92166; qual_mismatch_simple_bayesian[40][22] = -6.14871; qual_mismatch_simple_bayesian[40][23] = -6.37493; qual_mismatch_simple_bayesian[40][24] = -6.60014; qual_mismatch_simple_bayesian[40][25] = -6.82407; qual_mismatch_simple_bayesian[40][26] = -7.04642; qual_mismatch_simple_bayesian[40][27] = -7.26682; qual_mismatch_simple_bayesian[40][28] = -7.48479; qual_mismatch_simple_bayesian[40][29] = -7.6998; qual_mismatch_simple_bayesian[40][30] = -7.91118; qual_mismatch_simple_bayesian[40][31] = -8.11817; qual_mismatch_simple_bayesian[40][32] = -8.31988; qual_mismatch_simple_bayesian[40][33] = -8.51533; qual_mismatch_simple_bayesian[40][34] = -8.70341; qual_mismatch_simple_bayesian[40][35] = -8.88299; qual_mismatch_simple_bayesian[40][36] = -9.05289; qual_mismatch_simple_bayesian[40][37] = -9.21201; qual_mismatch_simple_bayesian[40][38] = -9.35935; qual_mismatch_simple_bayesian[40][39] = -9.49414; qual_mismatch_simple_bayesian[40][40] = -9.61587; qual_mismatch_simple_bayesian[40][41] = -9.72438; qual_mismatch_simple_bayesian[40][42] = -9.81984; qual_mismatch_simple_bayesian[40][43] = -9.90274; qual_mismatch_simple_bayesian[40][44] = -9.97387; qual_mismatch_simple_bayesian[40][45] = -10.0342; qual_mismatch_simple_bayesian[40][46] = -10.0849; qual_mismatch_simple_bayesian[41][0] = -1.09864; qual_mismatch_simple_bayesian[41][1] = -1.32888; qual_mismatch_simple_bayesian[41][2] = -1.55911; qual_mismatch_simple_bayesian[41][3] = -1.78934; qual_mismatch_simple_bayesian[41][4] = -2.01955; qual_mismatch_simple_bayesian[41][5] = -2.24976; qual_mismatch_simple_bayesian[41][6] = -2.47995; qual_mismatch_simple_bayesian[41][7] = -2.71013; qual_mismatch_simple_bayesian[41][8] = -2.94029; qual_mismatch_simple_bayesian[41][9] = -3.17041; qual_mismatch_simple_bayesian[41][10] = -3.40051; qual_mismatch_simple_bayesian[41][11] = -3.63056; qual_mismatch_simple_bayesian[41][12] = -3.86056; qual_mismatch_simple_bayesian[41][13] = -4.0905; qual_mismatch_simple_bayesian[41][14] = -4.32034; qual_mismatch_simple_bayesian[41][15] = -4.55009; qual_mismatch_simple_bayesian[41][16] = -4.7797; qual_mismatch_simple_bayesian[41][17] = -5.00914; qual_mismatch_simple_bayesian[41][18] = -5.23837; qual_mismatch_simple_bayesian[41][19] = -5.46734; qual_mismatch_simple_bayesian[41][20] = -5.69598; qual_mismatch_simple_bayesian[41][21] = -5.9242; qual_mismatch_simple_bayesian[41][22] = -6.15189; qual_mismatch_simple_bayesian[41][23] = -6.37894; qual_mismatch_simple_bayesian[41][24] = -6.60516; qual_mismatch_simple_bayesian[41][25] = -6.83037; qual_mismatch_simple_bayesian[41][26] = -7.0543; qual_mismatch_simple_bayesian[41][27] = -7.27666; qual_mismatch_simple_bayesian[41][28] = -7.49705; qual_mismatch_simple_bayesian[41][29] = -7.71502; qual_mismatch_simple_bayesian[41][30] = -7.93003; qual_mismatch_simple_bayesian[41][31] = -8.14141; qual_mismatch_simple_bayesian[41][32] = -8.3484; qual_mismatch_simple_bayesian[41][33] = -8.55012; qual_mismatch_simple_bayesian[41][34] = -8.74556; qual_mismatch_simple_bayesian[41][35] = -8.93365; qual_mismatch_simple_bayesian[41][36] = -9.11323; qual_mismatch_simple_bayesian[41][37] = -9.28313; qual_mismatch_simple_bayesian[41][38] = -9.44225; qual_mismatch_simple_bayesian[41][39] = -9.58959; qual_mismatch_simple_bayesian[41][40] = -9.72438; qual_mismatch_simple_bayesian[41][41] = -9.84612; qual_mismatch_simple_bayesian[41][42] = -9.95463; qual_mismatch_simple_bayesian[41][43] = -10.0501; qual_mismatch_simple_bayesian[41][44] = -10.133; qual_mismatch_simple_bayesian[41][45] = -10.2041; qual_mismatch_simple_bayesian[41][46] = -10.2645; qual_mismatch_simple_bayesian[42][0] = -1.09863; qual_mismatch_simple_bayesian[42][1] = -1.32888; qual_mismatch_simple_bayesian[42][2] = -1.55911; qual_mismatch_simple_bayesian[42][3] = -1.78935; qual_mismatch_simple_bayesian[42][4] = -2.01957; qual_mismatch_simple_bayesian[42][5] = -2.24979; qual_mismatch_simple_bayesian[42][6] = -2.48; qual_mismatch_simple_bayesian[42][7] = -2.71019; qual_mismatch_simple_bayesian[42][8] = -2.94037; qual_mismatch_simple_bayesian[42][9] = -3.17052; qual_mismatch_simple_bayesian[42][10] = -3.40065; qual_mismatch_simple_bayesian[42][11] = -3.63075; qual_mismatch_simple_bayesian[42][12] = -3.8608; qual_mismatch_simple_bayesian[42][13] = -4.0908; qual_mismatch_simple_bayesian[42][14] = -4.32073; qual_mismatch_simple_bayesian[42][15] = -4.55058; qual_mismatch_simple_bayesian[42][16] = -4.78032; qual_mismatch_simple_bayesian[42][17] = -5.00993; qual_mismatch_simple_bayesian[42][18] = -5.23938; qual_mismatch_simple_bayesian[42][19] = -5.46861; qual_mismatch_simple_bayesian[42][20] = -5.69758; qual_mismatch_simple_bayesian[42][21] = -5.92621; qual_mismatch_simple_bayesian[42][22] = -6.15443; qual_mismatch_simple_bayesian[42][23] = -6.38213; qual_mismatch_simple_bayesian[42][24] = -6.60917; qual_mismatch_simple_bayesian[42][25] = -6.8354; qual_mismatch_simple_bayesian[42][26] = -7.06061; qual_mismatch_simple_bayesian[42][27] = -7.28454; qual_mismatch_simple_bayesian[42][28] = -7.50689; qual_mismatch_simple_bayesian[42][29] = -7.72729; qual_mismatch_simple_bayesian[42][30] = -7.94526; qual_mismatch_simple_bayesian[42][31] = -8.16027; qual_mismatch_simple_bayesian[42][32] = -8.37165; qual_mismatch_simple_bayesian[42][33] = -8.57864; qual_mismatch_simple_bayesian[42][34] = -8.78036; qual_mismatch_simple_bayesian[42][35] = -8.9758; qual_mismatch_simple_bayesian[42][36] = -9.16389; qual_mismatch_simple_bayesian[42][37] = -9.34347; qual_mismatch_simple_bayesian[42][38] = -9.51338; qual_mismatch_simple_bayesian[42][39] = -9.67249; qual_mismatch_simple_bayesian[42][40] = -9.81984; qual_mismatch_simple_bayesian[42][41] = -9.95463; qual_mismatch_simple_bayesian[42][42] = -10.0764; qual_mismatch_simple_bayesian[42][43] = -10.1849; qual_mismatch_simple_bayesian[42][44] = -10.2803; qual_mismatch_simple_bayesian[42][45] = -10.3632; qual_mismatch_simple_bayesian[42][46] = -10.4344; qual_mismatch_simple_bayesian[43][0] = -1.09863; qual_mismatch_simple_bayesian[43][1] = -1.32887; qual_mismatch_simple_bayesian[43][2] = -1.55912; qual_mismatch_simple_bayesian[43][3] = -1.78935; qual_mismatch_simple_bayesian[43][4] = -2.01959; qual_mismatch_simple_bayesian[43][5] = -2.24981; qual_mismatch_simple_bayesian[43][6] = -2.48003; qual_mismatch_simple_bayesian[43][7] = -2.71024; qual_mismatch_simple_bayesian[43][8] = -2.94043; qual_mismatch_simple_bayesian[43][9] = -3.17061; qual_mismatch_simple_bayesian[43][10] = -3.40076; qual_mismatch_simple_bayesian[43][11] = -3.63089; qual_mismatch_simple_bayesian[43][12] = -3.86099; qual_mismatch_simple_bayesian[43][13] = -4.09104; qual_mismatch_simple_bayesian[43][14] = -4.32104; qual_mismatch_simple_bayesian[43][15] = -4.55097; qual_mismatch_simple_bayesian[43][16] = -4.78082; qual_mismatch_simple_bayesian[43][17] = -5.01056; qual_mismatch_simple_bayesian[43][18] = -5.24017; qual_mismatch_simple_bayesian[43][19] = -5.46962; qual_mismatch_simple_bayesian[43][20] = -5.69885; qual_mismatch_simple_bayesian[43][21] = -5.92782; qual_mismatch_simple_bayesian[43][22] = -6.15645; qual_mismatch_simple_bayesian[43][23] = -6.38467; qual_mismatch_simple_bayesian[43][24] = -6.61237; qual_mismatch_simple_bayesian[43][25] = -6.83942; qual_mismatch_simple_bayesian[43][26] = -7.06564; qual_mismatch_simple_bayesian[43][27] = -7.29085; qual_mismatch_simple_bayesian[43][28] = -7.51478; qual_mismatch_simple_bayesian[43][29] = -7.73713; qual_mismatch_simple_bayesian[43][30] = -7.95753; qual_mismatch_simple_bayesian[43][31] = -8.1755; qual_mismatch_simple_bayesian[43][32] = -8.39051; qual_mismatch_simple_bayesian[43][33] = -8.60189; qual_mismatch_simple_bayesian[43][34] = -8.80888; qual_mismatch_simple_bayesian[43][35] = -9.0106; qual_mismatch_simple_bayesian[43][36] = -9.20605; qual_mismatch_simple_bayesian[43][37] = -9.39414; qual_mismatch_simple_bayesian[43][38] = -9.57372; qual_mismatch_simple_bayesian[43][39] = -9.74362; qual_mismatch_simple_bayesian[43][40] = -9.90274; qual_mismatch_simple_bayesian[43][41] = -10.0501; qual_mismatch_simple_bayesian[43][42] = -10.1849; qual_mismatch_simple_bayesian[43][43] = -10.3066; qual_mismatch_simple_bayesian[43][44] = -10.4151; qual_mismatch_simple_bayesian[43][45] = -10.5106; qual_mismatch_simple_bayesian[43][46] = -10.5935; qual_mismatch_simple_bayesian[44][0] = -1.09863; qual_mismatch_simple_bayesian[44][1] = -1.32887; qual_mismatch_simple_bayesian[44][2] = -1.55912; qual_mismatch_simple_bayesian[44][3] = -1.78936; qual_mismatch_simple_bayesian[44][4] = -2.0196; qual_mismatch_simple_bayesian[44][5] = -2.24983; qual_mismatch_simple_bayesian[44][6] = -2.48006; qual_mismatch_simple_bayesian[44][7] = -2.71028; qual_mismatch_simple_bayesian[44][8] = -2.94048; qual_mismatch_simple_bayesian[44][9] = -3.17068; qual_mismatch_simple_bayesian[44][10] = -3.40085; qual_mismatch_simple_bayesian[44][11] = -3.63101; qual_mismatch_simple_bayesian[44][12] = -3.86114; qual_mismatch_simple_bayesian[44][13] = -4.09123; qual_mismatch_simple_bayesian[44][14] = -4.32128; qual_mismatch_simple_bayesian[44][15] = -4.55128; qual_mismatch_simple_bayesian[44][16] = -4.78122; qual_mismatch_simple_bayesian[44][17] = -5.01107; qual_mismatch_simple_bayesian[44][18] = -5.24081; qual_mismatch_simple_bayesian[44][19] = -5.47042; qual_mismatch_simple_bayesian[44][20] = -5.69986; qual_mismatch_simple_bayesian[44][21] = -5.92909; qual_mismatch_simple_bayesian[44][22] = -6.15806; qual_mismatch_simple_bayesian[44][23] = -6.3867; qual_mismatch_simple_bayesian[44][24] = -6.61492; qual_mismatch_simple_bayesian[44][25] = -6.84262; qual_mismatch_simple_bayesian[44][26] = -7.06966; qual_mismatch_simple_bayesian[44][27] = -7.29589; qual_mismatch_simple_bayesian[44][28] = -7.52109; qual_mismatch_simple_bayesian[44][29] = -7.74503; qual_mismatch_simple_bayesian[44][30] = -7.96738; qual_mismatch_simple_bayesian[44][31] = -8.18777; qual_mismatch_simple_bayesian[44][32] = -8.40575; qual_mismatch_simple_bayesian[44][33] = -8.62076; qual_mismatch_simple_bayesian[44][34] = -8.83214; qual_mismatch_simple_bayesian[44][35] = -9.03913; qual_mismatch_simple_bayesian[44][36] = -9.24085; qual_mismatch_simple_bayesian[44][37] = -9.43629; qual_mismatch_simple_bayesian[44][38] = -9.62438; qual_mismatch_simple_bayesian[44][39] = -9.80396; qual_mismatch_simple_bayesian[44][40] = -9.97387; qual_mismatch_simple_bayesian[44][41] = -10.133; qual_mismatch_simple_bayesian[44][42] = -10.2803; qual_mismatch_simple_bayesian[44][43] = -10.4151; qual_mismatch_simple_bayesian[44][44] = -10.5369; qual_mismatch_simple_bayesian[44][45] = -10.6454; qual_mismatch_simple_bayesian[44][46] = -10.7408; qual_mismatch_simple_bayesian[45][0] = -1.09862; qual_mismatch_simple_bayesian[45][1] = -1.32887; qual_mismatch_simple_bayesian[45][2] = -1.55912; qual_mismatch_simple_bayesian[45][3] = -1.78937; qual_mismatch_simple_bayesian[45][4] = -2.01961; qual_mismatch_simple_bayesian[45][5] = -2.24985; qual_mismatch_simple_bayesian[45][6] = -2.48008; qual_mismatch_simple_bayesian[45][7] = -2.71031; qual_mismatch_simple_bayesian[45][8] = -2.94052; qual_mismatch_simple_bayesian[45][9] = -3.17073; qual_mismatch_simple_bayesian[45][10] = -3.40092; qual_mismatch_simple_bayesian[45][11] = -3.6311; qual_mismatch_simple_bayesian[45][12] = -3.86126; qual_mismatch_simple_bayesian[45][13] = -4.09138; qual_mismatch_simple_bayesian[45][14] = -4.32148; qual_mismatch_simple_bayesian[45][15] = -4.55153; qual_mismatch_simple_bayesian[45][16] = -4.78153; qual_mismatch_simple_bayesian[45][17] = -5.01147; qual_mismatch_simple_bayesian[45][18] = -5.24131; qual_mismatch_simple_bayesian[45][19] = -5.47106; qual_mismatch_simple_bayesian[45][20] = -5.70067; qual_mismatch_simple_bayesian[45][21] = -5.93011; qual_mismatch_simple_bayesian[45][22] = -6.15934; qual_mismatch_simple_bayesian[45][23] = -6.38831; qual_mismatch_simple_bayesian[45][24] = -6.61695; qual_mismatch_simple_bayesian[45][25] = -6.84517; qual_mismatch_simple_bayesian[45][26] = -7.07286; qual_mismatch_simple_bayesian[45][27] = -7.29991; qual_mismatch_simple_bayesian[45][28] = -7.52614; qual_mismatch_simple_bayesian[45][29] = -7.75134; qual_mismatch_simple_bayesian[45][30] = -7.97528; qual_mismatch_simple_bayesian[45][31] = -8.19763; qual_mismatch_simple_bayesian[45][32] = -8.41802; qual_mismatch_simple_bayesian[45][33] = -8.636; qual_mismatch_simple_bayesian[45][34] = -8.851; qual_mismatch_simple_bayesian[45][35] = -9.06239; qual_mismatch_simple_bayesian[45][36] = -9.26938; qual_mismatch_simple_bayesian[45][37] = -9.4711; qual_mismatch_simple_bayesian[45][38] = -9.66654; qual_mismatch_simple_bayesian[45][39] = -9.85463; qual_mismatch_simple_bayesian[45][40] = -10.0342; qual_mismatch_simple_bayesian[45][41] = -10.2041; qual_mismatch_simple_bayesian[45][42] = -10.3632; qual_mismatch_simple_bayesian[45][43] = -10.5106; qual_mismatch_simple_bayesian[45][44] = -10.6454; qual_mismatch_simple_bayesian[45][45] = -10.7671; qual_mismatch_simple_bayesian[45][46] = -10.8756; qual_mismatch_simple_bayesian[46][0] = -1.09862; qual_mismatch_simple_bayesian[46][1] = -1.32887; qual_mismatch_simple_bayesian[46][2] = -1.55912; qual_mismatch_simple_bayesian[46][3] = -1.78937; qual_mismatch_simple_bayesian[46][4] = -2.01962; qual_mismatch_simple_bayesian[46][5] = -2.24986; qual_mismatch_simple_bayesian[46][6] = -2.4801; qual_mismatch_simple_bayesian[46][7] = -2.71033; qual_mismatch_simple_bayesian[46][8] = -2.94056; qual_mismatch_simple_bayesian[46][9] = -3.17077; qual_mismatch_simple_bayesian[46][10] = -3.40098; qual_mismatch_simple_bayesian[46][11] = -3.63117; qual_mismatch_simple_bayesian[46][12] = -3.86135; qual_mismatch_simple_bayesian[46][13] = -4.09151; qual_mismatch_simple_bayesian[46][14] = -4.32163; qual_mismatch_simple_bayesian[46][15] = -4.55173; qual_mismatch_simple_bayesian[46][16] = -4.78178; qual_mismatch_simple_bayesian[46][17] = -5.01178; qual_mismatch_simple_bayesian[46][18] = -5.24172; qual_mismatch_simple_bayesian[46][19] = -5.47156; qual_mismatch_simple_bayesian[46][20] = -5.70131; qual_mismatch_simple_bayesian[46][21] = -5.93092; qual_mismatch_simple_bayesian[46][22] = -6.16036; qual_mismatch_simple_bayesian[46][23] = -6.38959; qual_mismatch_simple_bayesian[46][24] = -6.61856; qual_mismatch_simple_bayesian[46][25] = -6.8472; qual_mismatch_simple_bayesian[46][26] = -7.07542; qual_mismatch_simple_bayesian[46][27] = -7.30311; qual_mismatch_simple_bayesian[46][28] = -7.53016; qual_mismatch_simple_bayesian[46][29] = -7.75639; qual_mismatch_simple_bayesian[46][30] = -7.98159; qual_mismatch_simple_bayesian[46][31] = -8.20553; qual_mismatch_simple_bayesian[46][32] = -8.42788; qual_mismatch_simple_bayesian[46][33] = -8.64827; qual_mismatch_simple_bayesian[46][34] = -8.86625; qual_mismatch_simple_bayesian[46][35] = -9.08126; qual_mismatch_simple_bayesian[46][36] = -9.29264; qual_mismatch_simple_bayesian[46][37] = -9.49963; qual_mismatch_simple_bayesian[46][38] = -9.70135; qual_mismatch_simple_bayesian[46][39] = -9.8968; qual_mismatch_simple_bayesian[46][40] = -10.0849; qual_mismatch_simple_bayesian[46][41] = -10.2645; qual_mismatch_simple_bayesian[46][42] = -10.4344; qual_mismatch_simple_bayesian[46][43] = -10.5935; qual_mismatch_simple_bayesian[46][44] = -10.7408; qual_mismatch_simple_bayesian[46][45] = -10.8756; qual_mismatch_simple_bayesian[46][46] = -10.9974; vector qual_score; qual_score.resize(47); qual_score[0] = -2; qual_score[1] = -1.58147; qual_score[2] = -0.996843; qual_score[3] = -0.695524; qual_score[4] = -0.507676; qual_score[5] = -0.38013; qual_score[6] = -0.289268; qual_score[7] = -0.222552; qual_score[8] = -0.172557; qual_score[9] = -0.134552; qual_score[10] = -0.105361; qual_score[11] = -0.0827653; qual_score[12] = -0.0651742; qual_score[13] = -0.0514183; qual_score[14] = -0.0406248; qual_score[15] = -0.0321336; qual_score[16] = -0.0254397; qual_score[17] = -0.0201544; qual_score[18] = -0.0159759; qual_score[19] = -0.0126692; qual_score[20] = -0.0100503; qual_score[21] = -0.007975; qual_score[22] = -0.00632956; qual_score[23] = -0.00502447; qual_score[24] = -0.00398902; qual_score[25] = -0.00316729; qual_score[26] = -0.00251505; qual_score[27] = -0.00199726; qual_score[28] = -0.00158615; qual_score[29] = -0.00125972; qual_score[30] = -0.0010005; qual_score[31] = -0.000794644; qual_score[32] = -0.000631156; qual_score[33] = -0.000501313; qual_score[34] = -0.000398186; qual_score[35] = -0.000316278; qual_score[36] = -0.00025122; qual_score[37] = -0.000199546; qual_score[38] = -0.000158502; qual_score[39] = -0.0001259; qual_score[40] = -0.000100005; qual_score[41] = -7.9436e-05; qual_score[42] = -6.30977e-05; qual_score[43] = -5.012e-05; qual_score[44] = -3.98115e-05; qual_score[45] = -3.16233e-05; qual_score[46] = -2.51192e-05; int longestBase = 1000; Alignment* alignment; if(pDataArray->align == "gotoh") { alignment = new GotohOverlap(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch, longestBase); } else if(pDataArray->align == "needleman") { alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, longestBase); } else if(pDataArray->align == "kmer") { alignment = new KmerAlign(pDataArray->kmerSize); } string thisfqualindexfile, thisrqualindexfile, thisffastafile, thisrfastafile; thisfqualindexfile = ""; thisrqualindexfile = ""; thisffastafile = pDataArray->inputFiles[0]; thisrfastafile = pDataArray->inputFiles[1]; if (pDataArray->qualOrIndexFiles.size() != 0) { thisfqualindexfile = pDataArray->qualOrIndexFiles[0]; thisrqualindexfile = pDataArray->qualOrIndexFiles[1]; } if (pDataArray->m->debug) { pDataArray->m->mothurOut("[DEBUG]: ffasta = " + thisffastafile + ".\n[DEBUG]: rfasta = " + thisrfastafile + ".\n[DEBUG]: fqualindex = " + thisfqualindexfile + ".\n[DEBUG]: rqualindex = " + thisfqualindexfile + ".\n"); } ifstream inFFasta, inRFasta, inFQualIndex, inRQualIndex; #ifdef USE_BOOST boost::iostreams::filtering_istream inFF, inRF, inFQ, inRQ; #endif if (!pDataArray->gz) { //plain text files pDataArray->m->openInputFile(thisffastafile, inFFasta); pDataArray->m->openInputFile(thisrfastafile, inRFasta); }else { //compressed files - no need to seekg because compressed files divide workload differently #ifdef USE_BOOST pDataArray->m->openInputFileBinary(thisffastafile, inFFasta, inFF); pDataArray->m->openInputFileBinary(thisrfastafile, inRFasta, inRF); #endif } ofstream outFasta, outMisMatch, outScrapFasta, outQual, outScrapQual; if (thisfqualindexfile != "") { if (thisfqualindexfile != "NONE") { if (!pDataArray->gz) { //plain text files pDataArray->m->openInputFile(thisfqualindexfile, inFQualIndex); }else { #ifdef USE_BOOST pDataArray->m->openInputFileBinary(thisfqualindexfile, inFQualIndex, inFQ); #endif } //compressed files - no need to seekg because compressed files divide workload differently } else { thisfqualindexfile = ""; } if (thisrqualindexfile != "NONE") { if (!pDataArray->gz) { //plain text files pDataArray->m->openInputFile(thisrqualindexfile, inRQualIndex); }else { #ifdef USE_BOOST pDataArray->m->openInputFileBinary(thisrqualindexfile, inRQualIndex, inRQ); #endif } //compressed files - no need to seekg because compressed files divide workload differently } else { thisrqualindexfile = ""; } } pDataArray->m->openOutputFile(pDataArray->outputFasta, outFasta); pDataArray->m->openOutputFile(pDataArray->outputScrapFasta, outScrapFasta); pDataArray->m->openOutputFile(pDataArray->outputMisMatches, outMisMatch); bool hasQuality = false; bool hasIndex = false; outMisMatch << "Name\tLength\tOverlap_Length\tOverlap_Start\tOverlap_End\tMisMatches\tNum_Ns\n"; if (pDataArray->delim == '@') { //fastq files so make an output quality pDataArray->m->openOutputFile(pDataArray->outputQual, outQual); pDataArray->m->openOutputFile(pDataArray->outputScrapQual, outScrapQual); if (thisfqualindexfile != "") { if (thisfqualindexfile != "NONE") { hasIndex = true; } } if (thisrqualindexfile != "") { if (thisrqualindexfile != "NONE") { hasIndex = true; } } hasQuality = true; }else if ((pDataArray->delim == '>') && (pDataArray->qualOrIndexFiles.size() != 0)) { //fasta and qual files pDataArray->m->openOutputFile(pDataArray->outputQual, outQual); pDataArray->m->openOutputFile(pDataArray->outputScrapQual, outScrapQual); hasQuality = true; } if(pDataArray->allFiles){ for (int i = 0; i < pDataArray->fastaFileNames.size(); i++) { //clears old file for (int j = 0; j < pDataArray->fastaFileNames[i].size(); j++) { //clears old file if (pDataArray->fastaFileNames[i][j] != "") { ofstream temp, temp2; pDataArray->m->openOutputFile(pDataArray->fastaFileNames[i][j], temp); temp.close(); pDataArray->m->openOutputFile(pDataArray->qualFileNames[i][j], temp2); temp2.close(); } } } } Oligos oligos; if (pDataArray->oligosfile != "") { oligos.read(pDataArray->oligosfile, false); } int numFPrimers = oligos.getPairedPrimers().size(); int numBarcodes = oligos.getPairedBarcodes().size(); TrimOligos trimOligos(pDataArray->pdiffs, pDataArray->bdiffs, 0, 0, oligos.getPairedPrimers(), oligos.getPairedBarcodes(), hasIndex); TrimOligos* rtrimOligos = NULL; if (pDataArray->reorient) { rtrimOligos = new TrimOligos(pDataArray->pdiffs, pDataArray->bdiffs, 0, 0, oligos.getReorientedPairedPrimers(), oligos.getReorientedPairedBarcodes(), hasIndex); numBarcodes = oligos.getReorientedPairedBarcodes().size(); } //for(int i = 0; i < pDataArray->linesInput_end; i++){ //end is the number of sequences to process bool good = true; while (good) { if (pDataArray->m->control_pressed) { break; } int success = 1; string trashCode = ""; string commentString = ""; int currentSeqsDiffs = 0; bool ignore; ignore = false; Sequence fSeq, rSeq; QualityScores* fQual = NULL; QualityScores* rQual = NULL; QualityScores* savedFQual = NULL; QualityScores* savedRQual = NULL; Sequence findexBarcode("findex", "NONE"); Sequence rindexBarcode("rindex", "NONE"); if (!pDataArray->gz) { if (pDataArray->delim == '@') { //fastq files bool tignore; FastqRead fread(inFFasta, tignore, pDataArray->format); pDataArray->m->gobble(inFFasta); FastqRead rread(inRFasta, ignore, pDataArray->format); pDataArray->m->gobble(inRFasta); ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == rread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = rread.getName().substr(0, rread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (rread.getName()[rread.getName().length()-1] == '2')) { fread.setName(tempFRead); rread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead f2read(inFFasta, tignore, pDataArray->format); pDataArray->m->gobble(inFFasta); ///bool fixed = checkName(f2read, rread); ////////////////////////////////////////////////////////////// fixed = false; if (f2read.getName() == rread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = f2read.getName().substr(0, f2read.getName().length()-1); string tempRRead = rread.getName().substr(0, rread.getName().length()-1); if (tempFRead == tempRRead) { if ((f2read.getName()[f2read.getName().length()-1] == '1') && (rread.getName()[rread.getName().length()-1] == '2')) { f2read.setName(tempFRead); rread.setName(tempRRead); fixed = true; } } } if (!fixed) { FastqRead r2read(inRFasta, ignore, pDataArray->format); pDataArray->m->gobble(inRFasta); ///bool fixed = checkName(fread, r2read); ////////////////////////////////////////////////////////////// fixed = false; if (fread.getName() == r2read.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = r2read.getName().substr(0, r2read.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (r2read.getName()[r2read.getName().length()-1] == '2')) { fread.setName(tempFRead); r2read.setName(tempRRead); fixed = true; } } } if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, " + fread.getName() + ".\n"); ignore = true; } else { rread = r2read; } }else { fread = f2read; } ///////////////////////////////////////////////////////////// } if (tignore) { ignore=true; } fSeq.setName(fread.getName()); fSeq.setAligned(fread.getSeq()); rSeq.setName(rread.getName()); rSeq.setAligned(rread.getSeq()); fQual = new QualityScores(fread.getName(), fread.getScores()); rQual = new QualityScores(rread.getName(), rread.getScores()); savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (thisfqualindexfile != "") { //forward index file FastqRead firead(inFQualIndex, tignore, pDataArray->format); pDataArray->m->gobble(inFQualIndex); if (tignore) { ignore=true; } findexBarcode.setAligned(firead.getSeq()); ///bool fixed = checkName(fread, firead); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == firead.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = firead.getName().substr(0, firead.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (firead.getName()[firead.getName().length()-1] == '2')) { fread.setName(tempFRead); firead.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead f2iread(inFQualIndex, tignore, pDataArray->format); pDataArray->m->gobble(inFQualIndex); fixed = false; if (fread.getName() == f2iread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = f2iread.getName().substr(0, f2iread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (f2iread.getName()[f2iread.getName().length()-1] == '2')) { fread.setName(tempFRead); f2iread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { firead = f2iread; findexBarcode.setAligned(f2iread.getSeq()); } } } if (thisrqualindexfile != "") { //reverse index file FastqRead riread(inRQualIndex, tignore, pDataArray->format); pDataArray->m->gobble(inRQualIndex); if (tignore) { ignore=true; } rindexBarcode.setAligned(riread.getSeq()); ///bool fixed = checkName(fread, riread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == riread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = riread.getName().substr(0, riread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (riread.getName()[riread.getName().length()-1] == '2')) { fread.setName(tempFRead); riread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead r2iread(inRQualIndex, tignore, pDataArray->format); pDataArray->m->gobble(inRQualIndex); fixed = false; if (fread.getName() == r2iread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = r2iread.getName().substr(0, r2iread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (r2iread.getName()[r2iread.getName().length()-1] == '2')) { fread.setName(tempFRead); r2iread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in reverse index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { riread = r2iread; rindexBarcode.setAligned(riread.getSeq()); } } } }else { //reading fasta and maybe qual Sequence tfSeq(inFFasta); pDataArray->m->gobble(inFFasta); Sequence trSeq(inRFasta); pDataArray->m->gobble(inRFasta); ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (tfSeq.getName() == trSeq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tfSeq.getName().substr(0, tfSeq.getName().length()-1); string tempRRead = trSeq.getName().substr(0, trSeq.getName().length()-1); if (tempFRead == tempRRead) { if ((tfSeq.getName()[tfSeq.getName().length()-1] == '1') && (trSeq.getName()[trSeq.getName().length()-1] == '2')) { tfSeq.setName(tempFRead); trSeq.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { Sequence tf2Seq(inFFasta); pDataArray->m->gobble(inFFasta); ///bool fixed = checkName(f2read, rread); ////////////////////////////////////////////////////////////// fixed = false; if (tf2Seq.getName() == trSeq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tf2Seq.getName().substr(0, tf2Seq.getName().length()-1); string tempRRead = trSeq.getName().substr(0, trSeq.getName().length()-1); if (tempFRead == tempRRead) { if ((tf2Seq.getName()[tf2Seq.getName().length()-1] == '1') && (trSeq.getName()[trSeq.getName().length()-1] == '2')) { tf2Seq.setName(tempFRead); trSeq.setName(tempRRead); fixed = true; } } } if (!fixed) { Sequence tr2Seq(inRFasta); pDataArray->m->gobble(inRFasta); ///bool fixed = checkName(fread, r2read); ////////////////////////////////////////////////////////////// fixed = false; if (tfSeq.getName() == tr2Seq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tfSeq.getName().substr(0, tfSeq.getName().length()-1); string tempRRead = tr2Seq.getName().substr(0, tr2Seq.getName().length()-1); if (tempFRead == tempRRead) { if ((tfSeq.getName()[tfSeq.getName().length()-1] == '1') && (tr2Seq.getName()[tr2Seq.getName().length()-1] == '2')) { tfSeq.setName(tempFRead); tr2Seq.setName(tempRRead); fixed = true; } } } if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } else { trSeq = tr2Seq; } }else { tfSeq = tf2Seq; } ///////////////////////////////////////////////////////////// } fSeq.setName(tfSeq.getName()); fSeq.setAligned(tfSeq.getAligned()); rSeq.setName(trSeq.getName()); rSeq.setAligned(trSeq.getAligned()); if (thisfqualindexfile != "") { fQual = new QualityScores(inFQualIndex); pDataArray->m->gobble(inFQualIndex); rQual = new QualityScores(inRQualIndex); pDataArray->m->gobble(inRQualIndex); if (fQual->getName() != rQual->getName()) { ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fQual->getName() == rQual->getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fQual->getName().substr(0, fQual->getName().length()-1); string tempRRead = rQual->getName().substr(0, rQual->getName().length()-1); if (tempFRead == tempRRead) { if ((fQual->getName()[fQual->getName().length()-1] == '1') && (rQual->getName()[rQual->getName().length()-1] == '2')) { fQual->setName(tempFRead); rQual->setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse qfile file. Ignoring, " + fQual->getName() + ".\n"); ignore = true; } } savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (fQual->getName() != tfSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward quality file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } if (rQual->getName() != trSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in reverse quality file. Ignoring, " + trSeq.getName() + ".\n"); ignore = true; } } if (tfSeq.getName() != trSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } } }else { #ifdef USE_BOOST if (pDataArray->delim == '@') { //fastq files bool tignore; FastqRead fread(inFF, tignore, pDataArray->format); FastqRead rread(inRF, ignore, pDataArray->format); ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == rread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = rread.getName().substr(0, rread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (rread.getName()[rread.getName().length()-1] == '2')) { fread.setName(tempFRead); rread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead f2read(inFF, tignore, pDataArray->format); ///bool fixed = checkName(f2read, rread); ////////////////////////////////////////////////////////////// fixed = false; if (f2read.getName() == rread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = f2read.getName().substr(0, f2read.getName().length()-1); string tempRRead = rread.getName().substr(0, rread.getName().length()-1); if (tempFRead == tempRRead) { if ((f2read.getName()[f2read.getName().length()-1] == '1') && (rread.getName()[rread.getName().length()-1] == '2')) { f2read.setName(tempFRead); rread.setName(tempRRead); fixed = true; } } } if (!fixed) { FastqRead r2read(inRF, ignore, pDataArray->format); ///bool fixed = checkName(fread, r2read); ////////////////////////////////////////////////////////////// fixed = false; if (fread.getName() == r2read.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = r2read.getName().substr(0, r2read.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (r2read.getName()[r2read.getName().length()-1] == '2')) { fread.setName(tempFRead); r2read.setName(tempRRead); fixed = true; } } } if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, " + fread.getName() + ".\n"); ignore = true; } else { rread = r2read; } }else { fread = f2read; } ///////////////////////////////////////////////////////////// } if (tignore) { ignore=true; } fSeq.setName(fread.getName()); fSeq.setAligned(fread.getSeq()); rSeq.setName(rread.getName()); rSeq.setAligned(rread.getSeq()); fQual = new QualityScores(fread.getName(), fread.getScores()); rQual = new QualityScores(rread.getName(), rread.getScores()); savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (thisfqualindexfile != "") { //forward index file FastqRead firead(inFQ, tignore, pDataArray->format); if (tignore) { ignore=true; } findexBarcode.setAligned(firead.getSeq()); ///bool fixed = checkName(fread, firead); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == firead.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = firead.getName().substr(0, firead.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (firead.getName()[firead.getName().length()-1] == '2')) { fread.setName(tempFRead); firead.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead f2iread(inFQ, tignore, pDataArray->format); fixed = false; if (fread.getName() == f2iread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = f2iread.getName().substr(0, f2iread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (f2iread.getName()[f2iread.getName().length()-1] == '2')) { fread.setName(tempFRead); f2iread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { firead = f2iread; findexBarcode.setAligned(f2iread.getSeq()); } } } if (thisrqualindexfile != "") { //reverse index file FastqRead riread(inRQ, tignore, pDataArray->format); if (tignore) { ignore=true; } rindexBarcode.setAligned(riread.getSeq()); ///bool fixed = checkName(fread, riread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == riread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = riread.getName().substr(0, riread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (riread.getName()[riread.getName().length()-1] == '2')) { fread.setName(tempFRead); riread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead r2iread(inRQ, tignore, pDataArray->format); fixed = false; if (fread.getName() == r2iread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = r2iread.getName().substr(0, r2iread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (r2iread.getName()[r2iread.getName().length()-1] == '2')) { fread.setName(tempFRead); r2iread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { riread = r2iread; rindexBarcode.setAligned(riread.getSeq()); } } } }else { //reading fasta and maybe qual Sequence tfSeq(inFF); Sequence trSeq(inRF); ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (tfSeq.getName() == trSeq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tfSeq.getName().substr(0, tfSeq.getName().length()-1); string tempRRead = trSeq.getName().substr(0, trSeq.getName().length()-1); if (tempFRead == tempRRead) { if ((tfSeq.getName()[tfSeq.getName().length()-1] == '1') && (trSeq.getName()[trSeq.getName().length()-1] == '2')) { tfSeq.setName(tempFRead); trSeq.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { Sequence tf2Seq(inFF); ///bool fixed = checkName(f2read, rread); ////////////////////////////////////////////////////////////// fixed = false; if (tf2Seq.getName() == trSeq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tf2Seq.getName().substr(0, tf2Seq.getName().length()-1); string tempRRead = trSeq.getName().substr(0, trSeq.getName().length()-1); if (tempFRead == tempRRead) { if ((tf2Seq.getName()[tf2Seq.getName().length()-1] == '1') && (trSeq.getName()[trSeq.getName().length()-1] == '2')) { tf2Seq.setName(tempFRead); trSeq.setName(tempRRead); fixed = true; } } } if (!fixed) { Sequence tr2Seq(inRF); ///bool fixed = checkName(fread, r2read); ////////////////////////////////////////////////////////////// fixed = false; if (tfSeq.getName() == tr2Seq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tfSeq.getName().substr(0, tfSeq.getName().length()-1); string tempRRead = tr2Seq.getName().substr(0, tr2Seq.getName().length()-1); if (tempFRead == tempRRead) { if ((tfSeq.getName()[tfSeq.getName().length()-1] == '1') && (tr2Seq.getName()[tr2Seq.getName().length()-1] == '2')) { tfSeq.setName(tempFRead); tr2Seq.setName(tempRRead); fixed = true; } } } if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } else { trSeq = tr2Seq; } }else { tfSeq = tf2Seq; } ///////////////////////////////////////////////////////////// } fSeq.setName(tfSeq.getName()); fSeq.setAligned(tfSeq.getAligned()); rSeq.setName(trSeq.getName()); rSeq.setAligned(trSeq.getAligned()); if (thisfqualindexfile != "") { fQual = new QualityScores(inFQ); rQual = new QualityScores(inRQ); if (fQual->getName() != rQual->getName()) { ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fQual->getName() == rQual->getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fQual->getName().substr(0, fQual->getName().length()-1); string tempRRead = rQual->getName().substr(0, rQual->getName().length()-1); if (tempFRead == tempRRead) { if ((fQual->getName()[fQual->getName().length()-1] == '1') && (rQual->getName()[rQual->getName().length()-1] == '2')) { fQual->setName(tempFRead); rQual->setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse qfile file. Ignoring, " + fQual->getName() + ".\n"); ignore = true; } } savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (fQual->getName() != tfSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward quality file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } if (rQual->getName() != trSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in reverse quality file. Ignoring, " + trSeq.getName() + ".\n"); ignore = true; } } if (tfSeq.getName() != trSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } } #endif } int barcodeIndex = 0; int primerIndex = 0; if (!ignore) { Sequence savedFSeq(fSeq.getName(), fSeq.getAligned()); Sequence savedRSeq(rSeq.getName(), rSeq.getAligned()); Sequence savedFindex(findexBarcode.getName(), findexBarcode.getAligned()); Sequence savedRIndex(rindexBarcode.getName(), rindexBarcode.getAligned()); if(numBarcodes != 0){ vector results; if (hasQuality) { if (hasIndex) { results = trimOligos.stripBarcode(findexBarcode, rindexBarcode, *fQual, *rQual, barcodeIndex); }else { results = trimOligos.stripBarcode(fSeq, rSeq, *fQual, *rQual, barcodeIndex); } }else { results = trimOligos.stripBarcode(fSeq, rSeq, barcodeIndex); } success = results[0] + results[2]; commentString += "fbdiffs=" + toString(results[0]) + "(" + trimOligos.getCodeValue(results[1], pDataArray->bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + trimOligos.getCodeValue(results[3], pDataArray->bdiffs) + ") "; if(success > pDataArray->bdiffs) { trashCode += 'b'; } else{ currentSeqsDiffs += success; } } if(numFPrimers != 0){ vector results; if (hasQuality) { results = trimOligos.stripForward(fSeq, rSeq, *fQual, *rQual, primerIndex); }else { results = trimOligos.stripForward(fSeq, rSeq, primerIndex); } success = results[0] + results[2]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos.getCodeValue(results[1], pDataArray->pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + trimOligos.getCodeValue(results[3], pDataArray->pdiffs) + ") "; if(success > pDataArray->pdiffs) { trashCode += 'f'; } else{ currentSeqsDiffs += success; } } if (currentSeqsDiffs > pDataArray->tdiffs) { trashCode += 't'; } if (pDataArray->reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; string thiscommentString = ""; int thisCurrentSeqsDiffs = 0; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; if(numBarcodes != 0){ vector results; if (hasQuality) { if (hasIndex) { results = rtrimOligos->stripBarcode(savedFindex, savedRIndex, *savedFQual, *savedRQual, thisBarcodeIndex); }else { results = rtrimOligos->stripBarcode(savedFSeq, savedRSeq, *savedFQual, *savedRQual, thisBarcodeIndex); } }else { results = rtrimOligos->stripBarcode(savedFSeq, savedRSeq, thisBarcodeIndex); } thisSuccess = results[0] + results[2]; thiscommentString += "fbdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pDataArray->bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pDataArray->bdiffs) + ") "; if(thisSuccess > pDataArray->bdiffs) { thisTrashCode += 'b'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if(numFPrimers != 0){ vector results; if (hasQuality) { results = rtrimOligos->stripForward(savedFSeq, savedRSeq, *savedFQual, *savedRQual, thisPrimerIndex); }else { results = rtrimOligos->stripForward(savedFSeq, savedRSeq, thisPrimerIndex); } thisSuccess = results[0] + results[2]; thiscommentString += "fpdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pDataArray->pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pDataArray->pdiffs) + ") "; if(thisSuccess > pDataArray->pdiffs) { thisTrashCode += 'f'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if (thisCurrentSeqsDiffs > pDataArray->tdiffs) { thisTrashCode += 't'; } if (thisTrashCode == "") { trashCode = thisTrashCode; success = thisSuccess; currentSeqsDiffs = thisCurrentSeqsDiffs; commentString = thiscommentString; barcodeIndex = thisBarcodeIndex; primerIndex = thisPrimerIndex; savedFSeq.reverseComplement(); savedRSeq.reverseComplement(); fSeq.setAligned(savedFSeq.getAligned()); rSeq.setAligned(savedRSeq.getAligned()); if(hasQuality){ savedFQual->flipQScores(); savedRQual->flipQScores(); fQual->setScores(savedFQual->getScores()); rQual->setScores(savedRQual->getScores()); } }else { trashCode += "(" + thisTrashCode + ")"; } } //flip the reverse reads rSeq.reverseComplement(); if (hasQuality) { rQual->flipQScores(); } //pairwise align alignment->align(fSeq.getUnaligned(), rSeq.getUnaligned()); map ABaseMap = alignment->getSeqAAlnBaseMap(); map BBaseMap = alignment->getSeqBAlnBaseMap(); fSeq.setAligned(alignment->getSeqAAln()); rSeq.setAligned(alignment->getSeqBAln()); int length = fSeq.getAligned().length(); //traverse alignments merging into one contiguous seq string contig = ""; int numMismatches = 0; string seq1 = fSeq.getAligned(); string seq2 = rSeq.getAligned(); vector scores1, scores2, contigScores; if (hasQuality) { scores1 = fQual->getQualityScores(); scores2 = rQual->getQualityScores(); delete fQual; delete rQual; delete savedFQual; delete savedRQual; } // if (num < 5) { cout << fSeq.getStartPos() << '\t' << fSeq.getEndPos() << '\t' << rSeq.getStartPos() << '\t' << rSeq.getEndPos() << endl; } int overlapStart = fSeq.getStartPos()-1; int seq2Start = rSeq.getStartPos()-1; //bigger of the 2 starting positions is the location of the overlapping start if (overlapStart < seq2Start) { //seq2 starts later so take from 0 to seq2Start from seq1 overlapStart = seq2Start; for (int i = 0; i < overlapStart; i++) { contig += seq1[i]; if (hasQuality) { if (((seq1[i] != '-') && (seq1[i] != '.'))) { contigScores.push_back(scores1[ABaseMap[i]]); } } } }else { //seq1 starts later so take from 0 to overlapStart from seq2 for (int i = 0; i < overlapStart; i++) { contig += seq2[i]; if (hasQuality) { if (((seq2[i] != '-') && (seq2[i] != '.'))) { contigScores.push_back(scores2[BBaseMap[i]]); } } } } int seq1End = fSeq.getEndPos(); int seq2End = rSeq.getEndPos(); int overlapEnd = seq1End; if (seq2End < overlapEnd) { overlapEnd = seq2End; } //smallest end position is where overlapping ends int firstForward = 0; int seq2FirstForward = 0; int lastReverse = seq1.length(); int seq2lastReverse = seq2.length(); bool firstChooseSeq1 = false; bool lastChooseSeq1 = false; if (hasQuality) { for (int i = 0; i < seq1.length(); i++) { if ((seq1[i] != '.') && (seq1[i] != '-')) { if (scores1[ABaseMap[i]] == 2) { firstForward++; }else { break; } } } for (int i = 0; i < seq2.length(); i++) { if ((seq2[i] != '.') && (seq2[i] != '-')) { if (scores2[BBaseMap[i]] == 2) { seq2FirstForward++; }else { break; } } } if (seq2FirstForward > firstForward) { firstForward = seq2FirstForward; firstChooseSeq1 = true; } for (int i = seq1.length()-1; i >= 0; i--) { if ((seq1[i] != '.') && (seq1[i] != '-')) { if (scores1[ABaseMap[i]] == 2) { lastReverse--; }else { break; } } } for (int i = seq2.length()-1; i >= 0; i--) { if ((seq2[i] != '.') && (seq2[i] != '-')) { if (scores2[BBaseMap[i]] == 2) { seq2lastReverse--; }else { break; } } } if (lastReverse > seq2lastReverse) { lastReverse = seq2lastReverse; lastChooseSeq1 = true; } } int oStart = contig.length(); //cout << fSeq.getAligned() << endl; cout << rSeq.getAligned() << endl; for (int i = overlapStart; i < overlapEnd; i++) { //cout << seq1[i] << ' ' << seq2[i] << ' ' << scores1[ABaseMap[i]] << ' ' << scores2[BBaseMap[i]] << endl; if (seq1[i] == seq2[i]) { //match, add base and choose highest score contig += seq1[i]; if (hasQuality) { //contigScores.push_back(convertProb(qual_match_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])])); ///////////////////////////////////////////////////////////// int qualScore = 1; double qProb = qual_match_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])]; int lower = 0; int upper = 46; if (qProb < qual_score[0]) { qualScore = 1; } else { while (lower < upper) { int mid = lower + (upper - lower) / 2; if (qual_score[mid] == qProb) { qualScore = mid; lower = upper; } if (mid == lower) { qualScore = lower; lower = upper; } else if (qual_score[mid] > qProb) { upper = mid; } else if (qual_score[mid] < qProb) { lower = mid + 1; } } } qualScore = lower; contigScores.push_back(qualScore); //////////////////////////////////////////////////////////// } }else if (((seq1[i] == '.') || (seq1[i] == '-')) && ((seq2[i] != '-') && (seq2[i] != '.'))) { //seq1 is a gap and seq2 is a base, choose seq2, unless quality score for base is below insert. In that case eliminate base if (hasQuality) { if (scores2[BBaseMap[i]] <= pDataArray->insert) { } // else { contig += seq2[i]; contigScores.push_back(scores2[BBaseMap[i]]); } }else { contig += seq2[i]; } //with no quality info, then we keep it? }else if (((seq2[i] == '.') || (seq2[i] == '-')) && ((seq1[i] != '-') && (seq1[i] != '.'))) { //seq2 is a gap and seq1 is a base, choose seq1, unless quality score for base is below insert. In that case eliminate base if (hasQuality) { if (scores1[ABaseMap[i]] <= pDataArray->insert) { } // else { contig += seq1[i]; contigScores.push_back(scores1[ABaseMap[i]]); } }else { contig += seq1[i]; } //with no quality info, then we keep it? }else if (((seq1[i] != '-') && (seq1[i] != '.')) && ((seq2[i] != '-') && (seq2[i] != '.'))) { //both bases choose one with better quality if (hasQuality) { if (abs(scores1[ABaseMap[i]] - scores2[BBaseMap[i]]) >= pDataArray->deltaq) { //is the difference in qual scores >= deltaq, if yes choose base with higher score char c = seq1[i]; if (scores1[ABaseMap[i]] < scores2[BBaseMap[i]]) { c = seq2[i]; } contig += c; if ((i >= firstForward) && (i <= lastReverse)) { //in unmasked section //contigScores.push_back(convertProb(qual_mismatch_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])])); ///////////////////////////////////////////////////////////// int qualScore = 1; double qProb = qual_mismatch_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])]; int lower = 0; int upper = 46; if (qProb < qual_score[0]) { qualScore = 1; } else { while (lower < upper) { int mid = lower + (upper - lower) / 2; if (qual_score[mid] == qProb) { qualScore = mid; lower = upper; } if (mid == lower) { qualScore = lower; lower = upper; } else if (qual_score[mid] > qProb) { upper = mid; } else if (qual_score[mid] < qProb) { lower = mid + 1; } } } qualScore = lower; contigScores.push_back(qualScore); //////////////////////////////////////////////////////////// }else if (i < firstForward) { if (firstChooseSeq1) { contigScores.push_back(scores1[ABaseMap[i]]); } else { contigScores.push_back(scores2[BBaseMap[i]]); } }else if ((i > lastReverse)) { if (lastChooseSeq1) { contigScores.push_back(scores1[ABaseMap[i]]); } else { contigScores.push_back(scores2[BBaseMap[i]]); } }else { contigScores.push_back(2); } //N }else { //if no, base becomes n contig += 'N'; contigScores.push_back(2); } numMismatches++; }else { numMismatches++; } //cant decide, so eliminate and mark as mismatch }else { //should never get here pDataArray->m->mothurOut("[ERROR]: case I didn't think of seq1 = " + toString(seq1[i]) + " and seq2 = " + toString(seq2[i]) + "\n"); } } int oend = contig.length(); if (seq1End < seq2End) { //seq1 ends before seq2 so take from overlap to length from seq2 for (int i = overlapEnd; i < length; i++) { contig += seq2[i]; if (hasQuality) { if (((seq2[i] != '-') && (seq2[i] != '.'))) { contigScores.push_back(scores2[BBaseMap[i]]); } } } }else { //seq2 ends before seq1 so take from overlap to length from seq1 for (int i = overlapEnd; i < length; i++) { contig += seq1[i]; if (hasQuality) { if (((seq1[i] != '-') && (seq1[i] != '.'))) { contigScores.push_back(scores1[ABaseMap[i]]); } } } } //cout << contig << endl; //exit(1); if (pDataArray->trimOverlap) { contig = contig.substr(overlapStart, oend-oStart); if (contig.length() == 0) { trashCode += "l"; } if (hasQuality) { vector newContigScores; for (int i = overlapStart; i < oend; i++) { newContigScores.push_back(contigScores[i]); } contigScores = newContigScores; } } if(trashCode.length() == 0){ bool ignore = false; if (pDataArray->m->debug) { pDataArray->m->mothurOut(fSeq.getName()); } if (pDataArray->createOligosGroup) { string thisGroup = oligos.getGroupName(barcodeIndex, primerIndex); if (pDataArray->m->debug) { pDataArray->m->mothurOut(", group= " + thisGroup + "\n"); } int pos = thisGroup.find("ignore"); if (pos == string::npos) { pDataArray->groupMap[fSeq.getName()] = thisGroup; map::iterator it = pDataArray->groupCounts.find(thisGroup); if (it == pDataArray->groupCounts.end()) { pDataArray->groupCounts[thisGroup] = 1; } else { pDataArray->groupCounts[it->first] ++; } }else { ignore = true; } }else if (pDataArray->createFileGroup) { //for 3 column file option int pos = pDataArray->group.find("ignore"); if (pos == string::npos) { pDataArray->groupMap[fSeq.getName()] = pDataArray->group; map::iterator it = pDataArray->groupCounts.find(pDataArray->group); if (it == pDataArray->groupCounts.end()) { pDataArray->groupCounts[pDataArray->group] = 1; } else { pDataArray->groupCounts[it->first] ++; } }else { ignore = true; } } if (pDataArray->m->debug) { pDataArray->m->mothurOut("\n"); } if(!ignore){ //output outFasta << ">" << fSeq.getName() << '\t' << commentString << endl << contig << endl; if (hasQuality) { outQual << ">" << fSeq.getName() << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { outQual << contigScores[i] << " "; } outQual << endl; } int numNs = 0; for (int i = 0; i < contig.length(); i++) { if (contig[i] == 'N') { numNs++; } } outMisMatch << fSeq.getName() << '\t' << contig.length() << '\t' << (oend-oStart) << '\t' << oStart << '\t' << oend << '\t' << numMismatches << '\t' << numNs << endl; if (pDataArray->allFiles) { ofstream output; pDataArray->m->openOutputFileAppend(pDataArray->fastaFileNames[barcodeIndex][primerIndex], output); output << ">" << fSeq.getName() << '\t' << commentString << endl << contig << endl; output.close(); if (hasQuality) { ofstream output2; pDataArray->m->openOutputFileAppend(pDataArray->qualFileNames[barcodeIndex][primerIndex], output2); output2 << ">" << fSeq.getName() << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { output2 << contigScores[i] << " "; } output2 << endl; output2.close(); } } } }else { //output outScrapFasta << ">" << fSeq.getName() << " | " << trashCode << '\t' << commentString << endl << contig << endl; if (hasQuality) { outScrapQual << ">" << fSeq.getName() << " | " << trashCode << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { outScrapQual << contigScores[i] << " "; } outScrapQual << endl; } } } if (!pDataArray->gz) { if ((inFFasta.eof()) || (inRFasta.eof())) { good = false; break; } }else { #ifdef USE_BOOST if (inFF.eof() || inRF.eof()) { good = false; break; } #endif } thisNumReads++; //report progress if((thisNumReads) % 1000 == 0){ pDataArray->m->mothurOutJustToScreen(toString(thisNumReads)+"\n"); } } //report progress if((thisNumReads) % 1000 != 0){ pDataArray->m->mothurOutJustToScreen(toString(thisNumReads)+"\n"); } inFFasta.close(); inRFasta.close(); if (pDataArray->gz) { #ifdef USE_BOOST inFF.pop(); inRF.pop(); #endif } outFasta.close(); outScrapFasta.close(); outMisMatch.close(); if (pDataArray->delim == '@') { if (thisfqualindexfile != "") { inFQualIndex.close(); if (pDataArray->gz) { #ifdef USE_BOOST inFQ.pop(); #endif } } if (thisrqualindexfile != "") { inRQualIndex.close(); if (pDataArray->gz) { #ifdef USE_BOOST inRQ.pop(); #endif } } outQual.close(); outScrapQual.close(); }else{ if (hasQuality) { inFQualIndex.close(); inRQualIndex.close(); if (pDataArray->gz) { #ifdef USE_BOOST inFQ.pop(); inRQ.pop(); #endif } outQual.close(); outScrapQual.close(); } } delete alignment; if (pDataArray->reorient) { delete rtrimOligos; } pDataArray->done = true; if (pDataArray->m->control_pressed) { pDataArray->m->mothurRemove(pDataArray->outputFasta); pDataArray->m->mothurRemove(pDataArray->outputMisMatches); pDataArray->m->mothurRemove(pDataArray->outputScrapFasta); if (hasQuality) { pDataArray->m->mothurRemove(pDataArray->outputQual); pDataArray->m->mothurRemove(pDataArray->outputScrapQual); } } } /////////////////////////////////////////////////////////////////////////////////////// numReads += thisNumReads; pDataArray->m->mothurOut("Done.\n"); if (pDataArray->m->control_pressed) { for (int i = 0; i < pDataArray->outputNames.size(); i++) { pDataArray->m->mothurRemove(pDataArray->outputNames[i]); } return 0; } if(pDataArray->allFiles){ // so we don't add the same groupfile multiple times map::iterator it; set namesToRemove; for(int i=0;ifastaFileNames.size();i++){ for(int j=0;jfastaFileNames[0].size();j++){ if (pDataArray->fastaFileNames[i][j] != "") { if (namesToRemove.count(pDataArray->fastaFileNames[i][j]) == 0) { if(pDataArray->m->isBlank(pDataArray->fastaFileNames[i][j])){ pDataArray->m->mothurRemove(pDataArray->fastaFileNames[i][j]); namesToRemove.insert(pDataArray->fastaFileNames[i][j]); uniqueFastaNames.erase(pDataArray->fastaFileNames[i][j]); //remove from list for group file print pDataArray->m->mothurRemove(pDataArray->qualFileNames[i][j]); namesToRemove.insert(pDataArray->qualFileNames[i][j]); } } } } } //remove names for outputFileNames, just cleans up the output vector outputNames2; for(int i = 0; i < pDataArray->outputNames.size(); i++) { if (namesToRemove.count(pDataArray->outputNames[i]) == 0) { outputNames2.push_back(pDataArray->outputNames[i]); } } pDataArray->outputNames = outputNames2; for (it = uniqueFastaNames.begin(); it != uniqueFastaNames.end(); it++) { ifstream in; pDataArray->m->openInputFile(it->first, in); ofstream out; string thisroot = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(it->first)); string thisGroupName = thisroot + ".group"; pDataArray->outputNames.push_back(thisGroupName); pDataArray->m->openOutputFile(thisGroupName, out); while (!in.eof()){ if (pDataArray->m->control_pressed) { break; } Sequence currSeq(in); pDataArray->m->gobble(in); out << currSeq.getName() << '\t' << it->second << endl; } out.close(); in.close(); } } //append to combo files if (pDataArray->createFileGroup || pDataArray->createOligosGroup) { ofstream outCGroup; if (l == 0) { pDataArray->m->openOutputFile(pDataArray->compositeGroupFile, outCGroup); pDataArray->outputNames.push_back(pDataArray->compositeGroupFile); } else { pDataArray->m->openOutputFileAppend(pDataArray->compositeGroupFile, outCGroup); } if (!pDataArray->allFiles) { pDataArray->m->mothurRemove(outputGroupFileName); }else { ofstream outGroup; pDataArray->m->openOutputFile(outputGroupFileName, outGroup); for (map::iterator itGroup = pDataArray->groupMap.begin(); itGroup != pDataArray->groupMap.end(); itGroup++) { outCGroup << itGroup->first << '\t' << itGroup->second << endl; outGroup << itGroup->first << '\t' << itGroup->second << endl; } outGroup.close(); } outCGroup.close(); for (map::iterator itGroups = pDataArray->groupCounts.begin(); itGroups != pDataArray->groupCounts.end(); itGroups++) { map::iterator itTemp = pDataArray->totalGroupCounts.find(itGroups->first); if (itTemp == pDataArray->totalGroupCounts.end()) { pDataArray->totalGroupCounts[itGroups->first] = itGroups->second; } //new group create it in totalGroups else { itTemp->second += itGroups->second; } //existing group, update total } } if (l == 0) { pDataArray->m->appendFiles(pDataArray->outputMisMatches, pDataArray->compositeMisMatchFile); } else { pDataArray->m->appendFilesWithoutHeaders(pDataArray->outputMisMatches, pDataArray->compositeMisMatchFile); } pDataArray->m->appendFiles(pDataArray->outputFasta, pDataArray->compositeFastaFile); pDataArray->m->appendFiles(pDataArray->outputScrapFasta, pDataArray->compositeScrapFastaFile); pDataArray->m->appendFiles(pDataArray->outputQual, pDataArray->compositeQualFile); pDataArray->m->appendFiles(pDataArray->outputScrapQual, pDataArray->compositeScrapQualFile); if (!pDataArray->allFiles) { pDataArray->m->mothurRemove(pDataArray->outputMisMatches); pDataArray->m->mothurRemove(pDataArray->outputFasta); pDataArray->m->mothurRemove(pDataArray->outputScrapFasta); pDataArray->m->mothurRemove(pDataArray->outputQual); pDataArray->m->mothurRemove(pDataArray->outputScrapQual); }else { pDataArray->outputNames.push_back(pDataArray->outputFasta); pDataArray->outputNames.push_back(pDataArray->outputScrapFasta); pDataArray->outputNames.push_back(pDataArray->outputQual); pDataArray->outputNames.push_back(pDataArray->outputScrapQual); pDataArray->outputNames.push_back(pDataArray->outputMisMatches); } pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("It took " + toString(time(NULL) - startTime) + " secs to assemble " + toString(thisNumReads) + " reads.\n"); pDataArray->m->mothurOutEndLine(); } pDataArray->count = numReads; return numReads; } catch(exception& e) { pDataArray->m->errorOut(e, "MakeContigsCommand", "driverGroups"); exit(1); } } //********************************************************************************************************************** static DWORD WINAPI MyContigsThreadFunction(LPVOID lpParam){ contigsData* pDataArray; pDataArray = (contigsData*)lpParam; try { vector< vector > qual_match_simple_bayesian; qual_match_simple_bayesian.resize(47); for (int i = 0; i < qual_match_simple_bayesian.size(); i++) { qual_match_simple_bayesian[i].resize(47); } qual_match_simple_bayesian[0][0] = -1.09861; qual_match_simple_bayesian[0][1] = -1.32887; qual_match_simple_bayesian[0][2] = -1.55913; qual_match_simple_bayesian[0][3] = -1.78939; qual_match_simple_bayesian[0][4] = -2.01965; qual_match_simple_bayesian[0][5] = -2.2499; qual_match_simple_bayesian[0][6] = -2.48016; qual_match_simple_bayesian[0][7] = -2.71042; qual_match_simple_bayesian[0][8] = -2.94068; qual_match_simple_bayesian[0][9] = -3.17094; qual_match_simple_bayesian[0][10] = -3.4012; qual_match_simple_bayesian[0][11] = -3.63146; qual_match_simple_bayesian[0][12] = -3.86171; qual_match_simple_bayesian[0][13] = -4.09197; qual_match_simple_bayesian[0][14] = -4.32223; qual_match_simple_bayesian[0][15] = -4.55249; qual_match_simple_bayesian[0][16] = -4.78275; qual_match_simple_bayesian[0][17] = -5.01301; qual_match_simple_bayesian[0][18] = -5.24327; qual_match_simple_bayesian[0][19] = -5.47352; qual_match_simple_bayesian[0][20] = -5.70378; qual_match_simple_bayesian[0][21] = -5.93404; qual_match_simple_bayesian[0][22] = -6.1643; qual_match_simple_bayesian[0][23] = -6.39456; qual_match_simple_bayesian[0][24] = -6.62482; qual_match_simple_bayesian[0][25] = -6.85508; qual_match_simple_bayesian[0][26] = -7.08533; qual_match_simple_bayesian[0][27] = -7.31559; qual_match_simple_bayesian[0][28] = -7.54585; qual_match_simple_bayesian[0][29] = -7.77611; qual_match_simple_bayesian[0][30] = -8.00637; qual_match_simple_bayesian[0][31] = -8.23663; qual_match_simple_bayesian[0][32] = -8.46688; qual_match_simple_bayesian[0][33] = -8.69714; qual_match_simple_bayesian[0][34] = -8.9274; qual_match_simple_bayesian[0][35] = -9.15766; qual_match_simple_bayesian[0][36] = -9.38792; qual_match_simple_bayesian[0][37] = -9.61818; qual_match_simple_bayesian[0][38] = -9.84844; qual_match_simple_bayesian[0][39] = -10.0787; qual_match_simple_bayesian[0][40] = -10.309; qual_match_simple_bayesian[0][41] = -10.5392; qual_match_simple_bayesian[0][42] = -10.7695; qual_match_simple_bayesian[0][43] = -10.9997; qual_match_simple_bayesian[0][44] = -11.23; qual_match_simple_bayesian[0][45] = -11.4602; qual_match_simple_bayesian[0][46] = -11.6905; qual_match_simple_bayesian[1][0] = -1.32887; qual_match_simple_bayesian[1][1] = -1.37587; qual_match_simple_bayesian[1][2] = -1.41484; qual_match_simple_bayesian[1][3] = -1.44692; qual_match_simple_bayesian[1][4] = -1.47315; qual_match_simple_bayesian[1][5] = -1.49449; qual_match_simple_bayesian[1][6] = -1.51178; qual_match_simple_bayesian[1][7] = -1.52572; qual_match_simple_bayesian[1][8] = -1.53694; qual_match_simple_bayesian[1][9] = -1.54593; qual_match_simple_bayesian[1][10] = -1.55314; qual_match_simple_bayesian[1][11] = -1.5589; qual_match_simple_bayesian[1][12] = -1.5635; qual_match_simple_bayesian[1][13] = -1.56717; qual_match_simple_bayesian[1][14] = -1.5701; qual_match_simple_bayesian[1][15] = -1.57243; qual_match_simple_bayesian[1][16] = -1.57428; qual_match_simple_bayesian[1][17] = -1.57576; qual_match_simple_bayesian[1][18] = -1.57693; qual_match_simple_bayesian[1][19] = -1.57786; qual_match_simple_bayesian[1][20] = -1.5786; qual_match_simple_bayesian[1][21] = -1.57919; qual_match_simple_bayesian[1][22] = -1.57966; qual_match_simple_bayesian[1][23] = -1.58003; qual_match_simple_bayesian[1][24] = -1.58033; qual_match_simple_bayesian[1][25] = -1.58057; qual_match_simple_bayesian[1][26] = -1.58075; qual_match_simple_bayesian[1][27] = -1.5809; qual_match_simple_bayesian[1][28] = -1.58102; qual_match_simple_bayesian[1][29] = -1.58111; qual_match_simple_bayesian[1][30] = -1.58119; qual_match_simple_bayesian[1][31] = -1.58125; qual_match_simple_bayesian[1][32] = -1.58129; qual_match_simple_bayesian[1][33] = -1.58133; qual_match_simple_bayesian[1][34] = -1.58136; qual_match_simple_bayesian[1][35] = -1.58138; qual_match_simple_bayesian[1][36] = -1.5814; qual_match_simple_bayesian[1][37] = -1.58142; qual_match_simple_bayesian[1][38] = -1.58143; qual_match_simple_bayesian[1][39] = -1.58144; qual_match_simple_bayesian[1][40] = -1.58145; qual_match_simple_bayesian[1][41] = -1.58145; qual_match_simple_bayesian[1][42] = -1.58146; qual_match_simple_bayesian[1][43] = -1.58146; qual_match_simple_bayesian[1][44] = -1.58146; qual_match_simple_bayesian[1][45] = -1.58146; qual_match_simple_bayesian[1][46] = -1.58147; qual_match_simple_bayesian[2][0] = -1.55913; qual_match_simple_bayesian[2][1] = -1.41484; qual_match_simple_bayesian[2][2] = -1.31343; qual_match_simple_bayesian[2][3] = -1.23963; qual_match_simple_bayesian[2][4] = -1.18465; qual_match_simple_bayesian[2][5] = -1.14303; qual_match_simple_bayesian[2][6] = -1.11117; qual_match_simple_bayesian[2][7] = -1.08657; qual_match_simple_bayesian[2][8] = -1.06744; qual_match_simple_bayesian[2][9] = -1.05251; qual_match_simple_bayesian[2][10] = -1.0408; qual_match_simple_bayesian[2][11] = -1.0316; qual_match_simple_bayesian[2][12] = -1.02436; qual_match_simple_bayesian[2][13] = -1.01863; qual_match_simple_bayesian[2][14] = -1.01411; qual_match_simple_bayesian[2][15] = -1.01054; qual_match_simple_bayesian[2][16] = -1.00771; qual_match_simple_bayesian[2][17] = -1.00546; qual_match_simple_bayesian[2][18] = -1.00368; qual_match_simple_bayesian[2][19] = -1.00227; qual_match_simple_bayesian[2][20] = -1.00115; qual_match_simple_bayesian[2][21] = -1.00027; qual_match_simple_bayesian[2][22] = -0.99956; qual_match_simple_bayesian[2][23] = -0.999001; qual_match_simple_bayesian[2][24] = -0.998557; qual_match_simple_bayesian[2][25] = -0.998204; qual_match_simple_bayesian[2][26] = -0.997924; qual_match_simple_bayesian[2][27] = -0.997702; qual_match_simple_bayesian[2][28] = -0.997525; qual_match_simple_bayesian[2][29] = -0.997385; qual_match_simple_bayesian[2][30] = -0.997273; qual_match_simple_bayesian[2][31] = -0.997185; qual_match_simple_bayesian[2][32] = -0.997114; qual_match_simple_bayesian[2][33] = -0.997059; qual_match_simple_bayesian[2][34] = -0.997014; qual_match_simple_bayesian[2][35] = -0.996979; qual_match_simple_bayesian[2][36] = -0.996951; qual_match_simple_bayesian[2][37] = -0.996929; qual_match_simple_bayesian[2][38] = -0.996911; qual_match_simple_bayesian[2][39] = -0.996897; qual_match_simple_bayesian[2][40] = -0.996886; qual_match_simple_bayesian[2][41] = -0.996877; qual_match_simple_bayesian[2][42] = -0.99687; qual_match_simple_bayesian[2][43] = -0.996865; qual_match_simple_bayesian[2][44] = -0.99686; qual_match_simple_bayesian[2][45] = -0.996857; qual_match_simple_bayesian[2][46] = -0.996854; qual_match_simple_bayesian[3][0] = -1.78939; qual_match_simple_bayesian[3][1] = -1.44692; qual_match_simple_bayesian[3][2] = -1.23963; qual_match_simple_bayesian[3][3] = -1.10098; qual_match_simple_bayesian[3][4] = -1.0031; qual_match_simple_bayesian[3][5] = -0.931648; qual_match_simple_bayesian[3][6] = -0.878319; qual_match_simple_bayesian[3][7] = -0.837896; qual_match_simple_bayesian[3][8] = -0.806912; qual_match_simple_bayesian[3][9] = -0.782967; qual_match_simple_bayesian[3][10] = -0.764347; qual_match_simple_bayesian[3][11] = -0.7498; qual_match_simple_bayesian[3][12] = -0.738394; qual_match_simple_bayesian[3][13] = -0.729426; qual_match_simple_bayesian[3][14] = -0.722359; qual_match_simple_bayesian[3][15] = -0.71678; qual_match_simple_bayesian[3][16] = -0.712372; qual_match_simple_bayesian[3][17] = -0.708883; qual_match_simple_bayesian[3][18] = -0.706121; qual_match_simple_bayesian[3][19] = -0.703933; qual_match_simple_bayesian[3][20] = -0.702197; qual_match_simple_bayesian[3][21] = -0.700821; qual_match_simple_bayesian[3][22] = -0.69973; qual_match_simple_bayesian[3][23] = -0.698863; qual_match_simple_bayesian[3][24] = -0.698176; qual_match_simple_bayesian[3][25] = -0.69763; qual_match_simple_bayesian[3][26] = -0.697196; qual_match_simple_bayesian[3][27] = -0.696852; qual_match_simple_bayesian[3][28] = -0.696579; qual_match_simple_bayesian[3][29] = -0.696362; qual_match_simple_bayesian[3][30] = -0.69619; qual_match_simple_bayesian[3][31] = -0.696053; qual_match_simple_bayesian[3][32] = -0.695944; qual_match_simple_bayesian[3][33] = -0.695858; qual_match_simple_bayesian[3][34] = -0.695789; qual_match_simple_bayesian[3][35] = -0.695735; qual_match_simple_bayesian[3][36] = -0.695692; qual_match_simple_bayesian[3][37] = -0.695657; qual_match_simple_bayesian[3][38] = -0.69563; qual_match_simple_bayesian[3][39] = -0.695608; qual_match_simple_bayesian[3][40] = -0.695591; qual_match_simple_bayesian[3][41] = -0.695577; qual_match_simple_bayesian[3][42] = -0.695566; qual_match_simple_bayesian[3][43] = -0.695558; qual_match_simple_bayesian[3][44] = -0.695551; qual_match_simple_bayesian[3][45] = -0.695546; qual_match_simple_bayesian[3][46] = -0.695541; qual_match_simple_bayesian[4][0] = -2.01965; qual_match_simple_bayesian[4][1] = -1.47315; qual_match_simple_bayesian[4][2] = -1.18465; qual_match_simple_bayesian[4][3] = -1.0031; qual_match_simple_bayesian[4][4] = -0.879224; qual_match_simple_bayesian[4][5] = -0.790712; qual_match_simple_bayesian[4][6] = -0.725593; qual_match_simple_bayesian[4][7] = -0.676729; qual_match_simple_bayesian[4][8] = -0.639547; qual_match_simple_bayesian[4][9] = -0.610968; qual_match_simple_bayesian[4][10] = -0.588834; qual_match_simple_bayesian[4][11] = -0.571596; qual_match_simple_bayesian[4][12] = -0.558111; qual_match_simple_bayesian[4][13] = -0.547528; qual_match_simple_bayesian[4][14] = -0.539201; qual_match_simple_bayesian[4][15] = -0.532636; qual_match_simple_bayesian[4][16] = -0.527451; qual_match_simple_bayesian[4][17] = -0.523352; qual_match_simple_bayesian[4][18] = -0.520107; qual_match_simple_bayesian[4][19] = -0.517538; qual_match_simple_bayesian[4][20] = -0.515502; qual_match_simple_bayesian[4][21] = -0.513887; qual_match_simple_bayesian[4][22] = -0.512606; qual_match_simple_bayesian[4][23] = -0.51159; qual_match_simple_bayesian[4][24] = -0.510784; qual_match_simple_bayesian[4][25] = -0.510144; qual_match_simple_bayesian[4][26] = -0.509636; qual_match_simple_bayesian[4][27] = -0.509232; qual_match_simple_bayesian[4][28] = -0.508912; qual_match_simple_bayesian[4][29] = -0.508658; qual_match_simple_bayesian[4][30] = -0.508456; qual_match_simple_bayesian[4][31] = -0.508295; qual_match_simple_bayesian[4][32] = -0.508168; qual_match_simple_bayesian[4][33] = -0.508067; qual_match_simple_bayesian[4][34] = -0.507986; qual_match_simple_bayesian[4][35] = -0.507922; qual_match_simple_bayesian[4][36] = -0.507872; qual_match_simple_bayesian[4][37] = -0.507831; qual_match_simple_bayesian[4][38] = -0.507799; qual_match_simple_bayesian[4][39] = -0.507774; qual_match_simple_bayesian[4][40] = -0.507754; qual_match_simple_bayesian[4][41] = -0.507738; qual_match_simple_bayesian[4][42] = -0.507725; qual_match_simple_bayesian[4][43] = -0.507715; qual_match_simple_bayesian[4][44] = -0.507707; qual_match_simple_bayesian[4][45] = -0.507701; qual_match_simple_bayesian[4][46] = -0.507695; qual_match_simple_bayesian[5][0] = -2.2499; qual_match_simple_bayesian[5][1] = -1.49449; qual_match_simple_bayesian[5][2] = -1.14303; qual_match_simple_bayesian[5][3] = -0.931648; qual_match_simple_bayesian[5][4] = -0.790712; qual_match_simple_bayesian[5][5] = -0.691393; qual_match_simple_bayesian[5][6] = -0.618979; qual_match_simple_bayesian[5][7] = -0.564976; qual_match_simple_bayesian[5][8] = -0.524066; qual_match_simple_bayesian[5][9] = -0.492723; qual_match_simple_bayesian[5][10] = -0.468507; qual_match_simple_bayesian[5][11] = -0.449682; qual_match_simple_bayesian[5][12] = -0.434976; qual_match_simple_bayesian[5][13] = -0.423448; qual_match_simple_bayesian[5][14] = -0.414384; qual_match_simple_bayesian[5][15] = -0.407243; qual_match_simple_bayesian[5][16] = -0.401606; qual_match_simple_bayesian[5][17] = -0.397151; qual_match_simple_bayesian[5][18] = -0.393627; qual_match_simple_bayesian[5][19] = -0.390836; qual_match_simple_bayesian[5][20] = -0.388625; qual_match_simple_bayesian[5][21] = -0.386872; qual_match_simple_bayesian[5][22] = -0.385482; qual_match_simple_bayesian[5][23] = -0.384379; qual_match_simple_bayesian[5][24] = -0.383503; qual_match_simple_bayesian[5][25] = -0.382809; qual_match_simple_bayesian[5][26] = -0.382257; qual_match_simple_bayesian[5][27] = -0.38182; qual_match_simple_bayesian[5][28] = -0.381472; qual_match_simple_bayesian[5][29] = -0.381196; qual_match_simple_bayesian[5][30] = -0.380977; qual_match_simple_bayesian[5][31] = -0.380803; qual_match_simple_bayesian[5][32] = -0.380664; qual_match_simple_bayesian[5][33] = -0.380554; qual_match_simple_bayesian[5][34] = -0.380467; qual_match_simple_bayesian[5][35] = -0.380398; qual_match_simple_bayesian[5][36] = -0.380343; qual_match_simple_bayesian[5][37] = -0.380299; qual_match_simple_bayesian[5][38] = -0.380264; qual_match_simple_bayesian[5][39] = -0.380237; qual_match_simple_bayesian[5][40] = -0.380215; qual_match_simple_bayesian[5][41] = -0.380198; qual_match_simple_bayesian[5][42] = -0.380184; qual_match_simple_bayesian[5][43] = -0.380173; qual_match_simple_bayesian[5][44] = -0.380164; qual_match_simple_bayesian[5][45] = -0.380157; qual_match_simple_bayesian[5][46] = -0.380152; qual_match_simple_bayesian[6][0] = -2.48016; qual_match_simple_bayesian[6][1] = -1.51178; qual_match_simple_bayesian[6][2] = -1.11117; qual_match_simple_bayesian[6][3] = -0.878319; qual_match_simple_bayesian[6][4] = -0.725593; qual_match_simple_bayesian[6][5] = -0.618979; qual_match_simple_bayesian[6][6] = -0.541714; qual_match_simple_bayesian[6][7] = -0.48433; qual_match_simple_bayesian[6][8] = -0.440984; qual_match_simple_bayesian[6][9] = -0.407844; qual_match_simple_bayesian[6][10] = -0.382281; qual_match_simple_bayesian[6][11] = -0.362431; qual_match_simple_bayesian[6][12] = -0.34694; qual_match_simple_bayesian[6][13] = -0.334804; qual_match_simple_bayesian[6][14] = -0.325268; qual_match_simple_bayesian[6][15] = -0.317757; qual_match_simple_bayesian[6][16] = -0.311831; qual_match_simple_bayesian[6][17] = -0.307149; qual_match_simple_bayesian[6][18] = -0.303445; qual_match_simple_bayesian[6][19] = -0.300513; qual_match_simple_bayesian[6][20] = -0.29819; qual_match_simple_bayesian[6][21] = -0.296348; qual_match_simple_bayesian[6][22] = -0.294888; qual_match_simple_bayesian[6][23] = -0.29373; qual_match_simple_bayesian[6][24] = -0.29281; qual_match_simple_bayesian[6][25] = -0.292081; qual_match_simple_bayesian[6][26] = -0.291502; qual_match_simple_bayesian[6][27] = -0.291042; qual_match_simple_bayesian[6][28] = -0.290677; qual_match_simple_bayesian[6][29] = -0.290387; qual_match_simple_bayesian[6][30] = -0.290157; qual_match_simple_bayesian[6][31] = -0.289974; qual_match_simple_bayesian[6][32] = -0.289829; qual_match_simple_bayesian[6][33] = -0.289713; qual_match_simple_bayesian[6][34] = -0.289622; qual_match_simple_bayesian[6][35] = -0.289549; qual_match_simple_bayesian[6][36] = -0.289491; qual_match_simple_bayesian[6][37] = -0.289445; qual_match_simple_bayesian[6][38] = -0.289409; qual_match_simple_bayesian[6][39] = -0.28938; qual_match_simple_bayesian[6][40] = -0.289357; qual_match_simple_bayesian[6][41] = -0.289339; qual_match_simple_bayesian[6][42] = -0.289324; qual_match_simple_bayesian[6][43] = -0.289313; qual_match_simple_bayesian[6][44] = -0.289304; qual_match_simple_bayesian[6][45] = -0.289296; qual_match_simple_bayesian[6][46] = -0.28929; qual_match_simple_bayesian[7][0] = -2.71042; qual_match_simple_bayesian[7][1] = -1.52572; qual_match_simple_bayesian[7][2] = -1.08657; qual_match_simple_bayesian[7][3] = -0.837896; qual_match_simple_bayesian[7][4] = -0.676729; qual_match_simple_bayesian[7][5] = -0.564976; qual_match_simple_bayesian[7][6] = -0.48433; qual_match_simple_bayesian[7][7] = -0.424604; qual_match_simple_bayesian[7][8] = -0.379581; qual_match_simple_bayesian[7][9] = -0.345208; qual_match_simple_bayesian[7][10] = -0.318723; qual_match_simple_bayesian[7][11] = -0.298173; qual_match_simple_bayesian[7][12] = -0.282146; qual_match_simple_bayesian[7][13] = -0.269595; qual_match_simple_bayesian[7][14] = -0.259737; qual_match_simple_bayesian[7][15] = -0.251976; qual_match_simple_bayesian[7][16] = -0.245853; qual_match_simple_bayesian[7][17] = -0.241016; qual_match_simple_bayesian[7][18] = -0.23719; qual_match_simple_bayesian[7][19] = -0.234162; qual_match_simple_bayesian[7][20] = -0.231763; qual_match_simple_bayesian[7][21] = -0.229861; qual_match_simple_bayesian[7][22] = -0.228354; qual_match_simple_bayesian[7][23] = -0.227158; qual_match_simple_bayesian[7][24] = -0.226208; qual_match_simple_bayesian[7][25] = -0.225455; qual_match_simple_bayesian[7][26] = -0.224857; qual_match_simple_bayesian[7][27] = -0.224383; qual_match_simple_bayesian[7][28] = -0.224006; qual_match_simple_bayesian[7][29] = -0.223707; qual_match_simple_bayesian[7][30] = -0.223469; qual_match_simple_bayesian[7][31] = -0.22328; qual_match_simple_bayesian[7][32] = -0.22313; qual_match_simple_bayesian[7][33] = -0.223011; qual_match_simple_bayesian[7][34] = -0.222917; qual_match_simple_bayesian[7][35] = -0.222842; qual_match_simple_bayesian[7][36] = -0.222782; qual_match_simple_bayesian[7][37] = -0.222734; qual_match_simple_bayesian[7][38] = -0.222697; qual_match_simple_bayesian[7][39] = -0.222667; qual_match_simple_bayesian[7][40] = -0.222643; qual_match_simple_bayesian[7][41] = -0.222624; qual_match_simple_bayesian[7][42] = -0.222609; qual_match_simple_bayesian[7][43] = -0.222597; qual_match_simple_bayesian[7][44] = -0.222588; qual_match_simple_bayesian[7][45] = -0.222581; qual_match_simple_bayesian[7][46] = -0.222575; qual_match_simple_bayesian[8][0] = -2.94068; qual_match_simple_bayesian[8][1] = -1.53694; qual_match_simple_bayesian[8][2] = -1.06744; qual_match_simple_bayesian[8][3] = -0.806912; qual_match_simple_bayesian[8][4] = -0.639547; qual_match_simple_bayesian[8][5] = -0.524066; qual_match_simple_bayesian[8][6] = -0.440984; qual_match_simple_bayesian[8][7] = -0.379581; qual_match_simple_bayesian[8][8] = -0.333359; qual_match_simple_bayesian[8][9] = -0.298107; qual_match_simple_bayesian[8][10] = -0.270966; qual_match_simple_bayesian[8][11] = -0.249919; qual_match_simple_bayesian[8][12] = -0.233512; qual_match_simple_bayesian[8][13] = -0.220668; qual_match_simple_bayesian[8][14] = -0.210582; qual_match_simple_bayesian[8][15] = -0.202642; qual_match_simple_bayesian[8][16] = -0.19638; qual_match_simple_bayesian[8][17] = -0.191434; qual_match_simple_bayesian[8][18] = -0.187522; qual_match_simple_bayesian[8][19] = -0.184426; qual_match_simple_bayesian[8][20] = -0.181973; qual_match_simple_bayesian[8][21] = -0.180029; qual_match_simple_bayesian[8][22] = -0.178488; qual_match_simple_bayesian[8][23] = -0.177265; qual_match_simple_bayesian[8][24] = -0.176295; qual_match_simple_bayesian[8][25] = -0.175525; qual_match_simple_bayesian[8][26] = -0.174914; qual_match_simple_bayesian[8][27] = -0.174428; qual_match_simple_bayesian[8][28] = -0.174043; qual_match_simple_bayesian[8][29] = -0.173737; qual_match_simple_bayesian[8][30] = -0.173494; qual_match_simple_bayesian[8][31] = -0.173301; qual_match_simple_bayesian[8][32] = -0.173148; qual_match_simple_bayesian[8][33] = -0.173026; qual_match_simple_bayesian[8][34] = -0.17293; qual_match_simple_bayesian[8][35] = -0.172853; qual_match_simple_bayesian[8][36] = -0.172792; qual_match_simple_bayesian[8][37] = -0.172744; qual_match_simple_bayesian[8][38] = -0.172705; qual_match_simple_bayesian[8][39] = -0.172675; qual_match_simple_bayesian[8][40] = -0.17265; qual_match_simple_bayesian[8][41] = -0.172631; qual_match_simple_bayesian[8][42] = -0.172616; qual_match_simple_bayesian[8][43] = -0.172604; qual_match_simple_bayesian[8][44] = -0.172594; qual_match_simple_bayesian[8][45] = -0.172586; qual_match_simple_bayesian[8][46] = -0.17258; qual_match_simple_bayesian[9][0] = -3.17094; qual_match_simple_bayesian[9][1] = -1.54593; qual_match_simple_bayesian[9][2] = -1.05251; qual_match_simple_bayesian[9][3] = -0.782967; qual_match_simple_bayesian[9][4] = -0.610968; qual_match_simple_bayesian[9][5] = -0.492723; qual_match_simple_bayesian[9][6] = -0.407844; qual_match_simple_bayesian[9][7] = -0.345208; qual_match_simple_bayesian[9][8] = -0.298107; qual_match_simple_bayesian[9][9] = -0.262213; qual_match_simple_bayesian[9][10] = -0.234592; qual_match_simple_bayesian[9][11] = -0.213183; qual_match_simple_bayesian[9][12] = -0.196498; qual_match_simple_bayesian[9][13] = -0.18344; qual_match_simple_bayesian[9][14] = -0.173188; qual_match_simple_bayesian[9][15] = -0.165119; qual_match_simple_bayesian[9][16] = -0.158755; qual_match_simple_bayesian[9][17] = -0.153729; qual_match_simple_bayesian[9][18] = -0.149755; qual_match_simple_bayesian[9][19] = -0.146609; qual_match_simple_bayesian[9][20] = -0.144117; qual_match_simple_bayesian[9][21] = -0.142143; qual_match_simple_bayesian[9][22] = -0.140577; qual_match_simple_bayesian[9][23] = -0.139335; qual_match_simple_bayesian[9][24] = -0.138349; qual_match_simple_bayesian[9][25] = -0.137567; qual_match_simple_bayesian[9][26] = -0.136946; qual_match_simple_bayesian[9][27] = -0.136453; qual_match_simple_bayesian[9][28] = -0.136062; qual_match_simple_bayesian[9][29] = -0.135751; qual_match_simple_bayesian[9][30] = -0.135504; qual_match_simple_bayesian[9][31] = -0.135308; qual_match_simple_bayesian[9][32] = -0.135153; qual_match_simple_bayesian[9][33] = -0.135029; qual_match_simple_bayesian[9][34] = -0.134931; qual_match_simple_bayesian[9][35] = -0.134853; qual_match_simple_bayesian[9][36] = -0.134791; qual_match_simple_bayesian[9][37] = -0.134742; qual_match_simple_bayesian[9][38] = -0.134703; qual_match_simple_bayesian[9][39] = -0.134672; qual_match_simple_bayesian[9][40] = -0.134647; qual_match_simple_bayesian[9][41] = -0.134628; qual_match_simple_bayesian[9][42] = -0.134612; qual_match_simple_bayesian[9][43] = -0.1346; qual_match_simple_bayesian[9][44] = -0.13459; qual_match_simple_bayesian[9][45] = -0.134582; qual_match_simple_bayesian[9][46] = -0.134576; qual_match_simple_bayesian[10][0] = -3.4012; qual_match_simple_bayesian[10][1] = -1.55314; qual_match_simple_bayesian[10][2] = -1.0408; qual_match_simple_bayesian[10][3] = -0.764347; qual_match_simple_bayesian[10][4] = -0.588834; qual_match_simple_bayesian[10][5] = -0.468507; qual_match_simple_bayesian[10][6] = -0.382281; qual_match_simple_bayesian[10][7] = -0.318723; qual_match_simple_bayesian[10][8] = -0.270966; qual_match_simple_bayesian[10][9] = -0.234592; qual_match_simple_bayesian[10][10] = -0.206614; qual_match_simple_bayesian[10][11] = -0.184935; qual_match_simple_bayesian[10][12] = -0.168044; qual_match_simple_bayesian[10][13] = -0.154827; qual_match_simple_bayesian[10][14] = -0.144451; qual_match_simple_bayesian[10][15] = -0.136285; qual_match_simple_bayesian[10][16] = -0.129846; qual_match_simple_bayesian[10][17] = -0.124761; qual_match_simple_bayesian[10][18] = -0.12074; qual_match_simple_bayesian[10][19] = -0.117558; qual_match_simple_bayesian[10][20] = -0.115037; qual_match_simple_bayesian[10][21] = -0.113039; qual_match_simple_bayesian[10][22] = -0.111455; qual_match_simple_bayesian[10][23] = -0.110198; qual_match_simple_bayesian[10][24] = -0.109202; qual_match_simple_bayesian[10][25] = -0.10841; qual_match_simple_bayesian[10][26] = -0.107782; qual_match_simple_bayesian[10][27] = -0.107284; qual_match_simple_bayesian[10][28] = -0.106888; qual_match_simple_bayesian[10][29] = -0.106574; qual_match_simple_bayesian[10][30] = -0.106324; qual_match_simple_bayesian[10][31] = -0.106126; qual_match_simple_bayesian[10][32] = -0.105968; qual_match_simple_bayesian[10][33] = -0.105843; qual_match_simple_bayesian[10][34] = -0.105744; qual_match_simple_bayesian[10][35] = -0.105665; qual_match_simple_bayesian[10][36] = -0.105602; qual_match_simple_bayesian[10][37] = -0.105553; qual_match_simple_bayesian[10][38] = -0.105513; qual_match_simple_bayesian[10][39] = -0.105482; qual_match_simple_bayesian[10][40] = -0.105457; qual_match_simple_bayesian[10][41] = -0.105437; qual_match_simple_bayesian[10][42] = -0.105421; qual_match_simple_bayesian[10][43] = -0.105409; qual_match_simple_bayesian[10][44] = -0.105399; qual_match_simple_bayesian[10][45] = -0.105391; qual_match_simple_bayesian[10][46] = -0.105385; qual_match_simple_bayesian[11][0] = -3.63146; qual_match_simple_bayesian[11][1] = -1.5589; qual_match_simple_bayesian[11][2] = -1.0316; qual_match_simple_bayesian[11][3] = -0.7498; qual_match_simple_bayesian[11][4] = -0.571596; qual_match_simple_bayesian[11][5] = -0.449682; qual_match_simple_bayesian[11][6] = -0.362431; qual_match_simple_bayesian[11][7] = -0.298173; qual_match_simple_bayesian[11][8] = -0.249919; qual_match_simple_bayesian[11][9] = -0.213183; qual_match_simple_bayesian[11][10] = -0.184935; qual_match_simple_bayesian[11][11] = -0.163052; qual_match_simple_bayesian[11][12] = -0.146004; qual_match_simple_bayesian[11][13] = -0.132667; qual_match_simple_bayesian[11][14] = -0.122198; qual_match_simple_bayesian[11][15] = -0.11396; qual_match_simple_bayesian[11][16] = -0.107464; qual_match_simple_bayesian[11][17] = -0.102334; qual_match_simple_bayesian[11][18] = -0.0982781; qual_match_simple_bayesian[11][19] = -0.0950678; qual_match_simple_bayesian[11][20] = -0.0925252; qual_match_simple_bayesian[11][21] = -0.09051; qual_match_simple_bayesian[11][22] = -0.0889123; qual_match_simple_bayesian[11][23] = -0.0876449; qual_match_simple_bayesian[11][24] = -0.0866394; qual_match_simple_bayesian[11][25] = -0.0858414; qual_match_simple_bayesian[11][26] = -0.0852079; qual_match_simple_bayesian[11][27] = -0.0847051; qual_match_simple_bayesian[11][28] = -0.0843058; qual_match_simple_bayesian[11][29] = -0.0839888; qual_match_simple_bayesian[11][30] = -0.083737; qual_match_simple_bayesian[11][31] = -0.0835371; qual_match_simple_bayesian[11][32] = -0.0833783; qual_match_simple_bayesian[11][33] = -0.0832522; qual_match_simple_bayesian[11][34] = -0.083152; qual_match_simple_bayesian[11][35] = -0.0830725; qual_match_simple_bayesian[11][36] = -0.0830093; qual_match_simple_bayesian[11][37] = -0.0829591; qual_match_simple_bayesian[11][38] = -0.0829192; qual_match_simple_bayesian[11][39] = -0.0828876; qual_match_simple_bayesian[11][40] = -0.0828624; qual_match_simple_bayesian[11][41] = -0.0828425; qual_match_simple_bayesian[11][42] = -0.0828266; qual_match_simple_bayesian[11][43] = -0.082814; qual_match_simple_bayesian[11][44] = -0.082804; qual_match_simple_bayesian[11][45] = -0.082796; qual_match_simple_bayesian[11][46] = -0.0827897; qual_match_simple_bayesian[12][0] = -3.86171; qual_match_simple_bayesian[12][1] = -1.5635; qual_match_simple_bayesian[12][2] = -1.02436; qual_match_simple_bayesian[12][3] = -0.738394; qual_match_simple_bayesian[12][4] = -0.558111; qual_match_simple_bayesian[12][5] = -0.434976; qual_match_simple_bayesian[12][6] = -0.34694; qual_match_simple_bayesian[12][7] = -0.282146; qual_match_simple_bayesian[12][8] = -0.233512; qual_match_simple_bayesian[12][9] = -0.196498; qual_match_simple_bayesian[12][10] = -0.168044; qual_match_simple_bayesian[12][11] = -0.146004; qual_match_simple_bayesian[12][12] = -0.128838; qual_match_simple_bayesian[12][13] = -0.115409; qual_match_simple_bayesian[12][14] = -0.104869; qual_match_simple_bayesian[12][15] = -0.096575; qual_match_simple_bayesian[12][16] = -0.0900357; qual_match_simple_bayesian[12][17] = -0.0848716; qual_match_simple_bayesian[12][18] = -0.0807886; qual_match_simple_bayesian[12][19] = -0.0775572; qual_match_simple_bayesian[12][20] = -0.0749978; qual_match_simple_bayesian[12][21] = -0.0729694; qual_match_simple_bayesian[12][22] = -0.0713612; qual_match_simple_bayesian[12][23] = -0.0700856; qual_match_simple_bayesian[12][24] = -0.0690735; qual_match_simple_bayesian[12][25] = -0.0682703; qual_match_simple_bayesian[12][26] = -0.0676327; qual_match_simple_bayesian[12][27] = -0.0671265; qual_match_simple_bayesian[12][28] = -0.0667247; qual_match_simple_bayesian[12][29] = -0.0664056; qual_match_simple_bayesian[12][30] = -0.0661522; qual_match_simple_bayesian[12][31] = -0.065951; qual_match_simple_bayesian[12][32] = -0.0657912; qual_match_simple_bayesian[12][33] = -0.0656642; qual_match_simple_bayesian[12][34] = -0.0655634; qual_match_simple_bayesian[12][35] = -0.0654833; qual_match_simple_bayesian[12][36] = -0.0654198; qual_match_simple_bayesian[12][37] = -0.0653692; qual_match_simple_bayesian[12][38] = -0.0653291; qual_match_simple_bayesian[12][39] = -0.0652972; qual_match_simple_bayesian[12][40] = -0.0652719; qual_match_simple_bayesian[12][41] = -0.0652518; qual_match_simple_bayesian[12][42] = -0.0652359; qual_match_simple_bayesian[12][43] = -0.0652232; qual_match_simple_bayesian[12][44] = -0.0652131; qual_match_simple_bayesian[12][45] = -0.0652051; qual_match_simple_bayesian[12][46] = -0.0651987; qual_match_simple_bayesian[13][0] = -4.09197; qual_match_simple_bayesian[13][1] = -1.56717; qual_match_simple_bayesian[13][2] = -1.01863; qual_match_simple_bayesian[13][3] = -0.729426; qual_match_simple_bayesian[13][4] = -0.547528; qual_match_simple_bayesian[13][5] = -0.423448; qual_match_simple_bayesian[13][6] = -0.334804; qual_match_simple_bayesian[13][7] = -0.269595; qual_match_simple_bayesian[13][8] = -0.220668; qual_match_simple_bayesian[13][9] = -0.18344; qual_match_simple_bayesian[13][10] = -0.154827; qual_match_simple_bayesian[13][11] = -0.132667; qual_match_simple_bayesian[13][12] = -0.115409; qual_match_simple_bayesian[13][13] = -0.101909; qual_match_simple_bayesian[13][14] = -0.0913142; qual_match_simple_bayesian[13][15] = -0.0829777; qual_match_simple_bayesian[13][16] = -0.0764049; qual_match_simple_bayesian[13][17] = -0.0712146; qual_match_simple_bayesian[13][18] = -0.0671109; qual_match_simple_bayesian[13][19] = -0.0638632; qual_match_simple_bayesian[13][20] = -0.061291; qual_match_simple_bayesian[13][21] = -0.0592525; qual_match_simple_bayesian[13][22] = -0.0576362; qual_match_simple_bayesian[13][23] = -0.0563542; qual_match_simple_bayesian[13][24] = -0.055337; qual_match_simple_bayesian[13][25] = -0.0545298; qual_match_simple_bayesian[13][26] = -0.053889; qual_match_simple_bayesian[13][27] = -0.0533804; qual_match_simple_bayesian[13][28] = -0.0529765; qual_match_simple_bayesian[13][29] = -0.0526558; qual_match_simple_bayesian[13][30] = -0.0524012; qual_match_simple_bayesian[13][31] = -0.0521989; qual_match_simple_bayesian[13][32] = -0.0520383; qual_match_simple_bayesian[13][33] = -0.0519108; qual_match_simple_bayesian[13][34] = -0.0518095; qual_match_simple_bayesian[13][35] = -0.051729; qual_match_simple_bayesian[13][36] = -0.0516651; qual_match_simple_bayesian[13][37] = -0.0516143; qual_match_simple_bayesian[13][38] = -0.051574; qual_match_simple_bayesian[13][39] = -0.051542; qual_match_simple_bayesian[13][40] = -0.0515165; qual_match_simple_bayesian[13][41] = -0.0514963; qual_match_simple_bayesian[13][42] = -0.0514803; qual_match_simple_bayesian[13][43] = -0.0514675; qual_match_simple_bayesian[13][44] = -0.0514574; qual_match_simple_bayesian[13][45] = -0.0514493; qual_match_simple_bayesian[13][46] = -0.051443; qual_match_simple_bayesian[14][0] = -4.32223; qual_match_simple_bayesian[14][1] = -1.5701; qual_match_simple_bayesian[14][2] = -1.01411; qual_match_simple_bayesian[14][3] = -0.722359; qual_match_simple_bayesian[14][4] = -0.539201; qual_match_simple_bayesian[14][5] = -0.414384; qual_match_simple_bayesian[14][6] = -0.325268; qual_match_simple_bayesian[14][7] = -0.259737; qual_match_simple_bayesian[14][8] = -0.210582; qual_match_simple_bayesian[14][9] = -0.173188; qual_match_simple_bayesian[14][10] = -0.144451; qual_match_simple_bayesian[14][11] = -0.122198; qual_match_simple_bayesian[14][12] = -0.104869; qual_match_simple_bayesian[14][13] = -0.0913142; qual_match_simple_bayesian[14][14] = -0.0806768; qual_match_simple_bayesian[14][15] = -0.0723072; qual_match_simple_bayesian[14][16] = -0.0657085; qual_match_simple_bayesian[14][17] = -0.0604979; qual_match_simple_bayesian[14][18] = -0.0563782; qual_match_simple_bayesian[14][19] = -0.0531178; qual_match_simple_bayesian[14][20] = -0.0505356; qual_match_simple_bayesian[14][21] = -0.0484892; qual_match_simple_bayesian[14][22] = -0.0468667; qual_match_simple_bayesian[14][23] = -0.0455797; qual_match_simple_bayesian[14][24] = -0.0445586; qual_match_simple_bayesian[14][25] = -0.0437483; qual_match_simple_bayesian[14][26] = -0.0431051; qual_match_simple_bayesian[14][27] = -0.0425945; qual_match_simple_bayesian[14][28] = -0.0421891; qual_match_simple_bayesian[14][29] = -0.0418671; qual_match_simple_bayesian[14][30] = -0.0416115; qual_match_simple_bayesian[14][31] = -0.0414085; qual_match_simple_bayesian[14][32] = -0.0412473; qual_match_simple_bayesian[14][33] = -0.0411192; qual_match_simple_bayesian[14][34] = -0.0410175; qual_match_simple_bayesian[14][35] = -0.0409368; qual_match_simple_bayesian[14][36] = -0.0408726; qual_match_simple_bayesian[14][37] = -0.0408216; qual_match_simple_bayesian[14][38] = -0.0407812; qual_match_simple_bayesian[14][39] = -0.040749; qual_match_simple_bayesian[14][40] = -0.0407235; qual_match_simple_bayesian[14][41] = -0.0407032; qual_match_simple_bayesian[14][42] = -0.0406871; qual_match_simple_bayesian[14][43] = -0.0406743; qual_match_simple_bayesian[14][44] = -0.0406641; qual_match_simple_bayesian[14][45] = -0.040656; qual_match_simple_bayesian[14][46] = -0.0406496; qual_match_simple_bayesian[15][0] = -4.55249; qual_match_simple_bayesian[15][1] = -1.57243; qual_match_simple_bayesian[15][2] = -1.01054; qual_match_simple_bayesian[15][3] = -0.71678; qual_match_simple_bayesian[15][4] = -0.532636; qual_match_simple_bayesian[15][5] = -0.407243; qual_match_simple_bayesian[15][6] = -0.317757; qual_match_simple_bayesian[15][7] = -0.251976; qual_match_simple_bayesian[15][8] = -0.202642; qual_match_simple_bayesian[15][9] = -0.165119; qual_match_simple_bayesian[15][10] = -0.136285; qual_match_simple_bayesian[15][11] = -0.11396; qual_match_simple_bayesian[15][12] = -0.096575; qual_match_simple_bayesian[15][13] = -0.0829777; qual_match_simple_bayesian[15][14] = -0.0723072; qual_match_simple_bayesian[15][15] = -0.0639118; qual_match_simple_bayesian[15][16] = -0.0572929; qual_match_simple_bayesian[15][17] = -0.0520664; qual_match_simple_bayesian[15][18] = -0.0479342; qual_match_simple_bayesian[15][19] = -0.044664; qual_match_simple_bayesian[15][20] = -0.042074; qual_match_simple_bayesian[15][21] = -0.0400214; qual_match_simple_bayesian[15][22] = -0.038394; qual_match_simple_bayesian[15][23] = -0.0371032; qual_match_simple_bayesian[15][24] = -0.0360791; qual_match_simple_bayesian[15][25] = -0.0352663; qual_match_simple_bayesian[15][26] = -0.0346212; qual_match_simple_bayesian[15][27] = -0.0341091; qual_match_simple_bayesian[15][28] = -0.0337024; qual_match_simple_bayesian[15][29] = -0.0333796; qual_match_simple_bayesian[15][30] = -0.0331232; qual_match_simple_bayesian[15][31] = -0.0329196; qual_match_simple_bayesian[15][32] = -0.0327579; qual_match_simple_bayesian[15][33] = -0.0326294; qual_match_simple_bayesian[15][34] = -0.0325274; qual_match_simple_bayesian[15][35] = -0.0324464; qual_match_simple_bayesian[15][36] = -0.0323821; qual_match_simple_bayesian[15][37] = -0.0323309; qual_match_simple_bayesian[15][38] = -0.0322904; qual_match_simple_bayesian[15][39] = -0.0322581; qual_match_simple_bayesian[15][40] = -0.0322325; qual_match_simple_bayesian[15][41] = -0.0322121; qual_match_simple_bayesian[15][42] = -0.032196; qual_match_simple_bayesian[15][43] = -0.0321831; qual_match_simple_bayesian[15][44] = -0.032173; qual_match_simple_bayesian[15][45] = -0.0321649; qual_match_simple_bayesian[15][46] = -0.0321584; qual_match_simple_bayesian[16][0] = -4.78275; qual_match_simple_bayesian[16][1] = -1.57428; qual_match_simple_bayesian[16][2] = -1.00771; qual_match_simple_bayesian[16][3] = -0.712372; qual_match_simple_bayesian[16][4] = -0.527451; qual_match_simple_bayesian[16][5] = -0.401606; qual_match_simple_bayesian[16][6] = -0.311831; qual_match_simple_bayesian[16][7] = -0.245853; qual_match_simple_bayesian[16][8] = -0.19638; qual_match_simple_bayesian[16][9] = -0.158755; qual_match_simple_bayesian[16][10] = -0.129846; qual_match_simple_bayesian[16][11] = -0.107464; qual_match_simple_bayesian[16][12] = -0.0900357; qual_match_simple_bayesian[16][13] = -0.0764049; qual_match_simple_bayesian[16][14] = -0.0657085; qual_match_simple_bayesian[16][15] = -0.0572929; qual_match_simple_bayesian[16][16] = -0.0506582; qual_match_simple_bayesian[16][17] = -0.0454193; qual_match_simple_bayesian[16][18] = -0.0412773; qual_match_simple_bayesian[16][19] = -0.0379994; qual_match_simple_bayesian[16][20] = -0.0354033; qual_match_simple_bayesian[16][21] = -0.033346; qual_match_simple_bayesian[16][22] = -0.0317148; qual_match_simple_bayesian[16][23] = -0.0304209; qual_match_simple_bayesian[16][24] = -0.0293944; qual_match_simple_bayesian[16][25] = -0.0285798; qual_match_simple_bayesian[16][26] = -0.0279331; qual_match_simple_bayesian[16][27] = -0.0274198; qual_match_simple_bayesian[16][28] = -0.0270122; qual_match_simple_bayesian[16][29] = -0.0266886; qual_match_simple_bayesian[16][30] = -0.0264316; qual_match_simple_bayesian[16][31] = -0.0262275; qual_match_simple_bayesian[16][32] = -0.0260655; qual_match_simple_bayesian[16][33] = -0.0259367; qual_match_simple_bayesian[16][34] = -0.0258345; qual_match_simple_bayesian[16][35] = -0.0257533; qual_match_simple_bayesian[16][36] = -0.0256888; qual_match_simple_bayesian[16][37] = -0.0256376; qual_match_simple_bayesian[16][38] = -0.0255969; qual_match_simple_bayesian[16][39] = -0.0255645; qual_match_simple_bayesian[16][40] = -0.0255389; qual_match_simple_bayesian[16][41] = -0.0255185; qual_match_simple_bayesian[16][42] = -0.0255023; qual_match_simple_bayesian[16][43] = -0.0254894; qual_match_simple_bayesian[16][44] = -0.0254792; qual_match_simple_bayesian[16][45] = -0.0254711; qual_match_simple_bayesian[16][46] = -0.0254646; qual_match_simple_bayesian[17][0] = -5.01301; qual_match_simple_bayesian[17][1] = -1.57576; qual_match_simple_bayesian[17][2] = -1.00546; qual_match_simple_bayesian[17][3] = -0.708883; qual_match_simple_bayesian[17][4] = -0.523352; qual_match_simple_bayesian[17][5] = -0.397151; qual_match_simple_bayesian[17][6] = -0.307149; qual_match_simple_bayesian[17][7] = -0.241016; qual_match_simple_bayesian[17][8] = -0.191434; qual_match_simple_bayesian[17][9] = -0.153729; qual_match_simple_bayesian[17][10] = -0.124761; qual_match_simple_bayesian[17][11] = -0.102334; qual_match_simple_bayesian[17][12] = -0.0848716; qual_match_simple_bayesian[17][13] = -0.0712146; qual_match_simple_bayesian[17][14] = -0.0604979; qual_match_simple_bayesian[17][15] = -0.0520664; qual_match_simple_bayesian[17][16] = -0.0454193; qual_match_simple_bayesian[17][17] = -0.0401706; qual_match_simple_bayesian[17][18] = -0.036021; qual_match_simple_bayesian[17][19] = -0.032737; qual_match_simple_bayesian[17][20] = -0.0301362; qual_match_simple_bayesian[17][21] = -0.028075; qual_match_simple_bayesian[17][22] = -0.0264408; qual_match_simple_bayesian[17][23] = -0.0251447; qual_match_simple_bayesian[17][24] = -0.0241163; qual_match_simple_bayesian[17][25] = -0.0233001; qual_match_simple_bayesian[17][26] = -0.0226523; qual_match_simple_bayesian[17][27] = -0.0221381; qual_match_simple_bayesian[17][28] = -0.0217297; qual_match_simple_bayesian[17][29] = -0.0214055; qual_match_simple_bayesian[17][30] = -0.0211481; qual_match_simple_bayesian[17][31] = -0.0209436; qual_match_simple_bayesian[17][32] = -0.0207812; qual_match_simple_bayesian[17][33] = -0.0206523; qual_match_simple_bayesian[17][34] = -0.0205498; qual_match_simple_bayesian[17][35] = -0.0204685; qual_match_simple_bayesian[17][36] = -0.0204039; qual_match_simple_bayesian[17][37] = -0.0203526; qual_match_simple_bayesian[17][38] = -0.0203118; qual_match_simple_bayesian[17][39] = -0.0202794; qual_match_simple_bayesian[17][40] = -0.0202537; qual_match_simple_bayesian[17][41] = -0.0202333; qual_match_simple_bayesian[17][42] = -0.020217; qual_match_simple_bayesian[17][43] = -0.0202041; qual_match_simple_bayesian[17][44] = -0.0201939; qual_match_simple_bayesian[17][45] = -0.0201858; qual_match_simple_bayesian[17][46] = -0.0201793; qual_match_simple_bayesian[18][0] = -5.24327; qual_match_simple_bayesian[18][1] = -1.57693; qual_match_simple_bayesian[18][2] = -1.00368; qual_match_simple_bayesian[18][3] = -0.706121; qual_match_simple_bayesian[18][4] = -0.520107; qual_match_simple_bayesian[18][5] = -0.393627; qual_match_simple_bayesian[18][6] = -0.303445; qual_match_simple_bayesian[18][7] = -0.23719; qual_match_simple_bayesian[18][8] = -0.187522; qual_match_simple_bayesian[18][9] = -0.149755; qual_match_simple_bayesian[18][10] = -0.12074; qual_match_simple_bayesian[18][11] = -0.0982781; qual_match_simple_bayesian[18][12] = -0.0807886; qual_match_simple_bayesian[18][13] = -0.0671109; qual_match_simple_bayesian[18][14] = -0.0563782; qual_match_simple_bayesian[18][15] = -0.0479342; qual_match_simple_bayesian[18][16] = -0.0412773; qual_match_simple_bayesian[18][17] = -0.036021; qual_match_simple_bayesian[18][18] = -0.0318653; qual_match_simple_bayesian[18][19] = -0.0285766; qual_match_simple_bayesian[18][20] = -0.025972; qual_match_simple_bayesian[18][21] = -0.0239079; qual_match_simple_bayesian[18][22] = -0.0222713; qual_match_simple_bayesian[18][23] = -0.0209733; qual_match_simple_bayesian[18][24] = -0.0199434; qual_match_simple_bayesian[18][25] = -0.0191261; qual_match_simple_bayesian[18][26] = -0.0184774; qual_match_simple_bayesian[18][27] = -0.0179624; qual_match_simple_bayesian[18][28] = -0.0175535; qual_match_simple_bayesian[18][29] = -0.0172288; qual_match_simple_bayesian[18][30] = -0.016971; qual_match_simple_bayesian[18][31] = -0.0167662; qual_match_simple_bayesian[18][32] = -0.0166036; qual_match_simple_bayesian[18][33] = -0.0164745; qual_match_simple_bayesian[18][34] = -0.0163719; qual_match_simple_bayesian[18][35] = -0.0162904; qual_match_simple_bayesian[18][36] = -0.0162257; qual_match_simple_bayesian[18][37] = -0.0161743; qual_match_simple_bayesian[18][38] = -0.0161335; qual_match_simple_bayesian[18][39] = -0.0161011; qual_match_simple_bayesian[18][40] = -0.0160753; qual_match_simple_bayesian[18][41] = -0.0160549; qual_match_simple_bayesian[18][42] = -0.0160386; qual_match_simple_bayesian[18][43] = -0.0160257; qual_match_simple_bayesian[18][44] = -0.0160155; qual_match_simple_bayesian[18][45] = -0.0160073; qual_match_simple_bayesian[18][46] = -0.0160009; qual_match_simple_bayesian[19][0] = -5.47352; qual_match_simple_bayesian[19][1] = -1.57786; qual_match_simple_bayesian[19][2] = -1.00227; qual_match_simple_bayesian[19][3] = -0.703933; qual_match_simple_bayesian[19][4] = -0.517538; qual_match_simple_bayesian[19][5] = -0.390836; qual_match_simple_bayesian[19][6] = -0.300513; qual_match_simple_bayesian[19][7] = -0.234162; qual_match_simple_bayesian[19][8] = -0.184426; qual_match_simple_bayesian[19][9] = -0.146609; qual_match_simple_bayesian[19][10] = -0.117558; qual_match_simple_bayesian[19][11] = -0.0950678; qual_match_simple_bayesian[19][12] = -0.0775572; qual_match_simple_bayesian[19][13] = -0.0638632; qual_match_simple_bayesian[19][14] = -0.0531178; qual_match_simple_bayesian[19][15] = -0.044664; qual_match_simple_bayesian[19][16] = -0.0379994; qual_match_simple_bayesian[19][17] = -0.032737; qual_match_simple_bayesian[19][18] = -0.0285766; qual_match_simple_bayesian[19][19] = -0.0252842; qual_match_simple_bayesian[19][20] = -0.0226766; qual_match_simple_bayesian[19][21] = -0.0206101; qual_match_simple_bayesian[19][22] = -0.0189717; qual_match_simple_bayesian[19][23] = -0.0176722; qual_match_simple_bayesian[19][24] = -0.0166412; qual_match_simple_bayesian[19][25] = -0.015823; qual_match_simple_bayesian[19][26] = -0.0151735; qual_match_simple_bayesian[19][27] = -0.0146579; qual_match_simple_bayesian[19][28] = -0.0142486; qual_match_simple_bayesian[19][29] = -0.0139235; qual_match_simple_bayesian[19][30] = -0.0136654; qual_match_simple_bayesian[19][31] = -0.0134604; qual_match_simple_bayesian[19][32] = -0.0132976; qual_match_simple_bayesian[19][33] = -0.0131684; qual_match_simple_bayesian[19][34] = -0.0130657; qual_match_simple_bayesian[19][35] = -0.0129841; qual_match_simple_bayesian[19][36] = -0.0129193; qual_match_simple_bayesian[19][37] = -0.0128679; qual_match_simple_bayesian[19][38] = -0.012827; qual_match_simple_bayesian[19][39] = -0.0127945; qual_match_simple_bayesian[19][40] = -0.0127688; qual_match_simple_bayesian[19][41] = -0.0127483; qual_match_simple_bayesian[19][42] = -0.012732; qual_match_simple_bayesian[19][43] = -0.0127191; qual_match_simple_bayesian[19][44] = -0.0127088; qual_match_simple_bayesian[19][45] = -0.0127007; qual_match_simple_bayesian[19][46] = -0.0126942; qual_match_simple_bayesian[20][0] = -5.70378; qual_match_simple_bayesian[20][1] = -1.5786; qual_match_simple_bayesian[20][2] = -1.00115; qual_match_simple_bayesian[20][3] = -0.702197; qual_match_simple_bayesian[20][4] = -0.515502; qual_match_simple_bayesian[20][5] = -0.388625; qual_match_simple_bayesian[20][6] = -0.29819; qual_match_simple_bayesian[20][7] = -0.231763; qual_match_simple_bayesian[20][8] = -0.181973; qual_match_simple_bayesian[20][9] = -0.144117; qual_match_simple_bayesian[20][10] = -0.115037; qual_match_simple_bayesian[20][11] = -0.0925252; qual_match_simple_bayesian[20][12] = -0.0749978; qual_match_simple_bayesian[20][13] = -0.061291; qual_match_simple_bayesian[20][14] = -0.0505356; qual_match_simple_bayesian[20][15] = -0.042074; qual_match_simple_bayesian[20][16] = -0.0354033; qual_match_simple_bayesian[20][17] = -0.0301362; qual_match_simple_bayesian[20][18] = -0.025972; qual_match_simple_bayesian[20][19] = -0.0226766; qual_match_simple_bayesian[20][20] = -0.0200667; qual_match_simple_bayesian[20][21] = -0.0179984; qual_match_simple_bayesian[20][22] = -0.0163585; qual_match_simple_bayesian[20][23] = -0.0150578; qual_match_simple_bayesian[20][24] = -0.0140259; qual_match_simple_bayesian[20][25] = -0.0132069; qual_match_simple_bayesian[20][26] = -0.0125569; qual_match_simple_bayesian[20][27] = -0.0120409; qual_match_simple_bayesian[20][28] = -0.0116311; qual_match_simple_bayesian[20][29] = -0.0113058; qual_match_simple_bayesian[20][30] = -0.0110475; qual_match_simple_bayesian[20][31] = -0.0108423; qual_match_simple_bayesian[20][32] = -0.0106794; qual_match_simple_bayesian[20][33] = -0.01055; qual_match_simple_bayesian[20][34] = -0.0104472; qual_match_simple_bayesian[20][35] = -0.0103655; qual_match_simple_bayesian[20][36] = -0.0103007; qual_match_simple_bayesian[20][37] = -0.0102492; qual_match_simple_bayesian[20][38] = -0.0102083; qual_match_simple_bayesian[20][39] = -0.0101758; qual_match_simple_bayesian[20][40] = -0.01015; qual_match_simple_bayesian[20][41] = -0.0101295; qual_match_simple_bayesian[20][42] = -0.0101132; qual_match_simple_bayesian[20][43] = -0.0101003; qual_match_simple_bayesian[20][44] = -0.01009; qual_match_simple_bayesian[20][45] = -0.0100819; qual_match_simple_bayesian[20][46] = -0.0100754; qual_match_simple_bayesian[21][0] = -5.93404; qual_match_simple_bayesian[21][1] = -1.57919; qual_match_simple_bayesian[21][2] = -1.00027; qual_match_simple_bayesian[21][3] = -0.700821; qual_match_simple_bayesian[21][4] = -0.513887; qual_match_simple_bayesian[21][5] = -0.386872; qual_match_simple_bayesian[21][6] = -0.296348; qual_match_simple_bayesian[21][7] = -0.229861; qual_match_simple_bayesian[21][8] = -0.180029; qual_match_simple_bayesian[21][9] = -0.142143; qual_match_simple_bayesian[21][10] = -0.113039; qual_match_simple_bayesian[21][11] = -0.09051; qual_match_simple_bayesian[21][12] = -0.0729694; qual_match_simple_bayesian[21][13] = -0.0592525; qual_match_simple_bayesian[21][14] = -0.0484892; qual_match_simple_bayesian[21][15] = -0.0400214; qual_match_simple_bayesian[21][16] = -0.033346; qual_match_simple_bayesian[21][17] = -0.028075; qual_match_simple_bayesian[21][18] = -0.0239079; qual_match_simple_bayesian[21][19] = -0.0206101; qual_match_simple_bayesian[21][20] = -0.0179984; qual_match_simple_bayesian[21][21] = -0.0159286; qual_match_simple_bayesian[21][22] = -0.0142876; qual_match_simple_bayesian[21][23] = -0.012986; qual_match_simple_bayesian[21][24] = -0.0119533; qual_match_simple_bayesian[21][25] = -0.0111338; qual_match_simple_bayesian[21][26] = -0.0104833; qual_match_simple_bayesian[21][27] = -0.00996692; qual_match_simple_bayesian[21][28] = -0.00955691; qual_match_simple_bayesian[21][29] = -0.00923135; qual_match_simple_bayesian[21][30] = -0.00897283; qual_match_simple_bayesian[21][31] = -0.00876752; qual_match_simple_bayesian[21][32] = -0.00860447; qual_match_simple_bayesian[21][33] = -0.00847497; qual_match_simple_bayesian[21][34] = -0.00837212; qual_match_simple_bayesian[21][35] = -0.00829043; qual_match_simple_bayesian[21][36] = -0.00822555; qual_match_simple_bayesian[21][37] = -0.00817401; qual_match_simple_bayesian[21][38] = -0.00813308; qual_match_simple_bayesian[21][39] = -0.00810056; qual_match_simple_bayesian[21][40] = -0.00807474; qual_match_simple_bayesian[21][41] = -0.00805422; qual_match_simple_bayesian[21][42] = -0.00803793; qual_match_simple_bayesian[21][43] = -0.00802498; qual_match_simple_bayesian[21][44] = -0.0080147; qual_match_simple_bayesian[21][45] = -0.00800654; qual_match_simple_bayesian[21][46] = -0.00800005; qual_match_simple_bayesian[22][0] = -6.1643; qual_match_simple_bayesian[22][1] = -1.57966; qual_match_simple_bayesian[22][2] = -0.99956; qual_match_simple_bayesian[22][3] = -0.69973; qual_match_simple_bayesian[22][4] = -0.512606; qual_match_simple_bayesian[22][5] = -0.385482; qual_match_simple_bayesian[22][6] = -0.294888; qual_match_simple_bayesian[22][7] = -0.228354; qual_match_simple_bayesian[22][8] = -0.178488; qual_match_simple_bayesian[22][9] = -0.140577; qual_match_simple_bayesian[22][10] = -0.111455; qual_match_simple_bayesian[22][11] = -0.0889123; qual_match_simple_bayesian[22][12] = -0.0713612; qual_match_simple_bayesian[22][13] = -0.0576362; qual_match_simple_bayesian[22][14] = -0.0468667; qual_match_simple_bayesian[22][15] = -0.038394; qual_match_simple_bayesian[22][16] = -0.0317148; qual_match_simple_bayesian[22][17] = -0.0264408; qual_match_simple_bayesian[22][18] = -0.0222713; qual_match_simple_bayesian[22][19] = -0.0189717; qual_match_simple_bayesian[22][20] = -0.0163585; qual_match_simple_bayesian[22][21] = -0.0142876; qual_match_simple_bayesian[22][22] = -0.0126457; qual_match_simple_bayesian[22][23] = -0.0113434; qual_match_simple_bayesian[22][24] = -0.0103101; qual_match_simple_bayesian[22][25] = -0.00949014; qual_match_simple_bayesian[22][26] = -0.00883928; qual_match_simple_bayesian[22][27] = -0.00832259; qual_match_simple_bayesian[22][28] = -0.00791235; qual_match_simple_bayesian[22][29] = -0.00758661; qual_match_simple_bayesian[22][30] = -0.00732794; qual_match_simple_bayesian[22][31] = -0.00712252; qual_match_simple_bayesian[22][32] = -0.00695938; qual_match_simple_bayesian[22][33] = -0.00682981; qual_match_simple_bayesian[22][34] = -0.00672691; qual_match_simple_bayesian[22][35] = -0.00664517; qual_match_simple_bayesian[22][36] = -0.00658025; qual_match_simple_bayesian[22][37] = -0.00652869; qual_match_simple_bayesian[22][38] = -0.00648773; qual_match_simple_bayesian[22][39] = -0.0064552; qual_match_simple_bayesian[22][40] = -0.00642936; qual_match_simple_bayesian[22][41] = -0.00640883; qual_match_simple_bayesian[22][42] = -0.00639253; qual_match_simple_bayesian[22][43] = -0.00637958; qual_match_simple_bayesian[22][44] = -0.00636929; qual_match_simple_bayesian[22][45] = -0.00636112; qual_match_simple_bayesian[22][46] = -0.00635463; qual_match_simple_bayesian[23][0] = -6.39456; qual_match_simple_bayesian[23][1] = -1.58003; qual_match_simple_bayesian[23][2] = -0.999001; qual_match_simple_bayesian[23][3] = -0.698863; qual_match_simple_bayesian[23][4] = -0.51159; qual_match_simple_bayesian[23][5] = -0.384379; qual_match_simple_bayesian[23][6] = -0.29373; qual_match_simple_bayesian[23][7] = -0.227158; qual_match_simple_bayesian[23][8] = -0.177265; qual_match_simple_bayesian[23][9] = -0.139335; qual_match_simple_bayesian[23][10] = -0.110198; qual_match_simple_bayesian[23][11] = -0.0876449; qual_match_simple_bayesian[23][12] = -0.0700856; qual_match_simple_bayesian[23][13] = -0.0563542; qual_match_simple_bayesian[23][14] = -0.0455797; qual_match_simple_bayesian[23][15] = -0.0371032; qual_match_simple_bayesian[23][16] = -0.0304209; qual_match_simple_bayesian[23][17] = -0.0251447; qual_match_simple_bayesian[23][18] = -0.0209733; qual_match_simple_bayesian[23][19] = -0.0176722; qual_match_simple_bayesian[23][20] = -0.0150578; qual_match_simple_bayesian[23][21] = -0.012986; qual_match_simple_bayesian[23][22] = -0.0113434; qual_match_simple_bayesian[23][23] = -0.0100405; qual_match_simple_bayesian[23][24] = -0.00900678; qual_match_simple_bayesian[23][25] = -0.00818644; qual_match_simple_bayesian[23][26] = -0.00753529; qual_match_simple_bayesian[23][27] = -0.00701837; qual_match_simple_bayesian[23][28] = -0.00660796; qual_match_simple_bayesian[23][29] = -0.00628208; qual_match_simple_bayesian[23][30] = -0.00602329; qual_match_simple_bayesian[23][31] = -0.00581778; qual_match_simple_bayesian[23][32] = -0.00565457; qual_match_simple_bayesian[23][33] = -0.00552494; qual_match_simple_bayesian[23][34] = -0.00542199; qual_match_simple_bayesian[23][35] = -0.00534022; qual_match_simple_bayesian[23][36] = -0.00527527; qual_match_simple_bayesian[23][37] = -0.00522368; qual_match_simple_bayesian[23][38] = -0.00518271; qual_match_simple_bayesian[23][39] = -0.00515016; qual_match_simple_bayesian[23][40] = -0.00512431; qual_match_simple_bayesian[23][41] = -0.00510378; qual_match_simple_bayesian[23][42] = -0.00508747; qual_match_simple_bayesian[23][43] = -0.00507451; qual_match_simple_bayesian[23][44] = -0.00506422; qual_match_simple_bayesian[23][45] = -0.00505604; qual_match_simple_bayesian[23][46] = -0.00504955; qual_match_simple_bayesian[24][0] = -6.62482; qual_match_simple_bayesian[24][1] = -1.58033; qual_match_simple_bayesian[24][2] = -0.998557; qual_match_simple_bayesian[24][3] = -0.698176; qual_match_simple_bayesian[24][4] = -0.510784; qual_match_simple_bayesian[24][5] = -0.383503; qual_match_simple_bayesian[24][6] = -0.29281; qual_match_simple_bayesian[24][7] = -0.226208; qual_match_simple_bayesian[24][8] = -0.176295; qual_match_simple_bayesian[24][9] = -0.138349; qual_match_simple_bayesian[24][10] = -0.109202; qual_match_simple_bayesian[24][11] = -0.0866394; qual_match_simple_bayesian[24][12] = -0.0690735; qual_match_simple_bayesian[24][13] = -0.055337; qual_match_simple_bayesian[24][14] = -0.0445586; qual_match_simple_bayesian[24][15] = -0.0360791; qual_match_simple_bayesian[24][16] = -0.0293944; qual_match_simple_bayesian[24][17] = -0.0241163; qual_match_simple_bayesian[24][18] = -0.0199434; qual_match_simple_bayesian[24][19] = -0.0166412; qual_match_simple_bayesian[24][20] = -0.0140259; qual_match_simple_bayesian[24][21] = -0.0119533; qual_match_simple_bayesian[24][22] = -0.0103101; qual_match_simple_bayesian[24][23] = -0.00900678; qual_match_simple_bayesian[24][24] = -0.00797271; qual_match_simple_bayesian[24][25] = -0.00715208; qual_match_simple_bayesian[24][26] = -0.00650071; qual_match_simple_bayesian[24][27] = -0.00598361; qual_match_simple_bayesian[24][28] = -0.00557305; qual_match_simple_bayesian[24][29] = -0.00524706; qual_match_simple_bayesian[24][30] = -0.00498818; qual_match_simple_bayesian[24][31] = -0.0047826; qual_match_simple_bayesian[24][32] = -0.00461933; qual_match_simple_bayesian[24][33] = -0.00448966; qual_match_simple_bayesian[24][34] = -0.00438667; qual_match_simple_bayesian[24][35] = -0.00430487; qual_match_simple_bayesian[24][36] = -0.0042399; qual_match_simple_bayesian[24][37] = -0.0041883; qual_match_simple_bayesian[24][38] = -0.00414731; qual_match_simple_bayesian[24][39] = -0.00411475; qual_match_simple_bayesian[24][40] = -0.00408889; qual_match_simple_bayesian[24][41] = -0.00406835; qual_match_simple_bayesian[24][42] = -0.00405203; qual_match_simple_bayesian[24][43] = -0.00403907; qual_match_simple_bayesian[24][44] = -0.00402878; qual_match_simple_bayesian[24][45] = -0.0040206; qual_match_simple_bayesian[24][46] = -0.0040141; qual_match_simple_bayesian[25][0] = -6.85508; qual_match_simple_bayesian[25][1] = -1.58057; qual_match_simple_bayesian[25][2] = -0.998204; qual_match_simple_bayesian[25][3] = -0.69763; qual_match_simple_bayesian[25][4] = -0.510144; qual_match_simple_bayesian[25][5] = -0.382809; qual_match_simple_bayesian[25][6] = -0.292081; qual_match_simple_bayesian[25][7] = -0.225455; qual_match_simple_bayesian[25][8] = -0.175525; qual_match_simple_bayesian[25][9] = -0.137567; qual_match_simple_bayesian[25][10] = -0.10841; qual_match_simple_bayesian[25][11] = -0.0858414; qual_match_simple_bayesian[25][12] = -0.0682703; qual_match_simple_bayesian[25][13] = -0.0545298; qual_match_simple_bayesian[25][14] = -0.0437483; qual_match_simple_bayesian[25][15] = -0.0352663; qual_match_simple_bayesian[25][16] = -0.0285798; qual_match_simple_bayesian[25][17] = -0.0233001; qual_match_simple_bayesian[25][18] = -0.0191261; qual_match_simple_bayesian[25][19] = -0.015823; qual_match_simple_bayesian[25][20] = -0.0132069; qual_match_simple_bayesian[25][21] = -0.0111338; qual_match_simple_bayesian[25][22] = -0.00949014; qual_match_simple_bayesian[25][23] = -0.00818644; qual_match_simple_bayesian[25][24] = -0.00715208; qual_match_simple_bayesian[25][25] = -0.00633122; qual_match_simple_bayesian[25][26] = -0.00567967; qual_match_simple_bayesian[25][27] = -0.00516243; qual_match_simple_bayesian[25][28] = -0.00475176; qual_match_simple_bayesian[25][29] = -0.00442567; qual_match_simple_bayesian[25][30] = -0.00416673; qual_match_simple_bayesian[25][31] = -0.00396109; qual_match_simple_bayesian[25][32] = -0.00379778; qual_match_simple_bayesian[25][33] = -0.00366807; qual_match_simple_bayesian[25][34] = -0.00356505; qual_match_simple_bayesian[25][35] = -0.00348323; qual_match_simple_bayesian[25][36] = -0.00341824; qual_match_simple_bayesian[25][37] = -0.00336662; qual_match_simple_bayesian[25][38] = -0.00332562; qual_match_simple_bayesian[25][39] = -0.00329306; qual_match_simple_bayesian[25][40] = -0.00326719; qual_match_simple_bayesian[25][41] = -0.00324664; qual_match_simple_bayesian[25][42] = -0.00323032; qual_match_simple_bayesian[25][43] = -0.00321736; qual_match_simple_bayesian[25][44] = -0.00320706; qual_match_simple_bayesian[25][45] = -0.00319888; qual_match_simple_bayesian[25][46] = -0.00319238; qual_match_simple_bayesian[26][0] = -7.08533; qual_match_simple_bayesian[26][1] = -1.58075; qual_match_simple_bayesian[26][2] = -0.997924; qual_match_simple_bayesian[26][3] = -0.697196; qual_match_simple_bayesian[26][4] = -0.509636; qual_match_simple_bayesian[26][5] = -0.382257; qual_match_simple_bayesian[26][6] = -0.291502; qual_match_simple_bayesian[26][7] = -0.224857; qual_match_simple_bayesian[26][8] = -0.174914; qual_match_simple_bayesian[26][9] = -0.136946; qual_match_simple_bayesian[26][10] = -0.107782; qual_match_simple_bayesian[26][11] = -0.0852079; qual_match_simple_bayesian[26][12] = -0.0676327; qual_match_simple_bayesian[26][13] = -0.053889; qual_match_simple_bayesian[26][14] = -0.0431051; qual_match_simple_bayesian[26][15] = -0.0346212; qual_match_simple_bayesian[26][16] = -0.0279331; qual_match_simple_bayesian[26][17] = -0.0226523; qual_match_simple_bayesian[26][18] = -0.0184774; qual_match_simple_bayesian[26][19] = -0.0151735; qual_match_simple_bayesian[26][20] = -0.0125569; qual_match_simple_bayesian[26][21] = -0.0104833; qual_match_simple_bayesian[26][22] = -0.00883928; qual_match_simple_bayesian[26][23] = -0.00753529; qual_match_simple_bayesian[26][24] = -0.00650071; qual_match_simple_bayesian[26][25] = -0.00567967; qual_match_simple_bayesian[26][26] = -0.00502798; qual_match_simple_bayesian[26][27] = -0.00451062; qual_match_simple_bayesian[26][28] = -0.00409986; qual_match_simple_bayesian[26][29] = -0.00377371; qual_match_simple_bayesian[26][30] = -0.00351471; qual_match_simple_bayesian[26][31] = -0.00330902; qual_match_simple_bayesian[26][32] = -0.00314567; qual_match_simple_bayesian[26][33] = -0.00301594; qual_match_simple_bayesian[26][34] = -0.0029129; qual_match_simple_bayesian[26][35] = -0.00283106; qual_match_simple_bayesian[26][36] = -0.00276606; qual_match_simple_bayesian[26][37] = -0.00271443; qual_match_simple_bayesian[26][38] = -0.00267342; qual_match_simple_bayesian[26][39] = -0.00264084; qual_match_simple_bayesian[26][40] = -0.00261497; qual_match_simple_bayesian[26][41] = -0.00259442; qual_match_simple_bayesian[26][42] = -0.00257809; qual_match_simple_bayesian[26][43] = -0.00256512; qual_match_simple_bayesian[26][44] = -0.00255482; qual_match_simple_bayesian[26][45] = -0.00254664; qual_match_simple_bayesian[26][46] = -0.00254014; qual_match_simple_bayesian[27][0] = -7.31559; qual_match_simple_bayesian[27][1] = -1.5809; qual_match_simple_bayesian[27][2] = -0.997702; qual_match_simple_bayesian[27][3] = -0.696852; qual_match_simple_bayesian[27][4] = -0.509232; qual_match_simple_bayesian[27][5] = -0.38182; qual_match_simple_bayesian[27][6] = -0.291042; qual_match_simple_bayesian[27][7] = -0.224383; qual_match_simple_bayesian[27][8] = -0.174428; qual_match_simple_bayesian[27][9] = -0.136453; qual_match_simple_bayesian[27][10] = -0.107284; qual_match_simple_bayesian[27][11] = -0.0847051; qual_match_simple_bayesian[27][12] = -0.0671265; qual_match_simple_bayesian[27][13] = -0.0533804; qual_match_simple_bayesian[27][14] = -0.0425945; qual_match_simple_bayesian[27][15] = -0.0341091; qual_match_simple_bayesian[27][16] = -0.0274198; qual_match_simple_bayesian[27][17] = -0.0221381; qual_match_simple_bayesian[27][18] = -0.0179624; qual_match_simple_bayesian[27][19] = -0.0146579; qual_match_simple_bayesian[27][20] = -0.0120409; qual_match_simple_bayesian[27][21] = -0.00996692; qual_match_simple_bayesian[27][22] = -0.00832259; qual_match_simple_bayesian[27][23] = -0.00701837; qual_match_simple_bayesian[27][24] = -0.00598361; qual_match_simple_bayesian[27][25] = -0.00516243; qual_match_simple_bayesian[27][26] = -0.00451062; qual_match_simple_bayesian[27][27] = -0.00399318; qual_match_simple_bayesian[27][28] = -0.00358235; qual_match_simple_bayesian[27][29] = -0.00325613; qual_match_simple_bayesian[27][30] = -0.00299709; qual_match_simple_bayesian[27][31] = -0.00279137; qual_match_simple_bayesian[27][32] = -0.00262799; qual_match_simple_bayesian[27][33] = -0.00249823; qual_match_simple_bayesian[27][34] = -0.00239518; qual_match_simple_bayesian[27][35] = -0.00231332; qual_match_simple_bayesian[27][36] = -0.00224831; qual_match_simple_bayesian[27][37] = -0.00219667; qual_match_simple_bayesian[27][38] = -0.00215565; qual_match_simple_bayesian[27][39] = -0.00212307; qual_match_simple_bayesian[27][40] = -0.00209719; qual_match_simple_bayesian[27][41] = -0.00207664; qual_match_simple_bayesian[27][42] = -0.00206031; qual_match_simple_bayesian[27][43] = -0.00204734; qual_match_simple_bayesian[27][44] = -0.00203704; qual_match_simple_bayesian[27][45] = -0.00202886; qual_match_simple_bayesian[27][46] = -0.00202236; qual_match_simple_bayesian[28][0] = -7.54585; qual_match_simple_bayesian[28][1] = -1.58102; qual_match_simple_bayesian[28][2] = -0.997525; qual_match_simple_bayesian[28][3] = -0.696579; qual_match_simple_bayesian[28][4] = -0.508912; qual_match_simple_bayesian[28][5] = -0.381472; qual_match_simple_bayesian[28][6] = -0.290677; qual_match_simple_bayesian[28][7] = -0.224006; qual_match_simple_bayesian[28][8] = -0.174043; qual_match_simple_bayesian[28][9] = -0.136062; qual_match_simple_bayesian[28][10] = -0.106888; qual_match_simple_bayesian[28][11] = -0.0843058; qual_match_simple_bayesian[28][12] = -0.0667247; qual_match_simple_bayesian[28][13] = -0.0529765; qual_match_simple_bayesian[28][14] = -0.0421891; qual_match_simple_bayesian[28][15] = -0.0337024; qual_match_simple_bayesian[28][16] = -0.0270122; qual_match_simple_bayesian[28][17] = -0.0217297; qual_match_simple_bayesian[28][18] = -0.0175535; qual_match_simple_bayesian[28][19] = -0.0142486; qual_match_simple_bayesian[28][20] = -0.0116311; qual_match_simple_bayesian[28][21] = -0.00955691; qual_match_simple_bayesian[28][22] = -0.00791235; qual_match_simple_bayesian[28][23] = -0.00660796; qual_match_simple_bayesian[28][24] = -0.00557305; qual_match_simple_bayesian[28][25] = -0.00475176; qual_match_simple_bayesian[28][26] = -0.00409986; qual_match_simple_bayesian[28][27] = -0.00358235; qual_match_simple_bayesian[28][28] = -0.00317146; qual_match_simple_bayesian[28][29] = -0.0028452; qual_match_simple_bayesian[28][30] = -0.00258612; qual_match_simple_bayesian[28][31] = -0.00238037; qual_match_simple_bayesian[28][32] = -0.00221697; qual_match_simple_bayesian[28][33] = -0.0020872; qual_match_simple_bayesian[28][34] = -0.00198413; qual_match_simple_bayesian[28][35] = -0.00190226; qual_match_simple_bayesian[28][36] = -0.00183724; qual_match_simple_bayesian[28][37] = -0.00178559; qual_match_simple_bayesian[28][38] = -0.00174457; qual_match_simple_bayesian[28][39] = -0.00171198; qual_match_simple_bayesian[28][40] = -0.0016861; qual_match_simple_bayesian[28][41] = -0.00166554; qual_match_simple_bayesian[28][42] = -0.00164921; qual_match_simple_bayesian[28][43] = -0.00163624; qual_match_simple_bayesian[28][44] = -0.00162594; qual_match_simple_bayesian[28][45] = -0.00161776; qual_match_simple_bayesian[28][46] = -0.00161126; qual_match_simple_bayesian[29][0] = -7.77611; qual_match_simple_bayesian[29][1] = -1.58111; qual_match_simple_bayesian[29][2] = -0.997385; qual_match_simple_bayesian[29][3] = -0.696362; qual_match_simple_bayesian[29][4] = -0.508658; qual_match_simple_bayesian[29][5] = -0.381196; qual_match_simple_bayesian[29][6] = -0.290387; qual_match_simple_bayesian[29][7] = -0.223707; qual_match_simple_bayesian[29][8] = -0.173737; qual_match_simple_bayesian[29][9] = -0.135751; qual_match_simple_bayesian[29][10] = -0.106574; qual_match_simple_bayesian[29][11] = -0.0839888; qual_match_simple_bayesian[29][12] = -0.0664056; qual_match_simple_bayesian[29][13] = -0.0526558; qual_match_simple_bayesian[29][14] = -0.0418671; qual_match_simple_bayesian[29][15] = -0.0333796; qual_match_simple_bayesian[29][16] = -0.0266886; qual_match_simple_bayesian[29][17] = -0.0214055; qual_match_simple_bayesian[29][18] = -0.0172288; qual_match_simple_bayesian[29][19] = -0.0139235; qual_match_simple_bayesian[29][20] = -0.0113058; qual_match_simple_bayesian[29][21] = -0.00923135; qual_match_simple_bayesian[29][22] = -0.00758661; qual_match_simple_bayesian[29][23] = -0.00628208; qual_match_simple_bayesian[29][24] = -0.00524706; qual_match_simple_bayesian[29][25] = -0.00442567; qual_match_simple_bayesian[29][26] = -0.00377371; qual_match_simple_bayesian[29][27] = -0.00325613; qual_match_simple_bayesian[29][28] = -0.0028452; qual_match_simple_bayesian[29][29] = -0.00251891; qual_match_simple_bayesian[29][30] = -0.0022598; qual_match_simple_bayesian[29][31] = -0.00205403; qual_match_simple_bayesian[29][32] = -0.00189061; qual_match_simple_bayesian[29][33] = -0.00176082; qual_match_simple_bayesian[29][34] = -0.00165774; qual_match_simple_bayesian[29][35] = -0.00157586; qual_match_simple_bayesian[29][36] = -0.00151083; qual_match_simple_bayesian[29][37] = -0.00145918; qual_match_simple_bayesian[29][38] = -0.00141815; qual_match_simple_bayesian[29][39] = -0.00138557; qual_match_simple_bayesian[29][40] = -0.00135968; qual_match_simple_bayesian[29][41] = -0.00133912; qual_match_simple_bayesian[29][42] = -0.00132279; qual_match_simple_bayesian[29][43] = -0.00130982; qual_match_simple_bayesian[29][44] = -0.00129951; qual_match_simple_bayesian[29][45] = -0.00129133; qual_match_simple_bayesian[29][46] = -0.00128483; qual_match_simple_bayesian[30][0] = -8.00637; qual_match_simple_bayesian[30][1] = -1.58119; qual_match_simple_bayesian[30][2] = -0.997273; qual_match_simple_bayesian[30][3] = -0.69619; qual_match_simple_bayesian[30][4] = -0.508456; qual_match_simple_bayesian[30][5] = -0.380977; qual_match_simple_bayesian[30][6] = -0.290157; qual_match_simple_bayesian[30][7] = -0.223469; qual_match_simple_bayesian[30][8] = -0.173494; qual_match_simple_bayesian[30][9] = -0.135504; qual_match_simple_bayesian[30][10] = -0.106324; qual_match_simple_bayesian[30][11] = -0.083737; qual_match_simple_bayesian[30][12] = -0.0661522; qual_match_simple_bayesian[30][13] = -0.0524012; qual_match_simple_bayesian[30][14] = -0.0416115; qual_match_simple_bayesian[30][15] = -0.0331232; qual_match_simple_bayesian[30][16] = -0.0264316; qual_match_simple_bayesian[30][17] = -0.0211481; qual_match_simple_bayesian[30][18] = -0.016971; qual_match_simple_bayesian[30][19] = -0.0136654; qual_match_simple_bayesian[30][20] = -0.0110475; qual_match_simple_bayesian[30][21] = -0.00897283; qual_match_simple_bayesian[30][22] = -0.00732794; qual_match_simple_bayesian[30][23] = -0.00602329; qual_match_simple_bayesian[30][24] = -0.00498818; qual_match_simple_bayesian[30][25] = -0.00416673; qual_match_simple_bayesian[30][26] = -0.00351471; qual_match_simple_bayesian[30][27] = -0.00299709; qual_match_simple_bayesian[30][28] = -0.00258612; qual_match_simple_bayesian[30][29] = -0.0022598; qual_match_simple_bayesian[30][30] = -0.00200067; qual_match_simple_bayesian[30][31] = -0.00179488; qual_match_simple_bayesian[30][32] = -0.00163145; qual_match_simple_bayesian[30][33] = -0.00150165; qual_match_simple_bayesian[30][34] = -0.00139855; qual_match_simple_bayesian[30][35] = -0.00131667; qual_match_simple_bayesian[30][36] = -0.00125164; qual_match_simple_bayesian[30][37] = -0.00119998; qual_match_simple_bayesian[30][38] = -0.00115895; qual_match_simple_bayesian[30][39] = -0.00112636; qual_match_simple_bayesian[30][40] = -0.00110047; qual_match_simple_bayesian[30][41] = -0.00107991; qual_match_simple_bayesian[30][42] = -0.00106358; qual_match_simple_bayesian[30][43] = -0.0010506; qual_match_simple_bayesian[30][44] = -0.0010403; qual_match_simple_bayesian[30][45] = -0.00103211; qual_match_simple_bayesian[30][46] = -0.00102561; qual_match_simple_bayesian[31][0] = -8.23663; qual_match_simple_bayesian[31][1] = -1.58125; qual_match_simple_bayesian[31][2] = -0.997185; qual_match_simple_bayesian[31][3] = -0.696053; qual_match_simple_bayesian[31][4] = -0.508295; qual_match_simple_bayesian[31][5] = -0.380803; qual_match_simple_bayesian[31][6] = -0.289974; qual_match_simple_bayesian[31][7] = -0.22328; qual_match_simple_bayesian[31][8] = -0.173301; qual_match_simple_bayesian[31][9] = -0.135308; qual_match_simple_bayesian[31][10] = -0.106126; qual_match_simple_bayesian[31][11] = -0.0835371; qual_match_simple_bayesian[31][12] = -0.065951; qual_match_simple_bayesian[31][13] = -0.0521989; qual_match_simple_bayesian[31][14] = -0.0414085; qual_match_simple_bayesian[31][15] = -0.0329196; qual_match_simple_bayesian[31][16] = -0.0262275; qual_match_simple_bayesian[31][17] = -0.0209436; qual_match_simple_bayesian[31][18] = -0.0167662; qual_match_simple_bayesian[31][19] = -0.0134604; qual_match_simple_bayesian[31][20] = -0.0108423; qual_match_simple_bayesian[31][21] = -0.00876752; qual_match_simple_bayesian[31][22] = -0.00712252; qual_match_simple_bayesian[31][23] = -0.00581778; qual_match_simple_bayesian[31][24] = -0.0047826; qual_match_simple_bayesian[31][25] = -0.00396109; qual_match_simple_bayesian[31][26] = -0.00330902; qual_match_simple_bayesian[31][27] = -0.00279137; qual_match_simple_bayesian[31][28] = -0.00238037; qual_match_simple_bayesian[31][29] = -0.00205403; qual_match_simple_bayesian[31][30] = -0.00179488; qual_match_simple_bayesian[31][31] = -0.00158908; qual_match_simple_bayesian[31][32] = -0.00142563; qual_match_simple_bayesian[31][33] = -0.00129582; qual_match_simple_bayesian[31][34] = -0.00119272; qual_match_simple_bayesian[31][35] = -0.00111084; qual_match_simple_bayesian[31][36] = -0.0010458; qual_match_simple_bayesian[31][37] = -0.000994137; qual_match_simple_bayesian[31][38] = -0.000953104; qual_match_simple_bayesian[31][39] = -0.000920511; qual_match_simple_bayesian[31][40] = -0.000894622; qual_match_simple_bayesian[31][41] = -0.000874059; qual_match_simple_bayesian[31][42] = -0.000857725; qual_match_simple_bayesian[31][43] = -0.000844751; qual_match_simple_bayesian[31][44] = -0.000834445; qual_match_simple_bayesian[31][45] = -0.000826259; qual_match_simple_bayesian[31][46] = -0.000819756; qual_match_simple_bayesian[32][0] = -8.46688; qual_match_simple_bayesian[32][1] = -1.58129; qual_match_simple_bayesian[32][2] = -0.997114; qual_match_simple_bayesian[32][3] = -0.695944; qual_match_simple_bayesian[32][4] = -0.508168; qual_match_simple_bayesian[32][5] = -0.380664; qual_match_simple_bayesian[32][6] = -0.289829; qual_match_simple_bayesian[32][7] = -0.22313; qual_match_simple_bayesian[32][8] = -0.173148; qual_match_simple_bayesian[32][9] = -0.135153; qual_match_simple_bayesian[32][10] = -0.105968; qual_match_simple_bayesian[32][11] = -0.0833783; qual_match_simple_bayesian[32][12] = -0.0657912; qual_match_simple_bayesian[32][13] = -0.0520383; qual_match_simple_bayesian[32][14] = -0.0412473; qual_match_simple_bayesian[32][15] = -0.0327579; qual_match_simple_bayesian[32][16] = -0.0260655; qual_match_simple_bayesian[32][17] = -0.0207812; qual_match_simple_bayesian[32][18] = -0.0166036; qual_match_simple_bayesian[32][19] = -0.0132976; qual_match_simple_bayesian[32][20] = -0.0106794; qual_match_simple_bayesian[32][21] = -0.00860447; qual_match_simple_bayesian[32][22] = -0.00695938; qual_match_simple_bayesian[32][23] = -0.00565457; qual_match_simple_bayesian[32][24] = -0.00461933; qual_match_simple_bayesian[32][25] = -0.00379778; qual_match_simple_bayesian[32][26] = -0.00314567; qual_match_simple_bayesian[32][27] = -0.00262799; qual_match_simple_bayesian[32][28] = -0.00221697; qual_match_simple_bayesian[32][29] = -0.00189061; qual_match_simple_bayesian[32][30] = -0.00163145; qual_match_simple_bayesian[32][31] = -0.00142563; qual_match_simple_bayesian[32][32] = -0.00126218; qual_match_simple_bayesian[32][33] = -0.00113236; qual_match_simple_bayesian[32][34] = -0.00102926; qual_match_simple_bayesian[32][35] = -0.000947368; qual_match_simple_bayesian[32][36] = -0.000882324; qual_match_simple_bayesian[32][37] = -0.000830661; qual_match_simple_bayesian[32][38] = -0.000789625; qual_match_simple_bayesian[32][39] = -0.00075703; qual_match_simple_bayesian[32][40] = -0.00073114; qual_match_simple_bayesian[32][41] = -0.000710576; qual_match_simple_bayesian[32][42] = -0.000694241; qual_match_simple_bayesian[32][43] = -0.000681266; qual_match_simple_bayesian[32][44] = -0.00067096; qual_match_simple_bayesian[32][45] = -0.000662773; qual_match_simple_bayesian[32][46] = -0.00065627; qual_match_simple_bayesian[33][0] = -8.69714; qual_match_simple_bayesian[33][1] = -1.58133; qual_match_simple_bayesian[33][2] = -0.997059; qual_match_simple_bayesian[33][3] = -0.695858; qual_match_simple_bayesian[33][4] = -0.508067; qual_match_simple_bayesian[33][5] = -0.380554; qual_match_simple_bayesian[33][6] = -0.289713; qual_match_simple_bayesian[33][7] = -0.223011; qual_match_simple_bayesian[33][8] = -0.173026; qual_match_simple_bayesian[33][9] = -0.135029; qual_match_simple_bayesian[33][10] = -0.105843; qual_match_simple_bayesian[33][11] = -0.0832522; qual_match_simple_bayesian[33][12] = -0.0656642; qual_match_simple_bayesian[33][13] = -0.0519108; qual_match_simple_bayesian[33][14] = -0.0411192; qual_match_simple_bayesian[33][15] = -0.0326294; qual_match_simple_bayesian[33][16] = -0.0259367; qual_match_simple_bayesian[33][17] = -0.0206523; qual_match_simple_bayesian[33][18] = -0.0164745; qual_match_simple_bayesian[33][19] = -0.0131684; qual_match_simple_bayesian[33][20] = -0.01055; qual_match_simple_bayesian[33][21] = -0.00847497; qual_match_simple_bayesian[33][22] = -0.00682981; qual_match_simple_bayesian[33][23] = -0.00552494; qual_match_simple_bayesian[33][24] = -0.00448966; qual_match_simple_bayesian[33][25] = -0.00366807; qual_match_simple_bayesian[33][26] = -0.00301594; qual_match_simple_bayesian[33][27] = -0.00249823; qual_match_simple_bayesian[33][28] = -0.0020872; qual_match_simple_bayesian[33][29] = -0.00176082; qual_match_simple_bayesian[33][30] = -0.00150165; qual_match_simple_bayesian[33][31] = -0.00129582; qual_match_simple_bayesian[33][32] = -0.00113236; qual_match_simple_bayesian[33][33] = -0.00100254; qual_match_simple_bayesian[33][34] = -0.000899433; qual_match_simple_bayesian[33][35] = -0.000817538; qual_match_simple_bayesian[33][36] = -0.000752491; qual_match_simple_bayesian[33][37] = -0.000700826; qual_match_simple_bayesian[33][38] = -0.000659788; qual_match_simple_bayesian[33][39] = -0.000627192; qual_match_simple_bayesian[33][40] = -0.000601301; qual_match_simple_bayesian[33][41] = -0.000580736; qual_match_simple_bayesian[33][42] = -0.0005644; qual_match_simple_bayesian[33][43] = -0.000551424; qual_match_simple_bayesian[33][44] = -0.000541118; qual_match_simple_bayesian[33][45] = -0.000532931; qual_match_simple_bayesian[33][46] = -0.000526428; qual_match_simple_bayesian[34][0] = -8.9274; qual_match_simple_bayesian[34][1] = -1.58136; qual_match_simple_bayesian[34][2] = -0.997014; qual_match_simple_bayesian[34][3] = -0.695789; qual_match_simple_bayesian[34][4] = -0.507986; qual_match_simple_bayesian[34][5] = -0.380467; qual_match_simple_bayesian[34][6] = -0.289622; qual_match_simple_bayesian[34][7] = -0.222917; qual_match_simple_bayesian[34][8] = -0.17293; qual_match_simple_bayesian[34][9] = -0.134931; qual_match_simple_bayesian[34][10] = -0.105744; qual_match_simple_bayesian[34][11] = -0.083152; qual_match_simple_bayesian[34][12] = -0.0655634; qual_match_simple_bayesian[34][13] = -0.0518095; qual_match_simple_bayesian[34][14] = -0.0410175; qual_match_simple_bayesian[34][15] = -0.0325274; qual_match_simple_bayesian[34][16] = -0.0258345; qual_match_simple_bayesian[34][17] = -0.0205498; qual_match_simple_bayesian[34][18] = -0.0163719; qual_match_simple_bayesian[34][19] = -0.0130657; qual_match_simple_bayesian[34][20] = -0.0104472; qual_match_simple_bayesian[34][21] = -0.00837212; qual_match_simple_bayesian[34][22] = -0.00672691; qual_match_simple_bayesian[34][23] = -0.00542199; qual_match_simple_bayesian[34][24] = -0.00438667; qual_match_simple_bayesian[34][25] = -0.00356505; qual_match_simple_bayesian[34][26] = -0.0029129; qual_match_simple_bayesian[34][27] = -0.00239518; qual_match_simple_bayesian[34][28] = -0.00198413; qual_match_simple_bayesian[34][29] = -0.00165774; qual_match_simple_bayesian[34][30] = -0.00139855; qual_match_simple_bayesian[34][31] = -0.00119272; qual_match_simple_bayesian[34][32] = -0.00102926; qual_match_simple_bayesian[34][33] = -0.000899433; qual_match_simple_bayesian[34][34] = -0.00079632; qual_match_simple_bayesian[34][35] = -0.000714422; qual_match_simple_bayesian[34][36] = -0.000649373; qual_match_simple_bayesian[34][37] = -0.000597706; qual_match_simple_bayesian[34][38] = -0.000556667; qual_match_simple_bayesian[34][39] = -0.00052407; qual_match_simple_bayesian[34][40] = -0.000498178; qual_match_simple_bayesian[34][41] = -0.000477612; qual_match_simple_bayesian[34][42] = -0.000461276; qual_match_simple_bayesian[34][43] = -0.0004483; qual_match_simple_bayesian[34][44] = -0.000437993; qual_match_simple_bayesian[34][45] = -0.000429806; qual_match_simple_bayesian[34][46] = -0.000423302; qual_match_simple_bayesian[35][0] = -9.15766; qual_match_simple_bayesian[35][1] = -1.58138; qual_match_simple_bayesian[35][2] = -0.996979; qual_match_simple_bayesian[35][3] = -0.695735; qual_match_simple_bayesian[35][4] = -0.507922; qual_match_simple_bayesian[35][5] = -0.380398; qual_match_simple_bayesian[35][6] = -0.289549; qual_match_simple_bayesian[35][7] = -0.222842; qual_match_simple_bayesian[35][8] = -0.172853; qual_match_simple_bayesian[35][9] = -0.134853; qual_match_simple_bayesian[35][10] = -0.105665; qual_match_simple_bayesian[35][11] = -0.0830725; qual_match_simple_bayesian[35][12] = -0.0654833; qual_match_simple_bayesian[35][13] = -0.051729; qual_match_simple_bayesian[35][14] = -0.0409368; qual_match_simple_bayesian[35][15] = -0.0324464; qual_match_simple_bayesian[35][16] = -0.0257533; qual_match_simple_bayesian[35][17] = -0.0204685; qual_match_simple_bayesian[35][18] = -0.0162904; qual_match_simple_bayesian[35][19] = -0.0129841; qual_match_simple_bayesian[35][20] = -0.0103655; qual_match_simple_bayesian[35][21] = -0.00829043; qual_match_simple_bayesian[35][22] = -0.00664517; qual_match_simple_bayesian[35][23] = -0.00534022; qual_match_simple_bayesian[35][24] = -0.00430487; qual_match_simple_bayesian[35][25] = -0.00348323; qual_match_simple_bayesian[35][26] = -0.00283106; qual_match_simple_bayesian[35][27] = -0.00231332; qual_match_simple_bayesian[35][28] = -0.00190226; qual_match_simple_bayesian[35][29] = -0.00157586; qual_match_simple_bayesian[35][30] = -0.00131667; qual_match_simple_bayesian[35][31] = -0.00111084; qual_match_simple_bayesian[35][32] = -0.000947368; qual_match_simple_bayesian[35][33] = -0.000817538; qual_match_simple_bayesian[35][34] = -0.000714422; qual_match_simple_bayesian[35][35] = -0.000632522; qual_match_simple_bayesian[35][36] = -0.000567471; qual_match_simple_bayesian[35][37] = -0.000515803; qual_match_simple_bayesian[35][38] = -0.000474763; qual_match_simple_bayesian[35][39] = -0.000442165; qual_match_simple_bayesian[35][40] = -0.000416272; qual_match_simple_bayesian[35][41] = -0.000395705; qual_match_simple_bayesian[35][42] = -0.000379369; qual_match_simple_bayesian[35][43] = -0.000366392; qual_match_simple_bayesian[35][44] = -0.000356085; qual_match_simple_bayesian[35][45] = -0.000347898; qual_match_simple_bayesian[35][46] = -0.000341394; qual_match_simple_bayesian[36][0] = -9.38792; qual_match_simple_bayesian[36][1] = -1.5814; qual_match_simple_bayesian[36][2] = -0.996951; qual_match_simple_bayesian[36][3] = -0.695692; qual_match_simple_bayesian[36][4] = -0.507872; qual_match_simple_bayesian[36][5] = -0.380343; qual_match_simple_bayesian[36][6] = -0.289491; qual_match_simple_bayesian[36][7] = -0.222782; qual_match_simple_bayesian[36][8] = -0.172792; qual_match_simple_bayesian[36][9] = -0.134791; qual_match_simple_bayesian[36][10] = -0.105602; qual_match_simple_bayesian[36][11] = -0.0830093; qual_match_simple_bayesian[36][12] = -0.0654198; qual_match_simple_bayesian[36][13] = -0.0516651; qual_match_simple_bayesian[36][14] = -0.0408726; qual_match_simple_bayesian[36][15] = -0.0323821; qual_match_simple_bayesian[36][16] = -0.0256888; qual_match_simple_bayesian[36][17] = -0.0204039; qual_match_simple_bayesian[36][18] = -0.0162257; qual_match_simple_bayesian[36][19] = -0.0129193; qual_match_simple_bayesian[36][20] = -0.0103007; qual_match_simple_bayesian[36][21] = -0.00822555; qual_match_simple_bayesian[36][22] = -0.00658025; qual_match_simple_bayesian[36][23] = -0.00527527; qual_match_simple_bayesian[36][24] = -0.0042399; qual_match_simple_bayesian[36][25] = -0.00341824; qual_match_simple_bayesian[36][26] = -0.00276606; qual_match_simple_bayesian[36][27] = -0.00224831; qual_match_simple_bayesian[36][28] = -0.00183724; qual_match_simple_bayesian[36][29] = -0.00151083; qual_match_simple_bayesian[36][30] = -0.00125164; qual_match_simple_bayesian[36][31] = -0.0010458; qual_match_simple_bayesian[36][32] = -0.000882324; qual_match_simple_bayesian[36][33] = -0.000752491; qual_match_simple_bayesian[36][34] = -0.000649373; qual_match_simple_bayesian[36][35] = -0.000567471; qual_match_simple_bayesian[36][36] = -0.000502419; qual_match_simple_bayesian[36][37] = -0.00045075; qual_match_simple_bayesian[36][38] = -0.000409709; qual_match_simple_bayesian[36][39] = -0.00037711; qual_match_simple_bayesian[36][40] = -0.000351217; qual_match_simple_bayesian[36][41] = -0.00033065; qual_match_simple_bayesian[36][42] = -0.000314313; qual_match_simple_bayesian[36][43] = -0.000301336; qual_match_simple_bayesian[36][44] = -0.000291028; qual_match_simple_bayesian[36][45] = -0.000282841; qual_match_simple_bayesian[36][46] = -0.000276337; qual_match_simple_bayesian[37][0] = -9.61818; qual_match_simple_bayesian[37][1] = -1.58142; qual_match_simple_bayesian[37][2] = -0.996929; qual_match_simple_bayesian[37][3] = -0.695657; qual_match_simple_bayesian[37][4] = -0.507831; qual_match_simple_bayesian[37][5] = -0.380299; qual_match_simple_bayesian[37][6] = -0.289445; qual_match_simple_bayesian[37][7] = -0.222734; qual_match_simple_bayesian[37][8] = -0.172744; qual_match_simple_bayesian[37][9] = -0.134742; qual_match_simple_bayesian[37][10] = -0.105553; qual_match_simple_bayesian[37][11] = -0.0829591; qual_match_simple_bayesian[37][12] = -0.0653692; qual_match_simple_bayesian[37][13] = -0.0516143; qual_match_simple_bayesian[37][14] = -0.0408216; qual_match_simple_bayesian[37][15] = -0.0323309; qual_match_simple_bayesian[37][16] = -0.0256376; qual_match_simple_bayesian[37][17] = -0.0203526; qual_match_simple_bayesian[37][18] = -0.0161743; qual_match_simple_bayesian[37][19] = -0.0128679; qual_match_simple_bayesian[37][20] = -0.0102492; qual_match_simple_bayesian[37][21] = -0.00817401; qual_match_simple_bayesian[37][22] = -0.00652869; qual_match_simple_bayesian[37][23] = -0.00522368; qual_match_simple_bayesian[37][24] = -0.0041883; qual_match_simple_bayesian[37][25] = -0.00336662; qual_match_simple_bayesian[37][26] = -0.00271443; qual_match_simple_bayesian[37][27] = -0.00219667; qual_match_simple_bayesian[37][28] = -0.00178559; qual_match_simple_bayesian[37][29] = -0.00145918; qual_match_simple_bayesian[37][30] = -0.00119998; qual_match_simple_bayesian[37][31] = -0.000994137; qual_match_simple_bayesian[37][32] = -0.000830661; qual_match_simple_bayesian[37][33] = -0.000700826; qual_match_simple_bayesian[37][34] = -0.000597706; qual_match_simple_bayesian[37][35] = -0.000515803; qual_match_simple_bayesian[37][36] = -0.00045075; qual_match_simple_bayesian[37][37] = -0.000399079; qual_match_simple_bayesian[37][38] = -0.000358037; qual_match_simple_bayesian[37][39] = -0.000325438; qual_match_simple_bayesian[37][40] = -0.000299544; qual_match_simple_bayesian[37][41] = -0.000278977; qual_match_simple_bayesian[37][42] = -0.00026264; qual_match_simple_bayesian[37][43] = -0.000249663; qual_match_simple_bayesian[37][44] = -0.000239355; qual_match_simple_bayesian[37][45] = -0.000231167; qual_match_simple_bayesian[37][46] = -0.000224664; qual_match_simple_bayesian[38][0] = -9.84844; qual_match_simple_bayesian[38][1] = -1.58143; qual_match_simple_bayesian[38][2] = -0.996911; qual_match_simple_bayesian[38][3] = -0.69563; qual_match_simple_bayesian[38][4] = -0.507799; qual_match_simple_bayesian[38][5] = -0.380264; qual_match_simple_bayesian[38][6] = -0.289409; qual_match_simple_bayesian[38][7] = -0.222697; qual_match_simple_bayesian[38][8] = -0.172705; qual_match_simple_bayesian[38][9] = -0.134703; qual_match_simple_bayesian[38][10] = -0.105513; qual_match_simple_bayesian[38][11] = -0.0829192; qual_match_simple_bayesian[38][12] = -0.0653291; qual_match_simple_bayesian[38][13] = -0.051574; qual_match_simple_bayesian[38][14] = -0.0407812; qual_match_simple_bayesian[38][15] = -0.0322904; qual_match_simple_bayesian[38][16] = -0.0255969; qual_match_simple_bayesian[38][17] = -0.0203118; qual_match_simple_bayesian[38][18] = -0.0161335; qual_match_simple_bayesian[38][19] = -0.012827; qual_match_simple_bayesian[38][20] = -0.0102083; qual_match_simple_bayesian[38][21] = -0.00813308; qual_match_simple_bayesian[38][22] = -0.00648773; qual_match_simple_bayesian[38][23] = -0.00518271; qual_match_simple_bayesian[38][24] = -0.00414731; qual_match_simple_bayesian[38][25] = -0.00332562; qual_match_simple_bayesian[38][26] = -0.00267342; qual_match_simple_bayesian[38][27] = -0.00215565; qual_match_simple_bayesian[38][28] = -0.00174457; qual_match_simple_bayesian[38][29] = -0.00141815; qual_match_simple_bayesian[38][30] = -0.00115895; qual_match_simple_bayesian[38][31] = -0.000953104; qual_match_simple_bayesian[38][32] = -0.000789625; qual_match_simple_bayesian[38][33] = -0.000659788; qual_match_simple_bayesian[38][34] = -0.000556667; qual_match_simple_bayesian[38][35] = -0.000474763; qual_match_simple_bayesian[38][36] = -0.000409709; qual_match_simple_bayesian[38][37] = -0.000358037; qual_match_simple_bayesian[38][38] = -0.000316995; qual_match_simple_bayesian[38][39] = -0.000284396; qual_match_simple_bayesian[38][40] = -0.000258502; qual_match_simple_bayesian[38][41] = -0.000237934; qual_match_simple_bayesian[38][42] = -0.000221596; qual_match_simple_bayesian[38][43] = -0.000208619; qual_match_simple_bayesian[38][44] = -0.000198311; qual_match_simple_bayesian[38][45] = -0.000190123; qual_match_simple_bayesian[38][46] = -0.00018362; qual_match_simple_bayesian[39][0] = -10.0787; qual_match_simple_bayesian[39][1] = -1.58144; qual_match_simple_bayesian[39][2] = -0.996897; qual_match_simple_bayesian[39][3] = -0.695608; qual_match_simple_bayesian[39][4] = -0.507774; qual_match_simple_bayesian[39][5] = -0.380237; qual_match_simple_bayesian[39][6] = -0.28938; qual_match_simple_bayesian[39][7] = -0.222667; qual_match_simple_bayesian[39][8] = -0.172675; qual_match_simple_bayesian[39][9] = -0.134672; qual_match_simple_bayesian[39][10] = -0.105482; qual_match_simple_bayesian[39][11] = -0.0828876; qual_match_simple_bayesian[39][12] = -0.0652972; qual_match_simple_bayesian[39][13] = -0.051542; qual_match_simple_bayesian[39][14] = -0.040749; qual_match_simple_bayesian[39][15] = -0.0322581; qual_match_simple_bayesian[39][16] = -0.0255645; qual_match_simple_bayesian[39][17] = -0.0202794; qual_match_simple_bayesian[39][18] = -0.0161011; qual_match_simple_bayesian[39][19] = -0.0127945; qual_match_simple_bayesian[39][20] = -0.0101758; qual_match_simple_bayesian[39][21] = -0.00810056; qual_match_simple_bayesian[39][22] = -0.0064552; qual_match_simple_bayesian[39][23] = -0.00515016; qual_match_simple_bayesian[39][24] = -0.00411475; qual_match_simple_bayesian[39][25] = -0.00329306; qual_match_simple_bayesian[39][26] = -0.00264084; qual_match_simple_bayesian[39][27] = -0.00212307; qual_match_simple_bayesian[39][28] = -0.00171198; qual_match_simple_bayesian[39][29] = -0.00138557; qual_match_simple_bayesian[39][30] = -0.00112636; qual_match_simple_bayesian[39][31] = -0.000920511; qual_match_simple_bayesian[39][32] = -0.00075703; qual_match_simple_bayesian[39][33] = -0.000627192; qual_match_simple_bayesian[39][34] = -0.00052407; qual_match_simple_bayesian[39][35] = -0.000442165; qual_match_simple_bayesian[39][36] = -0.00037711; qual_match_simple_bayesian[39][37] = -0.000325438; qual_match_simple_bayesian[39][38] = -0.000284396; qual_match_simple_bayesian[39][39] = -0.000251796; qual_match_simple_bayesian[39][40] = -0.000225901; qual_match_simple_bayesian[39][41] = -0.000205333; qual_match_simple_bayesian[39][42] = -0.000188996; qual_match_simple_bayesian[39][43] = -0.000176018; qual_match_simple_bayesian[39][44] = -0.00016571; qual_match_simple_bayesian[39][45] = -0.000157522; qual_match_simple_bayesian[39][46] = -0.000151019; qual_match_simple_bayesian[40][0] = -10.309; qual_match_simple_bayesian[40][1] = -1.58145; qual_match_simple_bayesian[40][2] = -0.996886; qual_match_simple_bayesian[40][3] = -0.695591; qual_match_simple_bayesian[40][4] = -0.507754; qual_match_simple_bayesian[40][5] = -0.380215; qual_match_simple_bayesian[40][6] = -0.289357; qual_match_simple_bayesian[40][7] = -0.222643; qual_match_simple_bayesian[40][8] = -0.17265; qual_match_simple_bayesian[40][9] = -0.134647; qual_match_simple_bayesian[40][10] = -0.105457; qual_match_simple_bayesian[40][11] = -0.0828624; qual_match_simple_bayesian[40][12] = -0.0652719; qual_match_simple_bayesian[40][13] = -0.0515165; qual_match_simple_bayesian[40][14] = -0.0407235; qual_match_simple_bayesian[40][15] = -0.0322325; qual_match_simple_bayesian[40][16] = -0.0255389; qual_match_simple_bayesian[40][17] = -0.0202537; qual_match_simple_bayesian[40][18] = -0.0160753; qual_match_simple_bayesian[40][19] = -0.0127688; qual_match_simple_bayesian[40][20] = -0.01015; qual_match_simple_bayesian[40][21] = -0.00807474; qual_match_simple_bayesian[40][22] = -0.00642936; qual_match_simple_bayesian[40][23] = -0.00512431; qual_match_simple_bayesian[40][24] = -0.00408889; qual_match_simple_bayesian[40][25] = -0.00326719; qual_match_simple_bayesian[40][26] = -0.00261497; qual_match_simple_bayesian[40][27] = -0.00209719; qual_match_simple_bayesian[40][28] = -0.0016861; qual_match_simple_bayesian[40][29] = -0.00135968; qual_match_simple_bayesian[40][30] = -0.00110047; qual_match_simple_bayesian[40][31] = -0.000894622; qual_match_simple_bayesian[40][32] = -0.00073114; qual_match_simple_bayesian[40][33] = -0.000601301; qual_match_simple_bayesian[40][34] = -0.000498178; qual_match_simple_bayesian[40][35] = -0.000416272; qual_match_simple_bayesian[40][36] = -0.000351217; qual_match_simple_bayesian[40][37] = -0.000299544; qual_match_simple_bayesian[40][38] = -0.000258502; qual_match_simple_bayesian[40][39] = -0.000225901; qual_match_simple_bayesian[40][40] = -0.000200007; qual_match_simple_bayesian[40][41] = -0.000179438; qual_match_simple_bayesian[40][42] = -0.000163101; qual_match_simple_bayesian[40][43] = -0.000150123; qual_match_simple_bayesian[40][44] = -0.000139815; qual_match_simple_bayesian[40][45] = -0.000131627; qual_match_simple_bayesian[40][46] = -0.000125123; qual_match_simple_bayesian[41][0] = -10.5392; qual_match_simple_bayesian[41][1] = -1.58145; qual_match_simple_bayesian[41][2] = -0.996877; qual_match_simple_bayesian[41][3] = -0.695577; qual_match_simple_bayesian[41][4] = -0.507738; qual_match_simple_bayesian[41][5] = -0.380198; qual_match_simple_bayesian[41][6] = -0.289339; qual_match_simple_bayesian[41][7] = -0.222624; qual_match_simple_bayesian[41][8] = -0.172631; qual_match_simple_bayesian[41][9] = -0.134628; qual_match_simple_bayesian[41][10] = -0.105437; qual_match_simple_bayesian[41][11] = -0.0828425; qual_match_simple_bayesian[41][12] = -0.0652518; qual_match_simple_bayesian[41][13] = -0.0514963; qual_match_simple_bayesian[41][14] = -0.0407032; qual_match_simple_bayesian[41][15] = -0.0322121; qual_match_simple_bayesian[41][16] = -0.0255185; qual_match_simple_bayesian[41][17] = -0.0202333; qual_match_simple_bayesian[41][18] = -0.0160549; qual_match_simple_bayesian[41][19] = -0.0127483; qual_match_simple_bayesian[41][20] = -0.0101295; qual_match_simple_bayesian[41][21] = -0.00805422; qual_match_simple_bayesian[41][22] = -0.00640883; qual_match_simple_bayesian[41][23] = -0.00510378; qual_match_simple_bayesian[41][24] = -0.00406835; qual_match_simple_bayesian[41][25] = -0.00324664; qual_match_simple_bayesian[41][26] = -0.00259442; qual_match_simple_bayesian[41][27] = -0.00207664; qual_match_simple_bayesian[41][28] = -0.00166554; qual_match_simple_bayesian[41][29] = -0.00133912; qual_match_simple_bayesian[41][30] = -0.00107991; qual_match_simple_bayesian[41][31] = -0.000874059; qual_match_simple_bayesian[41][32] = -0.000710576; qual_match_simple_bayesian[41][33] = -0.000580736; qual_match_simple_bayesian[41][34] = -0.000477612; qual_match_simple_bayesian[41][35] = -0.000395705; qual_match_simple_bayesian[41][36] = -0.00033065; qual_match_simple_bayesian[41][37] = -0.000278977; qual_match_simple_bayesian[41][38] = -0.000237934; qual_match_simple_bayesian[41][39] = -0.000205333; qual_match_simple_bayesian[41][40] = -0.000179438; qual_match_simple_bayesian[41][41] = -0.00015887; qual_match_simple_bayesian[41][42] = -0.000142532; qual_match_simple_bayesian[41][43] = -0.000129555; qual_match_simple_bayesian[41][44] = -0.000119246; qual_match_simple_bayesian[41][45] = -0.000111058; qual_match_simple_bayesian[41][46] = -0.000104554; qual_match_simple_bayesian[42][0] = -10.7695; qual_match_simple_bayesian[42][1] = -1.58146; qual_match_simple_bayesian[42][2] = -0.99687; qual_match_simple_bayesian[42][3] = -0.695566; qual_match_simple_bayesian[42][4] = -0.507725; qual_match_simple_bayesian[42][5] = -0.380184; qual_match_simple_bayesian[42][6] = -0.289324; qual_match_simple_bayesian[42][7] = -0.222609; qual_match_simple_bayesian[42][8] = -0.172616; qual_match_simple_bayesian[42][9] = -0.134612; qual_match_simple_bayesian[42][10] = -0.105421; qual_match_simple_bayesian[42][11] = -0.0828266; qual_match_simple_bayesian[42][12] = -0.0652359; qual_match_simple_bayesian[42][13] = -0.0514803; qual_match_simple_bayesian[42][14] = -0.0406871; qual_match_simple_bayesian[42][15] = -0.032196; qual_match_simple_bayesian[42][16] = -0.0255023; qual_match_simple_bayesian[42][17] = -0.020217; qual_match_simple_bayesian[42][18] = -0.0160386; qual_match_simple_bayesian[42][19] = -0.012732; qual_match_simple_bayesian[42][20] = -0.0101132; qual_match_simple_bayesian[42][21] = -0.00803793; qual_match_simple_bayesian[42][22] = -0.00639253; qual_match_simple_bayesian[42][23] = -0.00508747; qual_match_simple_bayesian[42][24] = -0.00405203; qual_match_simple_bayesian[42][25] = -0.00323032; qual_match_simple_bayesian[42][26] = -0.00257809; qual_match_simple_bayesian[42][27] = -0.00206031; qual_match_simple_bayesian[42][28] = -0.00164921; qual_match_simple_bayesian[42][29] = -0.00132279; qual_match_simple_bayesian[42][30] = -0.00106358; qual_match_simple_bayesian[42][31] = -0.000857725; qual_match_simple_bayesian[42][32] = -0.000694241; qual_match_simple_bayesian[42][33] = -0.0005644; qual_match_simple_bayesian[42][34] = -0.000461276; qual_match_simple_bayesian[42][35] = -0.000379369; qual_match_simple_bayesian[42][36] = -0.000314313; qual_match_simple_bayesian[42][37] = -0.00026264; qual_match_simple_bayesian[42][38] = -0.000221596; qual_match_simple_bayesian[42][39] = -0.000188996; qual_match_simple_bayesian[42][40] = -0.000163101; qual_match_simple_bayesian[42][41] = -0.000142532; qual_match_simple_bayesian[42][42] = -0.000126194; qual_match_simple_bayesian[42][43] = -0.000113217; qual_match_simple_bayesian[42][44] = -0.000102908; qual_match_simple_bayesian[42][45] = -9.47203e-05; qual_match_simple_bayesian[42][46] = -8.82164e-05; qual_match_simple_bayesian[43][0] = -10.9997; qual_match_simple_bayesian[43][1] = -1.58146; qual_match_simple_bayesian[43][2] = -0.996865; qual_match_simple_bayesian[43][3] = -0.695558; qual_match_simple_bayesian[43][4] = -0.507715; qual_match_simple_bayesian[43][5] = -0.380173; qual_match_simple_bayesian[43][6] = -0.289313; qual_match_simple_bayesian[43][7] = -0.222597; qual_match_simple_bayesian[43][8] = -0.172604; qual_match_simple_bayesian[43][9] = -0.1346; qual_match_simple_bayesian[43][10] = -0.105409; qual_match_simple_bayesian[43][11] = -0.082814; qual_match_simple_bayesian[43][12] = -0.0652232; qual_match_simple_bayesian[43][13] = -0.0514675; qual_match_simple_bayesian[43][14] = -0.0406743; qual_match_simple_bayesian[43][15] = -0.0321831; qual_match_simple_bayesian[43][16] = -0.0254894; qual_match_simple_bayesian[43][17] = -0.0202041; qual_match_simple_bayesian[43][18] = -0.0160257; qual_match_simple_bayesian[43][19] = -0.0127191; qual_match_simple_bayesian[43][20] = -0.0101003; qual_match_simple_bayesian[43][21] = -0.00802498; qual_match_simple_bayesian[43][22] = -0.00637958; qual_match_simple_bayesian[43][23] = -0.00507451; qual_match_simple_bayesian[43][24] = -0.00403907; qual_match_simple_bayesian[43][25] = -0.00321736; qual_match_simple_bayesian[43][26] = -0.00256512; qual_match_simple_bayesian[43][27] = -0.00204734; qual_match_simple_bayesian[43][28] = -0.00163624; qual_match_simple_bayesian[43][29] = -0.00130982; qual_match_simple_bayesian[43][30] = -0.0010506; qual_match_simple_bayesian[43][31] = -0.000844751; qual_match_simple_bayesian[43][32] = -0.000681266; qual_match_simple_bayesian[43][33] = -0.000551424; qual_match_simple_bayesian[43][34] = -0.0004483; qual_match_simple_bayesian[43][35] = -0.000366392; qual_match_simple_bayesian[43][36] = -0.000301336; qual_match_simple_bayesian[43][37] = -0.000249663; qual_match_simple_bayesian[43][38] = -0.000208619; qual_match_simple_bayesian[43][39] = -0.000176018; qual_match_simple_bayesian[43][40] = -0.000150123; qual_match_simple_bayesian[43][41] = -0.000129555; qual_match_simple_bayesian[43][42] = -0.000113217; qual_match_simple_bayesian[43][43] = -0.000100239; qual_match_simple_bayesian[43][44] = -8.99308e-05; qual_match_simple_bayesian[43][45] = -8.17427e-05; qual_match_simple_bayesian[43][46] = -7.52387e-05; qual_match_simple_bayesian[44][0] = -11.23; qual_match_simple_bayesian[44][1] = -1.58146; qual_match_simple_bayesian[44][2] = -0.99686; qual_match_simple_bayesian[44][3] = -0.695551; qual_match_simple_bayesian[44][4] = -0.507707; qual_match_simple_bayesian[44][5] = -0.380164; qual_match_simple_bayesian[44][6] = -0.289304; qual_match_simple_bayesian[44][7] = -0.222588; qual_match_simple_bayesian[44][8] = -0.172594; qual_match_simple_bayesian[44][9] = -0.13459; qual_match_simple_bayesian[44][10] = -0.105399; qual_match_simple_bayesian[44][11] = -0.082804; qual_match_simple_bayesian[44][12] = -0.0652131; qual_match_simple_bayesian[44][13] = -0.0514574; qual_match_simple_bayesian[44][14] = -0.0406641; qual_match_simple_bayesian[44][15] = -0.032173; qual_match_simple_bayesian[44][16] = -0.0254792; qual_match_simple_bayesian[44][17] = -0.0201939; qual_match_simple_bayesian[44][18] = -0.0160155; qual_match_simple_bayesian[44][19] = -0.0127088; qual_match_simple_bayesian[44][20] = -0.01009; qual_match_simple_bayesian[44][21] = -0.0080147; qual_match_simple_bayesian[44][22] = -0.00636929; qual_match_simple_bayesian[44][23] = -0.00506422; qual_match_simple_bayesian[44][24] = -0.00402878; qual_match_simple_bayesian[44][25] = -0.00320706; qual_match_simple_bayesian[44][26] = -0.00255482; qual_match_simple_bayesian[44][27] = -0.00203704; qual_match_simple_bayesian[44][28] = -0.00162594; qual_match_simple_bayesian[44][29] = -0.00129951; qual_match_simple_bayesian[44][30] = -0.0010403; qual_match_simple_bayesian[44][31] = -0.000834445; qual_match_simple_bayesian[44][32] = -0.00067096; qual_match_simple_bayesian[44][33] = -0.000541118; qual_match_simple_bayesian[44][34] = -0.000437993; qual_match_simple_bayesian[44][35] = -0.000356085; qual_match_simple_bayesian[44][36] = -0.000291028; qual_match_simple_bayesian[44][37] = -0.000239355; qual_match_simple_bayesian[44][38] = -0.000198311; qual_match_simple_bayesian[44][39] = -0.00016571; qual_match_simple_bayesian[44][40] = -0.000139815; qual_match_simple_bayesian[44][41] = -0.000119246; qual_match_simple_bayesian[44][42] = -0.000102908; qual_match_simple_bayesian[44][43] = -8.99308e-05; qual_match_simple_bayesian[44][44] = -7.96225e-05; qual_match_simple_bayesian[44][45] = -7.14344e-05; qual_match_simple_bayesian[44][46] = -6.49304e-05; qual_match_simple_bayesian[45][0] = -11.4602; qual_match_simple_bayesian[45][1] = -1.58146; qual_match_simple_bayesian[45][2] = -0.996857; qual_match_simple_bayesian[45][3] = -0.695546; qual_match_simple_bayesian[45][4] = -0.507701; qual_match_simple_bayesian[45][5] = -0.380157; qual_match_simple_bayesian[45][6] = -0.289296; qual_match_simple_bayesian[45][7] = -0.222581; qual_match_simple_bayesian[45][8] = -0.172586; qual_match_simple_bayesian[45][9] = -0.134582; qual_match_simple_bayesian[45][10] = -0.105391; qual_match_simple_bayesian[45][11] = -0.082796; qual_match_simple_bayesian[45][12] = -0.0652051; qual_match_simple_bayesian[45][13] = -0.0514493; qual_match_simple_bayesian[45][14] = -0.040656; qual_match_simple_bayesian[45][15] = -0.0321649; qual_match_simple_bayesian[45][16] = -0.0254711; qual_match_simple_bayesian[45][17] = -0.0201858; qual_match_simple_bayesian[45][18] = -0.0160073; qual_match_simple_bayesian[45][19] = -0.0127007; qual_match_simple_bayesian[45][20] = -0.0100819; qual_match_simple_bayesian[45][21] = -0.00800654; qual_match_simple_bayesian[45][22] = -0.00636112; qual_match_simple_bayesian[45][23] = -0.00505604; qual_match_simple_bayesian[45][24] = -0.0040206; qual_match_simple_bayesian[45][25] = -0.00319888; qual_match_simple_bayesian[45][26] = -0.00254664; qual_match_simple_bayesian[45][27] = -0.00202886; qual_match_simple_bayesian[45][28] = -0.00161776; qual_match_simple_bayesian[45][29] = -0.00129133; qual_match_simple_bayesian[45][30] = -0.00103211; qual_match_simple_bayesian[45][31] = -0.000826259; qual_match_simple_bayesian[45][32] = -0.000662773; qual_match_simple_bayesian[45][33] = -0.000532931; qual_match_simple_bayesian[45][34] = -0.000429806; qual_match_simple_bayesian[45][35] = -0.000347898; qual_match_simple_bayesian[45][36] = -0.000282841; qual_match_simple_bayesian[45][37] = -0.000231167; qual_match_simple_bayesian[45][38] = -0.000190123; qual_match_simple_bayesian[45][39] = -0.000157522; qual_match_simple_bayesian[45][40] = -0.000131627; qual_match_simple_bayesian[45][41] = -0.000111058; qual_match_simple_bayesian[45][42] = -9.47203e-05; qual_match_simple_bayesian[45][43] = -8.17427e-05; qual_match_simple_bayesian[45][44] = -7.14344e-05; qual_match_simple_bayesian[45][45] = -6.32462e-05; qual_match_simple_bayesian[45][46] = -5.67422e-05; qual_match_simple_bayesian[46][0] = -11.6905; qual_match_simple_bayesian[46][1] = -1.58147; qual_match_simple_bayesian[46][2] = -0.996854; qual_match_simple_bayesian[46][3] = -0.695541; qual_match_simple_bayesian[46][4] = -0.507695; qual_match_simple_bayesian[46][5] = -0.380152; qual_match_simple_bayesian[46][6] = -0.28929; qual_match_simple_bayesian[46][7] = -0.222575; qual_match_simple_bayesian[46][8] = -0.17258; qual_match_simple_bayesian[46][9] = -0.134576; qual_match_simple_bayesian[46][10] = -0.105385; qual_match_simple_bayesian[46][11] = -0.0827897; qual_match_simple_bayesian[46][12] = -0.0651987; qual_match_simple_bayesian[46][13] = -0.051443; qual_match_simple_bayesian[46][14] = -0.0406496; qual_match_simple_bayesian[46][15] = -0.0321584; qual_match_simple_bayesian[46][16] = -0.0254646; qual_match_simple_bayesian[46][17] = -0.0201793; qual_match_simple_bayesian[46][18] = -0.0160009; qual_match_simple_bayesian[46][19] = -0.0126942; qual_match_simple_bayesian[46][20] = -0.0100754; qual_match_simple_bayesian[46][21] = -0.00800005; qual_match_simple_bayesian[46][22] = -0.00635463; qual_match_simple_bayesian[46][23] = -0.00504955; qual_match_simple_bayesian[46][24] = -0.0040141; qual_match_simple_bayesian[46][25] = -0.00319238; qual_match_simple_bayesian[46][26] = -0.00254014; qual_match_simple_bayesian[46][27] = -0.00202236; qual_match_simple_bayesian[46][28] = -0.00161126; qual_match_simple_bayesian[46][29] = -0.00128483; qual_match_simple_bayesian[46][30] = -0.00102561; qual_match_simple_bayesian[46][31] = -0.000819756; qual_match_simple_bayesian[46][32] = -0.00065627; qual_match_simple_bayesian[46][33] = -0.000526428; qual_match_simple_bayesian[46][34] = -0.000423302; qual_match_simple_bayesian[46][35] = -0.000341394; qual_match_simple_bayesian[46][36] = -0.000276337; qual_match_simple_bayesian[46][37] = -0.000224664; qual_match_simple_bayesian[46][38] = -0.00018362; qual_match_simple_bayesian[46][39] = -0.000151019; qual_match_simple_bayesian[46][40] = -0.000125123; qual_match_simple_bayesian[46][41] = -0.000104554; qual_match_simple_bayesian[46][42] = -8.82164e-05; qual_match_simple_bayesian[46][43] = -7.52387e-05; qual_match_simple_bayesian[46][44] = -6.49304e-05; qual_match_simple_bayesian[46][45] = -5.67422e-05; qual_match_simple_bayesian[46][46] = -5.02381e-05; vector< vector > qual_mismatch_simple_bayesian; qual_mismatch_simple_bayesian.resize(47); for (int i = 0; i < qual_mismatch_simple_bayesian.size(); i++) { qual_mismatch_simple_bayesian[i].resize(47); } qual_mismatch_simple_bayesian[0][0] = -1.50408; qual_mismatch_simple_bayesian[0][1] = -1.40619; qual_mismatch_simple_bayesian[0][2] = -1.33474; qual_mismatch_simple_bayesian[0][3] = -1.28141; qual_mismatch_simple_bayesian[0][4] = -1.24099; qual_mismatch_simple_bayesian[0][5] = -1.21; qual_mismatch_simple_bayesian[0][6] = -1.18606; qual_mismatch_simple_bayesian[0][7] = -1.16744; qual_mismatch_simple_bayesian[0][8] = -1.15289; qual_mismatch_simple_bayesian[0][9] = -1.14148; qual_mismatch_simple_bayesian[0][10] = -1.13251; qual_mismatch_simple_bayesian[0][11] = -1.12545; qual_mismatch_simple_bayesian[0][12] = -1.11987; qual_mismatch_simple_bayesian[0][13] = -1.11546; qual_mismatch_simple_bayesian[0][14] = -1.11197; qual_mismatch_simple_bayesian[0][15] = -1.10921; qual_mismatch_simple_bayesian[0][16] = -1.10702; qual_mismatch_simple_bayesian[0][17] = -1.10529; qual_mismatch_simple_bayesian[0][18] = -1.10391; qual_mismatch_simple_bayesian[0][19] = -1.10282; qual_mismatch_simple_bayesian[0][20] = -1.10195; qual_mismatch_simple_bayesian[0][21] = -1.10126; qual_mismatch_simple_bayesian[0][22] = -1.10072; qual_mismatch_simple_bayesian[0][23] = -1.10028; qual_mismatch_simple_bayesian[0][24] = -1.09994; qual_mismatch_simple_bayesian[0][25] = -1.09967; qual_mismatch_simple_bayesian[0][26] = -1.09945; qual_mismatch_simple_bayesian[0][27] = -1.09928; qual_mismatch_simple_bayesian[0][28] = -1.09914; qual_mismatch_simple_bayesian[0][29] = -1.09903; qual_mismatch_simple_bayesian[0][30] = -1.09895; qual_mismatch_simple_bayesian[0][31] = -1.09888; qual_mismatch_simple_bayesian[0][32] = -1.09882; qual_mismatch_simple_bayesian[0][33] = -1.09878; qual_mismatch_simple_bayesian[0][34] = -1.09874; qual_mismatch_simple_bayesian[0][35] = -1.09872; qual_mismatch_simple_bayesian[0][36] = -1.0987; qual_mismatch_simple_bayesian[0][37] = -1.09868; qual_mismatch_simple_bayesian[0][38] = -1.09867; qual_mismatch_simple_bayesian[0][39] = -1.09865; qual_mismatch_simple_bayesian[0][40] = -1.09865; qual_mismatch_simple_bayesian[0][41] = -1.09864; qual_mismatch_simple_bayesian[0][42] = -1.09863; qual_mismatch_simple_bayesian[0][43] = -1.09863; qual_mismatch_simple_bayesian[0][44] = -1.09863; qual_mismatch_simple_bayesian[0][45] = -1.09862; qual_mismatch_simple_bayesian[0][46] = -1.09862; qual_mismatch_simple_bayesian[1][0] = -1.40619; qual_mismatch_simple_bayesian[1][1] = -1.38979; qual_mismatch_simple_bayesian[1][2] = -1.37696; qual_mismatch_simple_bayesian[1][3] = -1.36688; qual_mismatch_simple_bayesian[1][4] = -1.35894; qual_mismatch_simple_bayesian[1][5] = -1.35268; qual_mismatch_simple_bayesian[1][6] = -1.34774; qual_mismatch_simple_bayesian[1][7] = -1.34383; qual_mismatch_simple_bayesian[1][8] = -1.34073; qual_mismatch_simple_bayesian[1][9] = -1.33828; qual_mismatch_simple_bayesian[1][10] = -1.33634; qual_mismatch_simple_bayesian[1][11] = -1.3348; qual_mismatch_simple_bayesian[1][12] = -1.33358; qual_mismatch_simple_bayesian[1][13] = -1.33261; qual_mismatch_simple_bayesian[1][14] = -1.33184; qual_mismatch_simple_bayesian[1][15] = -1.33123; qual_mismatch_simple_bayesian[1][16] = -1.33074; qual_mismatch_simple_bayesian[1][17] = -1.33036; qual_mismatch_simple_bayesian[1][18] = -1.33005; qual_mismatch_simple_bayesian[1][19] = -1.32981; qual_mismatch_simple_bayesian[1][20] = -1.32962; qual_mismatch_simple_bayesian[1][21] = -1.32946; qual_mismatch_simple_bayesian[1][22] = -1.32934; qual_mismatch_simple_bayesian[1][23] = -1.32924; qual_mismatch_simple_bayesian[1][24] = -1.32917; qual_mismatch_simple_bayesian[1][25] = -1.32911; qual_mismatch_simple_bayesian[1][26] = -1.32906; qual_mismatch_simple_bayesian[1][27] = -1.32902; qual_mismatch_simple_bayesian[1][28] = -1.32899; qual_mismatch_simple_bayesian[1][29] = -1.32896; qual_mismatch_simple_bayesian[1][30] = -1.32895; qual_mismatch_simple_bayesian[1][31] = -1.32893; qual_mismatch_simple_bayesian[1][32] = -1.32892; qual_mismatch_simple_bayesian[1][33] = -1.32891; qual_mismatch_simple_bayesian[1][34] = -1.3289; qual_mismatch_simple_bayesian[1][35] = -1.32889; qual_mismatch_simple_bayesian[1][36] = -1.32889; qual_mismatch_simple_bayesian[1][37] = -1.32889; qual_mismatch_simple_bayesian[1][38] = -1.32888; qual_mismatch_simple_bayesian[1][39] = -1.32888; qual_mismatch_simple_bayesian[1][40] = -1.32888; qual_mismatch_simple_bayesian[1][41] = -1.32888; qual_mismatch_simple_bayesian[1][42] = -1.32888; qual_mismatch_simple_bayesian[1][43] = -1.32887; qual_mismatch_simple_bayesian[1][44] = -1.32887; qual_mismatch_simple_bayesian[1][45] = -1.32887; qual_mismatch_simple_bayesian[1][46] = -1.32887; qual_mismatch_simple_bayesian[2][0] = -1.33474; qual_mismatch_simple_bayesian[2][1] = -1.37696; qual_mismatch_simple_bayesian[2][2] = -1.41181; qual_mismatch_simple_bayesian[2][3] = -1.44039; qual_mismatch_simple_bayesian[2][4] = -1.46368; qual_mismatch_simple_bayesian[2][5] = -1.48258; qual_mismatch_simple_bayesian[2][6] = -1.49786; qual_mismatch_simple_bayesian[2][7] = -1.51016; qual_mismatch_simple_bayesian[2][8] = -1.52003; qual_mismatch_simple_bayesian[2][9] = -1.52795; qual_mismatch_simple_bayesian[2][10] = -1.53428; qual_mismatch_simple_bayesian[2][11] = -1.53934; qual_mismatch_simple_bayesian[2][12] = -1.54338; qual_mismatch_simple_bayesian[2][13] = -1.5466; qual_mismatch_simple_bayesian[2][14] = -1.54916; qual_mismatch_simple_bayesian[2][15] = -1.55121; qual_mismatch_simple_bayesian[2][16] = -1.55283; qual_mismatch_simple_bayesian[2][17] = -1.55412; qual_mismatch_simple_bayesian[2][18] = -1.55515; qual_mismatch_simple_bayesian[2][19] = -1.55597; qual_mismatch_simple_bayesian[2][20] = -1.55662; qual_mismatch_simple_bayesian[2][21] = -1.55713; qual_mismatch_simple_bayesian[2][22] = -1.55754; qual_mismatch_simple_bayesian[2][23] = -1.55787; qual_mismatch_simple_bayesian[2][24] = -1.55813; qual_mismatch_simple_bayesian[2][25] = -1.55833; qual_mismatch_simple_bayesian[2][26] = -1.5585; qual_mismatch_simple_bayesian[2][27] = -1.55863; qual_mismatch_simple_bayesian[2][28] = -1.55873; qual_mismatch_simple_bayesian[2][29] = -1.55881; qual_mismatch_simple_bayesian[2][30] = -1.55888; qual_mismatch_simple_bayesian[2][31] = -1.55893; qual_mismatch_simple_bayesian[2][32] = -1.55897; qual_mismatch_simple_bayesian[2][33] = -1.559; qual_mismatch_simple_bayesian[2][34] = -1.55903; qual_mismatch_simple_bayesian[2][35] = -1.55905; qual_mismatch_simple_bayesian[2][36] = -1.55907; qual_mismatch_simple_bayesian[2][37] = -1.55908; qual_mismatch_simple_bayesian[2][38] = -1.55909; qual_mismatch_simple_bayesian[2][39] = -1.5591; qual_mismatch_simple_bayesian[2][40] = -1.5591; qual_mismatch_simple_bayesian[2][41] = -1.55911; qual_mismatch_simple_bayesian[2][42] = -1.55911; qual_mismatch_simple_bayesian[2][43] = -1.55912; qual_mismatch_simple_bayesian[2][44] = -1.55912; qual_mismatch_simple_bayesian[2][45] = -1.55912; qual_mismatch_simple_bayesian[2][46] = -1.55912; qual_mismatch_simple_bayesian[3][0] = -1.28141; qual_mismatch_simple_bayesian[3][1] = -1.36688; qual_mismatch_simple_bayesian[3][2] = -1.44039; qual_mismatch_simple_bayesian[3][3] = -1.50289; qual_mismatch_simple_bayesian[3][4] = -1.55549; qual_mismatch_simple_bayesian[3][5] = -1.59933; qual_mismatch_simple_bayesian[3][6] = -1.63558; qual_mismatch_simple_bayesian[3][7] = -1.66534; qual_mismatch_simple_bayesian[3][8] = -1.68963; qual_mismatch_simple_bayesian[3][9] = -1.70935; qual_mismatch_simple_bayesian[3][10] = -1.72529; qual_mismatch_simple_bayesian[3][11] = -1.73814; qual_mismatch_simple_bayesian[3][12] = -1.74847; qual_mismatch_simple_bayesian[3][13] = -1.75675; qual_mismatch_simple_bayesian[3][14] = -1.76338; qual_mismatch_simple_bayesian[3][15] = -1.76867; qual_mismatch_simple_bayesian[3][16] = -1.7729; qual_mismatch_simple_bayesian[3][17] = -1.77627; qual_mismatch_simple_bayesian[3][18] = -1.77895; qual_mismatch_simple_bayesian[3][19] = -1.78109; qual_mismatch_simple_bayesian[3][20] = -1.78279; qual_mismatch_simple_bayesian[3][21] = -1.78414; qual_mismatch_simple_bayesian[3][22] = -1.78522; qual_mismatch_simple_bayesian[3][23] = -1.78608; qual_mismatch_simple_bayesian[3][24] = -1.78676; qual_mismatch_simple_bayesian[3][25] = -1.7873; qual_mismatch_simple_bayesian[3][26] = -1.78773; qual_mismatch_simple_bayesian[3][27] = -1.78807; qual_mismatch_simple_bayesian[3][28] = -1.78834; qual_mismatch_simple_bayesian[3][29] = -1.78855; qual_mismatch_simple_bayesian[3][30] = -1.78873; qual_mismatch_simple_bayesian[3][31] = -1.78886; qual_mismatch_simple_bayesian[3][32] = -1.78897; qual_mismatch_simple_bayesian[3][33] = -1.78906; qual_mismatch_simple_bayesian[3][34] = -1.78912; qual_mismatch_simple_bayesian[3][35] = -1.78918; qual_mismatch_simple_bayesian[3][36] = -1.78922; qual_mismatch_simple_bayesian[3][37] = -1.78926; qual_mismatch_simple_bayesian[3][38] = -1.78928; qual_mismatch_simple_bayesian[3][39] = -1.7893; qual_mismatch_simple_bayesian[3][40] = -1.78932; qual_mismatch_simple_bayesian[3][41] = -1.78934; qual_mismatch_simple_bayesian[3][42] = -1.78935; qual_mismatch_simple_bayesian[3][43] = -1.78935; qual_mismatch_simple_bayesian[3][44] = -1.78936; qual_mismatch_simple_bayesian[3][45] = -1.78937; qual_mismatch_simple_bayesian[3][46] = -1.78937; qual_mismatch_simple_bayesian[4][0] = -1.24099; qual_mismatch_simple_bayesian[4][1] = -1.35894; qual_mismatch_simple_bayesian[4][2] = -1.46368; qual_mismatch_simple_bayesian[4][3] = -1.55549; qual_mismatch_simple_bayesian[4][4] = -1.63493; qual_mismatch_simple_bayesian[4][5] = -1.70287; qual_mismatch_simple_bayesian[4][6] = -1.76033; qual_mismatch_simple_bayesian[4][7] = -1.80845; qual_mismatch_simple_bayesian[4][8] = -1.8484; qual_mismatch_simple_bayesian[4][9] = -1.8813; qual_mismatch_simple_bayesian[4][10] = -1.90823; qual_mismatch_simple_bayesian[4][11] = -1.93016; qual_mismatch_simple_bayesian[4][12] = -1.94792; qual_mismatch_simple_bayesian[4][13] = -1.96226; qual_mismatch_simple_bayesian[4][14] = -1.97379; qual_mismatch_simple_bayesian[4][15] = -1.98305; qual_mismatch_simple_bayesian[4][16] = -1.99047; qual_mismatch_simple_bayesian[4][17] = -1.9964; qual_mismatch_simple_bayesian[4][18] = -2.00114; qual_mismatch_simple_bayesian[4][19] = -2.00492; qual_mismatch_simple_bayesian[4][20] = -2.00793; qual_mismatch_simple_bayesian[4][21] = -2.01033; qual_mismatch_simple_bayesian[4][22] = -2.01224; qual_mismatch_simple_bayesian[4][23] = -2.01376; qual_mismatch_simple_bayesian[4][24] = -2.01497; qual_mismatch_simple_bayesian[4][25] = -2.01593; qual_mismatch_simple_bayesian[4][26] = -2.01669; qual_mismatch_simple_bayesian[4][27] = -2.0173; qual_mismatch_simple_bayesian[4][28] = -2.01778; qual_mismatch_simple_bayesian[4][29] = -2.01816; qual_mismatch_simple_bayesian[4][30] = -2.01847; qual_mismatch_simple_bayesian[4][31] = -2.01871; qual_mismatch_simple_bayesian[4][32] = -2.0189; qual_mismatch_simple_bayesian[4][33] = -2.01906; qual_mismatch_simple_bayesian[4][34] = -2.01918; qual_mismatch_simple_bayesian[4][35] = -2.01927; qual_mismatch_simple_bayesian[4][36] = -2.01935; qual_mismatch_simple_bayesian[4][37] = -2.01941; qual_mismatch_simple_bayesian[4][38] = -2.01946; qual_mismatch_simple_bayesian[4][39] = -2.0195; qual_mismatch_simple_bayesian[4][40] = -2.01953; qual_mismatch_simple_bayesian[4][41] = -2.01955; qual_mismatch_simple_bayesian[4][42] = -2.01957; qual_mismatch_simple_bayesian[4][43] = -2.01959; qual_mismatch_simple_bayesian[4][44] = -2.0196; qual_mismatch_simple_bayesian[4][45] = -2.01961; qual_mismatch_simple_bayesian[4][46] = -2.01962; qual_mismatch_simple_bayesian[5][0] = -1.21; qual_mismatch_simple_bayesian[5][1] = -1.35268; qual_mismatch_simple_bayesian[5][2] = -1.48258; qual_mismatch_simple_bayesian[5][3] = -1.59933; qual_mismatch_simple_bayesian[5][4] = -1.70287; qual_mismatch_simple_bayesian[5][5] = -1.79352; qual_mismatch_simple_bayesian[5][6] = -1.87187; qual_mismatch_simple_bayesian[5][7] = -1.93881; qual_mismatch_simple_bayesian[5][8] = -1.99536; qual_mismatch_simple_bayesian[5][9] = -2.04269; qual_mismatch_simple_bayesian[5][10] = -2.08194; qual_mismatch_simple_bayesian[5][11] = -2.11426; qual_mismatch_simple_bayesian[5][12] = -2.14069; qual_mismatch_simple_bayesian[5][13] = -2.1622; qual_mismatch_simple_bayesian[5][14] = -2.17962; qual_mismatch_simple_bayesian[5][15] = -2.19368; qual_mismatch_simple_bayesian[5][16] = -2.20499; qual_mismatch_simple_bayesian[5][17] = -2.21406; qual_mismatch_simple_bayesian[5][18] = -2.22133; qual_mismatch_simple_bayesian[5][19] = -2.22714; qual_mismatch_simple_bayesian[5][20] = -2.23178; qual_mismatch_simple_bayesian[5][21] = -2.23548; qual_mismatch_simple_bayesian[5][22] = -2.23843; qual_mismatch_simple_bayesian[5][23] = -2.24078; qual_mismatch_simple_bayesian[5][24] = -2.24265; qual_mismatch_simple_bayesian[5][25] = -2.24414; qual_mismatch_simple_bayesian[5][26] = -2.24532; qual_mismatch_simple_bayesian[5][27] = -2.24626; qual_mismatch_simple_bayesian[5][28] = -2.24701; qual_mismatch_simple_bayesian[5][29] = -2.2476; qual_mismatch_simple_bayesian[5][30] = -2.24808; qual_mismatch_simple_bayesian[5][31] = -2.24845; qual_mismatch_simple_bayesian[5][32] = -2.24875; qual_mismatch_simple_bayesian[5][33] = -2.24899; qual_mismatch_simple_bayesian[5][34] = -2.24918; qual_mismatch_simple_bayesian[5][35] = -2.24933; qual_mismatch_simple_bayesian[5][36] = -2.24945; qual_mismatch_simple_bayesian[5][37] = -2.24954; qual_mismatch_simple_bayesian[5][38] = -2.24962; qual_mismatch_simple_bayesian[5][39] = -2.24967; qual_mismatch_simple_bayesian[5][40] = -2.24972; qual_mismatch_simple_bayesian[5][41] = -2.24976; qual_mismatch_simple_bayesian[5][42] = -2.24979; qual_mismatch_simple_bayesian[5][43] = -2.24981; qual_mismatch_simple_bayesian[5][44] = -2.24983; qual_mismatch_simple_bayesian[5][45] = -2.24985; qual_mismatch_simple_bayesian[5][46] = -2.24986; qual_mismatch_simple_bayesian[6][0] = -1.18606; qual_mismatch_simple_bayesian[6][1] = -1.34774; qual_mismatch_simple_bayesian[6][2] = -1.49786; qual_mismatch_simple_bayesian[6][3] = -1.63558; qual_mismatch_simple_bayesian[6][4] = -1.76033; qual_mismatch_simple_bayesian[6][5] = -1.87187; qual_mismatch_simple_bayesian[6][6] = -1.97029; qual_mismatch_simple_bayesian[6][7] = -2.05601; qual_mismatch_simple_bayesian[6][8] = -2.12976; qual_mismatch_simple_bayesian[6][9] = -2.19248; qual_mismatch_simple_bayesian[6][10] = -2.24527; qual_mismatch_simple_bayesian[6][11] = -2.28928; qual_mismatch_simple_bayesian[6][12] = -2.32567; qual_mismatch_simple_bayesian[6][13] = -2.35556; qual_mismatch_simple_bayesian[6][14] = -2.37995; qual_mismatch_simple_bayesian[6][15] = -2.39976; qual_mismatch_simple_bayesian[6][16] = -2.41577; qual_mismatch_simple_bayesian[6][17] = -2.42868; qual_mismatch_simple_bayesian[6][18] = -2.43906; qual_mismatch_simple_bayesian[6][19] = -2.44737; qual_mismatch_simple_bayesian[6][20] = -2.45403; qual_mismatch_simple_bayesian[6][21] = -2.45935; qual_mismatch_simple_bayesian[6][22] = -2.4636; qual_mismatch_simple_bayesian[6][23] = -2.46698; qual_mismatch_simple_bayesian[6][24] = -2.46968; qual_mismatch_simple_bayesian[6][25] = -2.47183; qual_mismatch_simple_bayesian[6][26] = -2.47353; qual_mismatch_simple_bayesian[6][27] = -2.47489; qual_mismatch_simple_bayesian[6][28] = -2.47598; qual_mismatch_simple_bayesian[6][29] = -2.47684; qual_mismatch_simple_bayesian[6][30] = -2.47752; qual_mismatch_simple_bayesian[6][31] = -2.47806; qual_mismatch_simple_bayesian[6][32] = -2.47849; qual_mismatch_simple_bayesian[6][33] = -2.47884; qual_mismatch_simple_bayesian[6][34] = -2.47911; qual_mismatch_simple_bayesian[6][35] = -2.47933; qual_mismatch_simple_bayesian[6][36] = -2.4795; qual_mismatch_simple_bayesian[6][37] = -2.47964; qual_mismatch_simple_bayesian[6][38] = -2.47974; qual_mismatch_simple_bayesian[6][39] = -2.47983; qual_mismatch_simple_bayesian[6][40] = -2.4799; qual_mismatch_simple_bayesian[6][41] = -2.47995; qual_mismatch_simple_bayesian[6][42] = -2.48; qual_mismatch_simple_bayesian[6][43] = -2.48003; qual_mismatch_simple_bayesian[6][44] = -2.48006; qual_mismatch_simple_bayesian[6][45] = -2.48008; qual_mismatch_simple_bayesian[6][46] = -2.4801; qual_mismatch_simple_bayesian[7][0] = -1.16744; qual_mismatch_simple_bayesian[7][1] = -1.34383; qual_mismatch_simple_bayesian[7][2] = -1.51016; qual_mismatch_simple_bayesian[7][3] = -1.66534; qual_mismatch_simple_bayesian[7][4] = -1.80845; qual_mismatch_simple_bayesian[7][5] = -1.93881; qual_mismatch_simple_bayesian[7][6] = -2.05601; qual_mismatch_simple_bayesian[7][7] = -2.16001; qual_mismatch_simple_bayesian[7][8] = -2.25109; qual_mismatch_simple_bayesian[7][9] = -2.32986; qual_mismatch_simple_bayesian[7][10] = -2.39718; qual_mismatch_simple_bayesian[7][11] = -2.45408; qual_mismatch_simple_bayesian[7][12] = -2.5017; qual_mismatch_simple_bayesian[7][13] = -2.54122; qual_mismatch_simple_bayesian[7][14] = -2.57376; qual_mismatch_simple_bayesian[7][15] = -2.60038; qual_mismatch_simple_bayesian[7][16] = -2.62204; qual_mismatch_simple_bayesian[7][17] = -2.63959; qual_mismatch_simple_bayesian[7][18] = -2.65376; qual_mismatch_simple_bayesian[7][19] = -2.66515; qual_mismatch_simple_bayesian[7][20] = -2.6743; qual_mismatch_simple_bayesian[7][21] = -2.68162; qual_mismatch_simple_bayesian[7][22] = -2.68748; qual_mismatch_simple_bayesian[7][23] = -2.69215; qual_mismatch_simple_bayesian[7][24] = -2.69588; qual_mismatch_simple_bayesian[7][25] = -2.69886; qual_mismatch_simple_bayesian[7][26] = -2.70122; qual_mismatch_simple_bayesian[7][27] = -2.70311; qual_mismatch_simple_bayesian[7][28] = -2.70461; qual_mismatch_simple_bayesian[7][29] = -2.7058; qual_mismatch_simple_bayesian[7][30] = -2.70675; qual_mismatch_simple_bayesian[7][31] = -2.7075; qual_mismatch_simple_bayesian[7][32] = -2.7081; qual_mismatch_simple_bayesian[7][33] = -2.70858; qual_mismatch_simple_bayesian[7][34] = -2.70896; qual_mismatch_simple_bayesian[7][35] = -2.70926; qual_mismatch_simple_bayesian[7][36] = -2.7095; qual_mismatch_simple_bayesian[7][37] = -2.70969; qual_mismatch_simple_bayesian[7][38] = -2.70984; qual_mismatch_simple_bayesian[7][39] = -2.70996; qual_mismatch_simple_bayesian[7][40] = -2.71005; qual_mismatch_simple_bayesian[7][41] = -2.71013; qual_mismatch_simple_bayesian[7][42] = -2.71019; qual_mismatch_simple_bayesian[7][43] = -2.71024; qual_mismatch_simple_bayesian[7][44] = -2.71028; qual_mismatch_simple_bayesian[7][45] = -2.71031; qual_mismatch_simple_bayesian[7][46] = -2.71033; qual_mismatch_simple_bayesian[8][0] = -1.15289; qual_mismatch_simple_bayesian[8][1] = -1.34073; qual_mismatch_simple_bayesian[8][2] = -1.52003; qual_mismatch_simple_bayesian[8][3] = -1.68963; qual_mismatch_simple_bayesian[8][4] = -1.8484; qual_mismatch_simple_bayesian[8][5] = -1.99536; qual_mismatch_simple_bayesian[8][6] = -2.12976; qual_mismatch_simple_bayesian[8][7] = -2.25109; qual_mismatch_simple_bayesian[8][8] = -2.3592; qual_mismatch_simple_bayesian[8][9] = -2.45427; qual_mismatch_simple_bayesian[8][10] = -2.5368; qual_mismatch_simple_bayesian[8][11] = -2.60759; qual_mismatch_simple_bayesian[8][12] = -2.66762; qual_mismatch_simple_bayesian[8][13] = -2.71801; qual_mismatch_simple_bayesian[8][14] = -2.75994; qual_mismatch_simple_bayesian[8][15] = -2.79454; qual_mismatch_simple_bayesian[8][16] = -2.8229; qual_mismatch_simple_bayesian[8][17] = -2.84602; qual_mismatch_simple_bayesian[8][18] = -2.86477; qual_mismatch_simple_bayesian[8][19] = -2.87992; qual_mismatch_simple_bayesian[8][20] = -2.89212; qual_mismatch_simple_bayesian[8][21] = -2.90191; qual_mismatch_simple_bayesian[8][22] = -2.90977; qual_mismatch_simple_bayesian[8][23] = -2.91605; qual_mismatch_simple_bayesian[8][24] = -2.92106; qual_mismatch_simple_bayesian[8][25] = -2.92507; qual_mismatch_simple_bayesian[8][26] = -2.92826; qual_mismatch_simple_bayesian[8][27] = -2.9308; qual_mismatch_simple_bayesian[8][28] = -2.93282; qual_mismatch_simple_bayesian[8][29] = -2.93444; qual_mismatch_simple_bayesian[8][30] = -2.93572; qual_mismatch_simple_bayesian[8][31] = -2.93674; qual_mismatch_simple_bayesian[8][32] = -2.93755; qual_mismatch_simple_bayesian[8][33] = -2.93819; qual_mismatch_simple_bayesian[8][34] = -2.9387; qual_mismatch_simple_bayesian[8][35] = -2.93911; qual_mismatch_simple_bayesian[8][36] = -2.93943; qual_mismatch_simple_bayesian[8][37] = -2.93969; qual_mismatch_simple_bayesian[8][38] = -2.93989; qual_mismatch_simple_bayesian[8][39] = -2.94005; qual_mismatch_simple_bayesian[8][40] = -2.94018; qual_mismatch_simple_bayesian[8][41] = -2.94029; qual_mismatch_simple_bayesian[8][42] = -2.94037; qual_mismatch_simple_bayesian[8][43] = -2.94043; qual_mismatch_simple_bayesian[8][44] = -2.94048; qual_mismatch_simple_bayesian[8][45] = -2.94052; qual_mismatch_simple_bayesian[8][46] = -2.94056; qual_mismatch_simple_bayesian[9][0] = -1.14148; qual_mismatch_simple_bayesian[9][1] = -1.33828; qual_mismatch_simple_bayesian[9][2] = -1.52795; qual_mismatch_simple_bayesian[9][3] = -1.70935; qual_mismatch_simple_bayesian[9][4] = -1.8813; qual_mismatch_simple_bayesian[9][5] = -2.04269; qual_mismatch_simple_bayesian[9][6] = -2.19248; qual_mismatch_simple_bayesian[9][7] = -2.32986; qual_mismatch_simple_bayesian[9][8] = -2.45427; qual_mismatch_simple_bayesian[9][9] = -2.56545; qual_mismatch_simple_bayesian[9][10] = -2.66352; qual_mismatch_simple_bayesian[9][11] = -2.74891; qual_mismatch_simple_bayesian[9][12] = -2.82235; qual_mismatch_simple_bayesian[9][13] = -2.8848; qual_mismatch_simple_bayesian[9][14] = -2.93733; qual_mismatch_simple_bayesian[9][15] = -2.98112; qual_mismatch_simple_bayesian[9][16] = -3.01733; qual_mismatch_simple_bayesian[9][17] = -3.04705; qual_mismatch_simple_bayesian[9][18] = -3.07131; qual_mismatch_simple_bayesian[9][19] = -3.09101; qual_mismatch_simple_bayesian[9][20] = -3.10693; qual_mismatch_simple_bayesian[9][21] = -3.11977; qual_mismatch_simple_bayesian[9][22] = -3.13008; qual_mismatch_simple_bayesian[9][23] = -3.13835; qual_mismatch_simple_bayesian[9][24] = -3.14496; qual_mismatch_simple_bayesian[9][25] = -3.15025; qual_mismatch_simple_bayesian[9][26] = -3.15447; qual_mismatch_simple_bayesian[9][27] = -3.15784; qual_mismatch_simple_bayesian[9][28] = -3.16052; qual_mismatch_simple_bayesian[9][29] = -3.16265; qual_mismatch_simple_bayesian[9][30] = -3.16435; qual_mismatch_simple_bayesian[9][31] = -3.1657; qual_mismatch_simple_bayesian[9][32] = -3.16678; qual_mismatch_simple_bayesian[9][33] = -3.16763; qual_mismatch_simple_bayesian[9][34] = -3.16831; qual_mismatch_simple_bayesian[9][35] = -3.16885; qual_mismatch_simple_bayesian[9][36] = -3.16928; qual_mismatch_simple_bayesian[9][37] = -3.16962; qual_mismatch_simple_bayesian[9][38] = -3.16989; qual_mismatch_simple_bayesian[9][39] = -3.17011; qual_mismatch_simple_bayesian[9][40] = -3.17028; qual_mismatch_simple_bayesian[9][41] = -3.17041; qual_mismatch_simple_bayesian[9][42] = -3.17052; qual_mismatch_simple_bayesian[9][43] = -3.17061; qual_mismatch_simple_bayesian[9][44] = -3.17068; qual_mismatch_simple_bayesian[9][45] = -3.17073; qual_mismatch_simple_bayesian[9][46] = -3.17077; qual_mismatch_simple_bayesian[10][0] = -1.13251; qual_mismatch_simple_bayesian[10][1] = -1.33634; qual_mismatch_simple_bayesian[10][2] = -1.53428; qual_mismatch_simple_bayesian[10][3] = -1.72529; qual_mismatch_simple_bayesian[10][4] = -1.90823; qual_mismatch_simple_bayesian[10][5] = -2.08194; qual_mismatch_simple_bayesian[10][6] = -2.24527; qual_mismatch_simple_bayesian[10][7] = -2.39718; qual_mismatch_simple_bayesian[10][8] = -2.5368; qual_mismatch_simple_bayesian[10][9] = -2.66352; qual_mismatch_simple_bayesian[10][10] = -2.77704; qual_mismatch_simple_bayesian[10][11] = -2.87741; qual_mismatch_simple_bayesian[10][12] = -2.96499; qual_mismatch_simple_bayesian[10][13] = -3.04048; qual_mismatch_simple_bayesian[10][14] = -3.10478; qual_mismatch_simple_bayesian[10][15] = -3.15899; qual_mismatch_simple_bayesian[10][16] = -3.20424; qual_mismatch_simple_bayesian[10][17] = -3.2417; qual_mismatch_simple_bayesian[10][18] = -3.27249; qual_mismatch_simple_bayesian[10][19] = -3.29764; qual_mismatch_simple_bayesian[10][20] = -3.31808; qual_mismatch_simple_bayesian[10][21] = -3.33462; qual_mismatch_simple_bayesian[10][22] = -3.34796; qual_mismatch_simple_bayesian[10][23] = -3.35868; qual_mismatch_simple_bayesian[10][24] = -3.36728; qual_mismatch_simple_bayesian[10][25] = -3.37416; qual_mismatch_simple_bayesian[10][26] = -3.37966; qual_mismatch_simple_bayesian[10][27] = -3.38405; qual_mismatch_simple_bayesian[10][28] = -3.38756; qual_mismatch_simple_bayesian[10][29] = -3.39035; qual_mismatch_simple_bayesian[10][30] = -3.39257; qual_mismatch_simple_bayesian[10][31] = -3.39434; qual_mismatch_simple_bayesian[10][32] = -3.39574; qual_mismatch_simple_bayesian[10][33] = -3.39686; qual_mismatch_simple_bayesian[10][34] = -3.39775; qual_mismatch_simple_bayesian[10][35] = -3.39846; qual_mismatch_simple_bayesian[10][36] = -3.39902; qual_mismatch_simple_bayesian[10][37] = -3.39947; qual_mismatch_simple_bayesian[10][38] = -3.39982; qual_mismatch_simple_bayesian[10][39] = -3.40011; qual_mismatch_simple_bayesian[10][40] = -3.40033; qual_mismatch_simple_bayesian[10][41] = -3.40051; qual_mismatch_simple_bayesian[10][42] = -3.40065; qual_mismatch_simple_bayesian[10][43] = -3.40076; qual_mismatch_simple_bayesian[10][44] = -3.40085; qual_mismatch_simple_bayesian[10][45] = -3.40092; qual_mismatch_simple_bayesian[10][46] = -3.40098; qual_mismatch_simple_bayesian[11][0] = -1.12545; qual_mismatch_simple_bayesian[11][1] = -1.3348; qual_mismatch_simple_bayesian[11][2] = -1.53934; qual_mismatch_simple_bayesian[11][3] = -1.73814; qual_mismatch_simple_bayesian[11][4] = -1.93016; qual_mismatch_simple_bayesian[11][5] = -2.11426; qual_mismatch_simple_bayesian[11][6] = -2.28928; qual_mismatch_simple_bayesian[11][7] = -2.45408; qual_mismatch_simple_bayesian[11][8] = -2.60759; qual_mismatch_simple_bayesian[11][9] = -2.74891; qual_mismatch_simple_bayesian[11][10] = -2.87741; qual_mismatch_simple_bayesian[11][11] = -2.99272; qual_mismatch_simple_bayesian[11][12] = -3.09485; qual_mismatch_simple_bayesian[11][13] = -3.18412; qual_mismatch_simple_bayesian[11][14] = -3.2612; qual_mismatch_simple_bayesian[11][15] = -3.32696; qual_mismatch_simple_bayesian[11][16] = -3.38246; qual_mismatch_simple_bayesian[11][17] = -3.42885; qual_mismatch_simple_bayesian[11][18] = -3.4673; qual_mismatch_simple_bayesian[11][19] = -3.49893; qual_mismatch_simple_bayesian[11][20] = -3.52479; qual_mismatch_simple_bayesian[11][21] = -3.54582; qual_mismatch_simple_bayesian[11][22] = -3.56284; qual_mismatch_simple_bayesian[11][23] = -3.57658; qual_mismatch_simple_bayesian[11][24] = -3.58762; qual_mismatch_simple_bayesian[11][25] = -3.59648; qual_mismatch_simple_bayesian[11][26] = -3.60357; qual_mismatch_simple_bayesian[11][27] = -3.60925; qual_mismatch_simple_bayesian[11][28] = -3.61377; qual_mismatch_simple_bayesian[11][29] = -3.61738; qual_mismatch_simple_bayesian[11][30] = -3.62026; qual_mismatch_simple_bayesian[11][31] = -3.62255; qual_mismatch_simple_bayesian[11][32] = -3.62438; qual_mismatch_simple_bayesian[11][33] = -3.62583; qual_mismatch_simple_bayesian[11][34] = -3.62698; qual_mismatch_simple_bayesian[11][35] = -3.6279; qual_mismatch_simple_bayesian[11][36] = -3.62863; qual_mismatch_simple_bayesian[11][37] = -3.62921; qual_mismatch_simple_bayesian[11][38] = -3.62967; qual_mismatch_simple_bayesian[11][39] = -3.63004; qual_mismatch_simple_bayesian[11][40] = -3.63033; qual_mismatch_simple_bayesian[11][41] = -3.63056; qual_mismatch_simple_bayesian[11][42] = -3.63075; qual_mismatch_simple_bayesian[11][43] = -3.63089; qual_mismatch_simple_bayesian[11][44] = -3.63101; qual_mismatch_simple_bayesian[11][45] = -3.6311; qual_mismatch_simple_bayesian[11][46] = -3.63117; qual_mismatch_simple_bayesian[12][0] = -1.11987; qual_mismatch_simple_bayesian[12][1] = -1.33358; qual_mismatch_simple_bayesian[12][2] = -1.54338; qual_mismatch_simple_bayesian[12][3] = -1.74847; qual_mismatch_simple_bayesian[12][4] = -1.94792; qual_mismatch_simple_bayesian[12][5] = -2.14069; qual_mismatch_simple_bayesian[12][6] = -2.32567; qual_mismatch_simple_bayesian[12][7] = -2.5017; qual_mismatch_simple_bayesian[12][8] = -2.66762; qual_mismatch_simple_bayesian[12][9] = -2.82235; qual_mismatch_simple_bayesian[12][10] = -2.96499; qual_mismatch_simple_bayesian[12][11] = -3.09485; qual_mismatch_simple_bayesian[12][12] = -3.21154; qual_mismatch_simple_bayesian[12][13] = -3.31504; qual_mismatch_simple_bayesian[12][14] = -3.40563; qual_mismatch_simple_bayesian[12][15] = -3.48395; qual_mismatch_simple_bayesian[12][16] = -3.55084; qual_mismatch_simple_bayesian[12][17] = -3.60736; qual_mismatch_simple_bayesian[12][18] = -3.65465; qual_mismatch_simple_bayesian[12][19] = -3.69388; qual_mismatch_simple_bayesian[12][20] = -3.72617; qual_mismatch_simple_bayesian[12][21] = -3.75259; qual_mismatch_simple_bayesian[12][22] = -3.77408; qual_mismatch_simple_bayesian[12][23] = -3.79149; qual_mismatch_simple_bayesian[12][24] = -3.80553; qual_mismatch_simple_bayesian[12][25] = -3.81683; qual_mismatch_simple_bayesian[12][26] = -3.8259; qual_mismatch_simple_bayesian[12][27] = -3.83316; qual_mismatch_simple_bayesian[12][28] = -3.83897; qual_mismatch_simple_bayesian[12][29] = -3.84361; qual_mismatch_simple_bayesian[12][30] = -3.8473; qual_mismatch_simple_bayesian[12][31] = -3.85025; qual_mismatch_simple_bayesian[12][32] = -3.8526; qual_mismatch_simple_bayesian[12][33] = -3.85447; qual_mismatch_simple_bayesian[12][34] = -3.85595; qual_mismatch_simple_bayesian[12][35] = -3.85713; qual_mismatch_simple_bayesian[12][36] = -3.85807; qual_mismatch_simple_bayesian[12][37] = -3.85882; qual_mismatch_simple_bayesian[12][38] = -3.85942; qual_mismatch_simple_bayesian[12][39] = -3.85989; qual_mismatch_simple_bayesian[12][40] = -3.86026; qual_mismatch_simple_bayesian[12][41] = -3.86056; qual_mismatch_simple_bayesian[12][42] = -3.8608; qual_mismatch_simple_bayesian[12][43] = -3.86099; qual_mismatch_simple_bayesian[12][44] = -3.86114; qual_mismatch_simple_bayesian[12][45] = -3.86126; qual_mismatch_simple_bayesian[12][46] = -3.86135; qual_mismatch_simple_bayesian[13][0] = -1.11546; qual_mismatch_simple_bayesian[13][1] = -1.33261; qual_mismatch_simple_bayesian[13][2] = -1.5466; qual_mismatch_simple_bayesian[13][3] = -1.75675; qual_mismatch_simple_bayesian[13][4] = -1.96226; qual_mismatch_simple_bayesian[13][5] = -2.1622; qual_mismatch_simple_bayesian[13][6] = -2.35556; qual_mismatch_simple_bayesian[13][7] = -2.54122; qual_mismatch_simple_bayesian[13][8] = -2.71801; qual_mismatch_simple_bayesian[13][9] = -2.8848; qual_mismatch_simple_bayesian[13][10] = -3.04048; qual_mismatch_simple_bayesian[13][11] = -3.18412; qual_mismatch_simple_bayesian[13][12] = -3.31504; qual_mismatch_simple_bayesian[13][13] = -3.43281; qual_mismatch_simple_bayesian[13][14] = -3.53737; qual_mismatch_simple_bayesian[13][15] = -3.629; qual_mismatch_simple_bayesian[13][16] = -3.70828; qual_mismatch_simple_bayesian[13][17] = -3.77607; qual_mismatch_simple_bayesian[13][18] = -3.83339; qual_mismatch_simple_bayesian[13][19] = -3.88139; qual_mismatch_simple_bayesian[13][20] = -3.92122; qual_mismatch_simple_bayesian[13][21] = -3.95404; qual_mismatch_simple_bayesian[13][22] = -3.9809; qual_mismatch_simple_bayesian[13][23] = -4.00276; qual_mismatch_simple_bayesian[13][24] = -4.02047; qual_mismatch_simple_bayesian[13][25] = -4.03476; qual_mismatch_simple_bayesian[13][26] = -4.04626; qual_mismatch_simple_bayesian[13][27] = -4.0555; qual_mismatch_simple_bayesian[13][28] = -4.06289; qual_mismatch_simple_bayesian[13][29] = -4.0688; qual_mismatch_simple_bayesian[13][30] = -4.07352; qual_mismatch_simple_bayesian[13][31] = -4.07729; qual_mismatch_simple_bayesian[13][32] = -4.08029; qual_mismatch_simple_bayesian[13][33] = -4.08268; qual_mismatch_simple_bayesian[13][34] = -4.08459; qual_mismatch_simple_bayesian[13][35] = -4.0861; qual_mismatch_simple_bayesian[13][36] = -4.08731; qual_mismatch_simple_bayesian[13][37] = -4.08826; qual_mismatch_simple_bayesian[13][38] = -4.08903; qual_mismatch_simple_bayesian[13][39] = -4.08963; qual_mismatch_simple_bayesian[13][40] = -4.09011; qual_mismatch_simple_bayesian[13][41] = -4.0905; qual_mismatch_simple_bayesian[13][42] = -4.0908; qual_mismatch_simple_bayesian[13][43] = -4.09104; qual_mismatch_simple_bayesian[13][44] = -4.09123; qual_mismatch_simple_bayesian[13][45] = -4.09138; qual_mismatch_simple_bayesian[13][46] = -4.09151; qual_mismatch_simple_bayesian[14][0] = -1.11197; qual_mismatch_simple_bayesian[14][1] = -1.33184; qual_mismatch_simple_bayesian[14][2] = -1.54916; qual_mismatch_simple_bayesian[14][3] = -1.76338; qual_mismatch_simple_bayesian[14][4] = -1.97379; qual_mismatch_simple_bayesian[14][5] = -2.17962; qual_mismatch_simple_bayesian[14][6] = -2.37995; qual_mismatch_simple_bayesian[14][7] = -2.57376; qual_mismatch_simple_bayesian[14][8] = -2.75994; qual_mismatch_simple_bayesian[14][9] = -2.93733; qual_mismatch_simple_bayesian[14][10] = -3.10478; qual_mismatch_simple_bayesian[14][11] = -3.2612; qual_mismatch_simple_bayesian[14][12] = -3.40563; qual_mismatch_simple_bayesian[14][13] = -3.53737; qual_mismatch_simple_bayesian[14][14] = -3.65598; qual_mismatch_simple_bayesian[14][15] = -3.76138; qual_mismatch_simple_bayesian[14][16] = -3.85381; qual_mismatch_simple_bayesian[14][17] = -3.93386; qual_mismatch_simple_bayesian[14][18] = -4.00234; qual_mismatch_simple_bayesian[14][19] = -4.0603; qual_mismatch_simple_bayesian[14][20] = -4.10885; qual_mismatch_simple_bayesian[14][21] = -4.14917; qual_mismatch_simple_bayesian[14][22] = -4.1824; qual_mismatch_simple_bayesian[14][23] = -4.20961; qual_mismatch_simple_bayesian[14][24] = -4.23176; qual_mismatch_simple_bayesian[14][25] = -4.24971; qual_mismatch_simple_bayesian[14][26] = -4.2642; qual_mismatch_simple_bayesian[14][27] = -4.27586; qual_mismatch_simple_bayesian[14][28] = -4.28523; qual_mismatch_simple_bayesian[14][29] = -4.29273; qual_mismatch_simple_bayesian[14][30] = -4.29872; qual_mismatch_simple_bayesian[14][31] = -4.30351; qual_mismatch_simple_bayesian[14][32] = -4.30734; qual_mismatch_simple_bayesian[14][33] = -4.31038; qual_mismatch_simple_bayesian[14][34] = -4.31281; qual_mismatch_simple_bayesian[14][35] = -4.31474; qual_mismatch_simple_bayesian[14][36] = -4.31627; qual_mismatch_simple_bayesian[14][37] = -4.3175; qual_mismatch_simple_bayesian[14][38] = -4.31847; qual_mismatch_simple_bayesian[14][39] = -4.31924; qual_mismatch_simple_bayesian[14][40] = -4.31986; qual_mismatch_simple_bayesian[14][41] = -4.32034; qual_mismatch_simple_bayesian[14][42] = -4.32073; qual_mismatch_simple_bayesian[14][43] = -4.32104; qual_mismatch_simple_bayesian[14][44] = -4.32128; qual_mismatch_simple_bayesian[14][45] = -4.32148; qual_mismatch_simple_bayesian[14][46] = -4.32163; qual_mismatch_simple_bayesian[15][0] = -1.10921; qual_mismatch_simple_bayesian[15][1] = -1.33123; qual_mismatch_simple_bayesian[15][2] = -1.55121; qual_mismatch_simple_bayesian[15][3] = -1.76867; qual_mismatch_simple_bayesian[15][4] = -1.98305; qual_mismatch_simple_bayesian[15][5] = -2.19368; qual_mismatch_simple_bayesian[15][6] = -2.39976; qual_mismatch_simple_bayesian[15][7] = -2.60038; qual_mismatch_simple_bayesian[15][8] = -2.79454; qual_mismatch_simple_bayesian[15][9] = -2.98112; qual_mismatch_simple_bayesian[15][10] = -3.15899; qual_mismatch_simple_bayesian[15][11] = -3.32696; qual_mismatch_simple_bayesian[15][12] = -3.48395; qual_mismatch_simple_bayesian[15][13] = -3.629; qual_mismatch_simple_bayesian[15][14] = -3.76138; qual_mismatch_simple_bayesian[15][15] = -3.88065; qual_mismatch_simple_bayesian[15][16] = -3.9867; qual_mismatch_simple_bayesian[15][17] = -4.07977; qual_mismatch_simple_bayesian[15][18] = -4.16041; qual_mismatch_simple_bayesian[15][19] = -4.22945; qual_mismatch_simple_bayesian[15][20] = -4.2879; qual_mismatch_simple_bayesian[15][21] = -4.3369; qual_mismatch_simple_bayesian[15][22] = -4.3776; qual_mismatch_simple_bayesian[15][23] = -4.41116; qual_mismatch_simple_bayesian[15][24] = -4.43864; qual_mismatch_simple_bayesian[15][25] = -4.46102; qual_mismatch_simple_bayesian[15][26] = -4.47916; qual_mismatch_simple_bayesian[15][27] = -4.49381; qual_mismatch_simple_bayesian[15][28] = -4.5056; qual_mismatch_simple_bayesian[15][29] = -4.51507; qual_mismatch_simple_bayesian[15][30] = -4.52265; qual_mismatch_simple_bayesian[15][31] = -4.52872; qual_mismatch_simple_bayesian[15][32] = -4.53356; qual_mismatch_simple_bayesian[15][33] = -4.53742; qual_mismatch_simple_bayesian[15][34] = -4.5405; qual_mismatch_simple_bayesian[15][35] = -4.54296; qual_mismatch_simple_bayesian[15][36] = -4.54491; qual_mismatch_simple_bayesian[15][37] = -4.54646; qual_mismatch_simple_bayesian[15][38] = -4.5477; qual_mismatch_simple_bayesian[15][39] = -4.54868; qual_mismatch_simple_bayesian[15][40] = -4.54947; qual_mismatch_simple_bayesian[15][41] = -4.55009; qual_mismatch_simple_bayesian[15][42] = -4.55058; qual_mismatch_simple_bayesian[15][43] = -4.55097; qual_mismatch_simple_bayesian[15][44] = -4.55128; qual_mismatch_simple_bayesian[15][45] = -4.55153; qual_mismatch_simple_bayesian[15][46] = -4.55173; qual_mismatch_simple_bayesian[16][0] = -1.10702; qual_mismatch_simple_bayesian[16][1] = -1.33074; qual_mismatch_simple_bayesian[16][2] = -1.55283; qual_mismatch_simple_bayesian[16][3] = -1.7729; qual_mismatch_simple_bayesian[16][4] = -1.99047; qual_mismatch_simple_bayesian[16][5] = -2.20499; qual_mismatch_simple_bayesian[16][6] = -2.41577; qual_mismatch_simple_bayesian[16][7] = -2.62204; qual_mismatch_simple_bayesian[16][8] = -2.8229; qual_mismatch_simple_bayesian[16][9] = -3.01733; qual_mismatch_simple_bayesian[16][10] = -3.20424; qual_mismatch_simple_bayesian[16][11] = -3.38246; qual_mismatch_simple_bayesian[16][12] = -3.55084; qual_mismatch_simple_bayesian[16][13] = -3.70828; qual_mismatch_simple_bayesian[16][14] = -3.85381; qual_mismatch_simple_bayesian[16][15] = -3.9867; qual_mismatch_simple_bayesian[16][16] = -4.10649; qual_mismatch_simple_bayesian[16][17] = -4.21306; qual_mismatch_simple_bayesian[16][18] = -4.30662; qual_mismatch_simple_bayesian[16][19] = -4.38774; qual_mismatch_simple_bayesian[16][20] = -4.45721; qual_mismatch_simple_bayesian[16][21] = -4.51606; qual_mismatch_simple_bayesian[16][22] = -4.5654; qual_mismatch_simple_bayesian[16][23] = -4.60641; qual_mismatch_simple_bayesian[16][24] = -4.64022; qual_mismatch_simple_bayesian[16][25] = -4.66792; qual_mismatch_simple_bayesian[16][26] = -4.69049; qual_mismatch_simple_bayesian[16][27] = -4.70878; qual_mismatch_simple_bayesian[16][28] = -4.72355; qual_mismatch_simple_bayesian[16][29] = -4.73544; qual_mismatch_simple_bayesian[16][30] = -4.74499; qual_mismatch_simple_bayesian[16][31] = -4.75264; qual_mismatch_simple_bayesian[16][32] = -4.75876; qual_mismatch_simple_bayesian[16][33] = -4.76365; qual_mismatch_simple_bayesian[16][34] = -4.76755; qual_mismatch_simple_bayesian[16][35] = -4.77065; qual_mismatch_simple_bayesian[16][36] = -4.77313; qual_mismatch_simple_bayesian[16][37] = -4.7751; qual_mismatch_simple_bayesian[16][38] = -4.77667; qual_mismatch_simple_bayesian[16][39] = -4.77792; qual_mismatch_simple_bayesian[16][40] = -4.77891; qual_mismatch_simple_bayesian[16][41] = -4.7797; qual_mismatch_simple_bayesian[16][42] = -4.78032; qual_mismatch_simple_bayesian[16][43] = -4.78082; qual_mismatch_simple_bayesian[16][44] = -4.78122; qual_mismatch_simple_bayesian[16][45] = -4.78153; qual_mismatch_simple_bayesian[16][46] = -4.78178; qual_mismatch_simple_bayesian[17][0] = -1.10529; qual_mismatch_simple_bayesian[17][1] = -1.33036; qual_mismatch_simple_bayesian[17][2] = -1.55412; qual_mismatch_simple_bayesian[17][3] = -1.77627; qual_mismatch_simple_bayesian[17][4] = -1.9964; qual_mismatch_simple_bayesian[17][5] = -2.21406; qual_mismatch_simple_bayesian[17][6] = -2.42868; qual_mismatch_simple_bayesian[17][7] = -2.63959; qual_mismatch_simple_bayesian[17][8] = -2.84602; qual_mismatch_simple_bayesian[17][9] = -3.04705; qual_mismatch_simple_bayesian[17][10] = -3.2417; qual_mismatch_simple_bayesian[17][11] = -3.42885; qual_mismatch_simple_bayesian[17][12] = -3.60736; qual_mismatch_simple_bayesian[17][13] = -3.77607; qual_mismatch_simple_bayesian[17][14] = -3.93386; qual_mismatch_simple_bayesian[17][15] = -4.07977; qual_mismatch_simple_bayesian[17][16] = -4.21306; qual_mismatch_simple_bayesian[17][17] = -4.33325; qual_mismatch_simple_bayesian[17][18] = -4.44022; qual_mismatch_simple_bayesian[17][19] = -4.53419; qual_mismatch_simple_bayesian[17][20] = -4.61567; qual_mismatch_simple_bayesian[17][21] = -4.68549; qual_mismatch_simple_bayesian[17][22] = -4.74465; qual_mismatch_simple_bayesian[17][23] = -4.79427; qual_mismatch_simple_bayesian[17][24] = -4.83552; qual_mismatch_simple_bayesian[17][25] = -4.86954; qual_mismatch_simple_bayesian[17][26] = -4.89741; qual_mismatch_simple_bayesian[17][27] = -4.92012; qual_mismatch_simple_bayesian[17][28] = -4.93853; qual_mismatch_simple_bayesian[17][29] = -4.9534; qual_mismatch_simple_bayesian[17][30] = -4.96537; qual_mismatch_simple_bayesian[17][31] = -4.97499; qual_mismatch_simple_bayesian[17][32] = -4.98269; qual_mismatch_simple_bayesian[17][33] = -4.98885; qual_mismatch_simple_bayesian[17][34] = -4.99377; qual_mismatch_simple_bayesian[17][35] = -4.9977; qual_mismatch_simple_bayesian[17][36] = -5.00083; qual_mismatch_simple_bayesian[17][37] = -5.00332; qual_mismatch_simple_bayesian[17][38] = -5.0053; qual_mismatch_simple_bayesian[17][39] = -5.00688; qual_mismatch_simple_bayesian[17][40] = -5.00814; qual_mismatch_simple_bayesian[17][41] = -5.00914; qual_mismatch_simple_bayesian[17][42] = -5.00993; qual_mismatch_simple_bayesian[17][43] = -5.01056; qual_mismatch_simple_bayesian[17][44] = -5.01107; qual_mismatch_simple_bayesian[17][45] = -5.01147; qual_mismatch_simple_bayesian[17][46] = -5.01178; qual_mismatch_simple_bayesian[18][0] = -1.10391; qual_mismatch_simple_bayesian[18][1] = -1.33005; qual_mismatch_simple_bayesian[18][2] = -1.55515; qual_mismatch_simple_bayesian[18][3] = -1.77895; qual_mismatch_simple_bayesian[18][4] = -2.00114; qual_mismatch_simple_bayesian[18][5] = -2.22133; qual_mismatch_simple_bayesian[18][6] = -2.43906; qual_mismatch_simple_bayesian[18][7] = -2.65376; qual_mismatch_simple_bayesian[18][8] = -2.86477; qual_mismatch_simple_bayesian[18][9] = -3.07131; qual_mismatch_simple_bayesian[18][10] = -3.27249; qual_mismatch_simple_bayesian[18][11] = -3.4673; qual_mismatch_simple_bayesian[18][12] = -3.65465; qual_mismatch_simple_bayesian[18][13] = -3.83339; qual_mismatch_simple_bayesian[18][14] = -4.00234; qual_mismatch_simple_bayesian[18][15] = -4.16041; qual_mismatch_simple_bayesian[18][16] = -4.30662; qual_mismatch_simple_bayesian[18][17] = -4.44022; qual_mismatch_simple_bayesian[18][18] = -4.56074; qual_mismatch_simple_bayesian[18][19] = -4.66803; qual_mismatch_simple_bayesian[18][20] = -4.76231; qual_mismatch_simple_bayesian[18][21] = -4.84409; qual_mismatch_simple_bayesian[18][22] = -4.91418; qual_mismatch_simple_bayesian[18][23] = -4.97359; qual_mismatch_simple_bayesian[18][24] = -5.02342; qual_mismatch_simple_bayesian[18][25] = -5.06486; qual_mismatch_simple_bayesian[18][26] = -5.09904; qual_mismatch_simple_bayesian[18][27] = -5.12706; qual_mismatch_simple_bayesian[18][28] = -5.14988; qual_mismatch_simple_bayesian[18][29] = -5.16839; qual_mismatch_simple_bayesian[18][30] = -5.18334; qual_mismatch_simple_bayesian[18][31] = -5.19537; qual_mismatch_simple_bayesian[18][32] = -5.20504; qual_mismatch_simple_bayesian[18][33] = -5.21278; qual_mismatch_simple_bayesian[18][34] = -5.21897; qual_mismatch_simple_bayesian[18][35] = -5.22392; qual_mismatch_simple_bayesian[18][36] = -5.22787; qual_mismatch_simple_bayesian[18][37] = -5.23102; qual_mismatch_simple_bayesian[18][38] = -5.23352; qual_mismatch_simple_bayesian[18][39] = -5.23552; qual_mismatch_simple_bayesian[18][40] = -5.23711; qual_mismatch_simple_bayesian[18][41] = -5.23837; qual_mismatch_simple_bayesian[18][42] = -5.23938; qual_mismatch_simple_bayesian[18][43] = -5.24017; qual_mismatch_simple_bayesian[18][44] = -5.24081; qual_mismatch_simple_bayesian[18][45] = -5.24131; qual_mismatch_simple_bayesian[18][46] = -5.24172; qual_mismatch_simple_bayesian[19][0] = -1.10282; qual_mismatch_simple_bayesian[19][1] = -1.32981; qual_mismatch_simple_bayesian[19][2] = -1.55597; qual_mismatch_simple_bayesian[19][3] = -1.78109; qual_mismatch_simple_bayesian[19][4] = -2.00492; qual_mismatch_simple_bayesian[19][5] = -2.22714; qual_mismatch_simple_bayesian[19][6] = -2.44737; qual_mismatch_simple_bayesian[19][7] = -2.66515; qual_mismatch_simple_bayesian[19][8] = -2.87992; qual_mismatch_simple_bayesian[19][9] = -3.09101; qual_mismatch_simple_bayesian[19][10] = -3.29764; qual_mismatch_simple_bayesian[19][11] = -3.49893; qual_mismatch_simple_bayesian[19][12] = -3.69388; qual_mismatch_simple_bayesian[19][13] = -3.88139; qual_mismatch_simple_bayesian[19][14] = -4.0603; qual_mismatch_simple_bayesian[19][15] = -4.22945; qual_mismatch_simple_bayesian[19][16] = -4.38774; qual_mismatch_simple_bayesian[19][17] = -4.53419; qual_mismatch_simple_bayesian[19][18] = -4.66803; qual_mismatch_simple_bayesian[19][19] = -4.78881; qual_mismatch_simple_bayesian[19][20] = -4.89635; qual_mismatch_simple_bayesian[19][21] = -4.99087; qual_mismatch_simple_bayesian[19][22] = -5.07289; qual_mismatch_simple_bayesian[19][23] = -5.1432; qual_mismatch_simple_bayesian[19][24] = -5.2028; qual_mismatch_simple_bayesian[19][25] = -5.25281; qual_mismatch_simple_bayesian[19][26] = -5.29439; qual_mismatch_simple_bayesian[19][27] = -5.32871; qual_mismatch_simple_bayesian[19][28] = -5.35683; qual_mismatch_simple_bayesian[19][29] = -5.37974; qual_mismatch_simple_bayesian[19][30] = -5.39832; qual_mismatch_simple_bayesian[19][31] = -5.41334; qual_mismatch_simple_bayesian[19][32] = -5.42542; qual_mismatch_simple_bayesian[19][33] = -5.43513; qual_mismatch_simple_bayesian[19][34] = -5.44291; qual_mismatch_simple_bayesian[19][35] = -5.44913; qual_mismatch_simple_bayesian[19][36] = -5.4541; qual_mismatch_simple_bayesian[19][37] = -5.45806; qual_mismatch_simple_bayesian[19][38] = -5.46122; qual_mismatch_simple_bayesian[19][39] = -5.46374; qual_mismatch_simple_bayesian[19][40] = -5.46574; qual_mismatch_simple_bayesian[19][41] = -5.46734; qual_mismatch_simple_bayesian[19][42] = -5.46861; qual_mismatch_simple_bayesian[19][43] = -5.46962; qual_mismatch_simple_bayesian[19][44] = -5.47042; qual_mismatch_simple_bayesian[19][45] = -5.47106; qual_mismatch_simple_bayesian[19][46] = -5.47156; qual_mismatch_simple_bayesian[20][0] = -1.10195; qual_mismatch_simple_bayesian[20][1] = -1.32962; qual_mismatch_simple_bayesian[20][2] = -1.55662; qual_mismatch_simple_bayesian[20][3] = -1.78279; qual_mismatch_simple_bayesian[20][4] = -2.00793; qual_mismatch_simple_bayesian[20][5] = -2.23178; qual_mismatch_simple_bayesian[20][6] = -2.45403; qual_mismatch_simple_bayesian[20][7] = -2.6743; qual_mismatch_simple_bayesian[20][8] = -2.89212; qual_mismatch_simple_bayesian[20][9] = -3.10693; qual_mismatch_simple_bayesian[20][10] = -3.31808; qual_mismatch_simple_bayesian[20][11] = -3.52479; qual_mismatch_simple_bayesian[20][12] = -3.72617; qual_mismatch_simple_bayesian[20][13] = -3.92122; qual_mismatch_simple_bayesian[20][14] = -4.10885; qual_mismatch_simple_bayesian[20][15] = -4.2879; qual_mismatch_simple_bayesian[20][16] = -4.45721; qual_mismatch_simple_bayesian[20][17] = -4.61567; qual_mismatch_simple_bayesian[20][18] = -4.76231; qual_mismatch_simple_bayesian[20][19] = -4.89635; qual_mismatch_simple_bayesian[20][20] = -5.01732; qual_mismatch_simple_bayesian[20][21] = -5.12507; qual_mismatch_simple_bayesian[20][22] = -5.21979; qual_mismatch_simple_bayesian[20][23] = -5.30199; qual_mismatch_simple_bayesian[20][24] = -5.37247; qual_mismatch_simple_bayesian[20][25] = -5.43222; qual_mismatch_simple_bayesian[20][26] = -5.48237; qual_mismatch_simple_bayesian[20][27] = -5.52408; qual_mismatch_simple_bayesian[20][28] = -5.55849; qual_mismatch_simple_bayesian[20][29] = -5.5867; qual_mismatch_simple_bayesian[20][30] = -5.60969; qual_mismatch_simple_bayesian[20][31] = -5.62833; qual_mismatch_simple_bayesian[20][32] = -5.64339; qual_mismatch_simple_bayesian[20][33] = -5.65552; qual_mismatch_simple_bayesian[20][34] = -5.66525; qual_mismatch_simple_bayesian[20][35] = -5.67306; qual_mismatch_simple_bayesian[20][36] = -5.6793; qual_mismatch_simple_bayesian[20][37] = -5.68429; qual_mismatch_simple_bayesian[20][38] = -5.68827; qual_mismatch_simple_bayesian[20][39] = -5.69144; qual_mismatch_simple_bayesian[20][40] = -5.69396; qual_mismatch_simple_bayesian[20][41] = -5.69598; qual_mismatch_simple_bayesian[20][42] = -5.69758; qual_mismatch_simple_bayesian[20][43] = -5.69885; qual_mismatch_simple_bayesian[20][44] = -5.69986; qual_mismatch_simple_bayesian[20][45] = -5.70067; qual_mismatch_simple_bayesian[20][46] = -5.70131; qual_mismatch_simple_bayesian[21][0] = -1.10126; qual_mismatch_simple_bayesian[21][1] = -1.32946; qual_mismatch_simple_bayesian[21][2] = -1.55713; qual_mismatch_simple_bayesian[21][3] = -1.78414; qual_mismatch_simple_bayesian[21][4] = -2.01033; qual_mismatch_simple_bayesian[21][5] = -2.23548; qual_mismatch_simple_bayesian[21][6] = -2.45935; qual_mismatch_simple_bayesian[21][7] = -2.68162; qual_mismatch_simple_bayesian[21][8] = -2.90191; qual_mismatch_simple_bayesian[21][9] = -3.11977; qual_mismatch_simple_bayesian[21][10] = -3.33462; qual_mismatch_simple_bayesian[21][11] = -3.54582; qual_mismatch_simple_bayesian[21][12] = -3.75259; qual_mismatch_simple_bayesian[21][13] = -3.95404; qual_mismatch_simple_bayesian[21][14] = -4.14917; qual_mismatch_simple_bayesian[21][15] = -4.3369; qual_mismatch_simple_bayesian[21][16] = -4.51606; qual_mismatch_simple_bayesian[21][17] = -4.68549; qual_mismatch_simple_bayesian[21][18] = -4.84409; qual_mismatch_simple_bayesian[21][19] = -4.99087; qual_mismatch_simple_bayesian[21][20] = -5.12507; qual_mismatch_simple_bayesian[21][21] = -5.2462; qual_mismatch_simple_bayesian[21][22] = -5.35411; qual_mismatch_simple_bayesian[21][23] = -5.44898; qual_mismatch_simple_bayesian[21][24] = -5.53133; qual_mismatch_simple_bayesian[21][25] = -5.60194; qual_mismatch_simple_bayesian[21][26] = -5.66182; qual_mismatch_simple_bayesian[21][27] = -5.71208; qual_mismatch_simple_bayesian[21][28] = -5.75388; qual_mismatch_simple_bayesian[21][29] = -5.78837; qual_mismatch_simple_bayesian[21][30] = -5.81665; qual_mismatch_simple_bayesian[21][31] = -5.83969; qual_mismatch_simple_bayesian[21][32] = -5.85838; qual_mismatch_simple_bayesian[21][33] = -5.87348; qual_mismatch_simple_bayesian[21][34] = -5.88564; qual_mismatch_simple_bayesian[21][35] = -5.89541; qual_mismatch_simple_bayesian[21][36] = -5.90323; qual_mismatch_simple_bayesian[21][37] = -5.90949; qual_mismatch_simple_bayesian[21][38] = -5.91449; qual_mismatch_simple_bayesian[21][39] = -5.91848; qual_mismatch_simple_bayesian[21][40] = -5.92166; qual_mismatch_simple_bayesian[21][41] = -5.9242; qual_mismatch_simple_bayesian[21][42] = -5.92621; qual_mismatch_simple_bayesian[21][43] = -5.92782; qual_mismatch_simple_bayesian[21][44] = -5.92909; qual_mismatch_simple_bayesian[21][45] = -5.93011; qual_mismatch_simple_bayesian[21][46] = -5.93092; qual_mismatch_simple_bayesian[22][0] = -1.10072; qual_mismatch_simple_bayesian[22][1] = -1.32934; qual_mismatch_simple_bayesian[22][2] = -1.55754; qual_mismatch_simple_bayesian[22][3] = -1.78522; qual_mismatch_simple_bayesian[22][4] = -2.01224; qual_mismatch_simple_bayesian[22][5] = -2.23843; qual_mismatch_simple_bayesian[22][6] = -2.4636; qual_mismatch_simple_bayesian[22][7] = -2.68748; qual_mismatch_simple_bayesian[22][8] = -2.90977; qual_mismatch_simple_bayesian[22][9] = -3.13008; qual_mismatch_simple_bayesian[22][10] = -3.34796; qual_mismatch_simple_bayesian[22][11] = -3.56284; qual_mismatch_simple_bayesian[22][12] = -3.77408; qual_mismatch_simple_bayesian[22][13] = -3.9809; qual_mismatch_simple_bayesian[22][14] = -4.1824; qual_mismatch_simple_bayesian[22][15] = -4.3776; qual_mismatch_simple_bayesian[22][16] = -4.5654; qual_mismatch_simple_bayesian[22][17] = -4.74465; qual_mismatch_simple_bayesian[22][18] = -4.91418; qual_mismatch_simple_bayesian[22][19] = -5.07289; qual_mismatch_simple_bayesian[22][20] = -5.21979; qual_mismatch_simple_bayesian[22][21] = -5.35411; qual_mismatch_simple_bayesian[22][22] = -5.47537; qual_mismatch_simple_bayesian[22][23] = -5.5834; qual_mismatch_simple_bayesian[22][24] = -5.67839; qual_mismatch_simple_bayesian[22][25] = -5.76086; qual_mismatch_simple_bayesian[22][26] = -5.83158; qual_mismatch_simple_bayesian[22][27] = -5.89155; qual_mismatch_simple_bayesian[22][28] = -5.9419; qual_mismatch_simple_bayesian[22][29] = -5.98377; qual_mismatch_simple_bayesian[22][30] = -6.01833; qual_mismatch_simple_bayesian[22][31] = -6.04666; qual_mismatch_simple_bayesian[22][32] = -6.06975; qual_mismatch_simple_bayesian[22][33] = -6.08848; qual_mismatch_simple_bayesian[22][34] = -6.10361; qual_mismatch_simple_bayesian[22][35] = -6.1158; qual_mismatch_simple_bayesian[22][36] = -6.12558; qual_mismatch_simple_bayesian[22][37] = -6.13342; qual_mismatch_simple_bayesian[22][38] = -6.1397; qual_mismatch_simple_bayesian[22][39] = -6.14471; qual_mismatch_simple_bayesian[22][40] = -6.14871; qual_mismatch_simple_bayesian[22][41] = -6.15189; qual_mismatch_simple_bayesian[22][42] = -6.15443; qual_mismatch_simple_bayesian[22][43] = -6.15645; qual_mismatch_simple_bayesian[22][44] = -6.15806; qual_mismatch_simple_bayesian[22][45] = -6.15934; qual_mismatch_simple_bayesian[22][46] = -6.16036; qual_mismatch_simple_bayesian[23][0] = -1.10028; qual_mismatch_simple_bayesian[23][1] = -1.32924; qual_mismatch_simple_bayesian[23][2] = -1.55787; qual_mismatch_simple_bayesian[23][3] = -1.78608; qual_mismatch_simple_bayesian[23][4] = -2.01376; qual_mismatch_simple_bayesian[23][5] = -2.24078; qual_mismatch_simple_bayesian[23][6] = -2.46698; qual_mismatch_simple_bayesian[23][7] = -2.69215; qual_mismatch_simple_bayesian[23][8] = -2.91605; qual_mismatch_simple_bayesian[23][9] = -3.13835; qual_mismatch_simple_bayesian[23][10] = -3.35868; qual_mismatch_simple_bayesian[23][11] = -3.57658; qual_mismatch_simple_bayesian[23][12] = -3.79149; qual_mismatch_simple_bayesian[23][13] = -4.00276; qual_mismatch_simple_bayesian[23][14] = -4.20961; qual_mismatch_simple_bayesian[23][15] = -4.41116; qual_mismatch_simple_bayesian[23][16] = -4.60641; qual_mismatch_simple_bayesian[23][17] = -4.79427; qual_mismatch_simple_bayesian[23][18] = -4.97359; qual_mismatch_simple_bayesian[23][19] = -5.1432; qual_mismatch_simple_bayesian[23][20] = -5.30199; qual_mismatch_simple_bayesian[23][21] = -5.44898; qual_mismatch_simple_bayesian[23][22] = -5.5834; qual_mismatch_simple_bayesian[23][23] = -5.70476; qual_mismatch_simple_bayesian[23][24] = -5.81289; qual_mismatch_simple_bayesian[23][25] = -5.90798; qual_mismatch_simple_bayesian[23][26] = -5.99054; qual_mismatch_simple_bayesian[23][27] = -6.06134; qual_mismatch_simple_bayesian[23][28] = -6.12139; qual_mismatch_simple_bayesian[23][29] = -6.17181; qual_mismatch_simple_bayesian[23][30] = -6.21374; qual_mismatch_simple_bayesian[23][31] = -6.24836; qual_mismatch_simple_bayesian[23][32] = -6.27673; qual_mismatch_simple_bayesian[23][33] = -6.29986; qual_mismatch_simple_bayesian[23][34] = -6.31861; qual_mismatch_simple_bayesian[23][35] = -6.33377; qual_mismatch_simple_bayesian[23][36] = -6.34597; qual_mismatch_simple_bayesian[23][37] = -6.35578; qual_mismatch_simple_bayesian[23][38] = -6.36363; qual_mismatch_simple_bayesian[23][39] = -6.36991; qual_mismatch_simple_bayesian[23][40] = -6.37493; qual_mismatch_simple_bayesian[23][41] = -6.37894; qual_mismatch_simple_bayesian[23][42] = -6.38213; qual_mismatch_simple_bayesian[23][43] = -6.38467; qual_mismatch_simple_bayesian[23][44] = -6.3867; qual_mismatch_simple_bayesian[23][45] = -6.38831; qual_mismatch_simple_bayesian[23][46] = -6.38959; qual_mismatch_simple_bayesian[24][0] = -1.09994; qual_mismatch_simple_bayesian[24][1] = -1.32917; qual_mismatch_simple_bayesian[24][2] = -1.55813; qual_mismatch_simple_bayesian[24][3] = -1.78676; qual_mismatch_simple_bayesian[24][4] = -2.01497; qual_mismatch_simple_bayesian[24][5] = -2.24265; qual_mismatch_simple_bayesian[24][6] = -2.46968; qual_mismatch_simple_bayesian[24][7] = -2.69588; qual_mismatch_simple_bayesian[24][8] = -2.92106; qual_mismatch_simple_bayesian[24][9] = -3.14496; qual_mismatch_simple_bayesian[24][10] = -3.36728; qual_mismatch_simple_bayesian[24][11] = -3.58762; qual_mismatch_simple_bayesian[24][12] = -3.80553; qual_mismatch_simple_bayesian[24][13] = -4.02047; qual_mismatch_simple_bayesian[24][14] = -4.23176; qual_mismatch_simple_bayesian[24][15] = -4.43864; qual_mismatch_simple_bayesian[24][16] = -4.64022; qual_mismatch_simple_bayesian[24][17] = -4.83552; qual_mismatch_simple_bayesian[24][18] = -5.02342; qual_mismatch_simple_bayesian[24][19] = -5.2028; qual_mismatch_simple_bayesian[24][20] = -5.37247; qual_mismatch_simple_bayesian[24][21] = -5.53133; qual_mismatch_simple_bayesian[24][22] = -5.67839; qual_mismatch_simple_bayesian[24][23] = -5.81289; qual_mismatch_simple_bayesian[24][24] = -5.93433; qual_mismatch_simple_bayesian[24][25] = -6.04254; qual_mismatch_simple_bayesian[24][26] = -6.1377; qual_mismatch_simple_bayesian[24][27] = -6.22033; qual_mismatch_simple_bayesian[24][28] = -6.29121; qual_mismatch_simple_bayesian[24][29] = -6.35132; qual_mismatch_simple_bayesian[24][30] = -6.40179; qual_mismatch_simple_bayesian[24][31] = -6.44377; qual_mismatch_simple_bayesian[24][32] = -6.47843; qual_mismatch_simple_bayesian[24][33] = -6.50683; qual_mismatch_simple_bayesian[24][34] = -6.52999; qual_mismatch_simple_bayesian[24][35] = -6.54877; qual_mismatch_simple_bayesian[24][36] = -6.56395; qual_mismatch_simple_bayesian[24][37] = -6.57617; qual_mismatch_simple_bayesian[24][38] = -6.58598; qual_mismatch_simple_bayesian[24][39] = -6.59385; qual_mismatch_simple_bayesian[24][40] = -6.60014; qual_mismatch_simple_bayesian[24][41] = -6.60516; qual_mismatch_simple_bayesian[24][42] = -6.60917; qual_mismatch_simple_bayesian[24][43] = -6.61237; qual_mismatch_simple_bayesian[24][44] = -6.61492; qual_mismatch_simple_bayesian[24][45] = -6.61695; qual_mismatch_simple_bayesian[24][46] = -6.61856; qual_mismatch_simple_bayesian[25][0] = -1.09967; qual_mismatch_simple_bayesian[25][1] = -1.32911; qual_mismatch_simple_bayesian[25][2] = -1.55833; qual_mismatch_simple_bayesian[25][3] = -1.7873; qual_mismatch_simple_bayesian[25][4] = -2.01593; qual_mismatch_simple_bayesian[25][5] = -2.24414; qual_mismatch_simple_bayesian[25][6] = -2.47183; qual_mismatch_simple_bayesian[25][7] = -2.69886; qual_mismatch_simple_bayesian[25][8] = -2.92507; qual_mismatch_simple_bayesian[25][9] = -3.15025; qual_mismatch_simple_bayesian[25][10] = -3.37416; qual_mismatch_simple_bayesian[25][11] = -3.59648; qual_mismatch_simple_bayesian[25][12] = -3.81683; qual_mismatch_simple_bayesian[25][13] = -4.03476; qual_mismatch_simple_bayesian[25][14] = -4.24971; qual_mismatch_simple_bayesian[25][15] = -4.46102; qual_mismatch_simple_bayesian[25][16] = -4.66792; qual_mismatch_simple_bayesian[25][17] = -4.86954; qual_mismatch_simple_bayesian[25][18] = -5.06486; qual_mismatch_simple_bayesian[25][19] = -5.25281; qual_mismatch_simple_bayesian[25][20] = -5.43222; qual_mismatch_simple_bayesian[25][21] = -5.60194; qual_mismatch_simple_bayesian[25][22] = -5.76086; qual_mismatch_simple_bayesian[25][23] = -5.90798; qual_mismatch_simple_bayesian[25][24] = -6.04254; qual_mismatch_simple_bayesian[25][25] = -6.16404; qual_mismatch_simple_bayesian[25][26] = -6.27231; qual_mismatch_simple_bayesian[25][27] = -6.36754; qual_mismatch_simple_bayesian[25][28] = -6.45023; qual_mismatch_simple_bayesian[25][29] = -6.52116; qual_mismatch_simple_bayesian[25][30] = -6.58132; qual_mismatch_simple_bayesian[25][31] = -6.63183; qual_mismatch_simple_bayesian[25][32] = -6.67385; qual_mismatch_simple_bayesian[25][33] = -6.70854; qual_mismatch_simple_bayesian[25][34] = -6.73697; qual_mismatch_simple_bayesian[25][35] = -6.76015; qual_mismatch_simple_bayesian[25][36] = -6.77895; qual_mismatch_simple_bayesian[25][37] = -6.79414; qual_mismatch_simple_bayesian[25][38] = -6.80637; qual_mismatch_simple_bayesian[25][39] = -6.8162; qual_mismatch_simple_bayesian[25][40] = -6.82407; qual_mismatch_simple_bayesian[25][41] = -6.83037; qual_mismatch_simple_bayesian[25][42] = -6.8354; qual_mismatch_simple_bayesian[25][43] = -6.83942; qual_mismatch_simple_bayesian[25][44] = -6.84262; qual_mismatch_simple_bayesian[25][45] = -6.84517; qual_mismatch_simple_bayesian[25][46] = -6.8472; qual_mismatch_simple_bayesian[26][0] = -1.09945; qual_mismatch_simple_bayesian[26][1] = -1.32906; qual_mismatch_simple_bayesian[26][2] = -1.5585; qual_mismatch_simple_bayesian[26][3] = -1.78773; qual_mismatch_simple_bayesian[26][4] = -2.01669; qual_mismatch_simple_bayesian[26][5] = -2.24532; qual_mismatch_simple_bayesian[26][6] = -2.47353; qual_mismatch_simple_bayesian[26][7] = -2.70122; qual_mismatch_simple_bayesian[26][8] = -2.92826; qual_mismatch_simple_bayesian[26][9] = -3.15447; qual_mismatch_simple_bayesian[26][10] = -3.37966; qual_mismatch_simple_bayesian[26][11] = -3.60357; qual_mismatch_simple_bayesian[26][12] = -3.8259; qual_mismatch_simple_bayesian[26][13] = -4.04626; qual_mismatch_simple_bayesian[26][14] = -4.2642; qual_mismatch_simple_bayesian[26][15] = -4.47916; qual_mismatch_simple_bayesian[26][16] = -4.69049; qual_mismatch_simple_bayesian[26][17] = -4.89741; qual_mismatch_simple_bayesian[26][18] = -5.09904; qual_mismatch_simple_bayesian[26][19] = -5.29439; qual_mismatch_simple_bayesian[26][20] = -5.48237; qual_mismatch_simple_bayesian[26][21] = -5.66182; qual_mismatch_simple_bayesian[26][22] = -5.83158; qual_mismatch_simple_bayesian[26][23] = -5.99054; qual_mismatch_simple_bayesian[26][24] = -6.1377; qual_mismatch_simple_bayesian[26][25] = -6.27231; qual_mismatch_simple_bayesian[26][26] = -6.39386; qual_mismatch_simple_bayesian[26][27] = -6.50219; qual_mismatch_simple_bayesian[26][28] = -6.59746; qual_mismatch_simple_bayesian[26][29] = -6.6802; qual_mismatch_simple_bayesian[26][30] = -6.75117; qual_mismatch_simple_bayesian[26][31] = -6.81137; qual_mismatch_simple_bayesian[26][32] = -6.86191; qual_mismatch_simple_bayesian[26][33] = -6.90396; qual_mismatch_simple_bayesian[26][34] = -6.93867; qual_mismatch_simple_bayesian[26][35] = -6.96713; qual_mismatch_simple_bayesian[26][36] = -6.99033; qual_mismatch_simple_bayesian[26][37] = -7.00914; qual_mismatch_simple_bayesian[26][38] = -7.02435; qual_mismatch_simple_bayesian[26][39] = -7.03659; qual_mismatch_simple_bayesian[26][40] = -7.04642; qual_mismatch_simple_bayesian[26][41] = -7.0543; qual_mismatch_simple_bayesian[26][42] = -7.06061; qual_mismatch_simple_bayesian[26][43] = -7.06564; qual_mismatch_simple_bayesian[26][44] = -7.06966; qual_mismatch_simple_bayesian[26][45] = -7.07286; qual_mismatch_simple_bayesian[26][46] = -7.07542; qual_mismatch_simple_bayesian[27][0] = -1.09928; qual_mismatch_simple_bayesian[27][1] = -1.32902; qual_mismatch_simple_bayesian[27][2] = -1.55863; qual_mismatch_simple_bayesian[27][3] = -1.78807; qual_mismatch_simple_bayesian[27][4] = -2.0173; qual_mismatch_simple_bayesian[27][5] = -2.24626; qual_mismatch_simple_bayesian[27][6] = -2.47489; qual_mismatch_simple_bayesian[27][7] = -2.70311; qual_mismatch_simple_bayesian[27][8] = -2.9308; qual_mismatch_simple_bayesian[27][9] = -3.15784; qual_mismatch_simple_bayesian[27][10] = -3.38405; qual_mismatch_simple_bayesian[27][11] = -3.60925; qual_mismatch_simple_bayesian[27][12] = -3.83316; qual_mismatch_simple_bayesian[27][13] = -4.0555; qual_mismatch_simple_bayesian[27][14] = -4.27586; qual_mismatch_simple_bayesian[27][15] = -4.49381; qual_mismatch_simple_bayesian[27][16] = -4.70878; qual_mismatch_simple_bayesian[27][17] = -4.92012; qual_mismatch_simple_bayesian[27][18] = -5.12706; qual_mismatch_simple_bayesian[27][19] = -5.32871; qual_mismatch_simple_bayesian[27][20] = -5.52408; qual_mismatch_simple_bayesian[27][21] = -5.71208; qual_mismatch_simple_bayesian[27][22] = -5.89155; qual_mismatch_simple_bayesian[27][23] = -6.06134; qual_mismatch_simple_bayesian[27][24] = -6.22033; qual_mismatch_simple_bayesian[27][25] = -6.36754; qual_mismatch_simple_bayesian[27][26] = -6.50219; qual_mismatch_simple_bayesian[27][27] = -6.62378; qual_mismatch_simple_bayesian[27][28] = -6.73214; qual_mismatch_simple_bayesian[27][29] = -6.82745; qual_mismatch_simple_bayesian[27][30] = -6.91022; qual_mismatch_simple_bayesian[27][31] = -6.98123; qual_mismatch_simple_bayesian[27][32] = -7.04146; qual_mismatch_simple_bayesian[27][33] = -7.09203; qual_mismatch_simple_bayesian[27][34] = -7.13411; qual_mismatch_simple_bayesian[27][35] = -7.16884; qual_mismatch_simple_bayesian[27][36] = -7.19731; qual_mismatch_simple_bayesian[27][37] = -7.22052; qual_mismatch_simple_bayesian[27][38] = -7.23935; qual_mismatch_simple_bayesian[27][39] = -7.25456; qual_mismatch_simple_bayesian[27][40] = -7.26682; qual_mismatch_simple_bayesian[27][41] = -7.27666; qual_mismatch_simple_bayesian[27][42] = -7.28454; qual_mismatch_simple_bayesian[27][43] = -7.29085; qual_mismatch_simple_bayesian[27][44] = -7.29589; qual_mismatch_simple_bayesian[27][45] = -7.29991; qual_mismatch_simple_bayesian[27][46] = -7.30311; qual_mismatch_simple_bayesian[28][0] = -1.09914; qual_mismatch_simple_bayesian[28][1] = -1.32899; qual_mismatch_simple_bayesian[28][2] = -1.55873; qual_mismatch_simple_bayesian[28][3] = -1.78834; qual_mismatch_simple_bayesian[28][4] = -2.01778; qual_mismatch_simple_bayesian[28][5] = -2.24701; qual_mismatch_simple_bayesian[28][6] = -2.47598; qual_mismatch_simple_bayesian[28][7] = -2.70461; qual_mismatch_simple_bayesian[28][8] = -2.93282; qual_mismatch_simple_bayesian[28][9] = -3.16052; qual_mismatch_simple_bayesian[28][10] = -3.38756; qual_mismatch_simple_bayesian[28][11] = -3.61377; qual_mismatch_simple_bayesian[28][12] = -3.83897; qual_mismatch_simple_bayesian[28][13] = -4.06289; qual_mismatch_simple_bayesian[28][14] = -4.28523; qual_mismatch_simple_bayesian[28][15] = -4.5056; qual_mismatch_simple_bayesian[28][16] = -4.72355; qual_mismatch_simple_bayesian[28][17] = -4.93853; qual_mismatch_simple_bayesian[28][18] = -5.14988; qual_mismatch_simple_bayesian[28][19] = -5.35683; qual_mismatch_simple_bayesian[28][20] = -5.55849; qual_mismatch_simple_bayesian[28][21] = -5.75388; qual_mismatch_simple_bayesian[28][22] = -5.9419; qual_mismatch_simple_bayesian[28][23] = -6.12139; qual_mismatch_simple_bayesian[28][24] = -6.29121; qual_mismatch_simple_bayesian[28][25] = -6.45023; qual_mismatch_simple_bayesian[28][26] = -6.59746; qual_mismatch_simple_bayesian[28][27] = -6.73214; qual_mismatch_simple_bayesian[28][28] = -6.85376; qual_mismatch_simple_bayesian[28][29] = -6.96216; qual_mismatch_simple_bayesian[28][30] = -7.0575; qual_mismatch_simple_bayesian[28][31] = -7.1403; qual_mismatch_simple_bayesian[28][32] = -7.21133; qual_mismatch_simple_bayesian[28][33] = -7.27159; qual_mismatch_simple_bayesian[28][34] = -7.32218; qual_mismatch_simple_bayesian[28][35] = -7.36428; qual_mismatch_simple_bayesian[28][36] = -7.39902; qual_mismatch_simple_bayesian[28][37] = -7.42751; qual_mismatch_simple_bayesian[28][38] = -7.45073; qual_mismatch_simple_bayesian[28][39] = -7.46957; qual_mismatch_simple_bayesian[28][40] = -7.48479; qual_mismatch_simple_bayesian[28][41] = -7.49705; qual_mismatch_simple_bayesian[28][42] = -7.50689; qual_mismatch_simple_bayesian[28][43] = -7.51478; qual_mismatch_simple_bayesian[28][44] = -7.52109; qual_mismatch_simple_bayesian[28][45] = -7.52614; qual_mismatch_simple_bayesian[28][46] = -7.53016; qual_mismatch_simple_bayesian[29][0] = -1.09903; qual_mismatch_simple_bayesian[29][1] = -1.32896; qual_mismatch_simple_bayesian[29][2] = -1.55881; qual_mismatch_simple_bayesian[29][3] = -1.78855; qual_mismatch_simple_bayesian[29][4] = -2.01816; qual_mismatch_simple_bayesian[29][5] = -2.2476; qual_mismatch_simple_bayesian[29][6] = -2.47684; qual_mismatch_simple_bayesian[29][7] = -2.7058; qual_mismatch_simple_bayesian[29][8] = -2.93444; qual_mismatch_simple_bayesian[29][9] = -3.16265; qual_mismatch_simple_bayesian[29][10] = -3.39035; qual_mismatch_simple_bayesian[29][11] = -3.61738; qual_mismatch_simple_bayesian[29][12] = -3.84361; qual_mismatch_simple_bayesian[29][13] = -4.0688; qual_mismatch_simple_bayesian[29][14] = -4.29273; qual_mismatch_simple_bayesian[29][15] = -4.51507; qual_mismatch_simple_bayesian[29][16] = -4.73544; qual_mismatch_simple_bayesian[29][17] = -4.9534; qual_mismatch_simple_bayesian[29][18] = -5.16839; qual_mismatch_simple_bayesian[29][19] = -5.37974; qual_mismatch_simple_bayesian[29][20] = -5.5867; qual_mismatch_simple_bayesian[29][21] = -5.78837; qual_mismatch_simple_bayesian[29][22] = -5.98377; qual_mismatch_simple_bayesian[29][23] = -6.17181; qual_mismatch_simple_bayesian[29][24] = -6.35132; qual_mismatch_simple_bayesian[29][25] = -6.52116; qual_mismatch_simple_bayesian[29][26] = -6.6802; qual_mismatch_simple_bayesian[29][27] = -6.82745; qual_mismatch_simple_bayesian[29][28] = -6.96216; qual_mismatch_simple_bayesian[29][29] = -7.0838; qual_mismatch_simple_bayesian[29][30] = -7.19222; qual_mismatch_simple_bayesian[29][31] = -7.28759; qual_mismatch_simple_bayesian[29][32] = -7.37041; qual_mismatch_simple_bayesian[29][33] = -7.44147; qual_mismatch_simple_bayesian[29][34] = -7.50174; qual_mismatch_simple_bayesian[29][35] = -7.55235; qual_mismatch_simple_bayesian[29][36] = -7.59446; qual_mismatch_simple_bayesian[29][37] = -7.62922; qual_mismatch_simple_bayesian[29][38] = -7.65772; qual_mismatch_simple_bayesian[29][39] = -7.68095; qual_mismatch_simple_bayesian[29][40] = -7.6998; qual_mismatch_simple_bayesian[29][41] = -7.71502; qual_mismatch_simple_bayesian[29][42] = -7.72729; qual_mismatch_simple_bayesian[29][43] = -7.73713; qual_mismatch_simple_bayesian[29][44] = -7.74503; qual_mismatch_simple_bayesian[29][45] = -7.75134; qual_mismatch_simple_bayesian[29][46] = -7.75639; qual_mismatch_simple_bayesian[30][0] = -1.09895; qual_mismatch_simple_bayesian[30][1] = -1.32895; qual_mismatch_simple_bayesian[30][2] = -1.55888; qual_mismatch_simple_bayesian[30][3] = -1.78873; qual_mismatch_simple_bayesian[30][4] = -2.01847; qual_mismatch_simple_bayesian[30][5] = -2.24808; qual_mismatch_simple_bayesian[30][6] = -2.47752; qual_mismatch_simple_bayesian[30][7] = -2.70675; qual_mismatch_simple_bayesian[30][8] = -2.93572; qual_mismatch_simple_bayesian[30][9] = -3.16435; qual_mismatch_simple_bayesian[30][10] = -3.39257; qual_mismatch_simple_bayesian[30][11] = -3.62026; qual_mismatch_simple_bayesian[30][12] = -3.8473; qual_mismatch_simple_bayesian[30][13] = -4.07352; qual_mismatch_simple_bayesian[30][14] = -4.29872; qual_mismatch_simple_bayesian[30][15] = -4.52265; qual_mismatch_simple_bayesian[30][16] = -4.74499; qual_mismatch_simple_bayesian[30][17] = -4.96537; qual_mismatch_simple_bayesian[30][18] = -5.18334; qual_mismatch_simple_bayesian[30][19] = -5.39832; qual_mismatch_simple_bayesian[30][20] = -5.60969; qual_mismatch_simple_bayesian[30][21] = -5.81665; qual_mismatch_simple_bayesian[30][22] = -6.01833; qual_mismatch_simple_bayesian[30][23] = -6.21374; qual_mismatch_simple_bayesian[30][24] = -6.40179; qual_mismatch_simple_bayesian[30][25] = -6.58132; qual_mismatch_simple_bayesian[30][26] = -6.75117; qual_mismatch_simple_bayesian[30][27] = -6.91022; qual_mismatch_simple_bayesian[30][28] = -7.0575; qual_mismatch_simple_bayesian[30][29] = -7.19222; qual_mismatch_simple_bayesian[30][30] = -7.31389; qual_mismatch_simple_bayesian[30][31] = -7.42233; qual_mismatch_simple_bayesian[30][32] = -7.51772; qual_mismatch_simple_bayesian[30][33] = -7.60056; qual_mismatch_simple_bayesian[30][34] = -7.67163; qual_mismatch_simple_bayesian[30][35] = -7.73192; qual_mismatch_simple_bayesian[30][36] = -7.78254; qual_mismatch_simple_bayesian[30][37] = -7.82466; qual_mismatch_simple_bayesian[30][38] = -7.85943; qual_mismatch_simple_bayesian[30][39] = -7.88794; qual_mismatch_simple_bayesian[30][40] = -7.91118; qual_mismatch_simple_bayesian[30][41] = -7.93003; qual_mismatch_simple_bayesian[30][42] = -7.94526; qual_mismatch_simple_bayesian[30][43] = -7.95753; qual_mismatch_simple_bayesian[30][44] = -7.96738; qual_mismatch_simple_bayesian[30][45] = -7.97528; qual_mismatch_simple_bayesian[30][46] = -7.98159; qual_mismatch_simple_bayesian[31][0] = -1.09888; qual_mismatch_simple_bayesian[31][1] = -1.32893; qual_mismatch_simple_bayesian[31][2] = -1.55893; qual_mismatch_simple_bayesian[31][3] = -1.78886; qual_mismatch_simple_bayesian[31][4] = -2.01871; qual_mismatch_simple_bayesian[31][5] = -2.24845; qual_mismatch_simple_bayesian[31][6] = -2.47806; qual_mismatch_simple_bayesian[31][7] = -2.7075; qual_mismatch_simple_bayesian[31][8] = -2.93674; qual_mismatch_simple_bayesian[31][9] = -3.1657; qual_mismatch_simple_bayesian[31][10] = -3.39434; qual_mismatch_simple_bayesian[31][11] = -3.62255; qual_mismatch_simple_bayesian[31][12] = -3.85025; qual_mismatch_simple_bayesian[31][13] = -4.07729; qual_mismatch_simple_bayesian[31][14] = -4.30351; qual_mismatch_simple_bayesian[31][15] = -4.52872; qual_mismatch_simple_bayesian[31][16] = -4.75264; qual_mismatch_simple_bayesian[31][17] = -4.97499; qual_mismatch_simple_bayesian[31][18] = -5.19537; qual_mismatch_simple_bayesian[31][19] = -5.41334; qual_mismatch_simple_bayesian[31][20] = -5.62833; qual_mismatch_simple_bayesian[31][21] = -5.83969; qual_mismatch_simple_bayesian[31][22] = -6.04666; qual_mismatch_simple_bayesian[31][23] = -6.24836; qual_mismatch_simple_bayesian[31][24] = -6.44377; qual_mismatch_simple_bayesian[31][25] = -6.63183; qual_mismatch_simple_bayesian[31][26] = -6.81137; qual_mismatch_simple_bayesian[31][27] = -6.98123; qual_mismatch_simple_bayesian[31][28] = -7.1403; qual_mismatch_simple_bayesian[31][29] = -7.28759; qual_mismatch_simple_bayesian[31][30] = -7.42233; qual_mismatch_simple_bayesian[31][31] = -7.54401; qual_mismatch_simple_bayesian[31][32] = -7.65246; qual_mismatch_simple_bayesian[31][33] = -7.74787; qual_mismatch_simple_bayesian[31][34] = -7.83072; qual_mismatch_simple_bayesian[31][35] = -7.90181; qual_mismatch_simple_bayesian[31][36] = -7.96211; qual_mismatch_simple_bayesian[31][37] = -8.01274; qual_mismatch_simple_bayesian[31][38] = -8.05488; qual_mismatch_simple_bayesian[31][39] = -8.08965; qual_mismatch_simple_bayesian[31][40] = -8.11817; qual_mismatch_simple_bayesian[31][41] = -8.14141; qual_mismatch_simple_bayesian[31][42] = -8.16027; qual_mismatch_simple_bayesian[31][43] = -8.1755; qual_mismatch_simple_bayesian[31][44] = -8.18777; qual_mismatch_simple_bayesian[31][45] = -8.19763; qual_mismatch_simple_bayesian[31][46] = -8.20553; qual_mismatch_simple_bayesian[32][0] = -1.09882; qual_mismatch_simple_bayesian[32][1] = -1.32892; qual_mismatch_simple_bayesian[32][2] = -1.55897; qual_mismatch_simple_bayesian[32][3] = -1.78897; qual_mismatch_simple_bayesian[32][4] = -2.0189; qual_mismatch_simple_bayesian[32][5] = -2.24875; qual_mismatch_simple_bayesian[32][6] = -2.47849; qual_mismatch_simple_bayesian[32][7] = -2.7081; qual_mismatch_simple_bayesian[32][8] = -2.93755; qual_mismatch_simple_bayesian[32][9] = -3.16678; qual_mismatch_simple_bayesian[32][10] = -3.39574; qual_mismatch_simple_bayesian[32][11] = -3.62438; qual_mismatch_simple_bayesian[32][12] = -3.8526; qual_mismatch_simple_bayesian[32][13] = -4.08029; qual_mismatch_simple_bayesian[32][14] = -4.30734; qual_mismatch_simple_bayesian[32][15] = -4.53356; qual_mismatch_simple_bayesian[32][16] = -4.75876; qual_mismatch_simple_bayesian[32][17] = -4.98269; qual_mismatch_simple_bayesian[32][18] = -5.20504; qual_mismatch_simple_bayesian[32][19] = -5.42542; qual_mismatch_simple_bayesian[32][20] = -5.64339; qual_mismatch_simple_bayesian[32][21] = -5.85838; qual_mismatch_simple_bayesian[32][22] = -6.06975; qual_mismatch_simple_bayesian[32][23] = -6.27673; qual_mismatch_simple_bayesian[32][24] = -6.47843; qual_mismatch_simple_bayesian[32][25] = -6.67385; qual_mismatch_simple_bayesian[32][26] = -6.86191; qual_mismatch_simple_bayesian[32][27] = -7.04146; qual_mismatch_simple_bayesian[32][28] = -7.21133; qual_mismatch_simple_bayesian[32][29] = -7.37041; qual_mismatch_simple_bayesian[32][30] = -7.51772; qual_mismatch_simple_bayesian[32][31] = -7.65246; qual_mismatch_simple_bayesian[32][32] = -7.77416; qual_mismatch_simple_bayesian[32][33] = -7.88263; qual_mismatch_simple_bayesian[32][34] = -7.97804; qual_mismatch_simple_bayesian[32][35] = -8.06091; qual_mismatch_simple_bayesian[32][36] = -8.132; qual_mismatch_simple_bayesian[32][37] = -8.19232; qual_mismatch_simple_bayesian[32][38] = -8.24296; qual_mismatch_simple_bayesian[32][39] = -8.2851; qual_mismatch_simple_bayesian[32][40] = -8.31988; qual_mismatch_simple_bayesian[32][41] = -8.3484; qual_mismatch_simple_bayesian[32][42] = -8.37165; qual_mismatch_simple_bayesian[32][43] = -8.39051; qual_mismatch_simple_bayesian[32][44] = -8.40575; qual_mismatch_simple_bayesian[32][45] = -8.41802; qual_mismatch_simple_bayesian[32][46] = -8.42788; qual_mismatch_simple_bayesian[33][0] = -1.09878; qual_mismatch_simple_bayesian[33][1] = -1.32891; qual_mismatch_simple_bayesian[33][2] = -1.559; qual_mismatch_simple_bayesian[33][3] = -1.78906; qual_mismatch_simple_bayesian[33][4] = -2.01906; qual_mismatch_simple_bayesian[33][5] = -2.24899; qual_mismatch_simple_bayesian[33][6] = -2.47884; qual_mismatch_simple_bayesian[33][7] = -2.70858; qual_mismatch_simple_bayesian[33][8] = -2.93819; qual_mismatch_simple_bayesian[33][9] = -3.16763; qual_mismatch_simple_bayesian[33][10] = -3.39686; qual_mismatch_simple_bayesian[33][11] = -3.62583; qual_mismatch_simple_bayesian[33][12] = -3.85447; qual_mismatch_simple_bayesian[33][13] = -4.08268; qual_mismatch_simple_bayesian[33][14] = -4.31038; qual_mismatch_simple_bayesian[33][15] = -4.53742; qual_mismatch_simple_bayesian[33][16] = -4.76365; qual_mismatch_simple_bayesian[33][17] = -4.98885; qual_mismatch_simple_bayesian[33][18] = -5.21278; qual_mismatch_simple_bayesian[33][19] = -5.43513; qual_mismatch_simple_bayesian[33][20] = -5.65552; qual_mismatch_simple_bayesian[33][21] = -5.87348; qual_mismatch_simple_bayesian[33][22] = -6.08848; qual_mismatch_simple_bayesian[33][23] = -6.29986; qual_mismatch_simple_bayesian[33][24] = -6.50683; qual_mismatch_simple_bayesian[33][25] = -6.70854; qual_mismatch_simple_bayesian[33][26] = -6.90396; qual_mismatch_simple_bayesian[33][27] = -7.09203; qual_mismatch_simple_bayesian[33][28] = -7.27159; qual_mismatch_simple_bayesian[33][29] = -7.44147; qual_mismatch_simple_bayesian[33][30] = -7.60056; qual_mismatch_simple_bayesian[33][31] = -7.74787; qual_mismatch_simple_bayesian[33][32] = -7.88263; qual_mismatch_simple_bayesian[33][33] = -8.00433; qual_mismatch_simple_bayesian[33][34] = -8.11281; qual_mismatch_simple_bayesian[33][35] = -8.20823; qual_mismatch_simple_bayesian[33][36] = -8.29111; qual_mismatch_simple_bayesian[33][37] = -8.36221; qual_mismatch_simple_bayesian[33][38] = -8.42253; qual_mismatch_simple_bayesian[33][39] = -8.47318; qual_mismatch_simple_bayesian[33][40] = -8.51533; qual_mismatch_simple_bayesian[33][41] = -8.55012; qual_mismatch_simple_bayesian[33][42] = -8.57864; qual_mismatch_simple_bayesian[33][43] = -8.60189; qual_mismatch_simple_bayesian[33][44] = -8.62076; qual_mismatch_simple_bayesian[33][45] = -8.636; qual_mismatch_simple_bayesian[33][46] = -8.64827; qual_mismatch_simple_bayesian[34][0] = -1.09874; qual_mismatch_simple_bayesian[34][1] = -1.3289; qual_mismatch_simple_bayesian[34][2] = -1.55903; qual_mismatch_simple_bayesian[34][3] = -1.78912; qual_mismatch_simple_bayesian[34][4] = -2.01918; qual_mismatch_simple_bayesian[34][5] = -2.24918; qual_mismatch_simple_bayesian[34][6] = -2.47911; qual_mismatch_simple_bayesian[34][7] = -2.70896; qual_mismatch_simple_bayesian[34][8] = -2.9387; qual_mismatch_simple_bayesian[34][9] = -3.16831; qual_mismatch_simple_bayesian[34][10] = -3.39775; qual_mismatch_simple_bayesian[34][11] = -3.62698; qual_mismatch_simple_bayesian[34][12] = -3.85595; qual_mismatch_simple_bayesian[34][13] = -4.08459; qual_mismatch_simple_bayesian[34][14] = -4.31281; qual_mismatch_simple_bayesian[34][15] = -4.5405; qual_mismatch_simple_bayesian[34][16] = -4.76755; qual_mismatch_simple_bayesian[34][17] = -4.99377; qual_mismatch_simple_bayesian[34][18] = -5.21897; qual_mismatch_simple_bayesian[34][19] = -5.44291; qual_mismatch_simple_bayesian[34][20] = -5.66525; qual_mismatch_simple_bayesian[34][21] = -5.88564; qual_mismatch_simple_bayesian[34][22] = -6.10361; qual_mismatch_simple_bayesian[34][23] = -6.31861; qual_mismatch_simple_bayesian[34][24] = -6.52999; qual_mismatch_simple_bayesian[34][25] = -6.73697; qual_mismatch_simple_bayesian[34][26] = -6.93867; qual_mismatch_simple_bayesian[34][27] = -7.13411; qual_mismatch_simple_bayesian[34][28] = -7.32218; qual_mismatch_simple_bayesian[34][29] = -7.50174; qual_mismatch_simple_bayesian[34][30] = -7.67163; qual_mismatch_simple_bayesian[34][31] = -7.83072; qual_mismatch_simple_bayesian[34][32] = -7.97804; qual_mismatch_simple_bayesian[34][33] = -8.11281; qual_mismatch_simple_bayesian[34][34] = -8.23452; qual_mismatch_simple_bayesian[34][35] = -8.34301; qual_mismatch_simple_bayesian[34][36] = -8.43844; qual_mismatch_simple_bayesian[34][37] = -8.52132; qual_mismatch_simple_bayesian[34][38] = -8.59243; qual_mismatch_simple_bayesian[34][39] = -8.65276; qual_mismatch_simple_bayesian[34][40] = -8.70341; qual_mismatch_simple_bayesian[34][41] = -8.74556; qual_mismatch_simple_bayesian[34][42] = -8.78036; qual_mismatch_simple_bayesian[34][43] = -8.80888; qual_mismatch_simple_bayesian[34][44] = -8.83214; qual_mismatch_simple_bayesian[34][45] = -8.851; qual_mismatch_simple_bayesian[34][46] = -8.86625; qual_mismatch_simple_bayesian[35][0] = -1.09872; qual_mismatch_simple_bayesian[35][1] = -1.32889; qual_mismatch_simple_bayesian[35][2] = -1.55905; qual_mismatch_simple_bayesian[35][3] = -1.78918; qual_mismatch_simple_bayesian[35][4] = -2.01927; qual_mismatch_simple_bayesian[35][5] = -2.24933; qual_mismatch_simple_bayesian[35][6] = -2.47933; qual_mismatch_simple_bayesian[35][7] = -2.70926; qual_mismatch_simple_bayesian[35][8] = -2.93911; qual_mismatch_simple_bayesian[35][9] = -3.16885; qual_mismatch_simple_bayesian[35][10] = -3.39846; qual_mismatch_simple_bayesian[35][11] = -3.6279; qual_mismatch_simple_bayesian[35][12] = -3.85713; qual_mismatch_simple_bayesian[35][13] = -4.0861; qual_mismatch_simple_bayesian[35][14] = -4.31474; qual_mismatch_simple_bayesian[35][15] = -4.54296; qual_mismatch_simple_bayesian[35][16] = -4.77065; qual_mismatch_simple_bayesian[35][17] = -4.9977; qual_mismatch_simple_bayesian[35][18] = -5.22392; qual_mismatch_simple_bayesian[35][19] = -5.44913; qual_mismatch_simple_bayesian[35][20] = -5.67306; qual_mismatch_simple_bayesian[35][21] = -5.89541; qual_mismatch_simple_bayesian[35][22] = -6.1158; qual_mismatch_simple_bayesian[35][23] = -6.33377; qual_mismatch_simple_bayesian[35][24] = -6.54877; qual_mismatch_simple_bayesian[35][25] = -6.76015; qual_mismatch_simple_bayesian[35][26] = -6.96713; qual_mismatch_simple_bayesian[35][27] = -7.16884; qual_mismatch_simple_bayesian[35][28] = -7.36428; qual_mismatch_simple_bayesian[35][29] = -7.55235; qual_mismatch_simple_bayesian[35][30] = -7.73192; qual_mismatch_simple_bayesian[35][31] = -7.90181; qual_mismatch_simple_bayesian[35][32] = -8.06091; qual_mismatch_simple_bayesian[35][33] = -8.20823; qual_mismatch_simple_bayesian[35][34] = -8.34301; qual_mismatch_simple_bayesian[35][35] = -8.46472; qual_mismatch_simple_bayesian[35][36] = -8.57322; qual_mismatch_simple_bayesian[35][37] = -8.66866; qual_mismatch_simple_bayesian[35][38] = -8.75154; qual_mismatch_simple_bayesian[35][39] = -8.82266; qual_mismatch_simple_bayesian[35][40] = -8.88299; qual_mismatch_simple_bayesian[35][41] = -8.93365; qual_mismatch_simple_bayesian[35][42] = -8.9758; qual_mismatch_simple_bayesian[35][43] = -9.0106; qual_mismatch_simple_bayesian[35][44] = -9.03913; qual_mismatch_simple_bayesian[35][45] = -9.06239; qual_mismatch_simple_bayesian[35][46] = -9.08126; qual_mismatch_simple_bayesian[36][0] = -1.0987; qual_mismatch_simple_bayesian[36][1] = -1.32889; qual_mismatch_simple_bayesian[36][2] = -1.55907; qual_mismatch_simple_bayesian[36][3] = -1.78922; qual_mismatch_simple_bayesian[36][4] = -2.01935; qual_mismatch_simple_bayesian[36][5] = -2.24945; qual_mismatch_simple_bayesian[36][6] = -2.4795; qual_mismatch_simple_bayesian[36][7] = -2.7095; qual_mismatch_simple_bayesian[36][8] = -2.93943; qual_mismatch_simple_bayesian[36][9] = -3.16928; qual_mismatch_simple_bayesian[36][10] = -3.39902; qual_mismatch_simple_bayesian[36][11] = -3.62863; qual_mismatch_simple_bayesian[36][12] = -3.85807; qual_mismatch_simple_bayesian[36][13] = -4.08731; qual_mismatch_simple_bayesian[36][14] = -4.31627; qual_mismatch_simple_bayesian[36][15] = -4.54491; qual_mismatch_simple_bayesian[36][16] = -4.77313; qual_mismatch_simple_bayesian[36][17] = -5.00083; qual_mismatch_simple_bayesian[36][18] = -5.22787; qual_mismatch_simple_bayesian[36][19] = -5.4541; qual_mismatch_simple_bayesian[36][20] = -5.6793; qual_mismatch_simple_bayesian[36][21] = -5.90323; qual_mismatch_simple_bayesian[36][22] = -6.12558; qual_mismatch_simple_bayesian[36][23] = -6.34597; qual_mismatch_simple_bayesian[36][24] = -6.56395; qual_mismatch_simple_bayesian[36][25] = -6.77895; qual_mismatch_simple_bayesian[36][26] = -6.99033; qual_mismatch_simple_bayesian[36][27] = -7.19731; qual_mismatch_simple_bayesian[36][28] = -7.39902; qual_mismatch_simple_bayesian[36][29] = -7.59446; qual_mismatch_simple_bayesian[36][30] = -7.78254; qual_mismatch_simple_bayesian[36][31] = -7.96211; qual_mismatch_simple_bayesian[36][32] = -8.132; qual_mismatch_simple_bayesian[36][33] = -8.29111; qual_mismatch_simple_bayesian[36][34] = -8.43844; qual_mismatch_simple_bayesian[36][35] = -8.57322; qual_mismatch_simple_bayesian[36][36] = -8.69494; qual_mismatch_simple_bayesian[36][37] = -8.80344; qual_mismatch_simple_bayesian[36][38] = -8.89888; qual_mismatch_simple_bayesian[36][39] = -8.98177; qual_mismatch_simple_bayesian[36][40] = -9.05289; qual_mismatch_simple_bayesian[36][41] = -9.11323; qual_mismatch_simple_bayesian[36][42] = -9.16389; qual_mismatch_simple_bayesian[36][43] = -9.20605; qual_mismatch_simple_bayesian[36][44] = -9.24085; qual_mismatch_simple_bayesian[36][45] = -9.26938; qual_mismatch_simple_bayesian[36][46] = -9.29264; qual_mismatch_simple_bayesian[37][0] = -1.09868; qual_mismatch_simple_bayesian[37][1] = -1.32889; qual_mismatch_simple_bayesian[37][2] = -1.55908; qual_mismatch_simple_bayesian[37][3] = -1.78926; qual_mismatch_simple_bayesian[37][4] = -2.01941; qual_mismatch_simple_bayesian[37][5] = -2.24954; qual_mismatch_simple_bayesian[37][6] = -2.47964; qual_mismatch_simple_bayesian[37][7] = -2.70969; qual_mismatch_simple_bayesian[37][8] = -2.93969; qual_mismatch_simple_bayesian[37][9] = -3.16962; qual_mismatch_simple_bayesian[37][10] = -3.39947; qual_mismatch_simple_bayesian[37][11] = -3.62921; qual_mismatch_simple_bayesian[37][12] = -3.85882; qual_mismatch_simple_bayesian[37][13] = -4.08826; qual_mismatch_simple_bayesian[37][14] = -4.3175; qual_mismatch_simple_bayesian[37][15] = -4.54646; qual_mismatch_simple_bayesian[37][16] = -4.7751; qual_mismatch_simple_bayesian[37][17] = -5.00332; qual_mismatch_simple_bayesian[37][18] = -5.23102; qual_mismatch_simple_bayesian[37][19] = -5.45806; qual_mismatch_simple_bayesian[37][20] = -5.68429; qual_mismatch_simple_bayesian[37][21] = -5.90949; qual_mismatch_simple_bayesian[37][22] = -6.13342; qual_mismatch_simple_bayesian[37][23] = -6.35578; qual_mismatch_simple_bayesian[37][24] = -6.57617; qual_mismatch_simple_bayesian[37][25] = -6.79414; qual_mismatch_simple_bayesian[37][26] = -7.00914; qual_mismatch_simple_bayesian[37][27] = -7.22052; qual_mismatch_simple_bayesian[37][28] = -7.42751; qual_mismatch_simple_bayesian[37][29] = -7.62922; qual_mismatch_simple_bayesian[37][30] = -7.82466; qual_mismatch_simple_bayesian[37][31] = -8.01274; qual_mismatch_simple_bayesian[37][32] = -8.19232; qual_mismatch_simple_bayesian[37][33] = -8.36221; qual_mismatch_simple_bayesian[37][34] = -8.52132; qual_mismatch_simple_bayesian[37][35] = -8.66866; qual_mismatch_simple_bayesian[37][36] = -8.80344; qual_mismatch_simple_bayesian[37][37] = -8.92516; qual_mismatch_simple_bayesian[37][38] = -9.03366; qual_mismatch_simple_bayesian[37][39] = -9.12911; qual_mismatch_simple_bayesian[37][40] = -9.21201; qual_mismatch_simple_bayesian[37][41] = -9.28313; qual_mismatch_simple_bayesian[37][42] = -9.34347; qual_mismatch_simple_bayesian[37][43] = -9.39414; qual_mismatch_simple_bayesian[37][44] = -9.43629; qual_mismatch_simple_bayesian[37][45] = -9.4711; qual_mismatch_simple_bayesian[37][46] = -9.49963; qual_mismatch_simple_bayesian[38][0] = -1.09867; qual_mismatch_simple_bayesian[38][1] = -1.32888; qual_mismatch_simple_bayesian[38][2] = -1.55909; qual_mismatch_simple_bayesian[38][3] = -1.78928; qual_mismatch_simple_bayesian[38][4] = -2.01946; qual_mismatch_simple_bayesian[38][5] = -2.24962; qual_mismatch_simple_bayesian[38][6] = -2.47974; qual_mismatch_simple_bayesian[38][7] = -2.70984; qual_mismatch_simple_bayesian[38][8] = -2.93989; qual_mismatch_simple_bayesian[38][9] = -3.16989; qual_mismatch_simple_bayesian[38][10] = -3.39982; qual_mismatch_simple_bayesian[38][11] = -3.62967; qual_mismatch_simple_bayesian[38][12] = -3.85942; qual_mismatch_simple_bayesian[38][13] = -4.08903; qual_mismatch_simple_bayesian[38][14] = -4.31847; qual_mismatch_simple_bayesian[38][15] = -4.5477; qual_mismatch_simple_bayesian[38][16] = -4.77667; qual_mismatch_simple_bayesian[38][17] = -5.0053; qual_mismatch_simple_bayesian[38][18] = -5.23352; qual_mismatch_simple_bayesian[38][19] = -5.46122; qual_mismatch_simple_bayesian[38][20] = -5.68827; qual_mismatch_simple_bayesian[38][21] = -5.91449; qual_mismatch_simple_bayesian[38][22] = -6.1397; qual_mismatch_simple_bayesian[38][23] = -6.36363; qual_mismatch_simple_bayesian[38][24] = -6.58598; qual_mismatch_simple_bayesian[38][25] = -6.80637; qual_mismatch_simple_bayesian[38][26] = -7.02435; qual_mismatch_simple_bayesian[38][27] = -7.23935; qual_mismatch_simple_bayesian[38][28] = -7.45073; qual_mismatch_simple_bayesian[38][29] = -7.65772; qual_mismatch_simple_bayesian[38][30] = -7.85943; qual_mismatch_simple_bayesian[38][31] = -8.05488; qual_mismatch_simple_bayesian[38][32] = -8.24296; qual_mismatch_simple_bayesian[38][33] = -8.42253; qual_mismatch_simple_bayesian[38][34] = -8.59243; qual_mismatch_simple_bayesian[38][35] = -8.75154; qual_mismatch_simple_bayesian[38][36] = -8.89888; qual_mismatch_simple_bayesian[38][37] = -9.03366; qual_mismatch_simple_bayesian[38][38] = -9.15539; qual_mismatch_simple_bayesian[38][39] = -9.2639; qual_mismatch_simple_bayesian[38][40] = -9.35935; qual_mismatch_simple_bayesian[38][41] = -9.44225; qual_mismatch_simple_bayesian[38][42] = -9.51338; qual_mismatch_simple_bayesian[38][43] = -9.57372; qual_mismatch_simple_bayesian[38][44] = -9.62438; qual_mismatch_simple_bayesian[38][45] = -9.66654; qual_mismatch_simple_bayesian[38][46] = -9.70135; qual_mismatch_simple_bayesian[39][0] = -1.09865; qual_mismatch_simple_bayesian[39][1] = -1.32888; qual_mismatch_simple_bayesian[39][2] = -1.5591; qual_mismatch_simple_bayesian[39][3] = -1.7893; qual_mismatch_simple_bayesian[39][4] = -2.0195; qual_mismatch_simple_bayesian[39][5] = -2.24967; qual_mismatch_simple_bayesian[39][6] = -2.47983; qual_mismatch_simple_bayesian[39][7] = -2.70996; qual_mismatch_simple_bayesian[39][8] = -2.94005; qual_mismatch_simple_bayesian[39][9] = -3.17011; qual_mismatch_simple_bayesian[39][10] = -3.40011; qual_mismatch_simple_bayesian[39][11] = -3.63004; qual_mismatch_simple_bayesian[39][12] = -3.85989; qual_mismatch_simple_bayesian[39][13] = -4.08963; qual_mismatch_simple_bayesian[39][14] = -4.31924; qual_mismatch_simple_bayesian[39][15] = -4.54868; qual_mismatch_simple_bayesian[39][16] = -4.77792; qual_mismatch_simple_bayesian[39][17] = -5.00688; qual_mismatch_simple_bayesian[39][18] = -5.23552; qual_mismatch_simple_bayesian[39][19] = -5.46374; qual_mismatch_simple_bayesian[39][20] = -5.69144; qual_mismatch_simple_bayesian[39][21] = -5.91848; qual_mismatch_simple_bayesian[39][22] = -6.14471; qual_mismatch_simple_bayesian[39][23] = -6.36991; qual_mismatch_simple_bayesian[39][24] = -6.59385; qual_mismatch_simple_bayesian[39][25] = -6.8162; qual_mismatch_simple_bayesian[39][26] = -7.03659; qual_mismatch_simple_bayesian[39][27] = -7.25456; qual_mismatch_simple_bayesian[39][28] = -7.46957; qual_mismatch_simple_bayesian[39][29] = -7.68095; qual_mismatch_simple_bayesian[39][30] = -7.88794; qual_mismatch_simple_bayesian[39][31] = -8.08965; qual_mismatch_simple_bayesian[39][32] = -8.2851; qual_mismatch_simple_bayesian[39][33] = -8.47318; qual_mismatch_simple_bayesian[39][34] = -8.65276; qual_mismatch_simple_bayesian[39][35] = -8.82266; qual_mismatch_simple_bayesian[39][36] = -8.98177; qual_mismatch_simple_bayesian[39][37] = -9.12911; qual_mismatch_simple_bayesian[39][38] = -9.2639; qual_mismatch_simple_bayesian[39][39] = -9.38563; qual_mismatch_simple_bayesian[39][40] = -9.49414; qual_mismatch_simple_bayesian[39][41] = -9.58959; qual_mismatch_simple_bayesian[39][42] = -9.67249; qual_mismatch_simple_bayesian[39][43] = -9.74362; qual_mismatch_simple_bayesian[39][44] = -9.80396; qual_mismatch_simple_bayesian[39][45] = -9.85463; qual_mismatch_simple_bayesian[39][46] = -9.8968; qual_mismatch_simple_bayesian[40][0] = -1.09865; qual_mismatch_simple_bayesian[40][1] = -1.32888; qual_mismatch_simple_bayesian[40][2] = -1.5591; qual_mismatch_simple_bayesian[40][3] = -1.78932; qual_mismatch_simple_bayesian[40][4] = -2.01953; qual_mismatch_simple_bayesian[40][5] = -2.24972; qual_mismatch_simple_bayesian[40][6] = -2.4799; qual_mismatch_simple_bayesian[40][7] = -2.71005; qual_mismatch_simple_bayesian[40][8] = -2.94018; qual_mismatch_simple_bayesian[40][9] = -3.17028; qual_mismatch_simple_bayesian[40][10] = -3.40033; qual_mismatch_simple_bayesian[40][11] = -3.63033; qual_mismatch_simple_bayesian[40][12] = -3.86026; qual_mismatch_simple_bayesian[40][13] = -4.09011; qual_mismatch_simple_bayesian[40][14] = -4.31986; qual_mismatch_simple_bayesian[40][15] = -4.54947; qual_mismatch_simple_bayesian[40][16] = -4.77891; qual_mismatch_simple_bayesian[40][17] = -5.00814; qual_mismatch_simple_bayesian[40][18] = -5.23711; qual_mismatch_simple_bayesian[40][19] = -5.46574; qual_mismatch_simple_bayesian[40][20] = -5.69396; qual_mismatch_simple_bayesian[40][21] = -5.92166; qual_mismatch_simple_bayesian[40][22] = -6.14871; qual_mismatch_simple_bayesian[40][23] = -6.37493; qual_mismatch_simple_bayesian[40][24] = -6.60014; qual_mismatch_simple_bayesian[40][25] = -6.82407; qual_mismatch_simple_bayesian[40][26] = -7.04642; qual_mismatch_simple_bayesian[40][27] = -7.26682; qual_mismatch_simple_bayesian[40][28] = -7.48479; qual_mismatch_simple_bayesian[40][29] = -7.6998; qual_mismatch_simple_bayesian[40][30] = -7.91118; qual_mismatch_simple_bayesian[40][31] = -8.11817; qual_mismatch_simple_bayesian[40][32] = -8.31988; qual_mismatch_simple_bayesian[40][33] = -8.51533; qual_mismatch_simple_bayesian[40][34] = -8.70341; qual_mismatch_simple_bayesian[40][35] = -8.88299; qual_mismatch_simple_bayesian[40][36] = -9.05289; qual_mismatch_simple_bayesian[40][37] = -9.21201; qual_mismatch_simple_bayesian[40][38] = -9.35935; qual_mismatch_simple_bayesian[40][39] = -9.49414; qual_mismatch_simple_bayesian[40][40] = -9.61587; qual_mismatch_simple_bayesian[40][41] = -9.72438; qual_mismatch_simple_bayesian[40][42] = -9.81984; qual_mismatch_simple_bayesian[40][43] = -9.90274; qual_mismatch_simple_bayesian[40][44] = -9.97387; qual_mismatch_simple_bayesian[40][45] = -10.0342; qual_mismatch_simple_bayesian[40][46] = -10.0849; qual_mismatch_simple_bayesian[41][0] = -1.09864; qual_mismatch_simple_bayesian[41][1] = -1.32888; qual_mismatch_simple_bayesian[41][2] = -1.55911; qual_mismatch_simple_bayesian[41][3] = -1.78934; qual_mismatch_simple_bayesian[41][4] = -2.01955; qual_mismatch_simple_bayesian[41][5] = -2.24976; qual_mismatch_simple_bayesian[41][6] = -2.47995; qual_mismatch_simple_bayesian[41][7] = -2.71013; qual_mismatch_simple_bayesian[41][8] = -2.94029; qual_mismatch_simple_bayesian[41][9] = -3.17041; qual_mismatch_simple_bayesian[41][10] = -3.40051; qual_mismatch_simple_bayesian[41][11] = -3.63056; qual_mismatch_simple_bayesian[41][12] = -3.86056; qual_mismatch_simple_bayesian[41][13] = -4.0905; qual_mismatch_simple_bayesian[41][14] = -4.32034; qual_mismatch_simple_bayesian[41][15] = -4.55009; qual_mismatch_simple_bayesian[41][16] = -4.7797; qual_mismatch_simple_bayesian[41][17] = -5.00914; qual_mismatch_simple_bayesian[41][18] = -5.23837; qual_mismatch_simple_bayesian[41][19] = -5.46734; qual_mismatch_simple_bayesian[41][20] = -5.69598; qual_mismatch_simple_bayesian[41][21] = -5.9242; qual_mismatch_simple_bayesian[41][22] = -6.15189; qual_mismatch_simple_bayesian[41][23] = -6.37894; qual_mismatch_simple_bayesian[41][24] = -6.60516; qual_mismatch_simple_bayesian[41][25] = -6.83037; qual_mismatch_simple_bayesian[41][26] = -7.0543; qual_mismatch_simple_bayesian[41][27] = -7.27666; qual_mismatch_simple_bayesian[41][28] = -7.49705; qual_mismatch_simple_bayesian[41][29] = -7.71502; qual_mismatch_simple_bayesian[41][30] = -7.93003; qual_mismatch_simple_bayesian[41][31] = -8.14141; qual_mismatch_simple_bayesian[41][32] = -8.3484; qual_mismatch_simple_bayesian[41][33] = -8.55012; qual_mismatch_simple_bayesian[41][34] = -8.74556; qual_mismatch_simple_bayesian[41][35] = -8.93365; qual_mismatch_simple_bayesian[41][36] = -9.11323; qual_mismatch_simple_bayesian[41][37] = -9.28313; qual_mismatch_simple_bayesian[41][38] = -9.44225; qual_mismatch_simple_bayesian[41][39] = -9.58959; qual_mismatch_simple_bayesian[41][40] = -9.72438; qual_mismatch_simple_bayesian[41][41] = -9.84612; qual_mismatch_simple_bayesian[41][42] = -9.95463; qual_mismatch_simple_bayesian[41][43] = -10.0501; qual_mismatch_simple_bayesian[41][44] = -10.133; qual_mismatch_simple_bayesian[41][45] = -10.2041; qual_mismatch_simple_bayesian[41][46] = -10.2645; qual_mismatch_simple_bayesian[42][0] = -1.09863; qual_mismatch_simple_bayesian[42][1] = -1.32888; qual_mismatch_simple_bayesian[42][2] = -1.55911; qual_mismatch_simple_bayesian[42][3] = -1.78935; qual_mismatch_simple_bayesian[42][4] = -2.01957; qual_mismatch_simple_bayesian[42][5] = -2.24979; qual_mismatch_simple_bayesian[42][6] = -2.48; qual_mismatch_simple_bayesian[42][7] = -2.71019; qual_mismatch_simple_bayesian[42][8] = -2.94037; qual_mismatch_simple_bayesian[42][9] = -3.17052; qual_mismatch_simple_bayesian[42][10] = -3.40065; qual_mismatch_simple_bayesian[42][11] = -3.63075; qual_mismatch_simple_bayesian[42][12] = -3.8608; qual_mismatch_simple_bayesian[42][13] = -4.0908; qual_mismatch_simple_bayesian[42][14] = -4.32073; qual_mismatch_simple_bayesian[42][15] = -4.55058; qual_mismatch_simple_bayesian[42][16] = -4.78032; qual_mismatch_simple_bayesian[42][17] = -5.00993; qual_mismatch_simple_bayesian[42][18] = -5.23938; qual_mismatch_simple_bayesian[42][19] = -5.46861; qual_mismatch_simple_bayesian[42][20] = -5.69758; qual_mismatch_simple_bayesian[42][21] = -5.92621; qual_mismatch_simple_bayesian[42][22] = -6.15443; qual_mismatch_simple_bayesian[42][23] = -6.38213; qual_mismatch_simple_bayesian[42][24] = -6.60917; qual_mismatch_simple_bayesian[42][25] = -6.8354; qual_mismatch_simple_bayesian[42][26] = -7.06061; qual_mismatch_simple_bayesian[42][27] = -7.28454; qual_mismatch_simple_bayesian[42][28] = -7.50689; qual_mismatch_simple_bayesian[42][29] = -7.72729; qual_mismatch_simple_bayesian[42][30] = -7.94526; qual_mismatch_simple_bayesian[42][31] = -8.16027; qual_mismatch_simple_bayesian[42][32] = -8.37165; qual_mismatch_simple_bayesian[42][33] = -8.57864; qual_mismatch_simple_bayesian[42][34] = -8.78036; qual_mismatch_simple_bayesian[42][35] = -8.9758; qual_mismatch_simple_bayesian[42][36] = -9.16389; qual_mismatch_simple_bayesian[42][37] = -9.34347; qual_mismatch_simple_bayesian[42][38] = -9.51338; qual_mismatch_simple_bayesian[42][39] = -9.67249; qual_mismatch_simple_bayesian[42][40] = -9.81984; qual_mismatch_simple_bayesian[42][41] = -9.95463; qual_mismatch_simple_bayesian[42][42] = -10.0764; qual_mismatch_simple_bayesian[42][43] = -10.1849; qual_mismatch_simple_bayesian[42][44] = -10.2803; qual_mismatch_simple_bayesian[42][45] = -10.3632; qual_mismatch_simple_bayesian[42][46] = -10.4344; qual_mismatch_simple_bayesian[43][0] = -1.09863; qual_mismatch_simple_bayesian[43][1] = -1.32887; qual_mismatch_simple_bayesian[43][2] = -1.55912; qual_mismatch_simple_bayesian[43][3] = -1.78935; qual_mismatch_simple_bayesian[43][4] = -2.01959; qual_mismatch_simple_bayesian[43][5] = -2.24981; qual_mismatch_simple_bayesian[43][6] = -2.48003; qual_mismatch_simple_bayesian[43][7] = -2.71024; qual_mismatch_simple_bayesian[43][8] = -2.94043; qual_mismatch_simple_bayesian[43][9] = -3.17061; qual_mismatch_simple_bayesian[43][10] = -3.40076; qual_mismatch_simple_bayesian[43][11] = -3.63089; qual_mismatch_simple_bayesian[43][12] = -3.86099; qual_mismatch_simple_bayesian[43][13] = -4.09104; qual_mismatch_simple_bayesian[43][14] = -4.32104; qual_mismatch_simple_bayesian[43][15] = -4.55097; qual_mismatch_simple_bayesian[43][16] = -4.78082; qual_mismatch_simple_bayesian[43][17] = -5.01056; qual_mismatch_simple_bayesian[43][18] = -5.24017; qual_mismatch_simple_bayesian[43][19] = -5.46962; qual_mismatch_simple_bayesian[43][20] = -5.69885; qual_mismatch_simple_bayesian[43][21] = -5.92782; qual_mismatch_simple_bayesian[43][22] = -6.15645; qual_mismatch_simple_bayesian[43][23] = -6.38467; qual_mismatch_simple_bayesian[43][24] = -6.61237; qual_mismatch_simple_bayesian[43][25] = -6.83942; qual_mismatch_simple_bayesian[43][26] = -7.06564; qual_mismatch_simple_bayesian[43][27] = -7.29085; qual_mismatch_simple_bayesian[43][28] = -7.51478; qual_mismatch_simple_bayesian[43][29] = -7.73713; qual_mismatch_simple_bayesian[43][30] = -7.95753; qual_mismatch_simple_bayesian[43][31] = -8.1755; qual_mismatch_simple_bayesian[43][32] = -8.39051; qual_mismatch_simple_bayesian[43][33] = -8.60189; qual_mismatch_simple_bayesian[43][34] = -8.80888; qual_mismatch_simple_bayesian[43][35] = -9.0106; qual_mismatch_simple_bayesian[43][36] = -9.20605; qual_mismatch_simple_bayesian[43][37] = -9.39414; qual_mismatch_simple_bayesian[43][38] = -9.57372; qual_mismatch_simple_bayesian[43][39] = -9.74362; qual_mismatch_simple_bayesian[43][40] = -9.90274; qual_mismatch_simple_bayesian[43][41] = -10.0501; qual_mismatch_simple_bayesian[43][42] = -10.1849; qual_mismatch_simple_bayesian[43][43] = -10.3066; qual_mismatch_simple_bayesian[43][44] = -10.4151; qual_mismatch_simple_bayesian[43][45] = -10.5106; qual_mismatch_simple_bayesian[43][46] = -10.5935; qual_mismatch_simple_bayesian[44][0] = -1.09863; qual_mismatch_simple_bayesian[44][1] = -1.32887; qual_mismatch_simple_bayesian[44][2] = -1.55912; qual_mismatch_simple_bayesian[44][3] = -1.78936; qual_mismatch_simple_bayesian[44][4] = -2.0196; qual_mismatch_simple_bayesian[44][5] = -2.24983; qual_mismatch_simple_bayesian[44][6] = -2.48006; qual_mismatch_simple_bayesian[44][7] = -2.71028; qual_mismatch_simple_bayesian[44][8] = -2.94048; qual_mismatch_simple_bayesian[44][9] = -3.17068; qual_mismatch_simple_bayesian[44][10] = -3.40085; qual_mismatch_simple_bayesian[44][11] = -3.63101; qual_mismatch_simple_bayesian[44][12] = -3.86114; qual_mismatch_simple_bayesian[44][13] = -4.09123; qual_mismatch_simple_bayesian[44][14] = -4.32128; qual_mismatch_simple_bayesian[44][15] = -4.55128; qual_mismatch_simple_bayesian[44][16] = -4.78122; qual_mismatch_simple_bayesian[44][17] = -5.01107; qual_mismatch_simple_bayesian[44][18] = -5.24081; qual_mismatch_simple_bayesian[44][19] = -5.47042; qual_mismatch_simple_bayesian[44][20] = -5.69986; qual_mismatch_simple_bayesian[44][21] = -5.92909; qual_mismatch_simple_bayesian[44][22] = -6.15806; qual_mismatch_simple_bayesian[44][23] = -6.3867; qual_mismatch_simple_bayesian[44][24] = -6.61492; qual_mismatch_simple_bayesian[44][25] = -6.84262; qual_mismatch_simple_bayesian[44][26] = -7.06966; qual_mismatch_simple_bayesian[44][27] = -7.29589; qual_mismatch_simple_bayesian[44][28] = -7.52109; qual_mismatch_simple_bayesian[44][29] = -7.74503; qual_mismatch_simple_bayesian[44][30] = -7.96738; qual_mismatch_simple_bayesian[44][31] = -8.18777; qual_mismatch_simple_bayesian[44][32] = -8.40575; qual_mismatch_simple_bayesian[44][33] = -8.62076; qual_mismatch_simple_bayesian[44][34] = -8.83214; qual_mismatch_simple_bayesian[44][35] = -9.03913; qual_mismatch_simple_bayesian[44][36] = -9.24085; qual_mismatch_simple_bayesian[44][37] = -9.43629; qual_mismatch_simple_bayesian[44][38] = -9.62438; qual_mismatch_simple_bayesian[44][39] = -9.80396; qual_mismatch_simple_bayesian[44][40] = -9.97387; qual_mismatch_simple_bayesian[44][41] = -10.133; qual_mismatch_simple_bayesian[44][42] = -10.2803; qual_mismatch_simple_bayesian[44][43] = -10.4151; qual_mismatch_simple_bayesian[44][44] = -10.5369; qual_mismatch_simple_bayesian[44][45] = -10.6454; qual_mismatch_simple_bayesian[44][46] = -10.7408; qual_mismatch_simple_bayesian[45][0] = -1.09862; qual_mismatch_simple_bayesian[45][1] = -1.32887; qual_mismatch_simple_bayesian[45][2] = -1.55912; qual_mismatch_simple_bayesian[45][3] = -1.78937; qual_mismatch_simple_bayesian[45][4] = -2.01961; qual_mismatch_simple_bayesian[45][5] = -2.24985; qual_mismatch_simple_bayesian[45][6] = -2.48008; qual_mismatch_simple_bayesian[45][7] = -2.71031; qual_mismatch_simple_bayesian[45][8] = -2.94052; qual_mismatch_simple_bayesian[45][9] = -3.17073; qual_mismatch_simple_bayesian[45][10] = -3.40092; qual_mismatch_simple_bayesian[45][11] = -3.6311; qual_mismatch_simple_bayesian[45][12] = -3.86126; qual_mismatch_simple_bayesian[45][13] = -4.09138; qual_mismatch_simple_bayesian[45][14] = -4.32148; qual_mismatch_simple_bayesian[45][15] = -4.55153; qual_mismatch_simple_bayesian[45][16] = -4.78153; qual_mismatch_simple_bayesian[45][17] = -5.01147; qual_mismatch_simple_bayesian[45][18] = -5.24131; qual_mismatch_simple_bayesian[45][19] = -5.47106; qual_mismatch_simple_bayesian[45][20] = -5.70067; qual_mismatch_simple_bayesian[45][21] = -5.93011; qual_mismatch_simple_bayesian[45][22] = -6.15934; qual_mismatch_simple_bayesian[45][23] = -6.38831; qual_mismatch_simple_bayesian[45][24] = -6.61695; qual_mismatch_simple_bayesian[45][25] = -6.84517; qual_mismatch_simple_bayesian[45][26] = -7.07286; qual_mismatch_simple_bayesian[45][27] = -7.29991; qual_mismatch_simple_bayesian[45][28] = -7.52614; qual_mismatch_simple_bayesian[45][29] = -7.75134; qual_mismatch_simple_bayesian[45][30] = -7.97528; qual_mismatch_simple_bayesian[45][31] = -8.19763; qual_mismatch_simple_bayesian[45][32] = -8.41802; qual_mismatch_simple_bayesian[45][33] = -8.636; qual_mismatch_simple_bayesian[45][34] = -8.851; qual_mismatch_simple_bayesian[45][35] = -9.06239; qual_mismatch_simple_bayesian[45][36] = -9.26938; qual_mismatch_simple_bayesian[45][37] = -9.4711; qual_mismatch_simple_bayesian[45][38] = -9.66654; qual_mismatch_simple_bayesian[45][39] = -9.85463; qual_mismatch_simple_bayesian[45][40] = -10.0342; qual_mismatch_simple_bayesian[45][41] = -10.2041; qual_mismatch_simple_bayesian[45][42] = -10.3632; qual_mismatch_simple_bayesian[45][43] = -10.5106; qual_mismatch_simple_bayesian[45][44] = -10.6454; qual_mismatch_simple_bayesian[45][45] = -10.7671; qual_mismatch_simple_bayesian[45][46] = -10.8756; qual_mismatch_simple_bayesian[46][0] = -1.09862; qual_mismatch_simple_bayesian[46][1] = -1.32887; qual_mismatch_simple_bayesian[46][2] = -1.55912; qual_mismatch_simple_bayesian[46][3] = -1.78937; qual_mismatch_simple_bayesian[46][4] = -2.01962; qual_mismatch_simple_bayesian[46][5] = -2.24986; qual_mismatch_simple_bayesian[46][6] = -2.4801; qual_mismatch_simple_bayesian[46][7] = -2.71033; qual_mismatch_simple_bayesian[46][8] = -2.94056; qual_mismatch_simple_bayesian[46][9] = -3.17077; qual_mismatch_simple_bayesian[46][10] = -3.40098; qual_mismatch_simple_bayesian[46][11] = -3.63117; qual_mismatch_simple_bayesian[46][12] = -3.86135; qual_mismatch_simple_bayesian[46][13] = -4.09151; qual_mismatch_simple_bayesian[46][14] = -4.32163; qual_mismatch_simple_bayesian[46][15] = -4.55173; qual_mismatch_simple_bayesian[46][16] = -4.78178; qual_mismatch_simple_bayesian[46][17] = -5.01178; qual_mismatch_simple_bayesian[46][18] = -5.24172; qual_mismatch_simple_bayesian[46][19] = -5.47156; qual_mismatch_simple_bayesian[46][20] = -5.70131; qual_mismatch_simple_bayesian[46][21] = -5.93092; qual_mismatch_simple_bayesian[46][22] = -6.16036; qual_mismatch_simple_bayesian[46][23] = -6.38959; qual_mismatch_simple_bayesian[46][24] = -6.61856; qual_mismatch_simple_bayesian[46][25] = -6.8472; qual_mismatch_simple_bayesian[46][26] = -7.07542; qual_mismatch_simple_bayesian[46][27] = -7.30311; qual_mismatch_simple_bayesian[46][28] = -7.53016; qual_mismatch_simple_bayesian[46][29] = -7.75639; qual_mismatch_simple_bayesian[46][30] = -7.98159; qual_mismatch_simple_bayesian[46][31] = -8.20553; qual_mismatch_simple_bayesian[46][32] = -8.42788; qual_mismatch_simple_bayesian[46][33] = -8.64827; qual_mismatch_simple_bayesian[46][34] = -8.86625; qual_mismatch_simple_bayesian[46][35] = -9.08126; qual_mismatch_simple_bayesian[46][36] = -9.29264; qual_mismatch_simple_bayesian[46][37] = -9.49963; qual_mismatch_simple_bayesian[46][38] = -9.70135; qual_mismatch_simple_bayesian[46][39] = -9.8968; qual_mismatch_simple_bayesian[46][40] = -10.0849; qual_mismatch_simple_bayesian[46][41] = -10.2645; qual_mismatch_simple_bayesian[46][42] = -10.4344; qual_mismatch_simple_bayesian[46][43] = -10.5935; qual_mismatch_simple_bayesian[46][44] = -10.7408; qual_mismatch_simple_bayesian[46][45] = -10.8756; qual_mismatch_simple_bayesian[46][46] = -10.9974; vector qual_score; qual_score.resize(47); qual_score[0] = -2; qual_score[1] = -1.58147; qual_score[2] = -0.996843; qual_score[3] = -0.695524; qual_score[4] = -0.507676; qual_score[5] = -0.38013; qual_score[6] = -0.289268; qual_score[7] = -0.222552; qual_score[8] = -0.172557; qual_score[9] = -0.134552; qual_score[10] = -0.105361; qual_score[11] = -0.0827653; qual_score[12] = -0.0651742; qual_score[13] = -0.0514183; qual_score[14] = -0.0406248; qual_score[15] = -0.0321336; qual_score[16] = -0.0254397; qual_score[17] = -0.0201544; qual_score[18] = -0.0159759; qual_score[19] = -0.0126692; qual_score[20] = -0.0100503; qual_score[21] = -0.007975; qual_score[22] = -0.00632956; qual_score[23] = -0.00502447; qual_score[24] = -0.00398902; qual_score[25] = -0.00316729; qual_score[26] = -0.00251505; qual_score[27] = -0.00199726; qual_score[28] = -0.00158615; qual_score[29] = -0.00125972; qual_score[30] = -0.0010005; qual_score[31] = -0.000794644; qual_score[32] = -0.000631156; qual_score[33] = -0.000501313; qual_score[34] = -0.000398186; qual_score[35] = -0.000316278; qual_score[36] = -0.00025122; qual_score[37] = -0.000199546; qual_score[38] = -0.000158502; qual_score[39] = -0.0001259; qual_score[40] = -0.000100005; qual_score[41] = -7.9436e-05; qual_score[42] = -6.30977e-05; qual_score[43] = -5.012e-05; qual_score[44] = -3.98115e-05; qual_score[45] = -3.16233e-05; qual_score[46] = -2.51192e-05; /*const double qual_match_simple_bayesian[][47] = { { -1.09861, -1.32887, -1.55913, -1.78939, -2.01965, -2.2499, -2.48016, -2.71042, -2.94068, -3.17094, -3.4012, -3.63146, -3.86171, -4.09197, -4.32223, -4.55249, -4.78275, -5.01301, -5.24327, -5.47352, -5.70378, -5.93404, -6.1643, -6.39456, -6.62482, -6.85508, -7.08533, -7.31559, -7.54585, -7.77611, -8.00637, -8.23663, -8.46688, -8.69714, -8.9274, -9.15766, -9.38792, -9.61818, -9.84844, -10.0787, -10.309, -10.5392, -10.7695, -10.9997, -11.23, -11.4602, -11.6905}, { -1.32887, -1.37587, -1.41484, -1.44692, -1.47315, -1.49449, -1.51178, -1.52572, -1.53694, -1.54593, -1.55314, -1.5589, -1.5635, -1.56717, -1.5701, -1.57243, -1.57428, -1.57576, -1.57693, -1.57786, -1.5786, -1.57919, -1.57966, -1.58003, -1.58033, -1.58057, -1.58075, -1.5809, -1.58102, -1.58111, -1.58119, -1.58125, -1.58129, -1.58133, -1.58136, -1.58138, -1.5814, -1.58142, -1.58143, -1.58144, -1.58145, -1.58145, -1.58146, -1.58146, -1.58146, -1.58146, -1.58147}, { -1.55913, -1.41484, -1.31343, -1.23963, -1.18465, -1.14303, -1.11117, -1.08657, -1.06744, -1.05251, -1.0408, -1.0316, -1.02436, -1.01863, -1.01411, -1.01054, -1.00771, -1.00546, -1.00368, -1.00227, -1.00115, -1.00027, -0.99956, -0.999001, -0.998557, -0.998204, -0.997924, -0.997702, -0.997525, -0.997385, -0.997273, -0.997185, -0.997114, -0.997059, -0.997014, -0.996979, -0.996951, -0.996929, -0.996911, -0.996897, -0.996886, -0.996877, -0.99687, -0.996865, -0.99686, -0.996857, -0.996854}, { -1.78939, -1.44692, -1.23963, -1.10098, -1.0031, -0.931648, -0.878319, -0.837896, -0.806912, -0.782967, -0.764347, -0.7498, -0.738394, -0.729426, -0.722359, -0.71678, -0.712372, -0.708883, -0.706121, -0.703933, -0.702197, -0.700821, -0.69973, -0.698863, -0.698176, -0.69763, -0.697196, -0.696852, -0.696579, -0.696362, -0.69619, -0.696053, -0.695944, -0.695858, -0.695789, -0.695735, -0.695692, -0.695657, -0.69563, -0.695608, -0.695591, -0.695577, -0.695566, -0.695558, -0.695551, -0.695546, -0.695541}, { -2.01965, -1.47315, -1.18465, -1.0031, -0.879224, -0.790712, -0.725593, -0.676729, -0.639547, -0.610968, -0.588834, -0.571596, -0.558111, -0.547528, -0.539201, -0.532636, -0.527451, -0.523352, -0.520107, -0.517538, -0.515502, -0.513887, -0.512606, -0.51159, -0.510784, -0.510144, -0.509636, -0.509232, -0.508912, -0.508658, -0.508456, -0.508295, -0.508168, -0.508067, -0.507986, -0.507922, -0.507872, -0.507831, -0.507799, -0.507774, -0.507754, -0.507738, -0.507725, -0.507715, -0.507707, -0.507701, -0.507695}, { -2.2499, -1.49449, -1.14303, -0.931648, -0.790712, -0.691393, -0.618979, -0.564976, -0.524066, -0.492723, -0.468507, -0.449682, -0.434976, -0.423448, -0.414384, -0.407243, -0.401606, -0.397151, -0.393627, -0.390836, -0.388625, -0.386872, -0.385482, -0.384379, -0.383503, -0.382809, -0.382257, -0.38182, -0.381472, -0.381196, -0.380977, -0.380803, -0.380664, -0.380554, -0.380467, -0.380398, -0.380343, -0.380299, -0.380264, -0.380237, -0.380215, -0.380198, -0.380184, -0.380173, -0.380164, -0.380157, -0.380152}, { -2.48016, -1.51178, -1.11117, -0.878319, -0.725593, -0.618979, -0.541714, -0.48433, -0.440984, -0.407844, -0.382281, -0.362431, -0.34694, -0.334804, -0.325268, -0.317757, -0.311831, -0.307149, -0.303445, -0.300513, -0.29819, -0.296348, -0.294888, -0.29373, -0.29281, -0.292081, -0.291502, -0.291042, -0.290677, -0.290387, -0.290157, -0.289974, -0.289829, -0.289713, -0.289622, -0.289549, -0.289491, -0.289445, -0.289409, -0.28938, -0.289357, -0.289339, -0.289324, -0.289313, -0.289304, -0.289296, -0.28929}, { -2.71042, -1.52572, -1.08657, -0.837896, -0.676729, -0.564976, -0.48433, -0.424604, -0.379581, -0.345208, -0.318723, -0.298173, -0.282146, -0.269595, -0.259737, -0.251976, -0.245853, -0.241016, -0.23719, -0.234162, -0.231763, -0.229861, -0.228354, -0.227158, -0.226208, -0.225455, -0.224857, -0.224383, -0.224006, -0.223707, -0.223469, -0.22328, -0.22313, -0.223011, -0.222917, -0.222842, -0.222782, -0.222734, -0.222697, -0.222667, -0.222643, -0.222624, -0.222609, -0.222597, -0.222588, -0.222581, -0.222575}, { -2.94068, -1.53694, -1.06744, -0.806912, -0.639547, -0.524066, -0.440984, -0.379581, -0.333359, -0.298107, -0.270966, -0.249919, -0.233512, -0.220668, -0.210582, -0.202642, -0.19638, -0.191434, -0.187522, -0.184426, -0.181973, -0.180029, -0.178488, -0.177265, -0.176295, -0.175525, -0.174914, -0.174428, -0.174043, -0.173737, -0.173494, -0.173301, -0.173148, -0.173026, -0.17293, -0.172853, -0.172792, -0.172744, -0.172705, -0.172675, -0.17265, -0.172631, -0.172616, -0.172604, -0.172594, -0.172586, -0.17258}, { -3.17094, -1.54593, -1.05251, -0.782967, -0.610968, -0.492723, -0.407844, -0.345208, -0.298107, -0.262213, -0.234592, -0.213183, -0.196498, -0.18344, -0.173188, -0.165119, -0.158755, -0.153729, -0.149755, -0.146609, -0.144117, -0.142143, -0.140577, -0.139335, -0.138349, -0.137567, -0.136946, -0.136453, -0.136062, -0.135751, -0.135504, -0.135308, -0.135153, -0.135029, -0.134931, -0.134853, -0.134791, -0.134742, -0.134703, -0.134672, -0.134647, -0.134628, -0.134612, -0.1346, -0.13459, -0.134582, -0.134576}, { -3.4012, -1.55314, -1.0408, -0.764347, -0.588834, -0.468507, -0.382281, -0.318723, -0.270966, -0.234592, -0.206614, -0.184935, -0.168044, -0.154827, -0.144451, -0.136285, -0.129846, -0.124761, -0.12074, -0.117558, -0.115037, -0.113039, -0.111455, -0.110198, -0.109202, -0.10841, -0.107782, -0.107284, -0.106888, -0.106574, -0.106324, -0.106126, -0.105968, -0.105843, -0.105744, -0.105665, -0.105602, -0.105553, -0.105513, -0.105482, -0.105457, -0.105437, -0.105421, -0.105409, -0.105399, -0.105391, -0.105385}, { -3.63146, -1.5589, -1.0316, -0.7498, -0.571596, -0.449682, -0.362431, -0.298173, -0.249919, -0.213183, -0.184935, -0.163052, -0.146004, -0.132667, -0.122198, -0.11396, -0.107464, -0.102334, -0.0982781, -0.0950678, -0.0925252, -0.09051, -0.0889123, -0.0876449, -0.0866394, -0.0858414, -0.0852079, -0.0847051, -0.0843058, -0.0839888, -0.083737, -0.0835371, -0.0833783, -0.0832522, -0.083152, -0.0830725, -0.0830093, -0.0829591, -0.0829192, -0.0828876, -0.0828624, -0.0828425, -0.0828266, -0.082814, -0.082804, -0.082796, -0.0827897}, { -3.86171, -1.5635, -1.02436, -0.738394, -0.558111, -0.434976, -0.34694, -0.282146, -0.233512, -0.196498, -0.168044, -0.146004, -0.128838, -0.115409, -0.104869, -0.096575, -0.0900357, -0.0848716, -0.0807886, -0.0775572, -0.0749978, -0.0729694, -0.0713612, -0.0700856, -0.0690735, -0.0682703, -0.0676327, -0.0671265, -0.0667247, -0.0664056, -0.0661522, -0.065951, -0.0657912, -0.0656642, -0.0655634, -0.0654833, -0.0654198, -0.0653692, -0.0653291, -0.0652972, -0.0652719, -0.0652518, -0.0652359, -0.0652232, -0.0652131, -0.0652051, -0.0651987}, { -4.09197, -1.56717, -1.01863, -0.729426, -0.547528, -0.423448, -0.334804, -0.269595, -0.220668, -0.18344, -0.154827, -0.132667, -0.115409, -0.101909, -0.0913142, -0.0829777, -0.0764049, -0.0712146, -0.0671109, -0.0638632, -0.061291, -0.0592525, -0.0576362, -0.0563542, -0.055337, -0.0545298, -0.053889, -0.0533804, -0.0529765, -0.0526558, -0.0524012, -0.0521989, -0.0520383, -0.0519108, -0.0518095, -0.051729, -0.0516651, -0.0516143, -0.051574, -0.051542, -0.0515165, -0.0514963, -0.0514803, -0.0514675, -0.0514574, -0.0514493, -0.051443}, { -4.32223, -1.5701, -1.01411, -0.722359, -0.539201, -0.414384, -0.325268, -0.259737, -0.210582, -0.173188, -0.144451, -0.122198, -0.104869, -0.0913142, -0.0806768, -0.0723072, -0.0657085, -0.0604979, -0.0563782, -0.0531178, -0.0505356, -0.0484892, -0.0468667, -0.0455797, -0.0445586, -0.0437483, -0.0431051, -0.0425945, -0.0421891, -0.0418671, -0.0416115, -0.0414085, -0.0412473, -0.0411192, -0.0410175, -0.0409368, -0.0408726, -0.0408216, -0.0407812, -0.040749, -0.0407235, -0.0407032, -0.0406871, -0.0406743, -0.0406641, -0.040656, -0.0406496}, { -4.55249, -1.57243, -1.01054, -0.71678, -0.532636, -0.407243, -0.317757, -0.251976, -0.202642, -0.165119, -0.136285, -0.11396, -0.096575, -0.0829777, -0.0723072, -0.0639118, -0.0572929, -0.0520664, -0.0479342, -0.044664, -0.042074, -0.0400214, -0.038394, -0.0371032, -0.0360791, -0.0352663, -0.0346212, -0.0341091, -0.0337024, -0.0333796, -0.0331232, -0.0329196, -0.0327579, -0.0326294, -0.0325274, -0.0324464, -0.0323821, -0.0323309, -0.0322904, -0.0322581, -0.0322325, -0.0322121, -0.032196, -0.0321831, -0.032173, -0.0321649, -0.0321584}, { -4.78275, -1.57428, -1.00771, -0.712372, -0.527451, -0.401606, -0.311831, -0.245853, -0.19638, -0.158755, -0.129846, -0.107464, -0.0900357, -0.0764049, -0.0657085, -0.0572929, -0.0506582, -0.0454193, -0.0412773, -0.0379994, -0.0354033, -0.033346, -0.0317148, -0.0304209, -0.0293944, -0.0285798, -0.0279331, -0.0274198, -0.0270122, -0.0266886, -0.0264316, -0.0262275, -0.0260655, -0.0259367, -0.0258345, -0.0257533, -0.0256888, -0.0256376, -0.0255969, -0.0255645, -0.0255389, -0.0255185, -0.0255023, -0.0254894, -0.0254792, -0.0254711, -0.0254646}, { -5.01301, -1.57576, -1.00546, -0.708883, -0.523352, -0.397151, -0.307149, -0.241016, -0.191434, -0.153729, -0.124761, -0.102334, -0.0848716, -0.0712146, -0.0604979, -0.0520664, -0.0454193, -0.0401706, -0.036021, -0.032737, -0.0301362, -0.028075, -0.0264408, -0.0251447, -0.0241163, -0.0233001, -0.0226523, -0.0221381, -0.0217297, -0.0214055, -0.0211481, -0.0209436, -0.0207812, -0.0206523, -0.0205498, -0.0204685, -0.0204039, -0.0203526, -0.0203118, -0.0202794, -0.0202537, -0.0202333, -0.020217, -0.0202041, -0.0201939, -0.0201858, -0.0201793}, { -5.24327, -1.57693, -1.00368, -0.706121, -0.520107, -0.393627, -0.303445, -0.23719, -0.187522, -0.149755, -0.12074, -0.0982781, -0.0807886, -0.0671109, -0.0563782, -0.0479342, -0.0412773, -0.036021, -0.0318653, -0.0285766, -0.025972, -0.0239079, -0.0222713, -0.0209733, -0.0199434, -0.0191261, -0.0184774, -0.0179624, -0.0175535, -0.0172288, -0.016971, -0.0167662, -0.0166036, -0.0164745, -0.0163719, -0.0162904, -0.0162257, -0.0161743, -0.0161335, -0.0161011, -0.0160753, -0.0160549, -0.0160386, -0.0160257, -0.0160155, -0.0160073, -0.0160009}, { -5.47352, -1.57786, -1.00227, -0.703933, -0.517538, -0.390836, -0.300513, -0.234162, -0.184426, -0.146609, -0.117558, -0.0950678, -0.0775572, -0.0638632, -0.0531178, -0.044664, -0.0379994, -0.032737, -0.0285766, -0.0252842, -0.0226766, -0.0206101, -0.0189717, -0.0176722, -0.0166412, -0.015823, -0.0151735, -0.0146579, -0.0142486, -0.0139235, -0.0136654, -0.0134604, -0.0132976, -0.0131684, -0.0130657, -0.0129841, -0.0129193, -0.0128679, -0.012827, -0.0127945, -0.0127688, -0.0127483, -0.012732, -0.0127191, -0.0127088, -0.0127007, -0.0126942}, { -5.70378, -1.5786, -1.00115, -0.702197, -0.515502, -0.388625, -0.29819, -0.231763, -0.181973, -0.144117, -0.115037, -0.0925252, -0.0749978, -0.061291, -0.0505356, -0.042074, -0.0354033, -0.0301362, -0.025972, -0.0226766, -0.0200667, -0.0179984, -0.0163585, -0.0150578, -0.0140259, -0.0132069, -0.0125569, -0.0120409, -0.0116311, -0.0113058, -0.0110475, -0.0108423, -0.0106794, -0.01055, -0.0104472, -0.0103655, -0.0103007, -0.0102492, -0.0102083, -0.0101758, -0.01015, -0.0101295, -0.0101132, -0.0101003, -0.01009, -0.0100819, -0.0100754}, { -5.93404, -1.57919, -1.00027, -0.700821, -0.513887, -0.386872, -0.296348, -0.229861, -0.180029, -0.142143, -0.113039, -0.09051, -0.0729694, -0.0592525, -0.0484892, -0.0400214, -0.033346, -0.028075, -0.0239079, -0.0206101, -0.0179984, -0.0159286, -0.0142876, -0.012986, -0.0119533, -0.0111338, -0.0104833, -0.00996692, -0.00955691, -0.00923135, -0.00897283, -0.00876752, -0.00860447, -0.00847497, -0.00837212, -0.00829043, -0.00822555, -0.00817401, -0.00813308, -0.00810056, -0.00807474, -0.00805422, -0.00803793, -0.00802498, -0.0080147, -0.00800654, -0.00800005}, { -6.1643, -1.57966, -0.99956, -0.69973, -0.512606, -0.385482, -0.294888, -0.228354, -0.178488, -0.140577, -0.111455, -0.0889123, -0.0713612, -0.0576362, -0.0468667, -0.038394, -0.0317148, -0.0264408, -0.0222713, -0.0189717, -0.0163585, -0.0142876, -0.0126457, -0.0113434, -0.0103101, -0.00949014, -0.00883928, -0.00832259, -0.00791235, -0.00758661, -0.00732794, -0.00712252, -0.00695938, -0.00682981, -0.00672691, -0.00664517, -0.00658025, -0.00652869, -0.00648773, -0.0064552, -0.00642936, -0.00640883, -0.00639253, -0.00637958, -0.00636929, -0.00636112, -0.00635463}, { -6.39456, -1.58003, -0.999001, -0.698863, -0.51159, -0.384379, -0.29373, -0.227158, -0.177265, -0.139335, -0.110198, -0.0876449, -0.0700856, -0.0563542, -0.0455797, -0.0371032, -0.0304209, -0.0251447, -0.0209733, -0.0176722, -0.0150578, -0.012986, -0.0113434, -0.0100405, -0.00900678, -0.00818644, -0.00753529, -0.00701837, -0.00660796, -0.00628208, -0.00602329, -0.00581778, -0.00565457, -0.00552494, -0.00542199, -0.00534022, -0.00527527, -0.00522368, -0.00518271, -0.00515016, -0.00512431, -0.00510378, -0.00508747, -0.00507451, -0.00506422, -0.00505604, -0.00504955}, { -6.62482, -1.58033, -0.998557, -0.698176, -0.510784, -0.383503, -0.29281, -0.226208, -0.176295, -0.138349, -0.109202, -0.0866394, -0.0690735, -0.055337, -0.0445586, -0.0360791, -0.0293944, -0.0241163, -0.0199434, -0.0166412, -0.0140259, -0.0119533, -0.0103101, -0.00900678, -0.00797271, -0.00715208, -0.00650071, -0.00598361, -0.00557305, -0.00524706, -0.00498818, -0.0047826, -0.00461933, -0.00448966, -0.00438667, -0.00430487, -0.0042399, -0.0041883, -0.00414731, -0.00411475, -0.00408889, -0.00406835, -0.00405203, -0.00403907, -0.00402878, -0.0040206, -0.0040141}, { -6.85508, -1.58057, -0.998204, -0.69763, -0.510144, -0.382809, -0.292081, -0.225455, -0.175525, -0.137567, -0.10841, -0.0858414, -0.0682703, -0.0545298, -0.0437483, -0.0352663, -0.0285798, -0.0233001, -0.0191261, -0.015823, -0.0132069, -0.0111338, -0.00949014, -0.00818644, -0.00715208, -0.00633122, -0.00567967, -0.00516243, -0.00475176, -0.00442567, -0.00416673, -0.00396109, -0.00379778, -0.00366807, -0.00356505, -0.00348323, -0.00341824, -0.00336662, -0.00332562, -0.00329306, -0.00326719, -0.00324664, -0.00323032, -0.00321736, -0.00320706, -0.00319888, -0.00319238}, { -7.08533, -1.58075, -0.997924, -0.697196, -0.509636, -0.382257, -0.291502, -0.224857, -0.174914, -0.136946, -0.107782, -0.0852079, -0.0676327, -0.053889, -0.0431051, -0.0346212, -0.0279331, -0.0226523, -0.0184774, -0.0151735, -0.0125569, -0.0104833, -0.00883928, -0.00753529, -0.00650071, -0.00567967, -0.00502798, -0.00451062, -0.00409986, -0.00377371, -0.00351471, -0.00330902, -0.00314567, -0.00301594, -0.0029129, -0.00283106, -0.00276606, -0.00271443, -0.00267342, -0.00264084, -0.00261497, -0.00259442, -0.00257809, -0.00256512, -0.00255482, -0.00254664, -0.00254014}, { -7.31559, -1.5809, -0.997702, -0.696852, -0.509232, -0.38182, -0.291042, -0.224383, -0.174428, -0.136453, -0.107284, -0.0847051, -0.0671265, -0.0533804, -0.0425945, -0.0341091, -0.0274198, -0.0221381, -0.0179624, -0.0146579, -0.0120409, -0.00996692, -0.00832259, -0.00701837, -0.00598361, -0.00516243, -0.00451062, -0.00399318, -0.00358235, -0.00325613, -0.00299709, -0.00279137, -0.00262799, -0.00249823, -0.00239518, -0.00231332, -0.00224831, -0.00219667, -0.00215565, -0.00212307, -0.00209719, -0.00207664, -0.00206031, -0.00204734, -0.00203704, -0.00202886, -0.00202236}, { -7.54585, -1.58102, -0.997525, -0.696579, -0.508912, -0.381472, -0.290677, -0.224006, -0.174043, -0.136062, -0.106888, -0.0843058, -0.0667247, -0.0529765, -0.0421891, -0.0337024, -0.0270122, -0.0217297, -0.0175535, -0.0142486, -0.0116311, -0.00955691, -0.00791235, -0.00660796, -0.00557305, -0.00475176, -0.00409986, -0.00358235, -0.00317146, -0.0028452, -0.00258612, -0.00238037, -0.00221697, -0.0020872, -0.00198413, -0.00190226, -0.00183724, -0.00178559, -0.00174457, -0.00171198, -0.0016861, -0.00166554, -0.00164921, -0.00163624, -0.00162594, -0.00161776, -0.00161126}, { -7.77611, -1.58111, -0.997385, -0.696362, -0.508658, -0.381196, -0.290387, -0.223707, -0.173737, -0.135751, -0.106574, -0.0839888, -0.0664056, -0.0526558, -0.0418671, -0.0333796, -0.0266886, -0.0214055, -0.0172288, -0.0139235, -0.0113058, -0.00923135, -0.00758661, -0.00628208, -0.00524706, -0.00442567, -0.00377371, -0.00325613, -0.0028452, -0.00251891, -0.0022598, -0.00205403, -0.00189061, -0.00176082, -0.00165774, -0.00157586, -0.00151083, -0.00145918, -0.00141815, -0.00138557, -0.00135968, -0.00133912, -0.00132279, -0.00130982, -0.00129951, -0.00129133, -0.00128483}, { -8.00637, -1.58119, -0.997273, -0.69619, -0.508456, -0.380977, -0.290157, -0.223469, -0.173494, -0.135504, -0.106324, -0.083737, -0.0661522, -0.0524012, -0.0416115, -0.0331232, -0.0264316, -0.0211481, -0.016971, -0.0136654, -0.0110475, -0.00897283, -0.00732794, -0.00602329, -0.00498818, -0.00416673, -0.00351471, -0.00299709, -0.00258612, -0.0022598, -0.00200067, -0.00179488, -0.00163145, -0.00150165, -0.00139855, -0.00131667, -0.00125164, -0.00119998, -0.00115895, -0.00112636, -0.00110047, -0.00107991, -0.00106358, -0.0010506, -0.0010403, -0.00103211, -0.00102561}, { -8.23663, -1.58125, -0.997185, -0.696053, -0.508295, -0.380803, -0.289974, -0.22328, -0.173301, -0.135308, -0.106126, -0.0835371, -0.065951, -0.0521989, -0.0414085, -0.0329196, -0.0262275, -0.0209436, -0.0167662, -0.0134604, -0.0108423, -0.00876752, -0.00712252, -0.00581778, -0.0047826, -0.00396109, -0.00330902, -0.00279137, -0.00238037, -0.00205403, -0.00179488, -0.00158908, -0.00142563, -0.00129582, -0.00119272, -0.00111084, -0.0010458, -0.000994137, -0.000953104, -0.000920511, -0.000894622, -0.000874059, -0.000857725, -0.000844751, -0.000834445, -0.000826259, -0.000819756}, { -8.46688, -1.58129, -0.997114, -0.695944, -0.508168, -0.380664, -0.289829, -0.22313, -0.173148, -0.135153, -0.105968, -0.0833783, -0.0657912, -0.0520383, -0.0412473, -0.0327579, -0.0260655, -0.0207812, -0.0166036, -0.0132976, -0.0106794, -0.00860447, -0.00695938, -0.00565457, -0.00461933, -0.00379778, -0.00314567, -0.00262799, -0.00221697, -0.00189061, -0.00163145, -0.00142563, -0.00126218, -0.00113236, -0.00102926, -0.000947368, -0.000882324, -0.000830661, -0.000789625, -0.00075703, -0.00073114, -0.000710576, -0.000694241, -0.000681266, -0.00067096, -0.000662773, -0.00065627}, { -8.69714, -1.58133, -0.997059, -0.695858, -0.508067, -0.380554, -0.289713, -0.223011, -0.173026, -0.135029, -0.105843, -0.0832522, -0.0656642, -0.0519108, -0.0411192, -0.0326294, -0.0259367, -0.0206523, -0.0164745, -0.0131684, -0.01055, -0.00847497, -0.00682981, -0.00552494, -0.00448966, -0.00366807, -0.00301594, -0.00249823, -0.0020872, -0.00176082, -0.00150165, -0.00129582, -0.00113236, -0.00100254, -0.000899433, -0.000817538, -0.000752491, -0.000700826, -0.000659788, -0.000627192, -0.000601301, -0.000580736, -0.0005644, -0.000551424, -0.000541118, -0.000532931, -0.000526428}, { -8.9274, -1.58136, -0.997014, -0.695789, -0.507986, -0.380467, -0.289622, -0.222917, -0.17293, -0.134931, -0.105744, -0.083152, -0.0655634, -0.0518095, -0.0410175, -0.0325274, -0.0258345, -0.0205498, -0.0163719, -0.0130657, -0.0104472, -0.00837212, -0.00672691, -0.00542199, -0.00438667, -0.00356505, -0.0029129, -0.00239518, -0.00198413, -0.00165774, -0.00139855, -0.00119272, -0.00102926, -0.000899433, -0.00079632, -0.000714422, -0.000649373, -0.000597706, -0.000556667, -0.00052407, -0.000498178, -0.000477612, -0.000461276, -0.0004483, -0.000437993, -0.000429806, -0.000423302}, { -9.15766, -1.58138, -0.996979, -0.695735, -0.507922, -0.380398, -0.289549, -0.222842, -0.172853, -0.134853, -0.105665, -0.0830725, -0.0654833, -0.051729, -0.0409368, -0.0324464, -0.0257533, -0.0204685, -0.0162904, -0.0129841, -0.0103655, -0.00829043, -0.00664517, -0.00534022, -0.00430487, -0.00348323, -0.00283106, -0.00231332, -0.00190226, -0.00157586, -0.00131667, -0.00111084, -0.000947368, -0.000817538, -0.000714422, -0.000632522, -0.000567471, -0.000515803, -0.000474763, -0.000442165, -0.000416272, -0.000395705, -0.000379369, -0.000366392, -0.000356085, -0.000347898, -0.000341394}, { -9.38792, -1.5814, -0.996951, -0.695692, -0.507872, -0.380343, -0.289491, -0.222782, -0.172792, -0.134791, -0.105602, -0.0830093, -0.0654198, -0.0516651, -0.0408726, -0.0323821, -0.0256888, -0.0204039, -0.0162257, -0.0129193, -0.0103007, -0.00822555, -0.00658025, -0.00527527, -0.0042399, -0.00341824, -0.00276606, -0.00224831, -0.00183724, -0.00151083, -0.00125164, -0.0010458, -0.000882324, -0.000752491, -0.000649373, -0.000567471, -0.000502419, -0.00045075, -0.000409709, -0.00037711, -0.000351217, -0.00033065, -0.000314313, -0.000301336, -0.000291028, -0.000282841, -0.000276337}, { -9.61818, -1.58142, -0.996929, -0.695657, -0.507831, -0.380299, -0.289445, -0.222734, -0.172744, -0.134742, -0.105553, -0.0829591, -0.0653692, -0.0516143, -0.0408216, -0.0323309, -0.0256376, -0.0203526, -0.0161743, -0.0128679, -0.0102492, -0.00817401, -0.00652869, -0.00522368, -0.0041883, -0.00336662, -0.00271443, -0.00219667, -0.00178559, -0.00145918, -0.00119998, -0.000994137, -0.000830661, -0.000700826, -0.000597706, -0.000515803, -0.00045075, -0.000399079, -0.000358037, -0.000325438, -0.000299544, -0.000278977, -0.00026264, -0.000249663, -0.000239355, -0.000231167, -0.000224664}, { -9.84844, -1.58143, -0.996911, -0.69563, -0.507799, -0.380264, -0.289409, -0.222697, -0.172705, -0.134703, -0.105513, -0.0829192, -0.0653291, -0.051574, -0.0407812, -0.0322904, -0.0255969, -0.0203118, -0.0161335, -0.012827, -0.0102083, -0.00813308, -0.00648773, -0.00518271, -0.00414731, -0.00332562, -0.00267342, -0.00215565, -0.00174457, -0.00141815, -0.00115895, -0.000953104, -0.000789625, -0.000659788, -0.000556667, -0.000474763, -0.000409709, -0.000358037, -0.000316995, -0.000284396, -0.000258502, -0.000237934, -0.000221596, -0.000208619, -0.000198311, -0.000190123, -0.00018362}, { -10.0787, -1.58144, -0.996897, -0.695608, -0.507774, -0.380237, -0.28938, -0.222667, -0.172675, -0.134672, -0.105482, -0.0828876, -0.0652972, -0.051542, -0.040749, -0.0322581, -0.0255645, -0.0202794, -0.0161011, -0.0127945, -0.0101758, -0.00810056, -0.0064552, -0.00515016, -0.00411475, -0.00329306, -0.00264084, -0.00212307, -0.00171198, -0.00138557, -0.00112636, -0.000920511, -0.00075703, -0.000627192, -0.00052407, -0.000442165, -0.00037711, -0.000325438, -0.000284396, -0.000251796, -0.000225901, -0.000205333, -0.000188996, -0.000176018, -0.00016571, -0.000157522, -0.000151019}, { -10.309, -1.58145, -0.996886, -0.695591, -0.507754, -0.380215, -0.289357, -0.222643, -0.17265, -0.134647, -0.105457, -0.0828624, -0.0652719, -0.0515165, -0.0407235, -0.0322325, -0.0255389, -0.0202537, -0.0160753, -0.0127688, -0.01015, -0.00807474, -0.00642936, -0.00512431, -0.00408889, -0.00326719, -0.00261497, -0.00209719, -0.0016861, -0.00135968, -0.00110047, -0.000894622, -0.00073114, -0.000601301, -0.000498178, -0.000416272, -0.000351217, -0.000299544, -0.000258502, -0.000225901, -0.000200007, -0.000179438, -0.000163101, -0.000150123, -0.000139815, -0.000131627, -0.000125123}, { -10.5392, -1.58145, -0.996877, -0.695577, -0.507738, -0.380198, -0.289339, -0.222624, -0.172631, -0.134628, -0.105437, -0.0828425, -0.0652518, -0.0514963, -0.0407032, -0.0322121, -0.0255185, -0.0202333, -0.0160549, -0.0127483, -0.0101295, -0.00805422, -0.00640883, -0.00510378, -0.00406835, -0.00324664, -0.00259442, -0.00207664, -0.00166554, -0.00133912, -0.00107991, -0.000874059, -0.000710576, -0.000580736, -0.000477612, -0.000395705, -0.00033065, -0.000278977, -0.000237934, -0.000205333, -0.000179438, -0.00015887, -0.000142532, -0.000129555, -0.000119246, -0.000111058, -0.000104554}, { -10.7695, -1.58146, -0.99687, -0.695566, -0.507725, -0.380184, -0.289324, -0.222609, -0.172616, -0.134612, -0.105421, -0.0828266, -0.0652359, -0.0514803, -0.0406871, -0.032196, -0.0255023, -0.020217, -0.0160386, -0.012732, -0.0101132, -0.00803793, -0.00639253, -0.00508747, -0.00405203, -0.00323032, -0.00257809, -0.00206031, -0.00164921, -0.00132279, -0.00106358, -0.000857725, -0.000694241, -0.0005644, -0.000461276, -0.000379369, -0.000314313, -0.00026264, -0.000221596, -0.000188996, -0.000163101, -0.000142532, -0.000126194, -0.000113217, -0.000102908, -9.47203e-05, -8.82164e-05}, { -10.9997, -1.58146, -0.996865, -0.695558, -0.507715, -0.380173, -0.289313, -0.222597, -0.172604, -0.1346, -0.105409, -0.082814, -0.0652232, -0.0514675, -0.0406743, -0.0321831, -0.0254894, -0.0202041, -0.0160257, -0.0127191, -0.0101003, -0.00802498, -0.00637958, -0.00507451, -0.00403907, -0.00321736, -0.00256512, -0.00204734, -0.00163624, -0.00130982, -0.0010506, -0.000844751, -0.000681266, -0.000551424, -0.0004483, -0.000366392, -0.000301336, -0.000249663, -0.000208619, -0.000176018, -0.000150123, -0.000129555, -0.000113217, -0.000100239, -8.99308e-05, -8.17427e-05, -7.52387e-05}, { -11.23, -1.58146, -0.99686, -0.695551, -0.507707, -0.380164, -0.289304, -0.222588, -0.172594, -0.13459, -0.105399, -0.082804, -0.0652131, -0.0514574, -0.0406641, -0.032173, -0.0254792, -0.0201939, -0.0160155, -0.0127088, -0.01009, -0.0080147, -0.00636929, -0.00506422, -0.00402878, -0.00320706, -0.00255482, -0.00203704, -0.00162594, -0.00129951, -0.0010403, -0.000834445, -0.00067096, -0.000541118, -0.000437993, -0.000356085, -0.000291028, -0.000239355, -0.000198311, -0.00016571, -0.000139815, -0.000119246, -0.000102908, -8.99308e-05, -7.96225e-05, -7.14344e-05, -6.49304e-05}, { -11.4602, -1.58146, -0.996857, -0.695546, -0.507701, -0.380157, -0.289296, -0.222581, -0.172586, -0.134582, -0.105391, -0.082796, -0.0652051, -0.0514493, -0.040656, -0.0321649, -0.0254711, -0.0201858, -0.0160073, -0.0127007, -0.0100819, -0.00800654, -0.00636112, -0.00505604, -0.0040206, -0.00319888, -0.00254664, -0.00202886, -0.00161776, -0.00129133, -0.00103211, -0.000826259, -0.000662773, -0.000532931, -0.000429806, -0.000347898, -0.000282841, -0.000231167, -0.000190123, -0.000157522, -0.000131627, -0.000111058, -9.47203e-05, -8.17427e-05, -7.14344e-05, -6.32462e-05, -5.67422e-05}, { -11.6905, -1.58147, -0.996854, -0.695541, -0.507695, -0.380152, -0.28929, -0.222575, -0.17258, -0.134576, -0.105385, -0.0827897, -0.0651987, -0.051443, -0.0406496, -0.0321584, -0.0254646, -0.0201793, -0.0160009, -0.0126942, -0.0100754, -0.00800005, -0.00635463, -0.00504955, -0.0040141, -0.00319238, -0.00254014, -0.00202236, -0.00161126, -0.00128483, -0.00102561, -0.000819756, -0.00065627, -0.000526428, -0.000423302, -0.000341394, -0.000276337, -0.000224664, -0.00018362, -0.000151019, -0.000125123, -0.000104554, -8.82164e-05, -7.52387e-05, -6.49304e-05, -5.67422e-05, -5.02381e-05}}; const double qual_mismatch_simple_bayesian[][47] = { { -1.50408, -1.40619, -1.33474, -1.28141, -1.24099, -1.21, -1.18606, -1.16744, -1.15289, -1.14148, -1.13251, -1.12545, -1.11987, -1.11546, -1.11197, -1.10921, -1.10702, -1.10529, -1.10391, -1.10282, -1.10195, -1.10126, -1.10072, -1.10028, -1.09994, -1.09967, -1.09945, -1.09928, -1.09914, -1.09903, -1.09895, -1.09888, -1.09882, -1.09878, -1.09874, -1.09872, -1.0987, -1.09868, -1.09867, -1.09865, -1.09865, -1.09864, -1.09863, -1.09863, -1.09863, -1.09862, -1.09862}, { -1.40619, -1.38979, -1.37696, -1.36688, -1.35894, -1.35268, -1.34774, -1.34383, -1.34073, -1.33828, -1.33634, -1.3348, -1.33358, -1.33261, -1.33184, -1.33123, -1.33074, -1.33036, -1.33005, -1.32981, -1.32962, -1.32946, -1.32934, -1.32924, -1.32917, -1.32911, -1.32906, -1.32902, -1.32899, -1.32896, -1.32895, -1.32893, -1.32892, -1.32891, -1.3289, -1.32889, -1.32889, -1.32889, -1.32888, -1.32888, -1.32888, -1.32888, -1.32888, -1.32887, -1.32887, -1.32887, -1.32887}, { -1.33474, -1.37696, -1.41181, -1.44039, -1.46368, -1.48258, -1.49786, -1.51016, -1.52003, -1.52795, -1.53428, -1.53934, -1.54338, -1.5466, -1.54916, -1.55121, -1.55283, -1.55412, -1.55515, -1.55597, -1.55662, -1.55713, -1.55754, -1.55787, -1.55813, -1.55833, -1.5585, -1.55863, -1.55873, -1.55881, -1.55888, -1.55893, -1.55897, -1.559, -1.55903, -1.55905, -1.55907, -1.55908, -1.55909, -1.5591, -1.5591, -1.55911, -1.55911, -1.55912, -1.55912, -1.55912, -1.55912}, { -1.28141, -1.36688, -1.44039, -1.50289, -1.55549, -1.59933, -1.63558, -1.66534, -1.68963, -1.70935, -1.72529, -1.73814, -1.74847, -1.75675, -1.76338, -1.76867, -1.7729, -1.77627, -1.77895, -1.78109, -1.78279, -1.78414, -1.78522, -1.78608, -1.78676, -1.7873, -1.78773, -1.78807, -1.78834, -1.78855, -1.78873, -1.78886, -1.78897, -1.78906, -1.78912, -1.78918, -1.78922, -1.78926, -1.78928, -1.7893, -1.78932, -1.78934, -1.78935, -1.78935, -1.78936, -1.78937, -1.78937}, { -1.24099, -1.35894, -1.46368, -1.55549, -1.63493, -1.70287, -1.76033, -1.80845, -1.8484, -1.8813, -1.90823, -1.93016, -1.94792, -1.96226, -1.97379, -1.98305, -1.99047, -1.9964, -2.00114, -2.00492, -2.00793, -2.01033, -2.01224, -2.01376, -2.01497, -2.01593, -2.01669, -2.0173, -2.01778, -2.01816, -2.01847, -2.01871, -2.0189, -2.01906, -2.01918, -2.01927, -2.01935, -2.01941, -2.01946, -2.0195, -2.01953, -2.01955, -2.01957, -2.01959, -2.0196, -2.01961, -2.01962}, { -1.21, -1.35268, -1.48258, -1.59933, -1.70287, -1.79352, -1.87187, -1.93881, -1.99536, -2.04269, -2.08194, -2.11426, -2.14069, -2.1622, -2.17962, -2.19368, -2.20499, -2.21406, -2.22133, -2.22714, -2.23178, -2.23548, -2.23843, -2.24078, -2.24265, -2.24414, -2.24532, -2.24626, -2.24701, -2.2476, -2.24808, -2.24845, -2.24875, -2.24899, -2.24918, -2.24933, -2.24945, -2.24954, -2.24962, -2.24967, -2.24972, -2.24976, -2.24979, -2.24981, -2.24983, -2.24985, -2.24986}, { -1.18606, -1.34774, -1.49786, -1.63558, -1.76033, -1.87187, -1.97029, -2.05601, -2.12976, -2.19248, -2.24527, -2.28928, -2.32567, -2.35556, -2.37995, -2.39976, -2.41577, -2.42868, -2.43906, -2.44737, -2.45403, -2.45935, -2.4636, -2.46698, -2.46968, -2.47183, -2.47353, -2.47489, -2.47598, -2.47684, -2.47752, -2.47806, -2.47849, -2.47884, -2.47911, -2.47933, -2.4795, -2.47964, -2.47974, -2.47983, -2.4799, -2.47995, -2.48, -2.48003, -2.48006, -2.48008, -2.4801}, { -1.16744, -1.34383, -1.51016, -1.66534, -1.80845, -1.93881, -2.05601, -2.16001, -2.25109, -2.32986, -2.39718, -2.45408, -2.5017, -2.54122, -2.57376, -2.60038, -2.62204, -2.63959, -2.65376, -2.66515, -2.6743, -2.68162, -2.68748, -2.69215, -2.69588, -2.69886, -2.70122, -2.70311, -2.70461, -2.7058, -2.70675, -2.7075, -2.7081, -2.70858, -2.70896, -2.70926, -2.7095, -2.70969, -2.70984, -2.70996, -2.71005, -2.71013, -2.71019, -2.71024, -2.71028, -2.71031, -2.71033}, { -1.15289, -1.34073, -1.52003, -1.68963, -1.8484, -1.99536, -2.12976, -2.25109, -2.3592, -2.45427, -2.5368, -2.60759, -2.66762, -2.71801, -2.75994, -2.79454, -2.8229, -2.84602, -2.86477, -2.87992, -2.89212, -2.90191, -2.90977, -2.91605, -2.92106, -2.92507, -2.92826, -2.9308, -2.93282, -2.93444, -2.93572, -2.93674, -2.93755, -2.93819, -2.9387, -2.93911, -2.93943, -2.93969, -2.93989, -2.94005, -2.94018, -2.94029, -2.94037, -2.94043, -2.94048, -2.94052, -2.94056}, { -1.14148, -1.33828, -1.52795, -1.70935, -1.8813, -2.04269, -2.19248, -2.32986, -2.45427, -2.56545, -2.66352, -2.74891, -2.82235, -2.8848, -2.93733, -2.98112, -3.01733, -3.04705, -3.07131, -3.09101, -3.10693, -3.11977, -3.13008, -3.13835, -3.14496, -3.15025, -3.15447, -3.15784, -3.16052, -3.16265, -3.16435, -3.1657, -3.16678, -3.16763, -3.16831, -3.16885, -3.16928, -3.16962, -3.16989, -3.17011, -3.17028, -3.17041, -3.17052, -3.17061, -3.17068, -3.17073, -3.17077}, { -1.13251, -1.33634, -1.53428, -1.72529, -1.90823, -2.08194, -2.24527, -2.39718, -2.5368, -2.66352, -2.77704, -2.87741, -2.96499, -3.04048, -3.10478, -3.15899, -3.20424, -3.2417, -3.27249, -3.29764, -3.31808, -3.33462, -3.34796, -3.35868, -3.36728, -3.37416, -3.37966, -3.38405, -3.38756, -3.39035, -3.39257, -3.39434, -3.39574, -3.39686, -3.39775, -3.39846, -3.39902, -3.39947, -3.39982, -3.40011, -3.40033, -3.40051, -3.40065, -3.40076, -3.40085, -3.40092, -3.40098}, { -1.12545, -1.3348, -1.53934, -1.73814, -1.93016, -2.11426, -2.28928, -2.45408, -2.60759, -2.74891, -2.87741, -2.99272, -3.09485, -3.18412, -3.2612, -3.32696, -3.38246, -3.42885, -3.4673, -3.49893, -3.52479, -3.54582, -3.56284, -3.57658, -3.58762, -3.59648, -3.60357, -3.60925, -3.61377, -3.61738, -3.62026, -3.62255, -3.62438, -3.62583, -3.62698, -3.6279, -3.62863, -3.62921, -3.62967, -3.63004, -3.63033, -3.63056, -3.63075, -3.63089, -3.63101, -3.6311, -3.63117}, { -1.11987, -1.33358, -1.54338, -1.74847, -1.94792, -2.14069, -2.32567, -2.5017, -2.66762, -2.82235, -2.96499, -3.09485, -3.21154, -3.31504, -3.40563, -3.48395, -3.55084, -3.60736, -3.65465, -3.69388, -3.72617, -3.75259, -3.77408, -3.79149, -3.80553, -3.81683, -3.8259, -3.83316, -3.83897, -3.84361, -3.8473, -3.85025, -3.8526, -3.85447, -3.85595, -3.85713, -3.85807, -3.85882, -3.85942, -3.85989, -3.86026, -3.86056, -3.8608, -3.86099, -3.86114, -3.86126, -3.86135}, { -1.11546, -1.33261, -1.5466, -1.75675, -1.96226, -2.1622, -2.35556, -2.54122, -2.71801, -2.8848, -3.04048, -3.18412, -3.31504, -3.43281, -3.53737, -3.629, -3.70828, -3.77607, -3.83339, -3.88139, -3.92122, -3.95404, -3.9809, -4.00276, -4.02047, -4.03476, -4.04626, -4.0555, -4.06289, -4.0688, -4.07352, -4.07729, -4.08029, -4.08268, -4.08459, -4.0861, -4.08731, -4.08826, -4.08903, -4.08963, -4.09011, -4.0905, -4.0908, -4.09104, -4.09123, -4.09138, -4.09151}, { -1.11197, -1.33184, -1.54916, -1.76338, -1.97379, -2.17962, -2.37995, -2.57376, -2.75994, -2.93733, -3.10478, -3.2612, -3.40563, -3.53737, -3.65598, -3.76138, -3.85381, -3.93386, -4.00234, -4.0603, -4.10885, -4.14917, -4.1824, -4.20961, -4.23176, -4.24971, -4.2642, -4.27586, -4.28523, -4.29273, -4.29872, -4.30351, -4.30734, -4.31038, -4.31281, -4.31474, -4.31627, -4.3175, -4.31847, -4.31924, -4.31986, -4.32034, -4.32073, -4.32104, -4.32128, -4.32148, -4.32163}, { -1.10921, -1.33123, -1.55121, -1.76867, -1.98305, -2.19368, -2.39976, -2.60038, -2.79454, -2.98112, -3.15899, -3.32696, -3.48395, -3.629, -3.76138, -3.88065, -3.9867, -4.07977, -4.16041, -4.22945, -4.2879, -4.3369, -4.3776, -4.41116, -4.43864, -4.46102, -4.47916, -4.49381, -4.5056, -4.51507, -4.52265, -4.52872, -4.53356, -4.53742, -4.5405, -4.54296, -4.54491, -4.54646, -4.5477, -4.54868, -4.54947, -4.55009, -4.55058, -4.55097, -4.55128, -4.55153, -4.55173}, { -1.10702, -1.33074, -1.55283, -1.7729, -1.99047, -2.20499, -2.41577, -2.62204, -2.8229, -3.01733, -3.20424, -3.38246, -3.55084, -3.70828, -3.85381, -3.9867, -4.10649, -4.21306, -4.30662, -4.38774, -4.45721, -4.51606, -4.5654, -4.60641, -4.64022, -4.66792, -4.69049, -4.70878, -4.72355, -4.73544, -4.74499, -4.75264, -4.75876, -4.76365, -4.76755, -4.77065, -4.77313, -4.7751, -4.77667, -4.77792, -4.77891, -4.7797, -4.78032, -4.78082, -4.78122, -4.78153, -4.78178}, { -1.10529, -1.33036, -1.55412, -1.77627, -1.9964, -2.21406, -2.42868, -2.63959, -2.84602, -3.04705, -3.2417, -3.42885, -3.60736, -3.77607, -3.93386, -4.07977, -4.21306, -4.33325, -4.44022, -4.53419, -4.61567, -4.68549, -4.74465, -4.79427, -4.83552, -4.86954, -4.89741, -4.92012, -4.93853, -4.9534, -4.96537, -4.97499, -4.98269, -4.98885, -4.99377, -4.9977, -5.00083, -5.00332, -5.0053, -5.00688, -5.00814, -5.00914, -5.00993, -5.01056, -5.01107, -5.01147, -5.01178}, { -1.10391, -1.33005, -1.55515, -1.77895, -2.00114, -2.22133, -2.43906, -2.65376, -2.86477, -3.07131, -3.27249, -3.4673, -3.65465, -3.83339, -4.00234, -4.16041, -4.30662, -4.44022, -4.56074, -4.66803, -4.76231, -4.84409, -4.91418, -4.97359, -5.02342, -5.06486, -5.09904, -5.12706, -5.14988, -5.16839, -5.18334, -5.19537, -5.20504, -5.21278, -5.21897, -5.22392, -5.22787, -5.23102, -5.23352, -5.23552, -5.23711, -5.23837, -5.23938, -5.24017, -5.24081, -5.24131, -5.24172}, { -1.10282, -1.32981, -1.55597, -1.78109, -2.00492, -2.22714, -2.44737, -2.66515, -2.87992, -3.09101, -3.29764, -3.49893, -3.69388, -3.88139, -4.0603, -4.22945, -4.38774, -4.53419, -4.66803, -4.78881, -4.89635, -4.99087, -5.07289, -5.1432, -5.2028, -5.25281, -5.29439, -5.32871, -5.35683, -5.37974, -5.39832, -5.41334, -5.42542, -5.43513, -5.44291, -5.44913, -5.4541, -5.45806, -5.46122, -5.46374, -5.46574, -5.46734, -5.46861, -5.46962, -5.47042, -5.47106, -5.47156}, { -1.10195, -1.32962, -1.55662, -1.78279, -2.00793, -2.23178, -2.45403, -2.6743, -2.89212, -3.10693, -3.31808, -3.52479, -3.72617, -3.92122, -4.10885, -4.2879, -4.45721, -4.61567, -4.76231, -4.89635, -5.01732, -5.12507, -5.21979, -5.30199, -5.37247, -5.43222, -5.48237, -5.52408, -5.55849, -5.5867, -5.60969, -5.62833, -5.64339, -5.65552, -5.66525, -5.67306, -5.6793, -5.68429, -5.68827, -5.69144, -5.69396, -5.69598, -5.69758, -5.69885, -5.69986, -5.70067, -5.70131}, { -1.10126, -1.32946, -1.55713, -1.78414, -2.01033, -2.23548, -2.45935, -2.68162, -2.90191, -3.11977, -3.33462, -3.54582, -3.75259, -3.95404, -4.14917, -4.3369, -4.51606, -4.68549, -4.84409, -4.99087, -5.12507, -5.2462, -5.35411, -5.44898, -5.53133, -5.60194, -5.66182, -5.71208, -5.75388, -5.78837, -5.81665, -5.83969, -5.85838, -5.87348, -5.88564, -5.89541, -5.90323, -5.90949, -5.91449, -5.91848, -5.92166, -5.9242, -5.92621, -5.92782, -5.92909, -5.93011, -5.93092}, { -1.10072, -1.32934, -1.55754, -1.78522, -2.01224, -2.23843, -2.4636, -2.68748, -2.90977, -3.13008, -3.34796, -3.56284, -3.77408, -3.9809, -4.1824, -4.3776, -4.5654, -4.74465, -4.91418, -5.07289, -5.21979, -5.35411, -5.47537, -5.5834, -5.67839, -5.76086, -5.83158, -5.89155, -5.9419, -5.98377, -6.01833, -6.04666, -6.06975, -6.08848, -6.10361, -6.1158, -6.12558, -6.13342, -6.1397, -6.14471, -6.14871, -6.15189, -6.15443, -6.15645, -6.15806, -6.15934, -6.16036}, { -1.10028, -1.32924, -1.55787, -1.78608, -2.01376, -2.24078, -2.46698, -2.69215, -2.91605, -3.13835, -3.35868, -3.57658, -3.79149, -4.00276, -4.20961, -4.41116, -4.60641, -4.79427, -4.97359, -5.1432, -5.30199, -5.44898, -5.5834, -5.70476, -5.81289, -5.90798, -5.99054, -6.06134, -6.12139, -6.17181, -6.21374, -6.24836, -6.27673, -6.29986, -6.31861, -6.33377, -6.34597, -6.35578, -6.36363, -6.36991, -6.37493, -6.37894, -6.38213, -6.38467, -6.3867, -6.38831, -6.38959}, { -1.09994, -1.32917, -1.55813, -1.78676, -2.01497, -2.24265, -2.46968, -2.69588, -2.92106, -3.14496, -3.36728, -3.58762, -3.80553, -4.02047, -4.23176, -4.43864, -4.64022, -4.83552, -5.02342, -5.2028, -5.37247, -5.53133, -5.67839, -5.81289, -5.93433, -6.04254, -6.1377, -6.22033, -6.29121, -6.35132, -6.40179, -6.44377, -6.47843, -6.50683, -6.52999, -6.54877, -6.56395, -6.57617, -6.58598, -6.59385, -6.60014, -6.60516, -6.60917, -6.61237, -6.61492, -6.61695, -6.61856}, { -1.09967, -1.32911, -1.55833, -1.7873, -2.01593, -2.24414, -2.47183, -2.69886, -2.92507, -3.15025, -3.37416, -3.59648, -3.81683, -4.03476, -4.24971, -4.46102, -4.66792, -4.86954, -5.06486, -5.25281, -5.43222, -5.60194, -5.76086, -5.90798, -6.04254, -6.16404, -6.27231, -6.36754, -6.45023, -6.52116, -6.58132, -6.63183, -6.67385, -6.70854, -6.73697, -6.76015, -6.77895, -6.79414, -6.80637, -6.8162, -6.82407, -6.83037, -6.8354, -6.83942, -6.84262, -6.84517, -6.8472}, { -1.09945, -1.32906, -1.5585, -1.78773, -2.01669, -2.24532, -2.47353, -2.70122, -2.92826, -3.15447, -3.37966, -3.60357, -3.8259, -4.04626, -4.2642, -4.47916, -4.69049, -4.89741, -5.09904, -5.29439, -5.48237, -5.66182, -5.83158, -5.99054, -6.1377, -6.27231, -6.39386, -6.50219, -6.59746, -6.6802, -6.75117, -6.81137, -6.86191, -6.90396, -6.93867, -6.96713, -6.99033, -7.00914, -7.02435, -7.03659, -7.04642, -7.0543, -7.06061, -7.06564, -7.06966, -7.07286, -7.07542}, { -1.09928, -1.32902, -1.55863, -1.78807, -2.0173, -2.24626, -2.47489, -2.70311, -2.9308, -3.15784, -3.38405, -3.60925, -3.83316, -4.0555, -4.27586, -4.49381, -4.70878, -4.92012, -5.12706, -5.32871, -5.52408, -5.71208, -5.89155, -6.06134, -6.22033, -6.36754, -6.50219, -6.62378, -6.73214, -6.82745, -6.91022, -6.98123, -7.04146, -7.09203, -7.13411, -7.16884, -7.19731, -7.22052, -7.23935, -7.25456, -7.26682, -7.27666, -7.28454, -7.29085, -7.29589, -7.29991, -7.30311}, { -1.09914, -1.32899, -1.55873, -1.78834, -2.01778, -2.24701, -2.47598, -2.70461, -2.93282, -3.16052, -3.38756, -3.61377, -3.83897, -4.06289, -4.28523, -4.5056, -4.72355, -4.93853, -5.14988, -5.35683, -5.55849, -5.75388, -5.9419, -6.12139, -6.29121, -6.45023, -6.59746, -6.73214, -6.85376, -6.96216, -7.0575, -7.1403, -7.21133, -7.27159, -7.32218, -7.36428, -7.39902, -7.42751, -7.45073, -7.46957, -7.48479, -7.49705, -7.50689, -7.51478, -7.52109, -7.52614, -7.53016}, { -1.09903, -1.32896, -1.55881, -1.78855, -2.01816, -2.2476, -2.47684, -2.7058, -2.93444, -3.16265, -3.39035, -3.61738, -3.84361, -4.0688, -4.29273, -4.51507, -4.73544, -4.9534, -5.16839, -5.37974, -5.5867, -5.78837, -5.98377, -6.17181, -6.35132, -6.52116, -6.6802, -6.82745, -6.96216, -7.0838, -7.19222, -7.28759, -7.37041, -7.44147, -7.50174, -7.55235, -7.59446, -7.62922, -7.65772, -7.68095, -7.6998, -7.71502, -7.72729, -7.73713, -7.74503, -7.75134, -7.75639}, { -1.09895, -1.32895, -1.55888, -1.78873, -2.01847, -2.24808, -2.47752, -2.70675, -2.93572, -3.16435, -3.39257, -3.62026, -3.8473, -4.07352, -4.29872, -4.52265, -4.74499, -4.96537, -5.18334, -5.39832, -5.60969, -5.81665, -6.01833, -6.21374, -6.40179, -6.58132, -6.75117, -6.91022, -7.0575, -7.19222, -7.31389, -7.42233, -7.51772, -7.60056, -7.67163, -7.73192, -7.78254, -7.82466, -7.85943, -7.88794, -7.91118, -7.93003, -7.94526, -7.95753, -7.96738, -7.97528, -7.98159}, { -1.09888, -1.32893, -1.55893, -1.78886, -2.01871, -2.24845, -2.47806, -2.7075, -2.93674, -3.1657, -3.39434, -3.62255, -3.85025, -4.07729, -4.30351, -4.52872, -4.75264, -4.97499, -5.19537, -5.41334, -5.62833, -5.83969, -6.04666, -6.24836, -6.44377, -6.63183, -6.81137, -6.98123, -7.1403, -7.28759, -7.42233, -7.54401, -7.65246, -7.74787, -7.83072, -7.90181, -7.96211, -8.01274, -8.05488, -8.08965, -8.11817, -8.14141, -8.16027, -8.1755, -8.18777, -8.19763, -8.20553}, { -1.09882, -1.32892, -1.55897, -1.78897, -2.0189, -2.24875, -2.47849, -2.7081, -2.93755, -3.16678, -3.39574, -3.62438, -3.8526, -4.08029, -4.30734, -4.53356, -4.75876, -4.98269, -5.20504, -5.42542, -5.64339, -5.85838, -6.06975, -6.27673, -6.47843, -6.67385, -6.86191, -7.04146, -7.21133, -7.37041, -7.51772, -7.65246, -7.77416, -7.88263, -7.97804, -8.06091, -8.132, -8.19232, -8.24296, -8.2851, -8.31988, -8.3484, -8.37165, -8.39051, -8.40575, -8.41802, -8.42788}, { -1.09878, -1.32891, -1.559, -1.78906, -2.01906, -2.24899, -2.47884, -2.70858, -2.93819, -3.16763, -3.39686, -3.62583, -3.85447, -4.08268, -4.31038, -4.53742, -4.76365, -4.98885, -5.21278, -5.43513, -5.65552, -5.87348, -6.08848, -6.29986, -6.50683, -6.70854, -6.90396, -7.09203, -7.27159, -7.44147, -7.60056, -7.74787, -7.88263, -8.00433, -8.11281, -8.20823, -8.29111, -8.36221, -8.42253, -8.47318, -8.51533, -8.55012, -8.57864, -8.60189, -8.62076, -8.636, -8.64827}, { -1.09874, -1.3289, -1.55903, -1.78912, -2.01918, -2.24918, -2.47911, -2.70896, -2.9387, -3.16831, -3.39775, -3.62698, -3.85595, -4.08459, -4.31281, -4.5405, -4.76755, -4.99377, -5.21897, -5.44291, -5.66525, -5.88564, -6.10361, -6.31861, -6.52999, -6.73697, -6.93867, -7.13411, -7.32218, -7.50174, -7.67163, -7.83072, -7.97804, -8.11281, -8.23452, -8.34301, -8.43844, -8.52132, -8.59243, -8.65276, -8.70341, -8.74556, -8.78036, -8.80888, -8.83214, -8.851, -8.86625}, { -1.09872, -1.32889, -1.55905, -1.78918, -2.01927, -2.24933, -2.47933, -2.70926, -2.93911, -3.16885, -3.39846, -3.6279, -3.85713, -4.0861, -4.31474, -4.54296, -4.77065, -4.9977, -5.22392, -5.44913, -5.67306, -5.89541, -6.1158, -6.33377, -6.54877, -6.76015, -6.96713, -7.16884, -7.36428, -7.55235, -7.73192, -7.90181, -8.06091, -8.20823, -8.34301, -8.46472, -8.57322, -8.66866, -8.75154, -8.82266, -8.88299, -8.93365, -8.9758, -9.0106, -9.03913, -9.06239, -9.08126}, { -1.0987, -1.32889, -1.55907, -1.78922, -2.01935, -2.24945, -2.4795, -2.7095, -2.93943, -3.16928, -3.39902, -3.62863, -3.85807, -4.08731, -4.31627, -4.54491, -4.77313, -5.00083, -5.22787, -5.4541, -5.6793, -5.90323, -6.12558, -6.34597, -6.56395, -6.77895, -6.99033, -7.19731, -7.39902, -7.59446, -7.78254, -7.96211, -8.132, -8.29111, -8.43844, -8.57322, -8.69494, -8.80344, -8.89888, -8.98177, -9.05289, -9.11323, -9.16389, -9.20605, -9.24085, -9.26938, -9.29264}, { -1.09868, -1.32889, -1.55908, -1.78926, -2.01941, -2.24954, -2.47964, -2.70969, -2.93969, -3.16962, -3.39947, -3.62921, -3.85882, -4.08826, -4.3175, -4.54646, -4.7751, -5.00332, -5.23102, -5.45806, -5.68429, -5.90949, -6.13342, -6.35578, -6.57617, -6.79414, -7.00914, -7.22052, -7.42751, -7.62922, -7.82466, -8.01274, -8.19232, -8.36221, -8.52132, -8.66866, -8.80344, -8.92516, -9.03366, -9.12911, -9.21201, -9.28313, -9.34347, -9.39414, -9.43629, -9.4711, -9.49963}, { -1.09867, -1.32888, -1.55909, -1.78928, -2.01946, -2.24962, -2.47974, -2.70984, -2.93989, -3.16989, -3.39982, -3.62967, -3.85942, -4.08903, -4.31847, -4.5477, -4.77667, -5.0053, -5.23352, -5.46122, -5.68827, -5.91449, -6.1397, -6.36363, -6.58598, -6.80637, -7.02435, -7.23935, -7.45073, -7.65772, -7.85943, -8.05488, -8.24296, -8.42253, -8.59243, -8.75154, -8.89888, -9.03366, -9.15539, -9.2639, -9.35935, -9.44225, -9.51338, -9.57372, -9.62438, -9.66654, -9.70135}, { -1.09865, -1.32888, -1.5591, -1.7893, -2.0195, -2.24967, -2.47983, -2.70996, -2.94005, -3.17011, -3.40011, -3.63004, -3.85989, -4.08963, -4.31924, -4.54868, -4.77792, -5.00688, -5.23552, -5.46374, -5.69144, -5.91848, -6.14471, -6.36991, -6.59385, -6.8162, -7.03659, -7.25456, -7.46957, -7.68095, -7.88794, -8.08965, -8.2851, -8.47318, -8.65276, -8.82266, -8.98177, -9.12911, -9.2639, -9.38563, -9.49414, -9.58959, -9.67249, -9.74362, -9.80396, -9.85463, -9.8968}, { -1.09865, -1.32888, -1.5591, -1.78932, -2.01953, -2.24972, -2.4799, -2.71005, -2.94018, -3.17028, -3.40033, -3.63033, -3.86026, -4.09011, -4.31986, -4.54947, -4.77891, -5.00814, -5.23711, -5.46574, -5.69396, -5.92166, -6.14871, -6.37493, -6.60014, -6.82407, -7.04642, -7.26682, -7.48479, -7.6998, -7.91118, -8.11817, -8.31988, -8.51533, -8.70341, -8.88299, -9.05289, -9.21201, -9.35935, -9.49414, -9.61587, -9.72438, -9.81984, -9.90274, -9.97387, -10.0342, -10.0849}, { -1.09864, -1.32888, -1.55911, -1.78934, -2.01955, -2.24976, -2.47995, -2.71013, -2.94029, -3.17041, -3.40051, -3.63056, -3.86056, -4.0905, -4.32034, -4.55009, -4.7797, -5.00914, -5.23837, -5.46734, -5.69598, -5.9242, -6.15189, -6.37894, -6.60516, -6.83037, -7.0543, -7.27666, -7.49705, -7.71502, -7.93003, -8.14141, -8.3484, -8.55012, -8.74556, -8.93365, -9.11323, -9.28313, -9.44225, -9.58959, -9.72438, -9.84612, -9.95463, -10.0501, -10.133, -10.2041, -10.2645}, { -1.09863, -1.32888, -1.55911, -1.78935, -2.01957, -2.24979, -2.48, -2.71019, -2.94037, -3.17052, -3.40065, -3.63075, -3.8608, -4.0908, -4.32073, -4.55058, -4.78032, -5.00993, -5.23938, -5.46861, -5.69758, -5.92621, -6.15443, -6.38213, -6.60917, -6.8354, -7.06061, -7.28454, -7.50689, -7.72729, -7.94526, -8.16027, -8.37165, -8.57864, -8.78036, -8.9758, -9.16389, -9.34347, -9.51338, -9.67249, -9.81984, -9.95463, -10.0764, -10.1849, -10.2803, -10.3632, -10.4344}, { -1.09863, -1.32887, -1.55912, -1.78935, -2.01959, -2.24981, -2.48003, -2.71024, -2.94043, -3.17061, -3.40076, -3.63089, -3.86099, -4.09104, -4.32104, -4.55097, -4.78082, -5.01056, -5.24017, -5.46962, -5.69885, -5.92782, -6.15645, -6.38467, -6.61237, -6.83942, -7.06564, -7.29085, -7.51478, -7.73713, -7.95753, -8.1755, -8.39051, -8.60189, -8.80888, -9.0106, -9.20605, -9.39414, -9.57372, -9.74362, -9.90274, -10.0501, -10.1849, -10.3066, -10.4151, -10.5106, -10.5935}, { -1.09863, -1.32887, -1.55912, -1.78936, -2.0196, -2.24983, -2.48006, -2.71028, -2.94048, -3.17068, -3.40085, -3.63101, -3.86114, -4.09123, -4.32128, -4.55128, -4.78122, -5.01107, -5.24081, -5.47042, -5.69986, -5.92909, -6.15806, -6.3867, -6.61492, -6.84262, -7.06966, -7.29589, -7.52109, -7.74503, -7.96738, -8.18777, -8.40575, -8.62076, -8.83214, -9.03913, -9.24085, -9.43629, -9.62438, -9.80396, -9.97387, -10.133, -10.2803, -10.4151, -10.5369, -10.6454, -10.7408}, { -1.09862, -1.32887, -1.55912, -1.78937, -2.01961, -2.24985, -2.48008, -2.71031, -2.94052, -3.17073, -3.40092, -3.6311, -3.86126, -4.09138, -4.32148, -4.55153, -4.78153, -5.01147, -5.24131, -5.47106, -5.70067, -5.93011, -6.15934, -6.38831, -6.61695, -6.84517, -7.07286, -7.29991, -7.52614, -7.75134, -7.97528, -8.19763, -8.41802, -8.636, -8.851, -9.06239, -9.26938, -9.4711, -9.66654, -9.85463, -10.0342, -10.2041, -10.3632, -10.5106, -10.6454, -10.7671, -10.8756}, { -1.09862, -1.32887, -1.55912, -1.78937, -2.01962, -2.24986, -2.4801, -2.71033, -2.94056, -3.17077, -3.40098, -3.63117, -3.86135, -4.09151, -4.32163, -4.55173, -4.78178, -5.01178, -5.24172, -5.47156, -5.70131, -5.93092, -6.16036, -6.38959, -6.61856, -6.8472, -7.07542, -7.30311, -7.53016, -7.75639, -7.98159, -8.20553, -8.42788, -8.64827, -8.86625, -9.08126, -9.29264, -9.49963, -9.70135, -9.8968, -10.0849, -10.2645, -10.4344, -10.5935, -10.7408, -10.8756, -10.9974}}; const double qual_score[47] = { -2, -1.58147, -0.996843, -0.695524, -0.507676, -0.38013, -0.289268, -0.222552, -0.172557, -0.134552, -0.105361, -0.0827653, -0.0651742, -0.0514183, -0.0406248, -0.0321336, -0.0254397, -0.0201544, -0.0159759, -0.0126692, -0.0100503, -0.007975, -0.00632956, -0.00502447, -0.00398902, -0.00316729, -0.00251505, -0.00199726, -0.00158615, -0.00125972, -0.0010005, -0.000794644, -0.000631156, -0.000501313, -0.000398186, -0.000316278, -0.00025122, -0.000199546, -0.000158502, -0.0001259, -0.000100005, -7.9436e-05, -6.30977e-05, -5.012e-05, -3.98115e-05, -3.16233e-05, -2.51192e-05}; */ int longestBase = 1000; Alignment* alignment; if(pDataArray->align == "gotoh") { alignment = new GotohOverlap(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch, longestBase); } else if(pDataArray->align == "needleman") { alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, longestBase); } else if(pDataArray->align == "kmer") { alignment = new KmerAlign(pDataArray->kmerSize); } pDataArray->count = 0; int num = 0; string thisfqualindexfile, thisrqualindexfile, thisffastafile, thisrfastafile; thisfqualindexfile = ""; thisrqualindexfile = ""; thisffastafile = pDataArray->inputFiles[0]; thisrfastafile = pDataArray->inputFiles[1]; if (pDataArray->qualOrIndexFiles.size() != 0) { thisfqualindexfile = pDataArray->qualOrIndexFiles[0]; thisrqualindexfile = pDataArray->qualOrIndexFiles[1]; } if (pDataArray->m->debug) { pDataArray->m->mothurOut("[DEBUG]: ffasta = " + thisffastafile + ".\n[DEBUG]: rfasta = " + thisrfastafile + ".\n[DEBUG]: fqualindex = " + thisfqualindexfile + ".\n[DEBUG]: rqualindex = " + thisfqualindexfile + ".\n"); } ifstream inFFasta, inRFasta, inFQualIndex, inRQualIndex; ofstream outFasta, outMisMatch, outScrapFasta, outQual, outScrapQual; pDataArray->m->openInputFile(thisffastafile, inFFasta); pDataArray->m->openInputFile(thisrfastafile, inRFasta); bool begin = false; //print header if you are process 0 if ((pDataArray->linesInput_start == 0) || (pDataArray->linesInput_start == 1)) { begin = true; inFFasta.seekg(0); inRFasta.seekg(0); pDataArray->m->zapGremlins(inFFasta); pDataArray->m->zapGremlins(inRFasta); }else { //this accounts for the difference in line endings. inFFasta.seekg(pDataArray->linesInput_start-1); pDataArray->m->gobble(inFFasta); inRFasta.seekg(pDataArray->linesInputReverse_start-1); pDataArray->m->gobble(inRFasta); } bool hasIndex = false; if (thisfqualindexfile != "") { if (thisfqualindexfile != "NONE") { pDataArray->m->openInputFile(thisfqualindexfile, inFQualIndex); if (begin) { inFQualIndex.seekg(0); pDataArray->m->zapGremlins(inFQualIndex); } else { inFQualIndex.seekg(pDataArray->qlinesInput_start-1); pDataArray->m->gobble(inFQualIndex); } hasIndex = true; }else { thisfqualindexfile = ""; } if (thisrqualindexfile != "NONE") { pDataArray->m->openInputFile(thisrqualindexfile, inRQualIndex); if (begin) { inRQualIndex.seekg(0); pDataArray->m->zapGremlins(inRQualIndex); } else { inRQualIndex.seekg(pDataArray->qlinesInputReverse_start-1); pDataArray->m->gobble(inRQualIndex); } hasIndex = true; }else { thisrqualindexfile = ""; } } pDataArray->m->openOutputFile(pDataArray->outputFasta, outFasta); pDataArray->m->openOutputFile(pDataArray->outputScrapFasta, outScrapFasta); pDataArray->m->openOutputFile(pDataArray->outputMisMatches, outMisMatch); bool hasQuality = false; outMisMatch << "Name\tLength\tOverlap_Length\tOverlap_Start\tOverlap_End\tMisMatches\tNum_Ns\n"; if (pDataArray->delim == '@') { //fastq files so make an output quality pDataArray->m->openOutputFile(pDataArray->outputQual, outQual); pDataArray->m->openOutputFile(pDataArray->outputScrapQual, outScrapQual); hasQuality = true; }else if ((pDataArray->delim == '>') && (pDataArray->qualOrIndexFiles.size() != 0)) { //fasta and qual files pDataArray->m->openOutputFile(pDataArray->outputQual, outQual); pDataArray->m->openOutputFile(pDataArray->outputScrapQual, outScrapQual); hasQuality = true; } if(pDataArray->allFiles){ for (int i = 0; i < pDataArray->fastaFileNames.size(); i++) { //clears old file for (int j = 0; j < pDataArray->fastaFileNames[i].size(); j++) { //clears old file if (pDataArray->fastaFileNames[i][j] != "") { ofstream temp, temp2; pDataArray->m->openOutputFile(pDataArray->fastaFileNames[i][j], temp); temp.close(); pDataArray->m->openOutputFile(pDataArray->qualFileNames[i][j], temp2); temp2.close(); } } } } Oligos oligos; if (pDataArray->oligosfile != "") { oligos.read(pDataArray->oligosfile, false); } int numFPrimers = oligos.getPairedPrimers().size(); int numBarcodes = oligos.getPairedBarcodes().size(); TrimOligos trimOligos(pDataArray->pdiffs, pDataArray->bdiffs, 0, 0, oligos.getPairedPrimers(), oligos.getPairedBarcodes(), hasIndex); TrimOligos* rtrimOligos = NULL; if (pDataArray->reorient) { rtrimOligos = new TrimOligos(pDataArray->pdiffs, pDataArray->bdiffs, 0, 0, oligos.getReorientedPairedPrimers(), oligos.getReorientedPairedBarcodes(), hasIndex); numBarcodes = oligos.getReorientedPairedBarcodes().size(); } for(int i = 0; i < pDataArray->linesInput_end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { break; } int success = 1; string trashCode = ""; string commentString = ""; int currentSeqsDiffs = 0; bool hasIndex = false; bool ignore; ignore = false; Sequence fSeq, rSeq; QualityScores* fQual = NULL; QualityScores* rQual = NULL; QualityScores* savedFQual = NULL; QualityScores* savedRQual = NULL; Sequence findexBarcode("findex", "NONE"); Sequence rindexBarcode("rindex", "NONE"); if (pDataArray->delim == '@') { //fastq files bool tignore; FastqRead fread(inFFasta, tignore, pDataArray->format); pDataArray->m->gobble(inFFasta); FastqRead rread(inRFasta, ignore, pDataArray->format); pDataArray->m->gobble(inRFasta); ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == rread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = rread.getName().substr(0, rread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (rread.getName()[rread.getName().length()-1] == '2')) { fread.setName(tempFRead); rread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead f2read(inFFasta, tignore, pDataArray->format); pDataArray->m->gobble(inFFasta); ///bool fixed = checkName(f2read, rread); ////////////////////////////////////////////////////////////// fixed = false; if (f2read.getName() == rread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = f2read.getName().substr(0, f2read.getName().length()-1); string tempRRead = rread.getName().substr(0, rread.getName().length()-1); if (tempFRead == tempRRead) { if ((f2read.getName()[f2read.getName().length()-1] == '1') && (rread.getName()[rread.getName().length()-1] == '2')) { f2read.setName(tempFRead); rread.setName(tempRRead); fixed = true; } } } if (!fixed) { FastqRead r2read(inRFasta, ignore, pDataArray->format); pDataArray->m->gobble(inRFasta); ///bool fixed = checkName(fread, r2read); ////////////////////////////////////////////////////////////// fixed = false; if (fread.getName() == r2read.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = r2read.getName().substr(0, r2read.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (r2read.getName()[r2read.getName().length()-1] == '2')) { fread.setName(tempFRead); r2read.setName(tempRRead); fixed = true; } } } if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, " + fread.getName() + ".\n"); ignore = true; } else { rread = r2read; } }else { fread = f2read; } ///////////////////////////////////////////////////////////// } if (tignore) { ignore=true; } fSeq.setName(fread.getName()); fSeq.setAligned(fread.getSeq()); rSeq.setName(rread.getName()); rSeq.setAligned(rread.getSeq()); fQual = new QualityScores(fread.getName(), fread.getScores()); rQual = new QualityScores(rread.getName(), rread.getScores()); savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (thisfqualindexfile != "") { //forward index file FastqRead firead(inFQualIndex, tignore, pDataArray->format); pDataArray->m->gobble(inFQualIndex); if (tignore) { ignore=true; } findexBarcode.setAligned(firead.getSeq()); ///bool fixed = checkName(fread, firead); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == firead.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = firead.getName().substr(0, firead.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (firead.getName()[firead.getName().length()-1] == '2')) { fread.setName(tempFRead); firead.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead f2iread(inFQualIndex, tignore, pDataArray->format); pDataArray->m->gobble(inFQualIndex); if (tignore) { ignore=true; } ///bool fixed = checkName(fread, f2iread); ///////////////////////////////////////////////////////////// fixed = false; if (fread.getName() == f2iread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = f2iread.getName().substr(0, f2iread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (f2iread.getName()[f2iread.getName().length()-1] == '2')) { fread.setName(tempFRead); f2iread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { firead = f2iread; findexBarcode.setAligned(f2iread.getSeq()); } } hasIndex = true; } if (thisrqualindexfile != "") { //reverse index file FastqRead riread(inRQualIndex, tignore, pDataArray->format); pDataArray->m->gobble(inRQualIndex); if (tignore) { ignore=true; } rindexBarcode.setAligned(riread.getSeq()); ///bool fixed = checkName(fread, riread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fread.getName() == riread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = riread.getName().substr(0, riread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (riread.getName()[riread.getName().length()-1] == '2')) { fread.setName(tempFRead); riread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { FastqRead r2iread(inRQualIndex, tignore, pDataArray->format); pDataArray->m->gobble(inRQualIndex); ///bool fixed = checkName(fread, r2iread); ///////////////////////////////////////////////////////////// fixed = false; if (fread.getName() == r2iread.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fread.getName().substr(0, fread.getName().length()-1); string tempRRead = r2iread.getName().substr(0, r2iread.getName().length()-1); if (tempFRead == tempRRead) { if ((fread.getName()[fread.getName().length()-1] == '1') && (r2iread.getName()[r2iread.getName().length()-1] == '2')) { fread.setName(tempFRead); r2iread.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in reverse index file. Ignoring, " + fread.getName() + ".\n"); ignore = true; }else { riread = r2iread; rindexBarcode.setAligned(riread.getSeq()); } } hasIndex = true; } }else { //reading fasta and maybe qual Sequence tfSeq(inFFasta); pDataArray->m->gobble(inFFasta); Sequence trSeq(inRFasta); pDataArray->m->gobble(inRFasta); ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (tfSeq.getName() == trSeq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tfSeq.getName().substr(0, tfSeq.getName().length()-1); string tempRRead = trSeq.getName().substr(0, trSeq.getName().length()-1); if (tempFRead == tempRRead) { if ((tfSeq.getName()[tfSeq.getName().length()-1] == '1') && (trSeq.getName()[trSeq.getName().length()-1] == '2')) { tfSeq.setName(tempFRead); trSeq.setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { Sequence tf2Seq(inFFasta); pDataArray->m->gobble(inFFasta); ///bool fixed = checkName(f2read, rread); ////////////////////////////////////////////////////////////// fixed = false; if (tf2Seq.getName() == trSeq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tf2Seq.getName().substr(0, tf2Seq.getName().length()-1); string tempRRead = trSeq.getName().substr(0, trSeq.getName().length()-1); if (tempFRead == tempRRead) { if ((tf2Seq.getName()[tf2Seq.getName().length()-1] == '1') && (trSeq.getName()[trSeq.getName().length()-1] == '2')) { tf2Seq.setName(tempFRead); trSeq.setName(tempRRead); fixed = true; } } } if (!fixed) { Sequence tr2Seq(inRFasta); pDataArray->m->gobble(inRFasta); ///bool fixed = checkName(fread, r2read); ////////////////////////////////////////////////////////////// fixed = false; if (tfSeq.getName() == tr2Seq.getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = tfSeq.getName().substr(0, tfSeq.getName().length()-1); string tempRRead = tr2Seq.getName().substr(0, tr2Seq.getName().length()-1); if (tempFRead == tempRRead) { if ((tfSeq.getName()[tfSeq.getName().length()-1] == '1') && (tr2Seq.getName()[tr2Seq.getName().length()-1] == '2')) { tfSeq.setName(tempFRead); tr2Seq.setName(tempRRead); fixed = true; } } } if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fastq file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } else { trSeq = tr2Seq; } }else { tfSeq = tf2Seq; } ///////////////////////////////////////////////////////////// } fSeq.setName(tfSeq.getName()); fSeq.setAligned(tfSeq.getAligned()); rSeq.setName(trSeq.getName()); rSeq.setAligned(trSeq.getAligned()); if (thisfqualindexfile != "") { fQual = new QualityScores(inFQualIndex); pDataArray->m->gobble(inFQualIndex); rQual = new QualityScores(inRQualIndex); pDataArray->m->gobble(inRQualIndex); if (fQual->getName() != rQual->getName()) { ///bool fixed = checkName(fread, rread); ////////////////////////////////////////////////////////////// bool fixed = false; if (fQual->getName() == rQual->getName()) { fixed = true; }else { //if no match are the names only different by 1 and 2? string tempFRead = fQual->getName().substr(0, fQual->getName().length()-1); string tempRRead = rQual->getName().substr(0, rQual->getName().length()-1); if (tempFRead == tempRRead) { if ((fQual->getName()[fQual->getName().length()-1] == '1') && (rQual->getName()[rQual->getName().length()-1] == '2')) { fQual->setName(tempFRead); rQual->setName(tempRRead); fixed = true; } } } ///////////////////////////////////////////////////////////// if (!fixed) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse qfile file. Ignoring, " + fQual->getName() + ".\n"); ignore = true; } } savedFQual = new QualityScores(fQual->getName(), fQual->getQualityScores()); savedRQual = new QualityScores(rQual->getName(), rQual->getQualityScores()); if (fQual->getName() != tfSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward quality file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } if (rQual->getName() != trSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in reverse quality file. Ignoring, " + trSeq.getName() + ".\n"); ignore = true; } } if (tfSeq.getName() != trSeq.getName()) { pDataArray->m->mothurOut("[WARNING]: name mismatch in forward and reverse fasta file. Ignoring, " + tfSeq.getName() + ".\n"); ignore = true; } } int barcodeIndex = 0; int primerIndex = 0; if (!ignore) { Sequence savedFSeq(fSeq.getName(), fSeq.getAligned()); Sequence savedRSeq(rSeq.getName(), rSeq.getAligned()); Sequence savedFindex(findexBarcode.getName(), findexBarcode.getAligned()); Sequence savedRIndex(rindexBarcode.getName(), rindexBarcode.getAligned()); if(numBarcodes != 0){ vector results; if (hasQuality) { if (hasIndex) { results = trimOligos.stripBarcode(findexBarcode, rindexBarcode, *fQual, *rQual, barcodeIndex); }else { results = trimOligos.stripBarcode(fSeq, rSeq, *fQual, *rQual, barcodeIndex); } }else { results = trimOligos.stripBarcode(fSeq, rSeq, barcodeIndex); } success = results[0] + results[2]; commentString += "fbdiffs=" + toString(results[0]) + "(" + trimOligos.getCodeValue(results[1], pDataArray->bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + trimOligos.getCodeValue(results[3], pDataArray->bdiffs) + ") "; if(success > pDataArray->bdiffs) { trashCode += 'b'; } else{ currentSeqsDiffs += success; } } if(numFPrimers != 0){ vector results; if (hasQuality) { results = trimOligos.stripForward(fSeq, rSeq, *fQual, *rQual, primerIndex); }else { results = trimOligos.stripForward(fSeq, rSeq, primerIndex); } success = results[0] + results[2]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos.getCodeValue(results[1], pDataArray->pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + trimOligos.getCodeValue(results[3], pDataArray->pdiffs) + ") "; if(success > pDataArray->pdiffs) { trashCode += 'f'; } else{ currentSeqsDiffs += success; } } if (currentSeqsDiffs > pDataArray->tdiffs) { trashCode += 't'; } if (pDataArray->reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; string thiscommentString = ""; int thisCurrentSeqsDiffs = 0; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; if(numBarcodes != 0){ vector results; if (hasQuality) { if (hasIndex) { results = rtrimOligos->stripBarcode(savedFindex, savedRIndex, *savedFQual, *savedRQual, thisBarcodeIndex); }else { results = rtrimOligos->stripBarcode(savedFSeq, savedRSeq, *savedFQual, *savedRQual, thisBarcodeIndex); } }else { results = rtrimOligos->stripBarcode(savedFSeq, savedRSeq, thisBarcodeIndex); } thisSuccess = results[0] + results[2]; thiscommentString += "fbdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pDataArray->bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pDataArray->bdiffs) + ") "; if(thisSuccess > pDataArray->bdiffs) { thisTrashCode += 'b'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if(numFPrimers != 0){ vector results; if (hasQuality) { results = rtrimOligos->stripForward(savedFSeq, savedRSeq, *savedFQual, *savedRQual, thisPrimerIndex); }else { results = rtrimOligos->stripForward(savedFSeq, savedRSeq, thisPrimerIndex); } thisSuccess = results[0] + results[2]; thiscommentString += "fpdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pDataArray->pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pDataArray->pdiffs) + ") "; if(thisSuccess > pDataArray->pdiffs) { thisTrashCode += 'f'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if (thisCurrentSeqsDiffs > pDataArray->tdiffs) { thisTrashCode += 't'; } if (thisTrashCode == "") { trashCode = thisTrashCode; success = thisSuccess; currentSeqsDiffs = thisCurrentSeqsDiffs; commentString = thiscommentString; barcodeIndex = thisBarcodeIndex; primerIndex = thisPrimerIndex; savedFSeq.reverseComplement(); savedRSeq.reverseComplement(); fSeq.setAligned(savedFSeq.getAligned()); rSeq.setAligned(savedRSeq.getAligned()); if(hasQuality){ savedFQual->flipQScores(); savedRQual->flipQScores(); fQual->setScores(savedFQual->getScores()); rQual->setScores(savedRQual->getScores()); } }else { trashCode += "(" + thisTrashCode + ")"; } } //flip the reverse reads rSeq.reverseComplement(); if (hasQuality) { rQual->flipQScores(); } //pairwise align alignment->align(fSeq.getUnaligned(), rSeq.getUnaligned()); map ABaseMap = alignment->getSeqAAlnBaseMap(); map BBaseMap = alignment->getSeqBAlnBaseMap(); fSeq.setAligned(alignment->getSeqAAln()); rSeq.setAligned(alignment->getSeqBAln()); int length = fSeq.getAligned().length(); //traverse alignments merging into one contiguous seq string contig = ""; int numMismatches = 0; string seq1 = fSeq.getAligned(); string seq2 = rSeq.getAligned(); vector scores1, scores2, contigScores; if (hasQuality) { scores1 = fQual->getQualityScores(); scores2 = rQual->getQualityScores(); delete fQual; delete rQual; delete savedFQual; delete savedRQual; } // if (num < 5) { cout << fSeq.getStartPos() << '\t' << fSeq.getEndPos() << '\t' << rSeq.getStartPos() << '\t' << rSeq.getEndPos() << endl; } int overlapStart = fSeq.getStartPos()-1; int seq2Start = rSeq.getStartPos()-1; //bigger of the 2 starting positions is the location of the overlapping start if (overlapStart < seq2Start) { //seq2 starts later so take from 0 to seq2Start from seq1 overlapStart = seq2Start; for (int i = 0; i < overlapStart; i++) { contig += seq1[i]; if (hasQuality) { if (((seq1[i] != '-') && (seq1[i] != '.'))) { contigScores.push_back(scores1[ABaseMap[i]]); } } } }else { //seq1 starts later so take from 0 to overlapStart from seq2 for (int i = 0; i < overlapStart; i++) { contig += seq2[i]; if (hasQuality) { if (((seq2[i] != '-') && (seq2[i] != '.'))) { contigScores.push_back(scores2[BBaseMap[i]]); } } } } int seq1End = fSeq.getEndPos(); int seq2End = rSeq.getEndPos(); int overlapEnd = seq1End; if (seq2End < overlapEnd) { overlapEnd = seq2End; } //smallest end position is where overlapping ends int firstForward = 0; int seq2FirstForward = 0; int lastReverse = seq1.length(); int seq2lastReverse = seq2.length(); bool firstChooseSeq1 = false; bool lastChooseSeq1 = false; if (hasQuality) { for (int i = 0; i < seq1.length(); i++) { if ((seq1[i] != '.') && (seq1[i] != '-')) { if (scores1[ABaseMap[i]] == 2) { firstForward++; }else { break; } } } for (int i = 0; i < seq2.length(); i++) { if ((seq2[i] != '.') && (seq2[i] != '-')) { if (scores2[BBaseMap[i]] == 2) { seq2FirstForward++; }else { break; } } } if (seq2FirstForward > firstForward) { firstForward = seq2FirstForward; firstChooseSeq1 = true; } for (int i = seq1.length()-1; i >= 0; i--) { if ((seq1[i] != '.') && (seq1[i] != '-')) { if (scores1[ABaseMap[i]] == 2) { lastReverse--; }else { break; } } } for (int i = seq2.length()-1; i >= 0; i--) { if ((seq2[i] != '.') && (seq2[i] != '-')) { if (scores2[BBaseMap[i]] == 2) { seq2lastReverse--; }else { break; } } } if (lastReverse > seq2lastReverse) { lastReverse = seq2lastReverse; lastChooseSeq1 = true; } } int oStart = contig.length(); //cout << fSeq.getAligned() << endl; cout << rSeq.getAligned() << endl; for (int i = overlapStart; i < overlapEnd; i++) { //cout << seq1[i] << ' ' << seq2[i] << ' ' << scores1[ABaseMap[i]] << ' ' << scores2[BBaseMap[i]] << endl; if (seq1[i] == seq2[i]) { //match, add base and choose highest score contig += seq1[i]; if (hasQuality) { //contigScores.push_back(convertProb(qual_match_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])])); ///////////////////////////////////////////////////////////// int qualScore = 1; double qProb = qual_match_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])]; int lower = 0; int upper = 46; if (qProb < qual_score[0]) { qualScore = 1; } else { while (lower < upper) { int mid = lower + (upper - lower) / 2; if (qual_score[mid] == qProb) { qualScore = mid; lower = upper; } if (mid == lower) { qualScore = lower; lower = upper; } else if (qual_score[mid] > qProb) { upper = mid; } else if (qual_score[mid] < qProb) { lower = mid + 1; } } } qualScore = lower; contigScores.push_back(qualScore); //////////////////////////////////////////////////////////// } }else if (((seq1[i] == '.') || (seq1[i] == '-')) && ((seq2[i] != '-') && (seq2[i] != '.'))) { //seq1 is a gap and seq2 is a base, choose seq2, unless quality score for base is below insert. In that case eliminate base if (hasQuality) { if (scores2[BBaseMap[i]] <= pDataArray->insert) { } // else { contig += seq2[i]; contigScores.push_back(scores2[BBaseMap[i]]); } }else { contig += seq2[i]; } //with no quality info, then we keep it? }else if (((seq2[i] == '.') || (seq2[i] == '-')) && ((seq1[i] != '-') && (seq1[i] != '.'))) { //seq2 is a gap and seq1 is a base, choose seq1, unless quality score for base is below insert. In that case eliminate base if (hasQuality) { if (scores1[ABaseMap[i]] <= pDataArray->insert) { } // else { contig += seq1[i]; contigScores.push_back(scores1[ABaseMap[i]]); } }else { contig += seq1[i]; } //with no quality info, then we keep it? }else if (((seq1[i] != '-') && (seq1[i] != '.')) && ((seq2[i] != '-') && (seq2[i] != '.'))) { //both bases choose one with better quality if (hasQuality) { if (abs(scores1[ABaseMap[i]] - scores2[BBaseMap[i]]) >= pDataArray->deltaq) { //is the difference in qual scores >= deltaq, if yes choose base with higher score char c = seq1[i]; if (scores1[ABaseMap[i]] < scores2[BBaseMap[i]]) { c = seq2[i]; } contig += c; if ((i >= firstForward) && (i <= lastReverse)) { //in unmasked section //contigScores.push_back(convertProb(qual_mismatch_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])])); ///////////////////////////////////////////////////////////// int qualScore = 1; double qProb = qual_mismatch_simple_bayesian[PHREDCLAMP(scores1[ABaseMap[i]])][PHREDCLAMP(scores2[BBaseMap[i]])]; int lower = 0; int upper = 46; if (qProb < qual_score[0]) { qualScore = 1; } else { while (lower < upper) { int mid = lower + (upper - lower) / 2; if (qual_score[mid] == qProb) { qualScore = mid; lower = upper; } if (mid == lower) { qualScore = lower; lower = upper; } else if (qual_score[mid] > qProb) { upper = mid; } else if (qual_score[mid] < qProb) { lower = mid + 1; } } } qualScore = lower; contigScores.push_back(qualScore); //////////////////////////////////////////////////////////// }else if (i < firstForward) { if (firstChooseSeq1) { contigScores.push_back(scores1[ABaseMap[i]]); } else { contigScores.push_back(scores2[BBaseMap[i]]); } }else if ((i > lastReverse)) { if (lastChooseSeq1) { contigScores.push_back(scores1[ABaseMap[i]]); } else { contigScores.push_back(scores2[BBaseMap[i]]); } }else { contigScores.push_back(2); } //N }else { //if no, base becomes n contig += 'N'; contigScores.push_back(2); } numMismatches++; }else { numMismatches++; } //cant decide, so eliminate and mark as mismatch }else { //should never get here pDataArray->m->mothurOut("[ERROR]: case I didn't think of seq1 = " + toString(seq1[i]) + " and seq2 = " + toString(seq2[i]) + "\n"); } } int oend = contig.length(); if (seq1End < seq2End) { //seq1 ends before seq2 so take from overlap to length from seq2 for (int i = overlapEnd; i < length; i++) { contig += seq2[i]; if (hasQuality) { if (((seq2[i] != '-') && (seq2[i] != '.'))) { contigScores.push_back(scores2[BBaseMap[i]]); } } } }else { //seq2 ends before seq1 so take from overlap to length from seq1 for (int i = overlapEnd; i < length; i++) { contig += seq1[i]; if (hasQuality) { if (((seq1[i] != '-') && (seq1[i] != '.'))) { contigScores.push_back(scores1[ABaseMap[i]]); } } } } //cout << contig << endl; //exit(1); if (pDataArray->trimOverlap) { contig = contig.substr(overlapStart, oend-oStart); if (contig.length() == 0) { trashCode += "l"; } if (hasQuality) { vector newContigScores; for (int i = overlapStart; i < oend; i++) { newContigScores.push_back(contigScores[i]); } contigScores = newContigScores; } } if(trashCode.length() == 0){ bool ignore = false; if (pDataArray->m->debug) { pDataArray->m->mothurOut(fSeq.getName()); } if (pDataArray->createOligosGroup) { string thisGroup = oligos.getGroupName(barcodeIndex, primerIndex); if (pDataArray->m->debug) { pDataArray->m->mothurOut(", group= " + thisGroup + "\n"); } int pos = thisGroup.find("ignore"); if (pos == string::npos) { pDataArray->groupMap[fSeq.getName()] = thisGroup; map::iterator it = pDataArray->groupCounts.find(thisGroup); if (it == pDataArray->groupCounts.end()) { pDataArray->groupCounts[thisGroup] = 1; } else { pDataArray->groupCounts[it->first] ++; } }else { ignore = true; } }else if (pDataArray->createFileGroup) { //for 3 column file option int pos = pDataArray->group.find("ignore"); if (pos == string::npos) { pDataArray->groupMap[fSeq.getName()] = pDataArray->group; map::iterator it = pDataArray->groupCounts.find(pDataArray->group); if (it == pDataArray->groupCounts.end()) { pDataArray->groupCounts[pDataArray->group] = 1; } else { pDataArray->groupCounts[it->first] ++; } }else { ignore = true; } } if (pDataArray->m->debug) { pDataArray->m->mothurOut("\n"); } if(!ignore){ //output outFasta << ">" << fSeq.getName() << '\t' << commentString << endl << contig << endl; if (hasQuality) { outQual << ">" << fSeq.getName() << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { outQual << contigScores[i] << " "; } outQual << endl; } int numNs = 0; for (int i = 0; i < contig.length(); i++) { if (contig[i] == 'N') { numNs++; } } outMisMatch << fSeq.getName() << '\t' << contig.length() << '\t' << (oend-oStart) << '\t' << oStart << '\t' << oend << '\t' << numMismatches << '\t' << numNs << endl; if (pDataArray->allFiles) { ofstream output; pDataArray->m->openOutputFileAppend(pDataArray->fastaFileNames[barcodeIndex][primerIndex], output); output << ">" << fSeq.getName() << '\t' << commentString << endl << contig << endl; output.close(); if (hasQuality) { ofstream output2; pDataArray->m->openOutputFileAppend(pDataArray->qualFileNames[barcodeIndex][primerIndex], output2); output2 << ">" << fSeq.getName() << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { output2 << contigScores[i] << " "; } output2 << endl; output2.close(); } } } }else { //output outScrapFasta << ">" << fSeq.getName() << " | " << trashCode << '\t' << commentString << endl << contig << endl; if (hasQuality) { outScrapQual << ">" << fSeq.getName() << " | " << trashCode << '\t' << commentString << endl; for (int i = 0; i < contigScores.size(); i++) { outScrapQual << contigScores[i] << " "; } outScrapQual << endl; } } } pDataArray->count++; //report progress if((pDataArray->count) % 1000 == 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count)+"\n"); } } //report progress if((pDataArray->count) % 1000 != 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count)+"\n"); } inFFasta.close(); inRFasta.close(); outFasta.close(); outScrapFasta.close(); outMisMatch.close(); if (pDataArray->delim == '@') { if (thisfqualindexfile != "") { inFQualIndex.close(); } if (thisrqualindexfile != "") { inRQualIndex.close(); } outQual.close(); outScrapQual.close(); }else{ if (hasQuality) { inFQualIndex.close(); inRQualIndex.close(); outQual.close(); outScrapQual.close(); } } delete alignment; if (pDataArray->reorient) { delete rtrimOligos; } pDataArray->done = true; if (pDataArray->m->control_pressed) { pDataArray->m->mothurRemove(pDataArray->outputFasta); pDataArray->m->mothurRemove(pDataArray->outputMisMatches); pDataArray->m->mothurRemove(pDataArray->outputScrapFasta); if (hasQuality) { pDataArray->m->mothurRemove(pDataArray->outputQual); pDataArray->m->mothurRemove(pDataArray->outputScrapQual); } } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "MakeContigsCommand", "MyContigsThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/makefastqcommand.cpp000066400000000000000000000230701255543666200222600ustar00rootroot00000000000000/* * makefastqcommand.cpp * mothur * * Created by westcott on 2/14/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "makefastqcommand.h" #include "sequence.hpp" #include "qualityscores.h" //********************************************************************************************************************** vector MakeFastQCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fastq",false,true,true); parameters.push_back(pfasta); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "none", "none","fastq",false,true,true); parameters.push_back(pqfile); CommandParameter pformat("format", "Multiple", "sanger-illumina-illumina1.8+", "sanger", "", "", "","",false,false); parameters.push_back(pformat); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MakeFastQCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MakeFastQCommand::getHelpString(){ try { string helpString = ""; helpString += "The make.fastq command reads a fasta and quality file and creates a fastq file.\n"; helpString += "The make.fastq command parameters are fasta, qfile and format. fasta and qfile are required.\n"; helpString += "The format parameter is used to indicate whether your sequences are sanger, illumina1.8+ or illumina, default=sanger.\n"; helpString += "The make.fastq command should be in the following format: make.fastq(qfile=yourQualityFile, fasta=yourFasta).\n"; helpString += "Example make.fastq(fasta=amazon.fasta, qfile=amazon.qual).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MakeFastQCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MakeFastQCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fastq") { pattern = "[filename],fastq"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MakeFastQCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MakeFastQCommand::MakeFastQCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fastq"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MakeFastQCommand", "MakeFastQCommand"); exit(1); } } //********************************************************************************************************************** MakeFastQCommand::MakeFastQCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fastq"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; fastafile = ""; } else if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastafile); } qualfile = validParameter.validFile(parameters, "qfile", true); if (qualfile == "not open") { abort = true; qualfile = ""; } else if (qualfile == "not found") { qualfile = m->getQualFile(); if (qualfile != "") { m->mothurOut("Using " + qualfile + " as input file for the qfile parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current qualfile and the qfile parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setQualFile(qualfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(fastafile); } format = validParameter.validFile(parameters, "format", false); if (format == "not found"){ format = "sanger"; } if ((format != "sanger") && (format != "illumina") && (format != "illumina1.8+")) { m->mothurOut(format + " is not a valid format. Your format choices are sanger, illumina1.8+ and illumina, aborting." ); m->mothurOutEndLine(); abort=true; } } } catch(exception& e) { m->errorOut(e, "MakeFastQCommand", "MakeFastQCommand"); exit(1); } } //********************************************************************************************************************** int MakeFastQCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); string outputFile = getOutputFileName("fastq",variables); outputNames.push_back(outputFile); outputTypes["fastq"].push_back(outputFile); ofstream out; m->openOutputFile(outputFile, out); ifstream qFile; m->openInputFile(qualfile, qFile); ifstream fFile; m->openInputFile(fastafile, fFile); while (!fFile.eof() && !qFile.eof()) { if (m->control_pressed) { break; } Sequence currSeq(fFile); m->gobble(fFile); QualityScores currQual(qFile); m->gobble(qFile); if (currSeq.getName() != currQual.getName()) { m->mothurOut("[ERROR]: mismatch between fasta and quality files. Found " + currSeq.getName() + " in fasta file and " + currQual.getName() + " in quality file."); m->mothurOutEndLine(); m->control_pressed = true; } else { //print sequence out << '@' << currSeq.getName() << endl << currSeq.getAligned() << endl; string qualityString = convertQual(currQual.getQualityScores()); //print quality info out << '+' << currQual.getName() << endl << qualityString << endl; } } fFile.close(); qFile.close(); out.close(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MakeFastQCommand", "execute"); exit(1); } } //********************************************************************************************************************** string MakeFastQCommand::convertQual(vector qual) { try { string qualScores; for (int i = 0; i < qual.size(); i++) { int controlChar = int('!'); if (format == "illumina") { controlChar = int('@'); } int temp = qual[i] + controlChar; char qualChar = (char) temp; qualScores += qualChar; } return qualScores; } catch(exception& e) { m->errorOut(e, "MakeFastQCommand", "convertQual"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/makefastqcommand.h000066400000000000000000000016721255543666200217310ustar00rootroot00000000000000#ifndef MAKEFASTQCOMMAND_H #define MAKEFASTQCOMMAND_H /* * makefastqcommand.h * mothur * * Created by westcott on 2/14/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" class MakeFastQCommand : public Command { public: MakeFastQCommand(string); MakeFastQCommand(); ~MakeFastQCommand(){} vector setParameters(); string getCommandName() { return "make.fastq"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Make.fastq"; } string getDescription() { return "creates a fastq file from a fasta and quality file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string fastafile, qualfile, outputDir, format; bool abort; vector outputNames; string convertQual(vector); }; #endif mothur-1.36.1/source/commands/makefilecommand.cpp000066400000000000000000000253401255543666200220630ustar00rootroot00000000000000// // makefilecommand.cpp // Mothur // // Created by Sarah Westcott on 6/24/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #include "makefilecommand.h" //********************************************************************************************************************** vector MakeFileCommand::setParameters(){ try { CommandParameter ptype("type", "Multiple", "fastq-gz", "fastq", "", "", "","",false,false); parameters.push_back(ptype); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MakeFileCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MakeFileCommand::getHelpString(){ try { string helpString = ""; helpString += "The make.file command takes a input directory and creates a file file containing the fastq or gz files in the directory.\n"; helpString += "The make.fastq command parameters are inputdir and type. inputdir is required.\n"; helpString += "May create more than one file. Mothur will attempt to match paired files. \n"; helpString += "The type parameter allows you to set the type of files to look for. Options are fastq or gz. Default=fastq. \n"; helpString += "The make.file command should be in the following format: \n"; helpString += "make.file(inputdir=yourInputDirectory). \n"; helpString += "Example make.group(inputdir=fastqFiles)\n"; helpString += "Note: No spaces between parameter labels (i.e. inputdir), '=' and parameters (i.e. yourInputDirectory).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MakeFileCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MakeFileCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "file") { pattern = "[filename],[tag],file"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MakeFileCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MakeFileCommand::MakeFileCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["file"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MakeFileCommand", "MakeFileCommand"); exit(1); } } //********************************************************************************************************************** MakeFileCommand::MakeFileCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["file"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; m->mothurOut("[ERROR]: The inputdir parameter is required, aborting."); m->mothurOutEndLine(); abort = true; } else { if (m->dirCheck(inputDir)) {} // all set else { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter typeFile = validParameter.validFile(parameters, "type", false); if (typeFile == "not found"){ typeFile = "fastq"; } if ((typeFile != "fastq") && (typeFile != "gz")) { m->mothurOut(typeFile + " is not a valid type. Options are fastq or gz. I will use fastq."); m->mothurOutEndLine(); typeFile = "fastq"; } } } catch(exception& e) { m->errorOut(e, "MakeFileCommand", "MakeFileCommand"); exit(1); } } //********************************************************************************************************************** int MakeFileCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //find all .fastq files string tempFile = inputDir + "fileList.temp"; string findCommand = "find \"" + inputDir.substr(0, inputDir.length()-1) + "\" -maxdepth 1 -name \"*." + typeFile + "\" > \"" + tempFile + "\""; system(findCommand.c_str()); //read in list of files vector fastqFiles; m->readAccnos(tempFile, fastqFiles, "no error"); m->mothurRemove(tempFile); if (fastqFiles.size() == 0) { m->mothurOut("[WARNING]: Unable to find any " + typeFile + " files in your directory.\n"); } else { //sort into alpha order to put pairs togther if they exist sort(fastqFiles.begin(), fastqFiles.end()); vector< vector > paired; vector singles; string lastFile = ""; for (int i = 0; i < fastqFiles.size()-1; i++) { if (m->debug) { m->mothurOut("[DEBUG]: File " + toString(i) + " = " + fastqFiles[i] + ".\n"); } if (m->control_pressed) { break; } string simpleName1 = m->getRootName(m->getSimpleName(fastqFiles[i])); string simpleName2 = m->getRootName(m->getSimpleName(fastqFiles[i+1])); //possible pair if (simpleName1.length() == simpleName2.length()) { int numDiffs = 0; for (int j = 0; j < simpleName1.length(); j++) { if (numDiffs > 1) { break; } else if (simpleName1[j] != simpleName2[j]) { numDiffs++; } } if (numDiffs > 1) { singles.push_back(fastqFiles[i]); lastFile = fastqFiles[i]; } else { //only one diff = paired files int pos = simpleName1.find("R1"); int pos2 = simpleName2.find("R2"); if ((pos != string::npos) && (pos2 != string::npos)){ vector temp; temp.push_back(fastqFiles[i]); temp.push_back(fastqFiles[i+1]); lastFile = fastqFiles[i+1]; paired.push_back(temp); i++; }else { singles.push_back(fastqFiles[i]); lastFile = fastqFiles[i]; } } }else{ singles.push_back(fastqFiles[i]); lastFile = fastqFiles[i]; } } if (lastFile != fastqFiles[fastqFiles.size()-1]) { singles.push_back(fastqFiles[fastqFiles.size()-1]); } if (singles.size() != 0) { map variables; variables["[filename]"] = outputDir + "fileList."; variables["[tag]"] = "single"; string filename = getOutputFileName("file",variables); ofstream out; m->openOutputFile(filename, out); outputNames.push_back(filename); outputTypes["file"].push_back(filename); m->setFileFile(filename); for (int i = 0; i < singles.size(); i++) { out << singles[i] << endl; } out.close(); } if (paired.size() != 0) { map variables; variables["[filename]"] = outputDir + "fileList."; variables["[tag]"] = "paired"; string filename = getOutputFileName("file",variables); ofstream out; m->openOutputFile(filename, out); outputNames.push_back(filename); outputTypes["file"].push_back(filename); m->setFileFile(filename); for (int i = 0; i < paired.size(); i++) { for (int j = 0; j < paired[i].size(); j++) { out << paired[i][j] << '\t'; } out << endl; } out.close(); } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MakeFileCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/makefilecommand.h000066400000000000000000000017401255543666200215260ustar00rootroot00000000000000// // makefilecommand.h // Mothur // // Created by Sarah Westcott on 6/24/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #ifndef __Mothur__makefilecommand__ #define __Mothur__makefilecommand__ #include "command.hpp" class MakeFileCommand : public Command { public: MakeFileCommand(string); MakeFileCommand(); ~MakeFileCommand(){} vector setParameters(); string getCommandName() { return "make.file"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Make.file"; } string getDescription() { return "creates a file file containing fastq filenames"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string inputDir, outputDir, typeFile; vector outputNames; bool abort; }; #endif /* defined(__Mothur__makefilecommand__) */ mothur-1.36.1/source/commands/makegroupcommand.cpp000066400000000000000000000250171255543666200223010ustar00rootroot00000000000000/* * makegroupcommand.cpp * Mothur * * Created by westcott on 5/7/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "makegroupcommand.h" #include "sequence.hpp" //********************************************************************************************************************** vector MakeGroupCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","group",false,true,true); parameters.push_back(pfasta); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false,true); parameters.push_back(pgroups); CommandParameter poutput("output", "String", "", "", "", "", "","",false,false); parameters.push_back(poutput); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MakeGroupCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MakeGroupCommand::getHelpString(){ try { string helpString = ""; helpString += "The make.group command reads a fasta file or series of fasta files and creates a groupfile.\n"; helpString += "The make.group command parameters are fasta, groups and output. Fasta and group are required.\n"; helpString += "The output parameter allows you to specify the name of groupfile created. \n"; helpString += "The make.group command should be in the following format: \n"; helpString += "make.group(fasta=yourFastaFiles, groups=yourGroups). \n"; helpString += "Example make.group(fasta=seqs1.fasta-seq2.fasta-seqs3.fasta, groups=A-B-C)\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFiles).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MakeGroupCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MakeGroupCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "group") { pattern = "[filename],groups"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MakeGroupCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MakeGroupCommand::MakeGroupCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["group"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MakeGroupCommand", "MakeGroupCommand"); exit(1); } } //********************************************************************************************************************** MakeGroupCommand::MakeGroupCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["group"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } fastaFileName = validParameter.validFile(parameters, "fasta", false); if (fastaFileName == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->splitAtDash(fastaFileName, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); filename += m->getRootName(m->getSimpleName(fastaFileNames[i])); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } ifstream in; int ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else{ filename += m->getRootName(m->getSimpleName(fastaFileNames[i])); m->setFastaFile(fastaFileNames[i]); } } } //prevent giantic file name map variables; variables["[filename]"] = filename; if (fastaFileNames.size() > 3) { variables["[filename]"] = outputDir + "merge"; } filename = getOutputFileName("group",variables); //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } output = validParameter.validFile(parameters, "output", false); if (output == "not found") { output = ""; } else{ filename = output; } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { m->mothurOut("groups is a required parameter for the make.group command."); m->mothurOutEndLine(); abort = true; } else { m->splitAtDash(groups, groupsNames); } if (groupsNames.size() != fastaFileNames.size()) { m->mothurOut("You do not have the same number of valid fastfile files as groups. This could be because we could not open a fastafile."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "MakeGroupCommand", "MakeGroupCommand"); exit(1); } } //********************************************************************************************************************** int MakeGroupCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (outputDir == "") { outputDir = m->hasPath(fastaFileNames[0]); } filename = outputDir + filename; ofstream out; m->openOutputFile(filename, out); for (int i = 0; i < fastaFileNames.size(); i++) { if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(filename); return 0; } ifstream in; m->openInputFile(fastaFileNames[i], in); while (!in.eof()) { Sequence seq(in, "no align"); m->gobble(in); if (m->control_pressed) { outputTypes.clear(); in.close(); out.close(); m->mothurRemove(filename); return 0; } if (seq.getName() != "") { out << seq.getName() << '\t' << groupsNames[i] << endl; } } in.close(); } out.close(); m->mothurOutEndLine(); m->mothurOut("Output File Names: " + filename); m->mothurOutEndLine(); outputNames.push_back(filename); outputTypes["group"].push_back(filename); m->mothurOutEndLine(); //set group file as new current groupfile string current = ""; itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "MakeGroupCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/makegroupcommand.h000066400000000000000000000016321255543666200217430ustar00rootroot00000000000000#ifndef MAKEGROUPCOMMAND_H #define MAKEGROUPCOMMAND_H /* * makegroupcommand.h * Mothur * * Created by westcott on 5/7/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" class MakeGroupCommand : public Command { public: MakeGroupCommand(string); MakeGroupCommand(); ~MakeGroupCommand(){} vector setParameters(); string getCommandName() { return "make.group"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Make.group"; } string getDescription() { return "creates a group file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string fastaFileName, groups, outputDir, filename, output; vector fastaFileNames; vector groupsNames, outputNames; bool abort; }; #endif mothur-1.36.1/source/commands/makelefsecommand.cpp000066400000000000000000000572551255543666200222540ustar00rootroot00000000000000// // makelefse.cpp // Mothur // // Created by SarahsWork on 6/3/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "makelefsecommand.h" #include "designmap.h" //********************************************************************************************************************** vector MakeLefseCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "SharedRel", "SharedRel", "none","lefse",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "SharedRel", "SharedRel", "none","lefse",false,false,true); parameters.push_back(prelabund); CommandParameter pconstaxonomy("constaxonomy", "InputTypes", "", "", "none", "none", "none","",false,false,false); parameters.push_back(pconstaxonomy); CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","",false,false, true); parameters.push_back(pdesign); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pscale("scale", "Multiple", "totalgroup-totalotu-averagegroup-averageotu", "totalgroup", "", "", "","",false,false); parameters.push_back(pscale); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MakeLefseCommand::getHelpString(){ try { string helpString = ""; helpString += "The make.lefse command allows you to create a lefse formatted input file from mothur's output files.\n"; helpString += "The make.lefse command parameters are: shared, relabund, constaxonomy, design, scale, groups and label. The shared or relabund are required.\n"; helpString += "The shared parameter is used to input your shared file, http://www.wiki.mothur.org/wiki/Shared_file.\n"; helpString += "The relabund parameter is used to input your relabund file, http://www.wiki.mothur.org/wiki/Relabund_file.\n"; helpString += "The design parameter is used to input your design file, http://www.wiki.mothur.org/wiki/Design_File.\n"; helpString += "The constaxonomy parameter is used to input your taxonomy file. http://www.wiki.mothur.org/wiki/Constaxonomy_file. The contaxonomy file is the taxonomy file outputted by classify.otu(list=yourListfile, taxonomy=yourTaxonomyFile). Be SURE that the you are the constaxonomy file distance matches the shared file distance. ie, for *.0.03.cons.taxonomy set label=0.03. Mothur is smart enough to handle shared files that have been subsampled. \n"; helpString += "The scale parameter allows you to select what scale you would like to use to convert your shared file abundances to relative abundances. Choices are totalgroup, totalotu, averagegroup, averageotu, default is totalgroup.\n"; helpString += "The label parameter allows you to select what distance level you would like used, if none is given the first distance is used.\n"; helpString += "The make.lefse command should be in the following format: make.lefse(shared=yourSharedFile)\n"; helpString += "make.lefse(shared=final.an.shared)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MakeLefseCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "lefse") { pattern = "[filename],[distance],lefse"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MakeLefseCommand::MakeLefseCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["lefse"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "MakeLefseCommand"); exit(1); } } //********************************************************************************************************************** MakeLefseCommand::MakeLefseCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["lefse"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("constaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["constaxonomy"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } } //check for parameters designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { abort = true; } else if (designfile == "not found") { designfile = ""; } else { m->setDesignFile(designfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { m->setRelAbundFile(relabundfile); } constaxonomyfile = validParameter.validFile(parameters, "constaxonomy", true); if (constaxonomyfile == "not open") { constaxonomyfile = ""; abort = true; } else if (constaxonomyfile == "not found") { constaxonomyfile = ""; } label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } if ((relabundfile == "") && (sharedfile == "")) { //is there are current file available for either of these? //give priority to shared, then relabund sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared or relabund."); m->mothurOutEndLine(); abort = true; } } } if ((relabundfile != "") && (sharedfile != "")) { m->mothurOut("[ERROR]: You may not use both a shared and relabund file."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } scale = validParameter.validFile(parameters, "scale", false); if (scale == "not found") { scale = "totalgroup"; } if ((scale != "totalgroup") && (scale != "totalotu") && (scale != "averagegroup") && (scale != "averageotu")) { m->mothurOut(scale + " is not a valid scaling option for the get.relabund command. Choices are totalgroup, totalotu, averagegroup, averageotu."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "MakeLefseCommand"); exit(1); } } //********************************************************************************************************************** int MakeLefseCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } map consTax; if (constaxonomyfile != "") { m->readConsTax(constaxonomyfile, consTax); } if (m->control_pressed) { return 0; } if (sharedfile != "") { inputFile = sharedfile; vector lookup = getSharedRelabund(); runRelabund(consTax, lookup); }else { inputFile = relabundfile; vector lookup = getRelabund(); runRelabund(consTax, lookup); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "execute"); exit(1); } } //********************************************************************************************************************** int MakeLefseCommand::runRelabund(map& consTax, vector& lookup){ try { if (outputDir == "") { outputDir = m->hasPath(inputFile); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFile)); variables["[distance]"] = lookup[0]->getLabel(); string outputFile = getOutputFileName("lefse",variables); outputNames.push_back(outputFile); outputTypes["lefse"].push_back(outputFile); ofstream out; m->openOutputFile(outputFile, out); DesignMap* designMap = NULL; if (designfile != "") { designMap = new DesignMap(designfile); vector categories = designMap->getNamesOfCategories(); if (categories.size() > 3) { m->mothurOut("\n[NOTE]: LEfSe input files allow for a class, subclass and subject. More than 3 categories can cause formatting errors.\n\n"); } for (int j = 0; j < categories.size(); j++) { out << categories[j]; for (int i = 0; i < lookup.size()-1; i++) { if (m->control_pressed) { out.close(); delete designMap; return 0; } string value = designMap->get(lookup[i]->getGroup(), categories[j]); if (value == "not found") { m->mothurOut("[ERROR]: " + lookup[i]->getGroup() + " is not in your design file, please correct.\n"); m->control_pressed = true; }else { out << '\t' << value; } } string value = designMap->get(lookup[lookup.size()-1]->getGroup(), categories[j]); if (value == "not found") { m->mothurOut("[ERROR]: " + lookup[lookup.size()-1]->getGroup() + " is not in your design file, please correct.\n"); m->control_pressed = true; }else { out << '\t' << value; } out << endl; } } out << "group"; for (int i = 0; i < lookup.size(); i++) { out << '\t' << lookup[i]->getGroup(); } out << endl; for (int i = 0; i < lookup[0]->getNumBins(); i++) { //process each otu if (m->control_pressed) { break; } string nameOfOtu = m->currentSharedBinLabels[i]; if (constaxonomyfile != "") { //try to find the otuName in consTax to replace with consensus taxonomy int simpleLabel; m->mothurConvert(m->getSimpleLabel(nameOfOtu), simpleLabel); map::iterator it = consTax.find(simpleLabel); if (it != consTax.end()) { nameOfOtu = it->second.taxonomy; //add sanity check abundances here?? string fixedName = ""; //remove confidences and change ; to | m->removeConfidences(nameOfOtu); for (int j = 0; j < nameOfOtu.length()-1; j++) { if (nameOfOtu[j] == ';') { fixedName += "_" + m->currentSharedBinLabels[i] + '|'; } else { fixedName += nameOfOtu[j]; } } nameOfOtu = fixedName; }else { m->mothurOut("[ERROR]: can't find " + nameOfOtu + " in constaxonomy file. Do the distances match, did you forget to use the label parameter?\n"); m->control_pressed = true; } } //print name out << nameOfOtu; //print out relabunds for each otu for (int j = 0; j < lookup.size(); j++) { out << '\t' << lookup[j]->getAbundance(i); } out << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "execute"); exit(1); } } //********************************************************************************************************************** vector MakeLefseCommand::getSharedRelabund(){ try { InputData input(sharedfile, "sharedfile"); vector templookup = input.getSharedRAbundVectors(); string lastLabel = templookup[0]->getLabel(); vector lookup; if (label == "") { label = lastLabel; } else { //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((templookup[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { for (int i = 0; i < templookup.size(); i++) { delete templookup[i]; } return lookup; } if(labels.count(templookup[0]->getLabel()) == 1){ processedLabels.insert(templookup[0]->getLabel()); userLabels.erase(templookup[0]->getLabel()); break; } if ((m->anyLabelsToProcess(templookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = templookup[0]->getLabel(); for (int i = 0; i < templookup.size(); i++) { delete templookup[i]; } templookup = input.getSharedRAbundVectors(lastLabel); processedLabels.insert(templookup[0]->getLabel()); userLabels.erase(templookup[0]->getLabel()); //restore real lastlabel to save below templookup[0]->setLabel(saveLabel); break; } lastLabel = templookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < templookup.size(); i++) { delete templookup[i]; } templookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { for (int i = 0; i < templookup.size(); i++) { delete templookup[i]; } return lookup; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < templookup.size(); i++) { if (templookup[i] != NULL) { delete templookup[i]; } } templookup = input.getSharedRAbundVectors(lastLabel); } } for (int i = 0; i < templookup.size(); i++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(templookup[i]->getLabel()); temp->setGroup(templookup[i]->getGroup()); lookup.push_back(temp); } //convert to relabund for (int i = 0; i < templookup.size(); i++) { for (int j = 0; j < templookup[i]->getNumBins(); j++) { if (m->control_pressed) { for (int k = 0; k < templookup.size(); k++) { delete templookup[k]; } return lookup; } int abund = templookup[i]->getAbundance(j); float relabund = 0.0; if (scale == "totalgroup") { relabund = abund / (float) templookup[i]->getNumSeqs(); }else if (scale == "totalotu") { //calc the total in this otu int totalOtu = 0; for (int l = 0; l < templookup.size(); l++) { totalOtu += templookup[l]->getAbundance(j); } relabund = abund / (float) totalOtu; }else if (scale == "averagegroup") { relabund = abund / (float) (templookup[i]->getNumSeqs() / (float) templookup[i]->getNumBins()); }else if (scale == "averageotu") { //calc the total in this otu int totalOtu = 0; for (int l = 0; l < templookup.size(); l++) { totalOtu += templookup[l]->getAbundance(j); } float averageOtu = totalOtu / (float) templookup.size(); relabund = abund / (float) averageOtu; }else{ m->mothurOut(scale + " is not a valid scaling option."); m->mothurOutEndLine(); m->control_pressed = true; } lookup[i]->push_back(relabund, lookup[i]->getGroup()); } } for (int k = 0; k < templookup.size(); k++) { delete templookup[k]; } return lookup; } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "getSharedRelabund"); exit(1); } } //********************************************************************************************************************** vector MakeLefseCommand::getRelabund(){ try { InputData input(relabundfile, "relabund"); vector lookupFloat = input.getSharedRAbundFloatVectors(); string lastLabel = lookupFloat[0]->getLabel(); if (label == "") { label = lastLabel; return lookupFloat; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookupFloat[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return lookupFloat; } if(labels.count(lookupFloat[0]->getLabel()) == 1){ processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookupFloat[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookupFloat[0]->getLabel(); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat = input.getSharedRAbundFloatVectors(lastLabel); processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); //restore real lastlabel to save below lookupFloat[0]->setLabel(saveLabel); break; } lastLabel = lookupFloat[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat = input.getSharedRAbundFloatVectors(); } if (m->control_pressed) { return lookupFloat; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i] != NULL) { delete lookupFloat[i]; } } lookupFloat = input.getSharedRAbundFloatVectors(lastLabel); } return lookupFloat; } catch(exception& e) { m->errorOut(e, "MakeLefseCommand", "getRelabund"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/makelefsecommand.h000066400000000000000000000031321255543666200217020ustar00rootroot00000000000000// // makelefse.h // Mothur // // Created by SarahsWork on 6/3/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef __Mothur__makelefse__ #define __Mothur__makelefse__ #include "mothurout.h" #include "command.hpp" #include "inputdata.h" #include "sharedutilities.h" #include "phylosummary.h" /**************************************************************************************************/ class MakeLefseCommand : public Command { public: MakeLefseCommand(string); MakeLefseCommand(); ~MakeLefseCommand(){} vector setParameters(); string getCommandName() { return "make.lefse"; } string getCommandCategory() { return "General"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "http://huttenhower.sph.harvard.edu/galaxy/root?tool_id=lefse_upload http://www.mothur.org/wiki/Make.lefse"; } string getDescription() { return "creates LEfSe input file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, allLines, otulabel, hasGroupInfo; string outputDir; vector outputNames, Groups; string sharedfile, designfile, constaxonomyfile, relabundfile, scale, label, inputFile; int runRelabund(map&, vector&); vector getRelabund(); vector getSharedRelabund(); }; /**************************************************************************************************/ #endif /* defined(__Mothur__makelefse__) */ mothur-1.36.1/source/commands/makelookupcommand.cpp000066400000000000000000001040621255543666200224540ustar00rootroot00000000000000// // makelookupcommand.cpp // Mothur // // Created by SarahsWork on 5/14/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "makelookupcommand.h" //********************************************************************************************************************** vector MakeLookupCommand::setParameters(){ try { CommandParameter ptemplate("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(ptemplate); CommandParameter pflow("flow", "InputTypes", "", "", "none", "none", "none","lookup",false,true,true); parameters.push_back(pflow); CommandParameter perrors("error", "InputTypes", "", "", "none", "none", "none","none",false,true,true); parameters.push_back(perrors); CommandParameter pbarcode("barcode", "String", "", "AACCGTGTC", "", "", "","",false,false); parameters.push_back(pbarcode); CommandParameter pkey("key", "String", "", "TCAG", "", "", "","",false,false); parameters.push_back(pkey); CommandParameter pthreshold("threshold", "Number", "", "10000", "", "", "","",false,false); parameters.push_back(pthreshold); CommandParameter porder("order", "Multiple", "A-B-I", "A", "", "", "","",false,false, true); parameters.push_back(porder); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MakeLookupCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MakeLookupCommand::getHelpString(){ try { string helpString = ""; helpString += "The make.lookup command allows you to create custom lookup files for use with shhh.flows.\n"; helpString += "The make.lookup command parameters are: reference, flow, error, barcode, key, threshold and order.\n"; helpString += "The reference file needs to be in the same direction as the flow data and it must start with the forward primer sequence. It is required.\n"; helpString += "The flow parameter is used to provide the flow data. It is required.\n"; helpString += "The error parameter is used to provide the error summary. It is required.\n"; helpString += "The barcode parameter is used to provide the barcode sequence. Default=AACCGTGTC.\n"; helpString += "The key parameter is used to provide the key sequence. Default=TCAG.\n"; helpString += "The threshold parameter is ....Default=10000.\n"; helpString += "The order parameter options are A, B or I. Default=A. A = TACG and B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"; helpString += "The make.lookup should be in the following format: make.lookup(reference=HMP_MOCK.v53.fasta, flow=H3YD4Z101.mock3.flow_450.flow, error=H3YD4Z101.mock3.flow_450.error.summary, barcode=AACCTGGC)\n"; helpString += "new(...)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MakeLookupCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MakeLookupCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "lookup") { pattern = "[filename],lookup"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MakeLookupCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MakeLookupCommand::MakeLookupCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["lookup"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MakeLookupCommand", "MakeLookupCommand"); exit(1); } } //********************************************************************************************************************** MakeLookupCommand::MakeLookupCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["lookup"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("flow"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["flow"] = inputDir + it->second; } } it = parameters.find("error"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["error"] = inputDir + it->second; } } it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } } } //check for parameters errorFileName = validParameter.validFile(parameters, "error", true); if (errorFileName == "not open") { errorFileName = ""; abort = true; } else if (errorFileName == "not found") { errorFileName = ""; m->mothurOut("[ERROR]: error parameter is required."); m->mothurOutEndLine(); abort = true; } flowFileName = validParameter.validFile(parameters, "flow", true); if (flowFileName == "not open") { flowFileName = ""; abort = true; } else if (flowFileName == "not found") { flowFileName = ""; m->mothurOut("[ERROR]: flow parameter is required."); m->mothurOutEndLine(); abort = true; } else { m->setFlowFile(flowFileName); } refFastaFileName = validParameter.validFile(parameters, "reference", true); if (refFastaFileName == "not open") { abort = true; } else if (refFastaFileName == "not found") { refFastaFileName = ""; m->mothurOut("[ERROR]: reference parameter is required."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(flowFileName); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "threshold", false); if (temp == "not found"){ temp = "10000"; } m->mothurConvert(temp, thresholdCount); barcodeSequence = validParameter.validFile(parameters, "barcode", false); if (barcodeSequence == "not found"){ barcodeSequence = "AACCGTGTC"; } keySequence = validParameter.validFile(parameters, "key", false); if (keySequence == "not found"){ keySequence = "TCAG"; } temp = validParameter.validFile(parameters, "order", false); if (temp == "not found"){ temp = "A"; } if (temp.length() > 1) { m->mothurOut("[ERROR]: " + temp + " is not a valid option for order. order options are A, B, or I. A = TACG, B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC, and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"); abort=true; } else { if (toupper(temp[0]) == 'A') { flowOrder = "TACG"; } else if(toupper(temp[0]) == 'B'){ flowOrder = "TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC"; } else if(toupper(temp[0]) == 'I'){ flowOrder = "TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC"; } else { m->mothurOut("[ERROR]: " + temp + " is not a valid option for order. order options are A, B, or I. A = TACG, B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC, and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"); abort=true; } } } } catch(exception& e) { m->errorOut(e, "MakeLookupCommand", "MakeLookupCommand"); exit(1); } } //********************************************************************************************************************** int MakeLookupCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); double gapOpening = 10; int maxHomoP = 101; vector > penaltyMatrix; penaltyMatrix.resize(maxHomoP); for(int i=0;iopenInputFile(refFastaFileName, refFASTA); // * open reference sequence file map > refFlowgrams; while(!refFASTA.eof()){ if (m->control_pressed) { refFASTA.close(); return 0; } Sequence seq(refFASTA); m->gobble(refFASTA); if (m->debug) { m->mothurOut("[DEBUG]: seq = " + seq.getName() + ".\n"); } string fullSequence = keySequence + barcodeSequence + seq.getAligned(); // * concatenate the keySequence, barcodeSequence, and // referenceSequences refFlowgrams[seq.getName()] = convertSeqToFlow(fullSequence, flowOrder); // * translate concatenated sequences into flowgram } refFASTA.close(); vector > lookupTable; lookupTable.resize(1000); for(int i=0;i<1000;i++){ lookupTable[i].resize(11, 0); } if (m->debug) { m->mothurOut("[DEBUG]: here .\n"); } //Loop through each sequence in the flow file and the error summary file. ifstream flowFile; m->openInputFile(flowFileName, flowFile); int numFlows; flowFile >> numFlows; if (m->debug) { m->mothurOut("[DEBUG]: numflows = " + toString(numFlows) + ".\n"); } ifstream errorFile; m->openInputFile(errorFileName, errorFile); m->getline(errorFile); //grab headers string errorQuery, flowQuery, referenceName, dummy; string chimera; float intensity; vector std; std.resize(11, 0); while(errorFile && flowFile){ if (m->control_pressed) { errorFile.close(); flowFile.close(); return 0; } // * if it's chimeric, chuck it errorFile >> errorQuery >> referenceName; for(int i=2;i<40;i++){ errorFile >> dummy; } errorFile >> chimera; if(chimera == "2"){ m->getline(flowFile); } else{ flowFile >> flowQuery >> dummy; if(flowQuery != errorQuery){ cout << flowQuery << " != " << errorQuery << endl; } map >::iterator it = refFlowgrams.find(referenceName); // * compare sequence to its closest reference if (it == refFlowgrams.end()) { m->mothurOut("[WARNING]: missing reference flow " + referenceName + ", ignoring flow " + flowQuery + ".\n"); m->getline(flowFile); m->gobble(flowFile); }else { vector refFlow = it->second; vector flowgram; flowgram.resize(numFlows); if (m->debug) { m->mothurOut("[DEBUG]: flowQuery = " + flowQuery + ".\t" + "refName " + referenceName+ ".\n"); } for(int i=0;i> intensity; flowgram[i] = intensity;// (int)round(100 * intensity); } m->gobble(flowFile); if (m->debug) { m->mothurOut("[DEBUG]: before align.\n"); } alignFlowGrams(flowgram, refFlow, gapOpening, penaltyMatrix, flowOrder); if (m->debug) { m->mothurOut("[DEBUG]: after align.\n"); } if (m->control_pressed) { errorFile.close(); flowFile.close(); return 0; } for(int i=0;i 1000){count = 999;} if(abs(flowgram[i]-refFlow[i])<=0.50){ lookupTable[count][int(refFlow[i])]++; // * build table std[int(refFlow[i])] += (100*refFlow[i]-count)*(100*refFlow[i]-count); } } } } m->gobble(errorFile); m->gobble(flowFile); } errorFile.close(); flowFile.close(); //get probabilities vector counts; counts.resize(11, 0); int totalCount = 0; for(int i=0;i<1000;i++){ for(int j=0;j<11;j++){ counts[j] += lookupTable[i][j]; totalCount += lookupTable[i][j]; } } int N = 11; for(int i=0;i<11;i++){ if(counts[i] < thresholdCount){ N = i; break; } //bring back std[i] = sqrt(std[i]/(double)(counts[i])); //bring back } regress(std, N); //bring back if (m->control_pressed) { return 0; } double minProbability = 0.1 / (double)totalCount; //calculate the negative log probabilities of each intensity given the actual homopolymer length; impute with a guassian when counts are too low double sqrtTwoPi = 2.50662827463;//pow(2.0 * 3.14159, 0.5); for(int i=0;i<1000;i++){ if (m->control_pressed) { return 0; } for(int j=0;j minProbability){ lookupTable[i][j] = -log(normalProbability); } else{ lookupTable[i][j] = -log(minProbability); } } } //calculate the probability of each homopolymer length vector negLogHomoProb; negLogHomoProb.resize(11, 0.00); //bring back for(int i=0;icontrol_pressed) { return 0; } //output data table. column one is the probability of each homopolymer length map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(flowFileName)); string outputFile = getOutputFileName("lookup",variables); outputNames.push_back(outputFile); outputTypes["lookup"].push_back(outputFile); ofstream lookupFile; m->openOutputFile(outputFile, lookupFile); lookupFile.precision(8); for(int j=0;j<11;j++){ // lookupFile << counts[j]; lookupFile << showpoint << negLogHomoProb[j]; //bring back for(int i=0;i<1000;i++){ lookupFile << '\t' << lookupTable[i][j]; } lookupFile << endl; } lookupFile.close(); m->mothurOut("\nData for homopolymer lengths of " + toString(N) + " and longer were imputed for this analysis\n\n"); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MakeLookupCommand", "execute"); exit(1); } } //****************************************************************************************************************************** vector MakeLookupCommand::convertSeqToFlow(string sequence, string order){ try { int seqLength = (int)sequence.length(); int numFlows = (int)order.length(); vector flowgram; int orderIndex = 0; int sequenceIndex = 0; while(orderIndex < numFlows && sequenceIndex < seqLength){ if (m->control_pressed) { return flowgram; } int homopolymerLength = 1; char base = sequence[sequenceIndex]; while(base == sequence[sequenceIndex+1] && sequenceIndex < seqLength){ homopolymerLength++; sequenceIndex++; } sequenceIndex++; for(int i=orderIndex; ierrorOut(e, "MakeLookupCommand", "convertSeqToFlow"); exit(1); } } //****************************************************************************************************************************** int MakeLookupCommand::alignFlowGrams(vector& flowgram, vector& refFlow, double gapOpening, vector > penaltyMatrix, string flowOrder){ try { int numQueryFlows = (int)flowgram.size(); int numRefFlows = (int)refFlow.size(); //cout << numQueryFlows << '\t' << numRefFlows << endl; vector > scoreMatrix; scoreMatrix.resize(numQueryFlows+1); vector > directMatrix; directMatrix.resize(numQueryFlows+1); for(int i=0;i<=numQueryFlows;i++){ if (m->control_pressed) { return 0; } scoreMatrix[i].resize(numRefFlows+1, 0.00); directMatrix[i].resize(numRefFlows+1, 'x'); scoreMatrix[i][0] = i * gapOpening; directMatrix[i][0] = 'u'; } //cout << numQueryFlows << '\t' << numRefFlows << endl; for(int i=0;i<=numRefFlows;i++){ scoreMatrix[0][i] = i * gapOpening; directMatrix[0][i] = 'l'; } for(int i=1;i<=numQueryFlows;i++){ for(int j=1;j<=numRefFlows;j++){ if (m->control_pressed) { return 0; } double diagonal = 1000000000; if(flowOrder[i%flowOrder.length()] == flowOrder[j%flowOrder.length()]){ diagonal = scoreMatrix[i-1][j-1] + penaltyMatrix[round(flowgram[i-1])][refFlow[j-1]]; } double up = scoreMatrix[i-1][j] + gapOpening; double left = scoreMatrix[i][j-1] + gapOpening; double minScore = diagonal; char direction = 'd'; if(left < diagonal && left < up){ minScore = left; direction = 'l'; } else if(up < diagonal && up < left){ minScore = up; direction = 'u'; } scoreMatrix[i][j] = minScore; directMatrix[i][j] = direction; } } int minRowIndex = numQueryFlows; double minRowScore = scoreMatrix[numQueryFlows][numRefFlows]; for(int i=0;icontrol_pressed) { return 0; } if(scoreMatrix[i][numRefFlows] < minRowScore){ minRowScore = scoreMatrix[i][numRefFlows]; minRowIndex = i; } } int minColumnIndex = numRefFlows; double minColumnScore = scoreMatrix[numQueryFlows][numRefFlows]; for(int i=0;icontrol_pressed) { return 0; } if(scoreMatrix[numQueryFlows][i] < minColumnScore){ minColumnScore = scoreMatrix[numQueryFlows][i]; minColumnIndex = i; } } int i=minRowIndex; int j= minColumnIndex; vector newFlowgram; vector newRefFlowgram; while(i > 0 && j > 0){ if (m->control_pressed) { return 0; } if(directMatrix[i][j] == 'd'){ newFlowgram.push_back(flowgram[i-1]); newRefFlowgram.push_back(refFlow[j-1]); i--; j--; } else if(directMatrix[i][j] == 'l'){ newFlowgram.push_back(0); newRefFlowgram.push_back(refFlow[j-1]); j--; } else if(directMatrix[i][j] == 'u'){ newFlowgram.push_back(flowgram[i-1]); newRefFlowgram.push_back(0); i--; } } flowgram = newFlowgram; refFlow = newRefFlowgram; return 0; } catch(exception& e) { m->errorOut(e, "MakeLookupCommand", "alignFlowGrams"); exit(1); } } //****************************************************************************************************************************** int MakeLookupCommand::regress(vector& data, int N){ try { //fit data for larger values of N double xMean = 0; double yMean = 0; for(int i=1;icontrol_pressed) { return 0; } xMean += i; yMean += data[i]; } xMean /= (N-1); yMean /= (N-1); double numerator = 0; double denomenator = 0; for(int i=1;icontrol_pressed) { return 0; } numerator += (i-xMean)*(data[i] - yMean); denomenator += (i-xMean) * (i-xMean); } double slope = numerator / denomenator; double intercept = yMean - slope * xMean; for(int i=N;i<11;i++){ data[i] = intercept + i * slope; } return 0; } catch(exception& e) { m->errorOut(e, "MakeLookupCommand", "regress"); exit(1); } } //****************************************************************************************************************************** //********************************************************************************************************************** mothur-1.36.1/source/commands/makelookupcommand.h000066400000000000000000000034311255543666200221170ustar00rootroot00000000000000// // makelookupcommand.h // Mothur // // Created by SarahsWork on 5/14/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_makelookupcommand_h #define Mothur_makelookupcommand_h #include "command.hpp" #include "sequence.hpp" /**************************************************************************************************/ class MakeLookupCommand : public Command { public: MakeLookupCommand(string); MakeLookupCommand(); ~MakeLookupCommand(){} vector setParameters(); string getCommandName() { return "make.lookup"; } string getCommandCategory() { return "Sequence Processing"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "Quince, C., A. Lanzén, T. P. Curtis, R. J. Davenport, N. Hall, I. M. Head, L. F. Read, and W. T. Sloan. 2009. Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods 6:639-41. http://www.mothur.org/wiki/Make.lookup"; } string getDescription() { return "Creates a lookup file for use with shhh.flows using user-supplied mock community data and flow grams"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string outputDir, flowFileName, errorFileName, flowOrder, refFastaFileName, barcodeSequence, keySequence; vector outputNames; int thresholdCount; vector convertSeqToFlow(string sequence, string order); int alignFlowGrams(vector& flowgram, vector& refFlow, double gapOpening, vector > penaltyMatrix, string flowOrder); int regress(vector& data, int N); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/mantelcommand.cpp000066400000000000000000000251401255543666200215640ustar00rootroot00000000000000/* * mantelcommand.cpp * mothur * * Created by westcott on 2/9/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mantelcommand.h" #include "readphylipvector.h" //********************************************************************************************************************** vector MantelCommand::setParameters(){ try { CommandParameter pphylip1("phylip1", "InputTypes", "", "", "none", "none", "none","mantel",false,true,true); parameters.push_back(pphylip1); CommandParameter pphylip2("phylip2", "InputTypes", "", "", "none", "none", "none","mantel",false,true,true); parameters.push_back(pphylip2); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pmethod("method", "Multiple", "pearson-spearman-kendall", "pearson", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MantelCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MantelCommand::getHelpString(){ try { string helpString = ""; helpString += "Sokal, R. R., & Rohlf, F. J. (1995). Biometry, 3rd edn. New York: Freeman.\n"; helpString += "The mantel command reads two distance matrices and calculates the mantel correlation coefficient.\n"; helpString += "The mantel command parameters are phylip1, phylip2, iters and method. The phylip1 and phylip2 parameters are required. Matrices must be the same size and contain the same names.\n"; helpString += "The method parameter allows you to select what method you would like to use. Options are pearson, spearman and kendall. Default=pearson.\n"; helpString += "The iters parameter allows you to set number of randomization for the P value. The default is 1000. \n"; helpString += "The mantel command should be in the following format: mantel(phylip1=veg.dist, phylip2=env.dist).\n"; helpString += "The mantel command outputs a .mantel file.\n"; helpString += "Note: No spaces between parameter labels (i.e. phylip1), '=' and parameters (i.e. veg.dist).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MantelCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MantelCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "mantel") { pattern = "[filename],mantel"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MantelCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MantelCommand::MantelCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["mantel"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MantelCommand", "MantelCommand"); exit(1); } } //********************************************************************************************************************** MantelCommand::MantelCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["mantel"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip1"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip1"] = inputDir + it->second; } } it = parameters.find("phylip2"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip2"] = inputDir + it->second; } } } //check for required parameters phylipfile1 = validParameter.validFile(parameters, "phylip1", true); if (phylipfile1 == "not open") { phylipfile1 = ""; abort = true; } else if (phylipfile1 == "not found") { phylipfile1 = ""; m->mothurOut("phylip1 is a required parameter for the mantel command."); m->mothurOutEndLine(); abort = true; } phylipfile2 = validParameter.validFile(parameters, "phylip2", true); if (phylipfile2 == "not open") { phylipfile2 = ""; abort = true; } else if (phylipfile2 == "not found") { phylipfile2 = ""; m->mothurOut("phylip2 is a required parameter for the mantel command."); m->mothurOutEndLine(); abort = true; } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(phylipfile1); } method = validParameter.validFile(parameters, "method", false); if (method == "not found"){ method = "pearson"; } string temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); if ((method != "pearson") && (method != "spearman") && (method != "kendall")) { m->mothurOut(method + " is not a valid method. Valid methods are pearson, spearman, and kendall."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "MantelCommand", "MantelCommand"); exit(1); } } //********************************************************************************************************************** int MantelCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } /***************************************************/ // reading distance files // /***************************************************/ //read phylip1 ReadPhylipVector readMatrix(phylipfile1); vector< vector > matrix1; vector names1 = readMatrix.read(matrix1); if (m->control_pressed) { return 0; } //read phylip2 ReadPhylipVector readMatrix2(phylipfile2); vector< vector > matrix2; vector names2 = readMatrix2.read(matrix2); if (m->control_pressed) { return 0; } //make sure matrix2 and matrix1 are in the same order if (names1 == names2) { //then everything is in same order and same size }else if (names1.size() != names2.size()) { //wrong size no need to order, abort m->mothurOut("[ERROR]: distance matrices are not the same size, aborting."); m->mothurOutEndLine(); m->control_pressed = true; }else { //sizes are the same, but either the names are different or they are in different order m->mothurOut("[WARNING]: Names do not match between distance files. Comparing based on order in files."); m->mothurOutEndLine(); } if (m->control_pressed) { return 0; } /***************************************************/ // calculating mantel and signifigance // /***************************************************/ //calc mantel coefficient LinearAlgebra linear; double mantel = 0.0; if (method == "pearson") { mantel = linear.calcPearson(matrix1, matrix2); } else if (method == "spearman") { mantel = linear.calcSpearman(matrix1, matrix2); } else if (method == "kendall") { mantel = linear.calcKendall(matrix1, matrix2); } //calc signifigance int count = 0; for (int i = 0; i < iters; i++) { if (m->control_pressed) { return 0; } //randomize matrix2 vector< vector > matrix2Copy = matrix2; random_shuffle(matrix2Copy.begin(), matrix2Copy.end()); //calc random mantel double randomMantel = 0.0; if (method == "pearson") { randomMantel = linear.calcPearson(matrix1, matrix2Copy); } else if (method == "spearman") { randomMantel = linear.calcSpearman(matrix1, matrix2Copy); } else if (method == "kendall") { randomMantel = linear.calcKendall(matrix1, matrix2Copy); } if (randomMantel >= mantel) { count++; } } double pValue = count / (float) iters; if (m->control_pressed) { return 0; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(phylipfile1)); string outputFile = getOutputFileName("mantel",variables); outputNames.push_back(outputFile); outputTypes["mantel"].push_back(outputFile); ofstream out; m->openOutputFile(outputFile, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); out << "Mantel\tpValue" << endl; out << mantel << '\t' << pValue << endl; out.close(); cout << "\nmantel = " << mantel << "\tpValue = " << pValue << endl; m->mothurOutJustToLog("\nmantel = " + toString(mantel) + "\tpValue = " + toString(pValue) + "\n"); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MantelCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/mantelcommand.h000066400000000000000000000021241255543666200212260ustar00rootroot00000000000000#ifndef MANTELCOMMAND_H #define MANTELCOMMAND_H /* * mantelcommand.h * mothur * * Created by westcott on 2/9/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "linearalgebra.h" class MantelCommand : public Command { public: MantelCommand(string); MantelCommand(); ~MantelCommand(){} vector setParameters(); string getCommandName() { return "mantel"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "McCune B, Grace JB, Urban DL (2002). Analysis of ecological communities. MjM Software Design: Gleneden Beach, OR. \nLegendre P, Legendre L (1998). Numerical Ecology. Elsevier: New York. \nhttp://www.mothur.org/wiki/Mantel"; } string getDescription() { return "Mantel’s test for correlation between matrices"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string phylipfile1, phylipfile2, outputDir, method; bool abort; int iters; vector outputNames; }; #endif mothur-1.36.1/source/commands/matrixoutputcommand.cpp000066400000000000000000001241771255543666200231030ustar00rootroot00000000000000/* * matrixoutputcommand.cpp * Mothur * * Created by Sarah Westcott on 5/20/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "matrixoutputcommand.h" #include "subsample.h" //********************************************************************************************************************** vector MatrixOutputCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","phylip",false,true,true); parameters.push_back(pshared); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter psubsample("subsample", "String", "", "", "", "", "","",false,false); parameters.push_back(psubsample); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pcalc("calc", "Multiple", "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan-kstest-sharednseqs-ochiai-anderberg-kulczynski-kulczynskicody-lennon-morisitahorn-braycurtis-whittaker-odum-canberra-structeuclidean-structchord-hellinger-manhattan-structpearson-soergel-spearman-structkulczynski-speciesprofile-hamming-structchi2-gower-memchi2-memchord-memeuclidean-mempearson-jsd-rjsd", "jclass-thetayc", "", "", "","",true,false,true); parameters.push_back(pcalc); CommandParameter poutput("output", "Multiple", "lt-square-column", "lt", "", "", "","",false,false); parameters.push_back(poutput); CommandParameter pmode("mode", "Multiple", "average-median", "average", "", "", "","",false,false); parameters.push_back(pmode); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MatrixOutputCommand::getHelpString(){ try { string helpString = ""; ValidCalculators validCalculator; helpString += "The dist.shared command parameters are shared, groups, calc, output, processors, subsample, iters, mode, and label. shared is a required, unless you have a valid current file.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included used.\n"; helpString += "The group names are separated by dashes. The label parameter allows you to select what distance levels you would like distance matrices created for, and is also separated by dashes.\n"; helpString += "The iters parameter allows you to choose the number of times you would like to run the subsample.\n"; helpString += "The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group.\n"; helpString += "The dist.shared command should be in the following format: dist.shared(groups=yourGroups, calc=yourCalcs, label=yourLabels).\n"; helpString += "The output parameter allows you to specify format of your distance matrix. Options are lt, column and square. The default is lt.\n"; helpString += "The mode parameter allows you to specify if you want the average or the median values reported when subsampling. Options are average, and median. The default is average.\n"; helpString += "Example dist.shared(groups=A-B-C, calc=jabund-sorabund).\n"; helpString += "The default value for groups is all the groups in your groupfile.\n"; helpString += "The default value for calc is jclass and thetayc.\n"; helpString += validCalculator.printCalc("matrix"); helpString += "The dist.shared command outputs a .dist file for each calculator you specify at each distance you choose.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MatrixOutputCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "phylip") { pattern = "[filename],[calc],[distance],[outputtag],dist-[filename],[calc],[distance],[outputtag],[tag2],dist"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MatrixOutputCommand::MatrixOutputCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["phylip"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "MatrixOutputCommand"); exit(1); } } //********************************************************************************************************************** MatrixOutputCommand::MatrixOutputCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["phylip"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else if (sharedfile == "not open") { sharedfile = ""; abort = true; } else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(sharedfile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } output = validParameter.validFile(parameters, "output", false); if(output == "not found"){ output = "lt"; } if ((output != "lt") && (output != "square") && (output != "column")) { m->mothurOut(output + " is not a valid output form. Options are lt, column and square. I will use lt."); m->mothurOutEndLine(); output = "lt"; } mode = validParameter.validFile(parameters, "mode", false); if(mode == "not found"){ mode = "average"; } if ((mode != "average") && (mode != "median")) { m->mothurOut(mode + " is not a valid mode. Options are average and medina. I will use average."); m->mothurOutEndLine(); output = "average"; } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "jclass-thetayc"; } else { if (calc == "default") { calc = "jclass-thetayc"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "subsample", false); if (temp == "not found") { temp = "F"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); subsample = true; } else { if (m->isTrue(temp)) { subsample = true; subsampleSize = -1; } //we will set it to smallest group later else { subsample = false; } } if (subsample == false) { iters = 0; } if (abort == false) { ValidCalculators validCalculator; int i; for (i=0; ierrorOut(e, "MatrixOutputCommand", "MatrixOutputCommand"); exit(1); } } //********************************************************************************************************************** MatrixOutputCommand::~MatrixOutputCommand(){} //********************************************************************************************************************** int MatrixOutputCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //if the users entered no valid calculators don't execute command if (matrixCalculators.size() == 0) { m->mothurOut("No valid calculators."); m->mothurOutEndLine(); return 0; } input = new InputData(sharedfile, "sharedfile"); lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (lookup.size() < 2) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); delete input; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0;} if (subsample) { if (subsampleSize == -1) { //user has not set size, set size = smallest samples size subsampleSize = lookup[0]->getNumSeqs(); for (int i = 1; i < lookup.size(); i++) { int thisSize = lookup[i]->getNumSeqs(); if (thisSize < subsampleSize) { subsampleSize = thisSize; } } }else { m->clearGroups(); Groups.clear(); vector temp; for (int i = 0; i < lookup.size(); i++) { if (lookup[i]->getNumSeqs() < subsampleSize) { m->mothurOut(lookup[i]->getGroup() + " contains " + toString(lookup[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete lookup[i]; }else { Groups.push_back(lookup[i]->getGroup()); temp.push_back(lookup[i]); } } lookup = temp; m->setGroups(Groups); } if (lookup.size() < 2) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); m->control_pressed = true; delete input; return 0; } } numGroups = lookup.size(); lines.resize(processors); for (int i = 0; i < processors; i++) { lines[i].start = int (sqrt(float(i)/float(processors)) * numGroups); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numGroups); } if (m->control_pressed) { delete input; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); return 0; } //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { outputTypes.clear(); delete input; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } m->clearGroups(); return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //get next line to process for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { outputTypes.clear(); delete input; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } m->clearGroups(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } if (m->control_pressed) { outputTypes.clear(); delete input; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } m->clearGroups(); return 0; } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } if (m->control_pressed) { outputTypes.clear(); delete input; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } m->clearGroups(); return 0; } //reset groups parameter m->clearGroups(); //set phylip file as new current phylipfile string current = ""; itTypes = outputTypes.find("phylip"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; if (!subsample) { m->setPhylipFile(current); } } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "execute"); exit(1); } } /***********************************************************/ void MatrixOutputCommand::printSims(ostream& out, vector< vector >& simMatrix) { try { out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); if (output == "lt") { out << simMatrix.size() << endl; for (int b = 0; b < simMatrix.size(); b++) { out << lookup[b]->getGroup(); for (int n = 0; n < b; n++) { out << '\t' << simMatrix[b][n]; } out << endl; } }else if (output == "column") { for (int b = 0; b < simMatrix.size(); b++) { for (int n = 0; n < b; n++) { out << lookup[b]->getGroup() << '\t' << lookup[n]->getGroup() << '\t' << simMatrix[b][n] << endl; } } }else{ out << simMatrix.size() << endl; for (int b = 0; b < simMatrix.size(); b++) { out << lookup[b]->getGroup(); for (int n = 0; n < simMatrix[b].size(); n++) { out << '\t' << simMatrix[b][n]; } out << endl; } } } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "printSims"); exit(1); } } /***********************************************************/ int MatrixOutputCommand::process(vector thisLookup){ try { vector< vector< vector > > calcDistsTotals; //each iter, one for each calc, then each groupCombos dists. this will be used to make .dist files vector< vector > calcDists; calcDists.resize(matrixCalculators.size()); for (int thisIter = 0; thisIter < iters+1; thisIter++) { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = thisLookup[0]->getLabel(); variables["[tag2]"] = ""; vector thisItersLookup = thisLookup; if (subsample && (thisIter != 0)) { SubSample sample; vector tempLabels; //dont need since we arent printing the sampled sharedRabunds //make copy of lookup so we don't get access violations vector newLookup; for (int k = 0; k < thisItersLookup.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisItersLookup[k]->getLabel()); temp->setGroup(thisItersLookup[k]->getGroup()); newLookup.push_back(temp); } //for each bin for (int k = 0; k < thisItersLookup[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } for (int j = 0; j < thisItersLookup.size(); j++) { newLookup[j]->push_back(thisItersLookup[j]->getAbundance(k), thisItersLookup[j]->getGroup()); } } tempLabels = sample.getSample(newLookup, subsampleSize); thisItersLookup = newLookup; } if(processors == 1){ driver(thisItersLookup, 0, numGroups, calcDists); }else{ int process = 1; vector processIDS; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ driver(thisItersLookup, lines[process].start, lines[process].end, calcDists); string tempdistFileName = m->getRootName(m->getSimpleName(sharedfile)) + m->mothurGetpid(process) + ".dist"; ofstream outtemp; m->openOutputFile(tempdistFileName, outtemp); for (int i = 0; i < calcDists.size(); i++) { outtemp << calcDists[i].size() << endl; for (int j = 0; j < calcDists[i].size(); j++) { outtemp << calcDists[i][j].seq1 << '\t' << calcDists[i][j].seq2 << '\t' << calcDists[i][j].dist << endl; } } outtemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(m->getRootName(m->getSimpleName(sharedfile)) + m->mothurGetpid(process) + ".dist"); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(m->getRootName(m->getSimpleName(sharedfile)) + m->mothurGetpid(process) + ".dist");}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); lines.resize(processors); for (int i = 0; i < processors; i++) { lines[i].start = int (sqrt(float(i)/float(processors)) * numGroups); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numGroups); } processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ driver(thisItersLookup, lines[process].start, lines[process].end, calcDists); string tempdistFileName = m->getRootName(m->getSimpleName(sharedfile)) + m->mothurGetpid(process) + ".dist"; ofstream outtemp; m->openOutputFile(tempdistFileName, outtemp); for (int i = 0; i < calcDists.size(); i++) { outtemp << calcDists[i].size() << endl; for (int j = 0; j < calcDists[i].size(); j++) { outtemp << calcDists[i][j].seq1 << '\t' << calcDists[i][j].seq2 << '\t' << calcDists[i][j].dist << endl; } } outtemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //parent do your part driver(thisItersLookup, lines[0].start, lines[0].end, calcDists); //force parent to wait until all the processes are done for (int i = 0; i < processIDS.size(); i++) { int temp = processIDS[i]; wait(&temp); } for (int i = 0; i < processIDS.size(); i++) { string tempdistFileName = m->getRootName(m->getSimpleName(sharedfile)) + toString(processIDS[i]) + ".dist"; ifstream intemp; m->openInputFile(tempdistFileName, intemp); for (int k = 0; k < calcDists.size(); k++) { int size = 0; intemp >> size; m->gobble(intemp); for (int j = 0; j < size; j++) { int seq1 = 0; int seq2 = 0; float dist = 1.0; intemp >> seq1 >> seq2 >> dist; m->gobble(intemp); seqDist tempDist(seq1, seq2, dist); calcDists[k].push_back(tempDist); } } intemp.close(); m->mothurRemove(tempdistFileName); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the distSharedData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to pass results vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; i newLookup; for (int k = 0; k < thisItersLookup.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisItersLookup[k]->getLabel()); temp->setGroup(thisItersLookup[k]->getGroup()); newLookup.push_back(temp); } //for each bin for (int k = 0; k < thisItersLookup[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } for (int j = 0; j < thisItersLookup.size(); j++) { newLookup[j]->push_back(thisItersLookup[j]->getAbundance(k), thisItersLookup[j]->getGroup()); } } // Allocate memory for thread data. distSharedData* tempSum = new distSharedData(m, lines[i].start, lines[i].end, Estimators, newLookup); pDataArray.push_back(tempSum); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyDistSharedThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //parent do your part driver(thisItersLookup, lines[0].start, lines[0].end, calcDists); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ if (pDataArray[i]->count != (pDataArray[i]->end-pDataArray[i]->start)) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end-pDataArray[i]->start) + " groups assigned to it, quitting. \n"); m->control_pressed = true; } for (int j = 0; j < pDataArray[i]->thisLookup.size(); j++) { delete pDataArray[i]->thisLookup[j]; } for (int k = 0; k < calcDists.size(); k++) { int size = pDataArray[i]->calcDists[k].size(); for (int j = 0; j < size; j++) { calcDists[k].push_back(pDataArray[i]->calcDists[k][j]); } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif } if (subsample && (thisIter != 0)) { if((thisIter) % 100 == 0){ m->mothurOutJustToScreen(toString(thisIter)+"\n"); } calcDistsTotals.push_back(calcDists); for (int i = 0; i < calcDists.size(); i++) { for (int j = 0; j < calcDists[i].size(); j++) { if (m->debug) { m->mothurOut("[DEBUG]: Results: iter = " + toString(thisIter) + ", " + thisLookup[calcDists[i][j].seq1]->getGroup() + " - " + thisLookup[calcDists[i][j].seq2]->getGroup() + " distance = " + toString(calcDists[i][j].dist) + ".\n"); } } } //clean up memory for (int i = 0; i < thisItersLookup.size(); i++) { delete thisItersLookup[i]; } thisItersLookup.clear(); }else { //print results for whole dataset for (int i = 0; i < calcDists.size(); i++) { if (m->control_pressed) { break; } //initialize matrix vector< vector > matrix; //square matrix to represent the distance matrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { matrix[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcDists[i].size(); j++) { int row = calcDists[i][j].seq1; int column = calcDists[i][j].seq2; double dist = calcDists[i][j].dist; matrix[row][column] = dist; matrix[column][row] = dist; } variables["[outputtag]"] = output; variables["[calc]"] = matrixCalculators[i]->getName(); string distFileName = getOutputFileName("phylip",variables); outputNames.push_back(distFileName); outputTypes["phylip"].push_back(distFileName); ofstream outDist; m->openOutputFile(distFileName, outDist); outDist.setf(ios::fixed, ios::floatfield); outDist.setf(ios::showpoint); printSims(outDist, matrix); outDist.close(); } } for (int i = 0; i < calcDists.size(); i++) { calcDists[i].clear(); } } if (iters != 0) { //we need to find the average distance and standard deviation for each groups distance vector< vector > calcAverages = m->getAverages(calcDistsTotals, mode); //find standard deviation vector< vector > stdDev = m->getStandardDeviation(calcDistsTotals, calcAverages); //print results for (int i = 0; i < calcDists.size(); i++) { vector< vector > matrix; //square matrix to represent the distance matrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { matrix[k].resize(thisLookup.size(), 0.0); } vector< vector > stdmatrix; //square matrix to represent the stdDev stdmatrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { stdmatrix[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcAverages[i].size(); j++) { int row = calcAverages[i][j].seq1; int column = calcAverages[i][j].seq2; float dist = calcAverages[i][j].dist; float stdDist = stdDev[i][j].dist; matrix[row][column] = dist; matrix[column][row] = dist; stdmatrix[row][column] = stdDist; stdmatrix[column][row] = stdDist; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = thisLookup[0]->getLabel(); variables["[outputtag]"] = output; variables["[tag2]"] = "ave"; variables["[calc]"] = matrixCalculators[i]->getName(); string distFileName = getOutputFileName("phylip",variables); outputNames.push_back(distFileName); outputTypes["phylip"].push_back(distFileName); //set current phylip file to average distance matrix m->setPhylipFile(distFileName); ofstream outAve; m->openOutputFile(distFileName, outAve); outAve.setf(ios::fixed, ios::floatfield); outAve.setf(ios::showpoint); printSims(outAve, matrix); outAve.close(); variables["[tag2]"] = "std"; distFileName = getOutputFileName("phylip",variables); outputNames.push_back(distFileName); outputTypes["phylip"].push_back(distFileName); ofstream outSTD; m->openOutputFile(distFileName, outSTD); outSTD.setf(ios::fixed, ios::floatfield); outSTD.setf(ios::showpoint); printSims(outSTD, stdmatrix); outSTD.close(); } } return 0; } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "process"); exit(1); } } /**************************************************************************************************/ int MatrixOutputCommand::driver(vector thisLookup, int start, int end, vector< vector >& calcDists) { try { vector subset; for (int k = start; k < end; k++) { // pass cdd each set of groups to compare for (int l = 0; l < k; l++) { if (k != l) { //we dont need to similiarity of a groups to itself subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabunds subset.push_back(thisLookup[k]); subset.push_back(thisLookup[l]); for(int i=0;igetNeedsAll()) { //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < thisLookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(thisLookup[w]); } } } vector tempdata = matrixCalculators[i]->getValues(subset); //saves the calculator outputs if (m->control_pressed) { return 1; } seqDist temp(l, k, tempdata[0]); calcDists[i].push_back(temp); } } } } return 0; } catch(exception& e) { m->errorOut(e, "MatrixOutputCommand", "driver"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/commands/matrixoutputcommand.h000066400000000000000000000266221255543666200225440ustar00rootroot00000000000000#ifndef MATRIXOUTPUTCOMMAND_H #define MATRIXOUTPUTCOMMAND_H /* * matrixoutputcommand.h * Mothur * * Created by Sarah Westcott on 5/20/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "groupmap.h" #include "validcalculator.h" #include "sharedsobscollectsummary.h" #include "sharedchao1.h" #include "sharedace.h" #include "sharednseqs.h" #include "sharedjabund.h" #include "sharedsorabund.h" #include "sharedjclass.h" #include "sharedsorclass.h" #include "sharedjest.h" #include "sharedsorest.h" #include "sharedthetayc.h" #include "sharedthetan.h" #include "sharedkstest.h" #include "whittaker.h" #include "sharedochiai.h" #include "sharedanderbergs.h" #include "sharedkulczynski.h" #include "sharedkulczynskicody.h" #include "sharedlennon.h" #include "sharedmorisitahorn.h" #include "sharedbraycurtis.h" #include "sharedjackknife.h" #include "whittaker.h" #include "odum.h" #include "canberra.h" #include "structeuclidean.h" #include "structchord.h" #include "hellinger.h" #include "manhattan.h" #include "structpearson.h" #include "soergel.h" #include "spearman.h" #include "structkulczynski.h" #include "structchi2.h" #include "speciesprofile.h" #include "hamming.h" #include "gower.h" #include "memchi2.h" #include "memchord.h" #include "memeuclidean.h" #include "mempearson.h" #include "sharedjsd.h" #include "sharedrjsd.h" // aka. dist.shared() /* This command create a tree file for each similarity calculator at distance level, using various calculators to find the similiarity between groups. The user can select the labels they wish to use as well as the groups they would like included. They can also use as many or as few calculators as they wish. */ class MatrixOutputCommand : public Command { public: MatrixOutputCommand(string); MatrixOutputCommand(); ~MatrixOutputCommand(); vector setParameters(); string getCommandName() { return "dist.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Dist.shared"; } string getDescription() { return "generate a distance matrix that describes the dissimilarity among multiple groups"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lines; void printSims(ostream&, vector< vector >&); int process(vector); vector matrixCalculators; //vector< vector > simMatrix; InputData* input; vector lookup; string exportFileName, output, sharedfile; int numGroups, processors, iters, subsampleSize; ofstream out; bool abort, allLines, subsample; set labels; //holds labels to be used string outputFile, calc, groups, label, outputDir, mode; vector Estimators, Groups, outputNames; //holds estimators to be used int process(vector, string, string); int driver(vector, int, int, vector< vector >&); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct distSharedData { vector thisLookup; vector< vector > calcDists; vector Estimators; unsigned long long start; unsigned long long end; MothurOut* m; int count; distSharedData(){} distSharedData(MothurOut* mout, unsigned long long st, unsigned long long en, vector est, vector lu) { m = mout; start = st; end = en; Estimators = est; thisLookup = lu; count = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyDistSharedThreadFunction(LPVOID lpParam){ distSharedData* pDataArray; pDataArray = (distSharedData*)lpParam; try { vector matrixCalculators; ValidCalculators validCalculator; for (int i=0; iEstimators.size(); i++) { if (validCalculator.isValidCalculator("matrix", pDataArray->Estimators[i]) == true) { if (pDataArray->Estimators[i] == "sharedsobs") { matrixCalculators.push_back(new SharedSobsCS()); }else if (pDataArray->Estimators[i] == "sharedchao") { matrixCalculators.push_back(new SharedChao1()); }else if (pDataArray->Estimators[i] == "sharedace") { matrixCalculators.push_back(new SharedAce()); }else if (pDataArray->Estimators[i] == "jabund") { matrixCalculators.push_back(new JAbund()); }else if (pDataArray->Estimators[i] == "sorabund") { matrixCalculators.push_back(new SorAbund()); }else if (pDataArray->Estimators[i] == "jclass") { matrixCalculators.push_back(new Jclass()); }else if (pDataArray->Estimators[i] == "sorclass") { matrixCalculators.push_back(new SorClass()); }else if (pDataArray->Estimators[i] == "jest") { matrixCalculators.push_back(new Jest()); }else if (pDataArray->Estimators[i] == "sorest") { matrixCalculators.push_back(new SorEst()); }else if (pDataArray->Estimators[i] == "thetayc") { matrixCalculators.push_back(new ThetaYC()); }else if (pDataArray->Estimators[i] == "thetan") { matrixCalculators.push_back(new ThetaN()); }else if (pDataArray->Estimators[i] == "kstest") { matrixCalculators.push_back(new KSTest()); }else if (pDataArray->Estimators[i] == "sharednseqs") { matrixCalculators.push_back(new SharedNSeqs()); }else if (pDataArray->Estimators[i] == "ochiai") { matrixCalculators.push_back(new Ochiai()); }else if (pDataArray->Estimators[i] == "anderberg") { matrixCalculators.push_back(new Anderberg()); }else if (pDataArray->Estimators[i] == "kulczynski") { matrixCalculators.push_back(new Kulczynski()); }else if (pDataArray->Estimators[i] == "kulczynskicody") { matrixCalculators.push_back(new KulczynskiCody()); }else if (pDataArray->Estimators[i] == "lennon") { matrixCalculators.push_back(new Lennon()); }else if (pDataArray->Estimators[i] == "morisitahorn") { matrixCalculators.push_back(new MorHorn()); }else if (pDataArray->Estimators[i] == "braycurtis") { matrixCalculators.push_back(new BrayCurtis()); }else if (pDataArray->Estimators[i] == "whittaker") { matrixCalculators.push_back(new Whittaker()); }else if (pDataArray->Estimators[i] == "odum") { matrixCalculators.push_back(new Odum()); }else if (pDataArray->Estimators[i] == "canberra") { matrixCalculators.push_back(new Canberra()); }else if (pDataArray->Estimators[i] == "structeuclidean") { matrixCalculators.push_back(new StructEuclidean()); }else if (pDataArray->Estimators[i] == "structchord") { matrixCalculators.push_back(new StructChord()); }else if (pDataArray->Estimators[i] == "hellinger") { matrixCalculators.push_back(new Hellinger()); }else if (pDataArray->Estimators[i] == "manhattan") { matrixCalculators.push_back(new Manhattan()); }else if (pDataArray->Estimators[i] == "structpearson") { matrixCalculators.push_back(new StructPearson()); }else if (pDataArray->Estimators[i] == "soergel") { matrixCalculators.push_back(new Soergel()); }else if (pDataArray->Estimators[i] == "spearman") { matrixCalculators.push_back(new Spearman()); }else if (pDataArray->Estimators[i] == "structkulczynski") { matrixCalculators.push_back(new StructKulczynski()); }else if (pDataArray->Estimators[i] == "speciesprofile") { matrixCalculators.push_back(new SpeciesProfile()); }else if (pDataArray->Estimators[i] == "hamming") { matrixCalculators.push_back(new Hamming()); }else if (pDataArray->Estimators[i] == "structchi2") { matrixCalculators.push_back(new StructChi2()); }else if (pDataArray->Estimators[i] == "gower") { matrixCalculators.push_back(new Gower()); }else if (pDataArray->Estimators[i] == "memchi2") { matrixCalculators.push_back(new MemChi2()); }else if (pDataArray->Estimators[i] == "memchord") { matrixCalculators.push_back(new MemChord()); }else if (pDataArray->Estimators[i] == "memeuclidean") { matrixCalculators.push_back(new MemEuclidean()); }else if (pDataArray->Estimators[i] == "mempearson") { matrixCalculators.push_back(new MemPearson()); }else if (pDataArray->Estimators[i] == "jsd") { matrixCalculators.push_back(new JSD()); }else if (pDataArray->Estimators[i] == "rjsd") { matrixCalculators.push_back(new RJSD()); } } } pDataArray->calcDists.resize(matrixCalculators.size()); vector subset; for (int k = pDataArray->start; k < pDataArray->end; k++) { // pass cdd each set of groups to compare pDataArray->count++; for (int l = 0; l < k; l++) { if (k != l) { //we dont need to similiarity of a groups to itself subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabunds subset.push_back(pDataArray->thisLookup[k]); subset.push_back(pDataArray->thisLookup[l]); for(int i=0;igetNeedsAll()) { //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < pDataArray->thisLookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(pDataArray->thisLookup[w]); } } } vector tempdata = matrixCalculators[i]->getValues(subset); //saves the calculator outputs if (pDataArray->m->control_pressed) { return 1; } seqDist temp(l, k, tempdata[0]); pDataArray->calcDists[i].push_back(temp); } } } } for(int i=0;im->errorOut(e, "MatrixOutputCommand", "MyDistSharedThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/mergefilecommand.cpp000066400000000000000000000135131255543666200222440ustar00rootroot00000000000000/* * mergefilecommand.cpp * Mothur * * Created by Pat Schloss on 6/14/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "mergefilecommand.h" //********************************************************************************************************************** vector MergeFileCommand::setParameters(){ try { CommandParameter pinput("input", "String", "", "", "", "", "","",false,true,true); parameters.push_back(pinput); CommandParameter poutput("output", "String", "", "", "", "", "","",false,true,true); parameters.push_back(poutput); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MergeFileCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MergeFileCommand::getHelpString(){ try { string helpString = ""; helpString += "The merge.file command takes a list of files separated by dashes and merges them into one file."; helpString += "The merge.file command parameters are input and output."; helpString += "Example merge.file(input=small.fasta-large.fasta, output=all.fasta)."; helpString += "Note: No spaces between parameter labels (i.e. output), '=' and parameters (i.e.yourOutputFileName).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MergeFileCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** MergeFileCommand::MergeFileCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["merge"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MergeFileCommand", "MergeFileCommand"); exit(1); } } //********************************************************************************************************************** MergeFileCommand::MergeFileCommand(string option) { try { abort = false; calledHelp = false; if(option == "help") { help(); abort = true; calledHelp = true; }else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["merge"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } string fileList = validParameter.validFile(parameters, "input", false); if(fileList == "not found") { m->mothurOut("you must enter two or more file names"); m->mothurOutEndLine(); abort=true; } else{ m->splitAtDash(fileList, fileNames); } //if the user changes the output directory command factory will send this info to us in the output parameter string outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found") { outputDir = ""; } numInputFiles = fileNames.size(); ifstream testFile; if(numInputFiles == 0){ m->mothurOut("you must enter two or more file names and you entered " + toString(fileNames.size()) + " file names"); m->mothurOutEndLine(); abort=true; } else{ for(int i=0;ihasPath(fileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fileNames[i] = inputDir + fileNames[i]; } } if(m->openInputFile(fileNames[i], testFile)){ abort = true; } testFile.close(); } } outputFileName = validParameter.validFile(parameters, "output", false); if (outputFileName == "not found") { m->mothurOut("you must enter an output file name"); m->mothurOutEndLine(); abort=true; } else if (outputDir != "") { outputFileName = outputDir + m->getSimpleName(outputFileName); } } } catch(exception& e) { m->errorOut(e, "MergeFileCommand", "MergeFileCommand"); exit(1); } } //********************************************************************************************************************** int MergeFileCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } m->mothurRemove(outputFileName); for(int i=0;iappendFiles(fileNames[i], outputFileName); } if (m->control_pressed) { m->mothurRemove(outputFileName); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFileName); m->mothurOutEndLine(); outputNames.push_back(outputFileName); outputTypes["merge"].push_back(outputFileName); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MergeFileCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/mergefilecommand.h000066400000000000000000000016311255543666200217070ustar00rootroot00000000000000#ifndef MERGEFILECOMMAND_H #define MERGEFILECOMMAND_H /* * mergefilecommand.h * Mothur * * Created by Pat Schloss on 6/14/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "command.hpp" class MergeFileCommand : public Command { public: MergeFileCommand(string); MergeFileCommand(); ~MergeFileCommand(){} vector setParameters(); string getCommandName() { return "merge.files"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string){ return ""; } string getCitation() { return "http://www.mothur.org/wiki/Merge.files"; } string getDescription() { return "appends files creating one file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector fileNames, outputNames; string outputFileName; int numInputFiles; bool abort; }; #endif mothur-1.36.1/source/commands/mergegroupscommand.cpp000066400000000000000000000436001255543666200226440ustar00rootroot00000000000000/* * mergegroupscommand.cpp * mothur * * Created by westcott on 1/24/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mergegroupscommand.h" #include "sharedutilities.h" //********************************************************************************************************************** vector MergeGroupsCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "sharedGroup", "none","shared",false,false,true); parameters.push_back(pshared); CommandParameter pgroup("group", "InputTypes", "", "", "none", "sharedGroup", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pdesign); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MergeGroupsCommand::getHelpString(){ try { string helpString = ""; helpString += "The merge.groups command reads a shared or group file and a design file and merges the groups that are in the same grouping in the design file.\n"; helpString += "The merge.groups command outputs a .shared file. \n"; helpString += "The merge.groups command parameters are shared, group, groups, label and design. The design parameter is required.\n"; helpString += "The design parameter allows you to assign your groups to sets. It is required. \n"; helpString += "The design file looks like the group file. It is a 2 column tab delimited file, where the first column is the group name and the second column is the set the group belongs to.\n"; helpString += "The groups parameter allows you to specify which of the groups in your shared or group file you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distance levels you would like, and are also separated by dashes.\n"; helpString += "The merge.groups command should be in the following format: merge.groups(design=yourDesignFile, shared=yourSharedFile).\n"; helpString += "Example merge.groups(design=temp.design, groups=A-B-C, shared=temp.shared).\n"; helpString += "The default value for groups is all the groups in your sharedfile, and all labels in your inputfile will be used.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MergeGroupsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "shared") { pattern = "[filename],merge,[extension]"; } else if (type == "group") { pattern = "[filename],merge,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MergeGroupsCommand::MergeGroupsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["group"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "MetaStatsCommand"); exit(1); } } //********************************************************************************************************************** MergeGroupsCommand::MergeGroupsCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["group"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } } //check for required parameters designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { abort = true; } else if (designfile == "not found") { //if there is a current shared file, use it designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current designfile and the design parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setDesignFile(designfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; sharedfile = ""; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; groupfile = ""; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = "all"; } m->splitAtDash(groups, Groups); m->setGroups(Groups); if ((sharedfile == "") && (groupfile == "")) { //give priority to group, then shared groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current groupfile or sharedfile and one is required."); m->mothurOutEndLine(); abort = true; } } } } } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "MergeGroupsCommand"); exit(1); } } //********************************************************************************************************************** int MergeGroupsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } designMap = new DesignMap(designfile); if (groupfile != "") { processGroupFile(designMap); } if (sharedfile != "") { processSharedFile(designMap); } //reset groups parameter m->clearGroups(); delete designMap; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0;} //set shared file as new current sharedfile string current = ""; itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int MergeGroupsCommand::process(vector& thisLookUp, ofstream& out){ try { map merged; map::iterator it; for (int i = 0; i < thisLookUp.size(); i++) { if (m->control_pressed) { return 0; } //what grouping does this group belong to string grouping = designMap->get(thisLookUp[i]->getGroup()); if (grouping == "not found") { m->mothurOut("[ERROR]: " + thisLookUp[i]->getGroup() + " is not in your design file. Ignoring!"); m->mothurOutEndLine(); grouping = "NOTFOUND"; } else { //do we already have a member of this grouping? it = merged.find(grouping); if (it == merged.end()) { //nope, so create it merged[grouping] = *thisLookUp[i]; merged[grouping].setGroup(grouping); }else { //yes, merge it for (int j = 0; j < thisLookUp[i]->getNumBins(); j++) { int abund = (it->second).getAbundance(j); abund += thisLookUp[i]->getAbundance(j); (it->second).set(j, abund, grouping); } } } } //print new file for (it = merged.begin(); it != merged.end(); it++) { out << (it->second).getLabel() << '\t' << it->first << '\t'; (it->second).print(out); } return 0; } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "process"); exit(1); } } //********************************************************************************************************************** int MergeGroupsCommand::processSharedFile(DesignMap*& designMap){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); string outputFileName = getOutputFileName("shared", variables); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); InputData input(sharedfile, "sharedfile"); lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { out.close(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); delete designMap; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } process(lookup, out); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } process(lookup, out); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { out.close(); m->clearGroups(); delete designMap; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { out.close(); m->clearGroups(); delete designMap; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } process(lookup, out); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "processSharedFile"); exit(1); } } //********************************************************************************************************************** int MergeGroupsCommand::processGroupFile(DesignMap*& designMap){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); outputTypes["group"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); //read groupfile GroupMap groupMap(groupfile); groupMap.readMap(); //fill Groups - checks for "all" and for any typo groups SharedUtil* util = new SharedUtil(); vector nameGroups = groupMap.getNamesOfGroups(); util->setGroups(Groups, nameGroups); delete util; vector namesOfSeqs = groupMap.getNamesSeqs(); bool error = false; for (int i = 0; i < namesOfSeqs.size(); i++) { if (m->control_pressed) { break; } string thisGroup = groupMap.getGroup(namesOfSeqs[i]); //are you in a group the user wants if (m->inUsersGroups(thisGroup, Groups)) { string thisGrouping = designMap->get(thisGroup); if (thisGrouping == "not found") { m->mothurOut("[ERROR]: " + namesOfSeqs[i] + " is from group " + thisGroup + " which is not in your design file, please correct."); m->mothurOutEndLine(); error = true; } else { out << namesOfSeqs[i] << '\t' << thisGrouping << endl; } } } if (error) { m->control_pressed = true; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "MergeGroupsCommand", "processGroupFile"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/mergegroupscommand.h000066400000000000000000000025301255543666200223060ustar00rootroot00000000000000#ifndef MERGEGROUPSCOMMAND_H #define MERGEGROUPSCOMMAND_H /* * mergegroupscommand.h * mothur * * Created by westcott on 1/24/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "sharedrabundvector.h" #include "designmap.h" class MergeGroupsCommand : public Command { public: MergeGroupsCommand(string); MergeGroupsCommand(); ~MergeGroupsCommand() {} vector setParameters(); string getCommandName() { return "merge.groups"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Merge.groups"; } string getDescription() { return "reads shared file and a design file and merges the groups in the shared file that are in the same grouping in the design file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: DesignMap* designMap; vector lookup; bool abort, allLines, pickedGroups; set labels; //holds labels to be used string groups, label, outputDir, inputDir, designfile, sharedfile, groupfile; vector Groups, outputNames; int process(vector&, ofstream&); int processSharedFile(DesignMap*&); int processGroupFile(DesignMap*&); }; #endif mothur-1.36.1/source/commands/mergesfffilecommand.cpp000066400000000000000000001160011255543666200227370ustar00rootroot00000000000000// // mergesfffilecommand.cpp // Mothur // // Created by Sarah Westcott on 1/31/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #include "mergesfffilecommand.h" #include "endiannessmacros.h" //********************************************************************************************************************** vector MergeSfffilesCommand::setParameters(){ try { CommandParameter psff("sff", "InputTypes", "", "", "sffFile", "sffFile", "none","sff",false,false); parameters.push_back(psff); CommandParameter pfile("file", "InputTypes", "", "", "sffFile", "sffFile", "none","sff",false,false); parameters.push_back(pfile); CommandParameter pkeytrim("keytrim", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pkeytrim); CommandParameter poutput("output", "String", "", "", "", "", "","",false,true,true); parameters.push_back(poutput); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MergeSfffilesCommand::getHelpString(){ try { string helpString = ""; helpString += "The merge.sfffiles command reads a sff file or a file containing a list of sff files and merges the individual files into a single sff file. \n"; helpString += "The merge.sfffiles command parameters are sff, file and output. sff or file is required. \n"; helpString += "The sff parameter allows you to enter the sff list of sff files separated by -'s.\n"; helpString += "The file parameter allows you to provide a file containing a list of sff files to merge. \n"; helpString += "The keytrim parameter allows you to mergesff files with different keysequence by trimming them to the first 4 characters. Provided the first 4 match. \n"; helpString += "The output parameter allows you to provide an output filename. \n"; helpString += "Example sffinfo(sff=mySffFile.sff-mySecond.sff).\n"; helpString += "Note: No spaces between parameter labels (i.e. sff), '=' and parameters (i.e.yourSffFileName).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MergeSfffilesCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "sff") { pattern = "[filename],"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MergeSfffilesCommand::MergeSfffilesCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["sff"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "MergeSfffilesCommand"); exit(1); } } //********************************************************************************************************************** MergeSfffilesCommand::MergeSfffilesCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["sff"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ string path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } } sffFilename = validParameter.validFile(parameters, "sff", false); if (sffFilename == "not found") { sffFilename = ""; } else { m->splitAtDash(sffFilename, filenames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < filenames.size(); i++) { bool ignore = false; if (filenames[i] == "current") { filenames[i] = m->getSFFFile(); if (filenames[i] != "") { m->mothurOut("Using " + filenames[i] + " as input file for the sff parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sfffile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list filenames.erase(filenames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(filenames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { filenames[i] = inputDir + filenames[i]; } } ifstream in; int ableToOpen = m->openInputFile(filenames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(filenames[i]); m->mothurOut("Unable to open " + filenames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); filenames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(filenames[i]); m->mothurOut("Unable to open " + filenames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); filenames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + filenames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list filenames.erase(filenames.begin()+i); i--; }else { m->setSFFFile(filenames[i]); } } } } file = validParameter.validFile(parameters, "file", true); if (file == "not open") { abort = true; } else if (file == "not found") { file = ""; } if ((file == "") && (filenames.size() == 0)) { m->mothurOut("[ERROR]: no valid files."); m->mothurOutEndLine(); abort = true; } if ((file != "") && (filenames.size() != 0)) { //both are given m->mothurOut("[ERROR]: cannot use file option and sff option at the same time, choose one."); m->mothurOutEndLine(); abort = true; } outputFile = validParameter.validFile(parameters, "output", false); if (outputFile == "not found") { m->mothurOut("you must enter an output file name"); m->mothurOutEndLine(); abort=true; } if (outputDir != "") { outputFile = outputDir + m->getSimpleName(outputFile); } string temp = validParameter.validFile(parameters, "keytrim", false); if (temp == "not found") { temp = "F"; } keyTrim = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "MergeSfffilesCommand"); exit(1); } } //********************************************************************************************************************** int MergeSfffilesCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (file != "") { readFile(); if (outputDir == "") { outputDir = m->hasPath(file); } } ofstream out; map variables; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(outputFile); } variables["[filename]"] = thisOutputDir + m->getSimpleName(outputFile); outputFile = getOutputFileName("sff",variables); m->openOutputFileBinary(outputFile, out); outputNames.push_back(outputFile); outputTypes["sff"].push_back(outputFile); outputFileHeader = outputFile + ".headers"; numTotalReads = 0; for (int s = 0; s < filenames.size(); s++) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } int start = time(NULL); filenames[s] = m->getFullPathName(filenames[s]); m->mothurOut("\nMerging info from " + filenames[s] + " ..." ); m->mothurOutEndLine(); int numReads = mergeSffInfo(filenames[s], out); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to merge " + toString(numReads) + ".\n"); } out.close(); //create new common header and add to merged file adjustCommonHeader(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set sff file as new current sff file string current = ""; itTypes = outputTypes.find("sff"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSFFFile(current); } } //report output filenames m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "execute"); exit(1); } } //********************************************************************************************************************** int MergeSfffilesCommand::mergeSffInfo(string input, ofstream& out){ try { currentFileName = input; ifstream in; m->openInputFileBinary(input, in); CommonHeader header; readCommonHeader(in, header); int count = 0; //check magic number and version if (header.magicNumber != 779314790) { m->mothurOut("Magic Number is not correct, not a valid .sff file"); m->mothurOutEndLine(); return count; } if (header.version != "0001") { m->mothurOut("Version is not supported, only support version 0001."); m->mothurOutEndLine(); return count; } //save for adjustHeader sanity check commonHeaders.push_back(header); //read through the sff file while (!in.eof()) { //read data seqRead read; Header readheader; readSeqData(in, read, header.numFlowsPerRead, readheader, out); bool okay = sanityCheck(readheader, read); if (!okay) { break; } count++; //report progress if((count+1) % 10000 == 0){ m->mothurOut(toString(count+1)); m->mothurOutEndLine(); } if (m->control_pressed) { count = 0; break; } if (count >= header.numReads) { break; } } //report progress if (!m->control_pressed) { if((count) % 10000 != 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } } in.close(); return count; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "mergeSffInfo"); exit(1); } } //********************************************************************************************************************** int MergeSfffilesCommand::readCommonHeader(ifstream& in, CommonHeader& header){ try { if (!in.eof()) { //read magic number char buffer[4]; in.read(buffer, 4); header.magicNumber = be_int4(*(unsigned int *)(&buffer)); //read version char buffer9[4]; in.read(buffer9, 4); header.version = ""; for (int i = 0; i < 4; i++) { header.version += toString((int)(buffer9[i])); } //read offset char buffer2 [8]; in.read(buffer2, 8); header.indexOffset = be_int8(*(unsigned long long *)(&buffer2)); //read index length char buffer3 [4]; in.read(buffer3, 4); header.indexLength = be_int4(*(unsigned int *)(&buffer3)); //read num reads char buffer4 [4]; in.read(buffer4, 4); header.numReads = be_int4(*(unsigned int *)(&buffer4)); if (m->debug) { m->mothurOut("[DEBUG]: numReads = " + toString(header.numReads) + "\n"); } //read header length char buffer5 [2]; in.read(buffer5, 2); header.headerLength = be_int2(*(unsigned short *)(&buffer5)); //read key length char buffer6 [2]; in.read(buffer6, 2); header.keyLength = be_int2(*(unsigned short *)(&buffer6)); //cout << "header key length = " << header.keyLength << endl; //read number of flow reads char buffer7 [2]; in.read(buffer7, 2); header.numFlowsPerRead = be_int2(*(unsigned short *)(&buffer7)); //read format code char buffer8 [1]; in.read(buffer8, 1); header.flogramFormatCode = (int)(buffer8[0]); //read flow chars char* tempBuffer = new char[header.numFlowsPerRead]; in.read(&(*tempBuffer), header.numFlowsPerRead); header.flowChars = tempBuffer; if (header.flowChars.length() > header.numFlowsPerRead) { header.flowChars = header.flowChars.substr(0, header.numFlowsPerRead); } delete[] tempBuffer; //read key char* tempBuffer2 = new char[header.keyLength]; in.read(&(*tempBuffer2), header.keyLength); header.keySequence = tempBuffer2; //cout << "key sequence " < header.keyLength) { header.keySequence = header.keySequence.substr(0, header.keyLength); } delete[] tempBuffer2; //cout << "key sequence " <mothurOut("Error reading sff common header."); m->mothurOutEndLine(); } return 0; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "readCommonHeader"); exit(1); } } //********************************************************************************************************************** int MergeSfffilesCommand::adjustCommonHeader(){ try { //sanity check bool okayMagic = true; bool okayVersion = true; bool okayHeader = true; bool okayKeyLength = true; bool okayNumFlows = true; bool okayformatCode = true; bool okayflowChar = true; bool okayKeySequence = true; if (commonHeaders.size() != 0) { unsigned int magicN = commonHeaders[0].magicNumber; string version = commonHeaders[0].version; unsigned short headerLength = commonHeaders[0].headerLength; unsigned short keyLength = commonHeaders[0].keyLength; unsigned short numFlows = commonHeaders[0].numFlowsPerRead; int flowCode = commonHeaders[0].flogramFormatCode; string flowChars = commonHeaders[0].flowChars; string keySeq = commonHeaders[0].keySequence; for (int i = 1; i < commonHeaders.size(); i++) { if (commonHeaders[i].magicNumber != magicN) { okayMagic = false; m->mothurOut("[ERROR]: merge issue with common headers. Magic numbers do not match. " + filenames[0] + " magic number is " + toString(commonHeaders[0].magicNumber) + ", but " + filenames[i] + " magic number is " + toString(commonHeaders[i].magicNumber) + ".\n"); } if (commonHeaders[i].version != version) { okayVersion = false; m->mothurOut("[ERROR]: merge issue with common headers. Versions do not match. " + filenames[0] + " version is " + commonHeaders[0].version + ", but " + filenames[i] + " version is " + commonHeaders[i].version + ".\n"); } if (commonHeaders[i].headerLength != headerLength) { okayHeader = false; m->mothurOut("[ERROR]: merge issue with common headers. Header lengths do not match. " + filenames[0] + " header length is " + toString(commonHeaders[0].headerLength) + ", but " + filenames[i] + " header length is " + toString(commonHeaders[i].headerLength) + ".\n"); } if (commonHeaders[i].keyLength != keyLength) { okayKeyLength = false; m->mothurOut("[ERROR]: merge issue with common headers. Key Lengths do not match. " + filenames[0] + " Key length is " + toString(commonHeaders[0].keyLength) + ", but " + filenames[i] + " key length is " + toString(commonHeaders[i].keyLength) + ".\n"); } if (commonHeaders[i].numFlowsPerRead != numFlows) { okayNumFlows = false; m->mothurOut("[ERROR]: merge issue with common headers. Number of flows per read do not match. " + filenames[0] + " number of flows is " + toString(commonHeaders[0].numFlowsPerRead) + ", but " + filenames[i] + " number of flows is " + toString(commonHeaders[i].numFlowsPerRead) + ".\n"); } if (commonHeaders[i].flogramFormatCode != flowCode) { okayformatCode = false; m->mothurOut("[ERROR]: merge issue with common headers. Flow format codes do not match. " + filenames[0] + " Flow format code is " + toString(commonHeaders[0].flogramFormatCode) + ", but " + filenames[i] + " flow format code is " + toString(commonHeaders[i].flogramFormatCode) + ".\n"); } if (commonHeaders[i].flowChars != flowChars) { okayflowChar = false; m->mothurOut("[ERROR]: merge issue with common headers. Flow characters do not match. " + filenames[0] + " Flow characters are " + commonHeaders[0].flowChars + ", but " + filenames[i] + " flow characters are " + commonHeaders[i].flowChars + ".\n"); } if (commonHeaders[i].keySequence != keySeq) { okayKeySequence = false; if (keyTrim) { m->mothurOut("[WARNING]: merge issue with common headers. Key sequences do not match. " + filenames[0] + " Key sequence is " + commonHeaders[0].keySequence + ", but " + filenames[i] + " key sequence is " + commonHeaders[i].keySequence + ". We will attempt to trim them.\n"); }else { m->mothurOut("[ERROR]: merge issue with common headers. Key sequences do not match. " + filenames[0] + " Key sequence is " + commonHeaders[0].keySequence + ", but " + filenames[i] + " key sequence is " + commonHeaders[i].keySequence + ".\n"); } } } }else { m->control_pressed = true; return 0; } //should never get here bool modify = false; if (!okayMagic || !okayVersion || !okayHeader || !okayKeyLength || !okayNumFlows || !okayformatCode || !okayflowChar) { m->control_pressed = true; return 0; } if (!okayKeySequence) { bool okayKeySequence2 = true; string keySeq = commonHeaders[0].keySequence.substr(0,4); for (int i = 1; i < commonHeaders.size(); i++) { if ((commonHeaders[i].keySequence.substr(0,4)) != keySeq) { okayKeySequence2 = false; } } if (okayKeySequence2 && keyTrim) { modify = true; m->mothurOut("We are able to trim the key sequences. Merged key seqeunce will be " + keySeq + ".\n"); } } string endian = m->findEdianness(); char* mybuffer = new char[4]; ifstream in; m->openInputFileBinary(currentFileName, in); //magic number in.read(mybuffer,4); ofstream out; m->openOutputFileBinaryAppend(outputFileHeader, out); out.write(mybuffer, in.gcount()); delete[] mybuffer; //version mybuffer = new char[4]; in.read(mybuffer,4); out.write(mybuffer, in.gcount()); delete[] mybuffer; //offset mybuffer = new char[8]; in.read(mybuffer,8); unsigned long long offset = 0; char* thisbuffer = new char[8]; thisbuffer[0] = (offset >> 56) & 0xFF; thisbuffer[1] = (offset >> 48) & 0xFF; thisbuffer[2] = (offset >> 40) & 0xFF; thisbuffer[3] = (offset >> 32) & 0xFF; thisbuffer[4] = (offset >> 24) & 0xFF; thisbuffer[5] = (offset >> 16) & 0xFF; thisbuffer[6] = (offset >> 8) & 0xFF; thisbuffer[7] = offset & 0xFF; out.write(thisbuffer, 8); delete[] thisbuffer; delete[] mybuffer; //read index length mybuffer = new char[4]; in.read(mybuffer,4); offset = 0; char* thisbuffer2 = new char[4]; thisbuffer2[0] = (offset >> 24) & 0xFF; thisbuffer2[1] = (offset >> 16) & 0xFF; thisbuffer2[2] = (offset >> 8) & 0xFF; thisbuffer2[3] = offset & 0xFF; out.write(thisbuffer2, 4); delete[] thisbuffer2; delete[] mybuffer; //change num reads mybuffer = new char[4]; in.read(mybuffer,4); delete[] mybuffer; thisbuffer2 = new char[4]; thisbuffer2[0] = (numTotalReads >> 24) & 0xFF; thisbuffer2[1] = (numTotalReads >> 16) & 0xFF; thisbuffer2[2] = (numTotalReads >> 8) & 0xFF; thisbuffer2[3] = numTotalReads & 0xFF; out.write(thisbuffer2, 4); delete[] thisbuffer2; //read header length mybuffer = new char[2]; in.read(mybuffer,2); out.write(mybuffer, in.gcount()); delete[] mybuffer; //read key length mybuffer = new char[2]; in.read(mybuffer,2); if (modify) { unsigned short fourL = 4; thisbuffer2 = new char[2]; thisbuffer2[0] = (fourL >> 8) & 0xFF; thisbuffer2[1] = fourL & 0xFF; out.write(thisbuffer2, in.gcount()); delete[] thisbuffer2; }else { out.write(mybuffer, in.gcount()); } delete[] mybuffer; //read number of flow reads mybuffer = new char[2]; in.read(mybuffer,2); out.write(mybuffer, in.gcount()); delete[] mybuffer; //read format code mybuffer = new char[1]; in.read(mybuffer,1); out.write(mybuffer, in.gcount()); delete[] mybuffer; //read flow chars mybuffer = new char[commonHeaders[0].numFlowsPerRead]; in.read(mybuffer,commonHeaders[0].numFlowsPerRead); out.write(mybuffer, in.gcount()); delete[] mybuffer; //read key mybuffer = new char[commonHeaders[0].keyLength]; in.read(mybuffer,commonHeaders[0].keyLength); if (modify) { out.write(mybuffer, 4); }else { out.write(mybuffer, in.gcount()); } delete[] mybuffer; /* Pad to 8 chars */ unsigned long long spotInFile = in.tellg(); if (modify) { spotInFile -= commonHeaders[0].keyLength - 4; } unsigned long long spot = (spotInFile + 7)& ~7; // ~ inverts in.seekg(spot); mybuffer = new char[spot-spotInFile]; out.write(mybuffer, spot-spotInFile); delete[] mybuffer; in.close(); out.close(); m->appendSFFFiles(outputFile, outputFileHeader); m->renameFile(outputFileHeader, outputFile); m->mothurRemove(outputFileHeader); return 0; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "adjustCommonHeader"); exit(1); } } //********************************************************************************************************************** bool MergeSfffilesCommand::readSeqData(ifstream& in, seqRead& read, int numFlowReads, Header& header, ofstream& out){ try { unsigned long long startSpotInFile = in.tellg(); if (!in.eof()) { /*****************************************/ //read header //read header length char buffer [2]; in.read(buffer, 2); header.headerLength = be_int2(*(unsigned short *)(&buffer)); //read name length char buffer2 [2]; in.read(buffer2, 2); header.nameLength = be_int2(*(unsigned short *)(&buffer2)); //read num bases char buffer3 [4]; in.read(buffer3, 4); header.numBases = be_int4(*(unsigned int *)(&buffer3)); //read clip qual left char buffer4 [2]; in.read(buffer4, 2); header.clipQualLeft = be_int2(*(unsigned short *)(&buffer4)); header.clipQualLeft = 5; //read clip qual right char buffer5 [2]; in.read(buffer5, 2); header.clipQualRight = be_int2(*(unsigned short *)(&buffer5)); //read clipAdapterLeft char buffer6 [2]; in.read(buffer6, 2); header.clipAdapterLeft = be_int2(*(unsigned short *)(&buffer6)); //read clipAdapterRight char buffer7 [2]; in.read(buffer7, 2); header.clipAdapterRight = be_int2(*(unsigned short *)(&buffer7)); //read name char* tempBuffer = new char[header.nameLength]; in.read(&(*tempBuffer), header.nameLength); header.name = tempBuffer; if (header.name.length() > header.nameLength) { header.name = header.name.substr(0, header.nameLength); } delete[] tempBuffer; /* Pad to 8 chars */ unsigned long long spotInFile = in.tellg(); unsigned long long spot = (spotInFile + 7)& ~7; in.seekg(spot); /*****************************************/ //sequence read //read flowgram read.flowgram.resize(numFlowReads); for (int i = 0; i < numFlowReads; i++) { char buffer [2]; in.read(buffer, 2); read.flowgram[i] = be_int2(*(unsigned short *)(&buffer)); } //read flowIndex read.flowIndex.resize(header.numBases); for (int i = 0; i < header.numBases; i++) { char temp[1]; in.read(temp, 1); read.flowIndex[i] = be_int1(*(unsigned char *)(&temp)); } //read bases char* tempBuffer6 = new char[header.numBases]; in.read(&(*tempBuffer6), header.numBases); read.bases = tempBuffer6; if (read.bases.length() > header.numBases) { read.bases = read.bases.substr(0, header.numBases); } delete[] tempBuffer6; //read qual scores read.qualScores.resize(header.numBases); for (int i = 0; i < header.numBases; i++) { char temp[1]; in.read(temp, 1); read.qualScores[i] = be_int1(*(unsigned char *)(&temp)); } /* Pad to 8 chars */ spotInFile = in.tellg(); spot = (spotInFile + 7)& ~7; in.seekg(spot); char * mybuffer; mybuffer = new char [spot-startSpotInFile]; ifstream in2; m->openInputFileBinary(currentFileName, in2); in2.seekg(startSpotInFile); in2.read(mybuffer,spot-startSpotInFile); out.write(mybuffer, in2.gcount()); numTotalReads++; delete[] mybuffer; in2.close(); }else{ m->mothurOut("Error reading."); m->mothurOutEndLine(); } if (in.eof()) { return true; } return false; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "readSeqData"); exit(1); } } //********************************************************************************************************************** bool MergeSfffilesCommand::sanityCheck(Header& header, seqRead& read) { try { bool okay = true; string message = "[WARNING]: Your sff file may be corrupted! Sequence: " + header.name + "\n"; if (header.clipQualLeft > read.bases.length()) { okay = false; message += "Clip Qual Left = " + toString(header.clipQualLeft) + ", but we only read " + toString(read.bases.length()) + " bases.\n"; } if (header.clipQualRight > read.bases.length()) { okay = false; message += "Clip Qual Right = " + toString(header.clipQualRight) + ", but we only read " + toString(read.bases.length()) + " bases.\n"; } if (header.clipQualLeft > read.qualScores.size()) { okay = false; message += "Clip Qual Left = " + toString(header.clipQualLeft) + ", but we only read " + toString(read.qualScores.size()) + " quality scores.\n"; } if (header.clipQualRight > read.qualScores.size()) { okay = false; message += "Clip Qual Right = " + toString(header.clipQualRight) + ", but we only read " + toString(read.qualScores.size()) + " quality scores.\n"; } if (okay == false) { m->mothurOut(message); m->mothurOutEndLine(); } return okay; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "sanityCheck"); exit(1); } } //********************************************************************************************************************** int MergeSfffilesCommand::readFile(){ try { string filename; ifstream in; m->openInputFile(file, in); while(!in.eof()) { if (m->control_pressed) { return 0; } in >> filename; m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: filename = " + filename + ".\n"); } //check to make sure both are able to be opened ifstream in2; int openForward = m->openInputFile(filename, in2, "noerror"); //if you can't open it, try default location if (openForward == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(filename); m->mothurOut("Unable to open " + filename + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openForward = m->openInputFile(tryPath, in3, "noerror"); in3.close(); filename = tryPath; } } //if you can't open it, try output location if (openForward == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(filename); m->mothurOut("Unable to open " + filename + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openForward = m->openInputFile(tryPath, in4, "noerror"); filename = tryPath; in4.close(); } } if (openForward == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + filename + ", ignoring.\n"); }else{ filenames.push_back(filename); } } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "readFileNames"); exit(1); } } //********************************************************************************************************************** int MergeSfffilesCommand::printCommonHeaderForDebug(CommonHeader& header, ofstream& out, int numReads){ try { string endian = m->findEdianness(); ifstream in; m->openInputFileBinary(currentFileName, in); //magic number char* mybuffer = new char[4]; in.read(mybuffer,4); out.write(mybuffer, in.gcount()); string contents = mybuffer; m->mothurOut("magicNumber = " + contents + "\n"); delete[] mybuffer; //version char* mybuffer1 = new char[4]; in.read(mybuffer1,4); out.write(mybuffer1, in.gcount()); contents = mybuffer1; m->mothurOut("version = " + contents + "\n"); m->mothurOut("version = " + header.version + "\n"); delete[] mybuffer1; //offset char* mybuffer2 = new char[8]; in.read(mybuffer2,8); unsigned long long offset = 0; char* thisbuffer = new char[8]; thisbuffer[0] = (offset >> 56) & 0xFF; thisbuffer[1] = (offset >> 48) & 0xFF; thisbuffer[2] = (offset >> 40) & 0xFF; thisbuffer[3] = (offset >> 32) & 0xFF; thisbuffer[4] = (offset >> 24) & 0xFF; thisbuffer[5] = (offset >> 16) & 0xFF; thisbuffer[6] = (offset >> 8) & 0xFF; thisbuffer[7] = offset & 0xFF; out.write(thisbuffer, 8); delete[] thisbuffer; delete[] mybuffer2; m->mothurOut("index offset = " + toString(header.indexOffset) + "\n"); //read index length char* mybuffer3 = new char[4]; in.read(mybuffer3,4); offset = 0; char* thisbuffer2 = new char[4]; thisbuffer2[0] = (offset >> 24) & 0xFF; thisbuffer2[1] = (offset >> 16) & 0xFF; thisbuffer2[2] = (offset >> 8) & 0xFF; thisbuffer2[3] = offset & 0xFF; out.write(thisbuffer2, 4); delete[] thisbuffer2; delete[] mybuffer3; m->mothurOut("index read length = " + toString(header.indexLength) + "\n"); //change num reads char* mybuffer4 = new char[4]; in.read(mybuffer4,4); char* thisbuffer3 = new char[4]; if (endian == "BIG_ENDIAN") { thisbuffer3[0] = (numReads >> 24) & 0xFF; thisbuffer3[1] = (numReads >> 16) & 0xFF; thisbuffer3[2] = (numReads >> 8) & 0xFF; thisbuffer3[3] = numReads & 0xFF; }else { thisbuffer3[0] = numReads & 0xFF; thisbuffer3[1] = (numReads >> 8) & 0xFF; thisbuffer3[2] = (numReads >> 16) & 0xFF; thisbuffer3[3] = (numReads >> 24) & 0xFF; } out.write(thisbuffer3, 4); contents = mybuffer4; m->mothurOut("numReads = " + contents + "\n"); unsigned int numTReads = be_int4(*(unsigned int *)(mybuffer4)); m->mothurOut("numReads = " + toString(numTReads) + "\n"); m->mothurOut("numReads = " + toString(header.numReads) + "\n"); delete[] thisbuffer3; delete[] mybuffer4; //read header length char* mybuffer5 = new char[2]; in.read(mybuffer5,2); out.write(mybuffer5, in.gcount()); contents = mybuffer5; m->mothurOut("readLength = " + contents + "\n"); m->mothurOut("readLength = " + toString(header.headerLength) + "\n"); delete[] mybuffer5; //read key length char* mybuffer6 = new char[2]; in.read(mybuffer6,2); out.write(mybuffer6, in.gcount()); contents = mybuffer6; m->mothurOut("key length = " + contents + "\n"); m->mothurOut("key length = " + toString(header.keyLength) + "\n"); delete[] mybuffer6; //read number of flow reads char* mybuffer7 = new char[2]; in.read(mybuffer7,2); out.write(mybuffer7, in.gcount()); contents = mybuffer7; m->mothurOut("num flow reads = " + contents + "\n"); int numFlowReads = be_int2(*(unsigned short *)(mybuffer7)); m->mothurOut("num flow Reads = " + toString(numFlowReads) + "\n"); delete[] mybuffer7; //read format code char* mybuffer8 = new char[1]; in.read(mybuffer8,1); out.write(mybuffer8, in.gcount()); contents = mybuffer8; m->mothurOut("read format code = " + contents + "\n"); m->mothurOut("read format code = " + toString(header.flogramFormatCode) + "\n"); delete[] mybuffer8; //read flow chars char* mybuffer9 = new char[header.numFlowsPerRead]; in.read(mybuffer9,header.numFlowsPerRead); out.write(mybuffer9, in.gcount()); contents = mybuffer9; m->mothurOut("flow chars = " + contents + "\n"); m->mothurOut("flow chars = " + header.flowChars + "\n"); delete[] mybuffer9; //read key char* mybuffer10 = new char[header.keyLength]; in.read(mybuffer10,header.keyLength); out.write(mybuffer10, in.gcount()); contents = mybuffer10; m->mothurOut("key = " + contents + "\n"); m->mothurOut("key = " + header.keySequence + "\n"); delete[] mybuffer10; /* Pad to 8 chars */ unsigned long long spotInFile = in.tellg(); unsigned long long spot = (spotInFile + 7)& ~7; // ~ inverts in.seekg(spot); char* mybuffer11 = new char[spot-spotInFile]; out.write(mybuffer11, spot-spotInFile); delete[] mybuffer11; in.close(); return 0; } catch(exception& e) { m->errorOut(e, "MergeSfffilesCommand", "printCommonHeaderForDebug"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/mergesfffilecommand.h000066400000000000000000000032511255543666200224060ustar00rootroot00000000000000// // mergesfffilecommand.h // Mothur // // Created by Sarah Westcott on 1/31/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #ifndef Mothur_mergesfffilecommand_h #define Mothur_mergesfffilecommand_h #include "command.hpp" /**********************************************************/ class MergeSfffilesCommand : public Command { public: MergeSfffilesCommand(string); MergeSfffilesCommand(); ~MergeSfffilesCommand(){} vector setParameters(); string getCommandName() { return "merge.sfffiles"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/merge.sfffiles"; } string getDescription() { return "merge individual sfffiles into a single .sff file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string sffFilename, outputDir, file, currentFileName; vector filenames, outputNames; bool abort, keyTrim; int numTotalReads, allFilesnumFlowReads, allFileskeyLength; string outputFile, outputFileHeader; vector commonHeaders; //extract sff file functions int mergeSffInfo(string, ofstream&); int readCommonHeader(ifstream&, CommonHeader&); int readHeader(ifstream&, Header&); bool readSeqData(ifstream&, seqRead&, int, Header&, ofstream&); int decodeName(string&, string&, string&, string); bool sanityCheck(Header&, seqRead&); int adjustCommonHeader(); int readFile(); int printCommonHeaderForDebug(CommonHeader& header, ofstream& out, int numReads); }; /**********************************************************/ #endif mothur-1.36.1/source/commands/mergetaxsummarycommand.cpp000066400000000000000000000372311255543666200235420ustar00rootroot00000000000000// // mergetaxsummarycommand.cpp // Mothur // // Created by Sarah Westcott on 2/13/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "mergetaxsummarycommand.h" //********************************************************************************************************************** vector MergeTaxSummaryCommand::setParameters(){ try { CommandParameter pinput("input", "String", "", "", "", "", "","",false,true,true); parameters.push_back(pinput); CommandParameter poutput("output", "String", "", "", "", "", "","",false,true,true); parameters.push_back(poutput); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MergeTaxSummaryCommand::getHelpString(){ try { string helpString = ""; helpString += "The merge.taxsummary command takes a list of tax.summary files separated by dashes and merges them into one file."; helpString += "The merge.taxsummary command parameters are input and output."; helpString += "Example merge.taxsummary(input=small.tax.summary-large.tax.summary, output=all.tax.summary)."; helpString += "Note: No spaces between parameter labels (i.e. output), '=' and parameters (i.e.yourOutputFileName).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** MergeTaxSummaryCommand::MergeTaxSummaryCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["taxsummary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "MergeTaxSummaryCommand"); exit(1); } } //********************************************************************************************************************** MergeTaxSummaryCommand::MergeTaxSummaryCommand(string option) { try { abort = false; calledHelp = false; if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true; } else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["taxsummary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } string fileList = validParameter.validFile(parameters, "input", false); if(fileList == "not found") { m->mothurOut("you must enter two or more file names"); m->mothurOutEndLine(); abort=true; } else{ m->splitAtDash(fileList, fileNames); } //if the user changes the output directory command factory will send this info to us in the output parameter string outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found") { outputDir = ""; } numInputFiles = fileNames.size(); ifstream testFile; if(numInputFiles == 0){ m->mothurOut("you must enter two or more file names and you entered " + toString(fileNames.size()) + " file names"); m->mothurOutEndLine(); abort=true; } else{ for(int i=0;ihasPath(fileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fileNames[i] = inputDir + fileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fileNames[i], in, "noerror"); in.close(); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fileNames[i]); m->mothurOut("Unable to open " + fileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fileNames[i] = tryPath; } } //if you can't open it, try output location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fileNames[i]); m->mothurOut("Unable to open " + fileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fileNames[i] = tryPath; } } if (ableToOpen == 1) { m->mothurOut("Unable to open " + fileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fileNames.erase(fileNames.begin()+i); i--; } } } outputFileName = validParameter.validFile(parameters, "output", false); if (outputFileName == "not found") { m->mothurOut("you must enter an output file name"); m->mothurOutEndLine(); abort=true; } else if (outputDir != "") { outputFileName = outputDir + m->getSimpleName(outputFileName); } } } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "MergeTaxSummaryCommand"); exit(1); } } //********************************************************************************************************************** int MergeTaxSummaryCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } outputFileName = m->getFullPathName(outputFileName); m->mothurRemove(outputFileName); vector tree; tree.push_back(rawTaxNode("Root")); tree[0].rank = "0"; bool hasGroups = true; set groups; for (int i = 0; i < fileNames.size(); i++) { ifstream in; m->openInputFile(fileNames[i], in); string temp = m->getline(in); vector headers = m->splitWhiteSpace(temp); vector thisFilesGroups; if (headers.size() == 5) { hasGroups = false; } else { for (int j = 5; j < headers.size(); j++) { groups.insert(headers[j]); thisFilesGroups.push_back(headers[j]); } } int level, daugterLevels, total; float totalFloat; string rankId, tax; map levelToCurrentNode; levelToCurrentNode[0] = 0; while (!in.eof()) { if (m->control_pressed) { return 0; } in >> level >> rankId >> tax >> daugterLevels >> totalFloat; m->gobble(in); if ((totalFloat < 1) && (totalFloat > 0)) { m->mothurOut("[ERROR]: cannot merge tax.summary files with relative abundances.\n"); m->control_pressed = true; in.close(); return 0; }else { total = int(totalFloat); } map groupCounts; if (thisFilesGroups.size() != 0) { for (int j = 0; j < thisFilesGroups.size(); j++) { int tempNum; in >> tempNum; m->gobble(in); groupCounts[thisFilesGroups[j]] = tempNum; } } if (level == 0) {} else { map::iterator itParent = levelToCurrentNode.find(level-1); int parent = 0; if (itParent == levelToCurrentNode.end()) { m->mothurOut("[ERROR]: situation I didnt expect.\n"); } else { parent = itParent->second; } levelToCurrentNode[level] = addTaxToTree(tree, level, parent, tax, total, groupCounts); } } in.close(); } if (!hasGroups && (groups.size() != 0)) { groups.clear(); m->mothurOut("[WARNING]: not all files contain group breakdown, ignoring group counts.\n"); } ofstream out; m->openOutputFile(outputFileName, out); print(out, tree, groups); if (m->control_pressed) { m->mothurRemove(outputFileName); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFileName); m->mothurOutEndLine(); outputNames.push_back(outputFileName); outputTypes["taxsummary"].push_back(outputFileName); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "execute"); exit(1); } } /**************************************************************************************************/ int MergeTaxSummaryCommand::addTaxToTree(vector& tree, int level, int currentNode, string taxon, int total, map groups){ try { map::iterator childPointer; childPointer = tree[currentNode].children.find(taxon); int nodeToIncrement = 0; if(childPointer != tree[currentNode].children.end()){ //if the node already exists, increment counts nodeToIncrement = childPointer->second; tree[nodeToIncrement].total += total; for (map::iterator itGroups = groups.begin(); itGroups != groups.end(); itGroups++) { map::iterator it = tree[nodeToIncrement].groupCount.find(itGroups->first); if (it == tree[nodeToIncrement].groupCount.end()) { tree[nodeToIncrement].groupCount[itGroups->first] = itGroups->second; } else { it->second += itGroups->second; } } } else{ //otherwise, create it tree.push_back(rawTaxNode(taxon)); tree[currentNode].children[taxon] = tree.size()-1; tree[tree.size()-1].parent = currentNode; nodeToIncrement = tree.size()-1; tree[nodeToIncrement].total = total; tree[nodeToIncrement].level = level; for (map::iterator itGroups = groups.begin(); itGroups != groups.end(); itGroups++) { tree[nodeToIncrement].groupCount[itGroups->first] = itGroups->second; } } return nodeToIncrement; } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "addSeqToTree"); exit(1); } } /**************************************************************************************************/ int MergeTaxSummaryCommand::assignRank(int index, vector& tree){ try { map::iterator it; int counter = 1; for(it=tree[index].children.begin();it!=tree[index].children.end();it++){ if (m->control_pressed) { return 0; } tree[it->second].rank = tree[index].rank + '.' + toString(counter); counter++; assignRank(it->second, tree); } return 0; } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "assignRank"); exit(1); } } /**************************************************************************************************/ int MergeTaxSummaryCommand::print(ofstream& out, vector& tree, set groups){ try { assignRank(0, tree); vector mGroups; //print labels out << "taxlevel\trankID\ttaxon\tdaughterlevels\ttotal"; for (set::iterator it = groups.begin(); it != groups.end(); it++) { out << '\t' << (*it) ; } out << endl; for (set::iterator it2 = groups.begin(); it2 != groups.end(); it2++) { tree[0].groupCount[*it2] = 0; } map::iterator it; for(it=tree[0].children.begin();it!=tree[0].children.end();it++){ tree[0].total += tree[it->second].total; for (set::iterator it2 = groups.begin(); it2 != groups.end(); it2++) { map:: iterator itGroups = tree[it->second].groupCount.find(*it2); if (itGroups != tree[it->second].groupCount.end()) { tree[0].groupCount[*it2] += itGroups->second; } } } //print root out << tree[0].level << "\t" << tree[0].rank << "\t" << tree[0].name << "\t" << tree[0].children.size() << "\t" << tree[0].total; for (set::iterator it = groups.begin(); it != groups.end(); it++) { map:: iterator itGroups = tree[0].groupCount.find(*it); int num = 0; if (itGroups != tree[0].groupCount.end()) { num = itGroups->second; } out << '\t' << num; } out << endl; //print rest print(0, out, tree, groups); return 0; } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "print"); exit(1); } } /**************************************************************************************************/ int MergeTaxSummaryCommand::print(int i, ofstream& out, vector& tree, set groups){ try { map::iterator it; for(it=tree[i].children.begin();it!=tree[i].children.end();it++){ //print root out << tree[it->second].level << "\t" << tree[it->second].rank << "\t" << tree[it->second].name << "\t" << tree[it->second].children.size() << "\t" << tree[it->second].total; for (set::iterator it2 = groups.begin(); it2 != groups.end(); it2++) { map:: iterator itGroups = tree[it->second].groupCount.find(*it2); int num = 0; if (itGroups != tree[it->second].groupCount.end()) { num = itGroups->second; } out << '\t' << num ; } out << endl; print(it->second, out, tree, groups); } return 0; } catch(exception& e) { m->errorOut(e, "MergeTaxSummaryCommand", "print"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/mergetaxsummarycommand.h000066400000000000000000000024761255543666200232120ustar00rootroot00000000000000// // mergetaxsummarycommand.h // Mothur // // Created by Sarah Westcott on 2/13/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_mergetaxsummarycommand_h #define Mothur_mergetaxsummarycommand_h #include "mothur.h" #include "command.hpp" #include "phylosummary.h" class MergeTaxSummaryCommand : public Command { public: MergeTaxSummaryCommand(string); MergeTaxSummaryCommand(); ~MergeTaxSummaryCommand(){} vector setParameters(); string getCommandName() { return "merge.taxsummary"; } string getCommandCategory() { return "Phylotype Analysis"; } string getHelpString(); string getOutputPattern(string){ return ""; } string getCitation() { return "http://www.mothur.org/wiki/Merge.taxsummary"; } string getDescription() { return "merges tax summary files creating one file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector fileNames, outputNames; string outputFileName; int numInputFiles; bool abort; int addTaxToTree(vector&, int, int, string, int, map); int assignRank(int index, vector& tree); int print(ofstream& out, vector& tree, set groups); int print(int, ofstream& out, vector& tree, set groups); }; #endif mothur-1.36.1/source/commands/metastatscommand.cpp000066400000000000000000000762431255543666200223230ustar00rootroot00000000000000/* * metastatscommand.cpp * Mothur * * Created by westcott on 9/16/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "metastatscommand.h" #include "sharedutilities.h" #include "sharedrabundfloatvector.h" //********************************************************************************************************************** vector MetaStatsCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","metastats",false,true,true); parameters.push_back(pshared); CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pdesign); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pthreshold("threshold", "Number", "", "0.05", "", "", "","",false,false); parameters.push_back(pthreshold); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter psets("sets", "String", "", "", "", "", "","",false,false); parameters.push_back(psets); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MetaStatsCommand::getHelpString(){ try { string helpString = ""; helpString += "This command is based on the Metastats program, White, J.R., Nagarajan, N. & Pop, M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 5, e1000352 (2009).\n"; helpString += "The metastats command outputs a .metastats file. \n"; helpString += "The metastats command parameters are shared, iters, threshold, groups, label, design, sets and processors. The shared and design parameters are required, unless you have valid current files.\n"; helpString += "The design parameter allows you to assign your groups to sets when you are running metastat. mothur will run all pairwise comparisons of the sets. It is required. \n"; helpString += "The design file looks like the group file. It is a 2 column tab delimited file, where the first column is the group name and the second column is the set the group belongs to.\n"; helpString += "The sets parameter allows you to specify which of the sets in your designfile you would like to analyze. The set names are separated by dashes. THe default is all sets in the designfile.\n"; helpString += "The iters parameter allows you to set number of bootstrap permutations for estimating null distribution of t statistic. The default is 1000. \n"; helpString += "The threshold parameter allows you to set the significance level to reject null hypotheses (default 0.05).\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distance levels you would like, and are also separated by dashes.\n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; helpString += "The metastats command should be in the following format: metastats(design=yourDesignFile).\n"; helpString += "Example metastats(design=temp.design, groups=A-B-C).\n"; helpString += "The default value for groups is all the groups in your groupfile, and all labels in your inputfile will be used.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MetaStatsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "metastats") { pattern = "[filename],[distance],[group],metastats"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** MetaStatsCommand::MetaStatsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["metastats"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "MetaStatsCommand"); exit(1); } } //********************************************************************************************************************** MetaStatsCommand::MetaStatsCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["metastats"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //check for required parameters sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //check for required parameters designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { abort = true; } else if (designfile == "not found") { //if there is a current design file, use it designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current designfile and the design parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setDesignFile(designfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(sharedfile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; pickedGroups = false; } else { pickedGroups = true; m->splitAtDash(groups, Groups); m->setGroups(Groups); } sets = validParameter.validFile(parameters, "sets", false); if (sets == "not found") { sets = ""; } else { m->splitAtDash(sets, Sets); } string temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "threshold", false); if (temp == "not found") { temp = "0.05"; } m->mothurConvert(temp, threshold); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); } } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "MetaStatsCommand"); exit(1); } } //********************************************************************************************************************** int MetaStatsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //just used to convert files to test metastats online /****************************************************/ bool convertInputToShared = false; convertSharedToInput = false; if (convertInputToShared) { convertToShared(sharedfile); return 0; } /****************************************************/ designMap = new DesignMap(designfile); input = new InputData(sharedfile, "sharedfile"); lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //setup the pairwise comparions of sets for metastats //calculate number of comparisons i.e. with groups A,B,C = AB, AC, BC = 3; //make sure sets are all in designMap SharedUtil* util = new SharedUtil(); vector dGroups = designMap->getCategory(); util->setGroups(Sets, dGroups); delete util; int numGroups = Sets.size(); for (int a=0; a groups; groups.push_back(Sets[a]); groups.push_back(Sets[l]); namesOfGroupCombos.push_back(groups); } } //only 1 combo if (numGroups == 2) { processors = 1; } else if (numGroups < 2) { m->mothurOut("Not enough sets, I need at least 2 valid sets. Unable to complete command."); m->mothurOutEndLine(); m->control_pressed = true; } if(processors != 1){ int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } } //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); delete input; delete designMap; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { outputTypes.clear(); m->clearGroups(); delete input; delete designMap; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //get next line to process lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { outputTypes.clear(); m->clearGroups(); delete input; delete designMap; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //reset groups parameter m->clearGroups(); delete input; delete designMap; if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0;} m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int MetaStatsCommand::process(vector& thisLookUp){ try { if(processors == 1){ driver(0, namesOfGroupCombos.size(), thisLookUp); }else{ int process = 1; vector processIDS; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ driver(lines[process].start, lines[process].end, thisLookUp); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide lines.clear(); int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ driver(lines[process].start, lines[process].end, thisLookUp); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part driver(lines[0].start, lines[0].end, thisLookUp); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the summarySharedData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to pass results vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; i newLookup; vector designMapGroups; for (int k = 0; k < thisLookUp.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisLookUp[k]->getLabel()); temp->setGroup(thisLookUp[k]->getGroup()); newLookup.push_back(temp); designMapGroups.push_back(designMap->get(thisLookUp[k]->getGroup())); } //for each bin for (int k = 0; k < thisLookUp[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } for (int j = 0; j < thisLookUp.size(); j++) { newLookup[j]->push_back(thisLookUp[j]->getAbundance(k), thisLookUp[j]->getGroup()); } } // Allocate memory for thread data. metastatsData* tempSum = new metastatsData(sharedfile, outputDir, m, lines[i].start, lines[i].end, namesOfGroupCombos, newLookup, designMapGroups, iters, threshold); pDataArray.push_back(tempSum); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyMetastatsThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //do my part driver(lines[0].start, lines[0].end, thisLookUp); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ if (pDataArray[i]->count != (pDataArray[i]->num)) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->num) + " groups assigned to it, quitting. \n"); m->control_pressed = true; } for (int j = 0; j < pDataArray[i]->thisLookUp.size(); j++) { delete pDataArray[i]->thisLookUp[j]; } for (int j = 0; j < pDataArray[i]->outputNames.size(); j++) { outputNames.push_back(pDataArray[i]->outputNames[j]); outputTypes["metastats"].push_back(pDataArray[i]->outputNames[j]); } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif } return 0; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "process"); exit(1); } } //********************************************************************************************************************** int MetaStatsCommand::driver(unsigned long long start, unsigned long long num, vector& thisLookUp) { try { //for each combo for (int c = start; c < (start+num); c++) { //get set names string setA = namesOfGroupCombos[c][0]; string setB = namesOfGroupCombos[c][1]; //get filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = thisLookUp[0]->getLabel(); variables["[group]"] = setA + "-" + setB; string outputFileName = getOutputFileName("metastats",variables); outputNames.push_back(outputFileName); outputTypes["metastats"].push_back(outputFileName); //int nameLength = outputFileName.length(); //char * output = new char[nameLength]; //strcpy(output, outputFileName.c_str()); //build matrix from shared rabunds //double** data; //data = new double*[thisLookUp[0]->getNumBins()]; vector< vector > data2; data2.resize(thisLookUp[0]->getNumBins()); vector subset; int setACount = 0; int setBCount = 0; for (int i = 0; i < thisLookUp.size(); i++) { string thisGroup = thisLookUp[i]->getGroup(); //is this group for a set we want to compare?? //sorting the sets by putting setB at the back and setA in the front if ((designMap->get(thisGroup) == setB)) { subset.push_back(thisLookUp[i]); setBCount++; }else if ((designMap->get(thisGroup) == setA)) { subset.insert(subset.begin()+setACount, thisLookUp[i]); setACount++; } } if ((setACount == 0) || (setBCount == 0)) { m->mothurOut("Missing shared info for " + setA + " or " + setB + ". Skipping comparison."); m->mothurOutEndLine(); outputNames.pop_back(); }else { //fill data for (int j = 0; j < thisLookUp[0]->getNumBins(); j++) { //data[j] = new double[subset.size()]; data2[j].resize(subset.size(), 0.0); for (int i = 0; i < subset.size(); i++) { data2[j][i] = (subset[i]->getAbundance(j)); } } m->mothurOut("Comparing " + setA + " and " + setB + "..."); m->mothurOutEndLine(); //metastat_main(output, thisLookUp[0]->getNumBins(), subset.size(), threshold, iters, data, setACount); if (convertSharedToInput) { convertToInput(subset, outputFileName); } m->mothurOutEndLine(); MothurMetastats mothurMeta(threshold, iters); mothurMeta.runMetastats(outputFileName , data2, setACount); m->mothurOutEndLine(); m->mothurOutEndLine(); } //free memory //delete output; //for(int i = 0; i < thisLookUp[0]->getNumBins(); i++) { delete[] data[i]; } //delete[] data; } return 0; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "driver"); exit(1); } } //********************************************************************************************************************** /*Metastats files look like: 13_0 14_0 13_52 14_52 70S 71S 72S M1 M2 M3 C11 C12 C21 C15 C16 C19 C3 C4 C9 Alphaproteobacteria 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 Mollicutes 0 0 2 0 0 59 5 11 4 1 0 2 8 1 0 1 0 3 0 Verrucomicrobiae 0 0 0 0 0 1 6 0 0 0 0 0 0 0 0 0 0 0 0 Deltaproteobacteria 0 0 0 0 0 6 1 0 1 0 1 1 7 0 0 0 0 0 0 Cyanobacteria 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Epsilonproteobacteria 0 0 0 0 0 0 0 0 6 0 0 3 1 0 0 0 0 0 0 Clostridia 75 65 207 226 801 280 267 210 162 197 81 120 106 148 120 94 84 98 121 Bacilli 3 2 16 8 21 52 31 70 46 65 4 28 5 23 62 26 20 30 25 Bacteroidetes (class) 21 25 22 64 226 193 296 172 98 55 19 149 201 85 50 76 113 92 82 Gammaproteobacteria 0 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 0 0 0 TM7_genera_incertae_sedis 0 0 0 0 0 0 0 0 1 0 1 2 0 2 0 0 0 0 0 Actinobacteria (class) 1 1 1 2 0 0 0 9 3 7 1 1 1 3 1 2 1 2 3 Betaproteobacteria 0 0 3 3 0 0 9 1 1 0 1 2 3 1 1 0 0 0 0 */ //this function is just used to convert files to test the differences between the metastats version and mothurs version int MetaStatsCommand::convertToShared(string filename) { try { ifstream in; m->openInputFile(filename, in); string header = m->getline(in); m->gobble(in); vector groups = m->splitWhiteSpace(header); vector newLookup; cout << groups.size() << endl; for (int i = 0; i < groups.size(); i++) { cout << "creating group " << groups[i] << endl; SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel("0.03"); temp->setGroup(groups[i]); newLookup.push_back(temp); } int otuCount = 0; while (!in.eof()) { if (m->control_pressed) { break; } string otuname; in >> otuname; m->gobble(in); otuCount++; cout << otuname << endl; for (int i = 0; i < groups.size(); i++) { double temp; in >> temp; m->gobble(in); newLookup[i]->push_back(temp, groups[i]); } m->gobble(in); } in.close(); ofstream out; m->openOutputFile(filename+".shared", out); out << "label\tgroup\tnumOTUs"; string snumBins = toString(otuCount); for (int i = 0; i < otuCount; i++) { string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; out << '\t' << binLabel; } out << endl; for (int i = 0; i < groups.size(); i++) { out << "0.03" << '\t' << groups[i] << '\t'; newLookup[i]->print(out); } out.close(); cout << filename+".shared" << endl; return 0; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "convertToShared"); exit(1); } } //********************************************************************************************************************** int MetaStatsCommand::convertToInput(vector& subset, string thisfilename) { try { ofstream out; m->openOutputFile(thisfilename+".matrix", out); for (int i = 0; i < subset.size(); i++) { out << '\t' << subset[i]->getGroup(); } out << endl; for (int i = 0; i < subset[0]->getNumBins(); i++) { out << m->currentSharedBinLabels[i]; for (int j = 0; j < subset.size(); j++) { out << '\t' << subset[j]->getAbundance(i); } out << endl; } out.close(); cout << thisfilename+".matrix" << endl; return 0; } catch(exception& e) { m->errorOut(e, "MetaStatsCommand", "convertToInput"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/metastatscommand.h000066400000000000000000000123641255543666200217620ustar00rootroot00000000000000#ifndef METASTATSCOMMAND_H #define METASTATSCOMMAND_H /* * metastatscommand.h * Mothur * * Created by westcott on 9/16/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "sharedrabundvector.h" #include "mothurmetastats.h" #include "designmap.h" class MetaStatsCommand : public Command { public: MetaStatsCommand(string); MetaStatsCommand(); ~MetaStatsCommand() {} vector setParameters(); string getCommandName() { return "metastats"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "White JR, Nagarajan N, Pop M (2009). Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 5: e1000352. \nhttp://www.mothur.org/wiki/Metastats"; } string getDescription() { return "detects differentially abundant features in clinical metagenomic samples"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lines; DesignMap* designMap; InputData* input; vector lookup; bool abort, allLines, pickedGroups; set labels; //holds labels to be used string groups, label, outputDir, inputDir, designfile, sets, sharedfile; vector Groups, outputNames, Sets; vector< vector > namesOfGroupCombos; int iters, processors; float threshold; int process(vector&); int driver(unsigned long long, unsigned long long, vector&); int convertToShared(string filename); int convertToInput(vector&, string); bool convertSharedToInput; }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct metastatsData { vector thisLookUp; vector< vector > namesOfGroupCombos; vector designMapGroups; vector outputNames; int start; int num, iters, count; float threshold; MothurOut* m; string sharedfile; string outputDir; metastatsData(){} metastatsData(string sf, string oDir, MothurOut* mout, int st, int en, vector< vector > ns, vector lu, vector dg, int i, float thr) { sharedfile = sf; outputDir = oDir; m = mout; start = st; num = en; namesOfGroupCombos = ns; thisLookUp = lu; designMapGroups = dg; iters = i; threshold = thr; count=0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyMetastatsThreadFunction(LPVOID lpParam){ metastatsData* pDataArray; pDataArray = (metastatsData*)lpParam; try { //for each combo for (int c = pDataArray->start; c < (pDataArray->start+pDataArray->num); c++) { pDataArray->count++; //get set names string setA = pDataArray->namesOfGroupCombos[c][0]; string setB = pDataArray->namesOfGroupCombos[c][1]; //get filename string outputFileName = pDataArray->outputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(pDataArray->sharedfile)) + pDataArray->thisLookUp[0]->getLabel() + "." + setA + "-" + setB + ".metastats"; pDataArray->outputNames.push_back(outputFileName); vector< vector > data2; data2.resize(pDataArray->thisLookUp[0]->getNumBins()); vector subset; int setACount = 0; int setBCount = 0; for (int i = 0; i < pDataArray->thisLookUp.size(); i++) { //is this group for a set we want to compare?? //sorting the sets by putting setB at the back and setA in the front if (pDataArray->designMapGroups[i] == setB) { subset.push_back(pDataArray->thisLookUp[i]); setBCount++; }else if (pDataArray->designMapGroups[i] == setA) { subset.insert(subset.begin()+setACount, pDataArray->thisLookUp[i]); setACount++; } } if ((setACount == 0) || (setBCount == 0)) { pDataArray->m->mothurOut("Missing shared info for " + setA + " or " + setB + ". Skipping comparison."); pDataArray->m->mothurOutEndLine(); pDataArray->outputNames.pop_back(); }else { //fill data for (int j = 0; j < pDataArray->thisLookUp[0]->getNumBins(); j++) { data2[j].resize(subset.size(), 0.0); for (int i = 0; i < subset.size(); i++) { data2[j][i] = (subset[i]->getAbundance(j)); } } pDataArray->m->mothurOut("Comparing " + setA + " and " + setB + "..."); pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOutEndLine(); MothurMetastats mothurMeta(pDataArray->threshold, pDataArray->iters); mothurMeta.runMetastats(outputFileName, data2, setACount); pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOutEndLine(); } } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "MetaStatsCommand", "MyMetastatsThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/mgclustercommand.cpp000066400000000000000000001004021255543666200223040ustar00rootroot00000000000000/* * mgclustercommand.cpp * Mothur * * Created by westcott on 12/11/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "mgclustercommand.h" //********************************************************************************************************************** vector MGClusterCommand::setParameters(){ try { CommandParameter pblast("blast", "InputTypes", "", "", "none", "none", "none","list",false,true,true); parameters.push_back(pblast); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "ColumnName","rabund-sabund",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter plength("length", "Number", "", "5", "", "", "","",false,false); parameters.push_back(plength); CommandParameter ppenalty("penalty", "Number", "", "0.10", "", "", "","",false,false); parameters.push_back(ppenalty); CommandParameter pcutoff("cutoff", "Number", "", "0.70", "", "", "","",false,false,true); parameters.push_back(pcutoff); CommandParameter pprecision("precision", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pprecision); CommandParameter pmethod("method", "Multiple", "furthest-nearest-average", "average", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter phard("hard", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(phard); CommandParameter pmin("min", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pmin); CommandParameter pmerge("merge", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pmerge); CommandParameter padjust("adjust", "String", "", "F", "", "", "","",false,false); parameters.push_back(padjust); CommandParameter phcluster("hcluster", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(phcluster); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MGClusterCommand::getHelpString(){ try { string helpString = ""; helpString += "The mgcluster command parameter options are blast, name, cutoff, precision, hard, method, merge, min, length, penalty, adjust and hcluster. The blast parameter is required.\n"; helpString += "The mgcluster command reads a blast and name file and clusters the sequences into OPF units similiar to the OTUs.\n"; helpString += "This command outputs a .list, .rabund and .sabund file that can be used with mothur other commands to estimate richness.\n"; helpString += "The cutoff parameter is used to specify the maximum distance you would like to cluster to. The default is 0.70.\n"; helpString += "The precision parameter's default value is 100. \n"; helpString += "The acceptable mgcluster methods are furthest, nearest and average. If no method is provided then average is assumed.\n"; helpString += "The min parameter allows you to specify is you want the minimum or maximum blast score ratio used in calculating the distance. The default is true, meaning you want the minimum.\n"; helpString += "The length parameter is used to specify the minimum overlap required. The default is 5.\n"; helpString += "The adjust parameter is used to handle missing distances. If you set a cutoff, adjust=f by default. If not, adjust=t by default. Adjust=f, means ignore missing distances and adjust cutoff as needed with the average neighbor method. Adjust=t, will treat missing distances as 1.0. You can also set the value the missing distances should be set to, adjust=0.5 would give missing distances a value of 0.5.\n"; helpString += "The penalty parameter is used to adjust the error rate. The default is 0.10.\n"; helpString += "The merge parameter allows you to shut off merging based on overlaps and just cluster. By default merge is true, meaning you want to merge.\n"; helpString += "The hcluster parameter allows you to use the hcluster algorithm when clustering. This may be neccessary if your file is too large to fit into RAM. The default is false.\n"; helpString += "The mgcluster command should be in the following format: \n"; helpString += "mgcluster(blast=yourBlastfile, name=yourNameFile, cutoff=yourCutOff).\n"; helpString += "Note: No spaces between parameter labels (i.e. balst), '=' and parameters (i.e.yourBlastfile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string MGClusterCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "list") { pattern = "[filename],[clustertag],list-[filename],[clustertag],[tag2],list"; } else if (type == "rabund") { pattern = "[filename],[clustertag],rabund"; } else if (type == "sabund") { pattern = "[filename],[clustertag],sabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "getOutputPattern"); exit(1); } } //******************************************************************************************************************* MGClusterCommand::MGClusterCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "MGClusterCommand"); exit(1); } } //********************************************************************************************************************** MGClusterCommand::MGClusterCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("blast"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["blast"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters blastfile = validParameter.validFile(parameters, "blast", true); if (blastfile == "not open") { blastfile = ""; abort = true; } else if (blastfile == "not found") { blastfile = ""; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(blastfile); //if user entered a file with a path then preserve it } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if (countfile != "" && namefile != "") { m->mothurOut("[ERROR]: Cannot have both a name file and count file. Please use one or the other."); m->mothurOutEndLine(); abort = true; } if ((blastfile == "")) { m->mothurOut("When executing a mgcluster command you must provide a blastfile."); m->mothurOutEndLine(); abort = true; } //check for optional parameter and set defaults string temp; temp = validParameter.validFile(parameters, "precision", false); if (temp == "not found") { temp = "100"; } precisionLength = temp.length(); m->mothurConvert(temp, precision); cutoffSet = false; temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "0.70"; } else { cutoffSet = true; } m->mothurConvert(temp, cutoff); cutoff += (5 / (precision * 10.0)); method = validParameter.validFile(parameters, "method", false); if (method == "not found") { method = "average"; } if ((method == "furthest") || (method == "nearest") || (method == "average")) { } else { m->mothurOut("Not a valid clustering method. Valid clustering algorithms are furthest, nearest or average."); m->mothurOutEndLine(); abort = true; } temp = validParameter.validFile(parameters, "length", false); if (temp == "not found") { temp = "5"; } m->mothurConvert(temp, length); temp = validParameter.validFile(parameters, "penalty", false); if (temp == "not found") { temp = "0.10"; } m->mothurConvert(temp, penalty); temp = validParameter.validFile(parameters, "min", false); if (temp == "not found") { temp = "true"; } minWanted = m->isTrue(temp); temp = validParameter.validFile(parameters, "merge", false); if (temp == "not found") { temp = "true"; } merge = m->isTrue(temp); temp = validParameter.validFile(parameters, "hcluster", false); if (temp == "not found") { temp = "false"; } hclusterWanted = m->isTrue(temp); temp = validParameter.validFile(parameters, "hard", false); if (temp == "not found") { temp = "T"; } hard = m->isTrue(temp); temp = validParameter.validFile(parameters, "adjust", false); if (temp == "not found") { if (cutoffSet) { temp = "F"; }else { temp="T"; } } if (m->isNumeric1(temp)) { m->mothurConvert(temp, adjust); } else if (m->isTrue(temp)) { adjust = 1.0; } else { adjust = -1.0; } } } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "MGClusterCommand"); exit(1); } } //********************************************************************************************************************** int MGClusterCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //read names file map counts; if (namefile != "") { nameMap = new NameAssignment(namefile); nameMap->readMap(); }else if (countfile != "") { ct = new CountTable(); ct->readTable(countfile, false, false); nameMap= new NameAssignment(); vector tempNames = ct->getNamesOfSeqs(); for (int i = 0; i < tempNames.size(); i++) { nameMap->push_back(tempNames[i]); } counts = ct->getNameMap(); }else{ nameMap= new NameAssignment(); } string fileroot = outputDir + m->getRootName(m->getSimpleName(blastfile)); string tag = ""; time_t start; float previousDist = 0.00000; float rndPreviousDist = 0.00000; //read blastfile - creates sparsematrices for the distances and overlaps as well as a listvector //must remember to delete those objects here since readBlast does not read = new ReadBlast(blastfile, cutoff, penalty, length, minWanted, hclusterWanted); read->read(nameMap); list = new ListVector(nameMap->getListVector()); RAbundVector* rabund = NULL; if(countfile != "") { rabund = new RAbundVector(); createRabund(ct, list, rabund); }else { rabund = new RAbundVector(list->getRAbundVector()); } //list = new ListVector(nameMap->getListVector()); //rabund = new RAbundVector(list->getRAbundVector()); if (m->control_pressed) { outputTypes.clear(); delete nameMap; delete read; delete list; delete rabund; return 0; } start = time(NULL); oldList = *list; map Seq2Bin; map oldSeq2Bin; if (method == "furthest") { tag = "fn"; } else if (method == "nearest") { tag = "nn"; } else { tag = "an"; } map variables; variables["[filename]"] = fileroot; variables["[clustertag]"] = tag; string sabundFileName = getOutputFileName("sabund", variables); string rabundFileName = getOutputFileName("rabund", variables); if (countfile != "") { variables["[tag2]"] = "unique_list"; } string listFileName = getOutputFileName("list", variables); if (countfile == "") { m->openOutputFile(sabundFileName, sabundFile); m->openOutputFile(rabundFileName, rabundFile); } m->openOutputFile(listFileName, listFile); list->printHeaders(listFile); if (m->control_pressed) { delete nameMap; delete read; delete list; delete rabund; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } double saveCutoff = cutoff; if (!hclusterWanted) { //get distmatrix and overlap SparseDistanceMatrix* distMatrix = read->getDistMatrix(); overlapMatrix = read->getOverlapMatrix(); //already sorted by read delete read; //create cluster if (method == "furthest") { cluster = new CompleteLinkage(rabund, list, distMatrix, cutoff, method, adjust); } else if(method == "nearest"){ cluster = new SingleLinkage(rabund, list, distMatrix, cutoff, method, adjust); } else if(method == "average"){ cluster = new AverageLinkage(rabund, list, distMatrix, cutoff, method, adjust); } cluster->setMapWanted(true); Seq2Bin = cluster->getSeqtoBin(); oldSeq2Bin = Seq2Bin; if (m->control_pressed) { delete nameMap; delete distMatrix; delete list; delete rabund; delete cluster; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } //cluster using cluster classes while (distMatrix->getSmallDist() < cutoff && distMatrix->getNNodes() > 0){ if (m->debug) { cout << "numNodes=" << distMatrix->getNNodes() << " smallDist = " << distMatrix->getSmallDist() << endl; } cluster->update(cutoff); if (m->control_pressed) { delete nameMap; delete distMatrix; delete list; delete rabund; delete cluster; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } float dist = distMatrix->getSmallDist(); float rndDist; if (hard) { rndDist = m->ceilDist(dist, precision); }else{ rndDist = m->roundDist(dist, precision); } if(previousDist <= 0.0000 && dist != previousDist){ oldList.setLabel("unique"); printData(&oldList, counts); } else if(rndDist != rndPreviousDist){ if (merge) { ListVector* temp = mergeOPFs(oldSeq2Bin, rndPreviousDist); if (m->control_pressed) { delete nameMap; delete distMatrix; delete list; delete rabund; delete cluster; delete temp; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } temp->setLabel(toString(rndPreviousDist, precisionLength-1)); printData(temp, counts); delete temp; }else{ oldList.setLabel(toString(rndPreviousDist, precisionLength-1)); printData(&oldList, counts); } } previousDist = dist; rndPreviousDist = rndDist; oldList = *list; Seq2Bin = cluster->getSeqtoBin(); oldSeq2Bin = Seq2Bin; } if(previousDist <= 0.0000){ oldList.setLabel("unique"); printData(&oldList, counts); } else if(rndPreviousDistcontrol_pressed) { delete nameMap; delete distMatrix; delete list; delete rabund; delete cluster; delete temp; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } temp->setLabel(toString(rndPreviousDist, precisionLength-1)); printData(temp, counts); delete temp; }else{ oldList.setLabel(toString(rndPreviousDist, precisionLength-1)); printData(&oldList, counts); } } //free memory overlapMatrix.clear(); delete distMatrix; delete cluster; }else { //use hcluster to cluster //get distmatrix and overlap overlapFile = read->getOverlapFile(); distFile = read->getDistFile(); delete read; //sort the distance and overlap files sortHclusterFiles(distFile, overlapFile); if (m->control_pressed) { delete nameMap; delete list; delete rabund; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } //create cluster hcluster = new HCluster(rabund, list, method, distFile, nameMap, cutoff); hcluster->setMapWanted(true); Seq2Bin = cluster->getSeqtoBin(); oldSeq2Bin = Seq2Bin; vector seqs; seqs.resize(1); // to start loop //ifstream inHcluster; //m->openInputFile(distFile, inHcluster); if (m->control_pressed) { delete nameMap; delete list; delete rabund; delete hcluster; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } while (seqs.size() != 0){ seqs = hcluster->getSeqs(); //to account for cutoff change in average neighbor if (seqs.size() != 0) { if (seqs[0].dist > cutoff) { break; } } if (m->control_pressed) { delete nameMap; delete list; delete rabund; delete hcluster; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); m->mothurRemove(distFile); m->mothurRemove(overlapFile); outputTypes.clear(); return 0; } for (int i = 0; i < seqs.size(); i++) { //-1 means skip me if (seqs[i].seq1 != seqs[i].seq2) { cutoff = hcluster->update(seqs[i].seq1, seqs[i].seq2, seqs[i].dist); if (m->control_pressed) { delete nameMap; delete list; delete rabund; delete hcluster; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); m->mothurRemove(distFile); m->mothurRemove(overlapFile); outputTypes.clear(); return 0; } float rndDist; if (hard) { rndDist = m->ceilDist(seqs[i].dist, precision); }else{ rndDist = m->roundDist(seqs[i].dist, precision); } if((previousDist <= 0.0000) && (seqs[i].dist != previousDist)){ oldList.setLabel("unique"); printData(&oldList, counts); } else if((rndDist != rndPreviousDist)){ if (merge) { ListVector* temp = mergeOPFs(oldSeq2Bin, rndPreviousDist); if (m->control_pressed) { delete nameMap; delete list; delete rabund; delete hcluster; delete temp; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); m->mothurRemove(distFile); m->mothurRemove(overlapFile); outputTypes.clear(); return 0; } temp->setLabel(toString(rndPreviousDist, precisionLength-1)); printData(temp, counts); delete temp; }else{ oldList.setLabel(toString(rndPreviousDist, precisionLength-1)); printData(&oldList, counts); } } previousDist = seqs[i].dist; rndPreviousDist = rndDist; oldList = *list; Seq2Bin = cluster->getSeqtoBin(); oldSeq2Bin = Seq2Bin; } } } //inHcluster.close(); if(previousDist <= 0.0000){ oldList.setLabel("unique"); printData(&oldList, counts); } else if(rndPreviousDistcontrol_pressed) { delete nameMap; delete list; delete rabund; delete hcluster; delete temp; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); m->mothurRemove(distFile); m->mothurRemove(overlapFile); outputTypes.clear(); return 0; } temp->setLabel(toString(rndPreviousDist, precisionLength-1)); printData(temp, counts); delete temp; }else{ oldList.setLabel(toString(rndPreviousDist, precisionLength-1)); printData(&oldList, counts); } } delete hcluster; m->mothurRemove(distFile); m->mothurRemove(overlapFile); } delete list; delete rabund; listFile.close(); if (countfile == "") { sabundFile.close(); rabundFile.close(); } if (m->control_pressed) { delete nameMap; listFile.close(); if (countfile == "") { rabundFile.close(); sabundFile.close(); m->mothurRemove((fileroot+ tag + ".rabund")); m->mothurRemove((fileroot+ tag + ".sabund")); } m->mothurRemove((fileroot+ tag + ".list")); outputTypes.clear(); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(listFileName); m->mothurOutEndLine(); outputNames.push_back(listFileName); outputTypes["list"].push_back(listFileName); if (countfile == "") { m->mothurOut(rabundFileName); m->mothurOutEndLine(); outputNames.push_back(rabundFileName); outputTypes["rabund"].push_back(rabundFileName); m->mothurOut(sabundFileName); m->mothurOutEndLine(); outputNames.push_back(sabundFileName); outputTypes["sabund"].push_back(sabundFileName); } m->mothurOutEndLine(); if (saveCutoff != cutoff) { if (hard) { saveCutoff = m->ceilDist(saveCutoff, precision); } else { saveCutoff = m->roundDist(saveCutoff, precision); } m->mothurOut("changed cutoff to " + toString(cutoff)); m->mothurOutEndLine(); } //set list file as new current listfile string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } //set rabund file as new current rabundfile itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } //set sabund file as new current sabundfile itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } m->mothurOut("It took " + toString(time(NULL) - start) + " seconds to cluster."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "execute"); exit(1); } } //********************************************************************************************************************** void MGClusterCommand::printData(ListVector* mergedList, map& counts){ try { if (countfile != "") { mergedList->print(listFile, counts); }else { mergedList->print(listFile); } SAbundVector sabund = mergedList->getSAbundVector(); if (countfile == "") { mergedList->getRAbundVector().print(rabundFile); sabund.print(sabundFile); } sabund.print(cout); } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "printData"); exit(1); } } //********************************************************************************************************************** //this merging is just at the reporting level, after this info is printed to the file it is gone and does not effect the datastructures //that are used to cluster by distance. this is done so that the overlapping data does not have more influenece than the distance data. ListVector* MGClusterCommand::mergeOPFs(map binInfo, float dist){ try { //create new listvector so you don't overwrite the clustering ListVector* newList = new ListVector(oldList); bool done = false; ifstream inOverlap; int count = 0; if (hclusterWanted) { m->openInputFile(overlapFile, inOverlap); if (inOverlap.eof()) { done = true; } }else { if (overlapMatrix.size() == 0) { done = true; } } while (!done) { if (m->control_pressed) { if (hclusterWanted) { inOverlap.close(); } return newList; } //get next overlap seqDist overlapNode; if (!hclusterWanted) { if (count < overlapMatrix.size()) { //do we have another node in the matrix overlapNode = overlapMatrix[count]; count++; }else { break; } }else { if (!inOverlap.eof()) { string firstName, secondName; float overlapDistance; inOverlap >> firstName >> secondName >> overlapDistance; m->gobble(inOverlap); //commented out because we check this in readblast already //map::iterator itA = nameMap->find(firstName); //map::iterator itB = nameMap->find(secondName); //if(itA == nameMap->end()){ cerr << "AAError: Sequence '" << firstName << "' was not found in the names file, please correct\n"; exit(1); } //if(itB == nameMap->end()){ cerr << "ABError: Sequence '" << secondName << "' was not found in the names file, please correct\n"; exit(1); } //overlapNode.seq1 = itA->second; //overlapNode.seq2 = itB->second; overlapNode.seq1 = nameMap->get(firstName); overlapNode.seq2 = nameMap->get(secondName); overlapNode.dist = overlapDistance; }else { inOverlap.close(); break; } } if (overlapNode.dist < dist) { //get names of seqs that overlap string name1 = nameMap->get(overlapNode.seq1); string name2 = nameMap->get(overlapNode.seq2); //use binInfo to find out if they are already in the same bin //map::iterator itBin1 = binInfo.find(name1); //map::iterator itBin2 = binInfo.find(name2); //if(itBin1 == binInfo.end()){ cerr << "AAError: Sequence '" << name1 << "' does not have any bin info.\n"; exit(1); } //if(itBin2 == binInfo.end()){ cerr << "ABError: Sequence '" << name2 << "' does not have any bin info.\n"; exit(1); } //int binKeep = itBin1->second; //int binRemove = itBin2->second; int binKeep = binInfo[name1]; int binRemove = binInfo[name2]; //if not merge bins and update binInfo if(binKeep != binRemove) { //save names in old bin string names = newList->get(binRemove); //merge bins into name1s bin newList->set(binKeep, newList->get(binRemove)+','+newList->get(binKeep)); newList->set(binRemove, ""); //update binInfo while (names.find_first_of(',') != -1) { //get name from bin string name = names.substr(0,names.find_first_of(',')); //save name and bin number binInfo[name] = binKeep; names = names.substr(names.find_first_of(',')+1, names.length()); } //get last name binInfo[names] = binKeep; } }else { done = true; } } //return listvector return newList; } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "mergeOPFs"); exit(1); } } //********************************************************************************************************************** void MGClusterCommand::sortHclusterFiles(string unsortedDist, string unsortedOverlap) { try { //sort distFile string sortedDistFile = m->sortFile(unsortedDist, outputDir); m->mothurRemove(unsortedDist); //delete unsorted file distFile = sortedDistFile; //sort overlap file string sortedOverlapFile = m->sortFile(unsortedOverlap, outputDir); m->mothurRemove(unsortedOverlap); //delete unsorted file overlapFile = sortedOverlapFile; } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "sortHclusterFiles"); exit(1); } } //********************************************************************************************************************** void MGClusterCommand::createRabund(CountTable*& ct, ListVector*& list, RAbundVector*& rabund){ try { //vector names = ct.getNamesOfSeqs(); //for ( int i; i < ct.getNumGroups(); i++ ) { rav.push_back( ct.getNumSeqs(names[i]) ); } //return rav; for(int i = 0; i < list->getNumBins(); i++) { vector binNames; string bin = list->get(i); m->splitAtComma(bin, binNames); int total = 0; for (int j = 0; j < binNames.size(); j++) { total += ct->getNumSeqs(binNames[j]); } rabund->push_back(total); } } catch(exception& e) { m->errorOut(e, "MGClusterCommand", "createRabund"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/mgclustercommand.h000066400000000000000000000036721255543666200217640ustar00rootroot00000000000000#ifndef MGCLUSTERCOMMAND_H #define MGCLUSTERCOMMAND_H /* * mgclustercommand.h * Mothur * * Created by westcott on 12/11/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "readblast.h" #include "nameassignment.hpp" #include "cluster.hpp" #include "hcluster.h" #include "rabundvector.hpp" #include "sabundvector.hpp" #include "counttable.h" /**********************************************************************/ class MGClusterCommand : public Command { public: MGClusterCommand(string); MGClusterCommand(); ~MGClusterCommand(){} vector setParameters(); string getCommandName() { return "mgcluster"; } string getCommandCategory() { return "Clustering"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Handelsman J (2008). A statistical toolbox for metagenomics. BMC Bioinformatics 9: 34. \nhttp://www.mothur.org/wiki/Mgcluster"; } string getDescription() { return "cluster your sequences into OTUs using a blast file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: ReadBlast* read; NameAssignment* nameMap; Cluster* cluster; HCluster* hcluster; ListVector* list; CountTable* ct; ListVector oldList; RAbundVector rav; vector overlapMatrix; vector outputNames; string blastfile, method, namefile, countfile, overlapFile, distFile, outputDir; ofstream sabundFile, rabundFile, listFile; double cutoff; float penalty, adjust; int precision, length, precisionLength; bool abort, minWanted, hclusterWanted, merge, hard, cutoffSet; void printData(ListVector*, map&); ListVector* mergeOPFs(map, float); void sortHclusterFiles(string, string); vector getSeqs(ifstream&); void createRabund(CountTable*&, ListVector*&, RAbundVector*&); }; /**********************************************************************/ #endif mothur-1.36.1/source/commands/mimarksattributescommand.cpp000066400000000000000000000567501255543666200240710ustar00rootroot00000000000000// // mimarksattributescommand.cpp // Mothur // // Created by Sarah Westcott on 3/17/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #include "mimarksattributescommand.h" //********************************************************************************************************************** vector MimarksAttributesCommand::setParameters(){ try { CommandParameter pxml("xml", "InputTypes", "", "", "none", "none", "none","summary",false,false,true); parameters.push_back(pxml); CommandParameter psets("package", "String", "", "", "", "", "","",false,false); parameters.push_back(psets); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string MimarksAttributesCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "source") { pattern = "[filename],source"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** string MimarksAttributesCommand::getHelpString(){ try { string helpString = ""; helpString += "Reads bioSample Attributes xml and generates source for get.mimarkspackage command. Only parameter required is xml.\n"; helpString += "The package parameter allows you to set the package you want. Default MIMARKS.survey.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** MimarksAttributesCommand::MimarksAttributesCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["source"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "MimarksAttributesCommand"); exit(1); } } //********************************************************************************************************************** MimarksAttributesCommand::MimarksAttributesCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("xml"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["xml"] = inputDir + it->second; } } } vector tempOutNames; outputTypes["source"] = tempOutNames; //check for required parameters xmlFile = validParameter.validFile(parameters, "xml", true); if (xmlFile == "not open") { abort = true; } else if (xmlFile == "not found") { xmlFile = ""; abort=true; m->mothurOut("You must provide an xml file. It is required."); m->mothurOutEndLine(); } selectedPackage = validParameter.validFile(parameters, "package", false); if (selectedPackage == "not found") { selectedPackage = "MIMARKS.survey."; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(xmlFile); } } } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "MimarksAttributesCommand"); exit(1); } } //********************************************************************************************************************** int MimarksAttributesCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } ifstream in; m->openInputFile(xmlFile, in); string header = m->getline(in); m->gobble(in); if (header != "") { m->mothurOut("[ERROR]: " + header + " is not a bioSample attribute file.\n"); m->control_pressed = true; } map categories; map::iterator it; while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } Attribute attribute = readAttribute(in); if (attribute.name != "") { if (m->debug) { m->mothurOut("[DEBUG]: name=" + attribute.name + " harmonizedName=" + attribute.harmonizedName + " format=" + attribute.format + " description=" + attribute.description + " package=" + attribute.getPackagesString() + "\n"); } if (attribute.format == "") { attribute.format = "{none}"; } if (attribute.description == "") { attribute.description = "none"; } for (int i = 0; i < attribute.packages.size(); i++) { for (int j = 0; j < attribute.packages[i].groupName.length(); j++) { if (attribute.packages[i].groupName[j] == '-') { attribute.packages[i].groupName[j] = '_'; } } it = categories.find(attribute.packages[i].groupName); if (it != categories.end()) { //we already have this category, ie air, soil... if (attribute.packages[i].name == (it->second).packageName) { //add attribute to category (it->second).values[attribute.harmonizedName].required = attribute.packages[i].required; (it->second).values[attribute.harmonizedName].format = attribute.format; string newDescription = ""; for (int j = 0; j < attribute.description.length(); j++) { if (attribute.description[j] == '"') { newDescription += "\\\""; } else { newDescription += attribute.description[j]; } } (it->second).values[attribute.harmonizedName].description = newDescription; } }else { if ((attribute.packages[i].groupName == "\"Built\"") || (attribute.packages[i].groupName == "\"Nucleic Acid Sequence Source\"")) {} else { Group thisGroup(attribute.packages[i].name); thisGroup.values[attribute.harmonizedName].required = attribute.packages[i].required; thisGroup.values[attribute.harmonizedName].format = attribute.format; string newDescription = ""; for (int j = 0; j < attribute.description.length(); j++) { if (attribute.description[j] == '"') { newDescription += "\\\""; } else { newDescription += attribute.description[j]; } } thisGroup.values[attribute.harmonizedName].description = newDescription; categories[attribute.packages[i].groupName] = thisGroup; } } } } } in.close(); string requiredByALL = "*sample_name\t*description\t*sample_title\t*seq_methods\t*organism"; string rFormatALL = "#{text}\t{text}\t{text}\t{text}\t{controlled vacabulary}"; string rDescriptionALL = "#{sample name}\t{description of sample}\t{sample title}\t{description of library_construction_protocol}\t{http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock}"; string environment = "\"Environment\""; it = categories.find(environment); if (it != categories.end()) { map::iterator itValue = (it->second).values.begin(); if (itValue->second.required) { requiredByALL += "\t*" + itValue->first; rFormatALL += "\t{" + (itValue->second.format) + "}"; rDescriptionALL += "\t{" + (itValue->second.description) + "}"; } itValue++; for (; itValue != (it->second).values.end(); itValue++) { if (itValue->second.required) { requiredByALL += "\t*" + itValue->first; rFormatALL += "\t{" + (itValue->second.format) + "}"; rDescriptionALL += "\t{" + (itValue->second.description) + "}"; } } } ofstream out; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(xmlFile)); string outputFileName = getOutputFileName("source",variables); outputNames.push_back(outputFileName); outputTypes["source"].push_back(outputFileName); m->openOutputFile(outputFileName, out); //create outputs string requiredValues = requiredByALL; string nonRequiredValues = ""; string rFormat = rFormatALL; string nonRFormat = ""; string rDescription = rDescriptionALL; string nonRDescription = ""; it = categories.begin(); map::iterator itValue = (it->second).values.begin(); if (itValue->second.required) { requiredValues += "\t*" + itValue->first; rFormat += "\t{" + (itValue->second.format) + "}"; rDescription += "\t{" + (itValue->second.description) + "}"; }else { nonRequiredValues += itValue->first; nonRFormat += "{" + itValue->second.format + "}"; nonRDescription += "{" + (itValue->second.description) + "}"; } itValue++; for (; itValue != (it->second).values.end(); itValue++) { if (itValue->second.required) { requiredValues += "\t*" + itValue->first; rFormat += "\t{" + (itValue->second.format) + "}"; rDescription += "\t{" + (itValue->second.description) + "}"; }else { nonRequiredValues += "\t" + itValue->first; nonRFormat += "\t{" + itValue->second.format + "}"; nonRDescription += "\t{" + (itValue->second.description) + "}"; } } out << "if (package == " + it->first + ") {\n"; out << "\tout << \"#" + it->second.packageName + "\" << endl;\n"; out << "\t if (requiredonly) {\n"; out << "\t\tout << \"" + rDescription + "\" << endl;\n"; out << "\t\tout << \"" + rFormat + "\" << endl;\n"; out << "\t\tout << \"" + requiredValues + "\" << endl;\n"; out << "\t}else {\n"; out << "\t\tout << \"" + rDescription + '\t' + nonRDescription + "\" << endl;\n"; out << "\t\tout << \"" + rFormat + '\t' + nonRFormat + "\" << endl;\n"; out << "\t\tout << \"" + requiredValues + '\t' + nonRequiredValues + "\" << endl;\n"; out << "\t}\n"; out << "}"; it++; for (; it != categories.end(); it++) { if ((it->first == "\"Environment\"")) {} else { //create outputs string requiredValues = requiredByALL; string nonRequiredValues = ""; string rFormat = rFormatALL; string nonRFormat = ""; string rDescription = rDescriptionALL; string nonRDescription = ""; map::iterator itValue = (it->second).values.begin(); if (itValue->second.required) { requiredValues += "\t*" + itValue->first; rFormat += "\t{" + (itValue->second.format)+ "}"; rDescription += "\t{" + (itValue->second.description) + "}"; }else { nonRequiredValues += itValue->first; nonRFormat += "{" + itValue->second.format+ "}"; nonRDescription += "{" + (itValue->second.description) + "}"; } itValue++; for (; itValue != (it->second).values.end(); itValue++) { if (itValue->second.required) { requiredValues += "\t*" + itValue->first; rFormat += "\t{" + (itValue->second.format)+ "}"; rDescription += "\t{" + (itValue->second.description) + "}"; }else { nonRequiredValues += "\t" + itValue->first; nonRFormat += "\t{" + itValue->second.format+ "}"; nonRDescription += "\t{" + (itValue->second.description) + "}"; } } out << "else if (package == " + it->first + ") {\n"; out << "\tout << \"#" + it->second.packageName + "\" << endl;\n"; out << "\t if (requiredonly) {\n"; out << "\t\tout << \"" + rDescription + "\" << endl;\n"; out << "\t\tout << \"" + rFormat + "\" << endl;\n"; out << "\t\tout << \"" + requiredValues + "\" << endl;\n"; out << "\t}else {\n"; out << "\t\tout << \"" + rDescription + '\t' + nonRDescription + "\" << endl;\n"; out << "\t\tout << \"" + rFormat + '\t' + nonRFormat + "\" << endl;\n"; out << "\t\tout << \"" + requiredValues + '\t' + nonRequiredValues + "\" << endl;\n"; out << "\t}\n"; out << "}"; } } out << endl << endl; it = categories.begin(); out << "if ((package == " << it->first << ") "; it++; for (; it != categories.end(); it++) { out << "|| (package == " << it->first << ") "; } out << ") {}\n\n"; out << "vector requiredFieldsForPackage;\n"; vector rAll; m->splitAtChar(requiredByALL, rAll, '\t'); for (int i = 0; i < rAll.size(); i++) { out << "requiredFieldsForPackage.push_back(\"" + rAll[i].substr(1) + "\");\n"; } out << "\n\n"; for (it = categories.begin(); it != categories.end(); it++) { out << "if (packageType == \"" << it->second.packageName << "\") {"; for (map::iterator itValue = (it->second).values.begin(); itValue != (it->second).values.end(); itValue++) { if (itValue->second.required) { out << "\trequiredFieldsForPackage.push_back(\"" + itValue->first + "\");"; } } out << "}\n"; } out.close(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "execute"); exit(1); } } //********************************************************************************************************************** Attribute MimarksAttributesCommand::readAttribute(ifstream& in){ try { //read string header = m->getline(in); m->gobble(in); if (header == "") { Attribute temp; return temp; } if (header != "") { m->mothurOut("[ERROR]: " + header + ", expected '' in file.\n"); m->control_pressed = true; } //read name //wastewater type m->gobble(in); string name = m->getline(in); m->gobble(in); trimTags(name); //read hamonized name //wastewater_type m->gobble(in); string hname = m->getline(in); m->gobble(in); trimTags(hname); //read description // //the origin of wastewater such as human waste, rainfall, storm drains, etc. // string description = ""; unsigned long long spot = in.tellg(); m->gobble(in); char c = in.get(); c = in.get(); if (c == 'D') { //description description += "gobble(in); string thisLine = m->getline(in); m->gobble(in); description += thisLine; if (thisLine.find("") != string::npos) { break; } } trimTags(description); }else { //package in.seekg(spot); } //read format //{text} spot = in.tellg(); m->gobble(in); c = in.get(); c = in.get(); string format = ""; if (c == 'F') { //format format += "getline(in); m->gobble(in); if (format.find("") == string::npos) { //format is not on oneline while (!in.eof()) { m->gobble(in); string thisLine = m->getline(in); m->gobble(in); format += thisLine; if (thisLine.find("") != string::npos) { break; } } } trimTags(format); }else { //package in.seekg(spot); } Attribute attribute(hname, description, name, format); //read Synonym - may be none //ref biomaterial bool FirstTime = true; while (!in.eof()) { unsigned long long thisspot = in.tellg(); m->gobble(in); char c = in.get(); c = in.get(); if (c == 'S') { //synonym FirstTime = false; m->getline(in); m->gobble(in); }else { //package if (FirstTime) { in.seekg(spot); } else { in.seekg(thisspot); } break; } } //read packages - may be none //MIGS.ba.air.4.0 while (!in.eof()) { string package = m->getline(in); m->gobble(in); if (package == "") { break; } else { Package thisPackage = parsePackage(package); if (thisPackage.groupName != "ignore") { attribute.packages.push_back(thisPackage); } } } return attribute; } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "execute"); exit(1); } } //********************************************************************************************************************** Package MimarksAttributesCommand::parsePackage(string package){ try { string openingTag = trimTags(package); Package thispackage; thispackage.name = package; //only care about packages from our selection if (thispackage.name.find(selectedPackage) == string::npos) { thispackage.groupName = "ignore"; return thispackage; } int pos = openingTag.find("use"); if (pos != string::npos) { //read required or not string use = openingTag.substr(openingTag.find_first_of("\""), 11); if (use == "\"mandatory\"") { thispackage.required = true; } }else { m->mothurOut("[ERROR]: parsing error - " + openingTag + ". Expeacted something like in file.\n"); m->control_pressed = true; return thispackage; } //selectedPackage = MIMARKS.survey. pos = package.find(selectedPackage); if (pos != string::npos) { //read groupname string group = package.substr(pos+15); group = group.substr(0, (group.find_first_of("."))); thispackage.groupName = "\"" + group + "\""; }else { thispackage.groupName = "ignore"; } return thispackage; } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "parsePackage"); exit(1); } } //********************************************************************************************************************** string MimarksAttributesCommand::trimTags(string& value){ try { string forwardTag = ""; string thisValue = ""; int openCarrot = 0; int closedCarrot = 0; for (int i = 0; i < value.length(); i++) { if (m->control_pressed) { return forwardTag; } if (value[i] == '<') { openCarrot++; } else if (value[i] == '>') { closedCarrot++; } //you are reading front tag if ((openCarrot == 1) && (closedCarrot == 0)) { forwardTag += value[i]; } if (openCarrot == closedCarrot) { //reading value if (value[i] != '>') { thisValue += value[i]; } } if (openCarrot > 1) { break; } } value = thisValue; return (forwardTag + '>'); } catch(exception& e) { m->errorOut(e, "MimarksAttributesCommand", "trimTags"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/mimarksattributescommand.h000066400000000000000000000054711255543666200235300ustar00rootroot00000000000000// // mimarksattributescommand.h // Mothur // // Created by Sarah Westcott on 3/17/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #ifndef __Mothur__mimarksattributescommand__ #define __Mothur__mimarksattributescommand__ #include "command.hpp" struct Package { bool required; string groupName; string name; Package() { required=false; groupName=""; name=""; } Package(bool r, string g, string n) : required(r), groupName(g), name(n) {} ~Package() {} string getPackageString() { string r = "mandatory"; if (!required) { r = "optional"; } string packageString = name + '\t' + groupName + '\t' + r; return packageString; } }; struct Value { bool required; string format, description; Value() { format=""; description=""; required=false; } Value(bool r, string d, string f) : format(f), description(d), required(r) {} ~Value() {} }; struct Group { string packageName; map values; Group() { packageName = ""; } Group(string p) : packageName(p) {} ~Group() {} }; struct Attribute { string name, harmonizedName, description, format; vector packages; string getPackagesString() { string packagesString = ""; for (int i = 0; i < packages.size(); i++) { packagesString += packages[i].getPackageString() + "\n"; } return packagesString; } Attribute() { format=""; description=""; harmonizedName=""; name=""; } Attribute(string hn, string d, string n, string f) : format(f), harmonizedName(hn), name(n), description(d) {} ~Attribute() {} }; /**************************************************************************************************/ class MimarksAttributesCommand : public Command { public: MimarksAttributesCommand(string); MimarksAttributesCommand(); ~MimarksAttributesCommand(){} vector setParameters(); string getCommandName() { return "mimarks.attributes"; } string getCommandCategory() { return "Hidden"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "http://www.mothur.org/wiki/Mimarks.attributes"; } string getDescription() { return "Reads bioSample Attributes xml and generates source for get.mimarkspackage command."; } int execute(); void help() { m->mothurOut(getHelpString()); } private: Attribute readAttribute(ifstream& in); Package parsePackage(string package); string trimTags(string& value); bool abort; string outputDir, xmlFile, selectedPackage; vector outputNames; }; /**************************************************************************************************/ #endif /* defined(__Mothur__mimarksattributescommand__) */ mothur-1.36.1/source/commands/newcommandtemplate.cpp000066400000000000000000000526731255543666200226440ustar00rootroot00000000000000// // newcommandtemplate.cpp // Mothur // // Created by Sarah Westcott on 5/3/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // // #include "newcommandtemplate.h" // Test Change. //********************************************************************************************************************** vector NewCommand::setParameters(){ try { //eaxamples of each type of parameter. more info on the types of parameters can be found in commandparameter.h CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false); parameters.push_back(pprocessors); //files that have dependancies CommandParameter pphylip("phylip", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "none","outputType",false,false); parameters.push_back(pphylip); CommandParameter pname("name", "InputTypes", "", "", "none", "none", "ColumnName","outputType",false,false); parameters.push_back(pname); CommandParameter pcolumn("column", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "ColumnName","outputType",false,false); parameters.push_back(pcolumn); //files that do not have dependancies - fasta is set to not be required whereas shared is set to be required CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","outputType",false,false); parameters.push_back(pfasta); CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","outputType",false,true); parameters.push_back(pshared); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); //choose more than one multiple options CommandParameter pcalc("calc", "Multiple", "jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan-morisitahorn-braycurtis", "jest-thetayc", "", "", "","",true,false); parameters.push_back(pcalc); //choose only one multiple options CommandParameter pdistance("distance", "Multiple", "column-lt-square", "column", "", "", "","",false,false); parameters.push_back(pdistance); CommandParameter ptiming("timing", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(ptiming); //every command must have inputdir and outputdir. This allows mothur users to redirect input and output files. CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "NewCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string NewCommand::getHelpString(){ try { string helpString = ""; helpString += "The new command allows you to ....\n"; helpString += "The new command parameters are: ....\n"; helpString += "The whatever parameter is used to ....\n"; helpString += "The new command should be in the following format: \n"; helpString += "new(...)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "NewCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string NewCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fileType1") { pattern = "[filename],tag1"; } else if (type == "fileType2") { pattern = "[filename],tag2"; } else if (type == "fileType3") { pattern = "[filename],tag3"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "NewCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** NewCommand::NewCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fileType1"] = tempOutNames; //filetypes should be things like: shared, fasta, accnos... outputTypes["fileType2"] = tempOutNames; outputTypes["FileType3"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "NewCommand", "NewCommand"); exit(1); } } //********************************************************************************************************************** NewCommand::NewCommand(string option) { try { //////////////////////////////////////////////////////// /////////////////// start leave alone block //////////// //////////////////////////////////////////////////////// abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { /////////////////////////////////////////////////////////////// //////////////// stop leave alone block /////////////////////// /////////////////////////////////////////////////////////////// //edit file types below to include only the types you added as parameters string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } /////////////////////////////////////////////////////////////////////////////// /////////// example of getting filenames and checking dependancies //////////// // the validParameter class will make sure file exists, fill with correct // // and name is current is given /////////////////////////////////////////////// /////////////////////////////////////////////////////////////////////////////// ///variables for examples below that you will most likely want to put in the header for //use by the other class functions. string phylipfile, columnfile, namefile, fastafile, sharedfile, method, countfile; int processors; bool useTiming, allLines; vector Estimators, Groups; set labels; //if allLines is used it should be initialized to 1 above. //check for parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { phylipfile = ""; abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { columnfile = ""; abort = true; } else if (columnfile == "not found") { columnfile = ""; } else { m->setColumnFile(columnfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } //get fastafile - it is not required fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort=true; } else if (fastafile == "not found") { fastafile = ""; } if (fastafile != "") { m->setFastaFile(fastafile); } if ((phylipfile == "") && (columnfile == "")) { //is there are current file available for either of these? //give priority to column, then phylip columnfile = m->getColumnFile(); if (columnfile != "") { m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a phylip or column file before you can use the cluster command."); m->mothurOutEndLine(); abort = true; } } } else if ((phylipfile != "") && (columnfile != "")) { m->mothurOut("When executing a cluster command you must enter ONLY ONE of the following: phylip or column."); m->mothurOutEndLine(); abort = true; } if (columnfile != "") { if (namefile == "") { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a namefile if you are going to use the column format."); m->mothurOutEndLine(); abort = true; } } } //get shared file, it is required sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); //if user entered a file with a path then preserve it } ////////////////////////////////////////////////////////////////////// ////////// example of getting other types of parameters ////////////// ////////////////////////////////////////////////////////////////////// //use only one Mutliple type method = validParameter.validFile(parameters, "method", false); if (method == "not found") { method = "average"; } if ((method == "furthest") || (method == "nearest") || (method == "average") || (method == "weighted")) { } else { m->mothurOut("Not a valid clustering method. Valid clustering algorithms are furthest, nearest, average, and weighted."); m->mothurOutEndLine(); abort = true; } //use more than one multiple type. do not check to make sure the entry is valid. string calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "sobs-chao-ace-jack-shannon-npshannon-simpson"; } else { if (calc == "default") { calc = "sobs-chao-ace-jack-shannon-npshannon-simpson"; } } m->splitAtDash(calc, Estimators); //Boolean type - m->isTrue looks for t, true, f or false and is case insensitive string timing = validParameter.validFile(parameters, "timing", false); if (timing == "not found") { timing = "F"; } useTiming = m->isTrue(timing); //Number type - mothurConvert makes sure the convert can happen to avoid a crash. string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); //Groups must be checked later to make sure they are valid. SharedUtilities has functions of check the validity, just make to so m->setGroups() after the checks. If you are using these with a shared file no need to check the SharedRAbundVector class will call SharedUtilites for you, kinda nice, huh? string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); //Commonly used to process list, rabund, sabund, shared and relabund files. Look at "smart distancing" examples below in the execute function. string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } //if your command has a namefile as an option, you may want ot check to see if there is a current namefile //saved by mothur that is associated with the other files you are using as inputs. //You can do so by adding the files associated with the namefile to the files vector and then asking parser to check. //This saves our users headaches over file mismatches because they forgot to include the namefile, :) if (countfile == "") { if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "NewCommand", "NewCommand"); exit(1); } } //********************************************************************************************************************** int NewCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } // reading and processing a shared file code example // Note: As long as you set groups and labels as shown in the constructor, you can use this code without modification other than adding your function call which is passed the lookup vector. // The classes used below will handle the checking of groups to make sure they are valid and returning only the groups you selected. The while loop implements mothur "smart distancing" so as long as you filled label as shown above in the constructor the code below will handle bad labels or labels not included in the sharedfile. //Reads sharefile, binLabels are stored in m->currentBinLabels, lookup will be filled with groups in m->getGroups() or all groups in file if m->getGroups is empty. If groups are selected, some bins maybe eliminated if they only contained seqs from groups not included. No need to worry about the details of this, SharedRAbundVector takes care of it. Just make sure to use m->currentBinLabels if you are outputting OTU labels so that if otus are eliminated you still have the correct names. /* InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); /////////////////////////////////////////////////////////////////////////////////// //// Call your function to process specific distance in sharedfile, ie lookup ///// /////////////////////////////////////////////////////////////////////////////////// processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); /////////////////////////////////////////////////////////////////////////////////// //// Call your function to process specific distance in sharedfile, ie lookup ///// /////////////////////////////////////////////////////////////////////////////////// processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); /////////////////////////////////////////////////////////////////////////////////// //// Call your function to process specific distance in sharedfile, ie lookup ///// /////////////////////////////////////////////////////////////////////////////////// for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } */ //if you make a new file or a type that mothur keeps track of the current version, you can update it with something like the following. string currentFasta = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { currentFasta = (itTypes->second)[0]; m->setFastaFile(currentFasta); } } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "NewCommand", "NewCommand"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/newcommandtemplate.h000066400000000000000000000035031255543666200222750ustar00rootroot00000000000000#ifndef Mothur_newcommandtemplate_h #define Mothur_newcommandtemplate_h // // newcommandtemplate.h // Mothur // // Created by westcott on 5/3/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // //test //*********Be sure to change ifdef and define to a unique name.**************// /* This class is designed to provide a template for creating new commands. It includes code snippets to make creating the command classes virtually pure functions easier. It includes sample parameter declaration and parameter checking, as well as reference to other classes you may find helpful. It also includes the code needed to read a sharedfile. It is a work in progress so please add things you may find helpful to yourself or other developers trying to add commands to mothur. */ #include "command.hpp" /**************************************************************************************************/ class NewCommand : public Command { public: NewCommand(string); NewCommand(); ~NewCommand(){} vector setParameters(); string getCommandName() { return "newCommandNameToBeSeenByUser"; } string getCommandCategory() { return "commandCategory"; } string getOutputPattern(string); //commmand category choices: Sequence Processing, OTU-Based Approaches, Hypothesis Testing, Phylotype Analysis, General, Clustering and Hidden string getHelpString(); string getCitation() { return "http://www.mothur.org/wiki/newCommandNameToBeSeenByUser"; } string getDescription() { return "brief description"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string outputDir; vector outputNames; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/nmdscommand.cpp000066400000000000000000000534361255543666200212560ustar00rootroot00000000000000/* * nmdscommand.cpp * mothur * * Created by westcott on 1/11/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "nmdscommand.h" #include "readphylipvector.h" //********************************************************************************************************************** vector NMDSCommand::setParameters(){ try { CommandParameter paxes("axes", "InputTypes", "", "", "none", "none", "none","",false,false,true); parameters.push_back(paxes); CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "none", "none","nmds-stress",false,true,true); parameters.push_back(pphylip); CommandParameter pmaxdim("maxdim", "Number", "", "2", "", "", "","",false,false); parameters.push_back(pmaxdim); CommandParameter pmindim("mindim", "Number", "", "2", "", "", "","",false,false); parameters.push_back(pmindim); CommandParameter piters("iters", "Number", "", "10", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pmaxiters("maxiters", "Number", "", "500", "", "", "","",false,false); parameters.push_back(pmaxiters); CommandParameter pepsilon("epsilon", "Number", "", "0.000000000001", "", "", "","",false,false); parameters.push_back(pepsilon); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string NMDSCommand::getHelpString(){ try { string helpString = ""; helpString += "The nmds command is modelled after the nmds code written in R by Sarah Goslee, using Non-metric multidimensional scaling function using the majorization algorithm from Borg & Groenen 1997, Modern Multidimensional Scaling.\n"; helpString += "The nmds command parameters are phylip, axes, mindim, maxdim, maxiters, iters and epsilon.\n"; helpString += "The phylip parameter allows you to enter your distance file.\n"; helpString += "The axes parameter allows you to enter a file containing a starting configuration.\n"; helpString += "The maxdim parameter allows you to select the maximum dimensions to use. Default=2\n"; helpString += "The mindim parameter allows you to select the minimum dimensions to use. Default=2\n"; helpString += "The maxiters parameter allows you to select the maximum number of iters to try with each random configuration. Default=500\n"; helpString += "The iters parameter allows you to select the number of random configuration to try. Default=10\n"; helpString += "The epsilon parameter allows you to select set an acceptable stopping point. Default=1e-12.\n"; helpString += "Example nmds(phylip=yourDistanceFile).\n"; helpString += "Note: No spaces between parameter labels (i.e. phylip), '=' and parameters (i.e.yourDistanceFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string NMDSCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "nmds") { pattern = "[filename],nmds.axes"; } else if (type == "stress") { pattern = "[filename],nmds.stress"; } else if (type == "iters") { pattern = "[filename],nmds.iters"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** NMDSCommand::NMDSCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["nmds"] = tempOutNames; outputTypes["stress"] = tempOutNames; outputTypes["iters"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "NMDSCommand"); exit(1); } } //********************************************************************************************************************** NMDSCommand::NMDSCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser. getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("axes"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["axes"] = inputDir + it->second; } } } //initialize outputTypes vector tempOutNames; outputTypes["nmds"] = tempOutNames; outputTypes["iters"] = tempOutNames; outputTypes["stress"] = tempOutNames; //required parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { phylipfile = ""; abort = true; } else if (phylipfile == "not found") { //if there is a current phylip file, use it phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current phylip file and the phylip parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setPhylipFile(phylipfile); } axesfile = validParameter.validFile(parameters, "axes", true); if (axesfile == "not open") { axesfile = ""; abort = true; } else if (axesfile == "not found") { axesfile = ""; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(phylipfile); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "mindim", false); if (temp == "not found") { temp = "2"; } m->mothurConvert(temp, mindim); temp = validParameter.validFile(parameters, "maxiters", false); if (temp == "not found") { temp = "500"; } m->mothurConvert(temp, maxIters); temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "maxdim", false); if (temp == "not found") { temp = "2"; } m->mothurConvert(temp, maxdim); temp = validParameter.validFile(parameters, "epsilon", false); if (temp == "not found") { temp = "0.000000000001"; } m->mothurConvert(temp, epsilon); if (mindim < 1) { m->mothurOut("mindim must be at least 1."); m->mothurOutEndLine(); abort = true; } if (maxdim < mindim) { maxdim = mindim; } } } catch(exception& e) { m->errorOut(e, "NMDSCommand", "NMDSCommand"); exit(1); } } //********************************************************************************************************************** int NMDSCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); vector names; vector< vector< double> > matrix; //read in phylip file ReadPhylipVector readFile(phylipfile); names = readFile.read(matrix); if (m->control_pressed) { return 0; } //read axes vector< vector > axes; if (axesfile != "") { axes = readAxes(names); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(phylipfile)); string outputFileName = getOutputFileName("iters",variables); string stressFileName = getOutputFileName("stress",variables); outputNames.push_back(outputFileName); outputTypes["iters"].push_back(outputFileName); outputNames.push_back(stressFileName); outputTypes["stress"].push_back(stressFileName); ofstream out, out2; m->openOutputFile(outputFileName, out); m->openOutputFile(stressFileName, out2); out2.setf(ios::fixed, ios::floatfield); out2.setf(ios::showpoint); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); out2 << "Dimension\tIter\tStress\tRsq" << endl; double bestStress = 10000000; double bestR2 = 10000000; vector< vector > bestConfig; int bestDim = 0; for (int i = mindim; i <= maxdim; i++) { m->mothurOut("Processing Dimension: " + toString(i)); m->mothurOutEndLine(); for (int j = 0; j < iters; j++) { m->mothurOut(toString(j+1)); m->mothurOutEndLine(); //get configuration - either randomly generate or resize to this dimension vector< vector > thisConfig; if (axesfile == "") { thisConfig = generateStartingConfiguration(names.size(), i); } else { thisConfig = getConfiguration(axes, i); } if (m->control_pressed) { out.close(); out2.close(); for (int k = 0; k < outputNames.size(); k++) { m->mothurRemove(outputNames[k]); } return 0; } //calc nmds for this dimension double stress; vector< vector > endConfig = nmdsCalc(matrix, thisConfig, stress); if (m->control_pressed) { out.close(); out2.close(); for (int k = 0; k < outputNames.size(); k++) { m->mothurRemove(outputNames[k]); } return 0; } //calc euclid distances for new config vector< vector > newEuclid = linearCalc.calculateEuclidianDistance(endConfig); if (m->control_pressed) { out.close(); out2.close(); for (int k = 0; k < outputNames.size(); k++) { m->mothurRemove(outputNames[k]); } return 0; } //calc correlation between original distances and euclidean distances from this config double rsquared = linearCalc.calcPearson(newEuclid, matrix); rsquared *= rsquared; if (m->control_pressed) { out.close(); out2.close(); for (int k = 0; k < outputNames.size(); k++) { m->mothurRemove(outputNames[k]); } return 0; } //output results out << "Config" << (j+1); for (int k = 0; k < i; k++) { out << '\t' << "axis" << (k+1); } out << endl; out2 << i << '\t' << (j+1) << '\t' << stress << '\t' << rsquared << endl; output(endConfig, names, out); //save best if (stress < bestStress) { bestDim = i; bestStress = stress; bestR2 = rsquared; bestConfig = endConfig; } if (m->control_pressed) { out.close(); out2.close(); for (int k = 0; k < outputNames.size(); k++) { m->mothurRemove(outputNames[k]); } return 0; } } } out.close(); out2.close(); //output best config string BestFileName = getOutputFileName("nmds",variables); outputNames.push_back(BestFileName); outputTypes["nmds"].push_back(BestFileName); m->mothurOut("\nNumber of dimensions:\t" + toString(bestDim) + "\n"); m->mothurOut("Lowest stress :\t" + toString(bestStress) + "\n"); m->mothurOut("R-squared for configuration:\t" + toString(bestR2) + "\n"); ofstream outBest; m->openOutputFile(BestFileName, outBest); outBest.setf(ios::fixed, ios::floatfield); outBest.setf(ios::showpoint); outBest << "group"; for (int k = 0; k < bestConfig.size(); k++) { outBest << '\t' << "axis" << (k+1); } outBest << endl; output(bestConfig, names, outBest); outBest.close(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "execute"); exit(1); } } //********************************************************************************************************************** vector< vector > NMDSCommand::nmdsCalc(vector< vector >& matrix, vector< vector >& config, double& stress1) { try { vector< vector > newConfig = config; //calc euclid distances vector< vector > euclid = linearCalc.calculateEuclidianDistance(newConfig); if (m->control_pressed) { return newConfig; } double stress2 = calculateStress(matrix, euclid); stress1 = stress2 + 1.0 + epsilon; int count = 0; while ((count < maxIters) && (abs(stress1 - stress2) > epsilon)) { count++; stress1 = stress2; if (m->control_pressed) { return newConfig; } vector< vector > b; b.resize(euclid.size()); for (int i = 0; i < b.size(); i++) { b[i].resize(euclid[i].size(), 0.0); } vector columnSums; columnSums.resize(euclid.size(), 0.0); for (int i = 0; i < euclid.size(); i++) { for (int j = 0; j < euclid[i].size(); j++) { //eliminate divide by zero error if (euclid[i][j] != 0) { b[i][j] = matrix[i][j] / euclid[i][j]; columnSums[j] += b[i][j]; b[i][j] *= -1.0; } } } //put in diagonal sums for (int i = 0; i < euclid.size(); i++) { b[i][i] = columnSums[i]; } int numInLowerTriangle = matrix.size() * (matrix.size()-1) / 2.0; double n = (1.0 + sqrt(1.0 + 8.0 * numInLowerTriangle)) / 2.0; //matrix mult newConfig = linearCalc.matrix_mult(newConfig, b); for (int i = 0; i < newConfig.size(); i++) { for (int j = 0; j < newConfig[i].size(); j++) { newConfig[i][j] *= (1.0 / n); } } euclid = linearCalc.calculateEuclidianDistance(newConfig); stress2 = calculateStress(matrix, euclid); } return newConfig; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "generateStartingConfiguration"); exit(1); } } //********************************************************************************************************************** //generate random config vector< vector > NMDSCommand::generateStartingConfiguration(int numNames, int dimension) { try { vector< vector > axes; axes.resize(dimension); for (int i = 0; i < axes.size(); i++) { axes[i].resize(numNames); } //generate random number between -1 and 1, precision 6 for (int i = 0; i < axes.size(); i++) { for (int j = 0; j < axes[i].size(); j++) { if (m->control_pressed) { return axes; } //generate random int between 0 and 99999 int myrand = (int)((float)(rand()) / ((RAND_MAX / 99998) + 1)); //generate random sign int mysign = (int)((float)(rand()) / ((RAND_MAX / 99998) + 1)); //if mysign is even then sign = positive, else sign = negative if ((mysign % 2) == 0) { mysign = 1.0; } else { mysign = -1.0; } axes[i][j] = mysign * myrand / (float) 100000; } } return axes; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "generateStartingConfiguration"); exit(1); } } //********************************************************************************************************************** //normalize configuration int NMDSCommand::normalizeConfiguration(vector< vector >& axes, int numNames, int dimension) { try { vector averageAxes; averageAxes.resize(dimension, 0.0); //find average for (int i = 0; i < axes.size(); i++) { for (int j = 0; j < axes[i].size(); j++) { averageAxes[i] += axes[i][j]; } averageAxes[i] /= (float) numNames; } //normalize axes double sumDenom = 0.0; for (int i = 0; i < axes.size(); i++) { for (int j = 0; j < axes[i].size(); j++) { sumDenom += ((axes[i][j] - averageAxes[i]) * (axes[i][j] - averageAxes[i])); } } double denom = sqrt((sumDenom / (float) (axes.size() * numNames))); for (int i = 0; i < axes.size(); i++) { for (int j = 0; j < axes[i].size(); j++) { axes[i][j] = (axes[i][j] - averageAxes[i]) / denom; } } return 0; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "normalizeConfiguration"); exit(1); } } //********************************************************************************************************************** //get configuration vector< vector > NMDSCommand::getConfiguration(vector< vector >& axes, int dimension) { try { vector< vector > newAxes; newAxes.resize(dimension); for (int i = 0; i < dimension; i++) { newAxes[i] = axes[i]; } return newAxes; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "getConfiguration"); exit(1); } } //********************************************************************************************************************** //find raw stress, and normalize using double NMDSCommand::calculateStress(vector< vector >& matrix, vector< vector >& config) { try { double normStress = 0.0; double denom = 0.0; double rawStress = 0.0; //find raw stress for (int i = 0; i < matrix.size(); i++) { for (int j = 0; j < matrix[i].size(); j++) { if (m->control_pressed) { return normStress; } rawStress += ((matrix[i][j] - config[i][j]) * (matrix[i][j] - config[i][j])); denom += (config[i][j] * config[i][j]); } } //normalize stress if ((rawStress != 0.0) && (denom != 0.0)) { normStress = sqrt((rawStress / denom)); } return normStress; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "calculateStress"); exit(1); } } //********************************************************************************************************************** int NMDSCommand::output(vector< vector >& config, vector& names, ofstream& out) { try { for (int i = 0; i < names.size(); i++) { out << names[i]; for (int j = 0; j < config.size(); j++) { out << '\t' << config[j][i]; } out << endl; } out << endl << endl; return 0; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "output"); exit(1); } } /*****************************************************************/ vector< vector > NMDSCommand::readAxes(vector names){ try { ifstream in; m->openInputFile(axesfile, in); string headerLine = m->getline(in); m->gobble(in); //count the number of axis you are reading bool done = false; int count = 0; while (!done) { int pos = headerLine.find("axis"); if (pos != string::npos) { count++; headerLine = headerLine.substr(pos+4); }else { done = true; } } if (maxdim > count) { m->mothurOut("You requested maxdim = " + toString(maxdim) + ", but your file only includes " + toString(count) + ". Using " + toString(count) + "."); m->mothurOutEndLine(); maxdim = count; if (maxdim < mindim) { m->mothurOut("Also adjusting mindim to " + toString(maxdim-1) + "."); m->mothurOutEndLine(); } } vector< vector > axes; axes.resize(maxdim); for (int i = 0; i < axes.size(); i++) { axes[i].resize(names.size(), 0.0); } map > orderedAxes; map >::iterator it; while (!in.eof()) { if (m->control_pressed) { in.close(); return axes; } string group = ""; in >> group; m->gobble(in); bool ignore = false; if (!m->inUsersGroups(group, names)) { ignore = true; m->mothurOut(group + " is in your axes file and not in your distance file, ignoring."); m->mothurOutEndLine(); } vector thisGroupsAxes; for (int i = 0; i < count; i++) { float temp = 0.0; in >> temp; //only save the axis we want if (i < maxdim) { thisGroupsAxes.push_back(temp); } } if (!ignore) { orderedAxes[group] = thisGroupsAxes; } m->gobble(in); } in.close(); //sanity check if (names.size() != orderedAxes.size()) { m->mothurOut("[ERROR]: your axes file does not match your distance file, aborting."); m->mothurOutEndLine(); m->control_pressed = true; return axes; } //put axes info in same order as distance file, just in case for (int i = 0; i < names.size(); i++) { it = orderedAxes.find(names[i]); if (it != orderedAxes.end()) { vector thisGroupsAxes = it->second; for (int j = 0; j < thisGroupsAxes.size(); j++) { axes[j][i] = thisGroupsAxes[j]; } }else { m->mothurOut("[ERROR]: your axes file does not match your distance file, aborting."); m->mothurOutEndLine(); m->control_pressed = true; return axes; } } return axes; } catch(exception& e) { m->errorOut(e, "NMDSCommand", "readAxes"); exit(1); } } /**********************************************************************************************************************/ mothur-1.36.1/source/commands/nmdscommand.h000066400000000000000000000037751255543666200207240ustar00rootroot00000000000000#ifndef NMDSCOMMAND_H #define NMDSCOMMAND_H /* * nmdscommand.h * mothur * * Created by westcott on 1/11/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "linearalgebra.h" /* Translated from the nmds.R code written by Sarah Goslee using, # Non-metric multidimensional scaling function # using the majorization algorithm from # Borg & Groenen 1997, Modern Multidimensional Scaling. # # also referenced (Kruskal 1964) */ /*****************************************************************/ class NMDSCommand : public Command { public: NMDSCommand(string); NMDSCommand(); ~NMDSCommand(){} vector setParameters(); string getCommandName() { return "nmds"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Borg, Groenen (1997). Non-metric multidimensional scaling function using the majorization algorithm, in Modern Multidimensional Scaling. Ed. T.F. Cox and M.A.A. Cox. Chapman and Hall. \nhttp://www.mothur.org/wiki/Nmds"; } string getDescription() { return "nmds"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string phylipfile, outputDir, axesfile; int maxdim, mindim, maxIters, iters; double epsilon; vector outputNames; LinearAlgebra linearCalc; vector< vector > nmdsCalc(vector< vector >&, vector< vector >&, double&); vector< vector > getConfiguration(vector< vector >&, int); vector< vector > generateStartingConfiguration(int, int); //pass in numNames, return axes int normalizeConfiguration(vector< vector >&, int, int); double calculateStress(vector< vector >&, vector< vector >&); vector< vector > readAxes(vector); int output(vector< vector >&, vector&, ofstream&); }; /*****************************************************************/ #endif mothur-1.36.1/source/commands/nocommands.cpp000066400000000000000000000014251255543666200211030ustar00rootroot00000000000000/* * nocommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "nocommands.h" //********************************************************************************************************************** NoCommand::NoCommand(string option) {} //********************************************************************************************************************** int NoCommand::execute(){ //Could choose to give more help here?fdsah cout << "Invalid command.\n"; CommandFactory* valid = CommandFactory::getInstance(); valid->printCommands(cout); return 0; } //********************************************************************************************************************** mothur-1.36.1/source/commands/nocommands.h000066400000000000000000000020151255543666200205440ustar00rootroot00000000000000#ifndef NOCOMMAND_H #define NOCOMMAND_H /* * nocommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This command is run if the user enters an invalid command. */ #include "command.hpp" #include "commandfactory.hpp" class NoCommand : public Command { public: NoCommand(string); NoCommand() {} ~NoCommand(){} vector setParameters() { return outputNames; } //dummy, doesn't really do anything string getCommandName() { return "NoCommand"; } string getCommandCategory() { return "Hidden"; } string getHelpString() { return "No Command"; } string getOutputPattern(string) { return ""; } string getCitation() { return "no citation"; } string getDescription() { return "no description"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector outputNames; }; #endif mothur-1.36.1/source/commands/normalizesharedcommand.cpp000066400000000000000000000722421255543666200235000ustar00rootroot00000000000000/* * normalizesharedcommand.cpp * Mothur * * Created by westcott on 9/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "normalizesharedcommand.h" //********************************************************************************************************************** vector NormalizeSharedCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "LRSS", "LRSS", "none","shared",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "LRSS", "LRSS", "none","shared",false,false,true); parameters.push_back(prelabund); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pmethod("method", "Multiple", "totalgroup-zscore", "totalgroup", "", "", "","",false,false,true); parameters.push_back(pmethod); CommandParameter pnorm("norm", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pnorm); CommandParameter pmakerelabund("makerelabund", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pmakerelabund); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string NormalizeSharedCommand::getHelpString(){ try { string helpString = ""; helpString += "The normalize.shared command parameters are shared, relabund, groups, method, norm, makerelabund and label. shared or relabund is required, unless you have a valid current file.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distance levels you would like, and are also separated by dashes.\n"; helpString += "The method parameter allows you to select what method you would like to use to normalize. The options are totalgroup and zscore. We hope to add more ways to normalize in the future, suggestions are welcome!\n"; helpString += "The makerelabund parameter allows you to convert a shared file to a relabund file before you normalize. default=f.\n"; helpString += "The norm parameter allows you to number you would like to normalize to. By default this is set to the number of sequences in your smallest group.\n"; helpString += "The normalize.shared command should be in the following format: normalize.shared(groups=yourGroups, label=yourLabels).\n"; helpString += "Example normalize.shared(groups=A-B-C, scale=totalgroup).\n"; helpString += "The default value for groups is all the groups in your groupfile, and all labels in your inputfile will be used.\n"; helpString += "The normalize.shared command outputs a .norm.shared file.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string NormalizeSharedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "shared") { pattern = "[filename],[distance],norm.shared"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** NormalizeSharedCommand::NormalizeSharedCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["shared"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "NormalizeSharedCommand"); exit(1); } } //********************************************************************************************************************** NormalizeSharedCommand::NormalizeSharedCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["shared"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { format = "sharedfile"; inputfile = sharedfile; m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { relabundfile = ""; abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { format = "relabund"; inputfile = relabundfile; m->setRelAbundFile(relabundfile); } if ((sharedfile == "") && (relabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputfile = sharedfile; format = "sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputfile = relabundfile; format = "relabund"; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list, sabund, rabund, relabund or shared file."); m->mothurOutEndLine(); abort = true; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; pickedGroups = false; } else { pickedGroups = true; m->splitAtDash(groups, Groups); m->setGroups(Groups); } method = validParameter.validFile(parameters, "method", false); if (method == "not found") { method = "totalgroup"; } if ((method != "totalgroup") && (method != "zscore")) { m->mothurOut(method + " is not a valid scaling option for the normalize.shared command. The options are totalgroup and zscore. We hope to add more ways to normalize in the future, suggestions are welcome!"); m->mothurOutEndLine(); abort = true; } string temp = validParameter.validFile(parameters, "norm", false); if (temp == "not found") { norm = 0; //once you have read, set norm to smallest group number }else { m->mothurConvert(temp, norm); if (norm < 0) { m->mothurOut("norm must be positive."); m->mothurOutEndLine(); abort=true; } } temp = validParameter.validFile(parameters, "makerelabund", false); if (temp == "") { temp = "f"; } makeRelabund = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "NormalizeSharedCommand"); exit(1); } } //********************************************************************************************************************** int NormalizeSharedCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } input = new InputData(inputfile, format); //you are reading a sharedfile and you do not want to make relabund if ((format == "sharedfile") && (!makeRelabund)) { lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //look for groups whose numseqs is below norm and remove them, warning the user if (norm != 0) { m->clearGroups(); vector mGroups; vector temp; for (int i = 0; i < lookup.size(); i++) { if (lookup[i]->getNumSeqs() < norm) { m->mothurOut(lookup[i]->getGroup() + " contains " + toString(lookup[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete lookup[i]; }else { mGroups.push_back(lookup[i]->getGroup()); temp.push_back(lookup[i]); } } lookup = temp; m->setGroups(mGroups); } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (method == "totalgroup") { //set norm to smallest group number if (norm == 0) { norm = lookup[0]->getNumSeqs(); for (int i = 1; i < lookup.size(); i++) { if (lookup[i]->getNumSeqs() < norm) { norm = lookup[i]->getNumSeqs(); } } } m->mothurOut("Normalizing to " + toString(norm) + "."); m->mothurOutEndLine(); } //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); normalize(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); normalize(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); m->clearGroups(); return 0; } //get next line to process lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); m->clearGroups(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); normalize(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } }else{ //relabund values lookupFloat = input->getSharedRAbundFloatVectors(); string lastLabel = lookupFloat[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //look for groups whose numseqs is below norm and remove them, warning the user if (norm != 0) { m->clearGroups(); vector mGroups; vector temp; for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i]->getNumSeqs() < norm) { m->mothurOut(lookupFloat[i]->getGroup() + " contains " + toString(lookupFloat[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete lookupFloat[i]; }else { mGroups.push_back(lookupFloat[i]->getGroup()); temp.push_back(lookupFloat[i]); } } lookupFloat = temp; m->setGroups(mGroups); } //set norm to smallest group number if (method == "totalgroup") { if (norm == 0) { norm = lookupFloat[0]->getNumSeqs(); for (int i = 1; i < lookupFloat.size(); i++) { if (lookupFloat[i]->getNumSeqs() < norm) { norm = lookupFloat[i]->getNumSeqs(); } } } m->mothurOut("Normalizing to " + toString(norm) + "."); m->mothurOutEndLine(); } //as long as you are not at the end of the file or done wih the lines you want while((lookupFloat[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } m->clearGroups(); return 0; } if(allLines == 1 || labels.count(lookupFloat[0]->getLabel()) == 1){ m->mothurOut(lookupFloat[0]->getLabel()); m->mothurOutEndLine(); normalize(lookupFloat); processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); } if ((m->anyLabelsToProcess(lookupFloat[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookupFloat[0]->getLabel(); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookupFloat[0]->getLabel()); m->mothurOutEndLine(); normalize(lookupFloat); processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); //restore real lastlabel to save below lookupFloat[0]->setLabel(saveLabel); } lastLabel = lookupFloat[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; lookupFloat[i] = NULL; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); m->clearGroups(); return 0; } //get next line to process lookupFloat = input->getSharedRAbundFloatVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); m->clearGroups(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i] != NULL) { delete lookupFloat[i]; } } lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookupFloat[0]->getLabel()); m->mothurOutEndLine(); normalize(lookupFloat); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } } } //reset groups parameter m->clearGroups(); delete input; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0;} m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); //m->mothurOut(outputFileName); m->mothurOutEndLine(); outputNames.push_back(outputFileName); outputTypes["shared"].push_back(outputFileName); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set shared file as new current sharedfile string current = ""; itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "execute"); exit(1); } } //********************************************************************************************************************** int NormalizeSharedCommand::normalize(vector& thisLookUp){ try { //save mothurOut's binLabels to restore for next label vector saveBinLabels = m->currentSharedBinLabels; if (pickedGroups) { eliminateZeroOTUS(thisLookUp); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[distance]"] = thisLookUp[0]->getLabel(); string outputFileName = getOutputFileName("shared",variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["shared"].push_back(outputFileName); if (method == "totalgroup") { //save numSeqs since they will change as the data is normalized vector sizes; for (int i = 0; i < thisLookUp.size(); i++) { sizes.push_back(thisLookUp[i]->getNumSeqs()); } for (int j = 0; j < thisLookUp[0]->getNumBins(); j++) { for (int i = 0; i < thisLookUp.size(); i++) { if (m->control_pressed) { out.close(); return 0; } int abund = thisLookUp[i]->getAbundance(j); float relabund = abund / (float) sizes[i]; float newNorm = relabund * norm; //round to nearest int int finalNorm = (int) floor((newNorm + 0.5)); thisLookUp[i]->set(j, finalNorm, thisLookUp[i]->getGroup()); } } }else if (method == "zscore") { for (int j = 0; j < thisLookUp[0]->getNumBins(); j++) { if (m->control_pressed) { out.close(); return 0; } //calc mean float mean = 0.0; for (int i = 0; i < thisLookUp.size(); i++) { mean += thisLookUp[i]->getAbundance(j); } mean /= (float) thisLookUp.size(); //calc standard deviation float sumSquared = 0.0; for (int i = 0; i < thisLookUp.size(); i++) { sumSquared += (((float)thisLookUp[i]->getAbundance(j) - mean) * ((float)thisLookUp[i]->getAbundance(j) - mean)); } sumSquared /= (float) thisLookUp.size(); float standardDev = sqrt(sumSquared); for (int i = 0; i < thisLookUp.size(); i++) { int finalNorm = 0; if (standardDev != 0) { // stop divide by zero float newNorm = ((float)thisLookUp[i]->getAbundance(j) - mean) / standardDev; //round to nearest int finalNorm = (int) floor((newNorm + 0.5)); } thisLookUp[i]->set(j, finalNorm, thisLookUp[i]->getGroup()); } } }else{ m->mothurOut(method + " is not a valid scaling option."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } eliminateZeroOTUS(thisLookUp); thisLookUp[0]->printHeaders(out); for (int i = 0; i < thisLookUp.size(); i++) { out << thisLookUp[i]->getLabel() << '\t' << thisLookUp[i]->getGroup() << '\t'; thisLookUp[i]->print(out); } out.close(); m->currentSharedBinLabels = saveBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "normalize"); exit(1); } } //********************************************************************************************************************** int NormalizeSharedCommand::normalize(vector& thisLookUp){ try { //save mothurOut's binLabels to restore for next label vector saveBinLabels = m->currentSharedBinLabels; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[distance]"] = thisLookUp[0]->getLabel(); string outputFileName = getOutputFileName("shared",variables); ofstream out; m->openOutputFile(outputFileName, out); outputNames.push_back(outputFileName); outputTypes["shared"].push_back(outputFileName); if (pickedGroups) { eliminateZeroOTUS(thisLookUp); } if (method == "totalgroup") { //save numSeqs since they will change as the data is normalized vector sizes; for (int i = 0; i < thisLookUp.size(); i++) { sizes.push_back(thisLookUp[i]->getNumSeqs()); } for (int j = 0; j < thisLookUp[0]->getNumBins(); j++) { for (int i = 0; i < thisLookUp.size(); i++) { if (m->control_pressed) { out.close(); return 0; } float abund = thisLookUp[i]->getAbundance(j); float relabund = abund / (float) sizes[i]; float newNorm = relabund * norm; thisLookUp[i]->set(j, newNorm, thisLookUp[i]->getGroup()); } } }else if (method == "zscore") { for (int j = 0; j < thisLookUp[0]->getNumBins(); j++) { if (m->control_pressed) { out.close(); return 0; } //calc mean float mean = 0.0; for (int i = 0; i < thisLookUp.size(); i++) { mean += thisLookUp[i]->getAbundance(j); } mean /= (float) thisLookUp.size(); //calc standard deviation float sumSquared = 0.0; for (int i = 0; i < thisLookUp.size(); i++) { sumSquared += ((thisLookUp[i]->getAbundance(j) - mean) * (thisLookUp[i]->getAbundance(j) - mean)); } sumSquared /= (float) thisLookUp.size(); float standardDev = sqrt(sumSquared); for (int i = 0; i < thisLookUp.size(); i++) { float newNorm = 0.0; if (standardDev != 0) { // stop divide by zero newNorm = (thisLookUp[i]->getAbundance(j) - mean) / standardDev; } thisLookUp[i]->set(j, newNorm, thisLookUp[i]->getGroup()); } } }else{ m->mothurOut(method + " is not a valid scaling option."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } eliminateZeroOTUS(thisLookUp); thisLookUp[0]->printHeaders(out); for (int i = 0; i < thisLookUp.size(); i++) { out << thisLookUp[i]->getLabel() << '\t' << thisLookUp[i]->getGroup() << '\t'; thisLookUp[i]->print(out); } out.close(); m->currentSharedBinLabels = saveBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "normalize"); exit(1); } } //********************************************************************************************************************** int NormalizeSharedCommand::eliminateZeroOTUS(vector& thislookup) { try { vector newLookup; for (int i = 0; i < thislookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); newLookup.push_back(temp); } //for each bin vector newBinLabels; string snumBins = toString(thislookup[0]->getNumBins()); for (int i = 0; i < thislookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //look at each sharedRabund and make sure they are not all zero bool allZero = true; for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { allZero = false; break; } } //if they are not all zero add this bin if (!allZero) { for (int j = 0; j < thislookup.size(); j++) { newLookup[j]->push_back(thislookup[j]->getAbundance(i), thislookup[j]->getGroup()); } //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } newBinLabels.push_back(binLabel); } } for (int j = 0; j < thislookup.size(); j++) { delete thislookup[j]; } thislookup = newLookup; m->currentSharedBinLabels = newBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "eliminateZeroOTUS"); exit(1); } } //********************************************************************************************************************** int NormalizeSharedCommand::eliminateZeroOTUS(vector& thislookup) { try { vector newLookup; for (int i = 0; i < thislookup.size(); i++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); newLookup.push_back(temp); } //for each bin vector newBinLabels; string snumBins = toString(thislookup[0]->getNumBins()); for (int i = 0; i < thislookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //look at each sharedRabund and make sure they are not all zero bool allZero = true; for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { allZero = false; break; } } //if they are not all zero add this bin if (!allZero) { for (int j = 0; j < thislookup.size(); j++) { newLookup[j]->push_back(thislookup[j]->getAbundance(i), thislookup[j]->getGroup()); } //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } newBinLabels.push_back(binLabel); } } for (int j = 0; j < thislookup.size(); j++) { delete thislookup[j]; } thislookup = newLookup; m->currentSharedBinLabels = newBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "NormalizeSharedCommand", "eliminateZeroOTUS"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/normalizesharedcommand.h000066400000000000000000000027031255543666200231400ustar00rootroot00000000000000#ifndef NORMALIZESHAREDCOMMAND_H #define NORMALIZESHAREDCOMMAND_H /* * normalizesharedcommand.h * Mothur * * Created by westcott on 9/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "sharedrabundvector.h" class NormalizeSharedCommand : public Command { public: NormalizeSharedCommand(string); NormalizeSharedCommand(); ~NormalizeSharedCommand() {} vector setParameters(); string getCommandName() { return "normalize.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Normalize.shared"; } string getDescription() { return "normalize samples in a shared or relabund file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: InputData* input; vector lookup; vector lookupFloat; bool abort, allLines, pickedGroups, makeRelabund; set labels; //holds labels to be used string groups, label, outputDir, method, sharedfile, relabundfile, format, inputfile; int norm; vector Groups, outputNames; int normalize(vector&); int normalize(vector&); int eliminateZeroOTUS(vector&); int eliminateZeroOTUS(vector&); }; #endif mothur-1.36.1/source/commands/otuassociationcommand.cpp000066400000000000000000000671541255543666200233630ustar00rootroot00000000000000/* * otuassociationcommand.cpp * Mothur * * Created by westcott on 1/19/12. * Copyright 2012 Schloss Lab. All rights reserved. * */ #include "otuassociationcommand.h" #include "linearalgebra.h" //********************************************************************************************************************** vector OTUAssociationCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "SharedRelMeta", "SharedRelMeta", "none","otucorr",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "SharedRelMeta", "SharedRelMeta", "none","otucorr",false,false); parameters.push_back(prelabund); CommandParameter pmetadata("metadata", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pmetadata); CommandParameter pcutoff("cutoff", "Number", "", "10", "", "", "","",false,false,true); parameters.push_back(pcutoff); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pmethod("method", "Multiple", "pearson-spearman-kendall", "pearson", "", "", "","",false,false,true); parameters.push_back(pmethod); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string OTUAssociationCommand::getHelpString(){ try { string helpString = ""; helpString += "The otu.association command reads a shared or relabund file and calculates the correlation coefficients between otus.\n"; helpString += "If you provide a metadata file, mothur will calculate te correlation bewteen the metadata and the otus.\n"; helpString += "The otu.association command parameters are shared, relabund, metadata, groups, method, cutoff and label. The shared or relabund parameter is required.\n"; helpString += "The groups parameter allows you to specify which of the groups you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distances level you would like used, and are also separated by dashes.\n"; helpString += "The cutoff parameter allows you to set a pvalue at which the otu will be reported.\n"; helpString += "The method parameter allows you to select what method you would like to use. Options are pearson, spearman and kendall. Default=pearson.\n"; helpString += "The otu.association command should be in the following format: otu.association(shared=yourSharedFile, method=yourMethod).\n"; helpString += "Example otu.association(shared=genus.pool.shared, method=kendall).\n"; helpString += "The otu.association command outputs a .otu.corr file.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string OTUAssociationCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "otucorr") { pattern = "[filename],[distance],[tag],otu.corr"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** OTUAssociationCommand::OTUAssociationCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["otucorr"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "OTUAssociationCommand"); exit(1); } } //********************************************************************************************************************** OTUAssociationCommand::OTUAssociationCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["otucorr"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } it = parameters.find("metadata"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["metadata"] = inputDir + it->second; } } } //check for required parameters sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { inputFileName = sharedfile; m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { inputFileName = relabundfile; m->setRelAbundFile(relabundfile); } metadatafile = validParameter.validFile(parameters, "metadata", true); if (metadatafile == "not open") { abort = true; metadatafile = ""; } else if (metadatafile == "not found") { metadatafile = ""; } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; pickedGroups = false; } else { pickedGroups = true; m->splitAtDash(groups, Groups); } m->setGroups(Groups); outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputFileName); } label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } if ((relabundfile == "") && (sharedfile == "")) { //is there are current file available for any of these? //give priority to shared, then relabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputFileName = sharedfile; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputFileName = relabundfile; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You must provide either a shared or relabund file."); m->mothurOutEndLine(); abort = true; } } } if ((relabundfile != "") && (sharedfile != "")) { m->mothurOut("You may only use one of the following : shared or relabund file."); m->mothurOutEndLine(); abort = true; } method = validParameter.validFile(parameters, "method", false); if (method == "not found"){ method = "pearson"; } string temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, cutoff); if ((method != "pearson") && (method != "spearman") && (method != "kendall")) { m->mothurOut(method + " is not a valid method. Valid methods are pearson, spearman, and kendall."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "OTUAssociationCommand"); exit(1); } } //********************************************************************************************************************** int OTUAssociationCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (metadatafile != "") { readMetadata(); } //function are identical just different datatypes if (sharedfile != "") { processShared(); } else if (relabundfile != "") { processRelabund(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "execute"); exit(1); } } //********************************************************************************************************************** int OTUAssociationCommand::processShared(){ try { InputData* input = new InputData(sharedfile, "sharedfile"); vector lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (metadatafile != "") { getMetadata(); bool error = false; if (metadata[0].size() != lookup.size()) { m->mothurOut("[ERROR]: You have selected to use " + toString(metadata[0].size()) + " data rows from the metadata file, but " + toString(lookup.size()) + " from the shared file.\n"); m->control_pressed = true; error=true;} if (error) { //maybe add extra info here?? compare groups in each file?? } } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete input; return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { delete input; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); } delete input; return 0; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "processShared"); exit(1); } } //********************************************************************************************************************** int OTUAssociationCommand::process(vector& lookup){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[distance]"] = lookup[0]->getLabel(); variables["[tag]"] = method; string outputFileName = getOutputFileName("otucorr",variables); outputNames.push_back(outputFileName); outputTypes["otucorr"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); //column headings if (metadatafile == "") { out << "OTUA\tOTUB\t" << method << "Coef\tSignificance\n"; } else { out << "OTUA\tMetadata\t" << method << "Coef\tSignificance\n"; } vector< vector > xy; xy.resize(lookup[0]->getNumBins()); for (int i = 0; i < lookup[0]->getNumBins(); i++) { for (int j = 0; j < lookup.size(); j++) { xy[i].push_back(lookup[j]->getAbundance(i)); } } LinearAlgebra linear; if (metadatafile == "") {//compare otus for (int i = 0; i < xy.size(); i++) { for (int k = 0; k < i; k++) { if (m->control_pressed) { out.close(); return 0; } double coef = 0.0; double sig = 0.0; if (method == "spearman") { coef = linear.calcSpearman(xy[i], xy[k], sig); } else if (method == "pearson") { coef = linear.calcPearson(xy[i], xy[k], sig); } else if (method == "kendall") { coef = linear.calcKendall(xy[i], xy[k], sig); } else { m->mothurOut("[ERROR]: invalid method, choices are spearman, pearson or kendall."); m->mothurOutEndLine(); m->control_pressed = true; } if (sig < cutoff) { out << m->currentSharedBinLabels[i] << '\t' << m->currentSharedBinLabels[k] << '\t' << coef << '\t' << sig << endl; } } } }else { //compare otus to metadata for (int i = 0; i < xy.size(); i++) { for (int k = 0; k < metadata.size(); k++) { if (m->control_pressed) { out.close(); return 0; } double coef = 0.0; double sig = 0.0; if (method == "spearman") { coef = linear.calcSpearman(xy[i], metadata[k], sig); } else if (method == "pearson") { coef = linear.calcPearson(xy[i], metadata[k], sig); } else if (method == "kendall") { coef = linear.calcKendall(xy[i], metadata[k], sig); } else { m->mothurOut("[ERROR]: invalid method, choices are spearman, pearson or kendall."); m->mothurOutEndLine(); m->control_pressed = true; } if (sig < cutoff) { out << m->currentSharedBinLabels[i] << '\t' << metadataLabels[k] << '\t' << coef << '\t' << sig << endl; } } } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "process"); exit(1); } } //********************************************************************************************************************** int OTUAssociationCommand::processRelabund(){ try { InputData* input = new InputData(relabundfile, "relabund"); vector lookup = input->getSharedRAbundFloatVectors(); string lastLabel = lookup[0]->getLabel(); if (metadatafile != "") { getMetadata(); bool error = false; if (metadata[0].size() != lookup.size()) { m->mothurOut("[ERROR]: You have selected to use " + toString(metadata[0].size()) + " data rows from the metadata file, but " + toString(lookup.size()) + " from the relabund file.\n"); m->control_pressed = true; error=true;} if (error) { //maybe add extra info here?? compare groups in each file?? } } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete input; return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundFloatVectors(lastLabel); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundFloatVectors(); } if (m->control_pressed) { delete input; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundFloatVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); } delete input; return 0; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "processRelabund"); exit(1); } } //********************************************************************************************************************** int OTUAssociationCommand::process(vector& lookup){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileName)); variables["[distance]"] = lookup[0]->getLabel(); variables["[tag]"] = method; string outputFileName = getOutputFileName("otucorr",variables); outputNames.push_back(outputFileName); outputTypes["otucorr"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); //column headings if (metadatafile == "") { out << "OTUA\tOTUB\t" << method << "Coef\tSignificance\n"; } else { out << "OTUA\tMetadata\t" << method << "Coef\tSignificance\n"; } vector< vector > xy; xy.resize(lookup[0]->getNumBins()); for (int i = 0; i < lookup[0]->getNumBins(); i++) { for (int j = 0; j < lookup.size(); j++) { xy[i].push_back(lookup[j]->getAbundance(i)); } } LinearAlgebra linear; if (metadatafile == "") {//compare otus for (int i = 0; i < xy.size(); i++) { for (int k = 0; k < i; k++) { if (m->control_pressed) { out.close(); return 0; } double coef = 0.0; double sig = 0.0; if (method == "spearman") { coef = linear.calcSpearman(xy[i], xy[k], sig); } else if (method == "pearson") { coef = linear.calcPearson(xy[i], xy[k], sig); } else if (method == "kendall") { coef = linear.calcKendall(xy[i], xy[k], sig); } else { m->mothurOut("[ERROR]: invalid method, choices are spearman, pearson or kendall."); m->mothurOutEndLine(); m->control_pressed = true; } if (sig < cutoff) { out << m->currentSharedBinLabels[i] << '\t' << m->currentSharedBinLabels[k] << '\t' << coef << '\t' << sig << endl; } } } }else { //compare otus to metadata for (int i = 0; i < xy.size(); i++) { for (int k = 0; k < metadata.size(); k++) { if (m->control_pressed) { out.close(); return 0; } double coef = 0.0; double sig = 0.0; if (method == "spearman") { coef = linear.calcSpearman(xy[i], metadata[k], sig); } else if (method == "pearson") { coef = linear.calcPearson(xy[i], metadata[k], sig); } else if (method == "kendall") { coef = linear.calcKendall(xy[i], metadata[k], sig); } else { m->mothurOut("[ERROR]: invalid method, choices are spearman, pearson or kendall."); m->mothurOutEndLine(); m->control_pressed = true; } if (sig < cutoff) { out << m->currentSharedBinLabels[i] << '\t' << metadataLabels[k] << '\t' << coef << '\t' << sig << endl; } } } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "process"); exit(1); } } /*****************************************************************/ int OTUAssociationCommand::readMetadata(){ try { ifstream in; m->openInputFile(metadatafile, in); string headerLine = m->getline(in); m->gobble(in); istringstream iss (headerLine,istringstream::in); //read the first label, because it refers to the groups string columnLabel; iss >> columnLabel; m->gobble(iss); //save names of columns you are reading while (!iss.eof()) { iss >> columnLabel; m->gobble(iss); if (m->debug) { m->mothurOut("[DEBUG]: metadata column Label = " + columnLabel + "\n"); } metadataLabels.push_back(columnLabel); } int count = metadataLabels.size(); //read rest of file while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } string group = ""; in >> group; m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: metadata group = " + group + "\n"); } SharedRAbundFloatVector* tempLookup = new SharedRAbundFloatVector(); tempLookup->setGroup(group); tempLookup->setLabel("1"); for (int i = 0; i < count; i++) { float temp = 0.0; in >> temp; if (m->debug) { m->mothurOut("[DEBUG]: metadata value = " + toString(temp) + "\n"); } tempLookup->push_back(temp, group); } metadataLookup.push_back(tempLookup); m->gobble(in); } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "readMetadata"); exit(1); } } /*****************************************************************/ //eliminate groups user did not pick, remove zeroed out otus, fill metadata vector. int OTUAssociationCommand::getMetadata(){ try { vector mGroups = m->getGroups(); bool remove = false; for (int i = 0; i < metadataLookup.size(); i++) { //if this sharedrabund is not from a group the user wants then delete it. if (!(m->inUsersGroups(metadataLookup[i]->getGroup(), mGroups))) { delete metadataLookup[i]; metadataLookup[i] = NULL; metadataLookup.erase(metadataLookup.begin()+i); i--; remove = true; } } vector newLookup; for (int i = 0; i < metadataLookup.size(); i++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(metadataLookup[i]->getLabel()); temp->setGroup(metadataLookup[i]->getGroup()); newLookup.push_back(temp); } //for each bin vector newBinLabels; for (int i = 0; i < metadataLookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //look at each sharedRabund and make sure they are not all zero bool allZero = true; for (int j = 0; j < metadataLookup.size(); j++) { if (metadataLookup[j]->getAbundance(i) != 0) { allZero = false; break; } } //if they are not all zero add this bin if (!allZero) { for (int j = 0; j < metadataLookup.size(); j++) { newLookup[j]->push_back(metadataLookup[j]->getAbundance(i), metadataLookup[j]->getGroup()); } newBinLabels.push_back(metadataLabels[i]); } } metadataLabels = newBinLabels; for (int j = 0; j < metadataLookup.size(); j++) { delete metadataLookup[j]; } metadataLookup.clear(); metadata.resize(newLookup[0]->getNumBins()); for (int i = 0; i < newLookup[0]->getNumBins(); i++) { for (int j = 0; j < newLookup.size(); j++) { metadata[i].push_back(newLookup[j]->getAbundance(i)); } } for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } catch(exception& e) { m->errorOut(e, "OTUAssociationCommand", "getMetadata"); exit(1); } } /*****************************************************************/ mothur-1.36.1/source/commands/otuassociationcommand.h000066400000000000000000000026611255543666200230200ustar00rootroot00000000000000#ifndef OTUASSOCIATIONCOMMAND_H #define OTUASSOCIATIONCOMMAND_H /* * otuassociationcommand.h * Mothur * * Created by westcott on 1/19/12. * Copyright 2012 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sharedrabundfloatvector.h" #include "inputdata.h" class OTUAssociationCommand : public Command { public: OTUAssociationCommand(string); OTUAssociationCommand(); ~OTUAssociationCommand(){} vector setParameters(); string getCommandName() { return "otu.association"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Otu.association"; } string getDescription() { return "calculate the correlation coefficient for the otus in a shared/relabund file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string sharedfile, relabundfile, metadatafile, groups, label, inputFileName, outputDir, method; bool abort, pickedGroups, allLines; double cutoff; set labels; vector metadataLookup; vector< vector< double> > metadata; vector outputNames, Groups, metadataLabels; int processShared(); int process(vector&); int processRelabund(); int process(vector&); int readMetadata(); int getMetadata(); }; #endif mothur-1.36.1/source/commands/otuhierarchycommand.cpp000066400000000000000000000337531255543666200230230ustar00rootroot00000000000000/* * otuhierarchycommand.cpp * Mothur * * Created by westcott on 1/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "otuhierarchycommand.h" #include "inputdata.h" //********************************************************************************************************************** vector OtuHierarchyCommand::setParameters(){ try { CommandParameter poutput("output", "Multiple", "name-number", "name", "", "", "","",false,false); parameters.push_back(poutput); CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","otuheirarchy",false,true,true); parameters.push_back(plist); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "OtuHierarchyCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string OtuHierarchyCommand::getHelpString(){ try { string helpString = ""; helpString += "The otu.hierarchy command is used to see how otus relate at two distances. \n"; helpString += "The otu.hierarchy command parameters are list, label and output. list and label parameters are required. \n"; helpString += "The output parameter allows you to output the names of the sequence in the OTUs or the OTU numbers. Options are name and number, default is name. \n"; helpString += "The otu.hierarchy command should be in the following format: \n"; helpString += "otu.hierarchy(list=yourListFile, label=yourLabels).\n"; helpString += "Example otu.hierarchy(list=amazon.fn.list, label=0.01-0.03).\n"; helpString += "The otu.hierarchy command outputs a .otu.hierarchy file which is described on the wiki.\n"; helpString += "Note: No spaces between parameter labels (i.e. list), '=' and parameters (i.e.yourListFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "OtuHierarchyCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string OtuHierarchyCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "otuheirarchy") { pattern = "[filename],[distance1],[tag],[distance2],otu.hierarchy"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "OtuHierarchyCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** OtuHierarchyCommand::OtuHierarchyCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["otuheirarchy"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "OtuHierarchyCommand", "OtuHierarchyCommand"); exit(1); } } //********************************************************************************************************************** OtuHierarchyCommand::OtuHierarchyCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["otuheirarchy"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } listFile = validParameter.validFile(parameters, "list", true); if (listFile == "not found") { listFile = m->getListFile(); if (listFile != "") { m->mothurOut("Using " + listFile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current list file. You must provide a list file."); m->mothurOutEndLine(); abort = true; } }else if (listFile == "not open") { abort = true; } else { m->setListFile(listFile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(listFile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { m->mothurOut("label is a required parameter for the otu.hierarchy command."); m->mothurOutEndLine(); abort = true; } else { m->splitAtDash(label, mylabels); if (mylabels.size() != 2) { m->mothurOut("You must provide 2 labels."); m->mothurOutEndLine(); abort = true; } } output = validParameter.validFile(parameters, "output", false); if (output == "not found") { output = "name"; } if ((output != "name") && (output != "number")) { m->mothurOut("output options are name and number. I will use name."); m->mothurOutEndLine(); output = "name"; } } } catch(exception& e) { m->errorOut(e, "OtuHierarchyCommand", "OtuHierarchyCommand"); exit(1); } } //********************************************************************************************************************** int OtuHierarchyCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get listvectors that correspond to labels requested, (or use smart distancing to get closest listvector) vector< vector > lists = getListVectors(); if (m->control_pressed) { outputTypes.clear(); return 0; } //determine which is little and which is big, putting little first if (lists.size() == 4) { //if big is first swap them if (lists[0].size() < lists[2].size()) { vector< vector > tempLists; tempLists.push_back(lists[2]); tempLists.push_back(lists[3]); tempLists.push_back(lists[0]); tempLists.push_back(lists[1]); lists = tempLists; string tempLabel = list2Label; list2Label = list1Label; list1Label = tempLabel; } }else{ m->mothurOut("error getting listvectors, unable to read 2 different vectors, check your label inputs."); m->mothurOutEndLine(); return 0; } //map sequences to bin number in the "little" otu map littleBins; vector binLabels0 = lists[0]; for (int i = 0; i < lists[0].size(); i++) { if (m->control_pressed) { return 0; } string bin = lists[1][i]; vector names; m->splitAtComma(bin, names); for (int j = 0; j < names.size(); j++) { littleBins[names[j]] = i; } } ofstream out; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listFile)); variables["[distance1]"] = list1Label; variables["[tag]"] = "-"; variables["[distance2]"] = list2Label; string outputFileName = getOutputFileName("otuheirarchy",variables); m->openOutputFile(outputFileName, out); //go through each bin in "big" otu and output the bins in "little" otu which created it vector binLabels1 = lists[2]; for (int i = 0; i < lists[2].size(); i++) { if (m->control_pressed) { outputTypes.clear(); out.close(); m->mothurRemove(outputFileName); return 0; } string binnames = lists[3][i]; vector names; m->splitAtComma(binnames, names); //output column 1 if (output == "name") { out << binnames << '\t'; } else { out << binLabels1[i] << '\t'; } map bins; //bin numbers in little that are in this bin in big map::iterator it; //parse bin for (int j = 0; j < names.size(); j++) { bins[littleBins[names[j]]] = littleBins[names[j]]; } string col2 = ""; for (it = bins.begin(); it != bins.end(); it++) { if (output == "name") { col2 += lists[1][it->first] + "\t"; } else { col2 += binLabels0[it->first] + "\t"; } } //output column 2 out << col2 << endl; } out.close(); if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFileName); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFileName); m->mothurOutEndLine(); outputNames.push_back(outputFileName); outputTypes["otuheirarchy"].push_back(outputFileName); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "OtuHierarchyCommand", "execute"); exit(1); } } //********************************************************************************************************************** //returns a vector of listVectors where "little" vector is first vector< vector > OtuHierarchyCommand::getListVectors() { //return value [0] -> otulabelsFirstLabel [1] -> binsFirstLabel [2] -> otulabelsSecondLabel [3] -> binsSecondLabel try { vector< vector > lists; int count = 0; for (set::iterator it = mylabels.begin(); it != mylabels.end(); it++) { string realLabel; vector< vector > thisList = getListVector(*it, realLabel); if (m->control_pressed) { return lists; } for (int i = 0; i < thisList.size(); i++) { lists.push_back(thisList[i]); } if (count == 0) { list1Label = realLabel; count++; } else { list2Label = realLabel; } } return lists; } catch(exception& e) { m->errorOut(e, "OtuHierarchyCommand", "getListVectors"); exit(1); } } //********************************************************************************************************************** vector< vector > OtuHierarchyCommand::getListVector(string label, string& realLabel){ //return value [0] -> otulabels [1] -> bins try { vector< vector > myList; InputData input(listFile, "list"); ListVector* list = input.getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return myList; } if(labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); break; } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below //list->setLabel(saveLabel); break; } lastLabel = list->getLabel(); //get next line to process //prevent memory leak delete list; list = input.getListVector(); } if (m->control_pressed) { return myList; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { delete list; list = input.getListVector(lastLabel); } //at this point the list vector has the right distance myList.push_back(list->getLabels()); vector bins; for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { return myList; } bins.push_back(list->get(i)); } myList.push_back(bins); realLabel = list->getLabel(); delete list; return myList; } catch(exception& e) { m->errorOut(e, "OtuHierarchyCommand", "getListVector"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/otuhierarchycommand.h000066400000000000000000000025301255543666200224550ustar00rootroot00000000000000#ifndef OTUHIERARCHYCOMMAND_H #define OTUHIERARCHYCOMMAND_H /* * otuhierarchycommand.h * Mothur * * Created by westcott on 1/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "listvector.hpp" //********************************************************************************************************************** class OtuHierarchyCommand : public Command { public: OtuHierarchyCommand(string); OtuHierarchyCommand(); ~OtuHierarchyCommand(){} vector setParameters(); string getCommandName() { return "otu.hierarchy"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Otu.hierarchy"; } string getDescription() { return "relates OTUs at different distances"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; set mylabels; //holds labels to be used string label, listFile, outputDir, output, list1Label, list2Label; vector outputNames; vector< vector > getListVectors(); vector< vector > getListVector(string, string&); }; //********************************************************************************************************************** #endif mothur-1.36.1/source/commands/pairwiseseqscommand.cpp000066400000000000000000001366531255543666200230370ustar00rootroot00000000000000/* * pairwiseseqscommand.cpp * Mothur * * Created by westcott on 10/20/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "pairwiseseqscommand.h" //********************************************************************************************************************** vector PairwiseSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","phylip-column",false,true,true); parameters.push_back(pfasta); CommandParameter palign("align", "Multiple", "needleman-gotoh-blast-noalign", "needleman", "", "", "","",false,false); parameters.push_back(palign); CommandParameter pmatch("match", "Number", "", "1.0", "", "", "","",false,false); parameters.push_back(pmatch); CommandParameter pmismatch("mismatch", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pmismatch); CommandParameter pgapopen("gapopen", "Number", "", "-2.0", "", "", "","",false,false); parameters.push_back(pgapopen); CommandParameter pgapextend("gapextend", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pgapextend); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter poutput("output", "Multiple", "column-lt-square-phylip", "column", "", "", "","phylip-column",false,false,true); parameters.push_back(poutput); CommandParameter pcalc("calc", "Multiple", "nogaps-eachgap-onegap", "onegap", "", "", "","",false,false); parameters.push_back(pcalc); CommandParameter pcountends("countends", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pcountends); CommandParameter pcompress("compress", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pcompress); CommandParameter pcutoff("cutoff", "Number", "", "1.0", "", "", "","",false,false,true); parameters.push_back(pcutoff); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PairwiseSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The pairwise.seqs command reads a fasta file and creates distance matrix.\n"; helpString += "The pairwise.seqs command parameters are fasta, align, match, mismatch, gapopen, gapextend, calc, output, cutoff and processors.\n"; helpString += "The fasta parameter is required. You may enter multiple fasta files by separating their names with dashes. ie. fasta=abrecovery.fasta-amzon.fasta \n"; helpString += "The align parameter allows you to specify the alignment method to use. Your options are: gotoh, needleman, blast and noalign. The default is needleman.\n"; helpString += "The match parameter allows you to specify the bonus for having the same base. The default is 1.0.\n"; helpString += "The mistmatch parameter allows you to specify the penalty for having different bases. The default is -1.0.\n"; helpString += "The gapopen parameter allows you to specify the penalty for opening a gap in an alignment. The default is -2.0.\n"; helpString += "The gapextend parameter allows you to specify the penalty for extending a gap in an alignment. The default is -1.0.\n"; helpString += "The calc parameter allows you to specify the method of calculating the distances. Your options are: nogaps, onegap or eachgap. The default is onegap.\n"; helpString += "The countends parameter allows you to specify whether to include terminal gaps in distance. Your options are: T or F. The default is T.\n"; helpString += "The cutoff parameter allows you to specify maximum distance to keep. The default is 1.0.\n"; helpString += "The output parameter allows you to specify format of your distance matrix. Options are column, lt, and square. The default is column.\n"; helpString += "The compress parameter allows you to indicate that you want the resulting distance file compressed. The default is false.\n"; helpString += "The pairwise.seqs command should be in the following format: \n"; helpString += "pairwise.seqs(fasta=yourfastaFile, align=yourAlignmentMethod) \n"; helpString += "Example pairwise.seqs(fasta=candidate.fasta, align=blast)\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string PairwiseSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "phylip") { pattern = "[filename],[outputtag],dist"; } else if (type == "column") { pattern = "[filename],dist"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** PairwiseSeqsCommand::PairwiseSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "PairwiseSeqsCommand"); exit(1); } } //********************************************************************************************************************** PairwiseSeqsCommand::PairwiseSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("pairwise.seqs"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } fastaFileName = validParameter.validFile(parameters, "fasta", false); if (fastaFileName == "not found") { //if there is a current fasta file, use it string filename = m->getFastaFile(); if (filename != "") { fastaFileNames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else { m->splitAtDash(fastaFileName, fastaFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < fastaFileNames.size(); i++) { bool ignore = false; if (fastaFileNames[i] == "current") { fastaFileNames[i] = m->getFastaFile(); if (fastaFileNames[i] != "") { m->mothurOut("Using " + fastaFileNames[i] + " as input file for the fasta parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(fastaFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { fastaFileNames[i] = inputDir + fastaFileNames[i]; } } int ableToOpen; ifstream in; ableToOpen = m->openInputFile(fastaFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } //if you can't open it, try output location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(fastaFileNames[i]); m->mothurOut("Unable to open " + fastaFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fastaFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + fastaFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list fastaFileNames.erase(fastaFileNames.begin()+i); i--; }else { m->setFastaFile(fastaFileNames[i]); } } } //make sure there is at least one valid file left if (fastaFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "match", false); if (temp == "not found"){ temp = "1.0"; } m->mothurConvert(temp, match); temp = validParameter.validFile(parameters, "mismatch", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, misMatch); if (misMatch > 0) { m->mothurOut("[ERROR]: mismatch must be negative.\n"); abort=true; } temp = validParameter.validFile(parameters, "gapopen", false); if (temp == "not found"){ temp = "-2.0"; } m->mothurConvert(temp, gapOpen); if (gapOpen > 0) { m->mothurOut("[ERROR]: gapopen must be negative.\n"); abort=true; } temp = validParameter.validFile(parameters, "gapextend", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, gapExtend); if (gapExtend > 0) { m->mothurOut("[ERROR]: gapextend must be negative.\n"); abort=true; } temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "cutoff", false); if(temp == "not found"){ temp = "1.0"; } m->mothurConvert(temp, cutoff); temp = validParameter.validFile(parameters, "countends", false); if(temp == "not found"){ temp = "T"; } countends = m->isTrue(temp); temp = validParameter.validFile(parameters, "compress", false); if(temp == "not found"){ temp = "F"; } compress = m->isTrue(temp); align = validParameter.validFile(parameters, "align", false); if (align == "not found"){ align = "needleman"; } output = validParameter.validFile(parameters, "output", false); if(output == "not found"){ output = "column"; } if (output=="phylip") { output = "lt"; } if ((output != "column") && (output != "lt") && (output != "square")) { m->mothurOut(output + " is not a valid output form. Options are column, lt and square. I will use column."); m->mothurOutEndLine(); output = "column"; } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "onegap"; } else { if (calc == "default") { calc = "onegap"; } } m->splitAtDash(calc, Estimators); } } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "PairwiseSeqsCommand"); exit(1); } } //********************************************************************************************************************** int PairwiseSeqsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } longestBase = 2000; //will need to update this in driver if we find sequences with more bases. hardcoded so we don't have the pre-read user fasta file. cutoff += 0.005; for (int s = 0; s < fastaFileNames.size(); s++) { if (m->control_pressed) { outputTypes.clear(); return 0; } m->mothurOut("Processing sequences from " + fastaFileNames[s] + " ..." ); m->mothurOutEndLine(); if (outputDir == "") { outputDir += m->hasPath(fastaFileNames[s]); } ifstream inFASTA; m->openInputFile(fastaFileNames[s], inFASTA); alignDB = SequenceDB(inFASTA); inFASTA.close(); int numSeqs = alignDB.getNumSeqs(); int startTime = time(NULL); string outputFile = ""; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFileNames[s])); if (output == "lt") { //does the user want lower triangle phylip formatted file variables["[outputtag]"] = "phylip"; outputFile = getOutputFileName("phylip", variables); m->mothurRemove(outputFile); outputTypes["phylip"].push_back(outputFile); }else if (output == "column") { //user wants column format outputFile = getOutputFileName("column", variables); outputTypes["column"].push_back(outputFile); m->mothurRemove(outputFile); }else { //assume square variables["[outputtag]"] = "square"; outputFile = getOutputFileName("phylip", variables); m->mothurRemove(outputFile); outputTypes["phylip"].push_back(outputFile); } #ifdef USE_MPI int pid, start, end; int tag = 2001; MPI_Status status; MPI_Comm_size(MPI_COMM_WORLD, &processors); //set processors to the number of mpi processes running MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are //each process gets where it should start and stop in the file if (output != "square") { start = int (sqrt(float(pid)/float(processors)) * numSeqs); end = int (sqrt(float(pid+1)/float(processors)) * numSeqs); }else{ start = int ((float(pid)/float(processors)) * numSeqs); end = int ((float(pid+1)/float(processors)) * numSeqs); } if (output == "column") { MPI_File outMPI; int amode=MPI_MODE_CREATE|MPI_MODE_WRONLY; char filename[1024]; strcpy(filename, outputFile.c_str()); MPI_File_open(MPI_COMM_WORLD, filename, amode, MPI_INFO_NULL, &outMPI); if (pid == 0) { //you are the root process //do your part string outputMyPart; driverMPI(start, end, outMPI, cutoff); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&outMPI); m->mothurRemove(outputFile); return 0; } //wait on chidren for(int i = 1; i < processors; i++) { if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&outMPI); m->mothurRemove(outputFile); return 0; } char buf[5]; MPI_Recv(buf, 5, MPI_CHAR, i, tag, MPI_COMM_WORLD, &status); } }else { //you are a child process //do your part driverMPI(start, end, outMPI, cutoff); if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&outMPI); m->mothurRemove(outputFile); return 0; } char buf[5]; strcpy(buf, "done"); //tell parent you are done. MPI_Send(buf, 5, MPI_CHAR, 0, tag, MPI_COMM_WORLD); } MPI_File_close(&outMPI); }else { //lower triangle format if (pid == 0) { //you are the root process //do your part string outputMyPart; unsigned long long mySize; if (output != "square"){ driverMPI(start, end, outputFile, mySize); } else { driverMPI(start, end, outputFile, mySize, output); } if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFile); return 0; } int amode=MPI_MODE_APPEND|MPI_MODE_WRONLY|MPI_MODE_CREATE; // MPI_File outMPI; MPI_File inMPI; char filename[1024]; strcpy(filename, outputFile.c_str()); MPI_File_open(MPI_COMM_SELF, filename, amode, MPI_INFO_NULL, &outMPI); //wait on chidren for(int b = 1; b < processors; b++) { unsigned long long fileSize; if (m->control_pressed) { outputTypes.clear(); MPI_File_close(&outMPI); m->mothurRemove(outputFile); return 0; } MPI_Recv(&fileSize, 1, MPI_LONG, b, tag, MPI_COMM_WORLD, &status); string outTemp = outputFile + toString(b) + ".temp"; char* buf = new char[outTemp.length()]; memcpy(buf, outTemp.c_str(), outTemp.length()); MPI_File_open(MPI_COMM_SELF, buf, MPI_MODE_DELETE_ON_CLOSE|MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); delete buf; int count = 0; while (count < fileSize) { char buf2[1]; MPI_File_read(inMPI, buf2, 1, MPI_CHAR, &status); MPI_File_write(outMPI, buf2, 1, MPI_CHAR, &status); count += 1; } MPI_File_close(&inMPI); //deleted on close } MPI_File_close(&outMPI); }else { //you are a child process //do your part unsigned long long size; if (output != "square"){ driverMPI(start, end, (outputFile + toString(pid) + ".temp"), size); } else { driverMPI(start, end, (outputFile + toString(pid) + ".temp"), size, output); } if (m->control_pressed) { return 0; } //tell parent you are done. MPI_Send(&size, 1, MPI_LONG, 0, tag, MPI_COMM_WORLD); } } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else //#if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //if you don't need to fork anything if(processors == 1){ if (output != "square") { driver(0, numSeqs, outputFile, cutoff); } else { driver(0, numSeqs, outputFile, "square"); } }else{ //you have multiple processors for (int i = 0; i < processors; i++) { distlinePair tempLine; lines.push_back(tempLine); if (output != "square") { lines[i].start = int (sqrt(float(i)/float(processors)) * numSeqs); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numSeqs); }else{ lines[i].start = int ((float(i)/float(processors)) * numSeqs); lines[i].end = int ((float(i+1)/float(processors)) * numSeqs); } } createProcesses(outputFile); } //#else //ifstream inFASTA; //if (output != "square") { driver(0, numSeqs, outputFile, cutoff); } //else { driver(0, numSeqs, outputFile, "square"); } //#endif #endif if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFile); return 0; } #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif ifstream fileHandle; fileHandle.open(outputFile.c_str()); if(fileHandle) { m->gobble(fileHandle); if (fileHandle.eof()) { m->mothurOut(outputFile + " is blank. This can result if there are no distances below your cutoff."); m->mothurOutEndLine(); } } if (compress) { m->mothurOut("Compressing..."); m->mothurOutEndLine(); m->mothurOut("(Replacing " + outputFile + " with " + outputFile + ".gz)"); m->mothurOutEndLine(); system(("gzip -v " + outputFile).c_str()); outputNames.push_back(outputFile + ".gz"); }else { outputNames.push_back(outputFile); } #ifdef USE_MPI } #endif m->mothurOut("It took " + toString(time(NULL) - startTime) + " to calculate the distances for " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); if (m->control_pressed) { outputTypes.clear(); m->mothurRemove(outputFile); return 0; } } //set phylip file as new current phylipfile string current = ""; itTypes = outputTypes.find("phylip"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setPhylipFile(current); } } //set column file as new current columnfile itTypes = outputTypes.find("column"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setColumnFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "execute"); exit(1); } } /**************************************************************************************************/ void PairwiseSeqsCommand::createProcesses(string filename) { try { int process = 1; processIDS.clear(); bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ if (output != "square") { driver(lines[process].start, lines[process].end, filename + m->mothurGetpid(process) + ".temp", cutoff); } else { driver(lines[process].start, lines[process].end, filename + m->mothurGetpid(process) + ".temp", "square"); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide int numSeqs = alignDB.getNumSeqs(); lines.clear(); for (int i = 0; i < processors; i++) { distlinePair tempLine; lines.push_back(tempLine); if (output != "square") { lines[i].start = int (sqrt(float(i)/float(processors)) * numSeqs); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numSeqs); }else{ lines[i].start = int ((float(i)/float(processors)) * numSeqs); lines[i].end = int ((float(i+1)/float(processors)) * numSeqs); } } processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ if (output != "square") { driver(lines[process].start, lines[process].end, filename + m->mothurGetpid(process) + ".temp", cutoff); } else { driver(lines[process].start, lines[process].end, filename + m->mothurGetpid(process) + ".temp", "square"); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i=0;i pDataArray; //[processors-1]; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor-1 worker threads. for( int i=0; icount != (pDataArray[i]->end-pDataArray[i]->start)) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end-pDataArray[i]->start) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //append and remove temp files for (int i=0;iappendFiles((filename + toString(processIDS[i]) + ".temp"), filename); m->mothurRemove((filename + toString(processIDS[i]) + ".temp")); } } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int PairwiseSeqsCommand::driver(int startLine, int endLine, string dFileName, float cutoff){ try { int startTime = time(NULL); Alignment* alignment; if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, longestBase); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } else if(align == "blast") { alignment = new BlastAlignment(gapOpen, gapExtend, match, misMatch); } else if(align == "noalign") { alignment = new NoAlign(); } else { m->mothurOut(align + " is not a valid alignment option. I will run the command using needleman."); m->mothurOutEndLine(); alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } ValidCalculators validCalculator; Dist* distCalculator; if (countends) { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap") { distCalculator = new eachGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapDist(); } } }else { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap"){ distCalculator = new eachGapIgnoreTermGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapIgnoreTermGapDist(); } } } //column file ofstream outFile(dFileName.c_str(), ios::trunc); outFile.setf(ios::fixed, ios::showpoint); outFile << setprecision(4); if((output == "lt") && startLine == 0){ outFile << alignDB.getNumSeqs() << endl; } for(int i=startLine;icontrol_pressed) { outFile.close(); delete alignment; delete distCalculator; return 0; } if (alignDB.get(i).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(i).getUnaligned().length()+1); } if (alignDB.get(j).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(j).getUnaligned().length()+1); } Sequence seqI(alignDB.get(i).getName(), alignDB.get(i).getAligned()); Sequence seqJ(alignDB.get(j).getName(), alignDB.get(j).getAligned()); alignment->align(seqI.getUnaligned(), seqJ.getUnaligned()); seqI.setAligned(alignment->getSeqAAln()); seqJ.setAligned(alignment->getSeqBAln()); distCalculator->calcDist(seqI, seqJ); double dist = distCalculator->getDist(); if (m->debug) { m->mothurOut("[DEBUG]: " + seqI.getName() + '\t' + alignment->getSeqAAln() + '\n' + seqJ.getName() + alignment->getSeqBAln() + '\n' + "distance = " + toString(dist) + "\n"); } if(dist <= cutoff){ if (output == "column") { outFile << alignDB.get(i).getName() << ' ' << alignDB.get(j).getName() << ' ' << dist << endl; } } if (output == "lt") { outFile << '\t' << dist; } } if (output == "lt") { outFile << endl; } if(i % 100 == 0){ m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } } m->mothurOutJustToScreen(toString(endLine-1) + "\t" + toString(time(NULL) - startTime)+"\n"); outFile.close(); delete alignment; delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "driver"); exit(1); } } /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int PairwiseSeqsCommand::driver(int startLine, int endLine, string dFileName, string square){ try { int startTime = time(NULL); Alignment* alignment; if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, longestBase); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } else if(align == "blast") { alignment = new BlastAlignment(gapOpen, gapExtend, match, misMatch); } else if(align == "noalign") { alignment = new NoAlign(); } else { m->mothurOut(align + " is not a valid alignment option. I will run the command using needleman."); m->mothurOutEndLine(); alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } ValidCalculators validCalculator; Dist* distCalculator; if (countends) { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap") { distCalculator = new eachGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapDist(); } } }else { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap"){ distCalculator = new eachGapIgnoreTermGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapIgnoreTermGapDist(); } } } //column file ofstream outFile(dFileName.c_str(), ios::trunc); outFile.setf(ios::fixed, ios::showpoint); outFile << setprecision(4); if(startLine == 0){ outFile << alignDB.getNumSeqs() << endl; } for(int i=startLine;icontrol_pressed) { outFile.close(); delete alignment; delete distCalculator; return 0; } if (alignDB.get(i).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(i).getUnaligned().length()+1); } if (alignDB.get(j).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(j).getUnaligned().length()+1); } Sequence seqI(alignDB.get(i).getName(), alignDB.get(i).getAligned()); Sequence seqJ(alignDB.get(j).getName(), alignDB.get(j).getAligned()); alignment->align(seqI.getUnaligned(), seqJ.getUnaligned()); seqI.setAligned(alignment->getSeqAAln()); seqJ.setAligned(alignment->getSeqBAln()); distCalculator->calcDist(seqI, seqJ); double dist = distCalculator->getDist(); outFile << '\t' << dist; if (m->debug) { m->mothurOut("[DEBUG]: " + seqI.getName() + '\t' + alignment->getSeqAAln() + '\n' + seqJ.getName() + alignment->getSeqBAln() + '\n' + "distance = " + toString(dist) + "\n"); } } outFile << endl; if(i % 100 == 0){ m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } } m->mothurOutJustToScreen(toString(endLine-1) + "\t" + toString(time(NULL) - startTime)+"\n"); outFile.close(); delete alignment; delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "driver"); exit(1); } } #ifdef USE_MPI /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int PairwiseSeqsCommand::driverMPI(int startLine, int endLine, MPI_File& outMPI, float cutoff){ try { MPI_Status status; int startTime = time(NULL); Alignment* alignment; if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, longestBase); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } else if(align == "blast") { alignment = new BlastAlignment(gapOpen, gapExtend, match, misMatch); } else if(align == "noalign") { alignment = new NoAlign(); } else { m->mothurOut(align + " is not a valid alignment option. I will run the command using needleman."); m->mothurOutEndLine(); alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } ValidCalculators validCalculator; Dist* distCalculator; if (countends) { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap") { distCalculator = new eachGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapDist(); } } }else { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap"){ distCalculator = new eachGapIgnoreTermGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapIgnoreTermGapDist(); } } } string outputString = ""; for(int i=startLine;icontrol_pressed) { delete alignment; delete distCalculator; return 0; } if (alignDB.get(i).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(i).getUnaligned().length()+1); } if (alignDB.get(j).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(j).getUnaligned().length()+1); } Sequence seqI(alignDB.get(i).getName(), alignDB.get(i).getAligned()); Sequence seqJ(alignDB.get(j).getName(), alignDB.get(j).getAligned()); alignment->align(seqI.getUnaligned(), seqJ.getUnaligned()); seqI.setAligned(alignment->getSeqAAln()); seqJ.setAligned(alignment->getSeqBAln()); distCalculator->calcDist(seqI, seqJ); double dist = distCalculator->getDist(); if (m->debug) { cout << ("[DEBUG]: " + seqI.getName() + '\t' + alignment->getSeqAAln() + '\n' + seqJ.getName() + alignment->getSeqBAln() + '\n' + "distance = " + toString(dist) + "\n"); } if(dist <= cutoff){ outputString += (alignDB.get(i).getName() + ' ' + alignDB.get(j).getName() + ' ' + toString(dist) + '\n'); } } if(i % 100 == 0){ //m->mothurOut(toString(i) + "\t" + toString(time(NULL) - startTime)); m->mothurOutEndLine(); cout << i << '\t' << (time(NULL) - startTime) << endl; } //send results to parent int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write_shared(outMPI, buf, length, MPI_CHAR, &status); outputString = ""; delete buf; } delete alignment; delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "driverMPI"); exit(1); } } /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int PairwiseSeqsCommand::driverMPI(int startLine, int endLine, string file, unsigned long long& size){ try { MPI_Status status; MPI_File outMPI; int amode=MPI_MODE_CREATE|MPI_MODE_WRONLY; char filename[1024]; strcpy(filename, file.c_str()); MPI_File_open(MPI_COMM_SELF, filename, amode, MPI_INFO_NULL, &outMPI); Alignment* alignment; if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, longestBase); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } else if(align == "blast") { alignment = new BlastAlignment(gapOpen, gapExtend, match, misMatch); } else if(align == "noalign") { alignment = new NoAlign(); } else { m->mothurOut(align + " is not a valid alignment option. I will run the command using needleman."); m->mothurOutEndLine(); alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } ValidCalculators validCalculator; Dist* distCalculator; if (countends) { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap") { distCalculator = new eachGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapDist(); } } }else { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap"){ distCalculator = new eachGapIgnoreTermGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapIgnoreTermGapDist(); } } } string outputString = ""; size = 0; if(startLine == 0){ outputString += toString(alignDB.getNumSeqs()) + "\n"; } for(int i=startLine;icontrol_pressed) { delete alignment; delete distCalculator; return 0; } if (alignDB.get(i).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(i).getUnaligned().length()+1); } if (alignDB.get(j).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(j).getUnaligned().length()+1); } Sequence seqI(alignDB.get(i).getName(), alignDB.get(i).getAligned()); Sequence seqJ(alignDB.get(j).getName(), alignDB.get(j).getAligned()); alignment->align(seqI.getUnaligned(), seqJ.getUnaligned()); seqI.setAligned(alignment->getSeqAAln()); seqJ.setAligned(alignment->getSeqBAln()); distCalculator->calcDist(seqI, seqJ); double dist = distCalculator->getDist(); if (m->debug) { cout << ("[DEBUG]: " + seqI.getName() + '\t' + alignment->getSeqAAln() + '\n' + seqJ.getName() + alignment->getSeqBAln() + '\n' + "distance = " + toString(dist) + "\n"); } outputString += + "\t" + toString(dist); } outputString += "\n"; //send results to parent int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write(outMPI, buf, length, MPI_CHAR, &status); size += outputString.length(); outputString = ""; delete buf; } MPI_File_close(&outMPI); delete alignment; delete distCalculator; return 1; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "driverMPI"); exit(1); } } /**************************************************************************************************/ /////// need to fix to work with calcs and sequencedb int PairwiseSeqsCommand::driverMPI(int startLine, int endLine, string file, unsigned long long& size, string square){ try { MPI_Status status; MPI_File outMPI; int amode=MPI_MODE_CREATE|MPI_MODE_WRONLY; char filename[1024]; strcpy(filename, file.c_str()); MPI_File_open(MPI_COMM_SELF, filename, amode, MPI_INFO_NULL, &outMPI); Alignment* alignment; if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, longestBase); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } else if(align == "blast") { alignment = new BlastAlignment(gapOpen, gapExtend, match, misMatch); } else if(align == "noalign") { alignment = new NoAlign(); } else { m->mothurOut(align + " is not a valid alignment option. I will run the command using needleman."); m->mothurOutEndLine(); alignment = new NeedlemanOverlap(gapOpen, match, misMatch, longestBase); } ValidCalculators validCalculator; Dist* distCalculator; if (countends) { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap") { distCalculator = new eachGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapDist(); } } }else { if (validCalculator.isValidCalculator("distance", Estimators[0]) == true) { if (Estimators[0] == "nogaps") { distCalculator = new ignoreGaps(); } else if (Estimators[0] == "eachgap"){ distCalculator = new eachGapIgnoreTermGapDist(); } else if (Estimators[0] == "onegap") { distCalculator = new oneGapIgnoreTermGapDist(); } } } string outputString = ""; size = 0; if(startLine == 0){ outputString += toString(alignDB.getNumSeqs()) + "\n"; } for(int i=startLine;icontrol_pressed) { delete alignment; return 0; } if (alignDB.get(i).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(i).getUnaligned().length()+1); } if (alignDB.get(j).getUnaligned().length() > alignment->getnRows()) { alignment->resize(alignDB.get(j).getUnaligned().length()+1); } Sequence seqI(alignDB.get(i).getName(), alignDB.get(i).getAligned()); Sequence seqJ(alignDB.get(j).getName(), alignDB.get(j).getAligned()); alignment->align(seqI.getUnaligned(), seqJ.getUnaligned()); seqI.setAligned(alignment->getSeqAAln()); seqJ.setAligned(alignment->getSeqBAln()); distCalculator->calcDist(seqI, seqJ); double dist = distCalculator->getDist(); outputString += + "\t" + toString(dist); if (m->debug) { cout << ("[DEBUG]: " + seqI.getName() + '\t' + alignment->getSeqAAln() + '\n' + seqJ.getName() + alignment->getSeqBAln() + '\n' + "distance = " + toString(dist) + "\n"); } } outputString += "\n"; //send results to parent int length = outputString.length(); char* buf = new char[length]; memcpy(buf, outputString.c_str(), length); MPI_File_write(outMPI, buf, length, MPI_CHAR, &status); size += outputString.length(); outputString = ""; delete buf; } MPI_File_close(&outMPI); delete alignment; return 1; } catch(exception& e) { m->errorOut(e, "PairwiseSeqsCommand", "driverMPI"); exit(1); } } #endif /**************************************************************************************************/ mothur-1.36.1/source/commands/pairwiseseqscommand.h000066400000000000000000000313751255543666200224770ustar00rootroot00000000000000#ifndef PAIRWISESEQSCOMMAND_H #define PAIRWISESEQSCOMMAND_H /* * pairwiseseqscommand.h * Mothur * * Created by westcott on 10/20/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "database.hpp" #include "alignment.hpp" #include "validcalculator.h" #include "dist.h" #include "sequencedb.h" #include "sequence.hpp" #include "gotohoverlap.hpp" #include "needlemanoverlap.hpp" #include "blastalign.hpp" #include "noalign.hpp" #include "ignoregaps.h" #include "eachgapdist.h" #include "eachgapignore.h" #include "onegapdist.h" #include "onegapignore.h" class PairwiseSeqsCommand : public Command { public: PairwiseSeqsCommand(string); PairwiseSeqsCommand(); ~PairwiseSeqsCommand() {} vector setParameters(); string getCommandName() { return "pairwise.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Needleman SB, Wunsch CD (1970). A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443-53. [ for needleman ]\nGotoh O (1982). An improved algorithm for matching biological sequences. J Mol Biol 162: 705-8. [ for gotoh ] \nhttp://www.mothur.org/wiki/Pairwise.seqs"; } string getDescription() { return "calculates pairwise distances from an unaligned fasta file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: struct distlinePair { int start; int end; }; vector processIDS; //end line, processid vector lines; SequenceDB alignDB; void createProcesses(string); int driver(int, int, string, float); int driver(int, int, string, string); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, float); int driverMPI(int, int, string, unsigned long long&); int driverMPI(int, int, string, unsigned long long&, string); #endif string fastaFileName, align, calc, outputDir, output; float match, misMatch, gapOpen, gapExtend, cutoff; int processors, longestBase; vector fastaFileNames, Estimators; vector outputNames; bool abort, countends, compress; }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct pairwiseData { string outputFileName; string align, square, distcalcType, output; unsigned long long start; unsigned long long end; MothurOut* m; float match, misMatch, gapOpen, gapExtend, cutoff; int count, threadID, longestBase; bool countends; SequenceDB alignDB; pairwiseData(){} pairwiseData(string ofn, string al, string sq, string di, bool co, string op, SequenceDB DB, MothurOut* mout, unsigned long long st, unsigned long long en, float ma, float misMa, float gapO, float gapE, int thr, float cu, int tid) { outputFileName = ofn; m = mout; start = st; end = en; match = ma; misMatch = misMa; gapOpen = gapO; gapExtend = gapE; longestBase = thr; align = al; square = sq; distcalcType = di; countends = co; alignDB = DB; count = 0; output = op; cutoff = cu; threadID = tid; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyPairwiseSquareThreadFunction(LPVOID lpParam){ pairwiseData* pDataArray; pDataArray = (pairwiseData*)lpParam; try { ofstream outFile((pDataArray->outputFileName).c_str(), ios::trunc); outFile.setf(ios::fixed, ios::showpoint); outFile << setprecision(4); pDataArray->count = 0; int startTime = time(NULL); Alignment* alignment; if(pDataArray->align == "gotoh") { alignment = new GotohOverlap(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch, pDataArray->longestBase); } else if(pDataArray->align == "needleman") { alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, pDataArray->longestBase); } else if(pDataArray->align == "blast") { alignment = new BlastAlignment(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch); } else if(pDataArray->align == "noalign") { alignment = new NoAlign(); } else { pDataArray->m->mothurOut(pDataArray->align + " is not a valid alignment option. I will run the command using needleman."); pDataArray->m->mothurOutEndLine(); alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, pDataArray->longestBase); } ValidCalculators validCalculator; Dist* distCalculator; if (pDataArray->countends) { if (validCalculator.isValidCalculator("distance", pDataArray->distcalcType) == true) { if (pDataArray->distcalcType == "nogaps") { distCalculator = new ignoreGaps(); } else if (pDataArray->distcalcType == "eachgap") { distCalculator = new eachGapDist(); } else if (pDataArray->distcalcType == "onegap") { distCalculator = new oneGapDist(); } } }else { if (validCalculator.isValidCalculator("distance", pDataArray->distcalcType) == true) { if (pDataArray->distcalcType == "nogaps") { distCalculator = new ignoreGaps(); } else if (pDataArray->distcalcType == "eachgap"){ distCalculator = new eachGapIgnoreTermGapDist(); } else if (pDataArray->distcalcType == "onegap") { distCalculator = new oneGapIgnoreTermGapDist(); } } } if(pDataArray->start == 0){ outFile << pDataArray->alignDB.getNumSeqs() << endl; } for(int i=pDataArray->start;iend;i++){ pDataArray->count++; string name = pDataArray->alignDB.get(i).getName(); //pad with spaces to make compatible if (name.length() < 10) { while (name.length() < 10) { name += " "; } } outFile << name; for(int j=0;jalignDB.getNumSeqs();j++){ if (pDataArray->m->control_pressed) { outFile.close(); delete alignment; delete distCalculator; return 0; } if (pDataArray->alignDB.get(i).getUnaligned().length() > alignment->getnRows()) { alignment->resize(pDataArray->alignDB.get(i).getUnaligned().length()+1); } if (pDataArray->alignDB.get(j).getUnaligned().length() > alignment->getnRows()) { alignment->resize(pDataArray->alignDB.get(j).getUnaligned().length()+1); } Sequence seqI(pDataArray->alignDB.get(i).getName(), pDataArray->alignDB.get(i).getAligned()); Sequence seqJ(pDataArray->alignDB.get(j).getName(), pDataArray->alignDB.get(j).getAligned()); alignment->align(seqI.getUnaligned(), seqJ.getUnaligned()); seqI.setAligned(alignment->getSeqAAln()); seqJ.setAligned(alignment->getSeqBAln()); distCalculator->calcDist(seqI, seqJ); double dist = distCalculator->getDist(); if (pDataArray->m->debug) { pDataArray->m->mothurOut("[DEBUG]: " + seqI.getName() + '\t' + alignment->getSeqAAln() + '\n' + seqJ.getName() + alignment->getSeqBAln() + '\n' + "distance = " + toString(dist) + "\n"); } outFile << '\t' << dist; } outFile << endl; if(i % 100 == 0){ pDataArray->m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } } pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count) + "\t" + toString(time(NULL) - startTime)+"\n"); outFile.close(); delete alignment; delete distCalculator; } catch(exception& e) { pDataArray->m->errorOut(e, "PairwiseSeqsCommand", "MyPairwiseSquareThreadFunction"); exit(1); } } /**************************************************************************************************/ static DWORD WINAPI MyPairwiseThreadFunction(LPVOID lpParam){ pairwiseData* pDataArray; pDataArray = (pairwiseData*)lpParam; try { ofstream outFile((pDataArray->outputFileName).c_str(), ios::trunc); outFile.setf(ios::fixed, ios::showpoint); outFile << setprecision(4); int startTime = time(NULL); Alignment* alignment; if(pDataArray->align == "gotoh") { alignment = new GotohOverlap(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch, pDataArray->longestBase); } else if(pDataArray->align == "needleman") { alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, pDataArray->longestBase); } else if(pDataArray->align == "blast") { alignment = new BlastAlignment(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch); } else if(pDataArray->align == "noalign") { alignment = new NoAlign(); } else { pDataArray->m->mothurOut(pDataArray->align + " is not a valid alignment option. I will run the command using needleman."); pDataArray->m->mothurOutEndLine(); alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, pDataArray->longestBase); } ValidCalculators validCalculator; Dist* distCalculator; if (pDataArray->countends) { if (validCalculator.isValidCalculator("distance", pDataArray->distcalcType) == true) { if (pDataArray->distcalcType == "nogaps") { distCalculator = new ignoreGaps(); } else if (pDataArray->distcalcType == "eachgap") { distCalculator = new eachGapDist(); } else if (pDataArray->distcalcType == "onegap") { distCalculator = new oneGapDist(); } } }else { if (validCalculator.isValidCalculator("distance", pDataArray->distcalcType) == true) { if (pDataArray->distcalcType == "nogaps") { distCalculator = new ignoreGaps(); } else if (pDataArray->distcalcType == "eachgap"){ distCalculator = new eachGapIgnoreTermGapDist(); } else if (pDataArray->distcalcType == "onegap") { distCalculator = new oneGapIgnoreTermGapDist(); } } } if((pDataArray->output == "lt") && pDataArray->start == 0){ outFile << pDataArray->alignDB.getNumSeqs() << endl; } pDataArray->count = 0; for(int i=pDataArray->start;iend;i++){ pDataArray->count++; if(pDataArray->output == "lt") { string name = pDataArray->alignDB.get(i).getName(); if (name.length() < 10) { //pad with spaces to make compatible while (name.length() < 10) { name += " "; } } outFile << name; } for(int j=0;jm->control_pressed) { outFile.close(); delete alignment; delete distCalculator; return 0; } if (pDataArray->alignDB.get(i).getUnaligned().length() > alignment->getnRows()) { alignment->resize(pDataArray->alignDB.get(i).getUnaligned().length()+1); } if (pDataArray->alignDB.get(j).getUnaligned().length() > alignment->getnRows()) { alignment->resize(pDataArray->alignDB.get(j).getUnaligned().length()+1); } Sequence seqI(pDataArray->alignDB.get(i).getName(), pDataArray->alignDB.get(i).getAligned()); Sequence seqJ(pDataArray->alignDB.get(j).getName(), pDataArray->alignDB.get(j).getAligned()); alignment->align(seqI.getUnaligned(), seqJ.getUnaligned()); seqI.setAligned(alignment->getSeqAAln()); seqJ.setAligned(alignment->getSeqBAln()); distCalculator->calcDist(seqI, seqJ); double dist = distCalculator->getDist(); if (pDataArray->m->debug) { pDataArray->m->mothurOut("[DEBUG]: " + seqI.getName() + '\t' + alignment->getSeqAAln() + '\n' + seqJ.getName() + alignment->getSeqBAln() + '\n' + "distance = " + toString(dist) + "\n"); } if(dist <= pDataArray->cutoff){ if (pDataArray->output == "column") { outFile << pDataArray->alignDB.get(i).getName() << ' ' << pDataArray->alignDB.get(j).getName() << ' ' << dist << endl; } } if (pDataArray->output == "lt") { outFile << '\t' << dist; } } if (pDataArray->output == "lt") { outFile << endl; } if(i % 100 == 0){ pDataArray->m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } } pDataArray->m->mothurOutJustToScreen(toString(pDataArray->end-1) + "\t" + toString(time(NULL) - startTime)+"\n"); outFile.close(); delete alignment; delete distCalculator; } catch(exception& e) { pDataArray->m->errorOut(e, "PairwiseSeqsCommand", "MyPairwiseThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/parsefastaqcommand.cpp000066400000000000000000002472221255543666200226250ustar00rootroot00000000000000/* * parsefastaqcommand.cpp * Mothur * * Created by westcott on 9/30/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "parsefastaqcommand.h" #include "sequence.hpp" //********************************************************************************************************************** vector ParseFastaQCommand::setParameters(){ try { CommandParameter pfile("file", "InputTypes", "", "", "fastqFile", "fastqFile", "none","",false,false,true); parameters.push_back(pfile); CommandParameter pfastq("fastq", "InputTypes", "", "", "fastqFile", "fastqFile", "none","",false,false,true); parameters.push_back(pfastq); CommandParameter poligos("oligos", "InputTypes", "", "", "oligosGroup", "none", "none","",false,false); parameters.push_back(poligos); CommandParameter pgroup("group", "InputTypes", "", "", "oligosGroup", "none", "none","",false,false); parameters.push_back(pgroup); CommandParameter preorient("checkorient", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(preorient); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ppdiffs); CommandParameter pbdiffs("bdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pbdiffs); CommandParameter pldiffs("ldiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pldiffs); CommandParameter psdiffs("sdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psdiffs); CommandParameter ptdiffs("tdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ptdiffs); CommandParameter pfasta("fasta", "Boolean", "", "T", "", "", "","fasta",false,false); parameters.push_back(pfasta); CommandParameter pqual("qfile", "Boolean", "", "T", "", "", "","qfile",false,false); parameters.push_back(pqual); CommandParameter ppacbio("pacbio", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(ppacbio); CommandParameter pformat("format", "Multiple", "sanger-illumina-solexa-illumina1.8+", "sanger", "", "", "","",false,false,true); parameters.push_back(pformat); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ParseFastaQCommand::getHelpString(){ try { string helpString = ""; helpString += "The fastq.info command reads a fastq file and creates a fasta and quality file or can be used to parse fastq files by sample.\n"; helpString += "The fastq.info command parameters are file, fastq, fasta, qfile, oligos, group and format; file or fastq is required.\n"; helpString += "The fastq.info command should be in the following format: fastq.info(fastaq=yourFastaQFile).\n"; helpString += "The oligos parameter allows you to provide an oligos file to split your fastq file into separate fastq files by barcode and primers. \n"; helpString += "The group parameter allows you to provide a group file to split your fastq file into separate fastq files by group. \n"; helpString += "The tdiffs parameter is used to specify the total number of differences allowed in the reads. The default is pdiffs + bdiffs + sdiffs + ldiffs.\n"; helpString += "The bdiffs parameter is used to specify the number of differences allowed in the barcode. The default is 0.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; helpString += "The ldiffs parameter is used to specify the number of differences allowed in the linker. The default is 0.\n"; helpString += "The sdiffs parameter is used to specify the number of differences allowed in the spacer. The default is 0.\n"; helpString += "The checkorient parameter will check look for the reverse compliment of the barcode or primer in the sequence. If found the sequence is flipped. The default is false.\n"; helpString += "The format parameter is used to indicate whether your sequences are sanger, solexa, illumina1.8+ or illumina, default=sanger.\n"; helpString += "The fasta parameter allows you to indicate whether you want a fasta file generated. Default=T.\n"; helpString += "The qfile parameter allows you to indicate whether you want a quality file generated. Default=T.\n"; helpString += "The pacbio parameter allows you to indicate .... When set to true, quality scores of 0 will results in a corresponding base of N. Default=F.\n"; helpString += "Example fastq.info(fastaq=test.fastaq).\n"; helpString += "Note: No spaces between parameter labels (i.e. fastq), '=' and yourFastQFile.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ParseFastaQCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],fasta-[filename],[group],[tag],fasta-[filename],[group],fasta"; } else if (type == "qfile") { pattern = "[filename],qual-[filename],[group],[tag],qual-[filename],[group],qual"; } else if (type == "fastq") { pattern = "[filename],[group],fastq-[filename],[group],[tag],fastq"; } //make.sra assumes the [filename],[group],[tag],fastq format for the 4 column file option. If this changes, may have to modify fixMap function. else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ParseFastaQCommand::ParseFastaQCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["fastq"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "ParseFastaQCommand"); exit(1); } } //********************************************************************************************************************** ParseFastaQCommand::ParseFastaQCommand(string option){ try { abort = false; calledHelp = false; fileOption = 0; createFileGroup = false; hasIndex = false; split = 1; if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["fastq"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fastq"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fastq"] = inputDir + it->second; } } it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } } //check for required parameters fastaQFile = validParameter.validFile(parameters, "fastq", true); if (fastaQFile == "not found") { fastaQFile= ""; } else if (fastaQFile == "not open") { fastaQFile = ""; abort = true; } else { inputfile = fastaQFile; } file = validParameter.validFile(parameters, "file", true); if (file == "not found") { file = ""; } else if (file == "not open") { file = ""; abort = true; } else { inputfile = file; fileOption = true; } if ((file == "") && (fastaQFile == "")) { m->mothurOut("You must provide a file or fastq option."); m->mothurOutEndLine(); abort = true; } oligosfile = validParameter.validFile(parameters, "oligos", true); if (oligosfile == "not found") { oligosfile = ""; } else if (oligosfile == "not open") { oligosfile = ""; abort = true; } else { m->setOligosFile(oligosfile); split = 2; } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not found") { groupfile = ""; } else if (groupfile == "not open") { groupfile = ""; abort = true; } else { m->setGroupFile(groupfile); split = 2; } if ((groupfile != "") && (oligosfile != "")) { m->mothurOut("You must enter ONLY ONE of the following: oligos or group."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } string temp; temp = validParameter.validFile(parameters, "fasta", false); if(temp == "not found"){ temp = "T"; } fasta = m->isTrue(temp); temp = validParameter.validFile(parameters, "qfile", false); if(temp == "not found"){ temp = "T"; } qual = m->isTrue(temp); temp = validParameter.validFile(parameters, "pacbio", false); if(temp == "not found"){ temp = "F"; } pacbio = m->isTrue(temp); temp = validParameter.validFile(parameters, "bdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, bdiffs); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, pdiffs); temp = validParameter.validFile(parameters, "ldiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, ldiffs); temp = validParameter.validFile(parameters, "sdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, sdiffs); temp = validParameter.validFile(parameters, "tdiffs", false); if (temp == "not found") { int tempTotal = pdiffs + bdiffs + ldiffs + sdiffs; temp = toString(tempTotal); } m->mothurConvert(temp, tdiffs); if(tdiffs == 0){ tdiffs = bdiffs + pdiffs + ldiffs + sdiffs; } format = validParameter.validFile(parameters, "format", false); if (format == "not found"){ format = "sanger"; } if ((format != "sanger") && (format != "illumina") && (format != "illumina1.8+") && (format != "solexa")) { m->mothurOut(format + " is not a valid format. Your format choices are sanger, solexa, illumina1.8+ and illumina, aborting." ); m->mothurOutEndLine(); abort=true; } if ((!fasta) && (!qual) && (file == "")) { m->mothurOut("[ERROR]: no outputs selected. Aborting."); m->mothurOutEndLine(); abort=true; } temp = validParameter.validFile(parameters, "checkorient", false); if (temp == "not found") { temp = "F"; } reorient = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "ParseFastaQCommand"); exit(1); } } //********************************************************************************************************************** int ParseFastaQCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } vector< vector > files; if (file != "") { //read file files = readFile(); } if (m->control_pressed) { return 0; } TrimOligos* trimOligos = NULL; TrimOligos* rtrimOligos = NULL; pairedOligos = false; numBarcodes = 0; numPrimers= 0; numLinkers= 0; numSpacers = 0; numRPrimers = 0; if (oligosfile != "") { readOligos(oligosfile); //find group read belongs to if (pairedOligos) { trimOligos = new TrimOligos(pdiffs, bdiffs, 0, 0, oligos.getPairedPrimers(), oligos.getPairedBarcodes(), hasIndex); numBarcodes = oligos.getPairedBarcodes().size(); numPrimers = oligos.getPairedPrimers().size(); } else { trimOligos = new TrimOligos(pdiffs, bdiffs, ldiffs, sdiffs, oligos.getPrimers(), oligos.getBarcodes(), oligos.getReversePrimers(), oligos.getLinkers(), oligos.getSpacers()); numPrimers = oligos.getPrimers().size(); numBarcodes = oligos.getBarcodes().size(); } if (reorient) { rtrimOligos = new TrimOligos(pdiffs, bdiffs, 0, 0, oligos.getReorientedPairedPrimers(), oligos.getReorientedPairedBarcodes(), hasIndex); numBarcodes = oligos.getReorientedPairedBarcodes().size(); } }else if (groupfile != "") { readGroup(groupfile); } if (file != "") { if (m->control_pressed) { return 0; } for (int i = 0; i < files.size(); i++) { //process each pair if (m->control_pressed) { break; } if ((fileOption == 2) || (fileOption == 4)) { processFile(files[i], trimOligos, rtrimOligos); } else if (fileOption == 3) { if (m->mothurCalling) { //add group names to fastq files and make copies ofstream temp, temp2; map variables; variables["[filename]"] = m->getRootName(files[i][0]); variables["[group]"] = file2Group[i]; variables["[tag]"] = "forward"; string newffqFile = getOutputFileName("fastq", variables); m->openOutputFile(newffqFile, temp); temp.close(); m->appendFiles(files[i][0], newffqFile); outputNames.push_back(newffqFile); outputTypes["fastq"].push_back(newffqFile); variables["[filename]"] = m->getRootName(files[i][1]); variables["[group]"] = file2Group[i]; variables["[tag]"] = "reverse"; string newfrqFile = getOutputFileName("fastq", variables); m->openOutputFile(newfrqFile, temp2); temp2.close(); m->appendFiles(files[i][1], newfrqFile); outputNames.push_back(newfrqFile); outputTypes["fastq"].push_back(newfrqFile); } //if requested, make fasta and qual if (fasta || qual) { processFile(files[i], trimOligos, rtrimOligos); } //split = 1, so no parseing by group will be done. } } }else { processFile(fastaQFile, trimOligos, rtrimOligos); } if (split > 1) { map::iterator it; set namesToRemove; for(int i=0;iisBlank(fastqFileNames[i][j])){ m->mothurRemove(fastqFileNames[i][j]); namesToRemove.insert(fastqFileNames[i][j]); if (pairedOligos) { if (fileOption) { m->mothurRemove(rfastqFileNames[i][j]); namesToRemove.insert(rfastqFileNames[i][j]); } } if(fasta){ m->mothurRemove(fastaFileNames[i][j]); namesToRemove.insert(fastaFileNames[i][j]); if (pairedOligos) { if (fileOption) { m->mothurRemove(rfastaFileNames[i][j]); namesToRemove.insert(rfastaFileNames[i][j]); } } } if(qual){ m->mothurRemove(qualFileNames[i][j]); namesToRemove.insert(qualFileNames[i][j]); if (pairedOligos) { if (fileOption) { m->mothurRemove(rqualFileNames[i][j]); namesToRemove.insert(rqualFileNames[i][j]); } } } } } } } } //remove names for outputFileNames, just cleans up the output for(int i = 0; i < outputNames.size(); i++) { if (namesToRemove.count(outputNames[i]) != 0) { outputNames.erase(outputNames.begin()+i); i--; }else { string ending = outputNames[i].substr(outputNames[i].length()-5); if (ending == "fastq") { outputTypes["fastq"].push_back(outputNames[i]); } else if (ending == "fasta") { outputTypes["fasta"].push_back(outputNames[i]); } else if (ending == ".qfile") { outputTypes["qfile"].push_back(outputNames[i]); } } } //ffqnoMatchFile, rfqnoMatchFile, ffnoMatchFile, rfnoMatchFile, fqnoMatchFile, rqnoMatchFile if(m->isBlank(ffqnoMatchFile)){ m->mothurRemove(ffqnoMatchFile); } else { outputNames.push_back(ffqnoMatchFile); outputTypes["fastq"].push_back(ffqnoMatchFile); } if(fasta){ if(m->isBlank(ffnoMatchFile)){ m->mothurRemove(ffnoMatchFile); } else { outputNames.push_back(ffnoMatchFile); outputTypes["fasta"].push_back(ffnoMatchFile); } } if(qual){ if(m->isBlank(fqnoMatchFile)){ m->mothurRemove(fqnoMatchFile); } else { outputNames.push_back(fqnoMatchFile); outputTypes["qfile"].push_back(fqnoMatchFile); } } if (pairedOligos) { if (fileOption) { if(m->isBlank(rfqnoMatchFile)){ m->mothurRemove(rfqnoMatchFile); } else { outputNames.push_back(rfqnoMatchFile); outputTypes["fastq"].push_back(rfqnoMatchFile); } if(fasta){ if(m->isBlank(rfnoMatchFile)){ m->mothurRemove(rfnoMatchFile); } else { outputNames.push_back(rfnoMatchFile); outputTypes["fasta"].push_back(rfnoMatchFile); } } if(qual){ if(m->isBlank(rqnoMatchFile)){ m->mothurRemove(rqnoMatchFile); } else { outputNames.push_back(rqnoMatchFile); outputTypes["qfile"].push_back(rqnoMatchFile); } } } } } if (groupfile != "") { delete groupMap; } else if (oligosfile != "") { delete trimOligos; if (reorient) { delete rtrimOligos; } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); outputNames.clear(); return 0; } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("qfile"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setQualFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "execute"); exit(1); } } //********************************************************************************************************************** //assumes file option was used int ParseFastaQCommand::processFile(vector files, TrimOligos*& trimOligos, TrimOligos*& rtrimOligos){ try { string inputfile = files[0]; string inputReverse = files[1]; //open Output Files map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); string ffastaFile = getOutputFileName("fasta",variables); string fqualFile = getOutputFileName("qfile",variables); variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputReverse)); string rfastaFile = getOutputFileName("fasta",variables); string rqualFile = getOutputFileName("qfile",variables); ofstream outfFasta, outfQual, outrFasta, outrQual; if (fasta) { m->openOutputFile(ffastaFile, outfFasta); outputNames.push_back(ffastaFile); outputTypes["fasta"].push_back(ffastaFile); m->openOutputFile(rfastaFile, outrFasta); outputNames.push_back(rfastaFile); outputTypes["fasta"].push_back(rfastaFile);} if (qual) { m->openOutputFile(fqualFile, outfQual); outputNames.push_back(fqualFile); outputTypes["qfile"].push_back(fqualFile); m->openOutputFile(rqualFile, outrQual); outputNames.push_back(rqualFile); outputTypes["qfile"].push_back(rqualFile); } ifstream inf; m->openInputFile(inputfile, inf); ifstream inr; m->openInputFile(inputReverse, inr); ifstream inFIndex, inRIndex; if (files[2] != "") { m->openInputFile(files[2], inFIndex); } if (files[3] != "") { m->openInputFile(files[3], inRIndex); } //fill convert table - goes from solexa to sanger. Used fq_all2std.pl as a reference. for (int i = -64; i < 65; i++) { char temp = (char) ((int)(33 + 10*log(1+pow(10,(i/10.0)))/log(10)+0.499)); convertTable.push_back(temp); } int count = 0; while (!inf.eof() && !inr.eof()) { if (m->control_pressed) { break; } bool ignoref, ignorer; fastqRead2 thisfRead = readFastq(inf, ignoref); fastqRead2 thisrRead = readFastq(inr, ignorer); if (!ignoref && ! ignorer) { vector fqualScores; vector rqualScores; if (qual) { fqualScores = convertQual(thisfRead.quality); outfQual << ">" << thisfRead.seq.getName() << endl; for (int i = 0; i < fqualScores.size(); i++) { outfQual << fqualScores[i] << " "; } outfQual << endl; rqualScores = convertQual(thisrRead.quality); outrQual << ">" << thisrRead.seq.getName() << endl; for (int i = 0; i < rqualScores.size(); i++) { outrQual << rqualScores[i] << " "; } outrQual << endl; } if (m->control_pressed) { break; } if (pacbio) { if (!qual) { rqualScores = convertQual(thisrRead.quality); fqualScores = convertQual(thisfRead.quality); } //convert if not done string sequence = thisfRead.seq.getAligned(); for (int i = 0; i < fqualScores.size(); i++) { if (fqualScores[i] == 0){ sequence[i] = 'N'; } } thisfRead.seq.setAligned(sequence); sequence = thisrRead.seq.getAligned(); for (int i = 0; i < rqualScores.size(); i++) { if (rqualScores[i] == 0){ sequence[i] = 'N'; } } thisrRead.seq.setAligned(sequence); } //print sequence info to files if (fasta) { thisfRead.seq.printSequence(outfFasta); thisrRead.seq.printSequence(outrFasta); } if (split > 1) { Sequence findexBarcode("findex", "NONE"); Sequence rindexBarcode("rindex", "NONE"); if (fileOption == 4) { bool ignorefi, ignoreri; if (files[2] != "") { fastqRead2 thisfiRead = readFastq(inFIndex, ignorefi); if (!ignorefi) { findexBarcode.setAligned(thisfiRead.seq.getAligned()); } } if (files[3] != "") { fastqRead2 thisriRead = readFastq(inRIndex, ignoreri); if (!ignoreri) { rindexBarcode.setAligned(thisriRead.seq.getAligned()); } } } int barcodeIndex, primerIndex, trashCodeLength; if (oligosfile != "") { if ((files[2] != "") || (files[3] != "")) { Sequence tempF = thisfRead.seq; Sequence tempR = thisrRead.seq; thisfRead.seq = findexBarcode; thisrRead.seq = rindexBarcode; trashCodeLength = findGroup(thisfRead, thisrRead, barcodeIndex, primerIndex, trimOligos, rtrimOligos, numBarcodes, numPrimers); thisfRead.seq = tempF; thisrRead.seq = tempR; }else { trashCodeLength = findGroup(thisfRead, thisrRead, barcodeIndex, primerIndex, trimOligos, rtrimOligos, numBarcodes, numPrimers); } }else if (groupfile != "") { trashCodeLength = findGroup(thisfRead, barcodeIndex, primerIndex, "groupMode"); } else { m->mothurOut("[ERROR]: uh oh, we shouldn't be here...\n"); } if(trashCodeLength == 0){ ofstream out; m->openOutputFileAppend(fastqFileNames[barcodeIndex][primerIndex], out); out << thisfRead.wholeRead; out.close(); ofstream out2; m->openOutputFileAppend(rfastqFileNames[barcodeIndex][primerIndex], out2); out2 << thisrRead.wholeRead; out2.close(); //print no match fasta, if wanted if (fasta) { ofstream outf, outr; m->openOutputFileAppend(fastaFileNames[barcodeIndex][primerIndex], outf); thisfRead.seq.printSequence(outf); outf.close(); m->openOutputFileAppend(rfastaFileNames[barcodeIndex][primerIndex], outr); thisrRead.seq.printSequence(outr); outr.close(); } //print no match quality parse, if wanted if (qual) { ofstream outq, outq2; m->openOutputFileAppend(qualFileNames[barcodeIndex][primerIndex], outq); outq << ">" << thisfRead.seq.getName() << endl; for (int i = 0; i < fqualScores.size(); i++) { outq << fqualScores[i] << " "; } outq << endl; outq.close(); m->openOutputFileAppend(rqualFileNames[barcodeIndex][primerIndex], outq2); outq2 << ">" << thisrRead.seq.getName() << endl; for (int i = 0; i < rqualScores.size(); i++) { outq2 << rqualScores[i] << " "; } outq2 << endl; outq2.close(); } }else{ //print no match fastq ofstream out, out2; m->openOutputFileAppend(ffqnoMatchFile, out); out << thisfRead.wholeRead; out.close(); m->openOutputFileAppend(rfqnoMatchFile, out2); out2 << thisrRead.wholeRead; out2.close(); //print no match fasta, if wanted if (fasta) { ofstream outf, outr; m->openOutputFileAppend(ffnoMatchFile, outf); thisfRead.seq.printSequence(outf); outf.close(); m->openOutputFileAppend(rfnoMatchFile, outr); thisrRead.seq.printSequence(outr); outr.close(); } //print no match quality parse, if wanted if (qual) { ofstream outq, outq2; m->openOutputFileAppend(fqnoMatchFile, outq); outq << ">" << thisfRead.seq.getName() << endl; for (int i = 0; i < fqualScores.size(); i++) { outq << fqualScores[i] << " "; } outq << endl; outq.close(); m->openOutputFileAppend(rqnoMatchFile, outq2); outq2 << ">" << thisrRead.seq.getName() << endl; for (int i = 0; i < rqualScores.size(); i++) { outq2 << rqualScores[i] << " "; } outq2 << endl; outq2.close(); } } } //report progress if((count+1) % 10000 == 0){ m->mothurOut(toString(count+1)); m->mothurOutEndLine(); } count++; } } inf.close(); inr.close(); if (files[2] != "") { inFIndex.close(); } if (files[3] != "") { inRIndex.close(); } if (fasta) { outfFasta.close(); outrFasta.close(); } if (qual) { outfQual.close(); outrQual.close(); } //report progress if (!m->control_pressed) { if((count) % 10000 != 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } } return 0; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "processFile"); exit(1); } } //********************************************************************************************************************** int ParseFastaQCommand::processFile(string inputfile, TrimOligos*& trimOligos, TrimOligos*& rtrimOligos){ try { //open Output Files map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); string fastaFile = getOutputFileName("fasta",variables); string qualFile = getOutputFileName("qfile",variables); ofstream outFasta, outQual; //fasta and quality files for whole input file if (fasta) { m->openOutputFile(fastaFile, outFasta); outputNames.push_back(fastaFile); outputTypes["fasta"].push_back(fastaFile); } if (qual) { m->openOutputFile(qualFile, outQual); outputNames.push_back(qualFile); outputTypes["qfile"].push_back(qualFile); } ifstream in; m->openInputFile(inputfile, in); //fill convert table - goes from solexa to sanger. Used fq_all2std.pl as a reference. for (int i = -64; i < 65; i++) { char temp = (char) ((int)(33 + 10*log(1+pow(10,(i/10.0)))/log(10)+0.499)); convertTable.push_back(temp); } int count = 0; while (!in.eof()) { if (m->control_pressed) { break; } bool ignore; fastqRead2 thisRead = readFastq(in, ignore); if (!ignore) { vector qualScores; if (qual) { qualScores = convertQual(thisRead.quality); outQual << ">" << thisRead.seq.getName() << endl; for (int i = 0; i < qualScores.size(); i++) { outQual << qualScores[i] << " "; } outQual << endl; } if (m->control_pressed) { break; } if (pacbio) { if (!qual) { qualScores = convertQual(thisRead.quality); } //convert if not done string sequence = thisRead.seq.getAligned(); for (int i = 0; i < qualScores.size(); i++) { if (qualScores[i] == 0){ sequence[i] = 'N'; } } thisRead.seq.setAligned(sequence); } //print sequence info to files if (fasta) { thisRead.seq.printSequence(outFasta); } if (split > 1) { int barcodeIndex, primerIndex, trashCodeLength; if (oligosfile != "") { trashCodeLength = findGroup(thisRead, barcodeIndex, primerIndex, trimOligos, rtrimOligos, numBarcodes, numPrimers); } else if (groupfile != "") { trashCodeLength = findGroup(thisRead, barcodeIndex, primerIndex, "groupMode"); } else { m->mothurOut("[ERROR]: uh oh, we shouldn't be here...\n"); } if(trashCodeLength == 0){ //files in here are per group //print fastq to barcode and primer match ofstream out; m->openOutputFileAppend(fastqFileNames[barcodeIndex][primerIndex], out); out << thisRead.wholeRead; out.close(); //print fasta match if wanted if (fasta) { ofstream outf; m->openOutputFileAppend(fastaFileNames[barcodeIndex][primerIndex], outf); thisRead.seq.printSequence(outf); outf.close(); } //print qual match, if wanted if (qual) { ofstream outq; m->openOutputFileAppend(qualFileNames[barcodeIndex][primerIndex], outq); outq << ">" << thisRead.seq.getName() << endl; for (int i = 0; i < qualScores.size(); i++) { outq << qualScores[i] << " "; } outq.close(); } }else{ //print no match fastq ofstream out; m->openOutputFileAppend(ffqnoMatchFile, out); out << thisRead.wholeRead; out.close(); //print no match fasta, if wanted if (fasta) { ofstream outf; m->openOutputFileAppend(ffnoMatchFile, outf); thisRead.seq.printSequence(outf); outf.close(); } //print no match quality parse, if wanted if (qual) { ofstream outq; m->openOutputFileAppend(fqnoMatchFile, outq); outq << ">" << thisRead.seq.getName() << endl; for (int i = 0; i < qualScores.size(); i++) { outq << qualScores[i] << " "; } outq.close(); } } } //report progress if((count+1) % 10000 == 0){ m->mothurOut(toString(count+1)); m->mothurOutEndLine(); } count++; } } in.close(); if (fasta) { outFasta.close(); } if (qual) { outQual.close(); } //report progress if (!m->control_pressed) { if((count) % 10000 != 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } } return 0; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "processFile"); exit(1); } } //********************************************************************************************************************** fastqRead2 ParseFastaQCommand::readFastq(ifstream& in, bool& ignore){ try { ignore = false; string wholeRead = ""; //read sequence name string line = m->getline(in); m->gobble(in); if (split > 1) { wholeRead += line + "\n"; } vector pieces = m->splitWhiteSpace(line); string name = ""; if (pieces.size() != 0) { name = pieces[0]; } if (name == "") { m->mothurOut("[WARNING]: Blank fasta name, ignoring read."); m->mothurOutEndLine(); ignore=true; } else if (name[0] != '@') { m->mothurOut("[WARNING]: reading " + name + " expected a name with @ as a leading character, ignoring read."); m->mothurOutEndLine(); ignore=true; } else { name = name.substr(1); } //read sequence string sequence = m->getline(in); m->gobble(in); if (split > 1) { wholeRead += sequence + "\n"; } if (sequence == "") { m->mothurOut("[WARNING]: missing sequence for " + name + ", ignoring."); ignore=true; } //read sequence name line = m->getline(in); m->gobble(in); if (split > 1) { wholeRead += line + "\n"; } pieces = m->splitWhiteSpace(line); string name2 = ""; if (pieces.size() != 0) { name2 = pieces[0]; } if (name2 == "") { m->mothurOut("[WARNING]: expected a name with + as a leading character, ignoring."); ignore=true; } else if (name2[0] != '+') { m->mothurOut("[WARNING]: reading " + name2 + " expected a name with + as a leading character, ignoring."); ignore=true; } else { name2 = name2.substr(1); if (name2 == "") { name2 = name; } } //read quality scores string quality = m->getline(in); m->gobble(in); if (split > 1) { wholeRead += quality + "\n"; } if (quality == "") { m->mothurOut("[WARNING]: missing quality for " + name2 + ", ignoring."); ignore=true; } //sanity check sequence length and number of quality scores match if (name2 != "") { if (name != name2) { m->mothurOut("[WARNING]: names do not match. read " + name + " for fasta and " + name2 + " for quality, ignoring."); ignore=true; } } if (quality.length() != sequence.length()) { m->mothurOut("[WARNING]: Lengths do not match for sequence " + name + ". Read " + toString(sequence.length()) + " characters for fasta and " + toString(quality.length()) + " characters for quality scores, ignoring read."); ignore=true; } m->checkName(name); Sequence seq(name, sequence); fastqRead2 read(seq, quality, wholeRead); if (m->debug) { m->mothurOut("[DEBUG]: " + read.seq.getName() + " " + read.seq.getAligned() + " " + quality + "\n"); } return read; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "readFastq"); exit(1); } } //********************************************************************************************************************** vector ParseFastaQCommand::convertQual(string qual) { try { vector qualScores; bool negativeScores = false; for (int i = 0; i < qual.length(); i++) { int temp = 0; temp = int(qual[i]); if (format == "illumina") { temp -= 64; //char '@' }else if (format == "illumina1.8+") { temp -= int('!'); //char '!' }else if (format == "solexa") { temp = int(convertTable[temp]); //convert to sanger temp -= int('!'); //char '!' }else { temp -= int('!'); //char '!' } if (temp < -5) { negativeScores = true; } qualScores.push_back(temp); } if (negativeScores) { m->mothurOut("[ERROR]: finding negative quality scores, do you have the right format selected? http://en.wikipedia.org/wiki/FASTQ_format#Encoding \n"); m->control_pressed = true; } return qualScores; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "convertQual"); exit(1); } } //********************************************************************************************************************** int ParseFastaQCommand::findGroup(fastqRead2 thisRead, int& barcode, int& primer, TrimOligos*& trimOligos, TrimOligos*& rtrimOligos, int numBarcodes, int numPrimers) { try { int success = 1; string trashCode = ""; int currentSeqsDiffs = 0; Sequence currSeq(thisRead.seq.getName(), thisRead.seq.getAligned()); QualityScores currQual; currQual.setScores(convertQual(thisRead.quality)); //for reorient Sequence savedSeq(currSeq.getName(), currSeq.getAligned()); QualityScores savedQual(currQual.getName(), currQual.getScores()); if(numLinkers != 0){ success = trimOligos->stripLinker(currSeq, currQual); if(success > ldiffs) { trashCode += 'k'; } else{ currentSeqsDiffs += success; } } if(numBarcodes != 0){ vector results = trimOligos->stripBarcode(currSeq, currQual, barcode); if (pairedOligos) { success = results[0] + results[2]; } else { success = results[0]; } if(success > bdiffs) { trashCode += 'b'; } else{ currentSeqsDiffs += success; } } if(numSpacers != 0){ success = trimOligos->stripSpacer(currSeq, currQual); if(success > sdiffs) { trashCode += 's'; } else{ currentSeqsDiffs += success; } } if(numPrimers != 0){ vector results = trimOligos->stripForward(currSeq, currQual, primer, true); if (pairedOligos) { success = results[0] + results[2]; } else { success = results[0]; } if(success > pdiffs) { trashCode += 'f'; } else{ currentSeqsDiffs += success; } } if(numRPrimers != 0){ vector results = trimOligos->stripReverse(currSeq, currQual); success = results[0]; if(success > pdiffs) { trashCode += 'r'; } else{ currentSeqsDiffs += success; } } if (currentSeqsDiffs > tdiffs) { trashCode += 't'; } if (reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; int thisCurrentSeqsDiffs = 0; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; //cout << currSeq.getName() << '\t' << savedSeq.getUnaligned() << endl; if(numBarcodes != 0){ vector results = rtrimOligos->stripBarcode(savedSeq, savedQual, thisBarcodeIndex); if (pairedOligos) { thisSuccess = results[0] + results[2]; } else { thisSuccess = results[0]; } if(thisSuccess > bdiffs) { thisTrashCode += "b"; } else{ thisCurrentSeqsDiffs += thisSuccess; } } //cout << currSeq.getName() << '\t' << savedSeq.getUnaligned() << endl; if(numPrimers != 0){ vector results = rtrimOligos->stripForward(savedSeq, savedQual, thisPrimerIndex, true); if (pairedOligos) { thisSuccess = results[0] + results[2]; } else { thisSuccess = results[0]; } if(thisSuccess > pdiffs) { thisTrashCode += "f"; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if (thisCurrentSeqsDiffs > tdiffs) { thisTrashCode += 't'; } if (thisTrashCode == "") { trashCode = thisTrashCode; success = thisSuccess; currentSeqsDiffs = thisCurrentSeqsDiffs; barcode = thisBarcodeIndex; primer = thisPrimerIndex; //savedSeq.reverseComplement(); //currSeq.setAligned(savedSeq.getAligned()); //savedQual.flipQScores(); //currQual.setScores(savedQual.getScores()); }else { trashCode += "(" + thisTrashCode + ")"; } } if (trashCode.length() == 0) { //is this sequence in the ignore group string thisGroup = oligos.getGroupName(barcode, primer); int pos = thisGroup.find("ignore"); if (pos != string::npos) { trashCode += "i"; } } return trashCode.length(); } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "findGroup"); exit(1); } } //********************************************************************************************************************** int ParseFastaQCommand::findGroup(fastqRead2 thisRead, int& barcode, int& primer, string groupMode) { try { string trashCode = ""; primer = 0; string group = groupMap->getGroup(thisRead.seq.getName()); if (group == "not found") { trashCode += "g"; } //scrap for group else { barcode = GroupToFile[group]; } return trashCode.length(); } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "findGroup"); exit(1); } } //********************************************************************************************************************** int ParseFastaQCommand::findGroup(fastqRead2 thisfRead, fastqRead2 thisrRead, int& barcode, int& primer, TrimOligos*& trimOligos, TrimOligos*& rtrimOligos, int numBarcodes, int numPrimers) { try { int success = 1; string trashCode = ""; int currentSeqsDiffs = 0; Sequence fcurrSeq(thisfRead.seq.getName(), thisfRead.seq.getAligned()); QualityScores fcurrQual; fcurrQual.setScores(convertQual(thisfRead.quality)); Sequence rcurrSeq(thisrRead.seq.getName(), thisrRead.seq.getAligned()); QualityScores rcurrQual; rcurrQual.setScores(convertQual(thisrRead.quality)); //for reorient Sequence fsavedSeq(fcurrSeq.getName(), fcurrSeq.getAligned()); QualityScores fsavedQual(fcurrQual.getName(), fcurrQual.getScores()); Sequence rsavedSeq(rcurrSeq.getName(), rcurrSeq.getAligned()); QualityScores rsavedQual(rcurrQual.getName(), rcurrQual.getScores()); if(numBarcodes != 0){ vector results = trimOligos->stripBarcode(fcurrSeq, rcurrSeq, fcurrQual, rcurrQual, barcode); if (pairedOligos) { success = results[0] + results[2]; } else { success = results[0]; } if(success > bdiffs) { trashCode += 'b'; } else{ currentSeqsDiffs += success; } } if(numPrimers != 0){ vector results = trimOligos->stripForward(fcurrSeq, rcurrSeq, fcurrQual, rcurrQual, primer); if (pairedOligos) { success = results[0] + results[2]; } else { success = results[0]; } if(success > pdiffs) { trashCode += 'f'; } else{ currentSeqsDiffs += success; } } if (currentSeqsDiffs > tdiffs) { trashCode += 't'; } if (reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; int thisCurrentSeqsDiffs = 0; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; if(numBarcodes != 0){ vector results = rtrimOligos->stripBarcode(fsavedSeq, rsavedSeq, fsavedQual, rsavedQual, thisBarcodeIndex); if (pairedOligos) { thisSuccess = results[0] + results[2]; } else { thisSuccess = results[0]; } if(thisSuccess > bdiffs) { thisTrashCode += 'b'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if(numPrimers != 0){ vector results = rtrimOligos->stripForward(fsavedSeq, rsavedSeq, fsavedQual, rsavedQual, thisPrimerIndex); if (pairedOligos) { thisSuccess = results[0] + results[2]; } else { thisSuccess = results[0]; } if(thisSuccess > pdiffs) { thisTrashCode += 'f'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if (thisCurrentSeqsDiffs > tdiffs) { thisTrashCode += 't'; } if (thisTrashCode == "") { trashCode = thisTrashCode; success = thisSuccess; currentSeqsDiffs = thisCurrentSeqsDiffs; barcode = thisBarcodeIndex; primer = thisPrimerIndex; //fsavedSeq.reverseComplement(); //rsavedSeq.reverseComplement(); //fcurrSeq.setAligned(fsavedSeq.getAligned()); //rcurrSeq.setAligned(rsavedSeq.getAligned()); //fsavedQual.flipQScores(); rsavedQual.flipQScores(); // fcurrQual.setScores(fsavedQual.getScores()); rcurrQual.setScores(rsavedQual.getScores()); }else { trashCode += "(" + thisTrashCode + ")"; } } if (trashCode.length() == 0) { //is this sequence in the ignore group string thisGroup = oligos.getGroupName(barcode, primer); int pos = thisGroup.find("ignore"); if (pos != string::npos) { trashCode += "i"; } } return trashCode.length(); } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "findGroup"); exit(1); } } //********************************************************************************************************************** /* file option 1 ffastqfile1 rfastqfile1 ffastqfile2 rfastqfile2 ... file option 2 group ffastqfile rfastqfile group ffastqfile rfastqfile group ffastqfile rfastqfile ... file option 3 My.forward.fastq My.reverse.fastq none My.rindex.fastq //none is an option is no forward or reverse index file */ //lines can be 2, 3, or 4 columns // forward.fastq reverse.fastq -> 2 column // groupName forward.fastq reverse.fastq -> 3 column // forward.fastq reverse.fastq forward.index.fastq reverse.index.fastq -> 4 column // forward.fastq reverse.fastq none reverse.index.fastq -> 4 column // forward.fastq reverse.fastq forward.index.fastq none -> 4 column vector< vector > ParseFastaQCommand::readFile(){ try { vector< vector > files; string forward, reverse, findex, rindex; ifstream in; m->openInputFile(inputfile, in); while(!in.eof()) { if (m->control_pressed) { return files; } string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); string group = ""; if (pieces.size() == 2) { forward = pieces[0]; reverse = pieces[1]; group = ""; findex = ""; rindex = ""; fileOption = 2; }else if (pieces.size() == 3) { if (oligosfile != "") { m->mothurOut("[ERROR]: You cannot have an oligosfile and 3 column file option at the same time. Aborting. \n"); m->control_pressed = true; } if (groupfile != "") { m->mothurOut("[ERROR]: You cannot have an groupfile and 3 column file option at the same time. Aborting. \n"); m->control_pressed = true; } group = pieces[0]; forward = pieces[1]; reverse = pieces[2]; findex = ""; rindex = ""; createFileGroup = true; fileOption = 3; }else if (pieces.size() == 4) { if (oligosfile == "") { m->mothurOut("[ERROR]: You must have an oligosfile with the index file option. Aborting. \n"); m->control_pressed = true; } forward = pieces[0]; reverse = pieces[1]; findex = pieces[2]; rindex = pieces[3]; fileOption = 4; hasIndex = true; if ((findex == "none") || (findex == "NONE")){ findex = ""; } if ((rindex == "none") || (rindex == "NONE")){ rindex = ""; } }else { m->mothurOut("[ERROR]: file lines can be 2, 3, or 4 columns. The forward fastq files in the first column and their matching reverse fastq files in the second column, or a groupName then forward fastq file and reverse fastq file, or forward fastq file then reverse fastq then forward index and reverse index file. If you only have one index file add 'none' for the other one. \n"); m->control_pressed = true; } if (m->debug) { m->mothurOut("[DEBUG]: group = " + group + ", forward = " + forward + ", reverse = " + reverse + ", forwardIndex = " + findex + ", reverseIndex = " + rindex + ".\n"); } if (inputDir != "") { string path = m->hasPath(forward); if (path == "") { forward = inputDir + forward; } path = m->hasPath(reverse); if (path == "") { reverse = inputDir + reverse; } if (findex != "") { path = m->hasPath(findex); if (path == "") { findex = inputDir + findex; } } if (rindex != "") { path = m->hasPath(rindex); if (path == "") { rindex = inputDir + rindex; } } } //check to make sure both are able to be opened ifstream in2; int openForward = m->openInputFile(forward, in2, "noerror"); //if you can't open it, try default location if (openForward == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(forward); m->mothurOut("Unable to open " + forward + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openForward = m->openInputFile(tryPath, in3, "noerror"); in3.close(); forward = tryPath; } } //if you can't open it, try output location if (openForward == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(forward); m->mothurOut("Unable to open " + forward + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openForward = m->openInputFile(tryPath, in4, "noerror"); forward = tryPath; in4.close(); } } if (openForward == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + forward + ", ignoring pair.\n"); }else{ in2.close(); } ifstream in3; int openReverse = m->openInputFile(reverse, in3, "noerror"); //if you can't open it, try default location if (openReverse == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(reverse); m->mothurOut("Unable to open " + reverse + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openReverse = m->openInputFile(tryPath, in3, "noerror"); in3.close(); reverse = tryPath; } } //if you can't open it, try output location if (openReverse == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(reverse); m->mothurOut("Unable to open " + reverse + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openReverse = m->openInputFile(tryPath, in4, "noerror"); reverse = tryPath; in4.close(); } } if (openReverse == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + reverse + ", ignoring pair.\n"); }else{ in3.close(); } int openFindex = 0; if (findex != "") { ifstream in4; openFindex = m->openInputFile(findex, in4, "noerror"); in4.close(); //if you can't open it, try default location if (openFindex == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(findex); m->mothurOut("Unable to open " + findex + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in5; openFindex = m->openInputFile(tryPath, in5, "noerror"); in5.close(); findex = tryPath; } } //if you can't open it, try output location if (openFindex == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(findex); m->mothurOut("Unable to open " + findex + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in6; openFindex = m->openInputFile(tryPath, in6, "noerror"); findex = tryPath; in6.close(); } } if (openFindex == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + findex + ", ignoring pair.\n"); } } int openRindex = 0; if (rindex != "") { ifstream in7; openRindex = m->openInputFile(rindex, in7, "noerror"); in7.close(); //if you can't open it, try default location if (openRindex == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(rindex); m->mothurOut("Unable to open " + rindex + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in8; openRindex = m->openInputFile(tryPath, in8, "noerror"); in8.close(); rindex = tryPath; } } //if you can't open it, try output location if (openRindex == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(rindex); m->mothurOut("Unable to open " + rindex + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in9; openRindex = m->openInputFile(tryPath, in9, "noerror"); rindex = tryPath; in9.close(); } } if (openRindex == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + rindex + ", ignoring pair.\n"); } } if ((openForward != 1) && (openReverse != 1) && (openFindex != 1) && (openRindex != 1)) { //good pair file2Group[files.size()] = group; vector pair; pair.push_back(forward); pair.push_back(reverse); pair.push_back(findex); pair.push_back(rindex); if (((findex != "") || (rindex != "")) && (oligosfile == "")) { m->mothurOut("[ERROR]: You need to provide an oligos file if you are going to use an index file.\n"); m->control_pressed = true; } files.push_back(pair); } } in.close(); return files; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "readFileNames"); exit(1); } } //*************************************************************************************************************** bool ParseFastaQCommand::readOligos(string oligoFile){ try { bool allBlank = false; if (fileOption) { oligos.read(oligosfile, false); } // like make.contigs else { oligos.read(oligosfile); } if (m->control_pressed) { return false; } //error in reading oligos if (oligos.hasPairedPrimers() || oligos.hasPairedBarcodes()) { pairedOligos = true; numPrimers = oligos.getPairedPrimers().size(); numBarcodes = oligos.getPairedBarcodes().size(); }else { pairedOligos = false; numPrimers = oligos.getPrimers().size(); numBarcodes = oligos.getBarcodes().size(); } numLinkers = oligos.getLinkers().size(); numSpacers = oligos.getSpacers().size(); numRPrimers = oligos.getReversePrimers().size(); vector groupNames = oligos.getGroupNames(); if (groupNames.size() == 0) { allBlank = true; } if (m->control_pressed) { return false; } fastqFileNames.resize(oligos.getBarcodeNames().size()); for(int i=0;i uniqueNames; //used to cleanup outputFileNames if (pairedOligos) { map barcodes = oligos.getPairedBarcodes(); map primers = oligos.getPairedPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->first); string barcodeName = oligos.getBarcodeName(itBar->first); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string comboName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } if(((itPrimer->second).forward+(itPrimer->second).reverse) == ""){ if ((itBar->second).forward != "NONE") { comboName += (itBar->second).forward; } if ((itBar->second).reverse != "NONE") { if (comboName == "") { comboName += (itBar->second).reverse; } else { comboName += ("."+(itBar->second).reverse); } } }else{ if(((itBar->second).forward+(itBar->second).reverse) == ""){ if ((itPrimer->second).forward != "NONE") { comboName += (itPrimer->second).forward; } if ((itPrimer->second).reverse != "NONE") { if (comboName == "") { comboName += (itPrimer->second).reverse; } else { comboName += ("."+(itPrimer->second).reverse); } } } else{ if ((itBar->second).forward != "NONE") { comboName += (itBar->second).forward; } if ((itBar->second).reverse != "NONE") { if (comboName == "") { comboName += (itBar->second).reverse; } else { comboName += ("."+(itBar->second).reverse); } } if ((itPrimer->second).forward != "NONE") { if (comboName == "") { comboName += (itPrimer->second).forward; } else { comboName += ("."+(itPrimer->second).forward); } } if ((itPrimer->second).reverse != "NONE") { if (comboName == "") { comboName += (itPrimer->second).reverse; } else { comboName += ("."+(itPrimer->second).reverse); } } } } if (comboName != "") { comboGroupName += "_" + comboName; } ofstream temp; map variables; if (fileOption) { variables["[tag]"] = "forward"; } variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = comboGroupName; string fastqFileName = getOutputFileName("fastq", variables); if (uniqueNames.count(fastqFileName) == 0) { outputNames.push_back(fastqFileName); uniqueNames.insert(fastqFileName); } fastqFileNames[itBar->first][itPrimer->first] = fastqFileName; m->openOutputFile(fastqFileName, temp); temp.close(); if (fileOption) { variables["[tag]"] = "reverse"; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); string rfastqFileName = getOutputFileName("fastq", variables); if (uniqueNames.count(rfastqFileName) == 0) { outputNames.push_back(rfastqFileName); uniqueNames.insert(rfastqFileName); } ofstream temp2; rfastqFileNames[itBar->first][itPrimer->first] = rfastqFileName; m->openOutputFile(rfastqFileName, temp2); temp2.close(); } if(fasta){ variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = comboGroupName; if (fileOption) { variables["[tag]"] = "forward"; } string fastaFileName = getOutputFileName("fasta", variables); if (uniqueNames.count(fastaFileName) == 0) { outputNames.push_back(fastaFileName); outputTypes["fasta"].push_back(fastaFileName); } ofstream temp3; fastaFileNames[itBar->first][itPrimer->first] = fastaFileName; m->openOutputFile(fastaFileName, temp3); temp3.close(); if (fileOption) { variables["[tag]"] = "reverse"; string fastaFileName2 = getOutputFileName("fasta", variables); if (uniqueNames.count(fastaFileName2) == 0) { outputNames.push_back(fastaFileName2); outputTypes["fasta"].push_back(fastaFileName2); } ofstream temp4; rfastaFileNames[itBar->first][itPrimer->first] = fastaFileName2; m->openOutputFile(fastaFileName2, temp4); temp4.close(); } } if(qual){ variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = comboGroupName; if (fileOption) { variables["[tag]"] = "forward"; } string qualFileName = getOutputFileName("qfile", variables); if (uniqueNames.count(qualFileName) == 0) { outputNames.push_back(qualFileName); outputTypes["qfile"].push_back(qualFileName); } ofstream temp4; qualFileNames[itBar->first][itPrimer->first] = qualFileName; m->openOutputFile(qualFileName, temp4); temp4.close(); if (fileOption) { variables["[tag]"] = "reverse"; string qualFileName2 = getOutputFileName("qfile", variables); if (uniqueNames.count(qualFileName2) == 0) { outputNames.push_back(qualFileName2); outputTypes["qfile"].push_back(qualFileName2); } ofstream temp5; rqualFileNames[itBar->first][itPrimer->first] = qualFileName2; m->openOutputFile(qualFileName2, temp5); temp5.close(); } } } } } //make blank files for no matches ofstream temp, tempff, tempfq, rtemp, temprf, temprq; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = "scrap"; if (fileOption) { variables["[tag]"] = "forward"; } ffqnoMatchFile = getOutputFileName("fastq", variables); m->openOutputFile(ffqnoMatchFile, temp); temp.close(); if (fileOption) { variables["[tag]"] = "reverse"; rfqnoMatchFile = getOutputFileName("fastq", variables); m->openOutputFile(rfqnoMatchFile, rtemp); rtemp.close(); } if (fasta) { if (fileOption) { variables["[tag]"] = "forward"; } ffnoMatchFile = getOutputFileName("fasta", variables); m->openOutputFile(ffnoMatchFile, tempff); tempff.close(); if (fileOption) { variables["[tag]"] = "reverse"; rfnoMatchFile = getOutputFileName("fasta", variables); m->openOutputFile(rfnoMatchFile, temprf); temprf.close(); } } if (qual) { if (fileOption) { variables["[tag]"] = "forward"; } fqnoMatchFile = getOutputFileName("qfile", variables); m->openOutputFile(fqnoMatchFile, tempfq); tempfq.close(); if (fileOption) { variables["[tag]"] = "reverse"; rqnoMatchFile = getOutputFileName("qfile", variables); m->openOutputFile(rqnoMatchFile, temprq); temprq.close(); } } }else { map barcodes = oligos.getBarcodes() ; map primers = oligos.getPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->second); string barcodeName = oligos.getBarcodeName(itBar->second); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string comboName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } if(itPrimer->first == ""){ comboName = itBar->first; }else{ if(itBar->first == ""){ comboName = itPrimer->first; } else{ comboName = itBar->first + "." + itPrimer->first; } } if (comboName != "") { comboGroupName += "_" + comboName; } ofstream temp; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = comboGroupName; string fastqFileName = getOutputFileName("fastq", variables); if (uniqueNames.count(fastqFileName) == 0) { outputNames.push_back(fastqFileName); uniqueNames.insert(fastqFileName); } fastqFileNames[itBar->second][itPrimer->second] = fastqFileName; m->openOutputFile(fastqFileName, temp); temp.close(); if(fasta){ variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = comboGroupName; string fastaFileName = getOutputFileName("fasta", variables); if (uniqueNames.count(fastaFileName) == 0) { outputNames.push_back(fastaFileName); outputTypes["fasta"].push_back(fastaFileName); } ofstream temp3; fastaFileNames[itBar->second][itPrimer->second] = fastaFileName; m->openOutputFile(fastaFileName, temp3); temp3.close(); } if(qual){ variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = comboGroupName; string qualFileName = getOutputFileName("qfile", variables); if (uniqueNames.count(qualFileName) == 0) { outputNames.push_back(qualFileName); outputTypes["qfile"].push_back(qualFileName); } ofstream temp4; qualFileNames[itBar->second][itPrimer->second] = qualFileName; m->openOutputFile(qualFileName, temp4); temp4.close(); } } } } //make blank files for no matches ofstream temp, tempff, tempfq; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = "scrap"; ffqnoMatchFile = getOutputFileName("fastq", variables); m->openOutputFile(ffqnoMatchFile, temp); temp.close(); if (fasta) { ffnoMatchFile = getOutputFileName("fasta", variables); m->openOutputFile(ffqnoMatchFile, tempff); tempff.close(); } if (qual) { fqnoMatchFile = getOutputFileName("qfile", variables); m->openOutputFile(fqnoMatchFile, tempfq); tempfq.close(); } } if (allBlank) { m->mothurOut("[WARNING]: your oligos file does not contain any group names. mothur will not create a groupfile."); m->mothurOutEndLine(); return false; } return true; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "getOligos"); exit(1); } } //*************************************************************************************************************** bool ParseFastaQCommand::readGroup(string groupfile){ try { fastqFileNames.clear(); groupMap = new GroupMap(); groupMap->readMap(groupfile); //like barcodeNameVector - no primer names vector groups = groupMap->getNamesOfGroups(); fastqFileNames.resize(groups.size()); for (int i = 0; i < fastqFileNames.size(); i++) { for (int j = 0; j < 1; j++) { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaQFile)); variables["[group]"] = groups[i]; string thisFilename = getOutputFileName("fastq",variables); outputNames.push_back(thisFilename); ofstream temp; m->openOutputFileBinary(thisFilename, temp); temp.close(); fastqFileNames[i].push_back(thisFilename); GroupToFile[groups[i]] = i; } } //make blank files for no matches ofstream temp, tempff, tempfq; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[group]"] = "scrap"; ffqnoMatchFile = getOutputFileName("fastq", variables); m->openOutputFile(ffqnoMatchFile, temp); temp.close(); if (fasta) { ffnoMatchFile = getOutputFileName("fasta", variables); m->openOutputFile(ffqnoMatchFile, tempff); tempff.close(); } if (qual) { fqnoMatchFile = getOutputFileName("qfile", variables); m->openOutputFile(fqnoMatchFile, tempfq); tempfq.close(); } return true; } catch(exception& e) { m->errorOut(e, "ParseFastaQCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/parsefastaqcommand.h000066400000000000000000000050501255543666200222610ustar00rootroot00000000000000#ifndef PARSEFASTAQCOMMAND_H #define PARSEFASTAQCOMMAND_H /* * parsefastaqcommand.h * Mothur * * Created by westcott on 9/30/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "trimoligos.h" #include "sequence.hpp" #include "groupmap.h" #include "oligos.h" struct fastqRead2 { string quality; Sequence seq; string wholeRead; fastqRead2() { }; fastqRead2(Sequence s, string q, string w) : seq(s), quality(q), wholeRead(w){}; ~fastqRead2() {}; }; class ParseFastaQCommand : public Command { public: ParseFastaQCommand(string); ParseFastaQCommand(); ~ParseFastaQCommand() {} vector setParameters(); string getCommandName() { return "fastq.info"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Parse.fastq"; } string getDescription() { return "reads a fastq file and creates a fasta and quality file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector outputNames; string outputDir, inputDir, fastaQFile, format, oligosfile, groupfile, file, inputfile, ffastq, rfastq; bool abort, fasta, qual, pacbio, pairedOligos, reorient, createFileGroup, hasIndex; int pdiffs, bdiffs, ldiffs, sdiffs, tdiffs, split, numBarcodes, numPrimers, numLinkers, numSpacers, numRPrimers, fileOption; GroupMap* groupMap; Oligos oligos; map file2Group; vector< vector > readFile(); vector > fastqFileNames; vector > rfastqFileNames; vector > fastaFileNames; vector > qualFileNames; vector > rfastaFileNames; vector > rqualFileNames; string ffqnoMatchFile, rfqnoMatchFile, ffnoMatchFile, rfnoMatchFile, fqnoMatchFile, rqnoMatchFile; vector Groups; map GroupToFile; int processFile(string inputfile, TrimOligos*&, TrimOligos*&); int processFile(vector inputfiles, TrimOligos*&, TrimOligos*&); vector convertQual(string); vector convertTable; bool readOligos(string oligosFile); bool readGroup(string oligosFile); fastqRead2 readFastq(ifstream&, bool&); int findGroup(fastqRead2, int&, int&, TrimOligos*&, TrimOligos*&, int, int); int findGroup(fastqRead2, int&, int&, string); int findGroup(fastqRead2, fastqRead2, int&, int&, TrimOligos*&, TrimOligos*&, int, int); }; #endif mothur-1.36.1/source/commands/parselistscommand.cpp000066400000000000000000000411031255543666200224720ustar00rootroot00000000000000/* * parselistcommand.cpp * Mothur * * Created by westcott on 2/24/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "parselistscommand.h" //********************************************************************************************************************** vector ParseListCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","list",false,true,true); parameters.push_back(plist); CommandParameter pcount("count", "InputTypes", "", "", "CountGroup", "CountGroup", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "CountGroup", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ParseListCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ParseListCommand::getHelpString(){ try { string helpString = ""; helpString += "The parse.list command reads a list and group or count file and generates a list file for each group in the group or count file. \n"; helpString += "The parse.list command parameters are list, group, count and label.\n"; helpString += "The list and group or count parameters are required.\n"; helpString += "If a count file is provided, mothur assumes the list file contains only unique names.\n"; helpString += "If a group file is provided, mothur assumes the list file contains all names.\n"; helpString += "The label parameter is used to read specific labels in your input you want to use.\n"; helpString += "The parse.list command should be used in the following format: parse.list(list=yourListFile, group=yourGroupFile, label=yourLabels).\n"; helpString += "Example: parse.list(list=abrecovery.fn.list, group=abrecovery.groups, label=0.03).\n"; helpString += "Note: No spaces between parameter labels (i.e. list), '=' and parameters (i.e.yourListfile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ParseListCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ParseListCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "list") { pattern = "[filename],[group],[distance],list"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ParseListCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ParseListCommand::ParseListCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["list"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ParseListCommand", "ParseListCommand"); exit(1); } } //********************************************************************************************************************** ParseListCommand::ParseListCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["list"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current list file. You must provide a list file."); m->mothurOutEndLine(); abort = true; } }else { m->setListFile(listfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(listfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not found") { groupfile = ""; groupMap = NULL; } else if (groupfile == "not open") { abort = true; groupfile = ""; groupMap = NULL; } else { m->setGroupFile(groupfile); groupMap = new GroupMap(groupfile); int error = groupMap->readMap(); if (error == 1) { abort = true; } } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not found") { countfile = ""; } else if (countfile == "not open") { abort = true; countfile = ""; } else { m->setCountTableFile(countfile); ct.readTable(countfile, true, false); if (!ct.hasGroupInfo()) { abort = true; m->mothurOut("[ERROR]: The parse.list command requires group info to be present in your countfile, quitting."); m->mothurOutEndLine(); } } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; }else if ((groupfile == "") && (countfile == "")) { m->mothurOut("[ERROR]: you must provide one of the following: group or count."); m->mothurOutEndLine(); abort=true; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; allLines = 1; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } } } catch(exception& e) { m->errorOut(e, "ParseListCommand", "ParseListCommand"); exit(1); } } //********************************************************************************************************************** int ParseListCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; InputData input(listfile, "list"); list = input.getListVector(); string lastLabel = list->getLabel(); if (m->control_pressed) { delete list; if (groupfile != "") { delete groupMap; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete list; if (groupfile != "") { delete groupMap; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } if(allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel()); m->mothurOutEndLine(); parse(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); list = input.getListVector(lastLabel); //get new list vector to process m->mothurOut(list->getLabel()); m->mothurOutEndLine(); parse(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input.getListVector(); //get new list vector to process } if (m->control_pressed) { if (groupfile != "") { delete groupMap; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } if (m->control_pressed) { if (groupfile != "") { delete groupMap; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input.getListVector(lastLabel); //get new list vector to process m->mothurOut(list->getLabel()); m->mothurOutEndLine(); parse(list); delete list; } if (groupfile != "") { delete groupMap; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ParseListCommand", "execute"); exit(1); } } /**********************************************************************************************************************/ int ParseListCommand::parse(ListVector* thisList) { try { map filehandles; map::iterator it3; //set fileroot map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[distance]"] = thisList->getLabel(); //fill filehandles with neccessary ofstreams ofstream* temp; vector gGroups; if (groupfile != "") { gGroups = groupMap->getNamesOfGroups(); } else { gGroups = ct.getNamesOfGroups(); } for (int i=0; iopenOutputFile(filename, *temp); outputNames.push_back(filename); outputTypes["list"].push_back(filename); } map groupVector; map groupLabels; map::iterator itGroup; map groupNumBins; //print label for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { groupNumBins[it3->first] = 0; groupVector[it3->first] = ""; groupLabels[it3->first] = "label\tnumOtus"; } vector binLabels = thisList->getLabels(); for (int i = 0; i < thisList->getNumBins(); i++) { if (m->control_pressed) { break; } map groupBins; string bin = list->get(i); vector names; m->splitAtComma(bin, names); //parses bin into individual sequence names //parse bin into list of sequences in each group for (int j = 0; j < names.size(); j++) { if (groupfile != "") { string group = groupMap->getGroup(names[j]); if (group == "not found") { m->mothurOut(names[j] + " is not in your groupfile. please correct."); m->mothurOutEndLine(); exit(1); } itGroup = groupBins.find(group); if(itGroup == groupBins.end()) { groupBins[group] = names[j]; //add first name groupNumBins[group]++; }else{ //add another name groupBins[group] = groupBins[group] + "," + names[j]; } }else{ vector thisSeqsGroups = ct.getGroups(names[j]); for (int k = 0; k < thisSeqsGroups.size(); k++) { string group = thisSeqsGroups[k]; itGroup = groupBins.find(group); if(itGroup == groupBins.end()) { groupBins[group] = names[j]; //add first name groupNumBins[group]++; }else{ //add another name groupBins[group] = groupBins[group] + "," + names[j]; } } } } //print parsed bin info to files for (itGroup = groupBins.begin(); itGroup != groupBins.end(); itGroup++) { groupVector[itGroup->first] += '\t' + itGroup->second; groupLabels[itGroup->first] += '\t' + binLabels[i]; } } if (m->control_pressed) { for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(filehandles[it3->first])).close(); delete it3->second; } return 0; } //end list vector for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(filehandles[it3->first])) << groupLabels[it3->first] << endl; (*(filehandles[it3->first])) << thisList->getLabel() << '\t' << groupNumBins[it3->first] << groupVector[it3->first] << endl; // label numBins listvector for that group (*(filehandles[it3->first])).close(); delete it3->second; } return 0; } catch(exception& e) { m->errorOut(e, "ParseListCommand", "parse"); exit(1); } } /**********************************************************************************************************************/ mothur-1.36.1/source/commands/parselistscommand.h000066400000000000000000000024031255543666200221370ustar00rootroot00000000000000#ifndef PARSELISTCOMMAND_H #define PARSELISTCOMMAND_H /* * parselistcommand.h * Mothur * * Created by westcott on 2/24/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "groupmap.h" #include "inputdata.h" #include "listvector.hpp" /***************************************************************************************/ class ParseListCommand : public Command { public: ParseListCommand(string); ParseListCommand(); ~ParseListCommand() {} vector setParameters(); string getCommandName() { return "parse.list"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Parse.list"; } string getDescription() { return "parses a list file by group"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: int parse(ListVector*); ListVector* list; GroupMap* groupMap; CountTable ct; ofstream out; string outputDir, listfile, groupfile, label, countfile; set labels; bool abort, allLines; vector outputNames; }; /***************************************************************************************/ #endif mothur-1.36.1/source/commands/parsimonycommand.cpp000066400000000000000000000565671255543666200223460ustar00rootroot00000000000000/* * parsimonycommand.cpp * Mothur * * Created by Sarah Westcott on 1/26/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "parsimonycommand.h" #include "treereader.h" //********************************************************************************************************************** vector ParsimonyCommand::setParameters(){ try { CommandParameter ptree("tree", "InputTypes", "", "", "none", "none", "none","parsimony-psummary",false,true,true); parameters.push_back(ptree); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter prandom("random", "String", "", "", "", "", "","",false,false); parameters.push_back(prandom); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ParsimonyCommand::getHelpString(){ try { string helpString = ""; helpString += "The parsimony command parameters are tree, group, name, count, random, groups, processors and iters. tree parameter is required unless you have valid current tree file or are using random.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. You must enter at least 1 valid group.\n"; helpString += "The group names are separated by dashes. The iters parameter allows you to specify how many random trees you would like compared to your tree.\n"; helpString += "The parsimony command should be in the following format: parsimony(random=yourOutputFilename, groups=yourGroups, iters=yourIters).\n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; helpString += "Example parsimony(random=out, iters=500).\n"; helpString += "The default value for random is "" (meaning you want to use the trees in your inputfile, randomtree=out means you just want the random distribution of trees outputted to out.rd_parsimony),\n"; helpString += "and iters is 1000. The parsimony command output two files: .parsimony and .psummary their descriptions are in the manual.\n"; helpString += "Note: No spaces between parameter labels (i.e. random), '=' and parameters (i.e.yourOutputFilename).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ParsimonyCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "parsimony") { pattern = "[filename],parsimony"; } else if (type == "psummary") { pattern = "[filename],psummary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ParsimonyCommand::ParsimonyCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["parsimony"] = tempOutNames; outputTypes["psummary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "ParsimonyCommand"); exit(1); } } /***********************************************************/ ParsimonyCommand::ParsimonyCommand(string option) { try { abort = false; calledHelp = false; Groups.clear(); //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["parsimony"] = tempOutNames; outputTypes["psummary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("tree"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["tree"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } randomtree = validParameter.validFile(parameters, "random", false); if (randomtree == "not found") { randomtree = ""; } //are you trying to use parsimony without reading a tree or saying you want random distribution if (randomtree == "") { //check for required parameters treefile = validParameter.validFile(parameters, "tree", true); if (treefile == "not open") { treefile = ""; abort = true; } else if (treefile == "not found") { //if there is a current design file, use it treefile = m->getTreeFile(); if (treefile != "") { m->mothurOut("Using " + treefile + " as input file for the tree parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current tree file and the tree parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setTreeFile(treefile); } //check for required parameters groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } } //if the user changes the output directory command factory will send this info to us in the output parameter string outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; if (randomtree == "") { outputDir += m->hasPath(treefile); } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; m->clearGroups(); } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } itersString = validParameter.validFile(parameters, "iters", false); if (itersString == "not found") { itersString = "1000"; } m->mothurConvert(itersString, iters); string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if (countfile=="") { if (namefile == "") { vector files; files.push_back(treefile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "ParsimonyCommand"); exit(1); } } /***********************************************************/ int ParsimonyCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //randomtree will tell us if user had their own treefile or if they just want the random distribution //user has entered their own tree if (randomtree == "") { m->setTreeFile(treefile); TreeReader* reader; if (countfile == "") { reader = new TreeReader(treefile, groupfile, namefile); } else { reader = new TreeReader(treefile, countfile); } T = reader->getTrees(); ct = T[0]->getCountTable(); delete reader; if(outputDir == "") { outputDir += m->hasPath(treefile); } map variables; variables["[filename]"] = outputDir + m->getSimpleName(treefile) + "."; output = new ColumnFile(getOutputFileName("parsimony",variables), itersString); outputNames.push_back(getOutputFileName("parsimony",variables)); outputTypes["parsimony"].push_back(getOutputFileName("parsimony",variables)); sumFile = getOutputFileName("psummary",variables); m->openOutputFile(sumFile, outSum); outputNames.push_back(sumFile); outputTypes["psummary"].push_back(sumFile); }else { //user wants random distribution getUserInput(); if(outputDir == "") { outputDir += m->hasPath(randomtree); } output = new ColumnFile(outputDir+ m->getSimpleName(randomtree), itersString); outputNames.push_back(outputDir+ m->getSimpleName(randomtree)); outputTypes["parsimony"].push_back(outputDir+ m->getSimpleName(randomtree)); } //set users groups to analyze SharedUtil util; vector mGroups = m->getGroups(); vector tGroups = ct->getNamesOfGroups(); util.setGroups(mGroups, tGroups, allGroups, numGroups, "parsimony"); //sets the groups the user wants to analyze util.getCombos(groupComb, mGroups, numComp); m->setGroups(mGroups); if (numGroups == 1) { numComp++; groupComb.push_back(allGroups); } Parsimony pars; counter = 0; Progress* reading; reading = new Progress("Comparing to random:", iters); if (m->control_pressed) { delete reading; delete output; delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } if (randomtree == "") { outSum.close(); } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); m->clearGroups(); return 0; } //get pscore for users tree userData.resize(numComp,0); //data = AB, AC, BC, ABC. randomData.resize(numComp,0); //data = AB, AC, BC, ABC. rscoreFreq.resize(numComp); uscoreFreq.resize(numComp); rCumul.resize(numComp); uCumul.resize(numComp); userTreeScores.resize(numComp); UScoreSig.resize(numComp); if (randomtree == "") { //get pscores for users trees for (int i = 0; i < T.size(); i++) { userData = pars.getValues(T[i], processors, outputDir); //data = AB, AC, BC, ABC. if (m->control_pressed) { delete reading; delete output; delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } if (randomtree == "") { outSum.close(); } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); m->clearGroups(); return 0; } //output scores for each combination for(int k = 0; k < numComp; k++) { //update uscoreFreq map::iterator it = uscoreFreq[k].find(userData[k]); if (it == uscoreFreq[k].end()) {//new score uscoreFreq[k][userData[k]] = 1; }else{ uscoreFreq[k][userData[k]]++; } //add users score to valid scores validScores[userData[k]] = userData[k]; //save score for summary file userTreeScores[k].push_back(userData[k]); } } //get pscores for random trees for (int j = 0; j < iters; j++) { //create new tree with same num nodes and leaves as users randT = new Tree(ct); //create random relationships between nodes randT->assembleRandomTree(); //get pscore of random tree randomData = pars.getValues(randT, processors, outputDir); if (m->control_pressed) { delete reading; delete output; delete randT; if (randomtree == "") { outSum.close(); } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } m->clearGroups(); return 0; } for(int r = 0; r < numComp; r++) { //add trees pscore to map of scores map::iterator it = rscoreFreq[r].find(randomData[r]); if (it != rscoreFreq[r].end()) {//already have that score rscoreFreq[r][randomData[r]]++; }else{//first time we have seen this score rscoreFreq[r][randomData[r]] = 1; } //add randoms score to validscores validScores[randomData[r]] = randomData[r]; } //update progress bar reading->update(j); delete randT; } }else { //get pscores for random trees for (int j = 0; j < iters; j++) { //create new tree with same num nodes and leaves as users randT = new Tree(ct); //create random relationships between nodes randT->assembleRandomTree(); if (m->control_pressed) { delete reading; delete output; delete randT; delete ct; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } //get pscore of random tree randomData = pars.getValues(randT, processors, outputDir); if (m->control_pressed) { delete reading; delete output; delete randT; delete ct; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } for(int r = 0; r < numComp; r++) { //add trees pscore to map of scores map::iterator it = rscoreFreq[r].find(randomData[r]); if (it != rscoreFreq[r].end()) {//already have that score rscoreFreq[r][randomData[r]]++; }else{//first time we have seen this score rscoreFreq[r][randomData[r]] = 1; } //add randoms score to validscores validScores[randomData[r]] = randomData[r]; } //update progress bar reading->update(j); delete randT; } } for(int a = 0; a < numComp; a++) { float rcumul = 0.0000; float ucumul = 0.0000; //this loop fills the cumulative maps and put 0.0000 in the score freq map to make it easier to print. for (map::iterator it = validScores.begin(); it != validScores.end(); it++) { if (randomtree == "") { map::iterator it2 = uscoreFreq[a].find(it->first); //user data has that score if (it2 != uscoreFreq[a].end()) { uscoreFreq[a][it->first] /= T.size(); ucumul+= it2->second; } else { uscoreFreq[a][it->first] = 0.0000; } //no user trees with that score //make uCumul map uCumul[a][it->first] = ucumul; } //make rscoreFreq map and rCumul map::iterator it2 = rscoreFreq[a].find(it->first); //get percentage of random trees with that info if (it2 != rscoreFreq[a].end()) { rscoreFreq[a][it->first] /= iters; rcumul+= it2->second; } else { rscoreFreq[a][it->first] = 0.0000; } //no random trees with that score rCumul[a][it->first] = rcumul; } //find the signifigance of each user trees score when compared to the random trees and save for printing the summary file for (int h = 0; h < userTreeScores[a].size(); h++) { UScoreSig[a].push_back(rCumul[a][userTreeScores[a][h]]); } } if (m->control_pressed) { delete reading; delete output; delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } if (randomtree == "") { outSum.close(); } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0; } //finish progress bar reading->finish(); delete reading; printParsimonyFile(); if (randomtree == "") { printUSummaryFile(); } delete output; delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } outputTypes.clear(); return 0;} m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "execute"); exit(1); } } /***********************************************************/ void ParsimonyCommand::printParsimonyFile() { try { vector data; vector tags; if (randomtree == "") { tags.push_back("Score"); tags.push_back("UserFreq"); tags.push_back("UserCumul"); tags.push_back("RandFreq"); tags.push_back("RandCumul"); }else { tags.push_back("Score"); tags.push_back("RandFreq"); tags.push_back("RandCumul"); } for(int a = 0; a < numComp; a++) { output->initFile(groupComb[a], tags); //print each line for (map::iterator it = validScores.begin(); it != validScores.end(); it++) { if (randomtree == "") { data.push_back(it->first); data.push_back(uscoreFreq[a][it->first]); data.push_back(uCumul[a][it->first]); data.push_back(rscoreFreq[a][it->first]); data.push_back(rCumul[a][it->first]); }else{ data.push_back(it->first); data.push_back(rscoreFreq[a][it->first]); data.push_back(rCumul[a][it->first]); } output->output(data); data.clear(); } output->resetFile(); } } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "printParsimonyFile"); exit(1); } } /***********************************************************/ int ParsimonyCommand::printUSummaryFile() { try { //column headers outSum << "Tree#" << '\t' << "Groups" << '\t' << "ParsScore" << '\t' << "ParsSig" << endl; m->mothurOut("Tree#\tGroups\tParsScore\tParsSig"); m->mothurOutEndLine(); //format output outSum.setf(ios::fixed, ios::floatfield); outSum.setf(ios::showpoint); //print each line for (int i = 0; i< T.size(); i++) { for(int a = 0; a < numComp; a++) { if (m->control_pressed) { outSum.close(); return 0; } if (UScoreSig[a][i] > (1/(float)iters)) { outSum << setprecision(6) << i+1 << '\t' << groupComb[a] << '\t' << userTreeScores[a][i] << setprecision(itersString.length()) << '\t' << UScoreSig[a][i] << endl; cout << setprecision(6) << i+1 << '\t' << groupComb[a] << '\t' << userTreeScores[a][i] << setprecision(itersString.length()) << '\t' << UScoreSig[a][i] << endl; m->mothurOutJustToLog(toString(i+1) + "\t" + groupComb[a] + "\t" + toString(userTreeScores[a][i]) + "\t" + toString(UScoreSig[a][i])); m->mothurOutEndLine(); }else { outSum << setprecision(6) << i+1 << '\t' << groupComb[a] << '\t' << userTreeScores[a][i] << setprecision(itersString.length()) << '\t' << "<" << (1/float(iters)) << endl; cout << setprecision(6) << i+1 << '\t' << groupComb[a] << '\t' << userTreeScores[a][i] << setprecision(itersString.length()) << '\t' << "<" << (1/float(iters)) << endl; m->mothurOutJustToLog(toString(i+1) + "\t" + groupComb[a] + "\t" + toString(userTreeScores[a][i]) + "\t" + toString((1/float(iters)))); m->mothurOutEndLine(); } } } outSum.close(); return 0; } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "printUSummaryFile"); exit(1); } } /***********************************************************/ void ParsimonyCommand::getUserInput() { try { //create treemap ct = new CountTable(); m->mothurOut("Please enter the number of groups you would like to analyze: "); cin >> numGroups; m->mothurOutJustToLog(toString(numGroups)); m->mothurOutEndLine(); int num, count; count = 1; numEachGroup.resize(numGroups, 0); set nameMap; map groupMap; set gps; for (int i = 1; i <= numGroups; i++) { m->mothurOut("Please enter the number of sequences in group " + toString(i) + ": "); cin >> num; m->mothurOutJustToLog(toString(num)); m->mothurOutEndLine(); gps.insert(toString(i)); //set tmaps namesOfSeqs for (int j = 0; j < num; j++) { groupMap[toString(count)] = toString(i); nameMap.insert(toString(count)); count++; } } ct->createTable(nameMap, groupMap, gps); //clears buffer so next command doesn't have error string s; getline(cin, s); m->Treenames = ct->getNamesOfSeqs(); m->runParse = false; } catch(exception& e) { m->errorOut(e, "ParsimonyCommand", "getUserInput"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/commands/parsimonycommand.h000066400000000000000000000063251255543666200217760ustar00rootroot00000000000000#ifndef PARSIMONYCOMMAND_H #define PARSIMONYCOMMAND_H /* * parsimonycommand.h * Mothur * * Created by Sarah Westcott on 1/26/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "parsimony.h" #include "counttable.h" #include "progress.hpp" #include "sharedutilities.h" #include "fileoutput.h" #include "readtree.h" class ParsimonyCommand : public Command { public: ParsimonyCommand(string); ParsimonyCommand(); ~ParsimonyCommand(){} vector setParameters(); string getCommandName() { return "parsimony"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Slatkin M, Maddison WP (1989). A cladistic measure of gene flow inferred from the phylogenies of alleles. Genetics 123: 603-13. \nSlatkin M, Maddison WP (1990). Detecting isolation by distance using phylogenies of genes. Genetics 126: 249-60. \nMartin AP (2002). Phylogenetic approaches for describing and comparing the diversity of microbial communities. Appl Environ Microbiol 68: 3673-82. \nSchloss PD, Handelsman J (2006). Introducing TreeClimber, a test to compare microbial community structure. Appl Environ Microbiol 72: 2379-84.\nhttp://www.mothur.org/wiki/Parsimony"; } string getDescription() { return "generic test that describes whether two or more communities have the same structure"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: FileOutput* output; vector T; //user trees Tree* randT; //random tree Tree* copyUserTree; CountTable* ct; CountTable* savect; vector groupComb; // AB. AC, BC... string sumFile, randomtree, allGroups, outputDir, treefile, groupfile, namefile, countfile; int iters, numGroups, numComp, counter, processors, numUniquesInName; vector numEachGroup; //vector containing the number of sequences in each group the users wants for random distrib. vector< vector > userTreeScores; //scores for users trees for each comb. vector< vector > UScoreSig; //tree score signifigance when compared to random trees - percentage of random trees with that score or lower. EstOutput userData; //pscore info for user tree EstOutput randomData; //pscore info for random trees map validScores; //map contains scores from both user and random vector< map > rscoreFreq; //map -vector entry for each combination. vector< map > uscoreFreq; //map -vector entry for each combination. vector< map > rCumul; //map -vector entry for each combination. vector< map > uCumul; //map -vector entry for each combination. ofstream outSum; bool abort; string groups, itersString; vector Groups, outputNames; //holds groups to be used map nameMap; void printParsimonyFile(); int printUSummaryFile(); void getUserInput(); int readNamesFile(); }; #endif mothur-1.36.1/source/commands/pcacommand.cpp000066400000000000000000000431261255543666200210530ustar00rootroot00000000000000/* * pcacommand.cpp * mothur * * Created by westcott on 1/7/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "pcacommand.h" #include "inputdata.h" //********************************************************************************************************************** vector PCACommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "LRSS", "LRSS", "none","pca-loadings",false,false,true); parameters.push_back(pshared); CommandParameter prelabund("relabund", "InputTypes", "", "", "LRSS", "LRSS", "none","pca-loadings",false,false,true); parameters.push_back(prelabund); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pmetric("metric", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pmetric); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PCACommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PCACommand::getHelpString(){ try { string helpString = ""; helpString += "The pca command parameters are shared, relabund, label, groups and metric. shared or relabund is required unless you have a valid current file."; helpString += "The label parameter is used to analyze specific labels in your input. Default is the first label in your shared or relabund file. Multiple labels may be separated by dashes.\n"; helpString += "The groups parameter allows you to specify which groups you would like analyzed. Groupnames are separated by dashes.\n"; helpString += "The metric parameter allows you to indicate if would like the pearson correlation coefficient calculated. Default=True"; helpString += "Example pca(groups=yourGroups).\n"; helpString += "Example pca(groups=A-B-C).\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PCACommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string PCACommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "pca") { pattern = "[filename],[distance],pca.axes"; } else if (type == "loadings") { pattern = "[filename],[distance],pca.loadings"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PCACommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** PCACommand::PCACommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["pca"] = tempOutNames; outputTypes["loadings"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "PCACommand", "PCACommand"); exit(1); } } //********************************************************************************************************************** PCACommand::PCACommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser. getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["pca"] = tempOutNames; outputTypes["loadings"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } } //check for required parameters sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { mode = "sharedfile"; inputFile = sharedfile; m->setSharedFile(sharedfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { relabundfile = ""; abort = true; } else if (relabundfile == "not found") { relabundfile = ""; } else { mode = "relabund"; inputFile = relabundfile; m->setRelAbundFile(relabundfile); } if ((sharedfile == "") && (relabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputFile = sharedfile; mode = "sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { relabundfile = m->getRelAbundFile(); if (relabundfile != "") { inputFile = relabundfile; mode = "relabund"; m->mothurOut("Using " + relabundfile + " as input file for the relabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a relabund or shared file."); m->mothurOutEndLine(); abort = true; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(inputFile); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "metric", false); if (temp == "not found"){ temp = "T"; } metric = m->isTrue(temp); label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; if(labels.size() == 0) { m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); } } else { m->splitAtDash(label, labels); } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); } } catch(exception& e) { m->errorOut(e, "PCACommand", "PCACommand"); exit(1); } } //********************************************************************************************************************** int PCACommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); cerr.setf(ios::fixed, ios::floatfield); cerr.setf(ios::showpoint); //get first line of shared file vector< vector > matrix; InputData* input; if (mode == "sharedfile") { input = new InputData(inputFile, "sharedfile"); }else if (mode == "relabund") { input = new InputData(inputFile, "relabund"); }else { m->mothurOut("[ERROR]: filetype not recognized."); m->mothurOutEndLine(); return 0; } vector lookupFloat = input->getSharedRAbundFloatVectors(); string lastLabel = lookupFloat[0]->getLabel(); set processedLabels; set userLabels = labels; //if the user gave no labels, then use the first one read if (labels.size() == 0) { label = lastLabel; process(lookupFloat); } //as long as you are not at the end of the file or done wih the lines you want while((lookupFloat[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat.clear(); return 0; } if(labels.count(lookupFloat[0]->getLabel()) == 1){ processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); process(lookupFloat); } if ((m->anyLabelsToProcess(lookupFloat[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookupFloat[0]->getLabel(); for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat.clear(); lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); process(lookupFloat); processedLabels.insert(lookupFloat[0]->getLabel()); userLabels.erase(lookupFloat[0]->getLabel()); //restore real lastlabel to save below lookupFloat[0]->setLabel(saveLabel); } lastLabel = lookupFloat[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat.clear(); lookupFloat = input->getSharedRAbundFloatVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete input; for (int i = 0; i < lookupFloat.size(); i++) { delete lookupFloat[i]; } lookupFloat.clear(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i] != NULL) { delete lookupFloat[i]; } } lookupFloat.clear(); lookupFloat = input->getSharedRAbundFloatVectors(lastLabel); process(lookupFloat); for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i] != NULL) { delete lookupFloat[i]; } } lookupFloat.clear(); } for (int i = 0; i < lookupFloat.size(); i++) { if (lookupFloat[i] != NULL) { delete lookupFloat[i]; } } lookupFloat.clear(); delete input; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PCACommand", "execute"); exit(1); } } /********************************************************************************************************************** vector< vector > PCACommand::createMatrix(vector lookupFloat){ try { vector< vector > matrix; matrix.resize(lookupFloat.size()); //fill matrix with shared files relative abundances for (int i = 0; i < lookupFloat.size(); i++) { for (int j = 0; j < lookupFloat[i]->getNumBins(); j++) { matrix[i].push_back(lookupFloat[i]->getAbundance(j)); } } vector< vector > transposeMatrix; transposeMatrix.resize(matrix[0].size()); for (int i = 0; i < transposeMatrix.size(); i++) { for (int j = 0; j < matrix.size(); j++) { transposeMatrix[i].push_back(matrix[j][i]); } } matrix = linearCalc.matrix_mult(matrix, transposeMatrix); return matrix; } catch(exception& e) { m->errorOut(e, "PCACommand", "createMatrix"); exit(1); } }*/ //********************************************************************************************************************** int PCACommand::process(vector& lookupFloat){ try { m->mothurOut("\nProcessing " + lookupFloat[0]->getLabel()); m->mothurOutEndLine(); int numOTUs = lookupFloat[0]->getNumBins(); int numSamples = lookupFloat.size(); vector< vector > matrix(numSamples); vector colMeans(numOTUs); //fill matrix with shared relative abundances, re-center for (int i = 0; i < lookupFloat.size(); i++) { matrix[i].resize(numOTUs, 0); for (int j = 0; j < numOTUs; j++) { matrix[i][j] = lookupFloat[i]->getAbundance(j); colMeans[j] += matrix[i][j]; } } for(int j=0;j > centered = matrix; for(int i=0;i > transpose(numOTUs); for (int i = 0; i < numOTUs; i++) { transpose[i].resize(numSamples, 0); for (int j = 0; j < numSamples; j++) { transpose[i][j] = centered[j][i]; } } vector > crossProduct = linearCalc.matrix_mult(transpose, centered); vector d; vector e; linearCalc.tred2(crossProduct, d, e); if (m->control_pressed) { return 0; } linearCalc.qtli(d, e, crossProduct); if (m->control_pressed) { return 0; } vector > X = linearCalc.matrix_mult(centered, crossProduct); if (m->control_pressed) { return 0; } string fbase = outputDir + m->getRootName(m->getSimpleName(inputFile)); //string outputFileName = fbase + lookupFloat[0]->getLabel(); output(fbase, lookupFloat[0]->getLabel(), m->getGroups(), X, d); if (metric) { vector > observedEuclideanDistance = linearCalc.getObservedEuclideanDistance(centered); for (int i = 1; i < 4; i++) { vector< vector > PCAEuclidDists = linearCalc.calculateEuclidianDistance(X, i); //G is the pca file if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } double corr = linearCalc.calcPearson(PCAEuclidDists, observedEuclideanDistance); m->mothurOut("Rsq " + toString(i) + " axis: " + toString(corr * corr)); m->mothurOutEndLine(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } } } return 0; } catch(exception& e) { m->errorOut(e, "PCACommand", "process"); exit(1); } } /*********************************************************************************************************************************/ void PCACommand::output(string fbase, string label, vector name_list, vector >& G, vector d) { try { int numEigenValues = d.size(); double dsum = 0.0000; for(int i=0;i variables; variables["[filename]"] = fbase; variables["[distance]"] = label; string pcaFileName = getOutputFileName("pca",variables); m->openOutputFile(pcaFileName, pcaData); pcaData.setf(ios::fixed, ios::floatfield); pcaData.setf(ios::showpoint); outputNames.push_back(pcaFileName); outputTypes["pca"].push_back(pcaFileName); ofstream pcaLoadings; string loadingsFilename = getOutputFileName("loadings",variables); m->openOutputFile(loadingsFilename, pcaLoadings); pcaLoadings.setf(ios::fixed, ios::floatfield); pcaLoadings.setf(ios::showpoint); outputNames.push_back(loadingsFilename); outputTypes["loadings"].push_back(loadingsFilename); pcaLoadings << "axis\tloading\n"; for(int i=0;ierrorOut(e, "PCACommand", "output"); exit(1); } } /*********************************************************************************************************************************/ mothur-1.36.1/source/commands/pcacommand.h000066400000000000000000000027411255543666200205160ustar00rootroot00000000000000#ifndef PCACOMMAND_H #define PCACOMMAND_H /* * pcacommand.h * mothur * * Created by westcott on 1/7/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "linearalgebra.h" #include "sharedrabundfloatvector.h" /*****************************************************************/ class PCACommand : public Command { public: PCACommand(string); PCACommand(); ~PCACommand() {} vector setParameters(); string getCommandName() { return "pca"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "McCune B, Grace JB, Urban DL (2002). Analysis of ecological communities. MjM Software Design: Gleneden Beach, OR. \nLegendre P, Legendre L (1998). Numerical Ecology. Elsevier: New York. \nhttp://www.mothur.org/wiki/Pca"; } string getDescription() { return "pca"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, metric; string outputDir, mode, inputFile, label, groups, sharedfile, relabundfile; vector outputNames, Groups; set labels; LinearAlgebra linearCalc; //vector< vector > createMatrix(vector); int process(vector&); void output(string, string, vector, vector >&, vector); }; /*****************************************************************/ #endif mothur-1.36.1/source/commands/pcoacommand.cpp000066400000000000000000000240771255543666200212360ustar00rootroot00000000000000 /* * pcacommand.cpp * Mothur * * Created by westcott on 1/4/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "pcoacommand.h" #include "readphylipvector.h" //********************************************************************************************************************** vector PCOACommand::setParameters(){ try { CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "none", "none","pcoa-loadings",false,true,true); parameters.push_back(pphylip); CommandParameter pmetric("metric", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pmetric); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PCOACommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PCOACommand::getHelpString(){ try { string helpString = ""; helpString += "The pcoa command parameters are phylip and metric"; helpString += "The phylip parameter allows you to enter your distance file."; helpString += "The metric parameter allows indicate you if would like the pearson correlation coefficient calculated. Default=True"; helpString += "Example pcoa(phylip=yourDistanceFile).\n"; helpString += "Note: No spaces between parameter labels (i.e. phylip), '=' and parameters (i.e.yourDistanceFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PCOACommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string PCOACommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "pcoa") { pattern = "[filename],pcoa.axes"; } else if (type == "loadings") { pattern = "[filename],pcoa.loadings"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PCOACommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** PCOACommand::PCOACommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["pcoa"] = tempOutNames; outputTypes["loadings"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "PCOACommand", "PCOACommand"); exit(1); } } //********************************************************************************************************************** PCOACommand::PCOACommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser. getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } } //initialize outputTypes vector tempOutNames; outputTypes["pcoa"] = tempOutNames; outputTypes["loadings"] = tempOutNames; //required parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { abort = true; } else if (phylipfile == "not found") { //if there is a current phylip file, use it phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current phylip file and the phylip parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setPhylipFile(phylipfile); } filename = phylipfile; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(phylipfile); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "metric", false); if (temp == "not found"){ temp = "T"; } metric = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "PCOACommand", "PCOACommand"); exit(1); } } //********************************************************************************************************************** int PCOACommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); cerr.setf(ios::fixed, ios::floatfield); cerr.setf(ios::showpoint); vector names; vector > D; fbase = outputDir + m->getRootName(m->getSimpleName(filename)); ReadPhylipVector readFile(filename); names = readFile.read(D); if (m->control_pressed) { return 0; } double offset = 0.0000; vector d; vector e; vector > G = D; //vector > copy_G; m->mothurOut("\nProcessing...\n"); for(int count=0;count<2;count++){ linearCalc.recenter(offset, D, G); if (m->control_pressed) { return 0; } linearCalc.tred2(G, d, e); if (m->control_pressed) { return 0; } linearCalc.qtli(d, e, G); if (m->control_pressed) { return 0; } offset = d[d.size()-1]; if(offset > 0.0) break; } if (m->control_pressed) { return 0; } output(fbase, names, G, d); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (metric) { for (int i = 1; i < 4; i++) { vector< vector > EuclidDists = linearCalc.calculateEuclidianDistance(G, i); //G is the pcoa file if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } double corr = linearCalc.calcPearson(EuclidDists, D); //G is the pcoa file, D is the users distance matrix m->mothurOut("Rsq " + toString(i) + " axis: " + toString(corr * corr)); m->mothurOutEndLine(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PCOACommand", "execute"); exit(1); } } /*********************************************************************************************************************************/ void PCOACommand::get_comment(istream& f, char begin, char end){ try { char d=f.get(); while(d != end){ d = f.get(); } d = f.peek(); } catch(exception& e) { m->errorOut(e, "PCOACommand", "get_comment"); exit(1); } } /*********************************************************************************************************************************/ void PCOACommand::output(string fnameRoot, vector name_list, vector >& G, vector d) { try { int rank = name_list.size(); double dsum = 0.0000; for(int i=0;i= 0) { G[i][j] *= pow(d[j],0.5); } else { G[i][j] = 0.00000; } } } ofstream pcaData; map variables; variables["[filename]"] = fnameRoot; string pcoaDataFile = getOutputFileName("pcoa",variables); m->openOutputFile(pcoaDataFile, pcaData); pcaData.setf(ios::fixed, ios::floatfield); pcaData.setf(ios::showpoint); outputNames.push_back(pcoaDataFile); outputTypes["pcoa"].push_back(pcoaDataFile); ofstream pcaLoadings; string loadingsFile = getOutputFileName("loadings",variables); m->openOutputFile(loadingsFile, pcaLoadings); pcaLoadings.setf(ios::fixed, ios::floatfield); pcaLoadings.setf(ios::showpoint); outputNames.push_back(loadingsFile); outputTypes["loadings"].push_back(loadingsFile); pcaLoadings << "axis\tloading\n"; for(int i=0;ierrorOut(e, "PCOACommand", "output"); exit(1); } } /*********************************************************************************************************************************/ mothur-1.36.1/source/commands/pcoacommand.h000066400000000000000000000024571255543666200207010ustar00rootroot00000000000000#ifndef PCOACOMMAND_H #define PCOACOMMAND_H /* * pcoacommand.h * Mothur * * Created by westcott on 1/4/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "linearalgebra.h" /*****************************************************************/ class PCOACommand : public Command { public: PCOACommand(string); PCOACommand(); ~PCOACommand(){} vector setParameters(); string getCommandName() { return "pcoa"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "McCune B, Grace JB, Urban DL (2002). Analysis of ecological communities. MjM Software Design: Gleneden Beach, OR. \nLegendre P, Legendre L (1998). Numerical Ecology. Elsevier: New York. \nhttp://www.mothur.org/wiki/Pcoa"; } string getDescription() { return "pcoa"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, metric; string phylipfile, filename, fbase, outputDir; vector outputNames; LinearAlgebra linearCalc; void get_comment(istream&, char, char); void output(string, vector, vector >&, vector); }; /*****************************************************************/ #endif mothur-1.36.1/source/commands/pcrseqscommand.cpp000066400000000000000000001657541255543666200220040ustar00rootroot00000000000000// // prcseqscommand.cpp // Mothur // // Created by Sarah Westcott on 3/14/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "pcrseqscommand.h" //********************************************************************************************************************** vector PcrSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter poligos("oligos", "InputTypes", "", "", "ecolioligos", "none", "none","",false,false,true); parameters.push_back(poligos); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter ptax("taxonomy", "InputTypes", "", "", "none", "none", "none","taxonomy",false,false,true); parameters.push_back(ptax); CommandParameter pecoli("ecoli", "InputTypes", "", "", "ecolioligos", "none", "none","",false,false); parameters.push_back(pecoli); CommandParameter pstart("start", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pstart); CommandParameter pend("end", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pend); CommandParameter pnomatch("nomatch", "Multiple", "reject-keep", "reject", "", "", "","",false,false); parameters.push_back(pnomatch); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(ppdiffs); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pkeepprimer("keepprimer", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pkeepprimer); CommandParameter pkeepdots("keepdots", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pkeepdots); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PcrSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The pcr.seqs command reads a fasta file.\n"; helpString += "The pcr.seqs command parameters are fasta, oligos, name, group, count, taxonomy, ecoli, start, end, nomatch, pdiffs, processors, keepprimer and keepdots.\n"; helpString += "The ecoli parameter is used to provide a fasta file containing a single reference sequence (e.g. for e. coli) this must be aligned. Mothur will trim to the start and end positions of the reference sequence.\n"; helpString += "The start parameter allows you to provide a starting position to trim to.\n"; helpString += "The end parameter allows you to provide a ending position to trim from.\n"; helpString += "The nomatch parameter allows you to decide what to do with sequences where the primer is not found. Default=reject, meaning remove from fasta file. if nomatch=true, then do nothing to sequence.\n"; helpString += "The processors parameter allows you to use multiple processors.\n"; helpString += "The keepprimer parameter allows you to keep the primer, default=false.\n"; helpString += "The keepdots parameter allows you to keep the leading and trailing .'s, default=true.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; helpString += "For more details please check out the wiki http://www.mothur.org/wiki/Pcr.seqs .\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string PcrSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],pcr,[extension]-[filename],[tag],pcr,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],pcr,[extension]"; } else if (type == "name") { pattern = "[filename],pcr,[extension]"; } else if (type == "group") { pattern = "[filename],pcr,[extension]"; } else if (type == "count") { pattern = "[filename],pcr,[extension]"; } else if (type == "accnos") { pattern = "[filename],bad.accnos"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** PcrSeqsCommand::PcrSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["accnos"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "PcrSeqsCommand"); exit(1); } } //*************************************************************************************************************** PcrSeqsCommand::PcrSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } it = parameters.find("ecoli"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["ecoli"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else if (fastafile == "not open") { fastafile = ""; abort = true; } else { m->setFastaFile(fastafile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(fastafile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "keepprimer", false); if (temp == "not found") { temp = "f"; } keepprimer = m->isTrue(temp); temp = validParameter.validFile(parameters, "keepdots", false); if (temp == "not found") { temp = "t"; } keepdots = m->isTrue(temp); temp = validParameter.validFile(parameters, "oligos", true); if (temp == "not found"){ oligosfile = ""; } else if(temp == "not open"){ oligosfile = ""; abort = true; } else { oligosfile = temp; m->setOligosFile(oligosfile); } ecolifile = validParameter.validFile(parameters, "ecoli", true); if (ecolifile == "not found"){ ecolifile = ""; } else if(ecolifile == "not open"){ ecolifile = ""; abort = true; } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not found"){ namefile = ""; } else if(namefile == "not open"){ namefile = ""; abort = true; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not found"){ groupfile = ""; } else if(groupfile == "not open"){ groupfile = ""; abort = true; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not found"){ taxfile = ""; } else if(taxfile == "not open"){ taxfile = ""; abort = true; } else { m->setTaxonomyFile(taxfile); } temp = validParameter.validFile(parameters, "start", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, start); temp = validParameter.validFile(parameters, "end", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, end); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, pdiffs); nomatch = validParameter.validFile(parameters, "nomatch", false); if (nomatch == "not found") { nomatch = "reject"; } if ((nomatch != "reject") && (nomatch != "keep")) { m->mothurOut("[ERROR]: " + nomatch + " is not a valid entry for nomatch. Choices are reject and keep.\n"); abort = true; } //didnt set anything if ((oligosfile == "") && (ecolifile == "") && (start == -1) && (end == -1)) { m->mothurOut("[ERROR]: You did not set any options. Please provide an oligos or ecoli file, or set start or end.\n"); abort = true; } if ((oligosfile == "") && (ecolifile == "") && (start < 0) && (end == -1)) { m->mothurOut("[ERROR]: Invalid start value.\n"); abort = true; } if ((ecolifile != "") && (start != -1) && (end != -1)) { m->mothurOut("[ERROR]: You provided an ecoli file , but set the start or end parameters. Unsure what you intend. When you provide the ecoli file, mothur thinks you want to use the start and end of the sequence in the ecoli file.\n"); abort = true; } if ((oligosfile != "") && (ecolifile != "")) { m->mothurOut("[ERROR]: You can not use an ecoli file at the same time as an oligos file.\n"); abort = true; } //check to make sure you didn't forget the name file by mistake if (countfile == "") { if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "PcrSeqsCommand"); exit(1); } } //*************************************************************************************************************** int PcrSeqsCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); fileAligned = true; pairedOligos = false; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string trimSeqFile = getOutputFileName("fasta",variables); outputNames.push_back(trimSeqFile); outputTypes["fasta"].push_back(trimSeqFile); variables["[tag]"] = "scrap"; string badSeqFile = getOutputFileName("fasta",variables); length = 0; if(oligosfile != ""){ readOligos(); if (m->debug) { m->mothurOut("[DEBUG]: read oligos file. numprimers = " + toString(numFPrimers) + ", revprimers = " + toString(numRPrimers) + ".\n"); } } if (m->control_pressed) { return 0; } if(ecolifile != "") { readEcoli(); } if (m->control_pressed) { return 0; } vector positions; int numFastaSeqs = 0; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } #else if (processors == 1) { lines.push_back(linePair(0, 1000)); }else { positions = m->setFilePosFasta(fastafile, numFastaSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif if (m->control_pressed) { return 0; } set badNames; numFastaSeqs = createProcesses(fastafile, trimSeqFile, badSeqFile, badNames); if (m->control_pressed) { return 0; } //don't write or keep if blank if (badNames.size() != 0) { writeAccnos(badNames); } if (m->isBlank(badSeqFile)) { m->mothurRemove(badSeqFile); } else { outputNames.push_back(badSeqFile); outputTypes["fasta"].push_back(badSeqFile); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (namefile != "") { readName(badNames); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (groupfile != "") { readGroup(badNames); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (taxfile != "") { readTax(badNames); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (countfile != "") { readCount(badNames); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to screen " + toString(numFastaSeqs) + " sequences."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "execute"); exit(1); } } /**************************************************************************************************/ int PcrSeqsCommand::createProcesses(string filename, string goodFileName, string badFileName, set& badSeqNames) { try { vector processIDS; int process = 1; int num = 0; int pstart = -1; int pend = -1; bool adjustNeeded = false; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ string locationsFile = m->mothurGetpid(process) + ".temp"; num = driverPcr(filename, goodFileName + m->mothurGetpid(process) + ".temp", badFileName + m->mothurGetpid(process) + ".temp", locationsFile, badSeqNames, lines[process], pstart, adjustNeeded); //pass numSeqs to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << pstart << '\t' << adjustNeeded << endl; out << num << '\t' << badSeqNames.size() << endl; for (set::iterator it = badSeqNames.begin(); it != badSeqNames.end(); it++) { out << (*it) << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ string locationsFile = m->mothurGetpid(process) + ".temp"; num = driverPcr(filename, goodFileName + m->mothurGetpid(process) + ".temp", badFileName + m->mothurGetpid(process) + ".temp", locationsFile, badSeqNames, lines[process], pstart, adjustNeeded); //pass numSeqs to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << pstart << '\t' << adjustNeeded << endl; out << num << '\t' << badSeqNames.size() << endl; for (set::iterator it = badSeqNames.begin(); it != badSeqNames.end(); it++) { out << (*it) << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } string locationsFile = m->mothurGetpid(process) + ".temp"; num = driverPcr(filename, goodFileName, badFileName, locationsFile, badSeqNames, lines[0], pstart, adjustNeeded); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); int numBadNames = 0; string name = ""; int tpstart = -1; bool tempAdjust = false; if (!in.eof()) { in >> tpstart >> tempAdjust; m->gobble(in); if (tempAdjust) { adjustNeeded = true; } if (tpstart != -1) { if (tpstart != pstart) { adjustNeeded = true; } if (tpstart < pstart) { pstart = tpstart; } //smallest start } int tempNum = 0; in >> tempNum >> numBadNames; num += tempNum; m->gobble(in); } for (int j = 0; j < numBadNames; j++) { in >> name; m->gobble(in); badSeqNames.insert(name); } in.close(); m->mothurRemove(tempFile); m->appendFiles((goodFileName + toString(processIDS[i]) + ".temp"), goodFileName); m->mothurRemove((goodFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((badFileName + toString(processIDS[i]) + ".temp"), badFileName); m->mothurRemove((badFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((toString(processIDS[i]) + ".temp"), locationsFile); m->mothurRemove((toString(processIDS[i]) + ".temp")); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the sumScreenData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to badSeqNames. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; string locationsFile = "locationsFile.txt"; m->mothurRemove(locationsFile); m->mothurRemove(goodFileName); m->mothurRemove(badFileName); //Create processor worker threads. for( int i=0; icount; if (pDataArray[i]->count != pDataArray[i]->fend) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->fend) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } if (pDataArray[i]->adjustNeeded) { adjustNeeded = true; } if (pDataArray[i]->pstart != -1) { if (pDataArray[i]->pstart != pstart) { adjustNeeded = true; } if (pDataArray[i]->pstart < pstart) { pstart = pDataArray[i]->pstart; } } //smallest start for (set::iterator it = pDataArray[i]->badSeqNames.begin(); it != pDataArray[i]->badSeqNames.end(); it++) { badSeqNames.insert(*it); } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } for (int i = 0; i < processIDS.size(); i++) { m->appendFiles((goodFileName + toString(processIDS[i]) + ".temp"), goodFileName); m->mothurRemove((goodFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((badFileName + toString(processIDS[i]) + ".temp"), badFileName); m->mothurRemove((badFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((locationsFile+toString(processIDS[i]) + ".temp"), locationsFile); m->mothurRemove((locationsFile+toString(processIDS[i]) + ".temp")); } #endif if (fileAligned && adjustNeeded) { //find pend - pend is the biggest ending value, but we must account for when we adjust the start. That adjustment may make the "new" end larger then the largest end. So lets find out what that "new" end will be. ifstream inLocations; m->openInputFile(locationsFile, inLocations); while(!inLocations.eof()) { if (m->control_pressed) { break; } string name = ""; int thisStart = -1; int thisEnd = -1; if (numFPrimers != 0) { inLocations >> name >> thisStart; m->gobble(inLocations); } if (numRPrimers != 0) { inLocations >> name >> thisEnd; m->gobble(inLocations); } else { pend = -1; break; } int myDiff = 0; if (pstart != -1) { if (thisStart != -1) { if (thisStart != pstart) { myDiff += (thisStart - pstart); } } } int myEnd = thisEnd + myDiff; //cout << name << '\t' << thisStart << '\t' << thisEnd << " diff = " << myDiff << '\t' << myEnd << endl; if (thisEnd != -1) { if (myEnd > pend) { pend = myEnd; } } } inLocations.close(); adjustDots(goodFileName, locationsFile, pstart, pend); }else { m->mothurRemove(locationsFile); } return num; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** int PcrSeqsCommand::driverPcr(string filename, string goodFasta, string badFasta, string locationsName, set& badSeqNames, linePair filePos, int& pstart, bool& adjustNeeded){ try { ofstream goodFile; m->openOutputFile(goodFasta, goodFile); ofstream badFile; m->openOutputFile(badFasta, badFile); ofstream locationsFile; m->openOutputFile(locationsName, locationsFile); ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(filePos.start); bool done = false; int count = 0; set lengths; set locations; //locations[0] = beginning locations, //pdiffs, bdiffs, primers, barcodes, revPrimers map primers; map barcodes; //not used vector revPrimer; if (pairedOligos) { map primerPairs = oligos.getPairedPrimers(); for (map::iterator it = primerPairs.begin(); it != primerPairs.end(); it++) { primers[(it->second).forward] = it->first; revPrimer.push_back((it->second).reverse); } }else{ primers = oligos.getPrimers(); revPrimer = oligos.getReversePrimers(); } TrimOligos trim(pdiffs, 0, primers, barcodes, revPrimer); while (!done) { if (m->control_pressed) { break; } Sequence currSeq(inFASTA); m->gobble(inFASTA); if (fileAligned) { //assume aligned until proven otherwise lengths.insert(currSeq.getAligned().length()); if (lengths.size() > 1) { fileAligned = false; } } string trashCode = ""; string locationsString = ""; int thisPStart = -1; int thisPEnd = -1; int totalDiffs = 0; string commentString = ""; if (m->control_pressed) { break; } if (currSeq.getName() != "") { if (m->debug) { m->mothurOut("[DEBUG]: seq name = " + currSeq.getName() + ".\n"); } bool goodSeq = true; if (oligosfile != "") { map mapAligned; bool aligned = isAligned(currSeq.getAligned(), mapAligned); //process primers if (primers.size() != 0) { int primerStart = 0; int primerEnd = 0; vector results = trim.findForward(currSeq, primerStart, primerEnd); bool good = true; if (results[0] > pdiffs) { good = false; } totalDiffs += results[0]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trim.getCodeValue(results[1], pdiffs) + ") "; if(!good){ if (nomatch == "reject") { goodSeq = false; } trashCode += "f"; } else{ //are you aligned if (aligned) { if (!keepprimer) { if (keepdots) { currSeq.filterToPos(mapAligned[primerEnd-1]+1); } //mapAligned[primerEnd-1] is the location of the last base in the primer. we want to trim to the space just after that. The -1 & +1 ensures if the primer is followed by gaps they are not trimmed causing an aligned sequence dataset to become unaligned. else { currSeq.setAligned(currSeq.getAligned().substr(mapAligned[primerEnd-1]+1)); if (fileAligned) { thisPStart = mapAligned[primerEnd-1]+1; //locations[0].insert(mapAligned[primerEnd-1]+1); locationsString += currSeq.getName() + "\t" + toString(mapAligned[primerEnd-1]+1) + "\n"; } } } else { if (keepdots) { currSeq.filterToPos(mapAligned[primerStart]); } else { currSeq.setAligned(currSeq.getAligned().substr(mapAligned[primerStart])); if (fileAligned) { thisPStart = mapAligned[primerStart]; //locations[0].insert(mapAligned[primerStart]); locationsString += currSeq.getName() + "\t" + toString(mapAligned[primerStart]) + "\n"; } } } isAligned(currSeq.getAligned(), mapAligned); }else { if (!keepprimer) { currSeq.setAligned(currSeq.getUnaligned().substr(primerEnd)); } else { currSeq.setAligned(currSeq.getUnaligned().substr(primerStart)); } } } } //process reverse primers if (revPrimer.size() != 0) { int primerStart = 0; int primerEnd = 0; vector results = trim.findReverse(currSeq, primerStart, primerEnd); bool good = true; if (results[0] > pdiffs) { good = false; } totalDiffs += results[0]; commentString += "rpdiffs=" + toString(results[0]) + "(" + trim.getCodeValue(results[1], pdiffs) + ") "; if(!good){ if (nomatch == "reject") { goodSeq = false; } trashCode += "r"; } else{ //are you aligned if (aligned) { if (!keepprimer) { if (keepdots) { currSeq.filterFromPos(mapAligned[primerStart]); } else { currSeq.setAligned(currSeq.getAligned().substr(0, mapAligned[primerStart])); if (fileAligned) { thisPEnd = mapAligned[primerStart]; //locations[1].insert(mapAligned[primerStart]); locationsString += currSeq.getName() + "\t" + toString(mapAligned[primerStart]) + "\n"; } } } else { if (keepdots) { currSeq.filterFromPos(mapAligned[primerEnd-1]+1); } else { currSeq.setAligned(currSeq.getAligned().substr(0, mapAligned[primerEnd-1]+1)); if (fileAligned) { thisPEnd = mapAligned[primerEnd-1]+1; //locations[1].insert(mapAligned[primerEnd-1]+1); locationsString += currSeq.getName() + "\t" + toString(mapAligned[primerEnd-1]+1) + "\n"; } } } } else { if (!keepprimer) { currSeq.setAligned(currSeq.getUnaligned().substr(0, primerStart)); } else { currSeq.setAligned(currSeq.getUnaligned().substr(0, primerEnd)); } } } } }else if (ecolifile != "") { //make sure the seqs are aligned if (!fileAligned) { m->mothurOut("[ERROR]: seqs are not aligned. When using start and end your sequences must be aligned.\n"); m->control_pressed = true; break; } else if (currSeq.getAligned().length() != length) { m->mothurOut("[ERROR]: seqs are not the same length as ecoli seq. When using ecoli option your sequences must be aligned and the same length as the ecoli sequence.\n"); m->control_pressed = true; break; }else { if (keepdots) { currSeq.filterToPos(start); currSeq.filterFromPos(end); }else { string seqString = currSeq.getAligned().substr(0, end); seqString = seqString.substr(start); currSeq.setAligned(seqString); } } }else{ //using start and end to trim //make sure the seqs are aligned if (!fileAligned) { m->mothurOut("[ERROR]: seqs are not aligned. When using start and end your sequences must be aligned.\n"); m->control_pressed = true; break; } else { if (end != -1) { if (end > currSeq.getAligned().length()) { m->mothurOut("[ERROR]: end is longer than your sequence length, aborting.\n"); m->control_pressed = true; break; } else { if (keepdots) { currSeq.filterFromPos(end); } else { string seqString = currSeq.getAligned().substr(0, end); currSeq.setAligned(seqString); } } } if (start != -1) { if (keepdots) { currSeq.filterToPos(start); } else { string seqString = currSeq.getAligned().substr(start); currSeq.setAligned(seqString); } } } } if (commentString != "") { string seqComment = currSeq.getComment(); currSeq.setComment("\t" + commentString + "\t" + seqComment); } if (totalDiffs > pdiffs) { trashCode += "t"; goodSeq = false; } //trimming removed all bases // if (currSeq.getUnaligned() == "") { goodSeq = false; } if(goodSeq == 1) { currSeq.printSequence(goodFile); if (m->debug) { m->mothurOut("[DEBUG]: " + locationsString + "\n"); } if (thisPStart != -1) { locations.insert(thisPStart); } if (locationsString != "") { locationsFile << locationsString; } } else { badSeqNames.insert(currSeq.getName()); currSeq.setName(currSeq.getName() + '|' + trashCode); currSeq.printSequence(badFile); } count++; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= filePos.end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count)+"\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count)+"\n"); } badFile.close(); goodFile.close(); inFASTA.close(); locationsFile.close(); if (m->debug) { m->mothurOut("[DEBUG]: fileAligned = " + toString(fileAligned) +'\n'); } if (fileAligned && !keepdots) { //print out smallest start value and largest end value if (locations.size() > 1) { adjustNeeded = true; } if (primers.size() != 0) { set::iterator it = locations.begin(); pstart = *it; } } return count; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "driverPcr"); exit(1); } } //********************************************************************/ bool PcrSeqsCommand::isAligned(string seq, map& aligned){ try { aligned.clear(); bool isAligned = false; int countBases = 0; for (int i = 0; i < seq.length(); i++) { if (!isalpha(seq[i])) { isAligned = true; } else { aligned[countBases] = i; countBases++; } //maps location in unaligned -> location in aligned. } //ie. the 3rd base may be at spot 10 in the alignment //later when we trim we want to trim from spot 10. return isAligned; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "isAligned"); exit(1); } } //********************************************************************************************************************** int PcrSeqsCommand::adjustDots(string goodFasta, string locations, int pstart, int pend){ try { ifstream inFasta; m->openInputFile(goodFasta, inFasta); ifstream inLocations; m->openInputFile(locations, inLocations); ofstream out; m->openOutputFile(goodFasta+".temp", out); set lengths; //cout << pstart << '\t' << pend << endl; //if (pstart > pend) { //swap them while(!inFasta.eof()) { if(m->control_pressed) { break; } Sequence seq(inFasta); m->gobble(inFasta); string name = ""; int thisStart = -1; int thisEnd = -1; if (numFPrimers != 0) { inLocations >> name >> thisStart; m->gobble(inLocations); } if (numRPrimers != 0) { inLocations >> name >> thisEnd; m->gobble(inLocations); } //cout << seq.getName() << '\t' << thisStart << '\t' << thisEnd << '\t' << seq.getAligned().length() << endl; //cout << seq.getName() << '\t' << pstart << '\t' << pend << endl; if (name != seq.getName()) { m->mothurOut("[ERROR]: name mismatch in pcr.seqs.\n"); } else { if (pstart != -1) { if (thisStart != -1) { if (thisStart != pstart) { string dots = ""; for (int i = pstart; i < thisStart; i++) { dots += "."; } thisEnd += dots.length(); dots += seq.getAligned(); seq.setAligned(dots); } } } if (pend != -1) { if (thisEnd != -1) { if (thisEnd != pend) { string dots = seq.getAligned(); for (int i = thisEnd; i < pend; i++) { dots += "."; } seq.setAligned(dots); } } } lengths.insert(seq.getAligned().length()); } seq.printSequence(out); } inFasta.close(); inLocations.close(); out.close(); m->mothurRemove(locations); m->mothurRemove(goodFasta); m->renameFile(goodFasta+".temp", goodFasta); //cout << "final lengths = \n"; //for (set::iterator it = lengths.begin(); it != lengths.end(); it++) { //cout << *it << endl; // cout << lengths.count(*it) << endl; // } return 0; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "adjustDots"); exit(1); } } //*************************************************************************************************************** bool PcrSeqsCommand::readEcoli(){ try { ifstream in; m->openInputFile(ecolifile, in); //read seq if (!in.eof()){ Sequence ecoli(in); length = ecoli.getAligned().length(); start = ecoli.getStartPos(); end = ecoli.getEndPos(); }else { in.close(); m->control_pressed = true; return false; } in.close(); return true; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "readEcoli"); exit(1); } } //*************************************************************************************************************** int PcrSeqsCommand::writeAccnos(set badNames){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); string outputFileName = getOutputFileName("accnos",variables); outputNames.push_back(outputFileName); outputTypes["accnos"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); for (set::iterator it = badNames.begin(); it != badNames.end(); it++) { if (m->control_pressed) { break; } out << (*it) << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "writeAccnos"); exit(1); } } //*************************************************************************************************************** int PcrSeqsCommand::readName(set& names){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputFileName = getOutputFileName("name", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); in >> secondCol; string savedSecond = secondCol; vector parsedNames; m->splitAtComma(secondCol, parsedNames); vector validSecond; validSecond.clear(); for (int i = 0; i < parsedNames.size(); i++) { if (names.count(parsedNames[i]) == 0) { validSecond.push_back(parsedNames[i]); } } if (validSecond.size() != parsedNames.size()) { //we want to get rid of someone, so get rid of everyone for (int i = 0; i < parsedNames.size(); i++) { names.insert(parsedNames[i]); } removedCount += parsedNames.size(); }else { out << firstCol << '\t' << savedSecond << endl; wroteSomething = true; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["name"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your name file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "readName"); exit(1); } } //********************************************************************************************************************** int PcrSeqsCommand::readGroup(set names){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(groupfile, in); string name, group; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> group; //read from second column //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t' << group << endl; }else { removedCount++; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["group"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your group file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int PcrSeqsCommand::readTax(set names){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(taxfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(taxfile)); variables["[extension]"] = m->getExtension(taxfile); string outputFileName = getOutputFileName("taxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(taxfile, in); string name, tax; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> tax; //read from second column //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t' << tax << endl; }else { removedCount++; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["taxonomy"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your taxonomy file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "readTax"); exit(1); } } //*************************************************************************************************************** int PcrSeqsCommand::readCount(set badSeqNames){ try { ifstream in; m->openInputFile(countfile, in); set::iterator it; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string goodCountFile = getOutputFileName("count", variables); outputNames.push_back(goodCountFile); outputTypes["count"].push_back(goodCountFile); ofstream goodCountOut; m->openOutputFile(goodCountFile, goodCountOut); string headers = m->getline(in); m->gobble(in); goodCountOut << headers << endl; string test = headers; vector pieces = m->splitWhiteSpace(test); string name, rest; int thisTotal, removedCount; removedCount = 0; rest = ""; bool wroteSomething = false; while (!in.eof()) { if (m->control_pressed) { goodCountOut.close(); in.close(); m->mothurRemove(goodCountFile); return 0; } in >> name; m->gobble(in); in >> thisTotal; m->gobble(in); if (pieces.size() > 2) { rest = m->getline(in); m->gobble(in); } if (badSeqNames.count(name) != 0) { removedCount+=thisTotal; } else{ wroteSomething = true; goodCountOut << name << '\t' << thisTotal << '\t' << rest << endl; } } in.close(); goodCountOut.close(); if (m->control_pressed) { m->mothurRemove(goodCountFile); } if (wroteSomething == false) { m->mothurOut("Your count file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } //check for groups that have been eliminated CountTable ct; if (ct.testGroups(goodCountFile)) { ct.readTable(goodCountFile, true, false); ct.printTable(goodCountFile); } if (m->control_pressed) { m->mothurRemove(goodCountFile); } m->mothurOut("Removed " + toString(removedCount) + " sequences from your count file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "readCOunt"); exit(1); } } //*************************************************************************************************************** int PcrSeqsCommand::readOligos(){ try { oligos.read(oligosfile); if (m->control_pressed) { return false; } //error in reading oligos if (oligos.hasPairedPrimers()) { pairedOligos = true; numFPrimers = oligos.getPairedPrimers().size(); }else { pairedOligos = false; numFPrimers = oligos.getPrimers().size(); } numRPrimers = oligos.getReversePrimers().size(); if (oligos.getLinkers().size() != 0) { m->mothurOut("[WARNING]: pcr.seqs is not setup to remove linkers, ignoring.\n"); } if (oligos.getSpacers().size() != 0) { m->mothurOut("[WARNING]: pcr.seqs is not setup to remove spacers, ignoring.\n"); } return true; } catch(exception& e) { m->errorOut(e, "PcrSeqsCommand", "readOligos"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/pcrseqscommand.h000066400000000000000000000432261255543666200214360ustar00rootroot00000000000000#ifndef Mothur_pcrseqscommand_h #define Mothur_pcrseqscommand_h // // pcrseqscommand.h // Mothur // // Created by Sarah Westcott on 3/14/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "sequence.hpp" #include "trimoligos.h" #include "alignment.hpp" #include "needlemanoverlap.hpp" #include "counttable.h" #include "oligos.h" class PcrSeqsCommand : public Command { public: PcrSeqsCommand(string); PcrSeqsCommand(); ~PcrSeqsCommand(){} vector setParameters(); string getCommandName() { return "pcr.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Pcr.seqs"; } string getDescription() { return "pcr.seqs"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lines; bool abort, keepprimer, keepdots, fileAligned, pairedOligos; string fastafile, oligosfile, taxfile, groupfile, namefile, countfile, ecolifile, outputDir, nomatch; int start, end, processors, length, pdiffs, numFPrimers, numRPrimers; Oligos oligos; vector outputNames; int writeAccnos(set); int readName(set&); int readGroup(set); int readTax(set); int readCount(set); int readOligos(); bool readEcoli(); int driverPcr(string, string, string, string, set&, linePair, int&, bool&); int createProcesses(string, string, string, set&); bool isAligned(string, map&); int adjustDots(string, string, int, int); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct pcrData { string filename; string goodFasta, badFasta, oligosfile, ecolifile, nomatch, locationsName; unsigned long long fstart; unsigned long long fend; int count, start, end, length, pdiffs, pstart, pend; MothurOut* m; set badSeqNames; bool keepprimer, keepdots, fileAligned, adjustNeeded; pcrData(){} pcrData(string f, string gf, string bfn, string loc, MothurOut* mout, string ol, string ec, string nm, bool kp, bool kd, int st, int en, int l, int pd, unsigned long long fst, unsigned long long fen) { filename = f; goodFasta = gf; badFasta = bfn; m = mout; oligosfile = ol; ecolifile = ec; nomatch = nm; keepprimer = kp; keepdots = kd; end = en; start = st; length = l; fstart = fst; fend = fen; pdiffs = pd; locationsName = loc; count = 0; fileAligned = true; adjustNeeded = false; pstart = -1; pend = -1; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyPcrThreadFunction(LPVOID lpParam){ pcrData* pDataArray; pDataArray = (pcrData*)lpParam; try { ofstream goodFile; pDataArray->m->openOutputFile(pDataArray->goodFasta, goodFile); ofstream badFile; pDataArray->m->openOutputFile(pDataArray->badFasta, badFile); ofstream locationsFile; pDataArray->m->openOutputFile(pDataArray->locationsName, locationsFile); ifstream inFASTA; pDataArray->m->openInputFile(pDataArray->filename, inFASTA); //print header if you are process 0 if ((pDataArray->fstart == 0) || (pDataArray->fstart == 1)) { inFASTA.seekg(0); }else { //this accounts for the difference in line endings. inFASTA.seekg(pDataArray->fstart-1); pDataArray->m->gobble(inFASTA); } set lengths; //pdiffs, bdiffs, primers, barcodes, revPrimers map faked; set locations; //locations = beginning locations Oligos oligos; int numFPrimers, numRPrimers; numFPrimers = 0; numRPrimers = 0; map primers; map barcodes; //not used vector revPrimer; if (pDataArray->oligosfile != "") { oligos.read(pDataArray->oligosfile); if (oligos.hasPairedPrimers()) { map primerPairs = oligos.getPairedPrimers(); for (map::iterator it = primerPairs.begin(); it != primerPairs.end(); it++) { primers[(it->second).forward] = it->first; revPrimer.push_back((it->second).reverse); } }else { primers = oligos.getPrimers(); revPrimer = oligos.getReversePrimers(); } numRPrimers = revPrimer.size(); numFPrimers = primers.size(); } TrimOligos trim(pDataArray->pdiffs, 0, primers, barcodes, revPrimer); for(int i = 0; i < pDataArray->fend; i++){ //end is the number of sequences to process pDataArray->count++; if (pDataArray->m->control_pressed) { break; } Sequence currSeq(inFASTA); pDataArray->m->gobble(inFASTA); if (pDataArray->fileAligned) { //assume aligned until proven otherwise lengths.insert(currSeq.getAligned().length()); if (lengths.size() > 1) { pDataArray->fileAligned = false; } } string trashCode = ""; string locationsString = ""; int thisPStart = -1; int thisPEnd = -1; int totalDiffs = 0; string commentString = ""; if (currSeq.getName() != "") { bool goodSeq = true; if (pDataArray->oligosfile != "") { map mapAligned; //bool aligned = isAligned(currSeq.getAligned(), mapAligned); /////////////////////////////////////////////////////////////// bool aligned = false; string seq = currSeq.getAligned(); int countBases = 0; for (int k = 0; k < seq.length(); k++) { if (!isalpha(seq[k])) { aligned = true; } else { mapAligned[countBases] = k; countBases++; } //maps location in unaligned -> location in aligned. } //ie. the 3rd base may be at spot 10 in the alignment //later when we trim we want to trim from spot 10. /////////////////////////////////////////////////////////////// //process primers if (numFPrimers != 0) { int primerStart = 0; int primerEnd = 0; vector results = trim.findForward(currSeq, primerStart, primerEnd); bool good = true; if (results[0] > pDataArray->pdiffs) { good = false; } totalDiffs += results[0]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trim.getCodeValue(results[1], pDataArray->pdiffs) + ") "; if(!good){ if (pDataArray->nomatch == "reject") { goodSeq = false; } trashCode += "f"; } else{ //are you aligned if (aligned) { if (!pDataArray->keepprimer) { if (pDataArray->keepdots) { currSeq.filterToPos(mapAligned[primerEnd-1]+1); } else { currSeq.setAligned(currSeq.getAligned().substr(mapAligned[primerEnd-1]+1)); if (pDataArray->fileAligned) { thisPStart = mapAligned[primerEnd-1]+1; //locations.insert(mapAligned[primerEnd-1]+1); locationsString += currSeq.getName() + "\t" + toString(mapAligned[primerEnd-1]+1) + "\n"; } } } else { if (pDataArray->keepdots) { currSeq.filterToPos(mapAligned[primerStart]); } else { currSeq.setAligned(currSeq.getAligned().substr(mapAligned[primerStart])); if (pDataArray->fileAligned) { thisPStart = mapAligned[primerStart]; //locations.insert(mapAligned[primerStart]); locationsString += currSeq.getName() + "\t" + toString(mapAligned[primerStart]) + "\n"; } } } /////////////////////////////////////////////////////////////// mapAligned.clear(); string seq = currSeq.getAligned(); int countBases = 0; for (int k = 0; k < seq.length(); k++) { if (!isalpha(seq[k])) { ; } else { mapAligned[countBases] = k; countBases++; } } /////////////////////////////////////////////////////////////// }else { if (!pDataArray->keepprimer) { currSeq.setAligned(currSeq.getUnaligned().substr(primerEnd)); } else { currSeq.setAligned(currSeq.getUnaligned().substr(primerStart)); } } } } //process reverse primers if (numRPrimers != 0) { int primerStart = 0; int primerEnd = 0; vector results = trim.findReverse(currSeq, primerStart, primerEnd); bool good = true; if (results[0] > pDataArray->pdiffs) { good = false; } totalDiffs += results[0]; commentString += "rpdiffs=" + toString(results[0]) + "(" + trim.getCodeValue(results[1], pDataArray->pdiffs) + ") "; if(!good){ if (pDataArray->nomatch == "reject") { goodSeq = false; } trashCode += "r"; } else{ //are you aligned if (aligned) { if (!pDataArray->keepprimer) { if (pDataArray->keepdots) { currSeq.filterFromPos(mapAligned[primerStart]); } else { currSeq.setAligned(currSeq.getAligned().substr(0, mapAligned[primerStart])); if (pDataArray->fileAligned) { thisPEnd = mapAligned[primerStart]; //locations.insert(mapAligned[primerStart]); locationsString += currSeq.getName() + "\t" + toString(mapAligned[primerStart]) + "\n"; } } } else { if (pDataArray->keepdots) { currSeq.filterFromPos(mapAligned[primerEnd-1]+1); } else { currSeq.setAligned(currSeq.getAligned().substr(0, mapAligned[primerEnd-1]+1)); if (pDataArray->fileAligned) { thisPEnd = mapAligned[primerEnd-1]+1; //locations.insert(mapAligned[primerEnd-1]+1); locationsString += currSeq.getName() + "\t" + toString(mapAligned[primerEnd-1]+1) + "\n"; } } } } else { if (!pDataArray->keepprimer) { currSeq.setAligned(currSeq.getUnaligned().substr(0, primerStart)); } else { currSeq.setAligned(currSeq.getUnaligned().substr(0, primerEnd)); } } } } }else if (pDataArray->ecolifile != "") { //make sure the seqs are aligned if (!pDataArray->fileAligned) { pDataArray->m->mothurOut("[ERROR]: seqs are not aligned. When using start and end your sequences must be aligned.\n"); pDataArray->m->control_pressed = true; break; } else if (currSeq.getAligned().length() != pDataArray->length) { pDataArray->m->mothurOut("[ERROR]: seqs are not the same length as ecoli seq. When using ecoli option your sequences must be aligned and the same length as the ecoli sequence.\n"); pDataArray->m->control_pressed = true; break; }else { if (pDataArray->keepdots) { currSeq.filterToPos(pDataArray->start); currSeq.filterFromPos(pDataArray->end); }else { string seqString = currSeq.getAligned().substr(0, pDataArray->end); seqString = seqString.substr(pDataArray->start); currSeq.setAligned(seqString); } } }else{ //using start and end to trim //make sure the seqs are aligned if (!pDataArray->fileAligned) { pDataArray->m->mothurOut("[ERROR]: seqs are not aligned. When using start and end your sequences must be aligned.\n"); pDataArray->m->control_pressed = true; break; } else { if (pDataArray->end != -1) { if (pDataArray->end > currSeq.getAligned().length()) { pDataArray->m->mothurOut("[ERROR]: end is longer than your sequence length, aborting.\n"); pDataArray->m->control_pressed = true; break; } else { if (pDataArray->keepdots) { currSeq.filterFromPos(pDataArray->end); } else { string seqString = currSeq.getAligned().substr(0, pDataArray->end); currSeq.setAligned(seqString); } } } if (pDataArray->start != -1) { if (pDataArray->keepdots) { currSeq.filterToPos(pDataArray->start); } else { string seqString = currSeq.getAligned().substr(pDataArray->start); currSeq.setAligned(seqString); } } } } if (commentString != "") { string seqComment = currSeq.getComment(); currSeq.setComment("\t" + commentString + "\t" + seqComment); } if (totalDiffs > pDataArray->pdiffs) { trashCode += "t"; goodSeq = false; } //trimming removed all bases if (currSeq.getUnaligned() == "") { goodSeq = false; } if(goodSeq == 1) { currSeq.printSequence(goodFile); if (locationsString != "") { locationsFile << locationsString; } if (thisPStart != -1) { locations.insert(thisPStart); } } else { pDataArray->badSeqNames.insert(currSeq.getName()); currSeq.setName(currSeq.getName() + '|' + trashCode); currSeq.printSequence(badFile); } } //report progress if((i+1) % 100 == 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(i+1)+"\n"); } } //report progress if((pDataArray->count) % 100 != 0){ pDataArray->m->mothurOutJustToScreen("Thread Processing sequence: " + toString(pDataArray->count)+"\n"); } goodFile.close(); inFASTA.close(); badFile.close(); locationsFile.close(); if (pDataArray->m->debug) { pDataArray->m->mothurOut("[DEBUG]: fileAligned = " + toString(pDataArray->fileAligned) +'\n'); } if (pDataArray->fileAligned && !pDataArray->keepdots) { //print out smallest start value and largest end value if (locations.size() > 1) { pDataArray->adjustNeeded = true; } if (numFPrimers != 0) { set::iterator it = locations.begin(); pDataArray->pstart = *it; } } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "PcrSeqsCommand", "MyPcrThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/phylodiversitycommand.cpp000066400000000000000000001116651255543666200234120ustar00rootroot00000000000000/* * phylodiversitycommand.cpp * Mothur * * Created by westcott on 4/30/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "phylodiversitycommand.h" #include "treereader.h" //********************************************************************************************************************** vector PhyloDiversityCommand::setParameters(){ try { CommandParameter ptree("tree", "InputTypes", "", "", "none", "none", "none","phylodiv",false,true,true); parameters.push_back(ptree); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pfreq("freq", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pfreq); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter prarefy("rarefy", "Boolean", "", "F", "", "", "","rarefy",false,false); parameters.push_back(prarefy); CommandParameter psubsample("sampledepth", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psubsample); CommandParameter psummary("summary", "Boolean", "", "T", "", "", "","summary",false,false); parameters.push_back(psummary); CommandParameter pcollect("collect", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pcollect); CommandParameter pscale("scale", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pscale); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PhyloDiversityCommand::getHelpString(){ try { string helpString = ""; helpString += "The phylo.diversity command parameters are tree, group, name, count, groups, iters, freq, processors, scale, rarefy, collect and summary. tree and group are required, unless you have valid current files.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. The group names are separated by dashes. By default all groups are used.\n"; helpString += "The iters parameter allows you to specify the number of randomizations to preform, by default iters=1000, if you set rarefy to true.\n"; helpString += "The freq parameter is used indicate when to output your data, by default it is set to 100. But you can set it to a percentage of the number of sequence. For example freq=0.10, means 10%. \n"; helpString += "The sampledepth parameter allows you to enter the number of sequences you want to sample.\n"; helpString += "The scale parameter is used indicate that you want your output scaled to the number of sequences sampled, default = false. \n"; helpString += "The rarefy parameter allows you to create a rarefaction curve. The default is false.\n"; helpString += "The collect parameter allows you to create a collectors curve. The default is false.\n"; helpString += "The summary parameter allows you to create a .summary file. The default is true.\n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; helpString += "The phylo.diversity command should be in the following format: phylo.diversity(groups=yourGroups, rarefy=yourRarefy, iters=yourIters).\n"; helpString += "Example phylo.diversity(groups=A-B-C, rarefy=T, iters=500).\n"; helpString += "The phylo.diversity command output two files: .phylo.diversity and if rarefy=T, .rarefaction.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string PhyloDiversityCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "phylodiv") { pattern = "[filename],[tag],phylodiv"; } else if (type == "rarefy") { pattern = "[filename],[tag],phylodiv.rarefaction"; } else if (type == "summary") { pattern = "[filename],[tag],phylodiv.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** PhyloDiversityCommand::PhyloDiversityCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["phylodiv"] = tempOutNames; outputTypes["rarefy"] = tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "PhyloDiversityCommand"); exit(1); } } //********************************************************************************************************************** PhyloDiversityCommand::PhyloDiversityCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters();; OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["phylodiv"] = tempOutNames; outputTypes["rarefy"] = tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("tree"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["tree"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters treefile = validParameter.validFile(parameters, "tree", true); if (treefile == "not open") { treefile = ""; abort = true; } else if (treefile == "not found") { //if there is a current design file, use it treefile = m->getTreeFile(); if (treefile != "") { m->mothurOut("Using " + treefile + " as input file for the tree parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current tree file and the tree parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setTreeFile(treefile); } //check for required parameters groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(treefile); } string temp; temp = validParameter.validFile(parameters, "freq", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, freq); temp = validParameter.validFile(parameters, "rarefy", false); if (temp == "not found") { temp = "F"; } rarefy = m->isTrue(temp); temp = validParameter.validFile(parameters, "sampledepth", false); if (temp == "not found") { temp = "0"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); if (subsampleSize == 0) { subsample = false; } else { subsample = true; } }else { subsample = false; m->mothurOut("[ERROR]: sampledepth must be numeric, aborting.\n"); m->mothurOutEndLine(); abort=true; } if (subsample) { rarefy = true; } temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); if (!rarefy) { iters = 1; } temp = validParameter.validFile(parameters, "summary", false); if (temp == "not found") { temp = "T"; } summary = m->isTrue(temp); temp = validParameter.validFile(parameters, "scale", false); if (temp == "not found") { temp = "F"; } scale = m->isTrue(temp); temp = validParameter.validFile(parameters, "collect", false); if (temp == "not found") { temp = "F"; } collect = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } if ((!collect) && (!rarefy) && (!summary)) { m->mothurOut("No outputs selected. You must set either collect, rarefy or summary to true, summary=T by default."); m->mothurOutEndLine(); abort=true; } if (countfile=="") { if (namefile == "") { vector files; files.push_back(treefile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "PhyloDiversityCommand"); exit(1); } } //********************************************************************************************************************** int PhyloDiversityCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); m->setTreeFile(treefile); TreeReader* reader; if (countfile == "") { reader = new TreeReader(treefile, groupfile, namefile); } else { reader = new TreeReader(treefile, countfile); } vector trees = reader->getTrees(); ct = trees[0]->getCountTable(); delete reader; SharedUtil util; vector mGroups = m->getGroups(); vector tGroups = ct->getNamesOfGroups(); util.setGroups(mGroups, tGroups, "phylo.diversity"); //sets the groups the user wants to analyze //incase the user had some mismatches between the tree and group files we don't want group xxx to be analyzed for (int i = 0; i < mGroups.size(); i++) { if (mGroups[i] == "xxx") { mGroups.erase(mGroups.begin()+i); break; } } m->setGroups(mGroups); vector outputNames; //for each of the users trees for(int i = 0; i < trees.size(); i++) { if (m->control_pressed) { delete ct; for (int j = 0; j < trees.size(); j++) { delete trees[j]; } for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } ofstream outSum, outRare, outCollect; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(treefile)); variables["[tag]"] = toString(i+1); string outSumFile = getOutputFileName("summary",variables); string outRareFile = getOutputFileName("rarefy",variables); string outCollectFile = getOutputFileName("phylodiv",variables); if (summary) { m->openOutputFile(outSumFile, outSum); outputNames.push_back(outSumFile); outputTypes["summary"].push_back(outSumFile); } if (rarefy) { m->openOutputFile(outRareFile, outRare); outputNames.push_back(outRareFile); outputTypes["rarefy"].push_back(outRareFile); } if (collect) { m->openOutputFile(outCollectFile, outCollect); outputNames.push_back(outCollectFile); outputTypes["phylodiv"].push_back(outCollectFile); } int numLeafNodes = trees[i]->getNumLeaves(); //create a vector containing indexes of leaf nodes, randomize it, select nodes to send to calculator vector randomLeaf; for (int j = 0; j < numLeafNodes; j++) { if (m->inUsersGroups(trees[i]->tree[j].getGroup(), mGroups) == true) { //is this a node from the group the user selected. randomLeaf.push_back(j); } } numLeafNodes = randomLeaf.size(); //reset the number of leaf nodes you are using //each group, each sampling, if no rarefy iters = 1; map > diversity; //each group, each sampling, if no rarefy iters = 1; map > sumDiversity; //find largest group total int largestGroup = 0; for (int j = 0; j < mGroups.size(); j++) { int numSeqsThisGroup = ct->getGroupCount(mGroups[j]); if (numSeqsThisGroup > largestGroup) { largestGroup = numSeqsThisGroup; } //initialize diversity diversity[mGroups[j]].resize(numSeqsThisGroup+1, 0.0); //numSampled //groupA 0.0 0.0 //initialize sumDiversity sumDiversity[mGroups[j]].resize(numSeqsThisGroup+1, 0.0); } //convert freq percentage to number if (subsample) { largestGroup = subsampleSize; } int increment = 100; if (freq < 1.0) { increment = largestGroup * freq; }else { increment = freq; } //initialize sampling spots set numSampledList; for(int k = 1; k <= largestGroup; k++){ if((k == 1) || (k % increment == 0)){ numSampledList.insert(k); } } if(largestGroup % increment != 0){ numSampledList.insert(largestGroup); } //add other groups ending points if (!subsample) { for (int j = 0; j < mGroups.size(); j++) { if (numSampledList.count(diversity[mGroups[j]].size()-1) == 0) { numSampledList.insert(diversity[mGroups[j]].size()-1); } } } if (rarefy) { vector procIters; int numItersPerProcessor = iters / processors; //divide iters between processes for (int h = 0; h < processors; h++) { if(h == processors - 1){ numItersPerProcessor = iters - h * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } createProcesses(procIters, trees[i], diversity, sumDiversity, iters, increment, randomLeaf, numSampledList, outCollect, outSum); }else{ //no need to paralellize if you dont want to rarefy driver(trees[i], diversity, sumDiversity, iters, increment, randomLeaf, numSampledList, outCollect, outSum, true); } if (rarefy) { printData(numSampledList, sumDiversity, outRare, iters); } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to run phylo.diversity."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "execute"); exit(1); } } //********************************************************************************************************************** int PhyloDiversityCommand::createProcesses(vector& procIters, Tree* t, map< string, vector >& div, map >& sumDiv, int numIters, int increment, vector& randomLeaf, set& numSampledList, ofstream& outCollect, ofstream& outSum){ try { int process = 1; vector processIDS; map< string, vector >::iterator itSum; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ driver(t, div, sumDiv, procIters[process], increment, randomLeaf, numSampledList, outCollect, outSum, false); string outTemp = outputDir + m->mothurGetpid(process) + ".sumDiv.temp"; ofstream out; m->openOutputFile(outTemp, out); //output the sumDIversity for (itSum = sumDiv.begin(); itSum != sumDiv.end(); itSum++) { out << itSum->first << '\t' << (itSum->second).size() << '\t'; for (int k = 0; k < (itSum->second).size(); k++) { out << (itSum->second)[k] << '\t'; } out << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i])) + ".sumDiv.temp"); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i])) + ".sumDiv.temp");}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //divide iters between processes procIters.clear(); int numItersPerProcessor = iters / processors; for (int h = 0; h < processors; h++) { if(h == processors - 1){ numItersPerProcessor = iters - h * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ driver(t, div, sumDiv, procIters[process], increment, randomLeaf, numSampledList, outCollect, outSum, false); string outTemp = outputDir + m->mothurGetpid(process) + ".sumDiv.temp"; ofstream out; m->openOutputFile(outTemp, out); //output the sumDIversity for (itSum = sumDiv.begin(); itSum != sumDiv.end(); itSum++) { out << itSum->first << '\t' << (itSum->second).size() << '\t'; for (int k = 0; k < (itSum->second).size(); k++) { out << (itSum->second)[k] << '\t'; } out << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } driver(t, div, sumDiv, procIters[0], increment, randomLeaf, numSampledList, outCollect, outSum, true); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } //get data created by processes for (int i=0;i<(processors-1);i++) { //input the sumDIversity string inTemp = outputDir + toString(processIDS[i]) + ".sumDiv.temp"; ifstream in; m->openInputFile(inTemp, in); //output the sumDIversity for (int j = 0; j < sumDiv.size(); j++) { string group = ""; int size = 0; in >> group >> size; m->gobble(in); for (int k = 0; k < size; k++) { float tempVal; in >> tempVal; sumDiv[group][k] += tempVal; } m->gobble(in); } in.close(); m->mothurRemove(inTemp); } #else //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; vector cts; vector trees; map rootForGroup = getRootForGroups(t); //Create processor worker threads. for( int i=1; icopy(ct); Tree* copyTree = new Tree(copyCount); copyTree->getCopy(t); cts.push_back(copyCount); trees.push_back(copyTree); map > copydiv = div; map > copysumDiv = sumDiv; vector copyrandomLeaf = randomLeaf; set copynumSampledList = numSampledList; map copyRootForGrouping = rootForGroup; phylodivData* temp = new phylodivData(m, procIters[i], copydiv, copysumDiv, copyTree, copyCount, increment, copyrandomLeaf, copynumSampledList, copyRootForGrouping, subsample, subsampleSize); pDataArray.push_back(temp); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyPhyloDivThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } driver(t, div, sumDiv, procIters[0], increment, randomLeaf, numSampledList, outCollect, outSum, true); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (itSum = pDataArray[i]->sumDiv.begin(); itSum != pDataArray[i]->sumDiv.end(); itSum++) { for (int k = 0; k < (itSum->second).size(); k++) { sumDiv[itSum->first][k] += (itSum->second)[k]; } } delete cts[i]; delete trees[i]; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif return 0; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** int PhyloDiversityCommand::driver(Tree* t, map< string, vector >& div, map >& sumDiv, int numIters, int increment, vector& randomLeaf, set& numSampledList, ofstream& outCollect, ofstream& outSum, bool doSumCollect){ try { int numLeafNodes = randomLeaf.size(); vector mGroups = m->getGroups(); map rootForGroup = getRootForGroups(t); //maps groupName to root node in tree. "root" for group may not be the trees root and we don't want to include the extra branches. for (int l = 0; l < numIters; l++) { random_shuffle(randomLeaf.begin(), randomLeaf.end()); //initialize counts map counts; vector< map > countedBranch; for (int i = 0; i < t->getNumNodes(); i++) { map temp; for (int j = 0; j < mGroups.size(); j++) { temp[mGroups[j]] = false; } countedBranch.push_back(temp); } for (int j = 0; j < mGroups.size(); j++) { counts[mGroups[j]] = 0; } map metCount; bool allDone = false; for (int j = 0; j < mGroups.size(); j++) { counts[mGroups[j]] = false; } for(int k = 0; k < numLeafNodes; k++){ if (m->control_pressed) { return 0; } //calc branch length of randomLeaf k vector br = calcBranchLength(t, randomLeaf[k], countedBranch, rootForGroup); //for each group in the groups update the total branch length accounting for the names file vector groups = t->tree[randomLeaf[k]].getGroup(); for (int j = 0; j < groups.size(); j++) { if (m->inUsersGroups(groups[j], mGroups)) { int numSeqsInGroupJ = 0; map::iterator it; it = t->tree[randomLeaf[k]].pcount.find(groups[j]); if (it != t->tree[randomLeaf[k]].pcount.end()) { //this leaf node contains seqs from group j numSeqsInGroupJ = it->second; } if (numSeqsInGroupJ != 0) { div[groups[j]][(counts[groups[j]]+1)] = div[groups[j]][counts[groups[j]]] + br[j]; } for (int s = (counts[groups[j]]+2); s <= (counts[groups[j]]+numSeqsInGroupJ); s++) { div[groups[j]][s] = div[groups[j]][s-1]; //update counts, but don't add in redundant branch lengths } counts[groups[j]] += numSeqsInGroupJ; if (subsample) { if (counts[groups[j]] >= subsampleSize) { metCount[groups[j]] = true; } bool allTrue = true; for (int h = 0; h < mGroups.size(); h++) { if (!metCount[mGroups[h]]) { allTrue = false; } } if (allTrue) { allDone = true; } } if (allDone) { j+=groups.size(); k+=numLeafNodes; } } } } //if you subsample then rarefy=t if (rarefy) { //add this diversity to the sum for (int j = 0; j < mGroups.size(); j++) { for (int g = 0; g < div[mGroups[j]].size(); g++) { sumDiv[mGroups[j]][g] += div[mGroups[j]][g]; } } } if ((collect) && (l == 0) && doSumCollect) { printData(numSampledList, div, outCollect, 1); } if ((summary) && (l == 0) && doSumCollect) { printSumData(div, outSum, 1); } } return 0; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "driver"); exit(1); } } //********************************************************************************************************************** void PhyloDiversityCommand::printSumData(map< string, vector >& div, ofstream& out, int numIters){ try { out << "Groups\tnumSampled\tphyloDiversity" << endl; out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); vector mGroups = m->getGroups(); int numSampled = 0; for (int j = 0; j < mGroups.size(); j++) { if (subsample) { numSampled = subsampleSize; } else { numSampled = (div[mGroups[j]].size()-1); } out << mGroups[j] << '\t' << numSampled << '\t'; float score; if (scale) { score = (div[mGroups[j]][numSampled] / (float)numIters) / (float)numSampled; } else { score = div[mGroups[j]][numSampled] / (float)numIters; } out << setprecision(4) << score << endl; //cout << mGroups[j] << '\t' << numSampled << '\t'<< setprecision(4) << score << endl; } out.close(); } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "printSumData"); exit(1); } } //********************************************************************************************************************** void PhyloDiversityCommand::printData(set& num, map< string, vector >& div, ofstream& out, int numIters){ try { out << "numSampled"; vector mGroups = m->getGroups(); for (int i = 0; i < mGroups.size(); i++) { out << '\t' << mGroups[i]; } out << endl; out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); for (set::iterator it = num.begin(); it != num.end(); it++) { int numSampled = *it; out << numSampled; for (int j = 0; j < mGroups.size(); j++) { if (numSampled < div[mGroups[j]].size()) { float score; if (scale) { score = (div[mGroups[j]][numSampled] / (float)numIters) / (float)numSampled; } else { score = div[mGroups[j]][numSampled] / (float)numIters; } out << '\t' << setprecision(4) << score ; }else { out << "\tNA" ; } } out << endl; } out.close(); } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "printData"); exit(1); } } //********************************************************************************************************************** //need a vector of floats one branch length for every group the node represents. vector PhyloDiversityCommand::calcBranchLength(Tree* t, int leaf, vector< map >& counted, map roots){ try { //calc the branch length //while you aren't at root vector sums; int index = leaf; vector groups = t->tree[leaf].getGroup(); sums.resize(groups.size(), 0.0); //you are a leaf if(t->tree[index].getBranchLength() != -1){ for (int k = 0; k < groups.size(); k++) { sums[k] += abs(t->tree[index].getBranchLength()); } } index = t->tree[index].getParent(); //while you aren't at root while(t->tree[index].getParent() != -1){ if (m->control_pressed) { return sums; } for (int k = 0; k < groups.size(); k++) { if (index >= roots[groups[k]]) { counted[index][groups[k]] = true; } //if you are at this groups "root", then say we are done if (!counted[index][groups[k]]){ //if counted[index][groups[k] is true this groups has already added all br from here to root, so quit early if (t->tree[index].getBranchLength() != -1) { sums[k] += abs(t->tree[index].getBranchLength()); } counted[index][groups[k]] = true; } } index = t->tree[index].getParent(); } return sums; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "calcBranchLength"); exit(1); } } //********************************************************************************************************************** map PhyloDiversityCommand::getRootForGroups(Tree* t){ try { map roots; //maps group to root for group, may not be root of tree map done; //initialize root for all groups to -1 for (int k = 0; k < (t->getCountTable())->getNamesOfGroups().size(); k++) { done[(t->getCountTable())->getNamesOfGroups()[k]] = false; } for (int i = 0; i < t->getNumLeaves(); i++) { vector groups = t->tree[i].getGroup(); int index = t->tree[i].getParent(); for (int j = 0; j < groups.size(); j++) { if (done[groups[j]] == false) { //we haven't found the root for this group yet, initialize it done[groups[j]] = true; roots[groups[j]] = i; //set root to self to start } //while you aren't at root while(t->tree[index].getParent() != -1){ if (m->control_pressed) { return roots; } //do both your chidren have have descendants from the users groups? int lc = t->tree[index].getLChild(); int rc = t->tree[index].getRChild(); int LpcountSize = 0; map:: iterator itGroup = t->tree[lc].pcount.find(groups[j]); if (itGroup != t->tree[lc].pcount.end()) { LpcountSize++; } int RpcountSize = 0; itGroup = t->tree[rc].pcount.find(groups[j]); if (itGroup != t->tree[rc].pcount.end()) { RpcountSize++; } if ((LpcountSize != 0) && (RpcountSize != 0)) { //possible root if (index > roots[groups[j]]) { roots[groups[j]] = index; } }else { ;} index = t->tree[index].getParent(); } //} } } return roots; } catch(exception& e) { m->errorOut(e, "PhyloDiversityCommand", "getRootForGroups"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/phylodiversitycommand.h000066400000000000000000000214321255543666200230470ustar00rootroot00000000000000#ifndef PHYLODIVERSITYCOMMAND_H #define PHYLODIVERSITYCOMMAND_H /* * phylodiversitycommand.h * Mothur * * Created by westcott on 4/30/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "counttable.h" #include "sharedutilities.h" #include "tree.h" class PhyloDiversityCommand : public Command { public: PhyloDiversityCommand(string); PhyloDiversityCommand(); ~PhyloDiversityCommand(){} vector setParameters(); string getCommandName() { return "phylo.diversity"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Faith DP (1994). Phylogenetic pattern and the quantification of organismal biodiversity. Philos Trans R Soc Lond B Biol Sci 345: 45-58. \nhttp://www.mothur.org/wiki/Phylo.diversity"; } string getDescription() { return "phylo.diversity"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CountTable* ct; float freq; int iters, processors, numUniquesInName, subsampleSize; bool abort, rarefy, summary, collect, scale, subsample; string groups, outputDir, treefile, groupfile, namefile, countfile; vector Groups, outputNames; //holds groups to be used, and outputFile names map getRootForGroups(Tree* t); int readNamesFile(); void printData(set&, map< string, vector >&, ofstream&, int); void printSumData(map< string, vector >&, ofstream&, int); vector calcBranchLength(Tree*, int, vector< map >&, map); int driver(Tree*, map< string, vector >&, map >&, int, int, vector&, set&, ofstream&, ofstream&, bool); int createProcesses(vector&, Tree*, map< string, vector >&, map >&, int, int, vector&, set&, ofstream&, ofstream&); }; /***********************************************************************/ struct phylodivData { int numIters; MothurOut* m; map< string, vector > div; map > sumDiv; map rootForGroup; vector randomLeaf; set numSampledList; int increment, subsampleSize; Tree* t; CountTable* ct; bool includeRoot, subsample; phylodivData(){} phylodivData(MothurOut* mout, int ni, map< string, vector > cd, map< string, vector > csd, Tree* tree, CountTable* count, int incre, vector crl, set nsl, map rfg, bool su, int suS) { m = mout; t = tree; ct = count; div = cd; numIters = ni; sumDiv = csd; increment = incre; randomLeaf = crl; numSampledList = nsl; rootForGroup = rfg; subsample = su; subsampleSize = suS; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyPhyloDivThreadFunction(LPVOID lpParam){ phylodivData* pDataArray; pDataArray = (phylodivData*)lpParam; try { int numLeafNodes = pDataArray->randomLeaf.size(); vector mGroups = pDataArray->m->getGroups(); for (int l = 0; l < pDataArray->numIters; l++) { random_shuffle(pDataArray->randomLeaf.begin(), pDataArray->randomLeaf.end()); //initialize counts map counts; vector< map > countedBranch; for (int i = 0; i < pDataArray->t->getNumNodes(); i++) { map temp; for (int j = 0; j < mGroups.size(); j++) { temp[mGroups[j]] = false; } countedBranch.push_back(temp); } for (int j = 0; j < mGroups.size(); j++) { counts[mGroups[j]] = 0; } map metCount; bool allDone = false; for (int j = 0; j < mGroups.size(); j++) { counts[mGroups[j]] = false; } for(int k = 0; k < numLeafNodes; k++){ if (pDataArray->m->control_pressed) { return 0; } //calc branch length of randomLeaf k //vector br = calcBranchLength(t, randomLeaf[k], countedBranch, rootForGroup); //(Tree* t, int leaf, vector< map >& counted, map roots ///////////////////////////////////////////////////////////////////////////////////// vector br; int index = pDataArray->randomLeaf[k]; vector groups = pDataArray->t->tree[pDataArray->randomLeaf[k]].getGroup(); br.resize(groups.size(), 0.0); //you are a leaf if(pDataArray->t->tree[index].getBranchLength() != -1){ for (int k = 0; k < groups.size(); k++) { br[k] += abs(pDataArray->t->tree[index].getBranchLength()); } } index = pDataArray->t->tree[index].getParent(); //while you aren't at root while(pDataArray->t->tree[index].getParent() != -1){ if (pDataArray->m->control_pressed) { return 0; } for (int k = 0; k < groups.size(); k++) { if (index >= pDataArray->rootForGroup[groups[k]]) { countedBranch[index][groups[k]] = true; } //if you are at this groups "root", then say we are done if (!countedBranch[index][groups[k]]){ //if counted[index][groups[k] is true this groups has already added all br from here to root, so quit early if (pDataArray->t->tree[index].getBranchLength() != -1) { br[k] += abs(pDataArray->t->tree[index].getBranchLength()); } countedBranch[index][groups[k]] = true; } } index = pDataArray->t->tree[index].getParent(); } ///////////////////////////////////////////////////////////////////////////////////// //for each group in the groups update the total branch length accounting for the names file groups = pDataArray->t->tree[pDataArray->randomLeaf[k]].getGroup(); for (int j = 0; j < groups.size(); j++) { if (pDataArray->m->inUsersGroups(groups[j], mGroups)) { int numSeqsInGroupJ = 0; map::iterator it; it = pDataArray->t->tree[pDataArray->randomLeaf[k]].pcount.find(groups[j]); if (it != pDataArray->t->tree[pDataArray->randomLeaf[k]].pcount.end()) { //this leaf node contains seqs from group j numSeqsInGroupJ = it->second; } if (numSeqsInGroupJ != 0) { pDataArray->div[groups[j]][(counts[groups[j]]+1)] = pDataArray->div[groups[j]][counts[groups[j]]] + br[j]; } for (int s = (counts[groups[j]]+2); s <= (counts[groups[j]]+numSeqsInGroupJ); s++) { pDataArray->div[groups[j]][s] = pDataArray->div[groups[j]][s-1]; //update counts, but don't add in redundant branch lengths } counts[groups[j]] += numSeqsInGroupJ; if (pDataArray->subsample) { if (counts[groups[j]] >= pDataArray->subsampleSize) { metCount[groups[j]] = true; } bool allTrue = true; for (int h = 0; h < mGroups.size(); h++) { if (!metCount[mGroups[h]]) { allTrue = false; } } if (allTrue) { allDone = true; } } if (allDone) { j+=groups.size(); k+=numLeafNodes; } } } } //add this diversity to the sum for (int j = 0; j < mGroups.size(); j++) { for (int g = 0; g < pDataArray->div[mGroups[j]].size(); g++) { pDataArray->sumDiv[mGroups[j]][g] += pDataArray->div[mGroups[j]][g]; } } } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "PhyloDiversityCommand", "MyPhyloDivThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/phylotypecommand.cpp000066400000000000000000000411021255543666200223350ustar00rootroot00000000000000/* * phylotypecommand.cpp * Mothur * * Created by westcott on 11/20/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "phylotypecommand.h" #include "phylotree.h" #include "listvector.hpp" #include "rabundvector.hpp" #include "sabundvector.hpp" #include "counttable.h" //********************************************************************************************************************** vector PhylotypeCommand::setParameters(){ try { CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "none","list-rabund-sabund",false,true,true); parameters.push_back(ptaxonomy); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "ColumnName","rabund-sabund",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pcutoff("cutoff", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(pcutoff); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PhylotypeCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PhylotypeCommand::getHelpString(){ try { string helpString = ""; helpString += "The phylotype command reads a taxonomy file and outputs a .list, .rabund and .sabund file. \n"; helpString += "The phylotype command parameter options are taxonomy, name, count, cutoff and label. The taxonomy parameter is required.\n"; helpString += "The cutoff parameter allows you to specify the level you want to stop at. The default is the highest level in your taxonomy file. \n"; helpString += "For example: taxonomy = Bacteria;Bacteroidetes-Chlorobi;Bacteroidetes; - cutoff=2, would truncate the taxonomy to Bacteria;Bacteroidetes-Chlorobi; \n"; helpString += "For the cutoff parameter levels count up from the root of the phylotree. This enables you to look at the grouping down to a specific resolution, say the genus level.\n"; helpString += "The label parameter allows you to specify which level you would like, and are separated by dashes. The default all levels in your taxonomy file. \n"; helpString += "For the label parameter, levels count down from the root to keep the output similiar to mothur's other commands which report information from finer resolution to coarser resolutions.\n"; helpString += "The phylotype command should be in the following format: \n"; helpString += "phylotype(taxonomy=yourTaxonomyFile, cutoff=yourCutoff, label=yourLabels) \n"; helpString += "Eaxample: phylotype(taxonomy=amazon.taxonomy, cutoff=5, label=1-3-5).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PhylotypeCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string PhylotypeCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "list") { pattern = "[filename],[clustertag],list-[filename],[clustertag],[tag2],list"; } else if (type == "rabund") { pattern = "[filename],[clustertag],rabund"; } else if (type == "sabund") { pattern = "[filename],[clustertag],sabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PhylotypeCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** PhylotypeCommand::PhylotypeCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["sabund"] = tempOutNames; outputTypes["rabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "PhylotypeCommand", "PhylotypeCommand"); exit(1); } } /**********************************************************************************************************************/ PhylotypeCommand::PhylotypeCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["sabund"] = tempOutNames; outputTypes["rabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } taxonomyFileName = validParameter.validFile(parameters, "taxonomy", true); if (taxonomyFileName == "not found") { taxonomyFileName = m->getTaxonomyFile(); if (taxonomyFileName != "") { m->mothurOut("Using " + taxonomyFileName + " as input file for the taxonomy parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. taxonomy is a required parameter."); m->mothurOutEndLine(); abort = true; } }else if (taxonomyFileName == "not open") { taxonomyFileName = ""; abort = true; } else { m->setTaxonomyFile(taxonomyFileName); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { readNamesFile(); m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(taxonomyFileName); //if user entered a file with a path then preserve it } if ((countfile != "") && (namefile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } string temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, cutoff); label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; allLines = 1; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } if (countfile == "") { if (namefile == "") { vector files; files.push_back(taxonomyFileName); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "PhylotypeCommand", "PhylotypeCommand"); exit(1); } } /**********************************************************************************************************************/ int PhylotypeCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //reads in taxonomy file and makes all the taxonomies the same length //by appending the last taxon to a given taxonomy as many times as needed to //make it as long as the longest taxonomy in the file TaxEqualizer* taxEqual = new TaxEqualizer(taxonomyFileName, cutoff, outputDir); if (m->control_pressed) { delete taxEqual; return 0; } string equalizedTaxFile = taxEqual->getEqualizedTaxFile(); delete taxEqual; //build taxonomy tree from equalized file PhyloTree* tree = new PhyloTree(equalizedTaxFile); vector leaves = tree->getGenusNodes(); //store leaf nodes in current map for (int i = 0; i < leaves.size(); i++) { currentNodes[leaves[i]] = leaves[i]; } bool done = false; if (tree->get(leaves[0]).parent == -1) { m->mothurOut("Empty Tree"); m->mothurOutEndLine(); done = true; } if (m->control_pressed) { delete tree; return 0; } ofstream outList, outRabund, outSabund; map variables; string fileroot = outputDir + m->getRootName(m->getSimpleName(taxonomyFileName)); variables["[filename]"] = fileroot; variables["[clustertag]"] = "tx"; string sabundFileName = getOutputFileName("sabund", variables); string rabundFileName = getOutputFileName("rabund", variables); if (countfile != "") { variables["[tag2]"] = "unique_list"; } string listFileName = getOutputFileName("list", variables); map counts; if (countfile == "") { m->openOutputFile(sabundFileName, outSabund); m->openOutputFile(rabundFileName, outRabund); outputNames.push_back(sabundFileName); outputTypes["sabund"].push_back(sabundFileName); outputNames.push_back(rabundFileName); outputTypes["rabund"].push_back(rabundFileName); }else { CountTable ct; ct.readTable(countfile, false, false); counts = ct.getNameMap(); } m->openOutputFile(listFileName, outList); outputNames.push_back(listFileName); outputTypes["list"].push_back(listFileName); int count = 1; //start at leaves of tree and work towards root, processing the labels the user wants while((!done) && ((allLines == 1) || (labels.size() != 0))) { string level = toString(count); count++; if (m->control_pressed) { if (countfile == "") { outRabund.close(); outSabund.close(); } outList.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete tree; return 0; } //is this a level the user want output for if(allLines == 1 || labels.count(level) == 1){ //output level m->mothurOut(level); m->mothurOutEndLine(); ListVector list; list.setLabel(level); //go through nodes and build listvector for (itCurrent = currentNodes.begin(); itCurrent != currentNodes.end(); itCurrent++) { //get parents TaxNode node = tree->get(itCurrent->first); parentNodes[node.parent] = node.parent; vector names = node.accessions; //make the names compatable with listvector string name = ""; for (int i = 0; i < names.size(); i++) { if (names[i] != "unknown") { if (namefile != "") { map::iterator itNames = namemap.find(names[i]); //make sure this name is in namefile if (itNames != namemap.end()) { name += namemap[names[i]] + ","; } //you found it in namefile else { m->mothurOut("[ERROR]: " + names[i] + " is not in your namefile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } }else{ name += names[i] + ","; } } } if (m->control_pressed) { break; } name = name.substr(0, name.length()-1); //rip off extra ',' //add bin to list vector if (name != "") { list.push_back(name); } //caused by unknown } //print listvector if (!m->printedListHeaders) { list.printHeaders(outList); } if (countfile == "") { list.print(outList); } else { list.print(outList, counts); } if (countfile == "") { //print rabund list.getRAbundVector().print(outRabund); //print sabund list.getSAbundVector().print(outSabund); } labels.erase(level); }else { //just get parents for (itCurrent = currentNodes.begin(); itCurrent != currentNodes.end(); itCurrent++) { int parent = tree->get(itCurrent->first).parent; parentNodes[parent] = parent; } } //move up a level currentNodes = parentNodes; parentNodes.clear(); //have we reached the rootnode if (tree->get(currentNodes.begin()->first).parent == -1) { done = true; } } outList.close(); if (countfile == "") { outSabund.close(); outRabund.close(); } delete tree; if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set list file as new current listfile string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } //set rabund file as new current rabundfile itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } //set sabund file as new current sabundfile itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PhylotypeCommand", "execute"); exit(1); } } /*****************************************************************/ int PhylotypeCommand::readNamesFile() { try { ifstream in; m->openInputFile(namefile, in); string first, second; map::iterator itNames; while(!in.eof()) { in >> first >> second; m->gobble(in); itNames = namemap.find(first); if (itNames == namemap.end()) { namemap[first] = second; }else { m->mothurOut(first + " has already been seen in namefile, disregarding names file."); m->mothurOutEndLine(); in.close(); namemap.clear(); namefile = ""; return 1; } } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "PhylotypeCommand", "readNamesFile"); exit(1); } } /**********************************************************************************************************************/ mothur-1.36.1/source/commands/phylotypecommand.h000066400000000000000000000024641255543666200220120ustar00rootroot00000000000000#ifndef PHYLOTYPECOMMAND_H #define PHYLOTYPECOMMAND_H /* * phylotypecommand.h * Mothur * * Created by westcott on 11/20/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "taxonomyequalizer.h" #include "command.hpp" /*************************************************************************/ class PhylotypeCommand : public Command { public: PhylotypeCommand(string); PhylotypeCommand(); ~PhylotypeCommand(){} vector setParameters(); string getCommandName() { return "phylotype"; } string getCommandCategory() { return "Clustering"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Phylotype"; } string getDescription() { return "cluster your sequences into OTUs based on their classifications"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, allLines; string taxonomyFileName, label, outputDir, namefile, countfile; set labels; //holds labels to be used int cutoff; map namemap; vector outputNames; map currentNodes; map parentNodes; map::iterator itCurrent; int readNamesFile(); }; /*************************************************************************/ #endif mothur-1.36.1/source/commands/pipelinepdscommand.cpp000066400000000000000000001055171255543666200226270ustar00rootroot00000000000000/* * pipelinepdscommand.cpp * Mothur * * Created by westcott on 10/5/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "pipelinepdscommand.h" #include "sffinfocommand.h" #include "commandoptionparser.hpp" //********************************************************************************************************************** vector PipelineCommand::setParameters(){ try { CommandParameter psff("sff", "InputTypes", "", "", "none", "oneRequired", "pipe","",false,false,true); parameters.push_back(psff); CommandParameter poligos("oligos", "InputTypes", "", "", "none", "oneRequired", "pipe","",false,false,true); parameters.push_back(poligos); CommandParameter palign("align", "InputTypes", "", "", "none", "oneRequired", "pipe","",false,false,true); parameters.push_back(palign); CommandParameter pchimera("chimera", "InputTypes", "", "", "none", "oneRequired", "pipe","",false,false,true); parameters.push_back(pchimera); CommandParameter pclassify("classify", "InputTypes", "", "", "none", "oneRequired", "pipe","",false,false,true); parameters.push_back(pclassify); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "oneRequired", "pipe","",false,false,true); parameters.push_back(ptaxonomy); CommandParameter ppipeline("pipeline", "InputTypes", "", "", "none", "oneRequired", "none","",false,false,true); parameters.push_back(ppipeline); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PipelineCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PipelineCommand::getHelpString(){ try { string helpString = ""; helpString += "The pipeline.pds command is designed to guide you through your analysis using mothur.\n"; helpString += "The pipeline.pds command parameters are pipeline, sff, oligos, align, chimera, classify, taxonomy and processors.\n"; helpString += "The sff parameter allows you to enter your sff file. It is required, if not using pipeline parameter.\n"; helpString += "The oligos parameter allows you to enter your oligos file. It is required, if not using pipeline parameter.\n"; helpString += "The align parameter allows you to enter a template to use with the aligner. It is required, if not using pipeline parameter.\n"; helpString += "The chimera parameter allows you to enter a template to use for chimera detection. It is required, if not using pipeline parameter.\n"; helpString += "The classify parameter allows you to enter a template to use for classification. It is required, if not using pipeline parameter.\n"; helpString += "The taxonomy parameter allows you to enter a taxonomy file for the classify template to use for classification. It is required, if not using pipeline parameter.\n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; helpString += "The pipeline parameter allows you to enter your own pipeline file. This file should look like a mothur batchfile, but where you would be using a mothur generated file, you can use current instead.\n"; helpString += "Example: trim.seqs(processors=8, allfiles=T, maxambig=0, maxhomop=8, flip=T, bdiffs=1, pdiffs=2, qwindowaverage=35, qwindowsize=50, fasta=may1.v13.fasta, oligos=may1.v13.oligos, qfile=may1.v13.qual)\n"; helpString += "then, you could enter unique.seqs(fasta=current), and mothur would use the .trim.fasta file from the trim.seqs command. \n"; helpString += "then you could enter align.seqs(candidate=current, template=silva.v13.align, processors=8). , and mothur would use the .trim.unique.fasta file from the unique.seqs command. \n"; helpString += "If no pipeline file is given then mothur will use Pat's pipeline. \n"; helpString += "Here is a list of the commands used in Pat's pipeline.\n"; helpString += "All paralellized commands will use the processors you entered.\n"; helpString += "The sffinfo command takes your sff file and extracts the fasta and quality files.\n"; helpString += "The trim.seqs command uses your oligos file and the quality and fasta files generated by sffinfo.\n"; helpString += "The trim.seqs command sets the following parameters: allfiles=T, maxambig=0, maxhomop=8, flip=T, bdiffs=1, pdiffs=2, qwindowaverage=35, qwindowsize=50.\n"; helpString += "The unique.seqs command uses the trimmed fasta file and removes redundant sequences, don't worry the names file generated by unique.seqs will be used in the pipeline to make sure they are included.\n"; helpString += "The align.seqs command aligns the unique sequences using the aligners default options. \n"; helpString += "The screen.seqs command screens the sequences using optimize=end-minlength. \n"; helpString += "The pipeline uses chimera.slayer to detect chimeras using the default options. \n"; helpString += "The pipeline removes all sequences determined to be chimeric by chimera.slayer. \n"; helpString += "The filter.seqs command filters the sequences using vertical=T, trump=. \n"; helpString += "The unique.seqs command uses the filtered fasta file and name file to remove sequences that have become redundant after filtering.\n"; helpString += "The pre.cluster command clusters sequences that have no more than 2 differences.\n"; helpString += "The dist.seqs command is used to generate a column and phylip formatted distance matrix using cutoff=0.20 for column.\n"; helpString += "The pipeline uses cluster with method=average, hard=T. \n"; helpString += "The classify.seqs command is used to classify the sequences using the bayesian method with a cutoff of 80.\n"; helpString += "The phylotype command is used to cluster the sequences based on their classification.\n"; helpString += "The clearcut command is used to generate a tree using neighbor=T. \n"; helpString += "The summary.single and summary.shared commands are run on the otu files from cluster and phylotype commands. \n"; helpString += "The summary.shared command uses calc=sharednseqs-sharedsobs-sharedchao-sharedace-anderberg-jclass-jest-kulczynski-kulczynskicody-lennon-ochiai-sorclass-sorest-whittaker-braycurtis-jabund-morisitahorn-sorabund-thetan-thetayc. \n"; helpString += "The summary.single command uses calc=nseqs-sobs-coverage-bergerparker-chao-ace-jack-bootstrap-boneh-efron-shen-solow-shannon-npshannon-invsimpson-qstat-simpsoneven-shannoneven-heip-smithwilson. \n"; helpString += "The classify.otu command is used to get the concensus taxonomy for otu files from cluster and phylotype commands. \n"; helpString += "The phylo.diversity command run on the tree generated by clearcut with rarefy=T, iters=100. \n"; helpString += "The unifrac commands are also run on the tree generated by clearcut with random=F, distance=T. \n"; helpString += "\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PipelineCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** PipelineCommand::PipelineCommand(string option) { try { cFactory = CommandFactory::getInstance(); abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("sff"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sff"] = inputDir + it->second; } } it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } it = parameters.find("align"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["align"] = inputDir + it->second; } } it = parameters.find("chimera"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["chimera"] = inputDir + it->second; } } it = parameters.find("classify"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["classify"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("pipeline"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["pipeline"] = inputDir + it->second; } } } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } pipeFilename = validParameter.validFile(parameters, "pipeline", true); if (pipeFilename == "not found") { pipeFilename = ""; } else if (pipeFilename == "not open") { pipeFilename = ""; abort = true; } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if (pipeFilename != "") { abort = readUsersPipeline(); }else{ sffFile = validParameter.validFile(parameters, "sff", true); if (sffFile == "not found") { m->mothurOut("sff is a required parameter for the pipeline command."); m->mothurOutEndLine(); abort = true; } else if (sffFile == "not open") { sffFile = ""; abort = true; } else { m->setSFFFile(sffFile); } oligosFile = validParameter.validFile(parameters, "oligos", true); if (oligosFile == "not found") { m->mothurOut("oligos is a required parameter for the pipeline command."); m->mothurOutEndLine(); abort = true; } else if (oligosFile == "not open") { oligosFile = ""; abort = true; } alignFile = validParameter.validFile(parameters, "align", true); if (alignFile == "not found") { m->mothurOut("align is a required parameter for the pipeline command. Please provide the template to align with."); m->mothurOutEndLine(); abort = true; } else if (alignFile == "not open") { alignFile = ""; abort = true; } chimeraFile = validParameter.validFile(parameters, "chimera", true); if (chimeraFile == "not found") { m->mothurOut("chimera is a required parameter for the pipeline command. Please provide the template to check for chimeras with."); m->mothurOutEndLine(); abort = true; } else if (chimeraFile == "not open") { chimeraFile = ""; abort = true; } classifyFile = validParameter.validFile(parameters, "classify", true); if (classifyFile == "not found") { m->mothurOut("classify is a required parameter for the pipeline command. Please provide the template to use with the classifier."); m->mothurOutEndLine(); abort = true; } else if (classifyFile == "not open") { classifyFile = ""; abort = true; } taxonomyFile = validParameter.validFile(parameters, "taxonomy", true); if (taxonomyFile == "not found") { m->mothurOut("taxonomy is a required parameter for the pipeline command."); m->mothurOutEndLine(); abort = true; } else if (taxonomyFile == "not open") { taxonomyFile = ""; abort = true; } } } } catch(exception& e) { m->errorOut(e, "PipelineCommand", "PipelineCommand"); exit(1); } } //********************************************************************************************************************** int PipelineCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); if (pipeFilename == "") { createPatsPipeline(); //run Pats pipeline for (int i = 0; i < commands.size(); i++) { m->mothurOutEndLine(); m->mothurOut("mothur > " + commands[i]); m->mothurOutEndLine(); if (m->control_pressed) { return 0; } CommandOptionParser parser(commands[i]); string commandName = parser.getCommandString(); string options = parser.getOptionString(); #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if ((cFactory->MPIEnabled(commandName)) || (pid == 0)) { #endif //executes valid command Command* command = cFactory->getCommand(commandName, options, "pipe"); command->execute(); //add output files to list map > thisCommandsFile = command->getOutputFiles(); map >::iterator itMade; for (itMade = thisCommandsFile.begin(); itMade != thisCommandsFile.end(); itMade++) { vector temp = itMade->second; for (int j = 0; j < temp.size(); j++) { outputNames.push_back(temp[j]); } } #ifdef USE_MPI } #endif } }else { runUsersPipeline(); } if (m->control_pressed) { return 0; } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to run the pipeline analysis."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PipelineCommand", "execute"); exit(1); } } //********************************************************************************************************************** bool PipelineCommand::readUsersPipeline(){ try { ifstream in; m->openInputFile(pipeFilename, in); string nextCommand = ""; map > mothurMadeFiles; while(!in.eof()) { nextCommand = m->getline(in); m->gobble(in); if (nextCommand[0] != '#') { bool error = false; string commandName, options; error = parseCommand(nextCommand, commandName, options); if (error) { in.close(); return error; } if (commandName == "pipeline.pds") { m->mothurOut("Cannot run the pipeline.pds command from inside the pipeline.pds command."); m->mothurOutEndLine(); in.close(); return true; } error = checkForValidAndRequiredParameters(commandName, options, mothurMadeFiles); if (error) { in.close(); return error; } } } in.close(); return false; } catch(exception& e) { m->errorOut(e, "PipelineCommand", "readUsersPipeline"); exit(1); } } //********************************************************************************************************************** bool PipelineCommand::parseCommand(string nextCommand, string& name, string& options){ try { CommandOptionParser parser(nextCommand); name = parser.getCommandString(); options = parser.getOptionString(); if (name == "") { return true; } //name == "" if () are not right return false; } catch(exception& e) { m->errorOut(e, "PipelineCommand", "parseCommand"); exit(1); } } //********************************************************************************************************************** bool PipelineCommand::checkForValidAndRequiredParameters(string name, string options, map >& mothurMadeFiles){ try { if (name == "system") { return false; } //get shell of the command so we can check to make sure its valid without running it Command* command = cFactory->getCommand(name); //check to make sure all parameters are valid for command vector validParameters = command->setParameters(); OptionParser parser(options); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; map >::iterator itMade; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, validParameters, it->second) != true) { return true; } // not valid if (it->second == "current") { itMade = mothurMadeFiles.find(it->first); if (itMade == mothurMadeFiles.end()) { m->mothurOut("You have the " + it->first + " listed as a current file for the " + name + " command, but it seems mothur will not make that file in your current pipeline, please correct."); m->mothurOutEndLine(); return true; } } } //is the command missing any required vector commandParameters = command->getParameters(); vector requiredParameters; for (int i = 0; i < commandParameters.size(); i++) { if (commandParameters[i].required) { requiredParameters.push_back(commandParameters[i].name); } } for (int i = 0; i < requiredParameters.size(); i++) { it = parameters.find(requiredParameters[i]); if (it == parameters.end()) { string paraToLookFor = requiredParameters[i]; //does mothur have a current file for this? itMade = mothurMadeFiles.find(requiredParameters[i]); if (itMade == mothurMadeFiles.end()) { m->mothurOut(name + " requires the " + requiredParameters[i] + " parameter, please correct."); m->mothurOutEndLine(); } } } //update MothurMade map > thisCommandsFile = command->getOutputFiles(); for (itMade = thisCommandsFile.begin(); itMade != thisCommandsFile.end(); itMade++) { mothurMadeFiles[itMade->first] = itMade->second; //adds any new types } return false; } catch(exception& e) { m->errorOut(e, "PipelineCommand", "checkForValidAndRequiredParameters"); exit(1); } } //********************************************************************************************************************** int PipelineCommand::runUsersPipeline(){ try { ifstream in; m->openInputFile(pipeFilename, in); string nextCommand = ""; map > mothurMadeFiles; while(!in.eof()) { nextCommand = m->getline(in); m->gobble(in); if (nextCommand[0] != '#') { CommandOptionParser parser(nextCommand); string commandName = parser.getCommandString(); string options = parser.getOptionString(); if ((options != "") && (commandName != "system")) { bool error = fillInMothurMade(options, mothurMadeFiles); if (error) { in.close(); return 0; } } m->mothurOutEndLine(); m->mothurOut("mothur > " + commandName + "(" + options + ")"); m->mothurOutEndLine(); if (m->control_pressed) { return 0; } #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if ((cFactory->MPIEnabled(commandName)) || (pid == 0)) { #endif //executes valid command Command* command = cFactory->getCommand(commandName, options, "pipe"); command->execute(); //add output files to list map > thisCommandsFile = command->getOutputFiles(); map >::iterator itMade; map >::iterator it; for (itMade = thisCommandsFile.begin(); itMade != thisCommandsFile.end(); itMade++) { vector temp = itMade->second; for (int k = 0; k < temp.size(); k++) { outputNames.push_back(temp[k]); } // //update Mothur Made for each file it = mothurMadeFiles.find(itMade->first); if (it == mothurMadeFiles.end()) { //new type mothurMadeFiles[itMade->first] = temp; }else{ //update existing type vector oldFileNames = it->second; //look at new files, see if an old version of the file exists, if so update, else just add. //for example you may have abrecovery.fasta and amazon.fasta as old files and you created a new amazon.trim.fasta. for (int k = 0; k < temp.size(); k++) { //get base name string root = m->getSimpleName(temp[k]); string individual = ""; for(int i=0;ifirst][spot] = temp[k]; }else{ mothurMadeFiles[it->first].push_back(temp[k]); } } } } #ifdef USE_MPI } #endif } } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "PipelineCommand", "runUsersPipeline"); exit(1); } } //********************************************************************************************************************** bool PipelineCommand::fillInMothurMade(string& options, map >& mothurMadeFiles){ try { OptionParser parser(options); map parameters = parser.getParameters(); map::iterator it; map >::iterator itMade; options = ""; //fill in mothurmade filenames for (it = parameters.begin(); it != parameters.end(); it++) { string paraType = it->first; string tempOption = it->second; if (tempOption == "current") { itMade = mothurMadeFiles.find(paraType); if (itMade == mothurMadeFiles.end()) { m->mothurOut("Looking for a current " + paraType + " file, but it seems mothur has not made that file type in your current pipeline, please correct."); m->mothurOutEndLine(); return true; }else{ vector temp = itMade->second; if (temp.size() > 1) { //ask user which file to use m->mothurOut("More than one file has been created for the " + paraType + " parameter. "); m->mothurOutEndLine(); for (int i = 0; i < temp.size(); i++) { m->mothurOut(toString(i) + " - " + temp[i]); m->mothurOutEndLine(); } m->mothurOut("Please select the number of the file you would like to use: "); int num = 0; cin >> num; m->mothurOutJustToLog(toString(num)); m->mothurOutEndLine(); if ((num < 0) || (num > (temp.size()-1))) { m->mothurOut("Not a valid response, quitting."); m->mothurOutEndLine(); return true; } else { tempOption = temp[num]; } //clears buffer so next command doesn't have error string s; getline(cin, s); vector newTemp; for (int i = 0; i < temp.size(); i++) { if (i == num) { newTemp.push_back(temp[i]); } else { m->mothurOut("Would you like to remove " + temp[i] + " as an option for " + paraType + ", (y/n): "); m->mothurOutEndLine(); string response; cin >> response; m->mothurOutJustToLog(response); m->mothurOutEndLine(); if (response == "n") { newTemp.push_back(temp[i]); } //clears buffer so next command doesn't have error string s; getline(cin, s); } } mothurMadeFiles[paraType] = newTemp; }else if (temp.size() == 0){ m->mothurOut("Sorry, we seem to think you created a " + paraType + " file, but it seems mothur doesn't have a filename."); m->mothurOutEndLine(); return true; }else{ tempOption = temp[0]; } } } options += it->first + "=" + tempOption + ", "; } //rip off extra comma options = options.substr(0, (options.length()-2)); return false; } catch(exception& e) { m->errorOut(e, "PipelineCommand", "fillInMothurMade"); exit(1); } } //********************************************************************************************************************** void PipelineCommand::createPatsPipeline(){ try { //sff.info command string thisCommand = "sffinfo(sff=" + sffFile + ")"; commands.push_back(thisCommand); //trim.seqs command string fastaFile = m->getRootName(m->getSimpleName(sffFile)) + "fasta"; string qualFile = m->getRootName(m->getSimpleName(sffFile)) + "qual"; thisCommand = "trim.seqs(processors=" + toString(processors) + ", fasta=current, allfiles=T, maxambig=0, maxhomop=8, flip=T, bdiffs=1, pdiffs=2, qwindowaverage=35, qwindowsize=50, oligos=" + oligosFile + ", qfile=current)"; commands.push_back(thisCommand); //unique.seqs string groupFile = m->getRootName(m->getSimpleName(fastaFile)) + "groups"; qualFile = m->getRootName(m->getSimpleName(fastaFile)) + "trim.qual"; fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "trim.fasta"; thisCommand = "unique.seqs(fasta=current)"; commands.push_back(thisCommand); //align.seqs string nameFile = m->getRootName(m->getSimpleName(fastaFile)) + "names"; fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "unique" + m->getExtension(fastaFile); thisCommand = "align.seqs(processors=" + toString(processors) + ", candidate=current, template=" + alignFile + ")"; commands.push_back(thisCommand); //screen.seqs fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "align"; thisCommand = "screen.seqs(processors=" + toString(processors) + ", fasta=current, name=current, group=current, optimize=end-minlength)"; commands.push_back(thisCommand); //chimera.slayer fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "good" + m->getExtension(fastaFile); nameFile = m->getRootName(m->getSimpleName(nameFile)) + "good" + m->getExtension(nameFile); groupFile = m->getRootName(m->getSimpleName(groupFile)) + "good" + m->getExtension(groupFile); thisCommand = "chimera.slayer(processors=" + toString(processors) + ", fasta=current, template=" + chimeraFile + ")"; commands.push_back(thisCommand); //remove.seqs string accnosFile = m->getRootName(m->getSimpleName(fastaFile)) + "slayer.accnos"; thisCommand = "remove.seqs(fasta=current, name=current, group=current, accnos=current, dups=T)"; commands.push_back(thisCommand); //filter.seqs nameFile = m->getRootName(m->getSimpleName(nameFile)) + "pick" + m->getExtension(nameFile); groupFile = m->getRootName(m->getSimpleName(groupFile)) + "pick" + m->getExtension(groupFile); fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "pick" + m->getExtension(fastaFile); thisCommand = "filter.seqs(processors=" + toString(processors) + ", fasta=current, vertical=T, trump=.)"; commands.push_back(thisCommand); //unique.seqs fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "filter.fasta"; thisCommand = "unique.seqs(fasta=current, name=current)"; commands.push_back(thisCommand); //pre.cluster nameFile = m->getRootName(m->getSimpleName(fastaFile)) + "names"; fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "unique" + m->getExtension(fastaFile); thisCommand = "pre.cluster(fasta=current, name=current, diffs=2)"; commands.push_back(thisCommand); //dist.seqs nameFile = m->getRootName(m->getSimpleName(fastaFile)) + "precluster.names"; fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "precluster" + m->getExtension(fastaFile); thisCommand = "dist.seqs(processors=" + toString(processors) + ", fasta=current, cutoff=0.20)"; commands.push_back(thisCommand); //dist.seqs string columnFile = m->getRootName(m->getSimpleName(fastaFile)) + "dist"; thisCommand = "dist.seqs(processors=" + toString(processors) + ", fasta=current, output=lt)"; commands.push_back(thisCommand); //read.dist string phylipFile = m->getRootName(m->getSimpleName(fastaFile)) + "phylip.dist"; thisCommand = "read.dist(column=current, name=current)"; commands.push_back(thisCommand); //cluster thisCommand = "cluster(method=average, hard=T)"; commands.push_back(thisCommand); string listFile = m->getRootName(m->getSimpleName(columnFile)) + "an.list"; string rabundFile = m->getRootName(m->getSimpleName(columnFile)) + "an.rabund"; //degap.seqs thisCommand = "degap.seqs(fasta=current)"; commands.push_back(thisCommand); //classify.seqs fastaFile = m->getRootName(m->getSimpleName(fastaFile)) + "ng.fasta"; thisCommand = "classify.seqs(processors=" + toString(processors) + ", fasta=current, name=current, template=" + classifyFile + ", taxonomy=" + taxonomyFile + ", cutoff=80)"; commands.push_back(thisCommand); string RippedTaxName = m->getRootName(m->getSimpleName(taxonomyFile)); RippedTaxName = m->getExtension(RippedTaxName.substr(0, RippedTaxName.length()-1)); if (RippedTaxName[0] == '.') { RippedTaxName = RippedTaxName.substr(1, RippedTaxName.length()); } RippedTaxName += "."; string fastaTaxFile = m->getRootName(m->getSimpleName(fastaFile)) + RippedTaxName + "taxonomy"; string taxSummaryFile = m->getRootName(m->getSimpleName(fastaFile)) + RippedTaxName + "tax.summary"; //phylotype thisCommand = "phylotype(taxonomy=current, name=current)"; commands.push_back(thisCommand); string phyloListFile = m->getRootName(m->getSimpleName(fastaTaxFile)) + "tx.list"; string phyloRabundFile = m->getRootName(m->getSimpleName(fastaTaxFile)) + "tx.rabund"; //clearcut thisCommand = "clearcut(phylip=current, neighbor=T)"; commands.push_back(thisCommand); string treeFile = m->getRootName(m->getSimpleName(phylipFile)) + "tre"; //read.otu thisCommand = "make.shared(list=" + listFile + ", group=" + groupFile + ", label=0.03)"; commands.push_back(thisCommand); string sharedFile = m->getRootName(m->getSimpleName(listFile)) + "shared"; //read.otu thisCommand = "make.shared(list=" + phyloListFile + ", group=" + groupFile + ", label=1)"; commands.push_back(thisCommand); string phyloSharedFile = m->getRootName(m->getSimpleName(phyloListFile)) + "shared"; //read.otu thisCommand = "set.current(shared=" + sharedFile + ")"; commands.push_back(thisCommand); //summary.single thisCommand = "summary.single(shared=current, calc=nseqs-sobs-coverage-bergerparker-chao-ace-jack-bootstrap-boneh-efron-shen-solow-shannon-npshannon-invsimpson-qstat-simpsoneven-shannoneven-heip-smithwilson, size=5000)"; commands.push_back(thisCommand); //summary.shared thisCommand = "summary.shared(shared=current, calc=sharednseqs-sharedsobs-sharedchao-sharedace-anderberg-jclass-jest-kulczynski-kulczynskicody-lennon-ochiai-sorclass-sorest-whittaker-braycurtis-jabund-morisitahorn-sorabund-thetan-thetayc)"; commands.push_back(thisCommand); //read.otu //thisCommand = "read.otu(rabund=" + rabundFile + ", label=0.03)"; //commands.push_back(thisCommand); //summary.single thisCommand = "summary.single(rabund=" + rabundFile + ", label=0.03, calc=nseqs-sobs-coverage-bergerparker-chao-ace-jack-bootstrap-boneh-efron-shen-solow-shannon-npshannon-invsimpson-qstat-simpsoneven-shannoneven-heip-smithwilson, size=5000)"; commands.push_back(thisCommand); //read.otu thisCommand = "set.current(shared=" + phyloSharedFile + ")"; commands.push_back(thisCommand); //summary.single thisCommand = "summary.single(shared=current, calc=nseqs-sobs-coverage-bergerparker-chao-ace-jack-bootstrap-boneh-efron-shen-solow-shannon-npshannon-invsimpson-qstat-simpsoneven-shannoneven-heip-smithwilson, size=5000)"; commands.push_back(thisCommand); //summary.shared thisCommand = "summary.shared(shared=current, calc=sharednseqs-sharedsobs-sharedchao-sharedace-anderberg-jclass-jest-kulczynski-kulczynskicody-lennon-ochiai-sorclass-sorest-whittaker-braycurtis-jabund-morisitahorn-sorabund-thetan-thetayc)"; commands.push_back(thisCommand); //read.otu //thisCommand = "read.otu(rabund=" + phyloRabundFile + ", label=1)"; //commands.push_back(thisCommand); //summary.single thisCommand = "summary.single(rabund=" + phyloRabundFile + ", label=1, calc=nseqs-sobs-coverage-bergerparker-chao-ace-jack-bootstrap-boneh-efron-shen-solow-shannon-npshannon-invsimpson-qstat-simpsoneven-shannoneven-heip-smithwilson, size=5000)"; commands.push_back(thisCommand); //classify.otu thisCommand = "classify.otu(taxonomy=" + fastaTaxFile + ", name=" + nameFile + ", list=" + listFile + ", cutoff=51, label=0.03)"; commands.push_back(thisCommand); //classify.otu thisCommand = "classify.otu(taxonomy=" + fastaTaxFile + ", name=" + nameFile + ", list=" + phyloListFile + ", cutoff=51, label=1)"; commands.push_back(thisCommand); //read.tree thisCommand = "set.current(tree=" + treeFile + ", name=" + nameFile + ", group=" + groupFile + ")"; commands.push_back(thisCommand); //phylo.diversity thisCommand = "phylo.diversity(tree=current, group=current, name=current, iters=100,rarefy=T)"; commands.push_back(thisCommand); //unifrac.weighted thisCommand = "unifrac.weighted(tree=current, group=current, name=current, random=false, distance=true, groups=all, processors=" + toString(processors) + ")"; commands.push_back(thisCommand); //unifrac.unweighted thisCommand = "unifrac.unweighted(tree=current, group=current, name=current, random=false, distance=true, processors=" + toString(processors) + ")"; commands.push_back(thisCommand); } catch(exception& e) { m->errorOut(e, "PipelineCommand", "createPatsPipeline"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/pipelinepdscommand.h000066400000000000000000000031471255543666200222700ustar00rootroot00000000000000#ifndef PIPELINEPDSCOMMAND_H #define PIPELINEPDSCOMMAND_H /* * pipelinepdscommand.h * Mothur * * Created by westcott on 10/5/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "commandfactory.hpp" /****************************************************/ class PipelineCommand : public Command { public: PipelineCommand(string); PipelineCommand() { abort = true; calledHelp = true; setParameters(); } ~PipelineCommand(){} vector setParameters(); string getCommandName() { return "pipeline.pds"; } string getCommandCategory() { return "Hidden"; } string getHelpString(); string getOutputPattern(string) { return ""; } string getCitation() { return "Schloss PD, Gevers D, Westcott SL (2011). Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 6:e27310.\nhttp://www.mothur.org/wiki/Pipeline.pds"; } string getDescription() { return "pat's pipeline"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; CommandFactory* cFactory; vector outputNames; vector commands; string outputDir, sffFile, alignFile, oligosFile, taxonomyFile, pipeFilename, classifyFile, chimeraFile; int processors; bool readUsersPipeline(); int runUsersPipeline(); void createPatsPipeline(); bool parseCommand(string, string&, string&); bool checkForValidAndRequiredParameters(string, string, map >&); bool fillInMothurMade(string&, map >&); }; /****************************************************/ #endif mothur-1.36.1/source/commands/preclustercommand.cpp000066400000000000000000001334541255543666200225040ustar00rootroot00000000000000/* * preclustercommand.cpp * Mothur * * Created by westcott on 12/21/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "preclustercommand.h" #include "deconvolutecommand.h" //********************************************************************************************************************** vector PreClusterCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta-name",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pdiffs("diffs", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pdiffs); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter palign("align", "Multiple", "needleman-gotoh-blast-noalign", "needleman", "", "", "","",false,false); parameters.push_back(palign); CommandParameter pmatch("match", "Number", "", "1.0", "", "", "","",false,false); parameters.push_back(pmatch); CommandParameter pmismatch("mismatch", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pmismatch); CommandParameter pgapopen("gapopen", "Number", "", "-2.0", "", "", "","",false,false); parameters.push_back(pgapopen); CommandParameter pgapextend("gapextend", "Number", "", "-1.0", "", "", "","",false,false); parameters.push_back(pgapextend); CommandParameter ptopdown("topdown", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(ptopdown); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PreClusterCommand::getHelpString(){ try { string helpString = ""; helpString += "The pre.cluster command groups sequences that are within a given number of base mismatches.\n"; helpString += "The pre.cluster command outputs a new fasta and name file.\n"; helpString += "The pre.cluster command parameters are fasta, name, group, count, topdown, processors and diffs. The fasta parameter is required. \n"; helpString += "The name parameter allows you to give a list of seqs that are identical. This file is 2 columns, first column is name or representative sequence, second column is a list of its identical sequences separated by commas.\n"; helpString += "The group parameter allows you to provide a group file so you can cluster by group. \n"; helpString += "The count parameter allows you to provide a count file so you can cluster by group. \n"; helpString += "The diffs parameter allows you to specify maximum number of mismatched bases allowed between sequences in a grouping. The default is 1.\n"; helpString += "The topdown parameter allows you to specify whether to cluster from largest abundance to smallest or smallest to largest. Default=T, meaning largest to smallest.\n"; helpString += "The align parameter allows you to specify the alignment method to use. Your options are: gotoh, needleman, blast and noalign. The default is needleman.\n"; helpString += "The match parameter allows you to specify the bonus for having the same base. The default is 1.0.\n"; helpString += "The mistmatch parameter allows you to specify the penalty for having different bases. The default is -1.0.\n"; helpString += "The gapopen parameter allows you to specify the penalty for opening a gap in an alignment. The default is -2.0.\n"; helpString += "The gapextend parameter allows you to specify the penalty for extending a gap in an alignment. The default is -1.0.\n"; helpString += "The pre.cluster command should be in the following format: \n"; helpString += "pre.cluster(fasta=yourFastaFile, names=yourNamesFile, diffs=yourMaxDiffs) \n"; helpString += "Example pre.cluster(fasta=amazon.fasta, diffs=2).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string PreClusterCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],precluster,[extension]"; } else if (type == "name") { pattern = "[filename],precluster.names"; } else if (type == "count") { pattern = "[filename],precluster.count_table"; } else if (type == "map") { pattern = "[filename],precluster.map"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** PreClusterCommand::PreClusterCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["map"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "PreClusterCommand"); exit(1); } } //********************************************************************************************************************** PreClusterCommand::PreClusterCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (map::iterator it2 = parameters.begin(); it2 != parameters.end(); it2++) { if (validParameter.isValidParameter(it2->first, myArray, it2->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["map"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (fastafile == "not open") { abort = true; } else { m->setFastaFile(fastafile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(fastafile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not found") { namefile = ""; } else if (namefile == "not open") { namefile = ""; abort = true; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not found") { groupfile = ""; bygroup = false; } else if (groupfile == "not open") { abort = true; groupfile = ""; } else { m->setGroupFile(groupfile); bygroup = true; } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not found") { countfile = ""; } else if (countfile == "not open") { abort = true; countfile = ""; } else { m->setCountTableFile(countfile); ct.readTable(countfile, true, false); if (ct.hasGroupInfo()) { bygroup = true; } else { bygroup = false; } } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } string temp = validParameter.validFile(parameters, "diffs", false); if(temp == "not found"){ temp = "1"; } m->mothurConvert(temp, diffs); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "topdown", false); if(temp == "not found"){ temp = "T"; } topdown = m->isTrue(temp); temp = validParameter.validFile(parameters, "match", false); if (temp == "not found"){ temp = "1.0"; } m->mothurConvert(temp, match); temp = validParameter.validFile(parameters, "mismatch", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, misMatch); if (misMatch > 0) { m->mothurOut("[ERROR]: mismatch must be negative.\n"); abort=true; } temp = validParameter.validFile(parameters, "gapopen", false); if (temp == "not found"){ temp = "-2.0"; } m->mothurConvert(temp, gapOpen); if (gapOpen > 0) { m->mothurOut("[ERROR]: gapopen must be negative.\n"); abort=true; } temp = validParameter.validFile(parameters, "gapextend", false); if (temp == "not found"){ temp = "-1.0"; } m->mothurConvert(temp, gapExtend); if (gapExtend > 0) { m->mothurOut("[ERROR]: gapextend must be negative.\n"); abort=true; } align = validParameter.validFile(parameters, "align", false); if (align == "not found"){ align = "needleman"; } method = "unaligned"; if (countfile == "") { if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "PreClusterCommand"); exit(1); } } //********************************************************************************************************************** int PreClusterCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); if(align == "gotoh") { alignment = new GotohOverlap(gapOpen, gapExtend, match, misMatch, 1000); } else if(align == "needleman") { alignment = new NeedlemanOverlap(gapOpen, match, misMatch, 1000); } else if(align == "blast") { alignment = new BlastAlignment(gapOpen, gapExtend, match, misMatch); } else if(align == "noalign") { alignment = new NoAlign(); } else { m->mothurOut(align + " is not a valid alignment option. I will run the command using needleman."); m->mothurOutEndLine(); alignment = new NeedlemanOverlap(gapOpen, match, misMatch, 1000); } string fileroot = outputDir + m->getRootName(m->getSimpleName(fastafile)); map variables; variables["[filename]"] = fileroot; string newNamesFile = getOutputFileName("name",variables); string newCountFile = getOutputFileName("count",variables); string newMapFile = getOutputFileName("map",variables); //add group name if by group variables["[extension]"] = m->getExtension(fastafile); string newFastaFile = getOutputFileName("fasta", variables); outputNames.push_back(newFastaFile); outputTypes["fasta"].push_back(newFastaFile); if (countfile == "") { outputNames.push_back(newNamesFile); outputTypes["name"].push_back(newNamesFile); } else { outputNames.push_back(newCountFile); outputTypes["count"].push_back(newCountFile); } if (bygroup) { //clear out old files ofstream outFasta; m->openOutputFile(newFastaFile, outFasta); outFasta.close(); ofstream outNames; m->openOutputFile(newNamesFile, outNames); outNames.close(); newMapFile = fileroot + "precluster."; //parse fasta and name file by group vector groups; if (countfile != "") { cparser = new SequenceCountParser(countfile, fastafile); groups = cparser->getNamesOfGroups(); }else { if (namefile != "") { parser = new SequenceParser(groupfile, fastafile, namefile); } else { parser = new SequenceParser(groupfile, fastafile); } groups = parser->getNamesOfGroups(); } if(processors == 1) { driverGroups(newFastaFile, newNamesFile, newMapFile, 0, groups.size(), groups); } else { createProcessesGroups(newFastaFile, newNamesFile, newMapFile, groups); } if (countfile != "") { mergeGroupCounts(newCountFile, newNamesFile, newFastaFile); delete cparser; }else { delete parser; //run unique.seqs for deconvolute results string inputString = "fasta=" + newFastaFile; if (namefile != "") { inputString += ", name=" + newNamesFile; } m->mothurOutEndLine(); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: unique.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* uniqueCommand = new DeconvoluteCommand(inputString); uniqueCommand->execute(); map > filenames = uniqueCommand->getOutputFiles(); delete uniqueCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->renameFile(filenames["fasta"][0], newFastaFile); m->renameFile(filenames["name"][0], newNamesFile); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete alignment; return 0; } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to run pre.cluster."); m->mothurOutEndLine(); }else { if (processors != 1) { m->mothurOut("When using running without group information mothur can only use 1 processor, continuing."); m->mothurOutEndLine(); processors = 1; } if (namefile != "") { readNameFile(); } //reads fasta file and return number of seqs int numSeqs = readFASTA(); //fills alignSeqs and makes all seqs active if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete alignment; return 0; } if (numSeqs == 0) { m->mothurOut("Error reading fasta file...please correct."); m->mothurOutEndLine(); delete alignment; return 0; } if (diffs > length) { m->mothurOut("Error: diffs is greater than your sequence length."); m->mothurOutEndLine(); delete alignment; return 0; } int count = process(newMapFile); outputNames.push_back(newMapFile); outputTypes["map"].push_back(newMapFile); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete alignment; return 0; } m->mothurOut("Total number of sequences before precluster was " + toString(alignSeqs.size()) + "."); m->mothurOutEndLine(); m->mothurOut("pre.cluster removed " + toString(count) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); if (countfile != "") { newNamesFile = newCountFile; } printData(newFastaFile, newNamesFile, ""); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to cluster " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete alignment; return 0; } delete alignment; m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "execute"); exit(1); } } /**************************************************************************************************/ int PreClusterCommand::createProcessesGroups(string newFName, string newNName, string newMFile, vector groups) { try { vector processIDS; int process = 1; int num = 0; bool recalc = false; //sanity check if (groups.size() < processors) { processors = groups.size(); } //divide the groups between the processors vector lines; int remainingPairs = groups.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ outputNames.clear(); num = driverGroups(newFName + m->mothurGetpid(process) + ".temp", newNName + m->mothurGetpid(process) + ".temp", newMFile, lines[process].start, lines[process].end, groups); string tempFile = m->mothurGetpid(process) + ".outputNames.temp"; ofstream outTemp; m->openOutputFile(tempFile, outTemp); outTemp << outputNames.size(); for (int i = 0; i < outputNames.size(); i++) { outTemp << outputNames[i] << endl; } outTemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".outputNames.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".outputNames.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); num = 0; processIDS.resize(0); process = 1; int remainingPairs = groups.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ outputNames.clear(); num = driverGroups(newFName + m->mothurGetpid(process) + ".temp", newNName + m->mothurGetpid(process) + ".temp", newMFile, lines[process].start, lines[process].end, groups); string tempFile = m->mothurGetpid(process) + ".outputNames.temp"; ofstream outTemp; m->openOutputFile(tempFile, outTemp); outTemp << outputNames.size(); for (int i = 0; i < outputNames.size(); i++) { outTemp << outputNames[i] << endl; } outTemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part num = driverGroups(newFName, newNName, newMFile, lines[0].start, lines[0].end, groups); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, intemp); int num; intemp >> num; for (int k = 0; k < num; k++) { string name = ""; intemp >> name; m->gobble(intemp); outputNames.push_back(name); outputTypes["map"].push_back(name); } intemp.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the preClusterData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; icount != (pDataArray[i]->end-pDataArray[i]->start)) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end-pDataArray[i]->start) + " groups assigned to it, quitting. \n"); m->control_pressed = true; } for (int j = 0; j < pDataArray[i]->mapFileNames.size(); j++) { outputNames.push_back(pDataArray[i]->mapFileNames[j]); outputTypes["map"].push_back(pDataArray[i]->mapFileNames[j]); } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //append output files for(int i=0;igetFullPathName(".\\" + newFName); //newNName = m->getFullPathName(".\\" + newNName); m->appendFiles((newFName + toString(processIDS[i]) + ".temp"), newFName); m->mothurRemove((newFName + toString(processIDS[i]) + ".temp")); m->appendFiles((newNName + toString(processIDS[i]) + ".temp"), newNName); m->mothurRemove((newNName + toString(processIDS[i]) + ".temp")); } return num; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "createProcessesGroups"); exit(1); } } /**************************************************************************************************/ int PreClusterCommand::driverGroups(string newFFile, string newNFile, string newMFile, int start, int end, vector groups){ try { int numSeqs = 0; //precluster each group for (int i = start; i < end; i++) { start = time(NULL); if (m->control_pressed) { return 0; } m->mothurOutEndLine(); m->mothurOut("Processing group " + groups[i] + ":"); m->mothurOutEndLine(); map thisNameMap; vector thisSeqs; if (groupfile != "") { thisSeqs = parser->getSeqs(groups[i]); }else if (countfile != "") { thisSeqs = cparser->getSeqs(groups[i]); } if (namefile != "") { thisNameMap = parser->getNameMap(groups[i]); } //fill alignSeqs with this groups info. numSeqs = loadSeqs(thisNameMap, thisSeqs, groups[i]); if (m->control_pressed) { return 0; } if (method == "aligned") { if (diffs > length) { m->mothurOut("Error: diffs is greater than your sequence length."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } } int count= process(newMFile+groups[i]+".map"); outputNames.push_back(newMFile+groups[i]+".map"); outputTypes["map"].push_back(newMFile+groups[i]+".map"); if (m->control_pressed) { return 0; } m->mothurOut("Total number of sequences before pre.cluster was " + toString(alignSeqs.size()) + "."); m->mothurOutEndLine(); m->mothurOut("pre.cluster removed " + toString(count) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); printData(newFFile, newNFile, groups[i]); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to cluster " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); } return numSeqs; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "driverGroups"); exit(1); } } /**************************************************************************************************/ int PreClusterCommand::process(string newMapFile){ try { ofstream out; m->openOutputFile(newMapFile, out); //sort seqs by number of identical seqs if (topdown) { sort(alignSeqs.begin(), alignSeqs.end(), comparePriorityTopDown); } else { sort(alignSeqs.begin(), alignSeqs.end(), comparePriorityDownTop); } int count = 0; int numSeqs = alignSeqs.size(); if (topdown) { //think about running through twice... for (int i = 0; i < numSeqs; i++) { if (alignSeqs[i].active) { //this sequence has not been merged yet string chunk = alignSeqs[i].seq.getName() + "\t" + toString(alignSeqs[i].numIdentical) + "\t" + toString(0) + "\t" + alignSeqs[i].seq.getAligned() + "\n"; //try to merge it with all smaller seqs for (int j = i+1; j < numSeqs; j++) { if (m->control_pressed) { out.close(); return 0; } if (alignSeqs[j].active) { //this sequence has not been merged yet //are you within "diff" bases int mismatch = calcMisMatches(alignSeqs[i].seq.getAligned(), alignSeqs[j].seq.getAligned()); if (mismatch <= diffs) { //merge alignSeqs[i].names += ',' + alignSeqs[j].names; alignSeqs[i].numIdentical += alignSeqs[j].numIdentical; chunk += alignSeqs[j].seq.getName() + "\t" + toString(alignSeqs[j].numIdentical) + "\t" + toString(mismatch) + "\t" + alignSeqs[j].seq.getAligned() + "\n"; alignSeqs[j].active = 0; alignSeqs[j].numIdentical = 0; count++; } }//end if j active }//end for loop j //remove from active list alignSeqs[i].active = 0; out << "ideal_seq_" << (i+1) << '\t' << alignSeqs[i].numIdentical << endl << chunk << endl; }//end if active i if(i % 100 == 0) { m->mothurOutJustToScreen(toString(i) + "\t" + toString(numSeqs - count) + "\t" + toString(count)+"\n"); } } }else { map mapFile; map originalCount; map::iterator itCount; for (int i = 0; i < numSeqs; i++) { mapFile[i] = ""; originalCount[i] = alignSeqs[i].numIdentical; } //think about running through twice... for (int i = 0; i < numSeqs; i++) { //try to merge it into larger seqs for (int j = i+1; j < numSeqs; j++) { if (m->control_pressed) { out.close(); return 0; } if (originalCount[j] > originalCount[i]) { //this sequence is more abundant than I am //are you within "diff" bases int mismatch = calcMisMatches(alignSeqs[i].seq.getAligned(), alignSeqs[j].seq.getAligned()); if (mismatch <= diffs) { //merge alignSeqs[j].names += ',' + alignSeqs[i].names; alignSeqs[j].numIdentical += alignSeqs[i].numIdentical; mapFile[j] = alignSeqs[i].seq.getName() + "\t" + toString(alignSeqs[i].numIdentical) + "\t" + toString(mismatch) + "\t" + alignSeqs[i].seq.getAligned() + "\n" + mapFile[i]; alignSeqs[i].numIdentical = 0; originalCount.erase(i); mapFile[i] = ""; count++; j+=numSeqs; //exit search, we merged this one in. } }//end abundance check }//end for loop j if(i % 100 == 0) { m->mothurOutJustToScreen(toString(i) + "\t" + toString(numSeqs - count) + "\t" + toString(count)+"\n"); } } for (int i = 0; i < numSeqs; i++) { if (alignSeqs[i].numIdentical != 0) { out << "ideal_seq_" << (i+1) << '\t' << alignSeqs[i].numIdentical << endl << alignSeqs[i].seq.getName() + "\t" + toString(alignSeqs[i].numIdentical) + "\t" + toString(0) + "\t" + alignSeqs[i].seq.getAligned() + "\n" << mapFile[i] << endl; } } } out.close(); if(numSeqs % 100 != 0) { m->mothurOut(toString(numSeqs) + "\t" + toString(numSeqs - count) + "\t" + toString(count)); m->mothurOutEndLine(); } return count; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "process"); exit(1); } } /**************************************************************************************************/ int PreClusterCommand::readFASTA(){ try { //ifstream inNames; ifstream inFasta; m->openInputFile(fastafile, inFasta); set lengths; while (!inFasta.eof()) { if (m->control_pressed) { inFasta.close(); return 0; } Sequence seq(inFasta); m->gobble(inFasta); if (seq.getName() != "") { //can get "" if commented line is at end of fasta file if (namefile != "") { itSize = sizes.find(seq.getName()); if (itSize == sizes.end()) { m->mothurOut(seq.getName() + " is not in your names file, please correct."); m->mothurOutEndLine(); exit(1); } else{ seqPNode tempNode(itSize->second, seq, names[seq.getName()]); alignSeqs.push_back(tempNode); lengths.insert(seq.getAligned().length()); } }else { //no names file, you are identical to yourself int numRep = 1; if (countfile != "") { numRep = ct.getNumSeqs(seq.getName()); } seqPNode tempNode(numRep, seq, seq.getName()); alignSeqs.push_back(tempNode); lengths.insert(seq.getAligned().length()); } } } inFasta.close(); if (lengths.size() > 1) { method = "unaligned"; } else if (lengths.size() == 1) { method = "aligned"; } length = *(lengths.begin()); return alignSeqs.size(); } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "readFASTA"); exit(1); } } /**************************************************************************************************/ int PreClusterCommand::loadSeqs(map& thisName, vector& thisSeqs, string group){ try { set lengths; alignSeqs.clear(); map::iterator it; bool error = false; map thisCount; if (countfile != "") { thisCount = cparser->getCountTable(group); } for (int i = 0; i < thisSeqs.size(); i++) { if (m->control_pressed) { return 0; } if (namefile != "") { it = thisName.find(thisSeqs[i].getName()); //should never be true since parser checks for this if (it == thisName.end()) { m->mothurOut(thisSeqs[i].getName() + " is not in your names file, please correct."); m->mothurOutEndLine(); error = true; } else{ //get number of reps int numReps = 1; for(int j=0;j<(it->second).length();j++){ if((it->second)[j] == ','){ numReps++; } } seqPNode tempNode(numReps, thisSeqs[i], it->second); alignSeqs.push_back(tempNode); lengths.insert(thisSeqs[i].getAligned().length()); } }else { //no names file, you are identical to yourself int numRep = 1; if (countfile != "") { map::iterator it2 = thisCount.find(thisSeqs[i].getName()); //should never be true since parser checks for this if (it2 == thisCount.end()) { m->mothurOut(thisSeqs[i].getName() + " is not in your count file, please correct."); m->mothurOutEndLine(); error = true; } else { numRep = it2->second; } } seqPNode tempNode(numRep, thisSeqs[i], thisSeqs[i].getName()); alignSeqs.push_back(tempNode); lengths.insert(thisSeqs[i].getAligned().length()); } } if (lengths.size() > 1) { method = "unaligned"; } else if (lengths.size() == 1) { method = "aligned"; } length = *(lengths.begin()); //sanity check if (error) { m->control_pressed = true; } thisSeqs.clear(); return alignSeqs.size(); } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "loadSeqs"); exit(1); } } /**************************************************************************************************/ int PreClusterCommand::calcMisMatches(string seq1, string seq2){ try { int numBad = 0; if (method == "unaligned") { //align to eachother Sequence seqI("seq1", seq1); Sequence seqJ("seq2", seq2); //align seq2 to seq1 - less abundant to more abundant alignment->align(seqJ.getUnaligned(), seqI.getUnaligned()); seq2 = alignment->getSeqAAln(); seq1 = alignment->getSeqBAln(); //chop gap ends int startPos = 0; int endPos = seq2.length()-1; for (int i = 0; i < seq2.length(); i++) { if (isalpha(seq2[i])) { startPos = i; break; } } for (int i = seq2.length()-1; i >= 0; i--) { if (isalpha(seq2[i])) { endPos = i; break; } } //count number of diffs for (int i = startPos; i <= endPos; i++) { if (seq2[i] != seq1[i]) { numBad++; } if (numBad > diffs) { return length; } //to far to cluster } }else { //count diffs for (int i = 0; i < seq1.length(); i++) { //do they match if (seq1[i] != seq2[i]) { numBad++; } if (numBad > diffs) { return length; } //to far to cluster } } return numBad; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "calcMisMatches"); exit(1); } } /**************************************************************************************************/ int PreClusterCommand::mergeGroupCounts(string newcount, string newname, string newfasta){ try { ifstream inNames; m->openInputFile(newname, inNames); string group, first, second; set uniqueNames; while (!inNames.eof()) { if (m->control_pressed) { break; } inNames >> group; m->gobble(inNames); inNames >> first; m->gobble(inNames); inNames >> second; m->gobble(inNames); vector names; m->splitAtComma(second, names); uniqueNames.insert(first); int total = ct.getGroupCount(first, group); for (int i = 1; i < names.size(); i++) { total += ct.getGroupCount(names[i], group); ct.setAbund(names[i], group, 0); } ct.setAbund(first, group, total); } inNames.close(); vector namesOfSeqs = ct.getNamesOfSeqs(); for (int i = 0; i < namesOfSeqs.size(); i++) { if (ct.getNumSeqs(namesOfSeqs[i]) == 0) { ct.remove(namesOfSeqs[i]); } } ct.printTable(newcount); m->mothurRemove(newname); if (bygroup) { //if by group, must remove the duplicate seqs that are named the same ifstream in; m->openInputFile(newfasta, in); ofstream out; m->openOutputFile(newfasta+"temp", out); int count = 0; set already; while(!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { count++; if (already.count(seq.getName()) == 0) { seq.printSequence(out); already.insert(seq.getName()); } } } in.close(); out.close(); m->mothurRemove(newfasta); m->renameFile(newfasta+"temp", newfasta); } return 0; } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "mergeGroupCounts"); exit(1); } } /**************************************************************************************************/ void PreClusterCommand::printData(string newfasta, string newname, string group){ try { ofstream outFasta; ofstream outNames; if (bygroup) { m->openOutputFileAppend(newfasta, outFasta); m->openOutputFileAppend(newname, outNames); }else { m->openOutputFile(newfasta, outFasta); m->openOutputFile(newname, outNames); } if ((countfile != "") && (group == "")) { outNames << "Representative_Sequence\ttotal\n"; } for (int i = 0; i < alignSeqs.size(); i++) { if (alignSeqs[i].numIdentical != 0) { alignSeqs[i].seq.printSequence(outFasta); if (countfile != "") { if (group != "") { outNames << group << '\t' << alignSeqs[i].seq.getName() << '\t' << alignSeqs[i].names << endl; } else { outNames << alignSeqs[i].seq.getName() << '\t' << alignSeqs[i].numIdentical << endl; } }else { outNames << alignSeqs[i].seq.getName() << '\t' << alignSeqs[i].names << endl; } } } outFasta.close(); outNames.close(); } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "printData"); exit(1); } } /**************************************************************************************************/ void PreClusterCommand::readNameFile(){ try { ifstream in; m->openInputFile(namefile, in); string firstCol, secondCol; while (!in.eof()) { in >> firstCol >> secondCol; m->gobble(in); m->checkName(firstCol); m->checkName(secondCol); int size = m->getNumNames(secondCol); names[firstCol] = secondCol; sizes[firstCol] = size; } in.close(); } catch(exception& e) { m->errorOut(e, "PreClusterCommand", "readNameFile"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/commands/preclustercommand.h000066400000000000000000000543011255543666200221420ustar00rootroot00000000000000#ifndef PRECLUSTERCOMMAND_H #define PRECLUSTERCOMMAND_H /* * preclustercommand.h * Mothur * * Created by westcott on 12/21/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sequence.hpp" #include "sequenceparser.h" #include "sequencecountparser.h" #include "alignment.hpp" #include "gotohoverlap.hpp" #include "needlemanoverlap.hpp" #include "blastalign.hpp" #include "noalign.hpp" /************************************************************/ struct seqPNode { int numIdentical; Sequence seq; string names; bool active; int diffs; seqPNode() {} seqPNode(int n, Sequence s, string nm) : numIdentical(n), seq(s), names(nm), active(1) { diffs = 0; } ~seqPNode() {} }; /************************************************************/ inline bool comparePriorityTopDown(seqPNode first, seqPNode second) { if (first.numIdentical > second.numIdentical) { return true; } else if (first.numIdentical == second.numIdentical) { if (first.seq.getName() > second.seq.getName()) { return true; } } return false; } /************************************************************/ inline bool comparePriorityDownTop(seqPNode first, seqPNode second) { if (first.numIdentical < second.numIdentical) { return true; } else if (first.numIdentical == second.numIdentical) { if (first.seq.getName() > second.seq.getName()) { return true; } } return false; } //************************************************************/ class PreClusterCommand : public Command { public: PreClusterCommand(string); PreClusterCommand(); ~PreClusterCommand(){} vector setParameters(); string getCommandName() { return "pre.cluster"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Gevers D, Westcott SL (2011). Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 6:e27310.\nhttp://www.mothur.org/wiki/Pre.cluster"; } string getDescription() { return "implements a pseudo-single linkage algorithm with the goal of removing sequences that are likely due to pyrosequencing errors"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: SequenceParser* parser; SequenceCountParser* cparser; CountTable ct; Alignment* alignment; int diffs, length, processors; float match, misMatch, gapOpen, gapExtend; bool abort, bygroup, topdown; string fastafile, namefile, outputDir, groupfile, countfile, method, align; vector alignSeqs; //maps the number of identical seqs to a sequence map names; //represents the names file first column maps to second column map sizes; //this map a seq name to the number of identical seqs in the names file map::iterator itSize; // map active; //maps sequence name to whether it has already been merged or not. vector outputNames; int readFASTA(); void readNameFile(); //int readNamesFASTA(); int calcMisMatches(string, string); void printData(string, string, string); //fasta filename, names file name int process(string); int loadSeqs(map&, vector&, string); int driverGroups(string, string, string, int, int, vector groups); int createProcessesGroups(string, string, string, vector); int mergeGroupCounts(string, string, string); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct preClusterData { string fastafile; string namefile; string groupfile, countfile; string newFName, newNName, newMName, method, align; MothurOut* m; int start; int end, count; int diffs, threadID; vector groups; vector mapFileNames; bool topdown; float match, misMatch, gapOpen, gapExtend; preClusterData(){} preClusterData(string f, string n, string g, string c, string nff, string nnf, string nmf, vector gr, MothurOut* mout, int st, int en, int d, bool td, int tid, string me, string al, float ma, float misma, float gpOp, float gpEx) { fastafile = f; namefile = n; groupfile = g; newFName = nff; newNName = nnf; newMName = nmf; m = mout; start = st; end = en; diffs = d; threadID = tid; groups = gr; countfile = c; topdown = td; count=0; method = me; align = al; match = ma; misMatch = misma; gapExtend = gpEx; gapOpen = gpOp; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyPreclusterThreadFunction(LPVOID lpParam){ preClusterData* pDataArray; pDataArray = (preClusterData*)lpParam; try { Alignment* alignment; if(pDataArray->align == "gotoh") { alignment = new GotohOverlap(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch, 1000); } else if(pDataArray->align == "needleman") { alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, 1000); } else if(pDataArray->align == "blast") { alignment = new BlastAlignment(pDataArray->gapOpen, pDataArray->gapExtend, pDataArray->match, pDataArray->misMatch); } else if(pDataArray->align == "noalign") { alignment = new NoAlign(); } else { pDataArray->m->mothurOut(pDataArray->align + " is not a valid alignment option. I will run the command using needleman."); pDataArray->m->mothurOutEndLine(); alignment = new NeedlemanOverlap(pDataArray->gapOpen, pDataArray->match, pDataArray->misMatch, 1000); } //parse fasta and name file by group SequenceParser* parser; SequenceCountParser* cparser; if (pDataArray->countfile != "") { cparser = new SequenceCountParser(pDataArray->countfile, pDataArray->fastafile); }else { if (pDataArray->namefile != "") { parser = new SequenceParser(pDataArray->groupfile, pDataArray->fastafile, pDataArray->namefile); } else { parser = new SequenceParser(pDataArray->groupfile, pDataArray->fastafile); } } int numSeqs = 0; vector alignSeqs; //clear out old files ofstream outF; pDataArray->m->openOutputFile(pDataArray->newFName, outF); outF.close(); ofstream outN; pDataArray->m->openOutputFile(pDataArray->newNName, outN); outN.close(); //precluster each group for (int k = pDataArray->start; k < pDataArray->end; k++) { pDataArray->count++; int start = time(NULL); if (pDataArray->m->control_pressed) { delete parser; delete alignment;return 0; } pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("Processing group " + pDataArray->groups[k] + ":"); pDataArray->m->mothurOutEndLine(); map thisNameMap; vector thisSeqs; if (pDataArray->groupfile != "") { thisSeqs = parser->getSeqs(pDataArray->groups[k]); }else if (pDataArray->countfile != "") { thisSeqs = cparser->getSeqs(pDataArray->groups[k]); } if (pDataArray->namefile != "") { thisNameMap = parser->getNameMap(pDataArray->groups[k]); } //fill alignSeqs with this groups info. //////////////////////////////////////////////////// //numSeqs = loadSeqs(thisNameMap, thisSeqs); same function below int length = 0; set lengths; alignSeqs.clear(); map::iterator it; bool error = false; map thisCount; if (pDataArray->countfile != "") { thisCount = cparser->getCountTable(pDataArray->groups[k]); } for (int i = 0; i < thisSeqs.size(); i++) { if (pDataArray->m->control_pressed) { delete parser; delete alignment; return 0; } if (pDataArray->namefile != "") { it = thisNameMap.find(thisSeqs[i].getName()); //should never be true since parser checks for this if (it == thisNameMap.end()) { pDataArray->m->mothurOut(thisSeqs[i].getName() + " is not in your names file, please correct."); pDataArray->m->mothurOutEndLine(); error = true; } else{ //get number of reps int numReps = 1; for(int j=0;j<(it->second).length();j++){ if((it->second)[j] == ','){ numReps++; } } seqPNode tempNode(numReps, thisSeqs[i], it->second); alignSeqs.push_back(tempNode); lengths.insert(thisSeqs[i].getAligned().length()); } }else { //no names file, you are identical to yourself int numRep = 1; if (pDataArray->countfile != "") { map::iterator it2 = thisCount.find(thisSeqs[i].getName()); //should never be true since parser checks for this if (it2 == thisCount.end()) { pDataArray->m->mothurOut(thisSeqs[i].getName() + " is not in your count file, please correct."); pDataArray->m->mothurOutEndLine(); error = true; } else { numRep = it2->second; } } seqPNode tempNode(numRep, thisSeqs[i], thisSeqs[i].getName()); alignSeqs.push_back(tempNode); lengths.insert(thisSeqs[i].getAligned().length()); } } if (lengths.size() > 1) { pDataArray->method = "unaligned"; } else if (lengths.size() == 1) { pDataArray->method = "aligned"; } length = *(lengths.begin()); //sanity check if (error) { pDataArray->m->control_pressed = true; } thisSeqs.clear(); numSeqs = alignSeqs.size(); //////////////////////////////////////////////////// if (pDataArray->m->control_pressed) { delete parser; delete alignment; return 0; } if (pDataArray->method == "aligned") { if (pDataArray->diffs > length) { pDataArray->m->mothurOut("Error: diffs is greater than your sequence length."); pDataArray->m->mothurOutEndLine(); pDataArray->m->control_pressed = true; delete alignment; return 0; } } //////////////////////////////////////////////////// //int count = process(); - same function below ofstream out; pDataArray->m->openOutputFile(pDataArray->newMName+pDataArray->groups[k]+".map", out); pDataArray->mapFileNames.push_back(pDataArray->newMName+pDataArray->groups[k]+".map"); //sort seqs by number of identical seqs if (pDataArray->topdown) { sort(alignSeqs.begin(), alignSeqs.end(), comparePriorityTopDown); } else { sort(alignSeqs.begin(), alignSeqs.end(), comparePriorityDownTop); } int count = 0; if (pDataArray->topdown) { //think about running through twice... for (int i = 0; i < numSeqs; i++) { //are you active // itActive = active.find(alignSeqs[i].seq.getName()); if (alignSeqs[i].active) { //this sequence has not been merged yet string chunk = alignSeqs[i].seq.getName() + "\t" + toString(alignSeqs[i].numIdentical) + "\t" + toString(0) + "\t" + alignSeqs[i].seq.getAligned() + "\n"; //try to merge it with all smaller seqs for (int j = i+1; j < numSeqs; j++) { if (pDataArray->m->control_pressed) { delete parser; delete alignment; return 0; } if (alignSeqs[j].active) { //this sequence has not been merged yet //are you within "diff" bases //int mismatch = calcMisMatches(alignSeqs[i].seq.getAligned(), alignSeqs[j].seq.getAligned()); //////////////////////////////////////////////////// int mismatch = 0; if (pDataArray->method == "unaligned") { //align to eachother Sequence seqI("seq1", alignSeqs[i].seq.getAligned()); Sequence seqJ("seq2", alignSeqs[j].seq.getAligned()); //align seq2 to seq1 - less abundant to more abundant alignment->align(seqJ.getUnaligned(), seqI.getUnaligned()); string seq2 = alignment->getSeqAAln(); string seq1 = alignment->getSeqBAln(); //chop gap ends int startPos = 0; int endPos = seq2.length()-1; for (int i = 0; i < seq2.length(); i++) { if (isalpha(seq2[i])) { startPos = i; break; } } for (int i = seq2.length()-1; i >= 0; i--) { if (isalpha(seq2[i])) { endPos = i; break; } } //count number of diffs for (int i = startPos; i <= endPos; i++) { if (seq2[i] != seq1[i]) { mismatch++; } if (mismatch > pDataArray->diffs) { mismatch = length; break; } //to far to cluster } }else { for (int k = 0; k < alignSeqs[i].seq.getAligned().length(); k++) { //do they match if (alignSeqs[i].seq.getAligned()[k] != alignSeqs[j].seq.getAligned()[k]) { mismatch++; } if (mismatch > pDataArray->diffs) { mismatch = length; break; } //to far to cluster } } //////////////////////////////////////////////////// if (mismatch <= pDataArray->diffs) { //merge alignSeqs[i].names += ',' + alignSeqs[j].names; alignSeqs[i].numIdentical += alignSeqs[j].numIdentical; alignSeqs[j].active = 0; alignSeqs[j].numIdentical = 0; alignSeqs[j].diffs = mismatch; count++; chunk += alignSeqs[j].seq.getName() + "\t" + toString(alignSeqs[j].numIdentical) + "\t" + toString(mismatch) + "\t" + alignSeqs[j].seq.getAligned() + "\n"; } }//end if j active }//end for loop j //remove from active list alignSeqs[i].active = 0; out << "ideal_seq_" << (i+1) << '\t' << alignSeqs[i].numIdentical << endl << chunk << endl; }//end if active i if(i % 100 == 0) { pDataArray->m->mothurOutJustToScreen(toString(i) + "\t" + toString(numSeqs - count) + "\t" + toString(count)+"\n"); } } }else { map mapFile; map originalCount; map::iterator itCount; for (int i = 0; i < numSeqs; i++) { mapFile[i] = ""; originalCount[i] = alignSeqs[i].numIdentical; } //think about running through twice... for (int i = 0; i < numSeqs; i++) { //try to merge it into larger seqs for (int j = i+1; j < numSeqs; j++) { if (pDataArray->m->control_pressed) { out.close(); delete alignment; return 0; } if (originalCount[j] > originalCount[i]) { //this sequence is more abundant than I am //are you within "diff" bases //int mismatch = calcMisMatches(alignSeqs[i].seq.getAligned(), alignSeqs[j].seq.getAligned()); int mismatch = 0; if (pDataArray->method == "unaligned") { //align to eachother Sequence seqI("seq1", alignSeqs[i].seq.getAligned()); Sequence seqJ("seq2", alignSeqs[j].seq.getAligned()); //align seq2 to seq1 - less abundant to more abundant alignment->align(seqI.getUnaligned(), seqJ.getUnaligned()); string seq2 = alignment->getSeqAAln(); string seq1 = alignment->getSeqBAln(); //chop gap ends int startPos = 0; int endPos = seq2.length()-1; for (int i = 0; i < seq2.length(); i++) { if (isalpha(seq2[i])) { startPos = i; break; } } for (int i = seq2.length()-1; i >= 0; i--) { if (isalpha(seq2[i])) { endPos = i; break; } } //count number of diffs for (int i = startPos; i <= endPos; i++) { if (seq2[i] != seq1[i]) { mismatch++; } if (mismatch > pDataArray->diffs) { mismatch = length; break; } //to far to cluster } }else { for (int k = 0; k < alignSeqs[i].seq.getAligned().length(); k++) { //do they match if (alignSeqs[i].seq.getAligned()[k] != alignSeqs[j].seq.getAligned()[k]) { mismatch++; } if (mismatch > pDataArray->diffs) { mismatch = length; break; } //to far to cluster } } if (mismatch <= pDataArray->diffs) { //merge alignSeqs[j].names += ',' + alignSeqs[i].names; alignSeqs[j].numIdentical += alignSeqs[i].numIdentical; mapFile[j] = alignSeqs[i].seq.getName() + "\t" + toString(alignSeqs[i].numIdentical) + "\t" + toString(mismatch) + "\t" + alignSeqs[i].seq.getAligned() + "\n" + mapFile[i]; alignSeqs[i].numIdentical = 0; originalCount.erase(i); mapFile[i] = ""; count++; j+=numSeqs; //exit search, we merged this one in. } }//end abundance check }//end for loop j if(i % 100 == 0) { pDataArray->m->mothurOutJustToScreen(toString(i) + "\t" + toString(numSeqs - count) + "\t" + toString(count)+"\n"); } } for (int i = 0; i < numSeqs; i++) { if (alignSeqs[i].numIdentical != 0) { out << "ideal_seq_" << (i+1) << '\t' << alignSeqs[i].numIdentical << endl << alignSeqs[i].seq.getName() + "\t" + toString(alignSeqs[i].numIdentical) + "\t" + toString(0) + "\t" + alignSeqs[i].seq.getAligned() + "\n" << mapFile[i] << endl; } } } out.close(); if(numSeqs % 100 != 0) { pDataArray->m->mothurOut(toString(numSeqs) + "\t" + toString(numSeqs - count) + "\t" + toString(count)); pDataArray->m->mothurOutEndLine(); } //////////////////////////////////////////////////// if (pDataArray->m->control_pressed) { delete parser; return 0; } pDataArray->m->mothurOut("Total number of sequences before pre.cluster was " + toString(alignSeqs.size()) + ".");pDataArray-> m->mothurOutEndLine(); pDataArray->m->mothurOut("pre.cluster removed " + toString(count) + " sequences."); pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOutEndLine(); //////////////////////////////////////////////////// //printData(pDataArray->newFFile, pDataArray->newNFile); - same as below ofstream outFasta; ofstream outNames; pDataArray->m->openOutputFileAppend(pDataArray->newFName, outFasta); pDataArray->m->openOutputFileAppend(pDataArray->newNName, outNames); for (int i = 0; i < alignSeqs.size(); i++) { if (alignSeqs[i].numIdentical != 0) { alignSeqs[i].seq.printSequence(outFasta); if (pDataArray->countfile != "") { outNames << pDataArray->groups[k] << '\t' << alignSeqs[i].seq.getName() << '\t' << alignSeqs[i].names << endl; }else { outNames << alignSeqs[i].seq.getName() << '\t' << alignSeqs[i].names << endl; } } } outFasta.close(); outNames.close(); //////////////////////////////////////////////////// pDataArray->m->mothurOut("It took " + toString(time(NULL) - start) + " secs to cluster " + toString(numSeqs) + " sequences."); pDataArray->m->mothurOutEndLine(); } delete alignment; return numSeqs; } catch(exception& e) { pDataArray->m->errorOut(e, "PreClusterCommand", "MyPreclusterThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/primerdesigncommand.cpp000066400000000000000000001772701255543666200230100ustar00rootroot00000000000000// // primerdesigncommand.cpp // Mothur // // Created by Sarah Westcott on 1/18/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "primerdesigncommand.h" //********************************************************************************************************************** vector PrimerDesignCommand::setParameters(){ try { CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","summary-list",false,true,true); parameters.push_back(plist); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","",false,true, true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter plength("length", "Number", "", "18", "", "", "","",false,false); parameters.push_back(plength); CommandParameter pmintm("mintm", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmintm); CommandParameter pmaxtm("maxtm", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmaxtm); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false); parameters.push_back(pprocessors); CommandParameter potunumber("otulabel", "String", "", "", "", "", "","",false,true,true); parameters.push_back(potunumber); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(ppdiffs); CommandParameter pcutoff("cutoff", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string PrimerDesignCommand::getHelpString(){ try { string helpString = ""; helpString += "The primer.design allows you to identify sequence fragments that are specific to particular OTUs.\n"; helpString += "The primer.design command parameters are: list, fasta, name, count, otulabel, cutoff, length, pdiffs, mintm, maxtm, processors and label.\n"; helpString += "The list parameter allows you to provide a list file and is required.\n"; helpString += "The fasta parameter allows you to provide a fasta file and is required.\n"; helpString += "The name parameter allows you to provide a name file associated with your fasta file.\n"; helpString += "The count parameter allows you to provide a count file associated with your fasta file.\n"; helpString += "The label parameter is used to indicate the label you want to use from your list file.\n"; helpString += "The otulabel parameter is used to indicate the otu you want to use from your list file. It is required.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; helpString += "The length parameter is used to indicate the length of the primer. The default is 18.\n"; helpString += "The mintm parameter is used to indicate minimum melting temperature.\n"; helpString += "The maxtm parameter is used to indicate maximum melting temperature.\n"; helpString += "The processors parameter allows you to indicate the number of processors you want to use. Default=1.\n"; helpString += "The cutoff parameter allows you set a percentage of sequences that support the base. For example: cutoff=97 would only return a sequence that only showed ambiguities for bases that were not supported by at least 97% of sequences.\n"; helpString += "The primer.desing command should be in the following format: primer.design(list=yourListFile, fasta=yourFastaFile, name=yourNameFile)\n"; helpString += "primer.design(list=final.an.list, fasta=final.fasta, name=final.names, label=0.03)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string PrimerDesignCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],[distance],otu.cons.fasta"; } else if (type == "summary") { pattern = "[filename],[distance],primer.summary"; } else if (type == "list") { pattern = "[filename],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** PrimerDesignCommand::PrimerDesignCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["list"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "PrimerDesignCommand"); exit(1); } } //********************************************************************************************************************** PrimerDesignCommand::PrimerDesignCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["summary"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["list"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } //check for parameters namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } //get fastafile - it is required fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort=true; } else if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastafile); } //get listfile - it is required listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort=true; } else if (listfile == "not found") { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current listfile and the list parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setListFile(listfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(listfile); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, cutoff); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, pdiffs); temp = validParameter.validFile(parameters, "length", false); if (temp == "not found") { temp = "18"; } m->mothurConvert(temp, length); temp = validParameter.validFile(parameters, "mintm", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, minTM); temp = validParameter.validFile(parameters, "maxtm", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, maxTM); otulabel = validParameter.validFile(parameters, "otulabel", false); if (otulabel == "not found") { otulabel = ""; } if (otulabel == "") { m->mothurOut("[ERROR]: You must provide an OTU label, aborting.\n"); abort = true; } temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } if (countfile == "") { if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "PrimerDesignCommand"); exit(1); } } //********************************************************************************************************************** int PrimerDesignCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); ////////////////////////////////////////////////////////////////////////////// // get file inputs // ////////////////////////////////////////////////////////////////////////////// //reads list file and selects the label the users specified or the first label getListVector(); vector binLabels = list->getLabels(); int binIndex = findIndex(otulabel, binLabels); if (binIndex == -1) { m->mothurOut("[ERROR]: You selected an OTU label that is not in your in your list file, quitting.\n"); return 0; } map nameMap; unsigned long int numSeqs; //used to sanity check the files. numSeqs = total seqs for namefile and uniques for count. //list file should have all seqs if namefile was used to create it and only uniques in count file was used. if (namefile != "") { nameMap = m->readNames(namefile, numSeqs); } else if (countfile != "") { nameMap = readCount(numSeqs); } else { numSeqs = list->getNumSeqs(); } //sanity check if (numSeqs != list->getNumSeqs()) { if (namefile != "") { m->mothurOut("[ERROR]: Your list file contains " + toString(list->getNumSeqs()) + " sequences, and your name file contains " + toString(numSeqs) + " sequences, aborting. Do you have the correct files? Perhaps you forgot to include the name file when you clustered? \n"); } else if (countfile != "") { m->mothurOut("[ERROR]: Your list file contains " + toString(list->getNumSeqs()) + " sequences, and your count file contains " + toString(numSeqs) + " unique sequences, aborting. Do you have the correct files? Perhaps you forgot to include the count file when you clustered? \n"); } m->control_pressed = true; } if (m->control_pressed) { delete list; return 0; } ////////////////////////////////////////////////////////////////////////////// // process data // ////////////////////////////////////////////////////////////////////////////// m->mothurOut("\nFinding consensus sequences for each otu..."); cout.flush(); vector conSeqs = createProcessesConSeqs(nameMap, numSeqs); map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[distance]"] = list->getLabel(); string consFastaFile = getOutputFileName("fasta", variables); outputNames.push_back(consFastaFile); outputTypes["fasta"].push_back(consFastaFile); ofstream out; m->openOutputFile(consFastaFile, out); for (int i = 0; i < conSeqs.size(); i++) { conSeqs[i].printSequence(out); } out.close(); m->mothurOut("Done.\n\n"); set primers = getPrimer(conSeqs[binIndex]); if (m->control_pressed) { delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } string consSummaryFile = getOutputFileName("summary", variables); outputNames.push_back(consSummaryFile); outputTypes["summary"].push_back(consSummaryFile); ofstream outSum; m->openOutputFile(consSummaryFile, outSum); outSum << "PrimerOtu: " << otulabel << " Members: " << list->get(binIndex) << endl << "Primers\tminTm\tmaxTm" << endl; //find min and max melting points vector minTms; vector maxTms; string primerString = ""; for (set::iterator it = primers.begin(); it != primers.end();) { double minTm, maxTm; findMeltingPoint(*it, minTm, maxTm); if ((minTM == -1) && (maxTM == -1)) { //user did not set min or max Tm so save this primer minTms.push_back(minTm); maxTms.push_back(maxTm); outSum << *it << '\t' << minTm << '\t' << maxTm << endl; it++; }else if ((minTM == -1) && (maxTm <= maxTM)){ //user set max and no min, keep if below max minTms.push_back(minTm); maxTms.push_back(maxTm); outSum << *it << '\t' << minTm << '\t' << maxTm << endl; it++; }else if ((maxTM == -1) && (minTm >= minTM)){ //user set min and no max, keep if above min minTms.push_back(minTm); maxTms.push_back(maxTm); outSum << *it << '\t' << minTm << '\t' << maxTm << endl; it++; }else if ((maxTm <= maxTM) && (minTm >= minTM)) { //keep if above min and below max minTms.push_back(minTm); maxTms.push_back(maxTm); outSum << *it << '\t' << minTm << '\t' << maxTm << endl; it++; }else { primers.erase(it++); } //erase because it didn't qualify } outSum << "\nOTUNumber\tPrimer\tStart\tEnd\tLength\tMismatches\tminTm\tmaxTm\n"; outSum.close(); //check each otu's conseq for each primer in otunumber set otuToRemove = createProcesses(consSummaryFile, minTms, maxTms, primers, conSeqs, binIndex); if (m->control_pressed) { delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //print new list file map mvariables; mvariables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); mvariables["[extension]"] = m->getExtension(listfile); string newListFile = getOutputFileName("list", mvariables); ofstream outListTemp; m->openOutputFile(newListFile+".temp", outListTemp); outListTemp << list->getLabel() << '\t' << (list->getNumBins()-otuToRemove.size()); string headers = "label\tnumOtus"; for (int j = 0; j < list->getNumBins(); j++) { if (m->control_pressed) { break; } //good otus if (otuToRemove.count(j) == 0) { string bin = list->get(j); if (bin != "") { outListTemp << '\t' << bin; headers += '\t' + binLabels[j]; } } } outListTemp << endl; outListTemp.close(); ofstream outList; m->openOutputFile(newListFile, outList); outList << headers << endl; outList.close(); m->appendFiles(newListFile+".temp", newListFile); m->mothurRemove(newListFile+".temp"); outputNames.push_back(newListFile); outputTypes["list"].push_back(newListFile); if (m->control_pressed) { delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } delete list; m->mothurOut("It took " + toString(time(NULL) - start) + " secs to process " + toString(list->getNumBins()) + " OTUs.\n"); //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "execute"); exit(1); } } //********************************************************************/ //used http://www.biophp.org/minitools/melting_temperature/ as a reference to substitute degenerate bases // in order to find the min and max Tm values. //Tm = 64.9°C + 41°C x (number of G’s and C’s in the primer – 16.4)/N /* A = adenine * C = cytosine * G = guanine * T = thymine * R = G A (purine) * Y = T C (pyrimidine) * K = G T (keto) * M = A C (amino) * S = G C (strong bonds) * W = A T (weak bonds) * B = G T C (all but A) * D = G A T (all but C) * H = A C T (all but G) * V = G C A (all but T) * N = A G C T (any) */ int PrimerDesignCommand::findMeltingPoint(string primer, double& minTm, double& maxTm){ try { string minTmprimer = primer; string maxTmprimer = primer; //find minimum Tm string substituting for degenerate bases for (int i = 0; i < minTmprimer.length(); i++) { minTmprimer[i] = toupper(minTmprimer[i]); if (minTmprimer[i] == 'Y') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'R') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'W') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'K') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'M') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'D') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'V') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'H') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'B') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'N') { minTmprimer[i] = 'A'; } else if (minTmprimer[i] == 'S') { minTmprimer[i] = 'G'; } } //find maximum Tm string substituting for degenerate bases for (int i = 0; i < maxTmprimer.length(); i++) { maxTmprimer[i] = toupper(maxTmprimer[i]); if (maxTmprimer[i] == 'Y') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'R') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'W') { maxTmprimer[i] = 'A'; } else if (maxTmprimer[i] == 'K') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'M') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'D') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'V') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'H') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'B') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'N') { maxTmprimer[i] = 'G'; } else if (maxTmprimer[i] == 'S') { maxTmprimer[i] = 'G'; } } int numGC = 0; for (int i = 0; i < minTmprimer.length(); i++) { if (minTmprimer[i] == 'G') { numGC++; } else if (minTmprimer[i] == 'C') { numGC++; } } minTm = 64.9 + 41 * (numGC - 16.4) / (double) minTmprimer.length(); numGC = 0; for (int i = 0; i < maxTmprimer.length(); i++) { if (maxTmprimer[i] == 'G') { numGC++; } else if (maxTmprimer[i] == 'C') { numGC++; } } maxTm = 64.9 + 41 * (numGC - 16.4) / (double) maxTmprimer.length(); return 0; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "findMeltingPoint"); exit(1); } } //********************************************************************/ //search for a primer over the sequence string bool PrimerDesignCommand::findPrimer(string rawSequence, string primer, vector& primerStart, vector& primerEnd, vector& mismatches){ try { bool foundAtLeastOne = false; //innocent til proven guilty //look for exact match if(rawSequence.length() < primer.length()) { return false; } //search for primer for (int j = 0; j < rawSequence.length()-length; j++){ if (m->control_pressed) { return foundAtLeastOne; } string rawChunk = rawSequence.substr(j, length); int numDiff = countDiffs(primer, rawChunk); if(numDiff <= pdiffs){ primerStart.push_back(j); primerEnd.push_back(j+length); mismatches.push_back(numDiff); foundAtLeastOne = true; } } return foundAtLeastOne; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "findPrimer"); exit(1); } } //********************************************************************/ //find all primers for the given sequence set PrimerDesignCommand::getPrimer(Sequence primerSeq){ try { set primers; string rawSequence = primerSeq.getUnaligned(); for (int j = 0; j < rawSequence.length()-length; j++){ if (m->control_pressed) { break; } string primer = rawSequence.substr(j, length); primers.insert(primer); } return primers; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "getPrimer"); exit(1); } } /**************************************************************************************************/ set PrimerDesignCommand::createProcesses(string newSummaryFile, vector& minTms, vector& maxTms, set& primers, vector& conSeqs, int binIndex) { try { vector processIDS; int process = 1; set otusToRemove; int numBinsProcessed = 0; bool recalc = false; //sanity check int numBins = conSeqs.size(); if (numBins < processors) { processors = numBins; } //divide the otus between the processors vector lines; int numOtusPerProcessor = numBins / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numOtusPerProcessor; int endIndex = (i+1) * numOtusPerProcessor; if(i == (processors - 1)){ endIndex = numBins; } lines.push_back(linePair(startIndex, endIndex)); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ //clear old file because we append in driver m->mothurRemove(newSummaryFile + m->mothurGetpid(process) + ".temp"); otusToRemove = driver(newSummaryFile + m->mothurGetpid(process) + ".temp", minTms, maxTms, primers, conSeqs, lines[process].start, lines[process].end, numBinsProcessed, binIndex); string tempFile = m->mothurGetpid(process) + ".otus2Remove.temp"; ofstream outTemp; m->openOutputFile(tempFile, outTemp); outTemp << numBinsProcessed << endl; outTemp << otusToRemove.size() << endl; for (set::iterator it = otusToRemove.begin(); it != otusToRemove.end(); it++) { outTemp << *it << endl; } outTemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".otus2Remove.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".otus2Remove.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); int numOtusPerProcessor = numBins / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numOtusPerProcessor; int endIndex = (i+1) * numOtusPerProcessor; if(i == (processors - 1)){ endIndex = numBins; } lines.push_back(linePair(startIndex, endIndex)); } processIDS.clear(); process = 1; otusToRemove.clear(); numBinsProcessed = 0; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ //clear old file because we append in driver m->mothurRemove(newSummaryFile + m->mothurGetpid(process) + ".temp"); otusToRemove = driver(newSummaryFile + m->mothurGetpid(process) + ".temp", minTms, maxTms, primers, conSeqs, lines[process].start, lines[process].end, numBinsProcessed, binIndex); string tempFile = m->mothurGetpid(process) + ".otus2Remove.temp"; ofstream outTemp; m->openOutputFile(tempFile, outTemp); outTemp << numBinsProcessed << endl; outTemp << otusToRemove.size() << endl; for (set::iterator it = otusToRemove.begin(); it != otusToRemove.end(); it++) { outTemp << *it << endl; } outTemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part otusToRemove = driver(newSummaryFile, minTms, maxTms, primers, conSeqs, lines[0].start, lines[0].end, numBinsProcessed, binIndex); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, intemp); int num; intemp >> num; m->gobble(intemp); if (num != (lines[i+1].end - lines[i+1].start)) { m->mothurOut("[ERROR]: process " + toString(processIDS[i]) + " did not complete processing all OTUs assigned to it, quitting.\n"); m->control_pressed = true; } intemp >> num; m->gobble(intemp); for (int k = 0; k < num; k++) { int otu; intemp >> otu; m->gobble(intemp); otusToRemove.insert(otu); } intemp.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the primerDesignData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; imothurRemove(newSummaryFile+extension); primerDesignData* tempPrimer = new primerDesignData((newSummaryFile+extension), m, lines[i].start, lines[i].end, minTms, maxTms, primers, conSeqs, pdiffs, binIndex, length, i); pDataArray.push_back(tempPrimer); processIDS.push_back(i); //MySeqSumThreadFunction is in header. It must be global or static to work with the threads. //default security attributes, thread function name, argument to thread function, use default creation flags, returns the thread identifier hThreadArray[i-1] = CreateThread(NULL, 0, MyPrimerThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //using the main process as a worker saves time and memory otusToRemove = driver(newSummaryFile, minTms, maxTms, primers, conSeqs, lines[0].start, lines[0].end, numBinsProcessed, binIndex); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (set::iterator it = pDataArray[i]->otusToRemove.begin(); it != pDataArray[i]->otusToRemove.end(); it++) { otusToRemove.insert(*it); } int num = pDataArray[i]->numBinsProcessed; if (num != (lines[processIDS[i]].end - lines[processIDS[i]].start)) { m->mothurOut("[ERROR]: process " + toString(processIDS[i]) + " did not complete processing all OTUs assigned to it, quitting.\n"); m->control_pressed = true; } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //append output files for(int i=0;iappendFiles((newSummaryFile + toString(processIDS[i]) + ".temp"), newSummaryFile); m->mothurRemove((newSummaryFile + toString(processIDS[i]) + ".temp")); } return otusToRemove; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** set PrimerDesignCommand::driver(string summaryFileName, vector& minTms, vector& maxTms, set& primers, vector& conSeqs, int start, int end, int& numBinsProcessed, int binIndex){ try { set otuToRemove; ofstream outSum; m->openOutputFileAppend(summaryFileName, outSum); for (int i = start; i < end; i++) { if (m->control_pressed) { break; } if (i != (binIndex)) { int primerIndex = 0; for (set::iterator it = primers.begin(); it != primers.end(); it++) { vector primerStarts; vector primerEnds; vector mismatches; bool found = findPrimer(conSeqs[i].getUnaligned(), (*it), primerStarts, primerEnds, mismatches); //if we found it report to the table if (found) { for (int j = 0; j < primerStarts.size(); j++) { outSum << (i+1) << '\t' << *it << '\t' << primerStarts[j] << '\t' << primerEnds[j] << '\t' << length << '\t' << mismatches[j] << '\t' << minTms[primerIndex] << '\t' << maxTms[primerIndex] << endl; } otuToRemove.insert(i); } primerIndex++; } } numBinsProcessed++; } outSum.close(); return otuToRemove; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "driver"); exit(1); } } /**************************************************************************************************/ vector< vector< vector > > PrimerDesignCommand::driverGetCounts(map& nameMap, unsigned long int& fastaCount, vector& otuCounts, unsigned long long& start, unsigned long long& end){ try { vector< vector< vector > > counts; map seq2Bin; alignedLength = 0; ifstream in; m->openInputFile(fastafile, in); in.seekg(start); //adjust start if null strings if (start == 0) { m->zapGremlins(in); m->gobble(in); } bool done = false; fastaCount = 0; while (!done) { if (m->control_pressed) { in.close(); return counts; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { if (fastaCount == 0) { alignedLength = seq.getAligned().length(); initializeCounts(counts, alignedLength, seq2Bin, nameMap, otuCounts); } else if (alignedLength != seq.getAligned().length()) { m->mothurOut("[ERROR]: your sequences are not all the same length. primer.design requires sequences to be aligned."); m->mothurOutEndLine(); m->control_pressed = true; break; } int num = 1; map::iterator itCount; if (namefile != "") { itCount = nameMap.find(seq.getName()); if (itCount == nameMap.end()) { m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file and not in your name file, aborting."); m->mothurOutEndLine(); m->control_pressed = true; break; } else { num = itCount->second; } fastaCount+=num; }else if (countfile != "") { itCount = nameMap.find(seq.getName()); if (itCount == nameMap.end()) { m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file and not in your count file, aborting."); m->mothurOutEndLine(); m->control_pressed = true; break; } else { num = itCount->second; } fastaCount++; }else { fastaCount++; } //increment counts itCount = seq2Bin.find(seq.getName()); if (itCount == seq2Bin.end()) { if ((namefile != "") || (countfile != "")) { m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file and not in your list file, aborting. Perhaps you forgot to include your name or count file while clustering.\n"); m->mothurOutEndLine(); m->control_pressed = true; break; }else{ m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file and not in your list file, aborting."); m->mothurOutEndLine(); m->control_pressed = true; break; } }else { otuCounts[itCount->second] += num; string aligned = seq.getAligned(); for (int i = 0; i < alignedLength; i++) { char base = toupper(aligned[i]); if (base == 'A') { counts[itCount->second][i][0]+=num; } else if (base == 'T') { counts[itCount->second][i][1]+=num; } else if (base == 'G') { counts[itCount->second][i][2]+=num; } else if (base == 'C') { counts[itCount->second][i][3]+=num; } else { counts[itCount->second][i][4]+=num; } } } } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= end)) { break; } #else if (in.eof()) { break; } #endif } in.close(); return counts; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "driverGetCounts"); exit(1); } } /**************************************************************************************************/ vector PrimerDesignCommand::createProcessesConSeqs(map& nameMap, unsigned long int& numSeqs) { try { vector< vector< vector > > counts; vector otuCounts; vector processIDS; int process = 1; unsigned long int fastaCount = 0; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) vector positions; vector lines; positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(fastaLinePair(positions[i], positions[(i+1)])); } //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ counts = driverGetCounts(nameMap, fastaCount, otuCounts, lines[process].start, lines[process].end); string tempFile = m->mothurGetpid(process) + ".cons_counts.temp"; ofstream outTemp; m->openOutputFile(tempFile, outTemp); outTemp << fastaCount << endl; //pass counts outTemp << counts.size() << endl; for (int i = 0; i < counts.size(); i++) { outTemp << counts[i].size() << endl; for (int j = 0; j < counts[i].size(); j++) { for (int k = 0; k < 5; k++) { outTemp << counts[i][j][k] << '\t'; } outTemp << endl; } } //pass otuCounts outTemp << otuCounts.size() << endl; for (int i = 0; i < otuCounts.size(); i++) { outTemp << otuCounts[i] << '\t'; } outTemp << endl; outTemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".cons_counts.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".cons_counts.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); positions.clear(); lines.clear(); positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(fastaLinePair(positions[i], positions[(i+1)])); } counts.clear(); otuCounts.clear(); processIDS.clear(); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ counts = driverGetCounts(nameMap, fastaCount, otuCounts, lines[process].start, lines[process].end); string tempFile = m->mothurGetpid(process) + ".cons_counts.temp"; ofstream outTemp; m->openOutputFile(tempFile, outTemp); outTemp << fastaCount << endl; //pass counts outTemp << counts.size() << endl; for (int i = 0; i < counts.size(); i++) { outTemp << counts[i].size() << endl; for (int j = 0; j < counts[i].size(); j++) { for (int k = 0; k < 5; k++) { outTemp << counts[i][j][k] << '\t'; } outTemp << endl; } } //pass otuCounts outTemp << otuCounts.size() << endl; for (int i = 0; i < otuCounts.size(); i++) { outTemp << otuCounts[i] << '\t'; } outTemp << endl; outTemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part counts = driverGetCounts(nameMap, fastaCount, otuCounts, lines[0].start, lines[0].end); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, intemp); unsigned long int num; intemp >> num; m->gobble(intemp); fastaCount += num; intemp >> num; m->gobble(intemp); if (num != counts.size()) { m->mothurOut("[ERROR]: " + tempFile + " was not built correctly by the child process, quitting.\n"); m->control_pressed = true; } else { //read counts for (int k = 0; k < num; k++) { int alength; intemp >> alength; m->gobble(intemp); if (alength != alignedLength) { m->mothurOut("[ERROR]: your sequences are not all the same length. primer.design requires sequences to be aligned."); m->mothurOutEndLine(); m->control_pressed = true; } else { for (int j = 0; j < alength; j++) { for (int l = 0; l < 5; l++) { unsigned int numTemp; intemp >> numTemp; m->gobble(intemp); counts[k][j][l] += numTemp; } } } } //read otuCounts intemp >> num; m->gobble(intemp); for (int k = 0; k < num; k++) { unsigned int numTemp; intemp >> numTemp; m->gobble(intemp); otuCounts[k] += numTemp; } } intemp.close(); m->mothurRemove(tempFile); } #else unsigned long long start = 0; unsigned long long end = 1000; counts = driverGetCounts(nameMap, fastaCount, otuCounts, start, end); #endif //you will have a nameMap error if there is a namefile or countfile, but if those aren't given we want to make sure the fasta and list file match. if (fastaCount != numSeqs) { if ((namefile == "") && (countfile == "")) { m->mothurOut("[ERROR]: Your list file contains " + toString(list->getNumSeqs()) + " sequences, and your fasta file contains " + toString(fastaCount) + " sequences, aborting. Do you have the correct files? Perhaps you forgot to include the name or count file? \n"); } m->control_pressed = true; } vector conSeqs; if (m->control_pressed) { return conSeqs; } //build consensus seqs string snumBins = toString(counts.size()); for (int i = 0; i < counts.size(); i++) { if (m->control_pressed) { break; } string otuLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { otuLabel += "0"; } } otuLabel += sbinNumber; string cons = ""; for (int j = 0; j < counts[i].size(); j++) { cons += getBase(counts[i][j], otuCounts[i]); } Sequence consSeq(otuLabel, cons); conSeqs.push_back(consSeq); } if (m->control_pressed) { conSeqs.clear(); return conSeqs; } return conSeqs; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "createProcessesConSeqs"); exit(1); } } //*************************************************************************************************************** char PrimerDesignCommand::getBase(vector counts, int size){ //A,T,G,C,Gap try{ /* A = adenine * C = cytosine * G = guanine * T = thymine * R = G A (purine) * Y = T C (pyrimidine) * K = G T (keto) * M = A C (amino) * S = G C (strong bonds) * W = A T (weak bonds) * B = G T C (all but A) * D = G A T (all but C) * H = A C T (all but G) * V = G C A (all but T) * N = A G C T (any) */ char conBase = 'N'; //zero out counts that don't make the cutoff float percentage = (100.0 - cutoff) / 100.0; for (int i = 0; i < counts.size(); i++) { float countPercentage = counts[i] / (float) size; if (countPercentage < percentage) { counts[i] = 0; } } //any if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'n'; } //any no gap else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'N'; } //all but T else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'v'; } //all but T no gap else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'V'; } //all but G else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'h'; } //all but G no gap else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'H'; } //all but C else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'd'; } //all but C no gap else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'D'; } //all but A else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'b'; } //all but A no gap else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'B'; } //W = A T (weak bonds) else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'w'; } //W = A T (weak bonds) no gap else if ((counts[0] != 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'W'; } //S = G C (strong bonds) else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 's'; } //S = G C (strong bonds) no gap else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'S'; } //M = A C (amino) else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'm'; } //M = A C (amino) no gap else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'M'; } //K = G T (keto) else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'k'; } //K = G T (keto) no gap else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'K'; } //Y = T C (pyrimidine) else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'y'; } //Y = T C (pyrimidine) no gap else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'Y'; } //R = G A (purine) else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'r'; } //R = G A (purine) no gap else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'R'; } //only A else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'a'; } //only A no gap else if ((counts[0] != 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'A'; } //only T else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 't'; } //only T no gap else if ((counts[0] == 0) && (counts[1] != 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'T'; } //only G else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = 'g'; } //only G no gap else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] != 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'G'; } //only C else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] != 0)) { conBase = 'c'; } //only C no gap else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] != 0) && (counts[4] == 0)) { conBase = 'C'; } //only gap else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] != 0)) { conBase = '-'; } //cutoff removed all counts else if ((counts[0] == 0) && (counts[1] == 0) && (counts[2] == 0) && (counts[3] == 0) && (counts[4] == 0)) { conBase = 'N'; } else{ m->mothurOut("[ERROR]: cannot find consensus base."); m->mothurOutEndLine(); } return conBase; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "getBase"); exit(1); } } //********************************************************************************************************************** int PrimerDesignCommand::initializeCounts(vector< vector< vector > >& counts, int length, map& seq2Bin, map& nameMap, vector& otuCounts){ try { counts.clear(); otuCounts.clear(); seq2Bin.clear(); //vector< vector< vector > > counts - otu < spot_in_alignment < counts_for_A,T,G,C,Gap > > > for (int i = 0; i < list->getNumBins(); i++) { string binNames = list->get(i); vector names; m->splitAtComma(binNames, names); otuCounts.push_back(0); //lets be smart and only map the unique names if a name or count file was given to save search time and memory if ((namefile != "") || (countfile != "")) { for (int j = 0; j < names.size(); j++) { map::iterator itNames = nameMap.find(names[j]); if (itNames != nameMap.end()) { //add name because its a unique one seq2Bin[names[j]] = i; } } }else { //map everyone for (int j = 0; j < names.size(); j++) { seq2Bin[names[j]] = i; } } vector temp; temp.resize(5, 0); //A,T,G,C,Gap vector< vector > temp2; for (int j = 0; j < length; j++) { temp2.push_back(temp); } counts.push_back(temp2); } return 0; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "initializeCounts"); exit(1); } } //********************************************************************************************************************** map PrimerDesignCommand::readCount(unsigned long int& numSeqs){ try { map nameMap; CountTable ct; ct.readTable(countfile, false, false); vector namesOfSeqs = ct.getNamesOfSeqs(); numSeqs = ct.getNumUniqueSeqs(); for (int i = 0; i < namesOfSeqs.size(); i++) { if (m->control_pressed) { break; } nameMap[namesOfSeqs[i]] = ct.getNumSeqs(namesOfSeqs[i]); } return nameMap; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "readCount"); exit(1); } } //********************************************************************************************************************** int PrimerDesignCommand::getListVector(){ try { InputData input(listfile, "list"); list = input.getListVector(); string lastLabel = list->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); break; } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); break; } lastLabel = list->getLabel(); //get next line to process //prevent memory leak delete list; list = input.getListVector(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { delete list; list = input.getListVector(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "getListVector"); exit(1); } } //********************************************************************/ /* A = adenine * C = cytosine * G = guanine * T = thymine * R = G A (purine) * Y = T C (pyrimidine) * K = G T (keto) * M = A C (amino) * S = G C (strong bonds) * W = A T (weak bonds) * B = G T C (all but A) * D = G A T (all but C) * H = A C T (all but G) * V = G C A (all but T) * N = A G C T (any) */ int PrimerDesignCommand::countDiffs(string oligo, string seq){ try { int length = oligo.length(); int countDiffs = 0; for(int i=0;ierrorOut(e, "PrimerDesignCommand", "countDiffs"); exit(1); } } //********************************************************************************************************************** int PrimerDesignCommand::findIndex(string binLabel, vector binLabels){ try { int index = -1; for (int i = 0; i < binLabels.size(); i++){ if (m->control_pressed) { return index; } if (m->isLabelEquivalent(binLabel, binLabels[i])) { index = i; break; } } return index; } catch(exception& e) { m->errorOut(e, "PrimerDesignCommand", "findIndex"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/primerdesigncommand.h000066400000000000000000000237001255543666200224410ustar00rootroot00000000000000// // primerdesigncommand.h // Mothur // // Created by Sarah Westcott on 1/18/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_primerdesigncommand_h #define Mothur_primerdesigncommand_h #include "command.hpp" #include "listvector.hpp" #include "inputdata.h" #include "sequence.hpp" #include "alignment.hpp" #include "needlemanoverlap.hpp" /**************************************************************************************************/ class PrimerDesignCommand : public Command { public: PrimerDesignCommand(string); PrimerDesignCommand(); ~PrimerDesignCommand(){} vector setParameters(); string getCommandName() { return "primer.design"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "http://www.mothur.org/wiki/Primer.design"; } string getDescription() { return "identify sequence fragments that are specific to particular OTUs"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: struct fastaLinePair { unsigned long long start; unsigned long long end; fastaLinePair(unsigned long long i, unsigned long long j) : start(i), end(j) {} }; bool abort, allLines, large; int cutoff, pdiffs, length, processors, alignedLength; string outputDir, listfile, otulabel, namefile, countfile, fastafile, label; double minTM, maxTM; ListVector* list; vector outputNames; int initializeCounts(vector< vector< vector > >& counts, int length, map&, map&, vector&); map readCount(unsigned long int&); char getBase(vector counts, int size); int getListVector(); int countDiffs(string, string); set getPrimer(Sequence); bool findPrimer(string, string, vector&, vector&, vector&); int findMeltingPoint(string primer, double&, double&); set createProcesses(string, vector&, vector&, set&, vector&, int); set driver(string, vector&, vector&, set&, vector&, int, int, int&, int); vector< vector< vector > > driverGetCounts(map&, unsigned long int&, vector&, unsigned long long&, unsigned long long&); vector createProcessesConSeqs(map&, unsigned long int&); int findIndex(string binLabel, vector binLabels); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct primerDesignData { string summaryFileName; MothurOut* m; int start; int end; int pdiffs, threadID, length, binIndex; set primers; vector minTms, maxTms; set otusToRemove; vector consSeqs; int numBinsProcessed; primerDesignData(){} primerDesignData(string sf, MothurOut* mout, int st, int en, vector min, vector max, set pri, vector seqs, int d, int otun, int l, int tid) { summaryFileName = sf; m = mout; start = st; end = en; pdiffs = d; minTms = min; maxTms = max; primers = pri; consSeqs = seqs; binIndex = otun; length = l; threadID = tid; numBinsProcessed = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyPrimerThreadFunction(LPVOID lpParam){ primerDesignData* pDataArray; pDataArray = (primerDesignData*)lpParam; try { ofstream outSum; pDataArray->m->openOutputFileAppend(pDataArray->summaryFileName, outSum); for (int i = pDataArray->start; i < pDataArray->end; i++) { if (pDataArray->m->control_pressed) { break; } if (i != (pDataArray->binIndex)) { int primerIndex = 0; for (set::iterator it = pDataArray->primers.begin(); it != pDataArray->primers.end(); it++) { vector primerStarts; vector primerEnds; vector mismatches; //bool found = findPrimer(conSeqs[i].getUnaligned(), (*it), primerStarts, primerEnds, mismatches); /////////////////////////////////////////////////////////////////////////////////////////////////// bool found = false; //innocent til proven guilty string rawSequence = pDataArray->consSeqs[i].getUnaligned(); string primer = *it; //look for exact match if(rawSequence.length() < primer.length()) { found = false; } else { //search for primer for (int j = 0; j < rawSequence.length()-pDataArray->length; j++){ if (pDataArray->m->control_pressed) { found = false; break; } string rawChunk = rawSequence.substr(j, pDataArray->length); //int numDiff = countDiffs(primer, rawchuck); /////////////////////////////////////////////////////////////////////// int numDiff = 0; string oligo = primer; string seq = rawChunk; for(int k=0;klength;k++){ oligo[k] = toupper(oligo[k]); seq[k] = toupper(seq[k]); if(oligo[k] != seq[k]){ if((oligo[k] == 'N' || oligo[k] == 'I') && (seq[k] == 'N')) { numDiff++; } else if(oligo[k] == 'R' && (seq[k] != 'A' && seq[k] != 'G')) { numDiff++; } else if(oligo[k] == 'Y' && (seq[k] != 'C' && seq[k] != 'T')) { numDiff++; } else if(oligo[k] == 'M' && (seq[k] != 'C' && seq[k] != 'A')) { numDiff++; } else if(oligo[k] == 'K' && (seq[k] != 'T' && seq[k] != 'G')) { numDiff++; } else if(oligo[k] == 'W' && (seq[k] != 'T' && seq[k] != 'A')) { numDiff++; } else if(oligo[k] == 'S' && (seq[k] != 'C' && seq[k] != 'G')) { numDiff++; } else if(oligo[k] == 'B' && (seq[k] != 'C' && seq[k] != 'T' && seq[k] != 'G')) { numDiff++; } else if(oligo[k] == 'D' && (seq[k] != 'A' && seq[k] != 'T' && seq[k] != 'G')) { numDiff++; } else if(oligo[k] == 'H' && (seq[k] != 'A' && seq[k] != 'T' && seq[k] != 'C')) { numDiff++; } else if(oligo[k] == 'V' && (seq[k] != 'A' && seq[k] != 'C' && seq[k] != 'G')) { numDiff++; } else if(oligo[k] == 'A' && (seq[k] != 'A' && seq[k] != 'M' && seq[k] != 'R' && seq[k] != 'W' && seq[k] != 'D' && seq[k] != 'H' && seq[k] != 'V')) { numDiff++; } else if(oligo[k] == 'C' && (seq[k] != 'C' && seq[k] != 'Y' && seq[k] != 'M' && seq[k] != 'S' && seq[k] != 'B' && seq[k] != 'H' && seq[k] != 'V')) { numDiff++; } else if(oligo[k] == 'G' && (seq[k] != 'G' && seq[k] != 'R' && seq[k] != 'K' && seq[k] != 'S' && seq[k] != 'B' && seq[k] != 'D' && seq[k] != 'V')) { numDiff++; } else if(oligo[k] == 'T' && (seq[k] != 'T' && seq[k] != 'Y' && seq[k] != 'K' && seq[k] != 'W' && seq[k] != 'B' && seq[k] != 'D' && seq[k] != 'H')) { numDiff++; } else if((oligo[k] == '.' || oligo[k] == '-')) { numDiff++; } } } /////////////////////////////////////////////////////////////////////// if(numDiff <= pDataArray->pdiffs){ primerStarts.push_back(j); primerEnds.push_back(j+pDataArray->length); mismatches.push_back(numDiff); found = true; } } } /////////////////////////////////////////////////////////////////////////////////////////////////// //if we found it report to the table if (found) { for (int j = 0; j < primerStarts.size(); j++) { outSum << (i+1) << '\t' << *it << '\t' << primerStarts[j] << '\t' << primerEnds[j] << '\t' << pDataArray->length << '\t' << mismatches[j] << '\t' << pDataArray->minTms[primerIndex] << '\t' << pDataArray->maxTms[primerIndex] << endl; } pDataArray->otusToRemove.insert(i); } primerIndex++; } } pDataArray->numBinsProcessed++; } outSum.close(); } catch(exception& e) { pDataArray->m->errorOut(e, "PrimerDesignCommand", "MyPrimerThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/quitcommand.cpp000066400000000000000000000020061255543666200212620ustar00rootroot00000000000000/* * quitcommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "quitcommand.h" //********************************************************************************************************************** QuitCommand::QuitCommand(string option) { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} } //********************************************************************************************************************** QuitCommand::~QuitCommand(){} //********************************************************************************************************************** int QuitCommand::execute(){ if (abort == true) { return 0; } return 1; } //********************************************************************************************************************** mothur-1.36.1/source/commands/quitcommand.h000066400000000000000000000020761255543666200207360ustar00rootroot00000000000000#ifndef QUITCOMMAND_H #define QUITCOMMAND_H /* * quitcommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" /* The quit() command: The quit command terminates the mothur program. The quit command should be in the following format: quit (). */ class QuitCommand : public Command { public: QuitCommand(string); QuitCommand() {} ~QuitCommand(); vector setParameters() { return outputNames; } //dummy, doesn't really do anything string getCommandName() { return "quit"; } string getCommandCategory() { return "Hidden"; } string getHelpString() { return "The quit command will terminate mothur and should be in the following format: quit() or quit. \n"; } string getOutputPattern(string) { return ""; } string getCitation() { return "no citation"; } string getDescription() { return "quit"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; vector outputNames; }; #endif mothur-1.36.1/source/commands/rarefactcommand.cpp000066400000000000000000001071071255543666200220770ustar00rootroot00000000000000/* * rarefactcommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "rarefactcommand.h" #include "ace.h" #include "sobs.h" #include "nseqs.h" #include "chao1.h" #include "bootstrap.h" #include "simpson.h" #include "simpsoneven.h" #include "heip.h" #include "smithwilson.h" #include "invsimpson.h" #include "npshannon.h" #include "shannoneven.h" #include "shannon.h" #include "jackknife.h" #include "coverage.h" #include "shannonrange.h" //********************************************************************************************************************** vector RareFactCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false,true); parameters.push_back(plist); CommandParameter prabund("rabund", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false); parameters.push_back(prabund); CommandParameter psabund("sabund", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false); parameters.push_back(psabund); CommandParameter pshared("shared", "InputTypes", "", "", "LRSS", "LRSS", "none","",false,false,true); parameters.push_back(pshared); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pfreq("freq", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pfreq); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pcalc("calc", "Multiple", "sobs-chao-nseqs-coverage-ace-jack-shannon-shannoneven-npshannon-heip-smithwilson-simpson-simpsoneven-invsimpson-bootstrap-shannonrange", "sobs", "", "", "","",true,false,true); parameters.push_back(pcalc); CommandParameter pabund("abund", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pabund); CommandParameter palpha("alpha", "Multiple", "0-1-2", "1", "", "", "","",false,false,true); parameters.push_back(palpha); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pgroupmode("groupmode", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pgroupmode); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RareFactCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RareFactCommand::getHelpString(){ try { ValidCalculators validCalculator; string helpString = ""; helpString += "The rarefaction.single command parameters are list, sabund, rabund, shared, label, iters, freq, calc, processors, groupmode and abund. list, sabund, rabund or shared is required unless you have a valid current file. \n"; helpString += "The freq parameter is used indicate when to output your data, by default it is set to 100. But you can set it to a percentage of the number of sequence. For example freq=0.10, means 10%. \n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; helpString += "The rarefaction.single command should be in the following format: \n"; helpString += "rarefaction.single(label=yourLabel, iters=yourIters, freq=yourFreq, calc=yourEstimators).\n"; helpString += "Example rarefaction.single(label=unique-.01-.03, iters=10000, freq=10, calc=sobs-rchao-race-rjack-rbootstrap-rshannon-rnpshannon-rsimpson).\n"; helpString += "The default values for iters is 1000, freq is 100, and calc is rarefaction which calculates the rarefaction curve for the observed richness.\n"; helpString += "The alpha parameter is used to set the alpha value for the shannonrange calculator.\n"; validCalculator.printCalc("rarefaction"); helpString += "If you are running rarefaction.single with a shared file and would like your results collated in one file, set groupmode=t. (Default=true).\n"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "Note: No spaces between parameter labels (i.e. freq), '=' and parameters (i.e.yourFreq).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RareFactCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RareFactCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "rarefaction") { pattern = "[filename],rarefaction"; } else if (type == "r_chao") { pattern = "[filename],r_chao"; } else if (type == "r_ace") { pattern = "[filename],r_ace"; } else if (type == "r_jack") { pattern = "[filename],r_jack"; } else if (type == "r_shannon") { pattern = "[filename],r_shannon"; } else if (type == "r_shannoneven") { pattern = "[filename],r_shannoneven"; } else if (type == "r_smithwilson") { pattern = "[filename],r_smithwilson"; } else if (type == "r_npshannon") { pattern = "[filename],r_npshannon"; } else if (type == "r_shannonrange"){ pattern = "[filename],r_shannonrange"; } else if (type == "r_simpson") { pattern = "[filename],r_simpson"; } else if (type == "r_simpsoneven") { pattern = "[filename],r_simpsoneven"; } else if (type == "r_invsimpson") { pattern = "[filename],r_invsimpson"; } else if (type == "r_bootstrap") { pattern = "[filename],r_bootstrap"; } else if (type == "r_coverage") { pattern = "[filename],r_coverage"; } else if (type == "r_nseqs") { pattern = "[filename],r_nseqs"; } else if (type == "r_heip") { pattern = "[filename],r_heip"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RareFactCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RareFactCommand::RareFactCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["rarefaction"] = tempOutNames; outputTypes["r_chao"] = tempOutNames; outputTypes["r_ace"] = tempOutNames; outputTypes["r_jack"] = tempOutNames; outputTypes["r_shannon"] = tempOutNames; outputTypes["r_shannoneven"] = tempOutNames; outputTypes["r_shannonrange"] = tempOutNames; outputTypes["r_heip"] = tempOutNames; outputTypes["r_smithwilson"] = tempOutNames; outputTypes["r_npshannon"] = tempOutNames; outputTypes["r_simpson"] = tempOutNames; outputTypes["r_simpsoneven"] = tempOutNames; outputTypes["r_invsimpson"] = tempOutNames; outputTypes["r_bootstrap"] = tempOutNames; outputTypes["r_coverage"] = tempOutNames; outputTypes["r_nseqs"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RareFactCommand", "RareFactCommand"); exit(1); } } //********************************************************************************************************************** RareFactCommand::RareFactCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["rarefaction"] = tempOutNames; outputTypes["r_chao"] = tempOutNames; outputTypes["r_ace"] = tempOutNames; outputTypes["r_jack"] = tempOutNames; outputTypes["r_shannon"] = tempOutNames; outputTypes["r_shannoneven"] = tempOutNames; outputTypes["r_shannonrange"] = tempOutNames; outputTypes["r_heip"] = tempOutNames; outputTypes["r_smithwilson"] = tempOutNames; outputTypes["r_npshannon"] = tempOutNames; outputTypes["r_simpson"] = tempOutNames; outputTypes["r_simpsoneven"] = tempOutNames; outputTypes["r_invsimpson"] = tempOutNames; outputTypes["r_bootstrap"] = tempOutNames; outputTypes["r_coverage"] = tempOutNames; outputTypes["r_nseqs"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; inputfile = listfile; m->setListFile(listfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { sabundfile = ""; abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { format = "sabund"; inputfile = sabundfile; m->setSabundFile(sabundfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { rabundfile = ""; abort = true; } else if (rabundfile == "not found") { rabundfile = ""; } else { format = "rabund"; inputfile = rabundfile; m->setRabundFile(rabundfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { format = "sharedfile"; inputfile = sharedfile; m->setSharedFile(sharedfile); } if ((sharedfile == "") && (listfile == "") && (rabundfile == "") && (sabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputfile = sharedfile; format = "sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { inputfile = listfile; format = "list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { rabundfile = m->getRabundFile(); if (rabundfile != "") { inputfile = rabundfile; format = "rabund"; m->mothurOut("Using " + rabundfile + " as input file for the rabund parameter."); m->mothurOutEndLine(); } else { sabundfile = m->getSabundFile(); if (sabundfile != "") { inputfile = sabundfile; format = "sabund"; m->mothurOut("Using " + sabundfile + " as input file for the sabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list, sabund, rabund or shared file before you can use the collect.single command."); m->mothurOutEndLine(); abort = true; } } } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "sobs"; } else { if (calc == "default") { calc = "sobs"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } string temp; temp = validParameter.validFile(parameters, "freq", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, freq); temp = validParameter.validFile(parameters, "abund", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, abund); temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, nIters); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "alpha", false); if (temp == "not found") { temp = "1"; } m->mothurConvert(temp, alpha); if ((alpha != 0) && (alpha != 1) && (alpha != 2)) { m->mothurOut("[ERROR]: Not a valid alpha value. Valid values are 0, 1 and 2."); m->mothurOutEndLine(); abort=true; } temp = validParameter.validFile(parameters, "groupmode", false); if (temp == "not found") { temp = "T"; } groupMode = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "RareFactCommand", "RareFactCommand"); exit(1); } } //********************************************************************************************************************** int RareFactCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } map > labelToEnds; if ((format != "sharedfile")) { inputFileNames.push_back(inputfile); } else { inputFileNames = parseSharedFile(sharedfile, labelToEnds); format = "rabund"; } if (m->control_pressed) { return 0; } map file2Group; //index in outputNames[i] -> group for (int p = 0; p < inputFileNames.size(); p++) { string fileNameRoot = outputDir + m->getRootName(m->getSimpleName(inputFileNames[p])); if (m->control_pressed) { outputTypes.clear(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } m->clearGroups(); return 0; } if (inputFileNames.size() > 1) { m->mothurOutEndLine(); m->mothurOut("Processing group " + groups[p]); m->mothurOutEndLine(); m->mothurOutEndLine(); } int i; ValidCalculators validCalculator; map variables; variables["[filename]"] = fileNameRoot; for (i=0; i 1) { file2Group[outputNames.size()-1] = groups[p]; } } } //if the users entered no valid calculators don't execute command if (rDisplays.size() == 0) { for(int i=0;igetOrderVector(); string lastLabel = order->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } return 0; } //as long as you are not at the end of the file or done wih the lines you want while((order != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(order->getLabel()) == 1){ m->mothurOut(order->getLabel()); m->mothurOutEndLine(); map >::iterator itEndings = labelToEnds.find(order->getLabel()); set ends; if (itEndings != labelToEnds.end()) { ends = itEndings->second; } rCurve = new Rarefact(order, rDisplays, processors, ends); rCurve->getCurve(freq, nIters); delete rCurve; processedLabels.insert(order->getLabel()); userLabels.erase(order->getLabel()); } if ((m->anyLabelsToProcess(order->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = order->getLabel(); delete order; order = (input->getOrderVector(lastLabel)); m->mothurOut(order->getLabel()); m->mothurOutEndLine(); map >::iterator itEndings = labelToEnds.find(order->getLabel()); set ends; if (itEndings != labelToEnds.end()) { ends = itEndings->second; } rCurve = new Rarefact(order, rDisplays, processors, ends); rCurve->getCurve(freq, nIters); delete rCurve; processedLabels.insert(order->getLabel()); userLabels.erase(order->getLabel()); //restore real lastlabel to save below order->setLabel(saveLabel); } lastLabel = order->getLabel(); delete order; order = (input->getOrderVector()); } if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } return 0; } //run last label if you need to if (needToRun == true) { if (order != NULL) { delete order; } order = (input->getOrderVector(lastLabel)); m->mothurOut(order->getLabel()); m->mothurOutEndLine(); map >::iterator itEndings = labelToEnds.find(order->getLabel()); set ends; if (itEndings != labelToEnds.end()) { ends = itEndings->second; } rCurve = new Rarefact(order, rDisplays, processors, ends); rCurve->getCurve(freq, nIters); delete rCurve; delete order; } for(int i=0;icontrol_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //create summary file containing all the groups data for each label - this function just combines the info from the files already created. if ((sharedfile != "") && (groupMode)) { outputNames = createGroupFile(outputNames, file2Group); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RareFactCommand", "execute"); exit(1); } } //********************************************************************************************************************** vector RareFactCommand::createGroupFile(vector& outputNames, map file2Group) { try { vector newFileNames; //find different types of files map > typesFiles; map > > fileLabels; //combofile name to labels. each label is a vector because it may be unique lci hci. vector groupNames; for (int i = 0; i < outputNames.size(); i++) { string extension = m->getExtension(outputNames[i]); string combineFileName = outputDir + m->getRootName(m->getSimpleName(sharedfile)) + "groups" + extension; m->mothurRemove(combineFileName); //remove old file ifstream in; m->openInputFile(outputNames[i], in); string labels = m->getline(in); istringstream iss (labels,istringstream::in); string newLabel = ""; vector theseLabels; while(!iss.eof()) { iss >> newLabel; m->gobble(iss); theseLabels.push_back(newLabel); } vector< vector > allLabels; vector thisSet; thisSet.push_back(theseLabels[0]); allLabels.push_back(thisSet); thisSet.clear(); //makes "numSampled" its own grouping for (int j = 1; j < theseLabels.size()-1; j++) { if (theseLabels[j+1] == "lci") { thisSet.push_back(theseLabels[j]); thisSet.push_back(theseLabels[j+1]); thisSet.push_back(theseLabels[j+2]); j++; j++; }else{ //no lci or hci for this calc. thisSet.push_back(theseLabels[j]); } allLabels.push_back(thisSet); thisSet.clear(); } fileLabels[combineFileName] = allLabels; map >::iterator itfind = typesFiles.find(extension); if (itfind != typesFiles.end()) { (itfind->second)[outputNames[i]] = file2Group[i]; }else { map temp; temp[outputNames[i]] = file2Group[i]; typesFiles[extension] = temp; } if (!(m->inUsersGroups(file2Group[i], groupNames))) { groupNames.push_back(file2Group[i]); } } //for each type create a combo file for (map >::iterator it = typesFiles.begin(); it != typesFiles.end(); it++) { ofstream out; string combineFileName = outputDir + m->getRootName(m->getSimpleName(sharedfile)) + "groups" + it->first; m->openOutputFileAppend(combineFileName, out); newFileNames.push_back(combineFileName); map thisTypesFiles = it->second; //it->second maps filename to group set numSampledSet; //open each type summary file map > > > files; //maps file name to lines in file int maxLines = 0; for (map::iterator itFileNameGroup = thisTypesFiles.begin(); itFileNameGroup != thisTypesFiles.end(); itFileNameGroup++) { string thisfilename = itFileNameGroup->first; string group = itFileNameGroup->second; ifstream temp; m->openInputFile(thisfilename, temp); //read through first line - labels m->getline(temp); m->gobble(temp); map > > thisFilesLines; while (!temp.eof()){ int numSampled = 0; temp >> numSampled; m->gobble(temp); vector< vector > theseReads; vector thisSet; thisSet.push_back(toString(numSampled)); theseReads.push_back(thisSet); thisSet.clear(); for (int k = 1; k < fileLabels[combineFileName].size(); k++) { //output thing like 0.03-A lci-A hci-A vector reads; string next = ""; for (int l = 0; l < fileLabels[combineFileName][k].size(); l++) { //output modified labels temp >> next; m->gobble(temp); reads.push_back(next); } theseReads.push_back(reads); } thisFilesLines[numSampled] = theseReads; m->gobble(temp); numSampledSet.insert(numSampled); } files[group] = thisFilesLines; //save longest file for below if (maxLines < thisFilesLines.size()) { maxLines = thisFilesLines.size(); } temp.close(); m->mothurRemove(thisfilename); } //output new labels line out << fileLabels[combineFileName][0][0]; for (int k = 1; k < fileLabels[combineFileName].size(); k++) { //output thing like 0.03-A lci-A hci-A for (int n = 0; n < groupNames.size(); n++) { // for each group for (int l = 0; l < fileLabels[combineFileName][k].size(); l++) { //output modified labels out << '\t' << fileLabels[combineFileName][k][l] << '-' << groupNames[n]; } } } out << endl; //for each label for (set::iterator itNumSampled = numSampledSet.begin(); itNumSampled != numSampledSet.end(); itNumSampled++) { out << (*itNumSampled); if (m->control_pressed) { break; } for (int k = 1; k < fileLabels[combineFileName].size(); k++) { //each chunk //grab data for each group for (int n = 0; n < groupNames.size(); n++) { string group = groupNames[n]; map > >::iterator itLine = files[group].find(*itNumSampled); if (itLine != files[group].end()) { for (int l = 0; l < (itLine->second)[k].size(); l++) { out << '\t' << (itLine->second)[k][l]; } }else { for (int l = 0; l < fileLabels[combineFileName][k].size(); l++) { out << "\tNA"; } } } } out << endl; } out.close(); } //return combine file name return newFileNames; } catch(exception& e) { m->errorOut(e, "RareFactCommand", "createGroupFile"); exit(1); } } //********************************************************************************************************************** vector RareFactCommand::parseSharedFile(string filename, map >& label2Ends) { try { vector filenames; map filehandles; map::iterator it3; input = new InputData(filename, "sharedfile"); vector lookup = input->getSharedRAbundVectors(); string sharedFileRoot = m->getRootName(filename); //clears file before we start to write to it below for (int i=0; imothurRemove((sharedFileRoot + lookup[i]->getGroup() + ".rabund")); filenames.push_back((sharedFileRoot + lookup[i]->getGroup() + ".rabund")); } ofstream* temp; for (int i=0; igetGroup()] = temp; groups.push_back(lookup[i]->getGroup()); } while(lookup[0] != NULL) { for (int i = 0; i < lookup.size(); i++) { RAbundVector rav = lookup[i]->getRAbundVector(); m->openOutputFileAppend(sharedFileRoot + lookup[i]->getGroup() + ".rabund", *(filehandles[lookup[i]->getGroup()])); rav.print(*(filehandles[lookup[i]->getGroup()])); (*(filehandles[lookup[i]->getGroup()])).close(); label2Ends[lookup[i]->getLabel()].insert(rav.getNumSeqs()); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } //free memory for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { delete it3->second; } delete input; m->clearGroups(); return filenames; } catch(exception& e) { m->errorOut(e, "RareFactCommand", "parseSharedFile"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/rarefactcommand.h000066400000000000000000000031561255543666200215430ustar00rootroot00000000000000#ifndef RAREFACTCOMMAND_H #define RAREFACTCOMMAND_H /* * rarefactcommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "ordervector.hpp" #include "inputdata.h" #include "rarefact.h" #include "display.h" #include "validcalculator.h" class RareFactCommand : public Command { public: RareFactCommand(string); RareFactCommand(); ~RareFactCommand(){} vector setParameters(); string getCommandName() { return "rarefaction.single"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Magurran AE (2004). Measuring biological diversity. Blackwell Pub.: Malden, Ma. \nhttp://www.mothur.org/wiki/Rarefaction.single"; } string getDescription() { return "generate intra-sample rarefaction curves using a re-sampling without replacement approach"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector rDisplays; OrderVector* order; InputData* input; Rarefact* rCurve; int nIters, abund, processors, alpha; float freq; bool abort, allLines, groupMode; set labels; //holds labels to be used string label, calc, sharedfile, listfile, rabundfile, sabundfile, format, inputfile; vector Estimators; vector inputFileNames, outputNames; vector groups; string outputDir; vector parseSharedFile(string, map >&); vector createGroupFile(vector&, map); }; #endif mothur-1.36.1/source/commands/rarefactsharedcommand.cpp000066400000000000000000001126031255543666200232630ustar00rootroot00000000000000/* * rarefactsharedcommand.cpp * Dotur * * Created by Sarah Westcott on 1/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "rarefactsharedcommand.h" #include "sharedsobs.h" #include "sharednseqs.h" #include "sharedutilities.h" #include "subsample.h" //********************************************************************************************************************** vector RareFactSharedCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pshared); CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pdesign); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pfreq("freq", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pfreq); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pcalc("calc", "Multiple", "sharednseqs-sharedobserved", "sharedobserved", "", "", "","",true,false,true); parameters.push_back(pcalc); CommandParameter psubsampleiters("subsampleiters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(psubsampleiters); CommandParameter psubsample("subsample", "String", "", "", "", "", "","",false,false); parameters.push_back(psubsample); CommandParameter pjumble("jumble", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pjumble); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter psets("sets", "String", "", "", "", "", "","",false,false); parameters.push_back(psets); CommandParameter pgroupmode("groupmode", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pgroupmode); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RareFactSharedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RareFactSharedCommand::getHelpString(){ try { string helpString = ""; ValidCalculators validCalculator; helpString += "The rarefaction.shared command parameters are shared, design, label, iters, groups, sets, jumble, groupmode and calc. shared is required if there is no current sharedfile. \n"; helpString += "The design parameter allows you to assign your groups to sets. If provided mothur will run rarefaction.shared on a per set basis. \n"; helpString += "The sets parameter allows you to specify which of the sets in your designfile you would like to analyze. The set names are separated by dashes. THe default is all sets in the designfile.\n"; helpString += "The rarefaction command should be in the following format: \n"; helpString += "rarefaction.shared(label=yourLabel, iters=yourIters, calc=yourEstimators, jumble=yourJumble, groups=yourGroups).\n"; helpString += "The freq parameter is used indicate when to output your data, by default it is set to 100. But you can set it to a percentage of the number of sequence. For example freq=0.10, means 10%. \n"; helpString += "Example rarefaction.shared(label=unique-0.01-0.03, iters=10000, groups=B-C, jumble=T, calc=sharedobserved).\n"; helpString += "The default values for iters is 1000, freq is 100, and calc is sharedobserved which calculates the shared rarefaction curve for the observed richness.\n"; helpString += "The subsampleiters parameter allows you to choose the number of times you would like to run the subsample.\n"; helpString += "The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group.\n"; helpString += "The default value for groups is all the groups in your groupfile, and jumble is true.\n"; helpString += validCalculator.printCalc("sharedrarefaction"); helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. You must enter at least 2 valid groups.\n"; helpString += "Note: No spaces between parameter labels (i.e. freq), '=' and parameters (i.e.yourFreq).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RareFactSharedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RareFactSharedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "sharedrarefaction") { pattern = "[filename],shared.rarefaction"; } else if (type == "sharedr_nseqs") { pattern = "[filename],shared.r_nseqs"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RareFactSharedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RareFactSharedCommand::RareFactSharedCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["sharedrarefaction"] = tempOutNames; outputTypes["sharedr_nseqs"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RareFactSharedCommand", "RareFactSharedCommand"); exit(1); } } //********************************************************************************************************************** RareFactSharedCommand::RareFactSharedCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["sharedrarefaction"] = tempOutNames; outputTypes["sharedr_nseqs"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } } //get shared file sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { abort = true; designfile = ""; } else if (designfile == "not found") { designfile = ""; } else { m->setDesignFile(designfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "sharedobserved"; } else { if (calc == "default") { calc = "sharedobserved"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); string sets = validParameter.validFile(parameters, "sets", false); if (sets == "not found") { sets = ""; } else { m->splitAtDash(sets, Sets); } string temp; temp = validParameter.validFile(parameters, "freq", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, freq); temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, nIters); temp = validParameter.validFile(parameters, "jumble", false); if (temp == "not found") { temp = "T"; } if (m->isTrue(temp)) { jumble = true; } else { jumble = false; } m->jumble = jumble; temp = validParameter.validFile(parameters, "groupmode", false); if (temp == "not found") { temp = "T"; } groupMode = m->isTrue(temp); temp = validParameter.validFile(parameters, "subsampleiters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "subsample", false); if (temp == "not found") { temp = "F"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); subsample = true; } else { if (m->isTrue(temp)) { subsample = true; subsampleSize = -1; } //we will set it to smallest group later else { subsample = false; } } if (subsample == false) { iters = 1; } } } catch(exception& e) { m->errorOut(e, "RareFactSharedCommand", "RareFactSharedCommand"); exit(1); } } //********************************************************************************************************************** int RareFactSharedCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } DesignMap designMap; if (designfile == "") { //fake out designMap to run with process process(designMap, ""); }else { designMap.read(designfile); //fill Sets - checks for "all" and for any typo groups SharedUtil util; vector nameSets = designMap.getCategory(); util.setGroups(Sets, nameSets); for (int i = 0; i < Sets.size(); i++) { process(designMap, Sets[i]); } if (groupMode) { outputNames = createGroupFile(outputNames); } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RareFactSharedCommand", "execute"); exit(1); } } //********************************************************************************************************************** int RareFactSharedCommand::process(DesignMap& designMap, string thisSet){ try { Rarefact* rCurve; vector rDisplays; InputData input(sharedfile, "sharedfile"); lookup = input.getSharedRAbundVectors(); if (lookup.size() < 2) { m->mothurOut("I cannot run the command without at least 2 valid groups."); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } string fileNameRoot = outputDir + m->getRootName(m->getSimpleName(sharedfile)); vector newGroups = m->getGroups(); if (thisSet != "") { //make groups only filled with groups from this set so that's all inputdata will read vector thisSets; thisSets.push_back(thisSet); newGroups = designMap.getNamesGroups(thisSets); fileNameRoot += thisSet + "."; } vector subset; if (thisSet == "") { subset.clear(); subset = lookup; } else {//fill subset with this sets groups subset.clear(); for (int i = 0; i < lookup.size(); i++) { if (m->inUsersGroups(lookup[i]->getGroup(), newGroups)) { subset.push_back(lookup[i]); } } } /******************************************************/ if (subsample) { if (subsampleSize == -1) { //user has not set size, set size = smallest samples size subsampleSize = subset[0]->getNumSeqs(); for (int i = 1; i < subset.size(); i++) { int thisSize = subset[i]->getNumSeqs(); if (thisSize < subsampleSize) { subsampleSize = thisSize; } } }else { newGroups.clear(); vector temp; for (int i = 0; i < subset.size(); i++) { if (subset[i]->getNumSeqs() < subsampleSize) { m->mothurOut(subset[i]->getGroup() + " contains " + toString(subset[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete subset[i]; }else { newGroups.push_back(subset[i]->getGroup()); temp.push_back(subset[i]); } } subset = temp; } if (subset.size() < 2) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } } /******************************************************/ map variables; variables["[filename]"] = fileNameRoot; ValidCalculators validCalculator; for (int i=0; icontrol_pressed) { for(int i=0;imothurRemove(outputNames[i]); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. string lastLabel = subset[0]->getLabel(); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((subset[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(subset[0]->getLabel()) == 1){ m->mothurOut(subset[0]->getLabel() + '\t' + thisSet); m->mothurOutEndLine(); rCurve = new Rarefact(subset, rDisplays); rCurve->getSharedCurve(freq, nIters); delete rCurve; if (subsample) { subsampleLookup(subset, fileNameRoot); } processedLabels.insert(subset[0]->getLabel()); userLabels.erase(subset[0]->getLabel()); } if ((m->anyLabelsToProcess(subset[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = subset[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); if (thisSet == "") { subset.clear(); subset = lookup; } else {//fill subset with this sets groups subset.clear(); for (int i = 0; i < lookup.size(); i++) { if (m->inUsersGroups(lookup[i]->getGroup(), newGroups)) { subset.push_back(lookup[i]); } } } m->mothurOut(subset[0]->getLabel() + '\t' + thisSet); m->mothurOutEndLine(); rCurve = new Rarefact(subset, rDisplays); rCurve->getSharedCurve(freq, nIters); delete rCurve; if (subsample) { subsampleLookup(subset, fileNameRoot); } processedLabels.insert(subset[0]->getLabel()); userLabels.erase(subset[0]->getLabel()); //restore real lastlabel to save below subset[0]->setLabel(saveLabel); } lastLabel = subset[0]->getLabel(); //get next line to process for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); if (lookup[0] != NULL) { if (thisSet == "") { subset.clear(); subset = lookup; } else {//fill subset with this sets groups subset.clear(); for (int i = 0; i < lookup.size(); i++) { if (m->inUsersGroups(lookup[i]->getGroup(), newGroups)) { subset.push_back(lookup[i]); } } } }else { subset.clear(); subset.push_back(NULL); } } if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } if (m->control_pressed) { for(int i=0;imothurRemove(outputNames[i]); } return 0; } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); if (thisSet == "") { subset.clear(); subset = lookup; } else {//fill subset with this sets groups subset.clear(); for (int i = 0; i < lookup.size(); i++) { if (m->inUsersGroups(lookup[i]->getGroup(), newGroups)) { subset.push_back(lookup[i]); } } } m->mothurOut(subset[0]->getLabel() + '\t' + thisSet); m->mothurOutEndLine(); rCurve = new Rarefact(subset, rDisplays); rCurve->getSharedCurve(freq, nIters); delete rCurve; if (subsample) { subsampleLookup(subset, fileNameRoot); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } for(int i=0;ierrorOut(e, "RareFactSharedCommand", "process"); exit(1); } } //********************************************************************************************************************** int RareFactSharedCommand::subsampleLookup(vector& thisLookup, string fileNameRoot) { try { map > filenames; for (int thisIter = 0; thisIter < iters; thisIter++) { vector thisItersLookup = thisLookup; //we want the summary results for the whole dataset, then the subsampling SubSample sample; vector tempLabels; //dont need since we arent printing the sampled sharedRabunds //make copy of lookup so we don't get access violations vector newLookup; for (int k = 0; k < thisItersLookup.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisItersLookup[k]->getLabel()); temp->setGroup(thisItersLookup[k]->getGroup()); newLookup.push_back(temp); } //for each bin for (int k = 0; k < thisItersLookup[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } for (int j = 0; j < thisItersLookup.size(); j++) { newLookup[j]->push_back(thisItersLookup[j]->getAbundance(k), thisItersLookup[j]->getGroup()); } } tempLabels = sample.getSample(newLookup, subsampleSize); thisItersLookup = newLookup; Rarefact* rCurve; vector rDisplays; string thisfileNameRoot = fileNameRoot + toString(thisIter); map variables; variables["[filename]"] = thisfileNameRoot; ValidCalculators validCalculator; for (int i=0; igetSharedCurve(freq, nIters); delete rCurve; //clean up memory for (int i = 0; i < thisItersLookup.size(); i++) { delete thisItersLookup[i]; } thisItersLookup.clear(); for(int i=0;i > > results; //iter -> numSampled -> data for (map >::iterator it = filenames.begin(); it != filenames.end(); it++) { vector thisTypesFiles = it->second; vector columnHeaders; for (int i = 0; i < thisTypesFiles.size(); i++) { ifstream in; m->openInputFile(thisTypesFiles[i], in); string headers = m->getline(in); m->gobble(in); columnHeaders = m->splitWhiteSpace(headers); int numCols = columnHeaders.size(); vector > thisFilesLines; while (!in.eof()) { if (m->control_pressed) { break; } vector data; data.resize(numCols, 0); //read numSampled line for (int j = 0; j < numCols; j++) { in >> data[j]; m->gobble(in); } thisFilesLines.push_back(data); } in.close(); results.push_back(thisFilesLines); m->mothurRemove(thisTypesFiles[i]); } if (!m->control_pressed) { //process results map variables; variables["[filename]"] = fileNameRoot + "ave-std." + thisLookup[0]->getLabel() + "."; string outputFile = getOutputFileName(it->first,variables); ofstream out; m->openOutputFile(outputFile, out); outputNames.push_back(outputFile); outputTypes[it->first].push_back(outputFile); out << columnHeaders[0] << '\t' << "method"; for (int i = 1; i < columnHeaders.size(); i++) { out << '\t' << columnHeaders[i]; } out << endl; vector< vector > aveResults; aveResults.resize(results[0].size()); for (int i = 0; i < aveResults.size(); i++) { aveResults[i].resize(results[0][i].size(), 0.0); } for (int thisIter = 0; thisIter < iters; thisIter++) { //sum all groups dists for each calculator for (int i = 0; i < aveResults.size(); i++) { //initialize sums to zero. aveResults[i][0] = results[thisIter][i][0]; for (int j = 1; j < aveResults[i].size(); j++) { aveResults[i][j] += results[thisIter][i][j]; } } } for (int i = 0; i < aveResults.size(); i++) { //finds average. for (int j = 1; j < aveResults[i].size(); j++) { aveResults[i][j] /= (float) iters; } } //standard deviation vector< vector > stdResults; stdResults.resize(results[0].size()); for (int i = 0; i < stdResults.size(); i++) { stdResults[i].resize(results[0][i].size(), 0.0); } for (int thisIter = 0; thisIter < iters; thisIter++) { //compute the difference of each dist from the mean, and square the result of each for (int i = 0; i < stdResults.size(); i++) { stdResults[i][0] = aveResults[i][0]; for (int j = 1; j < stdResults[i].size(); j++) { stdResults[i][j] += ((results[thisIter][i][j] - aveResults[i][j]) * (results[thisIter][i][j] - aveResults[i][j])); } } } for (int i = 0; i < stdResults.size(); i++) { //finds average. out << aveResults[i][0] << '\t' << "ave"; for (int j = 1; j < aveResults[i].size(); j++) { out << '\t' << aveResults[i][j]; } out << endl; out << stdResults[i][0] << '\t' << "std"; for (int j = 1; j < stdResults[i].size(); j++) { stdResults[i][j] /= (float) iters; stdResults[i][j] = sqrt(stdResults[i][j]); out << '\t' << stdResults[i][j]; } out << endl; } out.close(); } } return 0; } catch(exception& e) { m->errorOut(e, "RareFactSharedCommand", "subsample"); exit(1); } } //********************************************************************************************************************** vector RareFactSharedCommand::createGroupFile(vector& outputNames) { try { vector newFileNames; //find different types of files map > typesFiles; map > > fileLabels; //combofile name to labels. each label is a vector because it may be unique lci hci. vector groupNames; for (int i = 0; i < outputNames.size(); i++) { string extension = m->getExtension(outputNames[i]); string combineFileName = outputDir + m->getRootName(m->getSimpleName(sharedfile)) + "groups" + extension; m->mothurRemove(combineFileName); //remove old file ifstream in; m->openInputFile(outputNames[i], in); string labels = m->getline(in); istringstream iss (labels,istringstream::in); string newLabel = ""; vector theseLabels; while(!iss.eof()) { iss >> newLabel; m->gobble(iss); theseLabels.push_back(newLabel); } vector< vector > allLabels; vector thisSet; thisSet.push_back(theseLabels[0]); allLabels.push_back(thisSet); thisSet.clear(); //makes "numSampled" its own grouping for (int j = 1; j < theseLabels.size()-1; j++) { if (theseLabels[j+1] == "lci") { thisSet.push_back(theseLabels[j]); thisSet.push_back(theseLabels[j+1]); thisSet.push_back(theseLabels[j+2]); j++; j++; }else{ //no lci or hci for this calc. thisSet.push_back(theseLabels[j]); } allLabels.push_back(thisSet); thisSet.clear(); } fileLabels[combineFileName] = allLabels; map >::iterator itfind = typesFiles.find(extension); if (itfind != typesFiles.end()) { (itfind->second)[outputNames[i]] = file2Group[i]; }else { map temp; temp[outputNames[i]] = file2Group[i]; typesFiles[extension] = temp; } if (!(m->inUsersGroups(file2Group[i], groupNames))) { groupNames.push_back(file2Group[i]); } } //for each type create a combo file for (map >::iterator it = typesFiles.begin(); it != typesFiles.end(); it++) { ofstream out; string combineFileName = outputDir + m->getRootName(m->getSimpleName(sharedfile)) + "groups" + it->first; m->openOutputFileAppend(combineFileName, out); newFileNames.push_back(combineFileName); map thisTypesFiles = it->second; //it->second maps filename to group set numSampledSet; //open each type summary file map > > > files; //maps file name to lines in file int maxLines = 0; for (map::iterator itFileNameGroup = thisTypesFiles.begin(); itFileNameGroup != thisTypesFiles.end(); itFileNameGroup++) { string thisfilename = itFileNameGroup->first; string group = itFileNameGroup->second; ifstream temp; m->openInputFile(thisfilename, temp); //read through first line - labels m->getline(temp); m->gobble(temp); map > > thisFilesLines; while (!temp.eof()){ int numSampled = 0; temp >> numSampled; m->gobble(temp); vector< vector > theseReads; vector thisSet; thisSet.push_back(toString(numSampled)); theseReads.push_back(thisSet); thisSet.clear(); for (int k = 1; k < fileLabels[combineFileName].size(); k++) { //output thing like 0.03-A lci-A hci-A vector reads; string next = ""; for (int l = 0; l < fileLabels[combineFileName][k].size(); l++) { //output modified labels temp >> next; m->gobble(temp); reads.push_back(next); } theseReads.push_back(reads); } thisFilesLines[numSampled] = theseReads; m->gobble(temp); numSampledSet.insert(numSampled); } files[group] = thisFilesLines; //save longest file for below if (maxLines < thisFilesLines.size()) { maxLines = thisFilesLines.size(); } temp.close(); m->mothurRemove(thisfilename); } //output new labels line out << fileLabels[combineFileName][0][0]; for (int k = 1; k < fileLabels[combineFileName].size(); k++) { //output thing like 0.03-A lci-A hci-A for (int n = 0; n < groupNames.size(); n++) { // for each group for (int l = 0; l < fileLabels[combineFileName][k].size(); l++) { //output modified labels out << '\t' << fileLabels[combineFileName][k][l] << '-' << groupNames[n]; } } } out << endl; //for each label for (set::iterator itNumSampled = numSampledSet.begin(); itNumSampled != numSampledSet.end(); itNumSampled++) { out << (*itNumSampled); if (m->control_pressed) { break; } for (int k = 1; k < fileLabels[combineFileName].size(); k++) { //each chunk //grab data for each group for (map > > >::iterator itFileNameGroup = files.begin(); itFileNameGroup != files.end(); itFileNameGroup++) { string group = itFileNameGroup->first; map > >::iterator itLine = files[group].find(*itNumSampled); if (itLine != files[group].end()) { for (int l = 0; l < (itLine->second)[k].size(); l++) { out << '\t' << (itLine->second)[k][l]; } }else { for (int l = 0; l < fileLabels[combineFileName][k].size(); l++) { out << "\tNA"; } } } } out << endl; } out.close(); } //return combine file name return newFileNames; } catch(exception& e) { m->errorOut(e, "RareFactSharedCommand", "createGroupFile"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/rarefactsharedcommand.h000066400000000000000000000032131255543666200227240ustar00rootroot00000000000000#ifndef RAREFACTSHAREDCOMMAND_H #define RAREFACTSHAREDCOMMAND_H /* * rarefactsharedcommand.h * Dotur * * Created by Sarah Westcott on 1/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "rarefact.h" #include "display.h" #include "validcalculator.h" #include "designmap.h" class RareFactSharedCommand : public Command { public: RareFactSharedCommand(string); RareFactSharedCommand(); ~RareFactSharedCommand() {} vector setParameters(); string getCommandName() { return "rarefaction.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Magurran AE (2004). Measuring biological diversity. Blackwell Pub.: Malden, Ma. \nhttp://www.mothur.org/wiki/Rarefaction.shared"; } string getDescription() { return "generate inter-sample rarefaction curves using a re-sampling without replacement approach"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lookup; int nIters, subsampleSize, iters; string format; float freq; map file2Group; //index in outputNames[i] -> group bool abort, allLines, jumble, groupMode, subsample; set labels; //holds labels to be used string label, calc, groups, outputDir, sharedfile, designfile; vector Estimators, Groups, outputNames, Sets; int process(DesignMap&, string); vector createGroupFile(vector&); int subsampleLookup(vector&, string); }; #endif mothur-1.36.1/source/commands/removedistscommand.cpp000066400000000000000000000423361255543666200226560ustar00rootroot00000000000000// // removedistscommand.cpp // Mothur // // Created by Sarah Westcott on 1/29/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "removedistscommand.h" //********************************************************************************************************************** vector RemoveDistsCommand::setParameters(){ try { CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "PhylipColumn", "none","phylip",false,false,true); parameters.push_back(pphylip); CommandParameter pcolumn("column", "InputTypes", "", "", "none", "PhylipColumn", "none","column",false,false,true); parameters.push_back(pcolumn); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(paccnos); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RemoveDistsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RemoveDistsCommand::getHelpString(){ try { string helpString = ""; helpString += "The remove.dists command removes distances from a phylip or column file related to groups or sequences listed in an accnos file.\n"; helpString += "The remove.dists command parameters are accnos, phylip and column.\n"; helpString += "The remove.dists command should be in the following format: get.dists(accnos=yourAccnos, phylip=yourPhylip).\n"; helpString += "Example remove.dists(accnos=final.accnos, phylip=final.an.thetayc.0.03.lt.ave.dist).\n"; helpString += "Note: No spaces between parameter labels (i.e. accnos), '=' and parameters (i.e.final.accnos).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RemoveDistsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RemoveDistsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "phylip") { pattern = "[filename],pick,[extension]"; } else if (type == "column") { pattern = "[filename],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RemoveDistsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RemoveDistsCommand::RemoveDistsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RemoveDistsCommand", "RemoveDistsCommand"); exit(1); } } //********************************************************************************************************************** RemoveDistsCommand::RemoveDistsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["column"] = tempOutNames; outputTypes["phylip"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } } //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = m->getAccnosFile(); if (accnosfile != "") { m->mothurOut("Using " + accnosfile + " as input file for the accnos parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no valid accnos file and accnos is required."); m->mothurOutEndLine(); abort = true; } }else { m->setAccnosFile(accnosfile); } phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { phylipfile = ""; abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { columnfile = ""; abort = true; } else if (columnfile == "not found") { columnfile = ""; } else { m->setColumnFile(columnfile); } if ((phylipfile == "") && (columnfile == "")) { //is there are current file available for either of these? //give priority to column, then phylip columnfile = m->getColumnFile(); if (columnfile != "") { m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a phylip or column file."); m->mothurOutEndLine(); abort = true; } } } } } catch(exception& e) { m->errorOut(e, "RemoveDistsCommand", "RemoveDistsCommand"); exit(1); } } //********************************************************************************************************************** int RemoveDistsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get names you want to keep names = m->readAccnos(accnosfile); if (m->control_pressed) { return 0; } //read through the correct file and output lines you want to keep if (phylipfile != "") { readPhylip(); } if (columnfile != "") { readColumn(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("phylip"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setPhylipFile(current); } } itTypes = outputTypes.find("column"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setColumnFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "RemoveDistsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int RemoveDistsCommand::readPhylip(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(phylipfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(phylipfile)); variables["[extension]"] = m->getExtension(phylipfile); string outputFileName = getOutputFileName("phylip", variables); ifstream in; m->openInputFile(phylipfile, in); float distance; int square, nseqs; string name; unsigned int row; set rows; //converts names in names to a index row = 0; string numTest; in >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } //not one we want to remove if (names.count(name) == 0) { rows.insert(row); } row++; //is the matrix square? char d; while((d=in.get()) != EOF){ if(isalnum(d)){ square = 1; in.putback(d); for(int i=0;i> distance; } break; } if(d == '\n'){ square = 0; break; } } //map name to row/column if(square == 0){ for(int i=1;i> name; if (names.count(name) == 0) { rows.insert(row); } row++; for(int j=0;jcontrol_pressed) { in.close(); return 0; } in >> distance; } } } else{ for(int i=1;i> name; if (names.count(name) == 0) { rows.insert(row); } row++; for(int j=0;jcontrol_pressed) { in.close(); return 0; } in >> distance; } } } in.close(); if (m->control_pressed) { return 0; } //read through file only printing rows and columns of seqs in names ifstream inPhylip; m->openInputFile(phylipfile, inPhylip); inPhylip >> numTest; ofstream out; m->openOutputFile(outputFileName, out); outputTypes["phylip"].push_back(outputFileName); outputNames.push_back(outputFileName); out << names.size() << endl; unsigned int count = 0; unsigned int keptCount = 0; if(square == 0){ for(int i=0;i> name; bool ignoreRow = false; if (names.count(name) != 0) { ignoreRow = true; count++; } else{ out << name; keptCount++; } for(int j=0;jcontrol_pressed) { inPhylip.close(); out.close(); return 0; } inPhylip >> distance; if (!ignoreRow) { //is this a column we want if(rows.count(j) != 0) { out << '\t' << distance; } } } if (!ignoreRow) { out << endl; } } } else{ for(int i=0;i> name; bool ignoreRow = false; if (names.count(name) != 0) { ignoreRow = true; count++; } else{ out << name; keptCount++; } for(int j=0;jcontrol_pressed) { inPhylip.close(); out.close(); return 0; } inPhylip >> distance; if (!ignoreRow) { //is this a column we want if(rows.count(j) != 0) { out << '\t' << distance; } } } if (!ignoreRow) { out << endl; } } } inPhylip.close(); out.close(); if (keptCount == 0) { m->mothurOut("Your file contains ONLY distances related to groups or sequences listed in the accnos file."); m->mothurOutEndLine(); } else if (count != names.size()) { m->mothurOut("[WARNING]: Your accnos file contains " + toString(names.size()) + " groups or sequences, but I only found " + toString(count) + " of them in the phylip file."); m->mothurOutEndLine(); //rewrite with new number m->renameFile(outputFileName, outputFileName+".temp"); ofstream out2; m->openOutputFile(outputFileName, out2); out2 << keptCount << endl; ifstream in3; m->openInputFile(outputFileName+".temp", in3); in3 >> nseqs; m->gobble(in3); char buffer[4096]; while (!in3.eof()) { in3.read(buffer, 4096); out2.write(buffer, in3.gcount()); } in3.close(); out2.close(); m->mothurRemove(outputFileName+".temp"); } m->mothurOut("Removed " + toString(count) + " groups or sequences from your phylip file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveDistsCommand", "readPhylip"); exit(1); } } //********************************************************************************************************************** int RemoveDistsCommand::readColumn(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(columnfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(columnfile)); variables["[extension]"] = m->getExtension(columnfile); string outputFileName = getOutputFileName("column", variables); outputTypes["column"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(columnfile, in); set removeNames; string firstName, secondName; float distance; bool wrote = false; while (!in.eof()) { if (m->control_pressed) { out.close(); in.close(); return 0; } in >> firstName >> secondName >> distance; m->gobble(in); //is either names in the accnos file if (names.count(firstName) != 0) { removeNames.insert(firstName); if (names.count(secondName) != 0) { removeNames.insert(secondName); } } else if (names.count(secondName) != 0) { removeNames.insert(secondName); if (names.count(firstName) != 0) { removeNames.insert(firstName); } } else { wrote = true; out << firstName << '\t' << secondName << '\t' << distance << endl; } } in.close(); out.close(); if (!wrote) { m->mothurOut("Your file contains ONLY distances related to groups or sequences listed in the accnos file."); m->mothurOutEndLine(); } else if (removeNames.size() != names.size()) { m->mothurOut("[WARNING]: Your accnos file contains " + toString(names.size()) + " groups or sequences, but I only found " + toString(removeNames.size()) + " of them in the column file."); m->mothurOutEndLine(); } m->mothurOut("Removed " + toString(removeNames.size()) + " groups or sequences from your column file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveDistsCommand", "readColumn"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/removedistscommand.h000066400000000000000000000020651255543666200223160ustar00rootroot00000000000000// // removedistscommand.h // Mothur // // Created by Sarah Westcott on 1/29/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_removedistscommand_h #define Mothur_removedistscommand_h #include "command.hpp" class RemoveDistsCommand : public Command { public: RemoveDistsCommand(string); RemoveDistsCommand(); ~RemoveDistsCommand(){} vector setParameters(); string getCommandName() { return "remove.dists"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Remove.dists"; } string getDescription() { return "removes distances from a phylip or column file related to groups or sequences listed in an accnos file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: set names; string accnosfile, phylipfile, columnfile, outputDir; bool abort; vector outputNames; int readPhylip(); int readColumn(); }; #endif mothur-1.36.1/source/commands/removegroupscommand.cpp000066400000000000000000001244331255543666200230460ustar00rootroot00000000000000/* * removegroupscommand.cpp * Mothur * * Created by westcott on 11/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "removegroupscommand.h" #include "sequence.hpp" #include "listvector.hpp" #include "sharedutilities.h" #include "inputdata.h" #include "designmap.h" //********************************************************************************************************************** vector RemoveGroupsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "FNGLT","fasta",false,false,true); parameters.push_back(pfasta); CommandParameter pshared("shared", "InputTypes", "", "", "none", "sharedGroup", "none","shared",false,false,true); parameters.push_back(pshared); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "sharedGroup", "FNGLT","group",false,false,true); parameters.push_back(pgroup); CommandParameter pdesign("design", "InputTypes", "", "", "none", "sharedGroup", "FNGLT","design",false,false); parameters.push_back(pdesign); CommandParameter plist("list", "InputTypes", "", "", "none", "none", "FNGLT","list",false,false,true); parameters.push_back(plist); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "FNGLT","taxonomy",false,false,true); parameters.push_back(ptaxonomy); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RemoveGroupsCommand::getHelpString(){ try { string helpString = ""; helpString += "The remove.groups command removes sequences from a specfic group or set of groups from the following file types: fasta, name, group, count, list, taxonomy, design or sharedfile.\n"; helpString += "It outputs a file containing the sequences NOT in the those specified groups, or with a sharedfile eliminates the groups you selected.\n"; helpString += "The remove.groups command parameters are accnos, fasta, name, group, list, taxonomy, shared, design and groups. The group or count parameter is required, unless you have a current group or count file or are using a sharedfile.\n"; helpString += "You must also provide an accnos containing the list of groups to remove or set the groups parameter to the groups you wish to remove.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like removed. You can separate group names with dashes.\n"; helpString += "The remove.groups command should be in the following format: remove.groups(accnos=yourAccnos, fasta=yourFasta, group=yourGroupFile).\n"; helpString += "Example remove.groups(accnos=amazon.accnos, fasta=amazon.fasta, group=amazon.groups).\n"; helpString += "or remove.groups(groups=pasture, fasta=amazon.fasta, amazon.groups).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RemoveGroupsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],pick,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "name") { pattern = "[filename],pick,[extension]"; } else if (type == "group") { pattern = "[filename],pick,[extension]"; } else if (type == "count") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[tag],pick,[extension]"; } else if (type == "shared") { pattern = "[filename],[tag],pick,[extension]"; } else if (type == "design") { pattern = "[filename],[tag],pick,[extension]-[filename],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RemoveGroupsCommand::RemoveGroupsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["design"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "RemoveGroupsCommand"); exit(1); } } //********************************************************************************************************************** RemoveGroupsCommand::RemoveGroupsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["design"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { accnosfile = ""; abort = true; } else if (accnosfile == "not found") { accnosfile = ""; } else { m->setAccnosFile(accnosfile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { taxfile = ""; abort = true; } else if (taxfile == "not found") { taxfile = ""; } else { m->setTaxonomyFile(taxfile); } designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { designfile = ""; abort = true; } else if (designfile == "not found") { designfile = ""; } else { m->setDesignFile(designfile); } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } if ((sharedfile == "") && (groupfile == "") && (designfile == "") && (countfile == "")) { //is there are current file available for any of these? if ((namefile != "") || (fastafile != "") || (listfile != "") || (taxfile != "")) { //give priority to group, then shared groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current groupfile, countfile or sharedfile and one is required."); m->mothurOutEndLine(); abort = true; } } } }else { //give priority to shared, then group sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { designfile = m->getDesignFile(); if (designfile != "") { m->mothurOut("Using " + designfile + " as input file for the design parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current groupfile, designfile, countfile or sharedfile and one is required."); m->mothurOutEndLine(); abort = true; } } } } } } if ((accnosfile == "") && (Groups.size() == 0)) { m->mothurOut("You must provide an accnos file containing group names or specify groups using the groups parameter."); m->mothurOutEndLine(); abort = true; } if ((fastafile == "") && (namefile == "") && (countfile == "") && (groupfile == "") && (designfile == "") && (sharedfile == "") && (listfile == "") && (taxfile == "")) { m->mothurOut("You must provide at least one of the following: fasta, name, taxonomy, group, shared, design, count or list."); m->mothurOutEndLine(); abort = true; } if (((groupfile == "") && (countfile == "")) && ((namefile != "") || (fastafile != "") || (listfile != "") || (taxfile != ""))) { m->mothurOut("If using a fasta, name, taxonomy, group or list, then you must provide a group or count file."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if ((namefile == "") && ((fastafile != "") || (taxfile != ""))){ vector files; files.push_back(fastafile); files.push_back(taxfile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "RemoveGroupsCommand"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get groups you want to remove if (accnosfile != "") { m->readAccnos(accnosfile, Groups); m->setGroups(Groups); } if (groupfile != "") { groupMap = new GroupMap(groupfile); groupMap->readMap(); //make sure groups are valid //takes care of user setting groupNames that are invalid or setting groups=all vector namesGroups = groupMap->getNamesOfGroups(); vector checkedGroups; for (int i = 0; i < Groups.size(); i++) { if (m->inUsersGroups(Groups[i], namesGroups)) { checkedGroups.push_back(Groups[i]); } else { m->mothurOut("[WARNING]: " + Groups[i] + " is not a valid group in your groupfile, ignoring.\n"); } } if (checkedGroups.size() == 0) { m->mothurOut("[ERROR]: no valid groups, aborting.\n"); delete groupMap; return 0; } else { Groups = checkedGroups; m->setGroups(Groups); } //fill names with names of sequences that are from the groups we want to remove fillNames(); delete groupMap; }else if (countfile != ""){ if ((fastafile != "") || (listfile != "") || (taxfile != "")) { m->mothurOut("\n[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.\n\n"); } CountTable ct; ct.readTable(countfile, true, false); if (!ct.hasGroupInfo()) { m->mothurOut("[ERROR]: your count file does not contain group info, aborting.\n"); return 0; } vector gNamesOfGroups = ct.getNamesOfGroups(); SharedUtil util; util.setGroups(Groups, gNamesOfGroups); vector namesOfSeqs = ct.getNamesOfSeqs(); sort(Groups.begin(), Groups.end()); for (int i = 0; i < namesOfSeqs.size(); i++) { vector thisSeqsGroups = ct.getGroups(namesOfSeqs[i]); if (m->isSubset(Groups, thisSeqsGroups)) { //you only have seqs from these groups so remove you names.insert(namesOfSeqs[i]); } } } if (m->control_pressed) { return 0; } //read through the correct file and output lines you want to keep if (namefile != "") { readName(); } if (fastafile != "") { readFasta(); } if (groupfile != "") { readGroup(); } if (countfile != "") { readCount(); } if (listfile != "") { readList(); } if (taxfile != "") { readTax(); } if (sharedfile != "") { readShared(); } if (designfile != "") { readDesign(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } itTypes = outputTypes.find("design"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setDesignFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::readFasta(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string outputFileName = getOutputFileName("fasta", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(fastafile, in); string name; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; currSeq.printSequence(out); }else { //if you are not in the accnos file check if you are a name that needs to be changed map::iterator it = uniqueToRedundant.find(name); if (it != uniqueToRedundant.end()) { wroteSomething = true; currSeq.setName(it->second); currSeq.printSequence(out); }else { removedCount++; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the groups you wish to remove."); m->mothurOutEndLine(); } outputTypes["fasta"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your fasta file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::readShared(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } //get group names from sharedfile so we can set Groups to the groupNames we want to keep //that way we can take advantage of the reads in inputdata and sharedRabundVector InputData* tempInput = new InputData(sharedfile, "sharedfile"); vector lookup = tempInput->getSharedRAbundVectors(); map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); //save m->Groups vector allGroupsNames = m->getAllGroups(); vector mothurOutGroups = m->getGroups(); vector groupsToKeep; for (int i = 0; i < allGroupsNames.size(); i++) { if (!m->inUsersGroups(allGroupsNames[i], m->getGroups())) { groupsToKeep.push_back(allGroupsNames[i]); } } if (allGroupsNames.size() == groupsToKeep.size()) { m->mothurOut("Your file does not contain any groups you wish to remove."); m->mothurOutEndLine(); m->setGroups(mothurOutGroups); delete tempInput; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } //reset read for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } delete tempInput; m->setGroups(groupsToKeep); m->clearAllGroups(); m->saveNextLabel = ""; m->printedSharedHeaders = false; m->currentSharedBinLabels.clear(); m->sharedBinLabelsInFile.clear(); InputData input(sharedfile, "sharedfile"); lookup = input.getSharedRAbundVectors(); bool wroteSomething = false; while(lookup[0] != NULL) { variables["[tag]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); if (m->control_pressed) { out.close(); m->mothurRemove(outputFileName); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } lookup[0]->printHeaders(out); for (int i = 0; i < lookup.size(); i++) { out << lookup[i]->getLabel() << '\t' << lookup[i]->getGroup() << '\t'; lookup[i]->print(out); wroteSomething = true; } //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); out.close(); } m->setGroups(mothurOutGroups); if (wroteSomething == false) { m->mothurOut("Your file contains only the groups you wish to remove."); m->mothurOutEndLine(); } string groupsString = ""; for (int i = 0; i < Groups.size()-1; i++) { groupsString += Groups[i] + ", "; } groupsString += Groups[Groups.size()-1]; m->mothurOut("Removed groups: " + groupsString + " from your shared file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "readShared"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::readList(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); ifstream in; m->openInputFile(listfile, in); bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ removedCount = 0; //read in list vector ListVector list(in); variables["[tag]"] = list.getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); vector binLabels = list.getLabels(); vector newBinLabels; //make a new list vector ListVector newList; newList.setLabel(list.getLabel()); //for each bin for (int i = 0; i < list.getNumBins(); i++) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } //parse out names that are in accnos file string binnames = list.get(i); string newNames = ""; while (binnames.find_first_of(',') != -1) { string name = binnames.substr(0,binnames.find_first_of(',')); binnames = binnames.substr(binnames.find_first_of(',')+1, binnames.length()); //if that name is in the .accnos file, add it if (names.count(name) == 0) { newNames += name + ","; } else { //if you are not in the accnos file check if you are a name that needs to be changed map::iterator it = uniqueToRedundant.find(name); if (it != uniqueToRedundant.end()) { newNames += it->second + ","; }else { removedCount++; } } } //get last name if (names.count(binnames) == 0) { newNames += binnames + ","; } else { //if you are not in the accnos file check if you are a name that needs to be changed map::iterator it = uniqueToRedundant.find(binnames); if (it != uniqueToRedundant.end()) { newNames += it->second + ","; }else { removedCount++; } } //if there are names in this bin add to new list if (newNames != "") { newNames = newNames.substr(0, newNames.length()-1); //rip off extra comma newList.push_back(newNames); newBinLabels.push_back(binLabels[i]); } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } m->gobble(in); out.close(); } in.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the groups you wish to remove."); m->mothurOutEndLine(); } m->mothurOut("Removed " + toString(removedCount) + " sequences from your list file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "readList"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::readName(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputFileName = getOutputFileName("name", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); in >> secondCol; vector parsedNames; m->splitAtComma(secondCol, parsedNames); vector validSecond; validSecond.clear(); for (int i = 0; i < parsedNames.size(); i++) { if (names.count(parsedNames[i]) == 0) { validSecond.push_back(parsedNames[i]); } } removedCount += parsedNames.size()-validSecond.size(); //if the name in the first column is in the set then print it and any other names in second column also in set if (names.count(firstCol) == 0) { wroteSomething = true; out << firstCol << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; //make first name in set you come to first column and then add the remaining names to second column }else { //you want part of this row if (validSecond.size() != 0) { wroteSomething = true; out << validSecond[0] << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; uniqueToRedundant[firstCol] = validSecond[0]; } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the groups you wish to remove."); m->mothurOutEndLine(); } outputTypes["name"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your name file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "readName"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::readGroup(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(groupfile, in); string name, group; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> group; //read from second column //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t' << group << endl; }else { removedCount++; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the groups you wish to remove."); m->mothurOutEndLine(); } outputTypes["group"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your group file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::readCount(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string outputFileName = getOutputFileName("count", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(countfile, in); bool wroteSomething = false; int removedCount = 0; string headers = m->getline(in); m->gobble(in); vector columnHeaders = m->splitWhiteSpace(headers); vector groups; map originalGroupIndexes; map GroupIndexes; set indexOfGroupsChosen; for (int i = 2; i < columnHeaders.size(); i++) { groups.push_back(columnHeaders[i]); originalGroupIndexes[i-2] = columnHeaders[i]; } //sort groups to keep consistent with how we store the groups in groupmap sort(groups.begin(), groups.end()); for (int i = 0; i < groups.size(); i++) { GroupIndexes[groups[i]] = i; } vector groupsToKeep; for (int i = 0; i < groups.size(); i++) { if (!m->inUsersGroups(groups[i], Groups)) { groupsToKeep.push_back(groups[i]); } } sort(groupsToKeep.begin(), groupsToKeep.end()); out << "Representative_Sequence\ttotal"; for (int i = 0; i < groupsToKeep.size(); i++) { out << '\t' << groupsToKeep[i]; indexOfGroupsChosen.insert(GroupIndexes[groupsToKeep[i]]); } out << endl; string name; int oldTotal; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> oldTotal; m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: " + name + '\t' + toString(oldTotal) + "\n"); } if (names.count(name) == 0) { //if group info, then read it vector selectedCounts; int thisTotal = 0; int temp; for (int i = 0; i < groups.size(); i++) { int thisIndex = GroupIndexes[originalGroupIndexes[i]]; in >> temp; m->gobble(in); if (indexOfGroupsChosen.count(thisIndex) != 0) { //we want this group selectedCounts.push_back(temp); thisTotal += temp; } } out << name << '\t' << thisTotal; for (int i = 0; i < selectedCounts.size(); i++) { out << '\t' << selectedCounts[i]; } out << endl; wroteSomething = true; removedCount+= (oldTotal - thisTotal); }else { m->getline(in); removedCount += oldTotal; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file does NOT contain sequences from the groups you wish to get."); m->mothurOutEndLine(); } outputTypes["count"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your count file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "readCount"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::readDesign(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(designfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(designfile)); variables["[extension]"] = m->getExtension(designfile); string outputFileName = getOutputFileName("design", variables); DesignMap designMap(designfile); vector groupsToKeep; vector allGroups = designMap.getNamesGroups(); for(int i = 0; i < allGroups.size(); i++) { if (!m->inUsersGroups(allGroups[i], Groups)) { groupsToKeep.push_back(allGroups[i]); } } bool wroteSomething = false; ofstream out; m->openOutputFile(outputFileName, out); int numGroupsFound = designMap.printGroups(out, groupsToKeep); if (numGroupsFound > 0) { wroteSomething = true; } out.close(); int removedCount = allGroups.size() - numGroupsFound; if (wroteSomething == false) { m->mothurOut("Your file contains only groups from the groups you wish to remove."); m->mothurOutEndLine(); } outputTypes["design"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " groups from your design file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "readDesign"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::readTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(taxfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(taxfile)); variables["[extension]"] = m->getExtension(taxfile); string outputFileName = getOutputFileName("taxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(taxfile, in); string name, tax; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> tax; //read from second column //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t' << tax << endl; }else { //if you are not in the accnos file check if you are a name that needs to be changed map::iterator it = uniqueToRedundant.find(name); if (it != uniqueToRedundant.end()) { wroteSomething = true; out << it->second << '\t' << tax << endl; }else { removedCount++; } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the groups you wish to remove."); m->mothurOutEndLine(); } outputTypes["taxonomy"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your taxonomy file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "readTax"); exit(1); } } //********************************************************************************************************************** int RemoveGroupsCommand::fillNames(){ try { vector seqs = groupMap->getNamesSeqs(); for (int i = 0; i < seqs.size(); i++) { if (m->control_pressed) { return 0; } string group = groupMap->getGroup(seqs[i]); if (m->inUsersGroups(group, Groups)) { names.insert(seqs[i]); } } return 0; } catch(exception& e) { m->errorOut(e, "RemoveGroupsCommand", "fillNames"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/removegroupscommand.h000066400000000000000000000030651255543666200225100ustar00rootroot00000000000000#ifndef REMOVEGROUPSCOMMAND_H #define REMOVEGROUPSCOMMAND_H /* * removegroupscommand.h * Mothur * * Created by westcott on 11/10/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "groupmap.h" class RemoveGroupsCommand : public Command { public: RemoveGroupsCommand(string); RemoveGroupsCommand(); ~RemoveGroupsCommand(){} vector setParameters(); string getCommandName() { return "remove.groups"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Remove.groups"; } string getDescription() { return "removes sequences from a list, fasta, name, group, shared, design or taxonomy file from a given group or set of groups"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: set names; string accnosfile, fastafile, namefile, groupfile, countfile, designfile, listfile, taxfile, outputDir, groups, sharedfile; bool abort; vector outputNames, Groups; GroupMap* groupMap; map uniqueToRedundant; //if a namefile is given and the first column name is not selected //then the other files need to change the unique name in their file to match. //only add the names that need to be changed to keep the map search quick int readFasta(); int readShared(); int readName(); int readGroup(); int readCount(); int readList(); int readTax(); int fillNames(); int readDesign(); }; #endif mothur-1.36.1/source/commands/removelineagecommand.cpp000066400000000000000000001562031255543666200231330ustar00rootroot00000000000000/* * removelineagecommand.cpp * Mothur * * Created by westcott on 9/24/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "removelineagecommand.h" #include "sequence.hpp" #include "listvector.hpp" #include "counttable.h" #include "inputdata.h" //********************************************************************************************************************** vector RemoveLineageCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "FNGLT", "none","fasta",false,false,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "FNGLT", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "FNGLT", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "FNGLT", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "FNGLT", "none","list",false,false,true); parameters.push_back(plist); CommandParameter pshared("shared", "InputTypes", "", "", "none", "FNGLT", "none","shared",false,false, true); parameters.push_back(pshared); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "tax", "FNGLT", "none","taxonomy",false,false, true); parameters.push_back(ptaxonomy); CommandParameter pconstaxonomy("constaxonomy", "InputTypes", "", "", "tax", "FNGLT", "none","constaxonomy",false,false, true); parameters.push_back(pconstaxonomy); CommandParameter palignreport("alignreport", "InputTypes", "", "", "none", "FNGLT", "none","alignreport",false,false); parameters.push_back(palignreport); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter ptaxon("taxon", "String", "", "", "", "", "","",false,true,true); parameters.push_back(ptaxon); CommandParameter pdups("dups", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pdups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RemoveLineageCommand::getHelpString(){ try { string helpString = ""; helpString += "The remove.lineage command reads a taxonomy or constaxonomy file and any of the following file types: fasta, name, group, count, list, shared or alignreport file. The constaxonomy can only be used with a shared or list file.\n"; helpString += "It outputs a file containing only the sequences or OTUS from the taxonomy file that are not from the taxon you requested to be removed.\n"; helpString += "The remove.lineage command parameters are taxon, fasta, name, group, count, list, shared, taxonomy, alignreport, label and dups. You must provide taxonomy or constaxonomy unless you have a valid current taxonomy file.\n"; helpString += "The dups parameter allows you to add the entire line from a name file if you add any name from the line. default=false. \n"; helpString += "The taxon parameter allows you to select the taxons you would like to remove, and is required.\n"; helpString += "You may enter your taxons with confidence scores, doing so will remove only those sequences that belong to the taxonomy and whose cofidence scores fall below the scores you give.\n"; helpString += "If they belong to the taxonomy and have confidences above those you provide the sequence will not be removed.\n"; helpString += "The label parameter is used to analyze specific labels in your input. \n"; helpString += "The remove.lineage command should be in the following format: remove.lineage(taxonomy=yourTaxonomyFile, taxon=yourTaxons).\n"; helpString += "Example remove.lineage(taxonomy=amazon.silva.taxonomy, taxon=Bacteria;Firmicutes;Bacilli;Lactobacillales;).\n"; helpString += "Note: If you are running mothur in script mode you must wrap the taxon in ' characters so mothur will ignore the ; in the taxon.\n"; helpString += "Example remove.lineage(taxonomy=amazon.silva.taxonomy, taxon='Bacteria;Firmicutes;Bacilli;Lactobacillales;').\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RemoveLineageCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],pick,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "constaxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "name") { pattern = "[filename],pick,[extension]"; } else if (type == "group") { pattern = "[filename],pick,[extension]"; } else if (type == "count") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[distance],pick,[extension]"; } else if (type == "shared") { pattern = "[filename],[distance],pick,[extension]"; } else if (type == "alignreport") { pattern = "[filename],pick.align.report"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RemoveLineageCommand::RemoveLineageCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["shared"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "RemoveLineageCommand"); exit(1); } } //********************************************************************************************************************** RemoveLineageCommand::RemoveLineageCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["shared"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("alignreport"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["alignreport"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("constaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["constaxonomy"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } alignfile = validParameter.validFile(parameters, "alignreport", true); if (alignfile == "not open") { abort = true; } else if (alignfile == "not found") { alignfile = ""; } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { taxfile = ""; abort = true; } else if (taxfile == "not found") { taxfile = ""; } else { m->setTaxonomyFile(taxfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } constaxonomy = validParameter.validFile(parameters, "constaxonomy", true); if (constaxonomy == "not open") { constaxonomy = ""; abort = true; } else if (constaxonomy == "not found") { constaxonomy = ""; } if ((constaxonomy == "") && (taxfile == "")) { taxfile = m->getTaxonomyFile(); if (taxfile != "") { m->mothurOut("Using " + taxfile + " as input file for the taxonomy parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current taxonomy file and did not provide a constaxonomy file. The taxonomy or constaxonomy parameter is required."); m->mothurOutEndLine(); abort = true; } } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } string usedDups = "true"; string temp = validParameter.validFile(parameters, "dups", false); if (temp == "not found") { if (namefile != "") { temp = "true"; } else { temp = "false"; usedDups = ""; } } dups = m->isTrue(temp); taxons = validParameter.validFile(parameters, "taxon", false); if (taxons == "not found") { taxons = ""; m->mothurOut("No taxons given, please correct."); m->mothurOutEndLine(); abort = true; } else { //rip off quotes if (taxons[0] == '\'') { taxons = taxons.substr(1); } if (taxons[(taxons.length()-1)] == '\'') { taxons = taxons.substr(0, (taxons.length()-1)); } } m->splitAtChar(taxons, listOfTaxons, '-'); if ((fastafile == "") && (constaxonomy == "") && (namefile == "") && (groupfile == "") && (alignfile == "") && (listfile == "") && (taxfile == "") && (countfile == "")) { m->mothurOut("You must provide one of the following: fasta, name, group, count, alignreport, taxonomy, constaxonomy, shared or listfile."); m->mothurOutEndLine(); abort = true; } if ((constaxonomy != "") && ((fastafile != "") || (namefile != "") || (groupfile != "") || (alignfile != "") || (taxfile != "") || (countfile != ""))) { m->mothurOut("[ERROR]: can only use constaxonomy file with a list or shared file, aborting.\n"); abort = true; } if ((constaxonomy != "") && (taxfile != "")) { m->mothurOut("[ERROR]: Choose only one: taxonomy or constaxonomy, aborting.\n"); abort = true; } if ((sharedfile != "") && (taxfile != "")) { m->mothurOut("[ERROR]: sharedfile can only be used with constaxonomy file, aborting.\n"); abort = true; } if ((sharedfile != "") || (listfile != "")) { label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } } if ((usedDups != "") && (namefile == "")) { m->mothurOut("You may only use dups with the name option."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if ((namefile == "") && ((fastafile != "") || (taxfile != ""))){ vector files; files.push_back(fastafile); files.push_back(taxfile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "RemoveLineageCommand"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (m->control_pressed) { return 0; } if (countfile != "") { if ((fastafile != "") || (listfile != "") || (taxfile != "")) { m->mothurOut("\n[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.\n\n"); } } //read through the correct file and output lines you want to keep if (taxfile != "") { readTax(); //fills the set of names to get if (namefile != "") { readName(); } if (fastafile != "") { readFasta(); } if (countfile != "") { readCount(); } if (groupfile != "") { readGroup(); } if (alignfile != "") { readAlign(); } if (listfile != "") { readList(); } }else { readConsTax(); if (listfile != "") { readConsList(); } if (sharedfile != "") { readShared(); } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "execute"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readFasta(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string outputFileName = getOutputFileName("fasta", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(fastafile, in); string name; bool wroteSomething = false; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; currSeq.printSequence(out); } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your fasta file contains only sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["fasta"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readList(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); ifstream in; m->openInputFile(listfile, in); bool wroteSomething = false; while(!in.eof()){ //read in list vector ListVector list(in); //make a new list vector ListVector newList; newList.setLabel(list.getLabel()); variables["[distance]"] = list.getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); if (m->control_pressed) { in.close(); out.close(); return 0; } vector binLabels = list.getLabels(); vector newBinLabels; //for each bin for (int i = 0; i < list.getNumBins(); i++) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } //parse out names that are in accnos file string binnames = list.get(i); vector bnames; m->splitAtComma(binnames, bnames); string newNames = ""; for (int j = 0; j < bnames.size(); j++) { string name = bnames[j]; //if that name is in the .accnos file, add it if (names.count(name) == 0) { newNames += name + ","; } } //if there are names in this bin add to new list if (newNames != "") { newNames = newNames.substr(0, newNames.length()-1); //rip off extra comma newList.push_back(newNames); newBinLabels.push_back(binLabels[i]); } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } m->gobble(in); out.close(); } in.close(); if (wroteSomething == false) { m->mothurOut("Your list file contains only sequences from " + taxons + "."); m->mothurOutEndLine(); } return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readList"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readName(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputFileName = getOutputFileName("name", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; bool wroteSomething = false; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; in >> secondCol; vector parsedNames; m->splitAtComma(secondCol, parsedNames); vector validSecond; validSecond.clear(); for (int i = 0; i < parsedNames.size(); i++) { if (names.count(parsedNames[i]) == 0) { validSecond.push_back(parsedNames[i]); } } if ((dups) && (validSecond.size() != parsedNames.size())) { //if dups is true and we want to get rid of anyone, get rid of everyone for (int i = 0; i < parsedNames.size(); i++) { names.insert(parsedNames[i]); } }else { //if the name in the first column is in the set then print it and any other names in second column also in set if (names.count(firstCol) == 0) { wroteSomething = true; out << firstCol << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; //make first name in set you come to first column and then add the remaining names to second column }else { //you want part of this row if (validSecond.size() != 0) { wroteSomething = true; out << validSecond[0] << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your name file contains only sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["name"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readName"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readCount(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string outputFileName = getOutputFileName("count", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(countfile, in); bool wroteSomething = false; string headers = m->getline(in); m->gobble(in); out << headers << endl; string test = headers; vector pieces = m->splitWhiteSpace(test); string name, rest; int thisTotal; rest = ""; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> thisTotal; m->gobble(in); if (pieces.size() > 2) { rest = m->getline(in); m->gobble(in); } if (m->debug) { m->mothurOut("[DEBUG]: " + name + '\t' + rest + "\n"); } if (names.count(name) == 0) { out << name << '\t' << thisTotal << '\t' << rest << endl; wroteSomething = true; } } in.close(); out.close(); //check for groups that have been eliminated CountTable ct; if (ct.testGroups(outputFileName)) { ct.readTable(outputFileName, true, false); ct.printTable(outputFileName); } if (wroteSomething == false) { m->mothurOut("Your group file contains only sequences from " + taxons + "."); m->mothurOutEndLine(); } outputTypes["count"].push_back(outputFileName); outputNames.push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readCount"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readConsList(){ try { getListVector(); if (m->control_pressed) { delete list; return 0;} ListVector newList; newList.setLabel(list->getLabel()); int removedCount = 0; bool wroteSomething = false; string snumBins = toString(list->getNumBins()); vector binLabels = list->getLabels(); vector newBinLabels; for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { delete list; return 0;} //create a label for this otu string otuLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { otuLabel += "0"; } } otuLabel += sbinNumber; if (names.count(m->getSimpleLabel(otuLabel)) == 0) { newList.push_back(list->get(i)); newBinLabels.push_back(binLabels[i]); }else { removedCount++; } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); variables["[distance]"] = list->getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); delete list; //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } out.close(); if (wroteSomething == false) { m->mothurOut("Your file only contains OTUs from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["list"].push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " OTUs from your list file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readConsList"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::getListVector(){ try { InputData input(listfile, "list"); list = input.getListVector(); string lastLabel = list->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); break; } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); break; } lastLabel = list->getLabel(); //get next line to process //prevent memory leak delete list; list = input.getListVector(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { delete list; list = input.getListVector(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "getListVector"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readShared(){ try { getShared(); if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } vector newLabels; //create new "filtered" lookup vector newLookup; for (int i = 0; i < lookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(lookup[i]->getLabel()); temp->setGroup(lookup[i]->getGroup()); newLookup.push_back(temp); } bool wroteSomething = false; int numRemoved = 0; for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } return 0; } //is this otu on the list if (names.count(m->getSimpleLabel(m->currentSharedBinLabels[i])) == 0) { wroteSomething = true; newLabels.push_back(m->currentSharedBinLabels[i]); for (int j = 0; j < newLookup.size(); j++) { //add this OTU to the new lookup newLookup[j]->push_back(lookup[j]->getAbundance(i), lookup[j]->getGroup()); } }else { numRemoved++; } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } m->currentSharedBinLabels = newLabels; newLookup[0]->printHeaders(out); for (int i = 0; i < newLookup.size(); i++) { out << newLookup[i]->getLabel() << '\t' << newLookup[i]->getGroup() << '\t'; newLookup[i]->print(out); } out.close(); for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } if (wroteSomething == false) { m->mothurOut("Your file only contains OTUs from " + taxons + "."); m->mothurOutEndLine(); } m->mothurOut("Removed " + toString(numRemoved) + " OTUs from your shared file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readShared"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::getShared(){ try { InputData input(sharedfile, "sharedfile"); lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(lookup[0]->getLabel()) == 1){ processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); break; } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "getShared"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readGroup(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(groupfile, in); string name, group; bool wroteSomething = false; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> group; //read from second column //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t' << group << endl; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your group file contains only sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["group"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(taxfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(taxfile)); variables["[extension]"] = m->getExtension(taxfile); string outputFileName = getOutputFileName("taxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(taxfile, in); string name, tax; bool wroteSomething = false; vector taxonsHasConfidence; taxonsHasConfidence.resize(listOfTaxons.size(), false); vector< vector< map > > searchTaxons; searchTaxons.resize(listOfTaxons.size()); vector noConfidenceTaxons; noConfidenceTaxons.resize(listOfTaxons.size(), ""); for (int i = 0; i < listOfTaxons.size(); i++) { noConfidenceTaxons[i] = listOfTaxons[i]; int hasConPos = listOfTaxons[i].find_first_of('('); if (hasConPos != string::npos) { taxonsHasConfidence[i] = true; searchTaxons[i] = getTaxons(listOfTaxons[i]); noConfidenceTaxons[i] = listOfTaxons[i]; m->removeConfidences(noConfidenceTaxons[i]); } } while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column in >> tax; //read from second column bool remove = false; string noQuotesTax = m->removeQuotes(tax); for (int j = 0; j < listOfTaxons.size(); j++) { string newtax = noQuotesTax; //if the users file contains confidence scores we want to ignore them when searching for the taxons, unless the taxon has them if (!taxonsHasConfidence[j]) { int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences != string::npos) { newtax = noQuotesTax; m->removeConfidences(newtax); } int pos = newtax.find(noConfidenceTaxons[j]); if (pos == string::npos) { //wroteSomething = true; //out << name << '\t' << tax << endl; }else{ //this sequence contains the taxon the user wants to remove names.insert(name); remove=true; break; } }else{//if taxons has them and you don't them remove taxons int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences == string::npos) { int pos = newtax.find(noConfidenceTaxons[j]); if (pos == string::npos) { //wroteSomething = true; //out << name << '\t' << tax << endl; }else{ //this sequence contains the taxon the user wants to remove names.insert(name); remove=true; break; } }else { //both have confidences so we want to make sure the users confidences are greater then or equal to the taxons //first remove confidences from both and see if the taxonomy exists string noNewTax = noQuotesTax; int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences != string::npos) { noNewTax = noQuotesTax; m->removeConfidences(noNewTax); } int pos = noNewTax.find(noConfidenceTaxons[j]); if (pos != string::npos) { //if yes, then are the confidences okay vector< map > usersTaxon = getTaxons(newtax); //the usersTaxon is most likely longer than the searchTaxons, and searchTaxon[0] may relate to userTaxon[4] //we want to "line them up", so we will find the the index where the searchstring starts int index = 0; for (int i = 0; i < usersTaxon.size(); i++) { if (usersTaxon[i].begin()->first == searchTaxons[j][0].begin()->first) { index = i; int spot = 0; bool goodspot = true; //is this really the start, or are we dealing with a taxon of the same name? while ((spot < searchTaxons[j].size()) && ((i+spot) < usersTaxon.size())) { if (usersTaxon[i+spot].begin()->first != searchTaxons[j][spot].begin()->first) { goodspot = false; break; } else { spot++; } } if (goodspot) { break; } } } for (int i = 0; i < searchTaxons[j].size(); i++) { if ((i+index) < usersTaxon.size()) { //just in case, should never be false if (usersTaxon[i+index].begin()->second < searchTaxons[j][i].begin()->second) { //is the users cutoff less than the search taxons remove = true; break; } }else { remove = true; break; } } //passed the test so remove you if (remove) { names.insert(name); remove=true; break; }else { //wroteSomething = true; //out << name << '\t' << tax << endl; } }else { //wroteSomething = true; //out << name << '\t' << tax << endl; } } } } if (!remove) { wroteSomething = true; out << name << '\t' << tax << endl; } m->gobble(in); } in.close(); out.close(); if (!wroteSomething) { m->mothurOut("Your taxonomy file contains only sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["taxonomy"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readTax"); exit(1); } } //********************************************************************************************************************** int RemoveLineageCommand::readConsTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(constaxonomy); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(constaxonomy)); variables["[extension]"] = m->getExtension(constaxonomy); string outputFileName = getOutputFileName("constaxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(constaxonomy, in); string otuLabel, tax; int numReps; bool wroteSomething = false; //read headers string headers = m->getline(in); out << headers << endl; //bool wroteSomething = false; vector taxonsHasConfidence; taxonsHasConfidence.resize(listOfTaxons.size(), false); vector< vector< map > > searchTaxons; searchTaxons.resize(listOfTaxons.size()); vector noConfidenceTaxons; noConfidenceTaxons.resize(listOfTaxons.size(), ""); for (int i = 0; i < listOfTaxons.size(); i++) { noConfidenceTaxons[i] = listOfTaxons[i]; int hasConPos = listOfTaxons[i].find_first_of('('); if (hasConPos != string::npos) { taxonsHasConfidence[i] = true; searchTaxons[i] = getTaxons(listOfTaxons[i]); noConfidenceTaxons[i] = listOfTaxons[i]; m->removeConfidences(noConfidenceTaxons[i]); } } while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> otuLabel; m->gobble(in); in >> numReps; m->gobble(in); in >> tax; m->gobble(in); bool remove = false; string noQuotesTax = m->removeQuotes(tax); for (int j = 0; j < listOfTaxons.size(); j++) { string newtax = noQuotesTax; //if the users file contains confidence scores we want to ignore them when searching for the taxons, unless the taxon has them if (!taxonsHasConfidence[j]) { int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences != string::npos) { newtax = noQuotesTax; m->removeConfidences(newtax); } int pos = newtax.find(noConfidenceTaxons[j]); if (pos == string::npos) { //wroteSomething = true; //out << name << '\t' << tax << endl; }else{ //this sequence contains the taxon the user wants to remove names.insert(m->getSimpleLabel(otuLabel)); remove=true; break; } }else{//if taxons has them and you don't them remove taxons int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences == string::npos) { int pos = newtax.find(noConfidenceTaxons[j]); if (pos == string::npos) { //wroteSomething = true; //out << name << '\t' << tax << endl; }else{ //this sequence contains the taxon the user wants to remove names.insert(m->getSimpleLabel(otuLabel)); remove=true; break; } }else { //both have confidences so we want to make sure the users confidences are greater then or equal to the taxons //first remove confidences from both and see if the taxonomy exists string noNewTax = noQuotesTax; int hasConfidences = noQuotesTax.find_first_of('('); if (hasConfidences != string::npos) { noNewTax = noQuotesTax; m->removeConfidences(noNewTax); } int pos = noNewTax.find(noConfidenceTaxons[j]); if (pos != string::npos) { //if yes, then are the confidences okay vector< map > usersTaxon = getTaxons(newtax); //the usersTaxon is most likely longer than the searchTaxons, and searchTaxon[0] may relate to userTaxon[4] //we want to "line them up", so we will find the the index where the searchstring starts int index = 0; for (int i = 0; i < usersTaxon.size(); i++) { if (usersTaxon[i].begin()->first == searchTaxons[j][0].begin()->first) { index = i; int spot = 0; bool goodspot = true; //is this really the start, or are we dealing with a taxon of the same name? while ((spot < searchTaxons[j].size()) && ((i+spot) < usersTaxon.size())) { if (usersTaxon[i+spot].begin()->first != searchTaxons[j][spot].begin()->first) { goodspot = false; break; } else { spot++; } } if (goodspot) { break; } } } for (int i = 0; i < searchTaxons[j].size(); i++) { if ((i+index) < usersTaxon.size()) { //just in case, should never be false if (usersTaxon[i+index].begin()->second < searchTaxons[j][i].begin()->second) { //is the users cutoff less than the search taxons remove = true; break; } }else { remove = true; break; } } //passed the test so remove you if (remove) { names.insert(m->getSimpleLabel(otuLabel)); remove=true; break; }else { //wroteSomething = true; //out << name << '\t' << tax << endl; } }else { //wroteSomething = true; //out << name << '\t' << tax << endl; } } } } if (!remove) { wroteSomething = true; out << otuLabel << '\t' << numReps << '\t' << tax << endl; } } in.close(); out.close(); if (names.size() == 0) { m->mothurOut("Your constaxonomy file contains OTUs only from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["constaxonomy"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readConsTax"); exit(1); } } /**************************************************************************************************/ vector< map > RemoveLineageCommand::getTaxons(string tax) { try { vector< map > t; string taxon = ""; int taxLength = tax.length(); for(int i=0;iisNumeric1(confidenceScore)) { //its a confidence newtaxon = taxon.substr(0, openParen); //rip off confidence confidence = taxon.substr((openParen+1), (closeParen-openParen-1)); }else { //its part of the taxon newtaxon = taxon; confidence = "0"; } }else{ newtaxon = taxon; confidence = "0"; } float con = 0; convert(confidence, con); map temp; temp[newtaxon] = con; t.push_back(temp); taxon = ""; } else{ taxon += tax[i]; } } return t; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "getTaxons"); exit(1); } } //********************************************************************************************************************** //alignreport file has a column header line then all other lines contain 16 columns. we just want the first column since that contains the name int RemoveLineageCommand::readAlign(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(alignfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(alignfile)); variables["[extension]"] = m->getExtension(alignfile); string outputFileName = getOutputFileName("alignreport", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(alignfile, in); string name, junk; bool wroteSomething = false; //read column headers for (int i = 0; i < 16; i++) { if (!in.eof()) { in >> junk; out << junk << '\t'; } else { break; } } out << endl; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t'; //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; out << junk << '\t'; } else { break; } } out << endl; }else {//still read just don't do anything with it //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; } else { break; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your align file contains only sequences from " + taxons + "."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["alignreport"].push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveLineageCommand", "readAlign"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/removelineagecommand.h000066400000000000000000000031111255543666200225650ustar00rootroot00000000000000#ifndef REMOVELINEAGECOMMAND_H #define REMOVELINEAGECOMMAND_H /* * removelineagecommand.h * Mothur * * Created by westcott on 9/24/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sharedrabundvector.h" #include "listvector.hpp" class RemoveLineageCommand : public Command { public: RemoveLineageCommand(string); RemoveLineageCommand(); ~RemoveLineageCommand(){}; vector setParameters(); string getCommandName() { return "remove.lineage"; } string getCommandCategory() { return "Phylotype Analysis"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Remove.lineage"; } string getDescription() { return "removes sequences from a list, fasta, name, group, alignreport or taxonomy file from a given taxonomy or set of taxonomies"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: set names; vector outputNames, listOfTaxons; string fastafile, namefile, groupfile, alignfile, listfile, countfile, taxfile, outputDir, taxons, sharedfile, constaxonomy, label; bool abort, dups; vector lookup; ListVector* list; int readFasta(); int readName(); int readGroup(); int readCount(); int readAlign(); int readList(); int readTax(); int readShared(); int readConsTax(); int readConsList(); int getShared(); int getListVector(); vector< map > getTaxons(string); }; #endif mothur-1.36.1/source/commands/removeotulabelscommand.cpp000066400000000000000000000704311255543666200235170ustar00rootroot00000000000000// // removeotulabels.cpp // Mothur // // Created by Sarah Westcott on 5/21/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "removeotulabelscommand.h" //********************************************************************************************************************** vector RemoveOtuLabelsCommand::setParameters(){ try { CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(paccnos); CommandParameter pconstaxonomy("constaxonomy", "InputTypes", "", "", "none", "FNGLT", "none","constaxonomy",false,false); parameters.push_back(pconstaxonomy); CommandParameter potucorr("otucorr", "InputTypes", "", "", "none", "FNGLT", "none","otucorr",false,false); parameters.push_back(potucorr); CommandParameter pcorraxes("corraxes", "InputTypes", "", "", "none", "FNGLT", "none","corraxes",false,false); parameters.push_back(pcorraxes); CommandParameter plist("list", "InputTypes", "", "", "none", "FNGLT", "none","list",false,false, true); parameters.push_back(plist); CommandParameter pshared("shared", "InputTypes", "", "", "none", "FNGLT", "none","shared",false,false, true); parameters.push_back(pshared); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RemoveOtuLabelsCommand::getHelpString(){ try { string helpString = ""; helpString += "The remove.otulabels command can be used to remove specific otus with the output from classify.otu, otu.association, or corr.axes. It can also be used to select a set of otus from a shared or list file.\n"; helpString += "The remove.otulabels parameters are: constaxonomy, otucorr, corraxes, shared, list, label and accnos.\n"; helpString += "The constaxonomy parameter is input the results of the classify.otu command.\n"; helpString += "The otucorr parameter is input the results of the otu.association command.\n"; helpString += "The corraxes parameter is input the results of the corr.axes command.\n"; helpString += "The label parameter is used to analyze specific labels in your input. \n"; helpString += "The remove.otulabels commmand should be in the following format: \n"; helpString += "remove.otulabels(accnos=yourListOfOTULabels, corraxes=yourCorrAxesFile)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RemoveOtuLabelsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "constaxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "otucorr") { pattern = "[filename],pick,[extension]"; } else if (type == "corraxes") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[distance],pick,[extension]"; } else if (type == "shared") { pattern = "[filename],[distance],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RemoveOtuLabelsCommand::RemoveOtuLabelsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["otucorr"] = tempOutNames; outputTypes["corraxes"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["list"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "RemoveOtuLabelsCommand"); exit(1); } } //********************************************************************************************************************** RemoveOtuLabelsCommand::RemoveOtuLabelsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { //edit file types below to include only the types you added as parameters string path; it = parameters.find("constaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["constaxonomy"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("corraxes"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["corraxes"] = inputDir + it->second; } } it = parameters.find("otucorr"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["otucorr"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } vector tempOutNames; outputTypes["constaxonomy"] = tempOutNames; outputTypes["otucorr"] = tempOutNames; outputTypes["corraxes"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["list"] = tempOutNames; //check for parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = m->getAccnosFile(); if (accnosfile != "") { m->mothurOut("Using " + accnosfile + " as input file for the accnos parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no valid accnos file and accnos is required."); m->mothurOutEndLine(); abort = true; } }else { m->setAccnosFile(accnosfile); } constaxonomyfile = validParameter.validFile(parameters, "constaxonomy", true); if (constaxonomyfile == "not open") { constaxonomyfile = ""; abort = true; } else if (constaxonomyfile == "not found") { constaxonomyfile = ""; } corraxesfile = validParameter.validFile(parameters, "corraxes", true); if (corraxesfile == "not open") { corraxesfile = ""; abort = true; } else if (corraxesfile == "not found") { corraxesfile = ""; } otucorrfile = validParameter.validFile(parameters, "otucorr", true); if (otucorrfile == "not open") { otucorrfile = ""; abort = true; } else if (otucorrfile == "not found") { otucorrfile = ""; } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } if ((constaxonomyfile == "") && (corraxesfile == "") && (otucorrfile == "") && (sharedfile == "") && (listfile == "")) { m->mothurOut("You must provide one of the following: constaxonomy, corraxes, otucorr, shared or list."); m->mothurOutEndLine(); abort = true; } if ((sharedfile != "") || (listfile != "")) { label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } } } } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "RemoveOtuLabelsCommand"); exit(1); } } //********************************************************************************************************************** int RemoveOtuLabelsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get labels you want to keep otulabels = m->readAccnos(accnosfile); //simplfy labels set newLabels; for (set::iterator it = otulabels.begin(); it != otulabels.end(); it++) { newLabels.insert(m->getSimpleLabel(*it)); } otulabels = newLabels; if (m->debug) { m->mothurOut("[DEBUG]: numlabels = " + toString(otulabels.size()) + "\n"); } if (m->control_pressed) { return 0; } //read through the correct file and output lines you want to keep if (constaxonomyfile != "") { readClassifyOtu(); } if (corraxesfile != "") { readCorrAxes(); } if (otucorrfile != "") { readOtuAssociation(); } if (listfile != "") { readList(); } if (sharedfile != "") { readShared(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); string current = ""; itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "GetOtuLabelsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int RemoveOtuLabelsCommand::readClassifyOtu(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(constaxonomyfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(constaxonomyfile)); variables["[extension]"] = m->getExtension(constaxonomyfile); string outputFileName = getOutputFileName("constaxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(constaxonomyfile, in); bool wroteSomething = false; int removedCount = 0; //read headers string headers = m->getline(in); m->gobble(in); out << headers << endl; while (!in.eof()) { if (m->control_pressed) { break; } string otu = ""; string tax = "unknown"; int size = 0; in >> otu >> size >> tax; m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: " + otu + toString(size) + tax + "\n"); } if (otulabels.count(m->getSimpleLabel(otu)) == 0) { wroteSomething = true; out << otu << '\t' << size << '\t' << tax << endl; }else { removedCount++; } } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file only contains labels from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["constaxonomy"].push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " otus from your constaxonomy file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "readClassifyOtu"); exit(1); } } //********************************************************************************************************************** int RemoveOtuLabelsCommand::readOtuAssociation(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(otucorrfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(otucorrfile)); variables["[extension]"] = m->getExtension(otucorrfile); string outputFileName = getOutputFileName("otucorr", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(otucorrfile, in); bool wroteSomething = false; int removedCount = 0; //read headers string headers = m->getline(in); m->gobble(in); out << headers << endl; while (!in.eof()) { if (m->control_pressed) { break; } string otu1 = ""; string otu2 = ""; in >> otu1 >> otu2; string line = m->getline(in); m->gobble(in); if ((otulabels.count(m->getSimpleLabel(otu1)) == 0) && (otulabels.count(m->getSimpleLabel(otu2)) == 0)){ wroteSomething = true; out << otu1 << '\t' << otu2 << '\t' << line << endl; }else { removedCount++; } } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file only contains labels from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["otucorr"].push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " lines from your otu.corr file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "readOtuAssociation"); exit(1); } } //********************************************************************************************************************** int RemoveOtuLabelsCommand::readCorrAxes(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(corraxesfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(corraxesfile)); variables["[extension]"] = m->getExtension(corraxesfile); string outputFileName = getOutputFileName("corraxes", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(corraxesfile, in); bool wroteSomething = false; int removedCount = 0; //read headers string headers = m->getline(in); m->gobble(in); out << headers << endl; while (!in.eof()) { if (m->control_pressed) { break; } string otu = ""; in >> otu; string line = m->getline(in); m->gobble(in); if (otulabels.count(m->getSimpleLabel(otu)) == 0) { wroteSomething = true; out << otu << '\t' << line << endl; }else { removedCount++; } } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file only contains labels from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["corraxes"].push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " lines from your corr.axes file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "readCorrAxes"); exit(1); } } //********************************************************************************************************************** int RemoveOtuLabelsCommand::readShared(){ try { getShared(); if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } vector newLabels; //create new "filtered" lookup vector newLookup; for (int i = 0; i < lookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(lookup[i]->getLabel()); temp->setGroup(lookup[i]->getGroup()); newLookup.push_back(temp); } bool wroteSomething = false; int numRemoved = 0; for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } return 0; } //is this otu on the list if (otulabels.count(m->getSimpleLabel(m->currentSharedBinLabels[i])) == 0) { wroteSomething = true; newLabels.push_back(m->currentSharedBinLabels[i]); for (int j = 0; j < newLookup.size(); j++) { //add this OTU to the new lookup newLookup[j]->push_back(lookup[j]->getAbundance(i), lookup[j]->getGroup()); } }else { numRemoved++; } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); variables["[distance]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } m->currentSharedBinLabels = newLabels; newLookup[0]->printHeaders(out); for (int i = 0; i < newLookup.size(); i++) { out << newLookup[i]->getLabel() << '\t' << newLookup[i]->getGroup() << '\t'; newLookup[i]->print(out); } out.close(); for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } if (wroteSomething == false) { m->mothurOut("Your file contains only OTUs from the .accnos file."); m->mothurOutEndLine(); } m->mothurOut("Removed " + toString(numRemoved) + " OTUs from your shared file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "readShared"); exit(1); } } //********************************************************************************************************************** int RemoveOtuLabelsCommand::readList(){ try { getListVector(); if (m->control_pressed) { delete list; return 0;} ListVector newList; newList.setLabel(list->getLabel()); int removedCount = 0; bool wroteSomething = false; vector binLabels = list->getLabels(); vector newLabels; for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { delete list; return 0;} if (otulabels.count(m->getSimpleLabel(binLabels[i])) == 0) { newList.push_back(list->get(i)); newLabels.push_back(binLabels[i]); }else { removedCount++; } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); variables["[distance]"] = list->getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); delete list; //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newLabels); newList.printHeaders(out); newList.print(out); } out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only OTUs from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["list"].push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " OTUs from your list file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "readList"); exit(1); } } //********************************************************************************************************************** int RemoveOtuLabelsCommand::getListVector(){ try { InputData input(listfile, "list"); list = input.getListVector(); string lastLabel = list->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); break; } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); break; } lastLabel = list->getLabel(); //get next line to process //prevent memory leak delete list; list = input.getListVector(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { delete list; list = input.getListVector(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "getListVector"); exit(1); } } //********************************************************************************************************************** int RemoveOtuLabelsCommand::getShared(){ try { InputData input(sharedfile, "sharedfile"); lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); if (label == "") { label = lastLabel; return 0; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { return 0; } if(labels.count(lookup[0]->getLabel()) == 1){ processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); break; } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); break; } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); } return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtuLabelsCommand", "getShared"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/removeotulabelscommand.h000066400000000000000000000032341255543666200231610ustar00rootroot00000000000000#ifndef Mothur_removeotulabelscommand_h #define Mothur_removeotulabelscommand_h // // removeotulabelscommand.h // Mothur // // Created by Sarah Westcott on 5/21/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "inputdata.h" #include "listvector.hpp" #include "sharedrabundvector.h" /**************************************************************************************************/ class RemoveOtuLabelsCommand : public Command { public: RemoveOtuLabelsCommand(string); RemoveOtuLabelsCommand(); ~RemoveOtuLabelsCommand(){} vector setParameters(); string getCommandName() { return "remove.otulabels"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Get.otulabels"; } string getDescription() { return "Can be used with output from classify.otu, otu.association, or corr.axes to remove specific otus."; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string outputDir, accnosfile, constaxonomyfile, otucorrfile, corraxesfile, listfile, sharedfile, label; vector outputNames; set otulabels; ListVector* list; vector lookup; int readClassifyOtu(); int readOtuAssociation(); int readCorrAxes(); int readList(); int readShared(); int getListVector(); int getShared(); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/removeotuscommand.cpp000066400000000000000000000420211255543666200225110ustar00rootroot00000000000000/* * removeotuscommand.cpp * Mothur * * Created by westcott on 11/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "removeotuscommand.h" #include "inputdata.h" #include "sharedutilities.h" //********************************************************************************************************************** vector RemoveOtusCommand::setParameters(){ try { CommandParameter pgroup("group", "InputTypes", "", "", "none", "none", "none","group",false,true,true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","list",false,true,true); parameters.push_back(plist); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RemoveOtusCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RemoveOtusCommand::getHelpString(){ try { string helpString = ""; helpString += "The remove.otus command removes otus containing sequences from a specfic group or set of groups.\n"; helpString += "It outputs a new list file containing the otus containing sequences NOT from in the those specified groups.\n"; helpString += "The remove.otus command parameters are accnos, group, list, label and groups. The group and list parameters are required, unless you have valid current files.\n"; helpString += "You must also provide an accnos containing the list of groups to get or set the groups parameter to the groups you wish to select.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like. You can separate group names with dashes.\n"; helpString += "The label parameter allows you to specify which distance you want to process.\n"; helpString += "The remove.otus command should be in the following format: remove.otus(accnos=yourAccnos, list=yourListFile, group=yourGroupFile, label=yourLabel).\n"; helpString += "Example remove.otus(accnos=amazon.accnos, list=amazon.fn.list, group=amazon.groups, label=0.03).\n"; helpString += "or remove.otus(groups=pasture, list=amazon.fn.list, amazon.groups, label=0.03).\n"; helpString += "Note: No spaces between parameter labels (i.e. list), '=' and parameters (i.e.yourListFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RemoveOtusCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RemoveOtusCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "group") { pattern = "[filename],[tag],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[tag],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RemoveOtusCommand", "getOutputPattern"); exit(1); } } ///********************************************************************************************************************** RemoveOtusCommand::RemoveOtusCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["list"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RemoveOtusCommand", "RemoveOtusCommand"); exit(1); } } //********************************************************************************************************************** RemoveOtusCommand::RemoveOtusCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["list"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } } //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = ""; } else { m->setAccnosFile(accnosfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current group file and the group parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setGroupFile(groupfile); } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current list file and the list parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setListFile(listfile); } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; m->mothurOut("You did not provide a label, I will use the first label in your inputfile."); m->mothurOutEndLine(); label=""; } if ((accnosfile == "") && (Groups.size() == 0)) { m->mothurOut("You must provide an accnos file or specify groups using the groups parameter."); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "RemoveOtusCommand", "RemoveOtusCommand"); exit(1); } } //********************************************************************************************************************** int RemoveOtusCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } groupMap = new GroupMap(groupfile); groupMap->readMap(); //get groups you want to remove if (accnosfile != "") { m->readAccnos(accnosfile, Groups); m->setGroups(Groups); } //make sure groups are valid //takes care of user setting groupNames that are invalid or setting groups=all SharedUtil* util = new SharedUtil(); vector allGroups = groupMap->getNamesOfGroups(); util->setGroups(Groups, allGroups); delete util; if (m->control_pressed) { delete groupMap; return 0; } //read through the list file keeping any otus that contain any sequence from the groups selected readListGroup(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtusCommand", "execute"); exit(1); } } //********************************************************************************************************************** int RemoveOtusCommand::readListGroup(){ try { InputData* input = new InputData(listfile, "list"); ListVector* list = input->getListVector(); string lastLabel = list->getLabel(); //using first label seen if none is provided if (label == "") { label = lastLabel; } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[tag]"] = label; variables["[extension]"] = m->getExtension(listfile); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); string GroupOutputDir = outputDir; if (outputDir == "") { GroupOutputDir += m->hasPath(groupfile); } variables["[filename]"] = GroupOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputGroupFileName = getOutputFileName("group", variables); ofstream outGroup; m->openOutputFile(outputGroupFileName, outGroup); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set labels; labels.insert(label); set processedLabels; set userLabels = labels; bool wroteSomething = false; //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && (userLabels.size() != 0)) { if (m->control_pressed) { delete list; delete input; out.close(); outGroup.close(); m->mothurRemove(outputFileName); m->mothurRemove(outputGroupFileName);return 0; } if(labels.count(list->getLabel()) == 1){ processList(list, groupMap, out, outGroup, wroteSomething); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input->getListVector(lastLabel); processList(list, groupMap, out, outGroup, wroteSomething); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = NULL; //get next line to process list = input->getListVector(); } if (m->control_pressed) { if (list != NULL) { delete list; } delete input; out.close(); outGroup.close(); m->mothurRemove(outputFileName); m->mothurRemove(outputGroupFileName); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input->getListVector(lastLabel); processList(list, groupMap, out, outGroup, wroteSomething); delete list; list = NULL; } out.close(); outGroup.close(); if (wroteSomething == false) { m->mothurOut("At distance " + label + " your file ONLY contains otus containing sequences from the groups you wish to remove."); m->mothurOutEndLine(); } outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); outputTypes["group"].push_back(outputGroupFileName); outputNames.push_back(outputGroupFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtusCommand", "readList"); exit(1); } } //********************************************************************************************************************** int RemoveOtusCommand::processList(ListVector*& list, GroupMap*& groupMap, ofstream& out, ofstream& outGroup, bool& wroteSomething){ try { //make a new list vector ListVector newList; newList.setLabel(list->getLabel()); int numOtus = 0; //for each bin vector binLabels = list->getLabels(); vector newBinLabels; for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { return 0; } //parse out names that are in accnos file string binnames = list->get(i); bool removeBin = false; string groupFileOutput = ""; //parse names string individual = ""; int length = binnames.length(); for(int j=0;jgetGroup(individual); if (group == "not found") { m->mothurOut("[ERROR]: " + individual + " is not in your groupfile. please correct."); m->mothurOutEndLine(); group = "NOTFOUND"; } if (m->inUsersGroups(group, Groups)) { removeBin = true; break; } groupFileOutput += individual + "\t" + group + "\n"; individual = ""; } else{ individual += binnames[j]; } } if (!removeBin) { //get last name string group = groupMap->getGroup(individual); if (group == "not found") { m->mothurOut("[ERROR]: " + individual + " is not in your groupfile. please correct."); m->mothurOutEndLine(); group = "NOTFOUND"; } if (m->inUsersGroups(group, Groups)) { removeBin = true; } groupFileOutput += individual + "\t" + group + "\n"; if (!removeBin) { //if there are no sequences from the groups we want to remove in this bin add to new list, output to groupfile newList.push_back(binnames); newBinLabels.push_back(binLabels[i]); outGroup << groupFileOutput; }else { numOtus++; } }else { numOtus++; } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } m->mothurOut(newList.getLabel() + " - removed " + toString(numOtus) + " of the " + toString(list->getNumBins()) + " OTUs."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveOtusCommand", "processList"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/removeotuscommand.h000066400000000000000000000022101255543666200221520ustar00rootroot00000000000000#ifndef REMOVEOTUSCOMMAND_H #define REMOVEOTUSCOMMAND_H /* * removeotuscommand.h * Mothur * * Created by westcott on 11/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "groupmap.h" #include "listvector.hpp" class RemoveOtusCommand : public Command { public: RemoveOtusCommand(string); RemoveOtusCommand(); ~RemoveOtusCommand(){} vector setParameters(); string getCommandName() { return "remove.otus"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Remove.otus"; } string getDescription() { return "outputs a new list file containing the otus NOT containing sequences from the groups specified"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string accnosfile, groupfile, listfile, outputDir, groups, label; bool abort; vector outputNames, Groups; GroupMap* groupMap; int readListGroup(); int processList(ListVector*&, GroupMap*&, ofstream&, ofstream&, bool&); }; #endif mothur-1.36.1/source/commands/removerarecommand.cpp000066400000000000000000001071241255543666200224560ustar00rootroot00000000000000/* * removerarecommand.cpp * mothur * * Created by westcott on 1/21/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "removerarecommand.h" #include "sequence.hpp" #include "groupmap.h" #include "sharedutilities.h" #include "inputdata.h" //********************************************************************************************************************** vector RemoveRareCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "none", "atleast", "none","list",false,false,true); parameters.push_back(plist); CommandParameter prabund("rabund", "InputTypes", "", "", "none", "atleast", "none","rabund",false,false,true); parameters.push_back(prabund); CommandParameter psabund("sabund", "InputTypes", "", "", "none", "atleast", "none","sabund",false,false,true); parameters.push_back(psabund); CommandParameter pshared("shared", "InputTypes", "", "", "none", "atleast", "none","shared",false,false,true); parameters.push_back(pshared); CommandParameter pcount("count", "InputTypes", "", "", "CountGroup", "none", "none","count",false,false); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","group",false,false); parameters.push_back(pgroup); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pnseqs("nseqs", "Number", "", "0", "", "", "","",false,true,true); parameters.push_back(pnseqs); CommandParameter pbygroup("bygroup", "Boolean", "", "f", "", "", "","",false,false); parameters.push_back(pbygroup); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RemoveRareCommand::getHelpString(){ try { string helpString = ""; helpString += "The remove.rare command parameters are list, rabund, sabund, shared, group, count, label, groups, bygroup and nseqs.\n"; helpString += "The remove.rare command reads one of the following file types: list, rabund, sabund or shared file. It outputs a new file after removing the rare otus.\n"; helpString += "The groups parameter allows you to specify which of the groups you would like analyzed. Default=all. You may separate group names with dashes.\n"; helpString += "The label parameter is used to analyze specific labels in your input. default=all. You may separate label names with dashes.\n"; helpString += "The bygroup parameter is only valid with the shared file. default=f, meaning remove any OTU that has nseqs or fewer sequences across all groups.\n"; helpString += "bygroups=T means remove any OTU that has nseqs or fewer sequences in each group (if groupA has 1 sequence and group B has 100 sequences in OTUZ and nseqs=1, then set the groupA count for OTUZ to 0 and keep groupB's count at 100.) \n"; helpString += "The nseqs parameter allows you to set the cutoff for an otu to be deemed rare. It is required.\n"; helpString += "The remove.rare command should be in the following format: remove.rare(shared=yourSharedFile, nseqs=yourRareCutoff).\n"; helpString += "Example remove.rare(shared=amazon.fn.shared, nseqs=2).\n"; helpString += "Note: No spaces between parameter labels (i.e. shared), '=' and parameters (i.e.yourSharedFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RemoveRareCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "rabund") { pattern = "[filename],pick,[extension]"; } else if (type == "sabund") { pattern = "[filename],pick,[extension]"; } else if (type == "group") { pattern = "[filename],pick,[extension]"; } else if (type == "count") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[tag],pick,[extension]"; } else if (type == "shared") { pattern = "[filename],[tag],pick,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RemoveRareCommand::RemoveRareCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["shared"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "RemoveRareCommand"); exit(1); } } //********************************************************************************************************************** RemoveRareCommand::RemoveRareCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for file parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { m->setSabundFile(sabundfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { abort = true; } else if (rabundfile == "not found") { rabundfile = ""; } else { m->setRabundFile(rabundfile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } if ((sharedfile == "") && (listfile == "") && (rabundfile == "") && (sabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { rabundfile = m->getRabundFile(); if (rabundfile != "") { m->mothurOut("Using " + rabundfile + " as input file for the rabund parameter."); m->mothurOutEndLine(); } else { sabundfile = m->getSabundFile(); if (sabundfile != "") { m->mothurOut("Using " + sabundfile + " as input file for the sabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list, sabund, rabund or shared file."); m->mothurOutEndLine(); abort = true; } } } } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = "all"; } m->splitAtDash(groups, Groups); label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } string temp = validParameter.validFile(parameters, "nseqs", false); if (temp == "not found") { m->mothurOut("nseqs is a required parameter."); m->mothurOutEndLine(); abort = true; } else { m->mothurConvert(temp, nseqs); } temp = validParameter.validFile(parameters, "bygroup", false); if (temp == "not found") { temp = "f"; } byGroup = m->isTrue(temp); if (byGroup && (sharedfile == "")) { m->mothurOut("The byGroup parameter is only valid with a shared file."); m->mothurOutEndLine(); } if (((groupfile != "") || (countfile != "")) && (listfile == "")) { m->mothurOut("A group or count file is only valid with a list file."); m->mothurOutEndLine(); groupfile = ""; countfile = ""; } } } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "RemoveRareCommand"); exit(1); } } //********************************************************************************************************************** int RemoveRareCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (m->control_pressed) { return 0; } //read through the correct file and output lines you want to keep if (sabundfile != "") { processSabund(); } if (rabundfile != "") { processRabund(); } if (listfile != "") { processList(); } if (sharedfile != "") { processShared(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set rabund file as new current rabundfile string current = ""; itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "execute"); exit(1); } } //********************************************************************************************************************** int RemoveRareCommand::processList(){ try { //you must provide a label because the names in the listfile need to be consistent string thisLabel = ""; if (allLines) { m->mothurOut("For the listfile you must select one label, using first label in your listfile."); m->mothurOutEndLine(); } else if (labels.size() > 1) { m->mothurOut("For the listfile you must select one label, using " + (*labels.begin()) + "."); m->mothurOutEndLine(); thisLabel = *labels.begin(); } else { thisLabel = *labels.begin(); } InputData input(listfile, "list"); ListVector* list = input.getListVector(); //get first one or the one we want if (thisLabel != "") { //use smart distancing set userLabels; userLabels.insert(thisLabel); set processedLabels; string lastLabel = list->getLabel(); while((list != NULL) && (userLabels.size() != 0)) { if(userLabels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); break; } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); delete list; list = input.getListVector(lastLabel); break; } lastLabel = list->getLabel(); delete list; list = input.getListVector(); } if (userLabels.size() != 0) { m->mothurOut("Your file does not include the label " + thisLabel + ". I will use " + lastLabel + "."); m->mothurOutEndLine(); list = input.getListVector(lastLabel); } } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); variables["[tag]"] = list->getLabel(); string outputFileName = getOutputFileName("list", variables); variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputGroupFileName = getOutputFileName("group", variables); variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string outputCountFileName = getOutputFileName("count", variables); ofstream out, outGroup; m->openOutputFile(outputFileName, out); bool wroteSomething = false; //if groupfile is given then use it GroupMap* groupMap; CountTable ct; if (groupfile != "") { groupMap = new GroupMap(groupfile); groupMap->readMap(); SharedUtil util; vector namesGroups = groupMap->getNamesOfGroups(); util.setGroups(Groups, namesGroups); m->openOutputFile(outputGroupFileName, outGroup); }else if (countfile != "") { ct.readTable(countfile, true, false); if (ct.hasGroupInfo()) { vector namesGroups = ct.getNamesOfGroups(); SharedUtil util; util.setGroups(Groups, namesGroups); } } if (list != NULL) { vector binLabels = list->getLabels(); vector newLabels; //make a new list vector ListVector newList; newList.setLabel(list->getLabel()); //for each bin for (int i = 0; i < list->getNumBins(); i++) { if (m->control_pressed) { if (groupfile != "") { delete groupMap; outGroup.close(); m->mothurRemove(outputGroupFileName); } out.close(); m->mothurRemove(outputFileName); return 0; } //parse out names that are in accnos file string binnames = list->get(i); vector names; string saveBinNames = binnames; m->splitAtComma(binnames, names); int binsize = names.size(); vector newGroupFile; if (groupfile != "") { vector newNames; saveBinNames = ""; for(int k = 0; k < names.size(); k++) { string group = groupMap->getGroup(names[k]); if (m->inUsersGroups(group, Groups)) { newGroupFile.push_back(names[k] + "\t" + group); newNames.push_back(names[k]); saveBinNames += names[k] + ","; } } names = newNames; binsize = names.size(); saveBinNames = saveBinNames.substr(0, saveBinNames.length()-1); }else if (countfile != "") { saveBinNames = ""; binsize = 0; for(int k = 0; k < names.size(); k++) { if (ct.hasGroupInfo()) { vector thisSeqsGroups = ct.getGroups(names[k]); int thisSeqsCount = 0; for (int n = 0; n < thisSeqsGroups.size(); n++) { if (m->inUsersGroups(thisSeqsGroups[n], Groups)) { thisSeqsCount += ct.getGroupCount(names[k], thisSeqsGroups[n]); } } binsize += thisSeqsCount; //if you don't have any seqs from the groups the user wants, then remove you. if (thisSeqsCount == 0) { newGroupFile.push_back(names[k]); } else { saveBinNames += names[k] + ","; } }else { binsize += ct.getNumSeqs(names[k]); saveBinNames += names[k] + ","; } } saveBinNames = saveBinNames.substr(0, saveBinNames.length()-1); } if (binsize > nseqs) { //keep bin newList.push_back(saveBinNames); newLabels.push_back(binLabels[i]); if (groupfile != "") { for(int k = 0; k < newGroupFile.size(); k++) { outGroup << newGroupFile[k] << endl; } } else if (countfile != "") { for(int k = 0; k < newGroupFile.size(); k++) { ct.remove(newGroupFile[k]); } } }else { if (countfile != "") { for(int k = 0; k < names.size(); k++) { ct.remove(names[k]); } } } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newLabels); newList.printHeaders(out); newList.print(out); } } out.close(); if (groupfile != "") { outGroup.close(); outputTypes["group"].push_back(outputGroupFileName); outputNames.push_back(outputGroupFileName); } if (countfile != "") { if (ct.hasGroupInfo()) { vector allGroups = ct.getNamesOfGroups(); for (int i = 0; i < allGroups.size(); i++) { if (!m->inUsersGroups(allGroups[i], Groups)) { ct.removeGroup(allGroups[i]); } } } ct.printTable(outputCountFileName); outputTypes["count"].push_back(outputCountFileName); outputNames.push_back(outputCountFileName); } if (wroteSomething == false) { m->mothurOut("Your file contains only rare sequences."); m->mothurOutEndLine(); } outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); return 0; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "processList"); exit(1); } } //********************************************************************************************************************** int RemoveRareCommand::processSabund(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sabundfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sabundfile)); variables["[extension]"] = m->getExtension(sabundfile); string outputFileName = getOutputFileName("sabund", variables); outputTypes["sabund"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. InputData input(sabundfile, "sabund"); SAbundVector* sabund = input.getSAbundVector(); string lastLabel = sabund->getLabel(); set processedLabels; set userLabels = labels; while((sabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete sabund; out.close(); return 0; } if(allLines == 1 || labels.count(sabund->getLabel()) == 1){ m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); if (sabund->getMaxRank() > nseqs) { for(int i = 1; i <=nseqs; i++) { sabund->set(i, 0); } }else { sabund->clear(); } if (sabund->getNumBins() > 0) { sabund->print(out); } } if ((m->anyLabelsToProcess(sabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = sabund->getLabel(); delete sabund; sabund = input.getSAbundVector(lastLabel); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); if (sabund->getMaxRank() > nseqs) { for(int i = 1; i <=nseqs; i++) { sabund->set(i, 0); } }else { sabund->clear(); } if (sabund->getNumBins() > 0) { sabund->print(out); } //restore real lastlabel to save below sabund->setLabel(saveLabel); } lastLabel = sabund->getLabel(); delete sabund; sabund = input.getSAbundVector(); } if (m->control_pressed) { out.close(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (sabund != NULL) { delete sabund; } sabund = input.getSAbundVector(lastLabel); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); if (sabund->getMaxRank() > nseqs) { for(int i = 1; i <=nseqs; i++) { sabund->set(i, 0); } }else { sabund->clear(); } if (sabund->getNumBins() > 0) { sabund->print(out); } delete sabund; } return 0; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "processSabund"); exit(1); } } //********************************************************************************************************************** int RemoveRareCommand::processRabund(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(rabundfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(rabundfile)); variables["[extension]"] = m->getExtension(rabundfile); string outputFileName = getOutputFileName("rabund", variables); outputTypes["rabund"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. InputData input(rabundfile, "rabund"); RAbundVector* rabund = input.getRAbundVector(); string lastLabel = rabund->getLabel(); set processedLabels; set userLabels = labels; while((rabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete rabund; out.close(); return 0; } if(allLines == 1 || labels.count(rabund->getLabel()) == 1){ m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); processedLabels.insert(rabund->getLabel()); userLabels.erase(rabund->getLabel()); RAbundVector newRabund; newRabund.setLabel(rabund->getLabel()); for (int i = 0; i < rabund->getNumBins(); i++) { if (rabund->get(i) > nseqs) { newRabund.push_back(rabund->get(i)); } } if (newRabund.getNumBins() > 0) { newRabund.print(out); } } if ((m->anyLabelsToProcess(rabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = rabund->getLabel(); delete rabund; rabund = input.getRAbundVector(lastLabel); m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); processedLabels.insert(rabund->getLabel()); userLabels.erase(rabund->getLabel()); RAbundVector newRabund; newRabund.setLabel(rabund->getLabel()); for (int i = 0; i < rabund->getNumBins(); i++) { if (rabund->get(i) > nseqs) { newRabund.push_back(rabund->get(i)); } } if (newRabund.getNumBins() > 0) { newRabund.print(out); } //restore real lastlabel to save below rabund->setLabel(saveLabel); } lastLabel = rabund->getLabel(); delete rabund; rabund = input.getRAbundVector(); } if (m->control_pressed) { out.close(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (rabund != NULL) { delete rabund; } rabund = input.getRAbundVector(lastLabel); m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); RAbundVector newRabund; newRabund.setLabel(rabund->getLabel()); for (int i = 0; i < rabund->getNumBins(); i++) { if (rabund->get(i) > nseqs) { newRabund.push_back(rabund->get(i)); } } if (newRabund.getNumBins() > 0) { newRabund.print(out); } delete rabund; } return 0; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "processRabund"); exit(1); } } //********************************************************************************************************************** int RemoveRareCommand::processShared(){ try { m->setGroups(Groups); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); set processedLabels; set userLabels = labels; while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); processLookup(lookup); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); processLookup(lookup); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processLookup(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } return 0; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "processSabund"); exit(1); } } //********************************************************************************************************************** int RemoveRareCommand::processLookup(vector& lookup){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); variables["[tag]"] = lookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); vector newRabunds; newRabunds.resize(lookup.size()); vector headers; for (int i = 0; i < lookup.size(); i++) { newRabunds[i].setGroup(lookup[i]->getGroup()); newRabunds[i].setLabel(lookup[i]->getLabel()); } if (byGroup) { //for each otu for (int i = 0; i < lookup[0]->getNumBins(); i++) { bool allZero = true; if (m->control_pressed) { out.close(); return 0; } //for each group for (int j = 0; j < lookup.size(); j++) { //are you rare? if (lookup[j]->getAbundance(i) > nseqs) { newRabunds[j].push_back(lookup[j]->getAbundance(i), newRabunds[j].getGroup()); allZero = false; }else { newRabunds[j].push_back(0, newRabunds[j].getGroup()); } } //eliminates zero otus if (allZero) { for (int j = 0; j < newRabunds.size(); j++) { newRabunds[j].pop_back(); } } else { headers.push_back(m->currentSharedBinLabels[i]); } } }else { //for each otu for (int i = 0; i < lookup[0]->getNumBins(); i++) { if (m->control_pressed) { out.close(); return 0; } int totalAbund = 0; //get total otu abundance for (int j = 0; j < lookup.size(); j++) { newRabunds[j].push_back(lookup[j]->getAbundance(i), newRabunds[j].getGroup()); totalAbund += lookup[j]->getAbundance(i); } //eliminates otus below rare cutoff if (totalAbund <= nseqs) { for (int j = 0; j < newRabunds.size(); j++) { newRabunds[j].pop_back(); } } else { headers.push_back(m->currentSharedBinLabels[i]); } } } //do we have any otus above the rare cutoff if (newRabunds[0].getNumBins() != 0) { out << "label\tGroup\tnumOtus"; for (int j = 0; j < headers.size(); j++) { out << '\t' << headers[j]; } out << endl; for (int j = 0; j < newRabunds.size(); j++) { out << newRabunds[j].getLabel() << '\t' << newRabunds[j].getGroup() << '\t'; newRabunds[j].print(out); } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveRareCommand", "processLookup"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/removerarecommand.h000066400000000000000000000023151255543666200221170ustar00rootroot00000000000000#ifndef REMOVERARECOMMAND_H #define REMOVERARECOMMAND_H /* * removerarecommand.h * mothur * * Created by westcott on 1/21/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "listvector.hpp" class RemoveRareCommand : public Command { public: RemoveRareCommand(string); RemoveRareCommand(); ~RemoveRareCommand(){} vector setParameters(); string getCommandName() { return "remove.rare"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Remove.rare"; } string getDescription() { return "removes rare sequences from a sabund, rabund, shared or list and group file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string sabundfile, rabundfile, sharedfile, groupfile, countfile, listfile, outputDir, groups, label; int nseqs, allLines; bool abort, byGroup; vector outputNames, Groups; set labels; int processSabund(); int processRabund(); int processList(); int processShared(); int processLookup(vector&); }; #endif mothur-1.36.1/source/commands/removeseqscommand.cpp000066400000000000000000001125571255543666200225060ustar00rootroot00000000000000/* * removeseqscommand.cpp * Mothur * * Created by Sarah Westcott on 7/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "removeseqscommand.h" #include "sequence.hpp" #include "listvector.hpp" #include "counttable.h" //********************************************************************************************************************** vector RemoveSeqsCommand::setParameters(){ try { CommandParameter pfastq("fastq", "InputTypes", "", "", "none", "FNGLT", "none","fastq",false,false,true); parameters.push_back(pfastq); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "FNGLT", "none","fasta",false,false,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "FNGLT", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "FNGLT", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "FNGLT", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "FNGLT", "none","list",false,false,true); parameters.push_back(plist); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "FNGLT", "none","taxonomy",false,false,true); parameters.push_back(ptaxonomy); CommandParameter palignreport("alignreport", "InputTypes", "", "", "none", "FNGLT", "none","alignreport",false,false); parameters.push_back(palignreport); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "FNGLT", "none","qfile",false,false); parameters.push_back(pqfile); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(paccnos); CommandParameter pdups("dups", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pdups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RemoveSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The remove.seqs command reads an .accnos file and at least one of the following file types: fasta, name, group, count, list, taxonomy, quality, fastq or alignreport file.\n"; helpString += "It outputs a file containing the sequences NOT in the .accnos file.\n"; helpString += "The remove.seqs command parameters are accnos, fasta, name, group, count, list, taxonomy, qfile, alignreport, fastq and dups. You must provide accnos and at least one of the file parameters.\n"; helpString += "The dups parameter allows you to remove the entire line from a name file if you remove any name from the line. default=true. \n"; helpString += "The remove.seqs command should be in the following format: remove.seqs(accnos=yourAccnos, fasta=yourFasta).\n"; helpString += "Example remove.seqs(accnos=amazon.accnos, fasta=amazon.fasta).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RemoveSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],pick,[extension]"; } else if (type == "fastq") { pattern = "[filename],pick,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],pick,[extension]"; } else if (type == "name") { pattern = "[filename],pick,[extension]"; } else if (type == "group") { pattern = "[filename],pick,[extension]"; } else if (type == "count") { pattern = "[filename],pick,[extension]"; } else if (type == "list") { pattern = "[filename],[distance],pick,[extension]"; } else if (type == "qfile") { pattern = "[filename],pick,[extension]"; } else if (type == "alignreport") { pattern = "[filename],pick.align.report"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "GetSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RemoveSeqsCommand::RemoveSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["fastq"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "RemoveSeqsCommand"); exit(1); } } //********************************************************************************************************************** RemoveSeqsCommand::RemoveSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["fastq"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("alignreport"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["alignreport"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("fastq"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fastq"] = inputDir + it->second; } } } //check for required parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { abort = true; } else if (accnosfile == "not found") { accnosfile = m->getAccnosFile(); if (accnosfile != "") { m->mothurOut("Using " + accnosfile + " as input file for the accnos parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no valid accnos file and accnos is required."); m->mothurOutEndLine(); abort = true; } }else { m->setAccnosFile(accnosfile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } alignfile = validParameter.validFile(parameters, "alignreport", true); if (alignfile == "not open") { abort = true; } else if (alignfile == "not found") { alignfile = ""; } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { abort = true; } else if (taxfile == "not found") { taxfile = ""; } else { m->setTaxonomyFile(taxfile); } qualfile = validParameter.validFile(parameters, "qfile", true); if (qualfile == "not open") { abort = true; } else if (qualfile == "not found") { qualfile = ""; } else { m->setQualFile(qualfile); } fastqfile = validParameter.validFile(parameters, "fastq", true); if (fastqfile == "not open") { abort = true; } else if (fastqfile == "not found") { fastqfile = ""; } string usedDups = "true"; string temp = validParameter.validFile(parameters, "dups", false); if (temp == "not found") { if (namefile != "") { temp = "true"; } else { temp = "false"; usedDups = ""; } } dups = m->isTrue(temp); countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } if ((fastqfile == "") && (countfile == "") && (fastafile == "") && (namefile == "") && (groupfile == "") && (alignfile == "") && (listfile == "") && (taxfile == "") && (qualfile == "")) { m->mothurOut("You must provide at least one of the following: fasta, name, group, taxonomy, quality, alignreport, fastq or list."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if ((fastafile != "") && (namefile == "")) { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "RemoveSeqsCommand"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get names you want to keep names = m->readAccnos(accnosfile); if (m->control_pressed) { return 0; } if (countfile != "") { if ((fastafile != "") || (listfile != "") || (taxfile != "")) { m->mothurOut("\n[NOTE]: The count file should contain only unique names, so mothur assumes your fasta, list and taxonomy files also contain only uniques.\n\n"); } } //read through the correct file and output lines you want to keep if (namefile != "") { readName(); } if (fastafile != "") { readFasta(); } if (fastqfile != "") { readFastq(); } if (groupfile != "") { readGroup(); } if (alignfile != "") { readAlign(); } if (listfile != "") { readList(); } if (taxfile != "") { readTax(); } if (qualfile != "") { readQual(); } if (countfile != "") { readCount(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("qfile"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setQualFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::readFasta(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string outputFileName = getOutputFileName("fasta", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(fastafile, in); string name; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); if (!dups) {//adjust name if needed map::iterator it = uniqueMap.find(currSeq.getName()); if (it != uniqueMap.end()) { currSeq.setName(it->second); } } name = currSeq.getName(); if (name != "") { //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; currSeq.printSequence(out); }else { removedCount++; } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["fasta"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your fasta file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::readFastq(){ try { bool wroteSomething = false; int removedCount = 0; ifstream in; m->openInputFile(fastqfile, in); string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastqfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastqfile)); variables["[extension]"] = m->getExtension(fastqfile); string outputFileName = getOutputFileName("fastq", variables); ofstream out; m->openOutputFile(outputFileName, out); while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } //read sequence name string input = m->getline(in); m->gobble(in); string outputString = input + "\n"; if (input[0] == '@') { //get rest of lines outputString += m->getline(in) + "\n"; m->gobble(in); outputString += m->getline(in) + "\n"; m->gobble(in); outputString += m->getline(in) + "\n"; m->gobble(in); vector splits = m->splitWhiteSpace(input); string name = splits[0]; name = name.substr(1); m->checkName(name); if (names.count(name) == 0) { wroteSomething = true; out << outputString; }else { removedCount++; } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["fasta"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your fastq file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readFastq"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::readQual(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(qualfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(qualfile)); variables["[extension]"] = m->getExtension(qualfile); string outputFileName = getOutputFileName("qfile", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(qualfile, in); string name; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ string saveName = ""; string name = ""; string scores = ""; in >> name; if (name.length() != 0) { saveName = name.substr(1); while (!in.eof()) { char c = in.get(); if (c == 10 || c == 13 || c == -1){ break; } else { name += c; } } m->gobble(in); } while(in){ char letter= in.get(); if(letter == '>'){ in.putback(letter); break; } else{ scores += letter; } } m->gobble(in); if (!dups) {//adjust name if needed map::iterator it = uniqueMap.find(saveName); if (it != uniqueMap.end()) { name = ">" + it->second; saveName = it->second; } } if (names.count(saveName) == 0) { wroteSomething = true; out << name << endl << scores; }else { removedCount++; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputNames.push_back(outputFileName); outputTypes["qfile"].push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your quality file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readQual"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::readCount(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string outputFileName = getOutputFileName("count", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(countfile, in); bool wroteSomething = false; int removedCount = 0; string headers = m->getline(in); m->gobble(in); out << headers << endl; string test = headers; vector pieces = m->splitWhiteSpace(test); string name, rest; int thisTotal; rest = ""; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> thisTotal; m->gobble(in); if (pieces.size() > 2) { rest = m->getline(in); m->gobble(in); } if (m->debug) { m->mothurOut("[DEBUG]: " + name + '\t' + rest + "\n"); } if (names.count(name) == 0) { out << name << '\t' << thisTotal << '\t' << rest << endl; wroteSomething = true; }else { removedCount += thisTotal; } } in.close(); out.close(); //check for groups that have been eliminated CountTable ct; if (ct.testGroups(outputFileName)) { ct.readTable(outputFileName, true, false); ct.printTable(outputFileName); } if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["count"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your count file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readCount"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::readList(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); ifstream in; m->openInputFile(listfile, in); bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ removedCount = 0; //read in list vector ListVector list(in); //make a new list vector ListVector newList; newList.setLabel(list.getLabel()); variables["[distance]"] = list.getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); vector binLabels = list.getLabels(); vector newBinLabels; if (m->control_pressed) { in.close(); out.close(); return 0; } //for each bin for (int i = 0; i < list.getNumBins(); i++) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } //parse out names that are in accnos file string bin = list.get(i); vector bnames; m->splitAtComma(bin, bnames); string newNames = ""; for (int j = 0; j < bnames.size(); j++) { string name = bnames[j]; //if that name is in the .accnos file, add it if (names.count(name) == 0) { newNames += name + ","; } else { removedCount++; } } //if there are names in this bin add to new list if (newNames != "") { newNames = newNames.substr(0, newNames.length()-1); //rip off extra comma newList.push_back(newNames); newBinLabels.push_back(binLabels[i]); } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newBinLabels); newList.printHeaders(out); newList.print(out); } m->gobble(in); out.close(); } in.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } m->mothurOut("Removed " + toString(removedCount) + " sequences from your list file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readList"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::readName(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputFileName = getOutputFileName("name", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); in >> secondCol; vector parsedNames; m->splitAtComma(secondCol, parsedNames); vector validSecond; validSecond.clear(); for (int i = 0; i < parsedNames.size(); i++) { if (names.count(parsedNames[i]) == 0) { validSecond.push_back(parsedNames[i]); } } if ((dups) && (validSecond.size() != parsedNames.size())) { //if dups is true and we want to get rid of anyone, get rid of everyone for (int i = 0; i < parsedNames.size(); i++) { names.insert(parsedNames[i]); } removedCount += parsedNames.size(); }else { removedCount += parsedNames.size()-validSecond.size(); //if the name in the first column is in the set then print it and any other names in second column also in set if (names.count(firstCol) == 0) { wroteSomething = true; out << firstCol << '\t'; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; //make first name in set you come to first column and then add the remaining names to second column }else { //you want part of this row if (validSecond.size() != 0) { wroteSomething = true; out << validSecond[0] << '\t'; //we are changing the unique name in the fasta file uniqueMap[firstCol] = validSecond[0]; //you know you have at least one valid second since first column is valid for (int i = 0; i < validSecond.size()-1; i++) { out << validSecond[i] << ','; } out << validSecond[validSecond.size()-1] << endl; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["name"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your name file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readName"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::readGroup(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(groupfile, in); string name, group; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); //read from first column in >> group; //read from second column //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t' << group << endl; }else { removedCount++; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["group"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your group file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int RemoveSeqsCommand::readTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(taxfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(taxfile)); variables["[extension]"] = m->getExtension(taxfile); string outputFileName = getOutputFileName("taxonomy", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(taxfile, in); string name, tax; bool wroteSomething = false; int removedCount = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); //read from first column in >> tax; //read from second column if (!dups) {//adjust name if needed map::iterator it = uniqueMap.find(name); if (it != uniqueMap.end()) { name = it->second; } } //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t' << tax << endl; }else { removedCount++; } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["taxonomy"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your taxonomy file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readTax"); exit(1); } } //********************************************************************************************************************** //alignreport file has a column header line then all other lines contain 16 columns. we just want the first column since that contains the name int RemoveSeqsCommand::readAlign(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(alignfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(alignfile)); string outputFileName = getOutputFileName("alignreport", variables); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(alignfile, in); string name, junk; bool wroteSomething = false; int removedCount = 0; //read column headers for (int i = 0; i < 16; i++) { if (!in.eof()) { in >> junk; out << junk << '\t'; } else { break; } } out << endl; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; //read from first column if (!dups) {//adjust name if needed map::iterator it = uniqueMap.find(name); if (it != uniqueMap.end()) { name = it->second; } } //if this name is in the accnos file if (names.count(name) == 0) { wroteSomething = true; out << name << '\t'; //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; out << junk << '\t'; } else { break; } } out << endl; }else {//still read just don't do anything with it removedCount++; //read rest for (int i = 0; i < 15; i++) { if (!in.eof()) { in >> junk; } else { break; } } } m->gobble(in); } in.close(); out.close(); if (wroteSomething == false) { m->mothurOut("Your file contains only sequences from the .accnos file."); m->mothurOutEndLine(); } outputTypes["alignreport"].push_back(outputFileName); outputNames.push_back(outputFileName); m->mothurOut("Removed " + toString(removedCount) + " sequences from your alignreport file."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "RemoveSeqsCommand", "readAlign"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/removeseqscommand.h000066400000000000000000000024501255543666200221410ustar00rootroot00000000000000#ifndef REMOVESEQSCOMMAND_H #define REMOVESEQSCOMMAND_H /* * removeseqscommand.h * Mothur * * Created by Sarah Westcott on 7/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" class RemoveSeqsCommand : public Command { public: RemoveSeqsCommand(string); RemoveSeqsCommand(); ~RemoveSeqsCommand(){} vector setParameters(); string getCommandName() { return "remove.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Remove.seqs"; } string getDescription() { return "removes sequences from a list, fasta, name, group, alignreport, quality or taxonomy file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: set names; string accnosfile, fastafile, fastqfile, namefile, groupfile, countfile, alignfile, listfile, taxfile, qualfile, outputDir; bool abort, dups; vector outputNames; map uniqueMap; int readFasta(); int readFastq(); int readName(); int readGroup(); int readCount(); int readAlign(); int readList(); int readTax(); int readQual(); }; #endif mothur-1.36.1/source/commands/renameseqscommand.cpp000066400000000000000000000523011255543666200224460ustar00rootroot00000000000000// // renameseqscommand.cpp // Mothur // // Created by SarahsWork on 5/28/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "renameseqscommand.h" #include "sequence.hpp" #include "groupmap.h" #include "counttable.h" //********************************************************************************************************************** vector RenameSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "GroupCount", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "GroupCount", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pdelim("delim", "String", "", "_", "", "", "","",false,false); parameters.push_back(pdelim); CommandParameter pplacement("placement", "Multiple", "front-back", "back", "", "", "","",false,false); parameters.push_back(pplacement); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "RenameSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string RenameSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The rename.seqs command reads a fastafile and groupfile or count file with an optional namefile. It creates files with the sequence names concatenated with the group."; helpString += "The rename.seqs command parameters are fasta, name, group, count, placement, delim. Fasta and group or count are required, unless a current file is available for both.\n"; helpString += "The placement parameter allows you to indicate whether you would like the group name appended to the front or back of the sequence name. Options are front or back. Default=back.\n"; helpString += "The delim parameter allow you to enter the character or characters you would like to separate the sequence name from the group name. Default='_'.\n"; helpString += "The rename.seqs command should be in the following format: \n"; helpString += "The rename.seqs command should be in the following format: \n"; helpString += "rename.seqs(fasta=yourFastaFile, group=yourGroupFile) \n"; helpString += "Example rename.seqs(fasta=abrecovery.unique.fasta, group=abrecovery.group).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "RenameSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string RenameSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],renamed,[extension]"; } else if (type == "name") { pattern = "[filename],renamed,[extension]"; } else if (type == "group") { pattern = "[filename],renamed,[extension]"; } else if (type == "count") { pattern = "[filename],renamed,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "RenameSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** RenameSeqsCommand::RenameSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "RenameSeqsCommand", "RenameSeqsCommand"); exit(1); } } /**************************************************************************************/ RenameSeqsCommand::RenameSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters fastaFile = validParameter.validFile(parameters, "fasta", true); if (fastaFile == "not open") { abort = true; } else if (fastaFile == "not found") { fastaFile = m->getFastaFile(); if (fastaFile != "") { m->mothurOut("Using " + fastaFile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastaFile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); CountTable temp; if (!temp.testGroups(countfile)) { m->mothurOut("[ERROR]: Your count file does not have group info, aborting.\n"); abort=true; } } nameFile = validParameter.validFile(parameters, "name", true); if (nameFile == "not open") { abort = true; } else if (nameFile == "not found"){ nameFile =""; } else { m->setNameFile(nameFile); } //look for current files if ((groupfile == "") && (countfile == "")) { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } CountTable temp; if (!temp.testGroups(countfile)) { m->mothurOut("[ERROR]: Your count file does not have group info, aborting.\n"); abort=true; } else { m->mothurOut("[ERROR]: You need to provide a groupfile or countfile."); m->mothurOutEndLine(); abort = true; } } } if ((countfile != "") && (nameFile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } placement = validParameter.validFile(parameters, "placement", false); if (placement == "not found") { placement = "back"; } if ((placement == "front") || (placement == "back")) { } else { m->mothurOut("[ERROR]: " + placement + " is not a valid placement option. Valid placement options are front or back.\n"); abort = true; } delim = validParameter.validFile(parameters, "delim", false); if (delim == "not found") { delim = "_"; } if (countfile == "") { if (nameFile == "") { vector files; files.push_back(fastaFile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "RenameSeqsCommand", "RenameSeqsCommand"); exit(1); } } /**************************************************************************************/ int RenameSeqsCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //prepare filenames and open files string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastaFile); } string outFastaFile = thisOutputDir + m->getRootName(m->getSimpleName(fastaFile)); map variables; variables["[filename]"] = outFastaFile; variables["[extension]"] = m->getExtension(fastaFile); outFastaFile = getOutputFileName("fasta", variables); outputNames.push_back(outFastaFile); outputTypes["fasta"].push_back(outFastaFile); ofstream outFasta; m->openOutputFile(outFastaFile, outFasta); ifstream in; m->openInputFile(fastaFile, in); GroupMap* groupMap = NULL; CountTable* countTable = NULL; if (groupfile != "") { groupMap = new GroupMap(groupfile); int groupError = groupMap->readMap(); if (groupError == 1) { delete groupMap; return 0; } vector allGroups = groupMap->getNamesOfGroups(); m->setAllGroups(allGroups); }else{ countTable = new CountTable(); countTable->readTable(countfile, true, false); } while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); string group = "not found"; if (groupfile != "") { group = groupMap->getGroup(seq.getName()); } else { vector groups = countTable->getGroups(seq.getName()); if (group.size() == 0) { group = "not found"; } else { group = groups[0]; for (int i = 1; i < groups.size(); i++) { group += "_" + groups[i]; } } } if (group == "not found") { m->mothurOut("[ERROR]: " + seq.getName() + " is not in your file, please correct.\n"); m->control_pressed = true; } else { string newName = ""; if (placement == "back") { newName = seq.getName() + delim + group; } else { newName = group + delim + seq.getName(); } //rename sequence in count table if (countfile != "") { countTable->renameSeq(seq.getName(), newName); } seq.setName(newName); seq.printSequence(outFasta); } } in.close(); if (m->control_pressed) { if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } bool notDone = true; if (nameFile != "") { thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(nameFile); } string outNameFile = thisOutputDir + m->getRootName(m->getSimpleName(nameFile)); variables["[filename]"] = outNameFile; variables["[extension]"] = m->getExtension(nameFile); outNameFile = getOutputFileName("group", variables); outputNames.push_back(outNameFile); outputTypes["name"].push_back(outNameFile); ofstream outName; m->openOutputFile(outNameFile, outName); map > nameMap; m->readNames(nameFile, nameMap); //process name file changing names for (map >::iterator it = nameMap.begin(); it != nameMap.end(); it++) { for (int i = 0; i < (it->second).size()-1; i++) { if (m->control_pressed) { break; } string group = groupMap->getGroup((it->second)[i]); if (group == "not found") { m->mothurOut("[ERROR]: " + (it->second)[i] + " is not in your group file, please correct.\n"); m->control_pressed = true; } else { string newName = ""; if (placement == "back") { newName = (it->second)[i] + delim + group; } else { newName = group + delim + (it->second)[i]; } groupMap->renameSeq((it->second)[i], newName); //change in group file (it->second)[i] = newName; //change in namefile } if (i == 0) { outName << (it->second)[i] << '\t' << (it->second)[i] << ','; } else { outName << (it->second)[i] << ','; } } //print last one if ((it->second).size() == 1) { string group = groupMap->getGroup((it->second)[0]); if (group == "not found") { m->mothurOut("[ERROR]: " + (it->second)[0] + " is not in your group file, please correct.\n"); m->control_pressed = true; } else { string newName = ""; if (placement == "back") { newName = (it->second)[0] + delim + group; } else { newName = group + delim + (it->second)[0]; } groupMap->renameSeq((it->second)[0], newName); //change in group file (it->second)[0] = newName; //change in namefile outName << (it->second)[0] << '\t' << (it->second)[0] << endl; } } else { string group = groupMap->getGroup((it->second)[(it->second).size()-1]); if (group == "not found") { m->mothurOut("[ERROR]: " + (it->second)[(it->second).size()-1] + " is not in your group file, please correct.\n"); m->control_pressed = true; } else { string newName = ""; if (placement == "back") { newName = (it->second)[(it->second).size()-1] + delim + group; } else { newName = group + delim + (it->second)[(it->second).size()-1]; } groupMap->renameSeq((it->second)[(it->second).size()-1], newName); //change in group file (it->second)[(it->second).size()-1] = newName; //change in namefile outName << (it->second)[(it->second).size()-1] << endl; } } } notDone = false; outName.close(); } if (m->control_pressed) { if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (groupfile != "") { if (notDone) { vector seqs = groupMap->getNamesSeqs(); for (int i = 0; i < seqs.size(); i++) { if (m->control_pressed) { break; } string group = groupMap->getGroup(seqs[i]); string newName = ""; if (placement == "back") { newName = seqs[i] + delim + group; } else { newName = group + delim + seqs[i]; } groupMap->renameSeq(seqs[i], newName); } } thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } string outGroupFile = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[filename]"] = outGroupFile; variables["[extension]"] = m->getExtension(groupfile); outGroupFile = getOutputFileName("group", variables); outputNames.push_back(outGroupFile); outputTypes["group"].push_back(outGroupFile); ofstream outGroup; m->openOutputFile(outGroupFile, outGroup); groupMap->print(outGroup); outGroup.close(); }else { thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } string outCountFile = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[filename]"] = outCountFile; variables["[extension]"] = m->getExtension(countfile); outCountFile = getOutputFileName("count", variables); outputNames.push_back(outCountFile); outputTypes["count"].push_back(outCountFile); countTable->printTable(outCountFile); } if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "RenameSeqsCommand", "execute"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/renameseqscommand.h000066400000000000000000000017141255543666200221150ustar00rootroot00000000000000// // renameseqscommand.h // Mothur // // Created by SarahsWork on 5/28/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_renameseqscommand_h #define Mothur_renameseqscommand_h #include "command.hpp" class RenameSeqsCommand : public Command { public: RenameSeqsCommand(string); RenameSeqsCommand(); ~RenameSeqsCommand() {} vector setParameters(); string getCommandName() { return "rename.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Rename.seqs"; } string getDescription() { return "rename sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string fastaFile, nameFile, groupfile, outputDir, placement, delim, countfile; vector outputNames; bool abort; map nameMap; }; #endif mothur-1.36.1/source/commands/reversecommand.cpp000066400000000000000000000236121255543666200217610ustar00rootroot00000000000000/* * reversecommand.cpp * Mothur * * Created by Pat Schloss on 6/6/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "reversecommand.h" #include "sequence.hpp" #include "qualityscores.h" //********************************************************************************************************************** vector ReverseSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "fastaQual", "none","fasta",false,false,true); parameters.push_back(pfasta); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "fastaQual", "none","qfile",false,false,true); parameters.push_back(pqfile); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ReverseSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ReverseSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The reverse.seqs command reads a fastafile and outputs a fasta file containing the reverse compliment.\n"; helpString += "The reverse.seqs command parameters fasta or qfile are required.\n"; helpString += "The reverse.seqs command should be in the following format: \n"; helpString += "reverse.seqs(fasta=yourFastaFile) \n"; return helpString; } catch(exception& e) { m->errorOut(e, "ReverseSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ReverseSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],rc,[extension]"; } else if (type == "qfile") { pattern = "[filename],rc,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ReverseSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ReverseSeqsCommand::ReverseSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ReverseSeqsCommand", "ReverseSeqsCommand"); exit(1); } } //*************************************************************************************************************** ReverseSeqsCommand::ReverseSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } } //check for required parameters fastaFileName = validParameter.validFile(parameters, "fasta", true); if (fastaFileName == "not open") { abort = true; } else if (fastaFileName == "not found") { fastaFileName = "";}// m->mothurOut("fasta is a required parameter for the reverse.seqs command."); m->mothurOutEndLine(); abort = true; } else { m->setFastaFile(fastaFileName); } qualFileName = validParameter.validFile(parameters, "qfile", true); if (qualFileName == "not open") { abort = true; } else if (qualFileName == "not found") { qualFileName = ""; }//m->mothurOut("fasta is a required parameter for the reverse.seqs command."); m->mothurOutEndLine(); abort = true; } else { m->setQualFile(qualFileName); } if((fastaFileName == "") && (qualFileName == "")){ fastaFileName = m->getFastaFile(); if (fastaFileName != "") { m->mothurOut("Using " + fastaFileName + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { qualFileName = m->getQualFile(); if (qualFileName != "") { m->mothurOut("Using " + qualFileName + " as input file for the qfile parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current files for fasta or qfile, and fasta or qfile is a required parameter for the reverse.seqs command."); m->mothurOutEndLine(); abort = true; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } } } catch(exception& e) { m->errorOut(e, "ReverseSeqsCommand", "ReverseSeqsCommand"); exit(1); } } //*************************************************************************************************************** int ReverseSeqsCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } string fastaReverseFileName; if(fastaFileName != ""){ ifstream inFASTA; m->openInputFile(fastaFileName, inFASTA); ofstream outFASTA; string tempOutputDir = outputDir; if (outputDir == "") { tempOutputDir += m->hasPath(fastaFileName); } //if user entered a file with a path then preserve it map variables; variables["[filename]"] = tempOutputDir + m->getRootName(m->getSimpleName(fastaFileName)); variables["[extension]"] = m->getExtension(fastaFileName); fastaReverseFileName = getOutputFileName("fasta", variables); m->openOutputFile(fastaReverseFileName, outFASTA); while(!inFASTA.eof()){ if (m->control_pressed) { inFASTA.close(); outFASTA.close(); m->mothurRemove(fastaReverseFileName); return 0; } Sequence currSeq(inFASTA); m->gobble(inFASTA); if (currSeq.getName() != "") { currSeq.reverseComplement(); currSeq.printSequence(outFASTA); } } inFASTA.close(); outFASTA.close(); outputNames.push_back(fastaReverseFileName); outputTypes["fasta"].push_back(fastaReverseFileName); } string qualReverseFileName; if(qualFileName != ""){ QualityScores currQual; ifstream inQual; m->openInputFile(qualFileName, inQual); ofstream outQual; string tempOutputDir = outputDir; if (outputDir == "") { tempOutputDir += m->hasPath(qualFileName); } //if user entered a file with a path then preserve it map variables; variables["[filename]"] = tempOutputDir + m->getRootName(m->getSimpleName(qualFileName)); variables["[extension]"] = m->getExtension(qualFileName); string qualReverseFileName = getOutputFileName("qfile", variables); m->openOutputFile(qualReverseFileName, outQual); while(!inQual.eof()){ if (m->control_pressed) { inQual.close(); outQual.close(); m->mothurRemove(qualReverseFileName); return 0; } currQual = QualityScores(inQual); m->gobble(inQual); currQual.flipQScores(); currQual.printQScores(outQual); } inQual.close(); outQual.close(); outputNames.push_back(qualReverseFileName); outputTypes["qfile"].push_back(qualReverseFileName); } if (m->control_pressed) { m->mothurRemove(qualReverseFileName); m->mothurRemove(fastaReverseFileName); return 0; } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("qfile"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setQualFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for(int i=0;imothurOut(outputNames[i]); m->mothurOutEndLine(); } return 0; } catch(exception& e) { m->errorOut(e, "ReverseSeqsCommand", "execute"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/commands/reversecommand.h000066400000000000000000000016371255543666200214310ustar00rootroot00000000000000#ifndef REVERSECOMMAND_H #define REVERSECOMMAND_H /* * reversecommand.h * Mothur * * Created by Pat Schloss on 6/6/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "command.hpp" class ReverseSeqsCommand : public Command { public: ReverseSeqsCommand(string); ReverseSeqsCommand(); ~ReverseSeqsCommand() {} vector setParameters(); string getCommandName() { return "reverse.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Reverse.seqs"; } string getDescription() { return "outputs a fasta file containing the reverse-complements"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string fastaFileName, qualFileName, outputDir; vector outputNames; }; #endif mothur-1.36.1/source/commands/screenseqscommand.cpp000066400000000000000000004070351255543666200224660ustar00rootroot00000000000000/* * screenseqscommand.cpp * Mothur * * Created by Pat Schloss on 6/3/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "screenseqscommand.h" #include "counttable.h" //********************************************************************************************************************** vector ScreenSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter pcontigsreport("contigsreport", "InputTypes", "", "", "report", "none", "none","contigsreport",false,false,true); parameters.push_back(pcontigsreport); CommandParameter palignreport("alignreport", "InputTypes", "", "", "report", "none", "none","alignreport",false,false); parameters.push_back(palignreport); CommandParameter psummary("summary", "InputTypes", "", "", "report", "none", "none","summary",false,false); parameters.push_back(psummary); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "none", "none","qfile",false,false); parameters.push_back(pqfile); CommandParameter ptax("taxonomy", "InputTypes", "", "", "none", "none", "none","taxonomy",false,false); parameters.push_back(ptax); CommandParameter pstart("start", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(pstart); CommandParameter pend("end", "Number", "", "-1", "", "", "","",false,false,true); parameters.push_back(pend); CommandParameter pmaxambig("maxambig", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmaxambig); CommandParameter pmaxhomop("maxhomop", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmaxhomop); CommandParameter pminlength("minlength", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pminlength); CommandParameter pmaxlength("maxlength", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmaxlength); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pcriteria("criteria", "Number", "", "90", "", "", "","",false,false); parameters.push_back(pcriteria); CommandParameter poptimize("optimize", "Multiple", "none-start-end-maxambig-maxhomop-minlength-maxlength", "none", "", "", "","",true,false); parameters.push_back(poptimize); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); //report parameters CommandParameter pminoverlap("minoverlap", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pminoverlap); CommandParameter postart("ostart", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(postart); CommandParameter poend("oend", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(poend); CommandParameter pmismatches("mismatches", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmismatches); CommandParameter pmaxn("maxn", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmaxn); CommandParameter pminscore("minscore", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pminscore); CommandParameter pmaxinsert("maxinsert", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmaxinsert); CommandParameter pminsim("minsim", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pminsim); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ScreenSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The screen.seqs command reads a fastafile and screens sequences.\n"; helpString += "The screen.seqs command parameters are fasta, start, end, maxambig, maxhomop, minlength, maxlength, name, group, count, qfile, alignreport, contigsreport, summary, taxonomy, optimize, criteria and processors.\n"; helpString += "The fasta parameter is required.\n"; helpString += "The contigsreport parameter allows you to use the contigsreport file to determine if a sequence is good. Screening parameters include: minoverlap, ostart, oend and mismatches. \n"; helpString += "The alignreport parameter allows you to use the alignreport file to determine if a sequence is good. Screening parameters include: minsim, minscore and maxinsert. \n"; helpString += "The summary parameter allows you to use the summary file from summary.seqs to save time processing.\n"; helpString += "The taxonomy parameter allows you to remove bad seqs from taxonomy files.\n"; helpString += "The start parameter is used to set a position the \"good\" sequences must start by. The default is -1.\n"; helpString += "The end parameter is used to set a position the \"good\" sequences must end after. The default is -1.\n"; helpString += "The maxambig parameter allows you to set the maximum number of ambigious bases allowed. The default is -1.\n"; helpString += "The maxhomop parameter allows you to set a maximum homopolymer length. \n"; helpString += "The minlength parameter allows you to set and minimum sequence length. \n"; helpString += "The maxn parameter allows you to set and maximum number of N's allowed in a sequence. \n"; helpString += "The minoverlap parameter allows you to set and minimum overlap. The default is -1. \n"; helpString += "The ostart parameter is used to set an overlap position the \"good\" sequences must start by. The default is -1. \n"; helpString += "The oend parameter is used to set an overlap position the \"good\" sequences must end after. The default is -1.\n"; helpString += "The mismatches parameter allows you to set and maximum mismatches in the contigs.report. \n"; helpString += "The minsim parameter allows you to set the minimum similarity to template sequences during alignment. Found in column \'SimBtwnQuery&Template\' in align.report file.\n"; helpString += "The minscore parameter allows you to set the minimum search score during alignment. Found in column \'SearchScore\' in align.report file.\n"; helpString += "The maxinsert parameter allows you to set the maximum number of insertions during alignment. Found in column \'LongestInsert\' in align.report file.\n"; helpString += "The processors parameter allows you to specify the number of processors to use while running the command. The default is 1.\n"; helpString += "The optimize and criteria parameters allow you set the start, end, maxabig, maxhomop, minlength and maxlength parameters relative to your set of sequences .\n"; helpString += "For example optimize=start-end, criteria=90, would set the start and end values to the position 90% of your sequences started and ended.\n"; helpString += "The name parameter allows you to provide a namesfile, and the group parameter allows you to provide a groupfile.\n"; helpString += "The screen.seqs command should be in the following format: \n"; helpString += "screen.seqs(fasta=yourFastaFile, name=youNameFile, group=yourGroupFIle, start=yourStart, end=yourEnd, maxambig=yourMaxambig, \n"; helpString += "maxhomop=yourMaxhomop, minlength=youMinlength, maxlength=yourMaxlength) \n"; helpString += "Example screen.seqs(fasta=abrecovery.fasta, name=abrecovery.names, group=abrecovery.groups, start=..., end=..., maxambig=..., maxhomop=..., minlength=..., maxlength=...).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ScreenSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],good,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],good,[extension]"; } else if (type == "name") { pattern = "[filename],good,[extension]"; } else if (type == "group") { pattern = "[filename],good,[extension]"; } else if (type == "count") { pattern = "[filename],good,[extension]"; } else if (type == "accnos") { pattern = "[filename],bad.accnos"; } else if (type == "qfile") { pattern = "[filename],good,[extension]"; } else if (type == "alignreport") { pattern = "[filename],good.align.report"; } else if (type == "contigsreport") { pattern = "[filename],good.contigs.report"; } else if (type == "summary") { pattern = "[filename],good.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ScreenSeqsCommand::ScreenSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["contigsreport"] = tempOutNames; outputTypes["summary"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "ScreenSeqsCommand"); exit(1); } } //*************************************************************************************************************** ScreenSeqsCommand::ScreenSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("screen.seqs"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["alignreport"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["contigsreport"] = tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("alignreport"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["alignreport"] = inputDir + it->second; } } it = parameters.find("contigsreport"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["contigsreport"] = inputDir + it->second; } } it = parameters.find("summary"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["summary"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (fastafile == "not open") { abort = true; } else { m->setFastaFile(fastafile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } qualfile = validParameter.validFile(parameters, "qfile", true); if (qualfile == "not open") { abort = true; } else if (qualfile == "not found") { qualfile = ""; } else { m->setQualFile(qualfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } contigsreport = validParameter.validFile(parameters, "contigsreport", true); if (contigsreport == "not open") { contigsreport = ""; abort = true; } else if (contigsreport == "not found") { contigsreport = ""; } summaryfile = validParameter.validFile(parameters, "summary", true); if (summaryfile == "not open") { summaryfile = ""; abort = true; } else if (summaryfile == "not found") { summaryfile = ""; } else { m->setSummaryFile(summaryfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } alignreport = validParameter.validFile(parameters, "alignreport", true); if (alignreport == "not open") { abort = true; } else if (alignreport == "not found") { alignreport = ""; } taxonomy = validParameter.validFile(parameters, "taxonomy", true); if (taxonomy == "not open") { abort = true; } else if (taxonomy == "not found") { taxonomy = ""; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(fastafile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "start", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, startPos); temp = validParameter.validFile(parameters, "end", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, endPos); temp = validParameter.validFile(parameters, "maxambig", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, maxAmbig); temp = validParameter.validFile(parameters, "maxhomop", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, maxHomoP); temp = validParameter.validFile(parameters, "minlength", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, minLength); temp = validParameter.validFile(parameters, "maxlength", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, maxLength); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "minoverlap", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, minOverlap); temp = validParameter.validFile(parameters, "ostart", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, oStart); temp = validParameter.validFile(parameters, "oend", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, oEnd); temp = validParameter.validFile(parameters, "mismatches", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, mismatches); temp = validParameter.validFile(parameters, "maxn", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, maxN); temp = validParameter.validFile(parameters, "minscore", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, minScore); temp = validParameter.validFile(parameters, "maxinsert", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, maxInsert); temp = validParameter.validFile(parameters, "minsim", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, minSim); temp = validParameter.validFile(parameters, "optimize", false); //optimizing trumps the optimized values original value if (temp == "not found"){ temp = "none"; } m->splitAtDash(temp, optimize); if ((contigsreport != "") && ((summaryfile != "") || ( alignreport != ""))) { m->mothurOut("[ERROR]: You may only provide one of the following: contigsreport, alignreport or summary, aborting.\n"); abort=true; } if ((alignreport != "") && ((summaryfile != "") || ( contigsreport != ""))) { m->mothurOut("[ERROR]: You may only provide one of the following: contigsreport, alignreport or summary, aborting.\n"); abort=true; } if ((summaryfile != "") && ((alignreport != "") || ( contigsreport != ""))) { m->mothurOut("[ERROR]: You may only provide one of the following: contigsreport, alignreport or summary, aborting.\n"); abort=true; } //check to make sure you have the files you need for certain screening if ((contigsreport == "") && ((minOverlap != -1) || (oStart != -1) || (oEnd != -1) || (mismatches != -1))) { m->mothurOut("[ERROR]: minoverlap, ostart, oend and mismatches can only be used with a contigs.report file, aborting.\n"); abort=true; } if ((alignreport == "") && ((minScore != -1) || (maxInsert != -1) || (minSim != -1))) { m->mothurOut("[ERROR]: minscore, maxinsert and minsim can only be used with a align.report file, aborting.\n"); abort=true; } //check for invalid optimize options set validOptimizers; validOptimizers.insert("none"); validOptimizers.insert("start"); validOptimizers.insert("end"); validOptimizers.insert("maxambig"); validOptimizers.insert("maxhomop"); validOptimizers.insert("minlength"); validOptimizers.insert("maxlength"); validOptimizers.insert("maxn"); if (contigsreport != "") { validOptimizers.insert("minoverlap"); validOptimizers.insert("ostart"); validOptimizers.insert("oend"); validOptimizers.insert("mismatches"); } if (alignreport != "") { validOptimizers.insert("minscore"); validOptimizers.insert("maxinsert"); validOptimizers.insert("minsim"); } for (int i = 0; i < optimize.size(); i++) { if (validOptimizers.count(optimize[i]) == 0) { m->mothurOut(optimize[i] + " is not a valid optimizer with your input files. Valid options are "); string valid = ""; for (set::iterator it = validOptimizers.begin(); it != validOptimizers.end(); it++) { valid += (*it) + ", "; } if (valid.length() != 0) { valid = valid.substr(0, valid.length()-2); } m->mothurOut(valid + "."); m->mothurOutEndLine(); optimize.erase(optimize.begin()+i); i--; } } if (optimize.size() == 1) { if (optimize[0] == "none") { optimize.clear(); } } temp = validParameter.validFile(parameters, "criteria", false); if (temp == "not found"){ temp = "90"; } m->mothurConvert(temp, criteria); if (countfile == "") { if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "ScreenSeqsCommand"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } map badSeqNames; int start = time(NULL); int numFastaSeqs = 0; if ((contigsreport == "") && (summaryfile == "") && (alignreport == "")) { numFastaSeqs = screenFasta(badSeqNames); } else { numFastaSeqs = screenReports(badSeqNames); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should fix files #endif if(namefile != "" && groupfile != "") { screenNameGroupFile(badSeqNames); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } }else if(namefile != "") { screenNameGroupFile(badSeqNames); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } }else if(groupfile != "") { screenGroupFile(badSeqNames); } // this screens just the group else if (countfile != "") { screenCountFile(badSeqNames); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(qualfile != "") { screenQual(badSeqNames); } if(taxonomy != "") { screenTaxonomy(badSeqNames); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } #ifdef USE_MPI } #endif m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("qfile"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setQualFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to screen " + toString(numFastaSeqs) + " sequences."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "execute"); exit(1); } } //***************************************************************************************************************/ int ScreenSeqsCommand::runFastaScreening(map& badSeqNames){ try{ int numFastaSeqs = 0; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); string badAccnosFile = getOutputFileName("accnos",variables); variables["[extension]"] = m->getExtension(fastafile); string goodSeqFile = getOutputFileName("fasta", variables); outputNames.push_back(goodSeqFile); outputTypes["fasta"].push_back(goodSeqFile); outputNames.push_back(badAccnosFile); outputTypes["accnos"].push_back(badAccnosFile); #ifdef USE_MPI int pid, numSeqsPerProcessor; int tag = 2001; vector MPIPos; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); MPI_File inMPI; MPI_File outMPIGood; MPI_File outMPIBadAccnos; int outMode=MPI_MODE_CREATE|MPI_MODE_WRONLY; int inMode=MPI_MODE_RDONLY; char outGoodFilename[1024]; strcpy(outGoodFilename, goodSeqFile.c_str()); char outBadAccnosFilename[1024]; strcpy(outBadAccnosFilename, badAccnosFile.c_str()); char inFileName[1024]; strcpy(inFileName, fastafile.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, inMode, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer MPI_File_open(MPI_COMM_WORLD, outGoodFilename, outMode, MPI_INFO_NULL, &outMPIGood); MPI_File_open(MPI_COMM_WORLD, outBadAccnosFilename, outMode, MPI_INFO_NULL, &outMPIBadAccnos); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPIGood); MPI_File_close(&outMPIBadAccnos); return 0; } if (pid == 0) { //you are the root process MPIPos = m->setFilePosFasta(fastafile, numFastaSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numFastaSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&MPIPos[0], (numFastaSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } //figure out how many sequences you have to align numSeqsPerProcessor = numFastaSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPIGood, outMPIBadAccnos, MPIPos, badSeqNames); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPIGood); MPI_File_close(&outMPIBadAccnos); return 0; } for (int i = 1; i < processors; i++) { //get bad lists int badSize; MPI_Recv(&badSize, 1, MPI_INT, i, tag, MPI_COMM_WORLD, &status); } }else{ //you are a child process MPI_Recv(&numFastaSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPIPos.resize(numFastaSeqs+1); MPI_Recv(&MPIPos[0], (numFastaSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); //figure out how many sequences you have to align numSeqsPerProcessor = numFastaSeqs / processors; int startIndex = pid * numSeqsPerProcessor; if(pid == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - pid * numSeqsPerProcessor; } //align your part driverMPI(startIndex, numSeqsPerProcessor, inMPI, outMPIGood, outMPIBadAccnos, MPIPos, badSeqNames); if (m->control_pressed) { MPI_File_close(&inMPI); MPI_File_close(&outMPIGood); MPI_File_close(&outMPIBadAccnos); return 0; } //send bad list int badSize = badSeqNames.size(); MPI_Send(&badSize, 1, MPI_INT, 0, tag, MPI_COMM_WORLD); } //close files MPI_File_close(&inMPI); MPI_File_close(&outMPIGood); MPI_File_close(&outMPIBadAccnos); MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #else if(processors == 1){ numFastaSeqs = driver(lines[0], goodSeqFile, badAccnosFile, fastafile, badSeqNames); } else{ numFastaSeqs = createProcesses(goodSeqFile, badAccnosFile, fastafile, badSeqNames); } if (m->control_pressed) { m->mothurRemove(goodSeqFile); return numFastaSeqs; } #endif #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should fix files //read accnos file with all names in it, process 0 just has its names MPI_File inMPIAccnos; MPI_Offset size; char inFileName[1024]; strcpy(inFileName, badAccnosFile.c_str()); MPI_File_open(MPI_COMM_SELF, inFileName, inMode, MPI_INFO_NULL, &inMPIAccnos); //comm, filename, mode, info, filepointer MPI_File_get_size(inMPIAccnos, &size); char* buffer = new char[size]; MPI_File_read(inMPIAccnos, buffer, size, MPI_CHAR, &status); string tempBuf = buffer; if (tempBuf.length() > size) { tempBuf = tempBuf.substr(0, size); } istringstream iss (tempBuf,istringstream::in); delete buffer; MPI_File_close(&inMPIAccnos); badSeqNames.clear(); string tempName, trashCode; while (!iss.eof()) { iss >> tempName >> trashCode; m->gobble(iss); badSeqNames[tempName] = trashCode; } } #endif return numFastaSeqs; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "runFastaScreening"); exit(1); } } //***************************************************************************************************************/ int ScreenSeqsCommand::screenReports(map& badSeqNames){ try{ int numFastaSeqs = 0; bool summarizedFasta = false; //did not provide a summary file, but set a parameter that requires summarizing the fasta file //or did provide a summary file, but set maxn parameter so we must summarize the fasta file vector positions; if (((summaryfile == "") && ((m->inUsersGroups("maxambig", optimize)) ||(m->inUsersGroups("maxhomop", optimize)) ||(m->inUsersGroups("maxlength", optimize)) || (m->inUsersGroups("minlength", optimize)) || (m->inUsersGroups("start", optimize)) || (m->inUsersGroups("end", optimize)))) || ((summaryfile != "") && m->inUsersGroups("maxn", optimize))) { //use the namefile to optimize correctly if (namefile != "") { nameMap = m->readNames(namefile); } else if (countfile != "") { CountTable ct; ct.readTable(countfile, true, false); nameMap = ct.getNameMap(); } getSummary(positions); summarizedFasta = true; } else { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } #else if(processors == 1){ lines.push_back(linePair(0, 1000)); } else { int numFastaSeqs = 0; positions = m->setFilePosFasta(fastafile, numFastaSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif } if ((summaryfile != "") && ((m->inUsersGroups("maxambig", optimize)) ||(m->inUsersGroups("maxhomop", optimize)) ||(m->inUsersGroups("maxlength", optimize)) || (m->inUsersGroups("minlength", optimize)) || (m->inUsersGroups("start", optimize)) || (m->inUsersGroups("end", optimize))) && !summarizedFasta) { //summarize based on summaryfile if (namefile != "") { nameMap = m->readNames(namefile); } else if (countfile != "") { CountTable ct; ct.readTable(countfile, true, false); nameMap = ct.getNameMap(); } getSummaryReport(); }else if ((contigsreport != "") && ((m->inUsersGroups("minoverlap", optimize)) || (m->inUsersGroups("ostart", optimize)) || (m->inUsersGroups("oend", optimize)) || (m->inUsersGroups("mismatches", optimize)))) { //optimize settings based on contigs file optimizeContigs(); }else if ((alignreport != "") && ((m->inUsersGroups("minsim", optimize)) || (m->inUsersGroups("minscore", optimize)) || (m->inUsersGroups("maxinsert", optimize)))) { //optimize settings based on contigs file optimizeAlign(); } //provided summary file, and did not set maxn so no need to summarize fasta if (summaryfile != "") { numFastaSeqs = screenSummary(badSeqNames); } //add in any seqs that fail due to contigs report results else if (contigsreport != "") { numFastaSeqs = screenContigs(badSeqNames); } //add in any seqs that fail due to align report else if (alignreport != "") { numFastaSeqs = screenAlignReport(badSeqNames); } return numFastaSeqs; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenReports"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::screenAlignReport(map& badSeqNames){ try { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(alignreport)); string outSummary = getOutputFileName("alignreport",variables); outputNames.push_back(outSummary); outputTypes["alignreport"].push_back(outSummary); string name, TemplateName, SearchMethod, AlignmentMethod; //QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template //checking for minScore, maxInsert, minSim int length, TemplateLength, QueryStart, QueryEnd, TemplateStart, TemplateEnd, PairwiseAlignmentLength, GapsInQuery, GapsInTemplate, LongestInsert; float SearchScore, SimBtwnQueryTemplate; ofstream out; m->openOutputFile(outSummary, out); //read summary file ifstream in; m->openInputFile(alignreport, in); out << (m->getline(in)) << endl; //skip headers int count = 0; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); return 0; } //seqname start end nbases ambigs polymer numSeqs in >> name >> length >> TemplateName >> TemplateLength >> SearchMethod >> SearchScore >> AlignmentMethod >> QueryStart >> QueryEnd >> TemplateStart >> TemplateEnd >> PairwiseAlignmentLength >> GapsInQuery >> GapsInTemplate >> LongestInsert >> SimBtwnQueryTemplate; m->gobble(in); bool goodSeq = 1; // innocent until proven guilty string trashCode = ""; if(maxInsert != -1 && maxInsert < LongestInsert) { goodSeq = 0; trashCode += "insert|"; } if(minScore != -1 && minScore > SearchScore) { goodSeq = 0; trashCode += "score|"; } if(minSim != -1 && minSim > SimBtwnQueryTemplate) { goodSeq = 0; trashCode += "sim|"; } if(goodSeq == 1){ out << name << '\t' << length << '\t' << TemplateName << '\t' << TemplateLength << '\t' << SearchMethod << '\t' << SearchScore << '\t' << AlignmentMethod << '\t' << QueryStart << '\t' << QueryEnd << '\t' << TemplateStart << '\t' << TemplateEnd << '\t' << PairwiseAlignmentLength << '\t' << GapsInQuery << '\t' << GapsInTemplate << '\t' << LongestInsert << '\t' << SimBtwnQueryTemplate << endl; } else{ badSeqNames[name] = trashCode; } count++; } in.close(); out.close(); int oldBadSeqsCount = badSeqNames.size(); int numFastaSeqs = runFastaScreening(badSeqNames); if (oldBadSeqsCount != badSeqNames.size()) { //more seqs were removed by maxns m->renameFile(outSummary, outSummary+".temp"); ofstream out2; m->openOutputFile(outSummary, out2); //read summary file ifstream in2; m->openInputFile(outSummary+".temp", in2); out2 << (m->getline(in2)) << endl; //skip headers while (!in2.eof()) { if (m->control_pressed) { in2.close(); out2.close(); return 0; } //seqname start end nbases ambigs polymer numSeqs in2 >> name >> length >> TemplateName >> TemplateLength >> SearchMethod >> SearchScore >> AlignmentMethod >> QueryStart >> QueryEnd >> TemplateStart >> TemplateEnd >> PairwiseAlignmentLength >> GapsInQuery >> GapsInTemplate >> LongestInsert >> SimBtwnQueryTemplate; m->gobble(in2); if (badSeqNames.count(name) == 0) { //are you good? out2 << name << '\t' << length << '\t' << TemplateName << '\t' << TemplateLength << '\t' << SearchMethod << '\t' << SearchScore << '\t' << AlignmentMethod << '\t' << QueryStart << '\t' << QueryEnd << '\t' << TemplateStart << '\t' << TemplateEnd << '\t' << PairwiseAlignmentLength << '\t' << GapsInQuery << '\t' << GapsInTemplate << '\t' << LongestInsert << '\t' << SimBtwnQueryTemplate << endl; } } in2.close(); out2.close(); m->mothurRemove(outSummary+".temp"); } if (numFastaSeqs != count) { m->mothurOut("[ERROR]: found " + toString(numFastaSeqs) + " sequences in your fasta file, and " + toString(count) + " sequences in your align report file, quitting.\n"); m->control_pressed = true; } return count; return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenAlignReport"); exit(1); } } //***************************************************************************************************************/ int ScreenSeqsCommand::screenContigs(map& badSeqNames){ try{ map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(contigsreport)); string outSummary = getOutputFileName("contigsreport",variables); outputNames.push_back(outSummary); outputTypes["contigsreport"].push_back(outSummary); string name; //Name Length Overlap_Length Overlap_Start Overlap_End MisMatches Num_Ns int length, OLength, thisOStart, thisOEnd, numMisMatches, numNs; ofstream out; m->openOutputFile(outSummary, out); //read summary file ifstream in; m->openInputFile(contigsreport, in); out << (m->getline(in)) << endl; //skip headers int count = 0; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); return 0; } //seqname start end nbases ambigs polymer numSeqs in >> name >> length >> OLength >> thisOStart >> thisOEnd >> numMisMatches >> numNs; m->gobble(in); bool goodSeq = 1; // innocent until proven guilty string trashCode = ""; if(oStart != -1 && oStart < thisOStart) { goodSeq = 0; trashCode += "ostart|"; } if(oEnd != -1 && oEnd > thisOEnd) { goodSeq = 0; trashCode += "oend|"; } if(maxN != -1 && maxN < numNs) { goodSeq = 0; trashCode += "n|"; } if(minOverlap != -1 && minOverlap > OLength) { goodSeq = 0; trashCode += "olength|"; } if(mismatches != -1 && mismatches < numMisMatches) { goodSeq = 0; trashCode += "mismatches|"; } if(goodSeq == 1){ out << name << '\t' << length << '\t' << OLength << '\t' << thisOStart << '\t' << thisOEnd << '\t' << numMisMatches << '\t' << numNs << endl; } else{ badSeqNames[name] = trashCode; } count++; } in.close(); out.close(); int oldBadSeqsCount = badSeqNames.size(); int numFastaSeqs = runFastaScreening(badSeqNames); if (oldBadSeqsCount != badSeqNames.size()) { //more seqs were removed by maxns m->renameFile(outSummary, outSummary+".temp"); ofstream out2; m->openOutputFile(outSummary, out2); //read summary file ifstream in2; m->openInputFile(outSummary+".temp", in2); out2 << (m->getline(in2)) << endl; //skip headers while (!in2.eof()) { if (m->control_pressed) { in2.close(); out2.close(); return 0; } //seqname start end nbases ambigs polymer numSeqs in2 >> name >> length >> OLength >> thisOStart >> thisOEnd >> numMisMatches >> numNs; m->gobble(in2); if (badSeqNames.count(name) == 0) { //are you good? out2 << name << '\t' << length << '\t' << OLength << '\t' << thisOStart << '\t' << thisOEnd << '\t' << numMisMatches << '\t' << numNs << endl; } } in2.close(); out2.close(); m->mothurRemove(outSummary+".temp"); } if (numFastaSeqs != count) { m->mothurOut("[ERROR]: found " + toString(numFastaSeqs) + " sequences in your fasta file, and " + toString(count) + " sequences in your contigs report file, quitting.\n"); m->control_pressed = true; } return count; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenContigs"); exit(1); } } //***************************************************************************************************************/ int ScreenSeqsCommand::screenSummary(map& badSeqNames){ try{ map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(summaryfile)); string outSummary = getOutputFileName("summary",variables); outputNames.push_back(outSummary); outputTypes["summary"].push_back(outSummary); string name; int start, end, length, ambigs, polymer, numReps; ofstream out; m->openOutputFile(outSummary, out); //read summary file ifstream in; m->openInputFile(summaryfile, in); out << (m->getline(in)) << endl; //skip headers int count = 0; while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); return 0; } //seqname start end nbases ambigs polymer numSeqs in >> name >> start >> end >> length >> ambigs >> polymer >> numReps; m->gobble(in); bool goodSeq = 1; // innocent until proven guilty string trashCode = ""; if(startPos != -1 && startPos < start) { goodSeq = 0; trashCode += "start|"; } if(endPos != -1 && endPos > end) { goodSeq = 0; trashCode += "end|"; } if(maxAmbig != -1 && maxAmbig < ambigs) { goodSeq = 0; trashCode += "ambig|"; } if(maxHomoP != -1 && maxHomoP < polymer) { goodSeq = 0; trashCode += "homop|"; } if(minLength != -1 && minLength > length) { goodSeq = 0; trashCode += "renameFile(outSummary, outSummary+".temp"); ofstream out2; m->openOutputFile(outSummary, out2); //read summary file ifstream in2; m->openInputFile(outSummary+".temp", in2); out2 << (m->getline(in2)) << endl; //skip headers while (!in2.eof()) { if (m->control_pressed) { in2.close(); out2.close(); return 0; } //seqname start end nbases ambigs polymer numSeqs in2 >> name >> start >> end >> length >> ambigs >> polymer >> numReps; m->gobble(in2); if (badSeqNames.count(name) == 0) { //are you good? out2 << name << '\t' << start << '\t' << end << '\t' << length << '\t' << ambigs << '\t' << polymer << '\t' << numReps << endl; } } in2.close(); out2.close(); m->mothurRemove(outSummary+".temp"); } if (numFastaSeqs != count) { m->mothurOut("[ERROR]: found " + toString(numFastaSeqs) + " sequences in your fasta file, and " + toString(count) + " sequences in your summary file, quitting.\n"); m->control_pressed = true; } return count; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenSummary"); exit(1); } } //***************************************************************************************************************/ int ScreenSeqsCommand::screenFasta(map& badSeqNames){ try{ //if the user want to optimize we need to know the 90% mark vector positions; if (optimize.size() != 0) { //get summary is paralellized so we need to divideFile, no need to do this step twice so I moved it here //use the namefile to optimize correctly if (namefile != "") { nameMap = m->readNames(namefile); } else if (countfile != "") { CountTable ct; ct.readTable(countfile, true, false); nameMap = ct.getNameMap(); } getSummary(positions); }else { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } #else if(processors == 1){ lines.push_back(linePair(0, 1000)); } else { int numFastaSeqs = 0; positions = m->setFilePosFasta(fastafile, numFastaSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif } if (m->control_pressed) { return 0; } int numFastaSeqs = runFastaScreening(badSeqNames); return numFastaSeqs; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenFasta"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::screenNameGroupFile(map badSeqNames){ try { ifstream inputNames; m->openInputFile(namefile, inputNames); map badSeqGroups; string seqName, seqList, group; map::iterator it; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string goodNameFile = getOutputFileName("name", variables); outputNames.push_back(goodNameFile); outputTypes["name"].push_back(goodNameFile); ofstream goodNameOut; m->openOutputFile(goodNameFile, goodNameOut); while(!inputNames.eof()){ if (m->control_pressed) { goodNameOut.close(); inputNames.close(); m->mothurRemove(goodNameFile); return 0; } inputNames >> seqName; m->gobble(inputNames); inputNames >> seqList; it = badSeqNames.find(seqName); if(it != badSeqNames.end()){ if(namefile != ""){ int start = 0; for(int i=0;isecond; start = i+1; } } badSeqGroups[seqList.substr(start,seqList.length()-start)] = it->second; } badSeqNames.erase(it); } else{ goodNameOut << seqName << '\t' << seqList << endl; } m->gobble(inputNames); } inputNames.close(); goodNameOut.close(); //we were unable to remove some of the bad sequences if (badSeqNames.size() != 0) { for (it = badSeqNames.begin(); it != badSeqNames.end(); it++) { m->mothurOut("Your namefile does not include the sequence " + it->first + " please correct."); m->mothurOutEndLine(); } } if(groupfile != ""){ ifstream inputGroups; m->openInputFile(groupfile, inputGroups); variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string goodGroupFile = getOutputFileName("group", variables); outputNames.push_back(goodGroupFile); outputTypes["group"].push_back(goodGroupFile); ofstream goodGroupOut; m->openOutputFile(goodGroupFile, goodGroupOut); while(!inputGroups.eof()){ if (m->control_pressed) { goodGroupOut.close(); inputGroups.close(); m->mothurRemove(goodNameFile); m->mothurRemove(goodGroupFile); return 0; } inputGroups >> seqName; m->gobble(inputGroups); inputGroups >> group; it = badSeqGroups.find(seqName); if(it != badSeqGroups.end()){ badSeqGroups.erase(it); } else{ goodGroupOut << seqName << '\t' << group << endl; } m->gobble(inputGroups); } inputGroups.close(); goodGroupOut.close(); //we were unable to remove some of the bad sequences if (badSeqGroups.size() != 0) { for (it = badSeqGroups.begin(); it != badSeqGroups.end(); it++) { m->mothurOut("Your groupfile does not include the sequence " + it->first + " please correct."); m->mothurOutEndLine(); } } } return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenNameGroupFile"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::getSummaryReport(){ try { vector startPosition; vector endPosition; vector seqLength; vector ambigBases; vector longHomoPolymer; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { #endif //read summary file ifstream in; m->openInputFile(summaryfile, in); m->getline(in); string name; int start, end, length, ambigs, polymer, numReps; while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } //seqname start end nbases ambigs polymer numSeqs in >> name >> start >> end >> length >> ambigs >> polymer >> numReps; m->gobble(in); int num = 1; if ((namefile != "") || (countfile !="")) { //make sure this sequence is in the namefile, else error map::iterator it = nameMap.find(name); if (it == nameMap.end()) { m->mothurOut("[ERROR]: " + name + " is not in your namefile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } else { num = it->second; } } //for each sequence this sequence represents for (int i = 0; i < num; i++) { startPosition.push_back(start); endPosition.push_back(end); seqLength.push_back(length); ambigBases.push_back(ambigs); longHomoPolymer.push_back(polymer); } } in.close(); sort(startPosition.begin(), startPosition.end()); sort(endPosition.begin(), endPosition.end()); sort(seqLength.begin(), seqLength.end()); sort(ambigBases.begin(), ambigBases.end()); sort(longHomoPolymer.begin(), longHomoPolymer.end()); //numSeqs is the number of unique seqs, startPosition.size() is the total number of seqs, we want to optimize using all seqs int criteriaPercentile = int(startPosition.size() * (criteria / (float) 100)); for (int i = 0; i < optimize.size(); i++) { if (optimize[i] == "start") { startPos = startPosition[criteriaPercentile]; m->mothurOut("Optimizing start to " + toString(startPos) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "end") { int endcriteriaPercentile = int(endPosition.size() * ((100 - criteria) / (float) 100)); endPos = endPosition[endcriteriaPercentile]; m->mothurOut("Optimizing end to " + toString(endPos) + "."); m->mothurOutEndLine();} else if (optimize[i] == "maxambig") { maxAmbig = ambigBases[criteriaPercentile]; m->mothurOut("Optimizing maxambig to " + toString(maxAmbig) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "maxhomop") { maxHomoP = longHomoPolymer[criteriaPercentile]; m->mothurOut("Optimizing maxhomop to " + toString(maxHomoP) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "minlength") { int mincriteriaPercentile = int(seqLength.size() * ((100 - criteria) / (float) 100)); minLength = seqLength[mincriteriaPercentile]; m->mothurOut("Optimizing minlength to " + toString(minLength) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "maxlength") { maxLength = seqLength[criteriaPercentile]; m->mothurOut("Optimizing maxlength to " + toString(maxLength) + "."); m->mothurOutEndLine(); } } #ifdef USE_MPI } MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); MPI_Comm_size(MPI_COMM_WORLD, &processors); if (pid == 0) { //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&startPos, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&endPos, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxAmbig, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxHomoP, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&minLength, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxLength, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); } }else { MPI_Recv(&startPos, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&endPos, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxAmbig, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxHomoP, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&minLength, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxLength, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #endif return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "getSummaryReport"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::optimizeContigs(){ try { vector olengths; vector oStarts; vector oEnds; vector numMismatches; vector numNs; vector positions; vector contigsLines; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFilePerLine(contigsreport, processors); for (int i = 0; i < (positions.size()-1); i++) { contigsLines.push_back(linePair(positions[i], positions[(i+1)])); } #else if(processors == 1){ contigsLines.push_back(linePair(0, 1000)); } else { int numContigsSeqs = 0; positions = m->setFilePosEachLine(contigsreport, numContigsSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numContigsSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numContigsSeqs - i * numSeqsPerProcessor; } contigsLines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { driverContigsSummary(olengths, oStarts, oEnds, numMismatches, numNs, contigsLines[0]); #else createProcessesContigsSummary(olengths, oStarts, oEnds, numMismatches, numNs, contigsLines); if (m->control_pressed) { return 0; } #endif sort(olengths.begin(), olengths.end()); sort(oStarts.begin(), oStarts.end()); sort(oEnds.begin(), oEnds.end()); sort(numMismatches.begin(), numMismatches.end()); sort(numNs.begin(), numNs.end()); //numSeqs is the number of unique seqs, startPosition.size() is the total number of seqs, we want to optimize using all seqs int criteriaPercentile = int(oStarts.size() * (criteria / (float) 100)); for (int i = 0; i < optimize.size(); i++) { if (optimize[i] == "ostart") { oStart = oStarts[criteriaPercentile]; m->mothurOut("Optimizing ostart to " + toString(oStart) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "oend") { int endcriteriaPercentile = int(oEnds.size() * ((100 - criteria) / (float) 100)); oEnd = oEnds[endcriteriaPercentile]; m->mothurOut("Optimizing oend to " + toString(oEnd) + "."); m->mothurOutEndLine();} else if (optimize[i] == "mismatches") { mismatches = numMismatches[criteriaPercentile]; m->mothurOut("Optimizing mismatches to " + toString(mismatches) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "maxn") { maxN = numNs[criteriaPercentile]; m->mothurOut("Optimizing maxn to " + toString(maxN) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "minoverlap") { int mincriteriaPercentile = int(olengths.size() * ((100 - criteria) / (float) 100)); minOverlap = olengths[mincriteriaPercentile]; m->mothurOut("Optimizing minoverlap to " + toString(minOverlap) + "."); m->mothurOutEndLine(); } } #ifdef USE_MPI } MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); MPI_Comm_size(MPI_COMM_WORLD, &processors); if (pid == 0) { //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&minOverlap, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&oStart, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&oEnd, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&mismatches, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxN, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); } }else { MPI_Recv(&minOverlap, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&oStart, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&oEnd, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&mismatches, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxN, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #endif return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "optimizeContigs"); exit(1); } } /**************************************************************************************/ int ScreenSeqsCommand::driverContigsSummary(vector& oLength, vector& ostartPosition, vector& oendPosition, vector& omismatches, vector& numNs, linePair filePos) { try { string name; //Name Length Overlap_Length Overlap_Start Overlap_End MisMatches Num_Ns int length, OLength, thisOStart, thisOEnd, numMisMatches, numns; ifstream in; m->openInputFile(contigsreport, in); in.seekg(filePos.start); if (filePos.start == 0) { //read headers m->zapGremlins(in); m->gobble(in); m->getline(in); m->gobble(in); } bool done = false; int count = 0; while (!done) { if (m->control_pressed) { in.close(); return 1; } //seqname start end nbases ambigs polymer numSeqs in >> name >> length >> OLength >> thisOStart >> thisOEnd >> numMisMatches >> numns; m->gobble(in); int num = 1; if ((namefile != "") || (countfile !="")){ //make sure this sequence is in the namefile, else error map::iterator it = nameMap.find(name); if (it == nameMap.end()) { m->mothurOut("[ERROR]: " + name + " is not in your namefile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } else { num = it->second; } } //for each sequence this sequence represents for (int i = 0; i < num; i++) { ostartPosition.push_back(thisOStart); oendPosition.push_back(thisOEnd); oLength.push_back(OLength); omismatches.push_back(numMisMatches); numNs.push_back(numns); } count++; //if((count) % 100 == 0){ m->mothurOut("Optimizing sequence: " + toString(count)); m->mothurOutEndLine(); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= filePos.end)) { break; } #else if (in.eof()) { break; } #endif } in.close(); return count; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "driverContigsSummary"); exit(1); } } /**************************************************************************************************/ int ScreenSeqsCommand::createProcessesContigsSummary(vector& oLength, vector& ostartPosition, vector& oendPosition, vector& omismatches, vector& numNs, vector contigsLines) { try { int process = 1; int num = 0; vector processIDS; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverContigsSummary(oLength, ostartPosition, oendPosition, omismatches, numNs, contigsLines[process]); //pass numSeqs to parent ofstream out; string tempFile = contigsreport + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << ostartPosition.size() << endl; for (int k = 0; k < ostartPosition.size(); k++) { out << ostartPosition[k] << '\t'; } out << endl; for (int k = 0; k < oendPosition.size(); k++) { out << oendPosition[k] << '\t'; } out << endl; for (int k = 0; k < oLength.size(); k++) { out << oLength[k] << '\t'; } out << endl; for (int k = 0; k < omismatches.size(); k++) { out << omismatches[k] << '\t'; } out << endl; for (int k = 0; k < numNs.size(); k++) { out << numNs[k] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(contigsreport + (toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(contigsreport + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); vector positions; contigsLines.clear(); positions = m->divideFilePerLine(contigsreport, processors); for (int i = 0; i < (positions.size()-1); i++) { contigsLines.push_back(linePair(positions[i], positions[(i+1)])); } //redo file divide lines.clear(); positions.clear(); positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } ostartPosition.clear(); oendPosition.clear(); oLength.clear(); omismatches.clear(); numNs.clear(); num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverContigsSummary(oLength, ostartPosition, oendPosition, omismatches, numNs, contigsLines[process]); //pass numSeqs to parent ofstream out; string tempFile = contigsreport + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << ostartPosition.size() << endl; for (int k = 0; k < ostartPosition.size(); k++) { out << ostartPosition[k] << '\t'; } out << endl; for (int k = 0; k < oendPosition.size(); k++) { out << oendPosition[k] << '\t'; } out << endl; for (int k = 0; k < oLength.size(); k++) { out << oLength[k] << '\t'; } out << endl; for (int k = 0; k < omismatches.size(); k++) { out << omismatches[k] << '\t'; } out << endl; for (int k = 0; k < numNs.size(); k++) { out << numNs[k] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } num = driverContigsSummary(oLength, ostartPosition, oendPosition, omismatches, numNs, contigsLines[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFilename, in); int temp, tempNum; in >> tempNum; m->gobble(in); num += tempNum; in >> tempNum; m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; ostartPosition.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; oendPosition.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; oLength.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; omismatches.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; numNs.push_back(temp); } m->gobble(in); in.close(); m->mothurRemove(tempFilename); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the seqSumData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// /* vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; icount; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } for (int k = 0; k < pDataArray[i]->ostartPosition.size(); k++) { ostartPosition.push_back(pDataArray[i]->ostartPosition[k]); } for (int k = 0; k < pDataArray[i]->oendPosition.size(); k++) { oendPosition.push_back(pDataArray[i]->oendPosition[k]); } for (int k = 0; k < pDataArray[i]->oLength.size(); k++) { oLength.push_back(pDataArray[i]->oLength[k]); } for (int k = 0; k < pDataArray[i]->omismatches.size(); k++) { omismatches.push_back(pDataArray[i]->omismatches[k]); } for (int k = 0; k < pDataArray[i]->numNs.size(); k++) { numNs.push_back(pDataArray[i]->numNs[k]); } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } */ #endif return num; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "createProcessesContigsSummary"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::optimizeAlign(){ try { vector sims; vector scores; vector inserts; vector positions; vector alignLines; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFilePerLine(alignreport, processors); for (int i = 0; i < (positions.size()-1); i++) { alignLines.push_back(linePair(positions[i], positions[(i+1)])); } #else if(processors == 1){ alignLines.push_back(linePair(0, 1000)); } else { int numAlignSeqs = 0; positions = m->setFilePosEachLine(alignreport, numAlignSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numAlignSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numAlignSeqs - i * numSeqsPerProcessor; } alignLines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { driverAlignSummary(sims, scores, inserts, alignLines[0]); #else createProcessesAlignSummary(sims, scores, inserts, alignLines); if (m->control_pressed) { return 0; } #endif sort(sims.begin(), sims.end()); sort(scores.begin(), scores.end()); sort(inserts.begin(), inserts.end()); //numSeqs is the number of unique seqs, startPosition.size() is the total number of seqs, we want to optimize using all seqs int criteriaPercentile = int(sims.size() * (criteria / (float) 100)); for (int i = 0; i < optimize.size(); i++) { if (optimize[i] == "minsim") { int mincriteriaPercentile = int(sims.size() * ((100 - criteria) / (float) 100)); minSim = sims[mincriteriaPercentile]; m->mothurOut("Optimizing minsim to " + toString(minSim) + "."); m->mothurOutEndLine();} else if (optimize[i] == "minscore") { int mincriteriaPercentile = int(scores.size() * ((100 - criteria) / (float) 100)); minScore = scores[mincriteriaPercentile]; m->mothurOut("Optimizing minscore to " + toString(minScore) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "maxinsert") { maxInsert = inserts[criteriaPercentile]; m->mothurOut("Optimizing maxinsert to " + toString(maxInsert) + "."); m->mothurOutEndLine(); } } #ifdef USE_MPI } MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); MPI_Comm_size(MPI_COMM_WORLD, &processors); if (pid == 0) { //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&minSim, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&minScore, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxInsert, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); } }else { MPI_Recv(&minSim, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&minScore, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxInsert, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #endif return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "optimizeContigs"); exit(1); } } /**************************************************************************************/ int ScreenSeqsCommand::driverAlignSummary(vector& sims, vector& scores, vector& inserts, linePair filePos) { try { string name, TemplateName, SearchMethod, AlignmentMethod; //QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template //checking for minScore, maxInsert, minSim int length, TemplateLength, QueryStart, QueryEnd, TemplateStart, TemplateEnd, PairwiseAlignmentLength, GapsInQuery, GapsInTemplate, LongestInsert; float SearchScore, SimBtwnQueryTemplate; ifstream in; m->openInputFile(alignreport, in); in.seekg(filePos.start); if (filePos.start == 0) { //read headers m->zapGremlins(in); m->gobble(in); m->getline(in); m->gobble(in); } bool done = false; int count = 0; while (!done) { if (m->control_pressed) { in.close(); return 1; } in >> name >> length >> TemplateName >> TemplateLength >> SearchMethod >> SearchScore >> AlignmentMethod >> QueryStart >> QueryEnd >> TemplateStart >> TemplateEnd >> PairwiseAlignmentLength >> GapsInQuery >> GapsInTemplate >> LongestInsert >> SimBtwnQueryTemplate; m->gobble(in); int num = 1; if ((namefile != "") || (countfile !="")){ //make sure this sequence is in the namefile, else error map::iterator it = nameMap.find(name); if (it == nameMap.end()) { m->mothurOut("[ERROR]: " + name + " is not in your namefile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } else { num = it->second; } } //for each sequence this sequence represents for (int i = 0; i < num; i++) { sims.push_back(SimBtwnQueryTemplate); scores.push_back(SearchScore); inserts.push_back(LongestInsert); } count++; //if((count) % 100 == 0){ m->mothurOut("Optimizing sequence: " + toString(count)); m->mothurOutEndLine(); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= filePos.end)) { break; } #else if (in.eof()) { break; } #endif } in.close(); return count; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "driverAlignSummary"); exit(1); } } /**************************************************************************************************/ int ScreenSeqsCommand::createProcessesAlignSummary(vector& sims, vector& scores, vector& inserts, vector alignLines) { try { int process = 1; int num = 0; vector processIDS; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverAlignSummary(sims, scores, inserts, alignLines[process]); //pass numSeqs to parent ofstream out; string tempFile = alignreport + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << sims.size() << endl; for (int k = 0; k < sims.size(); k++) { out << sims[k] << '\t'; } out << endl; for (int k = 0; k < scores.size(); k++) { out << scores[k] << '\t'; } out << endl; for (int k = 0; k < inserts.size(); k++) { out << inserts[k] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(contigsreport + (toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(contigsreport + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); vector positions; alignLines.clear(); positions = m->divideFilePerLine(contigsreport, processors); for (int i = 0; i < (positions.size()-1); i++) { alignLines.push_back(linePair(positions[i], positions[(i+1)])); } //redo file divide lines.clear(); positions.clear(); positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } sims.clear(); scores.clear(); inserts.clear(); num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverAlignSummary(sims, scores, inserts, alignLines[process]); //pass numSeqs to parent ofstream out; string tempFile = alignreport + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << sims.size() << endl; for (int k = 0; k < sims.size(); k++) { out << sims[k] << '\t'; } out << endl; for (int k = 0; k < scores.size(); k++) { out << scores[k] << '\t'; } out << endl; for (int k = 0; k < inserts.size(); k++) { out << inserts[k] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } num = driverAlignSummary(sims, scores, inserts, alignLines[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFilename, in); int temp, tempNum; float temp2; in >> tempNum; m->gobble(in); num += tempNum; in >> tempNum; m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp2; sims.push_back(temp2); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp2; scores.push_back(temp2); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; inserts.push_back(temp); } m->gobble(in); in.close(); m->mothurRemove(tempFilename); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the seqSumData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// /* vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; icount; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } for (int k = 0; k < pDataArray[i]->sims.size(); k++) { sims.push_back(pDataArray[i]->sims[k]); } for (int k = 0; k < pDataArray[i]->scores.size(); k++) { scores.push_back(pDataArray[i]->scores[k]); } for (int k = 0; k < pDataArray[i]->inserts.size(); k++) { inserts.push_back(pDataArray[i]->inserts[k]); } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } */ #endif return num; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "createProcessesAlignSummary"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::getSummary(vector& positions){ try { vector startPosition; vector endPosition; vector seqLength; vector ambigBases; vector longHomoPolymer; vector numNs; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } #else if(processors == 1){ lines.push_back(linePair(0, 1000)); } else { int numFastaSeqs = 0; positions = m->setFilePosFasta(fastafile, numFastaSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { linePair tempLine(0, positions[positions.size()-1]); driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, numNs, fastafile, tempLine); #else int numSeqs = 0; //#if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if(processors == 1){ numSeqs = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, numNs, fastafile, lines[0]); }else{ numSeqs = createProcessesCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, numNs, fastafile); } if (m->control_pressed) { return 0; } #endif sort(startPosition.begin(), startPosition.end()); sort(endPosition.begin(), endPosition.end()); sort(seqLength.begin(), seqLength.end()); sort(ambigBases.begin(), ambigBases.end()); sort(longHomoPolymer.begin(), longHomoPolymer.end()); sort(numNs.begin(), numNs.end()); //numSeqs is the number of unique seqs, startPosition.size() is the total number of seqs, we want to optimize using all seqs int criteriaPercentile = int(startPosition.size() * (criteria / (float) 100)); for (int i = 0; i < optimize.size(); i++) { if (optimize[i] == "start") { startPos = startPosition[criteriaPercentile]; m->mothurOut("Optimizing start to " + toString(startPos) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "end") { int endcriteriaPercentile = int(endPosition.size() * ((100 - criteria) / (float) 100)); endPos = endPosition[endcriteriaPercentile]; m->mothurOut("Optimizing end to " + toString(endPos) + "."); m->mothurOutEndLine();} else if (optimize[i] == "maxambig") { maxAmbig = ambigBases[criteriaPercentile]; m->mothurOut("Optimizing maxambig to " + toString(maxAmbig) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "maxhomop") { maxHomoP = longHomoPolymer[criteriaPercentile]; m->mothurOut("Optimizing maxhomop to " + toString(maxHomoP) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "minlength") { int mincriteriaPercentile = int(seqLength.size() * ((100 - criteria) / (float) 100)); minLength = seqLength[mincriteriaPercentile]; m->mothurOut("Optimizing minlength to " + toString(minLength) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "maxlength") { maxLength = seqLength[criteriaPercentile]; m->mothurOut("Optimizing maxlength to " + toString(maxLength) + "."); m->mothurOutEndLine(); } else if (optimize[i] == "maxn") { maxN = numNs[criteriaPercentile]; m->mothurOut("Optimizing maxn to " + toString(maxN) + "."); m->mothurOutEndLine(); } } #ifdef USE_MPI } MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); MPI_Comm_size(MPI_COMM_WORLD, &processors); if (pid == 0) { //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&startPos, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&endPos, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxAmbig, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxHomoP, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&minLength, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxLength, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&maxN, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); } }else { MPI_Recv(&startPos, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&endPos, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxAmbig, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxHomoP, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&minLength, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxLength, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); MPI_Recv(&maxN, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #endif return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "getSummary"); exit(1); } } /**************************************************************************************/ int ScreenSeqsCommand::driverCreateSummary(vector& startPosition, vector& endPosition, vector& seqLength, vector& ambigBases, vector& longHomoPolymer, vector& numNs, string filename, linePair filePos) { try { ifstream in; m->openInputFile(filename, in); in.seekg(filePos.start); //adjust start if null strings if (filePos.start == 0) { m->zapGremlins(in); m->gobble(in); } m->gobble(in); bool done = false; int count = 0; while (!done) { if (m->control_pressed) { in.close(); return 1; } Sequence current(in); m->gobble(in); if (current.getName() != "") { int num = 1; if ((namefile != "") || (countfile !="")){ //make sure this sequence is in the namefile, else error map::iterator it = nameMap.find(current.getName()); if (it == nameMap.end()) { m->mothurOut("[ERROR]: " + current.getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } else { num = it->second; } } //for each sequence this sequence represents int numns = current.getNumNs(); for (int i = 0; i < num; i++) { startPosition.push_back(current.getStartPos()); endPosition.push_back(current.getEndPos()); seqLength.push_back(current.getNumBases()); ambigBases.push_back(current.getAmbigBases()); longHomoPolymer.push_back(current.getLongHomoPolymer()); numNs.push_back(numns); } count++; } //if((count) % 100 == 0){ m->mothurOut("Optimizing sequence: " + toString(count)); m->mothurOutEndLine(); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= filePos.end)) { break; } #else if (in.eof()) { break; } #endif } in.close(); return count; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "driverCreateSummary"); exit(1); } } /**************************************************************************************************/ int ScreenSeqsCommand::createProcessesCreateSummary(vector& startPosition, vector& endPosition, vector& seqLength, vector& ambigBases, vector& longHomoPolymer, vector& numNs, string filename) { try { int process = 1; int num = 0; vector processIDS; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, numNs, fastafile, lines[process]); //pass numSeqs to parent ofstream out; string tempFile = fastafile + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << startPosition.size() << endl; for (int k = 0; k < startPosition.size(); k++) { out << startPosition[k] << '\t'; } out << endl; for (int k = 0; k < endPosition.size(); k++) { out << endPosition[k] << '\t'; } out << endl; for (int k = 0; k < seqLength.size(); k++) { out << seqLength[k] << '\t'; } out << endl; for (int k = 0; k < ambigBases.size(); k++) { out << ambigBases[k] << '\t'; } out << endl; for (int k = 0; k < longHomoPolymer.size(); k++) { out << longHomoPolymer[k] << '\t'; } out << endl; for (int k = 0; k < numNs.size(); k++) { out << numNs[k] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(contigsreport + (toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(contigsreport + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); vector positions; lines.clear(); positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } startPosition.clear(); endPosition.clear(); seqLength.clear(); ambigBases.clear(); longHomoPolymer.clear(); numNs.clear(); num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, numNs, fastafile, lines[process]); //pass numSeqs to parent ofstream out; string tempFile = fastafile + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << startPosition.size() << endl; for (int k = 0; k < startPosition.size(); k++) { out << startPosition[k] << '\t'; } out << endl; for (int k = 0; k < endPosition.size(); k++) { out << endPosition[k] << '\t'; } out << endl; for (int k = 0; k < seqLength.size(); k++) { out << seqLength[k] << '\t'; } out << endl; for (int k = 0; k < ambigBases.size(); k++) { out << ambigBases[k] << '\t'; } out << endl; for (int k = 0; k < longHomoPolymer.size(); k++) { out << longHomoPolymer[k] << '\t'; } out << endl; for (int k = 0; k < numNs.size(); k++) { out << numNs[k] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } num = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, numNs, fastafile, lines[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFilename, in); int temp, tempNum; in >> tempNum; m->gobble(in); num += tempNum; in >> tempNum; m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; startPosition.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; endPosition.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; seqLength.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; ambigBases.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; longHomoPolymer.push_back(temp); } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; numNs.push_back(temp); } m->gobble(in); in.close(); m->mothurRemove(tempFilename); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the seqSumData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; icount; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } for (int k = 0; k < pDataArray[i]->startPosition.size(); k++) { startPosition.push_back(pDataArray[i]->startPosition[k]); } for (int k = 0; k < pDataArray[i]->endPosition.size(); k++) { endPosition.push_back(pDataArray[i]->endPosition[k]); } for (int k = 0; k < pDataArray[i]->seqLength.size(); k++) { seqLength.push_back(pDataArray[i]->seqLength[k]); } for (int k = 0; k < pDataArray[i]->ambigBases.size(); k++) { ambigBases.push_back(pDataArray[i]->ambigBases[k]); } for (int k = 0; k < pDataArray[i]->longHomoPolymer.size(); k++) { longHomoPolymer.push_back(pDataArray[i]->longHomoPolymer[k]); } for (int k = 0; k < pDataArray[i]->numNs.size(); k++) { numNs.push_back(pDataArray[i]->numNs[k]); } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif return num; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "createProcessesCreateSummary"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::screenGroupFile(map badSeqNames){ try { ifstream inputGroups; m->openInputFile(groupfile, inputGroups); string seqName, group; map::iterator it; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string goodGroupFile = getOutputFileName("group", variables); outputNames.push_back(goodGroupFile); outputTypes["group"].push_back(goodGroupFile); ofstream goodGroupOut; m->openOutputFile(goodGroupFile, goodGroupOut); while(!inputGroups.eof()){ if (m->control_pressed) { goodGroupOut.close(); inputGroups.close(); m->mothurRemove(goodGroupFile); return 0; } inputGroups >> seqName; m->gobble(inputGroups); inputGroups >> group; m->gobble(inputGroups); it = badSeqNames.find(seqName); if(it != badSeqNames.end()){ badSeqNames.erase(it); } else{ goodGroupOut << seqName << '\t' << group << endl; } } if (m->control_pressed) { goodGroupOut.close(); inputGroups.close(); m->mothurRemove(goodGroupFile); return 0; } //we were unable to remove some of the bad sequences if (badSeqNames.size() != 0) { for (it = badSeqNames.begin(); it != badSeqNames.end(); it++) { m->mothurOut("Your groupfile does not include the sequence " + it->first + " please correct."); m->mothurOutEndLine(); } } inputGroups.close(); goodGroupOut.close(); if (m->control_pressed) { m->mothurRemove(goodGroupFile); } return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenGroupFile"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::screenCountFile(map badSeqNames){ try { ifstream in; m->openInputFile(countfile, in); map::iterator it; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string goodCountFile = getOutputFileName("count", variables); outputNames.push_back(goodCountFile); outputTypes["count"].push_back(goodCountFile); ofstream goodCountOut; m->openOutputFile(goodCountFile, goodCountOut); string headers = m->getline(in); m->gobble(in); goodCountOut << headers << endl; string test = headers; vector pieces = m->splitWhiteSpace(test); string name, rest; int thisTotal; rest = ""; while (!in.eof()) { if (m->control_pressed) { goodCountOut.close(); in.close(); m->mothurRemove(goodCountFile); return 0; } in >> name; m->gobble(in); in >> thisTotal; if (pieces.size() > 2) { rest = m->getline(in); m->gobble(in); } it = badSeqNames.find(name); if(it != badSeqNames.end()){ badSeqNames.erase(it); } else{ goodCountOut << name << '\t' << thisTotal << '\t' << rest << endl; } } if (m->control_pressed) { goodCountOut.close(); in.close(); m->mothurRemove(goodCountFile); return 0; } //we were unable to remove some of the bad sequences if (badSeqNames.size() != 0) { for (it = badSeqNames.begin(); it != badSeqNames.end(); it++) { m->mothurOut("Your count file does not include the sequence " + it->first + " please correct."); m->mothurOutEndLine(); } } in.close(); goodCountOut.close(); //check for groups that have been eliminated CountTable ct; if (ct.testGroups(goodCountFile)) { ct.readTable(goodCountFile, true, false); ct.printTable(goodCountFile); } if (m->control_pressed) { m->mothurRemove(goodCountFile); } return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenCountFile"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::screenTaxonomy(map badSeqNames){ try { ifstream input; m->openInputFile(taxonomy, input); string seqName, tax; map::iterator it; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(taxonomy)); variables["[extension]"] = m->getExtension(taxonomy); string goodTaxFile = getOutputFileName("taxonomy", variables); outputNames.push_back(goodTaxFile); outputTypes["taxonomy"].push_back(goodTaxFile); ofstream goodTaxOut; m->openOutputFile(goodTaxFile, goodTaxOut); while(!input.eof()){ if (m->control_pressed) { goodTaxOut.close(); input.close(); m->mothurRemove(goodTaxFile); return 0; } input >> seqName; m->gobble(input); input >> tax; it = badSeqNames.find(seqName); if(it != badSeqNames.end()){ badSeqNames.erase(it); } else{ goodTaxOut << seqName << '\t' << tax << endl; } m->gobble(input); } if (m->control_pressed) { goodTaxOut.close(); input.close(); m->mothurRemove(goodTaxFile); return 0; } //we were unable to remove some of the bad sequences if (badSeqNames.size() != 0) { for (it = badSeqNames.begin(); it != badSeqNames.end(); it++) { m->mothurOut("Your taxonomy file does not include the sequence " + it->first + " please correct."); m->mothurOutEndLine(); } } input.close(); goodTaxOut.close(); if (m->control_pressed) { m->mothurRemove(goodTaxFile); return 0; } return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenTaxonomy"); exit(1); } } //*************************************************************************************************************** int ScreenSeqsCommand::screenQual(map badSeqNames){ try { ifstream in; m->openInputFile(qualfile, in); map::iterator it; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(qualfile)); variables["[extension]"] = m->getExtension(qualfile); string goodQualFile = getOutputFileName("qfile", variables); outputNames.push_back(goodQualFile); outputTypes["qfile"].push_back(goodQualFile); ofstream goodQual; m->openOutputFile(goodQualFile, goodQual); while(!in.eof()){ if (m->control_pressed) { goodQual.close(); in.close(); m->mothurRemove(goodQualFile); return 0; } string saveName = ""; string name = ""; string scores = ""; in >> name; if (name.length() != 0) { saveName = name.substr(1); while (!in.eof()) { char c = in.get(); if (c == 10 || c == 13 || c == -1){ break; } else { name += c; } } m->gobble(in); } while(in){ char letter= in.get(); if(letter == '>'){ in.putback(letter); break; } else{ scores += letter; } } m->gobble(in); it = badSeqNames.find(saveName); if(it != badSeqNames.end()){ badSeqNames.erase(it); }else{ goodQual << name << endl << scores; } m->gobble(in); } in.close(); goodQual.close(); //we were unable to remove some of the bad sequences if (badSeqNames.size() != 0) { for (it = badSeqNames.begin(); it != badSeqNames.end(); it++) { m->mothurOut("Your qual file does not include the sequence " + it->first + " please correct."); m->mothurOutEndLine(); } } if (m->control_pressed) { m->mothurRemove(goodQualFile); return 0; } return 0; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "screenQual"); exit(1); } } //********************************************************************************************************************** int ScreenSeqsCommand::driver(linePair filePos, string goodFName, string badAccnosFName, string filename, map& badSeqNames){ try { ofstream goodFile; m->openOutputFile(goodFName, goodFile); ofstream badAccnosFile; m->openOutputFile(badAccnosFName, badAccnosFile); ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(filePos.start); bool done = false; int count = 0; while (!done) { if (m->control_pressed) { return 0; } Sequence currSeq(inFASTA); m->gobble(inFASTA); if (currSeq.getName() != "") { bool goodSeq = 1; // innocent until proven guilty string trashCode = ""; //have the report files found you bad map::iterator it = badSeqNames.find(currSeq.getName()); if (it != badSeqNames.end()) { goodSeq = 0; trashCode = it->second; } if (summaryfile == "") { //summaryfile includes these so no need to check again if(startPos != -1 && startPos < currSeq.getStartPos()) { goodSeq = 0; trashCode += "start|"; } if(endPos != -1 && endPos > currSeq.getEndPos()) { goodSeq = 0; trashCode += "end|";} if(maxAmbig != -1 && maxAmbig < currSeq.getAmbigBases()) { goodSeq = 0; trashCode += "ambig|";} if(maxHomoP != -1 && maxHomoP < currSeq.getLongHomoPolymer()) { goodSeq = 0; trashCode += "homop|";} if(minLength != -1 && minLength > currSeq.getNumBases()) { goodSeq = 0; trashCode += "= filePos.end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((count) % 100 == 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count)+"\n"); } } //report progress if((count) % 100 != 0){ m->mothurOutJustToScreen("Processing sequence: " + toString(count)+"\n"); } goodFile.close(); inFASTA.close(); badAccnosFile.close(); return count; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "driver"); exit(1); } } //********************************************************************************************************************** #ifdef USE_MPI int ScreenSeqsCommand::driverMPI(int start, int num, MPI_File& inMPI, MPI_File& goodFile, MPI_File& badAccnosFile, vector& MPIPos, map& badSeqNames){ try { string outputString = ""; MPI_Status statusGood; MPI_Status statusBadAccnos; MPI_Status status; int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are for(int i=0;icontrol_pressed) { return 0; } //read next sequence int length = MPIPos[start+i+1] - MPIPos[start+i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, MPIPos[start+i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; delete buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } istringstream iss (tempBuf,istringstream::in); Sequence currSeq(iss); //process seq if (currSeq.getName() != "") { bool goodSeq = 1; // innocent until proven guilty string trashCode = ""; //have the report files found you bad map::iterator it = badSeqNames.find(currSeq.getName()); if (it != badSeqNames.end()) { goodSeq = 0; trashCode = it->second; } if (summaryfile == "") { //summaryfile includes these so no need to check again if(startPos != -1 && startPos < currSeq.getStartPos()) { goodSeq = 0; trashCode += "start|"; } if(endPos != -1 && endPos > currSeq.getEndPos()) { goodSeq = 0; trashCode += "end|";} if(maxAmbig != -1 && maxAmbig < currSeq.getAmbigBases()) { goodSeq = 0; trashCode += "ambig|";} if(maxHomoP != -1 && maxHomoP < currSeq.getLongHomoPolymer()) { goodSeq = 0; trashCode += "homop|";} if(minLength != -1 && minLength > currSeq.getNumBases()) { goodSeq = 0; trashCode += "mothurOutJustToScreen("Processing sequence: " + toString(i)+"\n"); } } return 1; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "driverMPI"); exit(1); } } #endif /**************************************************************************************************/ int ScreenSeqsCommand::createProcesses(string goodFileName, string badAccnos, string filename, map& badSeqNames) { try { vector processIDS; int process = 1; int num = 0; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], goodFileName + m->mothurGetpid(process) + ".temp", badAccnos + m->mothurGetpid(process) + ".temp", filename, badSeqNames); //pass numSeqs to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(filename + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(lines[process], goodFileName + m->mothurGetpid(process) + ".temp", badAccnos + m->mothurGetpid(process) + ".temp", filename, badSeqNames); //pass numSeqs to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } num = driver(lines[0], goodFileName, badAccnos, filename, badSeqNames); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; num += tempNum; } in.close(); m->mothurRemove(tempFile); m->appendFiles((goodFileName + toString(processIDS[i]) + ".temp"), goodFileName); m->mothurRemove((goodFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((badAccnos + toString(processIDS[i]) + ".temp"), badAccnos); m->mothurRemove((badAccnos + toString(processIDS[i]) + ".temp")); } //read badSeqs in because root process doesnt know what other "bad" seqs the children found ifstream inBad; int ableToOpen = m->openInputFile(badAccnos, inBad, "no error"); if (ableToOpen == 0) { badSeqNames.clear(); string tempName, trashCode; while (!inBad.eof()) { inBad >> tempName >> trashCode; m->gobble(inBad); badSeqNames[tempName] = trashCode; } inBad.close(); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the sumScreenData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to badSeqNames. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; icount; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } for (map::iterator it = pDataArray[i]->badSeqNames.begin(); it != pDataArray[i]->badSeqNames.end(); it++) { badSeqNames[it->first] = it->second; } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } for (int i = 0; i < processIDS.size(); i++) { m->appendFiles((goodFileName + toString(processIDS[i]) + ".temp"), goodFileName); m->mothurRemove((goodFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((badAccnos + toString(processIDS[i]) + ".temp"), badAccnos); m->mothurRemove((badAccnos + toString(processIDS[i]) + ".temp")); } #endif return num; } catch(exception& e) { m->errorOut(e, "ScreenSeqsCommand", "createProcesses"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/commands/screenseqscommand.h000066400000000000000000000422321255543666200221250ustar00rootroot00000000000000#ifndef SCREENSEQSCOMMAND_H #define SCREENSEQSCOMMAND_H /* * screenseqscommand.h * Mothur * * Created by Pat Schloss on 6/3/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "sequence.hpp" class ScreenSeqsCommand : public Command { public: ScreenSeqsCommand(string); ScreenSeqsCommand(); ~ScreenSeqsCommand() {} vector setParameters(); string getCommandName() { return "screen.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Screen.seqs"; } string getDescription() { return "enables you to keep sequences that fulfill certain user defined criteria"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lines; int screenNameGroupFile(map); int screenGroupFile(map); int screenCountFile(map); int screenAlignReport(map&); int screenQual(map); int screenTaxonomy(map); int optimizeContigs(); int optimizeAlign(); int driver(linePair, string, string, string, map&); int createProcesses(string, string, string, map&); int screenSummary(map&); int screenContigs(map&); int runFastaScreening(map&); int screenFasta(map&); int screenReports(map&); int getSummary(vector&); int createProcessesCreateSummary(vector&, vector&, vector&, vector&, vector&, vector&, string); int driverCreateSummary(vector&, vector&, vector&, vector&, vector&, vector&, string, linePair); int getSummaryReport(); int driverContigsSummary(vector&, vector&, vector&, vector&, vector&, linePair); int createProcessesContigsSummary(vector&, vector&, vector&, vector&, vector&, vector); int driverAlignSummary(vector&, vector&, vector&, linePair); int createProcessesAlignSummary(vector&, vector&, vector&, vector); #ifdef USE_MPI int driverMPI(int, int, MPI_File&, MPI_File&, MPI_File&, vector&, map&); #endif bool abort; string fastafile, namefile, groupfile, alignreport, outputDir, qualfile, taxonomy, countfile, contigsreport, summaryfile; int startPos, endPos, maxAmbig, maxHomoP, minLength, maxLength, processors, minOverlap, oStart, oEnd, mismatches, maxN, maxInsert; float minSim, minScore, criteria; vector outputNames; vector optimize; map nameMap; }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct sumData { vector startPosition; vector endPosition; vector seqLength; vector ambigBases; vector longHomoPolymer; vector numNs; string filename, namefile, countfile; unsigned long long start; unsigned long long end; int count; MothurOut* m; map nameMap; sumData(){} sumData(string f, MothurOut* mout, unsigned long long st, unsigned long long en, string nf, string cf, map nam) { filename = f; namefile = nf; countfile = cf; m = mout; start = st; end = en; nameMap = nam; count = 0; } }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct contigsSumData { vector ostartPosition; vector oendPosition; vector oLength; vector omismatches; vector numNs; string filename, namefile, countfile; unsigned long long start; unsigned long long end; int count; MothurOut* m; map nameMap; contigsSumData(){} contigsSumData(string f, MothurOut* mout, unsigned long long st, unsigned long long en, string nf, string cf, map nam) { filename = f; namefile = nf; countfile = cf; m = mout; start = st; end = en; nameMap = nam; count = 0; } }; /**************************************************************************************************/ struct alignsData { vector sims; vector scores; vector inserts; string filename, namefile, countfile; unsigned long long start; unsigned long long end; int count; MothurOut* m; map nameMap; alignsData(){} alignsData(string f, MothurOut* mout, unsigned long long st, unsigned long long en, string nf, string cf, map nam) { filename = f; namefile = nf; countfile = cf; m = mout; start = st; end = en; nameMap = nam; count = 0; } }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct sumScreenData { int startPos, endPos, maxAmbig, maxHomoP, minLength, maxLength, maxN; unsigned long long start; unsigned long long end; int count; MothurOut* m; string goodFName, badAccnosFName, filename; map badSeqNames; string summaryfile, contigsreport; sumScreenData(){} sumScreenData(int s, int e, int a, int h, int minl, int maxl, int mn, map bs, string f, string sum, string cont, MothurOut* mout, unsigned long long st, unsigned long long en, string gf, string bf) { startPos = s; endPos = e; minLength = minl; maxLength = maxl; maxAmbig = a; maxHomoP = h; maxN = mn; filename = f; goodFName = gf; badAccnosFName = bf; m = mout; start = st; end = en; summaryfile = sum; contigsreport = cont; badSeqNames = bs; count = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MySumThreadFunction(LPVOID lpParam){ sumData* pDataArray; pDataArray = (sumData*)lpParam; try { ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->zapGremlins(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process pDataArray->count++; if (pDataArray->m->control_pressed) { in.close(); pDataArray->count = 1; return 1; } Sequence current(in); pDataArray->m->gobble(in); if (current.getName() != "") { int num = 1; if ((pDataArray->namefile != "") || (pDataArray->countfile !="")){ //make sure this sequence is in the namefile, else error map::iterator it = pDataArray->nameMap.find(current.getName()); if (it == pDataArray->nameMap.end()) { pDataArray->m->mothurOut("[ERROR]: " + current.getName() + " is not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); pDataArray->m->control_pressed = true; } else { num = it->second; } } //for each sequence this sequence represents int numns = current.getNumNs(); for (int i = 0; i < num; i++) { pDataArray->startPosition.push_back(current.getStartPos()); pDataArray->endPosition.push_back(current.getEndPos()); pDataArray->seqLength.push_back(current.getNumBases()); pDataArray->ambigBases.push_back(current.getAmbigBases()); pDataArray->longHomoPolymer.push_back(current.getLongHomoPolymer()); pDataArray->numNs.push_back(numns); } } } in.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ScreenSeqsCommand", "MySumThreadFunction"); exit(1); } } /**************************************************************************************************/ static DWORD WINAPI MyContigsSumThreadFunction(LPVOID lpParam){ contigsSumData* pDataArray; pDataArray = (contigsSumData*)lpParam; try { string name; //Name Length Overlap_Length Overlap_Start Overlap_End MisMatches Num_Ns int length, OLength, thisOStart, thisOEnd, numMisMatches, numns; ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->getline(in); pDataArray->m->gobble(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process pDataArray->count++; if (pDataArray->m->control_pressed) { in.close(); pDataArray->count = 1; return 1; } //seqname start end nbases ambigs polymer numSeqs in >> name >> length >> OLength >> thisOStart >> thisOEnd >> numMisMatches >> numns; pDataArray->m->gobble(in); int num = 1; if ((pDataArray->namefile != "") || (pDataArray->countfile !="")){ //make sure this sequence is in the namefile, else error map::iterator it = pDataArray->nameMap.find(name); if (it == pDataArray->nameMap.end()) { pDataArray->m->mothurOut("[ERROR]: " + name + " is not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); pDataArray->m->control_pressed = true; } else { num = it->second; } } //for each sequence this sequence represents for (int i = 0; i < num; i++) { pDataArray->ostartPosition.push_back(thisOStart); pDataArray->oendPosition.push_back(thisOEnd); pDataArray->oLength.push_back(OLength); pDataArray->omismatches.push_back(numMisMatches); pDataArray->numNs.push_back(numns); } } in.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ScreenSeqsCommand", "MyContigsThreadFunction"); exit(1); } } /**************************************************************************************************/ static DWORD WINAPI MyAlignsThreadFunction(LPVOID lpParam){ alignsData* pDataArray; pDataArray = (alignsData*)lpParam; try { string name, TemplateName, SearchMethod, AlignmentMethod; //QueryName QueryLength TemplateName TemplateLength SearchMethod SearchScore AlignmentMethod QueryStart QueryEnd TemplateStart TemplateEnd PairwiseAlignmentLength GapsInQuery GapsInTemplate LongestInsert SimBtwnQuery&Template //checking for minScore, maxInsert, minSim int length, TemplateLength, QueryStart, QueryEnd, TemplateStart, TemplateEnd, PairwiseAlignmentLength, GapsInQuery, GapsInTemplate, LongestInsert; float SearchScore, SimBtwnQueryTemplate; ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->zapGremlins(in); pDataArray->m->getline(in); pDataArray->m->gobble(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process pDataArray->count++; if (pDataArray->m->control_pressed) { in.close(); pDataArray->count = 1; return 1; } in >> name >> length >> TemplateName >> TemplateLength >> SearchMethod >> SearchScore >> AlignmentMethod >> QueryStart >> QueryEnd >> TemplateStart >> TemplateEnd >> PairwiseAlignmentLength >> GapsInQuery >> GapsInTemplate >> LongestInsert >> SimBtwnQueryTemplate; pDataArray->m->gobble(in); //cout << i << '\t' << name << endl; int num = 1; if ((pDataArray->namefile != "") || (pDataArray->countfile !="")){ //make sure this sequence is in the namefile, else error map::iterator it = pDataArray->nameMap.find(name); if (it == pDataArray->nameMap.end()) { pDataArray->m->mothurOut("[ERROR]: " + name + " is not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); pDataArray->m->control_pressed = true; } else { num = it->second; } } //for each sequence this sequence represents for (int i = 0; i < num; i++) { pDataArray->sims.push_back(SimBtwnQueryTemplate); pDataArray->scores.push_back(SearchScore); pDataArray->inserts.push_back(LongestInsert); } } in.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ScreenSeqsCommand", "MyAlignsThreadFunction"); exit(1); } } /**************************************************************************************************/ static DWORD WINAPI MySumScreenThreadFunction(LPVOID lpParam){ sumScreenData* pDataArray; pDataArray = (sumScreenData*)lpParam; try { ofstream goodFile; pDataArray->m->openOutputFile(pDataArray->goodFName, goodFile); ofstream badAccnosFile; pDataArray->m->openOutputFile(pDataArray->badAccnosFName, badAccnosFile); ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->zapGremlins(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process pDataArray->count++; if (pDataArray->m->control_pressed) { in.close(); badAccnosFile.close(); goodFile.close(); pDataArray->count = 1; return 1; } Sequence currSeq(in); pDataArray->m->gobble(in); if (currSeq.getName() != "") { bool goodSeq = 1; // innocent until proven guilty string trashCode = ""; //have the report files found you bad map::iterator it = pDataArray->badSeqNames.find(currSeq.getName()); if (it != pDataArray->badSeqNames.end()) { goodSeq = 0; trashCode = it->second; } //found it if (pDataArray->summaryfile == "") { if(pDataArray->startPos != -1 && pDataArray->startPos < currSeq.getStartPos()) { goodSeq = 0; trashCode += "start|"; } if(pDataArray->endPos != -1 && pDataArray->endPos > currSeq.getEndPos()) { goodSeq = 0; trashCode += "end|"; } if(pDataArray->maxAmbig != -1 && pDataArray->maxAmbig < currSeq.getAmbigBases()) { goodSeq = 0; trashCode += "ambig|"; } if(pDataArray->maxHomoP != -1 && pDataArray->maxHomoP < currSeq.getLongHomoPolymer()) { goodSeq = 0; trashCode += "homop|"; } if(pDataArray->minLength != -1 && pDataArray->minLength > currSeq.getNumBases()) { goodSeq = 0; trashCode += "maxLength != -1 && pDataArray->maxLength < currSeq.getNumBases()) { goodSeq = 0; trashCode += ">length|"; } } if (pDataArray->contigsreport == "") { //contigs report includes this so no need to check again if(pDataArray->maxN != -1 && pDataArray->maxN < currSeq.getNumNs()) { goodSeq = 0; trashCode += "n|"; } } if(goodSeq == 1){ currSeq.printSequence(goodFile); } else{ badAccnosFile << currSeq.getName() << '\t' << trashCode.substr(0, trashCode.length()-1) << endl; pDataArray->badSeqNames[currSeq.getName()] = trashCode; } } //report progress if((i+1) % 100 == 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(i+1)+"\n"); } } //report progress if((pDataArray->count) % 100 != 0){ pDataArray->m->mothurOutJustToScreen("Processing sequence: " + toString(pDataArray->count)+"\n"); } in.close(); goodFile.close(); badAccnosFile.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ScreenSeqsCommand", "MySumScreenThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/secondarystructurecommand.cpp000066400000000000000000000423701255543666200242600ustar00rootroot00000000000000/* * secondarystructurecommand.cpp * Mothur * * Created by westcott on 9/18/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "secondarystructurecommand.h" #include "sequence.hpp" #include "counttable.h" //********************************************************************************************************************** vector AlignCheckCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","aligncheck",false,true,true); parameters.push_back(pfasta); CommandParameter pmap("map", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(pmap); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","",false,false); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","",false,false); parameters.push_back(pcount); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "AlignCheckCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string AlignCheckCommand::getHelpString(){ try { string helpString = ""; helpString += "The align.check command reads a fasta file and map file as well as an optional name or count file.\n"; helpString += "It outputs a file containing the secondary structure matches in the .align.check file.\n"; helpString += "The align.check command parameters are fasta and map, both are required.\n"; helpString += "The align.check command should be in the following format: align.check(fasta=yourFasta, map=yourMap).\n"; helpString += "Example align.check(map=silva.ss.map, fasta=amazon.fasta).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "AlignCheckCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string AlignCheckCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "aligncheck") { pattern = "[filename],align.check"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "AlignCheckCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** AlignCheckCommand::AlignCheckCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["aligncheck"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "AlignCheckCommand", "AlignCheckCommand"); exit(1); } } //********************************************************************************************************************** AlignCheckCommand::AlignCheckCommand(string option) { try { abort = false; calledHelp = false; haderror = 0; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["aligncheck"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("map"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["map"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters mapfile = validParameter.validFile(parameters, "map", true); if (mapfile == "not open") { abort = true; } else if (mapfile == "not found") { mapfile = ""; m->mothurOut("You must provide an map file."); m->mothurOutEndLine(); abort = true; } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (namefile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(fastafile); //if user entered a file with a path then preserve it } if (countfile == "") { if ((namefile == "") && (fastafile != "")){ vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "AlignCheckCommand", "AlignCheckCommand"); exit(1); } } //********************************************************************************************************************** int AlignCheckCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //get secondary structure info. readMap(); if (namefile != "") { nameMap = m->readNames(namefile); } else if (countfile != "") { CountTable ct; ct.readTable(countfile, false, false); nameMap = ct.getNameMap(); } if (m->control_pressed) { return 0; } ifstream in; m->openInputFile(fastafile, in); ofstream out; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); string outfile = getOutputFileName("aligncheck",variables); m->openOutputFile(outfile, out); out << "name" << '\t' << "pound" << '\t' << "dash" << '\t' << "plus" << '\t' << "equal" << '\t'; out << "loop" << '\t' << "tilde" << '\t' << "total" << '\t' << "numseqs" << endl; vector pound; vector dash; vector plus; vector equal; vector loop; vector tilde; vector total; int count = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outfile); return 0; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { statData data = getStats(seq.getAligned()); if (haderror == 1) { m->control_pressed = true; break; } int num = 1; if ((namefile != "") || (countfile != "")) { //make sure this sequence is in the namefile, else error map::iterator it = nameMap.find(seq.getName()); if (it == nameMap.end()) { m->mothurOut("[ERROR]: " + seq.getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } else { num = it->second; } } //for each sequence this sequence represents for (int i = 0; i < num; i++) { pound.push_back(data.pound); dash.push_back(data.dash); plus.push_back(data.plus); equal.push_back(data.equal); loop.push_back(data.loop); tilde.push_back(data.tilde); total.push_back(data.total); } count++; out << seq.getName() << '\t' << data.pound << '\t' << data.dash << '\t' << data.plus << '\t' << data.equal << '\t'; out << data.loop << '\t' << data.tilde << '\t' << data.total << '\t' << num << endl; } } in.close(); out.close(); if (m->control_pressed) { m->mothurRemove(outfile); return 0; } sort(pound.begin(), pound.end()); sort(dash.begin(), dash.end()); sort(plus.begin(), plus.end()); sort(equal.begin(), equal.end()); sort(loop.begin(), loop.end()); sort(tilde.begin(), tilde.end()); sort(total.begin(), total.end()); int size = pound.size(); int ptile0_25 = int(size * 0.025); int ptile25 = int(size * 0.250); int ptile50 = int(size * 0.500); int ptile75 = int(size * 0.750); int ptile97_5 = int(size * 0.975); int ptile100 = size - 1; if (m->control_pressed) { m->mothurRemove(outfile); return 0; } m->mothurOutEndLine(); m->mothurOut("\t\tPound\tDash\tPlus\tEqual\tLoop\tTilde\tTotal"); m->mothurOutEndLine(); m->mothurOut("Minimum:\t" + toString(pound[0]) + "\t" + toString(dash[0]) + "\t" + toString(plus[0]) + "\t" + toString(equal[0]) + "\t" + toString(loop[0]) + "\t" + toString(tilde[0]) + "\t" + toString(total[0])); m->mothurOutEndLine(); m->mothurOut("2.5%-tile:\t" + toString(pound[ptile0_25]) + "\t" + toString(dash[ptile0_25]) + "\t" + toString(plus[ptile0_25]) + "\t" + toString(equal[ptile0_25]) + "\t"+ toString(loop[ptile0_25]) + "\t"+ toString(tilde[ptile0_25]) + "\t"+ toString(total[ptile0_25])); m->mothurOutEndLine(); m->mothurOut("25%-tile:\t" + toString(pound[ptile25]) + "\t" + toString(dash[ptile25]) + "\t" + toString(plus[ptile25]) + "\t" + toString(equal[ptile25]) + "\t" + toString(loop[ptile25]) + "\t" + toString(tilde[ptile25]) + "\t" + toString(total[ptile25])); m->mothurOutEndLine(); m->mothurOut("Median: \t" + toString(pound[ptile50]) + "\t" + toString(dash[ptile50]) + "\t" + toString(plus[ptile50]) + "\t" + toString(equal[ptile50]) + "\t" + toString(loop[ptile50]) + "\t" + toString(tilde[ptile50]) + "\t" + toString(total[ptile50])); m->mothurOutEndLine(); m->mothurOut("75%-tile:\t" + toString(pound[ptile75]) + "\t" + toString(dash[ptile75]) + "\t" + toString(plus[ptile75]) + "\t" + toString(equal[ptile75]) + "\t" + toString(loop[ptile75]) + "\t" + toString(tilde[ptile75]) + "\t" + toString(total[ptile75])); m->mothurOutEndLine(); m->mothurOut("97.5%-tile:\t" + toString(pound[ptile97_5]) + "\t" + toString(dash[ptile97_5]) + "\t" + toString(plus[ptile97_5]) + "\t" + toString(equal[ptile97_5]) + "\t" + toString(loop[ptile97_5]) + "\t" + toString(tilde[ptile97_5]) + "\t" + toString(total[ptile97_5])); m->mothurOutEndLine(); m->mothurOut("Maximum:\t" + toString(pound[ptile100]) + "\t" + toString(dash[ptile100]) + "\t" + toString(plus[ptile100]) + "\t" + toString(equal[ptile100]) + "\t" + toString(loop[ptile100]) + "\t" + toString(tilde[ptile100]) + "\t" + toString(total[ptile100])); m->mothurOutEndLine(); if ((namefile == "") && (countfile == "")) { m->mothurOut("# of Seqs:\t" + toString(count)); m->mothurOutEndLine(); } else { m->mothurOut("# of unique seqs:\t" + toString(count)); m->mothurOutEndLine(); m->mothurOut("total # of seqs:\t" + toString(size)); m->mothurOutEndLine(); } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outfile); m->mothurOutEndLine(); outputNames.push_back(outfile); outputTypes["aligncheck"].push_back(outfile); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "AlignCheckCommand", "execute"); exit(1); } } //********************************************************************************************************************** void AlignCheckCommand::readMap(){ try { structMap.resize(1, 0); ifstream in; m->openInputFile(mapfile, in); while(!in.eof()){ int position; in >> position; structMap.push_back(position); m->gobble(in); } in.close(); seqLength = structMap.size(); //check you make sure is structMap[10] = 380 then structMap[380] = 10. for(int i=0;imothurOut("Your map file contains an error: line " + toString(i) + " does not match line " + toString(structMap[i]) + "."); m->mothurOutEndLine(); } } } } catch(exception& e) { m->errorOut(e, "AlignCheckCommand", "readMap"); exit(1); } } /**************************************************************************************************/ statData AlignCheckCommand::getStats(string sequence){ try { statData data; sequence = "*" + sequence; // need to pad the sequence so we can index it by 1 int length = sequence.length(); if (length != seqLength) { m->mothurOut("your sequences are " + toString(length) + " long, but your map file only contains " + toString(seqLength) + " entries. please correct."); m->mothurOutEndLine(); haderror = 1; return data; } for(int i=1;ierrorOut(e, "AlignCheckCommand", "getStats"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/secondarystructurecommand.h000066400000000000000000000031731255543666200237230ustar00rootroot00000000000000#ifndef SECONDARYSTRUCTURECHECKERCOMMAND_H #define SECONDARYSTRUCTURECHECKERCOMMAND_H /* * secondarystructurecommand.h * Mothur * * Created by westcott on 9/18/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "command.hpp" /**************************************************************************************************/ struct statData { int pound; int tilde; int dash; int plus; int equal; int loop; int total; statData() : pound(0), loop(0), tilde(0), dash(0), plus(0), equal(0), total(0) {}; }; /**************************************************************************************************/ class AlignCheckCommand : public Command { public: AlignCheckCommand(string); AlignCheckCommand(); ~AlignCheckCommand(){} vector setParameters(); string getCommandName() { return "align.check"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Align.check"; } string getDescription() { return "calculate the number of potentially misaligned bases in a 16S rRNA gene sequence alignment"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector structMap; string mapfile, fastafile, outputDir, namefile, countfile; bool abort; int seqLength, haderror; vector outputNames; map nameMap; void readMap(); statData getStats(string sequence); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/sensspeccommand.cpp000066400000000000000000000670071255543666200221370ustar00rootroot00000000000000/* * sensspeccommand.cpp * Mothur * * Created by Pat Schloss on 7/6/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "sensspeccommand.h" //********************************************************************************************************************** vector SensSpecCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","sensspec",false,true,true); parameters.push_back(plist); CommandParameter pphylip("phylip", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "none","",false,false); parameters.push_back(pphylip); CommandParameter pcolumn("column", "InputTypes", "", "", "PhylipColumn", "PhylipColumn", "none","",false,false); parameters.push_back(pcolumn); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pcutoff("cutoff", "Number", "", "-1.00", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter pprecision("precision", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pprecision); CommandParameter phard("hard", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(phard); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SensSpecCommand::getHelpString(){ try { string helpString = ""; helpString += "The sens.spec command....\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SensSpecCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "sensspec") { pattern = "[filename],sensspec"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SensSpecCommand::SensSpecCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["sensspec"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "SensSpecCommand"); exit(1); } } //*************************************************************************************************************** SensSpecCommand::SensSpecCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { string temp; vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["sensspec"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } } //check for required parameters listFile = validParameter.validFile(parameters, "list", true); if (listFile == "not found") { listFile = m->getListFile(); if (listFile != "") { m->mothurOut("Using " + listFile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current list file and the list parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (listFile == "not open") { abort = true; } else { m->setListFile(listFile); } phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not found") { phylipfile = ""; } else if (phylipfile == "not open") { abort = true; } else { distFile = phylipfile; format = "phylip"; m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not found") { columnfile = ""; } else if (columnfile == "not open") { abort = true; } else { distFile = columnfile; format = "column"; m->setColumnFile(columnfile); } if ((phylipfile == "") && (columnfile == "")) { //is there are current file available for either of these? //give priority to column, then phylip columnfile = m->getColumnFile(); if (columnfile != "") { distFile = columnfile; format = "column"; m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { distFile = phylipfile; format = "phylip"; m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a phylip or column file."); m->mothurOutEndLine(); abort = true; } } }else if ((phylipfile != "") && (columnfile != "")) { m->mothurOut("When executing a sens.spec command you must enter ONLY ONE of the following: phylip or column."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(listFile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... temp = validParameter.validFile(parameters, "hard", false); if (temp == "not found"){ hard = 0; } else if(!m->isTrue(temp)) { hard = 0; } else if(m->isTrue(temp)) { hard = 1; } temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "-1.00"; } m->mothurConvert(temp, cutoff); // cout << cutoff << endl; temp = validParameter.validFile(parameters, "precision", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, precision); // cout << precision << endl; string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listFile)); sensSpecFileName = getOutputFileName("sensspec",variables); } } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "SensSpecCommand"); exit(1); } } //*************************************************************************************************************** int SensSpecCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } int startTime = time(NULL); //create list file with only unique names, saves time and memory by removing redundant names from list file that are not in the distance file. string newListFile = preProcessList(); if (newListFile != "") { listFile = newListFile; } setUpOutput(); outputNames.push_back(sensSpecFileName); outputTypes["sensspec"].push_back(sensSpecFileName); if(format == "phylip") { processPhylip(); } else if(format == "column") { processColumn(); } //remove temp file if created if (newListFile != "") { m->mothurRemove(newListFile); } if (m->control_pressed) { m->mothurRemove(sensSpecFileName); return 0; } m->mothurOut("It took " + toString(time(NULL) - startTime) + " to run sens.spec."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(sensSpecFileName); m->mothurOutEndLine(); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "execute"); exit(1); } } //*************************************************************************************************************** bool SensSpecCommand::testFile(){ try{ ifstream fileHandle; m->openInputFile(phylipfile, fileHandle); bool square = false; string numTest, name; fileHandle >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } char d; while((d=fileHandle.get()) != EOF){ if(isalnum(d)){ square = true; break; } if(d == '\n'){ square = false; break; } } fileHandle.close(); return square; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "testFile"); exit(1); } } //*************************************************************************************************************** int SensSpecCommand::processPhylip(){ try{ //probably need some checking to confirm that the names in the distance matrix are the same as those in the list file square = testFile(); string origCutoff = ""; bool getCutoff = 0; if(cutoff == -1.00) { getCutoff = 1; } else { origCutoff = toString(cutoff); cutoff += (0.49 / double(precision)); } map seqMap; string seqList; InputData input(listFile, "list"); ListVector* list = input.getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if(m->control_pressed){ for (int i = 0; i < outputNames.size(); i++){ m->mothurRemove(outputNames[i]); } delete list; return 0; } if(allLines == 1 || labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //process fillSeqMap(seqMap, list); process(seqMap, list->getLabel(), getCutoff, origCutoff); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //process fillSeqMap(seqMap, list); process(seqMap, list->getLabel(), getCutoff, origCutoff); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input.getListVector(); } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input.getListVector(lastLabel); //process fillSeqMap(seqMap, list); process(seqMap, list->getLabel(), getCutoff, origCutoff); delete list; } return 0; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "processPhylip"); exit(1); } } //*************************************************************************************************************** int SensSpecCommand::fillSeqMap(map& seqMap, ListVector*& list){ try { //for each otu for(int i=0;igetNumBins();i++){ if (m->control_pressed) { return 0; } string seqList = list->get(i); int seqListLength = seqList.length(); string seqName = ""; //parse bin by name, mapping each name to its otu number for(int j=0;jerrorOut(e, "SensSpecCommand", "fillSeqMap"); exit(1); } } //*************************************************************************************************************** int SensSpecCommand::fillSeqPairSet(set& seqPairSet, ListVector*& list){ try { int numSeqs = 0; //for each otu for(int i=0;igetNumBins();i++){ if (m->control_pressed) { return 0; } vector seqNameVector; string bin = list->get(i); m->splitAtComma(bin, seqNameVector); numSeqs += seqNameVector.size(); for(int j=0;jerrorOut(e, "SensSpecCommand", "fillSeqPairSet"); exit(1); } } //*************************************************************************************************************** int SensSpecCommand::process(map& seqMap, string label, bool& getCutoff, string& origCutoff){ try { int lNumSeqs = seqMap.size(); int pNumSeqs = 0; ifstream phylipFile; m->openInputFile(distFile, phylipFile); phylipFile >> pNumSeqs; if(pNumSeqs != lNumSeqs){ m->mothurOut("numSeq mismatch!\n"); /*m->control_pressed = true;*/ } string seqName; double distance; vector otuIndices(lNumSeqs, -1); truePositives = 0; falsePositives = 0; trueNegatives = 0; falseNegatives = 0; if(getCutoff == 1){ if(label != "unique"){ origCutoff = label; convert(label, cutoff); if(hard == 0){ cutoff += (0.49 / double(precision)); } } else{ origCutoff = "unique"; cutoff = 0.0000; } } m->mothurOut(label); m->mothurOutEndLine(); for(int i=0;icontrol_pressed) { return 0; } phylipFile >> seqName; otuIndices[i] = seqMap[seqName]; for(int j=0;j> distance; if(distance <= cutoff){ if(otuIndices[i] == otuIndices[j]) { truePositives++; } else { falseNegatives++; } } else{ if(otuIndices[i] == otuIndices[j]) { falsePositives++; } else { trueNegatives++; } } } if (square) { m->getline(phylipFile); } //get rest of line - redundant distances m->gobble(phylipFile); } phylipFile.close(); outputStatistics(label, origCutoff); return 0; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "process"); exit(1); } } //*************************************************************************************************************** int SensSpecCommand::process(set& seqPairSet, string label, bool& getCutoff, string& origCutoff, int numSeqs){ try { int numDists = (numSeqs * (numSeqs-1) / 2); ifstream columnFile; m->openInputFile(distFile, columnFile); string seqNameA, seqNameB, seqPairString; double distance; truePositives = 0; falsePositives = 0; trueNegatives = numDists; falseNegatives = 0; if(getCutoff == 1){ if(label != "unique"){ origCutoff = label; convert(label, cutoff); if(hard == 0){ cutoff += (0.49 / double(precision)); } } else{ origCutoff = "unique"; cutoff = 0.0000; } } m->mothurOut(label); m->mothurOutEndLine(); while(columnFile){ columnFile >> seqNameA >> seqNameB >> distance; if(seqNameA < seqNameB) { seqPairString = seqNameA + '\t' + seqNameB; } else { seqPairString = seqNameB + '\t' + seqNameA; } set::iterator it = seqPairSet.find(seqPairString); if(distance <= cutoff){ if(it != seqPairSet.end()){ truePositives++; seqPairSet.erase(it); } else{ falseNegatives++; } trueNegatives--; } else if(it != seqPairSet.end()){ falsePositives++; trueNegatives--; seqPairSet.erase(it); } m->gobble(columnFile); } falsePositives += seqPairSet.size(); outputStatistics(label, origCutoff); return 0; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "process"); exit(1); } } //*************************************************************************************************************** int SensSpecCommand::processColumn(){ try{ string origCutoff = ""; bool getCutoff = 0; if(cutoff == -1.00) { getCutoff = 1; } else { origCutoff = toString(cutoff); cutoff += (0.49 / double(precision)); } set seqPairSet; int numSeqs = 0; InputData input(listFile, "list"); ListVector* list = input.getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete list; return 0; } if(allLines == 1 || labels.count(list->getLabel()) == 1){ processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //process numSeqs = fillSeqPairSet(seqPairSet, list); process(seqPairSet, list->getLabel(), getCutoff, origCutoff, numSeqs); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //process numSeqs = fillSeqPairSet(seqPairSet, list); process(seqPairSet, list->getLabel(), getCutoff, origCutoff, numSeqs); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input.getListVector(); } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input.getListVector(lastLabel); //process numSeqs = fillSeqPairSet(seqPairSet, list); delete list; process(seqPairSet, list->getLabel(), getCutoff, origCutoff, numSeqs); } return 0; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "processColumn"); exit(1); } } //*************************************************************************************************************** void SensSpecCommand::setUpOutput(){ try{ ofstream sensSpecFile; m->openOutputFile(sensSpecFileName, sensSpecFile); sensSpecFile << "label\tcutoff\ttp\ttn\tfp\tfn\tsensitivity\tspecificity\tppv\tnpv\tfdr\taccuracy\tmcc\tf1score\n"; sensSpecFile.close(); } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "setUpOutput"); exit(1); } } //*************************************************************************************************************** void SensSpecCommand::outputStatistics(string label, string cutoff){ try{ double tp = (double) truePositives; double fp = (double) falsePositives; double tn = (double) trueNegatives; double fn = (double) falseNegatives; double p = tp + fn; double n = fp + tn; double pPrime = tp + fp; double nPrime = tn + fn; double sensitivity = tp / p; double specificity = tn / n; double positivePredictiveValue = tp / pPrime; double negativePredictiveValue = tn / nPrime; double falseDiscoveryRate = fp / pPrime; double accuracy = (tp + tn) / (p + n); double matthewsCorrCoef = (tp * tn - fp * fn) / sqrt(p * n * pPrime * nPrime); if(p == 0 || n == 0){ matthewsCorrCoef = 0; } double f1Score = 2.0 * tp / (p + pPrime); if(p == 0) { sensitivity = 0; matthewsCorrCoef = 0; } if(n == 0) { specificity = 0; matthewsCorrCoef = 0; } if(p + n == 0) { accuracy = 0; } if(p + pPrime == 0) { f1Score = 0; } if(pPrime == 0) { positivePredictiveValue = 0; falseDiscoveryRate = 0; matthewsCorrCoef = 0; } if(nPrime == 0) { negativePredictiveValue = 0; matthewsCorrCoef = 0; } ofstream sensSpecFile; m->openOutputFileAppend(sensSpecFileName, sensSpecFile); sensSpecFile << label << '\t' << cutoff << '\t'; sensSpecFile << truePositives << '\t' << trueNegatives << '\t' << falsePositives << '\t' << falseNegatives << '\t'; sensSpecFile << setprecision(4); sensSpecFile << sensitivity << '\t' << specificity << '\t' << positivePredictiveValue << '\t' << negativePredictiveValue << '\t'; sensSpecFile << falseDiscoveryRate << '\t' << accuracy << '\t' << matthewsCorrCoef << '\t' << f1Score << endl; sensSpecFile.close(); } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "outputStatistics"); exit(1); } } //*************************************************************************************************************** string SensSpecCommand::preProcessList(){ try { set uniqueNames; //get unique names from distance file if (format == "phylip") { ifstream phylipFile; m->openInputFile(distFile, phylipFile); string numTest; int pNumSeqs; phylipFile >> numTest; m->gobble(phylipFile); if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { m->mothurConvert(numTest, pNumSeqs); } string seqName; for(int i=0;icontrol_pressed) { return ""; } phylipFile >> seqName; m->getline(phylipFile); m->gobble(phylipFile); uniqueNames.insert(seqName); } phylipFile.close(); }else { ifstream columnFile; m->openInputFile(distFile, columnFile); string seqNameA, seqNameB; double distance; while(columnFile){ if (m->control_pressed) { return ""; } columnFile >> seqNameA >> seqNameB >> distance; uniqueNames.insert(seqNameA); uniqueNames.insert(seqNameB); m->gobble(columnFile); } columnFile.close(); } //read list file, if numSeqs > unique names then remove redundant names string newListFile = listFile + ".temp"; ofstream out; m->openOutputFile(newListFile, out); ifstream in; m->openInputFile(listFile, in); bool wroteSomething = false; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(newListFile); return ""; } //read in list vector ListVector list(in); //listfile is already unique if (list.getNumSeqs() == uniqueNames.size()) { in.close(); out.close(); m->mothurRemove(newListFile); return ""; } //make a new list vector ListVector newList; newList.setLabel(list.getLabel()); vector binLabels = list.getLabels(); vector newLabels; //for each bin for (int i = 0; i < list.getNumBins(); i++) { //parse out names that are in accnos file string binnames = list.get(i); vector bnames; m->splitAtComma(binnames, bnames); string newNames = ""; for (int j = 0; j < bnames.size(); j++) { string name = bnames[j]; //if that name is in the .accnos file, add it if (uniqueNames.count(name) != 0) { newNames += name + ","; } } //if there are names in this bin add to new list if (newNames != "") { newNames = newNames.substr(0, newNames.length()-1); //rip off extra comma newList.push_back(newNames); newLabels.push_back(binLabels[i]); } } //print new listvector if (newList.getNumBins() != 0) { wroteSomething = true; newList.setLabels(newLabels); if (!m->printedListHeaders) { newList.printHeaders(out); } newList.print(out); } m->gobble(in); } in.close(); out.close(); if (wroteSomething) { return newListFile; } else { m->mothurRemove(newListFile); } return ""; } catch(exception& e) { m->errorOut(e, "SensSpecCommand", "preProcessList"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/commands/sensspeccommand.h000066400000000000000000000033071255543666200215750ustar00rootroot00000000000000#ifndef SENSSPECCOMMAND_H #define SENSSPECCOMMAND_H /* * sensspeccommand.h * Mothur * * Created by Pat Schloss on 7/6/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "listvector.hpp" #include "inputdata.h" class SensSpecCommand : public Command { public: SensSpecCommand(string); SensSpecCommand(); ~SensSpecCommand(){} vector setParameters(); string getCommandName() { return "sens.spec"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Westcott SL (2011). Assessing and improving methods used in OTU-based approaches for 16S rRNA gene sequence analysis. Appl Environ Microbiol 77:3219.\nhttp://www.mothur.org/wiki/Sens.spec"; } string getDescription() { return "sens.spec"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: int processPhylip(); int processColumn(); void setUpOutput(); void outputStatistics(string, string); string listFile, distFile, nameFile, sensSpecFileName, phylipfile, columnfile; string outputDir; string format; vector outputNames; set labels; //holds labels to be used long int truePositives, falsePositives, trueNegatives, falseNegatives; bool abort, allLines, square; bool hard; //string lineLabel; double cutoff; int precision; int fillSeqMap(map&, ListVector*&); int fillSeqPairSet(set&, ListVector*&); int process(map&, string, bool&, string&); int process(set&, string, bool&, string&, int); string preProcessList(); bool testFile(); }; #endif mothur-1.36.1/source/commands/seqerrorcommand.cpp000066400000000000000000001622321255543666200221520ustar00rootroot00000000000000/* * seqerrorcommand.cpp * Mothur * * Created by Pat Schloss on 7/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "seqerrorcommand.h" #include "reportfile.h" #include "qualityscores.h" #include "refchimeratest.h" //********************************************************************************************************************** vector SeqErrorCommand::setParameters(){ try { CommandParameter pquery("fasta", "InputTypes", "", "", "none", "none", "none","errorType",false,true,true); parameters.push_back(pquery); CommandParameter preference("reference", "InputTypes", "", "", "none", "none", "none","",false,true,true); parameters.push_back(preference); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "none", "QualReport","",false,false); parameters.push_back(pqfile); CommandParameter preport("report", "InputTypes", "", "", "none", "none", "QualReport","",false,false); parameters.push_back(preport); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pignorechimeras("ignorechimeras", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pignorechimeras); CommandParameter pthreshold("threshold", "Number", "", "1.0", "", "", "","",false,false); parameters.push_back(pthreshold); CommandParameter paligned("aligned", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(paligned); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SeqErrorCommand::getHelpString(){ try { string helpString = ""; helpString += "The seq.error command reads a query alignment file and a reference alignment file and creates .....\n"; helpString += "The fasta parameter...\n"; helpString += "The reference parameter...\n"; helpString += "The qfile parameter...\n"; helpString += "The report parameter...\n"; helpString += "The name parameter allows you to provide a name file associated with the fasta file.\n"; helpString += "The count parameter allows you to provide a count file associated with the fasta file.\n"; helpString += "The ignorechimeras parameter...\n"; helpString += "The threshold parameter...\n"; helpString += "The processors parameter...\n"; helpString += "Example seq.error(...).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; helpString += "For more details please check out the wiki http://www.mothur.org/wiki/seq.error .\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SeqErrorCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "errorsummary") { pattern = "[filename],error.summary"; } else if (type == "errorseq") { pattern = "[filename],error.seq"; } else if (type == "errorquality") { pattern = "[filename],error.quality"; } else if (type == "errorqualforward") { pattern = "[filename],error.qual.forward"; } else if (type == "errorqualreverse") { pattern = "[filename],error.qual.reverse"; } else if (type == "errorforward") { pattern = "[filename],error.seq.forward"; } else if (type == "errorreverse") { pattern = "[filename],error.seq.reverse"; } else if (type == "errorcount") { pattern = "[filename],error.count"; } else if (type == "errormatrix") { pattern = "[filename],error.matrix"; } else if (type == "errorchimera") { pattern = "[filename],error.chimera"; } else if (type == "errorref-query") { pattern = "[filename],error.ref-query"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SeqErrorCommand::SeqErrorCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["errorsummary"] = tempOutNames; outputTypes["errorseq"] = tempOutNames; outputTypes["errorquality"] = tempOutNames; outputTypes["errorqualforward"] = tempOutNames; outputTypes["errorqualreverse"] = tempOutNames; outputTypes["errorforward"] = tempOutNames; outputTypes["errorreverse"] = tempOutNames; outputTypes["errorcount"] = tempOutNames; outputTypes["errormatrix"] = tempOutNames; outputTypes["errorchimera"] = tempOutNames; outputTypes["errorref-query"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "SeqErrorCommand"); exit(1); } } //*************************************************************************************************************** SeqErrorCommand::SeqErrorCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { string temp; vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["errorsummary"] = tempOutNames; outputTypes["errorseq"] = tempOutNames; outputTypes["errorquality"] = tempOutNames; outputTypes["errorqualforward"] = tempOutNames; outputTypes["errorqualreverse"] = tempOutNames; outputTypes["errorforward"] = tempOutNames; outputTypes["errorreverse"] = tempOutNames; outputTypes["errorcount"] = tempOutNames; outputTypes["errormatrix"] = tempOutNames; outputTypes["errorchimera"] = tempOutNames; outputTypes["errorref-query"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("reference"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reference"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a names file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a names file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a quality score file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("report"); //user has given a alignment report file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["report"] = inputDir + it->second; } } } //check for required parameters queryFileName = validParameter.validFile(parameters, "fasta", true); if (queryFileName == "not found") { queryFileName = m->getFastaFile(); if (queryFileName != "") { m->mothurOut("Using " + queryFileName + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fasta file and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (queryFileName == "not open") { queryFileName = ""; abort = true; } else { m->setFastaFile(queryFileName); } referenceFileName = validParameter.validFile(parameters, "reference", true); if (referenceFileName == "not found") { m->mothurOut("reference is a required parameter for the seq.error command."); m->mothurOutEndLine(); abort = true; } else if (referenceFileName == "not open") { abort = true; } //check for optional parameters namesFileName = validParameter.validFile(parameters, "name", true); if(namesFileName == "not found"){ namesFileName = ""; } else if (namesFileName == "not open") { namesFileName = ""; abort = true; } else { m->setNameFile(namesFileName); } //check for optional parameters countfile = validParameter.validFile(parameters, "count", true); if(countfile == "not found"){ countfile = ""; } else if (countfile == "not open") { countfile = ""; abort = true; } else { m->setCountTableFile(countfile); } qualFileName = validParameter.validFile(parameters, "qfile", true); if(qualFileName == "not found"){ qualFileName = ""; } else if (qualFileName == "not open") { qualFileName = ""; abort = true; } else { m->setQualFile(qualFileName); } reportFileName = validParameter.validFile(parameters, "report", true); if(reportFileName == "not found"){ reportFileName = ""; } else if (reportFileName == "not open") { reportFileName = ""; abort = true; } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ //if user entered a file with a path then preserve it outputDir = m->hasPath(queryFileName); } if ((countfile != "") && (namesFileName != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... temp = validParameter.validFile(parameters, "threshold", false); if (temp == "not found") { temp = "1.00"; } m->mothurConvert(temp, threshold); temp = validParameter.validFile(parameters, "ignorechimeras", false); if (temp == "not found") { temp = "T"; } ignoreChimeras = m->isTrue(temp); temp = validParameter.validFile(parameters, "aligned", false); if (temp == "not found"){ temp = "t"; } aligned = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if ((namesFileName == "") && (queryFileName != "")){ vector files; files.push_back(queryFileName); parser.getNameFile(files); } if(aligned == true){ if((reportFileName != "" && qualFileName == "") || (reportFileName == "" && qualFileName != "")){ m->mothurOut("if you use either a qual file or a report file, you have to have both."); m->mothurOutEndLine(); abort = true; } } else{ if(reportFileName != ""){ m->mothurOut("we are ignoring the report file if your sequences are not aligned. we will check that the sequences in your fasta and and qual file are the same length."); m->mothurOutEndLine(); } } } } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "SeqErrorCommand"); exit(1); } } //*************************************************************************************************************** int SeqErrorCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); maxLength = 5000; totalBases = 0; totalMatches = 0; substitutionMatrix.resize(6); for(int i=0;i<6;i++){ substitutionMatrix[i].resize(6,0); } string fileNameRoot = outputDir + m->getRootName(m->getSimpleName(queryFileName)); map variables; variables["[filename]"] = fileNameRoot; string errorSummaryFileName = getOutputFileName("errorsummary",variables); outputNames.push_back(errorSummaryFileName); outputTypes["errorsummary"].push_back(errorSummaryFileName); string errorSeqFileName = getOutputFileName("errorseq",variables); outputNames.push_back(errorSeqFileName); outputTypes["errorseq"].push_back(errorSeqFileName); string errorChimeraFileName = getOutputFileName("errorchimera",variables); outputNames.push_back(errorChimeraFileName); outputTypes["errorchimera"].push_back(errorChimeraFileName); setLines(queryFileName, qualFileName, reportFileName); if (m->control_pressed) { return 0; } getReferences(); //read in reference sequences - make sure there's no ambiguous bases if(namesFileName != "") { weights = getWeights(); } else if (countfile != "") { CountTable ct; ct.readTable(countfile, false, false); weights = ct.getNameMap(); } if (m->control_pressed) { return 0; } int numSeqs = 0; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if(processors == 1){ numSeqs = driver(queryFileName, qualFileName, reportFileName, errorSummaryFileName, errorSeqFileName, errorChimeraFileName, lines[0], qLines[0], rLines[0]); }else{ numSeqs = createProcesses(queryFileName, qualFileName, reportFileName, errorSummaryFileName, errorSeqFileName, errorChimeraFileName); } #else numSeqs = driver(queryFileName, qualFileName, reportFileName, errorSummaryFileName, errorSeqFileName, errorChimeraFileName, lines[0], qLines[0], rLines[0]); #endif if(qualFileName != ""){ printErrorQuality(qScoreErrorMap); printQualityFR(qualForwardMap, qualReverseMap); } printErrorFRFile(errorForward, errorReverse); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } string errorCountFileName = getOutputFileName("errorcount",variables); ofstream errorCountFile; m->openOutputFile(errorCountFileName, errorCountFile); outputNames.push_back(errorCountFileName); outputTypes["errorcount"].push_back(errorCountFileName); m->mothurOut("Overall error rate:\t" + toString((double)(totalBases - totalMatches) / (double)totalBases) + "\n"); m->mothurOut("Errors\tSequences\n"); errorCountFile << "Errors\tSequences\n"; for(int i=0;imothurOut(toString(i) + '\t' + toString(misMatchCounts[i]) + '\n'); errorCountFile << i << '\t' << misMatchCounts[i] << endl; } errorCountFile.close(); // if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } printSubMatrix(); string megAlignmentFileName = getOutputFileName("errorref-query",variables); ofstream megAlignmentFile; m->openOutputFile(megAlignmentFileName, megAlignmentFile); outputNames.push_back(megAlignmentFileName); outputTypes["errorref-query"].push_back(megAlignmentFileName); for(int i=0;imothurOut("It took " + toString(time(NULL) - start) + " secs to check " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SeqErrorCommand::createProcesses(string filename, string qFileName, string rFileName, string summaryFileName, string errorOutputFileName, string chimeraOutputFileName) { try { int process = 1; processIDS.clear(); map >::iterator it; int num = 0; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(filename, qFileName, rFileName, summaryFileName + m->mothurGetpid(process) + ".temp", errorOutputFileName+ m->mothurGetpid(process) + ".temp", chimeraOutputFileName + m->mothurGetpid(process) + ".temp", lines[process], qLines[process], rLines[process]); //pass groupCounts to parent ofstream out; string tempFile = filename + m->mothurGetpid(process) + ".info.temp"; m->openOutputFile(tempFile, out); //output totalBases and totalMatches out << num << '\t' << totalBases << '\t' << totalMatches << endl << endl; //output substitutionMatrix for(int i = 0; i < substitutionMatrix.size(); i++) { for (int j = 0; j < substitutionMatrix[i].size(); j++) { out << substitutionMatrix[i][j] << '\t'; } out << endl; } out << endl; //output qScoreErrorMap for (it = qScoreErrorMap.begin(); it != qScoreErrorMap.end(); it++) { vector thisScoreErrorMap = it->second; out << it->first << '\t'; for (int i = 0; i < thisScoreErrorMap.size(); i++) { out << thisScoreErrorMap[i] << '\t'; } out << endl; } out << endl; //output qualForwardMap for(int i = 0; i < qualForwardMap.size(); i++) { for (int j = 0; j < qualForwardMap[i].size(); j++) { out << qualForwardMap[i][j] << '\t'; } out << endl; } out << endl; //output qualReverseMap for(int i = 0; i < qualReverseMap.size(); i++) { for (int j = 0; j < qualReverseMap[i].size(); j++) { out << qualReverseMap[i][j] << '\t'; } out << endl; } out << endl; //output errorForward for (it = errorForward.begin(); it != errorForward.end(); it++) { vector thisErrorForward = it->second; out << it->first << '\t'; for (int i = 0; i < thisErrorForward.size(); i++) { out << thisErrorForward[i] << '\t'; } out << endl; } out << endl; //output errorReverse for (it = errorReverse.begin(); it != errorReverse.end(); it++) { vector thisErrorReverse = it->second; out << it->first << '\t'; for (int i = 0; i < thisErrorReverse.size(); i++) { out << thisErrorReverse[i] << '\t'; } out << endl; } out << endl; //output misMatchCounts out << misMatchCounts.size() << endl; for (int j = 0; j < misMatchCounts.size(); j++) { out << misMatchCounts[j] << '\t'; } out << endl; //output megaAlignVector for (int j = 0; j < megaAlignVector.size(); j++) { out << megaAlignVector[j] << endl; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } //do my part num = driver(filename, qFileName, rFileName, summaryFileName, errorOutputFileName, chimeraOutputFileName, lines[0], qLines[0], rLines[0]); //force parent to wait until all the processes are done for (int i=0;imothurOut("Appending files from process " + toString(processIDS[i])); m->mothurOutEndLine(); m->appendFiles((summaryFileName + toString(processIDS[i]) + ".temp"), summaryFileName); m->mothurRemove((summaryFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((errorOutputFileName + toString(processIDS[i]) + ".temp"), errorOutputFileName); m->mothurRemove((errorOutputFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((chimeraOutputFileName + toString(processIDS[i]) + ".temp"), chimeraOutputFileName); m->mothurRemove((chimeraOutputFileName + toString(processIDS[i]) + ".temp")); ifstream in; string tempFile = filename + toString(processIDS[i]) + ".info.temp"; m->openInputFile(tempFile, in); //input totalBases and totalMatches int tempBases, tempMatches, tempNumSeqs; in >> tempNumSeqs >> tempBases >> tempMatches; m->gobble(in); totalBases += tempBases; totalMatches += tempMatches; num += tempNumSeqs; //input substitutionMatrix int tempNum; for(int i = 0; i < substitutionMatrix.size(); i++) { for (int j = 0; j < substitutionMatrix[i].size(); j++) { in >> tempNum; substitutionMatrix[i][j] += tempNum; } m->gobble(in); } m->gobble(in); //input qScoreErrorMap char first; for (int i = 0; i < qScoreErrorMap.size(); i++) { in >> first; vector thisScoreErrorMap = qScoreErrorMap[first]; for (int i = 0; i < thisScoreErrorMap.size(); i++) { in >> tempNum; thisScoreErrorMap[i] += tempNum; } qScoreErrorMap[first] = thisScoreErrorMap; m->gobble(in); } m->gobble(in); //input qualForwardMap for(int i = 0; i < qualForwardMap.size(); i++) { for (int j = 0; j < qualForwardMap[i].size(); j++) { in >> tempNum; qualForwardMap[i][j] += tempNum; } m->gobble(in); } m->gobble(in); //input qualReverseMap for(int i = 0; i < qualReverseMap.size(); i++) { for (int j = 0; j < qualReverseMap[i].size(); j++) { in >> tempNum; qualReverseMap[i][j] += tempNum; } m->gobble(in); } m->gobble(in); //input errorForward for (int i = 0; i < errorForward.size(); i++) { in >> first; vector thisErrorForward = errorForward[first]; for (int i = 0; i < thisErrorForward.size(); i++) { in >> tempNum; thisErrorForward[i] += tempNum; } errorForward[first] = thisErrorForward; m->gobble(in); } m->gobble(in); //input errorReverse for (int i = 0; i < errorReverse.size(); i++) { in >> first; vector thisErrorReverse = errorReverse[first]; for (int i = 0; i < thisErrorReverse.size(); i++) { in >> tempNum; thisErrorReverse[i] += tempNum; } errorReverse[first] = thisErrorReverse; m->gobble(in); } m->gobble(in); //input misMatchCounts int misMatchSize; in >> misMatchSize; m->gobble(in); if (misMatchSize > misMatchCounts.size()) { misMatchCounts.resize(misMatchSize, 0); } for (int j = 0; j < misMatchSize; j++) { in >> tempNum; misMatchCounts[j] += tempNum; } m->gobble(in); //input megaAlignVector string thisLine; for (int j = 0; j < megaAlignVector.size(); j++) { thisLine = m->getline(in); m->gobble(in); megaAlignVector[j] += thisLine + '\n'; } m->gobble(in); in.close(); m->mothurRemove(tempFile); } #endif return num; } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** int SeqErrorCommand::driver(string filename, string qFileName, string rFileName, string summaryFileName, string errorOutputFileName, string chimeraOutputFileName, linePair line, linePair qline, linePair rline) { try { ReportFile report; QualityScores quality; misMatchCounts.resize(11, 0); int maxMismatch = 0; int numSeqs = 0; map::iterator it; qScoreErrorMap['m'].assign(101, 0); qScoreErrorMap['s'].assign(101, 0); qScoreErrorMap['i'].assign(101, 0); qScoreErrorMap['a'].assign(101, 0); errorForward['m'].assign(maxLength,0); errorForward['s'].assign(maxLength,0); errorForward['i'].assign(maxLength,0); errorForward['d'].assign(maxLength,0); errorForward['a'].assign(maxLength,0); errorReverse['m'].assign(maxLength,0); errorReverse['s'].assign(maxLength,0); errorReverse['i'].assign(maxLength,0); errorReverse['d'].assign(maxLength,0); errorReverse['a'].assign(maxLength,0); //open inputfiles and go to beginning place for this processor ifstream queryFile; m->openInputFile(filename, queryFile); queryFile.seekg(line.start); ifstream reportFile; ifstream qualFile; if((qFileName != "" && rFileName != "" && aligned)){ m->openInputFile(qFileName, qualFile); qualFile.seekg(qline.start); //gobble headers if (rline.start == 0) { report.readHeaders(reportFile, rFileName); } else{ m->openInputFile(rFileName, reportFile); reportFile.seekg(rline.start); } qualForwardMap.resize(maxLength); qualReverseMap.resize(maxLength); for(int i=0;iopenInputFile(qFileName, qualFile); qualFile.seekg(qline.start); qualForwardMap.resize(maxLength); qualReverseMap.resize(maxLength); for(int i=0;iopenOutputFile(chimeraOutputFileName, outChimeraReport); RefChimeraTest chimeraTest = RefChimeraTest(referenceSeqs, aligned); if (line.start == 0) { chimeraTest.printHeader(outChimeraReport); } ofstream errorSummaryFile; m->openOutputFile(summaryFileName, errorSummaryFile); if (line.start == 0) { printErrorHeader(errorSummaryFile); } ofstream errorSeqFile; m->openOutputFile(errorOutputFileName, errorSeqFile); megaAlignVector.assign(numRefs, ""); int index = 0; bool ignoreSeq = 0; bool moreSeqs = 1; while (moreSeqs) { Sequence query(queryFile); int numParentSeqs = -1; int closestRefIndex = -1; string querySeq = query.getAligned(); if (!aligned) { querySeq = query.getUnaligned(); } numParentSeqs = chimeraTest.analyzeQuery(query.getName(), querySeq, outChimeraReport); closestRefIndex = chimeraTest.getClosestRefIndex(); Sequence reference = referenceSeqs[closestRefIndex]; reference.setAligned(chimeraTest.getClosestRefAlignment()); query.setAligned(chimeraTest.getQueryAlignment()); if(numParentSeqs > 1 && ignoreChimeras == 1) { ignoreSeq = 1; } else { ignoreSeq = 0; } Compare minCompare = getErrors(query, reference); if((namesFileName != "") || (countfile != "")){ it = weights.find(query.getName()); minCompare.weight = it->second; } else{ minCompare.weight = 1; } printErrorData(minCompare, numParentSeqs, errorSummaryFile, errorSeqFile); if(!ignoreSeq){ for(int i=0;i maxMismatch){ maxMismatch = minCompare.mismatches; misMatchCounts.resize(maxMismatch + 1, 0); } misMatchCounts[minCompare.mismatches] += minCompare.weight; numSeqs++; megaAlignVector[closestRefIndex] += query.getInlineSeq() + '\n'; } index++; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = queryFile.tellg(); if ((pos == -1) || (pos >= line.end)) { break; } #else if (queryFile.eof()) { break; } #endif if(index % 100 == 0){ m->mothurOutJustToScreen(toString(index)+"\n"); } } queryFile.close(); outChimeraReport.close(); errorSummaryFile.close(); errorSeqFile.close(); if(qFileName != "" && rFileName != "") { reportFile.close(); qualFile.close(); } else if(qFileName != "" && aligned == false){ qualFile.close(); } //report progress m->mothurOutJustToScreen(toString(index)+"\n"); return index; } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "driver"); exit(1); } } //*************************************************************************************************************** void SeqErrorCommand::getReferences(){ try { int numAmbigSeqs = 0; int maxStartPos = 0; int minEndPos = 100000; int start = time(NULL); ifstream referenceFile; m->openInputFile(referenceFileName, referenceFile); while(referenceFile){ Sequence currentSeq(referenceFile); int numAmbigs = currentSeq.getAmbigBases(); if(numAmbigs > 0){ numAmbigSeqs++; } // int startPos = currentSeq.getStartPos(); // if(startPos > maxStartPos) { maxStartPos = startPos; } // // int endPos = currentSeq.getEndPos(); // if(endPos < minEndPos) { minEndPos = endPos; } if (currentSeq.getNumBases() == 0) { m->mothurOut("[WARNING]: " + currentSeq.getName() + " is blank, ignoring.");m->mothurOutEndLine(); }else { referenceSeqs.push_back(currentSeq); } m->gobble(referenceFile); } referenceFile.close(); m->mothurOut("It took " + toString(time(NULL) - start) + " to read " + toString(referenceSeqs.size()) + " sequences.");m->mothurOutEndLine(); numRefs = referenceSeqs.size(); for(int i=0;imothurOut("Warning: " + toString(numAmbigSeqs) + " reference sequences have ambiguous bases, these bases will be ignored\n"); } } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "getReferences"); exit(1); } } //*************************************************************************************************************** Compare SeqErrorCommand::getErrors(Sequence query, Sequence reference){ try { Compare errors; if(query.getAlignLength() != reference.getAlignLength()){ m->mothurOut("Warning: " + toString(query.getName()) + " and " + toString(reference.getName()) + " are different lengths\n"); } int alignLength = query.getAlignLength(); string q = query.getAligned(); string r = reference.getAligned(); int started = 0; errors.sequence = ""; for(int i=0;ierrorOut(e, "SeqErrorCommand", "getErrors"); exit(1); } } //*************************************************************************************************************** map SeqErrorCommand::getWeights(){ ifstream nameFile; m->openInputFile(namesFileName, nameFile); string seqName; string redundantSeqs; map nameCountMap; while(nameFile){ nameFile >> seqName >> redundantSeqs; nameCountMap[seqName] = m->getNumNames(redundantSeqs); m->gobble(nameFile); } nameFile.close(); return nameCountMap; } //*************************************************************************************************************** void SeqErrorCommand::printErrorHeader(ofstream& errorSummaryFile){ try { errorSummaryFile << "query\treference\tweight\t"; errorSummaryFile << "AA\tAT\tAG\tAC\tTA\tTT\tTG\tTC\tGA\tGT\tGG\tGC\tCA\tCT\tCG\tCC\tNA\tNT\tNG\tNC\tAi\tTi\tGi\tCi\tNi\tdA\tdT\tdG\tdC\t"; errorSummaryFile << "insertions\tdeletions\tsubstitutions\tambig\tmatches\tmismatches\ttotal\terror\tnumparents\n"; errorSummaryFile << setprecision(6); errorSummaryFile.setf(ios::fixed); } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "printErrorHeader"); exit(1); } } //*************************************************************************************************************** void SeqErrorCommand::printErrorData(Compare error, int numParentSeqs, ofstream& errorSummaryFile, ofstream& errorSeqFile){ try { errorSummaryFile << error.queryName << '\t' << error.refName << '\t' << error.weight << '\t'; errorSummaryFile << error.AA << '\t' << error.AT << '\t' << error.AG << '\t' << error.AC << '\t'; errorSummaryFile << error.TA << '\t' << error.TT << '\t' << error.TG << '\t' << error.TC << '\t'; errorSummaryFile << error.GA << '\t' << error.GT << '\t' << error.GG << '\t' << error.GC << '\t'; errorSummaryFile << error.CA << '\t' << error.CT << '\t' << error.CG << '\t' << error.CC << '\t'; errorSummaryFile << error.NA << '\t' << error.NT << '\t' << error.NG << '\t' << error.NC << '\t'; errorSummaryFile << error.Ai << '\t' << error.Ti << '\t' << error.Gi << '\t' << error.Ci << '\t' << error.Ni << '\t'; errorSummaryFile << error.dA << '\t' << error.dT << '\t' << error.dG << '\t' << error.dC << '\t'; errorSummaryFile << error.Ai + error.Ti + error.Gi + error.Ci << '\t'; //insertions errorSummaryFile << error.dA + error.dT + error.dG + error.dC << '\t'; //deletions errorSummaryFile << error.mismatches - (error.Ai + error.Ti + error.Gi + error.Ci) - (error.dA + error.dT + error.dG + error.dC) - (error.NA + error.NT + error.NG + error.NC + error.Ni) << '\t'; //substitutions errorSummaryFile << error.NA + error.NT + error.NG + error.NC + error.Ni << '\t'; //ambiguities errorSummaryFile << error.matches << '\t' << error.mismatches << '\t' << error.total << '\t' << error.errorRate << '\t' << numParentSeqs << endl; errorSeqFile << '>' << error.queryName << "\tref:" << error.refName << '\n' << error.sequence << endl; int a=0; int t=1; int g=2; int c=3; int gap=4; int n=5; if(numParentSeqs == 1 || ignoreChimeras == 0){ substitutionMatrix[a][a] += error.weight * error.AA; substitutionMatrix[a][t] += error.weight * error.TA; substitutionMatrix[a][g] += error.weight * error.GA; substitutionMatrix[a][c] += error.weight * error.CA; substitutionMatrix[a][gap] += error.weight * error.dA; substitutionMatrix[a][n] += error.weight * error.NA; substitutionMatrix[t][a] += error.weight * error.AT; substitutionMatrix[t][t] += error.weight * error.TT; substitutionMatrix[t][g] += error.weight * error.GT; substitutionMatrix[t][c] += error.weight * error.CT; substitutionMatrix[t][gap] += error.weight * error.dT; substitutionMatrix[t][n] += error.weight * error.NT; substitutionMatrix[g][a] += error.weight * error.AG; substitutionMatrix[g][t] += error.weight * error.TG; substitutionMatrix[g][g] += error.weight * error.GG; substitutionMatrix[g][c] += error.weight * error.CG; substitutionMatrix[g][gap] += error.weight * error.dG; substitutionMatrix[g][n] += error.weight * error.NG; substitutionMatrix[c][a] += error.weight * error.AC; substitutionMatrix[c][t] += error.weight * error.TC; substitutionMatrix[c][g] += error.weight * error.GC; substitutionMatrix[c][c] += error.weight * error.CC; substitutionMatrix[c][gap] += error.weight * error.dC; substitutionMatrix[c][n] += error.weight * error.NC; substitutionMatrix[gap][a] += error.weight * error.Ai; substitutionMatrix[gap][t] += error.weight * error.Ti; substitutionMatrix[gap][g] += error.weight * error.Gi; substitutionMatrix[gap][c] += error.weight * error.Ci; substitutionMatrix[gap][n] += error.weight * error.Ni; } } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "printErrorData"); exit(1); } } //*************************************************************************************************************** void SeqErrorCommand::printSubMatrix(){ try { string fileNameRoot = outputDir + m->getRootName(m->getSimpleName(queryFileName)); map variables; variables["[filename]"] = fileNameRoot; string subMatrixFileName = getOutputFileName("errormatrix",variables); ofstream subMatrixFile; m->openOutputFile(subMatrixFileName, subMatrixFile); outputNames.push_back(subMatrixFileName); outputTypes["errormatrix"].push_back(subMatrixFileName); vector bases(6); bases[0] = "A"; bases[1] = "T"; bases[2] = "G"; bases[3] = "C"; bases[4] = "Gap"; bases[5] = "N"; vector refSums(5,1); for(int i=0;i<5;i++){ subMatrixFile << "\tr" << bases[i]; for(int j=0;j<6;j++){ refSums[i] += substitutionMatrix[i][j]; } } subMatrixFile << endl; for(int i=0;i<6;i++){ subMatrixFile << 'q' << bases[i]; for(int j=0;j<5;j++){ subMatrixFile << '\t' << substitutionMatrix[j][i]; } subMatrixFile << endl; } subMatrixFile << "total"; for(int i=0;i<5;i++){ subMatrixFile << '\t' << refSums[i]; } subMatrixFile << endl; subMatrixFile.close(); } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "printSubMatrix"); exit(1); } } //*************************************************************************************************************** void SeqErrorCommand::printErrorFRFile(map > errorForward, map > errorReverse){ try{ string fileNameRoot = outputDir + m->getRootName(m->getSimpleName(queryFileName)); map variables; variables["[filename]"] = fileNameRoot; string errorForwardFileName = getOutputFileName("errorforward",variables); ofstream errorForwardFile; m->openOutputFile(errorForwardFileName, errorForwardFile); outputNames.push_back(errorForwardFileName); outputTypes["errorforward"].push_back(errorForwardFileName); errorForwardFile << "position\ttotalseqs\tmatch\tsubstitution\tinsertion\tdeletion\tambiguous" << endl; for(int i=0;iopenOutputFile(errorReverseFileName, errorReverseFile); outputNames.push_back(errorReverseFileName); outputTypes["errorreverse"].push_back(errorReverseFileName); errorReverseFile << "position\ttotalseqs\tmatch\tsubstitution\tinsertion\tdeletion\tambiguous" << endl; for(int i=0;ierrorOut(e, "SeqErrorCommand", "printErrorFRFile"); exit(1); } } //*************************************************************************************************************** void SeqErrorCommand::printErrorQuality(map > qScoreErrorMap){ try{ string fileNameRoot = outputDir + m->getRootName(m->getSimpleName(queryFileName)); map variables; variables["[filename]"] = fileNameRoot; string errorQualityFileName = getOutputFileName("errorquality",variables); ofstream errorQualityFile; m->openOutputFile(errorQualityFileName, errorQualityFile); outputNames.push_back(errorQualityFileName); outputTypes["errorquality"].push_back(errorQualityFileName); errorQualityFile << "qscore\tmatches\tsubstitutions\tinsertions\tambiguous" << endl; for(int i=0;i<101;i++){ errorQualityFile << i << '\t' << qScoreErrorMap['m'][i] << '\t' << qScoreErrorMap['s'][i] << '\t' << qScoreErrorMap['i'][i] << '\t'<< qScoreErrorMap['a'][i] << endl; } errorQualityFile.close(); } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "printErrorQuality"); exit(1); } } //*************************************************************************************************************** void SeqErrorCommand::printQualityFR(vector > qualForwardMap, vector > qualReverseMap){ try{ int numRows = 0; int numColumns = qualForwardMap[0].size(); for(int i=0;igetRootName(m->getSimpleName(queryFileName)); map variables; variables["[filename]"] = fileNameRoot; string qualityForwardFileName = getOutputFileName("errorqualforward",variables); ofstream qualityForwardFile; m->openOutputFile(qualityForwardFileName, qualityForwardFile); outputNames.push_back(qualityForwardFileName); outputTypes["errorqualforward"].push_back(qualityForwardFileName); for(int i=0;iopenOutputFile(qualityReverseFileName, qualityReverseFile); outputNames.push_back(qualityReverseFileName); outputTypes["errorqualreverse"].push_back(qualityReverseFileName); for(int i=0;ierrorOut(e, "SeqErrorCommand", "printQualityFR"); exit(1); } } /**************************************************************************************************/ int SeqErrorCommand::setLines(string filename, string qfilename, string rfilename) { try { vector fastaFilePos; vector qfileFilePos; vector rfileFilePos; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //set file positions for fasta file fastaFilePos = m->divideFile(filename, processors); if (qfilename == "") { } else { //get name of first sequence in each chunk map firstSeqNames; for (int i = 0; i < (fastaFilePos.size()-1); i++) { ifstream in; m->openInputFile(filename, in); in.seekg(fastaFilePos[i]); //adjust start if null strings if (i == 0) { m->zapGremlins(in); m->gobble(in); } Sequence temp(in); firstSeqNames[temp.getName()] = i; in.close(); } //make copy to use below map firstSeqNamesReport = firstSeqNames; if (qfilename != "") { //seach for filePos of each first name in the qfile and save in qfileFilePos ifstream inQual; m->openInputFile(qfilename, inQual); string input; while(!inQual.eof()){ input = m->getline(inQual); if (input.length() != 0) { if(input[0] == '>'){ //this is a sequence name line istringstream nameStream(input); string sname = ""; nameStream >> sname; sname = sname.substr(1); m->checkName(sname); map::iterator it = firstSeqNames.find(sname); if(it != firstSeqNames.end()) { //this is the start of a new chunk unsigned long long pos = inQual.tellg(); qfileFilePos.push_back(pos - input.length() - 1); firstSeqNames.erase(it); } } } if (firstSeqNames.size() == 0) { break; } } inQual.close(); if (firstSeqNames.size() != 0) { for (map::iterator it = firstSeqNames.begin(); it != firstSeqNames.end(); it++) { m->mothurOut(it->first + " is in your fasta file and not in your quality file, aborting."); m->mothurOutEndLine(); } m->control_pressed = true; return processors; } //get last file position of qfile FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (qfilename.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } qfileFilePos.push_back(size); } if(aligned){ //seach for filePos of each first name in the rfile and save in rfileFilePos string junk, input; ifstream inR; m->openInputFile(rfilename, inR); //read column headers for (int i = 0; i < 16; i++) { if (!inR.eof()) { inR >> junk; } else { break; } } while(!inR.eof()){ input = m->getline(inR); if (input.length() != 0) { istringstream nameStream(input); string sname = ""; nameStream >> sname; m->checkName(sname); map::iterator it = firstSeqNamesReport.find(sname); if(it != firstSeqNamesReport.end()) { //this is the start of a new chunk unsigned long long pos = inR.tellg(); rfileFilePos.push_back(pos - input.length() - 1); firstSeqNamesReport.erase(it); } } if (firstSeqNamesReport.size() == 0) { break; } m->gobble(inR); } inR.close(); if (firstSeqNamesReport.size() != 0) { for (map::iterator it = firstSeqNamesReport.begin(); it != firstSeqNamesReport.end(); it++) { m->mothurOut(it->first + " is in your fasta file and not in your report file, aborting."); m->mothurOutEndLine(); } m->control_pressed = true; return processors; } //get last file position of qfile FILE * rFile; unsigned long long sizeR; //get num bytes in file rFile = fopen (rfilename.c_str(),"rb"); if (rFile==NULL) perror ("Error opening file"); else{ fseek (rFile, 0, SEEK_END); sizeR=ftell (rFile); fclose (rFile); } rfileFilePos.push_back(sizeR); } } #else fastaFilePos.push_back(0); qfileFilePos.push_back(0); //get last file position of fastafile FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (filename.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } fastaFilePos.push_back(size); if (qfilename != "") { //get last file position of qualfile FILE * qFile; //get num bytes in file qFile = fopen (qfilename.c_str(),"rb"); if (qFile==NULL) perror ("Error opening file"); else{ fseek (qFile, 0, SEEK_END); size=ftell (qFile); fclose (qFile); } qfileFilePos.push_back(size); } if (reportFileName != "" && aligned == true) { rfileFilePos.push_back(0); //get last file position of qualfile FILE * rFile; //get num bytes in file rFile = fopen (rfilename.c_str(),"rb"); if (rFile==NULL) perror ("Error opening file"); else{ fseek (rFile, 0, SEEK_END); size=ftell (rFile); fclose (rFile); } rfileFilePos.push_back(size); } processors = 1; #endif if (m->control_pressed) { return 0; } for (int i = 0; i < (fastaFilePos.size()-1); i++) { lines.push_back(linePair(fastaFilePos[i], fastaFilePos[(i+1)])); if (qualFileName != "") { qLines.push_back(linePair(qfileFilePos[i], qfileFilePos[(i+1)])); } if (reportFileName != "" && aligned == true) { rLines.push_back(linePair(rfileFilePos[i], rfileFilePos[(i+1)])); } } if(qualFileName == "") { qLines = lines; rLines = lines; } //fills with duds if(aligned == false){ rLines = lines; } return processors; } catch(exception& e) { m->errorOut(e, "SeqErrorCommand", "setLines"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/commands/seqerrorcommand.h000066400000000000000000000045271255543666200216210ustar00rootroot00000000000000#ifndef SEQERRORCOMMAND #define SEQERRORCOMMAND /* * seqerrorcommand.h * Mothur * * Created by Pat Schloss on 7/15/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sequence.hpp" #include "counttable.h" #include "compare.h" class SeqErrorCommand : public Command { public: SeqErrorCommand(string); SeqErrorCommand(); ~SeqErrorCommand(){} vector setParameters(); string getCommandName() { return "seq.error"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Gevers D, Westcott SL (2011). Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 6:e27310.\nhttp://www.mothur.org/wiki/Seq.error"; } string getDescription() { return "seq.error"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; vector processIDS; //processid vector lines; vector qLines; vector rLines; void getReferences(); map getWeights(); Compare getErrors(Sequence, Sequence); void printErrorHeader(ofstream&); void printErrorData(Compare, int, ofstream&, ofstream&); void printSubMatrix(); void printErrorFRFile(map >, map >); void printErrorQuality(map >); void printQualityFR(vector >, vector >); int setLines(string, string, string); int driver(string, string, string, string, string, string, linePair, linePair, linePair); int createProcesses(string, string, string, string, string, string); string queryFileName, referenceFileName, qualFileName, reportFileName, namesFileName, outputDir, countfile; double threshold; bool ignoreChimeras, aligned; int numRefs, processors; int maxLength, totalBases, totalMatches; //ofstream errorSummaryFile, errorSeqFile; vector outputNames; vector referenceSeqs; vector > substitutionMatrix; vector > qualForwardMap; vector > qualReverseMap; vector misMatchCounts; map > qScoreErrorMap; map > errorForward; map > errorReverse; map weights; vector megaAlignVector; }; #endif mothur-1.36.1/source/commands/seqsummarycommand.cpp000066400000000000000000001205231255543666200225130ustar00rootroot00000000000000/* * seqcoordcommand.cpp * Mothur * * Created by Pat Schloss on 5/30/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "seqsummarycommand.h" #include "counttable.h" //********************************************************************************************************************** vector SeqSummaryCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","summary",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SeqSummaryCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SeqSummaryCommand::getHelpString(){ try { string helpString = ""; helpString += "The summary.seqs command reads a fastafile and summarizes the sequences.\n"; helpString += "The summary.seqs command parameters are fasta, name, count and processors, fasta is required, unless you have a valid current fasta file.\n"; helpString += "The name parameter allows you to enter a name file associated with your fasta file. \n"; helpString += "The count parameter allows you to enter a count file associated with your fasta file. \n"; helpString += "The summary.seqs command should be in the following format: \n"; helpString += "summary.seqs(fasta=yourFastaFile, processors=2) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SeqSummaryCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SeqSummaryCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SeqSummaryCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SeqSummaryCommand::SeqSummaryCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SeqSummaryCommand", "SeqSummaryCommand"); exit(1); } } //*************************************************************************************************************** SeqSummaryCommand::SeqSummaryCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter("summary.seqs"); map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //initialize outputTypes vector tempOutNames; outputTypes["summary"] = tempOutNames; //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; } else if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastafile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (namefile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(fastafile); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if (countfile == "") { if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "SeqSummaryCommand", "SeqSummaryCommand"); exit(1); } } //*************************************************************************************************************** int SeqSummaryCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); //set current fasta to fastafile m->setFastaFile(fastafile); map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); string summaryFile = getOutputFileName("summary",variables); long long numSeqs = 0; long long size = 0; map startPosition; map endPosition; map seqLength; map ambigBases; map longHomoPolymer; if (namefile != "") { nameMap = m->readNames(namefile); } else if (countfile != "") { CountTable ct; ct.readTable(countfile, false, false); nameMap = ct.getNameMap(); size = ct.getNumSeqs(); } if (m->control_pressed) { return 0; } vector positions; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(fastafile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } #else positions = m->setFilePosFasta(fastafile, numSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numSeqs - i * numSeqsPerProcessor; } lines.push_back(new linePair(positions[startIndex], numSeqsPerProcessor)); } #endif if(processors == 1){ numSeqs = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, fastafile, summaryFile, lines[0]); }else{ numSeqs = createProcessesCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, fastafile, summaryFile); } if (m->control_pressed) { return 0; } //set size if (countfile != "") {}//already set else if (namefile == "") { size = numSeqs; } else { for (map::iterator it = startPosition.begin(); it != startPosition.end(); it++) { size += it->second; } } long long ptile0_25 = 1+(long long)(size * 0.025); //number of sequences at 2.5% long long ptile25 = 1+(long long)(size * 0.250); //number of sequences at 25% long long ptile50 = 1+(long long)(size * 0.500); long long ptile75 = 1+(long long)(size * 0.750); long long ptile97_5 = 1+(long long)(size * 0.975); long long ptile100 = (long long)(size); vector starts; starts.resize(7,0); vector ends; ends.resize(7,0); vector ambigs; ambigs.resize(7,0); vector lengths; lengths.resize(7,0); vector homops; homops.resize(7,0); //find means long long meanStartPosition, meanEndPosition, meanSeqLength, meanAmbigBases, meanLongHomoPolymer; meanStartPosition = 0; meanEndPosition = 0; meanSeqLength = 0; meanAmbigBases = 0; meanLongHomoPolymer = 0; //minimum if ((startPosition.begin())->first == -1) { starts[0] = 0; } else {starts[0] = (startPosition.begin())->first; } long long totalSoFar = 0; starts[0] = (startPosition.begin())->first; int lastValue = 0; for (map::iterator it = startPosition.begin(); it != startPosition.end(); it++) { int value = it->first; if (value == -1) { value = 0; } meanStartPosition += (value*it->second); totalSoFar += it->second; if (((totalSoFar <= ptile0_25) && (totalSoFar > 1)) || ((lastValue < ptile0_25) && (totalSoFar > ptile0_25))){ starts[1] = value; } //save value if (((totalSoFar <= ptile25) && (totalSoFar > ptile0_25)) || ((lastValue < ptile25) && (totalSoFar > ptile25))) { starts[2] = value; } //save value if (((totalSoFar <= ptile50) && (totalSoFar > ptile25)) || ((lastValue < ptile50) && (totalSoFar > ptile50))) { starts[3] = value; } //save value if (((totalSoFar <= ptile75) && (totalSoFar > ptile50)) || ((lastValue < ptile75) && (totalSoFar > ptile75))) { starts[4] = value; } //save value if (((totalSoFar <= ptile97_5) && (totalSoFar > ptile75)) || ((lastValue < ptile97_5) && (totalSoFar > ptile97_5))) { starts[5] = value; } //save value if ((totalSoFar <= ptile100) && (totalSoFar > ptile97_5)) { starts[6] = value; } //save value lastValue = totalSoFar; } starts[6] = (startPosition.rbegin())->first; if ((endPosition.begin())->first == -1) { ends[0] = 0; } else {ends[0] = (endPosition.begin())->first; } totalSoFar = 0; ends[0] = (endPosition.begin())->first; lastValue = 0; for (map::iterator it = endPosition.begin(); it != endPosition.end(); it++) { int value = it->first; if (value == -1) { value = 0; } meanEndPosition += (value*it->second); totalSoFar += it->second; if (((totalSoFar <= ptile0_25) && (totalSoFar > 1)) || ((lastValue < ptile0_25) && (totalSoFar > ptile0_25))){ ends[1] = value; } //save value if (((totalSoFar <= ptile25) && (totalSoFar > ptile0_25)) || ((lastValue < ptile25) && (totalSoFar > ptile25))) { ends[2] = value; } //save value if (((totalSoFar <= ptile50) && (totalSoFar > ptile25)) || ((lastValue < ptile50) && (totalSoFar > ptile50))) { ends[3] = value; } //save value if (((totalSoFar <= ptile75) && (totalSoFar > ptile50)) || ((lastValue < ptile75) && (totalSoFar > ptile75))) { ends[4] = value; } //save value if (((totalSoFar <= ptile97_5) && (totalSoFar > ptile75)) || ((lastValue < ptile97_5) && (totalSoFar > ptile97_5))) { ends[5] = value; } //save value if ((totalSoFar <= ptile100) && (totalSoFar > ptile97_5)) { ends[6] = value; } //save value lastValue = totalSoFar; } ends[6] = (endPosition.rbegin())->first; lengths[0] = (seqLength.begin())->first; totalSoFar = 0; lastValue = 0; for (map::iterator it = seqLength.begin(); it != seqLength.end(); it++) { int value = it->first; meanSeqLength += (value*it->second); totalSoFar += it->second; if (((totalSoFar <= ptile0_25) && (totalSoFar > 1)) || ((lastValue < ptile0_25) && (totalSoFar > ptile0_25))){ lengths[1] = value; } //save value if (((totalSoFar <= ptile25) && (totalSoFar > ptile0_25)) || ((lastValue < ptile25) && (totalSoFar > ptile25))) { lengths[2] = value; } //save value if (((totalSoFar <= ptile50) && (totalSoFar > ptile25)) || ((lastValue < ptile50) && (totalSoFar > ptile50))) { lengths[3] = value; } //save value if (((totalSoFar <= ptile75) && (totalSoFar > ptile50)) || ((lastValue < ptile75) && (totalSoFar > ptile75))) { lengths[4] = value; } //save value if (((totalSoFar <= ptile97_5) && (totalSoFar > ptile75)) || ((lastValue < ptile97_5) && (totalSoFar > ptile97_5))) { lengths[5] = value; } //save value if ((totalSoFar <= ptile100) && (totalSoFar > ptile97_5)) { lengths[6] = value; } //save value lastValue = totalSoFar; } lengths[6] = (seqLength.rbegin())->first; ambigs[0] = (ambigBases.begin())->first; totalSoFar = 0; lastValue = 0; for (map::iterator it = ambigBases.begin(); it != ambigBases.end(); it++) { int value = it->first; meanAmbigBases += (value*it->second); totalSoFar += it->second; if (((totalSoFar <= ptile0_25) && (totalSoFar > 1)) || ((lastValue < ptile0_25) && (totalSoFar > ptile0_25))){ ambigs[1] = value; } //save value if (((totalSoFar <= ptile25) && (totalSoFar > ptile0_25)) || ((lastValue < ptile25) && (totalSoFar > ptile25))) { ambigs[2] = value; } //save value if (((totalSoFar <= ptile50) && (totalSoFar > ptile25)) || ((lastValue < ptile50) && (totalSoFar > ptile50))) { ambigs[3] = value; } //save value if (((totalSoFar <= ptile75) && (totalSoFar > ptile50)) || ((lastValue < ptile75) && (totalSoFar > ptile75))) { ambigs[4] = value; } //save value if (((totalSoFar <= ptile97_5) && (totalSoFar > ptile75)) || ((lastValue < ptile97_5) && (totalSoFar > ptile97_5))) { ambigs[5] = value; } //save value if ((totalSoFar <= ptile100) && (totalSoFar > ptile97_5)) { ambigs[6] = value; } //save value lastValue = totalSoFar; } ambigs[6] = (ambigBases.rbegin())->first; homops[0] = (longHomoPolymer.begin())->first; totalSoFar = 0; lastValue = 0; for (map::iterator it = longHomoPolymer.begin(); it != longHomoPolymer.end(); it++) { int value = it->first; meanLongHomoPolymer += (it->first*it->second); totalSoFar += it->second; if (((totalSoFar <= ptile0_25) && (totalSoFar > 1)) || ((lastValue < ptile0_25) && (totalSoFar > ptile0_25))){ homops[1] = value; } //save value if (((totalSoFar <= ptile25) && (totalSoFar > ptile0_25)) || ((lastValue < ptile25) && (totalSoFar > ptile25))) { homops[2] = value; } //save value if (((totalSoFar <= ptile50) && (totalSoFar > ptile25)) || ((lastValue < ptile50) && (totalSoFar > ptile50))) { homops[3] = value; } //save value if (((totalSoFar <= ptile75) && (totalSoFar > ptile50)) || ((lastValue < ptile75) && (totalSoFar > ptile75))) { homops[4] = value; } //save value if (((totalSoFar <= ptile97_5) && (totalSoFar > ptile75)) || ((lastValue < ptile97_5) && (totalSoFar > ptile97_5))) { homops[5] = value; } //save value if ((totalSoFar <= ptile100) && (totalSoFar > ptile97_5)) { homops[6] = value; } //save value lastValue = totalSoFar; } homops[6] = (longHomoPolymer.rbegin())->first; double meanstartPosition, meanendPosition, meanseqLength, meanambigBases, meanlongHomoPolymer; meanstartPosition = meanStartPosition / (double) size; meanendPosition = meanEndPosition /(double) size; meanlongHomoPolymer = meanLongHomoPolymer / (double) size; meanseqLength = meanSeqLength / (double) size; meanambigBases = meanAmbigBases /(double) size; //to compensate for blank sequences that would result in startPosition and endPostion equalling -1 if (startPosition[0] == -1) { startPosition[0] = 0; } if (endPosition[0] == -1) { endPosition[0] = 0; } if (m->control_pressed) { m->mothurRemove(summaryFile); return 0; } m->mothurOutEndLine(); m->mothurOut("\t\tStart\tEnd\tNBases\tAmbigs\tPolymer\tNumSeqs"); m->mothurOutEndLine(); m->mothurOut("Minimum:\t" + toString(starts[0]) + "\t" + toString(ends[0]) + "\t" + toString(lengths[0]) + "\t" + toString(ambigs[0]) + "\t" + toString(homops[0]) + "\t" + toString(1)); m->mothurOutEndLine(); m->mothurOut("2.5%-tile:\t" + toString(starts[1]) + "\t" + toString(ends[1]) + "\t" + toString(lengths[1]) + "\t" + toString(ambigs[1]) + "\t" + toString(homops[1]) + "\t" + toString(ptile0_25)); m->mothurOutEndLine(); m->mothurOut("25%-tile:\t" + toString(starts[2]) + "\t" + toString(ends[2]) + "\t" + toString(lengths[2]) + "\t" + toString(ambigs[2]) + "\t" + toString(homops[2]) + "\t" + toString(ptile25)); m->mothurOutEndLine(); m->mothurOut("Median: \t" + toString(starts[3]) + "\t" + toString(ends[3]) + "\t" + toString(lengths[3]) + "\t" + toString(ambigs[3]) + "\t" + toString(homops[3]) + "\t" + toString(ptile50)); m->mothurOutEndLine(); m->mothurOut("75%-tile:\t" + toString(starts[4]) + "\t" + toString(ends[4]) + "\t" + toString(lengths[4]) + "\t" + toString(ambigs[4]) + "\t" + toString(homops[4]) + "\t" + toString(ptile75)); m->mothurOutEndLine(); m->mothurOut("97.5%-tile:\t" + toString(starts[5]) + "\t" + toString(ends[5]) + "\t" + toString(lengths[5]) + "\t" + toString(ambigs[5]) + "\t" + toString(homops[5]) + "\t" + toString(ptile97_5)); m->mothurOutEndLine(); m->mothurOut("Maximum:\t" + toString(starts[6]) + "\t" + toString(ends[6]) + "\t" + toString(lengths[6]) + "\t" + toString(ambigs[6]) + "\t" + toString(homops[6]) + "\t" + toString(ptile100)); m->mothurOutEndLine(); m->mothurOut("Mean:\t" + toString(meanstartPosition) + "\t" + toString(meanendPosition) + "\t" + toString(meanseqLength) + "\t" + toString(meanambigBases) + "\t" + toString(meanlongHomoPolymer)); m->mothurOutEndLine(); if ((namefile == "") && (countfile == "")) { m->mothurOut("# of Seqs:\t" + toString(numSeqs)); m->mothurOutEndLine(); } else { m->mothurOut("# of unique seqs:\t" + toString(numSeqs)); m->mothurOutEndLine(); m->mothurOut("total # of seqs:\t" + toString(size)); m->mothurOutEndLine(); } if (m->control_pressed) { m->mothurRemove(summaryFile); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(summaryFile); m->mothurOutEndLine(); outputNames.push_back(summaryFile); outputTypes["summary"].push_back(summaryFile); m->mothurOutEndLine(); if ((namefile == "") && (countfile == "")) { m->mothurOut("It took " + toString(time(NULL) - start) + " secs to summarize " + toString(numSeqs) + " sequences.\n"); } else{ m->mothurOut("It took " + toString(time(NULL) - start) + " secs to summarize " + toString(size) + " sequences.\n"); } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("summary"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSummaryFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "SeqSummaryCommand", "execute"); exit(1); } } /**************************************************************************************/ long long SeqSummaryCommand::driverCreateSummary(map& startPosition, map& endPosition, map& seqLength, map& ambigBases, map& longHomoPolymer, string filename, string sumFile, linePair* filePos) { try { ofstream outSummary; m->openOutputFile(sumFile, outSummary); ifstream in; m->openInputFile(filename, in); in.seekg(filePos->start); //print header if you are process 0 if (filePos->start == 0) { outSummary << "seqname\tstart\tend\tnbases\tambigs\tpolymer\tnumSeqs" << endl; m->zapGremlins(in); m->gobble(in); } bool done = false; int count = 0; while (!done) { if (m->control_pressed) { in.close(); outSummary.close(); return 1; } if (m->debug) { m->mothurOut("[DEBUG]: count = " + toString(count) + "\n"); } Sequence current(in); m->gobble(in); if (current.getName() != "") { if (m->debug) { m->mothurOut("[DEBUG]: " + current.getName() + '\t' + toString(current.getNumBases()) + "\n"); } int num = 1; if ((namefile != "") || (countfile != "")) { //make sure this sequence is in the namefile, else error map::iterator it = nameMap.find(current.getName()); if (it == nameMap.end()) { m->mothurOut("[ERROR]: '" + current.getName() + "' is not in your name or count file, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } else { num = it->second; } } int thisStartPosition = current.getStartPos(); map::iterator it = startPosition.find(thisStartPosition); if (it == startPosition.end()) { startPosition[thisStartPosition] = num; } //first finding of this start position, set count. else { it->second += num; } //add counts int thisEndPosition = current.getEndPos(); it = endPosition.find(thisEndPosition); if (it == endPosition.end()) { endPosition[thisEndPosition] = num; } //first finding of this end position, set count. else { it->second += num; } //add counts int thisSeqLength = current.getNumBases(); it = seqLength.find(thisSeqLength); if (it == seqLength.end()) { seqLength[thisSeqLength] = num; } //first finding of this length, set count. else { it->second += num; } //add counts int thisAmbig = current.getAmbigBases(); it = ambigBases.find(thisAmbig); if (it == ambigBases.end()) { ambigBases[thisAmbig] = num; } //first finding of this ambig, set count. else { it->second += num; } //add counts int thisHomoP = current.getLongHomoPolymer(); it = longHomoPolymer.find(thisHomoP); if (it == longHomoPolymer.end()) { longHomoPolymer[thisHomoP] = num; } //first finding of this homop, set count. else { it->second += num; } //add counts count++; outSummary << current.getName() << '\t'; outSummary << thisStartPosition << '\t' << thisEndPosition << '\t'; outSummary << thisSeqLength << '\t' << thisAmbig << '\t'; outSummary << thisHomoP << '\t' << num << endl; if (m->debug) { m->mothurOut("[DEBUG]: " + current.getName() + '\t' + toString(num) + "\n"); } } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= filePos->end)) { break; } #else if (in.eof()) { break; } #endif } in.close(); return count; } catch(exception& e) { m->errorOut(e, "SeqSummaryCommand", "driverCreateSummary"); exit(1); } } /**************************************************************************************************/ long long SeqSummaryCommand::createProcessesCreateSummary(map& startPosition, map& endPosition, map& seqLength, map& ambigBases, map& longHomoPolymer, string filename, string sumFile) { try { int process = 1; int num = 0; processIDS.clear(); bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, fastafile, sumFile + m->mothurGetpid(process) + ".temp", lines[process]); //pass numSeqs to parent ofstream out; string tempFile = fastafile + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << startPosition.size() << endl; for (map::iterator it = startPosition.begin(); it != startPosition.end(); it++) { out << it->first << '\t' << it->second << endl; } out << endPosition.size() << endl; for (map::iterator it = endPosition.begin(); it != endPosition.end(); it++) { out << it->first << '\t' << it->second << endl; } out << seqLength.size() << endl; for (map::iterator it = seqLength.begin(); it != seqLength.end(); it++) { out << it->first << '\t' << it->second << endl; } out << ambigBases.size() << endl; for (map::iterator it = ambigBases.begin(); it != ambigBases.end(); it++) { out << it->first << '\t' << it->second << endl; } out << longHomoPolymer.size() << endl; for (map::iterator it = longHomoPolymer.begin(); it != longHomoPolymer.end(); it++) { out << it->first << '\t' << it->second << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(fastafile + (toString(processIDS[i]) + ".num.temp")); m->mothurRemove(sumFile + (toString(processIDS[i]) + ".temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(fastafile + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector positions = m->divideFile(filename, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(new linePair(positions[i], positions[(i+1)])); } startPosition.clear(); endPosition.clear(); seqLength.clear(); ambigBases.clear(); longHomoPolymer.clear(); num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, fastafile, sumFile + m->mothurGetpid(process) + ".temp", lines[process]); //pass numSeqs to parent ofstream out; string tempFile = fastafile + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out << startPosition.size() << endl; for (map::iterator it = startPosition.begin(); it != startPosition.end(); it++) { out << it->first << '\t' << it->second << endl; } out << endPosition.size() << endl; for (map::iterator it = endPosition.begin(); it != endPosition.end(); it++) { out << it->first << '\t' << it->second << endl; } out << seqLength.size() << endl; for (map::iterator it = seqLength.begin(); it != seqLength.end(); it++) { out << it->first << '\t' << it->second << endl; } out << ambigBases.size() << endl; for (map::iterator it = ambigBases.begin(); it != ambigBases.end(); it++) { out << it->first << '\t' << it->second << endl; } out << longHomoPolymer.size() << endl; for (map::iterator it = longHomoPolymer.begin(); it != longHomoPolymer.end(); it++) { out << it->first << '\t' << it->second << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do your part num = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, fastafile, sumFile, lines[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFilename, in); long long tempNum; in >> tempNum; m->gobble(in); num += tempNum; in >> tempNum; m->gobble(in); for (int k = 0; k < tempNum; k++) { long long first, second; in >> first; m->gobble(in); in >> second; m->gobble(in); map::iterator it = startPosition.find(first); if (it == startPosition.end()) { startPosition[first] = second; } //first finding of this start position, set count. else { it->second += second; } //add counts } m->gobble(in); in >> tempNum; m->gobble(in); for (int k = 0; k < tempNum; k++) { long long first, second; in >> first; m->gobble(in); in >> second; m->gobble(in); map::iterator it = endPosition.find(first); if (it == endPosition.end()) { endPosition[first] = second; } //first finding of this end position, set count. else { it->second += second; } //add counts } m->gobble(in); in >> tempNum; m->gobble(in); for (int k = 0; k < tempNum; k++) { long long first, second; in >> first; m->gobble(in); in >> second; m->gobble(in); map::iterator it = seqLength.find(first); if (it == seqLength.end()) { seqLength[first] = second; } //first finding of this end position, set count. else { it->second += second; } //add counts } m->gobble(in); in >> tempNum; m->gobble(in); for (int k = 0; k < tempNum; k++) { long long first, second; in >> first; m->gobble(in); in >> second; m->gobble(in); map::iterator it = ambigBases.find(first); if (it == ambigBases.end()) { ambigBases[first] = second; } //first finding of this end position, set count. else { it->second += second; } //add counts } m->gobble(in); in >> tempNum; m->gobble(in); for (int k = 0; k < tempNum; k++) { long long first, second; in >> first; m->gobble(in); in >> second; m->gobble(in); map::iterator it = longHomoPolymer.find(first); if (it == longHomoPolymer.end()) { longHomoPolymer[first] = second; } //first finding of this end position, set count. else { it->second += second; } //add counts } m->gobble(in); in.close(); m->mothurRemove(tempFilename); m->appendFiles((sumFile + toString(processIDS[i]) + ".temp"), sumFile); m->mothurRemove((sumFile + toString(processIDS[i]) + ".temp")); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the seqSumData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to allow both threads to add info to vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; bool hasNameMap = false; if ((namefile !="") || (countfile != "")) { hasNameMap = true; } //Create processor worker threads. for( int i=0; istart, lines[i]->end, hasNameMap, nameMap); pDataArray.push_back(tempSum); //MySeqSumThreadFunction is in header. It must be global or static to work with the threads. //default security attributes, thread function name, argument to thread function, use default creation flags, returns the thread identifier hThreadArray[i] = CreateThread(NULL, 0, MySeqSumThreadFunction, pDataArray[i], 0, &dwThreadIdArray[i]); } //do your part num = driverCreateSummary(startPosition, endPosition, seqLength, ambigBases, longHomoPolymer, fastafile, (sumFile+toString(processors-1)+".temp"), lines[processors-1]); processIDS.push_back(processors-1); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ num += pDataArray[i]->count; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } for (map::iterator it = pDataArray[i]->startPosition.begin(); it != pDataArray[i]->startPosition.end(); it++) { map::iterator itMain = startPosition.find(it->first); if (itMain == startPosition.end()) { //newValue startPosition[it->first] = it->second; }else { itMain->second += it->second; } //merge counts } for (map::iterator it = pDataArray[i]->endPosition.begin(); it != pDataArray[i]->endPosition.end(); it++) { map::iterator itMain = endPosition.find(it->first); if (itMain == endPosition.end()) { //newValue endPosition[it->first] = it->second; }else { itMain->second += it->second; } //merge counts } for (map::iterator it = pDataArray[i]->seqLength.begin(); it != pDataArray[i]->seqLength.end(); it++) { map::iterator itMain = seqLength.find(it->first); if (itMain == seqLength.end()) { //newValue seqLength[it->first] = it->second; }else { itMain->second += it->second; } //merge counts } for (map::iterator it = pDataArray[i]->ambigBases.begin(); it != pDataArray[i]->ambigBases.end(); it++) { map::iterator itMain = ambigBases.find(it->first); if (itMain == ambigBases.end()) { //newValue ambigBases[it->first] = it->second; }else { itMain->second += it->second; } //merge counts } for (map::iterator it = pDataArray[i]->longHomoPolymer.begin(); it != pDataArray[i]->longHomoPolymer.end(); it++) { map::iterator itMain = longHomoPolymer.find(it->first); if (itMain == longHomoPolymer.end()) { //newValue longHomoPolymer[it->first] = it->second; }else { itMain->second += it->second; } //merge counts } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } //append files for(int i=0;iappendFiles((sumFile + toString(processIDS[i]) + ".temp"), sumFile); m->mothurRemove((sumFile + toString(processIDS[i]) + ".temp")); } #endif return num; } catch(exception& e) { m->errorOut(e, "SeqSummaryCommand", "createProcessesCreateSummary"); exit(1); } } /**********************************************************************************************************************/ mothur-1.36.1/source/commands/seqsummarycommand.h000066400000000000000000000145121255543666200221600ustar00rootroot00000000000000#ifndef SEQSUMMARYCOMMAND_H #define SEQSUMMARYCOMMAND_H /* * seqcoordcommand.h * Mothur * * Created by Pat Schloss on 5/30/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "sequence.hpp" /**************************************************************************************************/ class SeqSummaryCommand : public Command { public: SeqSummaryCommand(string); SeqSummaryCommand(); ~SeqSummaryCommand(){} vector setParameters(); string getCommandName() { return "summary.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Summary.seqs"; } string getDescription() { return "summarize the quality of sequences in an unaligned or aligned fasta file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string fastafile, outputDir, namefile, countfile; int processors; vector outputNames; map nameMap; vector lines; vector processIDS; long long createProcessesCreateSummary(map&, map&, map&, map&, map&, string, string); long long driverCreateSummary(map&, map&, map&, map&, map&, string, string, linePair*); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct seqSumData { map startPosition; map endPosition; map seqLength; map ambigBases; map longHomoPolymer; string filename; string sumFile; unsigned long long start; unsigned long long end; int count; MothurOut* m; bool hasNameMap; map nameMap; seqSumData(){} seqSumData(string f, string sf, MothurOut* mout, unsigned long long st, unsigned long long en, bool na, map nam) { filename = f; sumFile = sf; m = mout; start = st; end = en; hasNameMap = na; nameMap = nam; count = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MySeqSumThreadFunction(LPVOID lpParam){ seqSumData* pDataArray; pDataArray = (seqSumData*)lpParam; try { ofstream outSummary; pDataArray->m->openOutputFile(pDataArray->sumFile, outSummary); ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { outSummary << "seqname\tstart\tend\tnbases\tambigs\tpolymer\tnumSeqs" << endl; in.seekg(0); pDataArray->m->zapGremlins(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { in.close(); outSummary.close(); pDataArray->count = 1; return 1; } Sequence current(in); pDataArray->m->gobble(in); if (current.getName() != "") { int num = 1; if (pDataArray->hasNameMap){ //make sure this sequence is in the namefile, else error map::iterator it = pDataArray->nameMap.find(current.getName()); if (it == pDataArray->nameMap.end()) { pDataArray->m->mothurOut("[ERROR]: " + current.getName() + " is not in your name or count file, please correct."); pDataArray->m->mothurOutEndLine(); pDataArray->m->control_pressed = true; } else { num = it->second; } } int thisStartPosition = current.getStartPos(); map::iterator it = pDataArray->startPosition.find(thisStartPosition); if (it == pDataArray->startPosition.end()) { pDataArray->startPosition[thisStartPosition] = num; } //first finding of this start position, set count. else { it->second += num; } //add counts int thisEndPosition = current.getEndPos(); it = pDataArray->endPosition.find(thisEndPosition); if (it == pDataArray->endPosition.end()) { pDataArray->endPosition[thisEndPosition] = num; } //first finding of this end position, set count. else { it->second += num; } //add counts int thisSeqLength = current.getNumBases(); it = pDataArray->seqLength.find(thisSeqLength); if (it == pDataArray->seqLength.end()) { pDataArray->seqLength[thisSeqLength] = num; } //first finding of this length, set count. else { it->second += num; } //add counts int thisAmbig = current.getAmbigBases(); it = pDataArray->ambigBases.find(thisAmbig); if (it == pDataArray->ambigBases.end()) { pDataArray->ambigBases[thisAmbig] = num; } //first finding of this ambig, set count. else { it->second += num; } //add counts int thisHomoP = current.getLongHomoPolymer(); it = pDataArray->longHomoPolymer.find(thisHomoP); if (it == pDataArray->longHomoPolymer.end()) { pDataArray->longHomoPolymer[thisHomoP] = num; } //first finding of this homop, set count. else { it->second += num; } //add counts pDataArray->count++; outSummary << current.getName() << '\t'; outSummary << thisStartPosition << '\t' << thisEndPosition << '\t'; outSummary << thisSeqLength << '\t' << thisAmbig << '\t'; outSummary << thisHomoP << '\t' << num << endl; } } in.close(); outSummary.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "SeqSummaryCommand", "MySeqSumThreadFunction"); exit(1); } } #endif #endif /**************************************************************************************************/ mothur-1.36.1/source/commands/setcurrentcommand.cpp000066400000000000000000000616731255543666200225150ustar00rootroot00000000000000/* * setcurrentcommand.cpp * Mothur * * Created by westcott on 3/16/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "setcurrentcommand.h" //********************************************************************************************************************** vector SetCurrentCommand::setParameters(){ try { CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pflow("flow", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pflow); CommandParameter pfile("file", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pfile); CommandParameter pbiom("biom", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pbiom); CommandParameter pphylip("phylip", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pphylip); CommandParameter pcolumn("column", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pcolumn); CommandParameter psummary("summary", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(psummary); CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pname); CommandParameter pgroup("group", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(plist); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(ptaxonomy); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pqfile); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos); CommandParameter prabund("rabund", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(prabund); CommandParameter psabund("sabund", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(psabund); CommandParameter pdesign("design", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pdesign); CommandParameter porder("order", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(porder); CommandParameter ptree("tree", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(ptree); CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pshared); CommandParameter pordergroup("ordergroup", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pordergroup); CommandParameter pcount("count", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pcount); CommandParameter prelabund("relabund", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(prelabund); CommandParameter psff("sff", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(psff); CommandParameter poligos("oligos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(poligos); CommandParameter pclear("clear", "String", "", "", "", "", "","",false,false); parameters.push_back(pclear); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SetCurrentCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SetCurrentCommand::getHelpString(){ try { string helpString = ""; helpString += "The set.current command allows you to set the current files saved by mothur.\n"; helpString += "The set.current command parameters are: clear, phylip, column, list, rabund, sabund, name, group, design, order, tree, shared, ordergroup, relabund, fasta, qfile, sff, oligos, accnos, biom, count, summary, file and taxonomy.\n"; helpString += "The clear paramter is used to indicate which file types you would like to clear values for, multiple types can be separated by dashes.\n"; helpString += "The set.current command should be in the following format: \n"; helpString += "set.current(fasta=yourFastaFile) or set.current(fasta=amazon.fasta, clear=name-accnos)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SetCurrentCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** SetCurrentCommand::SetCurrentCommand(){ try { abort = true; calledHelp = true; setParameters(); } catch(exception& e) { m->errorOut(e, "SetCurrentCommand", "SetCurrentCommand"); exit(1); } } //********************************************************************************************************************** SetCurrentCommand::SetCurrentCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("design"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["design"] = inputDir + it->second; } } it = parameters.find("order"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["order"] = inputDir + it->second; } } it = parameters.find("tree"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["tree"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("ordergroup"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["ordergroup"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("relabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["relabund"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("sff"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sff"] = inputDir + it->second; } } it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("flow"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["flow"] = inputDir + it->second; } } it = parameters.find("biom"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["biom"] = inputDir + it->second; } } it = parameters.find("summary"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["summary"] = inputDir + it->second; } } it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } } //check for parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { m->mothurOut("Ignoring: " + parameters["phylip"]); m->mothurOutEndLine(); phylipfile = ""; } else if (phylipfile == "not found") { phylipfile = ""; } if (phylipfile != "") { m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { m->mothurOut("Ignoring: " + parameters["column"]); m->mothurOutEndLine(); columnfile = ""; } else if (columnfile == "not found") { columnfile = ""; } if (columnfile != "") { m->setColumnFile(columnfile); } listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { m->mothurOut("Ignoring: " + parameters["list"]); m->mothurOutEndLine(); listfile = ""; } else if (listfile == "not found") { listfile = ""; } if (listfile != "") { m->setListFile(listfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { m->mothurOut("Ignoring: " + parameters["rabund"]); m->mothurOutEndLine(); rabundfile = ""; } else if (rabundfile == "not found") { rabundfile = ""; } if (rabundfile != "") { m->setRabundFile(rabundfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { m->mothurOut("Ignoring: " + parameters["sabund"]); m->mothurOutEndLine(); sabundfile = ""; } else if (sabundfile == "not found") { sabundfile = ""; } if (sabundfile != "") { m->setSabundFile(sabundfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { m->mothurOut("Ignoring: " + parameters["name"]); m->mothurOutEndLine(); namefile = ""; } else if (namefile == "not found") { namefile = ""; } if (namefile != "") { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { m->mothurOut("Ignoring: " + parameters["group"]); m->mothurOutEndLine(); groupfile = ""; } else if (groupfile == "not found") { groupfile = ""; } if (groupfile != "") { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { m->mothurOut("Ignoring: " + parameters["count"]); m->mothurOutEndLine(); countfile = ""; } else if (countfile == "not found") { countfile = ""; } if (countfile != "") { m->setCountTableFile(countfile); } designfile = validParameter.validFile(parameters, "design", true); if (designfile == "not open") { m->mothurOut("Ignoring: " + parameters["design"]); m->mothurOutEndLine(); designfile = ""; } else if (designfile == "not found") { designfile = ""; } if (designfile != "") { m->setDesignFile(designfile); } orderfile = validParameter.validFile(parameters, "order", true); if (orderfile == "not open") { m->mothurOut("Ignoring: " + parameters["order"]); m->mothurOutEndLine(); orderfile = ""; } else if (orderfile == "not found") { orderfile = ""; } if (orderfile != "") { m->setOrderFile(orderfile); } treefile = validParameter.validFile(parameters, "tree", true); if (treefile == "not open") { m->mothurOut("Ignoring: " + parameters["tree"]); m->mothurOutEndLine(); treefile = ""; } else if (treefile == "not found") { treefile = ""; } if (treefile != "") { m->setTreeFile(treefile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { m->mothurOut("Ignoring: " + parameters["shared"]); m->mothurOutEndLine(); sharedfile = ""; } else if (sharedfile == "not found") { sharedfile = ""; } if (sharedfile != "") { m->setSharedFile(sharedfile); } ordergroupfile = validParameter.validFile(parameters, "ordergroup", true); if (ordergroupfile == "not open") { m->mothurOut("Ignoring: " + parameters["ordergroup"]); m->mothurOutEndLine(); ordergroupfile = ""; } else if (ordergroupfile == "not found") { ordergroupfile = ""; } if (ordergroupfile != "") { m->setOrderGroupFile(ordergroupfile); } relabundfile = validParameter.validFile(parameters, "relabund", true); if (relabundfile == "not open") { m->mothurOut("Ignoring: " + parameters["relabund"]); m->mothurOutEndLine(); relabundfile = ""; } else if (relabundfile == "not found") { relabundfile = ""; } if (relabundfile != "") { m->setRelAbundFile(relabundfile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { m->mothurOut("Ignoring: " + parameters["fasta"]); m->mothurOutEndLine(); fastafile = ""; } else if (fastafile == "not found") { fastafile = ""; } if (fastafile != "") { m->setFastaFile(fastafile); } qualfile = validParameter.validFile(parameters, "qfile", true); if (qualfile == "not open") { m->mothurOut("Ignoring: " + parameters["qfile"]); m->mothurOutEndLine(); qualfile = ""; } else if (qualfile == "not found") { qualfile = ""; } if (qualfile != "") { m->setQualFile(qualfile); } sfffile = validParameter.validFile(parameters, "sff", true); if (sfffile == "not open") { m->mothurOut("Ignoring: " + parameters["sff"]); m->mothurOutEndLine(); sfffile = ""; } else if (sfffile == "not found") { sfffile = ""; } if (sfffile != "") { m->setSFFFile(sfffile); } oligosfile = validParameter.validFile(parameters, "oligos", true); if (oligosfile == "not open") { m->mothurOut("Ignoring: " + parameters["oligos"]); m->mothurOutEndLine(); oligosfile = ""; } else if (oligosfile == "not found") { oligosfile = ""; } if (oligosfile != "") { m->setOligosFile(oligosfile); } accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { m->mothurOut("Ignoring: " + parameters["accnos"]); m->mothurOutEndLine(); accnosfile = ""; } else if (accnosfile == "not found") { accnosfile = ""; } if (accnosfile != "") { m->setAccnosFile(accnosfile); } taxonomyfile = validParameter.validFile(parameters, "taxonomy", true); if (taxonomyfile == "not open") { m->mothurOut("Ignoring: " + parameters["taxonomy"]); m->mothurOutEndLine(); taxonomyfile = ""; } else if (taxonomyfile == "not found") { taxonomyfile = ""; } if (taxonomyfile != "") { m->setTaxonomyFile(taxonomyfile); } flowfile = validParameter.validFile(parameters, "flow", true); if (flowfile == "not open") { m->mothurOut("Ignoring: " + parameters["flow"]); m->mothurOutEndLine(); flowfile = ""; } else if (flowfile == "not found") { flowfile = ""; } if (flowfile != "") { m->setFlowFile(flowfile); } biomfile = validParameter.validFile(parameters, "biom", true); if (biomfile == "not open") { m->mothurOut("Ignoring: " + parameters["biom"]); m->mothurOutEndLine(); biomfile = ""; } else if (biomfile == "not found") { biomfile = ""; } if (biomfile != "") { m->setBiomFile(biomfile); } summaryfile = validParameter.validFile(parameters, "summary", true); if (summaryfile == "not open") { m->mothurOut("Ignoring: " + parameters["summary"]); m->mothurOutEndLine(); summaryfile = ""; } else if (summaryfile == "not found") { summaryfile = ""; } if (summaryfile != "") { m->setSummaryFile(summaryfile); } filefile = validParameter.validFile(parameters, "file", true); if (filefile == "not open") { m->mothurOut("Ignoring: " + parameters["file"]); m->mothurOutEndLine(); filefile = ""; } else if (filefile == "not found") { filefile = ""; } if (filefile != "") { m->setFileFile(filefile); } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); clearTypes = validParameter.validFile(parameters, "clear", false); if (clearTypes == "not found") { clearTypes = ""; } else { m->splitAtDash(clearTypes, types); } } } catch(exception& e) { m->errorOut(e, "SetCurrentCommand", "SetCurrentCommand"); exit(1); } } //********************************************************************************************************************** int SetCurrentCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //user wants to clear a type if (types.size() != 0) { for (int i = 0; i < types.size(); i++) { if (m->control_pressed) { break; } //look for file types if (types[i] == "fasta") { m->setFastaFile(""); }else if (types[i] == "qfile") { m->setQualFile(""); }else if (types[i] == "phylip") { m->setPhylipFile(""); }else if (types[i] == "column") { m->setColumnFile(""); }else if (types[i] == "list") { m->setListFile(""); }else if (types[i] == "rabund") { m->setRabundFile(""); }else if (types[i] == "sabund") { m->setSabundFile(""); }else if (types[i] == "name") { m->setNameFile(""); }else if (types[i] == "group") { m->setGroupFile(""); }else if (types[i] == "order") { m->setOrderFile(""); }else if (types[i] == "ordergroup") { m->setOrderGroupFile(""); }else if (types[i] == "tree") { m->setTreeFile(""); }else if (types[i] == "shared") { m->setSharedFile(""); }else if (types[i] == "relabund") { m->setRelAbundFile(""); }else if (types[i] == "design") { m->setDesignFile(""); }else if (types[i] == "sff") { m->setSFFFile(""); }else if (types[i] == "oligos") { m->setOligosFile(""); }else if (types[i] == "accnos") { m->setAccnosFile(""); }else if (types[i] == "taxonomy") { m->setTaxonomyFile(""); }else if (types[i] == "flow") { m->setFlowFile(""); }else if (types[i] == "biom") { m->setBiomFile(""); }else if (types[i] == "count") { m->setCountTableFile(""); }else if (types[i] == "summary") { m->setSummaryFile(""); }else if (types[i] == "file") { m->setFileFile(""); }else if (types[i] == "processors") { m->setProcessors("1"); }else if (types[i] == "all") { m->clearCurrentFiles(); }else { m->mothurOut("[ERROR]: mothur does not save a current file for " + types[i]); m->mothurOutEndLine(); } } } m->mothurOutEndLine(); m->mothurOut("Current files saved by mothur:"); m->mothurOutEndLine(); m->printCurrentFiles(); return 0; } catch(exception& e) { m->errorOut(e, "SetCurrentCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/setcurrentcommand.h000066400000000000000000000022611255543666200221460ustar00rootroot00000000000000#ifndef SETCURRENTCOMMAND_H #define SETCURRENTCOMMAND_H /* * setcurrentcommand.h * Mothur * * Created by westcott on 3/16/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" class SetCurrentCommand : public Command { public: SetCurrentCommand(string); SetCurrentCommand(); ~SetCurrentCommand() {} vector setParameters(); string getCommandName() { return "set.current"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string){ return ""; } string getCitation() { return "http://www.mothur.org/wiki/Set.current"; } string getDescription() { return "set current files for mothur"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector outputNames; bool abort; string clearTypes; vector types; string accnosfile, phylipfile, columnfile, listfile, rabundfile, sabundfile, namefile, groupfile, designfile, taxonomyfile, biomfile, countfile, summaryfile; string orderfile, treefile, sharedfile, ordergroupfile, relabundfile, fastafile, qualfile, sfffile, oligosfile, processors, flowfile, filefile; }; #endif mothur-1.36.1/source/commands/setdircommand.cpp000066400000000000000000000240671255543666200216050ustar00rootroot00000000000000/* * setoutdircommand.cpp * Mothur * * Created by westcott on 1/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "setdircommand.h" //********************************************************************************************************************** vector SetDirectoryCommand::setParameters(){ try { CommandParameter ptempdefault("tempdefault", "String", "", "", "", "", "","",false,false); parameters.push_back(ptempdefault); CommandParameter pdebug("debug", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pdebug); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pmodnames("modifynames", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pmodnames); CommandParameter pinput("input", "String", "", "", "", "", "","",false,false,true); parameters.push_back(pinput); CommandParameter poutput("output", "String", "", "", "", "", "","",false,false,true); parameters.push_back(poutput); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SetDirectoryCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SetDirectoryCommand::getHelpString(){ try { string helpString = ""; helpString += "The set.dir command can be used to direct the output files generated by mothur to a specific place, the directory must exist.\n"; helpString += "The set.dir command can also be used to specify the directory where your input files are located, the directory must exist.\n"; helpString += "The set.dir command can also be used to override or set the default location mothur will look for files if it is unable to find them, the directory must exist.\n"; helpString += "The set.dir command can also be used to run mothur in debug mode.\n"; helpString += "The set.dir command can also be used to seed random.\n"; helpString += "The set.dir command can also be used to set the modifynames parameter. Default=t, meaning if your sequence names contain ':' change them to '_' to avoid issues while making trees. modifynames=F will leave sequence names as they are.\n"; helpString += "The set.dir command parameters are input, output, tempdefault and debug and one is required.\n"; helpString += "To run mothur in debug mode set debug=true. Default debug=false.\n"; helpString += "To seed random set seed=yourRandomValue. By default mothur seeds random with the start time.\n"; helpString += "To return the output to the same directory as the input files you may enter: output=clear.\n"; helpString += "To return the input to the current working directory you may enter: input=clear.\n"; helpString += "To set the output to the directory where mothur.exe is located you may enter: output=default.\n"; helpString += "To set the input to the directory where mothur.exe is located you may enter: input=default.\n"; helpString += "To return the tempdefault to the default you provided at compile time you may enter: tempdefault=clear.\n"; helpString += "To set the tempdefault to the directory where mothur.exe is located you may enter: tempdefault=default.\n"; helpString += "The set.dir command should be in the following format: set.dir(output=yourOutputDirectory, input=yourInputDirectory, tempdefault=yourTempDefault).\n"; helpString += "Example set.outdir(output=/Users/lab/desktop/outputs, input=/Users/lab/desktop/inputs).\n"; helpString += "Note: No spaces between parameter labels (i.e. output), '=' and parameters (i.e.yourOutputDirectory).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SetDirectoryCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** SetDirectoryCommand::SetDirectoryCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } output = validParameter.validFile(parameters, "output", false); if (output == "not found") { output = ""; } input = validParameter.validFile(parameters, "input", false); if (input == "not found") { input = ""; } tempdefault = validParameter.validFile(parameters, "tempdefault", false); if (tempdefault == "not found") { tempdefault = ""; } bool debug = false; bool nodebug = false; debugorSeedOnly = false; string temp = validParameter.validFile(parameters, "debug", false); if (temp == "not found") { debug = false; nodebug=true; } else { debug = m->isTrue(temp); } m->debug = debug; bool nomod = false; temp = validParameter.validFile(parameters, "modifynames", false); if (temp == "not found") { modifyNames = true; nomod=true; } else { modifyNames = m->isTrue(temp); } m->modifyNames = modifyNames; bool seed = false; temp = validParameter.validFile(parameters, "seed", false); if (temp == "not found") { random = 0; } else { if (m->isInteger(temp)) { m->mothurConvert(temp, random); seed = true; } else { m->mothurOut("[ERROR]: Seed must be an integer for the set.dir command."); m->mothurOutEndLine(); abort = true; } } if (debug) { m->mothurOut("Setting [DEBUG] flag.\n"); } if (seed) { srand(random); m->mothurOut("Setting random seed to " + toString(random) + ".\n\n"); } if ((input == "") && (output == "") && (tempdefault == "") && nodebug && nomod && !seed) { m->mothurOut("[ERROR]: You must provide either an input, output, tempdefault, debug or modifynames for the set.dir command."); m->mothurOutEndLine(); abort = true; }else if((input == "") && (output == "") && (tempdefault == "")) { debugorSeedOnly = true; } } } catch(exception& e) { m->errorOut(e, "SetDirectoryCommand", "SetDirectoryCommand"); exit(1); } } //********************************************************************************************************************** int SetDirectoryCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (debugorSeedOnly) { } else { commandFactory = CommandFactory::getInstance(); m->mothurOut("Mothur's directories:"); m->mothurOutEndLine(); //redirect output if ((output == "clear") || (output == "")) { output = ""; commandFactory->setOutputDirectory(output); } else if (output == "default") { string exepath = m->argv; output = exepath.substr(0, (exepath.find_last_of('m'))); m->mothurOut("outputDir=" + output); m->mothurOutEndLine(); commandFactory->setOutputDirectory(output); }else { if (m->mkDir(output)) { m->mothurOut("outputDir=" + output); m->mothurOutEndLine(); commandFactory->setOutputDirectory(output); } } //redirect input if ((input == "clear") || (input == "")) { input = ""; commandFactory->setInputDirectory(input); } else if (input == "default") { string exepath = m->argv; input = exepath.substr(0, (exepath.find_last_of('m'))); m->mothurOut("inputDir=" + input); m->mothurOutEndLine(); commandFactory->setInputDirectory(input); }else { if (m->dirCheck(input)) { m->mothurOut("inputDir=" + input); m->mothurOutEndLine(); commandFactory->setInputDirectory(input); } } //set default if (tempdefault == "clear") { #ifdef MOTHUR_FILES string temp = MOTHUR_FILES; m->mothurOut("tempDefault=" + temp); m->mothurOutEndLine(); m->setDefaultPath(temp); #else string temp = ""; m->mothurOut("No default directory defined at compile time."); m->mothurOutEndLine(); m->setDefaultPath(temp); #endif }else if (tempdefault == "") { //do nothing }else if (tempdefault == "default") { string exepath = m->argv; tempdefault = exepath.substr(0, (exepath.find_last_of('m'))); m->mothurOut("tempDefault=" + tempdefault); m->mothurOutEndLine(); m->setDefaultPath(tempdefault); }else { if (m->mkDir(tempdefault)) { m->mothurOut("tempDefault=" + tempdefault); m->mothurOutEndLine(); m->setDefaultPath(tempdefault); } } } return 0; } catch(exception& e) { m->errorOut(e, "SetDirectoryCommand", "execute"); exit(1); } } //**********************************************************************************************************************/ mothur-1.36.1/source/commands/setdircommand.h000066400000000000000000000022271255543666200212440ustar00rootroot00000000000000#ifndef SETDIRCOMMAND_H #define SETDIRCOMMAND_H /* * setoutdircommand.h * Mothur * * Created by westcott on 1/21/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "commandfactory.hpp" /**********************************************************/ class SetDirectoryCommand : public Command { public: SetDirectoryCommand(string); SetDirectoryCommand() { abort = true; calledHelp = true; setParameters(); } ~SetDirectoryCommand(){} vector setParameters(); string getCommandName() { return "set.dir"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string){ return ""; } string getCitation() { return "http://www.mothur.org/wiki/Set.dir"; } string getDescription() { return "set input, output and default directories"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CommandFactory* commandFactory; string output, input, tempdefault; bool abort, debugorSeedOnly, modifyNames; int random; vector outputNames; }; /**********************************************************/ #endif mothur-1.36.1/source/commands/setlogfilecommand.cpp000066400000000000000000000101451255543666200224400ustar00rootroot00000000000000/* * setlogfilecommand.cpp * Mothur * * Created by westcott on 4/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "setlogfilecommand.h" //********************************************************************************************************************** vector SetLogFileCommand::setParameters(){ try { CommandParameter pappend("append", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pappend); CommandParameter pname("name", "String", "", "", "", "", "","",false,true,true); parameters.push_back(pname); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SetLogFileCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SetLogFileCommand::getHelpString(){ try { string helpString = ""; helpString += "The set.logfile command can be used to provide a specific name for your logfile and/or to append the log generated by mothur to an existing file.\n"; helpString += "The set.logfile command parameters are name and append, name is required. Append is set to false by default.\n"; helpString += "The set.logfile command should be in the following format: set.logfile(name=yourLogFileName, append=T).\n"; helpString += "Example set.logfile(name=/Users/lab/desktop/output.txt, append=T).\n"; helpString += "Note: No spaces between parameter labels (i.e. name), '=' and parameters (i.e.yourLogFileName).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SetLogFileCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** SetLogFileCommand::SetLogFileCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } name = validParameter.validFile(parameters, "name", false); if (name == "not found") { m->mothurOut("name is a required parameter for the set.logfile command."); abort = true; } string temp = validParameter.validFile(parameters, "append", false); if (temp == "not found") { temp = "F"; } append = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "SetLogFileCommand", "SetLogFileCommand"); exit(1); } } //********************************************************************************************************************** int SetLogFileCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } commandFactory = CommandFactory::getInstance(); string directory = m->hasPath(name); if (directory == "") { commandFactory->setLogfileName(name, append); }else if (m->dirCheck(directory)) { commandFactory->setLogfileName(name, append); } return 0; } catch(exception& e) { m->errorOut(e, "SetLogFileCommand", "execute"); exit(1); } } //**********************************************************************************************************************/ mothur-1.36.1/source/commands/setlogfilecommand.h000066400000000000000000000021131255543666200221010ustar00rootroot00000000000000#ifndef SETLOGFILECOMMAND_H #define SETLOGFILECOMMAND_H /* * setlogfilecommand.h * Mothur * * Created by westcott on 4/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "commandfactory.hpp" /**********************************************************/ class SetLogFileCommand : public Command { public: SetLogFileCommand(string); SetLogFileCommand() { setParameters(); abort = true; calledHelp = true; } ~SetLogFileCommand(){} vector setParameters(); string getCommandName() { return "set.logfile"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string){ return ""; } string getCitation() { return "http://www.mothur.org/wiki/Set.logfile"; } string getDescription() { return "set logfile name"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CommandFactory* commandFactory; string name; bool abort, append; vector outputNames; }; /**********************************************************/ #endif mothur-1.36.1/source/commands/setseedcommand.cpp000066400000000000000000000101271255543666200217370ustar00rootroot00000000000000// // setseedcommand.cpp // Mothur // // Created by Sarah Westcott on 3/24/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #include "setseedcommand.h" //********************************************************************************************************************** vector SetSeedCommand::setParameters(){ try { CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SetSeedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SetSeedCommand::getHelpString(){ try { string helpString = ""; helpString += "The set.seed command is used to seed random.\n"; helpString += "The set.seed command parameter is seed, and it is required.\n"; helpString += "To seed random set seed=yourRandomValue. By default mothur seeds random with the start time.\n"; helpString += "Example set.seed(seed=12345).\n"; helpString += "Note: No spaces between parameter labels (i.e. seed), '=' and parameters (i.e.yourSeedValue).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SetSeedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** SetSeedCommand::SetSeedCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } bool seed = false; string temp = validParameter.validFile(parameters, "seed", false); if (temp == "not found") { random = 0; m->mothurOut("[ERROR]: You must provide a seed value or set seed to clear."); m->mothurOutEndLine(); abort = true;} else if (temp == "clear") { random = time(NULL); seed = true; }else { if (m->isInteger(temp)) { m->mothurConvert(temp, random); seed = true; } else { m->mothurOut("[ERROR]: Seed must be an integer for the set.dir command."); m->mothurOutEndLine(); abort = true; } } } } catch(exception& e) { m->errorOut(e, "SetSeedCommand", "SetSeedCommand"); exit(1); } } //********************************************************************************************************************** int SetSeedCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } srand(random); m->mothurOut("Setting random seed to " + toString(random) + ".\n\n"); return 0; } catch(exception& e) { m->errorOut(e, "SetSeedCommand", "execute"); exit(1); } } //**********************************************************************************************************************/ mothur-1.36.1/source/commands/setseedcommand.h000066400000000000000000000022461255543666200214070ustar00rootroot00000000000000// // setseedcommand.h // Mothur // // Created by Sarah Westcott on 3/24/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #ifndef __Mothur__setseedcommand__ #define __Mothur__setseedcommand__ #include "command.hpp" #include "commandfactory.hpp" /**********************************************************/ class SetSeedCommand : public Command { public: SetSeedCommand(string); SetSeedCommand() { abort = true; calledHelp = true; setParameters(); } ~SetSeedCommand(){} vector setParameters(); string getCommandName() { return "set.seed"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string){ return ""; } string getCitation() { return "http://www.mothur.org/wiki/Set.seed"; } string getDescription() { return "set random seed"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: CommandFactory* commandFactory; bool abort; int random; vector outputNames; }; /**********************************************************/ #endif /* defined(__Mothur__setseedcommand__) */ mothur-1.36.1/source/commands/sffinfocommand.cpp000066400000000000000000002702001255543666200217350ustar00rootroot00000000000000/* * sffinfocommand.cpp * Mothur * * Created by westcott on 7/7/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "sffinfocommand.h" #include "endiannessmacros.h" #include "trimoligos.h" #include "sequence.hpp" #include "qualityscores.h" //********************************************************************************************************************** vector SffInfoCommand::setParameters(){ try { CommandParameter psff("sff", "InputTypes", "", "", "none", "none", "none","",false,false,true); parameters.push_back(psff); CommandParameter poligos("oligos", "InputTypes", "", "", "oligosGroup", "none", "none","",false,false); parameters.push_back(poligos); CommandParameter preorient("checkorient", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(preorient); CommandParameter pgroup("group", "InputTypes", "", "", "oligosGroup", "none", "none","",false,false); parameters.push_back(pgroup); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos); CommandParameter psfftxt("sfftxt", "String", "", "", "", "", "","",false,false); parameters.push_back(psfftxt); CommandParameter pflow("flow", "Boolean", "", "T", "", "", "","flow",false,false); parameters.push_back(pflow); CommandParameter ptrim("trim", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(ptrim); CommandParameter pfasta("fasta", "Boolean", "", "T", "", "", "","fasta",false,false); parameters.push_back(pfasta); CommandParameter pqfile("qfile", "Boolean", "", "T", "", "", "","qfile",false,false); parameters.push_back(pqfile); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ppdiffs); CommandParameter pbdiffs("bdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pbdiffs); CommandParameter pldiffs("ldiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pldiffs); CommandParameter psdiffs("sdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psdiffs); CommandParameter ptdiffs("tdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ptdiffs); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SffInfoCommand::getHelpString(){ try { string helpString = ""; helpString += "The sffinfo command reads a sff file and extracts the sequence data, or you can use it to parse a sfftxt file.\n"; helpString += "The sffinfo command parameters are sff, fasta, qfile, accnos, flow, sfftxt, oligos, group, bdiffs, tdiffs, ldiffs, sdiffs, pdiffs, checkorient and trim. sff is required. \n"; helpString += "The sff parameter allows you to enter the sff file you would like to extract data from. You may enter multiple files by separating them by -'s.\n"; helpString += "The fasta parameter allows you to indicate if you would like a fasta formatted file generated. Default=True. \n"; helpString += "The qfile parameter allows you to indicate if you would like a quality file generated. Default=True. \n"; helpString += "The oligos parameter allows you to provide an oligos file to split your sff file into separate sff files by barcode. \n"; helpString += "The group parameter allows you to provide a group file to split your sff file into separate sff files by group. \n"; helpString += "The tdiffs parameter is used to specify the total number of differences allowed in the sequence. The default is pdiffs + bdiffs + sdiffs + ldiffs.\n"; helpString += "The bdiffs parameter is used to specify the number of differences allowed in the barcode. The default is 0.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; helpString += "The ldiffs parameter is used to specify the number of differences allowed in the linker. The default is 0.\n"; helpString += "The sdiffs parameter is used to specify the number of differences allowed in the spacer. The default is 0.\n"; helpString += "The checkorient parameter will check look for the reverse compliment of the barcode or primer in the sequence. The default is false.\n"; helpString += "The flow parameter allows you to indicate if you would like a flowgram file generated. Default=True. \n"; helpString += "The sfftxt parameter allows you to indicate if you would like a sff.txt file generated. Default=False. \n"; helpString += "If you want to parse an existing sfftxt file into flow, fasta and quality file, enter the file name using the sfftxt parameter. \n"; helpString += "The trim parameter allows you to indicate if you would like a sequences and quality scores trimmed to the clipQualLeft and clipQualRight values. Default=True. \n"; helpString += "The accnos parameter allows you to provide a accnos file containing the names of the sequences you would like extracted. You may enter multiple files by separating them by -'s. \n"; helpString += "Example sffinfo(sff=mySffFile.sff, trim=F).\n"; helpString += "Note: No spaces between parameter labels (i.e. sff), '=' and parameters (i.e.yourSffFileName).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SffInfoCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],fasta-[filename],[tag],fasta"; } else if (type == "flow") { pattern = "[filename],flow"; } else if (type == "sfftxt") { pattern = "[filename],sff.txt"; } else if (type == "sff") { pattern = "[filename],[group],sff"; } else if (type == "qfile") { pattern = "[filename],qual-[filename],[tag],qual"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SffInfoCommand::SffInfoCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["flow"] = tempOutNames; outputTypes["sfftxt"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["sff"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "SffInfoCommand"); exit(1); } } //********************************************************************************************************************** SffInfoCommand::SffInfoCommand(string option) { try { abort = false; calledHelp = false; hasAccnos = false; hasOligos = false; hasGroup = false; split = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["flow"] = tempOutNames; outputTypes["sfftxt"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["sff"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } sffFilename = validParameter.validFile(parameters, "sff", false); if (sffFilename == "not found") { sffFilename = ""; } else { m->splitAtDash(sffFilename, filenames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < filenames.size(); i++) { bool ignore = false; if (filenames[i] == "current") { filenames[i] = m->getSFFFile(); if (filenames[i] != "") { m->mothurOut("Using " + filenames[i] + " as input file for the sff parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sfffile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list filenames.erase(filenames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(filenames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { filenames[i] = inputDir + filenames[i]; } } ifstream in; int ableToOpen = m->openInputFile(filenames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(filenames[i]); m->mothurOut("Unable to open " + filenames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); filenames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(filenames[i]); m->mothurOut("Unable to open " + filenames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); filenames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + filenames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list filenames.erase(filenames.begin()+i); i--; }else { m->setSFFFile(filenames[i]); } } } //make sure there is at least one valid file left if (filenames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } accnosName = validParameter.validFile(parameters, "accnos", false); if (accnosName == "not found") { accnosName = ""; } else { hasAccnos = true; m->splitAtDash(accnosName, accnosFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < accnosFileNames.size(); i++) { bool ignore = false; if (accnosFileNames[i] == "current") { accnosFileNames[i] = m->getAccnosFile(); if (accnosFileNames[i] != "") { m->mothurOut("Using " + accnosFileNames[i] + " as input file for the accnos parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current accnosfile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list accnosFileNames.erase(accnosFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(accnosFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { accnosFileNames[i] = inputDir + accnosFileNames[i]; } } ifstream in; int ableToOpen = m->openInputFile(accnosFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(accnosFileNames[i]); m->mothurOut("Unable to open " + accnosFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); accnosFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(accnosFileNames[i]); m->mothurOut("Unable to open " + accnosFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); accnosFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + accnosFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list accnosFileNames.erase(accnosFileNames.begin()+i); i--; } } } //make sure there is at least one valid file left if (accnosFileNames.size() == 0) { m->mothurOut("no valid files."); m->mothurOutEndLine(); abort = true; } } oligosfile = validParameter.validFile(parameters, "oligos", false); if (oligosfile == "not found") { oligosfile = ""; } else { hasOligos = true; m->splitAtDash(oligosfile, oligosFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < oligosFileNames.size(); i++) { bool ignore = false; if (oligosFileNames[i] == "current") { oligosFileNames[i] = m->getOligosFile(); if (oligosFileNames[i] != "") { m->mothurOut("Using " + oligosFileNames[i] + " as input file for the oligos parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current oligosfile, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list oligosFileNames.erase(oligosFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(oligosFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { oligosFileNames[i] = inputDir + oligosFileNames[i]; } } ifstream in; int ableToOpen = m->openInputFile(oligosFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(oligosFileNames[i]); m->mothurOut("Unable to open " + oligosFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); oligosFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(oligosFileNames[i]); m->mothurOut("Unable to open " + oligosFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); oligosFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + oligosFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list oligosFileNames.erase(oligosFileNames.begin()+i); i--; } } } //make sure there is at least one valid file left if (oligosFileNames.size() == 0) { m->mothurOut("no valid oligos files."); m->mothurOutEndLine(); abort = true; } } groupfile = validParameter.validFile(parameters, "group", false); if (groupfile == "not found") { groupfile = ""; } else { hasGroup = true; m->splitAtDash(groupfile, groupFileNames); //go through files and make sure they are good, if not, then disregard them for (int i = 0; i < groupFileNames.size(); i++) { bool ignore = false; if (groupFileNames[i] == "current") { groupFileNames[i] = m->getGroupFile(); if (groupFileNames[i] != "") { m->mothurOut("Using " + groupFileNames[i] + " as input file for the group parameter where you had given current."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current group file, ignoring current."); m->mothurOutEndLine(); ignore=true; //erase from file list groupFileNames.erase(groupFileNames.begin()+i); i--; } } if (!ignore) { if (inputDir != "") { string path = m->hasPath(groupFileNames[i]); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { groupFileNames[i] = inputDir + groupFileNames[i]; } } ifstream in; int ableToOpen = m->openInputFile(groupFileNames[i], in, "noerror"); //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(groupFileNames[i]); m->mothurOut("Unable to open " + groupFileNames[i] + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupFileNames[i] = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(groupFileNames[i]); m->mothurOut("Unable to open " + groupFileNames[i] + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); groupFileNames[i] = tryPath; } } in.close(); if (ableToOpen == 1) { m->mothurOut("Unable to open " + groupFileNames[i] + ". It will be disregarded."); m->mothurOutEndLine(); //erase from file list groupFileNames.erase(groupFileNames.begin()+i); i--; } } } //make sure there is at least one valid file left if (groupFileNames.size() == 0) { m->mothurOut("no valid group files."); m->mothurOutEndLine(); abort = true; } } if (hasGroup) { split = 2; if (groupFileNames.size() != filenames.size()) { abort = true; m->mothurOut("If you provide a group file, you must have one for each sff file."); m->mothurOutEndLine(); } } if (hasOligos) { split = 2; if (oligosFileNames.size() != filenames.size()) { abort = true; m->mothurOut("If you provide an oligos file, you must have one for each sff file."); m->mothurOutEndLine(); } } if (hasGroup && hasOligos) { m->mothurOut("You must enter ONLY ONE of the following: oligos or group."); m->mothurOutEndLine(); abort = true;} if (hasAccnos) { if (accnosFileNames.size() != filenames.size()) { abort = true; m->mothurOut("If you provide a accnos file, you must have one for each sff file."); m->mothurOutEndLine(); } } string temp = validParameter.validFile(parameters, "qfile", false); if (temp == "not found"){ temp = "T"; } qual = m->isTrue(temp); temp = validParameter.validFile(parameters, "fasta", false); if (temp == "not found"){ temp = "T"; } fasta = m->isTrue(temp); temp = validParameter.validFile(parameters, "flow", false); if (temp == "not found"){ temp = "T"; } flow = m->isTrue(temp); temp = validParameter.validFile(parameters, "trim", false); if (temp == "not found"){ temp = "T"; } trim = m->isTrue(temp); temp = validParameter.validFile(parameters, "bdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, bdiffs); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, pdiffs); temp = validParameter.validFile(parameters, "ldiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, ldiffs); temp = validParameter.validFile(parameters, "sdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, sdiffs); temp = validParameter.validFile(parameters, "tdiffs", false); if (temp == "not found") { int tempTotal = pdiffs + bdiffs + ldiffs + sdiffs; temp = toString(tempTotal); } m->mothurConvert(temp, tdiffs); if(tdiffs == 0){ tdiffs = bdiffs + pdiffs + ldiffs + sdiffs; } temp = validParameter.validFile(parameters, "sfftxt", false); if (temp == "not found") { temp = "F"; sfftxt = false; sfftxtFilename = ""; } else if (m->isTrue(temp)) { sfftxt = true; sfftxtFilename = ""; } else { //you are a filename if (inputDir != "") { map::iterator it = parameters.find("sfftxt"); //user has given a template file if(it != parameters.end()){ string path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sfftxt"] = inputDir + it->second; } } } sfftxtFilename = validParameter.validFile(parameters, "sfftxt", true); if (sfftxtFilename == "not found") { sfftxtFilename = ""; } else if (sfftxtFilename == "not open") { sfftxtFilename = ""; } } if ((sfftxtFilename == "") && (filenames.size() == 0)) { //if there is a current sff file, use it string filename = m->getSFFFile(); if (filename != "") { filenames.push_back(filename); m->mothurOut("Using " + filename + " as input file for the sff parameter."); m->mothurOutEndLine(); } else { m->mothurOut("[ERROR]: you must provide a valid sff or sfftxt file."); m->mothurOutEndLine(); abort=true; } } temp = validParameter.validFile(parameters, "checkorient", false); if (temp == "not found") { temp = "F"; } reorient = m->isTrue(temp); } } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "SffInfoCommand"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } for (int s = 0; s < filenames.size(); s++) { if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } int start = time(NULL); filenames[s] = m->getFullPathName(filenames[s]); m->mothurOut("Extracting info from " + filenames[s] + " ..." ); m->mothurOutEndLine(); string accnos = ""; if (hasAccnos) { accnos = accnosFileNames[s]; } string oligos = ""; if (hasOligos) { oligos = oligosFileNames[s]; } if (hasGroup) { oligos = groupFileNames[s]; } int numReads = extractSffInfo(filenames[s], accnos, oligos); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to extract " + toString(numReads) + ".\n"); } if (sfftxtFilename != "") { parseSffTxt(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("qfile"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setQualFile(current); } } itTypes = outputTypes.find("flow"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFlowFile(current); } } //report output filenames m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::extractSffInfo(string input, string accnos, string oligos){ try { oligosObject = new Oligos(); currentFileName = input; if (outputDir == "") { outputDir += m->hasPath(input); } if (accnos != "") { readAccnosFile(accnos); } else { seqNames.clear(); } TrimOligos* trimOligos = NULL; TrimOligos* rtrimOligos = NULL; if (hasOligos) { readOligos(oligos); split = 2; if (m->control_pressed) { delete oligosObject; return 0; } trimOligos = new TrimOligos(pdiffs, bdiffs, ldiffs, sdiffs, oligosObject->getPrimers(), oligosObject->getBarcodes(), oligosObject->getReversePrimers(), oligosObject->getLinkers(), oligosObject->getSpacers()); numFPrimers = oligosObject->getPrimers().size(); numBarcodes = oligosObject->getBarcodes().size(); if (reorient) { rtrimOligos = new TrimOligos(pdiffs, bdiffs, 0, 0, oligosObject->getReorientedPairedPrimers(), oligosObject->getReorientedPairedBarcodes(), false); numBarcodes = oligosObject->getReorientedPairedBarcodes().size(); } } if (hasGroup) { readGroup(oligos); split = 2; } ofstream outSfftxt, outFasta, outQual, outFlow; string outFastaFileName, outQualFileName; string rootName = outputDir + m->getRootName(m->getSimpleName(input)); if(rootName.find_last_of(".") == rootName.npos){ rootName += "."; } map variables; variables["[filename]"] = rootName; string sfftxtFileName = getOutputFileName("sfftxt",variables); string outFlowFileName = getOutputFileName("flow",variables); if (!trim) { variables["[tag]"] = "raw"; } outFastaFileName = getOutputFileName("fasta",variables); outQualFileName = getOutputFileName("qfile",variables); if (sfftxt) { m->openOutputFile(sfftxtFileName, outSfftxt); outSfftxt.setf(ios::fixed, ios::floatfield); outSfftxt.setf(ios::showpoint); outputNames.push_back(sfftxtFileName); outputTypes["sfftxt"].push_back(sfftxtFileName); } if (fasta) { m->openOutputFile(outFastaFileName, outFasta); outputNames.push_back(outFastaFileName); outputTypes["fasta"].push_back(outFastaFileName); } if (qual) { m->openOutputFile(outQualFileName, outQual); outputNames.push_back(outQualFileName); outputTypes["qfile"].push_back(outQualFileName); } if (flow) { m->openOutputFile(outFlowFileName, outFlow); outputNames.push_back(outFlowFileName); outFlow.setf(ios::fixed, ios::floatfield); outFlow.setf(ios::showpoint); outputTypes["flow"].push_back(outFlowFileName); } ifstream in; m->openInputFileBinary(input, in); CommonHeader header; readCommonHeader(in, header); int count = 0; //check magic number and version if (header.magicNumber != 779314790) { m->mothurOut("Magic Number is not correct, not a valid .sff file"); m->mothurOutEndLine(); delete oligosObject; if (hasOligos) { delete trimOligos; if (reorient) { delete rtrimOligos; } } return count; } if (header.version != "0001") { m->mothurOut("Version is not supported, only support version 0001."); m->mothurOutEndLine(); delete oligosObject; if (hasOligos) { delete trimOligos; if (reorient) { delete rtrimOligos; } } return count; } //print common header if (sfftxt) { printCommonHeader(outSfftxt, header); } if (flow) { outFlow << header.numFlowsPerRead << endl; } //ofstream outtemp; //m->openOutputFileBinary("./temp", outtemp); //printCommonHeaderForDebug(header, outtemp, 20000); //outtemp.close(); //read through the sff file while (!in.eof()) { bool print = true; //read data seqRead read; Header readheader; readSeqData(in, read, header.numFlowsPerRead, readheader, trimOligos, rtrimOligos); bool okay = sanityCheck(readheader, read); if (!okay) { break; } //cout << readheader.name << endl; //if you have provided an accosfile and this seq is not in it, then dont print if (seqNames.size() != 0) { if (seqNames.count(readheader.name) == 0) { print = false; } } //print if (print) { if (sfftxt) { printHeader(outSfftxt, readheader); printSffTxtSeqData(outSfftxt, read, readheader); } if (fasta) { printFastaSeqData(outFasta, read, readheader); } if (qual) { printQualSeqData(outQual, read, readheader); } if (flow) { printFlowSeqData(outFlow, read, readheader); } } count++; //report progress if((count+1) % 10000 == 0){ m->mothurOut(toString(count+1)); m->mothurOutEndLine(); } if (m->control_pressed) { count = 0; break; } if (count >= header.numReads) { break; } } //report progress if (!m->control_pressed) { if((count) % 10000 != 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } } in.close(); if (sfftxt) { outSfftxt.close(); } if (fasta) { outFasta.close(); } if (qual) { outQual.close(); } if (flow) { outFlow.close(); } if (split > 1) { //create new common headers for each file with the correct number of reads adjustCommonHeader(header); if (hasGroup) { delete groupMap; } //cout << "here" << endl; map::iterator it; set namesToRemove; for(int i=0;iisBlank(filehandles[i][j])){ //cout << i << '\t' << '\t' << j << '\t' << filehandles[i][j] << " is blank removing" << endl; m->mothurRemove(filehandles[i][j]); m->mothurRemove(filehandlesHeaders[i][j]); namesToRemove.insert(filehandles[i][j]); } } } } } //cout << "here2" << endl; //append new header to reads for (int i = 0; i < filehandles.size(); i++) { for (int j = 0; j < filehandles[i].size(); j++) { if (filehandles[i][j] != "") { m->appendSFFFiles(filehandles[i][j], filehandlesHeaders[i][j]); m->renameFile(filehandlesHeaders[i][j], filehandles[i][j]); m->mothurRemove(filehandlesHeaders[i][j]); //cout << i << '\t' << '\t' << j << '\t' << filehandles[i][j] << " done appending headers and removing " << filehandlesHeaders[i][j] << endl; if (numSplitReads[i][j] == 0) { m->mothurRemove(filehandles[i][j]); } } } } //cout << "here3" << endl; //remove names for outputFileNames, just cleans up the output for(int i = 0; i < outputNames.size(); i++) { if (namesToRemove.count(outputNames[i]) != 0) { //cout << "erasing " << i << '\t' << outputNames[i] << endl; outputNames.erase(outputNames.begin()+i); i--; }else { outputTypes["sff"].push_back(outputNames[i]); } } //cout << "here4" << endl; if(m->isBlank(noMatchFile)){ m->mothurRemove(noMatchFile); } else { outputNames.push_back(noMatchFile); outputTypes["sff"].push_back(noMatchFile); } } delete oligosObject; if (hasOligos) { delete trimOligos; if (reorient) { delete rtrimOligos; } } return count; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "extractSffInfo"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::readCommonHeader(ifstream& in, CommonHeader& header){ try { if (!in.eof()) { //read magic number char buffer[4]; in.read(buffer, 4); header.magicNumber = be_int4(*(unsigned int *)(&buffer)); //read version char buffer9[4]; in.read(buffer9, 4); header.version = ""; for (int i = 0; i < 4; i++) { header.version += toString((int)(buffer9[i])); } //read offset char buffer2 [8]; in.read(buffer2, 8); header.indexOffset = be_int8(*(unsigned long long *)(&buffer2)); //read index length char buffer3 [4]; in.read(buffer3, 4); header.indexLength = be_int4(*(unsigned int *)(&buffer3)); //read num reads char buffer4 [4]; in.read(buffer4, 4); header.numReads = be_int4(*(unsigned int *)(&buffer4)); if (m->debug) { m->mothurOut("[DEBUG]: numReads = " + toString(header.numReads) + "\n"); } //read header length char buffer5 [2]; in.read(buffer5, 2); header.headerLength = be_int2(*(unsigned short *)(&buffer5)); //read key length char buffer6 [2]; in.read(buffer6, 2); header.keyLength = be_int2(*(unsigned short *)(&buffer6)); //read number of flow reads char buffer7 [2]; in.read(buffer7, 2); header.numFlowsPerRead = be_int2(*(unsigned short *)(&buffer7)); //read format code char buffer8 [1]; in.read(buffer8, 1); header.flogramFormatCode = (int)(buffer8[0]); //read flow chars char* tempBuffer = new char[header.numFlowsPerRead]; in.read(&(*tempBuffer), header.numFlowsPerRead); header.flowChars = tempBuffer; if (header.flowChars.length() > header.numFlowsPerRead) { header.flowChars = header.flowChars.substr(0, header.numFlowsPerRead); } delete[] tempBuffer; //read key char* tempBuffer2 = new char[header.keyLength]; in.read(&(*tempBuffer2), header.keyLength); header.keySequence = tempBuffer2; if (header.keySequence.length() > header.keyLength) { header.keySequence = header.keySequence.substr(0, header.keyLength); } delete[] tempBuffer2; /* Pad to 8 chars */ unsigned long long spotInFile = in.tellg(); unsigned long long spot = (spotInFile + 7)& ~7; // ~ inverts in.seekg(spot); }else{ m->mothurOut("Error reading sff common header."); m->mothurOutEndLine(); } return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "readCommonHeader"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::adjustCommonHeader(CommonHeader header){ try { string endian = m->findEdianness(); char* mybuffer = new char[4]; ifstream in; m->openInputFileBinary(currentFileName, in); ofstream outNoMatchHeader; string tempNoHeader = "tempNoMatchHeader"; m->openOutputFileBinary(tempNoHeader, outNoMatchHeader); //magic number in.read(mybuffer,4); streamsize lengthRead = in.gcount(); for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, lengthRead); out.close(); } } outNoMatchHeader.write(mybuffer, lengthRead); delete[] mybuffer; //version mybuffer = new char[4]; in.read(mybuffer,4); lengthRead = in.gcount(); for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, lengthRead); out.close(); } } outNoMatchHeader.write(mybuffer, lengthRead); delete[] mybuffer; //offset mybuffer = new char[8]; in.read(mybuffer,8); lengthRead = in.gcount(); unsigned long long offset = 0; char* thisbuffer = new char[8]; thisbuffer[0] = (offset >> 56) & 0xFF; thisbuffer[1] = (offset >> 48) & 0xFF; thisbuffer[2] = (offset >> 40) & 0xFF; thisbuffer[3] = (offset >> 32) & 0xFF; thisbuffer[4] = (offset >> 24) & 0xFF; thisbuffer[5] = (offset >> 16) & 0xFF; thisbuffer[6] = (offset >> 8) & 0xFF; thisbuffer[7] = offset & 0xFF; for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << thisbuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(thisbuffer, 8); out.close(); } } outNoMatchHeader.write(thisbuffer, 8); delete[] thisbuffer; delete[] mybuffer; //read index length mybuffer = new char[4]; in.read(mybuffer,4); lengthRead = in.gcount(); offset = 0; char* thisbuffer2 = new char[4]; thisbuffer2[0] = (offset >> 24) & 0xFF; thisbuffer2[1] = (offset >> 16) & 0xFF; thisbuffer2[2] = (offset >> 8) & 0xFF; thisbuffer2[3] = offset & 0xFF; for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << thisbuffer2 << '\t' << filehandlesHeaders[i][j] << endl; out.write(thisbuffer2, 4); out.close(); } } outNoMatchHeader.write(thisbuffer2, 4); delete[] thisbuffer2; delete[] mybuffer; //change num reads mybuffer = new char[4]; in.read(mybuffer,4); lengthRead = in.gcount(); delete[] mybuffer; for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { //cout << filehandlesHeaders[i][j] << '\t' << numSplitReads[i][j] << endl; char* thisbuffer = new char[4]; thisbuffer[0] = (numSplitReads[i][j] >> 24) & 0xFF; thisbuffer[1] = (numSplitReads[i][j] >> 16) & 0xFF; thisbuffer[2] = (numSplitReads[i][j] >> 8) & 0xFF; thisbuffer[3] = numSplitReads[i][j] & 0xFF; ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << thisbuffer << '\t' << filehandlesHeaders[i][j] << endl; //unsigned int numTReads = (be_int4(*(unsigned int *)(thisbuffer))); //cout << "numReads = " << numTReads << endl; out.write(thisbuffer, 4); out.close(); delete[] thisbuffer; } } char* thisbuffer3 = new char[4]; thisbuffer3[0] = (numNoMatch >> 24) & 0xFF; thisbuffer3[1] = (numNoMatch >> 16) & 0xFF; thisbuffer3[2] = (numNoMatch >> 8) & 0xFF; thisbuffer3[3] = numNoMatch & 0xFF; outNoMatchHeader.write(thisbuffer3, 4); delete[] thisbuffer3; //read header length mybuffer = new char[2]; in.read(mybuffer,2); lengthRead = in.gcount(); for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, lengthRead); out.close(); } } outNoMatchHeader.write(mybuffer, lengthRead); delete[] mybuffer; //read key length mybuffer = new char[2]; in.read(mybuffer,2); lengthRead = in.gcount(); for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, lengthRead); out.close(); } } outNoMatchHeader.write(mybuffer, lengthRead); delete[] mybuffer; //read number of flow reads mybuffer = new char[2]; in.read(mybuffer,2); lengthRead = in.gcount(); for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, lengthRead); out.close(); } } outNoMatchHeader.write(mybuffer, lengthRead); delete[] mybuffer; //read format code mybuffer = new char[1]; in.read(mybuffer,1); lengthRead = in.gcount(); for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, lengthRead); out.close(); } } outNoMatchHeader.write(mybuffer, lengthRead); delete[] mybuffer; //read flow chars mybuffer = new char[header.numFlowsPerRead]; in.read(mybuffer,header.numFlowsPerRead); lengthRead = in.gcount(); for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, lengthRead); out.close(); } } outNoMatchHeader.write(mybuffer, lengthRead); delete[] mybuffer; //read key mybuffer = new char[header.keyLength]; in.read(mybuffer,header.keyLength); lengthRead = in.gcount(); for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, lengthRead); out.close(); } } outNoMatchHeader.write(mybuffer, lengthRead); delete[] mybuffer; /* Pad to 8 chars */ unsigned long long spotInFile = in.tellg(); unsigned long long spot = (spotInFile + 7)& ~7; // ~ inverts in.seekg(spot); mybuffer = new char[spot-spotInFile]; for (int i = 0; i < filehandlesHeaders.size(); i++) { for (int j = 0; j < filehandlesHeaders[i].size(); j++) { ofstream out; int able = m->openOutputFileBinaryAppend(filehandlesHeaders[i][j], out); //cout << able << '\t' << mybuffer << '\t' << filehandlesHeaders[i][j] << endl; out.write(mybuffer, spot-spotInFile); out.close(); } } outNoMatchHeader.write(mybuffer, spot-spotInFile); outNoMatchHeader.close(); delete[] mybuffer; in.close(); m->appendSFFFiles(noMatchFile, tempNoHeader); m->renameFile(tempNoHeader, noMatchFile); m->mothurRemove(tempNoHeader); return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "adjustCommonHeader"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::printCommonHeaderForDebug(CommonHeader& header, ofstream& out, int numReads){ try { string endian = m->findEdianness(); ifstream in; m->openInputFileBinary(currentFileName, in); //magic number char* mybuffer = new char[4]; in.read(mybuffer,4); out.write(mybuffer, in.gcount()); unsigned int magic = be_int4(*(unsigned int *)(mybuffer)); string contents = toString(magic); m->mothurOut("magicNumber = " + contents + "\n"); delete[] mybuffer; //version char* mybuffer1 = new char[4]; in.read(mybuffer1,4); out.write(mybuffer1, in.gcount()); contents = ""; for (int i = 0; i < 4; i++) { contents += toString((int)(mybuffer1[i])); } m->mothurOut("version = " + contents + "\n"); delete[] mybuffer1; //offset char* mybuffer2 = new char[8]; in.read(mybuffer2,8); unsigned long long offset = 0; char* thisbuffer = new char[8]; thisbuffer[0] = (offset >> 56) & 0xFF; thisbuffer[1] = (offset >> 48) & 0xFF; thisbuffer[2] = (offset >> 40) & 0xFF; thisbuffer[3] = (offset >> 32) & 0xFF; thisbuffer[4] = (offset >> 24) & 0xFF; thisbuffer[5] = (offset >> 16) & 0xFF; thisbuffer[6] = (offset >> 8) & 0xFF; thisbuffer[7] = offset & 0xFF; out.write(thisbuffer, 8); delete[] thisbuffer; delete[] mybuffer2; //read index length char* mybuffer3 = new char[4]; in.read(mybuffer3,4); offset = 0; char* thisbuffer2 = new char[4]; thisbuffer2[0] = (offset >> 24) & 0xFF; thisbuffer2[1] = (offset >> 16) & 0xFF; thisbuffer2[2] = (offset >> 8) & 0xFF; thisbuffer2[3] = offset & 0xFF; out.write(thisbuffer2, 4); delete[] thisbuffer2; delete[] mybuffer3; //change num reads char* mybuffer4 = new char[4]; in.read(mybuffer4,4); char* thisbuffer3 = new char[4]; if (endian == "BIG_ENDIAN") { thisbuffer3[0] = (numReads >> 24) & 0xFF; thisbuffer3[1] = (numReads >> 16) & 0xFF; thisbuffer3[2] = (numReads >> 8) & 0xFF; thisbuffer3[3] = numReads & 0xFF; }else { thisbuffer3[0] = numReads & 0xFF; thisbuffer3[1] = (numReads >> 8) & 0xFF; thisbuffer3[2] = (numReads >> 16) & 0xFF; thisbuffer3[3] = (numReads >> 24) & 0xFF; } out.write(thisbuffer3, 4); contents = mybuffer4; //m->mothurOut("numReads = " + contents + "\n"); unsigned int numTReads = be_int4(*(unsigned int *)(mybuffer4)); m->mothurOut("numReads = " + toString(numTReads) + "\n"); //m->mothurOut("numReads = " + toString(header.numReads) + "\n"); delete[] thisbuffer3; delete[] mybuffer4; //read header length char* mybuffer5 = new char[2]; in.read(mybuffer5,2); out.write(mybuffer5, in.gcount()); unsigned short hl = be_int2(*(unsigned short *)(mybuffer5)); contents = toString(hl); m->mothurOut("readLength = " + contents + "\n"); delete[] mybuffer5; //read key length char* mybuffer6 = new char[2]; in.read(mybuffer6,2); out.write(mybuffer6, in.gcount()); unsigned short kl = be_int2(*(unsigned short *)(mybuffer6)); contents = toString(kl); m->mothurOut("key length = " + contents + "\n"); delete[] mybuffer6; //read number of flow reads char* mybuffer7 = new char[2]; in.read(mybuffer7,2); out.write(mybuffer7, in.gcount()); contents = mybuffer7; //m->mothurOut("num flow reads = " + contents + "\n"); int numFlowReads = be_int2(*(unsigned short *)(mybuffer7)); m->mothurOut("num flow Reads = " + toString(numFlowReads) + "\n"); delete[] mybuffer7; //read format code char* mybuffer8 = new char[1]; in.read(mybuffer8,1); out.write(mybuffer8, in.gcount()); int fc = (int)(mybuffer8[0]); contents = toString(fc); m->mothurOut("read format code = " + contents + "\n"); delete[] mybuffer8; //read flow chars char* mybuffer9 = new char[header.numFlowsPerRead]; in.read(mybuffer9,header.numFlowsPerRead); out.write(mybuffer9, in.gcount()); contents = mybuffer9; m->mothurOut("flow chars = " + contents + "\n"); delete[] mybuffer9; //read key char* mybuffer10 = new char[header.keyLength]; in.read(mybuffer10,header.keyLength); out.write(mybuffer10, in.gcount()); contents = mybuffer10; m->mothurOut("key = " + contents + "\n"); delete[] mybuffer10; /* Pad to 8 chars */ unsigned long long spotInFile = in.tellg(); unsigned long long spot = (spotInFile + 7)& ~7; // ~ inverts in.seekg(spot); char* mybuffer11 = new char[spot-spotInFile]; out.write(mybuffer11, spot-spotInFile); delete[] mybuffer11; in.close(); return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "printCommonHeaderForDebug"); exit(1); } } //********************************************************************************************************************** bool SffInfoCommand::readSeqData(ifstream& in, seqRead& read, int numFlowReads, Header& header, TrimOligos*& trimOligos, TrimOligos*& rtrimOligos){ try { unsigned long long startSpotInFile = in.tellg(); if (!in.eof()) { /*****************************************/ //read header //read header length char buffer [2]; in.read(buffer, 2); header.headerLength = be_int2(*(unsigned short *)(&buffer)); //read name length char buffer2 [2]; in.read(buffer2, 2); header.nameLength = be_int2(*(unsigned short *)(&buffer2)); //read num bases char buffer3 [4]; in.read(buffer3, 4); header.numBases = be_int4(*(unsigned int *)(&buffer3)); //read clip qual left char buffer4 [2]; in.read(buffer4, 2); header.clipQualLeft = be_int2(*(unsigned short *)(&buffer4)); //header.clipQualLeft = 5; //read clip qual right char buffer5 [2]; in.read(buffer5, 2); header.clipQualRight = be_int2(*(unsigned short *)(&buffer5)); //read clipAdapterLeft char buffer6 [2]; in.read(buffer6, 2); header.clipAdapterLeft = be_int2(*(unsigned short *)(&buffer6)); //read clipAdapterRight char buffer7 [2]; in.read(buffer7, 2); header.clipAdapterRight = be_int2(*(unsigned short *)(&buffer7)); //read name char* tempBuffer = new char[header.nameLength]; in.read(&(*tempBuffer), header.nameLength); header.name = tempBuffer; if (header.name.length() > header.nameLength) { header.name = header.name.substr(0, header.nameLength); } delete[] tempBuffer; //extract info from name decodeName(header.timestamp, header.region, header.xy, header.name); /* Pad to 8 chars */ unsigned long long spotInFile = in.tellg(); unsigned long long spot = (spotInFile + 7)& ~7; in.seekg(spot); /*****************************************/ //sequence read //read flowgram read.flowgram.resize(numFlowReads); for (int i = 0; i < numFlowReads; i++) { char buffer [2]; in.read(buffer, 2); read.flowgram[i] = be_int2(*(unsigned short *)(&buffer)); } //read flowIndex read.flowIndex.resize(header.numBases); for (int i = 0; i < header.numBases; i++) { char temp[1]; in.read(temp, 1); read.flowIndex[i] = be_int1(*(unsigned char *)(&temp)); } //read bases char* tempBuffer6 = new char[header.numBases]; in.read(&(*tempBuffer6), header.numBases); read.bases = tempBuffer6; if (read.bases.length() > header.numBases) { read.bases = read.bases.substr(0, header.numBases); } delete[] tempBuffer6; //read qual scores read.qualScores.resize(header.numBases); for (int i = 0; i < header.numBases; i++) { char temp[1]; in.read(temp, 1); read.qualScores[i] = be_int1(*(unsigned char *)(&temp)); } /* Pad to 8 chars */ spotInFile = in.tellg(); spot = (spotInFile + 7)& ~7; in.seekg(spot); if (split > 1) { int barcodeIndex, primerIndex, trashCodeLength; if (hasOligos) { trashCodeLength = findGroup(header, read, barcodeIndex, primerIndex, trimOligos, rtrimOligos); } else if (hasGroup) { trashCodeLength = findGroup(header, read, barcodeIndex, primerIndex, "groupMode"); } else { m->mothurOut("[ERROR]: uh oh, we shouldn't be here...\n"); } char * mybuffer; mybuffer = new char [spot-startSpotInFile]; ifstream in2; m->openInputFileBinary(currentFileName, in2); in2.seekg(startSpotInFile); in2.read(mybuffer,spot-startSpotInFile); if(trashCodeLength == 0){ ofstream out; m->openOutputFileBinaryAppend(filehandles[barcodeIndex][primerIndex], out); out.write(mybuffer, in2.gcount()); out.close(); numSplitReads[barcodeIndex][primerIndex]++; } else{ ofstream out; m->openOutputFileBinaryAppend(noMatchFile, out); out.write(mybuffer, in2.gcount()); out.close(); numNoMatch++; } delete[] mybuffer; in2.close(); } }else{ m->mothurOut("Error reading."); m->mothurOutEndLine(); } if (in.eof()) { return true; } return false; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "readSeqData"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::findGroup(Header header, seqRead read, int& barcode, int& primer, TrimOligos*& trimOligos, TrimOligos*& rtrimOligos) { try { int success = 1; string trashCode = ""; int currentSeqsDiffs = 0; string seq = read.bases; if (trim) { if(header.clipQualRight < header.clipQualLeft){ if (header.clipQualRight == 0) { //don't trim right seq = seq.substr(header.clipQualLeft-1); }else { seq = "NNNN"; } } else if((header.clipQualRight != 0) && ((header.clipQualRight-header.clipQualLeft) >= 0)){ seq = seq.substr((header.clipQualLeft-1), (header.clipQualRight-header.clipQualLeft+1)); } else { seq = seq.substr(header.clipQualLeft-1); } }else{ //if you wanted the sfftxt then you already converted the bases to the right case if (!sfftxt) { int endValue = header.clipQualRight; //make the bases you want to clip lowercase and the bases you want to keep upper case if(endValue == 0){ endValue = seq.length(); } for (int i = 0; i < (header.clipQualLeft-1); i++) { seq[i] = tolower(seq[i]); } for (int i = (header.clipQualLeft-1); i < (endValue-1); i++) { seq[i] = toupper(seq[i]); } for (int i = (endValue-1); i < seq.length(); i++) { seq[i] = tolower(seq[i]); } } } Sequence currSeq(header.name, seq); QualityScores currQual; //for reorient Sequence savedSeq(currSeq.getName(), currSeq.getAligned()); QualityScores savedQual(currQual.getName(), currQual.getScores()); if(numLinkers != 0){ success = trimOligos->stripLinker(currSeq, currQual); if(success > ldiffs) { trashCode += 'k'; } else{ currentSeqsDiffs += success; } } if(numBarcodes != 0){ vector results = trimOligos->stripBarcode(currSeq, currQual, barcode); if (pairedOligos) { success = results[0] + results[2]; } else { success = results[0]; } if(success > bdiffs) { trashCode += 'b'; } else{ currentSeqsDiffs += success; } } if(numSpacers != 0){ success = trimOligos->stripSpacer(currSeq, currQual); if(success > sdiffs) { trashCode += 's'; } else{ currentSeqsDiffs += success; } } if(numFPrimers != 0){ vector results = trimOligos->stripForward(currSeq, currQual, primer, true); if (pairedOligos) { success = results[0] + results[2]; } else { success = results[0]; } if(success > pdiffs) { trashCode += 'f'; } else{ currentSeqsDiffs += success; } } if(numRPrimers != 0){ vector results = trimOligos->stripReverse(currSeq, currQual); success = results[0]; if(success > pdiffs) { trashCode += 'r'; } else{ currentSeqsDiffs += success; } } if (currentSeqsDiffs > tdiffs) { trashCode += 't'; } if (reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; int thisCurrentSeqsDiffs = 0; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; //cout << currSeq.getName() << '\t' << savedSeq.getUnaligned() << endl; if(numBarcodes != 0){ vector results = rtrimOligos->stripBarcode(savedSeq, savedQual, thisBarcodeIndex); if (pairedOligos) { thisSuccess = results[0] + results[2]; } else { thisSuccess = results[0]; } if(thisSuccess > bdiffs) { thisTrashCode += "b"; } else{ thisCurrentSeqsDiffs += thisSuccess; } } //cout << currSeq.getName() << '\t' << savedSeq.getUnaligned() << endl; if(numFPrimers != 0){ vector results = rtrimOligos->stripForward(savedSeq, savedQual, thisPrimerIndex, true); if (pairedOligos) { thisSuccess = results[0] + results[2]; } else { thisSuccess = results[0]; } if(thisSuccess > pdiffs) { thisTrashCode += "f"; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if (thisCurrentSeqsDiffs > tdiffs) { thisTrashCode += 't'; } if (thisTrashCode == "") { trashCode = thisTrashCode; success = thisSuccess; currentSeqsDiffs = thisCurrentSeqsDiffs; barcode = thisBarcodeIndex; primer = thisPrimerIndex; savedSeq.reverseComplement(); currSeq.setAligned(savedSeq.getAligned()); savedQual.flipQScores(); currQual.setScores(savedQual.getScores()); }else { trashCode += "(" + thisTrashCode + ")"; } } if (trashCode.length() == 0) { //is this sequence in the ignore group string thisGroup = oligosObject->getGroupName(barcode, primer); int pos = thisGroup.find("ignore"); if (pos != string::npos) { trashCode += "i"; } } return trashCode.length(); } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "findGroup"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::findGroup(Header header, seqRead read, int& barcode, int& primer, string groupMode) { try { string trashCode = ""; primer = 0; string group = groupMap->getGroup(header.name); if (group == "not found") { trashCode += "g"; } //scrap for group else { barcode = GroupToFile[group]; } return trashCode.length(); } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "findGroup"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::decodeName(string& timestamp, string& region, string& xy, string name) { try { if (name.length() >= 6) { string time = name.substr(0, 6); unsigned int timeNum = m->fromBase36(time); int q1 = timeNum / 60; int sec = timeNum - 60 * q1; int q2 = q1 / 60; int minute = q1 - 60 * q2; int q3 = q2 / 24; int hr = q2 - 24 * q3; int q4 = q3 / 32; int day = q3 - 32 * q4; int q5 = q4 / 13; int mon = q4 - 13 * q5; int year = 2000 + q5; timestamp = toString(year) + "_" + toString(mon) + "_" + toString(day) + "_" + toString(hr) + "_" + toString(minute) + "_" + toString(sec); } if (name.length() >= 9) { region = name.substr(7, 2); string xyNum = name.substr(9); unsigned int myXy = m->fromBase36(xyNum); int x = myXy >> 12; int y = myXy & 4095; xy = toString(x) + "_" + toString(y); } return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "decodeName"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::printCommonHeader(ofstream& out, CommonHeader& header) { try { out << "Common Header:\nMagic Number: " << header.magicNumber << endl; out << "Version: " << header.version << endl; out << "Index Offset: " << header.indexOffset << endl; out << "Index Length: " << header.indexLength << endl; out << "Number of Reads: " << header.numReads << endl; out << "Header Length: " << header.headerLength << endl; out << "Key Length: " << header.keyLength << endl; out << "Number of Flows: " << header.numFlowsPerRead << endl; out << "Format Code: " << header.flogramFormatCode << endl; out << "Flow Chars: " << header.flowChars << endl; out << "Key Sequence: " << header.keySequence << endl << endl; return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "printCommonHeader"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::printHeader(ofstream& out, Header& header) { try { out << ">" << header.name << endl; out << "Run Prefix: " << header.timestamp << endl; out << "Region #: " << header.region << endl; out << "XY Location: " << header.xy << endl << endl; out << "Run Name: " << endl; out << "Analysis Name: " << endl; out << "Full Path: " << endl << endl; out << "Read Header Len: " << header.headerLength << endl; out << "Name Length: " << header.nameLength << endl; out << "# of Bases: " << header.numBases << endl; out << "Clip Qual Left: " << header.clipQualLeft << endl; out << "Clip Qual Right: " << header.clipQualRight << endl; out << "Clip Adap Left: " << header.clipAdapterLeft << endl; out << "Clip Adap Right: " << header.clipAdapterRight << endl << endl; return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "printHeader"); exit(1); } } //********************************************************************************************************************** bool SffInfoCommand::sanityCheck(Header& header, seqRead& read) { try { bool okay = true; string message = "[WARNING]: Your sff file may be corrupted! Sequence: " + header.name + "\n"; if (header.clipQualLeft > read.bases.length()) { okay = false; message += "Clip Qual Left = " + toString(header.clipQualLeft) + ", but we only read " + toString(read.bases.length()) + " bases.\n"; } if (header.clipQualRight > read.bases.length()) { okay = false; message += "Clip Qual Right = " + toString(header.clipQualRight) + ", but we only read " + toString(read.bases.length()) + " bases.\n"; } if (header.clipQualLeft > read.qualScores.size()) { okay = false; message += "Clip Qual Left = " + toString(header.clipQualLeft) + ", but we only read " + toString(read.qualScores.size()) + " quality scores.\n"; } if (header.clipQualRight > read.qualScores.size()) { okay = false; message += "Clip Qual Right = " + toString(header.clipQualRight) + ", but we only read " + toString(read.qualScores.size()) + " quality scores.\n"; } if (okay == false) { m->mothurOut(message); m->mothurOutEndLine(); } return okay; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "sanityCheck"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::printSffTxtSeqData(ofstream& out, seqRead& read, Header& header) { try { out << "Flowgram: "; for (int i = 0; i < read.flowgram.size(); i++) { out << setprecision(2) << (read.flowgram[i]/(float)100) << '\t'; } out << endl << "Flow Indexes: "; int sum = 0; for (int i = 0; i < read.flowIndex.size(); i++) { sum += read.flowIndex[i]; out << sum << '\t'; } //make the bases you want to clip lowercase and the bases you want to keep upper case int endValue = header.clipQualRight; if(endValue == 0){ endValue = read.bases.length(); } for (int i = 0; i < (header.clipQualLeft-1); i++) { read.bases[i] = tolower(read.bases[i]); } for (int i = (header.clipQualLeft-1); i < (endValue-1); i++) { read.bases[i] = toupper(read.bases[i]); } for (int i = (endValue-1); i < read.bases.length(); i++) { read.bases[i] = tolower(read.bases[i]); } out << endl << "Bases: " << read.bases << endl << "Quality Scores: "; for (int i = 0; i < read.qualScores.size(); i++) { out << read.qualScores[i] << '\t'; } out << endl << endl; return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "printSffTxtSeqData"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::printFastaSeqData(ofstream& out, seqRead& read, Header& header) { try { string seq = read.bases; if (trim) { if(header.clipQualRight < header.clipQualLeft){ if (header.clipQualRight == 0) { //don't trim right seq = seq.substr(header.clipQualLeft-1); }else { seq = "NNNN"; } } else if((header.clipQualRight != 0) && ((header.clipQualRight-header.clipQualLeft) >= 0)){ seq = seq.substr((header.clipQualLeft-1), (header.clipQualRight-header.clipQualLeft+1)); } else { seq = seq.substr(header.clipQualLeft-1); } }else{ //if you wanted the sfftxt then you already converted the bases to the right case if (!sfftxt) { int endValue = header.clipQualRight; //make the bases you want to clip lowercase and the bases you want to keep upper case if(endValue == 0){ endValue = seq.length(); } for (int i = 0; i < (header.clipQualLeft-1); i++) { seq[i] = tolower(seq[i]); } for (int i = (header.clipQualLeft-1); i < (endValue-1); i++) { seq[i] = toupper(seq[i]); } for (int i = (endValue-1); i < seq.length(); i++) { seq[i] = tolower(seq[i]); } } } out << ">" << header.name << " xy=" << header.xy << endl; out << seq << endl; return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "printFastaSeqData"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::printQualSeqData(ofstream& out, seqRead& read, Header& header) { try { if (trim) { if(header.clipQualRight < header.clipQualLeft){ if (header.clipQualRight == 0) { //don't trim right out << ">" << header.name << " xy=" << header.xy << " length=" << (read.qualScores.size()-header.clipQualLeft) << endl; for (int i = (header.clipQualLeft-1); i < read.qualScores.size(); i++) { out << read.qualScores[i] << '\t'; } }else { out << ">" << header.name << " xy=" << header.xy << endl; out << "0\t0\t0\t0"; } } else if((header.clipQualRight != 0) && ((header.clipQualRight-header.clipQualLeft) >= 0)){ out << ">" << header.name << " xy=" << header.xy << " length=" << (header.clipQualRight-header.clipQualLeft+1) << endl; for (int i = (header.clipQualLeft-1); i < (header.clipQualRight); i++) { out << read.qualScores[i] << '\t'; } } else{ out << ">" << header.name << " xy=" << header.xy << " length=" << (header.clipQualRight-header.clipQualLeft) << endl; for (int i = (header.clipQualLeft-1); i < read.qualScores.size(); i++) { out << read.qualScores[i] << '\t'; } } }else{ out << ">" << header.name << " xy=" << header.xy << " length=" << read.qualScores.size() << endl; for (int i = 0; i < read.qualScores.size(); i++) { out << read.qualScores[i] << '\t'; } } out << endl; return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "printQualSeqData"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::printFlowSeqData(ofstream& out, seqRead& read, Header& header) { try { int endValue = header.clipQualRight; if (header.clipQualRight == 0) { endValue = read.flowIndex.size(); if (m->debug) { m->mothurOut("[DEBUG]: " + header.name + " has clipQualRight=0.\n"); } } if(endValue > header.clipQualLeft){ int rightIndex = 0; for (int i = 0; i < endValue; i++) { rightIndex += read.flowIndex[i]; } out << header.name << ' ' << rightIndex; for (int i = 0; i < read.flowgram.size(); i++) { out << setprecision(2) << ' ' << (read.flowgram[i]/(float)100); } out << endl; } return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "printFlowSeqData"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::readAccnosFile(string filename) { try { //remove old names seqNames.clear(); ifstream in; m->openInputFile(filename, in); string name; while(!in.eof()){ in >> name; m->gobble(in); seqNames.insert(name); if (m->control_pressed) { seqNames.clear(); break; } } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "readAccnosFile"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::parseSffTxt() { try { ifstream inSFF; m->openInputFile(sfftxtFilename, inSFF); if (outputDir == "") { outputDir += m->hasPath(sfftxtFilename); } //output file names ofstream outFasta, outQual, outFlow; string outFastaFileName, outQualFileName; string fileRoot = m->getRootName(m->getSimpleName(sfftxtFilename)); if (fileRoot.length() > 0) { //rip off last . fileRoot = fileRoot.substr(0, fileRoot.length()-1); fileRoot = m->getRootName(fileRoot); } map variables; variables["[filename]"] = fileRoot; string sfftxtFileName = getOutputFileName("sfftxt",variables); string outFlowFileName = getOutputFileName("flow",variables); if (!trim) { variables["[tag]"] = "raw"; } outFastaFileName = getOutputFileName("fasta",variables); outQualFileName = getOutputFileName("qfile",variables); if (fasta) { m->openOutputFile(outFastaFileName, outFasta); outputNames.push_back(outFastaFileName); outputTypes["fasta"].push_back(outFastaFileName); } if (qual) { m->openOutputFile(outQualFileName, outQual); outputNames.push_back(outQualFileName); outputTypes["qfile"].push_back(outQualFileName); } if (flow) { m->openOutputFile(outFlowFileName, outFlow); outputNames.push_back(outFlowFileName); outFlow.setf(ios::fixed, ios::floatfield); outFlow.setf(ios::showpoint); outputTypes["flow"].push_back(outFlowFileName); } //read common header string commonHeader = m->getline(inSFF); string magicNumber = m->getline(inSFF); string version = m->getline(inSFF); string indexOffset = m->getline(inSFF); string indexLength = m->getline(inSFF); int numReads = parseHeaderLineToInt(inSFF); string headerLength = m->getline(inSFF); string keyLength = m->getline(inSFF); int numFlows = parseHeaderLineToInt(inSFF); string flowgramCode = m->getline(inSFF); string flowChars = m->getline(inSFF); string keySequence = m->getline(inSFF); m->gobble(inSFF); string seqName; if (flow) { outFlow << numFlows << endl; } for(int i=0;imothurOut("[ERROR]: Expected " + toString(numReads) + " but reached end of file at " + toString(i+1) + "."); m->mothurOutEndLine(); break; } Header header; //parse read header inSFF >> seqName; seqName = seqName.substr(1); m->gobble(inSFF); header.name = seqName; string runPrefix = parseHeaderLineToString(inSFF); header.timestamp = runPrefix; string regionNumber = parseHeaderLineToString(inSFF); header.region = regionNumber; string xyLocation = parseHeaderLineToString(inSFF); header.xy = xyLocation; m->gobble(inSFF); string runName = parseHeaderLineToString(inSFF); string analysisName = parseHeaderLineToString(inSFF); string fullPath = parseHeaderLineToString(inSFF); m->gobble(inSFF); string readHeaderLen = parseHeaderLineToString(inSFF); convert(readHeaderLen, header.headerLength); string nameLength = parseHeaderLineToString(inSFF); convert(nameLength, header.nameLength); int numBases = parseHeaderLineToInt(inSFF); header.numBases = numBases; string clipQualLeft = parseHeaderLineToString(inSFF); convert(clipQualLeft, header.clipQualLeft); int clipQualRight = parseHeaderLineToInt(inSFF); header.clipQualRight = clipQualRight; string clipAdapLeft = parseHeaderLineToString(inSFF); convert(clipAdapLeft, header.clipAdapterLeft); string clipAdapRight = parseHeaderLineToString(inSFF); convert(clipAdapRight, header.clipAdapterRight); m->gobble(inSFF); seqRead read; //parse read vector flowVector = parseHeaderLineToFloatVector(inSFF, numFlows); read.flowgram = flowVector; vector flowIndices = parseHeaderLineToIntVector(inSFF, numBases); //adjust for print vector flowIndicesAdjusted; flowIndicesAdjusted.push_back(flowIndices[0]); for (int j = 1; j < flowIndices.size(); j++) { flowIndicesAdjusted.push_back(flowIndices[j] - flowIndices[j-1]); } read.flowIndex = flowIndicesAdjusted; string bases = parseHeaderLineToString(inSFF); read.bases = bases; vector qualityScores = parseHeaderLineToIntVector(inSFF, numBases); read.qualScores = qualityScores; m->gobble(inSFF); //if you have provided an accosfile and this seq is not in it, then dont print bool print = true; if (seqNames.size() != 0) { if (seqNames.count(header.name) == 0) { print = false; } } //print if (print) { if (fasta) { printFastaSeqData(outFasta, read, header); } if (qual) { printQualSeqData(outQual, read, header); } if (flow) { printFlowSeqData(outFlow, read, header); } } //report progress if((i+1) % 10000 == 0){ m->mothurOut(toString(i+1)); m->mothurOutEndLine(); } if (m->control_pressed) { break; } } //report progress if (!m->control_pressed) { if((numReads) % 10000 != 0){ m->mothurOut(toString(numReads)); m->mothurOutEndLine(); } } inSFF.close(); if (fasta) { outFasta.close(); } if (qual) { outQual.close(); } if (flow) { outFlow.close(); } return 0; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "parseSffTxt"); exit(1); } } //********************************************************************************************************************** int SffInfoCommand::parseHeaderLineToInt(ifstream& file){ try { int number; while (!file.eof()) { char c = file.get(); if (c == ':'){ file >> number; break; } } m->gobble(file); return number; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "parseHeaderLineToInt"); exit(1); } } //********************************************************************************************************************** string SffInfoCommand::parseHeaderLineToString(ifstream& file){ try { string text; while (!file.eof()) { char c = file.get(); if (c == ':'){ //m->gobble(file); //text = m->getline(file); file >> text; break; } } m->gobble(file); return text; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "parseHeaderLineToString"); exit(1); } } //********************************************************************************************************************** vector SffInfoCommand::parseHeaderLineToFloatVector(ifstream& file, int length){ try { vector floatVector(length); while (!file.eof()) { char c = file.get(); if (c == ':'){ float temp; for(int i=0;i> temp; floatVector[i] = temp * 100; } break; } } m->gobble(file); return floatVector; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "parseHeaderLineToFloatVector"); exit(1); } } //********************************************************************************************************************** vector SffInfoCommand::parseHeaderLineToIntVector(ifstream& file, int length){ try { vector intVector(length); while (!file.eof()) { char c = file.get(); if (c == ':'){ for(int i=0;i> intVector[i]; } break; } } m->gobble(file); return intVector; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "parseHeaderLineToIntVector"); exit(1); } } //*************************************************************************************************************** bool SffInfoCommand::readOligos(string oligoFile){ try { filehandles.clear(); numSplitReads.clear(); filehandlesHeaders.clear(); bool allBlank = false; oligosObject->read(oligoFile); if (m->control_pressed) { return false; } //error in reading oligos if (oligosObject->hasPairedPrimers() || oligosObject->hasPairedBarcodes()) { pairedOligos = true; m->mothurOut("[ERROR]: sffinfo does not support paired barcodes and primers, aborting.\n"); m->control_pressed = true; return true; }else { pairedOligos = false; numFPrimers = oligosObject->getPrimers().size(); numBarcodes = oligosObject->getBarcodes().size(); } numLinkers = oligosObject->getLinkers().size(); numSpacers = oligosObject->getSpacers().size(); numRPrimers = oligosObject->getReversePrimers().size(); vector groupNames = oligosObject->getGroupNames(); if (groupNames.size() == 0) { allBlank = true; } filehandles.resize(oligosObject->getBarcodeNames().size()); for(int i=0;igetPrimerNames().size();j++){ filehandles[i].push_back(""); } } if(split > 1){ set uniqueNames; //used to cleanup outputFileNames map barcodes = oligosObject->getBarcodes() ; map primers = oligosObject->getPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligosObject->getPrimerName(itPrimer->second); string barcodeName = oligosObject->getBarcodeName(itBar->second); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string comboName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } if(itPrimer->first == ""){ comboName = itBar->first; }else{ if(itBar->first == ""){ comboName = itPrimer->first; } else{ comboName = itBar->first + "." + itPrimer->first; } } if (comboName != "") { comboGroupName += "_" + comboName; } ofstream temp; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(currentFileName)); variables["[group]"] = comboGroupName; string thisFilename = getOutputFileName("sff",variables); if (uniqueNames.count(thisFilename) == 0) { outputNames.push_back(thisFilename); uniqueNames.insert(thisFilename); } filehandles[itBar->second][itPrimer->second] = thisFilename; m->openOutputFileBinary(thisFilename, temp); temp.close(); } } } } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(currentFileName)); variables["[group]"] = "scrap"; noMatchFile = getOutputFileName("sff",variables); m->mothurRemove(noMatchFile); numNoMatch = 0; filehandlesHeaders.resize(filehandles.size()); numSplitReads.resize(filehandles.size()); for (int i = 0; i < filehandles.size(); i++) { numSplitReads[i].resize(filehandles[i].size(), 0); for (int j = 0; j < filehandles[i].size(); j++) { filehandlesHeaders[i].push_back(filehandles[i][j]+"headers"); ofstream temp; m->openOutputFileBinary(filehandles[i][j]+"headers", temp); temp.close(); } } if (allBlank) { m->mothurOut("[WARNING]: your oligos file does not contain any group names. mothur will not create a split the sff file."); m->mothurOutEndLine(); split = 1; return false; } return true; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "readOligos"); exit(1); } } //*************************************************************************************************************** bool SffInfoCommand::readGroup(string oligoFile){ try { filehandles.clear(); numSplitReads.clear(); filehandlesHeaders.clear(); groupMap = new GroupMap(); groupMap->readMap(oligoFile); //like barcodeNameVector - no primer names vector groups = groupMap->getNamesOfGroups(); filehandles.resize(groups.size()); for (int i = 0; i < filehandles.size(); i++) { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(currentFileName)); variables["[group]"] = groups[i]; string thisFilename = getOutputFileName("sff",variables); outputNames.push_back(thisFilename); ofstream temp; m->openOutputFileBinary(thisFilename, temp); temp.close(); filehandles[i].push_back(thisFilename); GroupToFile[groups[i]] = i; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(currentFileName)); variables["[group]"] = "scrap"; noMatchFile = getOutputFileName("sff",variables); m->mothurRemove(noMatchFile); numNoMatch = 0; filehandlesHeaders.resize(groups.size()); numSplitReads.resize(filehandles.size()); for (int i = 0; i < filehandles.size(); i++) { numSplitReads[i].resize(filehandles[i].size(), 0); for (int j = 0; j < filehandles[i].size(); j++) { string thisHeader = filehandles[i][j]+"headers"; filehandlesHeaders[i].push_back(thisHeader); ofstream temp; m->openOutputFileBinary(thisHeader, temp); temp.close(); } } return true; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "readGroup"); exit(1); } } //********************************************************************/ string SffInfoCommand::reverseOligo(string oligo){ try { string reverse = ""; for(int i=oligo.length()-1;i>=0;i--){ if(oligo[i] == 'A') { reverse += 'T'; } else if(oligo[i] == 'T'){ reverse += 'A'; } else if(oligo[i] == 'U'){ reverse += 'A'; } else if(oligo[i] == 'G'){ reverse += 'C'; } else if(oligo[i] == 'C'){ reverse += 'G'; } else if(oligo[i] == 'R'){ reverse += 'Y'; } else if(oligo[i] == 'Y'){ reverse += 'R'; } else if(oligo[i] == 'M'){ reverse += 'K'; } else if(oligo[i] == 'K'){ reverse += 'M'; } else if(oligo[i] == 'W'){ reverse += 'W'; } else if(oligo[i] == 'S'){ reverse += 'S'; } else if(oligo[i] == 'B'){ reverse += 'V'; } else if(oligo[i] == 'V'){ reverse += 'B'; } else if(oligo[i] == 'D'){ reverse += 'H'; } else if(oligo[i] == 'H'){ reverse += 'D'; } else { reverse += 'N'; } } return reverse; } catch(exception& e) { m->errorOut(e, "SffInfoCommand", "reverseOligo"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/sffinfocommand.h000066400000000000000000000057031255543666200214060ustar00rootroot00000000000000#ifndef SFFINFOCOMMAND_H #define SFFINFOCOMMAND_H /* * sffinfocommand.h * Mothur * * Created by westcott on 7/7/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "groupmap.h" #include "oligos.h" #include "trimoligos.h" /**********************************************************/ class SffInfoCommand : public Command { public: SffInfoCommand(string); SffInfoCommand(); ~SffInfoCommand(){} vector setParameters(); string getCommandName() { return "sffinfo"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Sffinfo"; } string getDescription() { return "extract sequences reads from a .sff file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string sffFilename, sfftxtFilename, outputDir, accnosName, currentFileName, oligosfile, noMatchFile, groupfile; vector filenames, outputNames, accnosFileNames, oligosFileNames, groupFileNames; bool abort, fasta, qual, trim, flow, sfftxt, hasAccnos, hasOligos, hasGroup, reorient, pairedOligos; int mycount, split, numBarcodes, numFPrimers, numLinkers, numSpacers, numRPrimers, pdiffs, bdiffs, ldiffs, sdiffs, tdiffs, numNoMatch; set seqNames; GroupMap* groupMap; map GroupToFile; vector > numSplitReads; vector > filehandles; vector > filehandlesHeaders; Oligos* oligosObject; //extract sff file functions int extractSffInfo(string, string, string); int readCommonHeader(ifstream&, CommonHeader&); int readHeader(ifstream&, Header&); bool readSeqData(ifstream&, seqRead&, int, Header&, TrimOligos*&, TrimOligos*&); int decodeName(string&, string&, string&, string); bool readOligos(string oligosFile); bool readGroup(string oligosFile); int printCommonHeader(ofstream&, CommonHeader&); int printCommonHeaderForDebug(CommonHeader&, ofstream&, int); int printHeader(ofstream&, Header&); int printSffTxtSeqData(ofstream&, seqRead&, Header&); int printFlowSeqData(ofstream&, seqRead&, Header&); int printFastaSeqData(ofstream&, seqRead&, Header&); int printQualSeqData(ofstream&, seqRead&, Header&); int readAccnosFile(string); int parseSffTxt(); bool sanityCheck(Header&, seqRead&); int adjustCommonHeader(CommonHeader); int findGroup(Header header, seqRead read, int& barcode, int& primer, TrimOligos*&, TrimOligos*&); int findGroup(Header header, seqRead read, int& barcode, int& primer, string); string reverseOligo(string oligo); //parsesfftxt file functions int parseHeaderLineToInt(ifstream&); vector parseHeaderLineToFloatVector(ifstream&, int); vector parseHeaderLineToIntVector(ifstream&, int); string parseHeaderLineToString(ifstream&); }; /**********************************************************/ #endif mothur-1.36.1/source/commands/sffmultiplecommand.cpp000066400000000000000000001460061255543666200226430ustar00rootroot00000000000000// // sffmultiplecommand.cpp // Mothur // // Created by Sarah Westcott on 8/14/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "sffmultiplecommand.h" //********************************************************************************************************************** vector SffMultipleCommand::setParameters(){ try { CommandParameter pfile("file", "InputTypes", "", "", "none", "none", "none","fasta-name",false,true,true); parameters.push_back(pfile); //sffinfo CommandParameter ptrim("trim", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(ptrim); //trim.flows CommandParameter pmaxhomop("maxhomop", "Number", "", "9", "", "", "","",false,false); parameters.push_back(pmaxhomop); CommandParameter pmaxflows("maxflows", "Number", "", "450", "", "", "","",false,false); parameters.push_back(pmaxflows); CommandParameter pminflows("minflows", "Number", "", "450", "", "", "","",false,false); parameters.push_back(pminflows); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(ppdiffs); CommandParameter pbdiffs("bdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(pbdiffs); CommandParameter pldiffs("ldiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pldiffs); CommandParameter psdiffs("sdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psdiffs); CommandParameter ptdiffs("tdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ptdiffs); CommandParameter psignal("signal", "Number", "", "0.50", "", "", "","",false,false); parameters.push_back(psignal); CommandParameter pnoise("noise", "Number", "", "0.70", "", "", "","",false,false); parameters.push_back(pnoise); CommandParameter porder("order", "Multiple", "A-B-I", "A", "", "", "","",false,false, true); parameters.push_back(porder); //shhh.flows CommandParameter plookup("lookup", "InputTypes", "", "", "none", "none", "none","",false,false,true); parameters.push_back(plookup); CommandParameter pcutoff("cutoff", "Number", "", "0.01", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter pmaxiter("maxiter", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(pmaxiter); CommandParameter plarge("large", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(plarge); CommandParameter psigma("sigma", "Number", "", "60", "", "", "","",false,false); parameters.push_back(psigma); CommandParameter pmindelta("mindelta", "Number", "", "0.000001", "", "", "","",false,false); parameters.push_back(pmindelta); //trim.seqs parameters CommandParameter pallfiles("allfiles", "Boolean", "", "t", "", "", "","",false,false); parameters.push_back(pallfiles); CommandParameter pflip("flip", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(pflip); CommandParameter pmaxambig("maxambig", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmaxambig); CommandParameter pminlength("minlength", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pminlength); CommandParameter pmaxlength("maxlength", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pmaxlength); CommandParameter pkeepforward("keepforward", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pkeepforward); CommandParameter pkeepfirst("keepfirst", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pkeepfirst); CommandParameter premovelast("removelast", "Number", "", "0", "", "", "","",false,false); parameters.push_back(premovelast); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SffMultipleCommand::getHelpString(){ try { string helpString = ""; helpString += "The sff.multiple command reads a file containing sff filenames and optional oligos filenames. It runs the files through sffinfo, trim.flows, shhh.flows and trim.seqs combining the results.\n"; helpString += "The sff.multiple command parameters are: "; vector parameters = setParameters(); for (int i = 0; i < parameters.size()-1; i++) { helpString += parameters[i] + ", "; } helpString += parameters[parameters.size()-1] + ".\n"; helpString += "The file parameter allows you to enter the a file containing the list of sff files and optional oligos files.\n"; helpString += "The trim parameter allows you to indicate if you would like a sequences and quality scores generated by sffinfo trimmed to the clipQualLeft and clipQualRight values. Default=True. \n"; helpString += "The maxambig parameter allows you to set the maximum number of ambigious bases allowed. The default is -1.\n"; helpString += "The maxhomop parameter allows you to set a maximum homopolymer length. \n"; helpString += "The minlength parameter allows you to set and minimum sequence length. \n"; helpString += "The maxlength parameter allows you to set and maximum sequence length. \n"; helpString += "The tdiffs parameter is used to specify the total number of differences allowed in the sequence. The default is pdiffs + bdiffs + sdiffs + ldiffs.\n"; helpString += "The bdiffs parameter is used to specify the number of differences allowed in the barcode. The default is 0.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; helpString += "The ldiffs parameter is used to specify the number of differences allowed in the linker. The default is 0.\n"; helpString += "The sdiffs parameter is used to specify the number of differences allowed in the spacer. The default is 0.\n"; helpString += "The allfiles parameter will create separate group and fasta file for each grouping. The default is F.\n"; helpString += "The keepforward parameter allows you to indicate whether you want the forward primer removed or not. The default is F, meaning remove the forward primer.\n"; helpString += "The keepfirst parameter trims the sequence to the first keepfirst number of bases after the barcode or primers are removed, before the sequence is checked to see if it meets the other requirements. \n"; helpString += "The removelast removes the last removelast number of bases after the barcode or primers are removed, before the sequence is checked to see if it meets the other requirements.\n"; helpString += "The order parameter options are A, B or I. Default=A. A = TACG and B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"; helpString += "Example sff.multiple(file=mySffOligosFile.txt, trim=F).\n"; helpString += "Note: No spaces between parameter labels (i.e. file), '=' and parameters (i.e.mySffOligosFile.txt).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SffMultipleCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],fasta"; } else if (type == "name") { pattern = "[filename],names"; } else if (type == "group") { pattern = "[filename],groups"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SffMultipleCommand::SffMultipleCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "SffMultipleCommand"); exit(1); } } //********************************************************************************************************************** SffMultipleCommand::SffMultipleCommand(string option) { try { abort = false; calledHelp = false; append=false; makeGroup=false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } it = parameters.find("lookup"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["lookup"] = inputDir + it->second; } } } filename = validParameter.validFile(parameters, "file", true); if (filename == "not open") { filename = ""; abort = true; } else if (filename == "not found") { filename = ""; } string temp; temp = validParameter.validFile(parameters, "trim", false); if (temp == "not found"){ temp = "T"; } trim = m->isTrue(temp); temp = validParameter.validFile(parameters, "minflows", false); if (temp == "not found") { temp = "450"; } m->mothurConvert(temp, minFlows); temp = validParameter.validFile(parameters, "maxflows", false); if (temp == "not found") { temp = "450"; } m->mothurConvert(temp, maxFlows); temp = validParameter.validFile(parameters, "maxhomop", false); if (temp == "not found"){ temp = "9"; } m->mothurConvert(temp, maxHomoP); temp = validParameter.validFile(parameters, "signal", false); if (temp == "not found"){ temp = "0.50"; } m->mothurConvert(temp, signal); temp = validParameter.validFile(parameters, "noise", false); if (temp == "not found"){ temp = "0.70"; } m->mothurConvert(temp, noise); temp = validParameter.validFile(parameters, "bdiffs", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, bdiffs); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, pdiffs); temp = validParameter.validFile(parameters, "ldiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, ldiffs); temp = validParameter.validFile(parameters, "sdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, sdiffs); temp = validParameter.validFile(parameters, "tdiffs", false); if (temp == "not found") { int tempTotal = pdiffs + bdiffs + ldiffs + sdiffs; temp = toString(tempTotal); } m->mothurConvert(temp, tdiffs); if(tdiffs == 0){ tdiffs = bdiffs + pdiffs + ldiffs + sdiffs; } temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "order", false); if (temp == "not found"){ temp = "A"; } if (temp.length() > 1) { m->mothurOut("[ERROR]: " + temp + " is not a valid option for order. order options are A, B, or I. A = TACG, B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC, and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"); abort=true; } else { if (toupper(temp[0]) == 'A') { flowOrder = "A"; } else if(toupper(temp[0]) == 'B'){ flowOrder = "B"; } else if(toupper(temp[0]) == 'I'){ flowOrder = "I"; } else { m->mothurOut("[ERROR]: " + temp + " is not a valid option for order. order options are A, B, or I. A = TACG, B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC, and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"); abort=true; } } temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found"){ temp = "0.01"; } m->mothurConvert(temp, cutoff); temp = validParameter.validFile(parameters, "mindelta", false); if (temp == "not found"){ temp = "0.000001"; } minDelta = temp; temp = validParameter.validFile(parameters, "maxiter", false); if (temp == "not found"){ temp = "1000"; } m->mothurConvert(temp, maxIters); temp = validParameter.validFile(parameters, "large", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, largeSize); if (largeSize != 0) { large = true; } else { large = false; } if (largeSize < 0) { m->mothurOut("The value of the large cannot be negative.\n"); } temp = validParameter.validFile(parameters, "sigma", false);if (temp == "not found") { temp = "60"; } m->mothurConvert(temp, sigma); temp = validParameter.validFile(parameters, "flip", false); if (temp == "not found") { flip = 0; } else { flip = m->isTrue(temp); } temp = validParameter.validFile(parameters, "maxambig", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, maxAmbig); temp = validParameter.validFile(parameters, "minlength", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, minLength); temp = validParameter.validFile(parameters, "maxlength", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, maxLength); temp = validParameter.validFile(parameters, "keepfirst", false); if (temp == "not found") { temp = "0"; } convert(temp, keepFirst); temp = validParameter.validFile(parameters, "removelast", false); if (temp == "not found") { temp = "0"; } convert(temp, removeLast); temp = validParameter.validFile(parameters, "allfiles", false); if (temp == "not found") { temp = "F"; } allFiles = m->isTrue(temp); temp = validParameter.validFile(parameters, "keepforward", false); if (temp == "not found") { temp = "F"; } keepforward = m->isTrue(temp); temp = validParameter.validFile(parameters, "lookup", true); if (temp == "not found") { string path = m->argv; string tempPath = path; for (int i = 0; i < path.length(); i++) { tempPath[i] = tolower(path[i]); } path = path.substr(0, (tempPath.find_last_of('m'))); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) path += "lookupFiles/"; #else path += "lookupFiles\\"; #endif lookupFileName = m->getFullPathName(path) + "LookUp_Titanium.pat"; bool ableToOpen = m->checkLocations(lookupFileName, inputDir); if (!ableToOpen) { abort=true; } }else if(temp == "not open") { lookupFileName = validParameter.validFile(parameters, "lookup", false); //if you can't open it its not inputDir, try mothur excutable location string exepath = m->argv; string tempPath = exepath; for (int i = 0; i < exepath.length(); i++) { tempPath[i] = tolower(exepath[i]); } exepath = exepath.substr(0, (tempPath.find_last_of('m'))); string tryPath = m->getFullPathName(exepath) + m->getSimpleName(lookupFileName); m->mothurOut("Unable to open " + lookupFileName + ". Trying mothur's executable location " + tryPath); m->mothurOutEndLine(); ifstream in2; int ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); lookupFileName = tryPath; if (ableToOpen == 1) { m->mothurOut("Unable to open " + lookupFileName + "."); m->mothurOutEndLine(); abort=true; } }else { lookupFileName = temp; } } } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "SffMultipleCommand"); exit(1); } } //********************************************************************************************************************** int SffMultipleCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } vector sffFiles, oligosFiles; readFile(sffFiles, oligosFiles); string thisOutputDir = outputDir; if (thisOutputDir == "") { thisOutputDir = m->hasPath(filename); } string fileroot = thisOutputDir + m->getRootName(m->getSimpleName(filename)); map variables; variables["[filename]"] = fileroot; string fasta = getOutputFileName("fasta",variables); string name = getOutputFileName("name",variables); string group = getOutputFileName("group",variables); if (m->control_pressed) { return 0; } if (sffFiles.size() < processors) { processors = sffFiles.size(); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else //trim.flows, shhh.flows cannot handle multiple processors for windows. processors = 1; m->mothurOut("This command can only use 1 processor on Windows platforms, using 1 processors.\n\n"); #endif if (processors == 1) { driver(sffFiles, oligosFiles, 0, sffFiles.size(), fasta, name, group); } else { createProcesses(sffFiles, oligosFiles, fasta, name, group); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (append) { outputNames.push_back(fasta); outputTypes["fasta"].push_back(fasta); m->setFastaFile(fasta); outputNames.push_back(name); outputTypes["name"].push_back(name); m->setNameFile(name); if (makeGroup) { outputNames.push_back(group); outputTypes["group"].push_back(group); m->setGroupFile(group); } } m->setProcessors(toString(processors)); //report output filenames m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SffMultipleCommand::readFile(vector& sffFiles, vector& oligosFiles){ try { ifstream in; m->openInputFile(filename, in); bool allBlank = true; bool allFull = true; string oligos, sff; while (!in.eof()) { if (m->control_pressed) { break; } in >> sff; //ignore file pairing if(sff[0] == '#'){ while (!in.eof()) { char c = in.get(); if (c == 10 || c == 13){ break; } } m->gobble(in); } else { //check for oligos file bool ableToOpenSff = m->checkLocations(sff, inputDir); oligos = ""; // get rest of line in case there is a oligos filename while (!in.eof()) { char c = in.get(); if (c == 10 || c == 13 || c == -1){ break; } else if (c == 32 || c == 9){;} //space or tab else { oligos += c; } } if (ableToOpenSff) { sffFiles.push_back(sff); if (oligos != "") { bool ableToOpenOligos = m->checkLocations(oligos, inputDir); if (ableToOpenOligos) { allBlank = false; } else { m->mothurOut("Can not find " + oligos + ". Ignoring.\n"); oligos = ""; } } if (oligos == "") { allFull = false; } oligosFiles.push_back(oligos); //will push a blank if there is not an oligos for this sff file }else { m->mothurOut("Can not find " + sff + ". Ignoring.\n"); } } m->gobble(in); } in.close(); if (allBlank || allFull) { append = true; } if (allFull) { makeGroup = true; } return 0; } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "readFile"); exit(1); } } //********************************************************************************************************************** //runs sffinfo, summary.seqs, trim.flows, shhh.flows, trim.seqs, summary.seqs for each sff file. int SffMultipleCommand::driver(vector sffFiles, vector oligosFiles, int start, int end, string fasta, string name, string group){ try { m->mothurRemove(fasta); m->mothurRemove(name); m->mothurRemove(group); int count = 0; for (int s = start; s < end; s++) { string sff = sffFiles[s]; string oligos = oligosFiles[s]; m->mothurOut("\n>>>>>\tProcessing " + sff + " (file " + toString(s+1) + " of " + toString(sffFiles.size()) + ")\t<<<<<\n"); //run sff.info string redirects = ""; if (inputDir != "") { redirects += ", inputdir=" + inputDir; } if (outputDir != "") { redirects += ", outputdir=" + outputDir; } string inputString = "sff=" + sff + ", flow=T"; if (trim) { inputString += ", trim=T"; } if (redirects != "") { inputString += redirects; } m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: sffinfo(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* sffCommand = new SffInfoCommand(inputString); sffCommand->execute(); if (m->control_pressed){ break; } map > filenames = sffCommand->getOutputFiles(); delete sffCommand; m->mothurCalling = false; m->mothurOutEndLine(); redirects = ""; if (outputDir != "") { redirects += ", outputdir=" + outputDir; } //run summary.seqs on the fasta file string fastaFile = ""; map >::iterator it = filenames.find("fasta"); if (it != filenames.end()) { if ((it->second).size() != 0) { fastaFile = (it->second)[0]; } } else { m->mothurOut("[ERROR]: sffinfo did not create a fasta file, quitting.\n"); m->control_pressed = true; break; } inputString = "fasta=" + fastaFile + ", processors=1"; if (redirects != "") { inputString += redirects; } m->mothurOutEndLine(); m->mothurOut("Running command: summary.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* summarySeqsCommand = new SeqSummaryCommand(inputString); summarySeqsCommand->execute(); if (m->control_pressed){ break; } map > temp = summarySeqsCommand->getOutputFiles(); mergeOutputFileList(filenames, temp); delete summarySeqsCommand; m->mothurCalling = false; m->mothurOutEndLine(); //run trim.flows on the fasta file string flowFile = ""; it = filenames.find("flow"); if (it != filenames.end()) { if ((it->second).size() != 0) { flowFile = (it->second)[0]; } } else { m->mothurOut("[ERROR]: sffinfo did not create a flow file, quitting.\n"); m->control_pressed = true; break; } inputString = "flow=" + flowFile; if (oligos != "") { inputString += ", oligos=" + oligos; } inputString += ", maxhomop=" + toString(maxHomoP) + ", maxflows=" + toString(maxFlows) + ", minflows=" + toString(minFlows); inputString += ", pdiffs=" + toString(pdiffs) + ", bdiffs=" + toString(bdiffs) + ", ldiffs=" + toString(ldiffs) + ", sdiffs=" + toString(sdiffs); inputString += ", tdiffs=" + toString(tdiffs) + ", signal=" + toString(signal) + ", noise=" + toString(noise) + ", order=" + flowOrder + ", processors=1"; if (redirects != "") { inputString += redirects; } m->mothurOutEndLine(); m->mothurOut("Running command: trim.flows(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* trimFlowCommand = new TrimFlowsCommand(inputString); trimFlowCommand->execute(); if (m->control_pressed){ break; } temp = trimFlowCommand->getOutputFiles(); mergeOutputFileList(filenames, temp); delete trimFlowCommand; m->mothurCalling = false; string fileFileName = ""; flowFile = ""; if (oligos != "") { it = temp.find("file"); if (it != temp.end()) { if ((it->second).size() != 0) { fileFileName = (it->second)[0]; } } else { m->mothurOut("[ERROR]: trim.flows did not create a file file, quitting.\n"); m->control_pressed = true; break; } }else { vector flowFiles; it = temp.find("flow"); if (it != temp.end()) { if ((it->second).size() != 0) { flowFiles = (it->second); } } else { m->mothurOut("[ERROR]: trim.flows did not create a flow file, quitting.\n"); m->control_pressed = true; break; } for (int i = 0; i < flowFiles.size(); i++) { string end = flowFiles[i].substr(flowFiles[i].length()-9); if (end == "trim.flow") { flowFile = flowFiles[i]; i+=flowFiles.size(); //if we found the trim.flow file stop looking } } } if ((fileFileName == "") && (flowFile == "")) { m->mothurOut("[ERROR]: trim.flows did not create a file file or a trim.flow file, quitting.\n"); m->control_pressed = true; break; } if (fileFileName != "") { inputString = "file=" + fileFileName; } else { inputString = "flow=" + flowFile; } inputString += ", lookup=" + lookupFileName + ", cutoff=" + toString(cutoff); + ", maxiters=" + toString(maxIters); if (large) { inputString += ", large=" + toString(largeSize); } inputString += ", sigma=" +toString(sigma); inputString += ", mindelta=" + toString(minDelta); inputString += ", order=" + flowOrder + ", processors=1"; if (redirects != "") { inputString += redirects; } //run shhh.flows m->mothurOutEndLine(); m->mothurOut("Running command: shhh.flows(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* shhhFlowCommand = new ShhherCommand(inputString); shhhFlowCommand->execute(); if (m->control_pressed){ break; } temp = shhhFlowCommand->getOutputFiles(); mergeOutputFileList(filenames, temp); delete shhhFlowCommand; m->mothurCalling = false; vector fastaFiles; vector nameFiles; it = temp.find("fasta"); if (it != temp.end()) { if ((it->second).size() != 0) { fastaFiles = (it->second); } } else { m->mothurOut("[ERROR]: shhh.flows did not create a fasta file, quitting.\n"); m->control_pressed = true; break; } it = temp.find("name"); if (it != temp.end()) { if ((it->second).size() != 0) { nameFiles = (it->second); } } else { m->mothurOut("[ERROR]: shhh.flows did not create a name file, quitting.\n"); m->control_pressed = true; break; } //find fasta and name files with the shortest name. This is because if there is a composite name it will be the shortest. fastaFile = fastaFiles[0]; for (int i = 1; i < fastaFiles.size(); i++) { if (fastaFiles[i].length() < fastaFile.length()) { fastaFile = fastaFiles[i]; } } string nameFile = nameFiles[0]; for (int i = 1; i < nameFiles.size(); i++) { if (nameFiles[i].length() < nameFile.length()) { nameFile = nameFiles[i]; } } inputString = "fasta=" + fastaFile + ", name=" + nameFile; if (oligos != "") { inputString += ", oligos=" + oligos; } if (allFiles) { inputString += ", allfiles=t"; } else { inputString += ", allfiles=f"; } if (flip) { inputString += ", flip=t"; } else { inputString += ", flip=f"; } if (keepforward) { inputString += ", keepforward=t"; } else { inputString += ", keepforward=f"; } inputString += ", pdiffs=" + toString(pdiffs) + ", bdiffs=" + toString(bdiffs) + ", ldiffs=" + toString(ldiffs) + ", sdiffs=" + toString(sdiffs); inputString += ", tdiffs=" + toString(tdiffs) + ", maxambig=" + toString(maxAmbig) + ", minlength=" + toString(minLength) + ", maxlength=" + toString(maxLength); if (keepFirst != 0) { inputString += ", keepfirst=" + toString(keepFirst); } if (removeLast != 0) { inputString += ", removelast=" + toString(removeLast); } inputString += ", processors=1"; if (redirects != "") { inputString += redirects; } //run trim.seqs m->mothurOutEndLine(); m->mothurOut("Running command: trim.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* trimseqsCommand = new TrimSeqsCommand(inputString); trimseqsCommand->execute(); if (m->control_pressed){ break; } temp = trimseqsCommand->getOutputFiles(); mergeOutputFileList(filenames, temp); delete trimseqsCommand; m->mothurCalling = false; it = temp.find("fasta"); if (it != temp.end()) { if ((it->second).size() != 0) { fastaFiles = (it->second); } } else { m->mothurOut("[ERROR]: trim.seqs did not create a fasta file, quitting.\n"); m->control_pressed = true; break; } for (int i = 0; i < fastaFiles.size(); i++) { string end = fastaFiles[i].substr(fastaFiles[i].length()-10); if (end == "trim.fasta") { fastaFile = fastaFiles[i]; i+=fastaFiles.size(); //if we found the trim.fasta file stop looking } } it = temp.find("name"); if (it != temp.end()) { if ((it->second).size() != 0) { nameFiles = (it->second); } } else { m->mothurOut("[ERROR]: trim.seqs did not create a name file, quitting.\n"); m->control_pressed = true; break; } for (int i = 0; i < nameFiles.size(); i++) { string end = nameFiles[i].substr(nameFiles[i].length()-10); if (end == "trim.names") { nameFile = nameFiles[i]; i+=nameFiles.size(); //if we found the trim.names file stop looking } } vector groupFiles; string groupFile = ""; if (makeGroup) { it = temp.find("group"); if (it != temp.end()) { if ((it->second).size() != 0) { groupFiles = (it->second); } } //find group file with the shortest name. This is because if there is a composite group file it will be the shortest. groupFile = groupFiles[0]; for (int i = 1; i < groupFiles.size(); i++) { if (groupFiles[i].length() < groupFile.length()) { groupFile = groupFiles[i]; } } } inputString = "fasta=" + fastaFile + ", processors=1, name=" + nameFile; if (redirects != "") { inputString += redirects; } m->mothurOutEndLine(); m->mothurOut("Running command: summary.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; summarySeqsCommand = new SeqSummaryCommand(inputString); summarySeqsCommand->execute(); if (m->control_pressed){ break; } temp = summarySeqsCommand->getOutputFiles(); mergeOutputFileList(filenames, temp); delete summarySeqsCommand; m->mothurCalling = false; m->mothurOutEndLine(); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); if (append) { m->appendFiles(fastaFile, fasta); m->appendFiles(nameFile, name); if (makeGroup) { m->appendFiles(groupFile, group); } } for (it = filenames.begin(); it != filenames.end(); it++) { for (int i = 0; i < (it->second).size(); i++) { outputNames.push_back((it->second)[i]); outputTypes[it->first].push_back((it->second)[i]); } } count++; } return count; } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "driver"); exit(1); } } //********************************************************************************************************************** int SffMultipleCommand::mergeOutputFileList(map >& files, map >& temp){ try { map >::iterator it; for (it = temp.begin(); it != temp.end(); it++) { map >::iterator it2 = files.find(it->first); if (it2 == files.end()) { //we do not already have this type so just add it files[it->first] = it->second; }else { //merge them for (int i = 0; i < (it->second).size(); i++) { files[it->first].push_back((it->second)[i]); } } } return 0; } catch(exception& e) { m->errorOut(e, "SffMultipleCommand", "mergeOutputFileList"); exit(1); } } //********************************************************************************************************************** int SffMultipleCommand::createProcesses(vector sffFiles, vector oligosFiles, string fasta, string name, string group){ try { vector processIDS; int process = 1; int num = 0; bool recalc = false; //divide the groups between the processors vector lines; vector numFilesToComplete; int numFilesPerProcessor = sffFiles.size() / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numFilesPerProcessor; int endIndex = (i+1) * numFilesPerProcessor; if(i == (processors - 1)){ endIndex = sffFiles.size(); } lines.push_back(linePair(startIndex, endIndex)); numFilesToComplete.push_back((endIndex-startIndex)); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(sffFiles, oligosFiles, lines[process].start, lines[process].end, fasta + m->mothurGetpid(process) + ".temp", name + m->mothurGetpid(process) + ".temp", group + m->mothurGetpid(process) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << '\t' << outputNames.size() << endl; for (int i = 0; i < outputNames.size(); i++) { out << outputNames[i] << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(fasta + (toString(processIDS[i]) + ".temp")); m->mothurRemove(name + (toString(processIDS[i]) + ".temp")); m->mothurRemove(group + (toString(processIDS[i]) + ".temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(fasta + (toString(processIDS[i]) + ".temp"));m->mothurRemove(group + (toString(processIDS[i]) + ".temp"));m->mothurRemove(name + (toString(processIDS[i]) + ".temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide lines.clear(); numFilesToComplete.clear(); int numFilesPerProcessor = sffFiles.size() / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numFilesPerProcessor; int endIndex = (i+1) * numFilesPerProcessor; if(i == (processors - 1)){ endIndex = sffFiles.size(); } lines.push_back(linePair(startIndex, endIndex)); numFilesToComplete.push_back((endIndex-startIndex)); } num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(sffFiles, oligosFiles, lines[process].start, lines[process].end, fasta + m->mothurGetpid(process) + ".temp", name + m->mothurGetpid(process) + ".temp", group + m->mothurGetpid(process) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << '\t' << outputNames.size() << endl; for (int i = 0; i < outputNames.size(); i++) { out << outputNames[i] << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part num = driver(sffFiles, oligosFiles, lines[0].start, lines[0].end, fasta, name, group); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; int outputNamesSize = 0; in >> tempNum >> outputNamesSize; m->gobble(in); for (int j = 0; j < outputNamesSize; j++) { string tempName; in >> tempName; m->gobble(in); outputNames.push_back(tempName); } if (tempNum != numFilesToComplete[i+1]) { m->mothurOut("[ERROR]: main process expected " + toString(processIDS[i]) + " to complete " + toString(numFilesToComplete[i+1]) + " files, and it only reported completing " + toString(tempNum) + ". This will cause file mismatches. The flow files may be too large to process with multiple processors. \n"); } } in.close(); m->mothurRemove(tempFile); if (append) { m->appendFiles(fasta+toString(processIDS[i])+".temp", fasta); m->mothurRemove(fasta+toString(processIDS[i])+".temp"); m->appendFiles(name+toString(processIDS[i])+".temp", name); m->mothurRemove(name+toString(processIDS[i])+".temp"); if (makeGroup) { m->appendFiles(group+toString(processIDS[i])+".temp", group); m->mothurRemove(group+toString(processIDS[i])+".temp"); } } } #endif return 0; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/sffmultiplecommand.h000066400000000000000000000034751255543666200223120ustar00rootroot00000000000000#ifndef Mothur_sffmultiplecommand_h #define Mothur_sffmultiplecommand_h // // sffmultiplecommand.h // Mothur // // Created by Sarah Westcott on 8/14/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" #include "sffinfocommand.h" #include "seqsummarycommand.h" #include "trimflowscommand.h" #include "shhhercommand.h" #include "trimseqscommand.h" class SffMultipleCommand : public Command { public: SffMultipleCommand(string); SffMultipleCommand(); ~SffMultipleCommand(){} vector setParameters(); string getCommandName() { return "sff.multiple"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Sff.multiple"; } string getDescription() { return "run multiple sff files through, sffinfo, trim.flow, shhh.flows and trim.seqs combining the results"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string inputDir; string filename, outputDir, flowOrder, lookupFileName, minDelta; vector outputNames; bool abort, trim, large, flip, allFiles, keepforward, append, makeGroup; int maxFlows, minFlows, minLength, maxLength, maxHomoP, tdiffs, bdiffs, pdiffs, sdiffs, ldiffs; int processors, maxIters, largeSize; float signal, noise, cutoff, sigma; int keepFirst, removeLast, maxAmbig; int readFile(vector& sffFiles, vector& oligosFiles); int createProcesses(vector sffFiles, vector oligosFiles, string, string, string); int driver(vector sffFiles, vector oligosFiles, int start, int end, string, string, string); int mergeOutputFileList(map >& files, map >& temp); }; #endif mothur-1.36.1/source/commands/sharedcommand.cpp000066400000000000000000001416741255543666200215650ustar00rootroot00000000000000/* * sharedcommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedcommand.h" #include "sharedutilities.h" #include "counttable.h" //******************************************************************************************************************** //sorts lowest to highest inline bool compareSharedRabunds(SharedRAbundVector* left, SharedRAbundVector* right){ return (left->getGroup() < right->getGroup()); } //********************************************************************************************************************** vector SharedCommand::setParameters(){ try { CommandParameter pbiom("biom", "InputTypes", "", "", "BiomListGroup", "BiomListGroup", "none","shared",false,false); parameters.push_back(pbiom); CommandParameter plist("list", "InputTypes", "", "", "BiomListGroup", "BiomListGroup", "ListGroup","shared",false,false,true); parameters.push_back(plist); CommandParameter pcount("count", "InputTypes", "", "", "none", "GroupCount", "none","",false,false); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "none", "GroupCount", "ListGroup","",false,false,true); parameters.push_back(pgroup); //CommandParameter pordergroup("ordergroup", "InputTypes", "", "", "none", "none", "none",false,false); parameters.push_back(pordergroup); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","group",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SharedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SharedCommand::getHelpString(){ try { string helpString = ""; helpString += "The make.shared command reads a list and group file or a biom file and creates a shared file. If a list and group are provided a rabund file is created for each group.\n"; helpString += "The make.shared command parameters are list, group, biom, groups, count and label. list and group or count are required unless a current file is available or you provide a biom file.\n"; helpString += "The count parameter allows you to provide a count file containing the group info for the list file.\n"; helpString += "The groups parameter allows you to indicate which groups you want to include, group names should be separated by dashes. ex. groups=A-B-C. Default is all groups in your groupfile.\n"; helpString += "The label parameter is only valid with the list and group option and allows you to indicate which labels you want to include, label names should be separated by dashes. Default is all labels in your list file.\n"; //helpString += "The ordergroup parameter allows you to indicate the order of the groups in the sharedfile, by default the groups are listed alphabetically.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SharedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SharedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "shared") { pattern = "[filename],shared-[filename],[distance],shared"; } // else if (type == "rabund") { pattern = "[filename],[group],rabund"; } else if (type == "group") { pattern = "[filename],[group],groups"; } else if (type == "map") { pattern = "[filename],map"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SharedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SharedCommand::SharedCommand(){ try { abort = true; calledHelp = true; setParameters(); //initialize outputTypes vector tempOutNames; // outputTypes["rabund"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["map"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SharedCommand", "SharedCommand"); exit(1); } } //********************************************************************************************************************** SharedCommand::SharedCommand(string option) { try { abort = false; calledHelp = false; pickedGroups=false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("biom"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["biom"] = inputDir + it->second; } } } vector tempOutNames; // outputTypes["rabund"] = tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["map"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } biomfile = validParameter.validFile(parameters, "biom", true); if (biomfile == "not open") { biomfile = ""; abort = true; } else if (biomfile == "not found") { biomfile = ""; } else { m->setBiomFile(biomfile); } ordergroupfile = validParameter.validFile(parameters, "ordergroup", true); if (ordergroupfile == "not open") { abort = true; } else if (ordergroupfile == "not found") { ordergroupfile = ""; } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); CountTable temp; if (!temp.testGroups(countfile)) { m->mothurOut("[ERROR]: Your count file does not have group info, aborting."); m->mothurOutEndLine(); abort=true; } } if ((biomfile == "") && (listfile == "")) { //is there are current file available for either of these? //give priority to list, then biom listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { biomfile = m->getBiomFile(); if (biomfile != "") { m->mothurOut("Using " + biomfile + " as input file for the biom parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list or biom file before you can use the make.shared command."); m->mothurOutEndLine(); abort = true; } } } else if ((biomfile != "") && (listfile != "")) { m->mothurOut("When executing a make.shared command you must enter ONLY ONE of the following: list or biom."); m->mothurOutEndLine(); abort = true; } if (listfile != "") { if ((groupfile == "") && (countfile == "")) { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a groupfile or countfile if you are going to use the list format."); m->mothurOutEndLine(); abort = true; } } } } string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { pickedGroups=true; m->splitAtDash(groups, Groups); m->setGroups(Groups); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } } } catch(exception& e) { m->errorOut(e, "SharedCommand", "SharedCommand"); exit(1); } } //********************************************************************************************************************** int SharedCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (listfile != "") { createSharedFromListGroup(); } else { createSharedFromBiom(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } } string current = ""; itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SharedCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SharedCommand::createSharedFromBiom() { try { //getting output filename string filename = biomfile; if (outputDir == "") { outputDir += m->hasPath(filename); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(filename)); filename = getOutputFileName("shared",variables); outputNames.push_back(filename); outputTypes["shared"].push_back(filename); ofstream out; m->openOutputFile(filename, out); /*{ "id":"/Users/SarahsWork/Desktop/release/temp.job2.shared-unique", "format": "Biological Observation Matrix 0.9.1", "format_url": "http://biom-format.org", "type": "OTU table", "generated_by": "mothur1.24.0", "date": "Tue Apr 17 13:12:07 2012", */ ifstream in; m->openInputFile(biomfile, in); string matrixFormat = ""; int numRows = 0; int numCols = 0; int shapeNumRows = 0; int shapeNumCols = 0; vector otuNames; vector groupNames; map fileLines; vector names; int countOpenBrace = 0; int countClosedBrace = 0; int openParen = -1; //account for opening brace int closeParen = 0; bool ignoreCommas = false; bool atComma = false; string line = ""; string matrixElementType = ""; while (!in.eof()) { //split file by tags, so each "line" will have something like "id":"/Users/SarahsWork/Desktop/release/final.tx.1.subsample.1.pick.shared-1" if (m->control_pressed) { break; } char c = in.get(); m->gobble(in); if (c == '[') { countOpenBrace++; } else if (c == ']') { countClosedBrace++; } else if (c == '{') { openParen++; } else if (c == '}') { closeParen++; } else if ((!ignoreCommas) && (c == ',')) { atComma = true; } if ((countOpenBrace != countClosedBrace) && (countOpenBrace != countClosedBrace)) { ignoreCommas = true; } else if ((countOpenBrace == countClosedBrace) && (countOpenBrace == countClosedBrace)) { ignoreCommas = false; } if (atComma && !ignoreCommas) { if (fileLines.size() == 0) { //clip first { line = line.substr(1); } string tag = getTag(line); fileLines[tag] = line; line = ""; atComma = false; ignoreCommas = false; }else { line += c; } } if (line != "") { line = line.substr(0, line.length()-1); string tag = getTag(line); fileLines[tag] = line; } in.close(); string biomType; map::iterator it; it = fileLines.find("type"); if (it == fileLines.end()) { m->mothurOut("[ERROR]: you file does not have a type provided.\n"); } else { string thisLine = it->second; biomType = getTag(thisLine); // if ((biomType != "OTU table") && (biomType != "OTUtable") && (biomType != "Taxon table") && (biomType != "Taxontable")) { m->mothurOut("[ERROR]: " + biomType + " is not a valid biom type for mothur. Only types allowed are OTU table and Taxon table.\n"); m->control_pressed = true; } } if (m->control_pressed) { out.close(); m->mothurRemove(filename); return 0; } it = fileLines.find("matrix_type"); if (it == fileLines.end()) { m->mothurOut("[ERROR]: you file does not have a matrix_type provided.\n"); } else { string thisLine = it->second; matrixFormat = getTag(thisLine); if ((matrixFormat != "sparse") && (matrixFormat != "dense")) { m->mothurOut("[ERROR]: " + matrixFormat + " is not a valid biom matrix_type for mothur. Types allowed are sparse and dense.\n"); m->control_pressed = true; } } if (m->control_pressed) { out.close(); m->mothurRemove(filename); return 0; } it = fileLines.find("matrix_element_type"); if (it == fileLines.end()) { m->mothurOut("[ERROR]: you file does not have a matrix_element_type provided.\n"); } else { string thisLine = it->second; matrixElementType = getTag(thisLine); if ((matrixElementType != "int") && (matrixElementType != "float")) { m->mothurOut("[ERROR]: " + matrixElementType + " is not a valid biom matrix_element_type for mothur. Types allowed are int and float.\n"); m->control_pressed = true; } if (matrixElementType == "float") { m->mothurOut("[WARNING]: the shared file only uses integers, any float values will be rounded down to the nearest integer.\n"); } } if (m->control_pressed) { out.close(); m->mothurRemove(filename); return 0; } it = fileLines.find("rows"); if (it == fileLines.end()) { m->mothurOut("[ERROR]: you file does not have a rows provided.\n"); } else { string thisLine = it->second; if ((biomType == "Taxon table") || (biomType == "Taxontable")) { string mapFilename = getOutputFileName("map",variables); outputNames.push_back(mapFilename); outputTypes["map"].push_back(mapFilename); ofstream outMap; m->openOutputFile(mapFilename, outMap); vector taxonomies = readRows(thisLine, numRows); string snumBins = toString(numRows); for (int i = 0; i < numRows; i++) { //if there is a bin label use it otherwise make one string binLabel = "OTU"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; otuNames.push_back(binLabel); outMap << otuNames[i] << '\t' << taxonomies[i] << endl; } outMap.close(); }else{ otuNames = readRows(thisLine, numRows); } } if (m->control_pressed) { out.close(); m->mothurRemove(filename); return 0; } it = fileLines.find("columns"); if (it == fileLines.end()) { m->mothurOut("[ERROR]: you file does not have a columns provided.\n"); } else { string thisLine = it->second; //read sample names groupNames = readRows(thisLine, numCols); //if users selected groups, then remove the groups not wanted. SharedUtil util; vector Groups = m->getGroups(); vector allGroups = groupNames; util.setGroups(Groups, allGroups); m->setGroups(Groups); //set fileroot fileroot = outputDir + m->getRootName(m->getSimpleName(biomfile)); } if (m->control_pressed) { out.close(); m->mothurRemove(filename); return 0; } it = fileLines.find("shape"); if (it == fileLines.end()) { m->mothurOut("[ERROR]: you file does not have a shape provided.\n"); } else { string thisLine = it->second; getDims(thisLine, shapeNumRows, shapeNumCols); //check shape if (shapeNumCols != numCols) { m->mothurOut("[ERROR]: shape indicates " + toString(shapeNumCols) + " columns, but I only read " + toString(numCols) + " columns.\n"); m->control_pressed = true; } if (shapeNumRows != numRows) { m->mothurOut("[ERROR]: shape indicates " + toString(shapeNumRows) + " rows, but I only read " + toString(numRows) + " rows.\n"); m->control_pressed = true; } } if (m->control_pressed) { out.close(); m->mothurRemove(filename); return 0; } it = fileLines.find("data"); if (it == fileLines.end()) { m->mothurOut("[ERROR]: you file does not have a data provided.\n"); } else { string thisLine = it->second; m->currentSharedBinLabels = otuNames; //read data vector lookup = readData(matrixFormat, thisLine, matrixElementType, groupNames, otuNames.size()); m->mothurOutEndLine(); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); lookup[0]->printHeaders(out); printSharedData(lookup, out); } //for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { delete it3->second; } //out.close(); if (m->control_pressed) { m->mothurRemove(filename); return 0; } return 0; } catch(exception& e) { m->errorOut(e, "SharedCommand", "createSharedFromBiom"); exit(1); } } //********************************************************************************************************************** vector SharedCommand::readData(string matrixFormat, string line, string matrixElementType, vector& groupNames, int numOTUs) { try { vector lookup; //creates new sharedRAbunds for (int i = 0; i < groupNames.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(numOTUs); //sets all abunds to 0 temp->setLabel("userLabel"); temp->setGroup(groupNames[i]); lookup.push_back(temp); } bool dataStart = false; bool inBrackets = false; string num = ""; vector nums; int otuCount = 0; for (int i = 0; i < line.length(); i++) { if (m->control_pressed) { return lookup; } //look for opening [ to indicate data is starting if ((line[i] == '[') && (!dataStart)) { dataStart = true; i++; if (!(i < line.length())) { break; } } else if ((line[i] == ']') && dataStart && (!inBrackets)) { break; } //we are done reading data if (dataStart) { if ((line[i] == '[') && (!inBrackets)) { inBrackets = true; i++; if (!(i < line.length())) { break; } } else if ((line[i] == ']') && (inBrackets)) { inBrackets = false; int temp; float temp2; if (matrixElementType == "float") { m->mothurConvert(num, temp2); temp = (int)temp2; } else { m->mothurConvert(num, temp); } nums.push_back(temp); num = ""; //save info to vectors if (matrixFormat == "dense") { //sanity check if (nums.size() != lookup.size()) { m->mothurOut("[ERROR]: trouble parsing OTU data. OTU " + toString(otuCount) + " causing errors.\n"); m->control_pressed = true; } //set abundances for this otu //nums contains [abundSample0, abundSample1, abundSample2, ...] for current OTU for (int j = 0; j < lookup.size(); j++) { lookup[j]->set(otuCount, nums[j], groupNames[j]); } otuCount++; }else { //sanity check if (nums.size() != 3) { m->mothurOut("[ERROR]: trouble parsing OTU data.\n"); m->control_pressed = true; } //nums contains [otuNum, sampleNum, abundance] lookup[nums[1]]->set(nums[0], nums[2], groupNames[nums[1]]); } nums.clear(); } if (inBrackets) { if (line[i] == ',') { int temp; m->mothurConvert(num, temp); nums.push_back(temp); num = ""; }else { if (!isspace(line[i])) { num += line[i]; } } } } } SharedUtil util; bool remove = false; if (pickedGroups) { for (int i = 0; i < lookup.size(); i++) { //if this sharedrabund is not from a group the user wants then delete it. if (util.isValidGroup(lookup[i]->getGroup(), m->getGroups()) == false) { remove = true; delete lookup[i]; lookup[i] = NULL; lookup.erase(lookup.begin()+i); i--; } } } if (remove) { eliminateZeroOTUS(lookup); } return lookup; } catch(exception& e) { m->errorOut(e, "SharedCommand", "readData"); exit(1); } } //********************************************************************************************************************** int SharedCommand::eliminateZeroOTUS(vector& thislookup) { try { vector newLookup; for (int i = 0; i < thislookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); newLookup.push_back(temp); } //for each bin vector newBinLabels; string snumBins = toString(thislookup[0]->getNumBins()); for (int i = 0; i < thislookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //look at each sharedRabund and make sure they are not all zero bool allZero = true; for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { allZero = false; break; } } //if they are not all zero add this bin if (!allZero) { for (int j = 0; j < thislookup.size(); j++) { newLookup[j]->push_back(thislookup[j]->getAbundance(i), thislookup[j]->getGroup()); } //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } newBinLabels.push_back(binLabel); } } for (int j = 0; j < thislookup.size(); j++) { delete thislookup[j]; } thislookup = newLookup; m->currentSharedBinLabels = newBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "SharedCommand", "eliminateZeroOTUS"); exit(1); } } //********************************************************************************************************************** int SharedCommand::getDims(string line, int& shapeNumRows, int& shapeNumCols) { try { //get shape bool inBar = false; string num = ""; for (int i = 0; i < line.length(); i++) { //you want to ignore any ; until you reach the next ' if ((line[i] == '[') && (!inBar)) { inBar = true; i++; if (!(i < line.length())) { break; } } else if ((line[i] == ']') && (inBar)) { inBar= false; m->mothurConvert(num, shapeNumCols); break; } if (inBar) { if (line[i] == ',') { m->mothurConvert(num, shapeNumRows); num = ""; }else { if (!isspace(line[i])) { num += line[i]; } } } } return 0; } catch(exception& e) { m->errorOut(e, "SharedCommand", "getDims"); exit(1); } } //********************************************************************************************************************** vector SharedCommand::readRows(string line, int& numRows) { try { /*"rows":[ {"id":"Otu01", "metadata":{"taxonomy":["Bacteria", "Bacteroidetes", "Bacteroidia", "Bacteroidales", "Porphyromonadaceae", "unclassified"], "bootstrap":[100, 100, 100, 100, 100, 100]}}, {"id":"Otu02", "metadata":{"taxonomy":["Bacteria", "Bacteroidetes", "Bacteroidia", "Bacteroidales", "Rikenellaceae", "Alistipes"], "bootstrap":[100, 100, 100, 100, 100, 100]}}, ... "rows":[{"id": "k__Archaea;p__Euryarchaeota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae", "metadata": null}, {"id": "k__Bacteria;p__Actinobacteria;c__Actinobacteria;o__Actinomycetales;f__Actinomycetaceae", "metadata": null} .... make look like above ],*/ vector names; int countOpenBrace = 0; int countClosedBrace = 0; int openParen = 0; int closeParen = 0; string nextRow = ""; bool end = false; for (int i = 0; i < line.length(); i++) { if (m->control_pressed) { return names; } if (line[i] == '[') { countOpenBrace++; } else if (line[i] == ']') { countClosedBrace++; } else if (line[i] == '{') { openParen++; } else if (line[i] == '}') { closeParen++; } else if (openParen != 0) { nextRow += line[i]; } //you are reading the row info //you have reached the end of the rows info if ((countOpenBrace == countClosedBrace) && (countClosedBrace != 0)) { end = true; break; } if ((openParen == closeParen) && (closeParen != 0)) { //process row numRows++; vector items; m->splitAtChar(nextRow, items, ','); //parse by comma, will return junk for metadata but we aren't using that anyway string part = items[0]; items.clear(); m->splitAtChar(part, items, ':'); //split part we want containing the ids string name = items[1]; //remove "" if needed int pos = name.find("\""); if (pos != string::npos) { string newName = ""; for (int k = 0; k < name.length(); k++) { if (name[k] != '\"') { newName += name[k]; } } name = newName; } names.push_back(name); nextRow = ""; openParen = 0; closeParen = 0; } } return names; } catch(exception& e) { m->errorOut(e, "SharedCommand", "readRows"); exit(1); } } //********************************************************************************************************************** //designed for things like "type": "OTU table", returns type string SharedCommand::getTag(string& line) { try { bool inQuotes = false; string tag = ""; char c = '\"'; for (int i = 0; i < line.length(); i++) { //you want to ignore any ; until you reach the next ' if ((line[i] == c) && (!inQuotes)) { inQuotes = true; } else if ((line[i] == c) && (inQuotes)) { inQuotes= false; line = line.substr(i+1); return tag; } if (inQuotes) { if (line[i] != c) { tag += line[i]; } } } return tag; } catch(exception& e) { m->errorOut(e, "SharedCommand", "getInfo"); exit(1); } } //********************************************************************************************************************** int SharedCommand::createSharedFromListGroup() { try { GroupMap* groupMap = NULL; CountTable* countTable = NULL; if (groupfile != "") { groupMap = new GroupMap(groupfile); int groupError = groupMap->readMap(); if (groupError == 1) { delete groupMap; return 0; } vector allGroups = groupMap->getNamesOfGroups(); m->setAllGroups(allGroups); }else{ countTable = new CountTable(); countTable->readTable(countfile, true, false); } if (m->control_pressed) { return 0; } pickedGroups = false; //if hte user has not specified any groups then use them all if (Groups.size() == 0) { if (groupfile != "") { Groups = groupMap->getNamesOfGroups(); } else { Groups = countTable->getNamesOfGroups(); } m->setGroups(Groups); }else { pickedGroups = true; } ofstream out; string filename = ""; if (!pickedGroups) { string filename = listfile; if (outputDir == "") { outputDir += m->hasPath(filename); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(filename)); filename = getOutputFileName("shared",variables); outputNames.push_back(filename); outputTypes["shared"].push_back(filename); m->openOutputFile(filename, out); } //set fileroot fileroot = outputDir + m->getRootName(m->getSimpleName(listfile)); map variables; variables["[filename]"] = fileroot; string errorOff = "no error"; InputData input(listfile, "shared"); SharedListVector* SharedList = input.getSharedListVector(); string lastLabel = SharedList->getLabel(); vector lookup; if (m->control_pressed) { delete SharedList; if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } out.close(); if (!pickedGroups) { m->mothurRemove(filename); } return 0; } //sanity check vector namesSeqs; int numGroupNames = 0; if (m->groupMode == "group") { namesSeqs = groupMap->getNamesSeqs(); numGroupNames = groupMap->getNumSeqs(); } else { namesSeqs = countTable->getNamesOfSeqs(); numGroupNames = countTable->getNumUniqueSeqs(); } int error = ListGroupSameSeqs(namesSeqs, SharedList); if ((!pickedGroups) && (SharedList->getNumSeqs() != numGroupNames)) { //if the user has not specified any groups and their files don't match exit with error m->mothurOut("Your group file contains " + toString(numGroupNames) + " sequences and list file contains " + toString(SharedList->getNumSeqs()) + " sequences. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; out.close(); if (!pickedGroups) { m->mothurRemove(filename); } //remove blank shared file you made //delete memory delete SharedList; if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } return 0; } if (error == 1) { m->control_pressed = true; } //if user has specified groups make new groupfile for them if ((pickedGroups) && (m->groupMode == "group")) { //make new group file string groups = ""; if (m->getNumGroups() < 4) { for (int i = 0; i < m->getNumGroups()-1; i++) { groups += (m->getGroups())[i] + "."; } groups+=(m->getGroups())[m->getNumGroups()-1]; }else { groups = "merge"; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[group]"] = groups; string newGroupFile = getOutputFileName("group",variables); outputTypes["group"].push_back(newGroupFile); outputNames.push_back(newGroupFile); ofstream outGroups; m->openOutputFile(newGroupFile, outGroups); vector names = groupMap->getNamesSeqs(); string groupName; for (int i = 0; i < names.size(); i++) { groupName = groupMap->getGroup(names[i]); if (isValidGroup(groupName, m->getGroups())) { outGroups << names[i] << '\t' << groupName << endl; } } outGroups.close(); } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; while((SharedList != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete SharedList; if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } if (!pickedGroups) { out.close(); m->mothurRemove(filename); } return 0; } if(allLines == 1 || labels.count(SharedList->getLabel()) == 1){ lookup = SharedList->getSharedRAbundVector(); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { delete SharedList; if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } if (!pickedGroups) { out.close(); m->mothurRemove(filename); } return 0; } //if picked groups must split the shared file by label if (pickedGroups) { string filename = listfile; if (outputDir == "") { outputDir += m->hasPath(filename); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(filename)); variables["[distance]"] = lookup[0]->getLabel(); filename = getOutputFileName("shared",variables); outputNames.push_back(filename); outputTypes["shared"].push_back(filename); ofstream out2; m->openOutputFile(filename, out2); vector savedLabels = m->currentSharedBinLabels; eliminateZeroOTUS(lookup); lookup[0]->printHeaders(out2); printSharedData(lookup, out2); out2.close(); m->currentSharedBinLabels = savedLabels; //restore old labels }else { if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } printSharedData(lookup, out); //prints info to the .shared file } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } processedLabels.insert(SharedList->getLabel()); userLabels.erase(SharedList->getLabel()); } if ((m->anyLabelsToProcess(SharedList->getLabel(), userLabels, errorOff) == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = SharedList->getLabel(); delete SharedList; SharedList = input.getSharedListVector(lastLabel); //get new list vector to process lookup = SharedList->getSharedRAbundVector(); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { delete SharedList; if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } if (!pickedGroups) { out.close(); m->mothurRemove(filename); } return 0; } //if picked groups must split the shared file by label if (pickedGroups) { string filename = listfile; if (outputDir == "") { outputDir += m->hasPath(filename); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(filename)); variables["[distance]"] = lookup[0]->getLabel(); filename = getOutputFileName("shared",variables); outputNames.push_back(filename); outputTypes["shared"].push_back(filename); ofstream out2; m->openOutputFile(filename, out2); vector savedLabels = m->currentSharedBinLabels; eliminateZeroOTUS(lookup); lookup[0]->printHeaders(out2); printSharedData(lookup, out2); out2.close(); m->currentSharedBinLabels = savedLabels; //restore old labels }else { if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } printSharedData(lookup, out); //prints info to the .shared file } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } processedLabels.insert(SharedList->getLabel()); userLabels.erase(SharedList->getLabel()); //restore real lastlabel to save below SharedList->setLabel(saveLabel); } lastLabel = SharedList->getLabel(); delete SharedList; SharedList = input.getSharedListVector(); //get new list vector to process } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { if (processedLabels.count(lastLabel) != 1) { needToRun = true; } } //run last label if you need to if (needToRun == true) { if (SharedList != NULL) { delete SharedList; } SharedList = input.getSharedListVector(lastLabel); //get new list vector to process lookup = SharedList->getSharedRAbundVector(); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); if (m->control_pressed) { if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } if (!pickedGroups) { out.close(); m->mothurRemove(filename); } return 0; } //if picked groups must split the shared file by label if (pickedGroups) { string filename = listfile; if (outputDir == "") { outputDir += m->hasPath(filename); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(filename)); variables["[distance]"] = lookup[0]->getLabel(); filename = getOutputFileName("shared",variables); outputNames.push_back(filename); outputTypes["shared"].push_back(filename); ofstream out2; m->openOutputFile(filename, out2); vector savedLabels = m->currentSharedBinLabels; eliminateZeroOTUS(lookup); lookup[0]->printHeaders(out2); printSharedData(lookup, out2); out2.close(); m->currentSharedBinLabels = savedLabels; //restore old labels }else { if (!m->printedSharedHeaders) { lookup[0]->printHeaders(out); } printSharedData(lookup, out); //prints info to the .shared file } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } delete SharedList; } if (!pickedGroups) { out.close(); } if (groupMap != NULL) { delete groupMap; } if (countTable != NULL) { delete countTable; } if (m->control_pressed) { if (!pickedGroups) { m->mothurRemove(filename); } return 0; } return 0; } catch(exception& e) { m->errorOut(e, "SharedCommand", "createSharedFromListGroup"); exit(1); } } //********************************************************************************************************************** void SharedCommand::printSharedData(vector thislookup, ofstream& out) { try { if (order.size() == 0) { //user has not specified an order so do aplabetically sort(thislookup.begin(), thislookup.end(), compareSharedRabunds); m->clearGroups(); vector Groups; //initialize bin values for (int i = 0; i < thislookup.size(); i++) { out << thislookup[i]->getLabel() << '\t' << thislookup[i]->getGroup() << '\t'; thislookup[i]->print(out); Groups.push_back(thislookup[i]->getGroup()); } m->setGroups(Groups); }else{ //create a map from groupName to each sharedrabund map myMap; map::iterator myIt; for (int i = 0; i < thislookup.size(); i++) { myMap[thislookup[i]->getGroup()] = thislookup[i]; } m->clearGroups(); vector Groups; //loop through ordered list and print the rabund for (int i = 0; i < order.size(); i++) { myIt = myMap.find(order[i]); if(myIt != myMap.end()) { //we found it out << (myIt->second)->getLabel() << '\t' << (myIt->second)->getGroup() << '\t'; (myIt->second)->print(out); Groups.push_back((myIt->second)->getGroup()); }else{ m->mothurOut("Can't find shared info for " + order[i] + ", skipping."); m->mothurOutEndLine(); } } m->setGroups(Groups); } } catch(exception& e) { m->errorOut(e, "SharedCommand", "printSharedData"); exit(1); } } //********************************************************************************************************************** int SharedCommand::ListGroupSameSeqs(vector& groupMapsSeqs, SharedListVector* SharedList) { try { int error = 0; set groupNamesSeqs; for(int i = 0; i < groupMapsSeqs.size(); i++) { groupNamesSeqs.insert(groupMapsSeqs[i]); } //go through list and if group returns "not found" output it for (int i = 0; i < SharedList->getNumBins(); i++) { if (m->control_pressed) { return 0; } string names = SharedList->get(i); vector listNames; m->splitAtComma(names, listNames); for (int j = 0; j < listNames.size(); j++) { int num = groupNamesSeqs.count(listNames[j]); if (num == 0) { error = 1; if (groupfile != "") { m->mothurOut("[ERROR]: " + listNames[j] + " is in your listfile and not in your groupfile. Please correct."); m->mothurOutEndLine(); } else{ m->mothurOut("[ERROR]: " + listNames[j] + " is in your listfile and not in your count file. Please correct."); m->mothurOutEndLine(); } }else { groupNamesSeqs.erase(listNames[j]); } } } for (set::iterator itGroupSet = groupNamesSeqs.begin(); itGroupSet != groupNamesSeqs.end(); itGroupSet++) { error = 1; m->mothurOut("[ERROR]: " + (*itGroupSet) + " is in your groupfile and not your listfile. Please correct."); m->mothurOutEndLine(); } return error; } catch(exception& e) { m->errorOut(e, "SharedCommand", "ListGroupSameSeqs"); exit(1); } } //********************************************************************************************************************** SharedCommand::~SharedCommand(){ //delete list; } //********************************************************************************************************************** int SharedCommand::readOrderFile() { try { //remove old names order.clear(); ifstream in; m->openInputFile(ordergroupfile, in); string thisGroup; while(!in.eof()){ in >> thisGroup; m->gobble(in); order.push_back(thisGroup); if (m->control_pressed) { order.clear(); break; } } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "SharedCommand", "readOrderFile"); exit(1); } } //********************************************************************************************************************** bool SharedCommand::isValidGroup(string groupname, vector groups) { try { for (int i = 0; i < groups.size(); i++) { if (groupname == groups[i]) { return true; } } return false; } catch(exception& e) { m->errorOut(e, "SharedCommand", "isValidGroup"); exit(1); } } /************************************************************/ mothur-1.36.1/source/commands/sharedcommand.h000066400000000000000000000036221255543666200212200ustar00rootroot00000000000000#ifndef SHAREDCOMMAND_H #define SHAREDCOMMAND_H /* * sharedcommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "sharedlistvector.h" #include "inputdata.h" /* The shared() command: The shared command can only be executed after a successful read.shared command. The shared command parses a .list file and separates it into groups. It outputs a .shared file containing the OTU information for each group. There are no shared command parameters. The shared command should be in the following format: shared(). */ class SharedCommand : public Command { public: SharedCommand(string); SharedCommand(); ~SharedCommand(); vector setParameters(); string getCommandName() { return "make.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Make.shared"; } string getDescription() { return "make a shared file from a list and group file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: void printSharedData(vector, ofstream&); int readOrderFile(); bool isValidGroup(string, vector); int eliminateZeroOTUS(vector&); int ListGroupSameSeqs(vector&, SharedListVector*); int createSharedFromListGroup(); int createSharedFromBiom(); string getTag(string&); vector readRows(string, int&); int getDims(string, int&, int&); vector readData(string, string, string, vector&, int); vector Groups, outputNames, order; set labels; string fileroot, outputDir, listfile, groupfile, biomfile, ordergroupfile, countfile; bool firsttime, pickedGroups, abort, allLines; }; #endif mothur-1.36.1/source/commands/shhhercommand.cpp000066400000000000000000004214611255543666200215730ustar00rootroot00000000000000/* * shhher.cpp * Mothur * * Created by Pat Schloss on 12/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "shhhercommand.h" //********************************************************************************************************************** vector ShhherCommand::setParameters(){ try { CommandParameter pflow("flow", "InputTypes", "", "", "none", "fileflow", "none","fasta-name-group-counts-qfile",false,false,true); parameters.push_back(pflow); CommandParameter pfile("file", "InputTypes", "", "", "none", "fileflow", "none","fasta-name-group-counts-qfile",false,false,true); parameters.push_back(pfile); CommandParameter plookup("lookup", "InputTypes", "", "", "none", "none", "none","",false,false,true); parameters.push_back(plookup); CommandParameter pcutoff("cutoff", "Number", "", "0.01", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pmaxiter("maxiter", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(pmaxiter); CommandParameter plarge("large", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(plarge); CommandParameter psigma("sigma", "Number", "", "60", "", "", "","",false,false); parameters.push_back(psigma); CommandParameter pmindelta("mindelta", "Number", "", "0.000001", "", "", "","",false,false); parameters.push_back(pmindelta); CommandParameter porder("order", "Multiple", "A-B-I", "A", "", "", "","",false,false, true); parameters.push_back(porder); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ShhherCommand::getHelpString(){ try { string helpString = ""; helpString += "The shhh.flows command reads a file containing flowgrams and creates a file of corrected sequences.\n"; helpString += "The shhh.flows command parameters are flow, file, lookup, cutoff, processors, large, maxiter, sigma, mindelta and order.\n"; helpString += "The flow parameter is used to input your flow file.\n"; helpString += "The file parameter is used to input the *flow.files file created by trim.flows.\n"; helpString += "The lookup parameter is used specify the lookup file you would like to use. http://www.mothur.org/wiki/Lookup_files.\n"; helpString += "The order parameter options are A, B or I. Default=A. A = TACG and B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ShhherCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],shhh.fasta"; } else if (type == "name") { pattern = "[filename],shhh.names"; } else if (type == "group") { pattern = "[filename],shhh.groups"; } else if (type == "counts") { pattern = "[filename],shhh.counts"; } else if (type == "qfile") { pattern = "[filename],shhh.qual"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ShhherCommand::ShhherCommand(){ try { abort = true; calledHelp = true; setParameters(); //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["counts"] = tempOutNames; outputTypes["qfile"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "ShhherCommand"); exit(1); } } //********************************************************************************************************************** ShhherCommand::ShhherCommand(string option) { try { #ifdef USE_MPI MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &ncpus); if(pid == 0){ #endif abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["counts"] = tempOutNames; outputTypes["qfile"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("flow"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["flow"] = inputDir + it->second; } } it = parameters.find("lookup"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["lookup"] = inputDir + it->second; } } it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters flowFileName = validParameter.validFile(parameters, "flow", true); flowFilesFileName = validParameter.validFile(parameters, "file", true); if (flowFileName == "not found" && flowFilesFileName == "not found") { m->mothurOut("values for either flow or file must be provided for the shhh.flows command."); m->mothurOutEndLine(); abort = true; } else if (flowFileName == "not open" || flowFilesFileName == "not open") { abort = true; } if(flowFileName != "not found"){ compositeFASTAFileName = ""; compositeNamesFileName = ""; } else{ ofstream temp; string thisoutputDir = outputDir; if (outputDir == "") { thisoutputDir = m->hasPath(flowFilesFileName); } //if user entered a file with a path then preserve it //we want to rip off .files, and also .flow if its there string fileroot = m->getRootName(m->getSimpleName(flowFilesFileName)); if (fileroot[fileroot.length()-1] == '.') { fileroot = fileroot.substr(0, fileroot.length()-1); } //rip off dot string extension = m->getExtension(fileroot); if (extension == ".flow") { fileroot = m->getRootName(fileroot); } else { fileroot += "."; } //add back if needed compositeFASTAFileName = thisoutputDir + fileroot + "shhh.fasta"; m->openOutputFile(compositeFASTAFileName, temp); temp.close(); compositeNamesFileName = thisoutputDir + fileroot + "shhh.names"; m->openOutputFile(compositeNamesFileName, temp); temp.close(); } if(flowFilesFileName != "not found"){ string fName; ifstream flowFilesFile; m->openInputFile(flowFilesFileName, flowFilesFile); while(flowFilesFile){ fName = m->getline(flowFilesFile); //test if file is valid ifstream in; int ableToOpen = m->openInputFile(fName, in, "noerror"); in.close(); if (ableToOpen == 1) { if (inputDir != "") { //default path is set string tryPath = inputDir + fName; m->mothurOut("Unable to open " + fName + ". Trying input directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fName = tryPath; } } if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(fName); m->mothurOut("Unable to open " + fName + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fName = tryPath; } } //if you can't open it its not in current working directory or inputDir, try mothur excutable location if (ableToOpen == 1) { string exepath = m->argv; string tempPath = exepath; for (int i = 0; i < exepath.length(); i++) { tempPath[i] = tolower(exepath[i]); } exepath = exepath.substr(0, (tempPath.find_last_of('m'))); string tryPath = m->getFullPathName(exepath) + m->getSimpleName(fName); m->mothurOut("Unable to open " + fName + ". Trying mothur's executable location " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); fName = tryPath; } if (ableToOpen == 1) { m->mothurOut("Unable to open " + fName + ". Disregarding. "); m->mothurOutEndLine(); } else { flowFileVector.push_back(fName); } m->gobble(flowFilesFile); } flowFilesFile.close(); if (flowFileVector.size() == 0) { m->mothurOut("[ERROR]: no valid files."); m->mothurOutEndLine(); abort = true; } } else{ if (outputDir == "") { outputDir = m->hasPath(flowFileName); } flowFileVector.push_back(flowFileName); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "lookup", true); if (temp == "not found") { string path = m->argv; string tempPath = path; for (int i = 0; i < path.length(); i++) { tempPath[i] = tolower(path[i]); } path = path.substr(0, (tempPath.find_last_of('m'))); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) path += "lookupFiles/"; #else path += "lookupFiles\\"; #endif lookupFileName = m->getFullPathName(path) + "LookUp_Titanium.pat"; int ableToOpen; ifstream in; ableToOpen = m->openInputFile(lookupFileName, in, "noerror"); in.close(); //if you can't open it, try input location if (ableToOpen == 1) { if (inputDir != "") { //default path is set string tryPath = inputDir + m->getSimpleName(lookupFileName); m->mothurOut("Unable to open " + lookupFileName + ". Trying input directory " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); lookupFileName = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(lookupFileName); m->mothurOut("Unable to open " + lookupFileName + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); lookupFileName = tryPath; } } //if you can't open it its not in current working directory or inputDir, try mothur excutable location if (ableToOpen == 1) { string exepath = m->argv; string tempPath = exepath; for (int i = 0; i < exepath.length(); i++) { tempPath[i] = tolower(exepath[i]); } exepath = exepath.substr(0, (tempPath.find_last_of('m'))); string tryPath = m->getFullPathName(exepath) + m->getSimpleName(lookupFileName); m->mothurOut("Unable to open " + lookupFileName + ". Trying mothur's executable location " + tryPath); m->mothurOutEndLine(); ifstream in2; ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); lookupFileName = tryPath; } if (ableToOpen == 1) { m->mothurOut("Unable to open " + lookupFileName + "."); m->mothurOutEndLine(); abort=true; } } else if(temp == "not open") { lookupFileName = validParameter.validFile(parameters, "lookup", false); //if you can't open it its not inputDir, try mothur excutable location string exepath = m->argv; string tempPath = exepath; for (int i = 0; i < exepath.length(); i++) { tempPath[i] = tolower(exepath[i]); } exepath = exepath.substr(0, (tempPath.find_last_of('m'))); string tryPath = m->getFullPathName(exepath) + m->getSimpleName(lookupFileName); m->mothurOut("Unable to open " + lookupFileName + ". Trying mothur's executable location " + tryPath); m->mothurOutEndLine(); ifstream in2; int ableToOpen = m->openInputFile(tryPath, in2, "noerror"); in2.close(); lookupFileName = tryPath; if (ableToOpen == 1) { m->mothurOut("Unable to open " + lookupFileName + "."); m->mothurOutEndLine(); abort=true; } }else { lookupFileName = temp; } temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found"){ temp = "0.01"; } m->mothurConvert(temp, cutoff); temp = validParameter.validFile(parameters, "mindelta", false); if (temp == "not found"){ temp = "0.000001"; } m->mothurConvert(temp, minDelta); temp = validParameter.validFile(parameters, "maxiter", false); if (temp == "not found"){ temp = "1000"; } m->mothurConvert(temp, maxIters); temp = validParameter.validFile(parameters, "large", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, largeSize); if (largeSize != 0) { large = true; } else { large = false; } if (largeSize < 0) { m->mothurOut("The value of the large cannot be negative.\n"); } #ifdef USE_MPI if (large) { m->mothurOut("The large parameter is not available with the MPI-Enabled version.\n"); large=false; } #endif temp = validParameter.validFile(parameters, "sigma", false);if (temp == "not found") { temp = "60"; } m->mothurConvert(temp, sigma); temp = validParameter.validFile(parameters, "order", false); if (temp == "not found"){ temp = "A"; } if (temp.length() > 1) { m->mothurOut("[ERROR]: " + temp + " is not a valid option for order. order options are A, B, or I. A = TACG, B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC, and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"); abort=true; } else { if (toupper(temp[0]) == 'A') { flowOrder = "TACG"; } else if(toupper(temp[0]) == 'B'){ flowOrder = "TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC"; } else if(toupper(temp[0]) == 'I'){ flowOrder = "TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC"; } else { m->mothurOut("[ERROR]: " + temp + " is not a valid option for order. order options are A, B, or I. A = TACG, B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC, and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"); abort=true; } } } #ifdef USE_MPI } #endif } catch(exception& e) { m->errorOut(e, "ShhherCommand", "ShhherCommand"); exit(1); } } //********************************************************************************************************************** #ifdef USE_MPI int ShhherCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int tag = 1976; MPI_Status status; if(pid == 0){ for(int i=1;imothurOut("\nGetting preliminary data...\n"); getSingleLookUp(); if (m->control_pressed) { return 0; } getJointLookUp(); if (m->control_pressed) { return 0; } vector flowFileVector; if(flowFilesFileName != "not found"){ string fName; ifstream flowFilesFile; m->openInputFile(flowFilesFileName, flowFilesFile); while(flowFilesFile){ fName = m->getline(flowFilesFile); flowFileVector.push_back(fName); m->gobble(flowFilesFile); } } else{ flowFileVector.push_back(flowFileName); } int numFiles = flowFileVector.size(); for(int i=1;icontrol_pressed) { break; } double begClock = clock(); unsigned long long begTime = time(NULL); flowFileName = flowFileVector[i]; m->mothurOut("\n>>>>>\tProcessing " + flowFileName + " (file " + toString(i+1) + " of " + toString(numFiles) + ")\t<<<<<\n"); m->mothurOut("Reading flowgrams...\n"); getFlowData(); if (m->control_pressed) { break; } m->mothurOut("Identifying unique flowgrams...\n"); getUniques(); if (m->control_pressed) { break; } m->mothurOut("Calculating distances between flowgrams...\n"); char fileName[1024]; strcpy(fileName, flowFileName.c_str()); for(int i=1;icontrol_pressed) { break; } int done; for(int i=1;iappendFiles((distFileName + ".temp." + toString(i)), distFileName); m->mothurRemove((distFileName + ".temp." + toString(i))); } string namesFileName = createNamesFile(); if (m->control_pressed) { break; } m->mothurOut("\nClustering flowgrams...\n"); string listFileName = cluster(distFileName, namesFileName); if (m->control_pressed) { break; } getOTUData(listFileName); m->mothurRemove(distFileName); m->mothurRemove(namesFileName); m->mothurRemove(listFileName); if (m->control_pressed) { break; } initPyroCluster(); if (m->control_pressed) { break; } for(int i=1;icontrol_pressed) { break; } double maxDelta = 0; int iter = 0; int numOTUsOnCPU = numOTUs / ncpus; int numSeqsOnCPU = numSeqs / ncpus; m->mothurOut("\nDenoising flowgrams...\n"); m->mothurOut("iter\tmaxDelta\tnLL\t\tcycletime\n"); while((maxIters == 0 && maxDelta > minDelta) || iter < MIN_ITER || (maxDelta > minDelta && iter < maxIters)){ double cycClock = clock(); unsigned long long cycTime = time(NULL); fill(); if (m->control_pressed) { break; } int total = singleTau.size(); for(int i=1;i tempCentroids(numOTUs, 0); vector tempChange(numOTUs, 0); MPI_Recv(&tempCentroids[0], numOTUs, MPI_INT, i, tag, MPI_COMM_WORLD, &status); MPI_Recv(&tempChange[0], numOTUs, MPI_SHORT, i, tag, MPI_COMM_WORLD, &status); for(int j=otuStart;jcontrol_pressed) { break; } double nLL = getLikelihood(); if (m->control_pressed) { break; } checkCentroids(); if (m->control_pressed) { break; } for(int i=1;i childSeqIndex(childTotal, 0); vector childSingleTau(childTotal, 0); vector childDist(numSeqs * numOTUs, 0); vector otuIndex(childTotal, 0); MPI_Recv(&childSeqIndex[0], childTotal, MPI_INT, i, tag, MPI_COMM_WORLD, &status); MPI_Recv(&childSingleTau[0], childTotal, MPI_DOUBLE, i, tag, MPI_COMM_WORLD, &status); MPI_Recv(&childDist[0], numOTUs * numSeqs, MPI_DOUBLE, i, tag, MPI_COMM_WORLD, &status); MPI_Recv(&otuIndex[0], childTotal, MPI_INT, i, tag, MPI_COMM_WORLD, &status); int oldTotal = total; total += childTotal; singleTau.resize(total, 0); seqIndex.resize(total, 0); seqNumber.resize(total, 0); int childIndex = 0; for(int j=oldTotal;jmothurOut(toString(iter) + '\t' + toString(maxDelta) + '\t' + toString(nLL) + '\t' + toString(time(NULL) - cycTime) + '\t' + toString((clock() - cycClock)/(double)CLOCKS_PER_SEC) + '\n'); if((maxIters == 0 && maxDelta > minDelta) || iter < MIN_ITER || (maxDelta > minDelta && iter < maxIters)){ int live = 1; for(int i=1;icontrol_pressed) { break; } m->mothurOut("\nFinalizing...\n"); fill(); if (m->control_pressed) { break; } setOTUs(); vector otuCounts(numOTUs, 0); for(int i=0;icontrol_pressed) { break; } writeQualities(otuCounts); if (m->control_pressed) { break; } writeSequences(otuCounts); if (m->control_pressed) { break; } writeNames(otuCounts); if (m->control_pressed) { break; } writeClusters(otuCounts); if (m->control_pressed) { break; } writeGroups(); if (m->control_pressed) { break; } m->mothurOut("Total time to process " + toString(flowFileName) + ":\t" + toString(time(NULL) - begTime) + '\t' + toString((clock() - begClock)/(double)CLOCKS_PER_SEC) + '\n'); } } else{ int abort = 1; MPI_Recv(&abort, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); if(abort){ return 0; } int numFiles; MPI_Recv(&numFiles, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); for(int i=0;icontrol_pressed) { break; } //Now into the pyrodist part bool live = 1; char fileName[1024]; MPI_Recv(&fileName, 1024, MPI_CHAR, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&numSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&numUniques, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&numFlowCells, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); flowDataIntI.resize(numSeqs * numFlowCells); flowDataPrI.resize(numSeqs * numFlowCells); mapUniqueToSeq.resize(numSeqs); mapSeqToUnique.resize(numSeqs); lengths.resize(numSeqs); jointLookUp.resize(NUMBINS * NUMBINS); MPI_Recv(&flowDataIntI[0], numSeqs * numFlowCells, MPI_SHORT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&flowDataPrI[0], numSeqs * numFlowCells, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&mapUniqueToSeq[0], numSeqs, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&mapSeqToUnique[0], numSeqs, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&lengths[0], numSeqs, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&jointLookUp[0], NUMBINS * NUMBINS, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&cutoff, 1, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status); flowFileName = string(fileName); int flowDistStart = int(sqrt(float(pid)/float(ncpus)) * numUniques); int flowDistEnd = int(sqrt(float(pid+1)/float(ncpus)) * numUniques); string distanceStringChild = flowDistMPI(flowDistStart, flowDistEnd); if (m->control_pressed) { break; } int done = 1; MPI_Send(&done, 1, MPI_INT, 0, tag, MPI_COMM_WORLD); //Now into the pyronoise part MPI_Recv(&numOTUs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); singleLookUp.resize(HOMOPS * NUMBINS); uniqueFlowgrams.resize(numUniques * numFlowCells); weight.resize(numOTUs); centroids.resize(numOTUs); change.resize(numOTUs); dist.assign(numOTUs * numSeqs, 0); nSeqsPerOTU.resize(numOTUs); cumNumSeqs.resize(numOTUs); MPI_Recv(&singleLookUp[0], singleLookUp.size(), MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&uniqueFlowgrams[0], uniqueFlowgrams.size(), MPI_SHORT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&sigma, 1, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status); int startOTU = pid * numOTUs / ncpus; int endOTU = (pid + 1) * numOTUs / ncpus; int startSeq = pid * numSeqs / ncpus; int endSeq = (pid + 1) * numSeqs /ncpus; int total; while(live){ if (m->control_pressed) { break; } MPI_Recv(&total, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); singleTau.assign(total, 0.0000); seqNumber.assign(total, 0); seqIndex.assign(total, 0); MPI_Recv(&change[0], numOTUs, MPI_SHORT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(¢roids[0], numOTUs, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&singleTau[0], total, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&seqNumber[0], total, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&seqIndex[0], total, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&nSeqsPerOTU[0], total, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&cumNumSeqs[0], numOTUs, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); calcCentroidsDriver(startOTU, endOTU); MPI_Send(¢roids[0], numOTUs, MPI_INT, 0, tag, MPI_COMM_WORLD); MPI_Send(&change[0], numOTUs, MPI_SHORT, 0, tag, MPI_COMM_WORLD); MPI_Recv(¢roids[0], numOTUs, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&weight[0], numOTUs, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD, &status); MPI_Recv(&change[0], numOTUs, MPI_SHORT, 0, tag, MPI_COMM_WORLD, &status); vector otuIndex(total, 0); calcNewDistancesChildMPI(startSeq, endSeq, otuIndex); total = otuIndex.size(); MPI_Send(&total, 1, MPI_INT, 0, tag, MPI_COMM_WORLD); MPI_Send(&seqIndex[0], total, MPI_INT, 0, tag, MPI_COMM_WORLD); MPI_Send(&singleTau[0], total, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD); MPI_Send(&dist[0], numOTUs * numSeqs, MPI_DOUBLE, 0, tag, MPI_COMM_WORLD); MPI_Send(&otuIndex[0], total, MPI_INT, 0, tag, MPI_COMM_WORLD); MPI_Recv(&live, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); } } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } MPI_Barrier(MPI_COMM_WORLD); if(compositeFASTAFileName != ""){ outputNames.push_back(compositeFASTAFileName); outputTypes["fasta"].push_back(compositeFASTAFileName); outputNames.push_back(compositeNamesFileName); outputTypes["name"].push_back(compositeNamesFileName); } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "execute"); exit(1); } } /**************************************************************************************************/ string ShhherCommand::createNamesFile(){ try{ vector duplicateNames(numUniques, ""); for(int i=0;i variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(flowFileName)); string nameFileName = getOutputFileName("name",variables); ofstream nameFile; m->openOutputFile(nameFileName, nameFile); for(int i=0;icontrol_pressed) { break; } // nameFile << seqNameVector[mapUniqueToSeq[i]] << '\t' << duplicateNames[i].substr(0, duplicateNames[i].find_last_of(',')) << endl; nameFile << mapUniqueToSeq[i] << '\t' << duplicateNames[i].substr(0, duplicateNames[i].find_last_of(',')) << endl; } nameFile.close(); return nameFileName; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "createNamesFile"); exit(1); } } /**************************************************************************************************/ string ShhherCommand::flowDistMPI(int startSeq, int stopSeq){ try{ ostringstream outStream; outStream.setf(ios::fixed, ios::floatfield); outStream.setf(ios::dec, ios::basefield); outStream.setf(ios::showpoint); outStream.precision(6); int begTime = time(NULL); double begClock = clock(); for(int i=startSeq;icontrol_pressed) { break; } for(int j=0;jmothurOutJustToScreen(toString(i) + '\t' + toString(time(NULL) - begTime) + '\t' + toString((clock()-begClock)/CLOCKS_PER_SEC) + '\n'); } } string fDistFileName = flowFileName.substr(0,flowFileName.find_last_of('.')) + ".shhh.dist"; if(pid != 0){ fDistFileName += ".temp." + toString(pid); } if (m->control_pressed) { return fDistFileName; } m->mothurOutJustToScreen(toString(stopSeq) + '\t' + toString(time(NULL) - begTime) + '\t' + toString((clock()-begClock)/CLOCKS_PER_SEC) + '\n'); ofstream distFile(fDistFileName.c_str()); distFile << outStream.str(); distFile.close(); return fDistFileName; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "flowDistMPI"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::getOTUData(string listFileName){ try { ifstream listFile; m->openInputFile(listFileName, listFile); string label; listFile >> label >> numOTUs; otuData.assign(numSeqs, 0); cumNumSeqs.assign(numOTUs, 0); nSeqsPerOTU.assign(numOTUs, 0); aaP.clear();aaP.resize(numOTUs); seqNumber.clear(); aaI.clear(); seqIndex.clear(); string singleOTU = ""; for(int i=0;icontrol_pressed) { break; } listFile >> singleOTU; istringstream otuString(singleOTU); while(otuString){ string seqName = ""; for(int j=0;j::iterator nmIt = nameMap.find(seqName); int index = nmIt->second; nameMap.erase(nmIt); otuData[index] = i; nSeqsPerOTU[i]++; aaP[i].push_back(index); seqName = ""; } } map::iterator nmIt = nameMap.find(seqName); int index = nmIt->second; nameMap.erase(nmIt); otuData[index] = i; nSeqsPerOTU[i]++; aaP[i].push_back(index); otuString.get(); } sort(aaP[i].begin(), aaP[i].end()); for(int j=0;jerrorOut(e, "ShhherCommand", "getOTUData"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::initPyroCluster(){ try{ if (numOTUs < processors) { processors = 1; } if (m->debug) { m->mothurOut("[DEBUG]: numSeqs = " + toString(numSeqs) + " numOTUS = " + toString(numOTUs) + " about to alloc a dist vector with size = " + toString((numSeqs * numOTUs)) + ".\n"); } dist.assign(numSeqs * numOTUs, 0); change.assign(numOTUs, 1); centroids.assign(numOTUs, -1); weight.assign(numOTUs, 0); singleTau.assign(numSeqs, 1.0); nSeqsBreaks.assign(processors+1, 0); nOTUsBreaks.assign(processors+1, 0); if (m->debug) { m->mothurOut("[DEBUG]: made it through the memory allocation.\n"); } nSeqsBreaks[0] = 0; for(int i=0;ierrorOut(e, "ShhherCommand", "initPyroCluster"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::fill(){ try { int index = 0; for(int i=0;icontrol_pressed) { break; } cumNumSeqs[i] = index; for(int j=0;jerrorOut(e, "ShhherCommand", "fill"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::getFlowData(){ try{ ifstream flowFile; m->openInputFile(flowFileName, flowFile); string seqName; seqNameVector.clear(); lengths.clear(); flowDataIntI.clear(); nameMap.clear(); int currentNumFlowCells; float intensity; string numFlowTest; flowFile >> numFlowTest; if (!m->isContainingOnlyDigits(numFlowTest)) { m->mothurOut("[ERROR]: expected a number and got " + numFlowTest + ", quitting. Did you use the flow parameter instead of the file parameter?"); m->mothurOutEndLine(); exit(1); } else { convert(numFlowTest, numFlowCells); } int index = 0;//pcluster while(!flowFile.eof()){ if (m->control_pressed) { break; } flowFile >> seqName >> currentNumFlowCells; lengths.push_back(currentNumFlowCells); seqNameVector.push_back(seqName); nameMap[seqName] = index++;//pcluster for(int i=0;i> intensity; if(intensity > 9.99) { intensity = 9.99; } int intI = int(100 * intensity + 0.0001); flowDataIntI.push_back(intI); } m->gobble(flowFile); } flowFile.close(); numSeqs = seqNameVector.size(); for(int i=0;icontrol_pressed) { break; } int iNumFlowCells = i * numFlowCells; for(int j=lengths[i];jerrorOut(e, "ShhherCommand", "getFlowData"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::calcNewDistancesChildMPI(int startSeq, int stopSeq, vector& otuIndex){ try{ vector newTau(numOTUs,0); vector norms(numSeqs, 0); otuIndex.clear(); seqIndex.clear(); singleTau.clear(); for(int i=startSeq;icontrol_pressed) { break; } double offset = 1e8; int indexOffset = i * numOTUs; for(int j=0;j MIN_WEIGHT && change[j] == 1){ dist[indexOffset + j] = getDistToCentroid(centroids[j], i, lengths[i]); } if(weight[j] > MIN_WEIGHT && dist[indexOffset + j] < offset){ offset = dist[indexOffset + j]; } } for(int j=0;j MIN_WEIGHT){ newTau[j] = exp(sigma * (-dist[indexOffset + j] + offset)) * weight[j]; norms[i] += newTau[j]; } else{ newTau[j] = 0.0; } } for(int j=0;j MIN_TAU){ otuIndex.push_back(j); seqIndex.push_back(i); singleTau.push_back(newTau[j]); } } } } catch(exception& e) { m->errorOut(e, "ShhherCommand", "calcNewDistancesChildMPI"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::calcNewDistancesParent(int startSeq, int stopSeq){ try{ int total = 0; vector newTau(numOTUs,0); vector norms(numSeqs, 0); nSeqsPerOTU.assign(numOTUs, 0); for(int i=startSeq;icontrol_pressed) { break; } int indexOffset = i * numOTUs; double offset = 1e8; for(int j=0;j MIN_WEIGHT && change[j] == 1){ dist[indexOffset + j] = getDistToCentroid(centroids[j], i, lengths[i]); } if(weight[j] > MIN_WEIGHT && dist[indexOffset + j] < offset){ offset = dist[indexOffset + j]; } } for(int j=0;j MIN_WEIGHT){ newTau[j] = exp(sigma * (-dist[indexOffset + j] + offset)) * weight[j]; norms[i] += newTau[j]; } else{ newTau[j] = 0.0; } } for(int j=0;j MIN_TAU){ int oldTotal = total; total++; singleTau.resize(total, 0); seqNumber.resize(total, 0); seqIndex.resize(total, 0); singleTau[oldTotal] = newTau[j]; aaP[j][nSeqsPerOTU[j]] = oldTotal; aaI[j][nSeqsPerOTU[j]] = i; nSeqsPerOTU[j]++; } } } } catch(exception& e) { m->errorOut(e, "ShhherCommand", "calcNewDistancesParent"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::setOTUs(){ try { vector bigTauMatrix(numOTUs * numSeqs, 0.0000); for(int i=0;icontrol_pressed) { break; } for(int j=0;j maxTau){ maxTau = bigTauMatrix[i * numOTUs + j]; maxOTU = j; } } otuData[i] = maxOTU; } nSeqsPerOTU.assign(numOTUs, 0); for(int i=0;ierrorOut(e, "ShhherCommand", "setOTUs"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::getUniques(){ try{ numUniques = 0; uniqueFlowgrams.assign(numFlowCells * numSeqs, -1); uniqueCount.assign(numSeqs, 0); // anWeights uniqueLengths.assign(numSeqs, 0); mapSeqToUnique.assign(numSeqs, -1); mapUniqueToSeq.assign(numSeqs, -1); vector uniqueFlowDataIntI(numFlowCells * numSeqs, -1); for(int i=0;icontrol_pressed) { break; } int index = 0; vector current(numFlowCells); for(int j=0;j uniqueLengths[j]) { uniqueLengths[j] = lengths[i]; } break; } index++; } if(index == numUniques){ uniqueLengths[numUniques] = lengths[i]; uniqueCount[numUniques] = 1; mapSeqToUnique[i] = numUniques;//anMap mapUniqueToSeq[numUniques] = i;//anF for(int k=0;kcontrol_pressed) { break; } flowDataPrI[i] = getProbIntensity(flowDataIntI[i]); } } catch(exception& e) { m->errorOut(e, "ShhherCommand", "getUniques"); exit(1); } } /**************************************************************************************************/ float ShhherCommand::calcPairwiseDist(int seqA, int seqB){ try{ int minLength = lengths[mapSeqToUnique[seqA]]; if(lengths[seqB] < minLength){ minLength = lengths[mapSeqToUnique[seqB]]; } int ANumFlowCells = seqA * numFlowCells; int BNumFlowCells = seqB * numFlowCells; float dist = 0; for(int i=0;icontrol_pressed) { break; } int flowAIntI = flowDataIntI[ANumFlowCells + i]; float flowAPrI = flowDataPrI[ANumFlowCells + i]; int flowBIntI = flowDataIntI[BNumFlowCells + i]; float flowBPrI = flowDataPrI[BNumFlowCells + i]; dist += jointLookUp[flowAIntI * NUMBINS + flowBIntI] - flowAPrI - flowBPrI; } dist /= (float) minLength; return dist; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "calcPairwiseDist"); exit(1); } } //**********************************************************************************************************************/ string ShhherCommand::cluster(string distFileName, string namesFileName){ try { ReadMatrix* read = new ReadColumnMatrix(distFileName); read->setCutoff(cutoff); NameAssignment* clusterNameMap = new NameAssignment(namesFileName); clusterNameMap->readMap(); read->read(clusterNameMap); ListVector* list = read->getListVector(); SparseDistanceMatrix* matrix = read->getDMatrix(); delete read; delete clusterNameMap; RAbundVector* rabund = new RAbundVector(list->getRAbundVector()); float adjust = -1.0; Cluster* cluster = new CompleteLinkage(rabund, list, matrix, cutoff, "furthest", adjust); string tag = cluster->getTag(); double clusterCutoff = cutoff; while (matrix->getSmallDist() <= clusterCutoff && matrix->getNNodes() > 0){ if (m->control_pressed) { break; } cluster->update(clusterCutoff); } list->setLabel(toString(cutoff)); string listFileName = flowFileName.substr(0,flowFileName.find_last_of('.')) + ".shhh.list"; ofstream listFile; m->openOutputFile(listFileName, listFile); list->print(listFile); listFile.close(); delete matrix; delete cluster; delete rabund; delete list; return listFileName; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "cluster"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::calcCentroidsDriver(int start, int finish){ //this function gets the most likely homopolymer length at a flow position for a group of sequences //within an otu try{ for(int i=start;icontrol_pressed) { break; } double count = 0; int position = 0; int minFlowGram = 100000000; double minFlowValue = 1e8; change[i] = 0; //FALSE for(int j=0;j 0 && count > MIN_COUNT){ vector adF(nSeqsPerOTU[i]); vector anL(nSeqsPerOTU[i]); for(int j=0;jerrorOut(e, "ShhherCommand", "calcCentroidsDriver"); exit(1); } } /**************************************************************************************************/ double ShhherCommand::getDistToCentroid(int cent, int flow, int length){ try{ int flowAValue = cent * numFlowCells; int flowBValue = flow * numFlowCells; double dist = 0; for(int i=0;ierrorOut(e, "ShhherCommand", "getDistToCentroid"); exit(1); } } /**************************************************************************************************/ double ShhherCommand::getNewWeights(){ try{ double maxChange = 0; for(int i=0;icontrol_pressed) { break; } double difference = weight[i]; weight[i] = 0; for(int j=0;j maxChange){ maxChange = difference; } } return maxChange; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "getNewWeights"); exit(1); } } /**************************************************************************************************/ double ShhherCommand::getLikelihood(){ try{ vector P(numSeqs, 0); int effNumOTUs = 0; for(int i=0;i MIN_WEIGHT){ effNumOTUs++; } } string hold; for(int i=0;icontrol_pressed) { break; } for(int j=0;jerrorOut(e, "ShhherCommand", "getNewWeights"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::checkCentroids(){ try{ vector unique(numOTUs, 1); for(int i=0;icontrol_pressed) { break; } if(unique[i] == 1){ for(int j=i+1;jerrorOut(e, "ShhherCommand", "checkCentroids"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeQualities(vector otuCounts){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(flowFileName); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(flowFileName)); string qualityFileName = getOutputFileName("qfile",variables); ofstream qualityFile; m->openOutputFile(qualityFileName, qualityFile); qualityFile.setf(ios::fixed, ios::floatfield); qualityFile.setf(ios::showpoint); qualityFile << setprecision(6); vector > qualities(numOTUs); vector pr(HOMOPS, 0); for(int i=0;icontrol_pressed) { break; } int index = 0; if(nSeqsPerOTU[i] > 0){ while(index < numFlowCells){ double maxPrValue = 1e8; short maxPrIndex = -1; double count = 0.0000; pr.assign(HOMOPS, 0); for(int j=0;j MIN_COUNT){ double U = 0.0000; double norm = 0.0000; for(int s=0;s0.00){ temp = log10(U); } else{ temp = -10.1; } temp = floor(-10 * temp); value = (int)floor(temp); if(value > 100){ value = 100; } qualities[i].push_back((int)value); } } index++; } } if(otuCounts[i] > 0){ qualityFile << '>' << seqNameVector[mapUniqueToSeq[i]] << endl; for (int j = 4; j < qualities[i].size(); j++) { qualityFile << qualities[i][j] << ' '; } qualityFile << endl; } } qualityFile.close(); outputNames.push_back(qualityFileName); outputTypes["qfile"].push_back(qualityFileName); } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeQualities"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeSequences(vector otuCounts){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(flowFileName); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(flowFileName)); string fastaFileName = getOutputFileName("fasta",variables); ofstream fastaFile; m->openOutputFile(fastaFileName, fastaFile); vector names(numOTUs, ""); for(int i=0;icontrol_pressed) { break; } int index = centroids[i]; if(otuCounts[i] > 0){ fastaFile << '>' << seqNameVector[aaI[i][0]] << endl; string newSeq = ""; for(int j=0;jappendFiles(fastaFileName, compositeFASTAFileName); } } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeSequences"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeNames(vector otuCounts){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(flowFileName); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(flowFileName)); string nameFileName = getOutputFileName("name",variables); ofstream nameFile; m->openOutputFile(nameFileName, nameFile); for(int i=0;icontrol_pressed) { break; } if(otuCounts[i] > 0){ nameFile << seqNameVector[aaI[i][0]] << '\t' << seqNameVector[aaI[i][0]]; for(int j=1;jappendFiles(nameFileName, compositeNamesFileName); } } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeNames"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeGroups(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(flowFileName); } string fileRoot = m->getRootName(m->getSimpleName(flowFileName)); int pos = fileRoot.find_first_of('.'); string fileGroup = fileRoot; if (pos != string::npos) { fileGroup = fileRoot.substr(pos+1, (fileRoot.length()-1-(pos+1))); } map variables; variables["[filename]"] = thisOutputDir + fileRoot; string groupFileName = getOutputFileName("group",variables); ofstream groupFile; m->openOutputFile(groupFileName, groupFile); for(int i=0;icontrol_pressed) { break; } groupFile << seqNameVector[i] << '\t' << fileGroup << endl; } groupFile.close(); outputNames.push_back(groupFileName); outputTypes["group"].push_back(groupFileName); } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeGroups"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeClusters(vector otuCounts){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(flowFileName); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(flowFileName)); string otuCountsFileName = getOutputFileName("counts",variables); ofstream otuCountsFile; m->openOutputFile(otuCountsFileName, otuCountsFile); string bases = flowOrder; for(int i=0;icontrol_pressed) { break; } //output the translated version of the centroid sequence for the otu if(otuCounts[i] > 0){ int index = centroids[i]; otuCountsFile << "ideal\t"; for(int j=8;jerrorOut(e, "ShhherCommand", "writeClusters"); exit(1); } } #else //********************************************************************************************************************** int ShhherCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } getSingleLookUp(); if (m->control_pressed) { return 0; } getJointLookUp(); if (m->control_pressed) { return 0; } int numFiles = flowFileVector.size(); if (numFiles < processors) { processors = numFiles; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (processors == 1) { driver(flowFileVector, compositeFASTAFileName, compositeNamesFileName); } else { createProcesses(flowFileVector); } //each processor processes one file #else driver(flowFileVector, compositeFASTAFileName, compositeNamesFileName); #endif if(compositeFASTAFileName != ""){ outputNames.push_back(compositeFASTAFileName); outputTypes["fasta"].push_back(compositeFASTAFileName); outputNames.push_back(compositeNamesFileName); outputTypes["name"].push_back(compositeNamesFileName); } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "execute"); exit(1); } } #endif //******************************************************************************************************************** //sorts biggest to smallest inline bool compareFileSizes(string left, string right){ FILE * pFile; long leftsize = 0; //get num bytes in file string filename = left; pFile = fopen (filename.c_str(),"rb"); string error = "Error opening " + filename; if (pFile==NULL) perror (error.c_str()); else{ fseek (pFile, 0, SEEK_END); leftsize=ftell (pFile); fclose (pFile); } FILE * pFile2; long rightsize = 0; //get num bytes in file filename = right; pFile2 = fopen (filename.c_str(),"rb"); error = "Error opening " + filename; if (pFile2==NULL) perror (error.c_str()); else{ fseek (pFile2, 0, SEEK_END); rightsize=ftell (pFile2); fclose (pFile2); } return (leftsize > rightsize); } /**************************************************************************************************/ int ShhherCommand::createProcesses(vector filenames){ try { vector processIDS; int process = 1; int num = 0; bool recalc = false; //sanity check if (filenames.size() < processors) { processors = filenames.size(); } //sort file names by size to divide load better sort(filenames.begin(), filenames.end(), compareFileSizes); vector < vector > dividedFiles; //dividedFiles[1] = vector of filenames for process 1... dividedFiles.resize(processors); //for each file, figure out which process will complete it //want to divide the load intelligently so the big files are spread between processes for (int i = 0; i < filenames.size(); i++) { int processToAssign = (i+1) % processors; if (processToAssign == 0) { processToAssign = processors; } dividedFiles[(processToAssign-1)].push_back(filenames[i]); } //now lets reverse the order of ever other process, so we balance big files running with little ones for (int i = 0; i < processors; i++) { int remainder = ((i+1) % processors); if (remainder) { reverse(dividedFiles[i].begin(), dividedFiles[i].end()); } } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(dividedFiles[process], compositeFASTAFileName + m->mothurGetpid(process) + ".temp", compositeNamesFileName + m->mothurGetpid(process) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = compositeFASTAFileName + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(compositeNamesFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(compositeFASTAFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(compositeFASTAFileName + (toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(compositeNamesFileName + (toString(processIDS[i]) + ".temp"));m->mothurRemove(compositeFASTAFileName + (toString(processIDS[i]) + ".temp"));m->mothurRemove(compositeFASTAFileName + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); dividedFiles.clear(); //dividedFiles[1] = vector of filenames for process 1... dividedFiles.resize(processors); //for each file, figure out which process will complete it //want to divide the load intelligently so the big files are spread between processes for (int i = 0; i < filenames.size(); i++) { int processToAssign = (i+1) % processors; if (processToAssign == 0) { processToAssign = processors; } dividedFiles[(processToAssign-1)].push_back(filenames[i]); } //now lets reverse the order of ever other process, so we balance big files running with little ones for (int i = 0; i < processors; i++) { int remainder = ((i+1) % processors); if (remainder) { reverse(dividedFiles[i].begin(), dividedFiles[i].end()); } } num = 0; processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ num = driver(dividedFiles[process], compositeFASTAFileName + m->mothurGetpid(process) + ".temp", compositeNamesFileName + m->mothurGetpid(process) + ".temp"); //pass numSeqs to parent ofstream out; string tempFile = compositeFASTAFileName + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << num << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part driver(dividedFiles[0], compositeFASTAFileName, compositeNamesFileName); //force parent to wait until all the processes are done for (int i=0;i pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; ioutputNames.size(); j++){ outputNames.push_back(pDataArray[i]->outputNames[j]); } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } */ #endif for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; if (tempNum != dividedFiles[i+1].size()) { m->mothurOut("[ERROR]: main process expected " + toString(processIDS[i]) + " to complete " + toString(dividedFiles[i+1].size()) + " files, and it only reported completing " + toString(tempNum) + ". This will cause file mismatches. The flow files may be too large to process with multiple processors. \n"); } } in.close(); m->mothurRemove(tempFile); if (compositeFASTAFileName != "") { m->appendFiles((compositeFASTAFileName + toString(processIDS[i]) + ".temp"), compositeFASTAFileName); m->appendFiles((compositeNamesFileName + toString(processIDS[i]) + ".temp"), compositeNamesFileName); m->mothurRemove((compositeFASTAFileName + toString(processIDS[i]) + ".temp")); m->mothurRemove((compositeNamesFileName + toString(processIDS[i]) + ".temp")); } } return 0; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ vector ShhherCommand::parseFlowFiles(string filename){ try { vector files; int count = 0; ifstream in; m->openInputFile(filename, in); int thisNumFLows = 0; in >> thisNumFLows; m->gobble(in); while (!in.eof()) { if (m->control_pressed) { break; } ofstream out; string outputFileName = filename + toString(count) + ".temp"; m->openOutputFile(outputFileName, out); out << thisNumFLows << endl; files.push_back(outputFileName); int numLinesWrote = 0; for (int i = 0; i < largeSize; i++) { if (in.eof()) { break; } string line = m->getline(in); m->gobble(in); out << line << endl; numLinesWrote++; } out.close(); if (numLinesWrote == 0) { m->mothurRemove(outputFileName); files.pop_back(); } count++; } in.close(); if (m->control_pressed) { for (int i = 0; i < files.size(); i++) { m->mothurRemove(files[i]); } files.clear(); } m->mothurOut("\nDivided " + filename + " into " + toString(files.size()) + " files.\n\n"); return files; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "parseFlowFiles"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::driver(vector filenames, string thisCompositeFASTAFileName, string thisCompositeNamesFileName){ try { int numCompleted = 0; for(int i=0;icontrol_pressed) { break; } vector theseFlowFileNames; theseFlowFileNames.push_back(filenames[i]); if (large) { theseFlowFileNames = parseFlowFiles(filenames[i]); } if (m->control_pressed) { break; } double begClock = clock(); unsigned long long begTime; string fileNameForOutput = filenames[i]; for (int g = 0; g < theseFlowFileNames.size(); g++) { string flowFileName = theseFlowFileNames[g]; m->mothurOut("\n>>>>>\tProcessing " + flowFileName + " (file " + toString(i+1) + " of " + toString(filenames.size()) + ")\t<<<<<\n"); m->mothurOut("Reading flowgrams...\n"); vector seqNameVector; vector lengths; vector flowDataIntI; vector flowDataPrI; map nameMap; vector uniqueFlowgrams; vector uniqueCount; vector mapSeqToUnique; vector mapUniqueToSeq; vector uniqueLengths; int numFlowCells; if (m->debug) { m->mothurOut("[DEBUG]: About to read flowgrams.\n"); } int numSeqs = getFlowData(flowFileName, seqNameVector, lengths, flowDataIntI, nameMap, numFlowCells); if (m->control_pressed) { break; } m->mothurOut("Identifying unique flowgrams...\n"); int numUniques = getUniques(numSeqs, numFlowCells, uniqueFlowgrams, uniqueCount, uniqueLengths, mapSeqToUnique, mapUniqueToSeq, lengths, flowDataPrI, flowDataIntI); if (m->control_pressed) { break; } m->mothurOut("Calculating distances between flowgrams...\n"); string distFileName = flowFileName.substr(0,flowFileName.find_last_of('.')) + ".shhh.dist"; begTime = time(NULL); flowDistParentFork(numFlowCells, distFileName, numUniques, mapUniqueToSeq, mapSeqToUnique, lengths, flowDataPrI, flowDataIntI); m->mothurOutEndLine(); m->mothurOut("Total time: " + toString(time(NULL) - begTime) + '\t' + toString((clock() - begClock)/CLOCKS_PER_SEC) + '\n'); string namesFileName = flowFileName.substr(0,flowFileName.find_last_of('.')) + ".shhh.names"; createNamesFile(numSeqs, numUniques, namesFileName, seqNameVector, mapSeqToUnique, mapUniqueToSeq); if (m->control_pressed) { break; } m->mothurOut("\nClustering flowgrams...\n"); string listFileName = flowFileName.substr(0,flowFileName.find_last_of('.')) + ".shhh.list"; cluster(listFileName, distFileName, namesFileName); if (m->control_pressed) { break; } vector otuData; vector cumNumSeqs; vector nSeqsPerOTU; vector > aaP; //tMaster->aanP: each row is a different otu / each col contains the sequence indices vector > aaI; //tMaster->aanI: that are in each otu - can't differentiate between aaP and aaI vector seqNumber; //tMaster->anP: the sequence id number sorted by OTU vector seqIndex; //tMaster->anI; the index that corresponds to seqNumber int numOTUs = getOTUData(numSeqs, listFileName, otuData, cumNumSeqs, nSeqsPerOTU, aaP, aaI, seqNumber, seqIndex, nameMap); if (m->control_pressed) { break; } m->mothurRemove(distFileName); m->mothurRemove(namesFileName); m->mothurRemove(listFileName); vector dist; //adDist - distance of sequences to centroids vector change; //did the centroid sequence change? 0 = no; 1 = yes vector centroids; //the representative flowgram for each cluster m vector weight; vector singleTau; //tMaster->adTau: 1-D Tau vector (1xnumSeqs) vector nSeqsBreaks; vector nOTUsBreaks; if (m->debug) { m->mothurOut("[DEBUG]: numSeqs = " + toString(numSeqs) + " numOTUS = " + toString(numOTUs) + " about to alloc a dist vector with size = " + toString((numSeqs * numOTUs)) + ".\n"); } dist.assign(numSeqs * numOTUs, 0); change.assign(numOTUs, 1); centroids.assign(numOTUs, -1); weight.assign(numOTUs, 0); singleTau.assign(numSeqs, 1.0); nSeqsBreaks.assign(2, 0); nOTUsBreaks.assign(2, 0); nSeqsBreaks[0] = 0; nSeqsBreaks[1] = numSeqs; nOTUsBreaks[1] = numOTUs; if (m->debug) { m->mothurOut("[DEBUG]: done allocating memory, about to denoise.\n"); } if (m->control_pressed) { break; } double maxDelta = 0; int iter = 0; begClock = clock(); begTime = time(NULL); m->mothurOut("\nDenoising flowgrams...\n"); m->mothurOut("iter\tmaxDelta\tnLL\t\tcycletime\n"); while((maxIters == 0 && maxDelta > minDelta) || iter < MIN_ITER || (maxDelta > minDelta && iter < maxIters)){ if (m->control_pressed) { break; } double cycClock = clock(); unsigned long long cycTime = time(NULL); fill(numOTUs, seqNumber, seqIndex, cumNumSeqs, nSeqsPerOTU, aaP, aaI); if (m->control_pressed) { break; } calcCentroidsDriver(numOTUs, cumNumSeqs, nSeqsPerOTU, seqIndex, change, centroids, singleTau, mapSeqToUnique, uniqueFlowgrams, flowDataIntI, lengths, numFlowCells, seqNumber); if (m->control_pressed) { break; } maxDelta = getNewWeights(numOTUs, cumNumSeqs, nSeqsPerOTU, singleTau, seqNumber, weight); if (m->control_pressed) { break; } double nLL = getLikelihood(numSeqs, numOTUs, nSeqsPerOTU, seqNumber, cumNumSeqs, seqIndex, dist, weight); if (m->control_pressed) { break; } checkCentroids(numOTUs, centroids, weight); if (m->control_pressed) { break; } calcNewDistances(numSeqs, numOTUs, nSeqsPerOTU, dist, weight, change, centroids, aaP, singleTau, aaI, seqNumber, seqIndex, uniqueFlowgrams, flowDataIntI, numFlowCells, lengths); if (m->control_pressed) { break; } iter++; m->mothurOut(toString(iter) + '\t' + toString(maxDelta) + '\t' + toString(nLL) + '\t' + toString(time(NULL) - cycTime) + '\t' + toString((clock() - cycClock)/(double)CLOCKS_PER_SEC) + '\n'); } if (m->control_pressed) { break; } m->mothurOut("\nFinalizing...\n"); fill(numOTUs, seqNumber, seqIndex, cumNumSeqs, nSeqsPerOTU, aaP, aaI); if (m->debug) { m->mothurOut("[DEBUG]: done fill().\n"); } if (m->control_pressed) { break; } setOTUs(numOTUs, numSeqs, seqNumber, seqIndex, cumNumSeqs, nSeqsPerOTU, otuData, singleTau, dist, aaP, aaI); if (m->debug) { m->mothurOut("[DEBUG]: done setOTUs().\n"); } if (m->control_pressed) { break; } vector otuCounts(numOTUs, 0); for(int j=0;jdebug) { m->mothurOut("[DEBUG]: done calcCentroidsDriver().\n"); } if (m->control_pressed) { break; } if ((large) && (g == 0)) { flowFileName = filenames[i]; theseFlowFileNames[0] = filenames[i]; } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir = m->hasPath(flowFileName); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(flowFileName)); string qualityFileName = getOutputFileName("qfile",variables); string fastaFileName = getOutputFileName("fasta",variables); string nameFileName = getOutputFileName("name",variables); string otuCountsFileName = getOutputFileName("counts",variables); string fileRoot = m->getRootName(m->getSimpleName(flowFileName)); int pos = fileRoot.find_first_of('.'); string fileGroup = fileRoot; if (pos != string::npos) { fileGroup = fileRoot.substr(pos+1, (fileRoot.length()-1-(pos+1))); } string groupFileName = getOutputFileName("group",variables); writeQualities(numOTUs, numFlowCells, qualityFileName, otuCounts, nSeqsPerOTU, seqNumber, singleTau, flowDataIntI, uniqueFlowgrams, cumNumSeqs, mapUniqueToSeq, seqNameVector, centroids, aaI); if (m->control_pressed) { break; } writeSequences(thisCompositeFASTAFileName, numOTUs, numFlowCells, fastaFileName, otuCounts, uniqueFlowgrams, seqNameVector, aaI, centroids);if (m->control_pressed) { break; } writeNames(thisCompositeNamesFileName, numOTUs, nameFileName, otuCounts, seqNameVector, aaI, nSeqsPerOTU); if (m->control_pressed) { break; } writeClusters(otuCountsFileName, numOTUs, numFlowCells,otuCounts, centroids, uniqueFlowgrams, seqNameVector, aaI, nSeqsPerOTU, lengths, flowDataIntI); if (m->control_pressed) { break; } writeGroups(groupFileName, fileGroup, numSeqs, seqNameVector); if (m->control_pressed) { break; } if (large) { if (g > 0) { variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(theseFlowFileNames[0])); m->appendFiles(qualityFileName, getOutputFileName("qfile",variables)); m->mothurRemove(qualityFileName); m->appendFiles(fastaFileName, getOutputFileName("fasta",variables)); m->mothurRemove(fastaFileName); m->appendFiles(nameFileName, getOutputFileName("name",variables)); m->mothurRemove(nameFileName); m->appendFiles(otuCountsFileName, getOutputFileName("counts",variables)); m->mothurRemove(otuCountsFileName); m->appendFiles(groupFileName, getOutputFileName("group",variables)); m->mothurRemove(groupFileName); } m->mothurRemove(theseFlowFileNames[g]); } } numCompleted++; m->mothurOut("Total time to process " + fileNameForOutput + ":\t" + toString(time(NULL) - begTime) + '\t' + toString((clock() - begClock)/(double)CLOCKS_PER_SEC) + '\n'); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } return numCompleted; }catch(exception& e) { m->errorOut(e, "ShhherCommand", "driver"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::getFlowData(string filename, vector& thisSeqNameVector, vector& thisLengths, vector& thisFlowDataIntI, map& thisNameMap, int& numFlowCells){ try{ ifstream flowFile; m->openInputFile(filename, flowFile); string seqName; int currentNumFlowCells; float intensity; thisSeqNameVector.clear(); thisLengths.clear(); thisFlowDataIntI.clear(); thisNameMap.clear(); string numFlowTest; flowFile >> numFlowTest; if (!m->isContainingOnlyDigits(numFlowTest)) { m->mothurOut("[ERROR]: expected a number and got " + numFlowTest + ", quitting. Did you use the flow parameter instead of the file parameter?"); m->mothurOutEndLine(); exit(1); } else { convert(numFlowTest, numFlowCells); } if (m->debug) { m->mothurOut("[DEBUG]: numFlowCells = " + toString(numFlowCells) + ".\n"); } int index = 0;//pcluster while(!flowFile.eof()){ if (m->control_pressed) { break; } flowFile >> seqName >> currentNumFlowCells; thisLengths.push_back(currentNumFlowCells); thisSeqNameVector.push_back(seqName); thisNameMap[seqName] = index++;//pcluster if (m->debug) { m->mothurOut("[DEBUG]: seqName = " + seqName + " length = " + toString(currentNumFlowCells) + " index = " + toString(index) + "\n"); } for(int i=0;i> intensity; if(intensity > 9.99) { intensity = 9.99; } int intI = int(100 * intensity + 0.0001); thisFlowDataIntI.push_back(intI); } m->gobble(flowFile); } flowFile.close(); int numSeqs = thisSeqNameVector.size(); for(int i=0;icontrol_pressed) { break; } int iNumFlowCells = i * numFlowCells; for(int j=thisLengths[i];jerrorOut(e, "ShhherCommand", "getFlowData"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::flowDistParentFork(int numFlowCells, string distFileName, int stopSeq, vector& mapUniqueToSeq, vector& mapSeqToUnique, vector& lengths, vector& flowDataPrI, vector& flowDataIntI){ try{ ostringstream outStream; outStream.setf(ios::fixed, ios::floatfield); outStream.setf(ios::dec, ios::basefield); outStream.setf(ios::showpoint); outStream.precision(6); int begTime = time(NULL); double begClock = clock(); for(int i=0;icontrol_pressed) { break; } for(int j=0;jmothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - begTime)); m->mothurOutJustToScreen("\t" + toString((clock()-begClock)/CLOCKS_PER_SEC)+"\n"); } } ofstream distFile(distFileName.c_str()); distFile << outStream.str(); distFile.close(); if (m->control_pressed) {} else { m->mothurOutJustToScreen(toString(stopSeq-1) + "\t" + toString(time(NULL) - begTime)); m->mothurOutJustToScreen("\t" + toString((clock()-begClock)/CLOCKS_PER_SEC)+"\n"); } return 0; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "flowDistParentFork"); exit(1); } } /**************************************************************************************************/ float ShhherCommand::calcPairwiseDist(int numFlowCells, int seqA, int seqB, vector& mapSeqToUnique, vector& lengths, vector& flowDataPrI, vector& flowDataIntI){ try{ int minLength = lengths[mapSeqToUnique[seqA]]; if(lengths[seqB] < minLength){ minLength = lengths[mapSeqToUnique[seqB]]; } int ANumFlowCells = seqA * numFlowCells; int BNumFlowCells = seqB * numFlowCells; float dist = 0; for(int i=0;icontrol_pressed) { break; } int flowAIntI = flowDataIntI[ANumFlowCells + i]; float flowAPrI = flowDataPrI[ANumFlowCells + i]; int flowBIntI = flowDataIntI[BNumFlowCells + i]; float flowBPrI = flowDataPrI[BNumFlowCells + i]; dist += jointLookUp[flowAIntI * NUMBINS + flowBIntI] - flowAPrI - flowBPrI; } dist /= (float) minLength; return dist; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "calcPairwiseDist"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::getUniques(int numSeqs, int numFlowCells, vector& uniqueFlowgrams, vector& uniqueCount, vector& uniqueLengths, vector& mapSeqToUnique, vector& mapUniqueToSeq, vector& lengths, vector& flowDataPrI, vector& flowDataIntI){ try{ int numUniques = 0; uniqueFlowgrams.assign(numFlowCells * numSeqs, -1); uniqueCount.assign(numSeqs, 0); // anWeights uniqueLengths.assign(numSeqs, 0); mapSeqToUnique.assign(numSeqs, -1); mapUniqueToSeq.assign(numSeqs, -1); vector uniqueFlowDataIntI(numFlowCells * numSeqs, -1); for(int i=0;icontrol_pressed) { break; } int index = 0; vector current(numFlowCells); for(int j=0;j uniqueLengths[j]) { uniqueLengths[j] = lengths[i]; } break; } index++; } if(index == numUniques){ uniqueLengths[numUniques] = lengths[i]; uniqueCount[numUniques] = 1; mapSeqToUnique[i] = numUniques;//anMap mapUniqueToSeq[numUniques] = i;//anF for(int k=0;kcontrol_pressed) { break; } flowDataPrI[i] = getProbIntensity(flowDataIntI[i]); } return numUniques; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "getUniques"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::createNamesFile(int numSeqs, int numUniques, string filename, vector& seqNameVector, vector& mapSeqToUnique, vector& mapUniqueToSeq){ try{ vector duplicateNames(numUniques, ""); for(int i=0;iopenOutputFile(filename, nameFile); for(int i=0;icontrol_pressed) { break; } // nameFile << seqNameVector[mapUniqueToSeq[i]] << '\t' << duplicateNames[i].substr(0, duplicateNames[i].find_last_of(',')) << endl; nameFile << mapUniqueToSeq[i] << '\t' << duplicateNames[i].substr(0, duplicateNames[i].find_last_of(',')) << endl; } nameFile.close(); return 0; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "createNamesFile"); exit(1); } } //********************************************************************************************************************** int ShhherCommand::cluster(string filename, string distFileName, string namesFileName){ try { ReadMatrix* read = new ReadColumnMatrix(distFileName); read->setCutoff(cutoff); NameAssignment* clusterNameMap = new NameAssignment(namesFileName); clusterNameMap->readMap(); read->read(clusterNameMap); ListVector* list = read->getListVector(); SparseDistanceMatrix* matrix = read->getDMatrix(); delete read; delete clusterNameMap; RAbundVector* rabund = new RAbundVector(list->getRAbundVector()); float adjust = -1.0; Cluster* cluster = new CompleteLinkage(rabund, list, matrix, cutoff, "furthest", adjust); string tag = cluster->getTag(); double clusterCutoff = cutoff; while (matrix->getSmallDist() <= clusterCutoff && matrix->getNNodes() > 0){ if (m->control_pressed) { break; } cluster->update(clusterCutoff); } list->setLabel(toString(cutoff)); ofstream listFile; m->openOutputFile(filename, listFile); list->print(listFile); listFile.close(); delete matrix; delete cluster; delete rabund; delete list; return 0; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "cluster"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::getOTUData(int numSeqs, string fileName, vector& otuData, vector& cumNumSeqs, vector& nSeqsPerOTU, vector >& aaP, //tMaster->aanP: each row is a different otu / each col contains the sequence indices vector >& aaI, //tMaster->aanI: that are in each otu - can't differentiate between aaP and aaI vector& seqNumber, //tMaster->anP: the sequence id number sorted by OTU vector& seqIndex, map& nameMap){ try { ifstream listFile; m->openInputFile(fileName, listFile); string label; int numOTUs; listFile >> label >> numOTUs; if (m->debug) { m->mothurOut("[DEBUG]: Getting OTU Data...\n"); } otuData.assign(numSeqs, 0); cumNumSeqs.assign(numOTUs, 0); nSeqsPerOTU.assign(numOTUs, 0); aaP.clear();aaP.resize(numOTUs); seqNumber.clear(); aaI.clear(); seqIndex.clear(); string singleOTU = ""; for(int i=0;icontrol_pressed) { break; } if (m->debug) { m->mothurOut("[DEBUG]: processing OTU " + toString(i) + ".\n"); } listFile >> singleOTU; istringstream otuString(singleOTU); while(otuString){ string seqName = ""; for(int j=0;j::iterator nmIt = nameMap.find(seqName); int index = nmIt->second; nameMap.erase(nmIt); otuData[index] = i; nSeqsPerOTU[i]++; aaP[i].push_back(index); seqName = ""; } } map::iterator nmIt = nameMap.find(seqName); int index = nmIt->second; nameMap.erase(nmIt); otuData[index] = i; nSeqsPerOTU[i]++; aaP[i].push_back(index); otuString.get(); } sort(aaP[i].begin(), aaP[i].end()); for(int j=0;jerrorOut(e, "ShhherCommand", "getOTUData"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::calcCentroidsDriver(int numOTUs, vector& cumNumSeqs, vector& nSeqsPerOTU, vector& seqIndex, vector& change, //did the centroid sequence change? 0 = no; 1 = yes vector& centroids, //the representative flowgram for each cluster m vector& singleTau, //tMaster->adTau: 1-D Tau vector (1xnumSeqs) vector& mapSeqToUnique, vector& uniqueFlowgrams, vector& flowDataIntI, vector& lengths, int numFlowCells, vector& seqNumber){ //this function gets the most likely homopolymer length at a flow position for a group of sequences //within an otu try{ for(int i=0;icontrol_pressed) { break; } double count = 0; int position = 0; int minFlowGram = 100000000; double minFlowValue = 1e8; change[i] = 0; //FALSE for(int j=0;j 0 && count > MIN_COUNT){ vector adF(nSeqsPerOTU[i]); vector anL(nSeqsPerOTU[i]); for(int j=0;jerrorOut(e, "ShhherCommand", "calcCentroidsDriver"); exit(1); } } /**************************************************************************************************/ double ShhherCommand::getDistToCentroid(int cent, int flow, int length, vector& uniqueFlowgrams, vector& flowDataIntI, int numFlowCells){ try{ int flowAValue = cent * numFlowCells; int flowBValue = flow * numFlowCells; double dist = 0; for(int i=0;ierrorOut(e, "ShhherCommand", "getDistToCentroid"); exit(1); } } /**************************************************************************************************/ double ShhherCommand::getNewWeights(int numOTUs, vector& cumNumSeqs, vector& nSeqsPerOTU, vector& singleTau, vector& seqNumber, vector& weight){ try{ double maxChange = 0; for(int i=0;icontrol_pressed) { break; } double difference = weight[i]; weight[i] = 0; for(int j=0;j maxChange){ maxChange = difference; } } return maxChange; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "getNewWeights"); exit(1); } } /**************************************************************************************************/ double ShhherCommand::getLikelihood(int numSeqs, int numOTUs, vector& nSeqsPerOTU, vector& seqNumber, vector& cumNumSeqs, vector& seqIndex, vector& dist, vector& weight){ try{ vector P(numSeqs, 0); int effNumOTUs = 0; for(int i=0;i MIN_WEIGHT){ effNumOTUs++; } } string hold; for(int i=0;icontrol_pressed) { break; } for(int j=0;jerrorOut(e, "ShhherCommand", "getNewWeights"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::checkCentroids(int numOTUs, vector& centroids, vector& weight){ try{ vector unique(numOTUs, 1); for(int i=0;icontrol_pressed) { break; } if(unique[i] == 1){ for(int j=i+1;jerrorOut(e, "ShhherCommand", "checkCentroids"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::calcNewDistances(int numSeqs, int numOTUs, vector& nSeqsPerOTU, vector& dist, vector& weight, vector& change, vector& centroids, vector >& aaP, vector& singleTau, vector >& aaI, vector& seqNumber, vector& seqIndex, vector& uniqueFlowgrams, vector& flowDataIntI, int numFlowCells, vector& lengths){ try{ int total = 0; vector newTau(numOTUs,0); vector norms(numSeqs, 0); nSeqsPerOTU.assign(numOTUs, 0); for(int i=0;icontrol_pressed) { break; } int indexOffset = i * numOTUs; double offset = 1e8; for(int j=0;j MIN_WEIGHT && change[j] == 1){ dist[indexOffset + j] = getDistToCentroid(centroids[j], i, lengths[i], uniqueFlowgrams, flowDataIntI, numFlowCells); } if(weight[j] > MIN_WEIGHT && dist[indexOffset + j] < offset){ offset = dist[indexOffset + j]; } } for(int j=0;j MIN_WEIGHT){ newTau[j] = exp(sigma * (-dist[indexOffset + j] + offset)) * weight[j]; norms[i] += newTau[j]; } else{ newTau[j] = 0.0; } } for(int j=0;j MIN_TAU){ int oldTotal = total; total++; singleTau.resize(total, 0); seqNumber.resize(total, 0); seqIndex.resize(total, 0); singleTau[oldTotal] = newTau[j]; aaP[j][nSeqsPerOTU[j]] = oldTotal; aaI[j][nSeqsPerOTU[j]] = i; nSeqsPerOTU[j]++; } } } } catch(exception& e) { m->errorOut(e, "ShhherCommand", "calcNewDistances"); exit(1); } } /**************************************************************************************************/ int ShhherCommand::fill(int numOTUs, vector& seqNumber, vector& seqIndex, vector& cumNumSeqs, vector& nSeqsPerOTU, vector >& aaP, vector >& aaI){ try { int index = 0; for(int i=0;icontrol_pressed) { return 0; } cumNumSeqs[i] = index; for(int j=0;jerrorOut(e, "ShhherCommand", "fill"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::setOTUs(int numOTUs, int numSeqs, vector& seqNumber, vector& seqIndex, vector& cumNumSeqs, vector& nSeqsPerOTU, vector& otuData, vector& singleTau, vector& dist, vector >& aaP, vector >& aaI){ try { vector bigTauMatrix(numOTUs * numSeqs, 0.0000); for(int i=0;icontrol_pressed) { break; } for(int j=0;j maxTau){ maxTau = bigTauMatrix[i * numOTUs + j]; maxOTU = j; } } otuData[i] = maxOTU; } nSeqsPerOTU.assign(numOTUs, 0); for(int i=0;ierrorOut(e, "ShhherCommand", "setOTUs"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeQualities(int numOTUs, int numFlowCells, string qualityFileName, vector otuCounts, vector& nSeqsPerOTU, vector& seqNumber, vector& singleTau, vector& flowDataIntI, vector& uniqueFlowgrams, vector& cumNumSeqs, vector& mapUniqueToSeq, vector& seqNameVector, vector& centroids, vector >& aaI){ try { ofstream qualityFile; m->openOutputFile(qualityFileName, qualityFile); qualityFile.setf(ios::fixed, ios::floatfield); qualityFile.setf(ios::showpoint); qualityFile << setprecision(6); vector > qualities(numOTUs); vector pr(HOMOPS, 0); for(int i=0;icontrol_pressed) { break; } int index = 0; if(nSeqsPerOTU[i] > 0){ while(index < numFlowCells){ double maxPrValue = 1e8; short maxPrIndex = -1; double count = 0.0000; pr.assign(HOMOPS, 0); for(int j=0;j MIN_COUNT){ double U = 0.0000; double norm = 0.0000; for(int s=0;s0.00){ temp = log10(U); } else{ temp = -10.1; } temp = floor(-10 * temp); value = (int)floor(temp); if(value > 100){ value = 100; } qualities[i].push_back((int)value); } }//end if index++; }//end while }//end if if(otuCounts[i] > 0){ qualityFile << '>' << seqNameVector[mapUniqueToSeq[i]] << endl; //need to get past the first four bases for (int j = 4; j < qualities[i].size(); j++) { qualityFile << qualities[i][j] << ' '; } qualityFile << endl; } }//end for qualityFile.close(); outputNames.push_back(qualityFileName); outputTypes["qfile"].push_back(qualityFileName); } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeQualities"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeSequences(string thisCompositeFASTAFileName, int numOTUs, int numFlowCells, string fastaFileName, vector otuCounts, vector& uniqueFlowgrams, vector& seqNameVector, vector >& aaI, vector& centroids){ try { ofstream fastaFile; m->openOutputFile(fastaFileName, fastaFile); vector names(numOTUs, ""); for(int i=0;icontrol_pressed) { break; } int index = centroids[i]; if(otuCounts[i] > 0){ fastaFile << '>' << seqNameVector[aaI[i][0]] << endl; string newSeq = ""; for(int j=0;j= 4) { fastaFile << newSeq.substr(4) << endl; } else { fastaFile << "NNNN" << endl; } } } fastaFile.close(); outputNames.push_back(fastaFileName); outputTypes["fasta"].push_back(fastaFileName); if(thisCompositeFASTAFileName != ""){ m->appendFiles(fastaFileName, thisCompositeFASTAFileName); } } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeSequences"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeNames(string thisCompositeNamesFileName, int numOTUs, string nameFileName, vector otuCounts, vector& seqNameVector, vector >& aaI, vector& nSeqsPerOTU){ try { ofstream nameFile; m->openOutputFile(nameFileName, nameFile); for(int i=0;icontrol_pressed) { break; } if(otuCounts[i] > 0){ nameFile << seqNameVector[aaI[i][0]] << '\t' << seqNameVector[aaI[i][0]]; for(int j=1;jappendFiles(nameFileName, thisCompositeNamesFileName); } } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeNames"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeGroups(string groupFileName, string fileRoot, int numSeqs, vector& seqNameVector){ try { ofstream groupFile; m->openOutputFile(groupFileName, groupFile); for(int i=0;icontrol_pressed) { break; } groupFile << seqNameVector[i] << '\t' << fileRoot << endl; } groupFile.close(); outputNames.push_back(groupFileName); outputTypes["group"].push_back(groupFileName); } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeGroups"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::writeClusters(string otuCountsFileName, int numOTUs, int numFlowCells, vector otuCounts, vector& centroids, vector& uniqueFlowgrams, vector& seqNameVector, vector >& aaI, vector& nSeqsPerOTU, vector& lengths, vector& flowDataIntI){ try { ofstream otuCountsFile; m->openOutputFile(otuCountsFileName, otuCountsFile); string bases = flowOrder; for(int i=0;icontrol_pressed) { break; } //output the translated version of the centroid sequence for the otu if(otuCounts[i] > 0){ int index = centroids[i]; otuCountsFile << "ideal\t"; for(int j=8;j= 4) { otuCountsFile << newSeq.substr(4) << endl; } else { otuCountsFile << "NNNN" << endl; } } otuCountsFile << endl; } } otuCountsFile.close(); outputNames.push_back(otuCountsFileName); outputTypes["counts"].push_back(otuCountsFileName); } catch(exception& e) { m->errorOut(e, "ShhherCommand", "writeClusters"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::getSingleLookUp(){ try{ // these are the -log probabilities that a signal corresponds to a particular homopolymer length singleLookUp.assign(HOMOPS * NUMBINS, 0); int index = 0; ifstream lookUpFile; m->openInputFile(lookupFileName, lookUpFile); for(int i=0;icontrol_pressed) { break; } float logFracFreq; lookUpFile >> logFracFreq; for(int j=0;j> singleLookUp[index]; index++; } } lookUpFile.close(); } catch(exception& e) { m->errorOut(e, "ShhherCommand", "getSingleLookUp"); exit(1); } } /**************************************************************************************************/ void ShhherCommand::getJointLookUp(){ try{ // the most likely joint probability (-log) that two intenities have the same polymer length jointLookUp.resize(NUMBINS * NUMBINS, 0); for(int i=0;icontrol_pressed) { break; } for(int j=0;jerrorOut(e, "ShhherCommand", "getJointLookUp"); exit(1); } } /**************************************************************************************************/ double ShhherCommand::getProbIntensity(int intIntensity){ try{ double minNegLogProb = 100000000; for(int i=0;icontrol_pressed) { break; } float negLogProb = singleLookUp[i * NUMBINS + intIntensity]; if(negLogProb < minNegLogProb) { minNegLogProb = negLogProb; } } return minNegLogProb; } catch(exception& e) { m->errorOut(e, "ShhherCommand", "getProbIntensity"); exit(1); } } mothur-1.36.1/source/commands/shhhercommand.h000066400000000000000000001706561255543666200212470ustar00rootroot00000000000000#ifndef SHHHER_H #define SHHHER_H /* * shhher.h * Mothur * * Created by Pat Schloss on 12/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "readcolumn.h" #include "readmatrix.hpp" #include "rabundvector.hpp" #include "sabundvector.hpp" #include "listvector.hpp" #include "cluster.hpp" #include //********************************************************************************************************************** #define NUMBINS 1000 #define HOMOPS 10 #define MIN_COUNT 0.1 #define MIN_WEIGHT 0.1 #define MIN_TAU 0.0001 #define MIN_ITER 10 //********************************************************************************************************************** class ShhherCommand : public Command { public: ShhherCommand(string); ShhherCommand(); ~ShhherCommand() {} vector setParameters(); string getCommandName() { return "shhh.flows"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Gevers D, Westcott SL (2011). Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 6:e27310.\nQuince C, Lanzen A, Davenport RJ, Turnbaugh PJ (2011). Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12:38.\nQuince C, Lanzén A, Curtis TP, Davenport RJ, Hall N, Head IM, Read LF, Sloan WT (2009). Accurate determination of microbial diversity from 454 pyrosequencing data. Nat. Methods 6:639.\nhttp://www.mothur.org/wiki/Shhh.flows"; } string getDescription() { return "shhh.flows"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, large; string outputDir, flowFileName, flowFilesFileName, lookupFileName, compositeFASTAFileName, compositeNamesFileName; int processors, maxIters, largeSize; float cutoff, sigma, minDelta; string flowOrder; vector outputNames; vector singleLookUp; vector jointLookUp; vector flowFileVector; vector parseFlowFiles(string); int driver(vector, string, string); int createProcesses(vector); int getFlowData(string, vector&, vector&, vector&, map&, int&); int getUniques(int, int, vector&, vector&, vector&, vector&, vector&, vector&, vector&, vector&); int flowDistParentFork(int, string, int, vector&, vector&, vector&, vector&, vector&); float calcPairwiseDist(int, int, int, vector&, vector&, vector&, vector&); int createNamesFile(int, int, string, vector&, vector&, vector&); int cluster(string, string, string); int getOTUData(int numSeqs, string, vector&, vector&, vector&, vector >&, vector >&, vector&, vector&,map&); int calcCentroidsDriver(int numOTUs, vector&, vector&, vector&, vector&, vector&, vector&, vector&, vector&, vector&, vector&, int, vector&); double getDistToCentroid(int, int, int, vector&, vector&, int); double getNewWeights(int, vector&, vector&, vector&, vector&, vector&); double getLikelihood(int, int, vector&, vector&, vector&, vector&, vector&, vector&); int checkCentroids(int, vector&, vector&); void calcNewDistances(int, int, vector& , vector&,vector& , vector& change, vector&,vector >&, vector&, vector >&, vector&, vector&, vector&, vector&, int, vector&); int fill(int, vector&, vector&, vector&, vector&, vector >&, vector >&); void setOTUs(int, int, vector&, vector&, vector&, vector&, vector&, vector&, vector&, vector >&, vector >&); void writeQualities(int, int, string, vector, vector&, vector&, vector&, vector&, vector&, vector&, vector&, vector&, vector&, vector >&); void writeSequences(string, int, int, string, vector, vector&, vector&, vector >&, vector&); void writeNames(string, int, string, vector, vector&, vector >&, vector&); void writeGroups(string, string, int, vector&); void writeClusters(string, int, int, vector, vector&, vector&, vector&, vector >&, vector&, vector&, vector&); void getSingleLookUp(); void getJointLookUp(); double getProbIntensity(int); #ifdef USE_MPI string flowDistMPI(int, int); void calcNewDistancesChildMPI(int, int, vector&); int pid, ncpus; void getFlowData(); void getUniques(); float calcPairwiseDist(int, int); void flowDistParentFork(string, int, int); string createDistFile(int); string createNamesFile(); string cluster(string, string); void getOTUData(string); void initPyroCluster(); void fill(); void calcCentroids(); void calcCentroidsDriver(int, int); double getDistToCentroid(int, int, int); double getNewWeights(); double getLikelihood(); void checkCentroids(); void calcNewDistances(); void calcNewDistancesParent(int, int); void calcNewDistancesChild(int, int, vector&, vector&, vector&); void setOTUs(); void writeQualities(vector); void writeSequences(vector); void writeNames(vector); void writeGroups(); void writeClusters(vector); vector seqNameVector; vector lengths; vector flowDataIntI; vector flowDataPrI; map nameMap; vector otuData; vector cumNumSeqs; vector nSeqsPerOTU; vector > aaP; //tMaster->aanP: each row is a different otu / each col contains the sequence indices vector > aaI; //tMaster->aanI: that are in each otu - can't differentiate between aaP and aaI vector seqNumber; //tMaster->anP: the sequence id number sorted by OTU vector seqIndex; //tMaster->anI; the index that corresponds to seqNumber vector dist; //adDist - distance of sequences to centroids vector change; //did the centroid sequence change? 0 = no; 1 = yes vector centroids; //the representative flowgram for each cluster m vector weight; vector singleTau; //tMaster->adTau: 1-D Tau vector (1xnumSeqs) vector uniqueFlowgrams; vector uniqueCount; vector mapSeqToUnique; vector mapUniqueToSeq; vector uniqueLengths; int numSeqs, numUniques, numOTUs, numFlowCells; vector nSeqsBreaks; vector nOTUsBreaks; #endif }; /************************************************************************************************** //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct shhhFlowsData { int threadID, maxIters; float cutoff, sigma, minDelta; string flowOrder; vector singleLookUp; vector jointLookUp; vector filenames; vector outputNames; string thisCompositeFASTAFileName, thisCompositeNameFileName, outputDir; int start, stop; MothurOut* m; shhhFlowsData(){} shhhFlowsData(vector f, string cf, string cn, string ou, string flor, vector jl, vector sl, MothurOut* mout, int st, int sp, float cut, float si, float mD, int mx, int tid) { filenames = f; thisCompositeFASTAFileName = cf; thisCompositeNameFileName = cn; outputDir = ou; flowOrder = flor; m = mout; start = st; stop = sp; cutoff= cut; sigma = si; minDelta = mD; maxIters = mx; jointLookUp = jl; singleLookUp = sl; threadID = tid; } }; /************************************************************************************************** #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI ShhhFlowsThreadFunction(LPVOID lpParam){ shhhFlowsData* pDataArray; pDataArray = (shhhFlowsData*)lpParam; try { for(int l=pDataArray->start;lstop;l++){ if (pDataArray->m->control_pressed) { break; } string flowFileName = pDataArray->filenames[l]; pDataArray->m->mothurOut("\n>>>>>\tProcessing " + flowFileName + " (file " + toString(l+1) + " of " + toString(pDataArray->filenames.size()) + ")\t<<<<<\n"); pDataArray->m->mothurOut("Reading flowgrams...\n"); vector seqNameVector; vector lengths; vector flowDataIntI; vector flowDataPrI; map nameMap; vector uniqueFlowgrams; vector uniqueCount; vector mapSeqToUnique; vector mapUniqueToSeq; vector uniqueLengths; int numFlowCells; //int numSeqs = getFlowData(flowFileName, seqNameVector, lengths, flowDataIntI, nameMap, numFlowCells); /***************************************************************************************************** ifstream flowFile; // cout << "herethread " << flowFileName << '\t' << &flowFile << endl; pDataArray->m->openInputFile(flowFileName, flowFile); // cout << "herethread " << flowFileName << endl; string seqName; int currentNumFlowCells; float intensity; flowFile >> numFlowCells; int index = 0;//pcluster while(!flowFile.eof()){ if (pDataArray->m->control_pressed) { flowFile.close(); return 0; } flowFile >> seqName >> currentNumFlowCells; lengths.push_back(currentNumFlowCells); // cout << "herethread " << seqName << endl; seqNameVector.push_back(seqName); nameMap[seqName] = index++;//pcluster for(int i=0;i> intensity; if(intensity > 9.99) { intensity = 9.99; } int intI = int(100 * intensity + 0.0001); flowDataIntI.push_back(intI); } pDataArray->m->gobble(flowFile); } flowFile.close(); int numSeqs = seqNameVector.size(); // cout << numSeqs << endl; for(int i=0;im->control_pressed) { return 0; } int iNumFlowCells = i * numFlowCells; for(int j=lengths[i];jm->control_pressed) { return 0; } pDataArray->m->mothurOut("Identifying unique flowgrams...\n"); //int numUniques = getUniques(numSeqs, numFlowCells, uniqueFlowgrams, uniqueCount, uniqueLengths, mapSeqToUnique, mapUniqueToSeq, lengths, flowDataPrI, flowDataIntI); /***************************************************************************************************** int numUniques = 0; uniqueFlowgrams.assign(numFlowCells * numSeqs, -1); uniqueCount.assign(numSeqs, 0); // anWeights uniqueLengths.assign(numSeqs, 0); mapSeqToUnique.assign(numSeqs, -1); mapUniqueToSeq.assign(numSeqs, -1); vector uniqueFlowDataIntI(numFlowCells * numSeqs, -1); for(int i=0;im->control_pressed) { return 0; } int index = 0; vector current(numFlowCells); for(int j=0;j uniqueLengths[j]) { uniqueLengths[j] = lengths[i]; } break; } index++; } if(index == numUniques){ uniqueLengths[numUniques] = lengths[i]; uniqueCount[numUniques] = 1; mapSeqToUnique[i] = numUniques;//anMap mapUniqueToSeq[numUniques] = i;//anF for(int k=0;km->control_pressed) { return 0; } //flowDataPrI[i] = getProbIntensity(flowDataIntI[i]); flowDataPrI[i] = 100000000; for(int j=0;jm->control_pressed) { return 0; } float negLogProb = pDataArray->singleLookUp[j * NUMBINS + flowDataIntI[i]]; if(negLogProb < flowDataPrI[i]) { flowDataPrI[i] = negLogProb; } } } /***************************************************************************************************** if (pDataArray->m->control_pressed) { return 0; } pDataArray->m->mothurOut("Calculating distances between flowgrams...\n"); string distFileName = flowFileName.substr(0,flowFileName.find_last_of('.')) + ".shhh.dist"; unsigned long long begTime = time(NULL); double begClock = clock(); //flowDistParentFork(numFlowCells, distFileName, numUniques, mapUniqueToSeq, mapSeqToUnique, lengths, flowDataPrI, flowDataIntI); /***************************************************************************************************** ostringstream outStream; outStream.setf(ios::fixed, ios::floatfield); outStream.setf(ios::dec, ios::basefield); outStream.setf(ios::showpoint); outStream.precision(6); int thisbegTime = time(NULL); double thisbegClock = clock(); for(int i=0;im->control_pressed) { break; } for(int j=0;jm->control_pressed) { break; } int flowAIntI = flowDataIntI[ANumFlowCells + k]; float flowAPrI = flowDataPrI[ANumFlowCells + k]; int flowBIntI = flowDataIntI[BNumFlowCells + k]; float flowBPrI = flowDataPrI[BNumFlowCells + k]; flowDistance += pDataArray->jointLookUp[flowAIntI * NUMBINS + flowBIntI] - flowAPrI - flowBPrI; } flowDistance /= (float) minLength; /***************************************************************************************************** if(flowDistance < 1e-6){ outStream << mapUniqueToSeq[i] << '\t' << mapUniqueToSeq[j] << '\t' << 0.000000 << endl; } else if(flowDistance <= pDataArray->cutoff){ outStream << mapUniqueToSeq[i] << '\t' << mapUniqueToSeq[j] << '\t' << flowDistance << endl; } } if(i % 100 == 0){ pDataArray->m->mothurOut(toString(i) + "\t" + toString(time(NULL) - thisbegTime)); pDataArray->m->mothurOut("\t" + toString((clock()-thisbegClock)/CLOCKS_PER_SEC)); pDataArray->m->mothurOutEndLine(); } } ofstream distFile(distFileName.c_str()); distFile << outStream.str(); distFile.close(); if (pDataArray->m->control_pressed) {} else { pDataArray->m->mothurOut(toString(numUniques-1) + "\t" + toString(time(NULL) - thisbegTime)); pDataArray->m->mothurOut("\t" + toString((clock()-thisbegClock)/CLOCKS_PER_SEC)); pDataArray->m->mothurOutEndLine(); } /***************************************************************************************************** pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("Total time: " + toString(time(NULL) - begTime) + '\t' + toString((clock() - begClock)/CLOCKS_PER_SEC) + '\n'); string namesFileName = flowFileName.substr(0,flowFileName.find_last_of('.')) + ".shhh.names"; //createNamesFile(numSeqs, numUniques, namesFileName, seqNameVector, mapSeqToUnique, mapUniqueToSeq); /***************************************************************************************************** vector duplicateNames(numUniques, ""); for(int i=0;im->openOutputFile(namesFileName, nameFile); for(int i=0;im->control_pressed) { nameFile.close(); return 0; } nameFile << mapUniqueToSeq[i] << '\t' << duplicateNames[i].substr(0, duplicateNames[i].find_last_of(',')) << endl; } nameFile.close(); /***************************************************************************************************** if (pDataArray->m->control_pressed) { return 0; } pDataArray->m->mothurOut("\nClustering flowgrams...\n"); string listFileName = flowFileName.substr(0,flowFileName.find_last_of('.')) + ".shhh.list"; //cluster(listFileName, distFileName, namesFileName); /***************************************************************************************************** ReadMatrix* read = new ReadColumnMatrix(distFileName); read->setCutoff(pDataArray->cutoff); NameAssignment* clusterNameMap = new NameAssignment(namesFileName); clusterNameMap->readMap(); read->read(clusterNameMap); ListVector* list = read->getListVector(); SparseMatrix* matrix = read->getMatrix(); delete read; delete clusterNameMap; RAbundVector* rabund = new RAbundVector(list->getRAbundVector()); Cluster* cluster = new CompleteLinkage(rabund, list, matrix, pDataArray->cutoff, "furthest"); string tag = cluster->getTag(); double clusterCutoff = pDataArray->cutoff; while (matrix->getSmallDist() <= clusterCutoff && matrix->getNNodes() > 0){ if (pDataArray->m->control_pressed) { break; } cluster->update(clusterCutoff); } list->setLabel(toString(pDataArray->cutoff)); ofstream listFileOut; pDataArray->m->openOutputFile(listFileName, listFileOut); list->print(listFileOut); listFileOut.close(); delete matrix; delete cluster; delete rabund; delete list; /***************************************************************************************************** if (pDataArray->m->control_pressed) { return 0; } vector otuData; vector cumNumSeqs; vector nSeqsPerOTU; vector > aaP; //tMaster->aanP: each row is a different otu / each col contains the sequence indices vector > aaI; //tMaster->aanI: that are in each otu - can't differentiate between aaP and aaI vector seqNumber; //tMaster->anP: the sequence id number sorted by OTU vector seqIndex; //tMaster->anI; the index that corresponds to seqNumber //int numOTUs = getOTUData(numSeqs, listFileName, otuData, cumNumSeqs, nSeqsPerOTU, aaP, aaI, seqNumber, seqIndex, nameMap); /***************************************************************************************************** ifstream listFile; pDataArray->m->openInputFile(listFileName, listFile); string label; int numOTUs; listFile >> label >> numOTUs; otuData.assign(numSeqs, 0); cumNumSeqs.assign(numOTUs, 0); nSeqsPerOTU.assign(numOTUs, 0); aaP.clear();aaP.resize(numOTUs); seqNumber.clear(); aaI.clear(); seqIndex.clear(); string singleOTU = ""; for(int i=0;im->control_pressed) { break; } listFile >> singleOTU; istringstream otuString(singleOTU); while(otuString){ string seqName = ""; for(int j=0;j::iterator nmIt = nameMap.find(seqName); int index = nmIt->second; nameMap.erase(nmIt); otuData[index] = i; nSeqsPerOTU[i]++; aaP[i].push_back(index); seqName = ""; } } map::iterator nmIt = nameMap.find(seqName); int index = nmIt->second; nameMap.erase(nmIt); otuData[index] = i; nSeqsPerOTU[i]++; aaP[i].push_back(index); otuString.get(); } sort(aaP[i].begin(), aaP[i].end()); for(int j=0;jm->control_pressed) { return 0; } pDataArray->m->mothurRemove(distFileName); pDataArray->m->mothurRemove(namesFileName); pDataArray->m->mothurRemove(listFileName); vector dist; //adDist - distance of sequences to centroids vector change; //did the centroid sequence change? 0 = no; 1 = yes vector centroids; //the representative flowgram for each cluster m vector weight; vector singleTau; //tMaster->adTau: 1-D Tau vector (1xnumSeqs) vector nSeqsBreaks; vector nOTUsBreaks; dist.assign(numSeqs * numOTUs, 0); change.assign(numOTUs, 1); centroids.assign(numOTUs, -1); weight.assign(numOTUs, 0); singleTau.assign(numSeqs, 1.0); nSeqsBreaks.assign(2, 0); nOTUsBreaks.assign(2, 0); nSeqsBreaks[0] = 0; nSeqsBreaks[1] = numSeqs; nOTUsBreaks[1] = numOTUs; if (pDataArray->m->control_pressed) { break; } double maxDelta = 0; int iter = 0; begClock = clock(); begTime = time(NULL); pDataArray->m->mothurOut("\nDenoising flowgrams...\n"); pDataArray->m->mothurOut("iter\tmaxDelta\tnLL\t\tcycletime\n"); while((pDataArray->maxIters == 0 && maxDelta > pDataArray->minDelta) || iter < MIN_ITER || (maxDelta > pDataArray->minDelta && iter < pDataArray->maxIters)){ if (pDataArray->m->control_pressed) { break; } double cycClock = clock(); unsigned long long cycTime = time(NULL); //fill(numOTUs, seqNumber, seqIndex, cumNumSeqs, nSeqsPerOTU, aaP, aaI); /***************************************************************************************************** int indexFill = 0; for(int i=0;im->control_pressed) { return 0; } cumNumSeqs[i] = indexFill; for(int j=0;jm->control_pressed) { break; } //calcCentroidsDriver(numOTUs, cumNumSeqs, nSeqsPerOTU, seqIndex, change, centroids, singleTau, mapSeqToUnique, uniqueFlowgrams, flowDataIntI, lengths, numFlowCells, seqNumber); /***************************************************************************************************** for(int i=0;im->control_pressed) { break; } double count = 0; int position = 0; int minFlowGram = 100000000; double minFlowValue = 1e8; change[i] = 0; //FALSE for(int j=0;j 0 && count > MIN_COUNT){ vector adF(nSeqsPerOTU[i]); vector anL(nSeqsPerOTU[i]); for(int j=0;jsingleLookUp[uniqueFlowgrams[flowAValue] * NUMBINS + flowDataIntI[flowBValue]]; flowAValue++; flowBValue++; } dist = dist / (double)lengths[nI]; /***************************************************************************************************** adF[k] += dist * tauValue; } } for(int j=0;jm->control_pressed) { break; } //maxDelta = getNewWeights(numOTUs, cumNumSeqs, nSeqsPerOTU, singleTau, seqNumber, weight); /***************************************************************************************************** double maxChange = 0; for(int i=0;im->control_pressed) { break; } double difference = weight[i]; weight[i] = 0; for(int j=0;j maxChange){ maxChange = difference; } } maxDelta = maxChange; /***************************************************************************************************** if (pDataArray->m->control_pressed) { break; } //double nLL = getLikelihood(numSeqs, numOTUs, nSeqsPerOTU, seqNumber, cumNumSeqs, seqIndex, dist, weight); /***************************************************************************************************** vector P(numSeqs, 0); int effNumOTUs = 0; for(int i=0;i MIN_WEIGHT){ effNumOTUs++; } } string hold; for(int i=0;im->control_pressed) { break; } for(int j=0;jsigma); } } double nLL = 0.00; for(int i=0;isigma); /***************************************************************************************************** if (pDataArray->m->control_pressed) { break; } //checkCentroids(numOTUs, centroids, weight); /***************************************************************************************************** vector unique(numOTUs, 1); for(int i=0;im->control_pressed) { break; } if(unique[i] == 1){ for(int j=i+1;jm->control_pressed) { break; } //calcNewDistances(numSeqs, numOTUs, nSeqsPerOTU, dist, weight, change, centroids, aaP, singleTau, aaI, seqNumber, seqIndex, uniqueFlowgrams, flowDataIntI, numFlowCells, lengths); /***************************************************************************************************** int total = 0; vector newTau(numOTUs,0); vector norms(numSeqs, 0); nSeqsPerOTU.assign(numOTUs, 0); for(int i=0;im->control_pressed) { break; } int indexOffset = i * numOTUs; double offset = 1e8; for(int j=0;j MIN_WEIGHT && change[j] == 1){ //dist[indexOffset + j] = getDistToCentroid(centroids[j], i, lengths[i], uniqueFlowgrams, flowDataIntI, numFlowCells); /***************************************************************************************************** int flowAValue = centroids[j] * numFlowCells; int flowBValue = i * numFlowCells; double distTemp = 0; for(int l=0;lsingleLookUp[uniqueFlowgrams[flowAValue] * NUMBINS + flowDataIntI[flowBValue]]; flowAValue++; flowBValue++; } dist[indexOffset + j] = distTemp / (double)lengths[i]; /***************************************************************************************************** } if(weight[j] > MIN_WEIGHT && dist[indexOffset + j] < offset){ offset = dist[indexOffset + j]; } } for(int j=0;j MIN_WEIGHT){ newTau[j] = exp(pDataArray->sigma * (-dist[indexOffset + j] + offset)) * weight[j]; norms[i] += newTau[j]; } else{ newTau[j] = 0.0; } } for(int j=0;j MIN_TAU){ int oldTotal = total; total++; singleTau.resize(total, 0); seqNumber.resize(total, 0); seqIndex.resize(total, 0); singleTau[oldTotal] = newTau[j]; aaP[j][nSeqsPerOTU[j]] = oldTotal; aaI[j][nSeqsPerOTU[j]] = i; nSeqsPerOTU[j]++; } } } /***************************************************************************************************** if (pDataArray->m->control_pressed) { break; } iter++; pDataArray->m->mothurOut(toString(iter) + '\t' + toString(maxDelta) + '\t' + toString(nLL) + '\t' + toString(time(NULL) - cycTime) + '\t' + toString((clock() - cycClock)/(double)CLOCKS_PER_SEC) + '\n'); } if (pDataArray->m->control_pressed) { break; } pDataArray->m->mothurOut("\nFinalizing...\n"); //fill(numOTUs, seqNumber, seqIndex, cumNumSeqs, nSeqsPerOTU, aaP, aaI); /***************************************************************************************************** int indexFill = 0; for(int i=0;im->control_pressed) { return 0; } cumNumSeqs[i] = indexFill; for(int j=0;jm->control_pressed) { break; } //setOTUs(numOTUs, numSeqs, seqNumber, seqIndex, cumNumSeqs, nSeqsPerOTU, otuData, singleTau, dist, aaP, aaI); /***************************************************************************************************** vector bigTauMatrix(numOTUs * numSeqs, 0.0000); for(int i=0;im->control_pressed) { break; } for(int j=0;j maxTau){ maxTau = bigTauMatrix[i * numOTUs + j]; maxOTU = j; } } otuData[i] = maxOTU; } nSeqsPerOTU.assign(numOTUs, 0); for(int i=0;im->control_pressed) { return 0; } cumNumSeqs[i] = indexFill; for(int j=0;jm->control_pressed) { break; } vector otuCounts(numOTUs, 0); for(int i=0;im->control_pressed) { break; } double count = 0; int position = 0; int minFlowGram = 100000000; double minFlowValue = 1e8; change[i] = 0; //FALSE for(int j=0;j 0 && count > MIN_COUNT){ vector adF(nSeqsPerOTU[i]); vector anL(nSeqsPerOTU[i]); for(int j=0;jsingleLookUp[uniqueFlowgrams[flowAValue] * NUMBINS + flowDataIntI[flowBValue]]; flowAValue++; flowBValue++; } dist = dist / (double)lengths[nI]; /***************************************************************************************************** adF[k] += dist * tauValue; } } for(int j=0;jm->control_pressed) { break; } //writeQualities(numOTUs, numFlowCells, flowFileName, otuCounts, nSeqsPerOTU, seqNumber, singleTau, flowDataIntI, uniqueFlowgrams, cumNumSeqs, mapUniqueToSeq, seqNameVector, centroids, aaI); if (pDataArray->m->control_pressed) { break; } /***************************************************************************************************** string thisOutputDir = pDataArray->outputDir; if (pDataArray->outputDir == "") { thisOutputDir += pDataArray->m->hasPath(flowFileName); } string qualityFileName = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(flowFileName)) + "shhh.qual"; ofstream qualityFile; pDataArray->m->openOutputFile(qualityFileName, qualityFile); qualityFile.setf(ios::fixed, ios::floatfield); qualityFile.setf(ios::showpoint); qualityFile << setprecision(6); vector > qualities(numOTUs); vector pr(HOMOPS, 0); for(int i=0;im->control_pressed) { break; } int index = 0; int base = 0; if(nSeqsPerOTU[i] > 0){ qualities[i].assign(1024, -1); while(index < numFlowCells){ double maxPrValue = 1e8; short maxPrIndex = -1; double count = 0.0000; pr.assign(HOMOPS, 0); for(int j=0;jsingleLookUp[s * NUMBINS + intensity]; } } maxPrIndex = uniqueFlowgrams[centroids[i] * numFlowCells + index]; maxPrValue = pr[maxPrIndex]; if(count > MIN_COUNT){ double U = 0.0000; double norm = 0.0000; for(int s=0;s0.00){ temp = log10(U); } else{ temp = -10.1; } temp = floor(-10 * temp); value = (int)floor(temp); if(value > 100){ value = 100; } qualities[i][base] = (int)value; base++; } } index++; } } if(otuCounts[i] > 0){ qualityFile << '>' << seqNameVector[mapUniqueToSeq[i]] << endl; int j=4; //need to get past the first four bases while(qualities[i][j] != -1){ qualityFile << qualities[i][j] << ' '; j++; } qualityFile << endl; } } qualityFile.close(); pDataArray->outputNames.push_back(qualityFileName); /***************************************************************************************************** // writeSequences(thisCompositeFASTAFileName, numOTUs, numFlowCells, flowFileName, otuCounts, uniqueFlowgrams, seqNameVector, aaI, centroids); if (pDataArray->m->control_pressed) { break; } /***************************************************************************************************** thisOutputDir = pDataArray->outputDir; if (pDataArray->outputDir == "") { thisOutputDir += pDataArray->m->hasPath(flowFileName); } string fastaFileName = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(flowFileName)) + "shhh.fasta"; ofstream fastaFile; pDataArray->m->openOutputFile(fastaFileName, fastaFile); vector names(numOTUs, ""); for(int i=0;im->control_pressed) { break; } int index = centroids[i]; if(otuCounts[i] > 0){ fastaFile << '>' << seqNameVector[aaI[i][0]] << endl; string newSeq = ""; for(int j=0;jflowOrder[j % 4]; for(int k=0;koutputNames.push_back(fastaFileName); if(pDataArray->thisCompositeFASTAFileName != ""){ pDataArray->m->appendFiles(fastaFileName, pDataArray->thisCompositeFASTAFileName); } /***************************************************************************************************** //writeNames(thisCompositeNamesFileName, numOTUs, flowFileName, otuCounts, seqNameVector, aaI, nSeqsPerOTU); if (pDataArray->m->control_pressed) { break; } /***************************************************************************************************** thisOutputDir = pDataArray->outputDir; if (pDataArray->outputDir == "") { thisOutputDir += pDataArray->m->hasPath(flowFileName); } string nameFileName = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(flowFileName)) + "shhh.names"; ofstream nameFileOut; pDataArray->m->openOutputFile(nameFileName, nameFileOut); for(int i=0;im->control_pressed) { break; } if(otuCounts[i] > 0){ nameFileOut << seqNameVector[aaI[i][0]] << '\t' << seqNameVector[aaI[i][0]]; for(int j=1;joutputNames.push_back(nameFileName); if(pDataArray->thisCompositeNameFileName != ""){ pDataArray->m->appendFiles(nameFileName, pDataArray->thisCompositeNameFileName); } /***************************************************************************************************** //writeClusters(flowFileName, numOTUs, numFlowCells,otuCounts, centroids, uniqueFlowgrams, seqNameVector, aaI, nSeqsPerOTU, lengths, flowDataIntI); if (pDataArray->m->control_pressed) { break; } /***************************************************************************************************** thisOutputDir = pDataArray->outputDir; if (pDataArray->outputDir == "") { thisOutputDir += pDataArray->m->hasPath(flowFileName); } string otuCountsFileName = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(flowFileName)) + "shhh.counts"; ofstream otuCountsFile; pDataArray->m->openOutputFile(otuCountsFileName, otuCountsFile); string bases = pDataArray->flowOrder; for(int i=0;im->control_pressed) { break; } //output the translated version of the centroid sequence for the otu if(otuCounts[i] > 0){ int index = centroids[i]; otuCountsFile << "ideal\t"; for(int j=8;joutputNames.push_back(otuCountsFileName) /***************************************************************************************************** //writeGroups(flowFileName, numSeqs, seqNameVector); if (pDataArray->m->control_pressed) { break; } /***************************************************************************************************** thisOutputDir = pDataArray->outputDir; if (pDataArray->outputDir == "") { thisOutputDir += pDataArray->m->hasPath(flowFileName); } string fileRoot = thisOutputDir + pDataArray->m->getRootName(pDataArray->m->getSimpleName(flowFileName)); string groupFileName = fileRoot + "shhh.groups"; ofstream groupFile; pDataArray->m->openOutputFile(groupFileName, groupFile); for(int i=0;im->control_pressed) { break; } groupFile << seqNameVector[i] << '\t' << fileRoot << endl; } groupFile.close(); pDataArray->outputNames.push_back(groupFileName); /***************************************************************************************************** pDataArray->m->mothurOut("Total time to process " + flowFileName + ":\t" + toString(time(NULL) - begTime) + '\t' + toString((clock() - begClock)/(double)CLOCKS_PER_SEC) + '\n'); } if (pDataArray->m->control_pressed) { for (int i = 0; i < pDataArray->outputNames.size(); i++) { pDataArray->m->mothurRemove(pDataArray->outputNames[i]); } return 0; } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ShhherCommand", "ShhhFlowsThreadFunction"); exit(1); } } #endif */ #endif mothur-1.36.1/source/commands/shhhseqscommand.cpp000066400000000000000000001000001255543666200221170ustar00rootroot00000000000000/* * shhhseqscommand.cpp * Mothur * * Created by westcott on 11/8/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "shhhseqscommand.h" //********************************************************************************************************************** vector ShhhSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta-map",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "none", "none", "none","name",false,true,true); parameters.push_back(pname); CommandParameter pgroup("group", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(pgroup); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); CommandParameter psigma("sigma", "Number", "", "0.01", "", "", "","",false,false); parameters.push_back(psigma); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string ShhhSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The shhh.seqs command reads a fasta and name file and ....\n"; helpString += "The shhh.seqs command parameters are fasta, name, group, sigma and processors.\n"; helpString += "The fasta parameter allows you to enter the fasta file containing your sequences, and is required, unless you have a valid current fasta file. \n"; helpString += "The name parameter allows you to provide a name file associated with your fasta file. It is required. \n"; helpString += "The group parameter allows you to provide a group file. When checking sequences, only sequences from the same group as the query sequence will be used as the reference. \n"; helpString += "The processors parameter allows you to specify how many processors you would like to use. The default is 1. \n"; helpString += "The sigma parameter .... The default is 0.01. \n"; helpString += "The shhh.seqs command should be in the following format: \n"; helpString += "shhh.seqs(fasta=yourFastaFile, name=yourNameFile) \n"; helpString += "Example: shhh.seqs(fasta=AD.align, name=AD.names) \n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFastaFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string ShhhSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],shhh_seqs.fasta"; } else if (type == "name") { pattern = "[filename],shhh_seqs.names"; } else if (type == "map") { pattern = "[filename],shhh_seqs.map"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** ShhhSeqsCommand::ShhhSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["map"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "ShhhSeqsCommand"); exit(1); } } //********************************************************************************************************************** ShhhSeqsCommand::ShhhSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (map::iterator it2 = parameters.begin(); it2 != parameters.end(); it2++) { if (validParameter.isValidParameter(it2->first, myArray, it2->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["map"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } } //check for required parameters fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (fastafile == "not open") { abort = true; } else { m->setFastaFile(fastafile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not found") { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current namefile and the name parameter is required."); m->mothurOutEndLine(); abort = true; } } else if (namefile == "not open") { namefile = ""; abort = true; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not found") { groupfile = ""; } else if (groupfile == "not open") { abort = true; groupfile = ""; } else { m->setGroupFile(groupfile); } string temp = validParameter.validFile(parameters, "sigma", false); if(temp == "not found"){ temp = "0.01"; } m->mothurConvert(temp, sigma); sigma = 1/sigma; temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "ShhhSeqsCommand"); exit(1); } } //********************************************************************************************************************** int ShhhSeqsCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (outputDir == "") { outputDir = m->hasPath(fastafile); }//if user entered a file with a path then preserve it map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); string outputFileName = getOutputFileName("fasta",variables); string nameFileName = getOutputFileName("name",variables); string mapFileName = getOutputFileName("map",variables); if (groupfile != "") { //Parse sequences by group SequenceParser parser(groupfile, fastafile, namefile); vector groups = parser.getNamesOfGroups(); if (m->control_pressed) { return 0; } //clears files ofstream out, out1, out2; m->openOutputFile(outputFileName, out); out.close(); m->openOutputFile(nameFileName, out1); out1.close(); mapFileName = outputDir + m->getRootName(m->getSimpleName(fastafile)) + "shhh."; vector mapFileNames; if(processors == 1) { mapFileNames = driverGroups(parser, outputFileName, nameFileName, mapFileName, 0, groups.size(), groups); } else { mapFileNames = createProcessesGroups(parser, outputFileName, nameFileName, mapFileName, groups); } if (m->control_pressed) { return 0; } for (int j = 0; j < mapFileNames.size(); j++) { outputNames.push_back(mapFileNames[j]); outputTypes["map"].push_back(mapFileNames[j]); } //deconvolute results by running unique.seqs deconvoluteResults(outputFileName, nameFileName); if (m->control_pressed) { return 0; } }else{ vector sequences; vector uniqueNames; vector redundantNames; vector seqFreq; seqNoise noise; correctDist* correct = new correctDist(processors); //reads fasta and name file and loads them in order readData(correct, noise, sequences, uniqueNames, redundantNames, seqFreq); if (m->control_pressed) { return 0; } //calc distances for cluster string distFileName = outputDir + m->getRootName(m->getSimpleName(fastafile)) + "shhh.dist"; correct->execute(distFileName); delete correct; if (m->control_pressed) { m->mothurRemove(distFileName); return 0; } driver(noise, sequences, uniqueNames, redundantNames, seqFreq, distFileName, outputFileName, nameFileName, mapFileName); outputNames.push_back(mapFileName); outputTypes["map"].push_back(mapFileName); } if (m->control_pressed) { for (int j = 0; j < outputNames.size(); j++) { m->mothurRemove(outputNames[j]); } return 0; } outputNames.push_back(outputFileName); outputTypes["fasta"].push_back(outputFileName); outputNames.push_back(nameFileName); outputTypes["name"].push_back(nameFileName); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set accnos file as new current accnosfile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } return 0; } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int ShhhSeqsCommand::readData(correctDist* correct, seqNoise& noise, vector& seqs, vector& uNames, vector& rNames, vector& freq) { try { map nameMap; map::iterator it; m->readNames(namefile, nameMap); bool error = false; ifstream in; m->openInputFile(fastafile, in); while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { correct->addSeq(seq.getName(), seq.getAligned()); it = nameMap.find(seq.getName()); if (it != nameMap.end()) { noise.addSeq(seq.getAligned(), seqs); noise.addRedundantName(it->first, it->second, uNames, rNames, freq); }else { m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file and not in your namefile, please correct."); error = true; } } } in.close(); if (error) { m->control_pressed = true; } return seqs.size(); }catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "readData"); exit(1); } } //********************************************************************************************************************** int ShhhSeqsCommand::loadData(correctDist* correct, seqNoise& noise, vector& seqs, vector& uNames, vector& rNames, vector& freq, map& nameMap, vector& sequences) { try { bool error = false; map::iterator it; for (int i = 0; i < sequences.size(); i++) { if (m->control_pressed) { return 0; } if (sequences[i].getName() != "") { correct->addSeq(sequences[i].getName(), sequences[i].getAligned()); it = nameMap.find(sequences[i].getName()); if (it != nameMap.end()) { noise.addSeq(sequences[i].getAligned(), seqs); noise.addRedundantName(it->first, it->second, uNames, rNames, freq); }else { m->mothurOut("[ERROR]: " + sequences[i].getName() + " is in your fasta file and not in your namefile, please correct."); error = true; } } } if (error) { m->control_pressed = true; } return seqs.size(); }catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "loadData"); exit(1); } } /**************************************************************************************************/ vector ShhhSeqsCommand::createProcessesGroups(SequenceParser& parser, string newFName, string newNName, string newMName, vector groups) { try { vector processIDS; int process = 1; vector mapfileNames; bool recalc = false; //sanity check if (groups.size() < processors) { processors = groups.size(); } //divide the groups between the processors vector lines; int remainingPairs = groups.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ mapfileNames = driverGroups(parser, newFName + m->mothurGetpid(process) + ".temp", newNName + m->mothurGetpid(process) + ".temp", newMName, lines[process].start, lines[process].end, groups); //pass filenames to parent ofstream out; string tempFile = newMName + m->mothurGetpid(process) + ".temp"; m->openOutputFile(tempFile, out); out << mapfileNames.size() << endl; for (int i = 0; i < mapfileNames.size(); i++) { out << mapfileNames[i] << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(newFName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(newNName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(newMName + (toString(processIDS[i]) + ".temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); int remainingPairs = groups.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, (startIndex+numPairs))); //startIndex, endIndex startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } mapfileNames.clear(); processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ mapfileNames = driverGroups(parser, newFName + m->mothurGetpid(process) + ".temp", newNName + m->mothurGetpid(process) + ".temp", newMName, lines[process].start, lines[process].end, groups); //pass filenames to parent ofstream out; string tempFile = newMName + m->mothurGetpid(process) + ".temp"; m->openOutputFile(tempFile, out); out << mapfileNames.size() << endl; for (int i = 0; i < mapfileNames.size(); i++) { out << mapfileNames[i] << endl; } out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part mapfileNames = driverGroups(parser, newFName, newNName, newMName, lines[0].start, lines[0].end, groups); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); if (!in.eof()) { int tempNum = 0; in >> tempNum; m->gobble(in); for (int j = 0; j < tempNum; j++) { string filename; in >> filename; m->gobble(in); mapfileNames.push_back(filename); } } in.close(); m->mothurRemove(tempFile); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the shhhseqsData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; icount != (pDataArray[i]->end-pDataArray[i]->start)) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end-pDataArray[i]->start) + " groups assigned to it, quitting. \n"); m->control_pressed = true; } for (int j = 0; j < pDataArray[i]->mapfileNames.size(); j++) { mapfileNames.push_back(pDataArray[i]->mapfileNames[j]); } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //append output files for(int i=0;iappendFiles((newFName + toString(processIDS[i]) + ".temp"), newFName); m->mothurRemove((newFName + toString(processIDS[i]) + ".temp")); m->appendFiles((newNName + toString(processIDS[i]) + ".temp"), newNName); m->mothurRemove((newNName + toString(processIDS[i]) + ".temp")); } return mapfileNames; } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "createProcessesGroups"); exit(1); } } /**************************************************************************************************/ vector ShhhSeqsCommand::driverGroups(SequenceParser& parser, string newFFile, string newNFile, string newMFile, int start, int end, vector groups){ try { vector mapFileNames; for (int i = start; i < end; i++) { start = time(NULL); if (m->control_pressed) { return mapFileNames; } m->mothurOutEndLine(); m->mothurOut("Processing group " + groups[i] + ":"); m->mothurOutEndLine(); map thisNameMap; thisNameMap = parser.getNameMap(groups[i]); vector thisSeqs = parser.getSeqs(groups[i]); vector sequences; vector uniqueNames; vector redundantNames; vector seqFreq; seqNoise noise; correctDist* correct = new correctDist(1); //we use one processor since we already split up the work load. //load this groups info in order loadData(correct, noise, sequences, uniqueNames, redundantNames, seqFreq, thisNameMap, thisSeqs); if (m->control_pressed) { return mapFileNames; } //calc distances for cluster string distFileName = outputDir + m->getRootName(m->getSimpleName(fastafile)) + groups[i] + ".shhh.dist"; correct->execute(distFileName); delete correct; if (m->control_pressed) { m->mothurRemove(distFileName); return mapFileNames; } driver(noise, sequences, uniqueNames, redundantNames, seqFreq, distFileName, newFFile+groups[i], newNFile+groups[i], newMFile+groups[i]+".map"); if (m->control_pressed) { return mapFileNames; } m->appendFiles(newFFile+groups[i], newFFile); m->mothurRemove(newFFile+groups[i]); m->appendFiles(newNFile+groups[i], newNFile); m->mothurRemove(newNFile+groups[i]); mapFileNames.push_back(newMFile+groups[i]+".map"); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to process group " + groups[i] + "."); m->mothurOutEndLine(); } return mapFileNames; } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "driverGroups"); exit(1); } } //********************************************************************************************************************** int ShhhSeqsCommand::driver(seqNoise& noise, vector& sequences, vector& uniqueNames, vector& redundantNames, vector& seqFreq, string distFileName, string outputFileName, string nameFileName, string mapFileName) { try { double cutOff = 0.08; int minIter = 10; int maxIter = 1000; double minDelta = 1e-6; int numIters = 0; double maxDelta = 1e6; int numSeqs = sequences.size(); //run cluster command string inputString = "phylip=" + distFileName + ", method=furthest, cutoff=0.08"; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: cluster(" + inputString + ")"); m->mothurOutEndLine(); Command* clusterCommand = new ClusterCommand(inputString); clusterCommand->execute(); map > filenames = clusterCommand->getOutputFiles(); string listFileName = filenames["list"][0]; string rabundFileName = filenames["rabund"][0]; m->mothurRemove(rabundFileName); string sabundFileName = filenames["sabund"][0]; m->mothurRemove(sabundFileName); delete clusterCommand; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); if (m->control_pressed) { m->mothurRemove(distFileName); m->mothurRemove(listFileName); return 0; } vector distances(numSeqs * numSeqs); noise.getDistanceData(distFileName, distances); m->mothurRemove(distFileName); if (m->control_pressed) { m->mothurRemove(listFileName); return 0; } vector otuData(numSeqs); vector otuFreq; vector > otuBySeqLookUp; noise.getListData(listFileName, cutOff, otuData, otuFreq, otuBySeqLookUp); m->mothurRemove(listFileName); if (m->control_pressed) { return 0; } int numOTUs = otuFreq.size(); vector weights(numOTUs, 0); vector change(numOTUs, 1); vector centroids(numOTUs, -1); vector cumCount(numOTUs, 0); vector tau(numSeqs, 1); vector anP(numSeqs, 0); vector anI(numSeqs, 0); vector anN(numSeqs, 0); vector > aanI = otuBySeqLookUp; while(numIters < minIter || ((maxDelta > minDelta) && (numIters < maxIter))){ if (m->control_pressed) { return 0; } noise.updateOTUCountData(otuFreq, otuBySeqLookUp, aanI, anP, anI, cumCount); if (m->control_pressed) { return 0; } maxDelta = noise.calcNewWeights(weights, seqFreq, anI, cumCount, anP, otuFreq, tau); if (m->control_pressed) { return 0; } noise.calcCentroids(anI, anP, change, centroids, cumCount, distances, seqFreq, otuFreq, tau); if (m->control_pressed) { return 0; } noise.checkCentroids(weights, centroids); if (m->control_pressed) { return 0; } otuFreq.assign(numOTUs, 0); int total = 0; for(int i=0;icontrol_pressed) { return 0; } double offset = 1e6; double norm = 0.0000; double minWeight = 0.1; vector currentTau(numOTUs); for(int j=0;jcontrol_pressed) { return 0; } if(weights[j] > minWeight && distances[i * numSeqs+centroids[j]] < offset){ offset = distances[i * numSeqs+centroids[j]]; } } for(int j=0;jcontrol_pressed) { return 0; } if(weights[j] > minWeight){ currentTau[j] = exp(sigma * (-distances[(i * numSeqs + centroids[j])] + offset)) * weights[j]; norm += currentTau[j]; } else{ currentTau[j] = 0.0000; } } for(int j=0;jcontrol_pressed) { return 0; } currentTau[j] /= norm; } for(int j=0;jcontrol_pressed) { return 0; } if(currentTau[j] > 1.0e-4){ int oldTotal = total; total++; tau.resize(oldTotal+1); tau[oldTotal] = currentTau[j]; otuBySeqLookUp[j][otuFreq[j]] = oldTotal; aanI[j][otuFreq[j]] = i; otuFreq[j]++; } } anP.resize(total); anI.resize(total); } numIters++; } noise.updateOTUCountData(otuFreq, otuBySeqLookUp, aanI, anP, anI, cumCount); if (m->control_pressed) { return 0; } vector percentage(numSeqs); noise.setUpOTUData(otuData, percentage, cumCount, tau, otuFreq, anP, anI); if (m->control_pressed) { return 0; } noise.finishOTUData(otuData, otuFreq, anP, anI, cumCount, otuBySeqLookUp, aanI, tau); if (m->control_pressed) { return 0; } change.assign(numOTUs, 1); noise.calcCentroids(anI, anP, change, centroids, cumCount, distances, seqFreq, otuFreq, tau); if (m->control_pressed) { return 0; } vector finalTau(numOTUs, 0); for(int i=0;icontrol_pressed) { return 0; } finalTau[otuData[i]] += int(seqFreq[i]); } noise.writeOutput(outputFileName, nameFileName, mapFileName, finalTau, centroids, otuData, sequences, uniqueNames, redundantNames, seqFreq, distances); return 0; }catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "driver"); exit(1); } } //********************************************************************************************************************** int ShhhSeqsCommand::deconvoluteResults(string fastaFile, string nameFile){ try { m->mothurOutEndLine(); m->mothurOut("Deconvoluting results:"); m->mothurOutEndLine(); m->mothurOutEndLine(); //use unique.seqs to create new name and fastafile string inputString = "fasta=" + fastaFile + ", name=" + nameFile; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: unique.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* uniqueCommand = new DeconvoluteCommand(inputString); uniqueCommand->execute(); map > filenames = uniqueCommand->getOutputFiles(); delete uniqueCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); string newnameFile = filenames["name"][0]; string newfastaFile = filenames["fasta"][0]; m->mothurRemove(fastaFile); rename(newfastaFile.c_str(), fastaFile.c_str()); if (nameFile != newnameFile) { m->mothurRemove(nameFile); rename(newnameFile.c_str(), nameFile.c_str()); } return 0; } catch(exception& e) { m->errorOut(e, "ShhhSeqsCommand", "deconvoluteResults"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/shhhseqscommand.h000066400000000000000000000301211255543666200215720ustar00rootroot00000000000000#ifndef SHHHSEQSCOMMAND_H #define SHHHSEQSCOMMAND_H /* * shhhseqscommand.h * Mothur * * Created by westcott on 11/8/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "myseqdist.h" #include "seqnoise.h" #include "sequenceparser.h" #include "deconvolutecommand.h" #include "clustercommand.h" //********************************************************************************************************************** class ShhhSeqsCommand : public Command { public: ShhhSeqsCommand(string); ShhhSeqsCommand(); ~ShhhSeqsCommand() {} vector setParameters(); string getCommandName() { return "shhh.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Schloss PD, Gevers D, Westcott SL (2011). Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies. PLoS ONE. 6:e27310.\nQuince C, Lanzen A, Davenport RJ, Turnbaugh PJ (2011). Removing noise from pyrosequenced amplicons. BMC Bioinformatics 12:38.\nhttp://www.mothur.org/wiki/Shhh.seqs"; } string getDescription() { return "shhh.seqs"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string outputDir, fastafile, namefile, groupfile; int processors; double sigma; vector outputNames; int readData(correctDist*, seqNoise&, vector&, vector&, vector&, vector&); int loadData(correctDist*, seqNoise&, vector&, vector&, vector&, vector&, map&, vector&); int driver(seqNoise&, vector&, vector&, vector&, vector&, string, string, string, string); vector driverGroups(SequenceParser&, string, string, string, int, int, vector); vector createProcessesGroups(SequenceParser&, string, string, string, vector); int deconvoluteResults(string, string); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct shhhseqsData { string fastafile; string namefile; string groupfile; string newFName, newNName, newMName; MothurOut* m; int start; int end; int sigma, threadID, count; vector groups; vector mapfileNames; shhhseqsData(){} shhhseqsData(string f, string n, string g, string nff, string nnf, string nmf, vector gr, MothurOut* mout, int st, int en, int s, int tid) { fastafile = f; namefile = n; groupfile = g; newFName = nff; newNName = nnf; newMName = nmf; m = mout; start = st; end = en; sigma = s; threadID = tid; groups = gr; count=0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyShhhSeqsThreadFunction(LPVOID lpParam){ shhhseqsData* pDataArray; pDataArray = (shhhseqsData*)lpParam; try { //parse fasta and name file by group SequenceParser parser(pDataArray->groupfile, pDataArray->fastafile, pDataArray->namefile); //precluster each group for (int k = pDataArray->start; k < pDataArray->end; k++) { pDataArray->count++; int start = time(NULL); if (pDataArray->m->control_pressed) { return 0; } pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("Processing group " + pDataArray->groups[k] + ":"); pDataArray->m->mothurOutEndLine(); map thisNameMap; thisNameMap = parser.getNameMap(pDataArray->groups[k]); vector thisSeqs = parser.getSeqs(pDataArray->groups[k]); if (pDataArray->m->control_pressed) { return 0; } vector sequences; vector uniqueNames; vector redundantNames; vector seqFreq; seqNoise noise; correctDist* correct = new correctDist(1); //we use one processor since we already split up the work load. //load this groups info in order //loadData(correct, noise, sequences, uniqueNames, redundantNames, seqFreq, thisNameMap, thisSeqs); ////////////////////////////////////////////////////////////////////////////////////////////////// bool error = false; map::iterator it; for (int i = 0; i < thisSeqs.size(); i++) { if (pDataArray->m->control_pressed) { return 0; } if (thisSeqs[i].getName() != "") { correct->addSeq(thisSeqs[i].getName(), thisSeqs[i].getAligned()); it = thisNameMap.find(thisSeqs[i].getName()); if (it != thisNameMap.end()) { noise.addSeq(thisSeqs[i].getAligned(), sequences); noise.addRedundantName(it->first, it->second, uniqueNames, redundantNames, seqFreq); }else { pDataArray->m->mothurOut("[ERROR]: " + thisSeqs[i].getName() + " is in your fasta file and not in your namefile, please correct."); error = true; } } } if (error) { return 0; } ////////////////////////////////////////////////////////////////////////////////////////////////// if (pDataArray->m->control_pressed) { return 0; } //calc distances for cluster string distFileName = pDataArray->m->getRootName(pDataArray->m->getSimpleName(pDataArray->fastafile)) + pDataArray->groups[k] + ".shhh.dist"; correct->execute(distFileName); delete correct; if (pDataArray->m->control_pressed) { pDataArray->m->mothurRemove(distFileName); return 0; } //driver(noise, sequences, uniqueNames, redundantNames, seqFreq, distFileName, newFFile+groups[i], newNFile+groups[i], newMFile+groups[i]+".map"); /////////////////////////////////////////////////////////////////////////////////////////////////// double cutOff = 0.08; int minIter = 10; int maxIter = 1000; double minDelta = 1e-6; int numIters = 0; double maxDelta = 1e6; int numSeqs = sequences.size(); //run cluster command string inputString = "phylip=" + distFileName + ", method=furthest, cutoff=0.08"; pDataArray->m->mothurOut("/******************************************/"); pDataArray->m->mothurOutEndLine(); pDataArray->m->mothurOut("Running command: cluster(" + inputString + ")"); pDataArray->m->mothurOutEndLine(); Command* clusterCommand = new ClusterCommand(inputString); clusterCommand->execute(); map > filenames = clusterCommand->getOutputFiles(); string listFileName = filenames["list"][0]; string rabundFileName = filenames["rabund"][0]; pDataArray->m->mothurRemove(rabundFileName); string sabundFileName = filenames["sabund"][0]; pDataArray->m->mothurRemove(sabundFileName); delete clusterCommand; pDataArray->m->mothurOut("/******************************************/"); pDataArray->m->mothurOutEndLine(); if (pDataArray->m->control_pressed) { pDataArray->m->mothurRemove(distFileName); pDataArray->m->mothurRemove(listFileName); return 0; } vector distances(numSeqs * numSeqs); noise.getDistanceData(distFileName, distances); pDataArray->m->mothurRemove(distFileName); if (pDataArray->m->control_pressed) { pDataArray->m->mothurRemove(listFileName); return 0; } vector otuData(numSeqs); vector otuFreq; vector > otuBySeqLookUp; noise.getListData(listFileName, cutOff, otuData, otuFreq, otuBySeqLookUp); pDataArray->m->mothurRemove(listFileName); if (pDataArray->m->control_pressed) { return 0; } int numOTUs = otuFreq.size(); vector weights(numOTUs, 0); vector change(numOTUs, 1); vector centroids(numOTUs, -1); vector cumCount(numOTUs, 0); vector tau(numSeqs, 1); vector anP(numSeqs, 0); vector anI(numSeqs, 0); vector anN(numSeqs, 0); vector > aanI = otuBySeqLookUp; while(numIters < minIter || ((maxDelta > minDelta) && (numIters < maxIter))){ if (pDataArray->m->control_pressed) { return 0; } noise.updateOTUCountData(otuFreq, otuBySeqLookUp, aanI, anP, anI, cumCount); if (pDataArray->m->control_pressed) { return 0; } maxDelta = noise.calcNewWeights(weights, seqFreq, anI, cumCount, anP, otuFreq, tau); if (pDataArray->m->control_pressed) { return 0; } noise.calcCentroids(anI, anP, change, centroids, cumCount, distances, seqFreq, otuFreq, tau); if (pDataArray->m->control_pressed) { return 0; } noise.checkCentroids(weights, centroids); if (pDataArray->m->control_pressed) { return 0; } otuFreq.assign(numOTUs, 0); int total = 0; for(int i=0;im->control_pressed) { return 0; } double offset = 1e6; double norm = 0.0000; double minWeight = 0.1; vector currentTau(numOTUs); for(int j=0;jm->control_pressed) { return 0; } if(weights[j] > minWeight && distances[i * numSeqs+centroids[j]] < offset){ offset = distances[i * numSeqs+centroids[j]]; } } for(int j=0;jm->control_pressed) { return 0; } if(weights[j] > minWeight){ currentTau[j] = exp(pDataArray->sigma * (-distances[(i * numSeqs + centroids[j])] + offset)) * weights[j]; norm += currentTau[j]; } else{ currentTau[j] = 0.0000; } } for(int j=0;jm->control_pressed) { return 0; } currentTau[j] /= norm; } for(int j=0;jm->control_pressed) { return 0; } if(currentTau[j] > 1.0e-4){ int oldTotal = total; total++; tau.resize(oldTotal+1); tau[oldTotal] = currentTau[j]; otuBySeqLookUp[j][otuFreq[j]] = oldTotal; aanI[j][otuFreq[j]] = i; otuFreq[j]++; } } anP.resize(total); anI.resize(total); } numIters++; } noise.updateOTUCountData(otuFreq, otuBySeqLookUp, aanI, anP, anI, cumCount); if (pDataArray->m->control_pressed) { return 0; } vector percentage(numSeqs); noise.setUpOTUData(otuData, percentage, cumCount, tau, otuFreq, anP, anI); if (pDataArray->m->control_pressed) { return 0; } noise.finishOTUData(otuData, otuFreq, anP, anI, cumCount, otuBySeqLookUp, aanI, tau); if (pDataArray->m->control_pressed) { return 0; } change.assign(numOTUs, 1); noise.calcCentroids(anI, anP, change, centroids, cumCount, distances, seqFreq, otuFreq, tau); if (pDataArray->m->control_pressed) { return 0; } vector finalTau(numOTUs, 0); for(int i=0;im->control_pressed) { return 0; } finalTau[otuData[i]] += int(seqFreq[i]); } noise.writeOutput(pDataArray->newFName+pDataArray->groups[k], pDataArray->newNName+pDataArray->groups[k], pDataArray->newMName+pDataArray->groups[k]+".map", finalTau, centroids, otuData, sequences, uniqueNames, redundantNames, seqFreq, distances); /////////////////////////////////////////////////////////////////////////////////////////////////// if (pDataArray->m->control_pressed) { return 0; } pDataArray->m->appendFiles(pDataArray->newFName+pDataArray->groups[k], pDataArray->newFName); pDataArray->m->mothurRemove(pDataArray->newFName+pDataArray->groups[k]); pDataArray->m->appendFiles(pDataArray->newNName+pDataArray->groups[k], pDataArray->newNName); pDataArray->m->mothurRemove(pDataArray->newNName+pDataArray->groups[k]); pDataArray->mapfileNames.push_back(pDataArray->newMName+pDataArray->groups[k]+".map"); pDataArray->m->mothurOut("It took " + toString(time(NULL) - start) + " secs to process group " + pDataArray->groups[k] + "."); pDataArray->m->mothurOutEndLine(); } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "ShhhSeqsCommand", "MyShhhSeqsThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/sortseqscommand.cpp000066400000000000000000001503511255543666200221720ustar00rootroot00000000000000// // sortseqscommand.cpp // Mothur // // Created by Sarah Westcott on 2/3/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "sortseqscommand.h" #include "sequence.hpp" #include "qualityscores.h" //********************************************************************************************************************** vector SortSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "FNGLT", "none","fasta",false,false); parameters.push_back(pfasta); CommandParameter pflow("flow", "InputTypes", "", "", "none", "FNGLT", "none","flow",false,false); parameters.push_back(pflow); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "FNGLT", "none","name",false,false); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "FNGLT", "none","count",false,false); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "FNGLT", "none","group",false,false); parameters.push_back(pgroup); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "FNGLT", "none","taxonomy",false,false); parameters.push_back(ptaxonomy); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "FNGLT", "none","qfile",false,false); parameters.push_back(pqfile); CommandParameter plarge("large", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(plarge); CommandParameter paccnos("accnos", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(paccnos); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SortSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The sort.seqs command puts the sequences in the same order for the following file types: accnos fasta, name, group, count, taxonomy, flow or quality file.\n"; helpString += "The sort.seqs command parameters are accnos, fasta, name, group, count, taxonomy, flow, qfile and large.\n"; helpString += "The accnos file allows you to specify the order you want the files in. If none is provided, mothur will use the order of the first file it reads.\n"; helpString += "The large parameters is used to indicate your files are too large to fit in RAM.\n"; helpString += "The sort.seqs command should be in the following format: sort.seqs(fasta=yourFasta).\n"; helpString += "Example sort.seqs(fasta=amazon.fasta).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SortSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],sorted,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],sorted,[extension]"; } else if (type == "name") { pattern = "[filename],sorted,[extension]"; } else if (type == "group") { pattern = "[filename],sorted,[extension]"; } else if (type == "count") { pattern = "[filename],sorted,[extension]"; } else if (type == "flow") { pattern = "[filename],sorted,[extension]"; } else if (type == "qfile") { pattern = "[filename],sorted,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SortSeqsCommand::SortSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["flow"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "SortSeqsCommand"); exit(1); } } //********************************************************************************************************************** SortSeqsCommand::SortSeqsCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["flow"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("accnos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["accnos"] = inputDir + it->second; } } it = parameters.find("flow"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["flow"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for parameters accnosfile = validParameter.validFile(parameters, "accnos", true); if (accnosfile == "not open") { accnosfile = ""; abort = true; } else if (accnosfile == "not found") { accnosfile = ""; } else { m->setAccnosFile(accnosfile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } flowfile = validParameter.validFile(parameters, "flow", true); if (flowfile == "not open") { flowfile = ""; abort = true; } else if (flowfile == "not found") { flowfile = ""; } else { m->setFlowFile(flowfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { abort = true; } else if (taxfile == "not found") { taxfile = ""; } else { m->setTaxonomyFile(taxfile); } qualfile = validParameter.validFile(parameters, "qfile", true); if (qualfile == "not open") { abort = true; } else if (qualfile == "not found") { qualfile = ""; } else { m->setQualFile(qualfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } string temp = validParameter.validFile(parameters, "large", false); if (temp == "not found") { temp = "f"; } large = m->isTrue(temp); if ((fastafile == "") && (namefile == "") && (countfile == "") && (groupfile == "") && (taxfile == "") && (flowfile == "") && (qualfile == "")) { m->mothurOut("You must provide at least one of the following: fasta, name, group, count, taxonomy, flow or quality."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if ((fastafile != "") && (namefile == "")) { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "SortSeqsCommand"); exit(1); } } //********************************************************************************************************************** int SortSeqsCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //read through the correct file and output lines you want to keep if (accnosfile != "") { vector temp; m->readAccnos(accnosfile, temp); for (int i = 0; i < temp.size(); i++) { names[temp[i]] = i; } m->mothurOut("\nUsing " + accnosfile + " to determine the order. It contains " + toString(temp.size()) + " representative sequences.\n"); } if (fastafile != "") { readFasta(); } if (flowfile != "") { readFlow(); } if (qualfile != "") { readQual(); } if (namefile != "") { readName(); } if (groupfile != "") { readGroup(); } if (countfile != "") { readCount(); } if (taxfile != "") { readTax(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (outputNames.size() != 0) { m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } itTypes = outputTypes.find("qfile"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setQualFile(current); } } itTypes = outputTypes.find("flow"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFlowFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } } return 0; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SortSeqsCommand::readFasta(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string outputFileName = getOutputFileName("fasta", variables); outputTypes["fasta"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(fastafile, in); string name; if (names.size() != 0) {//this is not the first file we are reading so we need to use the order we already have if (large) { //if the file is too large to fit in memory we can still process it, but the io will be very time consuming. //read through the file looking for 1000 seqs at a time. Once we find them output them and start looking for the next 1000. //this way we only store 1000 seqs in memory at a time. int numNames = names.size(); int numNamesInFile = 0; //to make sure we dont miss any seqs, add any seqs that are not in names but in the file to the end of names while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { numNamesInFile++; map::iterator it = names.find(name); if (it == names.end()) { names[name] = numNames; numNames++; m->mothurOut(name + " was not in the contained the file which determined the order, adding it to the end.\n"); } } m->gobble(in); } in.close(); out.close(); int numLeft = names.size(); if (numNamesInFile < numLeft) { numLeft = numNamesInFile; } int size = 1000; //assume that user can hold 1000 seqs in memory if (numLeft < size) { size = numLeft; } int times = 0; vector seqs; seqs.resize(size); for (int i = 0; i < seqs.size(); i++) { seqs[i].setName(""); } //this is so if some of the seqs are missing we dont print out garbage while (numLeft > 0) { ifstream in2; m->openInputFile(fastafile, in2); if (m->control_pressed) { in2.close(); m->mothurRemove(outputFileName); return 0; } int found = 0; int needToFind = size; if (numLeft < size) { needToFind = numLeft; } while(!in2.eof()){ if (m->control_pressed) { in2.close(); m->mothurRemove(outputFileName); return 0; } //stop reading if we already found the seqs we are looking for if (found >= needToFind) { break; } Sequence currSeq(in2); name = currSeq.getName(); if (name != "") { map::iterator it = names.find(name); if (it != names.end()) { //we found it, so put it in the vector in the right place. //is it in the set of seqs we are looking for this time around int thisSeqsPlace = it->second; thisSeqsPlace -= (times * size); if ((thisSeqsPlace < size) && (thisSeqsPlace >= 0)) { seqs[thisSeqsPlace] = currSeq; found++; } }else { m->mothurOut("[ERROR]: in logic of readFasta function.\n"); m->control_pressed = true; } } m->gobble(in2); } in2.close(); ofstream out2; m->openOutputFileAppend(outputFileName, out2); int output = seqs.size(); if (numLeft < seqs.size()) { output = numLeft; } for (int i = 0; i < output; i++) { if (seqs[i].getName() != "") { seqs[i].printSequence(out2); } } out2.close(); times++; numLeft -= output; } m->mothurOut("Ordered " + toString(numNamesInFile) + " sequences from " + fastafile + ".\n"); }else { vector seqs; seqs.resize(names.size()); for (int i = 0; i < seqs.size(); i++) { seqs[i].setName(""); } //this is so if some of the seqs are missing we dont print out garbage while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { map::iterator it = names.find(name); if (it != names.end()) { //we found it, so put it in the vector in the right place. seqs[it->second] = currSeq; }else { //if we cant find it then add it to the end names[name] = seqs.size(); seqs.push_back(currSeq); m->mothurOut(name + " was not in the contained the file which determined the order, adding it to the end.\n"); } } m->gobble(in); } in.close(); int count = 0; for (int i = 0; i < seqs.size(); i++) { if (seqs[i].getName() != "") { seqs[i].printSequence(out); count++; } } out.close(); m->mothurOut("Ordered " + toString(count) + " sequences from " + fastafile + ".\n"); } }else { //read in file to fill names int count = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { //if this name is in the accnos file names[name] = count; count++; currSeq.printSequence(out); } m->gobble(in); } in.close(); out.close(); m->mothurOut("\nUsing " + fastafile + " to determine the order. It contains " + toString(count) + " sequences.\n"); } return 0; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "readFasta"); exit(1); } } //********************************************************************************************************************** int SortSeqsCommand::readFlow(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(flowfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(flowfile)); variables["[extension]"] = m->getExtension(flowfile); string outputFileName = getOutputFileName("flow", variables); outputTypes["flow"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(flowfile, in); int numFlows; string name; in >> numFlows; m->gobble(in); if (names.size() != 0) {//this is not the first file we are reading so we need to use the order we already have if (large) { //if the file is too large to fit in memory we can still process it, but the io will be very time consuming. //read through the file looking for 1000 seqs at a time. Once we find them output them and start looking for the next 1000. //this way we only store 1000 seqs in memory at a time. int numNames = names.size(); int numNamesInFile = 0; //to make sure we dont miss any seqs, add any seqs that are not in names but in the file to the end of names while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; string rest = m->getline(in); if (name != "") { numNamesInFile++; map::iterator it = names.find(name); if (it == names.end()) { names[name] = numNames; numNames++; m->mothurOut(name + " was not in the contained the file which determined the order, adding it to the end.\n"); } } m->gobble(in); } in.close(); out.close(); int numLeft = names.size(); if (numNamesInFile < numLeft) { numLeft = numNamesInFile; } int size = 1000; //assume that user can hold 1000 seqs in memory if (numLeft < size) { size = numLeft; } int times = 0; vector seqs; seqs.resize(size, ""); while (numLeft > 0) { ifstream in2; m->openInputFile(flowfile, in2); in2 >> numFlows; m->gobble(in2); if (m->control_pressed) { in2.close(); m->mothurRemove(outputFileName); return 0; } int found = 0; int needToFind = size; if (numLeft < size) { needToFind = numLeft; } while(!in2.eof()){ if (m->control_pressed) { in2.close(); m->mothurRemove(outputFileName); return 0; } //stop reading if we already found the seqs we are looking for if (found >= needToFind) { break; } in2 >> name; string rest = m->getline(in2); if (name != "") { map::iterator it = names.find(name); if (it != names.end()) { //we found it, so put it in the vector in the right place. //is it in the set of seqs we are looking for this time around int thisSeqsPlace = it->second; thisSeqsPlace -= (times * size); if ((thisSeqsPlace < size) && (thisSeqsPlace >= 0)) { seqs[thisSeqsPlace] = (name +'\t' + rest); found++; } }else { m->mothurOut("[ERROR]: in logic of readFlow function.\n"); m->control_pressed = true; } } m->gobble(in2); } in2.close(); ofstream out2; m->openOutputFileAppend(outputFileName, out2); int output = seqs.size(); if (numLeft < seqs.size()) { output = numLeft; } for (int i = 0; i < output; i++) { if (seqs[i] != "") { out2 << seqs[i] << endl; } } out2.close(); times++; numLeft -= output; } m->mothurOut("Ordered " + toString(numNamesInFile) + " flows from " + flowfile + ".\n"); }else { vector seqs; seqs.resize(names.size(), ""); while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; string rest = m->getline(in); if (name != "") { map::iterator it = names.find(name); if (it != names.end()) { //we found it, so put it in the vector in the right place. seqs[it->second] = (name + '\t' + rest); }else { //if we cant find it then add it to the end names[name] = seqs.size(); seqs.push_back((name + '\t' + rest)); m->mothurOut(name + " was not in the contained the file which determined the order, adding it to the end.\n"); } } m->gobble(in); } in.close(); int count = 0; for (int i = 0; i < seqs.size(); i++) { if (seqs[i] != "") { out << seqs[i] << endl; count++; } } out.close(); m->mothurOut("Ordered " + toString(count) + " flows from " + flowfile + ".\n"); } }else { //read in file to fill names int count = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; string rest = m->getline(in); if (name != "") { //if this name is in the accnos file names[name] = count; count++; out << name << '\t' << rest << endl; } m->gobble(in); } in.close(); out.close(); m->mothurOut("\nUsing " + flowfile + " to determine the order. It contains " + toString(count) + " flows.\n"); } return 0; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "readFlow"); exit(1); } } //********************************************************************************************************************** int SortSeqsCommand::readQual(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(qualfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(qualfile)); variables["[extension]"] = m->getExtension(qualfile); string outputFileName = getOutputFileName("qfile", variables); outputTypes["qfile"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(qualfile, in); string name; if (names.size() != 0) {//this is not the first file we are reading so we need to use the order we already have if (large) { //if the file is too large to fit in memory we can still process it, but the io will be very time consuming. //read through the file looking for 1000 seqs at a time. Once we find them output them and start looking for the next 1000. //this way we only store 1000 seqs in memory at a time. int numNames = names.size(); int numNamesInFile = 0; //to make sure we dont miss any seqs, add any seqs that are not in names but in the file to the end of names while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } QualityScores currQual; currQual = QualityScores(in); name = currQual.getName(); if (name != "") { numNamesInFile++; map::iterator it = names.find(name); if (it == names.end()) { names[name] = numNames; numNames++; m->mothurOut(name + " was not in the contained the file which determined the order, adding it to the end.\n"); } } m->gobble(in); } in.close(); out.close(); int numLeft = names.size(); if (numNamesInFile < numLeft) { numLeft = numNamesInFile; } int size = 1000; //assume that user can hold 1000 seqs in memory if (numLeft < size) { size = numLeft; } int times = 0; vector seqs; seqs.resize(size); for (int i = 0; i < seqs.size(); i++) { seqs[i].setName(""); } //this is so if some of the seqs are missing we dont print out garbage while (numLeft > 0) { ifstream in2; m->openInputFile(qualfile, in2); if (m->control_pressed) { in2.close(); m->mothurRemove(outputFileName); return 0; } int found = 0; int needToFind = size; if (numLeft < size) { needToFind = numLeft; } while(!in2.eof()){ if (m->control_pressed) { in2.close(); m->mothurRemove(outputFileName); return 0; } //stop reading if we already found the seqs we are looking for if (found >= needToFind) { break; } QualityScores currQual; currQual = QualityScores(in2); name = currQual.getName(); if (name != "") { map::iterator it = names.find(name); if (it != names.end()) { //we found it, so put it in the vector in the right place. //is it in the set of seqs we are looking for this time around int thisSeqsPlace = it->second; thisSeqsPlace -= (times * size); if ((thisSeqsPlace < size) && (thisSeqsPlace >= 0)) { seqs[thisSeqsPlace] = currQual; found++; } }else { m->mothurOut("[ERROR]: in logic of readQual function.\n"); m->control_pressed = true; } } m->gobble(in2); } in2.close(); ofstream out2; m->openOutputFileAppend(outputFileName, out2); int output = seqs.size(); if (numLeft < seqs.size()) { output = numLeft; } for (int i = 0; i < output; i++) { if (seqs[i].getName() != "") { seqs[i].printQScores(out2); } } out2.close(); times++; numLeft -= output; } m->mothurOut("Ordered " + toString(numNamesInFile) + " sequences from " + qualfile + ".\n"); }else { vector seqs; seqs.resize(names.size()); for (int i = 0; i < seqs.size(); i++) { seqs[i].setName(""); } //this is so if some of the seqs are missing we dont print out garbage while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } QualityScores currQual; currQual = QualityScores(in); name = currQual.getName(); if (name != "") { map::iterator it = names.find(name); if (it != names.end()) { //we found it, so put it in the vector in the right place. seqs[it->second] = currQual; }else { //if we cant find it then add it to the end names[name] = seqs.size(); seqs.push_back(currQual); m->mothurOut(name + " was not in the contained the file which determined the order, adding it to the end.\n"); } } m->gobble(in); } in.close(); int count = 0; for (int i = 0; i < seqs.size(); i++) { if (seqs[i].getName() != "") { seqs[i].printQScores(out); count++; } } out.close(); m->mothurOut("Ordered " + toString(count) + " sequences from " + qualfile + ".\n"); } }else { //read in file to fill names int count = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } QualityScores currQual; currQual = QualityScores(in); m->gobble(in); if (currQual.getName() != "") { //if this name is in the accnos file names[currQual.getName()] = count; count++; currQual.printQScores(out); } m->gobble(in); } in.close(); out.close(); m->mothurOut("\nUsing " + qualfile + " to determine the order. It contains " + toString(count) + " sequences.\n"); } return 0; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "readQual"); exit(1); } } //********************************************************************************************************************** int SortSeqsCommand::readName(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputFileName = getOutputFileName("name", variables); outputTypes["name"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(namefile, in); string name, firstCol, secondCol; if (names.size() != 0) {//this is not the first file we are reading so we need to use the order we already have vector seqs; seqs.resize(names.size(), ""); while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); in >> secondCol; m->gobble(in); if (firstCol != "") { map::iterator it = names.find(firstCol); if (it != names.end()) { //we found it, so put it in the vector in the right place. seqs[it->second] = firstCol + '\t' + secondCol; }else { //if we cant find it then add it to the end names[firstCol] = seqs.size(); seqs.push_back((firstCol + '\t' + secondCol)); m->mothurOut(firstCol + " was not in the contained the file which determined the order, adding it to the end.\n"); } } } in.close(); int count = 0; for (int i = 0; i < seqs.size(); i++) { if (seqs[i] != "") { out << seqs[i] << endl; count++; } } out.close(); m->mothurOut("Ordered " + toString(count) + " sequences from " + namefile + ".\n"); }else { //read in file to fill names int count = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); in >> secondCol; m->gobble(in); if (firstCol != "") { //if this name is in the accnos file names[firstCol] = count; count++; out << firstCol << '\t' << secondCol << endl; } m->gobble(in); } in.close(); out.close(); m->mothurOut("\nUsing " + namefile + " to determine the order. It contains " + toString(count) + " representative sequences.\n"); } return 0; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "readName"); exit(1); } } //********************************************************************************************************************** int SortSeqsCommand::readCount(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string outputFileName = getOutputFileName("count", variables); outputTypes["count"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(countfile, in); string firstCol, rest; if (names.size() != 0) {//this is not the first file we are reading so we need to use the order we already have vector seqs; seqs.resize(names.size(), ""); string headers = m->getline(in); m->gobble(in); while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); rest = m->getline(in); m->gobble(in); if (firstCol != "") { map::iterator it = names.find(firstCol); if (it != names.end()) { //we found it, so put it in the vector in the right place. seqs[it->second] = firstCol + '\t' + rest; }else { //if we cant find it then add it to the end names[firstCol] = seqs.size(); seqs.push_back((firstCol + '\t' + rest)); m->mothurOut(firstCol + " was not in the contained the file which determined the order, adding it to the end.\n"); } } } in.close(); int count = 0; out << headers << endl; for (int i = 0; i < seqs.size(); i++) { if (seqs[i] != "") { out << seqs[i] << endl; count++; } } out.close(); m->mothurOut("Ordered " + toString(count) + " sequences from " + countfile + ".\n"); }else { //read in file to fill names int count = 0; string headers = m->getline(in); m->gobble(in); out << headers << endl; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> firstCol; m->gobble(in); rest = m->getline(in); m->gobble(in); if (firstCol != "") { //if this name is in the accnos file names[firstCol] = count; count++; out << firstCol << '\t' << rest << endl; } m->gobble(in); } in.close(); out.close(); m->mothurOut("\nUsing " + countfile + " to determine the order. It contains " + toString(count) + " representative sequences.\n"); } return 0; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "readCount"); exit(1); } } //********************************************************************************************************************** int SortSeqsCommand::readGroup(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string outputFileName = getOutputFileName("group", variables); outputTypes["group"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(groupfile, in); string name, group; if (names.size() != 0) {//this is not the first file we are reading so we need to use the order we already have vector seqs; seqs.resize(names.size(), ""); while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> group; m->gobble(in); if (name != "") { map::iterator it = names.find(name); if (it != names.end()) { //we found it, so put it in the vector in the right place. seqs[it->second] = name + '\t' + group; }else { //if we cant find it then add it to the end names[name] = seqs.size(); seqs.push_back((name + '\t' + group)); m->mothurOut(name + " was not in the contained the file which determined the order, adding it to the end.\n"); } } } in.close(); int count = 0; for (int i = 0; i < seqs.size(); i++) { if (seqs[i] != "") { out << seqs[i] << endl; count++; } } out.close(); m->mothurOut("Ordered " + toString(count) + " sequences from " + groupfile + ".\n"); }else { //read in file to fill names int count = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> group; m->gobble(in); if (name != "") { //if this name is in the accnos file names[name] = count; count++; out << name << '\t' << group << endl; } m->gobble(in); } in.close(); out.close(); m->mothurOut("\nUsing " + groupfile + " to determine the order. It contains " + toString(count) + " sequences.\n"); } return 0; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "readGroup"); exit(1); } } //********************************************************************************************************************** int SortSeqsCommand::readTax(){ try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(taxfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(taxfile)); variables["[extension]"] = m->getExtension(taxfile); string outputFileName = getOutputFileName("taxonomy", variables); outputTypes["taxonomy"].push_back(outputFileName); outputNames.push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); ifstream in; m->openInputFile(taxfile, in); string name, tax; if (names.size() != 0) {//this is not the first file we are reading so we need to use the order we already have vector seqs; seqs.resize(names.size(), ""); while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> tax; m->gobble(in); if (name != "") { map::iterator it = names.find(name); if (it != names.end()) { //we found it, so put it in the vector in the right place. seqs[it->second] = name + '\t' + tax; }else { //if we cant find it then add it to the end names[name] = seqs.size(); seqs.push_back((name + '\t' + tax)); m->mothurOut(name + " was not in the contained the file which determined the order, adding it to the end.\n"); } } } in.close(); int count = 0; for (int i = 0; i < seqs.size(); i++) { if (seqs[i] != "") { out << seqs[i] << endl; count++; } } out.close(); m->mothurOut("Ordered " + toString(count) + " sequences from " + taxfile + ".\n"); }else { //read in file to fill names int count = 0; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outputFileName); return 0; } in >> name; m->gobble(in); in >> tax; m->gobble(in); if (name != "") { //if this name is in the accnos file names[name] = count; count++; out << name << '\t' << tax << endl; } m->gobble(in); } in.close(); out.close(); m->mothurOut("\nUsing " + taxfile + " to determine the order. It contains " + toString(count) + " sequences.\n"); } return 0; return 0; } catch(exception& e) { m->errorOut(e, "SortSeqsCommand", "readTax"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/sortseqscommand.h000066400000000000000000000023441255543666200216350ustar00rootroot00000000000000#ifndef Mothur_sortseqscommand_h #define Mothur_sortseqscommand_h // // sortseqscommand.h // Mothur // // Created by Sarah Westcott on 2/3/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "command.hpp" class SortSeqsCommand : public Command { public: SortSeqsCommand(string); SortSeqsCommand(); ~SortSeqsCommand(){} vector setParameters(); string getCommandName() { return "sort.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Sort.seqs"; } string getDescription() { return "puts sequences from a fasta, name, group, quality, flow or taxonomy file in the same order"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: map names; string accnosfile, fastafile, namefile, groupfile, countfile, taxfile, qualfile, flowfile, outputDir; bool abort, large; vector outputNames; int readFasta(); int readFlow(); int readName(); int readGroup(); int readTax(); int readCount(); int readQual(); }; #endif mothur-1.36.1/source/commands/sparcccommand.cpp000066400000000000000000000672021255543666200215640ustar00rootroot00000000000000// // sparcccommand.cpp // Mothur // // Created by SarahsWork on 5/10/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "sparcccommand.h" //********************************************************************************************************************** vector SparccCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","outputType",false,true); parameters.push_back(pshared); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter psamplings("samplings", "Number", "", "20", "", "", "","",false,false,false); parameters.push_back(psamplings); CommandParameter piterations("iterations", "Number", "", "10", "", "", "","",false,false,false); parameters.push_back(piterations); CommandParameter ppermutations("permutations", "Number", "", "1000", "", "", "","",false,false,false); parameters.push_back(ppermutations); CommandParameter pmethod("method", "Multiple", "relabund-dirichlet", "dirichlet", "", "", "","",false,false); parameters.push_back(pmethod); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); //every command must have inputdir and outputdir. This allows mothur users to redirect input and output files. CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SparccCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SparccCommand::getHelpString(){ try { string helpString = ""; helpString += "The sparcc command allows you to ....\n"; helpString += "The sparcc command parameters are: shared, groups, label, samplings, iterations, permutations, processors and method.\n"; helpString += "The samplings parameter is used to .... Default=20.\n"; helpString += "The iterations parameter is used to ....Default=10.\n"; helpString += "The permutations parameter is used to ....Default=1000.\n"; helpString += "The method parameter is used to ....Options are relabund and dirichlet. Default=dirichlet.\n"; helpString += "The default value for groups is all the groups in your sharedfile.\n"; helpString += "The label parameter is used to analyze specific labels in your shared file.\n"; helpString += "The sparcc command should be in the following format: sparcc(shared=yourSharedFile)\n"; helpString += "sparcc(shared=final.an.shared)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SparccCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SparccCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "corr") { pattern = "[filename],[distance],sparcc_correlation"; } else if (type == "pvalue") { pattern = "[filename],[distance],sparcc_pvalue"; } else if (type == "sparccrelabund") { pattern = "[filename],[distance],sparcc_relabund"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SparccCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SparccCommand::SparccCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["corr"] = tempOutNames; outputTypes["pvalue"] = tempOutNames; outputTypes["sparccrelabund"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SparccCommand", "SparccCommand"); exit(1); } } //********************************************************************************************************************** SparccCommand::SparccCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["corr"] = tempOutNames; //filetypes should be things like: shared, fasta, accnos... outputTypes["pvalue"] = tempOutNames; outputTypes["sparccrelabund"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //check for parameters //get shared file, it is required sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); //if user entered a file with a path then preserve it } normalizeMethod = validParameter.validFile(parameters, "method", false); if (normalizeMethod == "not found") { normalizeMethod = "dirichlet"; } if ((normalizeMethod == "dirichlet") || (normalizeMethod == "relabund")) { } else { m->mothurOut(normalizeMethod + " is not a valid method. Valid methods are dirichlet and relabund."); m->mothurOutEndLine(); abort = true; } string temp = validParameter.validFile(parameters, "samplings", false); if (temp == "not found"){ temp = "20"; } m->mothurConvert(temp, numSamplings); if(normalizeMethod == "relabund"){ numSamplings = 1; } temp = validParameter.validFile(parameters, "iterations", false); if (temp == "not found"){ temp = "10"; } m->mothurConvert(temp, maxIterations); temp = validParameter.validFile(parameters, "permutations", false); if (temp == "not found"){ temp = "1000"; } m->mothurConvert(temp, numPermutations); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); string groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } m->setGroups(Groups); string label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } } } catch(exception& e) { m->errorOut(e, "SparccCommand", "SparccCommand"); exit(1); } } //********************************************************************************************************************** int SparccCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); InputData input(sharedfile, "sharedfile"); vector lookup = input.getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); }return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } if (m->control_pressed) { return 0; } //get next line to process lookup = input.getSharedRAbundVectors(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input.getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOut("It took " + toString(time(NULL) - start) + " seconds to process."); m->mothurOutEndLine(); m->mothurOutEndLine(); //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SparccCommand", "execute"); exit(1); } } /**************************************************************************************************/ vector > SparccCommand::shuffleSharedVector(vector >& sharedVector){ try { int numGroups = (int)sharedVector.size(); int numOTUs = (int)sharedVector[0].size(); vector > shuffledVector = sharedVector; for(int i=0;ierrorOut(e, "SparccCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SparccCommand::process(vector& lookup){ try { cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); vector > sharedVector; vector otuNames = m->currentSharedBinLabels; //fill sharedVector to pass to CalcSparcc for (int i = 0; i < lookup.size(); i++) { vector abunds = lookup[i]->getAbundances(); vector temp; for (int j = 0; j < abunds.size(); j++) { temp.push_back((float) abunds[j]); } sharedVector.push_back(temp); } int numOTUs = (int)sharedVector[0].size(); int numGroups = lookup.size(); map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[distance]"] = lookup[0]->getLabel(); string relAbundFileName = getOutputFileName("sparccrelabund", variables); ofstream relAbundFile; m->openOutputFile(relAbundFileName, relAbundFile); outputNames.push_back(relAbundFileName); outputTypes["sparccrelabund"].push_back(relAbundFileName); relAbundFile << "OTU\taveRelAbund\n"; for(int i=0;icontrol_pressed) { relAbundFile.close(); return 0; } double relAbund = 0.0000; for(int j=0;jgetNumSeqs(); } relAbundFile << otuNames[i] <<'\t' << relAbund / (double) numGroups << endl; } relAbundFile.close(); CalcSparcc originalData(sharedVector, maxIterations, numSamplings, normalizeMethod); vector > origCorrMatrix = originalData.getRho(); string correlationFileName = getOutputFileName("corr", variables); ofstream correlationFile; m->openOutputFile(correlationFileName, correlationFile); outputNames.push_back(correlationFileName); outputTypes["corr"].push_back(correlationFileName); correlationFile.setf(ios::fixed, ios::floatfield); correlationFile.setf(ios::showpoint); for(int i=0;i > pValues = createProcesses(sharedVector, origCorrMatrix); if (m->control_pressed) { return 0; } string pValueFileName = getOutputFileName("pvalue", variables); ofstream pValueFile; m->openOutputFile(pValueFileName, pValueFile); outputNames.push_back(pValueFileName); outputTypes["pvalue"].push_back(pValueFileName); pValueFile.setf(ios::fixed, ios::floatfield); pValueFile.setf(ios::showpoint); for(int i=0;ierrorOut(e, "SparccCommand", "process"); exit(1); } } //********************************************************************************************************************** vector > SparccCommand::createProcesses(vector >& sharedVector, vector >& origCorrMatrix){ try { int numOTUs = sharedVector[0].size(); vector > pValues; bool recalc = false; if(processors == 1){ pValues = driver(sharedVector, origCorrMatrix, numPermutations); }else{ //divide iters between processors vector procIters; int numItersPerProcessor = numPermutations / processors; //divide iters between processes for (int h = 0; h < processors; h++) { if(h == processors - 1){ numItersPerProcessor = numPermutations - h * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } vector processIDS; int process = 1; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ pValues = driver(sharedVector, origCorrMatrix, procIters[process]); //pass pvalues to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".pvalues.temp"; m->openOutputFile(tempFile, out); //pass values for (int i = 0; i < pValues.size(); i++) { for (int j = 0; j < pValues[i].size(); j++) { out << pValues[i][j] << '\t'; } out << endl; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".pvalues.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove((toString(processIDS[i]) + ".pvalues.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); procIters.clear(); int numItersPerProcessor = numPermutations / processors; //divide iters between processes for (int h = 0; h < processors; h++) { if(h == processors - 1){ numItersPerProcessor = numPermutations - h * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } pValues.clear(); processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ pValues = driver(sharedVector, origCorrMatrix, procIters[process]); //pass pvalues to parent ofstream out; string tempFile = m->mothurGetpid(process) + ".pvalues.temp"; m->openOutputFile(tempFile, out); //pass values for (int i = 0; i < pValues.size(); i++) { for (int j = 0; j < pValues[i].size(); j++) { out << pValues[i][j] << '\t'; } out << endl; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do my part pValues = driver(sharedVector, origCorrMatrix, procIters[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFile, in); ////// to do /////////// int numTemp; numTemp = 0; for (int j = 0; j < pValues.size(); j++) { for (int k = 0; k < pValues.size(); k++) { in >> numTemp; m->gobble(in); pValues[j][k] += numTemp; } m->gobble(in); } in.close(); m->mothurRemove(tempFile); } #else //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; i > copySharedVector = sharedVector; vector< vector > copyOrig = origCorrMatrix; sparccData* temp = new sparccData(m, procIters[i], copySharedVector, copyOrig, numSamplings, maxIterations, numPermutations, normalizeMethod); pDataArray.push_back(temp); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MySparccThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //do my part pValues = driver(sharedVector, origCorrMatrix, procIters[0]); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (int j = 0; j < pDataArray[i]->pValues.size(); j++) { for (int k = 0; k < pDataArray[i]->pValues[j].size(); k++) { pValues[j][k] += pDataArray[i]->pValues[j][k]; } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif } for(int i=0;ierrorOut(e, "SparccCommand", "createProcesses"); exit(1); } } //********************************************************************************************************************** vector > SparccCommand::driver(vector >& sharedVector, vector >& origCorrMatrix, int numPerms){ try { int numOTUs = sharedVector[0].size(); vector > sharedShuffled = sharedVector; vector > pValues(numOTUs); for(int i=0;icontrol_pressed) { return pValues; } sharedShuffled = shuffleSharedVector(sharedVector); CalcSparcc permutedData(sharedShuffled, maxIterations, numSamplings, normalizeMethod); vector > permuteCorrMatrix = permutedData.getRho(); for(int j=0;j= 0 && randValue > observedValue) { pValues[j][k]++; }//this method seems to deflate the else if(observedValue < 0 && randValue < observedValue){ pValues[j][k]++; }//pvalues of small rho values } } if((i+1) % (int)(numPermutations * 0.05) == 0){ cout << i+1 << endl; } } return pValues; } catch(exception& e) { m->errorOut(e, "SparccCommand", "driver"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/sparcccommand.h000066400000000000000000000116421255543666200212260ustar00rootroot00000000000000// // sparcccommand.h // Mothur // // Created by SarahsWork on 5/10/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_sparcccommand_h #define Mothur_sparcccommand_h #include "command.hpp" #include "inputdata.h" #include "calcsparcc.h" /**************************************************************************************************/ class SparccCommand : public Command { public: SparccCommand(string); SparccCommand(); ~SparccCommand(){} vector setParameters(); string getCommandName() { return "sparcc"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getOutputPattern(string); //commmand category choices: Sequence Processing, OTU-Based Approaches, Hypothesis Testing, Phylotype Analysis, General, Clustering and Hidden string getHelpString(); string getCitation() { return "Friedman J, Alm EJ (2012) Inferring Correlation Networks from Genomic Survey Data. PLoS Comput Biol 8(9): e1002687. doi:10.1371/journal.pcbi.1002687 http://www.mothur.org/wiki/Sparcc"; } string getDescription() { return "Calculates correlations between OTUs using a method that is insensitive to the use of relative abundance data"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, allLines; string outputDir, sharedfile, normalizeMethod; int numSamplings, maxIterations, numPermutations, processors; set labels; vector Groups; vector outputNames; int process(vector&); vector > createProcesses(vector >&, vector >&); vector > driver(vector >&, vector >&, int); vector > shuffleSharedVector(vector >&); }; /**************************************************************************************************/ struct sparccData { MothurOut* m; int numPerms; vector< vector > sharedVector; vector< vector > origCorrMatrix; vector > pValues; int numSamplings, maxIterations, numPermutations; string normalizeMethod; sparccData(){} sparccData(MothurOut* mout, int it, vector< vector > cs, vector< vector > co, int ns, int mi, int np, string nm) { m = mout; numPerms = it; sharedVector = cs; origCorrMatrix = co; numSamplings = ns; maxIterations = mi; numPermutations = np; normalizeMethod = nm; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MySparccThreadFunction(LPVOID lpParam){ sparccData* pDataArray; pDataArray = (sparccData*)lpParam; try { int numOTUs = pDataArray->sharedVector[0].size(); vector > sharedShuffled = pDataArray->sharedVector; pDataArray->pValues.resize(numOTUs); for(int i=0;ipValues[i].assign(numOTUs, 0); } for(int i=0;inumPerms;i++){ if (pDataArray->m->control_pressed) { return 0; } //sharedShuffled = shuffleSharedVector(sharedVector); ////////////////////////////////////////////////////////// int numGroups = (int)pDataArray->sharedVector.size(); sharedShuffled = pDataArray->sharedVector; for(int k=0;ksharedVector[rand()%numGroups][j]; } } ///////////////////////////////////////////////////////// CalcSparcc permutedData(sharedShuffled, pDataArray->maxIterations, pDataArray->numSamplings, pDataArray->normalizeMethod); vector > permuteCorrMatrix = permutedData.getRho(); for(int j=0;jorigCorrMatrix[j][k]; if(observedValue >= 0 && randValue > observedValue) { pDataArray->pValues[j][k]++; }//this method seems to deflate the else if(observedValue < 0 && randValue < observedValue){ pDataArray->pValues[j][k]++; }//pvalues of small rho values } } if((i+1) % (int)(pDataArray->numPermutations * 0.05) == 0){ cout << i+1 << endl; } } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "SparccCommand", "MySparccThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/splitabundcommand.cpp000066400000000000000000001475611255543666200224650ustar00rootroot00000000000000/* * splitabundcommand.cpp * Mothur * * Created by westcott on 5/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "splitabundcommand.h" #include "sharedutilities.h" //********************************************************************************************************************** vector SplitAbundCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "FNGLT", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","count",false,false); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","group",false,false); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "FNGLT", "none","list",false,false,true); parameters.push_back(plist); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pcutoff("cutoff", "Number", "", "0", "", "", "","",false,true); parameters.push_back(pcutoff); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter paccnos("accnos", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(paccnos); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SplitAbundCommand::getHelpString(){ try { string helpString = ""; helpString += "The split.abund command reads a fasta file and a list or a names file splits the sequences into rare and abundant groups. \n"; helpString += "The split.abund command parameters are fasta, list, name, count, cutoff, group, label, groups, cutoff and accnos.\n"; helpString += "The fasta and a list or name or count parameter are required, and you must provide a cutoff value.\n"; helpString += "The cutoff parameter is used to qualify what is abundant and rare.\n"; helpString += "The group parameter allows you to parse a group file into rare and abundant groups.\n"; helpString += "The label parameter is used to read specific labels in your listfile you want to use.\n"; helpString += "The accnos parameter allows you to output a .rare.accnos and .abund.accnos files to use with the get.seqs and remove.seqs commands.\n"; helpString += "The groups parameter allows you to parse the files into rare and abundant files by group. \n"; helpString += "For example if you set groups=A-B-C, you will get a .A.abund, .A.rare, .B.abund, .B.rare, .C.abund, .C.rare files. \n"; helpString += "If you want .abund and .rare files for all groups, set groups=all. \n"; helpString += "The split.abund command should be used in the following format: split.abund(fasta=yourFasta, list=yourListFile, group=yourGroupFile, label=yourLabels, cutoff=yourCutoff).\n"; helpString += "Example: split.abund(fasta=abrecovery.fasta, list=abrecovery.fn.list, group=abrecovery.groups, label=0.03, cutoff=2).\n"; helpString += "Note: No spaces between parameter labels (i.e. list), '=' and parameters (i.e.yourListfile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SplitAbundCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],[tag],[tag2],fasta-[filename],[tag],[group],[tag2],fasta"; } else if (type == "list") { pattern = "[filename],[tag],[tag2],list-[filename],[group],[tag],[tag2],list"; } else if (type == "name") { pattern = "[filename],[tag],names-[filename],[group],[tag],names"; } else if (type == "count") { pattern = "[filename],[tag],[tag2],count_table-[filename],[tag],count_table"; } else if (type == "group") { pattern = "[filename],[tag],[tag2],groups-[filename],[tag],[group],[tag2],groups"; } else if (type == "accnos") { pattern = "[filename],[tag],[tag2],accnos-[filename],[tag],[group],[tag2],accnos"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SplitAbundCommand::SplitAbundCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["fasta"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "SplitAbundCommand"); exit(1); } } //********************************************************************************************************************** SplitAbundCommand::SplitAbundCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["accnos"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { abort = true; } else if (listfile == "not found") { listfile = ""; } else{ inputFile = listfile; m->setListFile(listfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else{ inputFile = namefile; m->setNameFile(namefile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; } else if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastafile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { int error = groupMap.readMap(groupfile); if (error == 1) { abort = true; } m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); ct.readTable(countfile, true, false); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } if (((groupfile == "") && (countfile == ""))&& (groups != "")) { m->mothurOut("You cannot select groups without a valid group or count file, I will disregard your groups selection. "); m->mothurOutEndLine(); groups = ""; Groups.clear(); } if (countfile != "") { if (!ct.hasGroupInfo()) { m->mothurOut("You cannot pick groups without group info in your count file; I will disregard your groups selection."); m->mothurOutEndLine(); groups = ""; Groups.clear(); } } //do you have all files needed if ((listfile == "") && (namefile == "") && (countfile == "")) { namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current list, count or namefile and one is required."); m->mothurOutEndLine(); abort = true; } } } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; allLines = 1; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } string temp = validParameter.validFile(parameters, "accnos", false); if (temp == "not found") { temp = "F"; } accnos = m->isTrue(temp); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, cutoff); if (cutoff == 0) { m->mothurOut("You must provide a cutoff to qualify what is abundant for the split.abund command. "); m->mothurOutEndLine(); abort = true; } } } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "SplitAbundCommand"); exit(1); } } //********************************************************************************************************************** SplitAbundCommand::~SplitAbundCommand(){} //********************************************************************************************************************** int SplitAbundCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (Groups.size() != 0) { vector allGroups; if (countfile != "") { allGroups = ct.getNamesOfGroups(); } else { allGroups = groupMap.getNamesOfGroups(); } SharedUtil util; util.setGroups(Groups, allGroups); } if (listfile != "") { //you are using a listfile to determine abundance if (outputDir == "") { outputDir = m->hasPath(listfile); } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; InputData input(listfile, "list"); ListVector* list = input.getListVector(); string lastLabel = list->getLabel(); //do you have a namefile or do we need to similate one? if (namefile != "") { readNamesFile(); } else { createNameMap(list); } if (m->control_pressed) { delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete list; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel()); m->mothurOutEndLine(); splitList(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input.getListVector(lastLabel); //get new list vector to process m->mothurOut(list->getLabel()); m->mothurOutEndLine(); splitList(list); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = input.getListVector(); //get new list vector to process } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input.getListVector(lastLabel); //get new list vector to process m->mothurOut(list->getLabel()); m->mothurOutEndLine(); splitList(list); delete list; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } }else if (namefile != "") { //you are using the namefile to determine abundance if (outputDir == "") { outputDir = m->hasPath(namefile); } splitNames(); writeNames(); string tag = ""; if (groupfile != "") { parseGroup(tag); } if (accnos) { writeAccnos(tag); } if (fastafile != "") { parseFasta(tag); } }else { //split by countfile string tag = ""; splitCount(); if (accnos) { writeAccnos(tag); } if (fastafile != "") { parseFasta(tag); } } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("accnos"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setAccnosFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "execute"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::splitList(ListVector* thisList) { try { rareNames.clear(); abundNames.clear(); //get rareNames and abundNames int numRareBins = 0; for (int i = 0; i < thisList->getNumBins(); i++) { if (m->control_pressed) { return 0; } string bin = thisList->get(i); vector names; m->splitAtComma(bin, names); //parses bin into individual sequence names int size = names.size(); //if countfile is not blank we assume the list file is unique, otherwise we assume it includes all seqs if (countfile != "") { size = 0; for (int j = 0; j < names.size(); j++) { size += ct.getNumSeqs(names[j]); } } if (size <= cutoff) { numRareBins++; for (int j = 0; j < names.size(); j++) { rareNames.insert(names[j]); } }else{ for (int j = 0; j < names.size(); j++) { abundNames.insert(names[j]); } } }//end for string tag = thisList->getLabel(); writeList(thisList, tag, numRareBins); if (groupfile != "") { parseGroup(tag); } if (accnos) { writeAccnos(tag); } if (fastafile != "") { parseFasta(tag); } if (countfile != "") { parseCount(tag); } return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "splitList"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::writeList(ListVector* thisList, string tag, int numRareBins) { try { map filehandles; if (Groups.size() == 0) { int numAbundBins = thisList->getNumBins() - numRareBins; ofstream aout; ofstream rout; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(listfile)); variables["[tag]"] = tag; variables["[tag2]"] = "rare"; string rare = getOutputFileName("list",variables); m->openOutputFile(rare+".temp", rout); outputNames.push_back(rare); outputTypes["list"].push_back(rare); variables["[tag2]"] = "abund"; string abund = getOutputFileName("list",variables); m->openOutputFile(abund+".temp", aout); outputNames.push_back(abund); outputTypes["list"].push_back(abund); if (rareNames.size() != 0) { rout << thisList->getLabel() << '\t' << numRareBins; } if (abundNames.size() != 0) { aout << thisList->getLabel() << '\t' << numAbundBins; } vector binLabels = thisList->getLabels(); string rareHeader = "label\tnumOtus"; string abundHeader = "label\tnumOtus"; for (int i = 0; i < thisList->getNumBins(); i++) { if (m->control_pressed) { break; } string bin = thisList->get(i); vector names; m->splitAtComma(bin, names); int size = names.size(); if (countfile != "") { size = 0; for (int j = 0; j < names.size(); j++) { size += ct.getNumSeqs(names[j]); } } if (size <= cutoff) { rout << '\t' << bin; rareHeader += '\t' + binLabels[i]; } else { aout << '\t' << bin; abundHeader += '\t' + binLabels[i]; } } if (rareNames.size() != 0) { rout << endl; } if (abundNames.size() != 0) { aout << endl; } rout.close(); aout.close(); //add headers ofstream r; m->openOutputFile(rare, r); r << rareHeader << endl; r.close(); m->appendFiles(rare+".temp", rare); m->mothurRemove(rare+".temp"); ofstream a; m->openOutputFile(abund, a); a << abundHeader << endl; a.close(); m->appendFiles(abund+".temp", abund); m->mothurRemove(abund+".temp"); }else{ //parse names by abundance and group string fileroot = outputDir + m->getRootName(m->getSimpleName(listfile)); ofstream* temp; ofstream* temp2; //map wroteFile; map filehandles; map::iterator it3; for (int i=0; i variables; variables["[filename]"] = fileroot; variables["[tag]"] = tag; variables["[tag2]"] = "rare"; variables["[group]"] = Groups[i]; string rareGroupFileName = getOutputFileName("list",variables); variables["[tag2]"] = "abund"; string abundGroupFileName = getOutputFileName("list",variables); m->openOutputFile(rareGroupFileName, *(filehandles[Groups[i]+".rare"])); m->openOutputFile(abundGroupFileName, *(filehandles[Groups[i]+".abund"])); outputNames.push_back(rareGroupFileName); outputTypes["list"].push_back(rareGroupFileName); outputNames.push_back(abundGroupFileName); outputTypes["list"].push_back(abundGroupFileName); } map groupVector; map groupLabels; map::iterator itGroup; map groupNumBins; for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { groupNumBins[it3->first] = 0; groupVector[it3->first] = ""; groupLabels[it3->first] = "label\tnumOtus"; } vector binLabels = thisList->getLabels(); for (int i = 0; i < thisList->getNumBins(); i++) { if (m->control_pressed) { break; } map groupBins; string bin = thisList->get(i); vector names; m->splitAtComma(bin, names); //parses bin into individual sequence names //parse bin into list of sequences in each group for (int j = 0; j < names.size(); j++) { string rareAbund; if (rareNames.count(names[j]) != 0) { //you are a rare name rareAbund = ".rare"; }else{ //you are a abund name rareAbund = ".abund"; } if (countfile == "") { string group = groupMap.getGroup(names[j]); if (m->inUsersGroups(group, Groups)) { //only add if this is in a group we want itGroup = groupBins.find(group+rareAbund); if(itGroup == groupBins.end()) { groupBins[group+rareAbund] = names[j]; //add first name groupNumBins[group+rareAbund]++; }else{ //add another name groupBins[group+rareAbund] += "," + names[j]; } }else if(group == "not found") { m->mothurOut(names[j] + " is not in your groupfile. Ignoring."); m->mothurOutEndLine(); } }else { vector thisSeqsGroups = ct.getGroups(names[j]); for (int k = 0; k < thisSeqsGroups.size(); k++) { if (m->inUsersGroups(thisSeqsGroups[k], Groups)) { //only add if this is in a group we want itGroup = groupBins.find(thisSeqsGroups[k]+rareAbund); if(itGroup == groupBins.end()) { groupBins[thisSeqsGroups[k]+rareAbund] = names[j]; //add first name groupNumBins[thisSeqsGroups[k]+rareAbund]++; }else{ //add another name groupBins[thisSeqsGroups[k]+rareAbund] += "," + names[j]; } } } } } for (itGroup = groupBins.begin(); itGroup != groupBins.end(); itGroup++) { groupVector[itGroup->first] += '\t' + itGroup->second; groupLabels[itGroup->first] += '\t' + binLabels[i]; } } //end list vector for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(filehandles[it3->first])) << groupLabels[it3->first] << endl; (*(filehandles[it3->first])) << thisList->getLabel() << '\t' << groupNumBins[it3->first] << groupVector[it3->first] << endl; // label numBins listvector for that group (*(filehandles[it3->first])).close(); delete it3->second; } } return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "writeList"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::splitCount() { //countfile try { rareNames.clear(); abundNames.clear(); vector allNames = ct.getNamesOfSeqs(); for (int i = 0; i < allNames.size(); i++) { if (m->control_pressed) { return 0; } int size = ct.getNumSeqs(allNames[i]); nameMap[allNames[i]] = allNames[i]; if (size <= cutoff) { rareNames.insert(allNames[i]); }else{ abundNames.insert(allNames[i]); } } //write out split count files parseCount(""); return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "splitCount"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::splitNames() { //namefile try { rareNames.clear(); abundNames.clear(); //open input file ifstream in; m->openInputFile(namefile, in); while (!in.eof()) { if (m->control_pressed) { break; } string firstCol, secondCol; in >> firstCol >> secondCol; m->gobble(in); nameMap[firstCol] = secondCol; int size = m->getNumNames(secondCol); if (size <= cutoff) { rareNames.insert(firstCol); }else{ abundNames.insert(firstCol); } } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "splitNames"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::readNamesFile() { try { //open input file ifstream in; m->openInputFile(namefile, in); while (!in.eof()) { if (m->control_pressed) { break; } string firstCol, secondCol; in >> firstCol >> secondCol; m->gobble(in); nameMap[firstCol] = secondCol; } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "readNamesFile"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::createNameMap(ListVector* thisList) { try { if (thisList != NULL) { for (int i = 0; i < thisList->getNumBins(); i++) { if (m->control_pressed) { return 0; } string bin = thisList->get(i); vector names; m->splitAtComma(bin, names); //parses bin into individual sequence names for (int j = 0; j < names.size(); j++) { nameMap[names[j]] = names[j]; } }//end for } return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "createNameMap"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::parseCount(string tag) { //namefile try { map filehandles; if (Groups.size() == 0) { map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(countfile)); variables["[tag]"] = tag; variables["[tag2]"] = "rare"; string rare = getOutputFileName("count",variables); outputNames.push_back(rare); outputTypes["count"].push_back(rare); variables["[tag2]"] = "abund"; string abund = getOutputFileName("count",variables); outputNames.push_back(abund); outputTypes["count"].push_back(abund); CountTable rareTable; CountTable abundTable; if (ct.hasGroupInfo()) { vector ctGroups = ct.getNamesOfGroups(); for (int i = 0; i < ctGroups.size(); i++) { rareTable.addGroup(ctGroups[i]); abundTable.addGroup(ctGroups[i]); } } if (rareNames.size() != 0) { for (set::iterator itRare = rareNames.begin(); itRare != rareNames.end(); itRare++) { if (ct.hasGroupInfo()) { vector groupCounts = ct.getGroupCounts(*itRare); rareTable.push_back(*itRare, groupCounts); }else { int groupCounts = ct.getNumSeqs(*itRare); rareTable.push_back(*itRare, groupCounts); } } if (rareTable.hasGroupInfo()) { vector ctGroups = rareTable.getNamesOfGroups(); for (int i = 0; i < ctGroups.size(); i++) { if (rareTable.getGroupCount(ctGroups[i]) == 0) { rareTable.removeGroup(ctGroups[i]); } } } rareTable.printTable(rare); } if (abundNames.size() != 0) { for (set::iterator itAbund = abundNames.begin(); itAbund != abundNames.end(); itAbund++) { if (ct.hasGroupInfo()) { vector groupCounts = ct.getGroupCounts(*itAbund); abundTable.push_back(*itAbund, groupCounts); }else { int groupCounts = ct.getNumSeqs(*itAbund); abundTable.push_back(*itAbund, groupCounts); } } if (abundTable.hasGroupInfo()) { vector ctGroups = abundTable.getNamesOfGroups(); for (int i = 0; i < ctGroups.size(); i++) { if (abundTable.getGroupCount(ctGroups[i]) == 0) { abundTable.removeGroup(ctGroups[i]); } } } abundTable.printTable(abund); } }else{ //parse names by abundance and group map countTableMap; map::iterator it3; for (int i=0; iaddGroup(Groups[i]); countTableMap[Groups[i]+".rare"] = rareCt; CountTable* abundCt = new CountTable(); abundCt->addGroup(Groups[i]); countTableMap[Groups[i]+".abund"] = abundCt; } vector allNames = ct.getNamesOfSeqs(); for (int i = 0; i < allNames.size(); i++) { string rareAbund; if (rareNames.count(allNames[i]) != 0) { //you are a rare name rareAbund = ".rare"; }else{ //you are a abund name rareAbund = ".abund"; } vector thisSeqsGroups = ct.getGroups(allNames[i]); for (int j = 0; j < thisSeqsGroups.size(); j++) { if (m->inUsersGroups(thisSeqsGroups[j], Groups)) { //only add if this is in a group we want int num = ct.getGroupCount(allNames[i], thisSeqsGroups[j]); vector nums; nums.push_back(num); countTableMap[thisSeqsGroups[j]+rareAbund]->push_back(allNames[i], nums); } } } for (it3 = countTableMap.begin(); it3 != countTableMap.end(); it3++) { string fileroot = outputDir + m->getRootName(m->getSimpleName(countfile)); map variables; variables["[filename]"] = fileroot; variables["[tag]"] = it3->first; string filename = getOutputFileName("count",variables); outputNames.push_back(filename); outputTypes["count"].push_back(filename); (it3->second)->printTable(filename); delete it3->second; } } return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "parseCount"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::writeNames() { //namefile try { map filehandles; if (Groups.size() == 0) { ofstream aout; ofstream rout; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(namefile)); variables["[tag]"] = "rare"; string rare = getOutputFileName("name", variables); m->openOutputFile(rare, rout); outputNames.push_back(rare); outputTypes["name"].push_back(rare); variables["[tag]"] = "abund"; string abund = getOutputFileName("name", variables); m->openOutputFile(abund, aout); outputNames.push_back(abund); outputTypes["name"].push_back(abund); if (rareNames.size() != 0) { for (set::iterator itRare = rareNames.begin(); itRare != rareNames.end(); itRare++) { rout << (*itRare) << '\t' << nameMap[(*itRare)] << endl; } } rout.close(); if (abundNames.size() != 0) { for (set::iterator itAbund = abundNames.begin(); itAbund != abundNames.end(); itAbund++) { aout << (*itAbund) << '\t' << nameMap[(*itAbund)] << endl; } } aout.close(); }else{ //parse names by abundance and group string fileroot = outputDir + m->getRootName(m->getSimpleName(namefile)); ofstream* temp; ofstream* temp2; map filehandles; map::iterator it3; for (int i=0; i variables; variables["[filename]"] = fileroot; variables["[tag]"] = "rare"; variables["[group]"] = Groups[i]; string rareGroupFileName = getOutputFileName("name",variables); variables["[tag]"] = "abund"; string abundGroupFileName = getOutputFileName("name",variables); m->openOutputFile(rareGroupFileName, *(filehandles[Groups[i]+".rare"])); m->openOutputFile(abundGroupFileName, *(filehandles[Groups[i]+".abund"])); } for (map::iterator itName = nameMap.begin(); itName != nameMap.end(); itName++) { vector names; m->splitAtComma(itName->second, names); //parses bin into individual sequence names string rareAbund; if (rareNames.count(itName->first) != 0) { //you are a rare name rareAbund = ".rare"; }else{ //you are a abund name rareAbund = ".abund"; } map outputStrings; map::iterator itout; for (int i = 0; i < names.size(); i++) { string group = groupMap.getGroup(names[i]); if (m->inUsersGroups(group, Groups)) { //only add if this is in a group we want itout = outputStrings.find(group+rareAbund); if (itout == outputStrings.end()) { outputStrings[group+rareAbund] = names[i] + '\t' + names[i]; }else { outputStrings[group+rareAbund] += "," + names[i]; } }else if(group == "not found") { m->mothurOut(names[i] + " is not in your groupfile. Ignoring."); m->mothurOutEndLine(); } } for (itout = outputStrings.begin(); itout != outputStrings.end(); itout++) { *(filehandles[itout->first]) << itout->second << endl; } } for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(filehandles[it3->first])).close(); map variables; variables["[filename]"] = fileroot; variables["[tag]"] = it3->first; outputNames.push_back(getOutputFileName("name",variables)); outputTypes["name"].push_back(getOutputFileName("name",variables)); delete it3->second; } } return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "writeNames"); exit(1); } } /**********************************************************************************************************************/ //just write the unique names - if a namesfile is given int SplitAbundCommand::writeAccnos(string tag) { try { map filehandles; if (Groups.size() == 0) { ofstream aout; ofstream rout; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFile)); variables["[tag]"] = tag; variables["[tag2]"] = "rare"; string rare = getOutputFileName("accnos",variables); m->openOutputFile(rare, rout); outputNames.push_back(rare); outputTypes["accnos"].push_back(rare); for (set::iterator itRare = rareNames.begin(); itRare != rareNames.end(); itRare++) { rout << (*itRare) << endl; } rout.close(); variables["[tag2]"] = "abund"; string abund = getOutputFileName("accnos",variables); m->openOutputFile(abund, aout); outputNames.push_back(abund); outputTypes["accnos"].push_back(abund); for (set::iterator itAbund = abundNames.begin(); itAbund != abundNames.end(); itAbund++) { aout << (*itAbund) << endl; } aout.close(); }else{ //parse names by abundance and group string fileroot = outputDir + m->getRootName(m->getSimpleName(inputFile)); ofstream* temp; ofstream* temp2; map filehandles; map::iterator it3; for (int i=0; i variables; variables["[filename]"] = fileroot; variables["[tag]"] = tag; variables["[tag2]"] = "rare"; variables["[group]"] = Groups[i]; m->openOutputFile(getOutputFileName("accnos",variables), *(filehandles[Groups[i]+".rare"])); variables["[tag2]"] = "abund"; m->openOutputFile(getOutputFileName("accnos",variables), *(filehandles[Groups[i]+".abund"])); } //write rare for (set::iterator itRare = rareNames.begin(); itRare != rareNames.end(); itRare++) { string group = groupMap.getGroup(*itRare); if (m->inUsersGroups(group, Groups)) { //only add if this is in a group we want *(filehandles[group+".rare"]) << *itRare << endl; } } //write abund for (set::iterator itAbund = abundNames.begin(); itAbund != abundNames.end(); itAbund++) { string group = groupMap.getGroup(*itAbund); if (m->inUsersGroups(group, Groups)) { //only add if this is in a group we want *(filehandles[group+".abund"]) << *itAbund << endl; } } //close files for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(filehandles[it3->first])).close(); map variables; variables["[filename]"] = fileroot; variables["[tag]"] = tag; variables["[tag2]"] = it3->first; outputNames.push_back(getOutputFileName("accnos",variables)); outputTypes["accnos"].push_back(getOutputFileName("accnos",variables)); delete it3->second; } } return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "writeAccnos"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::parseGroup(string tag) { //namefile try { map filehandles; if (Groups.size() == 0) { ofstream aout; ofstream rout; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[tag]"] = tag; variables["[tag2]"] = "rare"; string rare = getOutputFileName("group",variables); m->openOutputFile(rare, rout); outputNames.push_back(rare); outputTypes["group"].push_back(rare); variables["[tag2]"] = "abund"; string abund = getOutputFileName("group",variables); ; m->openOutputFile(abund, aout); outputNames.push_back(abund); outputTypes["group"].push_back(abund); for (map::iterator itName = nameMap.begin(); itName != nameMap.end(); itName++) { vector names; m->splitAtComma(itName->second, names); //parses bin into individual sequence names for (int i = 0; i < names.size(); i++) { string group = groupMap.getGroup(names[i]); if (group == "not found") { m->mothurOut(names[i] + " is not in your groupfile, ignoring, please correct."); m->mothurOutEndLine(); }else { if (rareNames.count(itName->first) != 0) { //you are a rare name rout << names[i] << '\t' << group << endl; }else{ //you are a abund name aout << names[i] << '\t' << group << endl; } } } } rout.close(); aout.close(); }else{ //parse names by abundance and group string fileroot = outputDir + m->getRootName(m->getSimpleName(groupfile)); ofstream* temp; ofstream* temp2; map filehandles; map::iterator it3; for (int i=0; i variables; variables["[filename]"] = fileroot; variables["[tag]"] = tag; variables["[tag2]"] = "rare"; variables["[group]"] = Groups[i]; m->openOutputFile(getOutputFileName("group",variables), *(filehandles[Groups[i]+".rare"])); variables["[tag2]"] = "abund"; m->openOutputFile(getOutputFileName("group",variables), *(filehandles[Groups[i]+".abund"])); } for (map::iterator itName = nameMap.begin(); itName != nameMap.end(); itName++) { vector names; m->splitAtComma(itName->second, names); //parses bin into individual sequence names string rareAbund; if (rareNames.count(itName->first) != 0) { //you are a rare name rareAbund = ".rare"; }else{ //you are a abund name rareAbund = ".abund"; } for (int i = 0; i < names.size(); i++) { string group = groupMap.getGroup(names[i]); if (m->inUsersGroups(group, Groups)) { //only add if this is in a group we want *(filehandles[group+rareAbund]) << names[i] << '\t' << group << endl; } } } for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(filehandles[it3->first])).close(); map variables; variables["[filename]"] = fileroot; variables["[tag]"] = tag; variables["[tag2]"] = it3->first; outputNames.push_back(getOutputFileName("group",variables)); outputTypes["group"].push_back(getOutputFileName("group",variables)); delete it3->second; } } return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "parseGroups"); exit(1); } } /**********************************************************************************************************************/ int SplitAbundCommand::parseFasta(string tag) { //namefile try { map filehandles; if (Groups.size() == 0) { ofstream aout; ofstream rout; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[tag]"] = tag; variables["[tag2]"] = "rare"; string rare = getOutputFileName("fasta",variables); m->openOutputFile(rare, rout); outputNames.push_back(rare); outputTypes["fasta"].push_back(rare); variables["[tag2]"] = "abund"; string abund = getOutputFileName("fasta",variables); m->openOutputFile(abund, aout); outputNames.push_back(abund); outputTypes["fasta"].push_back(abund); //open input file ifstream in; m->openInputFile(fastafile, in); while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { map::iterator itNames; itNames = nameMap.find(seq.getName()); if (itNames == nameMap.end()) { m->mothurOut(seq.getName() + " is not in your names or list file, ignoring."); m->mothurOutEndLine(); }else{ if (rareNames.count(seq.getName()) != 0) { //you are a rare name seq.printSequence(rout); }else{ //you are a abund name seq.printSequence(aout); } } } } in.close(); rout.close(); aout.close(); }else{ //parse names by abundance and group string fileroot = outputDir + m->getRootName(m->getSimpleName(fastafile)); ofstream* temp; ofstream* temp2; map filehandles; map::iterator it3; for (int i=0; i variables; variables["[filename]"] = fileroot; variables["[tag]"] = tag; variables["[tag2]"] = "rare"; variables["[group]"] = Groups[i]; m->openOutputFile(getOutputFileName("fasta",variables), *(filehandles[Groups[i]+".rare"])); variables["[tag2]"] = "abund"; m->openOutputFile(getOutputFileName("fasta",variables), *(filehandles[Groups[i]+".abund"])); } //open input file ifstream in; m->openInputFile(fastafile, in); while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { map::iterator itNames = nameMap.find(seq.getName()); if (itNames == nameMap.end()) { m->mothurOut(seq.getName() + " is not in your names or list file, ignoring."); m->mothurOutEndLine(); }else{ vector names; m->splitAtComma(itNames->second, names); //parses bin into individual sequence names string rareAbund; if (rareNames.count(itNames->first) != 0) { //you are a rare name rareAbund = ".rare"; }else{ //you are a abund name rareAbund = ".abund"; } if (countfile == "") { for (int i = 0; i < names.size(); i++) { string group = groupMap.getGroup(seq.getName()); if (m->inUsersGroups(group, Groups)) { //only add if this is in a group we want seq.printSequence(*(filehandles[group+rareAbund])); }else if(group == "not found") { m->mothurOut(seq.getName() + " is not in your groupfile. Ignoring."); m->mothurOutEndLine(); } } }else { vector thisSeqsGroups = ct.getGroups(names[0]); //we only need names[0], because there is no namefile for (int i = 0; i < thisSeqsGroups.size(); i++) { if (m->inUsersGroups(thisSeqsGroups[i], Groups)) { //only add if this is in a group we want seq.printSequence(*(filehandles[thisSeqsGroups[i]+rareAbund])); } } } } } } in.close(); for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { (*(filehandles[it3->first])).close(); map variables; variables["[filename]"] = fileroot; variables["[tag]"] = tag; variables["[tag2]"] = it3->first; outputNames.push_back(getOutputFileName("fasta",variables)); outputTypes["fasta"].push_back(getOutputFileName("fasta",variables)); delete it3->second; } } return 0; } catch(exception& e) { m->errorOut(e, "SplitAbundCommand", "parseFasta"); exit(1); } } /**********************************************************************************************************************/ mothur-1.36.1/source/commands/splitabundcommand.h000066400000000000000000000041651255543666200221220ustar00rootroot00000000000000#ifndef SPLITABUNDCOMMAND_H #define SPLITABUNDCOMMAND_H /* * splitabundcommand.h * Mothur * * Created by westcott on 5/17/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ /* split.abund - given a list or name file and a number (cutoff), make two files - *rare* and *abund* - where rare has data for otus that have fewer sequences than the cutoff and abund has data for otus that have as many or more sequences as the cutoff. also allow an option where a user can give a group file with the list or names file and split the group file into rare and abund. */ #include "command.hpp" #include "groupmap.h" #include "inputdata.h" #include "listvector.hpp" #include "sequence.hpp" #include "counttable.h" /***************************************************************************************/ class SplitAbundCommand : public Command { public: SplitAbundCommand(string); SplitAbundCommand(); ~SplitAbundCommand(); vector setParameters(); string getCommandName() { return "split.abund"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Split.abund"; } string getDescription() { return "split a list, name, group or fasta file based on abundance"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: int splitList(ListVector*); int splitNames(); //namefile int writeNames(); int writeList(ListVector*, string, int); int writeAccnos(string); int parseGroup(string); int parseFasta(string); int parseCount(string); int splitCount(); int readNamesFile(); //namefile int createNameMap(ListVector*); vector outputNames; GroupMap groupMap; CountTable ct; string outputDir, listfile, namefile, groupfile, countfile, label, groups, fastafile, inputFile; set labels, rareNames, abundNames; vector Groups; bool abort, allLines, accnos; int cutoff; map nameMap; }; /***************************************************************************************/ #endif mothur-1.36.1/source/commands/splitgroupscommand.cpp000066400000000000000000000407671255543666200227130ustar00rootroot00000000000000/* * splitgroupscommand.cpp * Mothur * * Created by westcott on 9/20/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "splitgroupscommand.h" #include "sharedutilities.h" #include "sequenceparser.h" #include "counttable.h" //********************************************************************************************************************** vector SplitGroupCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "CountGroup", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "CountGroup", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SplitGroupCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SplitGroupCommand::getHelpString(){ try { string helpString = ""; helpString += "The split.groups command reads a group or count file, and parses your fasta and names or count files by groups. \n"; helpString += "The split.groups command parameters are fasta, name, group, count and groups.\n"; helpString += "The fasta and group or count parameters are required.\n"; helpString += "The groups parameter allows you to select groups to create files for. \n"; helpString += "For example if you set groups=A-B-C, you will get a .A.fasta, .A.names, .B.fasta, .B.names, .C.fasta, .C.names files. \n"; helpString += "If you want .fasta and .names files for all groups, set groups=all. \n"; helpString += "The split.groups command should be used in the following format: split.group(fasta=yourFasta, group=yourGroupFile).\n"; helpString += "Example: split.groups(fasta=abrecovery.fasta, group=abrecovery.groups).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SplitGroupCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SplitGroupCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],[group],fasta"; } else if (type == "name") { pattern = "[filename],[group],names"; } else if (type == "count") { pattern = "[filename],[group],count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SplitGroupCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SplitGroupCommand::SplitGroupCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SplitGroupCommand", "SplitGroupCommand"); exit(1); } } //********************************************************************************************************************** SplitGroupCommand::SplitGroupCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { abort = true; } else if (fastafile == "not found") { fastafile = m->getFastaFile(); if (fastafile != "") { m->mothurOut("Using " + fastafile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setFastaFile(fastafile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; }else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (namefile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } if ((countfile != "") && (groupfile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or group."); m->mothurOutEndLine(); abort = true; } if ((countfile == "") && (groupfile == "")) { if (namefile == "") { //check for count then group countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a count or group file."); m->mothurOutEndLine(); abort = true; } } }else { //check for group groupfile = m->getGroupFile(); if (groupfile != "") { m->mothurOut("Using " + groupfile + " as input file for the group parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a count or group file."); m->mothurOutEndLine(); abort = true; } } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ if (groupfile != "") { outputDir = m->hasPath(groupfile); } else { outputDir = m->hasPath(countfile); } } if (countfile == "") { if (namefile == "") { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "SplitGroupCommand", "SplitAbundCommand"); exit(1); } } //********************************************************************************************************************** int SplitGroupCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (countfile == "" ) { runNameGroup(); } else { runCount(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SplitGroupCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SplitGroupCommand::runNameGroup(){ try { SequenceParser* parser; if (namefile == "") { parser = new SequenceParser(groupfile, fastafile); } else { parser = new SequenceParser(groupfile, fastafile, namefile); } if (m->control_pressed) { delete parser; return 0; } vector namesGroups = parser->getNamesOfGroups(); SharedUtil util; util.setGroups(Groups, namesGroups); string fastafileRoot = outputDir + m->getRootName(m->getSimpleName(fastafile)); string namefileRoot = outputDir + m->getRootName(m->getSimpleName(namefile)); m->mothurOutEndLine(); for (int i = 0; i < Groups.size(); i++) { m->mothurOut("Processing group: " + Groups[i]); m->mothurOutEndLine(); map variables; variables["[filename]"] = fastafileRoot; variables["[group]"] = Groups[i]; string newFasta = getOutputFileName("fasta",variables); variables["[filename]"] = namefileRoot; string newName = getOutputFileName("name",variables); parser->getSeqs(Groups[i], newFasta, false); outputNames.push_back(newFasta); outputTypes["fasta"].push_back(newFasta); if (m->control_pressed) { delete parser; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (namefile != "") { parser->getNameMap(Groups[i], newName); outputNames.push_back(newName); outputTypes["name"].push_back(newName); } if (m->control_pressed) { delete parser; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } } delete parser; return 0; } catch(exception& e) { m->errorOut(e, "SplitGroupCommand", "runNameGroup"); exit(1); } } //********************************************************************************************************************** int SplitGroupCommand::runCount(){ try { CountTable ct; ct.readTable(countfile, true, false); if (!ct.hasGroupInfo()) { m->mothurOut("[ERROR]: your count file does not contain group info, cannot split by group.\n"); m->control_pressed = true; } if (m->control_pressed) { return 0; } vector namesGroups = ct.getNamesOfGroups(); SharedUtil util; util.setGroups(Groups, namesGroups); //fill filehandles with neccessary ofstreams map ffiles; map cfiles; ofstream* temp; for (int i=0; i variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[group]"] = Groups[i]; string newFasta = getOutputFileName("fasta",variables); outputNames.push_back(newFasta); outputTypes["fasta"].push_back(newFasta); m->openOutputFile(newFasta, (*temp)); temp = new ofstream; cfiles[Groups[i]] = temp; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(countfile)); string newCount = getOutputFileName("count",variables); m->openOutputFile(newCount, (*temp)); outputNames.push_back(newCount); outputTypes["count"].push_back(newCount); (*temp) << "Representative_Sequence\ttotal\t" << Groups[i] << endl; } ifstream in; m->openInputFile(fastafile, in); while (!in.eof()) { Sequence seq(in); m->gobble(in); if (m->control_pressed) { break; } if (seq.getName() != "") { vector thisSeqsGroups = ct.getGroups(seq.getName()); for (int i = 0; i < thisSeqsGroups.size(); i++) { if (m->inUsersGroups(thisSeqsGroups[i], Groups)) { //if this sequence belongs to a group we want them print seq.printSequence(*(ffiles[thisSeqsGroups[i]])); int numSeqs = ct.getGroupCount(seq.getName(), thisSeqsGroups[i]); (*(cfiles[thisSeqsGroups[i]])) << seq.getName() << '\t' << numSeqs << '\t' << numSeqs << endl; } } } } in.close(); //close and delete ofstreams for (int i=0; ierrorOut(e, "SplitGroupCommand", "runCount"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/splitgroupscommand.h000066400000000000000000000025261255543666200223470ustar00rootroot00000000000000#ifndef SPLITGROUPSCOMMAND_H #define SPLITGROUPSCOMMAND_H /* * splitgroupscommand.h * Mothur * * Created by westcott on 9/20/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ /* split.groups - given a group file, split sequences and names files in to separate files *.group1.fasta and .group1.names. */ #include "command.hpp" #include "groupmap.h" #include "sequence.hpp" /***************************************************************************************/ class SplitGroupCommand : public Command { public: SplitGroupCommand(string); SplitGroupCommand(); ~SplitGroupCommand() {} vector setParameters(); string getCommandName() { return "split.groups"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Split.group"; } string getDescription() { return "split a name or fasta file by group"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector outputNames; string outputDir, namefile, groupfile, countfile, groups, fastafile; vector Groups; bool abort; int runNameGroup(); int runCount(); }; /***************************************************************************************/ #endif mothur-1.36.1/source/commands/sracommand.cpp000066400000000000000000003717641255543666200211110ustar00rootroot00000000000000// // sracommand.cpp // Mothur // // Created by SarahsWork on 10/28/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "sracommand.h" #include "sffinfocommand.h" #include "parsefastaqcommand.h" //********************************************************************************************************************** vector SRACommand::setParameters(){ try { CommandParameter psff("sff", "InputTypes", "", "", "sffFastQFile", "sffFastQFile", "none","xml",false,false); parameters.push_back(psff); CommandParameter poligos("oligos", "InputTypes", "", "", "oligos", "none", "none","",false,false,true); parameters.push_back(poligos); CommandParameter pfile("file", "InputTypes", "", "", "sffFastQFile-oligos", "sffFastQFile", "none","xml",false,false); parameters.push_back(pfile); CommandParameter pfastq("fastq", "InputTypes", "", "", "sffFastQFile", "sffFastQFile", "none","xml",false,false); parameters.push_back(pfastq); CommandParameter pcontact("project", "InputTypes", "", "", "none", "none", "none","xml",false,true,true); parameters.push_back(pcontact); CommandParameter preorient("checkorient", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(preorient); CommandParameter pmimark("mimark", "InputTypes", "", "", "none", "none", "none","xml",false,true,true); parameters.push_back(pmimark); //choose only one multiple options CommandParameter pplatform("platform", "Multiple", "_LS454-ILLUMINA-ION_TORRENT-PACBIO_SMRT", "_LS454", "", "", "","",false,false); parameters.push_back(pplatform); CommandParameter pinstrument("instrument", "Multiple", "454_GS-454_GS_20-454_GS_FLX-454_GS_FLX_Titanium-454_GS_Junior-Illumina_Genome_Analyzer-Illumina_Genome_Analyzer_II-Illumina_Genome_Analyzer_IIx-Illumina_HiSeq_2000-Illumina_HiSeq_1000-Illumina_MiSeq-PacBio_RS-Ion_Torrent_PGM-unspecified", "454_GS", "", "", "","",false,false); parameters.push_back(pinstrument); CommandParameter plibstrategy("libstrategy", "String", "AMPLICON", "", "", "", "","",false,false); parameters.push_back(plibstrategy); CommandParameter pdatatype("datatype", "String", "METAGENOME", "", "", "", "","",false,false); parameters.push_back(pdatatype); CommandParameter plibsource("libsource", "String", "METAGENOMIC", "", "", "", "","",false,false); parameters.push_back(plibsource); CommandParameter plibselection("libselection", "String", "PCR", "", "", "", "","",false,false); parameters.push_back(plibselection); CommandParameter porientation("orientation", "Multiple", "forward-reverse", "forward", "", "", "","",false,false); parameters.push_back(porientation); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ppdiffs); CommandParameter pbdiffs("bdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pbdiffs); CommandParameter pldiffs("ldiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pldiffs); CommandParameter psdiffs("sdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psdiffs); CommandParameter ptdiffs("tdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ptdiffs); //every command must have inputdir and outputdir. This allows mothur users to redirect input and output files. CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SRACommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SRACommand::getHelpString(){ try { string helpString = ""; helpString += "The make.sra command creates the necessary files for a NCBI submission. The xml file and individual sff or fastq files parsed from the original sff or fastq file.\n"; helpString += "The make.sra command parameters are: sff, fastq, file, oligos, project, mimarksfile, pdiffs, bdiffs, ldiffs, sdiffs, tdiffs, checkorient, platform, orientation, libstrategy, datatype, libsource, libselection and instrument.\n"; helpString += "The sff parameter is used to provide the original sff file.\n"; helpString += "The fastq parameter is used to provide the original fastq file.\n"; helpString += "The project parameter is used to provide your project file.\n"; helpString += "The oligos parameter is used to provide an oligos file to parse your sff or fastq file by. It is required and must contain barcodes and primers, or you must provide a file option. \n"; helpString += "The mimark parameter is used to provide your mimarks file. You can create the template for this file using the get.mimarkspackage command.\n"; helpString += "The file parameter is used to provide a file containing a list of individual fastq or sff files or paired fastq files with a group assignment. File lines can be 2, 3 or 4 columns. The 2 column files are sff file then oligos or fastqfile then oligos or ffastq and rfastq. You may have multiple lines in the file. The 3 column files are for paired read libraries. The format is groupName, forwardFastqFile reverseFastqFile. Four column files are for inputting file pairs with index files. Example: My.forward.fastq My.reverse.fastq NONE My.rindex.fastq. The keyword NONE can be used when there is not a index file for either the forward or reverse file. \n"; helpString += "The tdiffs parameter is used to specify the total number of differences allowed in the sequence. The default is pdiffs + bdiffs + sdiffs + ldiffs.\n"; helpString += "The bdiffs parameter is used to specify the number of differences allowed in the barcode. The default is 0.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; helpString += "The ldiffs parameter is used to specify the number of differences allowed in the linker. The default is 0.\n"; helpString += "The sdiffs parameter is used to specify the number of differences allowed in the spacer. The default is 0.\n"; helpString += "The checkorient parameter will check look for the reverse compliment of the barcode or primer in the sequence. The default is false.\n"; helpString += "The platform parameter is used to specify platform you are using choices are: _LS454,ILLUMINA,ION_TORRENT,PACBIO_SMRT. Default=_LS454. This is a controlled vocabulary section in the XML file that will be generated.\n"; helpString += "The orientation parameter is used to specify sequence orientation. Choices are: forward and reverse. Default=forward. This is a controlled vocabulary section in the XML file that will be generated.\n"; helpString += "The instrument parameter is used to specify instrument. Choices are 454_GS-454_GS_20-454_GS_FLX-454_GS_FLX_Titanium-454_GS_Junior-Illumina_Genome_Analyzer-Illumina_Genome_Analyzer_II-Illumina_Genome_Analyzer_IIx-Illumina_HiSeq_2000-Illumina_HiSeq_1000-Illumina_MiSeq-PacBio_RS-Ion_Torrent_PGM-unspecified. Default=454_GS. This is a controlled vocabulary section in the XML file that will be generated. \n"; helpString += "The libstrategy parameter is used to specify library strategy. Default=AMPLICON. Choices are AMPLICON,WGA,WGS,WGX,RNA-Seq,miRNA-Seq,WCS,CLONE,POOLCLONE,CLONEEND,FINISHING,ChIP-Seq,MNase-Seq,DNase-Hypersensitivity,Bisulfite-Seq,Tn-Seq,EST,FL-cDNA,CTS,MRE-Seq,MeDIP-Seq,MBD-Seq,OTHER. This is a controlled vocabulary section in the XML file that will be generated. \n"; helpString += "The libsource parameter is used to specify library source. Default=METAGENOMIC. Choices are METAGENOMIC,GENOMIC,TRANSCRIPTOMIC,METATRANSCRIPTOMIC,SYNTHETIC,VIRAL_RNA,OTHER. This is a controlled vocabulary section in the XML file that will be generated. \n"; helpString += "The libselection parameter is used to specify library selection. Default=PCR. Choices are PCR,RANDOM,RANDOM_PCR,RT-PCR,HMPR,MF,CF-S,CF-H,CF-T,CF-M,MDA,MSLL,cDNA,ChIP,MNase,DNAse,Hybrid_Selection,Reduced_Representation,Restriction_Digest,5-methylcytidine_antibody,MBD2_protein_methyl-CpG_binding_domain,CAGE,RACE,size_fractionation,Padlock_probes_capture_method,other,unspecified. This is a controlled vocabulary section in the XML file that will be generated. \n"; helpString += "The datatype parameter is used to specify datatype. Default=METAGENOME. Choices are METAGENOME,GENOME_SEQUENCING,METAGENOMIC_ASSEMBLY,ASSEMBLY,TRANSCRIPTOME,PROTEOMIC,MAP,CLONE_ENDS,TARGETED_LOCI,RANDOM_SURVEY,EXOME,VARIATION,EPIGENOMICS,PHENOTYPE,GENOTYPE,OTHER. This is a controlled vocabulary section in the XML file that will be generated. \n"; helpString += "make.sra(sff=sff=GHL4YHV01.sff, GHL4YHV01.oligos, project=test.project, mimark=MIMarksData.txt)\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SRACommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SRACommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "xml") { pattern = "[filename],xml"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SRACommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SRACommand::SRACommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["xml"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SRACommand", "SRACommand"); exit(1); } } //********************************************************************************************************************** SRACommand::SRACommand(string option) { try { abort = false; calledHelp = false; fileOption = 0; libLayout = "single"; //controlled vocab //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { //valid paramters for this command vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } vector tempOutNames; outputTypes["xml"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("sff"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sff"] = inputDir + it->second; } } it = parameters.find("fastq"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fastq"] = inputDir + it->second; } } it = parameters.find("file"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["file"] = inputDir + it->second; } } it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } it = parameters.find("project"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["project"] = inputDir + it->second; } } it = parameters.find("mimark"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["mimark"] = inputDir + it->second; } } } //check for parameters fastqfile = validParameter.validFile(parameters, "fastq", true); if (fastqfile == "not open") { fastqfile = ""; abort = true; } else if (fastqfile == "not found") { fastqfile = ""; } sfffile = validParameter.validFile(parameters, "sff", true); if (sfffile == "not open") { sfffile = ""; abort = true; } else if (sfffile == "not found") { sfffile = ""; } setOligosParameter = false; oligosfile = validParameter.validFile(parameters, "oligos", true); if (oligosfile == "not found") { oligosfile = ""; } else if(oligosfile == "not open") { abort = true; } else { m->setOligosFile(oligosfile); setOligosParameter = true; } contactfile = validParameter.validFile(parameters, "project", true); if (contactfile == "not found") { contactfile = ""; m->mothurOut("[ERROR]: You must provide a project file before you can use the sra command."); m->mothurOutEndLine(); abort = true; } else if(contactfile == "not open") { abort = true; } mimarksfile = validParameter.validFile(parameters, "mimark", true); if (mimarksfile == "not found") { mimarksfile = ""; m->mothurOut("[ERROR]: You must provide a mimark file before you can use the sra command. You can create a template for this file using the get.mimarkspackage command."); m->mothurOutEndLine(); abort = true; } else if(mimarksfile == "not open") { abort = true; } file = validParameter.validFile(parameters, "file", true); if (file == "not open") { file = ""; abort = true; } else if (file == "not found") { file = ""; } else { fileOption = findFileOption(); } if ((file == "") && (oligosfile == "")) { m->mothurOut("[ERROR]: You must provide an oligos file or file with oligos files in them before you can use the sra command."); m->mothurOutEndLine(); abort = true; } if ((fastqfile == "") && (file == "") && (sfffile == "")) { m->mothurOut("[ERROR]: You must provide a file, sff file or fastq file before you can use the sra command."); m->mothurOutEndLine(); abort = true; } //use only one Mutliple type _LS454-ILLUMINA-ION_TORRENT-PACBIO_SMRT platform = validParameter.validFile(parameters, "platform", false); if (platform == "not found") { platform = "_LS454"; } if (!checkCasesPlatforms(platform)) { abort = true; } //error message in function if (!abort) { //don't check instrument model is platform is bad //454_GS-454_GS_20-454_GS_FLX-454_GS_FLX_Titanium-454_GS_Junior-Illumina_Genome_Analyzer-Illumina_Genome_Analyzer_II-Illumina_Genome_Analyzer_IIx-Illumina_HiSeq_2000-Illumina_HiSeq_1000-Illumina_MiSeq-PacBio_RS-Ion_Torrent_PGM-unspecified instrumentModel = validParameter.validFile(parameters, "instrument", false); if (instrumentModel == "not found") { instrumentModel = "454_GS"; } if (!checkCasesInstrumentModels(instrumentModel)) { abort = true; } //error message in function } //turn _ to spaces mothur's work around for (int i = 0; i < instrumentModel.length(); i++) { if (instrumentModel[i] == '_') { instrumentModel[i] = ' '; } } libStrategy = validParameter.validFile(parameters, "libstrategy", false); if (libStrategy == "not found") { libStrategy = "AMPLICON"; } if (!checkCasesLibStrategy(libStrategy)) { abort = true; } //error message in function //turn _ to spaces mothur's work around for (int i = 0; i < libStrategy.length(); i++) { if (libStrategy[i] == '_') { libStrategy[i] = ' '; } } libSource = validParameter.validFile(parameters, "libsource", false); if (libSource == "not found") { libSource = "METAGENOMIC"; } if (!checkCasesLibSource(libSource)) { abort = true; } //error message in function //turn _ to spaces mothur's work around for (int i = 0; i < libSource.length(); i++) { if (libSource[i] == '_') { libSource[i] = ' '; } } libSelection = validParameter.validFile(parameters, "libselection", false); if (libSelection == "not found") { libSelection = "PCR"; } if (!checkCasesLibSelection(libSelection)) { abort = true; } //error message in function //turn _ to spaces mothur's work around for (int i = 0; i < libSelection.length(); i++) { if (libSelection[i] == '_') { libSelection[i] = ' '; } } dataType = validParameter.validFile(parameters, "datatype", false); if (dataType == "not found") { dataType = "metagenome"; } if (!checkCasesDataType(dataType)) { abort = true; } //error message in function //turn _ to spaces mothur's work around for (int i = 0; i < dataType.length(); i++) { if (dataType[i] == '_') { dataType[i] = ' '; } } orientation = validParameter.validFile(parameters, "orientation", false); if (orientation == "not found") { orientation = "forward"; } if ((orientation == "forward") || (orientation == "reverse")) { } else { m->mothurOut("[ERROR]: " + orientation + " is not a valid orientation option. Choices are: forward and reverse.\n"); m->mothurOutEndLine(); abort = true; } string temp = validParameter.validFile(parameters, "bdiffs", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, bdiffs); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, pdiffs); temp = validParameter.validFile(parameters, "ldiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, ldiffs); temp = validParameter.validFile(parameters, "sdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, sdiffs); temp = validParameter.validFile(parameters, "tdiffs", false); if (temp == "not found") { int tempTotal = pdiffs + bdiffs + ldiffs + sdiffs; temp = toString(tempTotal); } m->mothurConvert(temp, tdiffs); if(tdiffs == 0){ tdiffs = bdiffs + pdiffs + ldiffs + sdiffs; } checkorient = validParameter.validFile(parameters, "checkorient", false); if (temp == "not found") { temp = "F"; } } } catch(exception& e) { m->errorOut(e, "SRACommand", "SRACommand"); exit(1); } } //********************************************************************************************************************** int SRACommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } uniqueNames.insert("scrap"); readContactFile(); if (m->debug) { m->mothurOut("[DEBUG]: read contact file.\n"); } readMIMarksFile(); if (m->debug) { m->mothurOut("[DEBUG]: read mimarks file.\n"); } if (oligosfile != "") { readOligos(); } if (m->debug) { m->mothurOut("[DEBUG]: read oligos file.\n"); } if (m->control_pressed) { return 0; } //parse files map > filesBySample; isSFF = false; if (file != "") { readFile(filesBySample); } else if (sfffile != "") { parseSffFile(filesBySample); } else if (fastqfile != "") { parseFastqFile(filesBySample); } //cout << "files by sample size = " << filesBySample.size() << endl; //checks groups and files returned from parse - removes any groups that did not get reads assigned to them, orders files. checkGroups(filesBySample); sanityCheckMiMarksGroups(); if (m->debug) { m->mothurOut("[DEBUG]: finished sanity check.\n"); } //create xml file string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(inputfile); } map variables; variables["[filename]"] = thisOutputDir + "submission."; string outputFileName = getOutputFileName("xml", variables); outputNames.push_back(outputFileName); outputTypes["xml"].push_back(outputFileName); ofstream out; m->openOutputFile(outputFileName, out); string blankFile = thisOutputDir + "submit.ready"; ofstream outT; m->openOutputFile(blankFile, outT); outT.close(); //contacts portion //////////////////////////////////////////////////////// out << "\n"; out << "\t\n"; out << "\t\t New Submission. Generated by mothur version " + m->getVersion() + " \n"; out << "\t\t\n"; out << "\t\t\n"; out << "\t\t" + centerName + "\n"; out << "\t\t\n"; out << "\t\t\t\n"; out << "\t\t\t\t" + firstName + "\n"; out << "\t\t\t\t" + lastName + "\n"; out << "\t\t\t\n"; out << "\t\t\n"; out << "\t\t\n"; out << "\t\n"; //////////////////////////////////////////////////////// //bioproject //////////////////////////////////////////////////////// out << "\t\n"; out << "\t\t\n"; out << "\t\t\t\n"; out << "\t\t\t\t\n"; out << "\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t" + projectName + " \n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\t" + projectTitle + " \n"; out << "\t\t\t\t\t\t\t

" + description + "

\n"; if (website != "") { out << "\t\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\t\t" + website + "\n"; out << "\t\t\t\t\t\t\t\n"; } if (Grants.size() != 0) { for (int i = 0; i < Grants.size(); i++) { out << "\t\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\t\t" + Grants[i].grantAgency + "\n"; if (Grants[i].grantTitle != "") { out << "\t\t\t\t\t\t\t\t" + Grants[i].grantTitle + "\n"; } out << "\t\t\t\t\t\t\t\n"; } } out << "\t\t\t\t\t\t
\n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\t\t\t" + dataType + " \n"; out << "\t\t\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t
\n"; out << "\t\t\t\t
\n"; out << "\t\t\t
\n"; out << "\t\t\t\n"; out << "\t\t\t\t\t\t" + projectName + " \n"; out << "\t\t\t\n"; out << "\t\t
\n"; out << "\t
\n"; //////////////////////////////////////////////////////// //bioSample //////////////////////////////////////////////////////// for (int i = 0; i < Groups.size(); i++) { if (m->control_pressed) { break; } out << "\t\n"; out << "\t\t\n"; out << "\t\t\t\n"; out << "\t\t\t\t\n"; out << "\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t" + Groups[i] + " \n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t\t" + mimarks[Groups[i]]["sample_title"] + " \n"; out << "\t\t\t\t\t\t\t

" + mimarks[Groups[i]]["description"] + "

\n"; out << "\t\t\t\t\t\t
\n"; out << "\t\t\t\t\t\t\n"; string organismName = "metagenome"; map::iterator itOrganism = Group2Organism.find(Groups[i]); if (itOrganism != Group2Organism.end()) { organismName = itOrganism->second; } //user supplied acceptable organism, so use it. out << "\t\t\t\t\t\t\t" + organismName + " \n"; out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t\t" + packageType + "\n"; out << "\t\t\t\t\t\t\n"; //add biosample required attributes map >:: iterator it = mimarks.find(Groups[i]); if (it != mimarks.end()) { map categories = it->second; for (map:: iterator it2 = categories.begin(); it2 != categories.end(); it2++) { if (m->control_pressed) { break; } out << "\t\t\t\t\t\t\tfirst + "\">" + it2->second + "\n"; } } out << "\t\t\t\t\t\t\n"; out << "\t\t\t\t\t
\n"; out << "\t\t\t\t
\n"; out << "\t\t\t
\n"; out << "\t\t\t\n"; out << "\t\t\t\t" + Groups[i] + "\n"; out << "\t\t\t\n"; out << "\t\t
\n"; out << "\t
\n"; } map::iterator itGroup; //File objects //////////////////////////////////////////////////////// for (int i = 0; i < Groups.size(); i++) { vector thisGroupsFiles = filesBySample[Groups[i]]; string thisGroupsBarcode, thisGroupsPrimer; if (libLayout == "paired") { thisGroupsBarcode = "."; thisGroupsPrimer = "."; } else { thisGroupsBarcode = ""; thisGroupsPrimer = ""; } itGroup = Group2Barcode.find(Groups[i]); if (itGroup != Group2Barcode.end()) { if (fileOption != 5) { thisGroupsBarcode = itGroup->second; } //don't include barcodes if using index files. } itGroup = Group2Primer.find(Groups[i]); if (itGroup != Group2Primer.end()) { thisGroupsPrimer = itGroup->second; } //cout << Groups[i] << '\t' << thisGroupsFiles.size() << endl; for (int j = 0; j < thisGroupsFiles.size(); j++) { string libId = m->getSimpleName(thisGroupsFiles[j]) + "." + Groups[i]; if (m->control_pressed) { break; } out << "\t\n"; out << "\t\t\n"; if (libLayout == "paired") { //adjust the libID because the thisGroupsFiles[j] contains two filenames vector pieces = m->splitWhiteSpace(thisGroupsFiles[j]); libId = m->getSimpleName(pieces[0]) + "." + Groups[i]; out << "\t\t\tgetSimpleName(pieces[0]) + "\">\n"; out << "\t\t\t\tgeneric-data \n"; out << "\t\t\t\n"; out << "\t\t\tgetSimpleName(pieces[1]) + "\">\n"; out << "\t\t\t\tgeneric-data \n"; out << "\t\t\t\n"; //attributes if (linkers.size() != 0) { string linkerString = ""; //linker size forced to 1 for (int k = 0; k < linkers.size(); k++) { linkerString += linkers[k] + ";"; } linkerString = linkerString.substr(0, linkerString.length()-1); out << "\t\t\t" + linkerString + "\n"; out << "\t\t\t" + toString(ldiffs) + "\n"; } if (thisGroupsBarcode != ".") { string barcodeString = ""; vector thisBarcodes; m->splitAtChar(thisGroupsBarcode, thisBarcodes, '.'); if (thisBarcodes[0] != "NONE") { barcodeString += thisBarcodes[0] + ";"; } if (thisBarcodes[1] != "NONE") { barcodeString += thisBarcodes[1] + ";"; }//forward barcode + reverse barcode barcodeString = barcodeString.substr(0, barcodeString.length()-1); out << "\t\t\t" + barcodeString + "\n"; out << "\t\t\t" + toString(bdiffs) + "\n"; } if (spacers.size() != 0) { string spacerString = ""; //spacer size forced to 1 for (int k = 0; k < spacers.size(); k++) { spacerString += spacers[k] + ";"; } spacerString = spacerString.substr(0, spacerString.length()-1); out << "\t\t\t" + spacerString + "\n"; out << "\t\t\t" + toString(sdiffs) + "\n"; } if (thisGroupsPrimer != ".") { string primerString = ""; vector thisPrimers; m->splitAtChar(thisGroupsPrimer, thisPrimers, '.'); if (thisPrimers[0] != "") { primerString += thisPrimers[0] + ";"; } if (thisPrimers[1] != "") { primerString += thisPrimers[1] + ";"; } if (primerString != "") { primerString = primerString.substr(0, primerString.length()-1); out << "\t\t\t" + primerString + "\n"; out << "\t\t\t" + toString(pdiffs) + "\n"; } } out << "\t\t\t" + libId + "\n"; out << "\t\t\t" + libStrategy + "\n"; out << "\t\t\t" + libSource + "\n"; out << "\t\t\t" + libSelection + "\n"; out << "\t\t\t" + libLayout + "\n"; out << "\t\t\t" + instrumentModel + "\n"; out << "\t\t\t" + mimarks[Groups[i]]["seq_methods"] + "\n"; }else { //single out << "\t\t\tgetSimpleName(thisGroupsFiles[j]) + "\">\n"; out << "\t\t\t\tgeneric-data \n"; out << "\t\t\t\n"; //attributes //linkers -> barcodes -> spacers -> primers if (linkers.size() != 0) { string linkerString = ""; for (int k = 0; k < linkers.size(); k++) { linkerString += linkers[k] + ";"; } linkerString = linkerString.substr(0, linkerString.length()-1); out << "\t\t\t" + linkerString + "\n"; out << "\t\t\t" + toString(ldiffs) + "\n"; } if (thisGroupsBarcode != "") { out << "\t\t\t" + thisGroupsBarcode + "\n"; out << "\t\t\t" + toString(bdiffs) + "\n"; } if (spacers.size() != 0) { string spacerString = ""; for (int k = 0; k < spacers.size(); k++) { spacerString += spacers[k] + ";"; } spacerString = spacerString.substr(0, spacerString.length()-1); out << "\t\t\t" + spacerString + "\n"; out << "\t\t\t" + toString(sdiffs) + "\n"; } if (thisGroupsPrimer != "") { out << "\t\t\t" + thisGroupsPrimer + "\n"; out << "\t\t\t" + toString(pdiffs) + "\n"; } //out << "\t\t\t" + orientation + "\n"; out << "\t\t\t" + libId + "\n"; out << "\t\t\t" + libStrategy + "\n"; out << "\t\t\t" + libSource + "\n"; out << "\t\t\t" + libSelection + "\n"; out << "\t\t\t" + libLayout + "\n"; out << "\t\t\t" + instrumentModel + "\n"; out << "\t\t\t" + mimarks[Groups[i]]["seq_methods"] + "\n"; } ///////////////////bioProject info out << "\t\t\t\n"; out << "\t\t\t\t\n"; out << "\t\t\t\t\t" + projectName + " \n"; out << "\t\t\t\t\n"; out << "\t\t\t\n"; //////////////////bioSample info out << "\t\t\t\n"; out << "\t\t\t\t\n"; out << "\t\t\t\t\t" + Groups[i] + "\n"; out << "\t\t\t\t\n"; out << "\t\t\t\n"; //libID out << "\t\t\t\n"; if (libLayout == "paired") { //adjust the libID because the thisGroupsFiles[j] contains two filenames vector pieces = m->splitWhiteSpace(thisGroupsFiles[j]); libId = m->getSimpleName(pieces[0]) + "." + Groups[i]; } out << "\t\t\t\t" + libId + "\n"; out << "\t\t\t\n"; out << "\t\t\n"; out << "\t\n"; } } out << "
\n"; out.close(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output files created by command m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "execute"); exit(1); } } //********************************************************************************************************************** int SRACommand::readContactFile(){ try { lastName = ""; firstName = ""; submissionName = ""; email = ""; centerName = ""; centerType = ""; description = ""; website = ""; projectName = ""; projectTitle = ""; ownership = "owner"; ifstream in; m->openInputFile(contactfile, in); while(!in.eof()) { if (m->control_pressed) { break; } string key, value; in >> key; m->gobble(in); value = m->getline(in); m->gobble(in); for (int i = 0; i < key.length(); i++) { key[i] = toupper(key[i]); } if (key == "USERNAME") { submissionName = value; } else if (key == "LAST") { lastName = value; } else if (key == "FIRST") { firstName = value; } else if (key == "EMAIL") { email = value; } else if (key == "CENTER") { centerName = value; } else if (key == "TYPE") { centerType = value; for (int i = 0; i < centerType.length(); i++) { centerType[i] = tolower(centerType[i]); } if ((centerType == "consortium") || (centerType == "center") || (centerType == "institute") || (centerType == "lab")) {} else { m->mothurOut("[ERROR]: " + centerType + " is not a center type option. Valid center type options are consortium, center, institute and lab. This is a controlled vocabulary section in the XML file that will be generated."); m->mothurOutEndLine(); m->control_pressed = true; } }else if (key == "OWNERSHIP") { ownership = value; for (int i = 0; i < centerType.length(); i++) { centerType[i] = tolower(ownership[i]); } if ((ownership == "owner") || (ownership == "participant")) {} else { m->mothurOut("[ERROR]: " + ownership + " is not a ownership option. Valid ownership options are owner or participant. This is a controlled vocabulary section in the XML file that will be generated."); m->mothurOutEndLine(); m->control_pressed = true; } }else if (key == "DESCRIPTION") { description = value; } else if (key == "WEBSITE") { website = value; } else if (key == "PROJECTNAME") { projectName = value; } else if (key == "PROJECTTITLE") { projectTitle = value; } else if (key == "GRANT") { string temp = value; vector values; m->splitAtComma(temp, values); Grant thisGrant; for (int i = 0; i < values.size(); i++) { vector items; m->splitAtChar(values[i], items, '='); if (items.size() != 2) { m->mothurOut("[ERROR]: error parsing grant info for line \"" + value + "\", skipping it.\n"); break; } else { //remove any leading spaces in tag int i; for (i = 0; i < items[0].length(); i++) { if (isspace(items[0][i])) {}else {break;} } items[0] = items[0].substr(i); if (items[0] == "id") { thisGrant.grantId = items[1]; } else if (items[0] == "title") { thisGrant.grantTitle = items[1]; } else if (items[0] == "agency") { thisGrant.grantAgency = items[1]; } else { m->mothurOut("[ERROR]: unknown identifier '" + items[0] + "', skipping it.\n"); } } } if ((thisGrant.grantId == "") || (thisGrant.grantAgency == "")) { m->mothurOut("[ERROR]: Missing info for line \"" + value + "\", skipping it. Note: the id and agency fields are required. Example: Grant id=yourID, agency=yourAgency.\n"); } else { Grants.push_back(thisGrant); } } } in.close(); if (lastName == "") { m->mothurOut("[ERROR]: missing last name from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (firstName == "") { m->mothurOut("[ERROR]: missing first name from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (submissionName == "") { m->mothurOut("[ERROR]: missing submission name from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (email == "") { m->mothurOut("[ERROR]: missing email from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (centerName == "") { m->mothurOut("[ERROR]: missing center name from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (centerType == "") { m->mothurOut("[ERROR]: missing center type from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (description == "") { m->mothurOut("[ERROR]: missing description from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (projectTitle == "") { m->mothurOut("[ERROR]: missing project title from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } if (projectName == "") { m->mothurOut("[ERROR]: missing project name from project file, quitting."); m->mothurOutEndLine(); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "readContactFile"); exit(1); } } //********************************************************************************************************************** //air, host_associated, human_associated, human_gut, human_oral, human_skin, human_vaginal, microbial, miscellaneous, plant_associated, sediment, soil, wastewater or water //all packages require: *sample_name *organism *collection_date *biome *feature *material *geo_loc_name *lat_lon //air: *altitude //host_associated, human_associated, human_gut, human_oral, human_skin, human_vaginal, plant_associated: *host //microbial, sediment, soil: *depth *elev //water: *depth int SRACommand::readMIMarksFile(){ try { //acceptable organisms vector acceptableOrganisms; bool organismError = false; //ecological acceptableOrganisms.push_back("activated carbon metagenome"); acceptableOrganisms.push_back("activated sludge metagenome"); acceptableOrganisms.push_back("air metagenome"); acceptableOrganisms.push_back("anaerobic digester metagenome"); acceptableOrganisms.push_back("ant fungus garden metagenome"); acceptableOrganisms.push_back("aquatic metagenome"); acceptableOrganisms.push_back("activated carbon metagenome"); acceptableOrganisms.push_back("activated sludge metagenome"); acceptableOrganisms.push_back("beach sand metagenome"); acceptableOrganisms.push_back("biofilm metagenome"); acceptableOrganisms.push_back("biofilter metagenome"); acceptableOrganisms.push_back("biogas fermenter metagenome"); acceptableOrganisms.push_back("bioreactor metagenome"); acceptableOrganisms.push_back("bioreactor sludge metagenome"); acceptableOrganisms.push_back("clinical metagenome"); acceptableOrganisms.push_back("coal metagenome"); acceptableOrganisms.push_back("compost metagenome"); acceptableOrganisms.push_back("dust metagenome"); acceptableOrganisms.push_back("fermentation metagenome"); acceptableOrganisms.push_back("food fermentation metagenome"); acceptableOrganisms.push_back("food metagenome"); acceptableOrganisms.push_back("freshwater metagenome"); acceptableOrganisms.push_back("freshwater sediment metagenome"); acceptableOrganisms.push_back("groundwater metagenome"); acceptableOrganisms.push_back("halite metagenome"); acceptableOrganisms.push_back("hot springs metagenome"); acceptableOrganisms.push_back("hydrocarbon metagenome"); acceptableOrganisms.push_back("hydrothermal vent metagenome"); acceptableOrganisms.push_back("hypersaline lake metagenome"); acceptableOrganisms.push_back("ice metagenome"); acceptableOrganisms.push_back("indoor metagenome"); acceptableOrganisms.push_back("industrial waste metagenome"); acceptableOrganisms.push_back("mangrove metagenome"); acceptableOrganisms.push_back("marine metagenome"); acceptableOrganisms.push_back("marine sediment metagenome"); acceptableOrganisms.push_back("microbial mat metagenome"); acceptableOrganisms.push_back("mine drainage metagenome"); acceptableOrganisms.push_back("mixed culture metagenome"); acceptableOrganisms.push_back("oil production facility metagenome"); acceptableOrganisms.push_back("paper pulp metagenome"); acceptableOrganisms.push_back("permafrost metagenome"); acceptableOrganisms.push_back("plastisphere metagenome"); acceptableOrganisms.push_back("power plant metagenome"); acceptableOrganisms.push_back("retting rhizosphere metagenome"); acceptableOrganisms.push_back("rock metagenome"); acceptableOrganisms.push_back("salt lake metagenome"); acceptableOrganisms.push_back("saltern metagenome"); acceptableOrganisms.push_back("sediment metagenome"); acceptableOrganisms.push_back("snow metagenome"); acceptableOrganisms.push_back("soil metagenome"); acceptableOrganisms.push_back("stromatolite metagenome"); acceptableOrganisms.push_back("terrestrial metagenome"); acceptableOrganisms.push_back("tomb wall metagenome"); acceptableOrganisms.push_back("wastewater metagenome"); acceptableOrganisms.push_back("wetland metagenome"); acceptableOrganisms.push_back("whale fall metagenome"); //oganismal acceptableOrganisms.push_back("algae metagenome"); acceptableOrganisms.push_back("ant metagenome"); acceptableOrganisms.push_back("bat metagenome"); acceptableOrganisms.push_back("beetle metagenome"); acceptableOrganisms.push_back("bovine gut metagenome"); acceptableOrganisms.push_back("bovine metagenome"); acceptableOrganisms.push_back("chicken gut metagenome"); acceptableOrganisms.push_back("coral metagenome"); acceptableOrganisms.push_back("echinoderm metagenome"); acceptableOrganisms.push_back("endophyte metagenome"); acceptableOrganisms.push_back("epibiont metagenome"); acceptableOrganisms.push_back("fish metagenome"); acceptableOrganisms.push_back("fossil metagenome"); acceptableOrganisms.push_back("gill metagenome"); acceptableOrganisms.push_back("gut metagenome"); acceptableOrganisms.push_back("honeybee metagenome"); acceptableOrganisms.push_back("human gut metagenome"); acceptableOrganisms.push_back("human lung metagenome"); acceptableOrganisms.push_back("human metagenome"); acceptableOrganisms.push_back("human nasal/pharyngeal metagenome"); acceptableOrganisms.push_back("human oral metagenome"); acceptableOrganisms.push_back("human skin metagenome"); acceptableOrganisms.push_back("insect gut metagenome"); acceptableOrganisms.push_back("insect metagenome"); acceptableOrganisms.push_back("mollusc metagenome"); acceptableOrganisms.push_back("mosquito metagenome"); acceptableOrganisms.push_back("mouse gut metagenome"); acceptableOrganisms.push_back("mouse metagenome"); acceptableOrganisms.push_back("mouse skin metagenome"); acceptableOrganisms.push_back("nematode metagenome"); acceptableOrganisms.push_back("oral metagenome"); acceptableOrganisms.push_back("phyllosphere metagenome"); acceptableOrganisms.push_back("pig metagenome"); acceptableOrganisms.push_back("plant metagenome"); acceptableOrganisms.push_back("primate metagenome"); acceptableOrganisms.push_back("rat metagenome"); acceptableOrganisms.push_back("root metagenome"); acceptableOrganisms.push_back("sea squirt metagenome"); acceptableOrganisms.push_back("seed metagenome"); acceptableOrganisms.push_back("shoot metagenome"); acceptableOrganisms.push_back("skin metagenome"); acceptableOrganisms.push_back("snake metagenome"); acceptableOrganisms.push_back("sponge metagenome"); acceptableOrganisms.push_back("stomach metagenome"); acceptableOrganisms.push_back("symbiont metagenome"); acceptableOrganisms.push_back("termite gut metagenome"); acceptableOrganisms.push_back("termite metagenome"); acceptableOrganisms.push_back("upper respiratory tract metagenome"); acceptableOrganisms.push_back("urine metagenome"); acceptableOrganisms.push_back("viral metagenome"); acceptableOrganisms.push_back("wallaby gut metagenome"); acceptableOrganisms.push_back("wasp metagenome"); acceptableOrganisms.push_back("synthetic metagenome"); acceptableOrganisms.push_back("metagenome"); vector requiredFieldsForPackage; requiredFieldsForPackage.push_back("sample_name"); requiredFieldsForPackage.push_back("description"); requiredFieldsForPackage.push_back("sample_title"); requiredFieldsForPackage.push_back("collection_date"); requiredFieldsForPackage.push_back("env_biome"); requiredFieldsForPackage.push_back("env_feature"); requiredFieldsForPackage.push_back("env_material"); requiredFieldsForPackage.push_back("geo_loc_name"); requiredFieldsForPackage.push_back("lat_lon"); requiredFieldsForPackage.push_back("seq_methods"); requiredFieldsForPackage.push_back("organism"); ifstream in; m->openInputFile(mimarksfile, in); //read comments string temp; packageType = ""; while(!in.eof()) { if (m->control_pressed) { break; } temp = m->getline(in); m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: " + temp + "\n"); } if (temp[0] == '#') { int pos = temp.find("MIMARKS.survey"); if (pos != string::npos) { packageType = temp.substr(1); } } else{ break; } //hit headers line } //in future may want to add parsing of format header.... vector headers; m->splitAtChar(temp, headers, '\t'); m->removeBlanks(headers); //remove * from required's for (int i = 0; i < headers.size(); i++) { if (headers[i][0] == '*') { headers[i] = headers[i].substr(1); } if (m->debug) { m->mothurOut("[DEBUG]: " + headers[i] + "\n"); } } if (m->debug) { m->mothurOut("[DEBUG]: packageType = '" + packageType + "'\n"); } if (packageType == "MIMARKS.survey.air.4.0") { requiredFieldsForPackage.push_back("altitude"); } if (packageType == "MIMARKS.survey.host-associated.4.0") { requiredFieldsForPackage.push_back("host"); } if (packageType == "MIMARKS.survey.human-associated.4.0") { requiredFieldsForPackage.push_back("host"); } if (packageType == "MIMARKS.survey.human-gut.4.0") { requiredFieldsForPackage.push_back("host"); } if (packageType == "MIMARKS.survey.human-oral.4.0") { requiredFieldsForPackage.push_back("host"); } if (packageType == "MIMARKS.survey.human-skin.4.0") { requiredFieldsForPackage.push_back("host"); } if (packageType == "MIMARKS.survey.human-vaginal.4.0") { requiredFieldsForPackage.push_back("host"); } if (packageType == "MIMARKS.survey.microbial.4.0") { requiredFieldsForPackage.push_back("depth"); requiredFieldsForPackage.push_back("elev"); } if (packageType == "MIMARKS.survey.miscellaneous.4.0") {} if (packageType == "MIMARKS.survey.plant-associated.4.0") { requiredFieldsForPackage.push_back("host"); } if (packageType == "MIMARKS.survey.sediment.4.0") { requiredFieldsForPackage.push_back("depth"); requiredFieldsForPackage.push_back("elev"); } if (packageType == "MIMARKS.survey.soil.4.0") { requiredFieldsForPackage.push_back("depth"); requiredFieldsForPackage.push_back("elev"); } if (packageType == "MIMARKS.survey.wastewater.4.0") {} if (packageType == "MIMARKS.survey.water.4.0") { requiredFieldsForPackage.push_back("depth"); } if (!m->isSubset(headers, requiredFieldsForPackage)){ string requiredFields = ""; for (int i = 0; i < requiredFieldsForPackage.size()-1; i++) { requiredFields += requiredFieldsForPackage[i] + ", "; } requiredFields += requiredFieldsForPackage[requiredFieldsForPackage.size()-1]; m->mothurOut("[ERROR]: missing required fields for package, please correct. Required fields are " + requiredFields + ".\n"); m->control_pressed = true; in.close(); return 0; } //if (m->debug) { m->mothurOut("[DEBUG]: chooseAtLeastOneForPackage.size() = " + toString(chooseAtLeastOneForPackage.size()) + "\n"); } //if (!m->inUsersGroups(chooseAtLeastOneForPackage, headers)){ //returns true if any of the choose at least ones are in headers //string requiredFields = ""; //for (int i = 0; i < chooseAtLeastOneForPackage.size()-1; i++) { requiredFields += chooseAtLeastOneForPackage[i] + ", "; cout << chooseAtLeastOneForPackage[i] << endl; } //if (chooseAtLeastOneForPackage.size() < 1) { requiredFields += chooseAtLeastOneForPackage[chooseAtLeastOneForPackage.size()-1]; } //m->mothurOut("[ERROR]: missing a choose at least one fields for the package, please correct. These are marked with '**'. Required fields are " + requiredFields + ".\n"); m->control_pressed = true; in.close(); return 0; // } map allNA; for (int i = 1; i < headers.size(); i++) { allNA[headers[i]] = true; } while(!in.eof()) { if (m->control_pressed) { break; } temp = m->getline(in); m->gobble(in); //cout << temp << endl; if (m->debug) { m->mothurOut("[DEBUG]: " + temp + "\n"); } string original = temp; vector linePieces; m->splitAtChar(temp, linePieces, '\t'); m->removeBlanks(linePieces); if (linePieces.size() != headers.size()) { m->mothurOut("[ERROR]: line: " + original + " contains " + toString(linePieces.size()) + " columns, but you have " + toString(headers.size()) + " column headers, please correct.\n"); m->control_pressed = true; } else { map >:: iterator it = mimarks.find(linePieces[0]); if (it == mimarks.end()) { map categories; //start after *sample_name for (int i = 1; i < headers.size(); i++) { //check the users inputs for appropriate organisms if (headers[i] == "organism") { if (!m->inUsersGroups(linePieces[i], acceptableOrganisms)) { //not an acceptable organism organismError = true; m->mothurOut("[WARNING]: " + linePieces[i]+ " is not an acceptable organism, changing to acceptable 'metagenome'. NCBI will allow you to modify the organism after submission.\n"); linePieces[i] = "metagenome"; categories[headers[i]] = linePieces[i]; }else { if (linePieces[i] == "metagenome") { m->mothurOut("[WARNING]: metagenome is an acceptable organism, but NCBI would prefer a more specific choice if possible. Here is a link to the organism choices and descriptions, http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Undef&id=408169&lvl=3&keep=1&srchmode=1&unlock. To request the addition of a taxonomy to the list, please contact Anjanette Johnston at johnston@ncbi.nlm.nih.gov.\n"); } } Group2Organism[linePieces[0]] = linePieces[i]; } //check date format // BioSample has several accepted date formats like "DD-Mmm-YYYY" (eg., 30-Oct-2010) or standard "YYYY-mm-dd" or "YYYY-mm" (eg 2010-10-30, 2010-10). if (headers[i] == "collection_date") { //will autocorrect if possible bool okay = checkDateFormat(linePieces[i]); if (!okay) { m->control_pressed = true; } } if (linePieces[i] != "missing") { allNA[headers[i]] = false; } categories[headers[i]] = linePieces[i]; } //does this sample already match an existing sample? bool isOkaySample = true; for (map >:: iterator it2 = mimarks.begin(); it2 != mimarks.end(); it2++) { if (m->control_pressed) { break; } bool allSame = true; for (int i = 1; i < headers.size(); i++) { if ((it2->second)[headers[i]] != categories[headers[i]]) { allSame = false; } } if (allSame) { m->mothurOut("[ERROR]: " + linePieces[0]+ " is a duplicate sample to " + it2->first + ". It has all the same attributes in the MIMarks file. Samples must have distinguishing features to be uploaded to the NCBI library, please correct.\n"); m->control_pressed = true; isOkaySample = false; } } if (isOkaySample) { mimarks[linePieces[0]] = categories; } }else { m->mothurOut("[ERROR]: " + linePieces[0]+ " is a duplicate sampleName. Sample names must be unique, please correct.\n"); m->control_pressed = true; } } } in.close(); //add in values for "scrap" group map categories; //start after *sample_name for (int i = 1; i < headers.size(); i++) { categories[headers[i]] = "missing"; if (headers[i] == "organism") { categories[headers[i]] = "metagenome"; } if (headers[i] == "description") { categories[headers[i]] = "these sequences were scrapped"; } if (headers[i] == "sample_title") { categories[headers[i]] = "these sequences were scrapped"; } } mimarks["scrap"] = categories; Group2Organism["scrap"] = "metagenome"; if (organismError) { string organismTypes = ""; for (int i = 0; i < acceptableOrganisms.size()-1; i++) { organismTypes += acceptableOrganisms[i] + ", "; } organismTypes += acceptableOrganisms[acceptableOrganisms.size()-1]; m->mothurOut("\n[WARNING]: The acceptable organism choices are: " + organismTypes + ".\n\n\n"); } return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "readMIMarksFile"); exit(1); } } //********************************************************************************************************************** /* file option 1 sfffile1 oligosfile1 sfffile2 oligosfile2 ... file option 2 fastqfile1 oligosfile1 fastqfile2 oligosfile2 ... file option 3 ffastqfile1 rfastqfile1 ffastqfile2 rfastqfile2 ... file option 4 group fastqfile fastqfile group fastqfile fastqfile group fastqfile fastqfile ... file option 5 My.forward.fastq My.reverse.fastq none My.rindex.fastq //none is an option is no forward or reverse index file ... */ int SRACommand::readFile(map >& files){ try { bool runParseFastqFile = false; bool using3NONE = false; inputfile = file; files.clear(); ifstream in; m->openInputFile(file, in); fileOption = 0; while(!in.eof()) { if (m->control_pressed) { return 0; } string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); string group = ""; string thisFileName1, thisFileName2, findex, rindex; thisFileName1 = ""; thisFileName2 = ""; findex = ""; rindex = ""; if (pieces.size() == 2) { thisFileName1 = pieces[0]; thisFileName2 = pieces[1]; }else if (pieces.size() == 3) { thisFileName1 = pieces[1]; thisFileName2 = pieces[2]; group = pieces[0]; if (setOligosParameter) { m->mothurOut("[ERROR]: You cannot have an oligosfile and 3 column file option at the same time. Aborting. \n"); m->control_pressed = true; } if ((thisFileName2 != "none") && (thisFileName2 != "NONE" )) { if (!using3NONE) { libLayout = "paired"; } else { m->mothurOut("[ERROR]: You cannot have a 3 column file with paired and unpaired files at the same time. Aborting. \n"); m->control_pressed = true; } } else { thisFileName2 = ""; libLayout = "single"; using3NONE = true; } }else if (pieces.size() == 4) { if (!setOligosParameter) { m->mothurOut("[ERROR]: You must have an oligosfile with the index file option. Aborting. \n"); m->control_pressed = true; } thisFileName1 = pieces[0]; thisFileName2 = pieces[1]; findex = pieces[2]; rindex = pieces[3]; if ((findex == "none") || (findex == "NONE")){ findex = ""; } if ((rindex == "none") || (rindex == "NONE")){ rindex = ""; } }else { m->mothurOut("[ERROR]: file lines can be 2, 3 or 4 columns. The 2 column files are sff file then oligos or fastqfile then oligos or ffastq and rfastq. You may have multiple lines in the file. The 3 column files are for paired read libraries. The format is groupName, forwardFastqFile reverseFastqFile. Four column files are for inputting file pairs with index files. Example: My.forward.fastq My.reverse.fastq NONE My.rindex.fastq. The keyword NONE can be used when there is not a index file for either the forward or reverse file.\n"); m->control_pressed = true; } if (m->debug) { m->mothurOut("[DEBUG]: group = " + group + ", thisFileName1 = " + thisFileName1 + ", thisFileName2 = " + thisFileName2 + ".\n"); } if (inputDir != "") { string path = m->hasPath(thisFileName1); if (path == "") { thisFileName1 = inputDir + thisFileName1; } if (thisFileName2 != "") { path = m->hasPath(thisFileName2); if (path == "") { thisFileName2 = inputDir + thisFileName2; } } if (findex != "") { path = m->hasPath(findex); if (path == "") { findex = inputDir + findex; } } if (rindex != "") { path = m->hasPath(rindex); if (path == "") { rindex = inputDir + rindex; } } } //check to make sure both are able to be opened ifstream in2; int openForward = m->openInputFile(thisFileName1, in2, "noerror"); //if you can't open it, try default location if (openForward == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(thisFileName1); m->mothurOut("Unable to open " + thisFileName1 + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openForward = m->openInputFile(tryPath, in3, "noerror"); in3.close(); thisFileName1 = tryPath; } } //if you can't open it, try output location if (openForward == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(thisFileName1); m->mothurOut("Unable to open " + thisFileName1 + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openForward = m->openInputFile(tryPath, in4, "noerror"); thisFileName1 = tryPath; in4.close(); } } if (openForward == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + thisFileName1 + ", ignoring.\n"); }else{ in2.close(); } int openReverse = 0; if (thisFileName2 != "") { ifstream in3; openReverse = m->openInputFile(thisFileName2, in3, "noerror"); //if you can't open it, try default location if (openReverse == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(thisFileName2); m->mothurOut("Unable to open " + thisFileName2 + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in3; openReverse = m->openInputFile(tryPath, in3, "noerror"); in3.close(); thisFileName2 = tryPath; } } //if you can't open it, try output location if (openReverse == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(thisFileName2); m->mothurOut("Unable to open " + thisFileName2 + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in4; openReverse = m->openInputFile(tryPath, in4, "noerror"); thisFileName2 = tryPath; in4.close(); } } if (openReverse == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + thisFileName2 + ", ignoring pair.\n"); }else{ in3.close(); } } int openFindex = 0; if (findex != "") { ifstream in4; openFindex = m->openInputFile(findex, in4, "noerror"); in4.close(); //if you can't open it, try default location if (openFindex == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(findex); m->mothurOut("Unable to open " + findex + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in5; openFindex = m->openInputFile(tryPath, in5, "noerror"); in5.close(); findex = tryPath; } } //if you can't open it, try output location if (openFindex == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(findex); m->mothurOut("Unable to open " + findex + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in6; openFindex = m->openInputFile(tryPath, in6, "noerror"); findex = tryPath; in6.close(); } } if (openFindex == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + findex + ", ignoring pair.\n"); } } int openRindex = 0; if (rindex != "") { ifstream in7; openRindex = m->openInputFile(rindex, in7, "noerror"); in7.close(); //if you can't open it, try default location if (openRindex == 1) { if (m->getDefaultPath() != "") { //default path is set string tryPath = m->getDefaultPath() + m->getSimpleName(rindex); m->mothurOut("Unable to open " + rindex + ". Trying default " + tryPath); m->mothurOutEndLine(); ifstream in8; openRindex = m->openInputFile(tryPath, in8, "noerror"); in8.close(); rindex = tryPath; } } //if you can't open it, try output location if (openRindex == 1) { if (m->getOutputDir() != "") { //default path is set string tryPath = m->getOutputDir() + m->getSimpleName(rindex); m->mothurOut("Unable to open " + rindex + ". Trying output directory " + tryPath); m->mothurOutEndLine(); ifstream in9; openRindex = m->openInputFile(tryPath, in9, "noerror"); rindex = tryPath; in9.close(); } } if (openRindex == 1) { //can't find it m->mothurOut("[WARNING]: can't find " + rindex + ", ignoring pair.\n"); } } if ((pieces.size() == 2) && (openForward != 1) && (openReverse != 1)) { //good pair and sff or fastq and oligos libLayout = "single"; if (!setOligosParameter) { //process pair int pos = thisFileName1.find(".sff"); if (pos != string::npos) {//these files are sff files fileOption = 1; isSFF = true; sfffile = thisFileName1; oligosfile = thisFileName2; if (m->debug) { m->mothurOut("[DEBUG]: about to read oligos\n"); } readOligos(); if (m->debug) { m->mothurOut("[DEBUG]: about to parse\n"); } parseSffFile(files); if (m->debug) { m->mothurOut("[DEBUG]: done parsing " + sfffile + "\n"); } }else{ fileOption = 2; isSFF = false; fastqfile = thisFileName1; oligosfile = thisFileName2; if (m->debug) { m->mothurOut("[DEBUG]: about to read oligos\n"); } readOligos(); if (m->debug) { m->mothurOut("[DEBUG]: about to parse\n"); } parseFastqFile(files); if (m->debug) { m->mothurOut("[DEBUG]: done parsing " + fastqfile + "\n"); } } }else { runParseFastqFile = true; libLayout = "paired"; fileOption = 3; } }else if((pieces.size() == 3) && (openForward != 1) && (openReverse != 1)) { //good pair and paired read string thisname = thisFileName1 + " " + thisFileName2; if (using3NONE) { thisname = thisFileName1; } map >::iterator it = files.find(group); if (it == files.end()) { Groups.push_back(group); vector temp; temp.push_back(thisname); files[group] = temp; }else { files[group].push_back(thisname); } fileOption = 4; }else if ((pieces.size() == 4) && (openForward != 1) && (openReverse != 1) && (openFindex != 1) && (openRindex != 1)) { libLayout = "paired"; runParseFastqFile = true; fileOption = 5; } } in.close(); if (runParseFastqFile) { vector theseFiles; string commandString = "fasta=f, qfile=f, file=" + file; commandString += ", oligos=" + oligosfile; //add in pdiffs, bdiffs, ldiffs, sdiffs, tdiffs if (pdiffs != 0) { commandString += ", pdiffs=" + toString(pdiffs); } if (bdiffs != 0) { commandString += ", bdiffs=" + toString(bdiffs); } if (ldiffs != 0) { commandString += ", ldiffs=" + toString(ldiffs); } if (sdiffs != 0) { commandString += ", sdiffs=" + toString(sdiffs); } if (tdiffs != 0) { commandString += ", tdiffs=" + toString(tdiffs); } if (m->isTrue(checkorient)) { commandString += ", checkorient=" + checkorient; } m->mothurOutEndLine(); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: fastq.info(" + commandString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* fastqinfoCommand = new ParseFastaQCommand(commandString); fastqinfoCommand->execute(); map > filenames = fastqinfoCommand->getOutputFiles(); map >::iterator it = filenames.find("fastq"); if (it != filenames.end()) { theseFiles = it->second; } else { m->control_pressed = true; } // error in sffinfo delete fastqinfoCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); for (int i = 0; i < theseFiles.size(); i++) { outputNames.push_back(theseFiles[i]); } mapGroupToFile(files, theseFiles); fixMap(files); } inputfile = file; return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "readFile"); exit(1); } } //********************************************************************************************************************** int SRACommand::parseSffFile(map >& files){ try { vector theseFiles; inputfile = sfffile; libLayout = "single"; //controlled vocab isSFF = true; //run sffinfo to parse sff file into individual sampled sff files string commandString = "sff=" + sfffile; commandString += ", oligos=" + oligosfile; //add in pdiffs, bdiffs, ldiffs, sdiffs, tdiffs if (pdiffs != 0) { commandString += ", pdiffs=" + toString(pdiffs); } if (bdiffs != 0) { commandString += ", bdiffs=" + toString(bdiffs); } if (ldiffs != 0) { commandString += ", ldiffs=" + toString(ldiffs); } if (sdiffs != 0) { commandString += ", sdiffs=" + toString(sdiffs); } if (tdiffs != 0) { commandString += ", tdiffs=" + toString(tdiffs); } if (m->isTrue(checkorient)) { commandString += ", checkorient=" + checkorient; } m->mothurOutEndLine(); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: sffinfo(" + commandString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* sffinfoCommand = new SffInfoCommand(commandString); sffinfoCommand->execute(); map > filenames = sffinfoCommand->getOutputFiles(); map >::iterator it = filenames.find("sff"); if (it != filenames.end()) { theseFiles = it->second; } else { m->control_pressed = true; } // error in sffinfo delete sffinfoCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); for (int i = 0; i < theseFiles.size(); i++) { outputNames.push_back(theseFiles[i]); } mapGroupToFile(files, theseFiles); return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "readFile"); exit(1); } } //********************************************************************************************************************** int SRACommand::parseFastqFile(map >& files){ try { vector theseFiles; inputfile = fastqfile; //run sffinfo to parse sff file into individual sampled sff files string commandString = "fasta=f, qfile=f, fastq=" + fastqfile; commandString += ", oligos=" + oligosfile; //add in pdiffs, bdiffs, ldiffs, sdiffs, tdiffs if (pdiffs != 0) { commandString += ", pdiffs=" + toString(pdiffs); } if (bdiffs != 0) { commandString += ", bdiffs=" + toString(bdiffs); } if (ldiffs != 0) { commandString += ", ldiffs=" + toString(ldiffs); } if (sdiffs != 0) { commandString += ", sdiffs=" + toString(sdiffs); } if (tdiffs != 0) { commandString += ", tdiffs=" + toString(tdiffs); } if (m->isTrue(checkorient)) { commandString += ", checkorient=" + checkorient; } m->mothurOutEndLine(); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: fastq.info(" + commandString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* fastqinfoCommand = new ParseFastaQCommand(commandString); fastqinfoCommand->execute(); map > filenames = fastqinfoCommand->getOutputFiles(); map >::iterator it = filenames.find("fastq"); if (it != filenames.end()) { theseFiles = it->second; } else { m->control_pressed = true; } // error in sffinfo delete fastqinfoCommand; m->mothurCalling = false; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); for (int i = 0; i < theseFiles.size(); i++) { outputNames.push_back(theseFiles[i]); } mapGroupToFile(files, theseFiles); return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "readFile"); exit(1); } } //*************************************************************************************************************** //maps group to file int SRACommand::mapGroupToFile(map >& files, vector theseFiles){ try { for (int i = 0; i < Groups.size(); i++) { for (int j = 0; j < theseFiles.size(); j++) { string tempName = m->getSimpleName(theseFiles[j]); //if ((tempName == "GZGO5KL01.F006D146.sff") || (tempName == "G3BMWHG01.F008D021.sff") || (tempName == "GO2JXTW01.M002D125.sff") || (tempName == "GO5715J01.M003D125.sff")) { cout << Groups[i] << '\t' << theseFiles[j] << endl; } //cout << i << '\t' << j << '\t' << Groups[i] << '\t' << theseFiles[j] << endl; int pos = theseFiles[j].find(Groups[i]); if (pos != string::npos) { //you have a potential match, make sure you dont have a case of partial name if (theseFiles[j][pos+Groups[i].length()] == '.') { //final.soil.sff vs final.soil2.sff both would match soil. map >::iterator it = files.find(Groups[i]); if (it == files.end()) { vector temp; temp.push_back(theseFiles[j]); files[Groups[i]] = temp; }else { files[Groups[i]].push_back(theseFiles[j]); } } } } } return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "mapGroupToFile"); exit(1); } } //*************************************************************************************************************** //fixes map to files for index files parse int SRACommand::fixMap(map >& files){ try { for (map >::iterator it = files.begin(); it != files.end(); it++) { vector theseFiles = it->second; if (theseFiles.size() != 2) { m->mothurOut("[ERROR]: unexpected number of files, quitting. \n."); m->control_pressed = true; } if (m->control_pressed) { return 0; } vector temp; temp.resize(1, ""); for (int j = 0; j < theseFiles.size(); j++) { string tempName = m->getSimpleName(theseFiles[j]); int pos = theseFiles[j].find("forward.fastq"); if (pos != string::npos) { //you have a potential match for the forward file if (temp[0] == "") { temp[0] = theseFiles[j]; }else { string reverse = temp[0]; temp[0] = theseFiles[j] + " " + reverse; } }else { pos = theseFiles[j].find("reverse.fastq"); if (pos != string::npos) { //you have a potential match for the reverse file if (temp[0] == "") { temp[0] = theseFiles[j]; }else { temp[0] += " " + theseFiles[j]; } }else { m->mothurOut("[ERROR]: unexpected parsing results, quitting. \n."); m->control_pressed = true; //shouldn't get here unless the fastq.info changes the format of the output filenames??? } } } //cout << it->first << '\t' << temp[0] << endl; it->second = temp; } return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "fixMap"); exit(1); } } //*************************************************************************************************************** //checks groups and files returned from parse - removes any groups that did not get reads assigned to them, orders files. int SRACommand::checkGroups(map >& files){ try { vector newGroups; for (int i = 0; i < Groups.size(); i++) { if (m->debug) { m->mothurOut("[DEBUG]: group " + toString(i) + " = " + Groups[i] + "\n"); } map >::iterator it = files.find(Groups[i]); //no files for this group, remove it if (it == files.end()) { } else { newGroups.push_back(Groups[i]); } } Groups = newGroups; return 0; } catch(exception& e) { m->errorOut(e, "SRACommand", "checkGroups"); exit(1); } } //*************************************************************************************************************** int SRACommand::readOligos(){ try { Oligos oligos; if ((fileOption == 3) || (fileOption == 5)) { oligos.read(oligosfile, false); } //like make.contigs else { oligos.read(oligosfile); } if (m->control_pressed) { return false; } //error in reading oligos if (oligos.hasPairedPrimers() || oligos.hasPairedBarcodes()) { pairedOligos = true; libLayout = "paired"; } else { pairedOligos = false; libLayout = "single"; } vector thisFilesLinkers = oligos.getLinkers(); for (int i = 0; i < thisFilesLinkers.size(); i++) { linkers.push_back(thisFilesLinkers[i]); break; } if (thisFilesLinkers.size() > 1) { m->mothurOut("[WARNING]: the make.sra command only allows for the use of one linker at a time, disregarding all but first one.\n"); } vector thisFilesSpacers = oligos.getSpacers(); for (int i = 0; i < thisFilesSpacers.size(); i++) { spacers.push_back(thisFilesSpacers[i]); break; } if (thisFilesSpacers.size() > 1) { m->mothurOut("[WARNING]: the make.sra command only allows for the use of one spacer at a time, disregarding all but first one.\n"); } if (pairedOligos) { map barcodes = oligos.getPairedBarcodes(); map primers = oligos.getPairedPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->first); string barcodeName = oligos.getBarcodeName(itBar->first); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string comboName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } if(((itPrimer->second).forward+(itPrimer->second).reverse) == ""){ if ((itBar->second).forward != "NONE") { comboName += (itBar->second).forward; } if ((itBar->second).reverse != "NONE") { if (comboName == "") { comboName += (itBar->second).reverse; } else { comboName += ("."+(itBar->second).reverse); } } }else{ if(((itBar->second).forward+(itBar->second).reverse) == ""){ if ((itPrimer->second).forward != "NONE") { comboName += (itPrimer->second).forward; } if ((itPrimer->second).reverse != "NONE") { if (comboName == "") { comboName += (itPrimer->second).reverse; } else { comboName += ("."+(itPrimer->second).reverse); } } } else{ if ((itBar->second).forward != "NONE") { comboName += (itBar->second).forward; } if ((itBar->second).reverse != "NONE") { if (comboName == "") { comboName += (itBar->second).reverse; } else { comboName += ("."+(itBar->second).reverse); } } if ((itPrimer->second).forward != "NONE") { if (comboName == "") { comboName += (itPrimer->second).forward; } else { comboName += ("."+(itPrimer->second).forward); } } if ((itPrimer->second).reverse != "NONE") { if (comboName == "") { comboName += (itPrimer->second).reverse; } else { comboName += ("."+(itPrimer->second).reverse); } } } } if (comboName != "") { comboGroupName += "_" + comboName; } uniqueNames.insert(comboGroupName); map::iterator itGroup2Barcode = Group2Barcode.find(comboGroupName); if (itGroup2Barcode == Group2Barcode.end()) { string temp = (itBar->second).forward+"."+(itBar->second).reverse; Group2Barcode[comboGroupName] = temp; }else { string temp = (itBar->second).forward+"."+(itBar->second).reverse; if ((temp != ".") && (temp != itGroup2Barcode->second)) { m->mothurOut("[ERROR]: group and barcodes/primers not unique. Should never get here.\n"); } } itGroup2Barcode = Group2Primer.find(comboGroupName); if (itGroup2Barcode == Group2Primer.end()) { string temp = ((itPrimer->second).forward+"."+(itPrimer->second).reverse); Group2Primer[comboGroupName] = temp; }else { string temp = ((itPrimer->second).forward+"."+(itPrimer->second).reverse); if ((temp != ".") && (temp != itGroup2Barcode->second)) { m->mothurOut("[ERROR]: group and barcodes/primers not unique. Should never get here.\n"); } } } } } }else { map barcodes = oligos.getBarcodes() ; map primers = oligos.getPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->second); string barcodeName = oligos.getBarcodeName(itBar->second); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string comboName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } if(itPrimer->first == ""){ comboName = itBar->first; }else{ if(itBar->first == ""){ comboName = itPrimer->first; } else{ comboName = itBar->first + "." + itPrimer->first; } } //cout << comboGroupName << '\t' << comboName << endl; if (comboName != "") { comboGroupName += "_" + comboName; } uniqueNames.insert(comboGroupName); map::iterator itGroup2Barcode = Group2Barcode.find(comboGroupName); if (itGroup2Barcode == Group2Barcode.end()) { string temp = (itBar->first); Group2Barcode[comboGroupName] = temp; }else { string temp = (itBar->first); if ((temp != ".") && (temp != itGroup2Barcode->second)) { m->mothurOut("[ERROR]: group and barcodes/primers not unique. Should never get here.\n"); } } itGroup2Barcode = Group2Primer.find(comboGroupName); if (itGroup2Barcode == Group2Primer.end()) { string temp = (itPrimer->first); Group2Primer[comboGroupName] = temp; }else { string temp = (itPrimer->first); if ((temp != ".") && (temp != itGroup2Barcode->second)) { m->mothurOut("[ERROR]: group and barcodes/primers not unique. Should never get here.\n"); } } } } } } if (m->debug) { int count = 0; for (set::iterator it = uniqueNames.begin(); it != uniqueNames.end(); it++) { m->mothurOut("[DEBUG]: " + toString(count) + " groupName = " + *it + "\n"); count++; } } Groups.clear(); for (set::iterator it = uniqueNames.begin(); it != uniqueNames.end(); it++) { Groups.push_back(*it); } return true; } catch(exception& e) { m->errorOut(e, "SRACommand", "readOligos"); exit(1); } } //********************************************************************/ //_LS454-ILLUMINA-ION_TORRENT-PACBIO_SMRT bool SRACommand::checkCasesPlatforms(string& platform){ try { string original = platform; bool isOkay = true; //remove users possible case errors for (int i = 0; i < platform.size(); i++) { platform[i] = toupper(platform[i]); } //_LS454-ILLUMINA-ION_TORRENT-PACBIO_SMRT if ((platform == "_LS454") || (platform == "ILLUMINA") || (platform == "ION_TORRENT") || (platform == "PACBIO_SMRT") || (platform == "454")) { } else { isOkay = false; } if (isOkay) { if (platform == "454") { platform = "_LS454"; } }else { m->mothurOut("[ERROR]: " + original + " is not a valid platform option. Valid platform options are _LS454, ILLUMINA-ION, TORRENT or PACBIO_SMRT."); m->mothurOutEndLine(); abort = true; } return isOkay; } catch(exception& e) { m->errorOut(e, "SRACommand", "checkCasesPlatforms"); exit(1); } } //********************************************************************/ //454_GS-454_GS_20-454_GS_FLX-454_GS_FLX_Titanium-454_GS_Junior-Illumina_Genome_Analyzer-Illumina_Genome_Analyzer_II-Illumina_Genome_Analyzer_IIx-Illumina_HiSeq_2000-Illumina_HiSeq_1000-Illumina_MiSeq-PacBio_RS-Ion_Torrent_PGM-unspecified bool SRACommand::checkCasesInstrumentModels(string& instrumentModel){ try { string original = instrumentModel; bool isOkay = true; //remove users possible case errors for (int i = 0; i < instrumentModel.size(); i++) { instrumentModel[i] = toupper(instrumentModel[i]); } //_LS454-ILLUMINA-ION_TORRENT-PACBIO_SMRT if (platform == "_LS454") { //instrument model options are 454_GS-454_GS_20-454_GS_FLX-454_GS_FLX_Titanium-454_GS_Junior-unspecified if ((instrumentModel == "454_GS") || (instrumentModel == "454_GS_20") || (instrumentModel == "454_GS_FLX") || (instrumentModel == "454_GS_FLX_TITANIUM") || (instrumentModel == "454_GS_JUNIOR") || (instrumentModel == "UNSPECIFIED")) { } else { isOkay = false; } if (isOkay) { if (instrumentModel == "454_GS_FLX_TITANIUM") { instrumentModel = "454_GS_FLX_Titanium"; } if (instrumentModel == "454_GS_JUNIOR") { instrumentModel = "454_GS_Junior"; } if (instrumentModel == "UNSPECIFIED") { instrumentModel = "unspecified"; } }else { m->mothurOut("[ERROR]: " + original + " is not a valid instrument option for the " + platform + " platform. Valid instrument options are 454_GS, 454_GS_20, 454_GS_FLX, 454_GS_FLX_Titanium, 454_GS_Junior or unspecified."); m->mothurOutEndLine(); abort = true; } }else if (platform == "ILLUMINA") { //instrument model options are Illumina_Genome_Analyzer-Illumina_Genome_Analyzer_II-Illumina_Genome_Analyzer_IIx-Illumina_HiSeq_2000-Illumina_HiSeq_1000-Illumina_MiSeq-unspecified if ((instrumentModel == "ILLUMINA_GENOME_ANALYZER") || (instrumentModel == "ILLUMINA_GENOME_ANALYZER_II") || (instrumentModel == "ILLUMINA_GENOME_ANALYZER_IIX") || (instrumentModel == "ILLUMINA_HISEQ_2000") || (instrumentModel == "ILLUMINA_HISEQ_1000") || (instrumentModel == "ILLUMINA_MISEQ") || (instrumentModel == "UNSPECIFIED")) { } else { isOkay = false; } if (isOkay) { if (instrumentModel == "ILLUMINA_GENOME_ANALYZER") { instrumentModel = "Illumina_Genome_Analyzer"; } if (instrumentModel == "ILLUMINA_GENOME_ANALYZER_II") { instrumentModel = "Illumina_Genome_Analyzer_II"; } if (instrumentModel == "ILLUMINA_GENOME_ANALYZER_IIX") { instrumentModel = "Illumina_Genome_Analyzer_IIx"; } if (instrumentModel == "ILLUMINA_HISEQ_2000") { instrumentModel = "Illumina_HiSeq_2000"; } if (instrumentModel == "ILLUMINA_HISEQ_1000") { instrumentModel = "Illumina_HiSeq_1000"; } if (instrumentModel == "ILLUMINA_MISEQ") { instrumentModel = "Illumina_MiSeq"; } if (instrumentModel == "UNSPECIFIED") { instrumentModel = "unspecified"; } }else { m->mothurOut("[ERROR]: " + original + " is not a valid instrument option for the " + platform + " platform. Valid instrument options are Illumina_Genome_Analyzer, Illumina_Genome_Analyzer_II, Illumina_Genome_Analyzer_IIx, Illumina_HiSeq_2000, Illumina_HiSeq_1000, Illumina_MiSeq or unspecified."); m->mothurOutEndLine(); abort = true; } }else if (platform == "ION_TORRENT") { //instrument model options are Ion_Torrent_PGM-unspecified if ((instrumentModel == "ION_TORRENT_PGM") || (instrumentModel == "UNSPECIFIED")) { } else { isOkay = false; } if (isOkay) { if (instrumentModel == "ION_TORRENT_PGM") { instrumentModel = "Ion_Torrent_PGM"; } if (instrumentModel == "UNSPECIFIED") { instrumentModel = "unspecified"; } }else { m->mothurOut("[ERROR]: " + original + " is not a valid instrument option for the " + platform + " platform. Valid instrument options are Ion_Torrent_PGM or unspecified."); m->mothurOutEndLine(); abort = true; } }else if (platform == "PACBIO_SMRT") { //instrument model options are PacBio_RS-unspecified if ((instrumentModel == "PACBIO_RS") || (instrumentModel == "UNSPECIFIED")) { } else { isOkay = false; } if (isOkay) { if (instrumentModel == "PACBIO_RS") { instrumentModel = "PacBio_RS"; } if (instrumentModel == "UNSPECIFIED") { instrumentModel = "unspecified"; } }else { m->mothurOut("[ERROR]: " + original + " is not a valid instrument option for the " + platform + " platform. Valid instrument options are PacBio_RS or unspecified."); m->mothurOutEndLine(); abort = true; } } return isOkay; } catch(exception& e) { m->errorOut(e, "SRACommand", "checkCasesInstrumentModels"); exit(1); } } //********************************************************************************************************************** //AMPLICON,WGA,WGS,WGX,RNA-Seq,miRNA-Seq,WCS,CLONE,POOLCLONE,CLONEEND,FINISHING,ChIP-Seq,MNase-Seq,DNase-Hypersensitivity,Bisulfite-Seq,Tn-Seq,EST,FL-cDNA,CTS,MRE-Seq,MeDIP-Seq,MBD-Seq,OTHER bool SRACommand::checkCasesLibStrategy(string& libStrategy){ try { string original = libStrategy; bool isOkay = true; //remove users possible case errors for (int i = 0; i < libStrategy.size(); i++) { libStrategy[i] = toupper(libStrategy[i]); } if ((libStrategy == "AMPLICON") || (libStrategy == "WGA") || (libStrategy == "WGS") || (libStrategy == "WGX") || (libStrategy == "RNA-SEQ") || (libStrategy == "MIRNA-SEQ") || (libStrategy == "WCS") || (libStrategy == "CLONE") || (libStrategy == "POOLCLONE") || (libStrategy == "CLONEEND") || (libStrategy == "FINISHING") || (libStrategy == "CHIP-SEQ") || (libStrategy == "MNASE-SEQ") || (libStrategy == "DNASE-HYPERSENSITIVITY") || (libStrategy == "BISULFITE-SEQ") || (libStrategy == "TN-SEQ") || (libStrategy == "EST") || (libStrategy == "FL-CDNA") || (libStrategy == "CTS") || (libStrategy == "MRE-SEQ")|| (libStrategy == "MEDIP-SEQ") || (libStrategy == "MBD-SEQ") || (libStrategy == "OTHER")) { } else { isOkay = false; } if (isOkay) { if (libStrategy == "RNA-SEQ") { libStrategy = "RNA-Seq"; } if (libStrategy == "MIRNA-SEQ") { libStrategy = "miRNA-Seq"; } if (libStrategy == "CHIP-SEQ") { libStrategy = "ChIP-Seq"; } if (libStrategy == "MNASE-SEQ") { libStrategy = "MNase-Seq"; } if (libStrategy == "DNASE-HYPERSENSITIVITY") { libStrategy = "DNase-Hypersensitivity"; } if (libStrategy == "BISULFITE-SEQ") { libStrategy = "Bisulfite-Seq"; } if (libStrategy == "TN-SEQ") { libStrategy = "Tn-Seq"; } if (libStrategy == "FL-CDNA") { libStrategy = "FL-cDNA"; } if (libStrategy == "MRE-SEQ") { libStrategy = "MRE-Seq"; } if (libStrategy == "MEDIP-SEQ") { libStrategy = "MeDIP-Seq"; } }else { m->mothurOut("[ERROR]: " + original + " is not a valid libstrategy option. Valid libstrategy options are AMPLICON,WGA,WGS,WGX,RNA-Seq,miRNA-Seq,WCS,CLONE,POOLCLONE,CLONEEND,FINISHING,ChIP-Seq,MNase-Seq,DNase-Hypersensitivity,Bisulfite-Seq,Tn-Seq,EST,FL-cDNA,CTS,MRE-Seq,MeDIP-Seq,MBD-Seq or OTHER."); m->mothurOutEndLine(); abort = true; } return isOkay; } catch(exception& e) { m->errorOut(e, "SRACommand", "checkCasesLibStrategy"); exit(1); } } //********************************************************************************************************************** //METAGENOMIC,GENOMIC,TRANSCRIPTOMIC,METATRANSCRIPTOMIC,SYNTHETIC,VIRAL_RNA,OTHER bool SRACommand::checkCasesLibSource(string& libSource){ try { string original = libSource; bool isOkay = true; //remove users possible case errors for (int i = 0; i < libSource.size(); i++) { libSource[i] = toupper(libSource[i]); } if ((libSource == "METAGENOMIC") || (libSource == "GENOMIC") || (libSource == "TRANSCRIPTOMIC") || (libSource == "METATRANSCRIPTOMIC") || (libSource == "SYNTHETIC") || (libSource == "VIRAL_RNA") || (libSource == "OTHER")) { } else { isOkay = false; } if (isOkay) { }else { m->mothurOut("[ERROR]: " + original + " is not a valid libsource option. Valid libsource options are METAGENOMIC,GENOMIC,TRANSCRIPTOMIC,METATRANSCRIPTOMIC,SYNTHETIC,VIRAL_RNA or OTHER."); m->mothurOutEndLine(); abort = true; } return isOkay; } catch(exception& e) { m->errorOut(e, "SRACommand", "checkCasesLibStrategy"); exit(1); } } //********************************************************************************************************************** //PCR,RANDOM,RANDOM_PCR,RT-PCR,HMPR,MF,CF-S,CF-H,CF-T,CF-M,MDA,MSLL,cDNA,ChIP,MNase,DNAse,Hybrid_Selection,Reduced_Representation,Restriction_Digest,5-methylcytidine_antibody,MBD2_protein_methyl-CpG_binding_domain,CAGE,RACE,size_fractionation,Padlock_probes_capture_method,other,unspecified bool SRACommand::checkCasesLibSelection(string& libSelection){ try { string original = libSelection; bool isOkay = true; //remove users possible case errors for (int i = 0; i < libSelection.size(); i++) { libSelection[i] = toupper(libSelection[i]); } if ((libSelection == "PCR") || (libSelection == "RANDOM") || (libSelection == "RANDOM_PCR") || (libSelection == "RT-PCR") || (libSelection == "HMPR") || (libSelection == "MF") || (libSelection == "CF-S") || (libSelection == "CF-H") || (libSelection == "CF-T") || (libSelection == "CF-M") || (libSelection == "MDA") || (libSelection == "MSLL") || (libSelection == "CDNA") || (libSelection == "CHIP") || (libSelection == "MNASE") || (libSelection == "DNASE") || (libSelection == "HYBRID_SELECTION") || (libSelection == "REDUCED_REPRESENTATION") || (libSelection == "RESTRICTION_DIGEST") || (libSelection == "5-METHYLCYTIDINE_ANTIBODY") || (libSelection == "MBD2_PROTEIN_METHYL-CPG_BINDING_DOMAIN") || (libSelection == "CAGE") || (libSelection == "RACE") || (libSelection == "SIZE_FRACTIONATION") || (libSelection == "PADLOCK_PROBES_CAPTURE_METHOD") || (libSelection == "OTHER") || (libSelection == "UNSPECIFIED")) { } else { isOkay = false; } if (isOkay) { if (libSelection == "CDNA") { libSelection = "cDNA"; } if (libSelection == "CHIP") { libSelection = "ChIP"; } if (libSelection == "MNASE") { libSelection = "MNase"; } if (libSelection == "DNASE") { libSelection = "DNAse"; } if (libSelection == "HYBRID_SELECTION") { libSelection = "Hybrid_Selection"; } if (libSelection == "REDUCED_REPRESENTATION") { libSelection = "Reduced_Representation"; } if (libSelection == "RESTRICTION_DIGEST") { libSelection = "Restriction_Digest"; } if (libSelection == "5-METHYLCYTIDINE_ANTIBODY") { libSelection = "5-methylcytidine_antibody"; } if (libSelection == "MBD2_PROTEIN_METHYL-CPG_BINDING_DOMAIN") { libSelection = "MBD2_protein_methyl-CpG_binding_domain"; } if (libSelection == "SIZE_FRACTIONATION") { libSelection = "size_fractionation"; } if (libSelection == "PADLOCK_PROBES_CAPTURE_METHOD") { libSelection = "Padlock_probes_capture_method"; } if (libSelection == "OTHER") { libSelection = "other"; } if (libSelection == "UNSPECIFIED") { libSelection = "unspecified"; } }else { m->mothurOut("[ERROR]: " + original + " is not a valid libselection option. Valid libselection options are PCR,RANDOM,RANDOM_PCR,RT-PCR,HMPR,MF,CF-S,CF-H,CF-T,CF-M,MDA,MSLL,cDNA,ChIP,MNase,DNAse,Hybrid_Selection,Reduced_Representation,Restriction_Digest,5-methylcytidine_antibody,MBD2_protein_methyl-CpG_binding_domain,CAGE,RACE,size_fractionation,Padlock_probes_capture_method,other or unspecified."); m->mothurOutEndLine(); abort = true; } return isOkay; } catch(exception& e) { m->errorOut(e, "SRACommand", "checkCasesLibSelection"); exit(1); } } //********************************************************************************************************************** //METAGENOME,GENOME_SEQUENCING,METAGENOMIC_ASSEMBLY,ASSEMBLY,TRANSCRIPTOME,PROTEOMIC,MAP,CLONE_ENDS,TARGETED_LOCI,RANDOM_SURVEY,EXOME,VARIATION,EPIGENOMICS,PHENOTYPE,GENOTYPE,OTHER bool SRACommand::checkCasesDataType(string& dataType){ try { string original = dataType; bool isOkay = true; //remove users possible case errors for (int i = 0; i < dataType.size(); i++) { dataType[i] = toupper(dataType[i]); } if ((dataType == "METAGENOME") || (dataType == "GENOME_SEQUENCING") || (dataType == "METAGENOMIC_ASSEMBLY") || (dataType == "ASSEMBLY") || (dataType == "TRANSCRIPTOME") || (dataType == "PROTEOMIC") || (dataType == "MAP") || (dataType == "CLONE_ENDS") || (dataType == "TARGETED_LOCI") || (dataType == "RANDOM_SURVEY") || (dataType == "EXOME") || (dataType == "VARIATION") || (dataType == "EPIGENOMICS") || (dataType == "PHENOTYPE") || (dataType == "GENOTYPE") || (dataType == "OTHER")) { dataType = original; } else { isOkay = false; } if (isOkay) { }else { m->mothurOut("[ERROR]: " + original + " is not a valid datatype option. Valid datatype options are METAGENOME,GENOME_SEQUENCING,METAGENOMIC_ASSEMBLY,ASSEMBLY,TRANSCRIPTOME,PROTEOMIC,MAP,CLONE_ENDS,TARGETED_LOCI,RANDOM_SURVEY,EXOME,VARIATION,EPIGENOMICS,PHENOTYPE,GENOTYPE,OTHER."); m->mothurOutEndLine(); abort = true; } return isOkay; } catch(exception& e) { m->errorOut(e, "SRACommand", "checkCasesDataType"); exit(1); } } //********************************************************************************************************************** bool SRACommand::sanityCheckMiMarksGroups(){ try { bool isOkay = true; for (int i = 0; i < Groups.size(); i++) { if (m->control_pressed) { break; } map >::iterator it = mimarks.find(Groups[i]); if (it == mimarks.end()) { isOkay = false; m->mothurOut("[ERROR]: MIMarks file is missing group " + Groups[i] + ", please correct.\n"); } } if (!isOkay) { m->control_pressed = true; } return isOkay; } catch(exception& e) { m->errorOut(e, "SRACommand", "sanityCheckMiMarksGroups"); exit(1); } } //********************************************************************************************************************** //BioSample has several accepted date formats like "DD-Mmm-YYYY" (eg., 30-Oct-2010) or standard "YYYY-mm-dd" or "YYYY-mm" (eg 2010-10-30, 2010-10). bool SRACommand::checkDateFormat(string& date){ try { for (int i = 0; i < date.length(); i++) { if (date[i] == '/') { date[i] = '-'; } } if (m->debug) { m->mothurOut("[DEBUG]: date = " + date + "\n"); } map months; months["Jan"] = 31; months["Feb"] = 29; months["Mar"] = 31; months["Apr"] = 30; months["Jun"] = 30; months["May"] = 31; months["Jul"] = 31; months["Aug"] = 31; months["Sep"] = 30;months["Oct"] = 31; months["Nov"] = 30; months["Dec"] = 31; map monthsN; monthsN["01"] = 31; monthsN["02"] = 29; monthsN["03"] = 31; monthsN["04"] = 30; monthsN["06"] = 30; monthsN["05"] = 31; monthsN["07"] = 31; monthsN["08"] = 31; monthsN["09"] = 30;monthsN["10"] = 31; monthsN["11"] = 30; monthsN["12"] = 31; bool isOkay = true; if (m->containsAlphas(date)) { // then format == "DD-Mmm-YYYY", "Mmm-YYYY" vector pieces; if (date.find_first_of('-') != string::npos) { m->splitAtDash(date, pieces); } else { pieces = m->splitWhiteSpace(date); } if (m->debug) { m->mothurOut("[DEBUG]: in alpha\n"); } //check "Mmm-YYYY" if (pieces.size() == 2) { //"Mmm-YYYY" if (m->debug) { m->mothurOut("[DEBUG]: pieces = 2 -> " + pieces[0] + '\t' + pieces[1] + "\n"); } map::iterator it; it = months.find(pieces[0]); //is this a valid month if (it != months.end()) { if (pieces[1].size() != 4) { m->mothurOut("[ERROR]: " + pieces[1] + " is not a valid format for the year. Must be YYYY. \n"); isOkay = false; } }else { //see if we can correct if pieces[0][0] = toupper(pieces[0][0]); for (int i = 1; i < pieces[0].size(); i++) { pieces[0][i] = tolower(pieces[0][i]); } //look again it = months.find(pieces[0]); //is this a valid month if (it == months.end()) { m->mothurOut("[ERROR] " + pieces[0] + " is not a valid month. Looking for ""Mmm-YYYY\" format.\n"); isOkay = false; } else { if (pieces[1].size() != 4) { m->mothurOut("[ERROR]: " + pieces[1] + " is not a valid format for the year. Must be YYYY. \n"); isOkay = false; } } } if (isOkay) { date = pieces[0] + "-" + pieces[1]; } }else if (pieces.size() == 3) { //DD-Mmm-YYYY" if (m->debug) { m->mothurOut("[DEBUG]: pieces = 3 -> " + pieces[0] + '\t' + pieces[1] + '\t' + pieces[2] + "\n"); } map::iterator it; it = months.find(pieces[1]); //is this a valid month if (it != months.end()) { if (pieces[2].size() != 4) { m->mothurOut("[ERROR]: " + pieces[2] + " is not a valid format for the year. Must be YYYY. \n"); isOkay = false; } }else { //see if we can correct if pieces[1][0] = toupper(pieces[1][0]); for (int i = 1; i < pieces[1].size(); i++) { pieces[1][i] = tolower(pieces[1][i]); } //look again it = months.find(pieces[1]); //is this a valid month if (it == months.end()) { m->mothurOut("[ERROR] " + pieces[1] + " is not a valid month. Looking for ""Mmm-YYYY\" format.\n"); isOkay = false; } else { if (pieces[2].size() != 4) { m->mothurOut("[ERROR]: " + pieces[2] + " is not a valid format for the year. Must be YYYY. \n"); isOkay = false; } } } if (isOkay) { //check to make sure day is correct for month chosen int dayNumber; m->mothurConvert(pieces[0], dayNumber); if (dayNumber <= it->second) { if (dayNumber < 10) { //add leading 0. if (pieces[0].length() == 1) { pieces[0] = '0'+ pieces[0]; } } } } if (isOkay) { date = pieces[0] + "-" + pieces[1] + "-" + pieces[2]; } } }else { // no alpha months "YYYY" or "YYYY-mm-dd" or "YYYY-mm" if (m->debug) { m->mothurOut("[DEBUG]: in nonAlpha\n"); } vector pieces; if (date.find_first_of('-') != string::npos) { m->splitAtDash(date, pieces); } else { pieces = m->splitWhiteSpace(date); } string format = "yearFirst"; if (pieces[0].length() == 4) { format = "yearFirst"; } else if (pieces[pieces.size()-1].length() == 4) { format = "yearLast"; } if (format == "yearFirst" ) { if (m->debug) { m->mothurOut("[DEBUG]: yearFirst pieces = 3 -> " + pieces[0] + '\t' + pieces[1] + '\t' + pieces[2] + "\n"); } //just year if (pieces.size() == 1) { if (pieces[0].size() != 4) { m->mothurOut("[ERROR]: " + pieces[0] + " is not a valid format for the year. Must be YYYY. \n"); isOkay = false; } else { date= pieces[0]; } }else if (pieces.size() == 2) { //"YYYY-mm" if (pieces[0].size() != 4) { m->mothurOut("[ERROR]: " + pieces[0] + " is not a valid format for the year. Must be YYYY. \n"); isOkay = false; } //perhaps needs leading 0 if (pieces[1].length() < 2) { pieces[1] = "0" + pieces[1]; } map::iterator it = monthsN.find(pieces[1]); if (it == monthsN.end()) { m->mothurOut("[ERROR]: " + pieces[1] + " is not a valid format for the month. Must be mm. \n"); isOkay = false; } if (isOkay) { date = pieces[0] + "-" + pieces[1]; } }else if (pieces.size() == 3) { //"YYYY-mm-dd" if (pieces[0].size() != 4) { m->mothurOut("[ERROR]: " + pieces[0] + " is not a valid format for the year. Must be YYYY. \n"); isOkay = false; } //perhaps needs leading 0 if (pieces[1].length() < 2) { pieces[1] = "0" + pieces[1]; } map::iterator it = monthsN.find(pieces[1]); if (it == monthsN.end()) { m->mothurOut("[ERROR]: " + pieces[1] + " is not a valid format for the month. Must be mm. \n"); isOkay = false; }else { //is the day in range int maxDays = it->second; //perhaps needs leading 0 if (pieces[2].length() < 2) { pieces[2] = "0" + pieces[2]; } int day; m->mothurConvert(pieces[2], day); if (day <= maxDays) {} else { m->mothurOut("[ERROR]: " + pieces[2] + " is not a valid day for the month " + pieces[1]+ ". \n"); isOkay = false; } } if (isOkay) { date = pieces[0] + "-" + pieces[1] + "-" + pieces[2]; } } }else { // year last, try to fix format //if year last, then it could be dd-mm-yyyy or mm-dd-yyyy -> yyyy-mm-dd if (m->debug) { m->mothurOut("[DEBUG]: yearLast pieces = 3 -> " + pieces[0] + '\t' + pieces[1] + '\t' + pieces[2] + "\n"); } if (pieces[2].size() != 4) { m->mothurOut("[ERROR]: " + pieces[2] + " is not a valid format for the year. Must be YYYY. \n"); isOkay = false; } int first, second; m->mothurConvert(pieces[0], first); m->mothurConvert(pieces[1], second); if ((first <= 12) && (second <= 12)) { //we can't figure out which is the day and which is the month m->mothurOut("[ERROR]: " + pieces[0] + " and " + pieces[1] + " are both <= 12. Cannot determine which is the day and which is the month. \n"); isOkay = false; } else if ((first <= 12) && (second >= 12)) { //first=month and second = day, check valid date //perhaps needs leading 0 if (pieces[0].length() < 2) { pieces[0] = "0" + pieces[0]; } map::iterator it = monthsN.find(pieces[0]); if (it == monthsN.end()) { m->mothurOut("[ERROR]: " + pieces[0] + " is not a valid format for the month. Must be mm. \n"); isOkay = false; }else { //is the day in range int maxDays = it->second; if (second <= maxDays) { //reformat to acceptable format //perhaps needs leading 0 if (pieces[1].length() < 2) { pieces[1] = "0" + pieces[1]; } date = pieces[2] + "-" + pieces[0] + "-" + pieces[1]; } else { m->mothurOut("[ERROR]: " + pieces[1] + " is not a valid day for the month " + pieces[0]+ ". \n"); isOkay = false; } } }else if ((second <= 12) && (first >= 12)) { //second=month and first = day, check valid date if (pieces[1].length() < 2) { pieces[1] = "0" + pieces[1]; } map::iterator it = monthsN.find(pieces[1]); if (it == monthsN.end()) { m->mothurOut("[ERROR]: " + pieces[1] + " is not a valid format for the month. Must be mm. \n"); isOkay = false; }else { //is the day in range int maxDays = it->second; if (first <= maxDays) { //reformat to acceptable format //perhaps needs leading 0 if (pieces[0].length() < 2) { pieces[0] = "0" + pieces[0]; } date = pieces[2] + "-" + pieces[1] + "-" + pieces[0]; } else { m->mothurOut("[ERROR]: " + pieces[0] + " is not a valid day for the month " + pieces[1]+ ". \n"); isOkay = false; } } }else { m->mothurOut("[ERROR]: " + pieces[0] + " and " + pieces[1] + " are both > 12. No valid date. \n"); isOkay = false; } } } if (!isOkay) { m->mothurOut("[ERROR]: The date must be in one of the following formats: Date of sampling, in ""DD-Mmm-YYYY/"", ""Mmm-YYYY/"" or ""YYYY/"" format (eg., 30-Oct-1990, Oct-1990 or 1990) or ISO 8601 standard ""YYYY-mm-dd/"", ""YYYY-mm/"" (eg., 1990-10-30, 1990-10/"")"); } if (m->debug) { m->mothurOut("[DEBUG]: date = " + date + "\n"); } return isOkay; } catch(exception& e) { m->errorOut(e, "SRACommand", "checkDateFormat"); exit(1); } } //********************************************************************************************************************** /* file option 1 sfffile1 oligosfile1 sfffile2 oligosfile2 ... file option 2 fastqfile1 oligosfile1 fastqfile2 oligosfile2 ... file option 3 ffastqfile1 rfastqfile1 ffastqfile2 rfastqfile2 ... file option 4 group fastqfile fastqfile group fastqfile fastqfile group fastqfile fastqfile ... file option 5 My.forward.fastq My.reverse.fastq none My.rindex.fastq //none is an option is no forward or reverse index file ... */ int SRACommand::findFileOption(){ try { ifstream in; m->openInputFile(file, in); fileOption = 0; while(!in.eof()) { if (m->control_pressed) { return 0; } string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); if (pieces.size() == 2) { //good pair and sff or fastq and oligos if (!setOligosParameter) { fileOption = 12; //1 or 2 }else { fileOption = 3; } }else if(pieces.size() == 3) { //good pair and paired read fileOption = 4; }else if (pieces.size() == 4) { fileOption = 5; } break; } in.close(); return fileOption; } catch(exception& e) { m->errorOut(e, "SRACommand", "findFileOption"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/sracommand.h000066400000000000000000000057651255543666200205510ustar00rootroot00000000000000// // sracommand.h // Mothur // // Created by SarahsWork on 10/28/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_sracommand_h #define Mothur_sracommand_h #include "command.hpp" #include "trimoligos.h" #include "oligos.h" /**************************************************************************************************/ class SRACommand : public Command { public: SRACommand(string); SRACommand(); ~SRACommand(){} vector setParameters(); string getCommandName() { return "make.sra"; } string getCommandCategory() { return "Sequence Processing"; } string getOutputPattern(string); string getHelpString(); string getCitation() { return "http://www.mothur.org/wiki/Make.sra"; } string getDescription() { return "create a Sequence Read Archive / SRA"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: struct Grant { string grantId, grantTitle, grantAgency; Grant(string i, string a) : grantId(i), grantAgency(a), grantTitle("") {} Grant(string i, string a, string t) : grantId(i), grantAgency(a), grantTitle(t) {} Grant() : grantId(""), grantAgency(""), grantTitle("") {} }; bool abort, isSFF, pairedOligos, setOligosParameter; int tdiffs, bdiffs, pdiffs, sdiffs, ldiffs, fileOption; string sfffile, fastqfile, outputDir, file, oligosfile, contactfile, inputfile, mimarksfile; string libStrategy, libSource, libSelection, libLayout, platform, instrumentModel, fileType, dataType, checkorient; string submissionName, lastName, firstName, email, centerName, centerType, ownership, description, website, orientation, packageType; string projectName, projectTitle, inputDir; vector outputNames, Groups; vector Grants; map Group2Barcode; map Group2Primer; vector linkers; vector spacers; map Group2Organism; map > mimarks; //group -> valueForGroup> ex. F003D001 -> 42.282026 -83.733850> set uniqueNames; bool checkCasesInstrumentModels(string&); bool checkCasesPlatforms(string&); bool checkCasesLibStrategy(string&); bool checkCasesLibSource(string&); bool checkCasesLibSelection(string&); bool checkCasesDataType(string&); bool sanityCheckMiMarksGroups(); bool checkDateFormat(string& date); int readFile(map >&); int readContactFile(); int readMIMarksFile(); int readOligos(); int parseSffFile(map >&); int parseFastqFile(map >&); int checkGroups(map >&); int mapGroupToFile(map >&, vector); int fixMap(map >&); int findFileOption(); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/subsamplecommand.cpp000066400000000000000000002147061255543666200223070ustar00rootroot00000000000000/* * subsamplecommand.cpp * Mothur * * Created by westcott on 10/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "subsamplecommand.h" #include "sharedutilities.h" #include "deconvolutecommand.h" #include "getseqscommand.h" #include "subsample.h" //********************************************************************************************************************** vector SubSampleCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "FLSSR", "none","fasta",false,false,true); parameters.push_back(pfasta); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "none","taxonomy",false,false,true); parameters.push_back(ptaxonomy); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","group",false,false,true); parameters.push_back(pgroup); CommandParameter plist("list", "InputTypes", "", "", "none", "FLSSR", "none","list",false,false,true); parameters.push_back(plist); CommandParameter pshared("shared", "InputTypes", "", "", "none", "FLSSR", "none","shared",false,false,true); parameters.push_back(pshared); CommandParameter prabund("rabund", "InputTypes", "", "", "none", "FLSSR", "none","rabund",false,false); parameters.push_back(prabund); CommandParameter psabund("sabund", "InputTypes", "", "", "none", "FLSSR", "none","sabund",false,false); parameters.push_back(psabund); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter psize("size", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(psize); CommandParameter ppersample("persample", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(ppersample); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SubSampleCommand::getHelpString(){ try { string helpString = ""; helpString += "The sub.sample command is designed to be used as a way to normalize your data, or create a smaller set from your original set.\n"; helpString += "The sub.sample command parameters are fasta, name, list, group, count, rabund, sabund, shared, taxonomy, groups, size, persample and label. You must provide a fasta, list, sabund, rabund or shared file as an input file.\n"; helpString += "The namefile is only used with the fasta file, not with the listfile, because the list file should contain all sequences.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included. The group names are separated by dashes.\n"; helpString += "The label parameter allows you to select what distance levels you would like, and are also separated by dashes.\n"; helpString += "The size parameter allows you indicate the size of your subsample.\n"; helpString += "The persample parameter allows you indicate you want to select subsample of the same size from each of your groups, default=false. It is only used with the list and fasta files if a groupfile is given.\n"; helpString += "persample=false will select a random set of sequences of the size you select, but the number of seqs from each group may differ.\n"; helpString += "The size parameter is not set: with shared file size=number of seqs in smallest sample, with all other files if a groupfile is given and persample=true, then size=number of seqs in smallest sample, otherwise size=10% of number of seqs.\n"; helpString += "The sub.sample command should be in the following format: sub.sample(list=yourListFile, group=yourGroupFile, groups=yourGroups, label=yourLabels).\n"; helpString += "Example sub.sample(list=abrecovery.fn.list, group=abrecovery.groups, groups=B-C, size=20).\n"; helpString += "The default value for groups is all the groups in your groupfile, and all labels in your inputfile will be used.\n"; helpString += "The sub.sample command outputs a .subsample file.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SubSampleCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "fasta") { pattern = "[filename],subsample,[extension]"; } else if (type == "sabund") { pattern = "[filename],subsample,[extension]"; } else if (type == "name") { pattern = "[filename],subsample,[extension]"; } else if (type == "group") { pattern = "[filename],subsample,[extension]"; } else if (type == "count") { pattern = "[filename],subsample,[extension]"; } else if (type == "list") { pattern = "[filename],[distance],subsample,[extension]"; } else if (type == "taxonomy") { pattern = "[filename],subsample,[extension]"; } else if (type == "shared") { pattern = "[filename],[distance],subsample,[extension]"; } else if (type == "rabund") { pattern = "[filename],subsample,[extension]"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SubSampleCommand::SubSampleCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "GetRelAbundCommand"); exit(1); } } //********************************************************************************************************************** SubSampleCommand::SubSampleCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; //check to make sure all parameters are valid for command map::iterator it; for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["shared"] = tempOutNames; outputTypes["list"] = tempOutNames; outputTypes["rabund"] = tempOutNames; outputTypes["sabund"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["count"] = tempOutNames; outputTypes["taxonomy"] = tempOutNames; //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { m->setListFile(listfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { sabundfile = ""; abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { m->setSabundFile(sabundfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { rabundfile = ""; abort = true; } else if (rabundfile == "not found") { rabundfile = ""; } else { m->setRabundFile(rabundfile); } fastafile = validParameter.validFile(parameters, "fasta", true); if (fastafile == "not open") { fastafile = ""; abort = true; } else if (fastafile == "not found") { fastafile = ""; } else { m->setFastaFile(fastafile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { m->setSharedFile(sharedfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } taxonomyfile = validParameter.validFile(parameters, "taxonomy", true); if (taxonomyfile == "not open") { taxonomyfile = ""; abort = true; } else if (taxonomyfile == "not found") { taxonomyfile = ""; } else { m->setTaxonomyFile(taxonomyfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); ct.readTable(countfile, true, false); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; pickedGroups = false; } else { pickedGroups = true; m->splitAtDash(groups, Groups); m->setGroups(Groups); } string temp = validParameter.validFile(parameters, "size", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, size); temp = validParameter.validFile(parameters, "persample", false); if (temp == "not found"){ temp = "f"; } persample = m->isTrue(temp); if ((groupfile == "") && (countfile == "")) { persample = false; } if (countfile != "") { if (!ct.hasGroupInfo()) { persample = false; if (pickedGroups) { m->mothurOut("You cannot pick groups without group info in your count file."); m->mothurOutEndLine(); abort = true; } } } if ((namefile != "") && ((fastafile == "") && (taxonomyfile == ""))) { m->mothurOut("You may only use a name file with a fasta file or taxonomy file."); m->mothurOutEndLine(); abort = true; } if ((taxonomyfile != "") && ((fastafile == "") && (listfile == ""))) { m->mothurOut("You may only use a taxonomyfile with a fastafile or listfile."); m->mothurOutEndLine(); abort = true; } if ((fastafile == "") && (listfile == "") && (sabundfile == "") && (rabundfile == "") && (sharedfile == "")) { m->mothurOut("You must provide a fasta, list, sabund, rabund or shared file as an input file."); m->mothurOutEndLine(); abort = true; } if (pickedGroups && ((groupfile == "") && (sharedfile == "") && (countfile == ""))) { m->mothurOut("You cannot pick groups without a valid group, count or shared file."); m->mothurOutEndLine(); abort = true; } if (((groupfile != "") || (countfile != "")) && ((fastafile == "") && (listfile == ""))) { m->mothurOut("Group or count files are only valid with listfile or fastafile."); m->mothurOutEndLine(); abort = true; } if (((groupfile != "") || (countfile != "")) && ((fastafile != "") && (listfile != ""))) { m->mothurOut("A new group or count file can only be made from the subsample of a listfile or fastafile, not both. Please correct."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if ((fastafile != "") && (namefile == "")) { vector files; files.push_back(fastafile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "SubSampleCommand"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (sharedfile != "") { getSubSampleShared(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0;} if (listfile != "") { getSubSampleList(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (rabundfile != "") { getSubSampleRabund(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (sabundfile != "") { getSubSampleSabund(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (fastafile != "") { getSubSampleFasta(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("list"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setListFile(current); } } itTypes = outputTypes.find("shared"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSharedFile(current); } } itTypes = outputTypes.find("rabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setRabundFile(current); } } itTypes = outputTypes.find("sabund"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setSabundFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } itTypes = outputTypes.find("taxonomy"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTaxonomyFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::getSubSampleFasta() { try { if (namefile != "") { readNames(); } //fills names with all names in namefile. else { getNames(); }//no name file, so get list of names to pick from GroupMap groupMap; if (groupfile != "") { groupMap.readMap(groupfile); //takes care of user setting groupNames that are invalid or setting groups=all SharedUtil util; vector namesGroups = groupMap.getNamesOfGroups(); util.setGroups(Groups, namesGroups); //file mismatch quit if (names.size() != groupMap.getNumSeqs()) { m->mothurOut("[ERROR]: your fasta file contains " + toString(names.size()) + " sequences, and your groupfile contains " + toString(groupMap.getNumSeqs()) + ", please correct."); m->mothurOutEndLine(); return 0; } }else if (countfile != "") { if (ct.hasGroupInfo()) { SharedUtil util; vector namesGroups = ct.getNamesOfGroups(); util.setGroups(Groups, namesGroups); } //file mismatch quit if (names.size() != ct.getNumUniqueSeqs()) { m->mothurOut("[ERROR]: your fasta file contains " + toString(names.size()) + " sequences, and your count file contains " + toString(ct.getNumUniqueSeqs()) + " unique sequences, please correct."); m->mothurOutEndLine(); return 0; } } if (m->control_pressed) { return 0; } //make sure that if your picked groups size is not too big int thisSize = 0; if (countfile == "") { thisSize = names.size(); } else { thisSize = ct. getNumSeqs(); } //all seqs not just unique if (persample) { if (size == 0) { //user has not set size, set size = smallest samples size if (countfile == "") { size = groupMap.getNumSeqs(Groups[0]); } else { size = ct.getGroupCount(Groups[0]); } for (int i = 1; i < Groups.size(); i++) { int thisSize = 0; if (countfile == "") { thisSize = groupMap.getNumSeqs(Groups[i]); } else { thisSize = ct.getGroupCount(Groups[i]); } if (thisSize < size) { size = thisSize; } } }else { //make sure size is not too large vector newGroups; for (int i = 0; i < Groups.size(); i++) { int thisSize = 0; if (countfile == "") { thisSize = groupMap.getNumSeqs(Groups[i]); } else { thisSize = ct.getGroupCount(Groups[i]); } if (thisSize >= size) { newGroups.push_back(Groups[i]); } else { m->mothurOut("You have selected a size that is larger than " + Groups[i] + " number of sequences, removing " + Groups[i] + "."); m->mothurOutEndLine(); } } Groups = newGroups; if (newGroups.size() == 0) { m->mothurOut("[ERROR]: all groups removed."); m->mothurOutEndLine(); m->control_pressed = true; } } m->mothurOut("Sampling " + toString(size) + " from each group."); m->mothurOutEndLine(); }else { if (pickedGroups) { int total = 0; for(int i = 0; i < Groups.size(); i++) { if (countfile == "") { total += groupMap.getNumSeqs(Groups[i]); } else { total += ct.getGroupCount(Groups[i]); } } if (size == 0) { //user has not set size, set size = 10% samples size size = int (total * 0.10); } if (total < size) { if (size != 0) { m->mothurOut("Your size is too large for the number of groups you selected. Adjusting to " + toString(int (total * 0.10)) + "."); m->mothurOutEndLine(); } size = int (total * 0.10); } m->mothurOut("Sampling " + toString(size) + " from " + toString(total) + "."); m->mothurOutEndLine(); } if (size == 0) { //user has not set size, set size = 10% samples size if (countfile == "") { size = int (names.size() * 0.10); } else { size = int (ct.getNumSeqs() * 0.10); } } if (size > thisSize) { m->mothurOut("Your fasta file only contains " + toString(thisSize) + " sequences. Setting size to " + toString(thisSize) + "."); m->mothurOutEndLine(); size = thisSize; } if (!pickedGroups) { m->mothurOut("Sampling " + toString(size) + " from " + toString(thisSize) + "."); m->mothurOutEndLine(); } } random_shuffle(names.begin(), names.end()); set subset; //dont want repeat sequence names added if (persample) { if (countfile == "") { //initialize counts map groupCounts; map::iterator itGroupCounts; for (int i = 0; i < Groups.size(); i++) { groupCounts[Groups[i]] = 0; } for (int j = 0; j < names.size(); j++) { if (m->control_pressed) { return 0; } string group = groupMap.getGroup(names[j]); if (group == "not found") { m->mothurOut("[ERROR]: " + names[j] + " is not in your groupfile. please correct."); m->mothurOutEndLine(); group = "NOTFOUND"; } else{ itGroupCounts = groupCounts.find(group); if (itGroupCounts != groupCounts.end()) { if (itGroupCounts->second < size) { subset.insert(names[j]); (itGroupCounts->second)++; } } } } }else { SubSample sample; CountTable sampledCt = sample.getSample(ct, size, Groups); vector sampledSeqs = sampledCt.getNamesOfSeqs(); for (int i = 0; i < sampledSeqs.size(); i++) { subset.insert(sampledSeqs[i]); } string countOutputDir = outputDir; if (outputDir == "") { countOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = countOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string countOutputFileName = getOutputFileName("count", variables); outputTypes["count"].push_back(countOutputFileName); outputNames.push_back(countOutputFileName); sampledCt.printTable(countOutputFileName); } }else { if (countfile == "") { //randomly select a subset of those names to include in the subsample //since names was randomly shuffled just grab the next one for (int j = 0; j < names.size(); j++) { if (m->control_pressed) { return 0; } if (groupfile != "") { //if there is a groupfile given fill in group info string group = groupMap.getGroup(names[j]); if (group == "not found") { m->mothurOut("[ERROR]: " + names[j] + " is not in your groupfile. please correct."); m->mothurOutEndLine(); group = "NOTFOUND"; } if (pickedGroups) { //if hte user picked groups, we only want to keep the names of sequences from those groups if (m->inUsersGroups(group, Groups)) { subset.insert(names[j]); } }else{ subset.insert(names[j]); } }else{ //save everyone, group subset.insert(names[j]); } //do we have enough?? if (subset.size() == size) { break; } } }else { SubSample sample; CountTable sampledCt = sample.getSample(ct, size, Groups, pickedGroups); vector sampledSeqs = sampledCt.getNamesOfSeqs(); for (int i = 0; i < sampledSeqs.size(); i++) { subset.insert(sampledSeqs[i]); } string countOutputDir = outputDir; if (outputDir == "") { countOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = countOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string countOutputFileName = getOutputFileName("count", variables); outputTypes["count"].push_back(countOutputFileName); outputNames.push_back(countOutputFileName); sampledCt.printTable(countOutputFileName); } } if (subset.size() == 0) { m->mothurOut("The size you selected is too large, skipping fasta file."); m->mothurOutEndLine(); return 0; } string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(fastafile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(fastafile)); variables["[extension]"] = m->getExtension(fastafile); string outputFileName = getOutputFileName("fasta", variables); ofstream out; m->openOutputFile(outputFileName, out); //read through fasta file outputting only the names on the subsample list ifstream in; m->openInputFile(fastafile, in); string thisname; int count = 0; map >::iterator itNameMap; while(!in.eof()){ if (m->control_pressed) { in.close(); out.close(); return 0; } Sequence currSeq(in); thisname = currSeq.getName(); if (thisname != "") { //does the subset contain a sequence that this sequence represents itNameMap = nameMap.find(thisname); if (itNameMap != nameMap.end()) { vector nameRepresents = itNameMap->second; for (int i = 0; i < nameRepresents.size(); i++){ if (subset.count(nameRepresents[i]) != 0) { out << ">" << nameRepresents[i] << endl << currSeq.getAligned() << endl; count++; } } }else{ m->mothurOut("[ERROR]: " + thisname + " is not in your namefile, please correct."); m->mothurOutEndLine(); } } m->gobble(in); } in.close(); out.close(); if (count != subset.size()) { m->mothurOut("[ERROR]: The subset selected contained " + toString(subset.size()) + " sequences, but I only found " + toString(count) + " of those in the fastafile."); m->mothurOutEndLine(); } if (namefile != "") { m->mothurOut("Deconvoluting subsampled fasta file... "); m->mothurOutEndLine(); map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputNameFileName = getOutputFileName("name", variables); //use unique.seqs to create new name and fastafile string inputString = "fasta=" + outputFileName; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: unique.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* uniqueCommand = new DeconvoluteCommand(inputString); uniqueCommand->execute(); map > filenames = uniqueCommand->getOutputFiles(); delete uniqueCommand; m->mothurCalling = false; m->renameFile(filenames["name"][0], outputNameFileName); m->renameFile(filenames["fasta"][0], outputFileName); outputTypes["name"].push_back(outputNameFileName); outputNames.push_back(outputNameFileName); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Done."); m->mothurOutEndLine(); if (taxonomyfile != "") { set tempSubset; //get new unique names from fasta file //read through fasta file outputting only the names on the subsample list after deconvolute ifstream in2; m->openInputFile(outputFileName, in2); while (!in2.eof()) { Sequence seq(in2); m->gobble(in2); if (seq.getName() != "") { tempSubset.insert(seq.getName()); } } in2.close(); //send that list to getTax int tcount = getTax(tempSubset); if (tcount != tempSubset.size()) { m->mothurOut("[ERROR]: subsampled fasta file contains " + toString(tempSubset.size()) + " sequences, but I only found " + toString(tcount) + " in your taxonomy file, please correct."); m->mothurOutEndLine(); } } }else { if (taxonomyfile != "") { int tcount = getTax(subset); if (tcount != subset.size()) { m->mothurOut("[ERROR]: subsampled fasta file contains " + toString(subset.size()) + " sequences, but I only found " + toString(tcount) + " in your taxonomy file, please correct."); m->mothurOutEndLine(); } } //should only contain uniques. } outputTypes["fasta"].push_back(outputFileName); outputNames.push_back(outputFileName); //if a groupfile is provided read through the group file only outputting the names on the subsample list if (groupfile != "") { string groupOutputDir = outputDir; if (outputDir == "") { groupOutputDir += m->hasPath(groupfile); } map variables; variables["[filename]"] = groupOutputDir + m->getRootName(m->getSimpleName(groupfile)); variables["[extension]"] = m->getExtension(groupfile); string groupOutputFileName = getOutputFileName("group", variables); ofstream outGroup; m->openOutputFile(groupOutputFileName, outGroup); outputTypes["group"].push_back(groupOutputFileName); outputNames.push_back(groupOutputFileName); ifstream inGroup; m->openInputFile(groupfile, inGroup); string name, group; while(!inGroup.eof()){ if (m->control_pressed) { inGroup.close(); outGroup.close(); return 0; } inGroup >> name; m->gobble(inGroup); //read from first column inGroup >> group; //read from second column //if this name is in the accnos file if (subset.count(name) != 0) { outGroup << name << '\t' << group << endl; subset.erase(name); } m->gobble(inGroup); } inGroup.close(); outGroup.close(); //sanity check if (subset.size() != 0) { m->mothurOut("Your groupfile does not match your fasta file."); m->mothurOutEndLine(); for (set::iterator it = subset.begin(); it != subset.end(); it++) { m->mothurOut("[ERROR]: " + *it + " is missing from your groupfile."); m->mothurOutEndLine(); } } } return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getSubSampleFasta"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::getNames() { try { ifstream in; m->openInputFile(fastafile, in); string thisname; while(!in.eof()){ if (m->control_pressed) { in.close(); return 0; } Sequence currSeq(in); thisname = currSeq.getName(); if (thisname != "") { vector temp; temp.push_back(thisname); nameMap[thisname] = temp; names.push_back(thisname); } m->gobble(in); } in.close(); return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getNames"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::readNames() { try { nameMap.clear(); m->readNames(namefile, nameMap); //save names of all sequences map >::iterator it; for (it = nameMap.begin(); it != nameMap.end(); it++) { for (int i = 0; i < (it->second).size(); i++) { names.push_back((it->second)[i]); } } return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "readNames"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::getSubSampleShared() { try { InputData* input = new InputData(sharedfile, "sharedfile"); vector lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (size == 0) { //user has not set size, set size = smallest samples size size = lookup[0]->getNumSeqs(); for (int i = 1; i < lookup.size(); i++) { int thisSize = lookup[i]->getNumSeqs(); if (thisSize < size) { size = thisSize; } } }else { m->clearGroups(); Groups.clear(); vector temp; for (int i = 0; i < lookup.size(); i++) { if (lookup[i]->getNumSeqs() < size) { m->mothurOut(lookup[i]->getGroup() + " contains " + toString(lookup[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete lookup[i]; }else { Groups.push_back(lookup[i]->getGroup()); temp.push_back(lookup[i]); } } lookup = temp; m->setGroups(Groups); } if (lookup.size() == 0) { m->mothurOut("The size you selected is too large, skipping shared file."); m->mothurOutEndLine(); delete input; return 0; } m->mothurOut("Sampling " + toString(size) + " from each group."); m->mothurOutEndLine(); //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete input; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } //get next line to process lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processShared(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } delete input; return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getSubSampleShared"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::processShared(vector& thislookup) { try { //save mothurOut's binLabels to restore for next label vector saveBinLabels = m->currentSharedBinLabels; string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sharedfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[extension]"] = m->getExtension(sharedfile); variables["[distance]"] = thislookup[0]->getLabel(); string outputFileName = getOutputFileName("shared", variables); SubSample sample; vector subsampledLabels = sample.getSample(thislookup, size); if (m->control_pressed) { return 0; } ofstream out; m->openOutputFile(outputFileName, out); outputTypes["shared"].push_back(outputFileName); outputNames.push_back(outputFileName); m->currentSharedBinLabels = subsampledLabels; thislookup[0]->printHeaders(out); for (int i = 0; i < thislookup.size(); i++) { out << thislookup[i]->getLabel() << '\t' << thislookup[i]->getGroup() << '\t'; thislookup[i]->print(out); } out.close(); //save mothurOut's binLabels to restore for next label m->currentSharedBinLabels = saveBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "processShared"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::getSubSampleList() { try { if (namefile != "") { m->readNames(namefile, nameMap); } InputData* input = new InputData(listfile, "list"); ListVector* list = input->getListVector(); string lastLabel = list->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; ofstream outGroup; GroupMap groupMap; if (groupfile != "") { groupMap.readMap(groupfile); //takes care of user setting groupNames that are invalid or setting groups=all SharedUtil util; vector namesGroups = groupMap.getNamesOfGroups(); util.setGroups(Groups, namesGroups); //create outputfiles string groupOutputDir = outputDir; if (outputDir == "") { groupOutputDir += m->hasPath(groupfile); } string groupOutputFileName = groupOutputDir + m->getRootName(m->getSimpleName(groupfile)) + "subsample" + m->getExtension(groupfile); m->openOutputFile(groupOutputFileName, outGroup); outputTypes["group"].push_back(groupOutputFileName); outputNames.push_back(groupOutputFileName); //file mismatch quit if (list->getNumSeqs() != groupMap.getNumSeqs()) { m->mothurOut("[ERROR]: your list file contains " + toString(list->getNumSeqs()) + " sequences, and your groupfile contains " + toString(groupMap.getNumSeqs()) + ", please correct."); m->mothurOutEndLine(); delete list; delete input; outGroup.close(); return 0; } }else if (countfile != "") { if (ct.hasGroupInfo()) { SharedUtil util; vector namesGroups = ct.getNamesOfGroups(); util.setGroups(Groups, namesGroups); } //file mismatch quit if (list->getNumSeqs() != ct.getNumUniqueSeqs()) { m->mothurOut("[ERROR]: your list file contains " + toString(list->getNumSeqs()) + " sequences, and your count file contains " + toString(ct.getNumUniqueSeqs()) + " unique sequences, please correct."); m->mothurOutEndLine(); return 0; } } //make sure that if your picked groups size is not too big if (persample) { if (size == 0) { //user has not set size, set size = smallest samples size if (countfile == "") { size = groupMap.getNumSeqs(Groups[0]); } else { size = ct.getGroupCount(Groups[0]); } for (int i = 1; i < Groups.size(); i++) { int thisSize = 0; if (countfile == "") { thisSize = groupMap.getNumSeqs(Groups[i]); } else { thisSize = ct.getGroupCount(Groups[i]); } if (thisSize < size) { size = thisSize; } } }else { //make sure size is not too large vector newGroups; for (int i = 0; i < Groups.size(); i++) { int thisSize = 0; if (countfile == "") { thisSize = groupMap.getNumSeqs(Groups[i]); } else { thisSize = ct.getGroupCount(Groups[i]); } if (thisSize >= size) { newGroups.push_back(Groups[i]); } else { m->mothurOut("You have selected a size that is larger than " + Groups[i] + " number of sequences, removing " + Groups[i] + "."); m->mothurOutEndLine(); } } Groups = newGroups; if (newGroups.size() == 0) { m->mothurOut("[ERROR]: all groups removed."); m->mothurOutEndLine(); m->control_pressed = true; } } m->mothurOut("Sampling " + toString(size) + " from each group."); m->mothurOutEndLine(); }else{ if (pickedGroups) { int total = 0; for(int i = 0; i < Groups.size(); i++) { if (countfile == "") { total += groupMap.getNumSeqs(Groups[i]); } else { total += ct.getGroupCount(Groups[i]); } } if (size == 0) { //user has not set size, set size = 10% samples size size = int (total * 0.10); } if (total < size) { if (size != 0) { m->mothurOut("Your size is too large for the number of groups you selected. Adjusting to " + toString(int (total * 0.10)) + "."); m->mothurOutEndLine(); } size = int (total * 0.10); } m->mothurOut("Sampling " + toString(size) + " from " + toString(total) + "."); m->mothurOutEndLine(); }else { if (size == 0) { //user has not set size, set size = 10% samples size if (countfile == "") { size = int (list->getNumSeqs() * 0.10); } else { size = int (ct.getNumSeqs() * 0.10); } } int thisSize = 0; if (countfile == "") { thisSize = list->getNumSeqs(); } else { thisSize = ct.getNumSeqs(); } if (size > thisSize) { m->mothurOut("Your list file only contains " + toString(thisSize) + " sequences. Setting size to " + toString(thisSize) + "."); m->mothurOutEndLine(); size = thisSize; } m->mothurOut("Sampling " + toString(size) + " from " + toString(thisSize) + "."); m->mothurOutEndLine(); } } set subset; //dont want repeat sequence names added if (countfile == "") { //fill names for (int i = 0; i < list->getNumBins(); i++) { string binnames = list->get(i); vector thisBin; m->splitAtComma(binnames, thisBin); for(int j=0;jmothurOut("[ERROR]: " + thisBin[j] + " is not in your groupfile. please correct."); m->mothurOutEndLine(); group = "NOTFOUND"; } //if hte user picked groups, we only want to keep the names of sequences from those groups if (pickedGroups) { if (m->inUsersGroups(group, Groups)) { names.push_back(thisBin[j]); } } else{ names.push_back(thisBin[j]); } }//save everyone, group else{ names.push_back(thisBin[j]); } } } random_shuffle(names.begin(), names.end()); //randomly select a subset of those names to include in the subsample if (persample) { //initialize counts map groupCounts; map::iterator itGroupCounts; for (int i = 0; i < Groups.size(); i++) { groupCounts[Groups[i]] = 0; } for (int j = 0; j < names.size(); j++) { if (m->control_pressed) { delete list; delete input; return 0; } string group = groupMap.getGroup(names[j]); if (group == "not found") { m->mothurOut("[ERROR]: " + names[j] + " is not in your groupfile. please correct."); m->mothurOutEndLine(); group = "NOTFOUND"; } else{ itGroupCounts = groupCounts.find(group); if (itGroupCounts != groupCounts.end()) { if (groupCounts[group] < size) { subset.insert(names[j]); groupCounts[group]++; } } } } }else{ for (int j = 0; j < size; j++) { if (m->control_pressed) { break; } subset.insert(names[j]); } } if (groupfile != "") { //write out new groupfile for (set::iterator it = subset.begin(); it != subset.end(); it++) { string group = groupMap.getGroup(*it); if (group == "not found") { group = "NOTFOUND"; } outGroup << *it << '\t' << group << endl; } outGroup.close(); } }else { SubSample sample; CountTable sampledCt; if (persample) { sampledCt = sample.getSample(ct, size, Groups); } else { sampledCt = sample.getSample(ct, size, Groups, pickedGroups); } vector sampledSeqs = sampledCt.getNamesOfSeqs(); for (int i = 0; i < sampledSeqs.size(); i++) { subset.insert(sampledSeqs[i]); } string countOutputDir = outputDir; if (outputDir == "") { countOutputDir += m->hasPath(countfile); } map variables; variables["[filename]"] = countOutputDir + m->getRootName(m->getSimpleName(countfile)); variables["[extension]"] = m->getExtension(countfile); string countOutputFileName = getOutputFileName("count", variables); outputTypes["count"].push_back(countOutputFileName); outputNames.push_back(countOutputFileName); sampledCt.printTable(countOutputFileName); } //as long as you are not at the end of the file or done wih the lines you want while((list != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete list; delete input; return 0; } if(allLines == 1 || labels.count(list->getLabel()) == 1){ m->mothurOut(list->getLabel()); m->mothurOutEndLine(); processList(list, subset); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); } if ((m->anyLabelsToProcess(list->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = list->getLabel(); delete list; list = input->getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); processList(list, subset); processedLabels.insert(list->getLabel()); userLabels.erase(list->getLabel()); //restore real lastlabel to save below list->setLabel(saveLabel); } lastLabel = list->getLabel(); delete list; list = NULL; //get next line to process list = input->getListVector(); } if (m->control_pressed) { if (list != NULL) { delete list; } delete input; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (list != NULL) { delete list; } list = input->getListVector(lastLabel); m->mothurOut(list->getLabel()); m->mothurOutEndLine(); processList(list, subset); delete list; list = NULL; } if (list != NULL) { delete list; } delete input; if (taxonomyfile != "") { if (namefile == "") { InputData input(listfile, "list"); ListVector* list = input.getListVector(); string lastLabel = list->getLabel(); for (int i = 0; i < list->getNumBins(); i++) { vector temp; string bin = list->get(i); m->splitAtComma(bin, temp); for (int j = 0; j < temp.size(); j++) { vector tempFakeOut; tempFakeOut.push_back(temp[j]); nameMap[temp[j]] = tempFakeOut; } } delete list; int tcount = getTax(subset); if (tcount != subset.size()) { m->mothurOut("[ERROR]: subsampled list file contains " + toString(subset.size()) + " sequences, but I only found " + toString(tcount) + " in your taxonomy file, did you forget a name file? Please correct."); m->mothurOutEndLine(); } }else { string tempAccnos = "temp.accnos"; ofstream outAccnos; m->openOutputFile(tempAccnos, outAccnos); for (set::iterator it = subset.begin(); it != subset.end(); it++) { outAccnos << *it << endl; } outAccnos.close(); m->mothurOut("Sampling taxonomy and name file... "); m->mothurOutEndLine(); string thisNameOutputDir = outputDir; if (outputDir == "") { thisNameOutputDir += m->hasPath(namefile); } map variables; variables["[filename]"] = thisNameOutputDir + m->getRootName(m->getSimpleName(namefile)); variables["[extension]"] = m->getExtension(namefile); string outputNameFileName = getOutputFileName("name", variables); string thisTaxOutputDir = outputDir; if (outputDir == "") { thisTaxOutputDir += m->hasPath(taxonomyfile); } variables["[filename]"] = thisTaxOutputDir + m->getRootName(m->getSimpleName(taxonomyfile)); variables["[extension]"] = m->getExtension(taxonomyfile); string outputTaxFileName = getOutputFileName("taxonomy", variables); //use unique.seqs to create new name and fastafile string inputString = "dups=f, name=" + namefile + ", taxonomy=" + taxonomyfile + ", accnos=" + tempAccnos; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: get.seqs(" + inputString + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* getCommand = new GetSeqsCommand(inputString); getCommand->execute(); map > filenames = getCommand->getOutputFiles(); delete getCommand; m->mothurCalling = false; m->renameFile(filenames["name"][0], outputNameFileName); m->renameFile(filenames["taxonomy"][0], outputTaxFileName); outputTypes["name"].push_back(outputNameFileName); outputNames.push_back(outputNameFileName); outputNames.push_back(outputTaxFileName); outputTypes["taxonomy"].push_back(outputTaxFileName); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Done."); m->mothurOutEndLine(); } } return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getSubSampleList"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::processList(ListVector*& list, set& subset) { try { string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(listfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(listfile)); variables["[extension]"] = m->getExtension(listfile); variables["[distance]"] = list->getLabel(); string outputFileName = getOutputFileName("list", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["list"].push_back(outputFileName); outputNames.push_back(outputFileName); int numBins = list->getNumBins(); ListVector* temp = new ListVector(); temp->setLabel(list->getLabel()); vector binLabels = list->getLabels(); vector newLabels; for (int i = 0; i < numBins; i++) { if (m->control_pressed) { break; } string bin = list->get(i); vector binnames; m->splitAtComma(bin, binnames); string newNames = ""; for(int j=0;jpush_back(newNames); newLabels.push_back(binLabels[i]); } } temp->setLabels(newLabels); delete list; list = temp; if (m->control_pressed) { out.close(); return 0; } list->printHeaders(out); list->print(out); out.close(); return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "processList"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::getSubSampleRabund() { try { InputData* input = new InputData(rabundfile, "rabund"); RAbundVector* rabund = input->getRAbundVector(); string lastLabel = rabund->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (size == 0) { //user has not set size, set size = 10% size = int((rabund->getNumSeqs()) * 0.10); }else if (size > rabund->getNumSeqs()) { m->mothurOut("The size you selected is too large, skipping rabund file."); m->mothurOutEndLine(); delete input; delete rabund; return 0; } m->mothurOut("Sampling " + toString(size) + " from " + toString(rabund->getNumSeqs()) + "."); m->mothurOutEndLine(); string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(rabundfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(rabundfile)); variables["[extension]"] = m->getExtension(rabundfile); string outputFileName = getOutputFileName("rabund", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["rabund"].push_back(outputFileName); outputNames.push_back(outputFileName); //as long as you are not at the end of the file or done wih the lines you want while((rabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete input; delete rabund; out.close(); return 0; } if(allLines == 1 || labels.count(rabund->getLabel()) == 1){ m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); processRabund(rabund, out); processedLabels.insert(rabund->getLabel()); userLabels.erase(rabund->getLabel()); } if ((m->anyLabelsToProcess(rabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = rabund->getLabel(); delete rabund; rabund = input->getRAbundVector(lastLabel); m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); processRabund(rabund, out); processedLabels.insert(rabund->getLabel()); userLabels.erase(rabund->getLabel()); //restore real lastlabel to save below rabund->setLabel(saveLabel); } lastLabel = rabund->getLabel(); //prevent memory leak delete rabund; rabund = NULL; //get next line to process rabund = input->getRAbundVector(); } if (m->control_pressed) { out.close(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (rabund != NULL) { delete rabund; } rabund = input->getRAbundVector(lastLabel); m->mothurOut(rabund->getLabel()); m->mothurOutEndLine(); processRabund(rabund, out); delete rabund; } delete input; out.close(); return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getSubSampleRabund"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::processRabund(RAbundVector*& rabund, ofstream& out) { try { int numBins = rabund->getNumBins(); int thisSize = rabund->getNumSeqs(); if (thisSize != size) { OrderVector* order = new OrderVector(); for(int p=0;pget(p);j++){ order->push_back(p); } } random_shuffle(order->begin(), order->end()); RAbundVector* temp = new RAbundVector(numBins); temp->setLabel(rabund->getLabel()); delete rabund; rabund = temp; for (int j = 0; j < size; j++) { if (m->control_pressed) { delete order; return 0; } int bin = order->get(j); int abund = rabund->get(bin); rabund->set(bin, (abund+1)); } delete order; } if (m->control_pressed) { return 0; } rabund->print(out); return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "processRabund"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::getSubSampleSabund() { try { InputData* input = new InputData(sabundfile, "sabund"); SAbundVector* sabund = input->getSAbundVector(); string lastLabel = sabund->getLabel(); //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (size == 0) { //user has not set size, set size = 10% size = int((sabund->getNumSeqs()) * 0.10); }else if (size > sabund->getNumSeqs()) { m->mothurOut("The size you selected is too large, skipping sabund file."); m->mothurOutEndLine(); delete input; delete sabund; return 0; } m->mothurOut("Sampling " + toString(size) + " from " + toString(sabund->getNumSeqs()) + "."); m->mothurOutEndLine(); string thisOutputDir = outputDir; if (outputDir == "") { thisOutputDir += m->hasPath(sabundfile); } map variables; variables["[filename]"] = thisOutputDir + m->getRootName(m->getSimpleName(sabundfile)); variables["[extension]"] = m->getExtension(sabundfile); string outputFileName = getOutputFileName("sabund", variables); ofstream out; m->openOutputFile(outputFileName, out); outputTypes["sabund"].push_back(outputFileName); outputNames.push_back(outputFileName); //as long as you are not at the end of the file or done wih the lines you want while((sabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { delete input; delete sabund; out.close(); return 0; } if(allLines == 1 || labels.count(sabund->getLabel()) == 1){ m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); processSabund(sabund, out); processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); } if ((m->anyLabelsToProcess(sabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = sabund->getLabel(); delete sabund; sabund = input->getSAbundVector(lastLabel); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); processSabund(sabund, out); processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); //restore real lastlabel to save below sabund->setLabel(saveLabel); } lastLabel = sabund->getLabel(); //prevent memory leak delete sabund; sabund = NULL; //get next line to process sabund = input->getSAbundVector(); } if (m->control_pressed) { out.close(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (sabund != NULL) { delete sabund; } sabund = input->getSAbundVector(lastLabel); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); processSabund(sabund, out); delete sabund; } delete input; out.close(); return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getSubSampleSabund"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::processSabund(SAbundVector*& sabund, ofstream& out) { try { RAbundVector* rabund = new RAbundVector(); *rabund = sabund->getRAbundVector(); int numBins = rabund->getNumBins(); int thisSize = rabund->getNumSeqs(); if (thisSize != size) { OrderVector* order = new OrderVector(); for(int p=0;pget(p);j++){ order->push_back(p); } } random_shuffle(order->begin(), order->end()); RAbundVector* temp = new RAbundVector(numBins); temp->setLabel(rabund->getLabel()); delete rabund; rabund = temp; for (int j = 0; j < size; j++) { if (m->control_pressed) { delete order; return 0; } int bin = order->get(j); int abund = rabund->get(bin); rabund->set(bin, (abund+1)); } delete order; } if (m->control_pressed) { return 0; } delete sabund; sabund = new SAbundVector(); *sabund = rabund->getSAbundVector(); delete rabund; sabund->print(out); return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "processSabund"); exit(1); } } //********************************************************************************************************************** int SubSampleCommand::getTax(set& subset) { try { string thisTaxOutputDir = outputDir; if (outputDir == "") { thisTaxOutputDir += m->hasPath(taxonomyfile); } map variables; variables["[filename]"] = thisTaxOutputDir + m->getRootName(m->getSimpleName(taxonomyfile)); variables["[extension]"] = m->getExtension(taxonomyfile); string outputTaxFileName = getOutputFileName("taxonomy", variables); ofstream outTax; m->openOutputFile(outputTaxFileName, outTax); outputNames.push_back(outputTaxFileName); outputTypes["taxonomy"].push_back(outputTaxFileName); //read through fasta file outputting only the names on the subsample list ifstream inTax; m->openInputFile(taxonomyfile, inTax); string tname, tax; int tcount = 0; map >::iterator itNameMap; while(!inTax.eof()){ if (m->control_pressed) { inTax.close(); outTax.close(); return 0; } inTax >> tname; m->gobble(inTax); //read from first column inTax >> tax; m->gobble(inTax); //read from second column //does the subset contain a sequence that this sequence represents itNameMap = nameMap.find(tname); if (itNameMap != nameMap.end()) { vector nameRepresents = itNameMap->second; for (int i = 0; i < nameRepresents.size(); i++){ if (subset.count(nameRepresents[i]) != 0) { outTax << nameRepresents[i] << '\t' << tax << endl; tcount++; } } }else{ m->mothurOut("[ERROR]: " + tname + " is missing, please correct."); m->mothurOutEndLine(); } } inTax.close(); outTax.close(); return tcount; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getTax"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/subsamplecommand.h000066400000000000000000000033401255543666200217420ustar00rootroot00000000000000#ifndef SUBSAMPLECOMMAND_H #define SUBSAMPLECOMMAND_H /* * subsamplecommand.h * Mothur * * Created by westcott on 10/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "sharedrabundvector.h" #include "listvector.hpp" #include "rabundvector.hpp" #include "inputdata.h" #include "sequence.hpp" #include "counttable.h" class SubSampleCommand : public Command { public: SubSampleCommand(string); SubSampleCommand(); ~SubSampleCommand() {} vector setParameters(); string getCommandName() { return "sub.sample"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Sub.sample"; } string getDescription() { return "get a sampling of sequences from a list, shared, rabund, sabund or fasta file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, pickedGroups, allLines, persample; string listfile, groupfile, countfile, sharedfile, rabundfile, sabundfile, fastafile, namefile, taxonomyfile; set labels; //holds labels to be used string groups, label, outputDir; vector Groups, outputNames; int size; vector names; map > nameMap; CountTable ct; int getSubSampleShared(); int getSubSampleList(); int getSubSampleRabund(); int getSubSampleSabund(); int getSubSampleFasta(); int processShared(vector&); int processRabund(RAbundVector*&, ofstream&); int processSabund(SAbundVector*&, ofstream&); int processList(ListVector*&, set&); int getNames(); int readNames(); int getTax(set&); }; #endif mothur-1.36.1/source/commands/summarycommand.cpp000066400000000000000000001107501255543666200220030ustar00rootroot00000000000000/* * summarycommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "summarycommand.h" #include "ace.h" #include "sobs.h" #include "nseqs.h" #include "chao1.h" #include "bootstrap.h" #include "simpson.h" #include "simpsoneven.h" #include "invsimpson.h" #include "npshannon.h" #include "shannon.h" #include "heip.h" #include "smithwilson.h" #include "shannoneven.h" #include "jackknife.h" #include "geom.h" #include "logsd.h" #include "qstat.h" #include "bergerparker.h" #include "bstick.h" #include "goodscoverage.h" #include "coverage.h" #include "efron.h" #include "boneh.h" #include "solow.h" #include "shen.h" #include "subsample.h" #include "shannonrange.h" //********************************************************************************************************************** vector SummaryCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "LRSS", "LRSS", "none","summary",false,false,true); parameters.push_back(plist); CommandParameter prabund("rabund", "InputTypes", "", "", "LRSS", "LRSS", "none","summary",false,false); parameters.push_back(prabund); CommandParameter psabund("sabund", "InputTypes", "", "", "LRSS", "LRSS", "none","summary",false,false); parameters.push_back(psabund); CommandParameter pshared("shared", "InputTypes", "", "", "LRSS", "LRSS", "none","summary",false,false,true); parameters.push_back(pshared); CommandParameter psubsample("subsample", "String", "", "", "", "", "","",false,false); parameters.push_back(psubsample); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pcalc("calc", "Multiple", "sobs-chao-nseqs-coverage-ace-jack-shannon-shannoneven-npshannon-heip-smithwilson-simpson-simpsoneven-invsimpson-bootstrap-geometric-qstat-logseries-bergerparker-bstick-goodscoverage-efron-boneh-solow-shen", "sobs-chao-ace-jack-shannon-npshannon-simpson-shannonrange", "", "", "","",true,false,true); parameters.push_back(pcalc); CommandParameter palpha("alpha", "Multiple", "0-1-2", "1", "", "", "","",false,false,true); parameters.push_back(palpha); CommandParameter pabund("abund", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pabund); CommandParameter psize("size", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psize); CommandParameter pgroupmode("groupmode", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pgroupmode); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SummaryCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SummaryCommand::getHelpString(){ try { string helpString = ""; ValidCalculators validCalculator; helpString += "The summary.single command parameters are list, sabund, rabund, shared, subsample, iters, label, calc, abund and groupmode. list, sabund, rabund or shared is required unless you have a valid current file.\n"; helpString += "The summary.single command should be in the following format: \n"; helpString += "summary.single(label=yourLabel, calc=yourEstimators).\n"; helpString += "Example summary.single(label=unique-.01-.03, calc=sobs-chao-ace-jack-bootstrap-shannon-npshannon-simpson).\n"; helpString += validCalculator.printCalc("summary"); helpString += "The subsample parameter allows you to enter the size of the sample or you can set subsample=T and mothur will use the size of your smallest group in the case of a shared file. With a list, sabund or rabund file you must provide a subsample size.\n"; helpString += "The iters parameter allows you to choose the number of times you would like to run the subsample.\n"; helpString += "The default value calc is sobs-chao-ace-jack-shannon-npshannon-simpson\n"; helpString += "If you are running summary.single with a shared file and would like your summary results collated in one file, set groupmode=t. (Default=true).\n"; helpString += "The alpha parameter is used to set the alpha value for the shannonrange calculator.\n"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "Note: No spaces between parameter labels (i.e. label), '=' and parameters (i.e.yourLabels).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SummaryCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SummaryCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],summary-[filename],[tag],summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SummaryCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SummaryCommand::SummaryCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SummaryCommand", "SummaryCommand"); exit(1); } } //********************************************************************************************************************** SummaryCommand::SummaryCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["summary"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("rabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["rabund"] = inputDir + it->second; } } it = parameters.find("sabund"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["sabund"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; inputfile = listfile; m->setListFile(listfile); } sabundfile = validParameter.validFile(parameters, "sabund", true); if (sabundfile == "not open") { sabundfile = ""; abort = true; } else if (sabundfile == "not found") { sabundfile = ""; } else { format = "sabund"; inputfile = sabundfile; m->setSabundFile(sabundfile); } rabundfile = validParameter.validFile(parameters, "rabund", true); if (rabundfile == "not open") { rabundfile = ""; abort = true; } else if (rabundfile == "not found") { rabundfile = ""; } else { format = "rabund"; inputfile = rabundfile; m->setRabundFile(rabundfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { format = "sharedfile"; inputfile = sharedfile; m->setSharedFile(sharedfile); } if ((sharedfile == "") && (listfile == "") && (rabundfile == "") && (sabundfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputfile = sharedfile; format = "sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { inputfile = listfile; format = "list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { rabundfile = m->getRabundFile(); if (rabundfile != "") { inputfile = rabundfile; format = "rabund"; m->mothurOut("Using " + rabundfile + " as input file for the rabund parameter."); m->mothurOutEndLine(); } else { sabundfile = m->getSabundFile(); if (sabundfile != "") { inputfile = sabundfile; format = "sabund"; m->mothurOut("Using " + sabundfile + " as input file for the sabund parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list, sabund, rabund or shared file before you can use the collect.single command."); m->mothurOutEndLine(); abort = true; } } } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "sobs-chao-ace-jack-shannon-npshannon-simpson"; } else { if (calc == "default") { calc = "sobs-chao-ace-jack-shannon-npshannon-simpson"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } string temp; temp = validParameter.validFile(parameters, "abund", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, abund); temp = validParameter.validFile(parameters, "size", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, size); temp = validParameter.validFile(parameters, "groupmode", false); if (temp == "not found") { temp = "T"; } groupMode = m->isTrue(temp); temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "subsample", false); if (temp == "not found") { temp = "F"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); subsample = true; } else { if (m->isTrue(temp)) { subsample = true; subsampleSize = -1; } //we will set it to smallest group later else { subsample = false; subsampleSize = -1; } } temp = validParameter.validFile(parameters, "alpha", false); if (temp == "not found") { temp = "1"; } m->mothurConvert(temp, alpha); if ((alpha != 0) && (alpha != 1) && (alpha != 2)) { m->mothurOut("[ERROR]: Not a valid alpha value. Valid values are 0, 1 and 2."); m->mothurOutEndLine(); abort=true; } if (subsample == false) { iters = 0; } else { //if you did not set a samplesize and are not using a sharedfile if ((subsampleSize == -1) && (format != "sharedfile")) { m->mothurOut("[ERROR]: If you want to subsample with a list, rabund or sabund file, you must provide the sample size. You can do this by setting subsample=yourSampleSize.\n"); abort=true; } } } } catch(exception& e) { m->errorOut(e, "SummaryCommand", "SummaryCommand"); exit(1); } } //********************************************************************************************************************** int SummaryCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if ((format != "sharedfile")) { inputFileNames.push_back(inputfile); } else { inputFileNames = parseSharedFile(sharedfile); format = "rabund"; } if (m->control_pressed) { return 0; } int numLines = 0; int numCols = 0; map groupIndex; for (int p = 0; p < inputFileNames.size(); p++) { numLines = 0; numCols = 0; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputFileNames[p])); string fileNameRoot = getOutputFileName("summary",variables); variables["[tag]"] = "ave-std"; string fileNameAve = getOutputFileName("summary",variables); outputNames.push_back(fileNameRoot); outputTypes["summary"].push_back(fileNameRoot); if (inputFileNames.size() > 1) { m->mothurOutEndLine(); m->mothurOut("Processing group " + groups[p]); m->mothurOutEndLine(); m->mothurOutEndLine(); groupIndex[fileNameRoot] = groups[p]; } sumCalculators.clear(); ValidCalculators validCalculator; for (int i=0; imothurRemove(outputNames[i]); } return 0; } ofstream outputFileHandle; m->openOutputFile(fileNameRoot, outputFileHandle); outputFileHandle << "label"; ofstream outAve; if (subsample) { m->openOutputFile(fileNameAve, outAve); outputNames.push_back(fileNameAve); outputTypes["summary"].push_back(fileNameAve); outAve << "label\tmethod"; outAve.setf(ios::fixed, ios::floatfield); outAve.setf(ios::showpoint); if (inputFileNames.size() > 1) { groupIndex[fileNameAve] = groups[p]; } } input = new InputData(inputFileNames[p], format); sabund = input->getSAbundVector(); string lastLabel = sabund->getLabel(); for(int i=0;igetCols() == 1){ outputFileHandle << '\t' << sumCalculators[i]->getName(); if (subsample) { outAve << '\t' << sumCalculators[i]->getName(); } numCols++; } else{ outputFileHandle << '\t' << sumCalculators[i]->getName() << "\t" << sumCalculators[i]->getName() << "_lci\t" << sumCalculators[i]->getName() << "_hci"; if (subsample) { outAve << '\t' << sumCalculators[i]->getName() << "\t" << sumCalculators[i]->getName() << "_lci\t" << sumCalculators[i]->getName() << "_hci"; } numCols += 3; } } outputFileHandle << endl; if (subsample) { outAve << endl; } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (m->control_pressed) { outputFileHandle.close(); outAve.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for(int i=0;icontrol_pressed) { outputFileHandle.close(); outAve.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for(int i=0;igetLabel()) == 1){ m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); process(sabund, outputFileHandle, outAve); if (m->control_pressed) { outputFileHandle.close(); outAve.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for(int i=0;ianyLabelsToProcess(sabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = sabund->getLabel(); delete sabund; sabund = input->getSAbundVector(lastLabel); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); process(sabund, outputFileHandle, outAve); if (m->control_pressed) { outputFileHandle.close(); outAve.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for(int i=0;isetLabel(saveLabel); } lastLabel = sabund->getLabel(); delete sabund; sabund = input->getSAbundVector(); } if (m->control_pressed) { outputFileHandle.close(); outAve.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for(int i=0;i::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (sabund != NULL) { delete sabund; } sabund = input->getSAbundVector(lastLabel); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); process(sabund, outputFileHandle, outAve); if (m->control_pressed) { outputFileHandle.close(); outAve.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for(int i=0;icontrol_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } for(int i=0;icontrol_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //create summary file containing all the groups data for each label - this function just combines the info from the files already created. if ((sharedfile != "") && (groupMode)) { vector comboNames = createGroupSummaryFile(numLines, numCols, outputNames, groupIndex); for (int i = 0; i < comboNames.size(); i++) { outputNames.push_back(comboNames[i]); } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SummaryCommand", "execute"); exit(1); } } //********************************************************************************************************************** int SummaryCommand::process(SAbundVector*& sabund, ofstream& outputFileHandle, ofstream& outAve) { try { //calculator -> data -> values vector< vector< vector > > results; results.resize(sumCalculators.size()); outputFileHandle << sabund->getLabel(); SubSample sample; for (int thisIter = 0; thisIter < iters+1; thisIter++) { SAbundVector* thisIterSabund = sabund; //we want the summary results for the whole dataset, then the subsampling if ((thisIter > 0) && subsample) { //subsample sabund and run it //copy sabund since getSample destroys it RAbundVector rabund = sabund->getRAbundVector(); SAbundVector* newSabund = new SAbundVector(); *newSabund = rabund.getSAbundVector(); sample.getSample(newSabund, subsampleSize); thisIterSabund = newSabund; } for(int i=0;i data = sumCalculators[i]->getValues(thisIterSabund); if (m->control_pressed) { return 0; } if (thisIter == 0) { outputFileHandle << '\t'; sumCalculators[i]->print(outputFileHandle); }else { //some of the calc have hci and lci need to make room for that if (results[i].size() == 0) { results[i].resize(data.size()); } //save results for ave and std. for (int j = 0; j < data.size(); j++) { if (m->control_pressed) { return 0; } results[i][j].push_back(data[j]); } } } //cleanup memory if ((thisIter > 0) && subsample) { delete thisIterSabund; } } outputFileHandle << endl; if (subsample) { outAve << sabund->getLabel() << '\t' << "ave"; //find ave and std for this label and output //will need to modify the createGroupSummary to combine results and not mess with the .summary file. //calcs -> values vector< vector > calcAverages; calcAverages.resize(sumCalculators.size()); for (int i = 0; i < calcAverages.size(); i++) { calcAverages[i].resize(results[i].size(), 0); } for (int thisIter = 0; thisIter < iters; thisIter++) { //sum all groups dists for each calculator for (int i = 0; i < calcAverages.size(); i++) { //initialize sums to zero. for (int j = 0; j < calcAverages[i].size(); j++) { calcAverages[i][j] += results[i][j][thisIter]; } } } for (int i = 0; i < calcAverages.size(); i++) { //finds average. for (int j = 0; j < calcAverages[i].size(); j++) { calcAverages[i][j] /= (float) iters; outAve << '\t' << calcAverages[i][j]; } } //find standard deviation vector< vector > stdDev; stdDev.resize(sumCalculators.size()); for (int i = 0; i < stdDev.size(); i++) { stdDev[i].resize(results[i].size(), 0); } for (int thisIter = 0; thisIter < iters; thisIter++) { //compute the difference of each dist from the mean, and square the result of each for (int i = 0; i < stdDev.size(); i++) { for (int j = 0; j < stdDev[i].size(); j++) { stdDev[i][j] += ((results[i][j][thisIter] - calcAverages[i][j]) * (results[i][j][thisIter] - calcAverages[i][j])); } } } outAve << endl << sabund->getLabel() << '\t' << "std"; for (int i = 0; i < stdDev.size(); i++) { //finds average. for (int j = 0; j < stdDev[i].size(); j++) { stdDev[i][j] /= (float) iters; stdDev[i][j] = sqrt(stdDev[i][j]); outAve << '\t' << stdDev[i][j]; } } outAve << endl; } return 0; } catch(exception& e) { m->errorOut(e, "SummaryCommand", "process"); exit(1); } } //********************************************************************************************************************** vector SummaryCommand::parseSharedFile(string filename) { try { vector filenames; map filehandles; map::iterator it3; input = new InputData(filename, "sharedfile"); vector lookup = input->getSharedRAbundVectors(); string sharedFileRoot = m->getRootName(filename); /******************************************************/ if (subsample) { if (subsampleSize == -1) { //user has not set size, set size = smallest samples size subsampleSize = lookup[0]->getNumSeqs(); for (int i = 1; i < lookup.size(); i++) { int thisSize = lookup[i]->getNumSeqs(); if (thisSize < subsampleSize) { subsampleSize = thisSize; } } }else { m->clearGroups(); vector Groups; vector temp; for (int i = 0; i < lookup.size(); i++) { if (lookup[i]->getNumSeqs() < subsampleSize) { m->mothurOut(lookup[i]->getGroup() + " contains " + toString(lookup[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete lookup[i]; }else { Groups.push_back(lookup[i]->getGroup()); temp.push_back(lookup[i]); } } lookup = temp; m->setGroups(Groups); } if (lookup.size() < 1) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); m->control_pressed = true; delete input; return filenames; } } /******************************************************/ //clears file before we start to write to it below for (int i=0; imothurRemove((sharedFileRoot + lookup[i]->getGroup() + ".rabund")); filenames.push_back((sharedFileRoot + lookup[i]->getGroup() + ".rabund")); } ofstream* temp; for (int i=0; igetGroup()] = temp; groups.push_back(lookup[i]->getGroup()); } while(lookup[0] != NULL) { for (int i = 0; i < lookup.size(); i++) { RAbundVector rav = lookup[i]->getRAbundVector(); m->openOutputFileAppend(sharedFileRoot + lookup[i]->getGroup() + ".rabund", *(filehandles[lookup[i]->getGroup()])); rav.print(*(filehandles[lookup[i]->getGroup()])); (*(filehandles[lookup[i]->getGroup()])).close(); } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } //free memory for (it3 = filehandles.begin(); it3 != filehandles.end(); it3++) { delete it3->second; } delete input; return filenames; } catch(exception& e) { m->errorOut(e, "SummaryCommand", "parseSharedFile"); exit(1); } } //********************************************************************************************************************** vector SummaryCommand::createGroupSummaryFile(int numLines, int numCols, vector& outputNames, map groupIndex) { try { //open each groups summary file vector newComboNames; map > > files; map filesTypesLabels; map filesTypesNumLines; for (int i=0; i thisFilesLines; ifstream temp; m->openInputFile(outputNames[i], temp); //read through first line - labels string labelsLine = m->getline(temp); vector theseLabels = m->splitWhiteSpace(labelsLine); string newLabel = ""; for (int j = 0; j < theseLabels.size(); j++) { if (j == 1) { newLabel += "group\t" + theseLabels[j]; } else if (j == 0) { newLabel += theseLabels[j] + "\t"; } else{ newLabel += '\t' + theseLabels[j]; } } m->gobble(temp); int stop = numLines; if (theseLabels.size() != numCols+1) { stop = numLines*2; } //for each label for (int k = 0; k < stop; k++) { string thisLine = ""; string tempLabel; for (int j = 0; j < theseLabels.size(); j++) { temp >> tempLabel; //save for later if (j == 1) { thisLine += groupIndex[outputNames[i]] + "\t" + tempLabel; } else if (j == 0) { thisLine += tempLabel + "\t"; } else{ thisLine += "\t" + tempLabel; } } thisLine += "\n"; thisFilesLines.push_back(thisLine); m->gobble(temp); } string extension = m->getExtension(outputNames[i]); if (theseLabels.size() != numCols+1) { extension = ".ave-std" + extension; } string combineFileName = outputDir + m->getRootName(m->getSimpleName(sharedfile)) + "groups" + extension; m->mothurRemove(combineFileName); //remove old file filesTypesLabels[extension] = newLabel; filesTypesNumLines[extension] = stop; map > >::iterator itFiles = files.find(extension); if (itFiles != files.end()) { //add new files info to existing type files[extension][outputNames[i]] = thisFilesLines; }else { map > thisFile; thisFile[outputNames[i]] = thisFilesLines; files[extension] = thisFile; } temp.close(); m->mothurRemove(outputNames[i]); } for (map > >::iterator itFiles = files.begin(); itFiles != files.end(); itFiles++) { if (m->control_pressed) { break; } string extension = itFiles->first; map > thisType = itFiles->second; string combineFileName = outputDir + m->getRootName(m->getSimpleName(sharedfile)) + "groups" + extension; newComboNames.push_back(combineFileName); //open combined file ofstream out; m->openOutputFile(combineFileName, out); //output label line to new file out << filesTypesLabels[extension] << endl; //for each label for (int k = 0; k < filesTypesNumLines[extension]; k++) { //grab summary data for each group for (map >::iterator itType = thisType.begin(); itType != thisType.end(); itType++) { out << (itType->second)[k]; } } outputNames.clear(); out.close(); } //return combine file name return newComboNames; } catch(exception& e) { m->errorOut(e, "SummaryCommand", "createGroupSummaryFile"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/summarycommand.h000066400000000000000000000030321255543666200214420ustar00rootroot00000000000000#ifndef SUMMARYCOMMAND_H #define SUMMARYCOMMAND_H /* * summarycommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "sabundvector.hpp" #include "inputdata.h" #include "calculator.h" #include "validcalculator.h" class SummaryCommand : public Command { public: SummaryCommand(string); SummaryCommand(); ~SummaryCommand(){} vector setParameters(); string getCommandName() { return "summary.single"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Summary.single"; } string getDescription() { return "generate summary file that has the calculator value for each line in the OTU data"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector sumCalculators; InputData* input; SAbundVector* sabund; int abund, size, iters, subsampleSize, alpha; bool abort, allLines, groupMode, subsample; set labels; //holds labels to be used string label, calc, outputDir, sharedfile, listfile, rabundfile, sabundfile, format, inputfile; vector Estimators; vector inputFileNames, outputNames; vector groups; vector parseSharedFile(string); vector createGroupSummaryFile(int, int, vector&, map); int process(SAbundVector*&, ofstream&, ofstream&); }; #endif mothur-1.36.1/source/commands/summaryqualcommand.cpp000066400000000000000000000560151255543666200226710ustar00rootroot00000000000000/* * summaryqualcommand.cpp * Mothur * * Created by westcott on 11/28/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "summaryqualcommand.h" #include "counttable.h" //********************************************************************************************************************** vector SummaryQualCommand::setParameters(){ try { CommandParameter pqual("qfile", "InputTypes", "", "", "none", "none", "none","summary",false,true,true); parameters.push_back(pqual); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SummaryQualCommand::getHelpString(){ try { string helpString = ""; helpString += "The summary.qual command reads a quality file and an optional name or count file, and summarizes the quality information.\n"; helpString += "The summary.tax command parameters are qfile, name, count and processors. qfile is required, unless you have a valid current quality file.\n"; helpString += "The name parameter allows you to enter a name file associated with your quality file. \n"; helpString += "The count parameter allows you to enter a count file associated with your quality file. \n"; helpString += "The summary.qual command should be in the following format: \n"; helpString += "summary.qual(qfile=yourQualityFile) \n"; helpString += "Note: No spaces between parameter labels (i.e. qfile), '=' and parameters (i.e.yourQualityFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SummaryQualCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],qual.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SummaryQualCommand::SummaryQualCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "SummaryQualCommand"); exit(1); } } //*************************************************************************************************************** SummaryQualCommand::SummaryQualCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //initialize outputTypes vector tempOutNames; outputTypes["summary"] = tempOutNames; //check for required parameters qualfile = validParameter.validFile(parameters, "qfile", true); if (qualfile == "not open") { qualfile = ""; abort = true; } else if (qualfile == "not found") { qualfile = m->getQualFile(); if (qualfile != "") { m->mothurOut("Using " + qualfile + " as input file for the qfile parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current quality file and the qfile parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setQualFile(qualfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (namefile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(qualfile); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if (countfile == "") { if (namefile == "") { vector files; files.push_back(qualfile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "SummaryQualCommand"); exit(1); } } //*************************************************************************************************************** int SummaryQualCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); int numSeqs = 0; vector position; vector averageQ; vector< vector > scores; if (m->control_pressed) { return 0; } if (namefile != "") { nameMap = m->readNames(namefile); } else if (countfile != "") { CountTable ct; ct.readTable(countfile, false, false); nameMap = ct.getNameMap(); } vector positions; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) positions = m->divideFile(qualfile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } #else if (processors == 1) { lines.push_back(linePair(0, 1000)); }else { positions = m->setFilePosFasta(qualfile, numSeqs); if (positions.size() < processors) { processors = positions.size(); } //figure out how many sequences you have to process int numSeqsPerProcessor = numSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(positions[startIndex], numSeqsPerProcessor)); } } #endif if(processors == 1){ numSeqs = driverCreateSummary(position, averageQ, scores, qualfile, lines[0]); } else{ numSeqs = createProcessesCreateSummary(position, averageQ, scores, qualfile); } if (m->control_pressed) { return 0; } //print summary file map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(qualfile)); string summaryFile = getOutputFileName("summary",variables); printQual(summaryFile, position, averageQ, scores); if (m->control_pressed) { m->mothurRemove(summaryFile); return 0; } //output results to screen cout.setf(ios::fixed, ios::floatfield); cout.setf(ios::showpoint); m->mothurOutEndLine(); m->mothurOut("Position\tNumSeqs\tAverageQ"); m->mothurOutEndLine(); for (int i = 0; i < position.size(); i+=100) { float average = averageQ[i] / (float) position[i]; cout << i << '\t' << position[i] << '\t' << average; m->mothurOutJustToLog(toString(i) + "\t" + toString(position[i]) + "\t" + toString(average)); m->mothurOutEndLine(); } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to create the summary file for " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(summaryFile); m->mothurOutEndLine(); outputNames.push_back(summaryFile); outputTypes["summary"].push_back(summaryFile); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "execute"); exit(1); } } /**************************************************************************************/ int SummaryQualCommand::driverCreateSummary(vector& position, vector& averageQ, vector< vector >& scores, string filename, linePair filePos) { try { ifstream in; m->openInputFile(filename, in); in.seekg(filePos.start); //adjust start if null strings if (filePos.start == 0) { m->zapGremlins(in); m->gobble(in); } bool done = false; int count = 0; while (!done) { if (m->control_pressed) { in.close(); return 1; } QualityScores current(in); m->gobble(in); if (current.getName() != "") { int num = 1; if ((namefile != "") || (countfile != "")) { //make sure this sequence is in the namefile, else error map::iterator it = nameMap.find(current.getName()); if (it == nameMap.end()) { m->mothurOut("[ERROR]: " + current.getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); m->control_pressed = true; } else { num = it->second; } } vector thisScores = current.getQualityScores(); //resize to num of positions setting number of seqs with that size to 1 if (position.size() < thisScores.size()) { position.resize(thisScores.size(), 0); } if (averageQ.size() < thisScores.size()) { averageQ.resize(thisScores.size(), 0); } if (scores.size() < thisScores.size()) { scores.resize(thisScores.size()); for (int i = 0; i < scores.size(); i++) { scores[i].resize(41, 0); } } //increase counts of number of seqs with this position //average is really the total, we will average in execute for (int i = 0; i < thisScores.size(); i++) { position[i] += num; averageQ[i] += (thisScores[i] * num); //weighting for namesfile if (thisScores[i] > 40) { m->mothurOut("[ERROR]: " + current.getName() + " has a quality scores of " + toString(thisScores[i]) + ", expecting values to be less than 40."); m->mothurOutEndLine(); m->control_pressed = true; } else { scores[i][thisScores[i]] += num; } } count += num; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = in.tellg(); if ((pos == -1) || (pos >= filePos.end)) { break; } #else if (in.eof()) { break; } #endif } in.close(); return count; } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "driverCreateSummary"); exit(1); } } /**************************************************************************************************/ int SummaryQualCommand::createProcessesCreateSummary(vector& position, vector& averageQ, vector< vector >& scores, string filename) { try { int process = 1; int numSeqs = 0; processIDS.clear(); bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ numSeqs = driverCreateSummary(position, averageQ, scores, qualfile, lines[process]); //pass numSeqs to parent ofstream out; string tempFile = qualfile + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << numSeqs << endl; out << position.size() << endl; for (int k = 0; k < position.size(); k++) { out << position[k] << '\t'; } out << endl; for (int k = 0; k < averageQ.size(); k++) { out << averageQ[k] << '\t'; } out << endl; for (int k = 0; k < scores.size(); k++) { for (int j = 0; j < 41; j++) { out << scores[k][j] << '\t'; } out << endl; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(qualfile + (toString(processIDS[i]) + ".num.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(qualfile + (toString(processIDS[i]) + ".num.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide lines.clear(); vector positions = m->divideFile(qualfile, processors); for (int i = 0; i < (positions.size()-1); i++) { lines.push_back(linePair(positions[i], positions[(i+1)])); } numSeqs = 0; processIDS.resize(0); process = 1; position.clear(); averageQ.clear(); scores.clear(); //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ numSeqs = driverCreateSummary(position, averageQ, scores, qualfile, lines[process]); //pass numSeqs to parent ofstream out; string tempFile = qualfile + m->mothurGetpid(process) + ".num.temp"; m->openOutputFile(tempFile, out); out << numSeqs << endl; out << position.size() << endl; for (int k = 0; k < position.size(); k++) { out << position[k] << '\t'; } out << endl; for (int k = 0; k < averageQ.size(); k++) { out << averageQ[k] << '\t'; } out << endl; for (int k = 0; k < scores.size(); k++) { for (int j = 0; j < 41; j++) { out << scores[k][j] << '\t'; } out << endl; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //do your part numSeqs = driverCreateSummary(position, averageQ, scores, qualfile, lines[0]); //force parent to wait until all the processes are done for (int i=0;iopenInputFile(tempFilename, in); int temp, tempNum; in >> tempNum; m->gobble(in); numSeqs += tempNum; in >> tempNum; m->gobble(in); if (position.size() < tempNum) { position.resize(tempNum, 0); } if (averageQ.size() < tempNum) { averageQ.resize(tempNum, 0); } if (scores.size() < tempNum) { scores.resize(tempNum); for (int i = 0; i < scores.size(); i++) { scores[i].resize(41, 0); } } for (int k = 0; k < tempNum; k++) { in >> temp; position[k] += temp; } m->gobble(in); for (int k = 0; k < tempNum; k++) { in >> temp; averageQ[k] += temp; } m->gobble(in); for (int k = 0; k < tempNum; k++) { for (int j = 0; j < 41; j++) { in >> temp; scores[k][j] += temp; m->gobble(in); } } in.close(); m->mothurRemove(tempFilename); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the seqSumQualData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to pass results vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors]; HANDLE hThreadArray[processors]; bool hasNameMap = false; if ((namefile !="") || (countfile != "")) { hasNameMap = true; } //Create processor worker threads. for( int i=0; inumSeqs; if (pDataArray[i]->count != pDataArray[i]->end) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } int tempNum = pDataArray[i]->position.size(); if (position.size() < tempNum) { position.resize(tempNum, 0); } if (averageQ.size() < tempNum) { averageQ.resize(tempNum, 0); } if (scores.size() < tempNum) { scores.resize(tempNum); for (int i = 0; i < scores.size(); i++) { scores[i].resize(41, 0); } } for (int k = 0; k < tempNum; k++) { position[k] += pDataArray[i]->position[k]; } for (int k = 0; k < tempNum; k++) { averageQ[k] += pDataArray[i]->averageQ[k]; } for (int k = 0; k < tempNum; k++) { for (int j = 0; j < 41; j++) { scores[k][j] += pDataArray[i]->scores[k][j]; } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif return numSeqs; } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "createProcessesCreateSummary"); exit(1); } } /**************************************************************************************************/ int SummaryQualCommand::printQual(string sumFile, vector& position, vector& averageQ, vector< vector >& scores) { try { ofstream out; m->openOutputFile(sumFile, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); outputNames.push_back(sumFile); outputTypes["summary"].push_back(sumFile); //print headings out << "Position\tnumSeqs\tAverageQ"; for (int i = 0; i < 41; i++) { out << '\t' << "q" << i; } out << endl; for (int i = 0; i < position.size(); i++) { if (m->control_pressed) { out.close(); return 0; } double average = averageQ[i] / (float) position[i]; out << i << '\t' << position[i] << '\t' << average; for (int j = 0; j < 41; j++) { out << '\t' << scores[i][j]; } out << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "SummaryQualCommand", "printQual"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/summaryqualcommand.h000066400000000000000000000122551255543666200223340ustar00rootroot00000000000000#ifndef SUMMARYQUALCOMMAND_H #define SUMMARYQUALCOMMAND_H /* * summaryqualcommand.h * Mothur * * Created by westcott on 11/28/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "qualityscores.h" /**************************************************************************************************/ class SummaryQualCommand : public Command { public: SummaryQualCommand(string); SummaryQualCommand(); ~SummaryQualCommand(){} vector setParameters(); string getCommandName() { return "summary.qual"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Summary.qual"; } string getDescription() { return "summarize the quality of a set of sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; string qualfile, outputDir, namefile, countfile; vector outputNames; map nameMap; int processors; vector lines; vector processIDS; int createProcessesCreateSummary(vector&, vector&, vector< vector >&, string); int driverCreateSummary(vector&, vector&, vector< vector >&, string, linePair); int printQual(string, vector&, vector&, vector< vector >&); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct seqSumQualData { vector position; vector averageQ; vector< vector > scores; string filename; unsigned long long start; unsigned long long end; int count, numSeqs; MothurOut* m; bool hasNameMap; map nameMap; ~seqSumQualData(){} seqSumQualData(string f, MothurOut* mout, unsigned long long st, unsigned long long en, bool n, map nam) { filename = f; m = mout; start = st; end = en; hasNameMap = n; nameMap = nam; count = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MySeqSumQualThreadFunction(LPVOID lpParam){ seqSumQualData* pDataArray; pDataArray = (seqSumQualData*)lpParam; try { ifstream in; pDataArray->m->openInputFile(pDataArray->filename, in); //print header if you are process 0 if ((pDataArray->start == 0) || (pDataArray->start == 1)) { in.seekg(0); pDataArray->m->zapGremlins(in); }else { //this accounts for the difference in line endings. in.seekg(pDataArray->start-1); pDataArray->m->gobble(in); } pDataArray->count = 0; pDataArray->numSeqs = 0; for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { in.close(); pDataArray->count = 1; return 1; } QualityScores current(in); pDataArray->m->gobble(in); if (current.getName() != "") { int num = 1; if (pDataArray->hasNameMap) { //make sure this sequence is in the namefile, else error map::iterator it = pDataArray->nameMap.find(current.getName()); if (it == pDataArray->nameMap.end()) { pDataArray->m->mothurOut("[ERROR]: " + current.getName() + " is not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); pDataArray->m->control_pressed = true; } else { num = it->second; } } vector thisScores = current.getQualityScores(); //resize to num of positions setting number of seqs with that size to 1 if (pDataArray->position.size() < thisScores.size()) { pDataArray->position.resize(thisScores.size(), 0); } if (pDataArray->averageQ.size() < thisScores.size()) { pDataArray->averageQ.resize(thisScores.size(), 0); } if (pDataArray->scores.size() < thisScores.size()) { pDataArray->scores.resize(thisScores.size()); for (int i = 0; i < pDataArray->scores.size(); i++) { pDataArray->scores.at(i).resize(41, 0); } } //increase counts of number of seqs with this position //average is really the total, we will average in execute for (int i = 0; i < thisScores.size(); i++) { pDataArray->position.at(i) += num; pDataArray->averageQ.at(i) += (thisScores[i] * num); //weighting for namesfile if (thisScores[i] > 40) { pDataArray->m->mothurOut("[ERROR]: " + current.getName() + " has a quality scores of " + toString(thisScores[i]) + ", expecting values to be less than 40."); pDataArray->m->mothurOutEndLine(); pDataArray->m->control_pressed = true; } else { pDataArray->scores.at(i)[thisScores[i]] += num; } } pDataArray->numSeqs += num; pDataArray->count++; } } in.close(); return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "SummaryQualCommand", "MySeqSumQualThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/summarysharedcommand.cpp000066400000000000000000001414441255543666200231760ustar00rootroot00000000000000/* * summarysharedcommand.cpp * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "summarysharedcommand.h" #include "subsample.h" //********************************************************************************************************************** vector SummarySharedCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "none", "none", "none","summary",false,true,true); parameters.push_back(pshared); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter psubsample("subsample", "String", "", "", "", "", "","phylip",false,false); parameters.push_back(psubsample); CommandParameter pdistance("distance", "Boolean", "", "F", "", "", "","phylip",false,false); parameters.push_back(pdistance); CommandParameter pcalc("calc", "Multiple", "sharedchao-sharedsobs-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan-kstest-whittaker-sharednseqs-ochiai-anderberg-kulczynski-kulczynskicody-lennon-morisitahorn-braycurtis-odum-canberra-structeuclidean-structchord-hellinger-manhattan-structpearson-soergel-spearman-structkulczynski-speciesprofile-structchi2-hamming-gower-memchi2-memchord-memeuclidean-mempearson-jsd-rjsd", "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan", "", "", "","",true,false,true); parameters.push_back(pcalc); CommandParameter poutput("output", "Multiple", "lt-square", "lt", "", "", "","",false,false); parameters.push_back(poutput); CommandParameter pall("all", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pall); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SummarySharedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SummarySharedCommand::getHelpString(){ try { string helpString = ""; ValidCalculators validCalculator; helpString += "The summary.shared command parameters are shared, label, calc, distance, processors, subsample, iters and all. shared is required if there is no current sharedfile.\n"; helpString += "The summary.shared command should be in the following format: \n"; helpString += "summary.shared(label=yourLabel, calc=yourEstimators, groups=yourGroups).\n"; helpString += "Example summary.shared(label=unique-.01-.03, groups=B-C, calc=sharedchao-sharedace-jabund-sorensonabund-jclass-sorclass-jest-sorest-thetayc-thetan).\n"; helpString += validCalculator.printCalc("sharedsummary"); helpString += "The iters parameter allows you to choose the number of times you would like to run the subsample.\n"; helpString += "The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group.\n"; helpString += "The output parameter allows you to specify format of your distance matrix. Options are lt, and square. The default is lt.\n"; helpString += "The default value for calc is sharedsobs-sharedchao-sharedace-jabund-sorensonabund-jclass-sorclass-jest-sorest-thetayc-thetan\n"; helpString += "The default value for groups is all the groups in your groupfile.\n"; helpString += "The distance parameter allows you to indicate you would like a distance file created for each calculator for each label, default=f.\n"; helpString += "The label parameter is used to analyze specific labels in your input.\n"; helpString += "The all parameter is used to specify if you want the estimate of all your groups together. This estimate can only be made for sharedsobs and sharedchao calculators. The default is false.\n"; helpString += "If you use sharedchao and run into memory issues, set all to false. \n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. You must enter at least 2 valid groups.\n"; helpString += "Note: No spaces between parameter labels (i.e. label), '=' and parameters (i.e.yourLabel).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SummarySharedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SummarySharedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],summary-[filename],[tag],summary"; } else if (type == "phylip") { pattern = "[filename],[calc],[distance],[outputtag],[tag2],dist"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SummarySharedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SummarySharedCommand::SummarySharedCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; outputTypes["phylip"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SummarySharedCommand", "SummarySharedCommand"); exit(1); } } //********************************************************************************************************************** SummarySharedCommand::SummarySharedCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["summary"] = tempOutNames; outputTypes["phylip"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } } //get shared file sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current sharedfile and the shared parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setSharedFile(sharedfile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(sharedfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan"; } else { if (calc == "default") { calc = "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } string temp = validParameter.validFile(parameters, "all", false); if (temp == "not found") { temp = "false"; } all = m->isTrue(temp); temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); output = validParameter.validFile(parameters, "output", false); if(output == "not found"){ output = "lt"; } else { createPhylip = true; } if ((output != "lt") && (output != "square")) { m->mothurOut(output + " is not a valid output form. Options are lt and square. I will use lt."); m->mothurOutEndLine(); output = "lt"; } temp = validParameter.validFile(parameters, "subsample", false); if (temp == "not found") { temp = "F"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); subsample = true; } else { if (m->isTrue(temp)) { subsample = true; subsampleSize = -1; } //we will set it to smallest group later else { subsample = false; } } if (subsample == false) { iters = 0; } temp = validParameter.validFile(parameters, "distance", false); if (temp == "not found") { temp = "false"; } createPhylip = m->isTrue(temp); if (subsample) { createPhylip = true; } temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if (abort == false) { ValidCalculators validCalculator; int i; for (i=0; ierrorOut(e, "SummarySharedCommand", "SummarySharedCommand"); exit(1); } } //********************************************************************************************************************** int SummarySharedCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } ofstream outputFileHandle, outAll; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); string outputFileName = getOutputFileName("summary",variables); //if the users entered no valid calculators don't execute command if (sumCalculators.size() == 0) { return 0; } //check if any calcs can do multiples else{ if (all){ for (int i = 0; i < sumCalculators.size(); i++) { if (sumCalculators[i]->getMultiple() == true) { mult = true; } } } } input = new InputData(sharedfile, "sharedfile"); lookup = input->getSharedRAbundVectors(); string lastLabel = lookup[0]->getLabel(); /******************************************************/ //output headings for files /******************************************************/ //output estimator names as column headers m->openOutputFile(outputFileName, outputFileHandle); outputFileHandle << "label" <<'\t' << "comparison" << '\t'; for(int i=0;igetName(); if (sumCalculators[i]->getCols() == 3) { outputFileHandle << "\t" << sumCalculators[i]->getName() << "_lci\t" << sumCalculators[i]->getName() << "_hci"; } } outputFileHandle << endl; outputFileHandle.close(); //create file and put column headers for multiple groups file variables["[tag]"]= "multiple"; string outAllFileName = getOutputFileName("summary",variables); if (mult == true) { m->openOutputFile(outAllFileName, outAll); outputNames.push_back(outAllFileName); outAll << "label" <<'\t' << "comparison" << '\t'; for(int i=0;igetMultiple() == true) { outAll << '\t' << sumCalculators[i]->getName(); } } outAll << endl; outAll.close(); } if (lookup.size() < 2) { m->mothurOut("I cannot run the command without at least 2 valid groups."); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } //close files and clean up m->mothurRemove(outputFileName); if (mult == true) { m->mothurRemove(outAllFileName); } return 0; //if you only have 2 groups you don't need a .sharedmultiple file }else if ((lookup.size() == 2) && (mult == true)) { mult = false; m->mothurRemove(outAllFileName); outputNames.pop_back(); } if (m->control_pressed) { if (mult) { m->mothurRemove(outAllFileName); } m->mothurRemove(outputFileName); delete input; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for(int i=0;iclearGroups(); return 0; } /******************************************************/ if (subsample) { if (subsampleSize == -1) { //user has not set size, set size = smallest samples size subsampleSize = lookup[0]->getNumSeqs(); for (int i = 1; i < lookup.size(); i++) { int thisSize = lookup[i]->getNumSeqs(); if (thisSize < subsampleSize) { subsampleSize = thisSize; } } }else { m->clearGroups(); Groups.clear(); vector temp; for (int i = 0; i < lookup.size(); i++) { if (lookup[i]->getNumSeqs() < subsampleSize) { m->mothurOut(lookup[i]->getGroup() + " contains " + toString(lookup[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete lookup[i]; }else { Groups.push_back(lookup[i]->getGroup()); temp.push_back(lookup[i]); } } lookup = temp; m->setGroups(Groups); } if (lookup.size() < 2) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); m->control_pressed = true; delete input; return 0; } } /******************************************************/ //comparison breakup to be used by different processes later numGroups = lookup.size(); lines.resize(processors); for (int i = 0; i < processors; i++) { lines[i].start = int (sqrt(float(i)/float(processors)) * numGroups); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numGroups); } /******************************************************/ //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { if (mult) { m->mothurRemove(outAllFileName); } m->mothurRemove(outputFileName); delete input; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for(int i=0;iclearGroups(); return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, outputFileName, outAllFileName); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, outputFileName, outAllFileName); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //get next line to process //prevent memory leak for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { if (mult) { m->mothurRemove(outAllFileName); } m->mothurRemove(outputFileName); delete input; for(int i=0;iclearGroups(); return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup, outputFileName, outAllFileName); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //reset groups parameter m->clearGroups(); for(int i=0;icontrol_pressed) { m->mothurRemove(outAllFileName); m->mothurRemove(outputFileName); return 0; } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(outputFileName); m->mothurOutEndLine(); if (mult) { m->mothurOut(outAllFileName); m->mothurOutEndLine(); outputTypes["summary"].push_back(outAllFileName); } for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } outputTypes["summary"].push_back(outputFileName); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SummarySharedCommand", "execute"); exit(1); } } /***********************************************************/ int SummarySharedCommand::printSims(ostream& out, vector< vector >& simMatrix) { try { out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); //output num seqs out << simMatrix.size() << endl; if (output == "lt") { for (int b = 0; b < simMatrix.size(); b++) { out << lookup[b]->getGroup(); for (int n = 0; n < b; n++) { if (m->control_pressed) { return 0; } out << '\t' << simMatrix[b][n]; } out << endl; } }else{ for (int b = 0; b < simMatrix.size(); m++) { out << lookup[b]->getGroup(); for (int n = 0; n < simMatrix[b].size(); n++) { if (m->control_pressed) { return 0; } out << '\t' << simMatrix[b][n]; } out << endl; } } return 0; } catch(exception& e) { m->errorOut(e, "SummarySharedCommand", "printSims"); exit(1); } } /***********************************************************/ int SummarySharedCommand::process(vector thisLookup, string sumFileName, string sumAllFileName) { try { vector< vector< vector > > calcDistsTotals; //each iter, one for each calc, then each groupCombos dists. this will be used to make .dist files vector< vector > calcDists; calcDists.resize(sumCalculators.size()); for (int thisIter = 0; thisIter < iters+1; thisIter++) { vector thisItersLookup = thisLookup; if (subsample && (thisIter != 0)) { //we want the summary results for the whole dataset, then the subsampling SubSample sample; vector tempLabels; //dont need since we arent printing the sampled sharedRabunds //make copy of lookup so we don't get access violations vector newLookup; for (int k = 0; k < thisItersLookup.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisItersLookup[k]->getLabel()); temp->setGroup(thisItersLookup[k]->getGroup()); newLookup.push_back(temp); } //for each bin for (int k = 0; k < thisItersLookup[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } for (int j = 0; j < thisItersLookup.size(); j++) { newLookup[j]->push_back(thisItersLookup[j]->getAbundance(k), thisItersLookup[j]->getGroup()); } } tempLabels = sample.getSample(newLookup, subsampleSize); thisItersLookup = newLookup; } if(processors == 1){ driver(thisItersLookup, 0, numGroups, sumFileName+".temp", sumAllFileName+".temp", calcDists); m->appendFiles((sumFileName + ".temp"), sumFileName); m->mothurRemove((sumFileName + ".temp")); if (mult) { m->appendFiles((sumAllFileName + ".temp"), sumAllFileName); m->mothurRemove((sumAllFileName + ".temp")); } }else{ int process = 1; vector processIDS; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ driver(thisItersLookup, lines[process].start, lines[process].end, sumFileName + m->mothurGetpid(process) + ".temp", sumAllFileName + m->mothurGetpid(process) + ".temp", calcDists); //only do this if you want a distance file if (createPhylip) { string tempdistFileName = m->getRootName(m->getSimpleName(sumFileName)) + m->mothurGetpid(process) + ".dist"; ofstream outtemp; m->openOutputFile(tempdistFileName, outtemp); for (int i = 0; i < calcDists.size(); i++) { outtemp << calcDists[i].size() << endl; for (int j = 0; j < calcDists[i].size(); j++) { outtemp << calcDists[i][j].seq1 << '\t' << calcDists[i][j].seq2 << '\t' << calcDists[i][j].dist << endl; } } outtemp.close(); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(sumFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(sumAllFileName + (toString(processIDS[i]) + ".temp")); if (createPhylip) { m->mothurRemove(m->getRootName(m->getSimpleName(sumFileName)) + (toString(processIDS[i]) + ".dist")); } } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(sumFileName + (toString(processIDS[i]) + ".temp"));m->mothurRemove(sumAllFileName + (toString(processIDS[i]) + ".temp"));if (createPhylip) { m->mothurRemove(m->getRootName(m->getSimpleName(sumFileName)) + (toString(processIDS[i]) + ".dist")); }}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); /******************************************************/ //comparison breakup to be used by different processes later lines.clear(); numGroups = thisLookup.size(); lines.resize(processors); for (int i = 0; i < processors; i++) { lines[i].start = int (sqrt(float(i)/float(processors)) * numGroups); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numGroups); } /******************************************************/ calcDists.clear(); calcDists.resize(sumCalculators.size()); processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ driver(thisItersLookup, lines[process].start, lines[process].end, sumFileName + m->mothurGetpid(process) + ".temp", sumAllFileName + m->mothurGetpid(process) + ".temp", calcDists); //only do this if you want a distance file if (createPhylip) { string tempdistFileName = m->getRootName(m->getSimpleName(sumFileName)) + m->mothurGetpid(process) + ".dist"; ofstream outtemp; m->openOutputFile(tempdistFileName, outtemp); for (int i = 0; i < calcDists.size(); i++) { outtemp << calcDists[i].size() << endl; for (int j = 0; j < calcDists[i].size(); j++) { outtemp << calcDists[i][j].seq1 << '\t' << calcDists[i][j].seq2 << '\t' << calcDists[i][j].dist << endl; } } outtemp.close(); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //parent do your part driver(thisItersLookup, lines[0].start, lines[0].end, sumFileName + m->mothurGetpid(process) + ".temp", sumAllFileName + m->mothurGetpid(process) + ".temp", calcDists); m->appendFiles((sumFileName + m->mothurGetpid(process) + ".temp"), sumFileName); m->mothurRemove((sumFileName + m->mothurGetpid(process) + ".temp")); if (mult) { m->appendFiles((sumAllFileName + m->mothurGetpid(process) + ".temp"), sumAllFileName); } //force parent to wait until all the processes are done for (int i = 0; i < processIDS.size(); i++) { int temp = processIDS[i]; wait(&temp); } for (int i = 0; i < processIDS.size(); i++) { m->appendFiles((sumFileName + toString(processIDS[i]) + ".temp"), sumFileName); m->mothurRemove((sumFileName + toString(processIDS[i]) + ".temp")); if (mult) { m->mothurRemove((sumAllFileName + toString(processIDS[i]) + ".temp")); } if (createPhylip) { string tempdistFileName = m->getRootName(m->getSimpleName(sumFileName)) + toString(processIDS[i]) + ".dist"; ifstream intemp; m->openInputFile(tempdistFileName, intemp); for (int k = 0; k < calcDists.size(); k++) { int size = 0; intemp >> size; m->gobble(intemp); for (int j = 0; j < size; j++) { int seq1 = 0; int seq2 = 0; float dist = 1.0; intemp >> seq1 >> seq2 >> dist; m->gobble(intemp); seqDist tempDist(seq1, seq2, dist); calcDists[k].push_back(tempDist); } } intemp.close(); m->mothurRemove(tempdistFileName); } } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the summarySharedData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to pass results vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; i newLookup; for (int k = 0; k < thisLookup.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisLookup[k]->getLabel()); temp->setGroup(thisLookup[k]->getGroup()); newLookup.push_back(temp); } //for each bin for (int k = 0; k < thisItersLookup[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } for (int j = 0; j < thisItersLookup.size(); j++) { newLookup[j]->push_back(thisItersLookup[j]->getAbundance(k), thisItersLookup[j]->getGroup()); } } // Allocate memory for thread data. summarySharedData* tempSum = new summarySharedData((sumFileName+toString(i)+".temp"), m, lines[i].start, lines[i].end, Estimators, newLookup); pDataArray.push_back(tempSum); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MySummarySharedThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //parent do your part driver(thisItersLookup, lines[0].start, lines[0].end, sumFileName +"0.temp", sumAllFileName + "0.temp", calcDists); m->appendFiles((sumFileName + "0.temp"), sumFileName); m->mothurRemove((sumFileName + "0.temp")); if (mult) { m->appendFiles((sumAllFileName + "0.temp"), sumAllFileName); } //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ if (pDataArray[i]->count != (pDataArray[i]->end-pDataArray[i]->start)) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end-pDataArray[i]->start) + " groups assigned to it, quitting. \n"); m->control_pressed = true; } m->appendFiles((sumFileName + toString(processIDS[i]) + ".temp"), sumFileName); m->mothurRemove((sumFileName + toString(processIDS[i]) + ".temp")); for (int j = 0; j < pDataArray[i]->thisLookup.size(); j++) { delete pDataArray[i]->thisLookup[j]; } if (createPhylip) { for (int k = 0; k < calcDists.size(); k++) { int size = pDataArray[i]->calcDists[k].size(); for (int j = 0; j < size; j++) { calcDists[k].push_back(pDataArray[i]->calcDists[k][j]); } } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif } if (subsample && (thisIter != 0)) { //we want the summary results for the whole dataset, then the subsampling calcDistsTotals.push_back(calcDists); //clean up memory for (int i = 0; i < thisItersLookup.size(); i++) { delete thisItersLookup[i]; } thisItersLookup.clear(); }else { if (createPhylip) { for (int i = 0; i < calcDists.size(); i++) { if (m->control_pressed) { break; } //initialize matrix vector< vector > matrix; //square matrix to represent the distance matrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { matrix[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcDists[i].size(); j++) { int row = calcDists[i][j].seq1; int column = calcDists[i][j].seq2; double dist = calcDists[i][j].dist; matrix[row][column] = dist; matrix[column][row] = dist; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[calc]"] = sumCalculators[i]->getName(); variables["[distance]"] = thisLookup[0]->getLabel(); variables["[outputtag]"] = output; variables["[tag2]"] = ""; string distFileName = getOutputFileName("phylip",variables); outputNames.push_back(distFileName); outputTypes["phylip"].push_back(distFileName); ofstream outDist; m->openOutputFile(distFileName, outDist); outDist.setf(ios::fixed, ios::floatfield); outDist.setf(ios::showpoint); printSims(outDist, matrix); outDist.close(); } } } for (int i = 0; i < calcDists.size(); i++) { calcDists[i].clear(); } } if (iters != 0) { //we need to find the average distance and standard deviation for each groups distance vector< vector > calcAverages = m->getAverages(calcDistsTotals); //find standard deviation vector< vector > stdDev = m->getStandardDeviation(calcDistsTotals, calcAverages); //print results for (int i = 0; i < calcDists.size(); i++) { vector< vector > matrix; //square matrix to represent the distance matrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { matrix[k].resize(thisLookup.size(), 0.0); } vector< vector > stdmatrix; //square matrix to represent the stdDev stdmatrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { stdmatrix[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcAverages[i].size(); j++) { int row = calcAverages[i][j].seq1; int column = calcAverages[i][j].seq2; float dist = calcAverages[i][j].dist; float stdDist = stdDev[i][j].dist; matrix[row][column] = dist; matrix[column][row] = dist; stdmatrix[row][column] = stdDist; stdmatrix[column][row] = stdDist; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(sharedfile)); variables["[calc]"] = sumCalculators[i]->getName(); variables["[distance]"] = thisLookup[0]->getLabel(); variables["[outputtag]"] = output; variables["[tag2]"] = "ave"; string distFileName = getOutputFileName("phylip",variables); outputNames.push_back(distFileName); outputTypes["phylip"].push_back(distFileName); ofstream outAve; m->openOutputFile(distFileName, outAve); outAve.setf(ios::fixed, ios::floatfield); outAve.setf(ios::showpoint); printSims(outAve, matrix); outAve.close(); variables["[tag2]"] = "std"; distFileName = getOutputFileName("phylip",variables); outputNames.push_back(distFileName); outputTypes["phylip"].push_back(distFileName); ofstream outSTD; m->openOutputFile(distFileName, outSTD); outSTD.setf(ios::fixed, ios::floatfield); outSTD.setf(ios::showpoint); printSims(outSTD, stdmatrix); outSTD.close(); } } return 0; } catch(exception& e) { m->errorOut(e, "SummarySharedCommand", "process"); exit(1); } } /**************************************************************************************************/ int SummarySharedCommand::driver(vector thisLookup, int start, int end, string sumFile, string sumAllFile, vector< vector >& calcDists) { try { //loop through calculators and add to file all for all calcs that can do mutiple groups if (mult == true) { ofstream outAll; m->openOutputFile(sumAllFile, outAll); //output label outAll << thisLookup[0]->getLabel() << '\t'; //output groups names string outNames = ""; for (int j = 0; j < thisLookup.size(); j++) { outNames += thisLookup[j]->getGroup() + "-"; } outNames = outNames.substr(0, outNames.length()-1); //rip off extra '-'; outAll << outNames << '\t'; for(int i=0;igetMultiple() == true) { sumCalculators[i]->getValues(thisLookup); if (m->control_pressed) { outAll.close(); return 1; } outAll << '\t'; sumCalculators[i]->print(outAll); } } outAll << endl; outAll.close(); } ofstream outputFileHandle; m->openOutputFile(sumFile, outputFileHandle); vector subset; for (int k = start; k < end; k++) { // pass cdd each set of groups to compare for (int l = 0; l < k; l++) { outputFileHandle << thisLookup[0]->getLabel() << '\t'; subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabunds subset.push_back(thisLookup[k]); subset.push_back(thisLookup[l]); //sort groups to be alphanumeric if (thisLookup[k]->getGroup() > thisLookup[l]->getGroup()) { outputFileHandle << (thisLookup[l]->getGroup() +'\t' + thisLookup[k]->getGroup()) << '\t'; //print out groups }else{ outputFileHandle << (thisLookup[k]->getGroup() +'\t' + thisLookup[l]->getGroup()) << '\t'; //print out groups } for(int i=0;igetNeedsAll()) { //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < thisLookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(thisLookup[w]); } } } vector tempdata = sumCalculators[i]->getValues(subset); //saves the calculator outputs if (m->control_pressed) { outputFileHandle.close(); return 1; } outputFileHandle << '\t'; sumCalculators[i]->print(outputFileHandle); seqDist temp(l, k, tempdata[0]); calcDists[i].push_back(temp); } outputFileHandle << endl; } } outputFileHandle.close(); return 0; } catch(exception& e) { m->errorOut(e, "SummarySharedCommand", "driver"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/commands/summarysharedcommand.h000066400000000000000000000274131255543666200226420ustar00rootroot00000000000000#ifndef SUMMARYSHAREDCOMMAND_H #define SUMMARYSHAREDCOMMAND_H /* * summarysharedcommand.h * Dotur * * Created by Sarah Westcott on 1/2/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "sharedrabundvector.h" #include "inputdata.h" #include "calculator.h" #include "validcalculator.h" #include "sharedsobscollectsummary.h" #include "sharedchao1.h" #include "sharedace.h" #include "sharednseqs.h" #include "sharedjabund.h" #include "sharedsorabund.h" #include "sharedjclass.h" #include "sharedsorclass.h" #include "sharedjest.h" #include "sharedsorest.h" #include "sharedthetayc.h" #include "sharedthetan.h" #include "sharedkstest.h" #include "whittaker.h" #include "sharedochiai.h" #include "sharedanderbergs.h" #include "sharedkulczynski.h" #include "sharedkulczynskicody.h" #include "sharedlennon.h" #include "sharedmorisitahorn.h" #include "sharedbraycurtis.h" #include "sharedjackknife.h" #include "whittaker.h" #include "odum.h" #include "canberra.h" #include "structeuclidean.h" #include "structchord.h" #include "hellinger.h" #include "manhattan.h" #include "structpearson.h" #include "soergel.h" #include "spearman.h" #include "structkulczynski.h" #include "structchi2.h" #include "speciesprofile.h" #include "hamming.h" #include "gower.h" #include "memchi2.h" #include "memchord.h" #include "memeuclidean.h" #include "mempearson.h" #include "sharedjsd.h" #include "sharedrjsd.h" class SummarySharedCommand : public Command { public: SummarySharedCommand(string); SummarySharedCommand(); ~SummarySharedCommand() {} vector setParameters(); string getCommandName() { return "summary.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Summary.shared"; } string getDescription() { return "generate a summary file containing calculator values for each line in the OTU data and for all possible comparisons between groups"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lines; vector sumCalculators; InputData* input; bool abort, allLines, mult, all, createPhylip, subsample; set labels; //holds labels to be used string label, calc, groups, sharedfile, output; vector Estimators, Groups, outputNames; vector lookup; string format, outputDir; int numGroups, processors, subsampleSize, iters; int process(vector, string, string); int driver(vector, int, int, string, string, vector< vector >&); int printSims(ostream&, vector< vector >&); }; /**************************************************************************************************/ //custom data structure for threads to use. //main process handling the calcs that can do more than 2 groups // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct summarySharedData { vector thisLookup; vector< vector > calcDists; vector Estimators; unsigned long long start; unsigned long long end; MothurOut* m; string sumFile; int count; summarySharedData(){} summarySharedData(string sf, MothurOut* mout, unsigned long long st, unsigned long long en, vector est, vector lu) { sumFile = sf; m = mout; start = st; end = en; Estimators = est; thisLookup = lu; count=0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MySummarySharedThreadFunction(LPVOID lpParam){ summarySharedData* pDataArray; pDataArray = (summarySharedData*)lpParam; try { vector sumCalculators; ValidCalculators validCalculator; for (int i=0; iEstimators.size(); i++) { if (validCalculator.isValidCalculator("sharedsummary", pDataArray->Estimators[i]) == true) { if (pDataArray->Estimators[i] == "sharedsobs") { sumCalculators.push_back(new SharedSobsCS()); }else if (pDataArray->Estimators[i] == "sharedchao") { sumCalculators.push_back(new SharedChao1()); }else if (pDataArray->Estimators[i] == "sharedace") { sumCalculators.push_back(new SharedAce()); }else if (pDataArray->Estimators[i] == "jabund") { sumCalculators.push_back(new JAbund()); }else if (pDataArray->Estimators[i] == "sorabund") { sumCalculators.push_back(new SorAbund()); }else if (pDataArray->Estimators[i] == "jclass") { sumCalculators.push_back(new Jclass()); }else if (pDataArray->Estimators[i] == "sorclass") { sumCalculators.push_back(new SorClass()); }else if (pDataArray->Estimators[i] == "jest") { sumCalculators.push_back(new Jest()); }else if (pDataArray->Estimators[i] == "sorest") { sumCalculators.push_back(new SorEst()); }else if (pDataArray->Estimators[i] == "thetayc") { sumCalculators.push_back(new ThetaYC()); }else if (pDataArray->Estimators[i] == "thetan") { sumCalculators.push_back(new ThetaN()); }else if (pDataArray->Estimators[i] == "kstest") { sumCalculators.push_back(new KSTest()); }else if (pDataArray->Estimators[i] == "sharednseqs") { sumCalculators.push_back(new SharedNSeqs()); }else if (pDataArray->Estimators[i] == "ochiai") { sumCalculators.push_back(new Ochiai()); }else if (pDataArray->Estimators[i] == "anderberg") { sumCalculators.push_back(new Anderberg()); }else if (pDataArray->Estimators[i] == "kulczynski") { sumCalculators.push_back(new Kulczynski()); }else if (pDataArray->Estimators[i] == "kulczynskicody") { sumCalculators.push_back(new KulczynskiCody()); }else if (pDataArray->Estimators[i] == "lennon") { sumCalculators.push_back(new Lennon()); }else if (pDataArray->Estimators[i] == "morisitahorn") { sumCalculators.push_back(new MorHorn()); }else if (pDataArray->Estimators[i] == "braycurtis") { sumCalculators.push_back(new BrayCurtis()); }else if (pDataArray->Estimators[i] == "whittaker") { sumCalculators.push_back(new Whittaker()); }else if (pDataArray->Estimators[i] == "odum") { sumCalculators.push_back(new Odum()); }else if (pDataArray->Estimators[i] == "canberra") { sumCalculators.push_back(new Canberra()); }else if (pDataArray->Estimators[i] == "structeuclidean") { sumCalculators.push_back(new StructEuclidean()); }else if (pDataArray->Estimators[i] == "structchord") { sumCalculators.push_back(new StructChord()); }else if (pDataArray->Estimators[i] == "hellinger") { sumCalculators.push_back(new Hellinger()); }else if (pDataArray->Estimators[i] == "manhattan") { sumCalculators.push_back(new Manhattan()); }else if (pDataArray->Estimators[i] == "structpearson") { sumCalculators.push_back(new StructPearson()); }else if (pDataArray->Estimators[i] == "soergel") { sumCalculators.push_back(new Soergel()); }else if (pDataArray->Estimators[i] == "spearman") { sumCalculators.push_back(new Spearman()); }else if (pDataArray->Estimators[i] == "structkulczynski") { sumCalculators.push_back(new StructKulczynski()); }else if (pDataArray->Estimators[i] == "speciesprofile") { sumCalculators.push_back(new SpeciesProfile()); }else if (pDataArray->Estimators[i] == "hamming") { sumCalculators.push_back(new Hamming()); }else if (pDataArray->Estimators[i] == "structchi2") { sumCalculators.push_back(new StructChi2()); }else if (pDataArray->Estimators[i] == "gower") { sumCalculators.push_back(new Gower()); }else if (pDataArray->Estimators[i] == "memchi2") { sumCalculators.push_back(new MemChi2()); }else if (pDataArray->Estimators[i] == "memchord") { sumCalculators.push_back(new MemChord()); }else if (pDataArray->Estimators[i] == "memeuclidean") { sumCalculators.push_back(new MemEuclidean()); }else if (pDataArray->Estimators[i] == "mempearson") { sumCalculators.push_back(new MemPearson()); }else if (pDataArray->Estimators[i] == "jsd") { sumCalculators.push_back(new JSD()); }else if (pDataArray->Estimators[i] == "rjsd") { sumCalculators.push_back(new RJSD()); } } } pDataArray->calcDists.resize(sumCalculators.size()); ofstream outputFileHandle; pDataArray->m->openOutputFile(pDataArray->sumFile, outputFileHandle); vector subset; for (int k = pDataArray->start; k < pDataArray->end; k++) { // pass cdd each set of groups to compare pDataArray->count++; for (int l = 0; l < k; l++) { outputFileHandle << pDataArray->thisLookup[0]->getLabel() << '\t'; subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabunds subset.push_back(pDataArray->thisLookup[k]); subset.push_back(pDataArray->thisLookup[l]); //sort groups to be alphanumeric if (pDataArray->thisLookup[k]->getGroup() > pDataArray->thisLookup[l]->getGroup()) { outputFileHandle << (pDataArray->thisLookup[l]->getGroup() +'\t' + pDataArray->thisLookup[k]->getGroup()) << '\t'; //print out groups }else{ outputFileHandle << (pDataArray->thisLookup[k]->getGroup() +'\t' + pDataArray->thisLookup[l]->getGroup()) << '\t'; //print out groups } for(int i=0;igetNeedsAll()) { //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < pDataArray->thisLookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(pDataArray->thisLookup[w]); } } } vector tempdata = sumCalculators[i]->getValues(subset); //saves the calculator outputs if (pDataArray->m->control_pressed) { for(int i=0;iprint(outputFileHandle); seqDist temp(l, k, tempdata[0]); pDataArray->calcDists[i].push_back(temp); } outputFileHandle << endl; } } outputFileHandle.close(); for(int i=0;im->errorOut(e, "SummarySharedCommand", "MySummarySharedThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/summarytaxcommand.cpp000066400000000000000000000327511255543666200225240ustar00rootroot00000000000000/* * summarytaxcommand.cpp * Mothur * * Created by westcott on 9/23/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "summarytaxcommand.h" #include "phylosummary.h" //********************************************************************************************************************** vector SummaryTaxCommand::setParameters(){ try { CommandParameter ptaxonomy("taxonomy", "InputTypes", "", "", "none", "none", "none","summary",false,true,true); parameters.push_back(ptaxonomy); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter preftaxonomy("reftaxonomy", "InputTypes", "", "", "none", "none", "none","",false,false); parameters.push_back(preftaxonomy); CommandParameter prelabund("relabund", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(prelabund); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SummaryTaxCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string SummaryTaxCommand::getHelpString(){ try { string helpString = ""; helpString += "The summary.tax command reads a taxonomy file and an optional name file, and summarizes the taxonomy information.\n"; helpString += "The summary.tax command parameters are taxonomy, count, group, name and relabund. taxonomy is required, unless you have a valid current taxonomy file.\n"; helpString += "The name parameter allows you to enter a name file associated with your taxonomy file. \n"; helpString += "The group parameter allows you add a group file so you can have the summary totals broken up by group.\n"; helpString += "The count parameter allows you add a count file so you can have the summary totals broken up by group.\n"; helpString += "The reftaxonomy parameter allows you give the name of the reference taxonomy file used when you classified your sequences. It is not required, but providing it will keep the rankIDs in the summary file static.\n"; helpString += "The relabund parameter allows you to indicate you want the summary file values to be relative abundances rather than raw abundances. Default=F. \n"; helpString += "The summary.tax command should be in the following format: \n"; helpString += "summary.tax(taxonomy=yourTaxonomyFile) \n"; helpString += "Note: No spaces between parameter labels (i.e. taxonomy), '=' and parameters (i.e.yourTaxonomyFile).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SummaryTaxCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string SummaryTaxCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "summary") { pattern = "[filename],tax.summary"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "SummaryTaxCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** SummaryTaxCommand::SummaryTaxCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["summary"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "SummaryTaxCommand", "SummaryTaxCommand"); exit(1); } } //*************************************************************************************************************** SummaryTaxCommand::SummaryTaxCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("taxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["taxonomy"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("reftaxonomy"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["reftaxonomy"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //initialize outputTypes vector tempOutNames; outputTypes["summary"] = tempOutNames; //check for required parameters taxfile = validParameter.validFile(parameters, "taxonomy", true); if (taxfile == "not open") { abort = true; } else if (taxfile == "not found") { taxfile = m->getTaxonomyFile(); if (taxfile != "") { m->mothurOut("Using " + taxfile + " as input file for the taxonomy parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current taxonomy file and the taxonomy parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setTaxonomyFile(taxfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { groupfile = ""; abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } refTaxonomy = validParameter.validFile(parameters, "reftaxonomy", true); if (refTaxonomy == "not found") { refTaxonomy = ""; m->mothurOut("reftaxonomy is not required, but if given will keep the rankIDs in the summary file static."); m->mothurOutEndLine(); } else if (refTaxonomy == "not open") { refTaxonomy = ""; abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(taxfile); //if user entered a file with a path then preserve it } string temp = validParameter.validFile(parameters, "relabund", false); if (temp == "not found"){ temp = "false"; } relabund = m->isTrue(temp); if (countfile == "") { if (namefile == "") { vector files; files.push_back(taxfile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "SummaryTaxCommand", "SummaryTaxCommand"); exit(1); } } //*************************************************************************************************************** int SummaryTaxCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } int start = time(NULL); GroupMap* groupMap = NULL; CountTable* ct = NULL; if (groupfile != "") { groupMap = new GroupMap(groupfile); groupMap->readMap(); }else if (countfile != "") { ct = new CountTable(); ct->readTable(countfile, true, false); } PhyloSummary* taxaSum; if (countfile != "") { if (refTaxonomy != "") { taxaSum = new PhyloSummary(refTaxonomy, ct, relabund); } else { taxaSum = new PhyloSummary(ct, relabund); } }else { if (refTaxonomy != "") { taxaSum = new PhyloSummary(refTaxonomy, groupMap, relabund); } else { taxaSum = new PhyloSummary(groupMap, relabund); } } if (m->control_pressed) { if (groupMap != NULL) { delete groupMap; } if (ct != NULL) { delete ct; } delete taxaSum; return 0; } int numSeqs = 0; if ((namefile == "") || (countfile != "")) { numSeqs = taxaSum->summarize(taxfile); } else if (namefile != "") { map > nameMap; map >::iterator itNames; m->readNames(namefile, nameMap); if (m->control_pressed) { if (groupMap != NULL) { delete groupMap; } if (ct != NULL) { delete ct; } delete taxaSum; return 0; } ifstream in; m->openInputFile(taxfile, in); //read in users taxonomy file and add sequences to tree string name, taxon; while(!in.eof()){ if (m->control_pressed) { break; } in >> name >> taxon; m->gobble(in); itNames = nameMap.find(name); if (itNames == nameMap.end()) { m->mothurOut("[ERROR]: " + name + " is not in your name file please correct."); m->mothurOutEndLine(); exit(1); }else{ for (int i = 0; i < itNames->second.size(); i++) { numSeqs++; taxaSum->addSeqToTree(itNames->second[i], taxon); //add it as many times as there are identical seqs } itNames->second.clear(); nameMap.erase(itNames->first); } } in.close(); }else { numSeqs = taxaSum->summarize(taxfile); } if (m->control_pressed) { if (groupMap != NULL) { delete groupMap; } if (ct != NULL) { delete ct; } delete taxaSum; return 0; } //print summary file ofstream outTaxTree; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(taxfile)); string summaryFile = getOutputFileName("summary",variables); m->openOutputFile(summaryFile, outTaxTree); taxaSum->print(outTaxTree); outTaxTree.close(); delete taxaSum; if (groupMap != NULL) { delete groupMap; } if (ct != NULL) { delete ct; } if (m->control_pressed) { m->mothurRemove(summaryFile); return 0; } m->mothurOutEndLine(); m->mothurOut("It took " + toString(time(NULL) - start) + " secs to create the summary file for " + toString(numSeqs) + " sequences."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); m->mothurOut(summaryFile); m->mothurOutEndLine(); outputNames.push_back(summaryFile); outputTypes["summary"].push_back(summaryFile); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "SummaryTaxCommand", "execute"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/commands/summarytaxcommand.h000066400000000000000000000023011255543666200221550ustar00rootroot00000000000000#ifndef SUMMARYTAXCOMMAND_H #define SUMMARYTAXCOMMAND_H /* * summarytaxcommand.h * Mothur * * Created by westcott on 9/23/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "command.hpp" #include "counttable.h" /**************************************************************************************************/ class SummaryTaxCommand : public Command { public: SummaryTaxCommand(string); SummaryTaxCommand(); ~SummaryTaxCommand(){} vector setParameters(); string getCommandName() { return "summary.tax"; } string getCommandCategory() { return "Phylotype Analysis"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Summary.tax"; } string getDescription() { return "summarize the taxonomies of a set of sequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort, relabund; string taxfile, outputDir, namefile, groupfile, refTaxonomy, countfile; vector outputNames; map nameMap; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/systemcommand.cpp000066400000000000000000000073731255543666200216400ustar00rootroot00000000000000/* * systemcommand.cpp * Mothur * * Created by Sarah Westcott on 7/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "systemcommand.h" //********************************************************************************************************************** vector SystemCommand::setParameters(){ try { CommandParameter pcommand("command", "String", "", "", "", "", "","",false,false); parameters.push_back(pcommand); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "SystemCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** SystemCommand::SystemCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check for optional parameter and set defaults // ...at some point should added some additional type checking... string commandOption = validParameter.validFile(parameters, "command", false); if (commandOption == "not found") { commandOption = ""; } else { command = commandOption; } if ((option == "") && (commandOption == "")) { m->mothurOut("You must enter a command to run."); m->mothurOutEndLine(); abort = true; } else if (commandOption == "") { //check for outputdir and inputdir parameters int commaPos = option.find_first_of(','); //if there is a comma then grab string up to that pos if (commaPos != option.npos) { option = option.substr(0, commaPos); } command = option; } } } catch(exception& e) { m->errorOut(e, "SystemCommand", "SystemCommand"); exit(1); } } //********************************************************************************************************************** string SystemCommand::getHelpString(){ try { string helpString = ""; helpString += "The system command allows you to execute a system command from within mothur.\n"; helpString += "The system has no parameters.\n"; helpString += "The system command should be in the following format: system(yourCommand).\n"; helpString += "Example system(clear).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "SystemCommand", "help"); exit(1); } } //********************************************************************************************************************** int SystemCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } //if command contains a redirect don't add the redirect bool usedRedirect = false; if ((command.find('>')) == string::npos) { command += " > ./commandScreen.output 2>&1"; usedRedirect = true; } system(command.c_str()); if (usedRedirect) { ifstream in; string filename = "./commandScreen.output"; m->openInputFile(filename, in, "no error"); string output = ""; while(char c = in.get()){ if(in.eof()) { break; } else { output += c; } } in.close(); m->mothurOut(output); m->mothurOutEndLine(); m->mothurRemove(filename); } return 0; } catch(exception& e) { m->errorOut(e, "SystemCommand", "execute"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/systemcommand.h000066400000000000000000000016561255543666200213030ustar00rootroot00000000000000#ifndef SYSTEMCOMMAND_H #define SYSTEMCOMMAND_H /* * systemcommand.h * Mothur * * Created by Sarah Westcott on 7/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" class SystemCommand : public Command { public: SystemCommand(string); SystemCommand() { setParameters(); abort = true; calledHelp = true; } ~SystemCommand(){} vector setParameters(); string getCommandName() { return "system"; } string getCommandCategory() { return "General"; } string getHelpString(); string getOutputPattern(string){ return ""; } string getCitation() { return "http://www.mothur.org/wiki/System"; } string getDescription() { return "execute system commands from within mothur"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: string command; bool abort; vector outputNames; }; #endif mothur-1.36.1/source/commands/treegroupscommand.cpp000066400000000000000000001521651255543666200225130ustar00rootroot00000000000000/* * treegroupscommand.cpp * Mothur * * Created by Sarah Westcott on 4/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "treegroupscommand.h" #include "subsample.h" #include "consensus.h" //********************************************************************************************************************** vector TreeGroupCommand::setParameters(){ try { CommandParameter pshared("shared", "InputTypes", "", "", "PhylipColumnShared", "PhylipColumnShared", "none","tree",false,false,true); parameters.push_back(pshared); CommandParameter pphylip("phylip", "InputTypes", "", "", "PhylipColumnShared", "PhylipColumnShared", "none","tree",false,false); parameters.push_back(pphylip); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "ColumnName","",false,false); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount", "none", "countcolumn","",false,false); parameters.push_back(pcount); CommandParameter pcolumn("column", "InputTypes", "", "", "PhylipColumnShared", "PhylipColumnShared", "ColumnName-countcolumn","tree",false,false); parameters.push_back(pcolumn); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter psubsample("subsample", "String", "", "", "", "", "","",false,false); parameters.push_back(psubsample); CommandParameter pcutoff("cutoff", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pcutoff); CommandParameter pprecision("precision", "Number", "", "100", "", "", "","",false,false); parameters.push_back(pprecision); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter pcalc("calc", "Multiple", "sharedsobs-sharedchao-sharedace-jabund-sorabund-jclass-sorclass-jest-sorest-thetayc-thetan-kstest-sharednseqs-ochiai-anderberg-kulczynski-kulczynskicody-lennon-morisitahorn-braycurtis-whittaker-odum-canberra-structeuclidean-structchord-hellinger-manhattan-structpearson-soergel-spearman-structkulczynski-speciesprofile-hamming-structchi2-gower-memchi2-memchord-memeuclidean-mempearson-jsd-rjsd", "jclass-thetayc", "", "", "","",true,false,true); parameters.push_back(pcalc); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); //CommandParameter poutput("output", "Multiple", "lt-square", "lt", "", "", "",false,false); parameters.push_back(poutput); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string TreeGroupCommand::getHelpString(){ try { string helpString = ""; ValidCalculators validCalculator; helpString += "The tree.shared command creates a .tre to represent the similiarity between groups or sequences.\n"; helpString += "The tree.shared command parameters are shared, groups, calc, phylip, column, name, cutoff, precision, processors, subsample, iters and label.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included used.\n"; helpString += "The group names are separated by dashes. The label allow you to select what distance levels you would like trees created for, and are also separated by dashes.\n"; helpString += "The phylip or column parameter are required if you do not provide a sharedfile, and only one may be used. If you use a column file the name filename is required. \n"; helpString += "If you do not provide a cutoff value 10.00 is assumed. If you do not provide a precision value then 100 is assumed.\n"; helpString += "The tree.shared command should be in the following format: tree.shared(groups=yourGroups, calc=yourCalcs, label=yourLabels).\n"; helpString += "The iters parameter allows you to choose the number of times you would like to run the subsample.\n"; helpString += "The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group. The subsample parameter may only be used with a shared file.\n"; helpString += "Example tree.shared(groups=A-B-C, calc=jabund-sorabund).\n"; helpString += "The default value for groups is all the groups in your groupfile.\n"; helpString += "The default value for calc is jclass-thetayc.\n"; helpString += "The tree.shared command outputs a .tre file for each calculator you specify at each distance you choose.\n"; helpString += validCalculator.printCalc("treegroup"); helpString += "Or the tree.shared command can be in the following format: tree.shared(phylip=yourPhylipFile).\n"; helpString += "Example tree.shared(phylip=abrecovery.dist).\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string TreeGroupCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "tree") { pattern = "[filename],[calc],[distance],[tag],tre-[filename],tre"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** TreeGroupCommand::TreeGroupCommand(){ try { abort = true; calledHelp = true; setParameters(); //initialize outputTypes vector tempOutNames; outputTypes["tree"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "TreeGroupCommand"); exit(1); } } //********************************************************************************************************************** TreeGroupCommand::TreeGroupCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser. getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["tree"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("phylip"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["phylip"] = inputDir + it->second; } } it = parameters.find("column"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["column"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters phylipfile = validParameter.validFile(parameters, "phylip", true); if (phylipfile == "not open") { phylipfile = ""; abort = true; } else if (phylipfile == "not found") { phylipfile = ""; } else { inputfile = phylipfile; format = "phylip"; m->setPhylipFile(phylipfile); } columnfile = validParameter.validFile(parameters, "column", true); if (columnfile == "not open") { columnfile = ""; abort = true; } else if (columnfile == "not found") { columnfile = ""; } else { inputfile = columnfile; format = "column"; m->setColumnFile(columnfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { inputfile = sharedfile; format = "sharedfile"; m->setSharedFile(sharedfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((phylipfile == "") && (columnfile == "") && (sharedfile == "")) { //is there are current file available for either of these? //give priority to shared, then column, then phylip sharedfile = m->getSharedFile(); if (sharedfile != "") { inputfile = sharedfile; format = "sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { columnfile = m->getColumnFile(); if (columnfile != "") { inputfile = columnfile; format = "column"; m->mothurOut("Using " + columnfile + " as input file for the column parameter."); m->mothurOutEndLine(); } else { phylipfile = m->getPhylipFile(); if (phylipfile != "") { inputfile = phylipfile; format = "phylip"; m->mothurOut("Using " + phylipfile + " as input file for the phylip parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a shared, phylip or column file."); m->mothurOutEndLine(); abort = true; } } } } else if ((phylipfile != "") && (columnfile != "")) { m->mothurOut("When running the tree.shared command with a distance file you may not use both the column and the phylip parameters."); m->mothurOutEndLine(); abort = true; } if (columnfile != "") { if ((namefile == "") && (countfile == "")){ namefile = m->getNameFile(); if (namefile != "") { m->mothurOut("Using " + namefile + " as input file for the name parameter."); m->mothurOutEndLine(); } else { countfile = m->getCountTableFile(); if (countfile != "") { m->mothurOut("Using " + countfile + " as input file for the count parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You need to provide a namefile or countfile if you are going to use the column format."); m->mothurOutEndLine(); abort = true; } } } } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { calc = "jclass-thetayc"; } else { if (calc == "default") { calc = "jclass-thetayc"; } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } string temp; temp = validParameter.validFile(parameters, "precision", false); if (temp == "not found") { temp = "100"; } m->mothurConvert(temp, precision); temp = validParameter.validFile(parameters, "cutoff", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, cutoff); cutoff += (5 / (precision * 10.0)); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "iters", false); if (temp == "not found") { temp = "1000"; } m->mothurConvert(temp, iters); temp = validParameter.validFile(parameters, "subsample", false); if (temp == "not found") { temp = "F"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); subsample = true; } else { if (m->isTrue(temp)) { subsample = true; subsampleSize = -1; } //we will set it to smallest group later else { subsample = false; } } if (subsample == false) { iters = 1; } if (subsample && (format != "sharedfile")) { m->mothurOut("[ERROR]: the subsample parameter can only be used with a shared file.\n"); abort=true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(inputfile); //if user entered a file with a path then preserve it } } } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "TreeGroupCommand"); exit(1); } } //********************************************************************************************************************** TreeGroupCommand::~TreeGroupCommand(){ if (abort == false) { if (format == "sharedfile") { delete input; } else { delete list; } delete ct; } } //********************************************************************************************************************** int TreeGroupCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } if (format == "sharedfile") { ValidCalculators validCalculator; for (int i=0; imothurOut("You have given no valid calculators."); m->mothurOutEndLine(); return 0; } input = new InputData(sharedfile, "sharedfile"); lookup = input->getSharedRAbundVectors(); lastLabel = lookup[0]->getLabel(); if (lookup.size() < 2) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); return 0; } //used in tree constructor m->runParse = false; //create treemap class from groupmap for tree class to use ct = new CountTable(); set nameMap; map groupMap; set gps; for (int i = 0; i < m->getAllGroups().size(); i++) { nameMap.insert(m->getAllGroups()[i]); gps.insert(m->getAllGroups()[i]); groupMap[m->getAllGroups()[i]] = m->getAllGroups()[i]; } ct->createTable(nameMap, groupMap, gps); //clear globaldatas old tree names if any m->Treenames.clear(); //fills globaldatas tree names //m->Treenames = m->getGroups(); for (int k = 0; k < lookup.size(); k++) { m->Treenames.push_back(lookup[k]->getGroup()); } if (m->control_pressed) { return 0; } //create tree file makeSimsShared(); if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } }else{ //read in dist file filename = inputfile; ReadMatrix* readMatrix; if (format == "column") { readMatrix = new ReadColumnMatrix(filename); } else if (format == "phylip") { readMatrix = new ReadPhylipMatrix(filename); } readMatrix->setCutoff(cutoff); ct = NULL; nameMap = NULL; if(namefile != ""){ nameMap = new NameAssignment(namefile); nameMap->readMap(); readMatrix->read(nameMap); }else if (countfile != "") { ct = new CountTable(); ct->readTable(countfile, true, false); readMatrix->read(ct); }else { readMatrix->read(nameMap); } list = readMatrix->getListVector(); SparseDistanceMatrix* dMatrix = readMatrix->getDMatrix(); //clear globaldatas old tree names if any m->Treenames.clear(); //make treemap if (ct != NULL) { delete ct; } ct = new CountTable(); set nameMap; map groupMap; set gps; for (int i = 0; i < list->getNumBins(); i++) { string bin = list->get(i); nameMap.insert(bin); gps.insert(bin); groupMap[bin] = bin; m->Treenames.push_back(bin); } ct->createTable(nameMap, groupMap, gps); vector namesGroups = ct->getNamesOfGroups(); m->setGroups(namesGroups); //used in tree constructor m->runParse = false; if (m->control_pressed) { return 0; } vector< vector > matrix = makeSimsDist(dMatrix); delete readMatrix; delete dMatrix; if (m->control_pressed) { return 0; } //create a new filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); string outputFile = getOutputFileName("tree",variables); outputNames.push_back(outputFile); outputTypes["tree"].push_back(outputFile); Tree* newTree = createTree(matrix); if (newTree != NULL) { writeTree(outputFile, newTree); delete newTree; } if (m->control_pressed) { return 0; } m->mothurOut("Tree complete. "); m->mothurOutEndLine(); } //reset groups parameter m->clearGroups(); //set tree file as new current treefile string current = ""; itTypes = outputTypes.find("tree"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setTreeFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "execute"); exit(1); } } //********************************************************************************************************************** Tree* TreeGroupCommand::createTree(vector< vector >& simMatrix){ try { //create tree t = new Tree(ct, simMatrix); if (m->control_pressed) { delete t; t = NULL; return t; } //assemble tree t->assembleTree(); return t; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "createTree"); exit(1); } } /***********************************************************/ int TreeGroupCommand::writeTree(string out, Tree* T) { try { //print newick file t->createNewickFile(out); if (m->control_pressed) { m->mothurRemove(out); outputNames.pop_back(); return 1; } return 0; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "printSims"); exit(1); } } /***********************************************************/ void TreeGroupCommand::printSims(ostream& out, vector< vector >& simMatrix) { try { for (int m = 0; m < simMatrix.size(); m++) { //out << lookup[m]->getGroup() << '\t'; for (int n = 0; n < simMatrix.size(); n++) { out << simMatrix[m][n] << '\t'; } out << endl; } } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "printSims"); exit(1); } } /***********************************************************/ vector< vector > TreeGroupCommand::makeSimsDist(SparseDistanceMatrix* matrix) { try { numGroups = list->size(); //initialize simMatrix vector< vector > simMatrix; simMatrix.resize(numGroups); for (int k = 0; k < simMatrix.size(); k++) { for (int j = 0; j < simMatrix.size(); j++) { simMatrix[k].push_back(0.0); } } //go through sparse matrix and fill sims //go through each cell in the sparsematrix for (int i = 0; i < matrix->seqVec.size(); i++) { for (int j = 0; j < matrix->seqVec[i].size(); j++) { //already checked everyone else in row if (i < matrix->seqVec[i][j].index) { simMatrix[i][matrix->seqVec[i][j].index] = -(matrix->seqVec[i][j].dist -1.0); simMatrix[matrix->seqVec[i][j].index][i] = -(matrix->seqVec[i][j].dist -1.0); if (m->control_pressed) { return simMatrix; } } } } return simMatrix; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "makeSimsDist"); exit(1); } } /***********************************************************/ int TreeGroupCommand::makeSimsShared() { try { if (subsample) { if (subsampleSize == -1) { //user has not set size, set size = smallest samples size subsampleSize = lookup[0]->getNumSeqs(); for (int i = 1; i < lookup.size(); i++) { int thisSize = lookup[i]->getNumSeqs(); if (thisSize < subsampleSize) { subsampleSize = thisSize; } } }else { m->clearGroups(); Groups.clear(); m->Treenames.clear(); vector temp; for (int i = 0; i < lookup.size(); i++) { if (lookup[i]->getNumSeqs() < subsampleSize) { m->mothurOut(lookup[i]->getGroup() + " contains " + toString(lookup[i]->getNumSeqs()) + ". Eliminating."); m->mothurOutEndLine(); delete lookup[i]; }else { Groups.push_back(lookup[i]->getGroup()); temp.push_back(lookup[i]); m->Treenames.push_back(lookup[i]->getGroup()); } } lookup = temp; m->setGroups(Groups); } if (lookup.size() < 2) { m->mothurOut("You have not provided enough valid groups. I cannot run the command."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } } numGroups = lookup.size(); //sanity check to make sure processors < numComparisions int numDists = 0; for(int i=0;i processors) { break; } } } if (numDists < processors) { processors = numDists; } lines.resize(processors); for (int i = 0; i < processors; i++) { lines[i].start = int (sqrt(float(i)/float(processors)) * numGroups); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numGroups); } set processedLabels; set userLabels = labels; //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for(int i = 0 ; i < treeCalculators.size(); i++) { delete treeCalculators[i]; } return 1; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //get next line to process for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } for(int i = 0 ; i < treeCalculators.size(); i++) { delete treeCalculators[i]; } return 1; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); process(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } for(int i = 0 ; i < treeCalculators.size(); i++) { delete treeCalculators[i]; } return 0; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "makeSimsShared"); exit(1); } } /***********************************************************/ int TreeGroupCommand::process(vector thisLookup) { try{ vector< vector< vector > > calcDistsTotals; //each iter, one for each calc, then each groupCombos dists. this will be used to make .dist files vector< vector > calcDists; calcDists.resize(treeCalculators.size()); for (int thisIter = 0; thisIter < iters; thisIter++) { vector thisItersLookup = thisLookup; if (subsample) { SubSample sample; vector tempLabels; //dont need since we arent printing the sampled sharedRabunds //make copy of lookup so we don't get access violations vector newLookup; for (int k = 0; k < thisItersLookup.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisItersLookup[k]->getLabel()); temp->setGroup(thisItersLookup[k]->getGroup()); newLookup.push_back(temp); } //for each bin for (int k = 0; k < thisItersLookup[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } for (int j = 0; j < thisItersLookup.size(); j++) { newLookup[j]->push_back(thisItersLookup[j]->getAbundance(k), thisItersLookup[j]->getGroup()); } } tempLabels = sample.getSample(newLookup, subsampleSize); thisItersLookup = newLookup; } if(processors == 1){ driver(thisItersLookup, 0, numGroups, calcDists); }else{ int process = 1; vector processIDS; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ driver(thisItersLookup, lines[process].start, lines[process].end, calcDists); string tempdistFileName = m->getRootName(m->getSimpleName(sharedfile)) + m->mothurGetpid(process) + ".dist"; ofstream outtemp; m->openOutputFile(tempdistFileName, outtemp); for (int i = 0; i < calcDists.size(); i++) { outtemp << calcDists[i].size() << endl; for (int j = 0; j < calcDists[i].size(); j++) { outtemp << calcDists[i][j].seq1 << '\t' << calcDists[i][j].seq2 << '\t' << calcDists[i][j].dist << endl; } } outtemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(m->getRootName(m->getSimpleName(sharedfile)) + (toString(processIDS[i]) + ".dist")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(m->getRootName(m->getSimpleName(sharedfile)) + (toString(processIDS[i]) + ".dist"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); /******************************************************/ //comparison breakup to be used by different processes later lines.clear(); numGroups = thisLookup.size(); lines.resize(processors); for (int i = 0; i < processors; i++) { lines[i].start = int (sqrt(float(i)/float(processors)) * numGroups); lines[i].end = int (sqrt(float(i+1)/float(processors)) * numGroups); } /******************************************************/ calcDists.clear(); calcDists.resize(treeCalculators.size()); processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); process++; }else if (pid == 0){ driver(thisItersLookup, lines[process].start, lines[process].end, calcDists); string tempdistFileName = m->getRootName(m->getSimpleName(sharedfile)) + m->mothurGetpid(process) + ".dist"; ofstream outtemp; m->openOutputFile(tempdistFileName, outtemp); for (int i = 0; i < calcDists.size(); i++) { outtemp << calcDists[i].size() << endl; for (int j = 0; j < calcDists[i].size(); j++) { outtemp << calcDists[i][j].seq1 << '\t' << calcDists[i][j].seq2 << '\t' << calcDists[i][j].dist << endl; } } outtemp.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //parent do your part driver(thisItersLookup, lines[0].start, lines[0].end, calcDists); //force parent to wait until all the processes are done for (int i = 0; i < processIDS.size(); i++) { int temp = processIDS[i]; wait(&temp); } for (int i = 0; i < processIDS.size(); i++) { string tempdistFileName = m->getRootName(m->getSimpleName(sharedfile)) + toString(processIDS[i]) + ".dist"; ifstream intemp; m->openInputFile(tempdistFileName, intemp); for (int k = 0; k < calcDists.size(); k++) { int size = 0; intemp >> size; m->gobble(intemp); for (int j = 0; j < size; j++) { int seq1 = 0; int seq2 = 0; float dist = 1.0; intemp >> seq1 >> seq2 >> dist; m->gobble(intemp); seqDist tempDist(seq1, seq2, dist); calcDists[k].push_back(tempDist); } } intemp.close(); m->mothurRemove(tempdistFileName); } #else ////////////////////////////////////////////////////////////////////////////////////////////////////// //Windows version shared memory, so be careful when passing variables through the treeSharedData struct. //Above fork() will clone, so memory is separate, but that's not the case with windows, //Taking advantage of shared memory to pass results vectors. ////////////////////////////////////////////////////////////////////////////////////////////////////// vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=1; i newLookup; for (int k = 0; k < thisItersLookup.size(); k++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thisItersLookup[k]->getLabel()); temp->setGroup(thisItersLookup[k]->getGroup()); newLookup.push_back(temp); } //for each bin for (int k = 0; k < thisItersLookup[0]->getNumBins(); k++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } for (int j = 0; j < thisItersLookup.size(); j++) { newLookup[j]->push_back(thisItersLookup[j]->getAbundance(k), thisItersLookup[j]->getGroup()); } } // Allocate memory for thread data. treeSharedData* tempSum = new treeSharedData(m, lines[i].start, lines[i].end, Estimators, newLookup); pDataArray.push_back(tempSum); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyTreeSharedThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } //parent do your part driver(thisItersLookup, lines[0].start, lines[0].end, calcDists); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ if (pDataArray[i]->count != (pDataArray[i]->end-pDataArray[i]->start)) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->end-pDataArray[i]->start) + " groups assigned to it, quitting. \n"); m->control_pressed = true; } for (int j = 0; j < pDataArray[i]->thisLookup.size(); j++) { delete pDataArray[i]->thisLookup[j]; } for (int k = 0; k < calcDists.size(); k++) { int size = pDataArray[i]->calcDists[k].size(); for (int j = 0; j < size; j++) { calcDists[k].push_back(pDataArray[i]->calcDists[k][j]); } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif } calcDistsTotals.push_back(calcDists); if (subsample) { //clean up memory for (int i = 0; i < thisItersLookup.size(); i++) { delete thisItersLookup[i]; } thisItersLookup.clear(); for (int i = 0; i < calcDists.size(); i++) { calcDists[i].clear(); } } if (m->debug) { m->mothurOut("[DEBUG]: iter = " + toString(thisIter) + ".\n"); } } if (m->debug) { m->mothurOut("[DEBUG]: done with iters.\n"); } if (iters != 1) { //we need to find the average distance and standard deviation for each groups distance vector< vector > calcAverages = m->getAverages(calcDistsTotals); if (m->debug) { m->mothurOut("[DEBUG]: found averages.\n"); } //create average tree for each calc for (int i = 0; i < calcDists.size(); i++) { vector< vector > matrix; //square matrix to represent the distance matrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { matrix[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcAverages[i].size(); j++) { int row = calcAverages[i][j].seq1; int column = calcAverages[i][j].seq2; float dist = calcAverages[i][j].dist; matrix[row][column] = dist; matrix[column][row] = dist; } //create a new filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[calc]"] = treeCalculators[i]->getName(); variables["[distance]"] = thisLookup[0]->getLabel(); variables["[tag]"] = "ave"; string outputFile = getOutputFileName("tree",variables); outputNames.push_back(outputFile); outputTypes["tree"].push_back(outputFile); //creates tree from similarity matrix and write out file Tree* newTree = createTree(matrix); if (newTree != NULL) { writeTree(outputFile, newTree); } } if (m->debug) { m->mothurOut("[DEBUG]: done averages trees.\n"); } //create all trees for each calc and find their consensus tree for (int i = 0; i < calcDists.size(); i++) { if (m->control_pressed) { break; } //create a new filename //create a new filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[calc]"] = treeCalculators[i]->getName(); variables["[distance]"] = thisLookup[0]->getLabel(); variables["[tag]"] = "all"; string outputFile = getOutputFileName("tree",variables); outputNames.push_back(outputFile); outputTypes["tree"].push_back(outputFile); ofstream outAll; m->openOutputFile(outputFile, outAll); vector trees; for (int myIter = 0; myIter < iters; myIter++) { if(m->control_pressed) { break; } //initialize matrix vector< vector > matrix; //square matrix to represent the distance matrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { matrix[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcDistsTotals[myIter][i].size(); j++) { int row = calcDistsTotals[myIter][i][j].seq1; int column = calcDistsTotals[myIter][i][j].seq2; double dist = calcDistsTotals[myIter][i][j].dist; matrix[row][column] = dist; matrix[column][row] = dist; } //creates tree from similarity matrix and write out file Tree* newTree = createTree(matrix); if (newTree != NULL) { newTree->print(outAll); trees.push_back(newTree); } } outAll.close(); if (m->control_pressed) { for (int k = 0; k < trees.size(); k++) { delete trees[k]; } } if (m->debug) { m->mothurOut("[DEBUG]: done all trees.\n"); } Consensus consensus; //clear old tree names if any m->Treenames.clear(); m->Treenames = m->getGroups(); //may have changed if subsample eliminated groups Tree* conTree = consensus.getTree(trees); if (m->debug) { m->mothurOut("[DEBUG]: done cons tree.\n"); } //create a new filename variables["[tag]"] = "cons"; string conFile = getOutputFileName("tree",variables); outputNames.push_back(conFile); outputTypes["tree"].push_back(conFile); ofstream outTree; m->openOutputFile(conFile, outTree); if (conTree != NULL) { conTree->print(outTree, "boot"); delete conTree; } } }else { for (int i = 0; i < calcDists.size(); i++) { if (m->control_pressed) { break; } //initialize matrix vector< vector > matrix; //square matrix to represent the distance matrix.resize(thisLookup.size()); for (int k = 0; k < thisLookup.size(); k++) { matrix[k].resize(thisLookup.size(), 0.0); } for (int j = 0; j < calcDists[i].size(); j++) { int row = calcDists[i][j].seq1; int column = calcDists[i][j].seq2; double dist = calcDists[i][j].dist; matrix[row][column] = dist; matrix[column][row] = dist; } //create a new filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(inputfile)); variables["[calc]"] = treeCalculators[i]->getName(); variables["[distance]"] = thisLookup[0]->getLabel(); variables["[tag]"] = ""; string outputFile = getOutputFileName("tree",variables); outputNames.push_back(outputFile); outputTypes["tree"].push_back(outputFile); //creates tree from similarity matrix and write out file Tree* newTree = createTree(matrix); if (newTree != NULL) { writeTree(outputFile, newTree); delete newTree; } } } return 0; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "process"); exit(1); } } /**************************************************************************************************/ int TreeGroupCommand::driver(vector thisLookup, int start, int end, vector< vector >& calcDists) { try { vector subset; for (int k = start; k < end; k++) { // pass cdd each set of groups to compare for (int l = 0; l < k; l++) { if (k != l) { //we dont need to similiarity of a groups to itself subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabunds subset.push_back(thisLookup[k]); subset.push_back(thisLookup[l]); for(int i=0;igetNeedsAll()) { //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < thisLookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(thisLookup[w]); } } } vector tempdata = treeCalculators[i]->getValues(subset); //saves the calculator outputs if (m->control_pressed) { return 1; } seqDist temp(l, k, -(tempdata[0]-1.0)); calcDists[i].push_back(temp); } } } } return 0; } catch(exception& e) { m->errorOut(e, "TreeGroupCommand", "driver"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/commands/treegroupscommand.h000066400000000000000000000273461255543666200221620ustar00rootroot00000000000000#ifndef TREEGROUPCOMMAND_H #define TREEGROUPCOMMAND_H /* * treegroupscommand.h * Mothur * * Created by Sarah Westcott on 4/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "groupmap.h" #include "validcalculator.h" #include "tree.h" #include "counttable.h" #include "readmatrix.hpp" #include "readcolumn.h" #include "readphylip.h" #include "sharedsobscollectsummary.h" #include "sharedchao1.h" #include "sharedace.h" #include "sharednseqs.h" #include "sharedjabund.h" #include "sharedsorabund.h" #include "sharedjclass.h" #include "sharedsorclass.h" #include "sharedjest.h" #include "sharedsorest.h" #include "sharedthetayc.h" #include "sharedthetan.h" #include "sharedkstest.h" #include "whittaker.h" #include "sharedochiai.h" #include "sharedanderbergs.h" #include "sharedkulczynski.h" #include "sharedkulczynskicody.h" #include "sharedlennon.h" #include "sharedmorisitahorn.h" #include "sharedbraycurtis.h" #include "sharedjackknife.h" #include "whittaker.h" #include "odum.h" #include "canberra.h" #include "structeuclidean.h" #include "structchord.h" #include "hellinger.h" #include "manhattan.h" #include "structpearson.h" #include "soergel.h" #include "spearman.h" #include "structkulczynski.h" #include "structchi2.h" #include "speciesprofile.h" #include "hamming.h" #include "gower.h" #include "memchi2.h" #include "memchord.h" #include "memeuclidean.h" #include "mempearson.h" #include "sharedrjsd.h" #include "sharedjsd.h" /* This command create a tree file for each similarity calculator at distance level, using various calculators to find the similiarity between groups. The user can select the lines or labels they wish to use as well as the groups they would like included. They can also use as many or as few calculators as they wish. */ class TreeGroupCommand : public Command { public: TreeGroupCommand(string); TreeGroupCommand(); ~TreeGroupCommand(); vector setParameters(); string getCommandName() { return "tree.shared"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Tree.shared"; } string getDescription() { return "generate a tree file that describes the dissimilarity among groups"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lines; Tree* createTree(vector< vector >&); void printSims(ostream&, vector< vector >&); int makeSimsShared(); vector< vector > makeSimsDist(SparseDistanceMatrix*); int writeTree(string, Tree*); int driver(vector, int, int, vector< vector >&); NameAssignment* nameMap; ListVector* list; CountTable* ct; Tree* t; InputData* input; vector treeCalculators; vector lookup; string lastLabel; string format, groupNames, filename, sharedfile, countfile, inputfile; int numGroups, subsampleSize, iters, processors; ofstream out; float precision, cutoff; bool abort, allLines, subsample; set labels; //holds labels to be used string phylipfile, columnfile, namefile, calc, groups, label, outputDir; vector Estimators, Groups, outputNames; //holds estimators to be used //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. int process(vector); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct treeSharedData { vector thisLookup; vector< vector > calcDists; vector Estimators; unsigned long long start; unsigned long long end; MothurOut* m; int count; treeSharedData(){} treeSharedData(MothurOut* mout, unsigned long long st, unsigned long long en, vector est, vector lu) { m = mout; start = st; end = en; Estimators = est; thisLookup = lu; count=0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyTreeSharedThreadFunction(LPVOID lpParam){ treeSharedData* pDataArray; pDataArray = (treeSharedData*)lpParam; try { vector treeCalculators; ValidCalculators validCalculator; for (int i=0; iEstimators.size(); i++) { if (validCalculator.isValidCalculator("matrix", pDataArray->Estimators[i]) == true) { if (pDataArray->Estimators[i] == "sharedsobs") { treeCalculators.push_back(new SharedSobsCS()); }else if (pDataArray->Estimators[i] == "sharedchao") { treeCalculators.push_back(new SharedChao1()); }else if (pDataArray->Estimators[i] == "sharedace") { treeCalculators.push_back(new SharedAce()); }else if (pDataArray->Estimators[i] == "jabund") { treeCalculators.push_back(new JAbund()); }else if (pDataArray->Estimators[i] == "sorabund") { treeCalculators.push_back(new SorAbund()); }else if (pDataArray->Estimators[i] == "jclass") { treeCalculators.push_back(new Jclass()); }else if (pDataArray->Estimators[i] == "sorclass") { treeCalculators.push_back(new SorClass()); }else if (pDataArray->Estimators[i] == "jest") { treeCalculators.push_back(new Jest()); }else if (pDataArray->Estimators[i] == "sorest") { treeCalculators.push_back(new SorEst()); }else if (pDataArray->Estimators[i] == "thetayc") { treeCalculators.push_back(new ThetaYC()); }else if (pDataArray->Estimators[i] == "thetan") { treeCalculators.push_back(new ThetaN()); }else if (pDataArray->Estimators[i] == "kstest") { treeCalculators.push_back(new KSTest()); }else if (pDataArray->Estimators[i] == "sharednseqs") { treeCalculators.push_back(new SharedNSeqs()); }else if (pDataArray->Estimators[i] == "ochiai") { treeCalculators.push_back(new Ochiai()); }else if (pDataArray->Estimators[i] == "anderberg") { treeCalculators.push_back(new Anderberg()); }else if (pDataArray->Estimators[i] == "kulczynski") { treeCalculators.push_back(new Kulczynski()); }else if (pDataArray->Estimators[i] == "kulczynskicody") { treeCalculators.push_back(new KulczynskiCody()); }else if (pDataArray->Estimators[i] == "lennon") { treeCalculators.push_back(new Lennon()); }else if (pDataArray->Estimators[i] == "morisitahorn") { treeCalculators.push_back(new MorHorn()); }else if (pDataArray->Estimators[i] == "braycurtis") { treeCalculators.push_back(new BrayCurtis()); }else if (pDataArray->Estimators[i] == "whittaker") { treeCalculators.push_back(new Whittaker()); }else if (pDataArray->Estimators[i] == "odum") { treeCalculators.push_back(new Odum()); }else if (pDataArray->Estimators[i] == "canberra") { treeCalculators.push_back(new Canberra()); }else if (pDataArray->Estimators[i] == "structeuclidean") { treeCalculators.push_back(new StructEuclidean()); }else if (pDataArray->Estimators[i] == "structchord") { treeCalculators.push_back(new StructChord()); }else if (pDataArray->Estimators[i] == "hellinger") { treeCalculators.push_back(new Hellinger()); }else if (pDataArray->Estimators[i] == "manhattan") { treeCalculators.push_back(new Manhattan()); }else if (pDataArray->Estimators[i] == "structpearson") { treeCalculators.push_back(new StructPearson()); }else if (pDataArray->Estimators[i] == "soergel") { treeCalculators.push_back(new Soergel()); }else if (pDataArray->Estimators[i] == "spearman") { treeCalculators.push_back(new Spearman()); }else if (pDataArray->Estimators[i] == "structkulczynski") { treeCalculators.push_back(new StructKulczynski()); }else if (pDataArray->Estimators[i] == "speciesprofile") { treeCalculators.push_back(new SpeciesProfile()); }else if (pDataArray->Estimators[i] == "hamming") { treeCalculators.push_back(new Hamming()); }else if (pDataArray->Estimators[i] == "structchi2") { treeCalculators.push_back(new StructChi2()); }else if (pDataArray->Estimators[i] == "gower") { treeCalculators.push_back(new Gower()); }else if (pDataArray->Estimators[i] == "memchi2") { treeCalculators.push_back(new MemChi2()); }else if (pDataArray->Estimators[i] == "memchord") { treeCalculators.push_back(new MemChord()); }else if (pDataArray->Estimators[i] == "memeuclidean") { treeCalculators.push_back(new MemEuclidean()); }else if (pDataArray->Estimators[i] == "mempearson") { treeCalculators.push_back(new MemPearson()); }else if (pDataArray->Estimators[i] == "jsd") { treeCalculators.push_back(new JSD()); }else if (pDataArray->Estimators[i] == "rjsd") { treeCalculators.push_back(new RJSD()); } } } pDataArray->calcDists.resize(treeCalculators.size()); vector subset; for (int k = pDataArray->start; k < pDataArray->end; k++) { // pass cdd each set of groups to compare pDataArray->count++; for (int l = 0; l < k; l++) { if (k != l) { //we dont need to similiarity of a groups to itself subset.clear(); //clear out old pair of sharedrabunds //add new pair of sharedrabunds subset.push_back(pDataArray->thisLookup[k]); subset.push_back(pDataArray->thisLookup[l]); for(int i=0;igetNeedsAll()) { //load subset with rest of lookup for those calcs that need everyone to calc for a pair for (int w = 0; w < pDataArray->thisLookup.size(); w++) { if ((w != k) && (w != l)) { subset.push_back(pDataArray->thisLookup[w]); } } } vector tempdata = treeCalculators[i]->getValues(subset); //saves the calculator outputs if (pDataArray->m->control_pressed) { return 1; } seqDist temp(l, k, -(tempdata[0]-1.0)); pDataArray->calcDists[i].push_back(temp); } } } } for(int i=0;im->errorOut(e, "TreeGroupsCommand", "MyTreeSharedThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/trimflowscommand.cpp000066400000000000000000001631701255543666200223400ustar00rootroot00000000000000/* * trimflowscommand.cpp * Mothur * * Created by Pat Schloss on 12/22/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "trimflowscommand.h" #include "needlemanoverlap.hpp" //********************************************************************************************************************** vector TrimFlowsCommand::setParameters(){ try { CommandParameter pflow("flow", "InputTypes", "", "", "none", "none", "none","flow-file",false,true,true); parameters.push_back(pflow); CommandParameter poligos("oligos", "InputTypes", "", "", "none", "none", "none","",false,false,true); parameters.push_back(poligos); CommandParameter preorient("checkorient", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(preorient); CommandParameter pmaxhomop("maxhomop", "Number", "", "9", "", "", "","",false,false); parameters.push_back(pmaxhomop); CommandParameter pmaxflows("maxflows", "Number", "", "450", "", "", "","",false,false); parameters.push_back(pmaxflows); CommandParameter pminflows("minflows", "Number", "", "450", "", "", "","",false,false); parameters.push_back(pminflows); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(ppdiffs); CommandParameter pbdiffs("bdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(pbdiffs); CommandParameter pldiffs("ldiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pldiffs); CommandParameter psdiffs("sdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psdiffs); CommandParameter ptdiffs("tdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ptdiffs); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter psignal("signal", "Number", "", "0.50", "", "", "","",false,false); parameters.push_back(psignal); CommandParameter pnoise("noise", "Number", "", "0.70", "", "", "","",false,false); parameters.push_back(pnoise); CommandParameter pallfiles("allfiles", "Boolean", "", "t", "", "", "","",false,false); parameters.push_back(pallfiles); CommandParameter porder("order", "Multiple", "A-B-I", "A", "", "", "","",false,false, true); parameters.push_back(porder); CommandParameter pfasta("fasta", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pfasta); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "TrimFlowsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string TrimFlowsCommand::getHelpString(){ try { string helpString = ""; helpString += "The trim.flows command reads a flowgram file and creates .....\n"; helpString += "The oligos parameter allows you to provide an oligos file.\n"; helpString += "The maxhomop parameter allows you to set a maximum homopolymer length. \n"; helpString += "The tdiffs parameter is used to specify the total number of differences allowed in the sequence. The default is pdiffs + bdiffs + sdiffs + ldiffs.\n"; helpString += "The checkorient parameter will check look for the reverse compliment of the barcode or primer in the sequence. The default is false.\n"; helpString += "The bdiffs parameter is used to specify the number of differences allowed in the barcode. The default is 0.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; helpString += "The ldiffs parameter is used to specify the number of differences allowed in the linker. The default is 0.\n"; helpString += "The sdiffs parameter is used to specify the number of differences allowed in the spacer. The default is 0.\n"; helpString += "The order parameter options are A, B or I. Default=A. A = TACG and B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; helpString += "For more details please check out the wiki http://www.mothur.org/wiki/Trim.flows.\n"; return helpString; } catch(exception& e) { m->errorOut(e, "TrimFlowsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string TrimFlowsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "flow") { pattern = "[filename],[tag],flow"; } else if (type == "fasta") { pattern = "[filename],flow.fasta"; } else if (type == "file") { pattern = "[filename],flow.files"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "TrimFlowsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** TrimFlowsCommand::TrimFlowsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["flow"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["file"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "TrimFlowsCommand", "TrimFlowsCommand"); exit(1); } } //********************************************************************************************************************** TrimFlowsCommand::TrimFlowsCommand(string option) { try { abort = false; calledHelp = false; comboStarts = 0; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["flow"] = tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["file"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("flow"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["flow"] = inputDir + it->second; } } it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } } //check for required parameters flowFileName = validParameter.validFile(parameters, "flow", true); if (flowFileName == "not found") { flowFileName = m->getFlowFile(); if (flowFileName != "") { m->mothurOut("Using " + flowFileName + " as input file for the flow parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current flow file. You must provide a flow file."); m->mothurOutEndLine(); abort = true; } }else if (flowFileName == "not open") { flowFileName = ""; abort = true; } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(flowFileName); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "minflows", false); if (temp == "not found") { temp = "450"; } m->mothurConvert(temp, minFlows); temp = validParameter.validFile(parameters, "maxflows", false); if (temp == "not found") { temp = "450"; } m->mothurConvert(temp, maxFlows); temp = validParameter.validFile(parameters, "oligos", true); if (temp == "not found") { oligoFileName = ""; } else if(temp == "not open") { abort = true; } else { oligoFileName = temp; m->setOligosFile(oligoFileName); } temp = validParameter.validFile(parameters, "fasta", false); if (temp == "not found"){ fasta = 0; } else if(m->isTrue(temp)) { fasta = 1; } temp = validParameter.validFile(parameters, "maxhomop", false); if (temp == "not found"){ temp = "9"; } m->mothurConvert(temp, maxHomoP); temp = validParameter.validFile(parameters, "signal", false); if (temp == "not found"){ temp = "0.50"; } m->mothurConvert(temp, signal); temp = validParameter.validFile(parameters, "noise", false); if (temp == "not found"){ temp = "0.70"; } m->mothurConvert(temp, noise); temp = validParameter.validFile(parameters, "bdiffs", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, bdiffs); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found"){ temp = "0"; } m->mothurConvert(temp, pdiffs); temp = validParameter.validFile(parameters, "ldiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, ldiffs); temp = validParameter.validFile(parameters, "sdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, sdiffs); temp = validParameter.validFile(parameters, "tdiffs", false); if (temp == "not found") { int tempTotal = pdiffs + bdiffs + ldiffs + sdiffs; temp = toString(tempTotal); } m->mothurConvert(temp, tdiffs); if(tdiffs == 0){ tdiffs = bdiffs + pdiffs + ldiffs + sdiffs; } temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "order", false); if (temp == "not found"){ temp = "A"; } if (temp.length() > 1) { m->mothurOut("[ERROR]: " + temp + " is not a valid option for order. order options are A, B, or I. A = TACG, B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC, and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"); abort=true; } else { if (toupper(temp[0]) == 'A') { flowOrder = "TACG"; } else if(toupper(temp[0]) == 'B'){ flowOrder = "TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC"; } else if(toupper(temp[0]) == 'I'){ flowOrder = "TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC"; } else { m->mothurOut("[ERROR]: " + temp + " is not a valid option for order. order options are A, B, or I. A = TACG, B = TACGTACGTACGATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATAGATCGCATGACGATCGCATATCGTCAGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGTAGTCGAGCATCATCTGACGCAGTACGTGCATGATCTCAGTCAGCAGCTATGTCAGTGCATGCATAGATCGCATGACGATCGCATATCGTCAGTGCAGTGACTGATCGTCATCAGCTAGCATCGACTGCATGATCTCAGTCAGCAGC, and I = TACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGCTACGTACGTCTGAGCATCGATCGATGTACAGC.\n"); abort=true; } } if(oligoFileName == "") { allFiles = 0; } else { allFiles = 1; } temp = validParameter.validFile(parameters, "checkorient", false); if (temp == "not found") { temp = "F"; } reorient = m->isTrue(temp); numBarcodes = 0; numFPrimers = 0; numRPrimers = 0; numLinkers = 0; numSpacers = 0; } } catch(exception& e) { m->errorOut(e, "TrimFlowsCommand", "TrimFlowsCommand"); exit(1); } } //*************************************************************************************************************** int TrimFlowsCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(flowFileName)); string fastaFileName = getOutputFileName("fasta",variables); if(fasta){ outputNames.push_back(fastaFileName); outputTypes["fasta"].push_back(fastaFileName); } variables["[tag]"] = "trim"; string trimFlowFileName = getOutputFileName("flow",variables); outputNames.push_back(trimFlowFileName); outputTypes["flow"].push_back(trimFlowFileName); variables["[tag]"] = "scrap"; string scrapFlowFileName = getOutputFileName("flow",variables); outputNames.push_back(scrapFlowFileName); outputTypes["flow"].push_back(scrapFlowFileName); vector flowFilePos; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) flowFilePos = getFlowFileBreaks(); for (int i = 0; i < (flowFilePos.size()-1); i++) { lines.push_back(new linePair(flowFilePos[i], flowFilePos[(i+1)])); } #else ifstream in; m->openInputFile(flowFileName, in); in >> numFlows; in.close(); ///////////////////////////////////////// until I fix multiple processors for windows ////////////////// processors = 1; ///////////////////////////////////////// until I fix multiple processors for windows ////////////////// if (processors == 1) { lines.push_back(new linePair(0, 1000)); }else { int numFlowLines; flowFilePos = m->setFilePosEachLine(flowFileName, numFlowLines); flowFilePos.erase(flowFilePos.begin() + 1); numFlowLines--; //figure out how many sequences you have to process int numSeqsPerProcessor = numFlowLines / processors; cout << numSeqsPerProcessor << '\t' << numFlowLines << endl; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFlowLines - i * numSeqsPerProcessor; } lines.push_back(new linePair(flowFilePos[startIndex], numSeqsPerProcessor)); cout << flowFilePos[startIndex] << '\t' << numSeqsPerProcessor << endl; } } #endif vector > barcodePrimerComboFileNames; if(oligoFileName != ""){ getOligos(barcodePrimerComboFileNames); } if(processors == 1){ driverCreateTrim(flowFileName, trimFlowFileName, scrapFlowFileName, fastaFileName, barcodePrimerComboFileNames, lines[0]); }else{ createProcessesCreateTrim(flowFileName, trimFlowFileName, scrapFlowFileName, fastaFileName, barcodePrimerComboFileNames); } if (m->control_pressed) { return 0; } string flowFilesFileName; ofstream output; if(allFiles){ set namesAlreadyProcessed; set namesToRemove; flowFilesFileName = getOutputFileName("file",variables); m->openOutputFile(flowFilesFileName, output); for(int i=0;imothurRemove(barcodePrimerComboFileNames[i][j]); namesToRemove.insert(barcodePrimerComboFileNames[i][j]); } else{ output << m->getFullPathName(barcodePrimerComboFileNames[i][j]) << endl; } namesAlreadyProcessed.insert(barcodePrimerComboFileNames[i][j]); } } } } } output.close(); //remove names for outputFileNames, just cleans up the output vector outputNames2; for(int i = 0; i < outputNames.size(); i++) { if (namesToRemove.count(outputNames[i]) == 0) { outputNames2.push_back(outputNames[i]); } } outputNames = outputNames2; } else{ flowFilesFileName = getOutputFileName("file",variables); m->openOutputFile(flowFilesFileName, output); output << m->getFullPathName(trimFlowFileName) << endl; output.close(); } outputTypes["file"].push_back(flowFilesFileName); outputNames.push_back(flowFilesFileName); m->setFileFile(flowFilesFileName); m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "execute"); exit(1); } } //*************************************************************************************************************** int TrimFlowsCommand::driverCreateTrim(string flowFileName, string trimFlowFileName, string scrapFlowFileName, string fastaFileName, vector > thisBarcodePrimerComboFileNames, linePair* line){ try { ofstream trimFlowFile; m->openOutputFile(trimFlowFileName, trimFlowFile); trimFlowFile.setf(ios::fixed, ios::floatfield); trimFlowFile.setf(ios::showpoint); ofstream scrapFlowFile; m->openOutputFile(scrapFlowFileName, scrapFlowFile); scrapFlowFile.setf(ios::fixed, ios::floatfield); scrapFlowFile.setf(ios::showpoint); ofstream fastaFile; if(fasta){ m->openOutputFile(fastaFileName, fastaFile); } ifstream flowFile; m->openInputFile(flowFileName, flowFile); flowFile.seekg(line->start); if(line->start == 0){ flowFile >> numFlows; m->gobble(flowFile); scrapFlowFile << numFlows << endl; trimFlowFile << maxFlows << endl; if(allFiles){ for(int i=0;iopenOutputFile(thisBarcodePrimerComboFileNames[i][j], temp); temp << maxFlows << endl; temp.close(); } } } } } FlowData flowData(numFlows, signal, noise, maxHomoP, flowOrder); //cout << " driver flowdata address " << &flowData << &flowFile << endl; int count = 0; bool moreSeqs = 1; TrimOligos* trimOligos = NULL; if (pairedOligos) { trimOligos = new TrimOligos(pdiffs, bdiffs, 0, 0, oligos.getPairedPrimers(), oligos.getPairedBarcodes(), false); } else { trimOligos = new TrimOligos(pdiffs, bdiffs, ldiffs, sdiffs, oligos.getPrimers(), oligos.getBarcodes(), oligos.getReversePrimers(), oligos.getLinkers(), oligos.getSpacers()); } TrimOligos* rtrimOligos = NULL; if (reorient) { rtrimOligos = new TrimOligos(pdiffs, bdiffs, 0, 0, oligos.getReorientedPairedPrimers(), oligos.getReorientedPairedBarcodes(), false); numBarcodes = oligos.getReorientedPairedBarcodes().size(); } while(moreSeqs) { if (m->control_pressed) { break; } int success = 1; int currentSeqDiffs = 0; string trashCode = ""; string commentString = ""; flowData.getNext(flowFile); flowData.capFlows(maxFlows); Sequence currSeq = flowData.getSequence(); //for reorient Sequence savedSeq(currSeq.getName(), currSeq.getAligned()); if(!flowData.hasMinFlows(minFlows)){ //screen to see if sequence is of a minimum number of flows success = 0; trashCode += 'l'; } if(!flowData.hasGoodHomoP()){ //screen to see if sequence meets the maximum homopolymer limit success = 0; trashCode += 'h'; } int primerIndex = 0; int barcodeIndex = 0; if(numLinkers != 0){ success = trimOligos->stripLinker(currSeq); if(success > ldiffs) { trashCode += 'k'; } else{ currentSeqDiffs += success; } } if (m->debug) { m->mothurOut("[DEBUG]: " + currSeq.getName() + " " + currSeq.getUnaligned() + "\n"); } if(numBarcodes != 0){ vector results = trimOligos->stripBarcode(currSeq, barcodeIndex); if (pairedOligos) { success = results[0] + results[2]; commentString += "fbdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + trimOligos->getCodeValue(results[3], bdiffs) + ") "; } else { success = results[0]; commentString += "bdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], bdiffs) + ") "; } if(success > bdiffs) { trashCode += 'b'; } else{ currentSeqDiffs += success; } } if(numSpacers != 0){ success = trimOligos->stripSpacer(currSeq); if(success > sdiffs) { trashCode += 's'; } else{ currentSeqDiffs += success; } } if(numFPrimers != 0){ vector results = trimOligos->stripForward(currSeq, primerIndex); if (pairedOligos) { success = results[0] + results[2]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + trimOligos->getCodeValue(results[3], pdiffs) + ") "; } else { success = results[0]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pdiffs) + ") "; } if(success > pdiffs) { trashCode += 'f'; } else{ currentSeqDiffs += success; } } if(numRPrimers != 0){ vector results = trimOligos->stripReverse(currSeq); success = results[0]; commentString += "rpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pdiffs) + ") "; if(success > pdiffs) { trashCode += 'r'; } else{ currentSeqDiffs += success; } } if (currentSeqDiffs > tdiffs) { trashCode += 't'; } if (reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; int thisCurrentSeqsDiffs = 0; string thiscommentString = ""; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; //cout << currSeq.getName() << '\t' << savedSeq.getUnaligned() << endl; if(numBarcodes != 0){ vector results = rtrimOligos->stripBarcode(savedSeq, thisBarcodeIndex); if (pairedOligos) { thisSuccess = results[0] + results[2]; thiscommentString += "fbdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], bdiffs) + ") "; } else { thisSuccess = results[0]; thiscommentString += "bdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], bdiffs) + ") "; } if(thisSuccess > bdiffs) { thisTrashCode += "b"; } else{ thisCurrentSeqsDiffs += thisSuccess; } } //cout << currSeq.getName() << '\t' << savedSeq.getUnaligned() << endl; if(numFPrimers != 0){ vector results = rtrimOligos->stripForward(savedSeq, thisPrimerIndex); if (pairedOligos) { thisSuccess = results[0] + results[2]; thiscommentString += "fpdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pdiffs) + ") "; } else { thisSuccess = results[0]; thiscommentString += "pdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pdiffs) + ") "; } if(thisSuccess > pdiffs) { thisTrashCode += "f"; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if (thisCurrentSeqsDiffs > tdiffs) { thisTrashCode += 't'; } if (thisTrashCode == "") { trashCode = thisTrashCode; success = thisSuccess; currentSeqDiffs = thisCurrentSeqsDiffs; commentString = thiscommentString; barcodeIndex = thisBarcodeIndex; primerIndex = thisPrimerIndex; savedSeq.reverseComplement(); currSeq.setAligned(savedSeq.getAligned()); }else { trashCode += "(" + thisTrashCode + ")"; } } currSeq.setComment(commentString); if(trashCode.length() == 0){ string thisGroup = oligos.getGroupName(barcodeIndex, primerIndex); int pos = thisGroup.find("ignore"); if (pos == string::npos) { flowData.printFlows(trimFlowFile); if(fasta) { currSeq.printSequence(fastaFile); } if(allFiles){ ofstream output; m->openOutputFileAppend(thisBarcodePrimerComboFileNames[barcodeIndex][primerIndex], output); output.setf(ios::fixed, ios::floatfield); trimFlowFile.setf(ios::showpoint); flowData.printFlows(output); output.close(); } } } else{ flowData.printFlows(scrapFlowFile, trashCode); } count++; //cout << "driver" << '\t' << currSeq.getName() << endl; //report progress if((count) % 10000 == 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = flowFile.tellg(); if ((pos == -1) || (pos >= line->end)) { break; } #else if (flowFile.eof()) { break; } #endif } //report progress if((count) % 10000 != 0){ m->mothurOut(toString(count)); m->mothurOutEndLine(); } trimFlowFile.close(); scrapFlowFile.close(); flowFile.close(); if(fasta){ fastaFile.close(); } delete trimOligos; if (reorient) { delete rtrimOligos; } return count; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "driverCreateTrim"); exit(1); } } //*************************************************************************************************************** int TrimFlowsCommand::getOligos(vector >& outFlowFileNames){ try { bool allBlank = false; oligos.read(oligoFileName); if (m->control_pressed) { return 0; } //error in reading oligos if (oligos.hasPairedBarcodes()) { pairedOligos = true; numFPrimers = oligos.getPairedPrimers().size(); numBarcodes = oligos.getPairedBarcodes().size(); }else { pairedOligos = false; numFPrimers = oligos.getPrimers().size(); numBarcodes = oligos.getBarcodes().size(); } numLinkers = oligos.getLinkers().size(); numSpacers = oligos.getSpacers().size(); numRPrimers = oligos.getReversePrimers().size(); vector groupNames = oligos.getGroupNames(); if (groupNames.size() == 0) { allFiles = 0; allBlank = true; } outFlowFileNames.resize(oligos.getBarcodeNames().size()); for(int i=0;i uniqueNames; //used to cleanup outputFileNames if (pairedOligos) { map barcodes = oligos.getPairedBarcodes(); map primers = oligos.getPairedPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->first); string barcodeName = oligos.getBarcodeName(itBar->first); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } ofstream temp; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(flowFileName)); variables["[tag]"] = comboGroupName; string fileName = getOutputFileName("flow", variables); if (uniqueNames.count(fileName) == 0) { outputNames.push_back(fileName); outputTypes["flow"].push_back(fileName); uniqueNames.insert(fileName); } outFlowFileNames[itBar->first][itPrimer->first] = fileName; m->openOutputFile(fileName, temp); temp.close(); } } } }else { map barcodes = oligos.getBarcodes() ; map primers = oligos.getPrimers(); for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = oligos.getPrimerName(itPrimer->second); string barcodeName = oligos.getBarcodeName(itBar->second); if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; if(primerName == ""){ comboGroupName = barcodeName; }else{ if(barcodeName == ""){ comboGroupName = primerName; } else{ comboGroupName = barcodeName + "." + primerName; } } ofstream temp; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(flowFileName)); variables["[tag]"] = comboGroupName; string fileName = getOutputFileName("flow", variables); if (uniqueNames.count(fileName) == 0) { outputNames.push_back(fileName); outputTypes["flow"].push_back(fileName); uniqueNames.insert(fileName); } outFlowFileNames[itBar->second][itPrimer->second] = fileName; m->openOutputFile(fileName, temp); temp.close(); } } } } } return 0; } catch(exception& e) { m->errorOut(e, "TrimFlowsCommand", "getOligos"); exit(1); } } /**************************************************************************************************/ vector TrimFlowsCommand::getFlowFileBreaks() { try{ vector filePos; filePos.push_back(0); FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (flowFileName.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } //estimate file breaks unsigned long long chunkSize = 0; chunkSize = size / processors; //file too small to divide by processors if (chunkSize == 0) { processors = 1; filePos.push_back(size); return filePos; } //for each process seekg to closest file break and search for next '>' char. make that the filebreak for (int i = 0; i < processors; i++) { unsigned long long spot = (i+1) * chunkSize; ifstream in; m->openInputFile(flowFileName, in); in.seekg(spot); string dummy = m->getline(in); //there was not another sequence before the end of the file unsigned long long sanityPos = in.tellg(); // if (sanityPos == -1) { break; } // else { filePos.push_back(newSpot); } if (sanityPos == -1) { break; } else { filePos.push_back(sanityPos); } in.close(); } //save end pos filePos.push_back(size); //sanity check filePos for (int i = 0; i < (filePos.size()-1); i++) { if (filePos[(i+1)] <= filePos[i]) { filePos.erase(filePos.begin()+(i+1)); i--; } } ifstream in; m->openInputFile(flowFileName, in); in >> numFlows; m->gobble(in); //unsigned long long spot = in.tellg(); //filePos[0] = spot; in.close(); processors = (filePos.size() - 1); return filePos; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "getFlowFileBreaks"); exit(1); } } /**************************************************************************************************/ int TrimFlowsCommand::createProcessesCreateTrim(string flowFileName, string trimFlowFileName, string scrapFlowFileName, string fastaFileName, vector > barcodePrimerComboFileNames){ try { processIDS.clear(); int exitCommand = 1; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ vector > tempBarcodePrimerComboFileNames = barcodePrimerComboFileNames; if(allFiles){ for(int i=0;imothurGetpid(process) + ".temp"; ofstream temp; m->openOutputFile(tempBarcodePrimerComboFileNames[i][j], temp); temp.close(); } } } } driverCreateTrim(flowFileName, (trimFlowFileName + m->mothurGetpid(process) + ".temp"), (scrapFlowFileName + m->mothurGetpid(process) + ".temp"), (fastaFileName + m->mothurGetpid(process) + ".temp"), tempBarcodePrimerComboFileNames, lines[process]); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(trimFlowFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(scrapFlowFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(fastaFileName + (toString(processIDS[i]) + ".temp")); if(allFiles){ for(int i=0;imothurRemove(tempFile); } } } } } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(fastaFileName + (toString(processIDS[i]) + ".temp"));m->mothurRemove(trimFlowFileName + (toString(processIDS[i]) + ".temp"));m->mothurRemove(scrapFlowFileName + (toString(processIDS[i]) + ".temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide for (int i = 0; i < lines.size(); i++) { delete lines[i]; } lines.clear(); vector flowFilePos = getFlowFileBreaks(); for (int i = 0; i < (flowFilePos.size()-1); i++) { lines.push_back(new linePair(flowFilePos[i], flowFilePos[(i+1)])); } processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ vector > tempBarcodePrimerComboFileNames = barcodePrimerComboFileNames; if(allFiles){ for(int i=0;imothurGetpid(process) + ".temp"; ofstream temp; m->openOutputFile(tempBarcodePrimerComboFileNames[i][j], temp); temp.close(); } } } } driverCreateTrim(flowFileName, (trimFlowFileName + m->mothurGetpid(process) + ".temp"), (scrapFlowFileName + m->mothurGetpid(process) + ".temp"), (fastaFileName + m->mothurGetpid(process) + ".temp"), tempBarcodePrimerComboFileNames, lines[process]); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //parent do my part ofstream temp; m->openOutputFile(trimFlowFileName, temp); temp.close(); m->openOutputFile(scrapFlowFileName, temp); temp.close(); if(fasta){ m->openOutputFile(fastaFileName, temp); temp.close(); } driverCreateTrim(flowFileName, trimFlowFileName, scrapFlowFileName, fastaFileName, barcodePrimerComboFileNames, lines[0]); //force parent to wait until all the processes are done for (int i=0;i pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int i=0; i > tempBarcodePrimerComboFileNames = barcodePrimerComboFileNames; if(allFiles){ for(int i=0;iopenOutputFile(tempBarcodePrimerComboFileNames[i][j], temp); temp.close(); } } } } trimFlowData* tempflow = new trimFlowData(flowFileName, (trimFlowFileName + extension), (scrapFlowFileName + extension), fastaFileName, flowOrder, tempBarcodePrimerComboFileNames, barcodes, primers, revPrimer, fasta, allFiles, lines[i]->start, lines[i]->end, m, signal, noise, numFlows, maxFlows, minFlows, maxHomoP, tdiffs, bdiffs, pdiffs, i); pDataArray.push_back(tempflow); //MyTrimFlowThreadFunction is in header. It must be global or static to work with the threads. //default security attributes, thread function name, argument to thread function, use default creation flags, returns the thread identifier hThreadArray[i] = CreateThread(NULL, 0, MyTrimFlowThreadFunction, pDataArray[i], 0, &dwThreadIdArray[i]); } //using the main process as a worker saves time and memory ofstream temp; m->openOutputFile(trimFlowFileName, temp); temp.close(); m->openOutputFile(scrapFlowFileName, temp); temp.close(); if(fasta){ m->openOutputFile(fastaFileName, temp); temp.close(); } vector > tempBarcodePrimerComboFileNames = barcodePrimerComboFileNames; if(allFiles){ for(int i=0;iopenOutputFile(tempBarcodePrimerComboFileNames[i][j], temp); temp.close(); } } } } //do my part - do last piece because windows is looking for eof int num = driverCreateTrim(flowFileName, (trimFlowFileName + toString(processors-1) + ".temp"), (scrapFlowFileName + toString(processors-1) + ".temp"), (fastaFileName + toString(processors-1) + ".temp"), tempBarcodePrimerComboFileNames, lines[processors-1]); processIDS.push_back((processors-1)); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ num += pDataArray[i]->count; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } */ #endif //append files m->mothurOutEndLine(); for(int i=0;imothurOut("Appending files from process " + toString(processIDS[i])); m->mothurOutEndLine(); m->appendFiles((trimFlowFileName + toString(processIDS[i]) + ".temp"), trimFlowFileName); m->mothurRemove((trimFlowFileName + toString(processIDS[i]) + ".temp")); // m->mothurOut("\tDone with trim.flow file"); m->mothurOutEndLine(); m->appendFiles((scrapFlowFileName + toString(processIDS[i]) + ".temp"), scrapFlowFileName); m->mothurRemove((scrapFlowFileName + toString(processIDS[i]) + ".temp")); // m->mothurOut("\tDone with scrap.flow file"); m->mothurOutEndLine(); if(fasta){ m->appendFiles((fastaFileName + toString(processIDS[i]) + ".temp"), fastaFileName); m->mothurRemove((fastaFileName + toString(processIDS[i]) + ".temp")); // m->mothurOut("\tDone with flow.fasta file"); m->mothurOutEndLine(); } if(allFiles){ for (int j = 0; j < barcodePrimerComboFileNames.size(); j++) { for (int k = 0; k < barcodePrimerComboFileNames[0].size(); k++) { if (barcodePrimerComboFileNames[j][k] != "") { m->appendFiles((barcodePrimerComboFileNames[j][k] + toString(processIDS[i]) + ".temp"), barcodePrimerComboFileNames[j][k]); m->mothurRemove((barcodePrimerComboFileNames[j][k] + toString(processIDS[i]) + ".temp")); } } } } } return exitCommand; } catch(exception& e) { m->errorOut(e, "TrimFlowsCommand", "createProcessesCreateTrim"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/commands/trimflowscommand.h000066400000000000000000000173331255543666200220040ustar00rootroot00000000000000#ifndef TRIMFLOWSCOMMAND_H #define TRIMFLOWSCOMMAND_H /* * trimflowscommand.h * Mothur * * Created by Pat Schloss on 12/22/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "sequence.hpp" #include "flowdata.h" #include "groupmap.h" #include "trimoligos.h" #include "oligos.h" class TrimFlowsCommand : public Command { public: TrimFlowsCommand(string); TrimFlowsCommand(); ~TrimFlowsCommand() {} vector setParameters(); string getCommandName() { return "trim.flows"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Trim.flows"; } string getDescription() { return "trim.flows"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool abort; int comboStarts; vector processIDS; //processid vector lines; vector outputNames; set filesToRemove; bool allFiles; int processors; int numFPrimers, numRPrimers, numBarcodes; int maxFlows, minFlows, minLength, maxLength, maxHomoP, tdiffs, bdiffs, pdiffs, sdiffs, ldiffs, numLinkers, numSpacers; int numFlows; float signal, noise; bool fasta, pairedOligos, reorient; string flowOrder, flowFileName, oligoFileName, outputDir; Oligos oligos; vector getFlowFileBreaks(); int createProcessesCreateTrim(string, string, string, string, vector >); int driverCreateTrim(string, string, string, string, vector >, linePair*); int getOligos(vector >&); //a rewrite of what is in trimseqscommand.h }; /************************************************************************************************** //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct trimFlowData { string flowFileName; string trimFlowFileName; string scrapFlowFileName; string fastaFileName; string flowOrder; vector > barcodePrimerComboFileNames; map barcodes; map primers; vector revPrimer; bool fasta, allFiles; unsigned long long start; unsigned long long end; MothurOut* m; float signal, noise; int numFlows, maxFlows, minFlows, maxHomoP, tdiffs, bdiffs, pdiffs, threadID, count; trimFlowData(){} trimFlowData(string ff, string tf, string sf, string f, string fo, vector > bfn, map bar, map pri, vector rev, bool fa, bool al, unsigned long long st, unsigned long long en, MothurOut* mout, float sig, float n, int numF, int maxF, int minF, int maxH, int td, int bd, int pd, int tid) { flowFileName = ff; trimFlowFileName = tf; scrapFlowFileName = sf; fastaFileName = f; flowOrder = fo; barcodePrimerComboFileNames = bfn; barcodes = bar; primers = pri; revPrimer = rev; fasta = fa; allFiles = al; start = st; end = en; m = mout; signal = sig; noise = n; numFlows = numF; maxFlows = maxF; minFlows = minF; maxHomoP = maxH; tdiffs = td; bdiffs = bd; pdiffs = pd; threadID = tid; } }; /************************************************************************************************** #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyTrimFlowThreadFunction(LPVOID lpParam){ trimFlowData* pDataArray; pDataArray = (trimFlowData*)lpParam; try { ofstream trimFlowFile; pDataArray->m->openOutputFile(pDataArray->trimFlowFileName, trimFlowFile); trimFlowFile.setf(ios::fixed, ios::floatfield); trimFlowFile.setf(ios::showpoint); ofstream scrapFlowFile; pDataArray->m->openOutputFile(pDataArray->scrapFlowFileName, scrapFlowFile); scrapFlowFile.setf(ios::fixed, ios::floatfield); scrapFlowFile.setf(ios::showpoint); ofstream fastaFile; if(pDataArray->fasta){ pDataArray->m->openOutputFile(pDataArray->fastaFileName, fastaFile); } ifstream flowFile; pDataArray->m->openInputFile(pDataArray->flowFileName, flowFile); flowFile.seekg(pDataArray->start); if(pDataArray->start == 0){ flowFile >> pDataArray->numFlows; pDataArray->m->gobble(flowFile); scrapFlowFile << pDataArray->maxFlows << endl; trimFlowFile << pDataArray->maxFlows << endl; if(pDataArray->allFiles){ for(int i=0;ibarcodePrimerComboFileNames.size();i++){ for(int j=0;jbarcodePrimerComboFileNames[0].size();j++){ ofstream temp; pDataArray->m->openOutputFile(pDataArray->barcodePrimerComboFileNames[i][j], temp); temp << pDataArray->maxFlows << endl; temp.close(); } } } } FlowData flowData(pDataArray->numFlows, pDataArray->signal, pDataArray->noise, pDataArray->maxHomoP, pDataArray->flowOrder); cout << " thread flowdata address " << &flowData << '\t' << &flowFile << endl; TrimOligos trimOligos(pDataArray->pdiffs, pDataArray->bdiffs, pDataArray->primers, pDataArray->barcodes, pDataArray->revPrimer); pDataArray->count = pDataArray->end; cout << pDataArray->threadID << '\t' << pDataArray->count << endl; int count = 0; for(int i = 0; i < pDataArray->end; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { break; } cout << pDataArray->threadID << '\t' << count << endl; int success = 1; int currentSeqDiffs = 0; string trashCode = ""; flowData.getNext(flowFile); cout << "thread good bit " << flowFile.good() << endl; flowData.capFlows(pDataArray->maxFlows); Sequence currSeq = flowData.getSequence(); if(!flowData.hasMinFlows(pDataArray->minFlows)){ //screen to see if sequence is of a minimum number of flows success = 0; trashCode += 'l'; } int primerIndex = 0; int barcodeIndex = 0; if(pDataArray->barcodes.size() != 0){ success = trimOligos.stripBarcode(currSeq, barcodeIndex); if(success > pDataArray->bdiffs) { trashCode += 'b'; } else{ currentSeqDiffs += success; } } if(pDataArray->primers.size() != 0){ success = trimOligos.stripForward(currSeq, primerIndex); if(success > pDataArray->pdiffs) { trashCode += 'f'; } else{ currentSeqDiffs += success; } } if (currentSeqDiffs > pDataArray->tdiffs) { trashCode += 't'; } if(pDataArray->revPrimer.size() != 0){ success = trimOligos.stripReverse(currSeq); if(success > pdiffs) { trashCode += 'r'; } else{ currentSeqsDiffs += success; } } if(trashCode.length() == 0){ flowData.printFlows(trimFlowFile); if(pDataArray->fasta) { currSeq.setAligned(currSeq.getUnaligned()); currSeq.printSequence(fastaFile); } if(pDataArray->allFiles){ ofstream output; pDataArray->m->openOutputFileAppend(pDataArray->barcodePrimerComboFileNames[barcodeIndex][primerIndex], output); output.setf(ios::fixed, ios::floatfield); trimFlowFile.setf(ios::showpoint); flowData.printFlows(output); output.close(); } } else{ flowData.printFlows(scrapFlowFile, trashCode); } count++; cout << pDataArray->threadID << '\t' << currSeq.getName() << endl; //report progress if((count) % 10000 == 0){ pDataArray->m->mothurOut(toString(count)); pDataArray->m->mothurOutEndLine(); } } //report progress if((count) % 10000 != 0){ pDataArray->m->mothurOut(toString(count)); pDataArray->m->mothurOutEndLine(); } trimFlowFile.close(); scrapFlowFile.close(); flowFile.close(); if(pDataArray->fasta){ fastaFile.close(); } } catch(exception& e) { pDataArray->m->errorOut(e, "TrimFlowsCommand", "MyTrimFlowsThreadFunction"); exit(1); } } #endif */ #endif mothur-1.36.1/source/commands/trimseqscommand.cpp000066400000000000000000003176471255543666200221730ustar00rootroot00000000000000/* * trimseqscommand.cpp * Mothur * * Created by Pat Schloss on 6/6/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "trimseqscommand.h" #include "needlemanoverlap.hpp" #include "trimoligos.h" //********************************************************************************************************************** vector TrimSeqsCommand::setParameters(){ try { CommandParameter pfasta("fasta", "InputTypes", "", "", "none", "none", "none","fasta",false,true,true); parameters.push_back(pfasta); CommandParameter poligos("oligos", "InputTypes", "", "", "none", "none", "none","group",false,false,true); parameters.push_back(poligos); CommandParameter pqfile("qfile", "InputTypes", "", "", "none", "none", "none","qfile",false,false,true); parameters.push_back(pqfile); CommandParameter pname("name", "InputTypes", "", "", "namecount", "none", "none","name",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "namecount", "none", "none","count",false,false,true); parameters.push_back(pcount); CommandParameter pflip("flip", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(pflip); CommandParameter preorient("checkorient", "Boolean", "", "F", "", "", "","",false,false,true); parameters.push_back(preorient); CommandParameter pmaxambig("maxambig", "Number", "", "-1", "", "", "","",false,false); parameters.push_back(pmaxambig); CommandParameter pmaxhomop("maxhomop", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pmaxhomop); CommandParameter pminlength("minlength", "Number", "", "1", "", "", "","",false,false); parameters.push_back(pminlength); CommandParameter pmaxlength("maxlength", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pmaxlength); CommandParameter ppdiffs("pdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(ppdiffs); CommandParameter pbdiffs("bdiffs", "Number", "", "0", "", "", "","",false,false,true); parameters.push_back(pbdiffs); CommandParameter pldiffs("ldiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pldiffs); CommandParameter psdiffs("sdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(psdiffs); CommandParameter ptdiffs("tdiffs", "Number", "", "0", "", "", "","",false,false); parameters.push_back(ptdiffs); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter pallfiles("allfiles", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pallfiles); CommandParameter pkeepforward("keepforward", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pkeepforward); CommandParameter plogtransform("logtransform", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(plogtransform); CommandParameter pqtrim("qtrim", "Boolean", "", "T", "", "", "","",false,false); parameters.push_back(pqtrim); CommandParameter pqthreshold("qthreshold", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pqthreshold); CommandParameter pqaverage("qaverage", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pqaverage); CommandParameter prollaverage("rollaverage", "Number", "", "0", "", "", "","",false,false); parameters.push_back(prollaverage); CommandParameter pqwindowaverage("qwindowaverage", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pqwindowaverage); CommandParameter pqstepsize("qstepsize", "Number", "", "1", "", "", "","",false,false); parameters.push_back(pqstepsize); CommandParameter pqwindowsize("qwindowsize", "Number", "", "50", "", "", "","",false,false); parameters.push_back(pqwindowsize); CommandParameter pkeepfirst("keepfirst", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pkeepfirst); CommandParameter premovelast("removelast", "Number", "", "0", "", "", "","",false,false); parameters.push_back(premovelast); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string TrimSeqsCommand::getHelpString(){ try { string helpString = ""; helpString += "The trim.seqs command reads a fastaFile and creates 2 new fasta files, .trim.fasta and scrap.fasta, as well as group files if you provide and oligos file.\n"; helpString += "The .trim.fasta contains sequences that meet your requirements, and the .scrap.fasta contains those which don't.\n"; helpString += "The trim.seqs command parameters are fasta, name, count, flip, checkorient, oligos, maxambig, maxhomop, minlength, maxlength, qfile, qthreshold, qaverage, diffs, qtrim, keepfirst, removelast, logtransform and allfiles.\n"; helpString += "The fasta parameter is required.\n"; helpString += "The flip parameter will output the reverse compliment of your trimmed sequence. The default is false.\n"; helpString += "The checkorient parameter will check the reverse compliment of the sequence if the barcodes and primers cannot be found in the forward. The default is false.\n"; helpString += "The oligos parameter allows you to provide an oligos file.\n"; helpString += "The name parameter allows you to provide a names file with your fasta file.\n"; helpString += "The count parameter allows you to provide a count file with your fasta file.\n"; helpString += "The maxambig parameter allows you to set the maximum number of ambigious bases allowed. The default is -1.\n"; helpString += "The maxhomop parameter allows you to set a maximum homopolymer length. \n"; helpString += "The minlength parameter allows you to set and minimum sequence length. \n"; helpString += "The maxlength parameter allows you to set and maximum sequence length. \n"; helpString += "The tdiffs parameter is used to specify the total number of differences allowed in the sequence. The default is pdiffs + bdiffs + sdiffs + ldiffs.\n"; helpString += "The bdiffs parameter is used to specify the number of differences allowed in the barcode. The default is 0.\n"; helpString += "The pdiffs parameter is used to specify the number of differences allowed in the primer. The default is 0.\n"; helpString += "The ldiffs parameter is used to specify the number of differences allowed in the linker. The default is 0.\n"; helpString += "The sdiffs parameter is used to specify the number of differences allowed in the spacer. The default is 0.\n"; helpString += "The qfile parameter allows you to provide a quality file.\n"; helpString += "The qthreshold parameter allows you to set a minimum quality score allowed. \n"; helpString += "The qaverage parameter allows you to set a minimum average quality score allowed. \n"; helpString += "The qwindowsize parameter allows you to set a number of bases in a window. Default=50.\n"; helpString += "The qwindowaverage parameter allows you to set a minimum average quality score allowed over a window. \n"; helpString += "The rollaverage parameter allows you to set a minimum rolling average quality score allowed over a window. \n"; helpString += "The qstepsize parameter allows you to set a number of bases to move the window over. Default=1.\n"; helpString += "The logtransform parameter allows you to indicate you want the averages for the qwindowaverage, rollaverage and qaverage to be calculated using a logtransform. Default=F.\n"; helpString += "The allfiles parameter will create separate group and fasta file for each grouping. The default is F.\n"; helpString += "The keepforward parameter allows you to indicate whether you want the forward primer removed or not. The default is F, meaning remove the forward primer.\n"; helpString += "The qtrim parameter will trim sequence from the point that they fall below the qthreshold and put it in the .trim file if set to true. The default is T.\n"; helpString += "The keepfirst parameter trims the sequence to the first keepfirst number of bases after the barcode or primers are removed, before the sequence is checked to see if it meets the other requirements. \n"; helpString += "The removelast removes the last removelast number of bases after the barcode or primers are removed, before the sequence is checked to see if it meets the other requirements.\n"; helpString += "The trim.seqs command should be in the following format: \n"; helpString += "trim.seqs(fasta=yourFastaFile, flip=yourFlip, oligos=yourOligos, maxambig=yourMaxambig, \n"; helpString += "maxhomop=yourMaxhomop, minlength=youMinlength, maxlength=yourMaxlength) \n"; helpString += "Example trim.seqs(fasta=abrecovery.fasta, flip=..., oligos=..., maxambig=..., maxhomop=..., minlength=..., maxlength=...).\n"; helpString += "Note: No spaces between parameter labels (i.e. fasta), '=' and parameters (i.e.yourFasta).\n"; helpString += "For more details please check out the wiki http://www.mothur.org/wiki/Trim.seqs .\n"; return helpString; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string TrimSeqsCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "qfile") { pattern = "[filename],[tag],qual"; } else if (type == "fasta") { pattern = "[filename],[tag],fasta"; } else if (type == "group") { pattern = "[filename],groups"; } else if (type == "name") { pattern = "[filename],[tag],names"; } else if (type == "count") { pattern = "[filename],[tag],count_table-[filename],count_table"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** TrimSeqsCommand::TrimSeqsCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "TrimSeqsCommand"); exit(1); } } //*************************************************************************************************************** TrimSeqsCommand::TrimSeqsCommand(string option) { try { abort = false; calledHelp = false; comboStarts = 0; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); ValidParameters validParameter; map::iterator it; //check to make sure all parameters are valid for command for (it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["fasta"] = tempOutNames; outputTypes["qfile"] = tempOutNames; outputTypes["group"] = tempOutNames; outputTypes["name"] = tempOutNames; outputTypes["count"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("fasta"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["fasta"] = inputDir + it->second; } } it = parameters.find("oligos"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["oligos"] = inputDir + it->second; } } it = parameters.find("qfile"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["qfile"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters fastaFile = validParameter.validFile(parameters, "fasta", true); if (fastaFile == "not found") { fastaFile = m->getFastaFile(); if (fastaFile != "") { m->mothurOut("Using " + fastaFile + " as input file for the fasta parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current fastafile and the fasta parameter is required."); m->mothurOutEndLine(); abort = true; } }else if (fastaFile == "not open") { abort = true; } else { m->setFastaFile(fastaFile); } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = ""; outputDir += m->hasPath(fastaFile); //if user entered a file with a path then preserve it } //check for optional parameter and set defaults // ...at some point should added some additional type checking... string temp; temp = validParameter.validFile(parameters, "flip", false); if (temp == "not found") { flip = 0; } else { flip = m->isTrue(temp); } temp = validParameter.validFile(parameters, "oligos", true); if (temp == "not found"){ oligoFile = ""; } else if(temp == "not open"){ abort = true; } else { oligoFile = temp; m->setOligosFile(oligoFile); } temp = validParameter.validFile(parameters, "maxambig", false); if (temp == "not found") { temp = "-1"; } m->mothurConvert(temp, maxAmbig); temp = validParameter.validFile(parameters, "maxhomop", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, maxHomoP); temp = validParameter.validFile(parameters, "minlength", false); if (temp == "not found") { temp = "1"; } m->mothurConvert(temp, minLength); temp = validParameter.validFile(parameters, "maxlength", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, maxLength); temp = validParameter.validFile(parameters, "bdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, bdiffs); temp = validParameter.validFile(parameters, "pdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, pdiffs); temp = validParameter.validFile(parameters, "ldiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, ldiffs); temp = validParameter.validFile(parameters, "sdiffs", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, sdiffs); temp = validParameter.validFile(parameters, "tdiffs", false); if (temp == "not found") { int tempTotal = pdiffs + bdiffs + ldiffs + sdiffs; temp = toString(tempTotal); } m->mothurConvert(temp, tdiffs); if(tdiffs == 0){ tdiffs = bdiffs + pdiffs + ldiffs + sdiffs; } temp = validParameter.validFile(parameters, "qfile", true); if (temp == "not found") { qFileName = ""; } else if(temp == "not open") { abort = true; } else { qFileName = temp; m->setQualFile(qFileName); } temp = validParameter.validFile(parameters, "name", true); if (temp == "not found") { nameFile = ""; } else if(temp == "not open") { nameFile = ""; abort = true; } else { nameFile = temp; m->setNameFile(nameFile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { abort = true; countfile = ""; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((countfile != "") && (nameFile != "")) { m->mothurOut("You must enter ONLY ONE of the following: count or name."); m->mothurOutEndLine(); abort = true; } temp = validParameter.validFile(parameters, "qthreshold", false); if (temp == "not found") { temp = "0"; } m->mothurConvert(temp, qThreshold); temp = validParameter.validFile(parameters, "qtrim", false); if (temp == "not found") { temp = "t"; } qtrim = m->isTrue(temp); temp = validParameter.validFile(parameters, "rollaverage", false); if (temp == "not found") { temp = "0"; } convert(temp, qRollAverage); temp = validParameter.validFile(parameters, "qwindowaverage", false);if (temp == "not found") { temp = "0"; } convert(temp, qWindowAverage); temp = validParameter.validFile(parameters, "qwindowsize", false); if (temp == "not found") { temp = "50"; } convert(temp, qWindowSize); temp = validParameter.validFile(parameters, "qstepsize", false); if (temp == "not found") { temp = "1"; } convert(temp, qWindowStep); temp = validParameter.validFile(parameters, "qaverage", false); if (temp == "not found") { temp = "0"; } convert(temp, qAverage); temp = validParameter.validFile(parameters, "keepfirst", false); if (temp == "not found") { temp = "0"; } convert(temp, keepFirst); temp = validParameter.validFile(parameters, "removelast", false); if (temp == "not found") { temp = "0"; } convert(temp, removeLast); temp = validParameter.validFile(parameters, "allfiles", false); if (temp == "not found") { temp = "F"; } allFiles = m->isTrue(temp); temp = validParameter.validFile(parameters, "keepforward", false); if (temp == "not found") { temp = "F"; } keepforward = m->isTrue(temp); temp = validParameter.validFile(parameters, "logtransform", false); if (temp == "not found") { temp = "F"; } logtransform = m->isTrue(temp); temp = validParameter.validFile(parameters, "checkorient", false); if (temp == "not found") { temp = "F"; } reorient = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); if(allFiles && (oligoFile == "")){ m->mothurOut("You selected allfiles, but didn't enter an oligos. Ignoring the allfiles request."); m->mothurOutEndLine(); } if((qAverage != 0 && qThreshold != 0) && qFileName == ""){ m->mothurOut("You didn't provide a quality file name, quality criteria will be ignored."); m->mothurOutEndLine(); qAverage=0; qThreshold=0; } if(!flip && oligoFile=="" && !maxLength && !minLength && (maxAmbig==-1) && !maxHomoP && qFileName == ""){ m->mothurOut("You didn't set any options... quiting command."); m->mothurOutEndLine(); abort = true; } if (countfile == "") { if (nameFile == "") { vector files; files.push_back(fastaFile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "TrimSeqsCommand"); exit(1); } } //*************************************************************************************************************** int TrimSeqsCommand::execute(){ try{ if (abort == true) { if (calledHelp) { return 0; } return 2; } pairedOligos = false; numFPrimers = 0; //this needs to be initialized numRPrimers = 0; numSpacers = 0; numLinkers = 0; createGroup = false; vector > fastaFileNames; vector > qualFileNames; vector > nameFileNames; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFile)); variables["[tag]"] = "trim"; string trimSeqFile = getOutputFileName("fasta",variables); string trimQualFile = getOutputFileName("qfile",variables); outputNames.push_back(trimSeqFile); outputTypes["fasta"].push_back(trimSeqFile); variables["[tag]"] = "scrap"; string scrapSeqFile = getOutputFileName("fasta",variables); string scrapQualFile = getOutputFileName("qfile",variables); outputNames.push_back(scrapSeqFile); outputTypes["fasta"].push_back(scrapSeqFile); if (qFileName != "") { outputNames.push_back(trimQualFile); outputNames.push_back(scrapQualFile); outputTypes["qfile"].push_back(trimQualFile); outputTypes["qfile"].push_back(scrapQualFile); } variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(nameFile)); variables["[tag]"] = "trim"; string trimNameFile = getOutputFileName("name",variables); variables["[tag]"] = "scrap"; string scrapNameFile = getOutputFileName("name",variables); if (nameFile != "") { m->readNames(nameFile, nameMap); outputNames.push_back(trimNameFile); outputNames.push_back(scrapNameFile); outputTypes["name"].push_back(trimNameFile); outputTypes["name"].push_back(scrapNameFile); } variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(countfile)); variables["[tag]"] = "trim"; string trimCountFile = getOutputFileName("count",variables); variables["[tag]"] = "scrap"; string scrapCountFile = getOutputFileName("count",variables); if (countfile != "") { CountTable ct; ct.readTable(countfile, true, false); nameCount = ct.getNameMap(); outputNames.push_back(trimCountFile); outputNames.push_back(scrapCountFile); outputTypes["count"].push_back(trimCountFile); outputTypes["count"].push_back(scrapCountFile); } if (m->control_pressed) { return 0; } string outputGroupFileName; if(oligoFile != ""){ createGroup = getOligos(fastaFileNames, qualFileNames, nameFileNames); if ((createGroup) && (countfile == "")){ map myvariables; myvariables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFile)); outputGroupFileName = getOutputFileName("group",myvariables); outputNames.push_back(outputGroupFileName); outputTypes["group"].push_back(outputGroupFileName); } } if (m->control_pressed) { return 0; } //fills lines and qlines setLines(fastaFile, qFileName); if(processors == 1){ driverCreateTrim(fastaFile, qFileName, trimSeqFile, scrapSeqFile, trimQualFile, scrapQualFile, trimNameFile, scrapNameFile, trimCountFile, scrapCountFile, outputGroupFileName, fastaFileNames, qualFileNames, nameFileNames, lines[0], qLines[0]); }else{ createProcessesCreateTrim(fastaFile, qFileName, trimSeqFile, scrapSeqFile, trimQualFile, scrapQualFile, trimNameFile, scrapNameFile, trimCountFile, scrapCountFile, outputGroupFileName, fastaFileNames, qualFileNames, nameFileNames); } if (m->control_pressed) { return 0; } if(allFiles){ map uniqueFastaNames;// so we don't add the same groupfile multiple times map::iterator it; set namesToRemove; for(int i=0;iisBlank(fastaFileNames[i][j])){ m->mothurRemove(fastaFileNames[i][j]); namesToRemove.insert(fastaFileNames[i][j]); if(qFileName != ""){ m->mothurRemove(qualFileNames[i][j]); namesToRemove.insert(qualFileNames[i][j]); } if(nameFile != ""){ m->mothurRemove(nameFileNames[i][j]); namesToRemove.insert(nameFileNames[i][j]); } }else{ it = uniqueFastaNames.find(fastaFileNames[i][j]); if (it == uniqueFastaNames.end()) { uniqueFastaNames[fastaFileNames[i][j]] = barcodeNameVector[i]; } } } } } } //remove names for outputFileNames, just cleans up the output vector outputNames2; for(int i = 0; i < outputNames.size(); i++) { if (namesToRemove.count(outputNames[i]) == 0) { outputNames2.push_back(outputNames[i]); } } outputNames = outputNames2; for (it = uniqueFastaNames.begin(); it != uniqueFastaNames.end(); it++) { ifstream in; m->openInputFile(it->first, in); ofstream out; map myvariables; myvariables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(it->first)); string thisGroupName = ""; if (countfile == "") { thisGroupName = getOutputFileName("group",myvariables); outputNames.push_back(thisGroupName); outputTypes["group"].push_back(thisGroupName); } else { thisGroupName = getOutputFileName("count",myvariables); outputNames.push_back(thisGroupName); outputTypes["count"].push_back(thisGroupName); } m->openOutputFile(thisGroupName, out); if (countfile != "") { out << "Representative_Sequence\ttotal\t" << it->second << endl; } while (!in.eof()){ if (m->control_pressed) { break; } Sequence currSeq(in); m->gobble(in); if (countfile == "") { out << currSeq.getName() << '\t' << it->second << endl; if (nameFile != "") { map::iterator itName = nameMap.find(currSeq.getName()); if (itName != nameMap.end()) { vector thisSeqsNames; m->splitAtChar(itName->second, thisSeqsNames, ','); for (int k = 1; k < thisSeqsNames.size(); k++) { //start at 1 to skip self out << thisSeqsNames[k] << '\t' << it->second << endl; } }else { m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); } } }else { map::iterator itTotalReps = nameCount.find(currSeq.getName()); if (itTotalReps != nameCount.end()) { out << currSeq.getName() << '\t' << itTotalReps->second << '\t' << itTotalReps->second << endl; } else { m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your count file, please correct."); m->mothurOutEndLine(); } } } in.close(); out.close(); } if (countfile != "") { //create countfile with group info included CountTable* ct = new CountTable(); ct->readTable(trimCountFile, true, false); map justTrimmedNames = ct->getNameMap(); delete ct; CountTable newCt; for (map::iterator itCount = groupCounts.begin(); itCount != groupCounts.end(); itCount++) { newCt.addGroup(itCount->first); } vector tempCounts; tempCounts.resize(groupCounts.size(), 0); for (map::iterator itNames = justTrimmedNames.begin(); itNames != justTrimmedNames.end(); itNames++) { newCt.push_back(itNames->first, tempCounts); //add it to the table with no abundance so we can set the groups abundance map::iterator it2 = groupMap.find(itNames->first); if (it2 != groupMap.end()) { newCt.setAbund(itNames->first, it2->second, itNames->second); } else { m->mothurOut("[ERROR]: missing group info for " + itNames->first + "."); m->mothurOutEndLine(); m->control_pressed = true; } } newCt.printTable(trimCountFile); } } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output group counts m->mothurOutEndLine(); int total = 0; if (groupCounts.size() != 0) { m->mothurOut("Group count: \n"); } for (map::iterator it = groupCounts.begin(); it != groupCounts.end(); it++) { total += it->second; m->mothurOut(it->first + "\t" + toString(it->second)); m->mothurOutEndLine(); } if (total != 0) { m->mothurOut("Total of all groups is " + toString(total)); m->mothurOutEndLine(); } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //set fasta file as new current fastafile string current = ""; itTypes = outputTypes.find("fasta"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setFastaFile(current); } } itTypes = outputTypes.find("name"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setNameFile(current); } } itTypes = outputTypes.find("qfile"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setQualFile(current); } } itTypes = outputTypes.find("group"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setGroupFile(current); } } itTypes = outputTypes.find("count"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setCountTableFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "execute"); exit(1); } } /**************************************************************************************/ int TrimSeqsCommand::driverCreateTrim(string filename, string qFileName, string trimFileName, string scrapFileName, string trimQFileName, string scrapQFileName, string trimNFileName, string scrapNFileName, string trimCFileName, string scrapCFileName, string groupFileName, vector > fastaFileNames, vector > qualFileNames, vector > nameFileNames, linePair line, linePair qline) { try { ofstream trimFASTAFile; m->openOutputFile(trimFileName, trimFASTAFile); ofstream scrapFASTAFile; m->openOutputFile(scrapFileName, scrapFASTAFile); ofstream trimQualFile; ofstream scrapQualFile; if(qFileName != ""){ m->openOutputFile(trimQFileName, trimQualFile); m->openOutputFile(scrapQFileName, scrapQualFile); } ofstream trimNameFile; ofstream scrapNameFile; if(nameFile != ""){ m->openOutputFile(trimNFileName, trimNameFile); m->openOutputFile(scrapNFileName, scrapNameFile); } ofstream trimCountFile; ofstream scrapCountFile; if(countfile != ""){ m->openOutputFile(trimCFileName, trimCountFile); m->openOutputFile(scrapCFileName, scrapCountFile); if (line.start == 0) { trimCountFile << "Representative_Sequence\ttotal" << endl; scrapCountFile << "Representative_Sequence\ttotal" << endl; } } ofstream outGroupsFile; if ((createGroup) && (countfile == "")){ m->openOutputFile(groupFileName, outGroupsFile); } if(allFiles){ for (int i = 0; i < fastaFileNames.size(); i++) { //clears old file for (int j = 0; j < fastaFileNames[i].size(); j++) { //clears old file if (fastaFileNames[i][j] != "") { ofstream temp; m->openOutputFile(fastaFileNames[i][j], temp); temp.close(); if(qFileName != ""){ m->openOutputFile(qualFileNames[i][j], temp); temp.close(); } if(nameFile != ""){ m->openOutputFile(nameFileNames[i][j], temp); temp.close(); } } } } } ifstream inFASTA; m->openInputFile(filename, inFASTA); inFASTA.seekg(line.start); ifstream qFile; if(qFileName != "") { m->openInputFile(qFileName, qFile); qFile.seekg(qline.start); } int count = 0; bool moreSeqs = 1; int numBarcodes = barcodes.size(); TrimOligos* trimOligos = NULL; if (pairedOligos) { trimOligos = new TrimOligos(pdiffs, bdiffs, 0, 0, pairedPrimers, pairedBarcodes, false); numBarcodes = pairedBarcodes.size(); } else { trimOligos = new TrimOligos(pdiffs, bdiffs, ldiffs, sdiffs, primers, barcodes, revPrimer, linker, spacer); } TrimOligos* rtrimOligos = NULL; if (reorient) { //create reoriented primer and barcode pairs map rpairedPrimers, rpairedBarcodes; for (map::iterator it = pairedPrimers.begin(); it != pairedPrimers.end(); it++) { oligosPair tempPair(reverseOligo((it->second).reverse), (reverseOligo((it->second).forward))); //reversePrimer, rc ForwardPrimer rpairedPrimers[it->first] = tempPair; //cout << reverseOligo((it->second).reverse) << '\t' << (reverseOligo((it->second).forward)) << '\t' << primerNameVector[it->first] << endl; } for (map::iterator it = pairedBarcodes.begin(); it != pairedBarcodes.end(); it++) { oligosPair tempPair(reverseOligo((it->second).reverse), (reverseOligo((it->second).forward))); //reverseBarcode, rc ForwardBarcode rpairedBarcodes[it->first] = tempPair; //cout << reverseOligo((it->second).reverse) << '\t' << (reverseOligo((it->second).forward)) << '\t' << barcodeNameVector[it->first] << endl; } int index = rpairedBarcodes.size(); for (map::iterator it = barcodes.begin(); it != barcodes.end(); it++) { oligosPair tempPair("", reverseOligo((it->first))); //reverseBarcode, rc ForwardBarcode rpairedBarcodes[index] = tempPair; index++; //cout << reverseOligo((it->second).reverse) << '\t' << (reverseOligo((it->second).forward)) << '\t' << barcodeNameVector[it->first] << endl; } index = rpairedPrimers.size(); for (map::iterator it = primers.begin(); it != primers.end(); it++) { oligosPair tempPair("", reverseOligo((it->first))); //reverseBarcode, rc ForwardBarcode rpairedPrimers[index] = tempPair; index++; //cout << reverseOligo((it->second).reverse) << '\t' << (reverseOligo((it->second).forward)) << '\t' << primerNameVector[it->first] << endl; } rtrimOligos = new TrimOligos(pdiffs, bdiffs, 0, 0, rpairedPrimers, rpairedBarcodes, false); numBarcodes = rpairedBarcodes.size(); } while (moreSeqs) { int obsBDiffs = 0; int obsPDiffs = 0; if (m->control_pressed) { delete trimOligos; if (reorient) { delete rtrimOligos; } inFASTA.close(); trimFASTAFile.close(); scrapFASTAFile.close(); if ((createGroup) && (countfile == "")) { outGroupsFile.close(); } if(qFileName != "") { qFile.close(); scrapQualFile.close(); trimQualFile.close(); } if(nameFile != "") { scrapNameFile.close(); trimNameFile.close(); } if(countfile != "") { scrapCountFile.close(); trimCountFile.close(); } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } int success = 1; string trashCode = ""; string commentString = ""; int currentSeqsDiffs = 0; Sequence currSeq(inFASTA); m->gobble(inFASTA); //cout << currSeq.getName() << '\t' << currSeq.getUnaligned() << endl; Sequence savedSeq(currSeq.getName(), currSeq.getAligned()); QualityScores currQual; QualityScores savedQual; if(qFileName != ""){ currQual = QualityScores(qFile); m->gobble(qFile); savedQual.setName(currQual.getName()); savedQual.setScores(currQual.getScores()); //cout << currQual.getName() << endl; } string origSeq = currSeq.getUnaligned(); if (origSeq != "") { int barcodeIndex = 0; int primerIndex = 0; // cout << currSeq.getName() << '\t'; cout.flush(); if(numLinkers != 0){ success = trimOligos->stripLinker(currSeq, currQual); if(success > ldiffs) { trashCode += 'k'; } else{ currentSeqsDiffs += success; } } if(numBarcodes != 0){ vector results = trimOligos->stripBarcode(currSeq, currQual, barcodeIndex); if (pairedOligos) { success = results[0] + results[2]; commentString += "fbdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + trimOligos->getCodeValue(results[3], bdiffs) + ") "; } else { success = results[0]; commentString += "bdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], bdiffs) + ") "; } if(success > bdiffs) { trashCode += 'b'; } else{ currentSeqsDiffs += success; } } obsBDiffs = success; // cout << success << '\t'; cout.flush(); //cout << currSeq.getName() << '\t' << currSeq.getUnaligned() << endl; if(numSpacers != 0){ success = trimOligos->stripSpacer(currSeq, currQual); if(success > sdiffs) { trashCode += 's'; } else{ currentSeqsDiffs += success; } } if(numFPrimers != 0){ vector results = trimOligos->stripForward(currSeq, currQual, primerIndex, keepforward); if (pairedOligos) { success = results[0] + results[2]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + trimOligos->getCodeValue(results[3], pdiffs) + ") "; } else { success = results[0]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pdiffs) + ") "; } if(success > pdiffs) { trashCode += 'f'; } else{ currentSeqsDiffs += success; } } obsPDiffs = success; // cout << success << '\t'; cout.flush(); // cout << currentSeqsDiffs << endl; if(numRPrimers != 0){ vector results = trimOligos->stripReverse(currSeq, currQual); success = results[0]; commentString += "rpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pdiffs) + ") "; if(success > pdiffs) { trashCode += 'r'; } else{ currentSeqsDiffs += success; } } if (currentSeqsDiffs > tdiffs) { trashCode += 't'; } if (reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; string thiscommentString = ""; int thisCurrentSeqsDiffs = 0; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; //cout << currSeq.getName() << '\t' << savedSeq.getUnaligned() << endl; if(numBarcodes != 0){ vector results = rtrimOligos->stripBarcode(savedSeq, savedQual, thisBarcodeIndex); if (pairedOligos) { thisSuccess = results[0] + results[2]; thiscommentString += "fbdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], bdiffs) + ") "; } else { thisSuccess = results[0]; thiscommentString += "bdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], bdiffs) + ") "; } if(thisSuccess > bdiffs) { thisTrashCode += "b"; } else{ thisCurrentSeqsDiffs += thisSuccess; } } int revBDiffs = thisSuccess; // cout << thisSuccess << '\t'; cout.flush(); //cout << currSeq.getName() << '\t' << savedSeq.getUnaligned() << endl; if(numFPrimers != 0){ vector results = rtrimOligos->stripForward(savedSeq, savedQual, thisPrimerIndex, keepforward); if (pairedOligos) { thisSuccess = results[0] + results[2]; thiscommentString += "fpdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pdiffs) + ") "; } else { thisSuccess = results[0]; thiscommentString += "pdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pdiffs) + ") "; } if(thisSuccess > pdiffs) { thisTrashCode += "f"; } else{ thisCurrentSeqsDiffs += thisSuccess; } } int revPDiffs = thisSuccess; // cout << thisSuccess << '\t'; cout.flush(); if (thisCurrentSeqsDiffs > tdiffs) { thisTrashCode += 't'; } // cout << thisCurrentSeqsDiffs << endl; if (thisTrashCode == "") { obsPDiffs = revPDiffs; obsBDiffs = revBDiffs; trashCode = thisTrashCode; success = thisSuccess; currentSeqsDiffs = thisCurrentSeqsDiffs; barcodeIndex = thisBarcodeIndex; commentString = thiscommentString; primerIndex = thisPrimerIndex; savedSeq.reverseComplement(); currSeq.setAligned(savedSeq.getAligned()); if(qFileName != ""){ savedQual.flipQScores(); currQual.setScores(savedQual.getScores()); } }else { trashCode += "(" + thisTrashCode + ")"; } } if(keepFirst != 0){ success = keepFirstTrim(currSeq, currQual); } if(removeLast != 0){ success = removeLastTrim(currSeq, currQual); if(!success) { trashCode += 'l'; } } if(qFileName != ""){ int origLength = currSeq.getNumBases(); if(qThreshold != 0) { success = currQual.stripQualThreshold(currSeq, qThreshold); } else if(qAverage != 0) { success = currQual.cullQualAverage(currSeq, qAverage, logtransform); } else if(qRollAverage != 0) { success = currQual.stripQualRollingAverage(currSeq, qRollAverage, logtransform); } else if(qWindowAverage != 0){ success = currQual.stripQualWindowAverage(currSeq, qWindowStep, qWindowSize, qWindowAverage, logtransform); } else { success = 1; } //you don't want to trim, if it fails above then scrap it if ((!qtrim) && (origLength != currSeq.getNumBases())) { success = 0; } if(!success) { trashCode += 'q'; } } if(minLength > 0 || maxLength > 0){ success = cullLength(currSeq); if(!success) { trashCode += 'l'; } } if(maxHomoP > 0){ success = cullHomoP(currSeq); if(!success) { trashCode += 'h'; } } if(maxAmbig != -1){ success = cullAmbigs(currSeq); if(!success) { trashCode += 'n'; } } if(flip){ // should go last currSeq.reverseComplement(); if(qFileName != ""){ currQual.flipQScores(); } } if (m->debug) { m->mothurOut("[DEBUG]: " + currSeq.getName() + ", trashcode= " + trashCode); if (trashCode.length() != 0) { m->mothurOutEndLine(); } } string seqComment = currSeq.getComment(); currSeq.setComment("\t" + commentString + "\t" + seqComment); if(trashCode.length() == 0){ string thisGroup = ""; if (createGroup) { if(numBarcodes != 0){ thisGroup = barcodeNameVector[barcodeIndex]; if (numFPrimers != 0) { if (primerNameVector[primerIndex] != "") { if(thisGroup != "") { thisGroup += "." + primerNameVector[primerIndex]; }else { thisGroup = primerNameVector[primerIndex]; } } } } } int pos = thisGroup.find("ignore"); if (pos == string::npos) { currSeq.setAligned(currSeq.getUnaligned()); currSeq.printSequence(trimFASTAFile); if(qFileName != ""){ currQual.printQScores(trimQualFile); } if(nameFile != ""){ map::iterator itName = nameMap.find(currSeq.getName()); if (itName != nameMap.end()) { trimNameFile << itName->first << '\t' << itName->second << endl; } else { m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); } } int numRedundants = 0; if (countfile != "") { map::iterator itCount = nameCount.find(currSeq.getName()); if (itCount != nameCount.end()) { trimCountFile << itCount->first << '\t' << itCount->second << endl; numRedundants = itCount->second-1; }else { m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your count file, please correct."); m->mothurOutEndLine(); } } if (createGroup) { if(numBarcodes != 0){ if (m->debug) { m->mothurOut(", group= " + thisGroup + "\n"); } if (countfile == "") { outGroupsFile << currSeq.getName() << '\t' << thisGroup << endl; } else { groupMap[currSeq.getName()] = thisGroup; } if (nameFile != "") { map::iterator itName = nameMap.find(currSeq.getName()); if (itName != nameMap.end()) { vector thisSeqsNames; m->splitAtChar(itName->second, thisSeqsNames, ','); numRedundants = thisSeqsNames.size()-1; //we already include ourselves below for (int k = 1; k < thisSeqsNames.size(); k++) { //start at 1 to skip self outGroupsFile << thisSeqsNames[k] << '\t' << thisGroup << endl; } }else { m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); } } map::iterator it = groupCounts.find(thisGroup); if (it == groupCounts.end()) { groupCounts[thisGroup] = 1 + numRedundants; } else { groupCounts[it->first] += (1 + numRedundants); } } } if(allFiles){ ofstream output; m->openOutputFileAppend(fastaFileNames[barcodeIndex][primerIndex], output); currSeq.printSequence(output); output.close(); if(qFileName != ""){ m->openOutputFileAppend(qualFileNames[barcodeIndex][primerIndex], output); currQual.printQScores(output); output.close(); } if(nameFile != ""){ map::iterator itName = nameMap.find(currSeq.getName()); if (itName != nameMap.end()) { m->openOutputFileAppend(nameFileNames[barcodeIndex][primerIndex], output); output << itName->first << '\t' << itName->second << endl; output.close(); }else { m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); } } } } } else{ if(nameFile != ""){ //needs to be before the currSeq name is changed map::iterator itName = nameMap.find(currSeq.getName()); if (itName != nameMap.end()) { scrapNameFile << itName->first << '\t' << itName->second << endl; } else { m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); } } if (countfile != "") { map::iterator itCount = nameCount.find(currSeq.getName()); if (itCount != nameCount.end()) { trimCountFile << itCount->first << '\t' << itCount->second << endl; }else { m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your count file, please correct."); m->mothurOutEndLine(); } } currSeq.setName(currSeq.getName() + '|' + trashCode); currSeq.setUnaligned(origSeq); currSeq.setAligned(origSeq); currSeq.printSequence(scrapFASTAFile); if(qFileName != ""){ currQual.printQScores(scrapQualFile); } } count++; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) unsigned long long pos = inFASTA.tellg(); if ((pos == -1) || (pos >= line.end)) { break; } #else if (inFASTA.eof()) { break; } #endif //report progress if((count) % 1000 == 0){ m->mothurOutJustToScreen(toString(count)+"\n"); } } //report progress if((count) % 1000 != 0){ m->mothurOutJustToScreen(toString(count)+"\n"); } delete trimOligos; if (reorient) { delete rtrimOligos; } inFASTA.close(); trimFASTAFile.close(); scrapFASTAFile.close(); if (createGroup) { outGroupsFile.close(); } if(qFileName != "") { qFile.close(); scrapQualFile.close(); trimQualFile.close(); } if(nameFile != "") { scrapNameFile.close(); trimNameFile.close(); } if(countfile != "") { scrapCountFile.close(); trimCountFile.close(); } return count; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "driverCreateTrim"); exit(1); } } /**************************************************************************************************/ int TrimSeqsCommand::createProcessesCreateTrim(string filename, string qFileName, string trimFASTAFileName, string scrapFASTAFileName, string trimQualFileName, string scrapQualFileName, string trimNameFileName, string scrapNameFileName, string trimCountFileName, string scrapCountFileName, string groupFile, vector > fastaFileNames, vector > qualFileNames, vector > nameFileNames) { try { int process = 1; int exitCommand = 1; processIDS.clear(); bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { int pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ vector > tempFASTAFileNames = fastaFileNames; vector > tempPrimerQualFileNames = qualFileNames; vector > tempNameFileNames = nameFileNames; if(allFiles){ ofstream temp; for(int i=0;iopenOutputFile(tempFASTAFileNames[i][j], temp); temp.close(); if(qFileName != ""){ tempPrimerQualFileNames[i][j] += toString(getpid()) + ".temp"; m->openOutputFile(tempPrimerQualFileNames[i][j], temp); temp.close(); } if(nameFile != ""){ tempNameFileNames[i][j] += toString(getpid()) + ".temp"; m->openOutputFile(tempNameFileNames[i][j], temp); temp.close(); } } } } } driverCreateTrim(filename, qFileName, (trimFASTAFileName + toString(getpid()) + ".temp"), (scrapFASTAFileName + toString(getpid()) + ".temp"), (trimQualFileName + toString(getpid()) + ".temp"), (scrapQualFileName + toString(getpid()) + ".temp"), (trimNameFileName + toString(getpid()) + ".temp"), (scrapNameFileName + toString(getpid()) + ".temp"), (trimCountFileName + toString(getpid()) + ".temp"), (scrapCountFileName + toString(getpid()) + ".temp"), (groupFile + toString(getpid()) + ".temp"), tempFASTAFileNames, tempPrimerQualFileNames, tempNameFileNames, lines[process], qLines[process]); if (m->debug) { m->mothurOut("[DEBUG]: " + toString(lines[process].start) + '\t' + toString(qLines[process].start) + '\t' + toString(getpid()) + '\n'); } //pass groupCounts to parent if(createGroup){ ofstream out; string tempFile = filename + toString(getpid()) + ".num.temp"; m->openOutputFile(tempFile, out); out << groupCounts.size() << endl; for (map::iterator it = groupCounts.begin(); it != groupCounts.end(); it++) { out << it->first << '\t' << it->second << endl; } out << groupMap.size() << endl; for (map::iterator it = groupMap.begin(); it != groupMap.end(); it++) { out << it->first << '\t' << it->second << endl; } out.close(); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(trimFASTAFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(scrapFASTAFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(trimQualFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(scrapQualFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(trimNameFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(scrapNameFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(trimCountFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(scrapCountFileName + (toString(processIDS[i]) + ".temp")); m->mothurRemove(groupFile + (toString(processIDS[i]) + ".temp")); if (createGroup) { string tempFile = filename + (toString(processIDS[i])) + ".num.temp"; m->mothurRemove(tempFile); } if(allFiles){ for(int i=0;imothurRemove(tempFile); if(qFileName != ""){ string tempFile = qualFileNames[i][j] +(toString(processIDS[i])) + ".temp"; m->mothurRemove(tempFile); } if(nameFile != ""){ string tempFile = nameFileNames[i][j] +(toString(processIDS[i])) + ".temp"; m->mothurRemove(tempFile); } } } } } } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); //redo file divide lines.clear(); setLines(fastaFile, qFileName); exitCommand = 1; processIDS.resize(0); process = 1; while (process != processors) { int pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ vector > tempFASTAFileNames = fastaFileNames; vector > tempPrimerQualFileNames = qualFileNames; vector > tempNameFileNames = nameFileNames; if(allFiles){ ofstream temp; for(int i=0;iopenOutputFile(tempFASTAFileNames[i][j], temp); temp.close(); if(qFileName != ""){ tempPrimerQualFileNames[i][j] += toString(getpid()) + ".temp"; m->openOutputFile(tempPrimerQualFileNames[i][j], temp); temp.close(); } if(nameFile != ""){ tempNameFileNames[i][j] += toString(getpid()) + ".temp"; m->openOutputFile(tempNameFileNames[i][j], temp); temp.close(); } } } } } driverCreateTrim(filename, qFileName, (trimFASTAFileName + toString(getpid()) + ".temp"), (scrapFASTAFileName + toString(getpid()) + ".temp"), (trimQualFileName + toString(getpid()) + ".temp"), (scrapQualFileName + toString(getpid()) + ".temp"), (trimNameFileName + toString(getpid()) + ".temp"), (scrapNameFileName + toString(getpid()) + ".temp"), (trimCountFileName + toString(getpid()) + ".temp"), (scrapCountFileName + toString(getpid()) + ".temp"), (groupFile + toString(getpid()) + ".temp"), tempFASTAFileNames, tempPrimerQualFileNames, tempNameFileNames, lines[process], qLines[process]); if (m->debug) { m->mothurOut("[DEBUG]: " + toString(lines[process].start) + '\t' + toString(qLines[process].start) + '\t' + toString(getpid()) + '\n'); } //pass groupCounts to parent if(createGroup){ ofstream out; string tempFile = filename + toString(getpid()) + ".num.temp"; m->openOutputFile(tempFile, out); out << groupCounts.size() << endl; for (map::iterator it = groupCounts.begin(); it != groupCounts.end(); it++) { out << it->first << '\t' << it->second << endl; } out << groupMap.size() << endl; for (map::iterator it = groupMap.begin(); it != groupMap.end(); it++) { out << it->first << '\t' << it->second << endl; } out.close(); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } //parent do my part ofstream temp; m->openOutputFile(trimFASTAFileName, temp); temp.close(); m->openOutputFile(scrapFASTAFileName, temp); temp.close(); if(qFileName != ""){ m->openOutputFile(trimQualFileName, temp); temp.close(); m->openOutputFile(scrapQualFileName, temp); temp.close(); } if (nameFile != "") { m->openOutputFile(trimNameFileName, temp); temp.close(); m->openOutputFile(scrapNameFileName, temp); temp.close(); } if (countfile != "") { m->openOutputFile(trimCountFileName, temp); temp.close(); m->openOutputFile(scrapCountFileName, temp); temp.close(); } driverCreateTrim(filename, qFileName, trimFASTAFileName, scrapFASTAFileName, trimQualFileName, scrapQualFileName, trimNameFileName, scrapNameFileName, trimCountFileName, scrapCountFileName, groupFile, fastaFileNames, qualFileNames, nameFileNames, lines[0], qLines[0]); //force parent to wait until all the processes are done for (int i=0;i pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; //Create processor worker threads. for( int h=0; h > tempFASTAFileNames = fastaFileNames; vector > tempPrimerQualFileNames = qualFileNames; vector > tempNameFileNames = nameFileNames; if(allFiles){ ofstream temp; for(int i=0;iopenOutputFile(tempFASTAFileNames[i][j], temp); temp.close(); if(qFileName != ""){ tempPrimerQualFileNames[i][j] += extension; m->openOutputFile(tempPrimerQualFileNames[i][j], temp); temp.close(); } if(nameFile != ""){ tempNameFileNames[i][j] += extension; m->openOutputFile(tempNameFileNames[i][j], temp); temp.close(); } } } } } trimData* tempTrim = new trimData(filename, qFileName, nameFile, countfile, (trimFASTAFileName+extension), (scrapFASTAFileName+extension), (trimQualFileName+extension), (scrapQualFileName+extension), (trimNameFileName+extension), (scrapNameFileName+extension), (trimCountFileName+extension), (scrapCountFileName+extension), (groupFile+extension), tempFASTAFileNames, tempPrimerQualFileNames, tempNameFileNames, lines[h].start, lines[h].end, qLines[h].start, qLines[h].end, m, pdiffs, bdiffs, ldiffs, sdiffs, tdiffs, primers, barcodes, revPrimer, linker, spacer, pairedBarcodes, pairedPrimers, pairedOligos, primerNameVector, barcodeNameVector, createGroup, allFiles, keepforward, keepFirst, removeLast, qWindowStep, qWindowSize, qWindowAverage, qtrim, qThreshold, qAverage, qRollAverage, logtransform, minLength, maxAmbig, maxHomoP, maxLength, flip, reorient, nameMap, nameCount); pDataArray.push_back(tempTrim); hThreadArray[h] = CreateThread(NULL, 0, MyTrimThreadFunction, pDataArray[h], 0, &dwThreadIdArray[h]); } //parent do my part ofstream temp; m->openOutputFile(trimFASTAFileName, temp); temp.close(); m->openOutputFile(scrapFASTAFileName, temp); temp.close(); if(qFileName != ""){ m->openOutputFile(trimQualFileName, temp); temp.close(); m->openOutputFile(scrapQualFileName, temp); temp.close(); } if (nameFile != "") { m->openOutputFile(trimNameFileName, temp); temp.close(); m->openOutputFile(scrapNameFileName, temp); temp.close(); } vector > tempFASTAFileNames = fastaFileNames; vector > tempPrimerQualFileNames = qualFileNames; vector > tempNameFileNames = nameFileNames; if(allFiles){ ofstream temp; string extension = toString(processors-1) + ".temp"; for(int i=0;iopenOutputFile(tempFASTAFileNames[i][j], temp); temp.close(); if(qFileName != ""){ tempPrimerQualFileNames[i][j] += extension; m->openOutputFile(tempPrimerQualFileNames[i][j], temp); temp.close(); } if(nameFile != ""){ tempNameFileNames[i][j] += extension; m->openOutputFile(tempNameFileNames[i][j], temp); temp.close(); } } } } } driverCreateTrim(filename, qFileName, (trimFASTAFileName + toString(processors-1) + ".temp"), (scrapFASTAFileName + toString(processors-1) + ".temp"), (trimQualFileName + toString(processors-1) + ".temp"), (scrapQualFileName + toString(processors-1) + ".temp"), (trimNameFileName + toString(processors-1) + ".temp"), (scrapNameFileName + toString(processors-1) + ".temp"), (trimCountFileName + toString(processors-1) + ".temp"), (scrapCountFileName + toString(processors-1) + ".temp"), (groupFile + toString(processors-1) + ".temp"), tempFASTAFileNames, tempPrimerQualFileNames, tempNameFileNames, lines[processors-1], qLines[processors-1]); processIDS.push_back(processors-1); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ if (pDataArray[i]->count != pDataArray[i]->lineEnd) { m->mothurOut("[ERROR]: process " + toString(i) + " only processed " + toString(pDataArray[i]->count) + " of " + toString(pDataArray[i]->lineEnd) + " sequences assigned to it, quitting. \n"); m->control_pressed = true; } for (map::iterator it = pDataArray[i]->groupCounts.begin(); it != pDataArray[i]->groupCounts.end(); it++) { map::iterator it2 = groupCounts.find(it->first); if (it2 == groupCounts.end()) { groupCounts[it->first] = it->second; } else { groupCounts[it->first] += it->second; } } for (map::iterator it = pDataArray[i]->groupMap.begin(); it != pDataArray[i]->groupMap.end(); it++) { map::iterator it2 = groupMap.find(it->first); if (it2 == groupMap.end()) { groupMap[it->first] = it->second; } else { m->mothurOut("[ERROR]: " + it->first + " is in your fasta file more than once. Sequence names must be unique. please correct.\n"); } } CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif //append files for(int i=0;imothurOut("Appending files from process " + toString(processIDS[i])); m->mothurOutEndLine(); m->appendFiles((trimFASTAFileName + toString(processIDS[i]) + ".temp"), trimFASTAFileName); m->mothurRemove((trimFASTAFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((scrapFASTAFileName + toString(processIDS[i]) + ".temp"), scrapFASTAFileName); m->mothurRemove((scrapFASTAFileName + toString(processIDS[i]) + ".temp")); if(qFileName != ""){ m->appendFiles((trimQualFileName + toString(processIDS[i]) + ".temp"), trimQualFileName); m->mothurRemove((trimQualFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((scrapQualFileName + toString(processIDS[i]) + ".temp"), scrapQualFileName); m->mothurRemove((scrapQualFileName + toString(processIDS[i]) + ".temp")); } if(nameFile != ""){ m->appendFiles((trimNameFileName + toString(processIDS[i]) + ".temp"), trimNameFileName); m->mothurRemove((trimNameFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((scrapNameFileName + toString(processIDS[i]) + ".temp"), scrapNameFileName); m->mothurRemove((scrapNameFileName + toString(processIDS[i]) + ".temp")); } if(countfile != ""){ m->appendFiles((trimCountFileName + toString(processIDS[i]) + ".temp"), trimCountFileName); m->mothurRemove((trimCountFileName + toString(processIDS[i]) + ".temp")); m->appendFiles((scrapCountFileName + toString(processIDS[i]) + ".temp"), scrapCountFileName); m->mothurRemove((scrapCountFileName + toString(processIDS[i]) + ".temp")); } if((createGroup)&&(countfile == "")){ m->appendFiles((groupFile + toString(processIDS[i]) + ".temp"), groupFile); m->mothurRemove((groupFile + toString(processIDS[i]) + ".temp")); } if(allFiles){ for(int j=0;jappendFiles((fastaFileNames[j][k] + toString(processIDS[i]) + ".temp"), fastaFileNames[j][k]); m->mothurRemove((fastaFileNames[j][k] + toString(processIDS[i]) + ".temp")); if(qFileName != ""){ m->appendFiles((qualFileNames[j][k] + toString(processIDS[i]) + ".temp"), qualFileNames[j][k]); m->mothurRemove((qualFileNames[j][k] + toString(processIDS[i]) + ".temp")); } if(nameFile != ""){ m->appendFiles((nameFileNames[j][k] + toString(processIDS[i]) + ".temp"), nameFileNames[j][k]); m->mothurRemove((nameFileNames[j][k] + toString(processIDS[i]) + ".temp")); } } } } } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if(createGroup){ ifstream in; string tempFile = filename + toString(processIDS[i]) + ".num.temp"; m->openInputFile(tempFile, in); int tempNum; string group; in >> tempNum; m->gobble(in); if (tempNum != 0) { for (int i = 0; i < tempNum; i++) { int groupNum; in >> group >> groupNum; m->gobble(in); map::iterator it = groupCounts.find(group); if (it == groupCounts.end()) { groupCounts[group] = groupNum; } else { groupCounts[it->first] += groupNum; } } } in >> tempNum; m->gobble(in); if (tempNum != 0) { for (int i = 0; i < tempNum; i++) { string group, seqName; in >> seqName >> group; m->gobble(in); map::iterator it = groupMap.find(seqName); if (it == groupMap.end()) { groupMap[seqName] = group; } else { m->mothurOut("[ERROR]: " + seqName + " is in your fasta file more than once. Sequence names must be unique. please correct.\n"); } } } in.close(); m->mothurRemove(tempFile); } #endif } return exitCommand; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "createProcessesCreateTrim"); exit(1); } } /**************************************************************************************************/ int TrimSeqsCommand::setLines(string filename, string qfilename) { try { vector fastaFilePos; vector qfileFilePos; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //set file positions for fasta file fastaFilePos = m->divideFile(filename, processors); //get name of first sequence in each chunk map firstSeqNames; for (int i = 0; i < (fastaFilePos.size()-1); i++) { ifstream in; m->openInputFile(filename, in); in.seekg(fastaFilePos[i]); //adjust start if null strings if (i == 0) { m->zapGremlins(in); m->gobble(in); } Sequence temp(in); firstSeqNames[temp.getName()] = i; in.close(); } if(qfilename != "") { //seach for filePos of each first name in the qfile and save in qfileFilePos ifstream inQual; m->openInputFile(qfilename, inQual); string input; while(!inQual.eof()){ input = m->getline(inQual); if (input.length() != 0) { if(input[0] == '>'){ //this is a sequence name line istringstream nameStream(input); string sname = ""; nameStream >> sname; sname = sname.substr(1); m->checkName(sname); map::iterator it = firstSeqNames.find(sname); if(it != firstSeqNames.end()) { //this is the start of a new chunk unsigned long long pos = inQual.tellg(); qfileFilePos.push_back(pos - input.length() - 1); firstSeqNames.erase(it); } } } if (firstSeqNames.size() == 0) { break; } } inQual.close(); if (firstSeqNames.size() != 0) { for (map::iterator it = firstSeqNames.begin(); it != firstSeqNames.end(); it++) { m->mothurOut(it->first + " is in your fasta file and not in your quality file, not using quality file."); m->mothurOutEndLine(); } qFileName = ""; return processors; } //get last file position of qfile FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (qfilename.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } qfileFilePos.push_back(size); } for (int i = 0; i < (fastaFilePos.size()-1); i++) { if (m->debug) { m->mothurOut("[DEBUG]: " + toString(i) +'\t' + toString(fastaFilePos[i]) + '\t' + toString(fastaFilePos[i+1]) + '\n'); } lines.push_back(linePair(fastaFilePos[i], fastaFilePos[(i+1)])); if (qfilename != "") { qLines.push_back(linePair(qfileFilePos[i], qfileFilePos[(i+1)])); } } if(qfilename == "") { qLines = lines; } //files with duds return processors; #else if (processors == 1) { //save time //fastaFilePos.push_back(0); qfileFilePos.push_back(0); //fastaFilePos.push_back(1000); qfileFilePos.push_back(1000); lines.push_back(linePair(0, 1000)); if (qfilename != "") { qLines.push_back(linePair(0, 1000)); } }else{ int numFastaSeqs = 0; fastaFilePos = m->setFilePosFasta(filename, numFastaSeqs); if (fastaFilePos.size() < processors) { processors = fastaFilePos.size(); } if (qfilename != "") { int numQualSeqs = 0; qfileFilePos = m->setFilePosFasta(qfilename, numQualSeqs); if (numFastaSeqs != numQualSeqs) { m->mothurOut("[ERROR]: You have " + toString(numFastaSeqs) + " sequences in your fasta file, but " + toString(numQualSeqs) + " sequences in your quality file."); m->mothurOutEndLine(); m->control_pressed = true; } } //figure out how many sequences you have to process int numSeqsPerProcessor = numFastaSeqs / processors; for (int i = 0; i < processors; i++) { int startIndex = i * numSeqsPerProcessor; if(i == (processors - 1)){ numSeqsPerProcessor = numFastaSeqs - i * numSeqsPerProcessor; } lines.push_back(linePair(fastaFilePos[startIndex], numSeqsPerProcessor)); if (qfilename != "") { qLines.push_back(linePair(qfileFilePos[startIndex], numSeqsPerProcessor)); } } } if(qfilename == "") { qLines = lines; } //files with duds return 1; #endif } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "setLines"); exit(1); } } //*************************************************************************************************************** bool TrimSeqsCommand::getOligos(vector >& fastaFileNames, vector >& qualFileNames, vector >& nameFileNames){ try { ifstream inOligos; m->openInputFile(oligoFile, inOligos); ofstream test; string type, oligo, roligo, group; bool hasPrimer = false; bool hasPairedBarcodes = false; int indexPrimer = 0; int indexBarcode = 0; int indexPairedPrimer = 0; int indexPairedBarcode = 0; set uniquePrimers; set uniqueBarcodes; while(!inOligos.eof()){ inOligos >> type; if (m->debug) { m->mothurOut("[DEBUG]: reading type - " + type + ".\n"); } if(type[0] == '#'){ while (!inOligos.eof()) { char c = inOligos.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there m->gobble(inOligos); } else{ m->gobble(inOligos); //make type case insensitive for(int i=0;i> oligo; if (m->debug) { m->mothurOut("[DEBUG]: reading - " + oligo + ".\n"); } for(int i=0;i::iterator itPrime = primers.find(oligo); if (itPrime != primers.end()) { m->mothurOut("primer " + oligo + " is in your oligos file already."); m->mothurOutEndLine(); } if (m->debug) { if (group != "") { m->mothurOut("[DEBUG]: reading group " + group + ".\n"); }else{ m->mothurOut("[DEBUG]: no group for primer " + oligo + ".\n"); } } primers[oligo]=indexPrimer; indexPrimer++; primerNameVector.push_back(group); } else if (type == "PRIMER"){ m->gobble(inOligos); inOligos >> roligo; for(int i=0;idebug) { m->mothurOut("[DEBUG]: primer pair " + newPrimer.forward + " " + newPrimer.reverse + ", and group = " + group + ".\n"); } //check for repeat barcodes string tempPair = oligo+roligo; if (uniquePrimers.count(tempPair) != 0) { m->mothurOut("primer pair " + newPrimer.forward + " " + newPrimer.reverse + " is in your oligos file already."); m->mothurOutEndLine(); } else { uniquePrimers.insert(tempPair); } if (m->debug) { if (group != "") { m->mothurOut("[DEBUG]: reading group " + group + ".\n"); }else{ m->mothurOut("[DEBUG]: no group for primer pair " + newPrimer.forward + " " + newPrimer.reverse + ".\n"); } } pairedPrimers[indexPairedPrimer]=newPrimer; indexPairedPrimer++; primerNameVector.push_back(group); hasPrimer = true; } else if(type == "REVERSE"){ //Sequence oligoRC("reverse", oligo); //oligoRC.reverseComplement(); string oligoRC = reverseOligo(oligo); revPrimer.push_back(oligoRC); } else if(type == "BARCODE"){ inOligos >> group; //barcode lines can look like BARCODE atgcatgc groupName - for 454 seqs //or BARCODE atgcatgc atgcatgc groupName - for illumina data that has forward and reverse info string temp = ""; while (!inOligos.eof()) { char c = inOligos.get(); if (c == 10 || c == 13 || c == -1){ break; } else if (c == 32 || c == 9){;} //space or tab else { temp += c; } } //then this is illumina data with 4 columns if (temp != "") { hasPairedBarcodes = true; string reverseBarcode = group; //reverseOligo(group); //reverse barcode group = temp; for(int i=0;idebug) { m->mothurOut("[DEBUG]: barcode pair " + newPair.forward + " " + newPair.reverse + ", and group = " + group + ".\n"); } //check for repeat barcodes string tempPair = oligo+reverseBarcode; if (uniqueBarcodes.count(tempPair) != 0) { m->mothurOut("barcode pair " + newPair.forward + " " + newPair.reverse + " is in your oligos file already, disregarding."); m->mothurOutEndLine(); } else { uniqueBarcodes.insert(tempPair); } pairedBarcodes[indexPairedBarcode]=newPair; indexPairedBarcode++; barcodeNameVector.push_back(group); }else { //check for repeat barcodes map::iterator itBar = barcodes.find(oligo); if (itBar != barcodes.end()) { m->mothurOut("barcode " + oligo + " is in your oligos file already."); m->mothurOutEndLine(); } barcodes[oligo]=indexBarcode; indexBarcode++; barcodeNameVector.push_back(group); } }else if(type == "LINKER"){ linker.push_back(oligo); }else if(type == "SPACER"){ spacer.push_back(oligo); } else{ m->mothurOut("[WARNING]: " + type + " is not recognized as a valid type. Choices are forward, reverse, and barcode. Ignoring " + oligo + "."); m->mothurOutEndLine(); } } m->gobble(inOligos); } inOligos.close(); if (hasPairedBarcodes || hasPrimer) { pairedOligos = true; if ((primers.size() != 0) || (barcodes.size() != 0) || (linker.size() != 0) || (spacer.size() != 0) || (revPrimer.size() != 0)) { m->control_pressed = true; m->mothurOut("[ERROR]: cannot mix paired primers and barcodes with non paired or linkers and spacers, quitting."); m->mothurOutEndLine(); return 0; } } if(barcodeNameVector.size() == 0 && primerNameVector[0] == ""){ allFiles = 0; } //add in potential combos if(barcodeNameVector.size() == 0){ barcodes[""] = 0; barcodeNameVector.push_back(""); } if(primerNameVector.size() == 0){ primers[""] = 0; primerNameVector.push_back(""); } fastaFileNames.resize(barcodeNameVector.size()); for(int i=0;i uniqueNames; //used to cleanup outputFileNames if (pairedOligos) { for(map::iterator itBar = pairedBarcodes.begin();itBar != pairedBarcodes.end();itBar++){ for(map::iterator itPrimer = pairedPrimers.begin();itPrimer != pairedPrimers.end(); itPrimer++){ string primerName = primerNameVector[itPrimer->first]; string barcodeName = barcodeNameVector[itBar->first]; if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else { string comboGroupName = ""; string fastaFileName = ""; string qualFileName = ""; string nameFileName = ""; string countFileName = ""; if(primerName == ""){ comboGroupName = barcodeNameVector[itBar->first]; } else{ if(barcodeName == ""){ comboGroupName = primerNameVector[itPrimer->first]; } else{ comboGroupName = barcodeNameVector[itBar->first] + "." + primerNameVector[itPrimer->first]; } } ofstream temp; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFile)); variables["[tag]"] = comboGroupName; fastaFileName = getOutputFileName("fasta", variables); if (uniqueNames.count(fastaFileName) == 0) { outputNames.push_back(fastaFileName); outputTypes["fasta"].push_back(fastaFileName); uniqueNames.insert(fastaFileName); } fastaFileNames[itBar->first][itPrimer->first] = fastaFileName; m->openOutputFile(fastaFileName, temp); temp.close(); if(qFileName != ""){ variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(qFileName)); qualFileName = getOutputFileName("qfile", variables); if (uniqueNames.count(qualFileName) == 0) { outputNames.push_back(qualFileName); outputTypes["qfile"].push_back(qualFileName); } qualFileNames[itBar->first][itPrimer->first] = qualFileName; m->openOutputFile(qualFileName, temp); temp.close(); } if(nameFile != ""){ variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(nameFile)); nameFileName = getOutputFileName("name", variables); if (uniqueNames.count(nameFileName) == 0) { outputNames.push_back(nameFileName); outputTypes["name"].push_back(nameFileName); } nameFileNames[itBar->first][itPrimer->first] = nameFileName; m->openOutputFile(nameFileName, temp); temp.close(); } } } } }else { for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = primerNameVector[itPrimer->second]; string barcodeName = barcodeNameVector[itBar->second]; if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else { string comboGroupName = ""; string fastaFileName = ""; string qualFileName = ""; string nameFileName = ""; string countFileName = ""; if(primerName == ""){ comboGroupName = barcodeNameVector[itBar->second]; } else{ if(barcodeName == ""){ comboGroupName = primerNameVector[itPrimer->second]; } else{ comboGroupName = barcodeNameVector[itBar->second] + "." + primerNameVector[itPrimer->second]; } } ofstream temp; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(fastaFile)); variables["[tag]"] = comboGroupName; fastaFileName = getOutputFileName("fasta", variables); if (uniqueNames.count(fastaFileName) == 0) { outputNames.push_back(fastaFileName); outputTypes["fasta"].push_back(fastaFileName); uniqueNames.insert(fastaFileName); } fastaFileNames[itBar->second][itPrimer->second] = fastaFileName; m->openOutputFile(fastaFileName, temp); temp.close(); if(qFileName != ""){ variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(qFileName)); qualFileName = getOutputFileName("qfile", variables); if (uniqueNames.count(qualFileName) == 0) { outputNames.push_back(qualFileName); outputTypes["qfile"].push_back(qualFileName); } qualFileNames[itBar->second][itPrimer->second] = qualFileName; m->openOutputFile(qualFileName, temp); temp.close(); } if(nameFile != ""){ variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(nameFile)); nameFileName = getOutputFileName("name", variables); if (uniqueNames.count(nameFileName) == 0) { outputNames.push_back(nameFileName); outputTypes["name"].push_back(nameFileName); } nameFileNames[itBar->second][itPrimer->second] = nameFileName; m->openOutputFile(nameFileName, temp); temp.close(); } } } } } } numFPrimers = primers.size(); if (pairedOligos) { numFPrimers = pairedPrimers.size(); } numRPrimers = revPrimer.size(); numLinkers = linker.size(); numSpacers = spacer.size(); bool allBlank = true; for (int i = 0; i < barcodeNameVector.size(); i++) { if (barcodeNameVector[i] != "") { allBlank = false; break; } } for (int i = 0; i < primerNameVector.size(); i++) { if (primerNameVector[i] != "") { allBlank = false; break; } } if (allBlank) { m->mothurOut("[WARNING]: your oligos file does not contain any group names. mothur will not create a groupfile."); m->mothurOutEndLine(); allFiles = false; return false; } return true; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "getOligos"); exit(1); } } //*************************************************************************************************************** bool TrimSeqsCommand::keepFirstTrim(Sequence& sequence, QualityScores& qscores){ try { bool success = 1; if(qscores.getName() != ""){ qscores.trimQScores(-1, keepFirst); } // sequence.printSequence(cout);cout << endl; sequence.trim(keepFirst); // sequence.printSequence(cout);cout << endl << endl;; return success; } catch(exception& e) { m->errorOut(e, "keepFirstTrim", "countDiffs"); exit(1); } } //*************************************************************************************************************** bool TrimSeqsCommand::removeLastTrim(Sequence& sequence, QualityScores& qscores){ try { bool success = 0; int length = sequence.getNumBases() - removeLast; if(length > 0){ if(qscores.getName() != ""){ qscores.trimQScores(-1, length); } sequence.trim(length); success = 1; } else{ success = 0; } return success; } catch(exception& e) { m->errorOut(e, "removeLastTrim", "countDiffs"); exit(1); } } //*************************************************************************************************************** bool TrimSeqsCommand::cullLength(Sequence& seq){ try { int length = seq.getNumBases(); bool success = 0; //guilty until proven innocent if(length >= minLength && maxLength == 0) { success = 1; } else if(length >= minLength && length <= maxLength) { success = 1; } else { success = 0; } return success; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "cullLength"); exit(1); } } //*************************************************************************************************************** bool TrimSeqsCommand::cullHomoP(Sequence& seq){ try { int longHomoP = seq.getLongHomoPolymer(); bool success = 0; //guilty until proven innocent if(longHomoP <= maxHomoP){ success = 1; } else { success = 0; } return success; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "cullHomoP"); exit(1); } } //********************************************************************/ string TrimSeqsCommand::reverseOligo(string oligo){ try { string reverse = ""; for(int i=oligo.length()-1;i>=0;i--){ if(oligo[i] == 'A') { reverse += 'T'; } else if(oligo[i] == 'T'){ reverse += 'A'; } else if(oligo[i] == 'U'){ reverse += 'A'; } else if(oligo[i] == 'G'){ reverse += 'C'; } else if(oligo[i] == 'C'){ reverse += 'G'; } else if(oligo[i] == 'R'){ reverse += 'Y'; } else if(oligo[i] == 'Y'){ reverse += 'R'; } else if(oligo[i] == 'M'){ reverse += 'K'; } else if(oligo[i] == 'K'){ reverse += 'M'; } else if(oligo[i] == 'W'){ reverse += 'W'; } else if(oligo[i] == 'S'){ reverse += 'S'; } else if(oligo[i] == 'B'){ reverse += 'V'; } else if(oligo[i] == 'V'){ reverse += 'B'; } else if(oligo[i] == 'D'){ reverse += 'H'; } else if(oligo[i] == 'H'){ reverse += 'D'; } else { reverse += 'N'; } } return reverse; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "reverseOligo"); exit(1); } } //*************************************************************************************************************** bool TrimSeqsCommand::cullAmbigs(Sequence& seq){ try { int numNs = seq.getAmbigBases(); bool success = 0; //guilty until proven innocent if(numNs <= maxAmbig) { success = 1; } else { success = 0; } return success; } catch(exception& e) { m->errorOut(e, "TrimSeqsCommand", "cullAmbigs"); exit(1); } } //*************************************************************************************************************** mothur-1.36.1/source/commands/trimseqscommand.h000066400000000000000000000765741255543666200216410ustar00rootroot00000000000000#ifndef TRIMSEQSCOMMAND_H #define TRIMSEQSCOMMAND_H /* * trimseqscommand.h * Mothur * * Created by Pat Schloss on 6/6/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "command.hpp" #include "sequence.hpp" #include "qualityscores.h" #include "trimoligos.h" #include "counttable.h" class TrimSeqsCommand : public Command { public: TrimSeqsCommand(string); TrimSeqsCommand(); ~TrimSeqsCommand(){} vector setParameters(); string getCommandName() { return "trim.seqs"; } string getCommandCategory() { return "Sequence Processing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Trim.seqs"; } string getDescription() { return "provides the preprocessing features needed to screen and sort pyrosequences"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: bool getOligos(vector >&, vector >&, vector >&); bool keepFirstTrim(Sequence&, QualityScores&); bool removeLastTrim(Sequence&, QualityScores&); bool cullLength(Sequence&); bool cullHomoP(Sequence&); bool cullAmbigs(Sequence&); string reverseOligo(string); bool abort, createGroup; string fastaFile, oligoFile, qFileName, groupfile, nameFile, countfile, outputDir; bool flip, allFiles, qtrim, keepforward, pairedOligos, reorient, logtransform; int numFPrimers, numRPrimers, numLinkers, numSpacers, maxAmbig, maxHomoP, minLength, maxLength, processors, tdiffs, bdiffs, pdiffs, ldiffs, sdiffs, comboStarts; int qWindowSize, qWindowStep, keepFirst, removeLast; double qRollAverage, qThreshold, qWindowAverage, qAverage; vector revPrimer, outputNames; set filesToRemove; map pairedBarcodes; map pairedPrimers; map barcodes; vector groupVector; map primers; vector linker; vector spacer; map combos; map groupToIndex; vector primerNameVector; //needed here? vector barcodeNameVector; //needed here? map groupCounts; map nameMap; map nameCount; //for countfile name -> repCount map groupMap; //for countfile name -> group vector processIDS; //processid vector lines; vector qLines; int driverCreateTrim(string, string, string, string, string, string, string, string, string, string, string, vector >, vector >, vector >, linePair, linePair); int createProcessesCreateTrim(string, string, string, string, string, string, string, string, string, string, string, vector >, vector >, vector >); int setLines(string, string); }; /**************************************************************************************************/ //custom data structure for threads to use. // This is passed by void pointer so it can be any data type // that can be passed using a single void pointer (LPVOID). struct trimData { unsigned long long start, end; MothurOut* m; string filename, qFileName, trimFileName, scrapFileName, trimQFileName, scrapQFileName, trimNFileName, scrapNFileName, trimCFileName, scrapCFileName, groupFileName, nameFile, countfile; vector > fastaFileNames; vector > qualFileNames; vector > nameFileNames; unsigned long long lineStart, lineEnd, qlineStart, qlineEnd; bool flip, allFiles, qtrim, keepforward, createGroup, pairedOligos, reorient, logtransform; int numFPrimers, numRPrimers, numLinkers, numSpacers, maxAmbig, maxHomoP, minLength, maxLength, tdiffs, bdiffs, pdiffs, ldiffs, sdiffs; int qWindowSize, qWindowStep, keepFirst, removeLast, count; double qRollAverage, qThreshold, qWindowAverage, qAverage; vector revPrimer; map barcodes; map primers; map nameCount; vector linker; vector spacer; map combos; vector primerNameVector; vector barcodeNameVector; map groupCounts; map nameMap; map groupMap; map pairedBarcodes; map pairedPrimers; trimData(){} trimData(string fn, string qn, string nf, string cf, string tn, string sn, string tqn, string sqn, string tnn, string snn, string tcn, string scn,string gn, vector > ffn, vector > qfn, vector > nfn, unsigned long long lstart, unsigned long long lend, unsigned long long qstart, unsigned long long qend, MothurOut* mout, int pd, int bd, int ld, int sd, int td, map pri, map bar, vector revP, vector li, vector spa, map pbr, map ppr, bool po, vector priNameVector, vector barNameVector, bool cGroup, bool aFiles, bool keepF, int keepfi, int removeL, int WindowStep, int WindowSize, int WindowAverage, bool trim, double Threshold, double Average, double RollAverage, bool lt, int minL, int maxA, int maxH, int maxL, bool fli, bool reo, map nm, map ncount) { filename = fn; qFileName = qn; nameFile = nf; countfile = cf; trimFileName = tn; scrapFileName = sn; trimQFileName = tqn; scrapQFileName = sqn; trimNFileName = tnn; scrapNFileName = snn; trimCFileName = tcn; scrapCFileName = scn; groupFileName = gn; fastaFileNames = ffn; qualFileNames = qfn; nameFileNames = nfn; lineStart = lstart; lineEnd = lend; qlineStart = qstart; qlineEnd = qend; m = mout; nameCount = ncount; pdiffs = pd; bdiffs = bd; ldiffs = ld; sdiffs = sd; tdiffs = td; barcodes = bar; pairedPrimers = ppr; pairedBarcodes = pbr; pairedOligos = po; primers = pri; numFPrimers = primers.size(); revPrimer = revP; numRPrimers = revPrimer.size(); linker = li; numLinkers = linker.size(); spacer = spa; numSpacers = spacer.size(); primerNameVector = priNameVector; barcodeNameVector = barNameVector; createGroup = cGroup; allFiles = aFiles; keepforward = keepF; keepFirst = keepfi; removeLast = removeL; qWindowStep = WindowStep; qWindowSize = WindowSize; qWindowAverage = WindowAverage; qtrim = trim; qThreshold = Threshold; qAverage = Average; qRollAverage = RollAverage; logtransform = lt; minLength = minL; maxAmbig = maxA; maxHomoP = maxH; maxLength = maxL; flip = fli; reorient = reo; nameMap = nm; count = 0; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyTrimThreadFunction(LPVOID lpParam){ trimData* pDataArray; pDataArray = (trimData*)lpParam; try { ofstream trimFASTAFile; pDataArray->m->openOutputFile(pDataArray->trimFileName, trimFASTAFile); ofstream scrapFASTAFile; pDataArray->m->openOutputFile(pDataArray->scrapFileName, scrapFASTAFile); ofstream trimQualFile; ofstream scrapQualFile; if(pDataArray->qFileName != ""){ pDataArray->m->openOutputFile(pDataArray->trimQFileName, trimQualFile); pDataArray->m->openOutputFile(pDataArray->scrapQFileName, scrapQualFile); } ofstream trimNameFile; ofstream scrapNameFile; if(pDataArray->nameFile != ""){ pDataArray->m->openOutputFile(pDataArray->trimNFileName, trimNameFile); pDataArray->m->openOutputFile(pDataArray->scrapNFileName, scrapNameFile); } ofstream outGroupsFile; if ((pDataArray->createGroup) && (pDataArray->countfile == "")){ pDataArray->m->openOutputFile(pDataArray->groupFileName, outGroupsFile); } if(pDataArray->allFiles){ for (int i = 0; i < pDataArray->fastaFileNames.size(); i++) { //clears old file for (int j = 0; j < pDataArray->fastaFileNames[i].size(); j++) { //clears old file if (pDataArray->fastaFileNames[i][j] != "") { ofstream temp; pDataArray->m->openOutputFile(pDataArray->fastaFileNames[i][j], temp); temp.close(); if(pDataArray->qFileName != ""){ pDataArray->m->openOutputFile(pDataArray->qualFileNames[i][j], temp); temp.close(); } if(pDataArray->nameFile != ""){ pDataArray->m->openOutputFile(pDataArray->nameFileNames[i][j], temp); temp.close(); } } } } } ofstream trimCountFile; ofstream scrapCountFile; if(pDataArray->countfile != ""){ pDataArray->m->openOutputFile(pDataArray->trimCFileName, trimCountFile); pDataArray->m->openOutputFile(pDataArray->scrapCFileName, scrapCountFile); if ((pDataArray->lineStart == 0) || (pDataArray->lineStart == 1)) { trimCountFile << "Representative_Sequence\ttotal" << endl; scrapCountFile << "Representative_Sequence\ttotal" << endl; } } ifstream inFASTA; pDataArray->m->openInputFile(pDataArray->filename, inFASTA); if ((pDataArray->lineStart == 0) || (pDataArray->lineStart == 1)) { inFASTA.seekg(0); }else { //this accounts for the difference in line endings. inFASTA.seekg(pDataArray->lineStart-1); pDataArray->m->gobble(inFASTA); } ifstream qFile; if(pDataArray->qFileName != "") { pDataArray->m->openInputFile(pDataArray->qFileName, qFile); if ((pDataArray->qlineStart == 0) || (pDataArray->qlineStart == 1)) { qFile.seekg(0); }else { //this accounts for the difference in line endings. qFile.seekg(pDataArray->qlineStart-1); pDataArray->m->gobble(qFile); } } TrimOligos* trimOligos = NULL; int numBarcodes = pDataArray->barcodes.size(); if (pDataArray->pairedOligos) { trimOligos = new TrimOligos(pDataArray->pdiffs, pDataArray->bdiffs, 0, 0, pDataArray->pairedPrimers, pDataArray->pairedBarcodes, false); numBarcodes = pDataArray->pairedBarcodes.size(); pDataArray->numFPrimers = pDataArray->pairedPrimers.size(); } else { trimOligos = new TrimOligos(pDataArray->pdiffs, pDataArray->bdiffs, pDataArray->ldiffs, pDataArray->sdiffs, pDataArray->primers, pDataArray->barcodes, pDataArray->revPrimer, pDataArray->linker, pDataArray->spacer); } TrimOligos* rtrimOligos = NULL; if (pDataArray->reorient) { //create reoriented primer and barcode pairs map rpairedPrimers, rpairedBarcodes; for (map::iterator it = pDataArray->pairedPrimers.begin(); it != pDataArray->pairedPrimers.end(); it++) { oligosPair tempPair(trimOligos->reverseOligo((it->second).reverse), (trimOligos->reverseOligo((it->second).forward))); //reversePrimer, rc ForwardPrimer rpairedPrimers[it->first] = tempPair; } for (map::iterator it = pDataArray->pairedBarcodes.begin(); it != pDataArray->pairedBarcodes.end(); it++) { oligosPair tempPair(trimOligos->reverseOligo((it->second).reverse), (trimOligos->reverseOligo((it->second).forward))); //reverseBarcode, rc ForwardBarcode rpairedBarcodes[it->first] = tempPair; } int index = rpairedBarcodes.size(); for (map::iterator it = pDataArray->barcodes.begin(); it != pDataArray->barcodes.end(); it++) { oligosPair tempPair("", trimOligos->reverseOligo((it->first))); //reverseBarcode, rc ForwardBarcode rpairedBarcodes[index] = tempPair; index++; } index = rpairedPrimers.size(); for (map::iterator it = pDataArray->primers.begin(); it != pDataArray->primers.end(); it++) { oligosPair tempPair("", trimOligos->reverseOligo((it->first))); //reverseBarcode, rc ForwardBarcode rpairedPrimers[index] = tempPair; index++; } rtrimOligos = new TrimOligos(pDataArray->pdiffs, pDataArray->bdiffs, 0, 0, rpairedPrimers, rpairedBarcodes, false); numBarcodes = rpairedBarcodes.size(); } pDataArray->count = 0; for(int i = 0; i < pDataArray->lineEnd; i++){ //end is the number of sequences to process if (pDataArray->m->control_pressed) { delete trimOligos; if (pDataArray->reorient) { delete rtrimOligos; } inFASTA.close(); trimFASTAFile.close(); scrapFASTAFile.close(); if ((pDataArray->createGroup) && (pDataArray->countfile == "")) { outGroupsFile.close(); } if(pDataArray->qFileName != "") { qFile.close(); scrapQualFile.close(); trimQualFile.close(); } if(pDataArray->nameFile != "") { scrapNameFile.close(); trimNameFile.close(); } if(pDataArray->countfile != "") { scrapCountFile.close(); trimCountFile.close(); } if(pDataArray->qFileName != ""){ qFile.close(); } return 0; } int success = 1; string trashCode = ""; string commentString = ""; int currentSeqsDiffs = 0; Sequence currSeq(inFASTA); pDataArray->m->gobble(inFASTA); Sequence savedSeq(currSeq.getName(), currSeq.getAligned()); QualityScores currQual; QualityScores savedQual; if(pDataArray->qFileName != ""){ currQual = QualityScores(qFile); pDataArray->m->gobble(qFile); savedQual.setName(currQual.getName()); savedQual.setScores(currQual.getScores()); } string origSeq = currSeq.getUnaligned(); if (origSeq != "") { pDataArray->count++; int barcodeIndex = 0; int primerIndex = 0; if(pDataArray->numLinkers != 0){ success = trimOligos->stripLinker(currSeq, currQual); if(success > pDataArray->ldiffs) { trashCode += 'k'; } else{ currentSeqsDiffs += success; } } if(numBarcodes != 0){ vector results = trimOligos->stripBarcode(currSeq, currQual, barcodeIndex); if (pDataArray->pairedOligos) { success = results[0] + results[2]; commentString += "fbdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pDataArray->bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + trimOligos->getCodeValue(results[3], pDataArray->bdiffs) + ") "; } else { success = results[0]; commentString += "bdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pDataArray->bdiffs) + ") "; } if(success > pDataArray->bdiffs) { trashCode += 'b'; } else{ currentSeqsDiffs += success; } } if(pDataArray->numSpacers != 0){ success = trimOligos->stripSpacer(currSeq, currQual); if(success > pDataArray->sdiffs) { trashCode += 's'; } else{ currentSeqsDiffs += success; } } if(pDataArray->numFPrimers != 0){ vector results = trimOligos->stripForward(currSeq, currQual, primerIndex, pDataArray->keepforward); if (pDataArray->pairedOligos) { success = results[0] + results[2]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pDataArray->pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + trimOligos->getCodeValue(results[3], pDataArray->pdiffs) + ") "; } else { success = results[0]; commentString += "fpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pDataArray->pdiffs) + ") "; } if(success > pDataArray->pdiffs) { trashCode += 'f'; } else{ currentSeqsDiffs += success; } } if (currentSeqsDiffs > pDataArray->tdiffs) { trashCode += 't'; } if(pDataArray->numRPrimers != 0){ vector results = trimOligos->stripReverse(currSeq, currQual); success = results[0]; commentString += "rpdiffs=" + toString(results[0]) + "(" + trimOligos->getCodeValue(results[1], pDataArray->pdiffs) + ") "; if(success > pDataArray->pdiffs) { trashCode += 'r'; } else{ currentSeqsDiffs += success; } } if (pDataArray->reorient && (trashCode != "")) { //if you failed and want to check the reverse int thisSuccess = 0; string thisTrashCode = ""; string thiscommentString = ""; int thisCurrentSeqsDiffs = 0; int thisBarcodeIndex = 0; int thisPrimerIndex = 0; if(numBarcodes != 0){ vector results = rtrimOligos->stripBarcode(savedSeq, savedQual, thisBarcodeIndex); if (pDataArray->pairedOligos) { thisSuccess = results[0] + results[2]; thiscommentString += "fbdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pDataArray->bdiffs) + "), rbdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pDataArray->bdiffs) + ") "; } else { thisSuccess = results[0]; thiscommentString += "bdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pDataArray->bdiffs) + ") "; } if(thisSuccess > pDataArray->bdiffs) { thisTrashCode += 'b'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if(pDataArray->numFPrimers != 0){ vector results = rtrimOligos->stripForward(savedSeq, savedQual, thisPrimerIndex, pDataArray->keepforward); if (pDataArray->pairedOligos) { thisSuccess = results[0] + results[2]; thiscommentString += "fpdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pDataArray->pdiffs) + "), rpdiffs=" + toString(results[2]) + "(" + rtrimOligos->getCodeValue(results[3], pDataArray->pdiffs) + ") "; } else { thisSuccess = results[0]; thiscommentString += "pdiffs=" + toString(results[0]) + "(" + rtrimOligos->getCodeValue(results[1], pDataArray->pdiffs) + ") "; } if(thisSuccess > pDataArray->pdiffs) { thisTrashCode += 'f'; } else{ thisCurrentSeqsDiffs += thisSuccess; } } if (thisCurrentSeqsDiffs > pDataArray->tdiffs) { thisTrashCode += 't'; } if (thisTrashCode == "") { trashCode = thisTrashCode; success = thisSuccess; commentString = thiscommentString; currentSeqsDiffs = thisCurrentSeqsDiffs; barcodeIndex = thisBarcodeIndex; primerIndex = thisPrimerIndex; savedSeq.reverseComplement(); currSeq.setAligned(savedSeq.getAligned()); if(pDataArray->qFileName != ""){ savedQual.flipQScores(); currQual.setScores(savedQual.getScores()); } }else { trashCode += "(" + thisTrashCode + ")"; } } if(pDataArray->keepFirst != 0){ //success = keepFirstTrim(currSeq, currQual); success = 1; if(currQual.getName() != ""){ currQual.trimQScores(-1, pDataArray->keepFirst); } currSeq.trim(pDataArray->keepFirst); } if(pDataArray->removeLast != 0){ //success = removeLastTrim(currSeq, currQual); success = 0; int length = currSeq.getNumBases() - pDataArray->removeLast; if(length > 0){ if(currQual.getName() != ""){ currQual.trimQScores(-1, length); } currSeq.trim(length); success = 1; } else{ success = 0; } if(!success) { trashCode += 'l'; } } if(pDataArray->qFileName != ""){ int origLength = currSeq.getNumBases(); if(pDataArray->qThreshold != 0) { success = currQual.stripQualThreshold(currSeq, pDataArray->qThreshold); } else if(pDataArray->qAverage != 0) { success = currQual.cullQualAverage(currSeq, pDataArray->qAverage, pDataArray->logtransform); } else if(pDataArray->qRollAverage != 0) { success = currQual.stripQualRollingAverage(currSeq, pDataArray->qRollAverage, pDataArray->logtransform); } else if(pDataArray->qWindowAverage != 0){ success = currQual.stripQualWindowAverage(currSeq, pDataArray->qWindowStep, pDataArray->qWindowSize, pDataArray->qWindowAverage, pDataArray->logtransform); } else { success = 1; } //you don't want to trim, if it fails above then scrap it if ((!pDataArray->qtrim) && (origLength != currSeq.getNumBases())) { success = 0; } if(!success) { trashCode += 'q'; } } if(pDataArray->minLength > 0 || pDataArray->maxLength > 0){ //success = cullLength(currSeq); int length = currSeq.getNumBases(); success = 0; //guilty until proven innocent if(length >= pDataArray->minLength && pDataArray->maxLength == 0) { success = 1; } else if(length >= pDataArray->minLength && length <= pDataArray->maxLength) { success = 1; } else { success = 0; } if(!success) { trashCode += 'l'; } } if(pDataArray->maxHomoP > 0){ //success = cullHomoP(currSeq); int longHomoP = currSeq.getLongHomoPolymer(); success = 0; //guilty until proven innocent if(longHomoP <= pDataArray->maxHomoP){ success = 1; } else { success = 0; } if(!success) { trashCode += 'h'; } } if(pDataArray->maxAmbig != -1){ //success = cullAmbigs(currSeq); int numNs = currSeq.getAmbigBases(); success = 0; //guilty until proven innocent if(numNs <= pDataArray->maxAmbig) { success = 1; } else { success = 0; } if(!success) { trashCode += 'n'; } } if(pDataArray->flip){ // should go last currSeq.reverseComplement(); if(pDataArray->qFileName != ""){ currQual.flipQScores(); } } string seqComment = currSeq.getComment(); currSeq.setComment("\t" + commentString + "\t" + seqComment); if(trashCode.length() == 0){ string thisGroup = ""; if (pDataArray->createGroup) { if(numBarcodes != 0){ thisGroup = pDataArray->barcodeNameVector[barcodeIndex]; if (pDataArray->numFPrimers != 0) { if (pDataArray->primerNameVector[primerIndex] != "") { if(thisGroup != "") { thisGroup += "." + pDataArray->primerNameVector[primerIndex]; }else { thisGroup = pDataArray->primerNameVector[primerIndex]; } } } } } int pos = thisGroup.find("ignore"); if (pos == string::npos) { currSeq.setAligned(currSeq.getUnaligned()); currSeq.printSequence(trimFASTAFile); if(pDataArray->qFileName != ""){ currQual.printQScores(trimQualFile); } if(pDataArray->nameFile != ""){ map::iterator itName = pDataArray->nameMap.find(currSeq.getName()); if (itName != pDataArray->nameMap.end()) { trimNameFile << itName->first << '\t' << itName->second << endl; } else { pDataArray->m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); } } int numRedundants = 0; if (pDataArray->countfile != "") { map::iterator itCount = pDataArray->nameCount.find(currSeq.getName()); if (itCount != pDataArray->nameCount.end()) { trimCountFile << itCount->first << '\t' << itCount->second << endl; numRedundants = itCount->second-1; }else { pDataArray->m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your count file, please correct."); pDataArray->m->mothurOutEndLine(); } } if (pDataArray->createGroup) { if(numBarcodes != 0){ if (pDataArray->countfile == "") { outGroupsFile << currSeq.getName() << '\t' << thisGroup << endl; } else { pDataArray->groupMap[currSeq.getName()] = thisGroup; } if (pDataArray->nameFile != "") { map::iterator itName = pDataArray->nameMap.find(currSeq.getName()); if (itName != pDataArray->nameMap.end()) { vector thisSeqsNames; pDataArray->m->splitAtChar(itName->second, thisSeqsNames, ','); numRedundants = thisSeqsNames.size()-1; //we already include ourselves below for (int k = 1; k < thisSeqsNames.size(); k++) { //start at 1 to skip self outGroupsFile << thisSeqsNames[k] << '\t' << thisGroup << endl; } }else { pDataArray->m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); } } map::iterator it = pDataArray->groupCounts.find(thisGroup); if (it == pDataArray->groupCounts.end()) { pDataArray->groupCounts[thisGroup] = 1 + numRedundants; } else { pDataArray->groupCounts[it->first] += (1 + numRedundants); } } } if(pDataArray->allFiles){ ofstream output; pDataArray->m->openOutputFileAppend(pDataArray->fastaFileNames[barcodeIndex][primerIndex], output); currSeq.printSequence(output); output.close(); if(pDataArray->qFileName != ""){ pDataArray->m->openOutputFileAppend(pDataArray->qualFileNames[barcodeIndex][primerIndex], output); currQual.printQScores(output); output.close(); } if(pDataArray->nameFile != ""){ map::iterator itName = pDataArray->nameMap.find(currSeq.getName()); if (itName != pDataArray->nameMap.end()) { pDataArray->m->openOutputFileAppend(pDataArray->nameFileNames[barcodeIndex][primerIndex], output); output << itName->first << '\t' << itName->second << endl; output.close(); }else { pDataArray->m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); } } } } } else{ if(pDataArray->nameFile != ""){ //needs to be before the currSeq name is changed map::iterator itName = pDataArray->nameMap.find(currSeq.getName()); if (itName != pDataArray->nameMap.end()) { scrapNameFile << itName->first << '\t' << itName->second << endl; } else { pDataArray->m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your namefile, please correct."); pDataArray->m->mothurOutEndLine(); } } if (pDataArray->countfile != "") { map::iterator itCount = pDataArray->nameCount.find(currSeq.getName()); if (itCount != pDataArray->nameCount.end()) { trimCountFile << itCount->first << '\t' << itCount->second << endl; }else { pDataArray->m->mothurOut("[ERROR]: " + currSeq.getName() + " is not in your count file, please correct."); pDataArray->m->mothurOutEndLine(); } } currSeq.setName(currSeq.getName() + '|' + trashCode); currSeq.setUnaligned(origSeq); currSeq.setAligned(origSeq); currSeq.printSequence(scrapFASTAFile); if(pDataArray->qFileName != ""){ currQual.printQScores(scrapQualFile); } } } //report progress if((pDataArray->count) % 1000 == 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count)+"\n"); } } //report progress if((pDataArray->count) % 1000 != 0){ pDataArray->m->mothurOutJustToScreen(toString(pDataArray->count)+"\n"); } if (pDataArray->reorient) { delete rtrimOligos; } delete trimOligos; inFASTA.close(); trimFASTAFile.close(); scrapFASTAFile.close(); if (pDataArray->createGroup) { outGroupsFile.close(); } if(pDataArray->qFileName != "") { qFile.close(); scrapQualFile.close(); trimQualFile.close(); } if(pDataArray->nameFile != "") { scrapNameFile.close(); trimNameFile.close(); } return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "TrimSeqsCommand", "MyTrimThreadFunction"); exit(1); } } #endif /**************************************************************************************************/ #endif mothur-1.36.1/source/commands/unifracunweightedcommand.cpp000066400000000000000000001155571255543666200240330ustar00rootroot00000000000000/* * unifracunweightedcommand.cpp * Mothur * * Created by Sarah Westcott on 2/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "unifracunweightedcommand.h" #include "treereader.h" #include "subsample.h" #include "consensus.h" //********************************************************************************************************************** vector UnifracUnweightedCommand::setParameters(){ try { CommandParameter ptree("tree", "InputTypes", "", "", "none", "none", "none","unweighted-uwsummary",false,true,true); parameters.push_back(ptree); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter prandom("random", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(prandom); CommandParameter pdistance("distance", "Multiple", "column-lt-square-phylip", "column", "", "", "","phylip-column",false,false); parameters.push_back(pdistance); CommandParameter psubsample("subsample", "String", "", "", "", "", "","",false,false); parameters.push_back(psubsample); CommandParameter pconsensus("consensus", "Boolean", "", "F", "", "", "","tree",false,false); parameters.push_back(pconsensus); CommandParameter proot("root", "Boolean", "F", "", "", "", "","",false,false); parameters.push_back(proot); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string UnifracUnweightedCommand::getHelpString(){ try { string helpString = ""; helpString += "The unifrac.unweighted command parameters are tree, group, name, count, groups, iters, distance, processors, root and random. tree parameter is required unless you have valid current tree file.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. You must enter at least 1 valid group.\n"; helpString += "The group names are separated by dashes. The iters parameter allows you to specify how many random trees you would like compared to your tree.\n"; helpString += "The distance parameter allows you to create a distance file from the results. The default is false. You may set distance to lt, square or column.\n"; helpString += "The random parameter allows you to shut off the comparison to random trees. The default is false, meaning compare don't your trees with randomly generated trees.\n"; helpString += "The root parameter allows you to include the entire root in your calculations. The default is false, meaning stop at the root for this comparision instead of the root of the entire tree.\n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; helpString += "The unifrac.unweighted command should be in the following format: unifrac.unweighted(groups=yourGroups, iters=yourIters).\n"; helpString += "The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group. The subsample parameter may only be used with a group file.\n"; helpString += "The consensus parameter allows you to indicate you would like trees built from distance matrices created with the results of the subsampling, as well as a consensus tree built from these trees. Default=F.\n"; helpString += "Example unifrac.unweighted(groups=A-B-C, iters=500).\n"; helpString += "The default value for groups is all the groups in your groupfile, and iters is 1000.\n"; helpString += "The unifrac.unweighted command output two files: .unweighted and .uwsummary their descriptions are in the manual.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string UnifracUnweightedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "unweighted") { pattern = "[filename],unweighted-[filename],[tag],unweighted"; } else if (type == "uwsummary") { pattern = "[filename],uwsummary"; } else if (type == "phylip") { pattern = "[filename],[tag],[tag2],dist"; } else if (type == "column") { pattern = "[filename],[tag],[tag2],dist"; } else if (type == "tree") { pattern = "[filename],[tag],[tag2],tre"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** UnifracUnweightedCommand::UnifracUnweightedCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["unweighted"] = tempOutNames; outputTypes["uwsummary"] = tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; outputTypes["tree"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "UnifracUnweightedCommand"); exit(1); } } /***********************************************************/ UnifracUnweightedCommand::UnifracUnweightedCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["unweighted"] = tempOutNames; outputTypes["uwsummary"] = tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; outputTypes["tree"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("tree"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["tree"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters treefile = validParameter.validFile(parameters, "tree", true); if (treefile == "not open") { abort = true; } else if (treefile == "not found") { //if there is a current design file, use it treefile = m->getTreeFile(); if (treefile != "") { m->mothurOut("Using " + treefile + " as input file for the tree parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current tree file and the tree parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setTreeFile(treefile); } //check for required parameters groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(treefile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } itersString = validParameter.validFile(parameters, "iters", false); if (itersString == "not found") { itersString = "1000"; } m->mothurConvert(itersString, iters); string temp = validParameter.validFile(parameters, "distance", false); if (temp == "not found") { phylip = false; outputForm = ""; } else{ if (temp=="phylip") { temp = "lt"; } if ((temp == "lt") || (temp == "column") || (temp == "square")) { phylip = true; outputForm = temp; } else { m->mothurOut("Options for distance are: lt, square, or column. Using lt."); m->mothurOutEndLine(); phylip = true; outputForm = "lt"; } } temp = validParameter.validFile(parameters, "random", false); if (temp == "not found") { temp = "f"; } random = m->isTrue(temp); temp = validParameter.validFile(parameters, "root", false); if (temp == "not found") { temp = "F"; } includeRoot = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "subsample", false); if (temp == "not found") { temp = "F"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); subsample = true; } else { if (m->isTrue(temp)) { subsample = true; subsampleSize = -1; } //we will set it to smallest group later else { subsample = false; } } if (!subsample) { subsampleIters = 0; } else { subsampleIters = iters; } temp = validParameter.validFile(parameters, "consensus", false); if (temp == "not found") { temp = "F"; } consensus = m->isTrue(temp); if (subsample && random) { m->mothurOut("[ERROR]: random must be false, if subsample=t.\n"); abort=true; } if (countfile == "") { if (subsample && (groupfile == "")) { m->mothurOut("[ERROR]: if subsample=t, a group file must be provided.\n"); abort=true; } } else { CountTable testCt; if ((!testCt.testGroups(countfile)) && (subsample)) { m->mothurOut("[ERROR]: if subsample=t, a count file with group info must be provided.\n"); abort=true; } } if (subsample && (!phylip)) { phylip=true; outputForm = "lt"; } if (consensus && (!subsample)) { m->mothurOut("[ERROR]: you cannot use consensus without subsample.\n"); abort=true; } if (!random) { iters = 0; } //turn off random calcs //if user selects distance = true and no groups it won't calc the pairwise if ((phylip) && (Groups.size() == 0)) { groups = "all"; m->splitAtDash(groups, Groups); m->setGroups(Groups); } if (countfile=="") { if (namefile == "") { vector files; files.push_back(treefile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "UnifracUnweightedCommand"); exit(1); } } /***********************************************************/ int UnifracUnweightedCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } m->setTreeFile(treefile); TreeReader* reader; if (countfile == "") { reader = new TreeReader(treefile, groupfile, namefile); } else { reader = new TreeReader(treefile, countfile); } T = reader->getTrees(); ct = T[0]->getCountTable(); delete reader; map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(treefile)); sumFile = getOutputFileName("uwsummary",variables); outputNames.push_back(sumFile); outputTypes["uwsummary"].push_back(sumFile); m->openOutputFile(sumFile, outSum); SharedUtil util; Groups = m->getGroups(); vector namesGroups = ct->getNamesOfGroups(); util.setGroups(Groups, namesGroups, allGroups, numGroups, "unweighted"); //sets the groups the user wants to analyze Unweighted unweighted(includeRoot); int start = time(NULL); //set or check size if (subsample) { //user has not set size, set size = smallest samples size if (subsampleSize == -1) { vector temp; temp.push_back(Groups[0]); subsampleSize = ct->getGroupCount(Groups[0]); //num in first group for (int i = 1; i < Groups.size(); i++) { int thisSize = ct->getGroupCount(Groups[i]); if (thisSize < subsampleSize) { subsampleSize = thisSize; } } m->mothurOut("\nSetting subsample size to " + toString(subsampleSize) + ".\n\n"); }else { //eliminate any too small groups vector newGroups = Groups; Groups.clear(); for (int i = 0; i < newGroups.size(); i++) { int thisSize = ct->getGroupCount(newGroups[i]); if (thisSize >= subsampleSize) { Groups.push_back(newGroups[i]); } else { m->mothurOut("You have selected a size that is larger than "+newGroups[i]+" number of sequences, removing "+newGroups[i]+".\n"); } } m->setGroups(Groups); } } util.getCombos(groupComb, Groups, numComp); m->setGroups(Groups); if (numGroups == 1) { numComp++; groupComb.push_back(allGroups); } if (numComp < processors) { processors = numComp; } if (consensus && (numComp < 2)) { m->mothurOut("consensus can only be used with numComparisions greater than 1, setting consensus=f.\n"); consensus=false; } outSum << "Tree#" << '\t' << "Groups" << '\t' << "UWScore" <<'\t'; m->mothurOut("Tree#\tGroups\tUWScore\t"); if (random) { outSum << "UWSig"; m->mothurOut("UWSig"); } outSum << endl; m->mothurOutEndLine(); //get pscores for users trees for (int i = 0; i < T.size(); i++) { if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; }outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } counter = 0; if (random) { variables["[filename]"] = outputDir + m->getSimpleName(treefile); variables["[tag]"] = toString(i+1); string unFileName = getOutputFileName("unweighted", variables); output = new ColumnFile(unFileName, itersString); outputNames.push_back(unFileName); outputTypes["unweighted"].push_back(unFileName); } //get unweighted for users tree rscoreFreq.resize(numComp); rCumul.resize(numComp); utreeScores.resize(numComp); UWScoreSig.resize(numComp); vector userData; userData.resize(numComp,0); //weighted score info for user tree. data[0] = weightedscore AB, data[1] = weightedscore AC... userData = unweighted.getValues(T[i], processors, outputDir); //userData[0] = unweightedscore if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; }if (random) { delete output; } outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); }return 0; } //output scores for each combination for(int k = 0; k < numComp; k++) { //saves users score utreeScores[k].push_back(userData[k]); //add users score to validscores validScores[userData[k]] = userData[k]; if (!random) { UWScoreSig[k].push_back(0.0); } } if (random) { runRandomCalcs(T[i], userData); } if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; }if (random) { delete output; } outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } int startSubsample = time(NULL); //subsample loop vector< vector > calcDistsTotals; //each iter, each groupCombos dists. this will be used to make .dist files for (int thisIter = 0; thisIter < subsampleIters; thisIter++) { //subsampleIters=0, if subsample=f. if (m->control_pressed) { break; } //copy to preserve old one - would do this in subsample but memory cleanup becomes messy. CountTable* newCt = new CountTable(); //uses method of setting groups to doNotIncludeMe int sampleTime = 0; if (m->debug) { sampleTime = time(NULL); } SubSample sample; Tree* subSampleTree = sample.getSample(T[i], ct, newCt, subsampleSize); if (m->debug) { m->mothurOut("[DEBUG]: iter " + toString(thisIter) + " took " + toString(time(NULL) - sampleTime) + " seconds to sample tree.\n"); } //call new weighted function vector iterData; iterData.resize(numComp,0); Unweighted thisUnweighted(includeRoot); iterData = thisUnweighted.getValues(subSampleTree, processors, outputDir); //userData[0] = weightedscore //save data to make ave dist, std dist calcDistsTotals.push_back(iterData); delete newCt; delete subSampleTree; if((thisIter+1) % 100 == 0){ m->mothurOutJustToScreen(toString(thisIter+1)+"\n"); } } if (subsample) { m->mothurOut("It took " + toString(time(NULL) - startSubsample) + " secs to run the subsampling."); m->mothurOutEndLine(); } if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; }if (random) { delete output; } outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (subsample) { getAverageSTDMatrices(calcDistsTotals, i); } if (consensus) { getConsensusTrees(calcDistsTotals, i); } //print output files printUWSummaryFile(i); if (random) { printUnweightedFile(); delete output; } if (phylip) { createPhylipFile(i); } rscoreFreq.clear(); rCumul.clear(); validScores.clear(); utreeScores.clear(); UWScoreSig.clear(); } outSum.close(); delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to run unifrac.unweighted."); m->mothurOutEndLine(); //set phylip file as new current phylipfile string current = ""; itTypes = outputTypes.find("phylip"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setPhylipFile(current); } } //set column file as new current columnfile itTypes = outputTypes.find("column"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setColumnFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "execute"); exit(1); } } /**************************************************************************************************/ int UnifracUnweightedCommand::getAverageSTDMatrices(vector< vector >& dists, int treeNum) { try { //we need to find the average distance and standard deviation for each groups distance //finds sum vector averages = m->getAverages(dists); //find standard deviation vector stdDev = m->getStandardDeviation(dists, averages); //make matrix with scores in it vector< vector > avedists; //avedists.resize(m->getNumGroups()); for (int i = 0; i < m->getNumGroups(); i++) { vector temp; for (int j = 0; j < m->getNumGroups(); j++) { temp.push_back(0.0); } avedists.push_back(temp); } //make matrix with scores in it vector< vector > stddists; //stddists.resize(m->getNumGroups()); for (int i = 0; i < m->getNumGroups(); i++) { vector temp; for (int j = 0; j < m->getNumGroups(); j++) { temp.push_back(0.0); } //stddists[i].resize(m->getNumGroups(), 0.0); stddists.push_back(temp); } if (m->debug) { m->mothurOut("[DEBUG]: about to fill matrix.\n"); } //flip it so you can print it int count = 0; for (int r=0; rgetNumGroups(); r++) { for (int l = 0; l < r; l++) { avedists[r][l] = averages[count]; avedists[l][r] = averages[count]; stddists[r][l] = stdDev[count]; stddists[l][r] = stdDev[count]; count++; } } if (m->debug) { m->mothurOut("[DEBUG]: done filling matrix.\n"); } map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(treefile)); variables["[tag]"] = toString(treeNum+1); variables["[tag2]"] = "unweighted.ave"; string aveFileName = getOutputFileName("phylip",variables); if (outputForm != "column") { outputNames.push_back(aveFileName); outputTypes["phylip"].push_back(aveFileName); } else { outputNames.push_back(aveFileName); outputTypes["column"].push_back(aveFileName); } ofstream out; m->openOutputFile(aveFileName, out); variables["[tag2]"] = "unweighted.std"; string stdFileName = getOutputFileName("phylip",variables); if (outputForm != "column") { outputNames.push_back(stdFileName); outputTypes["phylip"].push_back(stdFileName); } else { outputNames.push_back(stdFileName); outputTypes["column"].push_back(stdFileName); } ofstream outStd; m->openOutputFile(stdFileName, outStd); if ((outputForm == "lt") || (outputForm == "square")) { //output numSeqs out << m->getNumGroups() << endl; outStd << m->getNumGroups() << endl; } //output to file for (int r=0; rgetNumGroups(); r++) { //output name string name = (m->getGroups())[r]; if (name.length() < 10) { //pad with spaces to make compatible while (name.length() < 10) { name += " "; } } if (outputForm == "lt") { out << name; outStd << name; //output distances for (int l = 0; l < r; l++) { out << '\t' << avedists[r][l]; outStd << '\t' << stddists[r][l];} out << endl; outStd << endl; }else if (outputForm == "square") { out << name; outStd << name; //output distances for (int l = 0; l < m->getNumGroups(); l++) { out << '\t' << avedists[r][l]; outStd << '\t' << stddists[r][l]; } out << endl; outStd << endl; }else{ //output distances for (int l = 0; l < r; l++) { string otherName = (m->getGroups())[l]; if (otherName.length() < 10) { //pad with spaces to make compatible while (otherName.length() < 10) { otherName += " "; } } out << name << '\t' << otherName << '\t' << avedists[r][l] << endl; outStd << name << '\t' << otherName << '\t' << stddists[r][l] << endl; } } } out.close(); outStd.close(); return 0; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "getAverageSTDMatrices"); exit(1); } } /**************************************************************************************************/ int UnifracUnweightedCommand::getConsensusTrees(vector< vector >& dists, int treeNum) { try { //used in tree constructor m->runParse = false; //create treemap class from groupmap for tree class to use CountTable newCt; set nameMap; map groupMap; set gps; for (int i = 0; i < m->getGroups().size(); i++) { nameMap.insert(m->getGroups()[i]); gps.insert(m->getGroups()[i]); groupMap[m->getGroups()[i]] = m->getGroups()[i]; } newCt.createTable(nameMap, groupMap, gps); //clear old tree names if any m->Treenames.clear(); //fills globaldatas tree names m->Treenames = m->getGroups(); vector newTrees = buildTrees(dists, treeNum, newCt); //also creates .all.tre file containing the trees created if (m->control_pressed) { return 0; } Consensus con; Tree* conTree = con.getTree(newTrees); //create a new filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(treefile)); variables["[tag]"] = toString(treeNum+1); variables["[tag2]"] = "unweighted.cons"; string conFile = getOutputFileName("tree",variables); outputNames.push_back(conFile); outputTypes["tree"].push_back(conFile); ofstream outTree; m->openOutputFile(conFile, outTree); if (conTree != NULL) { conTree->print(outTree, "boot"); delete conTree; } outTree.close(); return 0; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "getConsensusTrees"); exit(1); } } /**************************************************************************************************/ vector UnifracUnweightedCommand::buildTrees(vector< vector >& dists, int treeNum, CountTable& myct) { try { vector trees; //create a new filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(treefile)); variables["[tag]"] = toString(treeNum+1); variables["[tag2]"] = "unweighted.all"; string outputFile = getOutputFileName("tree",variables); outputNames.push_back(outputFile); outputTypes["tree"].push_back(outputFile); ofstream outAll; m->openOutputFile(outputFile, outAll); for (int i = 0; i < dists.size(); i++) { //dists[0] are the dists for the first subsampled tree. if (m->control_pressed) { break; } //make matrix with scores in it vector< vector > sims; sims.resize(m->getNumGroups()); for (int j = 0; j < m->getNumGroups(); j++) { sims[j].resize(m->getNumGroups(), 0.0); } int count = 0; for (int r=0; rgetNumGroups(); r++) { for (int l = 0; l < r; l++) { double sim = -(dists[i][count]-1.0); sims[r][l] = sim; sims[l][r] = sim; count++; } } //create tree Tree* tempTree = new Tree(&myct, sims); tempTree->assembleTree(); trees.push_back(tempTree); //print tree tempTree->print(outAll); } outAll.close(); if (m->control_pressed) { for (int i = 0; i < trees.size(); i++) { delete trees[i]; trees[i] = NULL; } m->mothurRemove(outputFile); } return trees; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "buildTrees"); exit(1); } } /**************************************************************************************************/ int UnifracUnweightedCommand::runRandomCalcs(Tree* thisTree, vector usersScores) { try { vector randomData; randomData.resize(numComp,0); //weighted score info for random trees. data[0] = weightedscore AB, data[1] = weightedscore AC... Unweighted unweighted(includeRoot); //get unweighted scores for random trees - if random is false iters = 0 for (int j = 0; j < iters; j++) { //we need a different getValues because when we swap the labels we only want to swap those in each pairwise comparison randomData = unweighted.getValues(thisTree, "", "", processors, outputDir); if (m->control_pressed) { return 0; } for(int k = 0; k < numComp; k++) { //add trees unweighted score to map of scores map::iterator it = rscoreFreq[k].find(randomData[k]); if (it != rscoreFreq[k].end()) {//already have that score rscoreFreq[k][randomData[k]]++; }else{//first time we have seen this score rscoreFreq[k][randomData[k]] = 1; } //add randoms score to validscores validScores[randomData[k]] = randomData[k]; } } for(int a = 0; a < numComp; a++) { float rcumul = 1.0000; //this loop fills the cumulative maps and put 0.0000 in the score freq map to make it easier to print. for (map::iterator it = validScores.begin(); it != validScores.end(); it++) { //make rscoreFreq map and rCumul map::iterator it2 = rscoreFreq[a].find(it->first); rCumul[a][it->first] = rcumul; //get percentage of random trees with that info if (it2 != rscoreFreq[a].end()) { rscoreFreq[a][it->first] /= iters; rcumul-= it2->second; } else { rscoreFreq[a][it->first] = 0.0000; } //no random trees with that score } UWScoreSig[a].push_back(rCumul[a][usersScores[a]]); } return 0; } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "runRandomCalcs"); exit(1); } } /***********************************************************/ void UnifracUnweightedCommand::printUnweightedFile() { try { vector data; vector tags; tags.push_back("Score"); tags.push_back("RandFreq"); tags.push_back("RandCumul"); for(int a = 0; a < numComp; a++) { output->initFile(groupComb[a], tags); //print each line for (map::iterator it = validScores.begin(); it != validScores.end(); it++) { data.push_back(it->first); data.push_back(rscoreFreq[a][it->first]); data.push_back(rCumul[a][it->first]); output->output(data); data.clear(); } output->resetFile(); } } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "printUnweightedFile"); exit(1); } } /***********************************************************/ void UnifracUnweightedCommand::printUWSummaryFile(int i) { try { //format output outSum.setf(ios::fixed, ios::floatfield); outSum.setf(ios::showpoint); //print each line for(int a = 0; a < numComp; a++) { outSum << i+1 << '\t'; m->mothurOut(toString(i+1) + "\t"); if (random) { if (UWScoreSig[a][0] > (1/(float)iters)) { outSum << setprecision(6) << groupComb[a] << '\t' << utreeScores[a][0] << '\t' << setprecision(itersString.length()) << UWScoreSig[a][0] << endl; cout << setprecision(6) << groupComb[a] << '\t' << utreeScores[a][0] << '\t' << setprecision(itersString.length()) << UWScoreSig[a][0] << endl; m->mothurOutJustToLog(groupComb[a] + "\t" + toString(utreeScores[a][0]) + "\t" + toString(UWScoreSig[a][0])+ "\n"); }else { outSum << setprecision(6) << groupComb[a] << '\t' << utreeScores[a][0] << '\t' << setprecision(itersString.length()) << "<" << (1/float(iters)) << endl; cout << setprecision(6) << groupComb[a] << '\t' << utreeScores[a][0] << '\t' << setprecision(itersString.length()) << "<" << (1/float(iters)) << endl; m->mothurOutJustToLog(groupComb[a] + "\t" + toString(utreeScores[a][0]) + "\t<" + toString((1/float(iters))) + "\n"); } }else{ outSum << setprecision(6) << groupComb[a] << '\t' << utreeScores[a][0] << endl; cout << setprecision(6) << groupComb[a] << '\t' << utreeScores[a][0] << endl; m->mothurOutJustToLog(groupComb[a] + "\t" + toString(utreeScores[a][0]) + "\n"); } } } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "printUWSummaryFile"); exit(1); } } /***********************************************************/ void UnifracUnweightedCommand::createPhylipFile(int i) { try { string phylipFileName; map variables; variables["[filename]"] = outputDir + m->getSimpleName(treefile); variables["[tag]"] = toString(i+1); if ((outputForm == "lt") || (outputForm == "square")) { variables["[tag2]"] = "unweighted.phylip"; phylipFileName = getOutputFileName("phylip",variables); outputNames.push_back(phylipFileName); outputTypes["phylip"].push_back(phylipFileName); }else { //column variables["[tag2]"] = "unweighted.column"; phylipFileName = getOutputFileName("column",variables); outputNames.push_back(phylipFileName); outputTypes["column"].push_back(phylipFileName); } ofstream out; m->openOutputFile(phylipFileName, out); if ((outputForm == "lt") || (outputForm == "square")) { //output numSeqs out << m->getNumGroups() << endl; } //make matrix with scores in it vector< vector > dists; dists.resize(m->getNumGroups()); for (int i = 0; i < m->getNumGroups(); i++) { dists[i].resize(m->getNumGroups(), 0.0); } //flip it so you can print it int count = 0; for (int r=0; rgetNumGroups(); r++) { for (int l = 0; l < r; l++) { dists[r][l] = utreeScores[count][0]; dists[l][r] = utreeScores[count][0]; count++; } } //output to file for (int r=0; rgetNumGroups(); r++) { //output name string name = (m->getGroups())[r]; if (name.length() < 10) { //pad with spaces to make compatible while (name.length() < 10) { name += " "; } } if (outputForm == "lt") { out << name; //output distances for (int l = 0; l < r; l++) { out << '\t' << dists[r][l]; } out << endl; }else if (outputForm == "square") { out << name; //output distances for (int l = 0; l < m->getNumGroups(); l++) { out << '\t' << dists[r][l]; } out << endl; }else{ //output distances for (int l = 0; l < r; l++) { string otherName = (m->getGroups())[l]; if (otherName.length() < 10) { //pad with spaces to make compatible while (otherName.length() < 10) { otherName += " "; } } out << name << '\t' << otherName << '\t' << dists[r][l] << endl; } } } out.close(); } catch(exception& e) { m->errorOut(e, "UnifracUnweightedCommand", "createPhylipFile"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/commands/unifracunweightedcommand.h000066400000000000000000000052101255543666200234600ustar00rootroot00000000000000#ifndef UNIFRACUNWEIGHTEDCOMMAND_H #define UNIFRACUNWEIGHTEDCOMMAND_H /* * unifracunweightedcommand.h * Mothur * * Created by Sarah Westcott on 2/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "unweighted.h" #include "counttable.h" #include "sharedutilities.h" #include "fileoutput.h" #include "readtree.h" class UnifracUnweightedCommand : public Command { public: UnifracUnweightedCommand(string); UnifracUnweightedCommand(); ~UnifracUnweightedCommand() {} vector setParameters(); string getCommandName() { return "unifrac.unweighted"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Lozupone C, Knight R (2005). UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71: 8228-35. \nhttp://www.mothur.org/wiki/Unifrac.unweighted"; } string getDescription() { return "generic tests that describes whether two or more communities have the same structure"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: FileOutput* output; vector T; //user trees CountTable* ct; string sumFile, allGroups; vector groupComb; // AB. AC, BC... int iters, numGroups, numComp, counter, processors, subsampleSize, subsampleIters; vector< vector > utreeScores; //scores for users trees for each comb. vector< vector > UWScoreSig; //tree score signifigance when compared to random trees - percentage of random trees with that score or higher. map validScores; //map contains scores from random vector< map > rscoreFreq; //map -vector entry for each combination. vector< map > rCumul; //map -vector entry for each combination. bool abort, phylip, random, includeRoot, consensus, subsample; string groups, itersString, outputDir, outputForm, treefile, groupfile, namefile, countfile; vector Groups, outputNames; //holds groups to be used ofstream outSum, out; ifstream inFile; int runRandomCalcs(Tree*, vector); void printUWSummaryFile(int); void printUnweightedFile(); void createPhylipFile(int); vector buildTrees(vector< vector >&, int, CountTable&); int getConsensusTrees(vector< vector >&, int); int getAverageSTDMatrices(vector< vector >&, int); }; #endif mothur-1.36.1/source/commands/unifracweightedcommand.cpp000066400000000000000000001426511255543666200234630ustar00rootroot00000000000000/* * unifracweightedcommand.cpp * Mothur * * Created by Sarah Westcott on 2/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "unifracweightedcommand.h" #include "consensus.h" #include "subsample.h" #include "treereader.h" //********************************************************************************************************************** vector UnifracWeightedCommand::setParameters(){ try { CommandParameter ptree("tree", "InputTypes", "", "", "none", "none", "none","weighted-wsummary",false,true,true); parameters.push_back(ptree); CommandParameter pname("name", "InputTypes", "", "", "NameCount", "none", "none","",false,false,true); parameters.push_back(pname); CommandParameter pcount("count", "InputTypes", "", "", "NameCount-CountGroup", "none", "none","",false,false,true); parameters.push_back(pcount); CommandParameter pgroup("group", "InputTypes", "", "", "CountGroup", "none", "none","",false,false,true); parameters.push_back(pgroup); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter piters("iters", "Number", "", "1000", "", "", "","",false,false); parameters.push_back(piters); CommandParameter pprocessors("processors", "Number", "", "1", "", "", "","",false,false,true); parameters.push_back(pprocessors); CommandParameter psubsample("subsample", "String", "", "", "", "", "","",false,false); parameters.push_back(psubsample); CommandParameter pconsensus("consensus", "Boolean", "", "F", "", "", "","tree",false,false); parameters.push_back(pconsensus); CommandParameter prandom("random", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(prandom); CommandParameter pdistance("distance", "Multiple", "column-lt-square-phylip", "column", "", "", "","phylip-column",false,false); parameters.push_back(pdistance); CommandParameter proot("root", "Boolean", "F", "", "", "", "","",false,false); parameters.push_back(proot); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string UnifracWeightedCommand::getHelpString(){ try { string helpString = ""; helpString += "The unifrac.weighted command parameters are tree, group, name, count, groups, iters, distance, processors, root, subsample, consensus and random. tree parameter is required unless you have valid current tree file.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like analyzed. You must enter at least 2 valid groups.\n"; helpString += "The group names are separated by dashes. The iters parameter allows you to specify how many random trees you would like compared to your tree.\n"; helpString += "The distance parameter allows you to create a distance file from the results. The default is false.\n"; helpString += "The random parameter allows you to shut off the comparison to random trees. The default is false, meaning don't compare your trees with randomly generated trees.\n"; helpString += "The root parameter allows you to include the entire root in your calculations. The default is false, meaning stop at the root for this comparision instead of the root of the entire tree.\n"; helpString += "The processors parameter allows you to specify the number of processors to use. The default is 1.\n"; helpString += "The subsample parameter allows you to enter the size pergroup of the sample or you can set subsample=T and mothur will use the size of your smallest group. The subsample parameter may only be used with a group file.\n"; helpString += "The consensus parameter allows you to indicate you would like trees built from distance matrices created with the results, as well as a consensus tree built from these trees. Default=F.\n"; helpString += "The unifrac.weighted command should be in the following format: unifrac.weighted(groups=yourGroups, iters=yourIters).\n"; helpString += "Example unifrac.weighted(groups=A-B-C, iters=500).\n"; helpString += "The default value for groups is all the groups in your groupfile, and iters is 1000.\n"; helpString += "The unifrac.weighted command output two files: .weighted and .wsummary their descriptions are in the manual.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string UnifracWeightedCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "weighted") { pattern = "[filename],weighted-[filename],[tag],weighted"; } else if (type == "wsummary") { pattern = "[filename],wsummary"; } else if (type == "phylip") { pattern = "[filename],[tag],[tag2],dist"; } else if (type == "column") { pattern = "[filename],[tag],[tag2],dist"; } else if (type == "tree") { pattern = "[filename],[tag],[tag2],tre"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** UnifracWeightedCommand::UnifracWeightedCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["weighted"] = tempOutNames; outputTypes["wsummary"] = tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; outputTypes["tree"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "UnifracWeightedCommand"); exit(1); } } /***********************************************************/ UnifracWeightedCommand::UnifracWeightedCommand(string option) { try { abort = false; calledHelp = false; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters=parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //initialize outputTypes vector tempOutNames; outputTypes["weighted"] = tempOutNames; outputTypes["wsummary"] = tempOutNames; outputTypes["phylip"] = tempOutNames; outputTypes["column"] = tempOutNames; outputTypes["tree"] = tempOutNames; //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("tree"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["tree"] = inputDir + it->second; } } it = parameters.find("group"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["group"] = inputDir + it->second; } } it = parameters.find("name"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["name"] = inputDir + it->second; } } it = parameters.find("count"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["count"] = inputDir + it->second; } } } //check for required parameters treefile = validParameter.validFile(parameters, "tree", true); if (treefile == "not open") { treefile = ""; abort = true; } else if (treefile == "not found") { //if there is a current design file, use it treefile = m->getTreeFile(); if (treefile != "") { m->mothurOut("Using " + treefile + " as input file for the tree parameter."); m->mothurOutEndLine(); } else { m->mothurOut("You have no current tree file and the tree parameter is required."); m->mothurOutEndLine(); abort = true; } }else { m->setTreeFile(treefile); } //check for required parameters groupfile = validParameter.validFile(parameters, "group", true); if (groupfile == "not open") { abort = true; } else if (groupfile == "not found") { groupfile = ""; } else { m->setGroupFile(groupfile); } namefile = validParameter.validFile(parameters, "name", true); if (namefile == "not open") { namefile = ""; abort = true; } else if (namefile == "not found") { namefile = ""; } else { m->setNameFile(namefile); } countfile = validParameter.validFile(parameters, "count", true); if (countfile == "not open") { countfile = ""; abort = true; } else if (countfile == "not found") { countfile = ""; } else { m->setCountTableFile(countfile); } if ((namefile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: name or count."); m->mothurOutEndLine(); abort = true; } if ((groupfile != "") && (countfile != "")) { m->mothurOut("[ERROR]: you may only use one of the following: group or count."); m->mothurOutEndLine(); abort=true; } outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(treefile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } itersString = validParameter.validFile(parameters, "iters", false); if (itersString == "not found") { itersString = "1000"; } m->mothurConvert(itersString, iters); string temp = validParameter.validFile(parameters, "distance", false); if (temp == "not found") { phylip = false; outputForm = ""; } else{ if (temp=="phylip") { temp = "lt"; } if ((temp == "lt") || (temp == "column") || (temp == "square")) { phylip = true; outputForm = temp; } else { m->mothurOut("Options for distance are: lt, square, or column. Using lt."); m->mothurOutEndLine(); phylip = true; outputForm = "lt"; } } temp = validParameter.validFile(parameters, "random", false); if (temp == "not found") { temp = "F"; } random = m->isTrue(temp); temp = validParameter.validFile(parameters, "root", false); if (temp == "not found") { temp = "F"; } includeRoot = m->isTrue(temp); temp = validParameter.validFile(parameters, "processors", false); if (temp == "not found"){ temp = m->getProcessors(); } m->setProcessors(temp); m->mothurConvert(temp, processors); temp = validParameter.validFile(parameters, "subsample", false); if (temp == "not found") { temp = "F"; } if (m->isNumeric1(temp)) { m->mothurConvert(temp, subsampleSize); subsample = true; } else { if (m->isTrue(temp)) { subsample = true; subsampleSize = -1; } //we will set it to smallest group later else { subsample = false; } } if (!subsample) { subsampleIters = 0; } else { subsampleIters = iters; } temp = validParameter.validFile(parameters, "consensus", false); if (temp == "not found") { temp = "F"; } consensus = m->isTrue(temp); if (subsample && random) { m->mothurOut("[ERROR]: random must be false, if subsample=t.\n"); abort=true; } if (countfile == "") { if (subsample && (groupfile == "")) { m->mothurOut("[ERROR]: if subsample=t, a group file must be provided.\n"); abort=true; } } else { CountTable testCt; if ((!testCt.testGroups(countfile)) && (subsample)) { m->mothurOut("[ERROR]: if subsample=t, a count file with group info must be provided.\n"); abort=true; } } if (subsample && (!phylip)) { phylip=true; outputForm = "lt"; } if (consensus && (!subsample)) { m->mothurOut("[ERROR]: you cannot use consensus without subsample.\n"); abort=true; } if (countfile=="") { if (namefile == "") { vector files; files.push_back(treefile); parser.getNameFile(files); } } } } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "UnifracWeightedCommand"); exit(1); } } /***********************************************************/ int UnifracWeightedCommand::execute() { try { if (abort == true) { if (calledHelp) { return 0; } return 2; } m->setTreeFile(treefile); TreeReader* reader; if (countfile == "") { reader = new TreeReader(treefile, groupfile, namefile); } else { reader = new TreeReader(treefile, countfile); } T = reader->getTrees(); ct = T[0]->getCountTable(); delete reader; if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } return 0; } map variables; variables["[filename]"] = outputDir + m->getSimpleName(treefile); sumFile = getOutputFileName("wsummary",variables); m->openOutputFile(sumFile, outSum); outputNames.push_back(sumFile); outputTypes["wsummary"].push_back(sumFile); SharedUtil util; string s; //to make work with setgroups Groups = m->getGroups(); vector nameGroups = ct->getNamesOfGroups(); if (nameGroups.size() < 2) { m->mothurOut("[ERROR]: You cannot run unifrac.weighted with less than 2 groups, aborting.\n"); delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } return 0; } util.setGroups(Groups, nameGroups, s, numGroups, "weighted"); //sets the groups the user wants to analyze m->setGroups(Groups); if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } return 0; } Weighted weighted(includeRoot); int start = time(NULL); //set or check size if (subsample) { //user has not set size, set size = smallest samples size if (subsampleSize == -1) { vector temp; temp.push_back(Groups[0]); subsampleSize = ct->getGroupCount(Groups[0]); //num in first group for (int i = 1; i < Groups.size(); i++) { int thisSize = ct->getGroupCount(Groups[i]); if (thisSize < subsampleSize) { subsampleSize = thisSize; } } m->mothurOut("\nSetting subsample size to " + toString(subsampleSize) + ".\n\n"); }else { //eliminate any too small groups vector newGroups = Groups; Groups.clear(); for (int i = 0; i < newGroups.size(); i++) { int thisSize = ct->getGroupCount(newGroups[i]); if (thisSize >= subsampleSize) { Groups.push_back(newGroups[i]); } else { m->mothurOut("You have selected a size that is larger than "+newGroups[i]+" number of sequences, removing "+newGroups[i]+".\n"); } } m->setGroups(Groups); } } //here in case some groups are removed by subsample util.getCombos(groupComb, Groups, numComp); if (numComp < processors) { processors = numComp; } if (consensus && (numComp < 2)) { m->mothurOut("consensus can only be used with numComparisions greater than 1, setting consensus=f.\n"); consensus=false; } //get weighted scores for users trees for (int i = 0; i < T.size(); i++) { if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } counter = 0; rScores.resize(numComp); //data[0] = weightedscore AB, data[1] = weightedscore AC... uScores.resize(numComp); //data[0] = weightedscore AB, data[1] = weightedscore AC... vector userData; userData.resize(numComp,0); //weighted score info for user tree. data[0] = weightedscore AB, data[1] = weightedscore AC... vector randomData; randomData.resize(numComp,0); //weighted score info for random trees. data[0] = weightedscore AB, data[1] = weightedscore AC... if (random) { variables["[filename]"] = outputDir + m->getSimpleName(treefile); variables["[tag]"] = toString(i+1); string wFileName = getOutputFileName("weighted", variables); output = new ColumnFile(wFileName, itersString); outputNames.push_back(wFileName); outputTypes["weighted"].push_back(wFileName); } userData = weighted.getValues(T[i], processors, outputDir); //userData[0] = weightedscore if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } if (random) { delete output; } outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //save users score for (int s=0; s > calcDistsTotals; //each iter, each groupCombos dists. this will be used to make .dist files for (int thisIter = 0; thisIter < subsampleIters; thisIter++) { //subsampleIters=0, if subsample=f. if (m->control_pressed) { break; } //copy to preserve old one - would do this in subsample but memory cleanup becomes messy. CountTable* newCt = new CountTable(); int sampleTime = 0; if (m->debug) { sampleTime = time(NULL); } //uses method of setting groups to doNotIncludeMe SubSample sample; Tree* subSampleTree = sample.getSample(T[i], ct, newCt, subsampleSize); if (m->debug) { m->mothurOut("[DEBUG]: iter " + toString(thisIter) + " took " + toString(time(NULL) - sampleTime) + " seconds to sample tree.\n"); } //call new weighted function vector iterData; iterData.resize(numComp,0); Weighted thisWeighted(includeRoot); iterData = thisWeighted.getValues(subSampleTree, processors, outputDir); //userData[0] = weightedscore //save data to make ave dist, std dist calcDistsTotals.push_back(iterData); delete newCt; delete subSampleTree; if((thisIter+1) % 100 == 0){ m->mothurOutJustToScreen(toString(thisIter+1)+"\n"); } } if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } if (random) { delete output; } outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (subsample) { getAverageSTDMatrices(calcDistsTotals, i); } if (consensus) { getConsensusTrees(calcDistsTotals, i); } } if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if (phylip) { createPhylipFile(); } printWSummaryFile(); //clear out users groups m->clearGroups(); delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } if (m->control_pressed) { for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } m->mothurOut("It took " + toString(time(NULL) - start) + " secs to run unifrac.weighted."); m->mothurOutEndLine(); //set phylip file as new current phylipfile string current = ""; itTypes = outputTypes.find("phylip"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setPhylipFile(current); } } //set column file as new current columnfile itTypes = outputTypes.find("column"); if (itTypes != outputTypes.end()) { if ((itTypes->second).size() != 0) { current = (itTypes->second)[0]; m->setColumnFile(current); } } m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "execute"); exit(1); } } /**************************************************************************************************/ int UnifracWeightedCommand::getAverageSTDMatrices(vector< vector >& dists, int treeNum) { try { //we need to find the average distance and standard deviation for each groups distance //finds sum vector averages = m->getAverages(dists); //find standard deviation vector stdDev = m->getStandardDeviation(dists, averages); //make matrix with scores in it vector< vector > avedists; //avedists.resize(m->getNumGroups()); for (int i = 0; i < m->getNumGroups(); i++) { vector temp; for (int j = 0; j < m->getNumGroups(); j++) { temp.push_back(0.0); } avedists.push_back(temp); } //make matrix with scores in it vector< vector > stddists; //stddists.resize(m->getNumGroups()); for (int i = 0; i < m->getNumGroups(); i++) { vector temp; for (int j = 0; j < m->getNumGroups(); j++) { temp.push_back(0.0); } //stddists[i].resize(m->getNumGroups(), 0.0); stddists.push_back(temp); } //flip it so you can print it int count = 0; for (int r=0; rgetNumGroups(); r++) { for (int l = 0; l < r; l++) { avedists[r][l] = averages[count]; avedists[l][r] = averages[count]; stddists[r][l] = stdDev[count]; stddists[l][r] = stdDev[count]; count++; } } map variables; variables["[filename]"] = outputDir + m->getSimpleName(treefile); variables["[tag]"] = toString(treeNum+1); variables["[tag2]"] = "weighted.ave"; string aveFileName = getOutputFileName("phylip",variables); if (outputForm != "column") { outputNames.push_back(aveFileName); outputTypes["phylip"].push_back(aveFileName); } else { outputNames.push_back(aveFileName); outputTypes["column"].push_back(aveFileName); } ofstream out; m->openOutputFile(aveFileName, out); variables["[tag2]"] = "weighted.std"; string stdFileName = getOutputFileName("phylip",variables); if (outputForm != "column") { outputNames.push_back(stdFileName); outputTypes["phylip"].push_back(stdFileName); } else { outputNames.push_back(stdFileName); outputTypes["column"].push_back(stdFileName); } ofstream outStd; m->openOutputFile(stdFileName, outStd); if ((outputForm == "lt") || (outputForm == "square")) { //output numSeqs out << m->getNumGroups() << endl; outStd << m->getNumGroups() << endl; } //output to file for (int r=0; rgetNumGroups(); r++) { //output name string name = (m->getGroups())[r]; if (name.length() < 10) { //pad with spaces to make compatible while (name.length() < 10) { name += " "; } } if (outputForm == "lt") { out << name; outStd << name; //output distances for (int l = 0; l < r; l++) { out << '\t' << avedists[r][l]; outStd << '\t' << stddists[r][l];} out << endl; outStd << endl; }else if (outputForm == "square") { out << name; outStd << name; //output distances for (int l = 0; l < m->getNumGroups(); l++) { out << '\t' << avedists[r][l]; outStd << '\t' << stddists[r][l]; } out << endl; outStd << endl; }else{ //output distances for (int l = 0; l < r; l++) { string otherName = (m->getGroups())[l]; if (otherName.length() < 10) { //pad with spaces to make compatible while (otherName.length() < 10) { otherName += " "; } } out << name << '\t' << otherName << '\t' << avedists[r][l] << endl; outStd << name << '\t' << otherName << '\t' << stddists[r][l] << endl; } } } out.close(); outStd.close(); return 0; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "getAverageSTDMatrices"); exit(1); } } /**************************************************************************************************/ int UnifracWeightedCommand::getConsensusTrees(vector< vector >& dists, int treeNum) { try { //used in tree constructor m->runParse = false; ///create treemap class from groupmap for tree class to use CountTable newCt; set nameMap; map groupMap; set gps; for (int i = 0; i < m->getGroups().size(); i++) { nameMap.insert(m->getGroups()[i]); gps.insert(m->getGroups()[i]); groupMap[m->getGroups()[i]] = m->getGroups()[i]; } newCt.createTable(nameMap, groupMap, gps); //clear old tree names if any m->Treenames.clear(); //fills globaldatas tree names m->Treenames = m->getGroups(); vector newTrees = buildTrees(dists, treeNum, newCt); //also creates .all.tre file containing the trees created if (m->control_pressed) { return 0; } Consensus con; Tree* conTree = con.getTree(newTrees); //create a new filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(treefile)); variables["[tag]"] = toString(treeNum+1); variables["[tag2]"] = "weighted.cons"; string conFile = getOutputFileName("tree",variables); outputNames.push_back(conFile); outputTypes["tree"].push_back(conFile); ofstream outTree; m->openOutputFile(conFile, outTree); if (conTree != NULL) { conTree->print(outTree, "boot"); delete conTree; } outTree.close(); return 0; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "getConsensusTrees"); exit(1); } } /**************************************************************************************************/ vector UnifracWeightedCommand::buildTrees(vector< vector >& dists, int treeNum, CountTable& myct) { try { vector trees; //create a new filename map variables; variables["[filename]"] = outputDir + m->getRootName(m->getSimpleName(treefile)); variables["[tag]"] = toString(treeNum+1); variables["[tag2]"] = "weighted.all"; string outputFile = getOutputFileName("tree",variables); outputNames.push_back(outputFile); outputTypes["tree"].push_back(outputFile); ofstream outAll; m->openOutputFile(outputFile, outAll); for (int i = 0; i < dists.size(); i++) { //dists[0] are the dists for the first subsampled tree. if (m->control_pressed) { break; } //make matrix with scores in it vector< vector > sims; sims.resize(m->getNumGroups()); for (int j = 0; j < m->getNumGroups(); j++) { sims[j].resize(m->getNumGroups(), 0.0); } int count = 0; for (int r=0; rgetNumGroups(); r++) { for (int l = 0; l < r; l++) { double sim = -(dists[i][count]-1.0); sims[r][l] = sim; sims[l][r] = sim; count++; } } //create tree Tree* tempTree = new Tree(&myct, sims); tempTree->assembleTree(); trees.push_back(tempTree); //print tree tempTree->print(outAll); } outAll.close(); if (m->control_pressed) { for (int i = 0; i < trees.size(); i++) { delete trees[i]; trees[i] = NULL; } m->mothurRemove(outputFile); } return trees; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "buildTrees"); exit(1); } } /**************************************************************************************************/ int UnifracWeightedCommand::runRandomCalcs(Tree* thisTree, vector usersScores) { try { //calculate number of comparisons i.e. with groups A,B,C = AB, AC, BC = 3; vector< vector > namesOfGroupCombos; for (int a=0; a groups; groups.push_back((m->getGroups())[a]); groups.push_back((m->getGroups())[l]); namesOfGroupCombos.push_back(groups); } } lines.clear(); //breakdown work between processors int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } //get scores for random trees for (int j = 0; j < iters; j++) { createProcesses(thisTree, namesOfGroupCombos, rScores); if (m->control_pressed) { delete ct; for (int i = 0; i < T.size(); i++) { delete T[i]; } delete output; outSum.close(); for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } } lines.clear(); //find the signifigance of the score for summary file for (int f = 0; f < numComp; f++) { //sort random scores sort(rScores[f].begin(), rScores[f].end()); //the index of the score higher than yours is returned //so if you have 1000 random trees the index returned is 100 //then there are 900 trees with a score greater then you. //giving you a signifigance of 0.900 int index = findIndex(usersScores[f], f); if (index == -1) { m->mothurOut("error in UnifracWeightedCommand"); m->mothurOutEndLine(); exit(1); } //error code //the signifigance is the number of trees with the users score or higher WScoreSig.push_back((iters-index)/(float)iters); } //out << "Tree# " << i << endl; calculateFreqsCumuls(); printWeightedFile(); delete output; return 0; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "runRandomCalcs"); exit(1); } } /**************************************************************************************************/ int UnifracWeightedCommand::createProcesses(Tree* t, vector< vector > namesOfGroupCombos, vector< vector >& scores) { try { int process = 1; vector processIDS; EstOutput results; bool recalc = false; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ driver(t, namesOfGroupCombos, lines[process].start, lines[process].end, scores); //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".weightedcommand.results.temp"; m->openOutputFile(tempFile, out); for (int i = lines[process].start; i < (lines[process].start + lines[process].end); i++) { out << scores[i][(scores[i].size()-1)] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".weightedcommand.results.temp")); } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(outputDir + (toString(processIDS[i]) + ".weightedcommand.results.temp"));}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); lines.clear(); //breakdown work between processors int remainingPairs = namesOfGroupCombos.size(); int startIndex = 0; for (int remainingProcessors = processors; remainingProcessors > 0; remainingProcessors--) { int numPairs = remainingPairs; //case for last processor if (remainingProcessors != 1) { numPairs = ceil(remainingPairs / remainingProcessors); } lines.push_back(linePair(startIndex, numPairs)); //startIndex, numPairs startIndex = startIndex + numPairs; remainingPairs = remainingPairs - numPairs; } results.clear(); processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ driver(t, namesOfGroupCombos, lines[process].start, lines[process].end, scores); //pass numSeqs to parent ofstream out; string tempFile = outputDir + m->mothurGetpid(process) + ".weightedcommand.results.temp"; m->openOutputFile(tempFile, out); for (int i = lines[process].start; i < (lines[process].start + lines[process].end); i++) { out << scores[i][(scores[i].size()-1)] << '\t'; } out << endl; out.close(); exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } driver(t, namesOfGroupCombos, lines[0].start, lines[0].end, scores); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } //get data created by processes for (int i=0;i<(processors-1);i++) { ifstream in; string s = outputDir + toString(processIDS[i]) + ".weightedcommand.results.temp"; m->openInputFile(s, in); double tempScore; for (int j = lines[(i+1)].start; j < (lines[(i+1)].start + lines[(i+1)].end); j++) { in >> tempScore; scores[j].push_back(tempScore); } in.close(); m->mothurRemove(s); } #else //fill in functions vector pDataArray; DWORD dwThreadIdArray[processors-1]; HANDLE hThreadArray[processors-1]; vector cts; vector trees; //Create processor worker threads. for( int i=1; icopy(ct); Tree* copyTree = new Tree(copyCount); copyTree->getCopy(t); cts.push_back(copyCount); trees.push_back(copyTree); vector< vector > copyScores = rScores; weightedRandomData* tempweighted = new weightedRandomData(m, lines[i].start, lines[i].end, namesOfGroupCombos, copyTree, copyCount, includeRoot, copyScores); pDataArray.push_back(tempweighted); processIDS.push_back(i); hThreadArray[i-1] = CreateThread(NULL, 0, MyWeightedRandomThreadFunction, pDataArray[i-1], 0, &dwThreadIdArray[i-1]); } driver(t, namesOfGroupCombos, lines[0].start, lines[0].end, scores); //Wait until all threads have terminated. WaitForMultipleObjects(processors-1, hThreadArray, TRUE, INFINITE); //Close all thread handles and free memory allocations. for(int i=0; i < pDataArray.size(); i++){ for (int j = pDataArray[i]->start; j < (pDataArray[i]->start+pDataArray[i]->num); j++) { scores[j].push_back(pDataArray[i]->scores[j][pDataArray[i]->scores[j].size()-1]); } delete cts[i]; delete trees[i]; CloseHandle(hThreadArray[i]); delete pDataArray[i]; } #endif return 0; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "createProcesses"); exit(1); } } /**************************************************************************************************/ int UnifracWeightedCommand::driver(Tree* t, vector< vector > namesOfGroupCombos, int start, int num, vector< vector >& scores) { try { Tree* randT = new Tree(ct); Weighted weighted(includeRoot); for (int h = start; h < (start+num); h++) { if (m->control_pressed) { return 0; } //initialize weighted score string groupA = namesOfGroupCombos[h][0]; string groupB = namesOfGroupCombos[h][1]; //copy T[i]'s info. randT->getCopy(t); //create a random tree with same topology as T[i], but different labels randT->assembleRandomUnifracTree(groupA, groupB); if (m->control_pressed) { delete randT; return 0; } //get wscore of random tree EstOutput randomData = weighted.getValues(randT, groupA, groupB); if (m->control_pressed) { delete randT; return 0; } //save scores scores[h].push_back(randomData[0]); } delete randT; return 0; } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "driver"); exit(1); } } /***********************************************************/ void UnifracWeightedCommand::printWeightedFile() { try { vector data; vector tags; tags.push_back("Score"); tags.push_back("RandFreq"); tags.push_back("RandCumul"); for(int a = 0; a < numComp; a++) { output->initFile(groupComb[a], tags); //print each line for (map::iterator it = validScores.begin(); it != validScores.end(); it++) { data.push_back(it->first); data.push_back(rScoreFreq[a][it->first]); data.push_back(rCumul[a][it->first]); output->output(data); data.clear(); } output->resetFile(); } } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "printWeightedFile"); exit(1); } } /***********************************************************/ void UnifracWeightedCommand::printWSummaryFile() { try { //column headers outSum << "Tree#" << '\t' << "Groups" << '\t' << "WScore" << '\t'; m->mothurOut("Tree#\tGroups\tWScore\t"); if (random) { outSum << "WSig"; m->mothurOut("WSig"); } outSum << endl; m->mothurOutEndLine(); //format output outSum.setf(ios::fixed, ios::floatfield); outSum.setf(ios::showpoint); //print each line int count = 0; for (int i = 0; i < T.size(); i++) { for (int j = 0; j < numComp; j++) { if (random) { if (WScoreSig[count] > (1/(float)iters)) { outSum << setprecision(6) << i+1 << '\t' << groupComb[j] << '\t' << utreeScores[count] << '\t' << setprecision(itersString.length()) << WScoreSig[count] << endl; cout << setprecision(6) << i+1 << '\t' << groupComb[j] << '\t' << utreeScores[count] << '\t' << setprecision(itersString.length()) << WScoreSig[count] << endl; m->mothurOutJustToLog(toString(i+1) +"\t" + groupComb[j] +"\t" + toString(utreeScores[count]) +"\t" + toString(WScoreSig[count]) + "\n"); }else{ outSum << setprecision(6) << i+1 << '\t' << groupComb[j] << '\t' << utreeScores[count] << '\t' << setprecision(itersString.length()) << "<" << (1/float(iters)) << endl; cout << setprecision(6) << i+1 << '\t' << groupComb[j] << '\t' << utreeScores[count] << '\t' << setprecision(itersString.length()) << "<" << (1/float(iters)) << endl; m->mothurOutJustToLog(toString(i+1) +"\t" + groupComb[j] +"\t" + toString(utreeScores[count]) +"\t<" + toString((1/float(iters))) + "\n"); } }else{ outSum << setprecision(6) << i+1 << '\t' << groupComb[j] << '\t' << utreeScores[count] << endl; cout << setprecision(6) << i+1 << '\t' << groupComb[j] << '\t' << utreeScores[count] << endl; m->mothurOutJustToLog(toString(i+1) +"\t" + groupComb[j] +"\t" + toString(utreeScores[count]) +"\n"); } count++; } } outSum.close(); } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "printWSummaryFile"); exit(1); } } /***********************************************************/ void UnifracWeightedCommand::createPhylipFile() { try { int count = 0; //for each tree for (int i = 0; i < T.size(); i++) { string phylipFileName; map variables; variables["[filename]"] = outputDir + m->getSimpleName(treefile); variables["[tag]"] = toString(i+1); if ((outputForm == "lt") || (outputForm == "square")) { variables["[tag2]"] = "weighted.phylip"; phylipFileName = getOutputFileName("phylip",variables); outputNames.push_back(phylipFileName); outputTypes["phylip"].push_back(phylipFileName); }else { //column variables["[tag2]"] = "weighted.column"; phylipFileName = getOutputFileName("column",variables); outputNames.push_back(phylipFileName); outputTypes["column"].push_back(phylipFileName); } ofstream out; m->openOutputFile(phylipFileName, out); if ((outputForm == "lt") || (outputForm == "square")) { //output numSeqs out << m->getNumGroups() << endl; } //make matrix with scores in it vector< vector > dists; dists.resize(m->getNumGroups()); for (int i = 0; i < m->getNumGroups(); i++) { dists[i].resize(m->getNumGroups(), 0.0); } //flip it so you can print it for (int r=0; rgetNumGroups(); r++) { for (int l = 0; l < r; l++) { dists[r][l] = utreeScores[count]; dists[l][r] = utreeScores[count]; count++; } } //output to file for (int r=0; rgetNumGroups(); r++) { //output name string name = (m->getGroups())[r]; if (name.length() < 10) { //pad with spaces to make compatible while (name.length() < 10) { name += " "; } } if (outputForm == "lt") { out << name; //output distances for (int l = 0; l < r; l++) { out << '\t' << dists[r][l]; } out << endl; }else if (outputForm == "square") { out << name; //output distances for (int l = 0; l < m->getNumGroups(); l++) { out << '\t' << dists[r][l]; } out << endl; }else{ //output distances for (int l = 0; l < r; l++) { string otherName = (m->getGroups())[l]; if (otherName.length() < 10) { //pad with spaces to make compatible while (otherName.length() < 10) { otherName += " "; } } out << name << '\t' << otherName << '\t' << dists[r][l] << endl; } } } out.close(); } } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "createPhylipFile"); exit(1); } } /***********************************************************/ int UnifracWeightedCommand::findIndex(float score, int index) { try{ for (int i = 0; i < rScores[index].size(); i++) { if (rScores[index][i] >= score) { return i; } } return rScores[index].size(); } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "findIndex"); exit(1); } } /***********************************************************/ void UnifracWeightedCommand::calculateFreqsCumuls() { try { //clear out old tree values rScoreFreq.clear(); rScoreFreq.resize(numComp); rCumul.clear(); rCumul.resize(numComp); validScores.clear(); //calculate frequency for (int f = 0; f < numComp; f++) { for (int i = 0; i < rScores[f].size(); i++) { //looks like 0,0,1,1,1,2,4,7... you want to make a map that say rScoreFreq[0] = 2, rScoreFreq[1] = 3... validScores[rScores[f][i]] = rScores[f][i]; map::iterator it = rScoreFreq[f].find(rScores[f][i]); if (it != rScoreFreq[f].end()) { rScoreFreq[f][rScores[f][i]]++; }else{ rScoreFreq[f][rScores[f][i]] = 1; } } } //calculate rcumul for(int a = 0; a < numComp; a++) { float rcumul = 1.0000; //this loop fills the cumulative maps and put 0.0000 in the score freq map to make it easier to print. for (map::iterator it = validScores.begin(); it != validScores.end(); it++) { //make rscoreFreq map and rCumul map::iterator it2 = rScoreFreq[a].find(it->first); rCumul[a][it->first] = rcumul; //get percentage of random trees with that info if (it2 != rScoreFreq[a].end()) { rScoreFreq[a][it->first] /= iters; rcumul-= it2->second; } else { rScoreFreq[a][it->first] = 0.0000; } //no random trees with that score } } } catch(exception& e) { m->errorOut(e, "UnifracWeightedCommand", "calculateFreqsCums"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/commands/unifracweightedcommand.h000066400000000000000000000124741255543666200231270ustar00rootroot00000000000000#ifndef UNIFRACWEIGHTEDCOMMAND_H #define UNIFRACWEIGHTEDCOMMAND_H /* * unifracweightedcommand.h * Mothur * * Created by Sarah Westcott on 2/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "weighted.h" #include "counttable.h" #include "progress.hpp" #include "sharedutilities.h" #include "fileoutput.h" #include "readtree.h" class UnifracWeightedCommand : public Command { public: UnifracWeightedCommand(string); UnifracWeightedCommand(); ~UnifracWeightedCommand() {} vector setParameters(); string getCommandName() { return "unifrac.weighted"; } string getCommandCategory() { return "Hypothesis Testing"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "Lozupone CA, Hamady M, Kelley ST, Knight R (2007). Quantitative and qualitative beta diversity measures lead to different insights into factors that structure microbial communities. Appl Environ Microbiol 73: 1576-85. \nhttp://www.mothur.org/wiki/Unifrac.weighted"; } string getDescription() { return "generic tests that describes whether two or more communities have the same structure"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: vector lines; CountTable* ct; FileOutput* output; vector T; //user trees vector utreeScores; //user tree unweighted scores vector WScoreSig; //tree weighted score signifigance when compared to random trees - percentage of random trees with that score or lower. vector groupComb; // AB. AC, BC... string sumFile, outputDir; int iters, numGroups, numComp, counter; vector< vector > rScores; //vector each group comb has an entry vector< vector > uScores; //vector each group comb has an entry vector< map > rScoreFreq; //map -vector entry for each combination. vector< map > rCumul; //map -vector entry for each c map validScores; //map contains scores from random bool abort, phylip, random, includeRoot, subsample, consensus; string groups, itersString, outputForm, treefile, groupfile, namefile, countfile; vector Groups, outputNames; //holds groups to be used int processors, subsampleSize, subsampleIters; ofstream outSum; map nameMap; void printWSummaryFile(); void printWeightedFile(); void createPhylipFile(); //void removeValidScoresDuplicates(); int findIndex(float, int); void calculateFreqsCumuls(); int createProcesses(Tree*, vector< vector >, vector< vector >&); int driver(Tree*, vector< vector >, int, int, vector< vector >&); int runRandomCalcs(Tree*, vector); vector buildTrees(vector< vector >&, int, CountTable&); int getConsensusTrees(vector< vector >&, int); int getAverageSTDMatrices(vector< vector >&, int); }; /***********************************************************************/ struct weightedRandomData { int start; int num; MothurOut* m; vector< vector > scores; vector< vector > namesOfGroupCombos; Tree* t; CountTable* ct; bool includeRoot; weightedRandomData(){} weightedRandomData(MothurOut* mout, int st, int en, vector< vector > ngc, Tree* tree, CountTable* count, bool ir, vector< vector > sc) { m = mout; start = st; num = en; namesOfGroupCombos = ngc; t = tree; ct = count; includeRoot = ir; scores = sc; } }; /**************************************************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else static DWORD WINAPI MyWeightedRandomThreadFunction(LPVOID lpParam){ weightedRandomData* pDataArray; pDataArray = (weightedRandomData*)lpParam; try { Tree* randT = new Tree(pDataArray->ct); Weighted weighted(pDataArray->includeRoot); for (int h = pDataArray->start; h < (pDataArray->start+pDataArray->num); h++) { if (pDataArray->m->control_pressed) { return 0; } //initialize weighted score string groupA = pDataArray->namesOfGroupCombos[h][0]; string groupB = pDataArray->namesOfGroupCombos[h][1]; //copy T[i]'s info. randT->getCopy(pDataArray->t); //create a random tree with same topology as T[i], but different labels randT->assembleRandomUnifracTree(groupA, groupB); if (pDataArray->m->control_pressed) { delete randT; return 0; } //get wscore of random tree EstOutput randomData = weighted.getValues(randT, groupA, groupB); if (pDataArray->m->control_pressed) { delete randT; return 0; } //save scores pDataArray->scores[h].push_back(randomData[0]); } delete randT; return 0; } catch(exception& e) { pDataArray->m->errorOut(e, "Weighted", "MyWeightedRandomThreadFunction"); exit(1); } } #endif #endif mothur-1.36.1/source/commands/venncommand.cpp000066400000000000000000000610021255543666200212470ustar00rootroot00000000000000/* * venncommand.cpp * Mothur * * Created by Sarah Westcott on 3/30/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "venncommand.h" #include "ace.h" #include "sobs.h" #include "chao1.h" //#include "jackknife.h" #include "sharedsobscollectsummary.h" #include "sharedchao1.h" #include "sharedace.h" #include "nseqs.h" //********************************************************************************************************************** vector VennCommand::setParameters(){ try { CommandParameter plist("list", "InputTypes", "", "", "LRSS", "LRSS", "none","svg",false,false,true); parameters.push_back(plist); CommandParameter pshared("shared", "InputTypes", "", "", "LRSS", "LRSS", "none","svg",false,false,true); parameters.push_back(pshared); CommandParameter pgroups("groups", "String", "", "", "", "", "","",false,false); parameters.push_back(pgroups); CommandParameter plabel("label", "String", "", "", "", "", "","",false,false); parameters.push_back(plabel); CommandParameter pcalc("calc", "String", "", "", "", "", "","",false,false); parameters.push_back(pcalc); CommandParameter pabund("abund", "Number", "", "10", "", "", "","",false,false); parameters.push_back(pabund); CommandParameter pnseqs("nseqs", "Boolean", "", "F", "", "", "","",false,false); parameters.push_back(pnseqs); CommandParameter psharedotus("sharedotus", "Boolean", "", "t", "", "", "","",false,false); parameters.push_back(psharedotus); CommandParameter pfontsize("fontsize", "Number", "", "24", "", "", "","",false,false); parameters.push_back(pfontsize); CommandParameter ppermute("permute", "Multiple", "1-2-3-4", "4", "", "", "","",false,false); parameters.push_back(ppermute); CommandParameter pseed("seed", "Number", "", "0", "", "", "","",false,false); parameters.push_back(pseed); CommandParameter pinputdir("inputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(pinputdir); CommandParameter poutputdir("outputdir", "String", "", "", "", "", "","",false,false); parameters.push_back(poutputdir); vector myArray; for (int i = 0; i < parameters.size(); i++) { myArray.push_back(parameters[i].name); } return myArray; } catch(exception& e) { m->errorOut(e, "VennCommand", "setParameters"); exit(1); } } //********************************************************************************************************************** string VennCommand::getHelpString(){ try { string helpString = ""; helpString += "The venn command parameters are list, shared, groups, calc, abund, nseqs, permute, sharedotus, fontsize and label. shared, relabund, list, rabund or sabund is required unless you have a valid current file.\n"; helpString += "The groups parameter allows you to specify which of the groups in your groupfile you would like included in your venn diagram, you may only use a maximum of 4 groups.\n"; helpString += "The group names are separated by dashes. The label allows you to select what distance levels you would like a venn diagram created for, and are also separated by dashes.\n"; helpString += "The fontsize parameter allows you to adjust the font size of the picture created, default=24.\n"; helpString += "The venn command should be in the following format: venn(groups=yourGroups, calc=yourCalcs, label=yourLabels, abund=yourAbund).\n"; helpString += "Example venn(groups=A-B-C, calc=sharedsobs-sharedchao, abund=20).\n"; helpString += "The default value for groups is all the groups in your groupfile up to 4, and all labels in your inputfile will be used.\n"; helpString += "The default value for calc is sobs if you have only read a list file or if you have selected only one group, and sharedsobs if you have multiple groups.\n"; helpString += "The default available estimators for calc are sobs, chao and ace if you have only read a list file, and sharedsobs, sharedchao and sharedace if you have read a shared file.\n"; helpString += "The nseqs parameter will output the number of sequences represented by the otus in the picture, default=F.\n"; helpString += "If you have more than 4 groups, you can use the permute parameter to set the number of groups you would like mothur to divide the samples into to draw the venn diagrams for all possible combos. Default=4.\n"; helpString += "The only estimators available four 4 groups are sharedsobs and sharedchao.\n"; helpString += "The sharedotus parameter can be used with the sharedsobs calculator to get the names of the OTUs in each section of the venn diagram. Default=t.\n"; helpString += "The venn command outputs a .svg file for each calculator you specify at each distance you choose.\n"; helpString += "Note: No spaces between parameter labels (i.e. groups), '=' and parameters (i.e.yourGroups).\n"; return helpString; } catch(exception& e) { m->errorOut(e, "VennCommand", "getHelpString"); exit(1); } } //********************************************************************************************************************** string VennCommand::getOutputPattern(string type) { try { string pattern = ""; if (type == "svg") { pattern = "[filename],svg"; } else { m->mothurOut("[ERROR]: No definition for type " + type + " output pattern.\n"); m->control_pressed = true; } return pattern; } catch(exception& e) { m->errorOut(e, "VennCommand", "getOutputPattern"); exit(1); } } //********************************************************************************************************************** VennCommand::VennCommand(){ try { abort = true; calledHelp = true; setParameters(); vector tempOutNames; outputTypes["svg"] = tempOutNames; } catch(exception& e) { m->errorOut(e, "VennCommand", "VennCommand"); exit(1); } } //********************************************************************************************************************** VennCommand::VennCommand(string option) { try { abort = false; calledHelp = false; allLines = 1; //allow user to run help if(option == "help") { help(); abort = true; calledHelp = true; } else if(option == "citation") { citation(); abort = true; calledHelp = true;} else { vector myArray = setParameters(); OptionParser parser(option); map parameters = parser.getParameters(); map::iterator it; ValidParameters validParameter; //check to make sure all parameters are valid for command for (map::iterator it = parameters.begin(); it != parameters.end(); it++) { if (validParameter.isValidParameter(it->first, myArray, it->second) != true) { abort = true; } } //if the user changes the input directory command factory will send this info to us in the output parameter string inputDir = validParameter.validFile(parameters, "inputdir", false); if (inputDir == "not found"){ inputDir = ""; } else { string path; it = parameters.find("shared"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["shared"] = inputDir + it->second; } } it = parameters.find("list"); //user has given a template file if(it != parameters.end()){ path = m->hasPath(it->second); //if the user has not given a path then, add inputdir. else leave path alone. if (path == "") { parameters["list"] = inputDir + it->second; } } } //check for required parameters listfile = validParameter.validFile(parameters, "list", true); if (listfile == "not open") { listfile = ""; abort = true; } else if (listfile == "not found") { listfile = ""; } else { format = "list"; inputfile = listfile; m->setListFile(listfile); } sharedfile = validParameter.validFile(parameters, "shared", true); if (sharedfile == "not open") { sharedfile = ""; abort = true; } else if (sharedfile == "not found") { sharedfile = ""; } else { format = "sharedfile"; inputfile = sharedfile; m->setSharedFile(sharedfile); } if ((sharedfile == "") && (listfile == "")) { //is there are current file available for any of these? //give priority to shared, then list, then rabund, then sabund //if there is a current shared file, use it sharedfile = m->getSharedFile(); if (sharedfile != "") { inputfile = sharedfile; format = "sharedfile"; m->mothurOut("Using " + sharedfile + " as input file for the shared parameter."); m->mothurOutEndLine(); } else { listfile = m->getListFile(); if (listfile != "") { inputfile = listfile; format = "list"; m->mothurOut("Using " + listfile + " as input file for the list parameter."); m->mothurOutEndLine(); } else { m->mothurOut("No valid current files. You must provide a list or shared file."); m->mothurOutEndLine(); abort = true; } } } //if the user changes the output directory command factory will send this info to us in the output parameter outputDir = validParameter.validFile(parameters, "outputdir", false); if (outputDir == "not found"){ outputDir = m->hasPath(inputfile); } //check for optional parameter and set defaults // ...at some point should added some additional type checking... label = validParameter.validFile(parameters, "label", false); if (label == "not found") { label = ""; } else { if(label != "all") { m->splitAtDash(label, labels); allLines = 0; } else { allLines = 1; } } groups = validParameter.validFile(parameters, "groups", false); if (groups == "not found") { groups = ""; } else { m->splitAtDash(groups, Groups); m->setGroups(Groups); } calc = validParameter.validFile(parameters, "calc", false); if (calc == "not found") { if(format == "list") { calc = "sobs"; } else { calc = "sharedsobs"; } } else { if (calc == "default") { if(format == "list") { calc = "sobs"; } else { calc = "sharedsobs"; } } } m->splitAtDash(calc, Estimators); if (m->inUsersGroups("citation", Estimators)) { ValidCalculators validCalc; validCalc.printCitations(Estimators); //remove citation from list of calcs for (int i = 0; i < Estimators.size(); i++) { if (Estimators[i] == "citation") { Estimators.erase(Estimators.begin()+i); break; } } } string temp; temp = validParameter.validFile(parameters, "abund", false); if (temp == "not found") { temp = "10"; } m->mothurConvert(temp, abund); temp = validParameter.validFile(parameters, "nseqs", false); if (temp == "not found"){ temp = "f"; } nseqs = m->isTrue(temp); temp = validParameter.validFile(parameters, "permute", false); if (temp == "not found"){ temp = "4"; } else { if ((temp == "1") || (temp == "2") || (temp == "3") || (temp == "4")) {} else { bool permTrue = m->isTrue(temp); if (permTrue) { temp = "4"; } else { } } } m->mothurConvert(temp, perm); if ((perm == 1) || (perm == 2) || (perm == 3) || (perm == 4)) { } else { m->mothurOut("[ERROR]: Not a valid permute value. Valid values are 1, 2, 3, 4 and true."); m->mothurOutEndLine(); abort = true; } temp = validParameter.validFile(parameters, "sharedotus", false); if (temp == "not found"){ temp = "t"; } sharedOtus = m->isTrue(temp); temp = validParameter.validFile(parameters, "fontsize", false); if (temp == "not found") { temp = "24"; } m->mothurConvert(temp, fontsize); } } catch(exception& e) { m->errorOut(e, "VennCommand", "VennCommand"); exit(1); } } //********************************************************************************************************************** int VennCommand::execute(){ try { if (abort == true) { if (calledHelp) { return 0; } return 2; } ValidCalculators validCalculator; if (format == "list") { for (int i=0; imothurOut("No valid calculators given, please correct."); m->mothurOutEndLine(); return 0; } venn = new Venn(outputDir, nseqs, inputfile, fontsize, sharedOtus); input = new InputData(inputfile, format); string lastLabel; if (format == "sharedfile") { lookup = input->getSharedRAbundVectors(); lastLabel = lookup[0]->getLabel(); if ((lookup.size() > 4)) { combos = findCombinations(lookup.size()); } }else if (format == "list") { sabund = input->getSAbundVector(); lastLabel = sabund->getLabel(); } //if the users enters label "0.06" and there is no "0.06" in their file use the next lowest label. set processedLabels; set userLabels = labels; if (format != "list") { //as long as you are not at the end of the file or done wih the lines you want while((lookup[0] != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < vennCalculators.size(); i++) { delete vennCalculators[i]; } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } m->clearGroups(); delete venn; delete input; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(lookup[0]->getLabel()) == 1){ m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); if (lookup.size() > 4) { set< set >::iterator it3; set::iterator it2; for (it3 = combos.begin(); it3 != combos.end(); it3++) { set poss = *it3; vector subset; for (it2 = poss.begin(); it2 != poss.end(); it2++) { subset.push_back(lookup[*it2]); } vector outfilenames = venn->getPic(subset, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } } }else { vector outfilenames = venn->getPic(lookup, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } } } if ((m->anyLabelsToProcess(lookup[0]->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = lookup[0]->getLabel(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); if (lookup.size() > 4) { set< set >::iterator it3; set::iterator it2; for (it3 = combos.begin(); it3 != combos.end(); it3++) { set poss = *it3; vector subset; for (it2 = poss.begin(); it2 != poss.end(); it2++) { subset.push_back(lookup[*it2]); } vector outfilenames = venn->getPic(subset, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } } }else { vector outfilenames = venn->getPic(lookup, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } } //restore real lastlabel to save below lookup[0]->setLabel(saveLabel); } lastLabel = lookup[0]->getLabel(); //get next line to process for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup = input->getSharedRAbundVectors(); } if (m->control_pressed) { for (int i = 0; i < vennCalculators.size(); i++) { delete vennCalculators[i]; } m->clearGroups(); delete venn; delete input; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { for (int i = 0; i < lookup.size(); i++) { if (lookup[i] != NULL) { delete lookup[i]; } } lookup = input->getSharedRAbundVectors(lastLabel); m->mothurOut(lookup[0]->getLabel()); m->mothurOutEndLine(); processedLabels.insert(lookup[0]->getLabel()); userLabels.erase(lookup[0]->getLabel()); if (lookup.size() > 4) { set< set >::iterator it3; set::iterator it2; for (it3 = combos.begin(); it3 != combos.end(); it3++) { set poss = *it3; vector subset; for (it2 = poss.begin(); it2 != poss.end(); it2++) { subset.push_back(lookup[*it2]); } vector outfilenames = venn->getPic(subset, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } } }else { vector outfilenames = venn->getPic(lookup, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } } for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } //reset groups parameter m->clearGroups(); if (m->control_pressed) { m->clearGroups(); delete venn; delete input; for (int i = 0; i < vennCalculators.size(); i++) { delete vennCalculators[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } }else{ while((sabund != NULL) && ((allLines == 1) || (userLabels.size() != 0))) { if (m->control_pressed) { for (int i = 0; i < vennCalculators.size(); i++) { delete vennCalculators[i]; } delete sabund; delete venn; delete input; for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } if(allLines == 1 || labels.count(sabund->getLabel()) == 1){ m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); vector outfilenames = venn->getPic(sabund, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); } if ((m->anyLabelsToProcess(sabund->getLabel(), userLabels, "") == true) && (processedLabels.count(lastLabel) != 1)) { string saveLabel = sabund->getLabel(); delete sabund; sabund = input->getSAbundVector(lastLabel); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); vector outfilenames = venn->getPic(sabund, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } processedLabels.insert(sabund->getLabel()); userLabels.erase(sabund->getLabel()); //restore real lastlabel to save below sabund->setLabel(saveLabel); } lastLabel = sabund->getLabel(); delete sabund; sabund = input->getSAbundVector(); } if (m->control_pressed) { for (int i = 0; i < vennCalculators.size(); i++) { delete vennCalculators[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } delete venn; delete input; return 0; } //output error messages about any remaining user labels set::iterator it; bool needToRun = false; for (it = userLabels.begin(); it != userLabels.end(); it++) { m->mothurOut("Your file does not include the label " + *it); if (processedLabels.count(lastLabel) != 1) { m->mothurOut(". I will use " + lastLabel + "."); m->mothurOutEndLine(); needToRun = true; }else { m->mothurOut(". Please refer to " + lastLabel + "."); m->mothurOutEndLine(); } } //run last label if you need to if (needToRun == true) { if (sabund != NULL) { delete sabund; } sabund = input->getSAbundVector(lastLabel); m->mothurOut(sabund->getLabel()); m->mothurOutEndLine(); vector outfilenames = venn->getPic(sabund, vennCalculators); for(int i = 0; i < outfilenames.size(); i++) { if (outfilenames[i] != "control" ) { outputNames.push_back(outfilenames[i]); outputTypes["svg"].push_back(outfilenames[i]); } } delete sabund; } if (m->control_pressed) { delete venn; delete input; for (int i = 0; i < vennCalculators.size(); i++) { delete vennCalculators[i]; } for (int i = 0; i < outputNames.size(); i++) { m->mothurRemove(outputNames[i]); } return 0; } } for (int i = 0; i < vennCalculators.size(); i++) { delete vennCalculators[i]; } delete venn; delete input; m->mothurOutEndLine(); m->mothurOut("Output File Names: "); m->mothurOutEndLine(); for (int i = 0; i < outputNames.size(); i++) { m->mothurOut(outputNames[i]); m->mothurOutEndLine(); } m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "VennCommand", "execute"); exit(1); } } //********************************************************************************************************************** //returns a vector of sets containing the group combinations set< set > VennCommand::findCombinations(int lookupSize){ try { set< set > combos; set possibles; for (int i = 0; i < lookupSize; i++) { possibles.insert(i); } getCombos(possibles, combos); return combos; } catch(exception& e) { m->errorOut(e, "VennCommand", "findCombinations"); exit(1); } } //********************************************************************************************************************** //recusively finds combos of length perm int VennCommand::getCombos(set possibles, set< set >& combos){ try { if (possibles.size() == perm) { //done if (combos.count(possibles) == 0) { //no dups combos.insert(possibles); } }else { //we still have work to do set::iterator it; set::iterator it2; for (it = possibles.begin(); it != possibles.end(); it++) { set newPossibles; for (it2 = possibles.begin(); it2 != possibles.end(); it2++) { //all possible combos of one length smaller if (*it != *it2) { newPossibles.insert(*it2); } } getCombos(newPossibles, combos); } } return 0; } catch(exception& e) { m->errorOut(e, "VennCommand", "getCombos"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/commands/venncommand.h000066400000000000000000000026201255543666200207150ustar00rootroot00000000000000#ifndef VENNCOMMAND_H #define VENNCOMMAND_H /* * venncommand.h * Mothur * * Created by Sarah Westcott on 3/30/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "command.hpp" #include "inputdata.h" #include "sharedlistvector.h" #include "venn.h" #include "validcalculator.h" class VennCommand : public Command { public: VennCommand(string); VennCommand(); ~VennCommand() {} vector setParameters(); string getCommandName() { return "venn"; } string getCommandCategory() { return "OTU-Based Approaches"; } string getHelpString(); string getOutputPattern(string); string getCitation() { return "http://www.mothur.org/wiki/Venn"; } string getDescription() { return "generates a Venn diagram from data provided in a shared file"; } int execute(); void help() { m->mothurOut(getHelpString()); } private: InputData* input; SharedListVector* SharedList; Venn* venn; vector vennCalculators; vector lookup; set< set > combos; SAbundVector* sabund; int abund, fontsize, perm; bool abort, allLines, nseqs, sharedOtus; set labels; //holds labels to be used string format, groups, calc, label, outputDir, sharedfile, listfile, inputfile; vector Estimators, Groups, outputNames; set< set > findCombinations(int); int getCombos(set, set< set >&); }; #endif mothur-1.36.1/source/communitytype/000077500000000000000000000000001255543666200173645ustar00rootroot00000000000000mothur-1.36.1/source/communitytype/communitytype.cpp000066400000000000000000000716171255543666200230320ustar00rootroot00000000000000// // communitytype.cpp // Mothur // // Created by SarahsWork on 12/3/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "communitytype.h" /**************************************************************************************************/ //can we get these psi/psi1 calculations into their own math class? //psi calcualtions swiped from gsl library... static const double psi_cs[23] = { -.038057080835217922, .491415393029387130, -.056815747821244730, .008357821225914313, -.001333232857994342, .000220313287069308, -.000037040238178456, .000006283793654854, -.000001071263908506, .000000183128394654, -.000000031353509361, .000000005372808776, -.000000000921168141, .000000000157981265, -.000000000027098646, .000000000004648722, -.000000000000797527, .000000000000136827, -.000000000000023475, .000000000000004027, -.000000000000000691, .000000000000000118, -.000000000000000020 }; static double apsi_cs[16] = { -.0204749044678185, -.0101801271534859, .0000559718725387, -.0000012917176570, .0000000572858606, -.0000000038213539, .0000000003397434, -.0000000000374838, .0000000000048990, -.0000000000007344, .0000000000001233, -.0000000000000228, .0000000000000045, -.0000000000000009, .0000000000000002, -.0000000000000000 }; /**************************************************************************************************/ /* coefficients for Maclaurin summation in hzeta() * B_{2j}/(2j)! */ static double hzeta_c[15] = { 1.00000000000000000000000000000, 0.083333333333333333333333333333, -0.00138888888888888888888888888889, 0.000033068783068783068783068783069, -8.2671957671957671957671957672e-07, 2.0876756987868098979210090321e-08, -5.2841901386874931848476822022e-10, 1.3382536530684678832826980975e-11, -3.3896802963225828668301953912e-13, 8.5860620562778445641359054504e-15, -2.1748686985580618730415164239e-16, 5.5090028283602295152026526089e-18, -1.3954464685812523340707686264e-19, 3.5347070396294674716932299778e-21, -8.9535174270375468504026113181e-23 }; /**************************************************************************************************/ void CommunityTypeFinder::printSilData(ofstream& out, double chi, vector sils){ try { out << setprecision (6) << numPartitions << '\t' << chi; for (int i = 0; i < sils.size(); i++) { out << '\t' << sils[i]; } out << endl; return; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "printSilData"); exit(1); } } /**************************************************************************************************/ void CommunityTypeFinder::printSilData(ostream& out, double chi, vector sils){ try { out << setprecision (6) << numPartitions << '\t' << chi; m->mothurOutJustToLog(toString(numPartitions) + '\t' + toString(chi)); for (int i = 0; i < sils.size(); i++) { out << '\t' << sils[i]; m->mothurOutJustToLog("\t" + toString(sils[i])); } out << endl; m->mothurOutJustToLog("\n"); return; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "printSilData"); exit(1); } } /**************************************************************************************************/ void CommunityTypeFinder::printZMatrix(string fileName, vector sampleName){ try { ofstream printMatrix; m->openOutputFile(fileName, printMatrix); //(fileName.c_str()); printMatrix.setf(ios::fixed, ios::floatfield); printMatrix.setf(ios::showpoint); for(int i=0;ierrorOut(e, "CommunityTypeFinder", "printZMatrix"); exit(1); } } /**************************************************************************************************/ void CommunityTypeFinder::printRelAbund(string fileName, vector otuNames){ try { ofstream printRA; m->openOutputFile(fileName, printRA); //(fileName.c_str()); printRA.setf(ios::fixed, ios::floatfield); printRA.setf(ios::showpoint); vector totals(numPartitions, 0.0000); for(int i=0;icontrol_pressed) { break; } printRA << otuNames[i]; for(int j=0;j= 0.0000){ double std = sqrt(error[j][i]); printRA << '\t' << 100 * exp(lambdaMatrix[j][i]) / totals[j]; printRA << '\t' << 100 * exp(lambdaMatrix[j][i] - 2.0 * std) / totals[j]; printRA << '\t' << 100 * exp(lambdaMatrix[j][i] + 2.0 * std) / totals[j]; } else{ printRA << '\t' << 100 * exp(lambdaMatrix[j][i]) / totals[j]; printRA << '\t' << "NA"; printRA << '\t' << "NA"; } } printRA << endl; } printRA.close(); } catch(exception& e) { m->errorOut(e, "CommunityTypeFinder", "printRelAbund"); exit(1); } } /**************************************************************************************************/ vector > CommunityTypeFinder::getHessian(){ try { vector alpha(numOTUs, 0.0000); double alphaSum = 0.0000; vector pi = zMatrix[currentPartition]; vector psi_ajk(numOTUs, 0.0000); vector psi_cjk(numOTUs, 0.0000); vector psi1_ajk(numOTUs, 0.0000); vector psi1_cjk(numOTUs, 0.0000); for(int j=0;jcontrol_pressed) { break; } alpha[j] = exp(lambdaMatrix[currentPartition][j]); alphaSum += alpha[j]; for(int i=0;icontrol_pressed) { break; } weight += pi[i]; double sum = 0.0000; for(int j=0;j > hessian(numOTUs); for(int i=0;icontrol_pressed) { break; } double term1 = -alpha[i] * (- psi_ajk[i] + psi_Ak + psi_cjk[i] - psi_Ck); double term2 = -alpha[i] * alpha[i] * (-psi1_ajk[i] + psi1_Ak + psi1_cjk[i] - psi1_Ck); double term3 = 0.1 * alpha[i]; hessian[i][i] = term1 + term2 + term3; for(int j=0;jerrorOut(e, "CommunityTypeFinder", "getHessian"); exit(1); } } /**************************************************************************************************/ double CommunityTypeFinder::psi1(double xx){ try { /* Euler-Maclaurin summation formula * [Moshier, p. 400, with several typo corrections] */ double s = 2.0000; const int jmax = 12; const int kmax = 10; int j, k; const double pmax = pow(kmax + xx, -s); double scp = s; double pcp = pmax / (kmax + xx); double value = pmax*((kmax+xx)/(s-1.0) + 0.5); for(k=0; kcontrol_pressed) { return 0; } value += pow(k + xx, -s); } for(j=0; j<=jmax; j++) { if (m->control_pressed) { return 0; } double delta = hzeta_c[j+1] * scp * pcp; value += delta; if(fabs(delta/value) < 0.5*EPSILON) break; scp *= (s+2*j+1)*(s+2*j+2); pcp /= (kmax + xx)*(kmax + xx); } return value; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "psi1"); exit(1); } } /**************************************************************************************************/ double CommunityTypeFinder::psi(double xx){ try { double psiX = 0.0000; if(xx < 1.0000){ double t1 = 1.0 / xx; psiX = cheb_eval(psi_cs, 22, 2.0*xx-1.0); psiX = -t1 + psiX; } else if(xx < 2.0000){ const double v = xx - 1.0; psiX = cheb_eval(psi_cs, 22, 2.0*v-1.0); } else{ const double t = 8.0/(xx*xx)-1.0; psiX = cheb_eval(apsi_cs, 15, t); psiX += log(xx) - 0.5/xx; } return psiX; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "psi"); exit(1); } } /**************************************************************************************************/ double CommunityTypeFinder::cheb_eval(const double seriesData[], int order, double xx){ try { double d = 0.0000; double dd = 0.0000; double x2 = xx * 2.0000; for(int j=order;j>=1;j--){ if (m->control_pressed) { return 0; } double temp = d; d = x2 * d - dd + seriesData[j]; dd = temp; } d = xx * d - dd + 0.5 * seriesData[0]; return d; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "cheb_eval"); exit(1); } } /**************************************************************************************************/ int CommunityTypeFinder::findkMeans(){ try { error.resize(numPartitions); for (int i = 0; i < numPartitions; i++) { error[i].resize(numOTUs, 0.0); } vector > relativeAbundance(numSamples); vector > alphaMatrix; alphaMatrix.resize(numPartitions); lambdaMatrix.resize(numPartitions); for(int i=0;icontrol_pressed) { return 0; } int groupTotal = 0; relativeAbundance[i].assign(numOTUs, 0.0); for(int j=0;j temp; for (int i = 0; i < numSamples; i++) { temp.push_back(i); } random_shuffle(temp.begin(), temp.end()); //assign each partition at least one random sample int numAssignedSamples = 0; for (int i = 0; i < numPartitions; i++) { zMatrix[i][temp[numAssignedSamples]] = 1; numAssignedSamples++; } //assign rest of samples to partitions int count = 0; for(int i=numAssignedSamples;i 1e-6 && iteration < maxIters){ if (m->control_pressed) { return 0; } //calcualte average relative abundance maxChange = 0.0000; for(int i=0;i averageRelativeAbundance(numOTUs, 0); for(int j=0;j maxChange){ maxChange = normChange; } } //calcualte distance between each sample in partition and the average relative abundance for(int i=0;icontrol_pressed) { return 0; } double normalizationFactor = 0; vector totalDistToPartition(numPartitions, 0); for(int j=0;jcontrol_pressed) { return 0; } for(int j=0;j 0){ lambdaMatrix[j][i] = log(alphaMatrix[j][i]); } else{ lambdaMatrix[j][i] = -10.0; } } } return 0; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "kMeans"); exit(1); } } /**************************************************************************************************/ //based on r function .medoid //results is length numOTUs and holds the distances from x of the sample in d with the min sum of distances to all other samples. //Basically the "best" medoid. //returns the sum of the distances squared double CommunityTypeFinder::rMedoid(vector< vector > x, vector< vector > d){ try { vector results; results.resize(numOTUs, 0.0); double minSumDist = 1e6; int minGroup = -1; for (int i = 0; i < d.size(); i++) { if (m->control_pressed) { break; } double thisSum = 0.0; for (int j = 0; j < d[i].size(); j++) { thisSum += d[i][j]; } if (thisSum < minSumDist) { minSumDist = thisSum; minGroup = i; } } if (minGroup != -1) { for (int i = 0; i < numOTUs; i++) { results[i] = x[minGroup][i]; } //save minGroups relativeAbundance for each OTU }else { m->mothurOut("[ERROR]: unable to find rMedoid group.\n"); m->control_pressed = true; } double allMeanDist = 0.0; for (int i = 0; i < x.size(); i++) { //numSamples for (int j = 0; j < x[i].size(); j++) { //numOTus if (m->control_pressed) { break; } allMeanDist += ((x[i][j]-results[j])*(x[i][j]-results[j])); //(otuX sampleY - otuX bestMedoid)^2 } } return allMeanDist; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "rMedoid"); exit(1); } } /**************************************************************************************************/ /*To assess the optimal number of clusters our dataset was most robustly partitioned into, we used the Calinski-Harabasz (CH) Index that has shown good performance in recovering the number of clusters. It is defined as: CHk=Bk/(k−1)/Wk/(n−k) where Bk is the between-cluster sum of squares (i.e. the squared distances between all points i and j, for which i and j are not in the same cluster) and Wk is the within-clusters sum of squares (i.e. the squared distances between all points i and j, for which i and j are in the same cluster). This measure implements the idea that the clustering is more robust when between-cluster distances are substantially larger than within-cluster distances. Consequently, we chose the number of clusters k such that CHk was maximal.*/ double CommunityTypeFinder::calcCHIndex(vector< vector< double> > dists){ try { double CH = 0.0; if (numPartitions < 2) { return CH; } map clusterMap; //map sample to partition for (int j = 0; j < numSamples; j++) { double maxValue = -1e6; for (int i = 0; i < numPartitions; i++) { if (m->control_pressed) { return 0.0; } if (zMatrix[i][j] > maxValue) { //for kmeans zmatrix contains values for each sample in each partition. partition with highest value for that sample is the partition where the sample should be clusterMap[j] = i; maxValue = zMatrix[i][j]; } } } //make countMatrix a relabund vector > relativeAbundance(numSamples); //[numSamples][numOTUs] //get relative abundance for(int i=0;icontrol_pressed) { return 0; } int groupTotal = 0; relativeAbundance[i].assign(numOTUs, 0.0); for(int j=0;j > centers = calcCenters(dists, clusterMap, relativeAbundance); if (m->control_pressed) { return 0.0; } double allMeanDist = rMedoid(relativeAbundance, dists); if (m->debug) { m->mothurOut("[DEBUG]: allMeandDist = " + toString(allMeanDist) + "\n"); } for (int i = 0; i < relativeAbundance.size(); i++) {//numSamples for (int j = 0; j < relativeAbundance[i].size(); j++) { //numOtus if (m->control_pressed) { return 0; } //x <- (x - centers[cl, ])^2 relativeAbundance[i][j] = ((relativeAbundance[i][j] - centers[clusterMap[i]][j])*(relativeAbundance[i][j] - centers[clusterMap[i]][j])); } } double wgss = 0.0; for (int j = 0; j < numOTUs; j++) { for(int i=0;icontrol_pressed) { return 0.0; } wgss += relativeAbundance[i][j]; } } double bgss = allMeanDist - wgss; CH = (bgss / (double)(numPartitions - 1)) / (wgss / (double) (numSamples - numPartitions)); return CH; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "calcCHIndex"); exit(1); } } /**************************************************************************************************/ vector > CommunityTypeFinder::calcCenters(vector >& dists, map clusterMap, vector >& relativeAbundance) { //[numsamples][numsamples] try { //for each partition //choose sample with smallest sum of squared dists // cout << "Here" << clusterMap.size() << endl; // for(map::iterator it = clusterMap.begin(); it != clusterMap.end(); it++) { cout << it->first << '\t' << it->second < > centers; centers.resize(numPartitions); vector sums; sums.resize(numSamples, 0.0); map > partition2Samples; //maps partitions to samples in the partition map >::iterator it; for (int i = 0; i < numSamples; i++) { int partitionI = clusterMap[i]; //add this sample to list of samples in this partition for access later it = partition2Samples.find(partitionI); if (it == partition2Samples.end()) { vector temp; temp.push_back(i); partition2Samples[partitionI] = temp; }else { partition2Samples[partitionI].push_back(i); } for (int j = 0; j < numSamples; j++) { int partitionJ = clusterMap[j]; if (partitionI == partitionJ) { //if you are a distance between samples in the same cluster sums[i] += dists[i][j]; sums[j] += dists[i][j]; }else{}//we dont' care about distance between clusters } } vector medoidsVector; medoidsVector.resize(numPartitions, -1); for (it = partition2Samples.begin(); it != partition2Samples.end(); it++) { //for each partition look for sample with smallest squared //sum dist to all other samples in cluster vector members = it->second; double minSumDist = 1e6; for (int i = 0; i < members.size(); i++) { if (m->control_pressed) { return centers; } if (sums[members[i]] < minSumDist) { minSumDist = sums[members[i]]; medoidsVector[it->first] = members[i]; } } } set medoids; for (int i = 0; i < medoidsVector.size(); i++) { medoids.insert(medoidsVector[i]); } int countPartitions = 0; for (set::iterator it = medoids.begin(); it != medoids.end(); it++) { for (int j = 0; j < numOTUs; j++) { centers[countPartitions].push_back(relativeAbundance[*it][j]); //save the relative abundance of the medoid for this partition for this OTU } countPartitions++; } return centers; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "calcCenters"); exit(1); } } /**************************************************************************************************/ //The silhouette width S(i)of individual data points i is calculated using the following formula: /* s(i) = b(i) - a(i) ----------- max(b(i),a(i)) where a(i) is the average dissimilarity (or distance) of sample i to all other samples in the same cluster, while b(i) is the average dissimilarity (or distance) to all objects in the closest other cluster. The formula implies -1 =< S(i) =< 1 . A sample which is much closer to its own cluster than to any other cluster has a high S(i) value, while S(i) close to 0 implies that the given sample lies somewhere between two clusters. Large negative S(i) values indicate that the sample was assigned to the wrong cluster. */ //based on silouette.r which calls sildist.c written by Francois Romain vector CommunityTypeFinder::calcSilhouettes(vector > dists) { try { vector silhouettes; silhouettes.resize(numSamples, 0.0); if (numPartitions < 2) { return silhouettes; } map clusterMap; //map sample to partition for (int j = 0; j < numSamples; j++) { double maxValue = 0.0; for (int i = 0; i < numPartitions; i++) { if (m->control_pressed) { return silhouettes; } if (zMatrix[i][j] > maxValue) { //for kmeans zmatrix contains values for each sample in each partition. partition with highest value for that sample is the partition where the sample should be clusterMap[j] = i; maxValue = zMatrix[i][j]; } } } //count number of samples in each partition vector counts; counts.resize(numPartitions, 0); vector DiC; DiC.resize((numPartitions*numSamples), 0.0); bool computeSi = true; for (int i = 0; i < numSamples; i++) { int partitionI = clusterMap[i]; counts[partitionI]++; for (int j = i+1; j < numSamples; j++) { if (m->control_pressed) { return silhouettes; } int partitionJ = clusterMap[j]; DiC[numPartitions*i+partitionJ] += dists[i][j]; DiC[numPartitions*j+partitionI] += dists[i][j]; } } vector neighbor; neighbor.resize(numSamples, -1); for (int i = 0; i < numSamples; i++) { if (m->control_pressed) { return silhouettes; } int ki = numPartitions*i; int partitionI = clusterMap[i]; computeSi = true; for (int j = 0; j < numPartitions; j++) { if (j == partitionI) { if (counts[j] == 1) { //only one sample in cluster computeSi = false; }else { DiC[ki+j] /= (counts[j]-1); } }else{ DiC[ki+j] /= counts[j]; } } double ai = DiC[ki+partitionI]; double bi = 0.0; if (partitionI == 0) { bi = DiC[ki+1]; neighbor[i] = 2; } else { bi = DiC[ki]; neighbor[i] = 1; } for (int j = 1; j < numPartitions; j++) { if (j != partitionI) { if (bi > DiC[ki+j]) { bi = DiC[ki + j]; neighbor[i] = j+1; } } } silhouettes[i] = 0.0; if (computeSi && bi != ai) { silhouettes[i] = (bi-ai) / (max(ai, bi)); } } return silhouettes; } catch(exception& e) { m->errorOut(e, "CommunityTypeFinder", "calcSilhouettes"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/communitytype/communitytype.h000066400000000000000000000042151255543666200224650ustar00rootroot00000000000000// // communitytype.h // Mothur // // Created by SarahsWork on 12/3/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_communitytype_h #define Mothur_communitytype_h #define EPSILON numeric_limits::epsilon() #include "mothurout.h" #include "linearalgebra.h" /**************************************************************************************************/ class CommunityTypeFinder { public: CommunityTypeFinder(){ m = MothurOut::getInstance(); } virtual ~CommunityTypeFinder(){}; virtual void printZMatrix(string, vector); virtual void printRelAbund(string, vector); virtual void printFitData(ofstream&) {} virtual void printFitData(ostream&, double) {} virtual void printSilData(ofstream&, double, vector); virtual void printSilData(ostream&, double, vector); virtual double getNLL() { return currNLL; } virtual double getAIC() { return aic; } virtual double getBIC() { return bic; } virtual double getLogDet() { return logDeterminant; } virtual double getLaplace() { return laplace; } virtual double calcCHIndex(vector< vector< double> >); //Calinski-Harabasz virtual vector calcSilhouettes(vector< vector< double> >); protected: int findkMeans(); vector > getHessian(); double psi1(double); double psi(double); double cheb_eval(const double[], int, double); double rMedoid(vector< vector > x, vector< vector > d); vector > calcCenters(vector >&, map, vector >&); MothurOut* m; vector > zMatrix; vector > lambdaMatrix; vector > error; vector > countMatrix; vector weights; int numPartitions; int numSamples; int numOTUs; int currentPartition; double currNLL, aic, bic, logDeterminant, laplace; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/communitytype/kmeans.cpp000066400000000000000000000012751255543666200213530ustar00rootroot00000000000000// // kmeans.cpp // Mothur // // Created by SarahsWork on 12/4/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "kmeans.h" /**************************************************************************************************/ KMeans::KMeans(vector > cm, int p) : CommunityTypeFinder() { try { countMatrix = cm; numSamples = (int)countMatrix.size(); numOTUs = (int)countMatrix[0].size(); numPartitions = p; findkMeans(); } catch(exception& e) { m->errorOut(e, "KMeans", "KMeans"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/communitytype/kmeans.h000066400000000000000000000010251255543666200210110ustar00rootroot00000000000000// // kmeans.h // Mothur // // Created by SarahsWork on 12/4/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_kmeans_h #define Mothur_kmeans_h #include "communitytype.h" /**************************************************************************************************/ class KMeans : public CommunityTypeFinder { public: KMeans(vector >, int); private: }; /**************************************************************************************************/ #endif mothur-1.36.1/source/communitytype/pam.cpp000066400000000000000000000310351255543666200206470ustar00rootroot00000000000000// // pam.cpp // Mothur // // Created by SarahsWork on 12/10/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "pam.h" #define DBL_EPSILON 1e-9 /**************************************************************************************************/ Pam::Pam(vector > c, vector > d, int p) : CommunityTypeFinder() { try { countMatrix = c; numSamples = (int)d.size(); numOTUs = (int)c[0].size(); numPartitions = p; dists = d; largestDist = 0; for (int i = 0; i < dists.size(); i++) { for (int j = i; j < dists.size(); j++) { if (m->control_pressed) { break; } if (dists[i][j] > largestDist) { largestDist = dists[i][j]; } } } buildPhase(); //choosing the medoids swapPhase(); //optimize clusters } catch(exception& e) { m->errorOut(e, "Pam", "Pam"); exit(1); } } /**************************************************************************************************/ //build and swap functions based on pam.c by maechler from R cluster package //sets Dp[0] does not set Dp[1]. chooses intial medoids. int Pam::buildPhase() { try { if (m->debug) { m->mothurOut("[DEBUG]: building medoids\n"); } vector gains; gains.resize(numSamples); largestDist *= 1.1 + 1; //make this distance larger than any distance in the matrix Dp.resize(numSamples); for (int i = 0; i < numSamples; i++) { Dp[i].push_back(largestDist); Dp[i].push_back(largestDist); } //2 smallest dists for this sample in this partition zMatrix.resize(numPartitions); for(int i=0;icontrol_pressed) { break; } if (medoids.count(i) == 0) { //is this sample is NOT a medoid? gains[i] = 0.0; for (int j = 0; j < numSamples; j++) { totalGain = Dp[j][0] - dists[i][j]; if (totalGain > 0.0) { gains[i] += totalGain; } } if (m->debug) { m->mothurOut("[DEBUG]: " + toString(i) + " totalGain = " + toString(totalGain) + "\n"); } if (clusterGain <= gains[i]) { clusterGain = gains[i]; medoid = i; } } } //save medoid value medoids.insert(medoid); if (m->debug) { m->mothurOut("[DEBUG]: new medoid " + toString(medoid) + "\n"); } //update dp values for (int i = 0; i < numSamples; i++) { if (Dp[i][0] > dists[i][medoid]) { Dp[i][0] = dists[i][medoid]; } } } if (m->debug) { m->mothurOut("[DEBUG]: done building medoids\n"); } return 0; } catch(exception& e) { m->errorOut(e, "Pam", "buildPhase"); exit(1); } } /**************************************************************************************************/ //goal to swap medoids with non-medoids to see if we can reduce the overall cost int Pam::swapPhase() { try { if (m->debug) { m->mothurOut("[DEBUG]: swapping medoids\n"); } //calculate cost of initial choice - average distance of samples to their closest medoid double sky = 0.0; double dzsky = 1.0; for (int i = 0; i < numSamples; i++) { sky += Dp[i][0]; } //sky /= (double) numSamples; bool done = false; int hbest, nbest; hbest = -1; nbest = -1; while (!done) { if (m->control_pressed) { break; } updateDp(); dzsky = 1; for (int h = 0; h < numSamples; h++) { if (m->control_pressed) { break; } if (medoids.count(h) == 0) { //this is NOT a medoid for (int i = 0; i < numSamples; i++) { if (medoids.count(i) != 0) { //this is a medoid double dz = 0.0; //Tih sum of distances between objects and closest medoid caused by swapping i and h. Basically the change in cost. If this < 0 its a "good" swap. When all Tih are > 0, then we stop the algo, because we have the optimal medoids. for (int j = 0; j < numSamples; j++) { if (m->control_pressed) { break; } if (dists[i][j] == Dp[j][0]) { double smallValue; smallValue = 0.0; if (Dp[j][1] > dists[h][j]) { smallValue = dists[h][j]; } else { smallValue = Dp[j][1]; } dz += (- Dp[j][0]+ smallValue); }else if (dists[h][j] < Dp[j][0]) { dz += (- Dp[j][0] + dists[h][j]); } } if (dzsky > dz) { dzsky = dz; hbest = h; nbest = i; } }//end if medoid }//end for i }//end if NOT medoid }//end if h if (dzsky < -16 *DBL_EPSILON * fabs(sky)) { medoids.insert(hbest); medoids.erase(nbest); if (m->debug) { m->mothurOut("[DEBUG]: swapping " + toString(hbest) + " " + toString(nbest) + "\n"); } sky += dzsky; }else { done = true; } //stop algo. } //fill zmatrix int count = 0; vector tempMedoids; for (set::iterator it = medoids.begin(); it != medoids.end(); it++) { medoid2Partition[*it] = count; zMatrix[count][*it] = 1; count++; //set medoid in this partition. tempMedoids.push_back(*it); } //which partition do you belong to? laplace = 0; for (int i = 0; i < numSamples; i++) { int partition = 0; double dist = dists[i][tempMedoids[0]]; //assign to first medoid for (int j = 1; j < tempMedoids.size(); j++) { if (dists[i][tempMedoids[j]] < dist) { //is this medoid closer? dist = dists[i][tempMedoids[j]]; partition = j; } } zMatrix[partition][i] = 1; laplace += dist; } laplace /= (double) numSamples; if (m->debug) { for(int i=0;imothurOut("[DEBUG]: partition 1: "); for (int j = 0; j < numSamples; j++) { m->mothurOut(toString(zMatrix[i][j]) + " "); } m->mothurOut("\n"); } m->mothurOut("[DEBUG]: medoids : "); for (set::iterator it = medoids.begin(); it != medoids.end(); it++) { m->mothurOut(toString(*it) + " "); } m->mothurOut("\n"); m->mothurOut("[DEBUG]: laplace : " + toString(laplace)); m->mothurOut("\n"); } if (m->debug) { m->mothurOut("[DEBUG]: done swapping medoids\n"); } return 0; } catch(exception& e) { m->errorOut(e, "Pam", "swapPhase"); exit(1); } } /**************************************************************************************************/ int Pam::updateDp() { try { for (int j = 0; j < numSamples; j++) { if (m->control_pressed) { break; } //initialize dp and ep Dp[j][0] = largestDist; Dp[j][1] = largestDist; for (int i = 0; i < numSamples; i++) { if (medoids.count(i) != 0) { //is this a medoid? if (Dp[j][0] > dists[j][i]) { Dp[j][0] = Dp[j][1]; Dp[j][0] = dists[j][i]; }else if (Dp[j][1] > dists[j][i]) { Dp[j][1] = dists[j][i]; } } } } return 0; } catch(exception& e) { m->errorOut(e, "Pam", "updateDp"); exit(1); } } /**************************************************************************************************/ /*To assess the optimal number of clusters our dataset was most robustly partitioned into, we used the Calinski-Harabasz (CH) Index that has shown good performance in recovering the number of clusters. It is defined as: CHk=Bk/(k−1)/Wk/(n−k) where Bk is the between-cluster sum of squares (i.e. the squared distances between all points i and j, for which i and j are not in the same cluster) and Wk is the within-clusters sum of squares (i.e. the squared distances between all points i and j, for which i and j are in the same cluster). This measure implements the idea that the clustering is more robust when between-cluster distances are substantially larger than within-cluster distances. Consequently, we chose the number of clusters k such that CHk was maximal.*/ //based on R index.G1.r function double Pam::calcCHIndex(vector< vector > dists){ //countMatrix = [numSamples][numOtus] try { double CH = 0.0; if (numPartitions < 2) { return CH; } map clusterMap; //map sample to partition for (int i = 0; i < numPartitions; i++) { for (int j = 0; j < numSamples; j++) { if (m->control_pressed) { return 0.0; } if (zMatrix[i][j] != 0) { clusterMap[j] = i; } } } //make countMatrix a relabund vector > relativeAbundance(numSamples); //[numSamples][numOTUs] //get relative abundance for(int i=0;icontrol_pressed) { return 0; } int groupTotal = 0; relativeAbundance[i].assign(numOTUs, 0.0); for(int j=0;j > centers; centers.resize(numPartitions); int countPartitions = 0; for (set::iterator it = medoids.begin(); it != medoids.end(); it++) { for (int j = 0; j < numOTUs; j++) { centers[countPartitions].push_back(relativeAbundance[*it][j]); //save the relative abundance of the medoid for this partition for this OTU } countPartitions++; } //centers.clear(); //centers = calcCenters(dists, clusterMap, relativeAbundance); double allMeanDist = rMedoid(relativeAbundance, dists); if (m->debug) { m->mothurOut("[DEBUG]: allMeandDist = " + toString(allMeanDist) + "\n"); } for (int i = 0; i < relativeAbundance.size(); i++) {//numSamples for (int j = 0; j < relativeAbundance[i].size(); j++) { //numOtus if (m->control_pressed) { return 0; } //x <- (x - centers[cl, ])^2 relativeAbundance[i][j] = ((relativeAbundance[i][j] - centers[clusterMap[i]][j])*(relativeAbundance[i][j] - centers[clusterMap[i]][j])); } } double wgss = 0.0; for (int j = 0; j < numOTUs; j++) { for(int i=0;icontrol_pressed) { return 0.0; } wgss += relativeAbundance[i][j]; } } double bgss = allMeanDist - wgss; CH = (bgss / (double)(numPartitions - 1)) / (wgss / (double) (numSamples - numPartitions)); return CH; } catch(exception& e){ m->errorOut(e, "Pam", "calcCHIndex"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/communitytype/pam.h000066400000000000000000000021451255543666200203140ustar00rootroot00000000000000// // pam.h // Mothur // // Created by SarahsWork on 12/10/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef Mothur_pam_h #define Mothur_pam_h #include "communitytype.h" //Partitioning Around Medoids /**************************************************************************************************/ class Pam : public CommunityTypeFinder { public: Pam(vector >, vector >, int); double calcCHIndex(vector< vector< double> >); private: set medoids; map medoid2Partition; double largestDist; vector > dists; vector > Dp; // [numSamples][2] - It contains Dp and Ep. Dp is in [numSamples][0] and Ep is in [numSamples][1]. Dp is the distance between p and the closest sample in S and Ep is the distance between p and the second closest object in S. Both are used in the build and swap phases. int buildPhase(); int swapPhase(); int updateDp(); /**************************************************************************************************/ }; #endif mothur-1.36.1/source/communitytype/qFinderDMM.cpp000066400000000000000000000660321255543666200220250ustar00rootroot00000000000000// // qFinderDMM.cpp // pds_dmm // // Created by Patrick Schloss on 11/8/12. // Copyright (c) 2012 University of Michigan. All rights reserved. // #include "qFinderDMM.h" /**************************************************************************************************/ qFinderDMM::qFinderDMM(vector > cm, int p) : CommunityTypeFinder() { try { //cout << "here" << endl; numPartitions = p; countMatrix = cm; numSamples = (int)countMatrix.size(); numOTUs = (int)countMatrix[0].size(); // if (m->debug) { m->mothurOut("before kmeans\n"); } findkMeans(); //if (m->debug) { m->mothurOut("done kMeans\n"); } optimizeLambda(); //if (m->debug) { m->mothurOut("done optimizeLambda\n"); } double change = 1.0000; currNLL = 0.0000; int iter = 0; while(change > 1.0e-6 && iter < 100){ // if (m->debug) { m->mothurOut("Calc_Z: \n"); } calculatePiK(); optimizeLambda(); // if (m->debug) { m->mothurOut("Iter: " + toString(iter) + "\n"); } for(int i=0;idebug) { m->mothurOut("done while loop\n"); } error.resize(numPartitions); logDeterminant = 0.0000; LinearAlgebra l; for(currentPartition=0;currentPartitiondebug) { m->mothurOut("current partition = " + toString(currentPartition) + "\n"); } if(currentPartition > 0){ logDeterminant += (2.0 * log(numSamples) - log(weights[currentPartition])); } //if (m->debug) { m->mothurOut("before hession\n"); } vector > hessian = getHessian(); //if (m->debug) { m->mothurOut("after hession\n"); } vector > invHessian = l.getInverse(hessian); //if (m->debug) { m->mothurOut("after inverse\n"); } for(int i=0;ierrorOut(e, "qFinderDMM", "qFinderDMM"); exit(1); } } /**************************************************************************************************/ void qFinderDMM::printFitData(ofstream& out){ try { out << setprecision (2) << numPartitions << '\t' << getNLL() << '\t' << getLogDet() << '\t' << getBIC() << '\t' << getAIC() << '\t' << laplace << endl; return; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "printFitData"); exit(1); } } /**************************************************************************************************/ void qFinderDMM::printFitData(ostream& out, double minLaplace){ try { if(laplace < minLaplace){ out << setprecision (2) << numPartitions << '\t' << getNLL() << '\t' << getLogDet() << '\t' << getBIC() << '\t' << getAIC() << '\t' << laplace << "***" << endl; }else { out << setprecision (2) << numPartitions << '\t' << getNLL() << '\t' << getLogDet() << '\t' << getBIC() << '\t' << getAIC() << '\t' << laplace << endl; } m->mothurOutJustToLog(toString(numPartitions) + '\t' + toString(getNLL()) + '\t' + toString(getLogDet()) + '\t'); m->mothurOutJustToLog(toString(getBIC()) + '\t' + toString(getAIC()) + '\t' + toString(laplace)); return; } catch(exception& e){ m->errorOut(e, "CommunityTypeFinder", "printFitData"); exit(1); } } /**************************************************************************************************/ // these functions for bfgs2 solver were lifted from the gnu_gsl source code... /* Find a minimum in x=[0,1] of the interpolating quadratic through * (0,f0) (1,f1) with derivative fp0 at x=0. The interpolating * polynomial is q(x) = f0 + fp0 * z + (f1-f0-fp0) * z^2 */ static double interp_quad (double f0, double fp0, double f1, double zl, double zh) { double fl = f0 + zl*(fp0 + zl*(f1 - f0 -fp0)); double fh = f0 + zh*(fp0 + zh*(f1 - f0 -fp0)); double c = 2 * (f1 - f0 - fp0); /* curvature */ double zmin = zl, fmin = fl; if (fh < fmin) { zmin = zh; fmin = fh; } if (c > 0) /* positive curvature required for a minimum */ { double z = -fp0 / c; /* location of minimum */ if (z > zl && z < zh) { double f = f0 + z*(fp0 + z*(f1 - f0 -fp0)); if (f < fmin) { zmin = z; fmin = f; }; } } return zmin; } /**************************************************************************************************/ /* Find a minimum in x=[0,1] of the interpolating cubic through * (0,f0) (1,f1) with derivatives fp0 at x=0 and fp1 at x=1. * * The interpolating polynomial is: * * c(x) = f0 + fp0 * z + eta * z^2 + xi * z^3 * * where eta=3*(f1-f0)-2*fp0-fp1, xi=fp0+fp1-2*(f1-f0). */ double cubic (double c0, double c1, double c2, double c3, double z){ return c0 + z * (c1 + z * (c2 + z * c3)); } /**************************************************************************************************/ void check_extremum (double c0, double c1, double c2, double c3, double z, double *zmin, double *fmin){ /* could make an early return by testing curvature >0 for minimum */ double y = cubic (c0, c1, c2, c3, z); if (y < *fmin) { *zmin = z; /* accepted new point*/ *fmin = y; } } /**************************************************************************************************/ int gsl_poly_solve_quadratic (double a, double b, double c, double *x0, double *x1) { double disc = b * b - 4 * a * c; if (a == 0) /* Handle linear case */ { if (b == 0) { return 0; } else { *x0 = -c / b; return 1; }; } if (disc > 0) { if (b == 0) { double r = fabs (0.5 * sqrt (disc) / a); *x0 = -r; *x1 = r; } else { double sgnb = (b > 0 ? 1 : -1); double temp = -0.5 * (b + sgnb * sqrt (disc)); double r1 = temp / a ; double r2 = c / temp ; if (r1 < r2) { *x0 = r1 ; *x1 = r2 ; } else { *x0 = r2 ; *x1 = r1 ; } } return 2; } else if (disc == 0) { *x0 = -0.5 * b / a ; *x1 = -0.5 * b / a ; return 2 ; } else { return 0; } } /**************************************************************************************************/ double interp_cubic (double f0, double fp0, double f1, double fp1, double zl, double zh){ double eta = 3 * (f1 - f0) - 2 * fp0 - fp1; double xi = fp0 + fp1 - 2 * (f1 - f0); double c0 = f0, c1 = fp0, c2 = eta, c3 = xi; double zmin, fmin; double z0, z1; zmin = zl; fmin = cubic(c0, c1, c2, c3, zl); check_extremum (c0, c1, c2, c3, zh, &zmin, &fmin); { int n = gsl_poly_solve_quadratic (3 * c3, 2 * c2, c1, &z0, &z1); if (n == 2) /* found 2 roots */ { if (z0 > zl && z0 < zh) check_extremum (c0, c1, c2, c3, z0, &zmin, &fmin); if (z1 > zl && z1 < zh) check_extremum (c0, c1, c2, c3, z1, &zmin, &fmin); } else if (n == 1) /* found 1 root */ { if (z0 > zl && z0 < zh) check_extremum (c0, c1, c2, c3, z0, &zmin, &fmin); } } return zmin; } /**************************************************************************************************/ double interpolate (double a, double fa, double fpa, double b, double fb, double fpb, double xmin, double xmax){ /* Map [a,b] to [0,1] */ double z, alpha, zmin, zmax; zmin = (xmin - a) / (b - a); zmax = (xmax - a) / (b - a); if (zmin > zmax) { double tmp = zmin; zmin = zmax; zmax = tmp; }; if(!isnan(fpb) ){ z = interp_cubic (fa, fpa * (b - a), fb, fpb * (b - a), zmin, zmax); } else{ z = interp_quad(fa, fpa * (b - a), fb, zmin, zmax); } alpha = a + z * (b - a); return alpha; } /**************************************************************************************************/ int qFinderDMM::lineMinimizeFletcher(vector& x, vector& p, double f0, double df0, double alpha1, double& alphaNew, double& fAlpha, vector& xalpha, vector& gradient ){ try { double rho = 0.01; double sigma = 0.10; double tau1 = 9.00; double tau2 = 0.05; double tau3 = 0.50; double alpha = alpha1; double alpha_prev = 0.0000; xalpha.resize(numOTUs, 0.0000); double falpha_prev = f0; double dfalpha_prev = df0; double a = 0.0000; double b = alpha; double fa = f0; double fb = 0.0000; double dfa = df0; double dfb = 0.0/0.0; int iter = 0; int maxIters = 100; while(iter++ < maxIters){ if (m->control_pressed) { break; } for(int i=0;i f0 + alpha * rho * df0 || fAlpha >= falpha_prev){ a = alpha_prev; b = alpha; fa = falpha_prev; fb = fAlpha; dfa = dfalpha_prev; dfb = 0.0/0.0; break; } negativeLogDerivEvidenceLambdaPi(xalpha, gradient); double dfalpha = 0.0000; for(int i=0;i= 0){ a = alpha; b = alpha_prev; fa = fAlpha; fb = falpha_prev; dfa = dfalpha; dfb = dfalpha_prev; break; } double delta = alpha - alpha_prev; double lower = alpha + delta; double upper = alpha + tau1 * delta; double alphaNext = interpolate(alpha_prev, falpha_prev, dfalpha_prev, alpha, fAlpha, dfalpha, lower, upper); alpha_prev = alpha; falpha_prev = fAlpha; dfalpha_prev = dfalpha; alpha = alphaNext; } iter = 0; while(iter++ < maxIters){ if (m->control_pressed) { break; } double delta = b - a; double lower = a + tau2 * delta; double upper = b - tau3 * delta; alpha = interpolate(a, fa, dfa, b, fb, dfb, lower, upper); for(int i=0;i f0 + rho * alpha * df0 || fAlpha >= fa){ b = alpha; fb = fAlpha; dfb = 0.0/0.0; } else{ double dfalpha = 0.0000; negativeLogDerivEvidenceLambdaPi(xalpha, gradient); dfalpha = 0.0000; for(int i=0;i= 0 && dfalpha >= 0) || ((b-a) <= 0.000 && dfalpha <= 0))){ b = a; fb = fa; dfb = dfa; a = alpha; fa = fAlpha; dfa = dfalpha; } else{ a = alpha; fa = fAlpha; dfa = dfalpha; } } } return 1; } catch(exception& e) { m->errorOut(e, "qFinderDMM", "lineMinimizeFletcher"); exit(1); } } /**************************************************************************************************/ int qFinderDMM::bfgs2_Solver(vector& x){ try{ // cout << "bfgs2_Solver" << endl; int bfgsIter = 0; double step = 1.0e-6; double delta_f = 0.0000;//f-f0; vector gradient; double f = negativeLogEvidenceLambdaPi(x); // cout << "after negLE" << endl; negativeLogDerivEvidenceLambdaPi(x, gradient); // cout << "after negLDE" << endl; vector x0 = x; vector g0 = gradient; double g0norm = 0; for(int i=0;i p = gradient; double pNorm = 0; for(int i=0;i 0.001 && bfgsIter++ < maxIter){ if (m->control_pressed) { return 0; } double f0 = f; vector dx(numOTUs, 0.0000); double alphaOld, alphaNew; if(pNorm == 0 || g0norm == 0 || df0 == 0){ dx.assign(numOTUs, 0.0000); break; } if(delta_f < 0){ double delta = max(-delta_f, 10 * EPSILON * abs(f0)); alphaOld = min(1.0, 2.0 * delta / (-df0)); } else{ alphaOld = step; } int success = lineMinimizeFletcher(x0, p, f0, df0, alphaOld, alphaNew, f, x, gradient); if(!success){ x = x0; break; } delta_f = f - f0; vector dx0(numOTUs); vector dg0(numOTUs); for(int i=0;i= 0.0) ? -1.0 : +1.0; for(int i=0;ierrorOut(e, "qFinderDMM", "bfgs2_Solver"); exit(1); } } /**************************************************************************************************/ double qFinderDMM::negativeLogEvidenceLambdaPi(vector& x){ try{ vector sumAlphaX(numSamples, 0.0000); double logEAlpha = 0.0000; double sumLambda = 0.0000; double sumAlpha = 0.0000; double logE = 0.0000; double nu = 0.10000; double eta = 0.10000; double weight = 0.00000; for(int i=0;icontrol_pressed) { return 0; } double lambda = x[i]; double alpha = exp(x[i]); logEAlpha += lgamma(alpha); sumLambda += lambda; sumAlpha += alpha; for(int j=0;jerrorOut(e, "qFinderDMM", "negativeLogEvidenceLambdaPi"); exit(1); } } /**************************************************************************************************/ void qFinderDMM::negativeLogDerivEvidenceLambdaPi(vector& x, vector& df){ try{ // cout << "\tstart negativeLogDerivEvidenceLambdaPi" << endl; vector storeVector(numSamples, 0.0000); vector derivative(numOTUs, 0.0000); vector alpha(numOTUs, 0.0000); double store = 0.0000; double nu = 0.1000; double eta = 0.1000; double weight = 0.0000; for(int i=0;icontrol_pressed) { return; } // cout << "start i loop" << endl; // // cout << i << '\t' << alpha[i] << '\t' << x[i] << '\t' << exp(x[i]) << '\t' << store << endl; alpha[i] = exp(x[i]); store += alpha[i]; // cout << "before derivative" << endl; derivative[i] = weight * psi(alpha[i]); // cout << "after derivative" << endl; // cout << i << '\t' << alpha[i] << '\t' << psi(alpha[i]) << '\t' << derivative[i] << endl; for(int j=0;jerrorOut(e, "qFinderDMM", "negativeLogDerivEvidenceLambdaPi"); exit(1); } } /**************************************************************************************************/ double qFinderDMM::getNegativeLogEvidence(vector& lambda, int group){ try { double sumAlpha = 0.0000; double sumAlphaX = 0.0000; double sumLnGamAlpha = 0.0000; double logEvidence = 0.0000; for(int i=0;icontrol_pressed) { return 0; } double alpha = exp(lambda[i]); double X = countMatrix[group][i]; double alphaX = alpha + X; sumLnGamAlpha += lgamma(alpha); sumAlpha += alpha; sumAlphaX += alphaX; logEvidence -= lgamma(alphaX); } sumLnGamAlpha -= lgamma(sumAlpha); logEvidence += lgamma(sumAlphaX); return logEvidence + sumLnGamAlpha; } catch(exception& e){ m->errorOut(e, "qFinderDMM", "getNegativeLogEvidence"); exit(1); } } /**************************************************************************************************/ void qFinderDMM::optimizeLambda(){ try { for(currentPartition=0;currentPartitioncontrol_pressed) { return; } bfgs2_Solver(lambdaMatrix[currentPartition]); } } catch(exception& e){ m->errorOut(e, "qFinderDMM", "optimizeLambda"); exit(1); } } /**************************************************************************************************/ void qFinderDMM::calculatePiK(){ try { vector store(numPartitions); for(int i=0;icontrol_pressed) { return; } double sum = 0.0000; double minNegLogEvidence =numeric_limits::max(); for(int j=0;jcontrol_pressed) { return; } zMatrix[j][i] = weights[j] * exp(-(store[j] - minNegLogEvidence)); sum += zMatrix[j][i]; } for(int j=0;jerrorOut(e, "qFinderDMM", "calculatePiK"); exit(1); } } /**************************************************************************************************/ double qFinderDMM::getNegativeLogLikelihood(){ try { double eta = 0.10000; double nu = 0.10000; vector pi(numPartitions, 0.0000); vector logBAlpha(numPartitions, 0.0000); double doubleSum = 0.0000; for(int i=0;icontrol_pressed) { return 0; } double sumAlphaK = 0.0000; pi[i] = weights[i] / (double)numSamples; for(int j=0;jcontrol_pressed) { return 0; } double probability = 0.0000; double factor = 0.0000; double sum = 0.0000; vector logStore(numPartitions, 0.0000); double offset = -numeric_limits::max(); for(int j=0;j offset){ offset = logStore[k]; } } for(int k=0;kcontrol_pressed) { return 0; } alphaSum += exp(lambdaMatrix[i][j]); lambdaSum += lambdaMatrix[i][j]; } } alphaSum *= -nu; lambdaSum *= eta; return (-doubleSum - L5 - L6 - alphaSum - lambdaSum); } catch(exception& e){ m->errorOut(e, "qFinderDMM", "getNegativeLogLikelihood"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/communitytype/qFinderDMM.h000066400000000000000000000022061255543666200214630ustar00rootroot00000000000000// // qFinderDMM.h // pds_dmm // // Created by Patrick Schloss on 11/8/12. // Copyright (c) 2012 University of Michigan. All rights reserved. // #ifndef pds_dmm_qFinderDMM_h #define pds_dmm_qFinderDMM_h #include "communitytype.h" /**************************************************************************************************/ class qFinderDMM : public CommunityTypeFinder { public: qFinderDMM(vector >, int); void printFitData(ofstream&); void printFitData(ostream&, double); private: void optimizeLambda(); void calculatePiK(); double negativeLogEvidenceLambdaPi(vector&); void negativeLogDerivEvidenceLambdaPi(vector&, vector&); double getNegativeLogEvidence(vector&, int); double getNegativeLogLikelihood(); int lineMinimizeFletcher(vector&, vector&, double, double, double, double&, double&, vector&, vector&); int bfgs2_Solver(vector&);//, double, double); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/completelinkage.cpp000066400000000000000000000017221255543666200203070ustar00rootroot00000000000000 #include "cluster.hpp" /***********************************************************************/ CompleteLinkage::CompleteLinkage(RAbundVector* rav, ListVector* lv, SparseDistanceMatrix* dm, float c, string s, float a) : Cluster(rav, lv, dm, c, s, a) {} /***********************************************************************/ //This function returns the tag of the method. string CompleteLinkage::getTag() { return("fn"); } /***********************************************************************/ //This function updates the distance based on the furthest neighbor method. bool CompleteLinkage::updateDistance(PDistCell& colCell, PDistCell& rowCell) { try { bool changed = false; if (colCell.dist < rowCell.dist) { colCell.dist = rowCell.dist; changed = true; } return(changed); } catch(exception& e) { m->errorOut(e, "CompleteLinkage", "updateDistance"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/consensus.cpp000066400000000000000000000276571255543666200172030ustar00rootroot00000000000000/* * consensuscommand.cpp * Mothur * * Created by Sarah Westcott on 4/29/09. * Copyright 2009 Schloss Lab UMASS AMherst. All rights reserved. * */ #include "consensus.h" //********************************************************************************************************************** Tree* Consensus::getTree(vector& t){ try { numNodes = t[0]->getNumNodes(); numLeaves = t[0]->getNumLeaves(); numTrees = t.size(); //get the possible pairings getSets(t); if (m->control_pressed) { return 0; } consensusTree = new Tree(t[0]->getCountTable()); it2 = nodePairs.find(treeSet); nodePairsInTree[treeSet] = it2->second; //erase treeset because you are adding it nodePairs.erase(treeSet); //set count to numLeaves; count = numLeaves; buildConsensusTree(treeSet); if (m->control_pressed) { delete consensusTree; return 0; } consensusTree->assembleTree(); if (m->control_pressed) { delete consensusTree; return 0; } return consensusTree; return 0; } catch(exception& e) { m->errorOut(e, "Consensus", "execute"); exit(1); } } //********************************************************************************************************************** int Consensus::printSetsInfo() { try { //open file for pairing not included in the tree string notIncluded = "cons.pairs"; ofstream out2; m->openOutputFile(notIncluded, out2); //output species in order out2 << "Species in Order: " << endl << endl; for (int i = 0; i < treeSet.size(); i++) { out2 << i+1 << ". " << treeSet[i] << endl; } //output sets included out2 << endl << "Sets included in the consensus tree:" << endl << endl; if (m->control_pressed) { return 0; } vector temp; for (it2 = nodePairsInTree.begin(); it2 != nodePairsInTree.end(); it2++) { if (m->control_pressed) { return 0; } //only output pairs not leaves if (it2->first.size() > 1) { temp.clear(); //initialize temp to all "." temp.resize(treeSet.size(), "."); //set the spot in temp that represents it2->first[i] to a "*" for (int i = 0; i < it2->first.size(); i++) { //find spot int index = findSpot(it2->first[i]); temp[index] = "*"; //temp[index] = it2->first[i] + " "; } //output temp for (int j = 0; j < temp.size(); j++) { out2 << temp[j]; } out2 << '\t' << it2->second << endl; } } //output sets not included out2 << endl << "Sets NOT included in the consensus tree:" << endl << endl; for (it2 = nodePairs.begin(); it2 != nodePairs.end(); it2++) { if (m->control_pressed) { return 0; } temp.clear(); //initialize temp to all "." temp.resize(treeSet.size(), "."); //set the spot in temp that represents it2->first[i] to a "*" for (int i = 0; i < it2->first.size(); i++) { //find spot int index = findSpot(it2->first[i]); temp[index] = "*"; } //output temp for (int j = 0; j < temp.size(); j++) { out2 << temp[j]; } out2 << '\t' << it2->second << endl; } return 0; } catch(exception& e) { m->errorOut(e, "Consensus", "printSetsInfo"); exit(1); } } //********************************************************************************************************************** int Consensus::buildConsensusTree(vector nodeSet) { try { vector leftChildSet; vector rightChildSet; if (m->control_pressed) { return 1; } //if you are at a leaf if (nodeSet.size() == 1) { //return the vector index of the leaf you are at return consensusTree->getIndex(nodeSet[0]); //terminate recursion }else if (count == numNodes) { return 0; } else { //finds best child pair leftChildSet = getNextAvailableSet(nodeSet, rightChildSet); int left = buildConsensusTree(leftChildSet); int right = buildConsensusTree(rightChildSet); consensusTree->tree[count].setChildren(left, right); consensusTree->tree[count].setLabel(toString(nodePairsInTree[nodeSet]/(float)numTrees)); consensusTree->tree[left].setParent(count); consensusTree->tree[right].setParent(count); count++; return (count-1); } } catch(exception& e) { m->errorOut(e, "Consensus", "buildConcensusTree"); exit(1); } } //********************************************************************************************************************** int Consensus::getSets(vector& t) { try { vector temp; treeSet.clear(); //for each tree add the possible pairs you find for (int i = 0; i < t.size(); i++) { //for each non-leaf node get descendant info. for (int j = numLeaves; j < numNodes; j++) { if (m->control_pressed) { return 1; } temp.clear(); //go through pcounts and pull out descendants for (it = t[i]->tree[j].pcount.begin(); it != t[i]->tree[j].pcount.end(); it++) { temp.push_back(it->first); } //sort temp sort(temp.begin(), temp.end()); it2 = nodePairs.find(temp); if (it2 != nodePairs.end()) { nodePairs[temp]++; }else{ nodePairs[temp] = 1; } } } //add each leaf to terminate recursion in consensus //you want the leaves in there but with insignifigant sightings value so it is added last //for each leaf node get descendant info. for (int j = 0; j < numLeaves; j++) { if (m->control_pressed) { return 1; } //only need the first one since leaves have no descendants but themselves it = t[0]->tree[j].pcount.begin(); temp.clear(); temp.push_back(it->first); //fill treeSet treeSet.push_back(it->first); //add leaf to list but with sighting value less then all non leaf pairs nodePairs[temp] = 0; } sort(treeSet.begin(), treeSet.end()); map< vector, int> nodePairsCopy = nodePairs; //set initial rating on pairs to sightings + subgroup sightings while (nodePairsCopy.size() != 0) { if (m->control_pressed) { return 1; } vector smallOne = getSmallest(nodePairsCopy); int subgrouprate = getSubgroupRating(smallOne); nodePairsInitialRate[smallOne] = nodePairs[smallOne] + subgrouprate; nodePairsCopy.erase(smallOne); } return 0; } catch(exception& e) { m->errorOut(e, "Consensus", "getSets"); exit(1); } } //********************************************************************************************************************** vector Consensus::getSmallest(map< vector, int> nodes) { try{ vector smallest = nodes.begin()->first; int smallsize = smallest.size(); for(it2 = nodes.begin(); it2 != nodes.end(); it2++) { if(it2->first.size() < smallsize) { smallsize = it2->first.size(); smallest = it2->first; } } return smallest; } catch(exception& e) { m->errorOut(e, "Consensus", "getSmallest"); exit(1); } } //********************************************************************************************************************** vector Consensus::getNextAvailableSet(vector bigset, vector& rest) { try { //cout << "new call " << endl << endl << endl; vector largest; largest.clear(); rest.clear(); //if you are just 2 groups if (bigset.size() == 2) { rest.push_back(bigset[0]); largest.push_back(bigset[1]); }else{ rest = bestSplit[bigset][0]; largest = bestSplit[bigset][1]; } //save for printing out later and for branch lengths nodePairsInTree[rest] = nodePairs[rest]; //delete whatever set you return because it is no longer available nodePairs.erase(rest); //save for printing out later and for branch lengths nodePairsInTree[largest] = nodePairs[largest]; //delete whatever set you return because it is no longer available nodePairs.erase(largest); return largest; } catch(exception& e) { m->errorOut(e, "Consensus", "getNextAvailableSet"); exit(1); } } /**********************************************************************************************************************/ int Consensus::getSubgroupRating(vector group) { try { map< vector, int>::iterator ittemp; map< vector< vector > , int >::iterator it3; int rate = 0; // ***********************************************************************************// //1. this function must be called passing it littlest sets to biggest // since it the rating is made from your sighting plus you best splits rating //2. it saves the top pair to use later // ***********************************************************************************// if (group.size() < 3) { return rate; } map< vector, int> possiblePairing; //this is all the subsets of group //go through the sets for (it2 = nodePairs.begin(); it2 != nodePairs.end(); it2++) { //are you a subset of bigset, then save in possiblePairings if (isSubset(group, it2->first) == true) { possiblePairing[it2->first] = it2->second; } } map< vector< vector > , int > rating; while (possiblePairing.size() != 0) { it2 = possiblePairing.begin(); vector temprest = getRestSet(group, it2->first); //is the rest a set available in possiblePairings ittemp = possiblePairing.find(temprest); if (ittemp != possiblePairing.end()) { //if the rest is in the possible pairings then add this pair to rating map vector< vector > temprate; temprate.push_back(it2->first); temprate.push_back(temprest); rating[temprate] = (nodePairsInitialRate[it2->first] + nodePairsInitialRate[temprest]); //erase so you dont add 1,2 and 2,1. possiblePairing.erase(temprest); } possiblePairing.erase(it2); } it3 = rating.begin(); rate = it3->second; vector< vector > topPair = it3->first; //choose the split with the best rating for (it3 = rating.begin(); it3 != rating.end(); it3++) { if (it3->second > rate) { rate = it3->second; topPair = it3->first; } } bestSplit[group] = topPair; return rate; } catch(exception& e) { m->errorOut(e, "Consensus", "getSubgroupRating"); exit(1); } } //********************************************************************************************************************** vector Consensus::getRestSet(vector bigset, vector subset) { try { vector rest; for (int i = 0; i < bigset.size(); i++) { bool inSubset = false; for (int j = 0; j < subset.size(); j++) { if (bigset[i] == subset[j]) { inSubset = true; break; } } //its not in the subset so put it in the rest if (inSubset == false) { rest.push_back(bigset[i]); } } return rest; } catch(exception& e) { m->errorOut(e, "Consensus", "getRestSet"); exit(1); } } //********************************************************************************************************************** bool Consensus::isSubset(vector bigset, vector subset) { try { if (subset.size() > bigset.size()) { return false; } //check if each guy in suset is also in bigset for (int i = 0; i < subset.size(); i++) { bool match = false; for (int j = 0; j < bigset.size(); j++) { if (subset[i] == bigset[j]) { match = true; break; } } //you have a guy in subset that had no match in bigset if (match == false) { return false; } } return true; } catch(exception& e) { m->errorOut(e, "Consensus", "isSubset"); exit(1); } } //********************************************************************************************************************** int Consensus::findSpot(string node) { try { int spot = 0; //check if each guy in suset is also in bigset for (int i = 0; i < treeSet.size(); i++) { if (treeSet[i] == node) { spot = i; break; } } return spot; } catch(exception& e) { m->errorOut(e, "Consensus", "findSpot"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/consensus.h000066400000000000000000000043111255543666200166260ustar00rootroot00000000000000#ifndef CONCENSUS_H #define CONCENSUS_H /* * consensus.h * Mothur * * Created by Sarah Westcott on 4/29/09. * Copyright 2009 Schloss Lab UMASS AMherst. All rights reserved. * */ #include "tree.h" #include "treemap.h" //NOTE: This class assumes all leaf nodes have 1 member. // Mothur does allow for names files with trees which would make a tree with multiple members at one leaf. // This class is currently only called internally by commands that have leaf node containing only 1 member. // But if in the future, this changes things will need to be reworked in getSets and buildConsensus. class Consensus { public: Consensus() { m = MothurOut::getInstance(); } ~Consensus() {} Tree* getTree(vector&); private: MothurOut* m; Tree* consensusTree; vector treeSet; //set containing all members of the tree to start recursion. filled in getSets(). map< vector, int > nodePairs; //, vector< vector > > bestSplit; //maps a group to its best split map< vector, int > nodePairsInitialRate; map< vector, int > nodePairsInTree; map::iterator it; map< vector, int>::iterator it2; string outputFile, notIncluded, filename; int numNodes, numLeaves, count, numTrees; //count is the next available spot in the tree vector vector outputNames; int getSets(vector&); int getSubgroupRating(vector); vector getSmallest(map< vector, int>); vector getNextAvailableSet(vector, vector&); vector getRestSet(vector, vector); bool isSubset(vector, vector); int findSpot(string); int buildConsensusTree(vector); int printSetsInfo(); }; #endif mothur-1.36.1/source/currentfile.h000066400000000000000000000065021255543666200171340ustar00rootroot00000000000000#ifndef CURRENTFILE_H #define CURRENTFILE_H /* * currentfile.h * Mothur * * Created by westcott on 3/15/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothurout.h" #include "mothur.h" /***********************************************/ class CurrentFile { public: static CurrentFile* getInstance() { if(instance == 0) { instance = new CurrentFile(); } return instance; } string getPhylipFile() { return phylipfile; } string getColumnFile() { return columnfile; } string getListFile() { return listfile; } string getRabundFile() { return rabundfile; } string getSabundFile() { return sabundfile; } string getNameFile() { return namefile; } string getGroupFile() { return groupfile; } string getOrderFile() { return orderfile; } string getOrderGroupFile() { return ordergroupfile; } string getTreeFile() { return treefile; } string getSharedFile() { return sharedfile; } string getRelAbundFile() { return relabundfile; } string getDesignFile() { return designfile; } string getFastaFile() { return fastafile; } string getSFFFile() { return sfffile; } string getQualFile() { return qualfile; } string getOligosFile() { return oligosfile; } void setListFile(string f) { listfile = m->getFullPathName(f); } void setTreeFile(string f) { treefile = m->getFullPathName(f); } void setGroupFile(string f) { groupfile = m->getFullPathName(f); } void setPhylipFile(string f) { phylipfile = m->getFullPathName(f); } void setColumnFile(string f) { columnfile = m->getFullPathName(f); } void setNameFile(string f) { namefile = m->getFullPathName(f); } void setRabundFile(string f) { rabundfile = m->getFullPathName(f); } void setSabundFile(string f) { sabundfile = m->getFullPathName(f); } void setSharedFile(string f) { sharedfile = m->getFullPathName(f); } void setRelAbundFile(string f) { relabundfile = m->getFullPathName(f); } void setOrderFile(string f) { orderfile = m->getFullPathName(f); } void setOrderGroupFile(string f) { ordergroupfile = m->getFullPathName(f); } void setDesignFile(string f) { designfile = m->getFullPathName(f); } void setFastaFile(string f) { fastafile = m->getFullPathName(f); } void setSFFFile(string f) { sfffile = m->getFullPathName(f); } void setQualFile(string f) { qualfile = m->getFullPathName(f); } void setOligosFile(string f) { oligosfile = m->getFullPathName(f); } private: MothurOut* m; string phylipfile, columnfile, listfile, rabundfile, sabundfile, namefile, groupfile, designfile; string orderfile, treefile, sharedfile, ordergroupfile, relabundfile, fastafile, qualfile, sfffile, oligosfile; static CurrentFile* instance; CurrentFile( const CurrentFile& ); // Disable copy constructor void operator=( const CurrentFile& ); // Disable assignment operator CurrentFile() { phylipfile = ""; columnfile = ""; listfile = ""; rabundfile = ""; sabundfile = ""; namefile = ""; groupfile = ""; designfile = ""; orderfile = ""; treefile = ""; sharedfile = ""; ordergroupfile = ""; relabundfile = ""; fastafile = ""; qualfile = ""; sfffile = ""; oligosfile = ""; } ~CurrentFile() { instance = 0; } }; /***********************************************/ #endif mothur-1.36.1/source/datastructures/000077500000000000000000000000001255543666200175135ustar00rootroot00000000000000mothur-1.36.1/source/datastructures/alignment.cpp000066400000000000000000000202131255543666200221730ustar00rootroot00000000000000/* * alignment.cpp * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is a class for an abstract datatype for classes that implement various types of alignment algorithms. * As of 12/18/08 these included alignments based on blastn, needleman-wunsch, and the Gotoh algorithms * */ #include "alignmentcell.hpp" #include "alignment.hpp" /**************************************************************************************************/ Alignment::Alignment() { m = MothurOut::getInstance(); /* do nothing */ } /**************************************************************************************************/ Alignment::Alignment(int A) : nCols(A), nRows(A) { try { m = MothurOut::getInstance(); alignment.resize(nRows); // For the Gotoh and Needleman-Wunsch we initialize the dynamic programming for(int i=0;ierrorOut(e, "Alignment", "Alignment"); exit(1); } } /**************************************************************************************************/ Alignment::Alignment(int A, int nk) : nCols(A), nRows(A) { try { m = MothurOut::getInstance(); alignment.resize(nRows); // For the Gotoh and Needleman-Wunsch we initialize the dynamic programming for(int i=0;ierrorOut(e, "Alignment", "Alignment"); exit(1); } } /**************************************************************************************************/ void Alignment::resize(int A) { try { nCols = A; nRows = A; alignment.resize(nRows); for(int i=0;ierrorOut(e, "Alignment", "resize"); exit(1); } } /**************************************************************************************************/ void Alignment::traceBack(){ // This traceback routine is used by the dynamic programming algorithms try { BBaseMap.clear(); ABaseMap.clear(); // to fill the values of seqAaln and seqBaln seqAaln = ""; seqBaln = ""; int row = lB-1; int column = lA-1; // seqAstart = 1; // seqAend = column; AlignmentCell currentCell = alignment[row][column]; // Start the traceback from the bottom-right corner of the // matrix if(currentCell.prevCell == 'x'){ seqAaln = seqBaln = "NOALIGNMENT"; }//If there's an 'x' in the bottom- else{ // right corner bail out because it means nothing got aligned int count = 0; while(currentCell.prevCell != 'x'){ // while the previous cell isn't an 'x', keep going... if(currentCell.prevCell == 'u'){ // if the pointer to the previous cell is 'u', go up in the seqAaln = '-' + seqAaln; // matrix. this indicates that we need to insert a gap in seqBaln = seqB[row] + seqBaln; // seqA and a base in seqB BBaseMap[row] = count; currentCell = alignment[--row][column]; } else if(currentCell.prevCell == 'l'){ // if the pointer to the previous cell is 'l', go to the left seqBaln = '-' + seqBaln; // in the matrix. this indicates that we need to insert a gap seqAaln = seqA[column] + seqAaln; // in seqB and a base in seqA ABaseMap[column] = count; currentCell = alignment[row][--column]; } else{ seqAaln = seqA[column] + seqAaln; // otherwise we need to go diagonally up and to the left, seqBaln = seqB[row] + seqBaln; // here we add a base to both alignments BBaseMap[row] = count; ABaseMap[column] = count; currentCell = alignment[--row][--column]; } count++; } } pairwiseLength = seqAaln.length(); seqAstart = 1; seqAend = 0; seqBstart = 1; seqBend = 0; //flip maps since we now know the total length map newAMap; for (map::iterator it = ABaseMap.begin(); it != ABaseMap.end(); it++) { int spot = it->second; newAMap[pairwiseLength-spot-1] = it->first-1; } ABaseMap = newAMap; map newBMap; for (map::iterator it = BBaseMap.begin(); it != BBaseMap.end(); it++) { int spot = it->second; newBMap[pairwiseLength-spot-1] = it->first-1; } BBaseMap = newBMap; for(int i=0;i=0;i--){ if(seqAaln[i] != '-' && seqBaln[i] == '-') { seqAend++; } else if(seqAaln[i] == '-' && seqBaln[i] != '-') { seqBend++; } else { break; } } pairwiseLength -= (seqAend + seqBend); seqAend = seqA.length() - seqAend - 1; seqBend = seqB.length() - seqBend - 1; } catch(exception& e) { m->errorOut(e, "Alignment", "traceBack"); exit(1); } } /**************************************************************************************************/ Alignment::~Alignment(){ try { for (int i = 0; i < alignment.size(); i++) { for (int j = (alignment[i].size()-1); j >= 0; j--) { alignment[i].pop_back(); } } } catch(exception& e) { m->errorOut(e, "Alignment", "~Alignment"); exit(1); } } /**************************************************************************************************/ string Alignment::getSeqAAln(){ return seqAaln; // this is called to get the alignment of seqA } /**************************************************************************************************/ string Alignment::getSeqBAln(){ return seqBaln; // this is called to get the alignment of seqB } /**************************************************************************************************/ int Alignment::getCandidateStartPos(){ return seqAstart; // this is called to report the quality of the alignment } /**************************************************************************************************/ int Alignment::getCandidateEndPos(){ return seqAend; // this is called to report the quality of the alignment } /**************************************************************************************************/ int Alignment::getTemplateStartPos(){ return seqBstart; // this is called to report the quality of the alignment } /**************************************************************************************************/ map Alignment::getSeqAAlnBaseMap(){ return ABaseMap; } /**************************************************************************************************/ map Alignment::getSeqBAlnBaseMap(){ return BBaseMap; } /**************************************************************************************************/ int Alignment::getTemplateEndPos(){ return seqBend; // this is called to report the quality of the alignment } /**************************************************************************************************/ int Alignment::getPairwiseLength(){ return pairwiseLength; // this is the pairwise alignment length } /**************************************************************************************************/ //int Alignment::getLongestTemplateGap(){ // // int length = seqBaln.length(); // int longGap = 0; // int gapLength = 0; // // int start = seqAstart; // if(seqAstart < seqBstart){ start = seqBstart; } // for(int i=seqAstart;i 0){ // if(gapLength > longGap){ longGap = gapLength; } // } // gapLength = 0; // } // } // return longGap; //} /**************************************************************************************************/ mothur-1.36.1/source/datastructures/alignment.hpp000066400000000000000000000027721255543666200222120ustar00rootroot00000000000000#ifndef DPALIGNMENT_H #define DPALIGNMENT_H /* * dpalignment.h * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is a class for an abstract datatype for classes that implement various types of alignment algorithms. * As of 12/18/08 these included alignments based on blastn, needleman-wunsch, and the Gotoh algorithms * */ #include "mothur.h" #include "alignmentcell.hpp" /**************************************************************************************************/ class Alignment { public: Alignment(int); Alignment(int, int); Alignment(); virtual ~Alignment(); virtual void align(string, string) = 0; virtual void alignPrimer(string, string) {} // float getAlignmentScore(); string getSeqAAln(); string getSeqBAln(); map getSeqAAlnBaseMap(); map getSeqBAlnBaseMap(); int getCandidateStartPos(); int getCandidateEndPos(); int getTemplateStartPos(); int getTemplateEndPos(); int getPairwiseLength(); void resize(int); int getnRows() { return nRows; } // int getLongestTemplateGap(); protected: void traceBack(); string seqA, seqAaln; string seqB, seqBaln; int seqAstart, seqAend; int seqBstart, seqBend; int pairwiseLength; int nRows, nCols, lA, lB; vector > alignment; map ABaseMap; map BBaseMap; MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/alignmentcell.cpp000066400000000000000000000013351255543666200230370ustar00rootroot00000000000000/* * alignmentcell.cpp * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class is pretty basic. Each AlignmentCell object contains a pointer to the previous cell and different values * used to calcualte the alignment. Initially everything is set to zero and all pointers are set to 'x' * */ #include "alignmentcell.hpp" //******************************************************************************************************************** AlignmentCell::AlignmentCell() : prevCell('x'), cValue(0.0000), dValue(0.0000), iValue(0.0000) {} //******************************************************************************************************************** mothur-1.36.1/source/datastructures/alignmentcell.hpp000066400000000000000000000014731255543666200230470ustar00rootroot00000000000000#ifndef ALIGNMENTCELL_H #define ALIGNMENTCELL_H /* * alignmentcell.hpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class is pretty basic. Each AlignmentCell object contains a pointer to the previous cell and different values * used to calcualte the alignment. Initially everything is set to zero and all pointers are set to 'x' * */ #include "mothurout.h" //******************************************************************************************************************** class AlignmentCell { public: AlignmentCell(); ~AlignmentCell() {} char prevCell; float cValue; float dValue; float iValue; }; //******************************************************************************************************************** #endif mothur-1.36.1/source/datastructures/alignmentdb.cpp000066400000000000000000000205211255543666200225030ustar00rootroot00000000000000/* * alignmentdb.cpp * Mothur * * Created by westcott on 11/4/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "alignmentdb.h" #include "kmerdb.hpp" #include "suffixdb.hpp" #include "blastdb.hpp" #include "referencedb.h" /**************************************************************************************************/ AlignmentDB::AlignmentDB(string fastaFileName, string s, int kmerSize, float gapOpen, float gapExtend, float match, float misMatch, int tid){ // This assumes that the template database is in fasta format, may try { // need to alter this in the future? m = MothurOut::getInstance(); longest = 0; method = s; bool needToGenerate = true; ReferenceDB* rdb = ReferenceDB::getInstance(); bool silent = false; threadID = tid; if (fastaFileName == "saved-silent") { fastaFileName = "saved"; silent = true; } if (fastaFileName == "saved") { int start = time(NULL); if (!silent) { m->mothurOutEndLine(); m->mothurOut("Using sequences from " + rdb->getSavedReference() + " that are saved in memory."); m->mothurOutEndLine(); } for (int i = 0; i < rdb->referenceSeqs.size(); i++) { templateSequences.push_back(rdb->referenceSeqs[i]); //save longest base if (rdb->referenceSeqs[i].getUnaligned().length() >= longest) { longest = (rdb->referenceSeqs[i].getUnaligned().length()+1); } } fastaFileName = rdb->getSavedReference(); numSeqs = templateSequences.size(); if (!silent) { m->mothurOut("It took " + toString(time(NULL) - start) + " to load " + toString(rdb->referenceSeqs.size()) + " sequences.");m->mothurOutEndLine(); } }else { int start = time(NULL); m->mothurOutEndLine(); m->mothurOut("Reading in the " + fastaFileName + " template sequences...\t"); cout.flush(); //bool aligned = false; int tempLength = 0; #ifdef USE_MPI int pid, processors; vector positions; MPI_Status status; MPI_File inMPI; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); int tag = 2001; char inFileName[1024]; strcpy(inFileName, fastaFileName.c_str()); MPI_File_open(MPI_COMM_WORLD, inFileName, MPI_MODE_RDONLY, MPI_INFO_NULL, &inMPI); //comm, filename, mode, info, filepointer if (pid == 0) { positions = m->setFilePosFasta(fastaFileName, numSeqs); //fills MPIPos, returns numSeqs //send file positions to all processes for(int i = 1; i < processors; i++) { MPI_Send(&numSeqs, 1, MPI_INT, i, tag, MPI_COMM_WORLD); MPI_Send(&positions[0], (numSeqs+1), MPI_LONG, i, tag, MPI_COMM_WORLD); } }else{ MPI_Recv(&numSeqs, 1, MPI_INT, 0, tag, MPI_COMM_WORLD, &status); positions.resize(numSeqs+1); MPI_Recv(&positions[0], (numSeqs+1), MPI_LONG, 0, tag, MPI_COMM_WORLD, &status); } //read file for(int i=0;icontrol_pressed) { templateSequences.clear(); break; } //read next sequence int length = positions[i+1] - positions[i]; char* buf4 = new char[length]; MPI_File_read_at(inMPI, positions[i], buf4, length, MPI_CHAR, &status); string tempBuf = buf4; if (tempBuf.length() > length) { tempBuf = tempBuf.substr(0, length); } delete buf4; istringstream iss (tempBuf,istringstream::in); Sequence temp(iss); if (temp.getName() != "") { templateSequences.push_back(temp); if (rdb->save) { rdb->referenceSeqs.push_back(temp); } //save longest base if (temp.getUnaligned().length() >= longest) { longest = temp.getUnaligned().length()+1; } if (tempLength != 0) { if (tempLength != temp.getAligned().length()) { m->mothurOut("[ERROR]: template is not aligned, aborting.\n"); m->control_pressed=true; } }else { tempLength = temp.getAligned().length(); } } } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case MPI_File_close(&inMPI); #else ifstream fastaFile; m->openInputFile(fastaFileName, fastaFile); while (!fastaFile.eof()) { Sequence temp(fastaFile); m->gobble(fastaFile); if (m->control_pressed) { templateSequences.clear(); break; } if (temp.getName() != "") { templateSequences.push_back(temp); if (rdb->save) { rdb->referenceSeqs.push_back(temp); } //save longest base if (temp.getUnaligned().length() >= longest) { longest = (temp.getUnaligned().length()+1); } if (tempLength != 0) { if (tempLength != temp.getAligned().length()) { m->mothurOut("[ERROR]: template is not aligned, aborting.\n"); m->control_pressed=true; } }else { tempLength = temp.getAligned().length(); } } } fastaFile.close(); #endif numSeqs = templateSequences.size(); //all of this is elsewhere already! m->mothurOut("DONE."); m->mothurOutEndLine(); cout.flush(); m->mothurOut("It took " + toString(time(NULL) - start) + " to read " + toString(templateSequences.size()) + " sequences."); m->mothurOutEndLine(); } //in case you delete the seqs and then ask for them emptySequence = Sequence(); emptySequence.setName("no_match"); emptySequence.setUnaligned("XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"); emptySequence.setAligned("XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"); string kmerDBName; if(method == "kmer") { search = new KmerDB(fastaFileName, kmerSize); #ifdef USE_MPI #else kmerDBName = fastaFileName.substr(0,fastaFileName.find_last_of(".")+1) + char('0'+ kmerSize) + "mer"; ifstream kmerFileTest(kmerDBName.c_str()); if(kmerFileTest){ bool GoodFile = m->checkReleaseVersion(kmerFileTest, m->getVersion()); if (GoodFile) { needToGenerate = false; } } #endif } else if(method == "suffix") { search = new SuffixDB(numSeqs); } else if(method == "blast") { search = new BlastDB(fastaFileName.substr(0,fastaFileName.find_last_of(".")+1), gapOpen, gapExtend, match, misMatch, "", threadID); } else { method = "kmer"; m->mothurOut(method + " is not a valid search option. I will run the command using kmer, ksize=8."); m->mothurOutEndLine(); search = new KmerDB(fastaFileName, 8); } if (!(m->control_pressed)) { if (needToGenerate) { //add sequences to search for (int i = 0; i < templateSequences.size(); i++) { search->addSequence(templateSequences[i]); if (m->control_pressed) { templateSequences.clear(); break; } } if (m->control_pressed) { templateSequences.clear(); } search->generateDB(); }else if ((method == "kmer") && (!needToGenerate)) { ifstream kmerFileTest(kmerDBName.c_str()); search->readKmerDB(kmerFileTest); } search->setNumSeqs(numSeqs); } } catch(exception& e) { m->errorOut(e, "AlignmentDB", "AlignmentDB"); exit(1); } } /**************************************************************************************************/ AlignmentDB::AlignmentDB(string s){ try { m = MothurOut::getInstance(); method = s; if(method == "suffix") { search = new SuffixDB(); } else if(method == "blast") { search = new BlastDB("", 0); } else { search = new KmerDB(); } //in case you delete the seqs and then ask for them emptySequence = Sequence(); emptySequence.setName("no_match"); emptySequence.setUnaligned("XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"); emptySequence.setAligned("XXXXXXXXXXXXXXXXXXXXXXXXXXXXX"); } catch(exception& e) { m->errorOut(e, "AlignmentDB", "AlignmentDB"); exit(1); } } /**************************************************************************************************/ AlignmentDB::~AlignmentDB() { delete search; } /**************************************************************************************************/ Sequence AlignmentDB::findClosestSequence(Sequence* seq) { try{ vector spot = search->findClosestSequences(seq, 1); if (spot.size() != 0) { return templateSequences[spot[0]]; } else { return emptySequence; } } catch(exception& e) { m->errorOut(e, "AlignmentDB", "findClosestSequence"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/datastructures/alignmentdb.h000066400000000000000000000016771255543666200221630ustar00rootroot00000000000000#ifndef ALIGNMENTDB_H #define ALIGNMENTDB_H /* * alignmentdb.h * Mothur * * Created by westcott on 11/4/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "sequence.hpp" #include "database.hpp" /**************************************************************************************************/ class AlignmentDB { public: AlignmentDB(string, string, int, float, float, float, float, int); //reads fastafile passed in and stores sequences AlignmentDB(string); ~AlignmentDB(); Sequence findClosestSequence(Sequence*); float getSearchScore() { return search->getSearchScore(); } int getLongestBase() { return longest; } private: int numSeqs, longest, threadID; string method; Database* search; vector templateSequences; Sequence emptySequence; MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/blastalign.cpp000066400000000000000000000133431255543666200223430ustar00rootroot00000000000000/* * blastalign.cpp * * * Created by Pat Schloss on 12/16/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is a basic alignment method that gets the blast program to do the heavy lifting. In the future, we should * probably incorporate NCBI's library so that we don't have to call on a user-supplied executable. This is a child * of the Alignment class, which requires a constructor and align method. * */ #include "alignment.hpp" #include "blastalign.hpp" //**************************************************************************************************/ BlastAlignment::BlastAlignment(float go, float ge, float ma, float mm) : match(ma), // This is the score to award for two nucleotides matching (match >= 0) mismatch(mm) // This is the penalty to assess for a mismatch (mismatch <= 0) { path = m->argv; path = path.substr(0, (path.find_last_of('m'))); gapOpen = abs(go); // This is the penalty to assess for opening a gap (gapOpen >= 0) gapExtend = abs(ge); // This is the penalty to assess for extending a gap (gapExtend >= 0) int randNumber = rand(); candidateFileName = toString(randNumber) + ".candidate"; templateFileName = toString(randNumber) + ".template"; blastFileName = toString(randNumber) + ".pairwise"; } //**************************************************************************************************/ BlastAlignment::~BlastAlignment(){ // The desctructor should clean up by removing the temporary m->mothurRemove(candidateFileName); // files used to run bl2seq m->mothurRemove(templateFileName); m->mothurRemove(blastFileName); } //**************************************************************************************************/ void BlastAlignment::align(string seqA, string seqB){ //Use blastn to align the two sequences ofstream candidateFile(candidateFileName.c_str()); // Write the sequence to be aligned to a temporary candidate seq file candidateFile << ">candidate" << endl << seqA << endl; candidateFile.close(); ofstream templateFile(templateFileName.c_str()); // Write the unaligned template sequence to a temporary candidate seq file templateFile << ">template" << endl << seqB << endl; templateFile.close(); // The blastCommand assumes that we have DNA sequences (blastn) and that they are fairly similar (-e 0.001) and // that we don't want to apply any kind of complexity filtering (-F F) string blastCommand = path + "blast/bin/bl2seq -p blastn -i " + candidateFileName + " -j " + templateFileName + " -e 0.0001 -F F -o " + blastFileName + " -W 11"; blastCommand += " -r " + toString(match) + " -q " + toString(mismatch); blastCommand += " -G " + toString(gapOpen) + " -E " + toString(gapExtend); system(blastCommand.c_str()); // Here we assume that "bl2seq" is in the users path or in the same folder as // this executable setPairwiseSeqs(); } /**************************************************************************************************/ void BlastAlignment::setPairwiseSeqs(){ // This method call assigns the blast generated alignment // to the pairwise entry in the Sequence class for the // candidate and template Sequence objects ifstream blastFile; m->openInputFile(blastFileName, blastFile); seqAaln = ""; seqBaln = ""; int candidateLength, templateLength; char d; string candidateName, templateName; while((d=blastFile.get()) != '='){} blastFile >> candidateName; // Get the candidate sequence name from flatfile while((d=blastFile.get()) != '('){} blastFile >> candidateLength; // Get the candidate sequence length from flatfile while((d=blastFile.get())){ if(d == '>'){ blastFile >> templateName; // Get the template sequence name from flatfile break; } else if(d == '*'){ // We go here if there is no significant match seqAstart = 0; seqBstart = 0; seqAend = 0; seqBend = 0; pairwiseLength = 0; // string dummy; // while(dummy != "query:"){ m->mothurOut(dummy, ""); m->mothurOutEndLine(); blastFile >> dummy; } // blastFile >> seqBend; // m->mothurOut(toString(seqBend), ""); m->mothurOutEndLine(); // for(int i=0;i> templateLength; // Get the template sequence length from flatfile while((d=blastFile.get()) != 'Q'){} // Suck up everything else until we get to the start of the alignment int queryStart, sbjctStart, queryEnd, sbjctEnd; string queryLabel, sbjctLabel, query, sbjct; blastFile >> queryLabel; queryLabel = 'Q' + queryLabel; while(queryLabel == "Query:"){ blastFile >> queryStart >> query >> queryEnd; while((d=blastFile.get()) != 'S'){}; blastFile >> sbjctLabel >> sbjctStart >> sbjct >> sbjctEnd; if(seqAaln == ""){ seqAstart = queryStart; seqBstart = sbjctStart; } seqAaln += query; // concatenate each line of the sequence to what we already have seqBaln += sbjct; // for the query and template (subject) sequence blastFile >> queryLabel; } seqAend = queryEnd; seqBend = sbjctEnd; pairwiseLength = seqAaln.length(); for(int i=1;imothurGetpid(threadID); if (m->debug) { m->mothurOut("[DEBUG]: tag = " + tag + "\t pid = " + pid + "\n"); } dbFileName = tag + pid + toString(randNumber) + ".template.unaligned.fasta"; queryFileName = tag + pid + toString(randNumber) + ".candidate.unaligned.fasta"; blastFileName = tag + pid + toString(randNumber) + ".blast"; //make sure blast exists in the write place if (path == "") { path = m->argv; string tempPath = path; for (int i = 0; i < path.length(); i++) { tempPath[i] = tolower(path[i]); } path = path.substr(0, (tempPath.find_last_of('m'))); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) path += "blast/bin/"; #else path += "blast\\bin\\"; #endif } string formatdbCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) formatdbCommand = path + "formatdb"; // format the database, -o option gives us the ability #else formatdbCommand = path + "formatdb.exe"; #endif //test to make sure formatdb exists ifstream in; formatdbCommand = m->getFullPathName(formatdbCommand); int ableToOpen = m->openInputFile(formatdbCommand, in, "no error"); in.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + formatdbCommand + " file does not exist. mothur requires formatdb.exe."); m->mothurOutEndLine(); m->control_pressed = true; } string blastCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) blastCommand = path + "blastall"; // format the database, -o option gives us the ability #else blastCommand = path + "blastall.exe"; //wrap entire string in "" //blastCommand = "\"" + blastCommand + "\""; #endif //test to make sure formatdb exists ifstream in2; blastCommand = m->getFullPathName(blastCommand); ableToOpen = m->openInputFile(blastCommand, in2, "no error"); in2.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + blastCommand + " file does not exist. mothur requires blastall.exe."); m->mothurOutEndLine(); m->control_pressed = true; } string megablastCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) megablastCommand = path + "megablast"; // format the database, -o option gives us the ability #else megablastCommand = path + "megablast.exe"; #endif //test to make sure formatdb exists ifstream in3; megablastCommand = m->getFullPathName(megablastCommand); ableToOpen = m->openInputFile(megablastCommand, in3, "no error"); in3.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + megablastCommand + " file does not exist. mothur requires megablast.exe."); m->mothurOutEndLine(); m->control_pressed = true; } } catch(exception& e) { m->errorOut(e, "BlastDB", "BlastDB"); exit(1); } } /**************************************************************************************************/ BlastDB::BlastDB(string b, int tid) : Database() { try { count = 0; path = b; threadID = tid; //make sure blast exists in the write place if (path == "") { path = m->argv; string tempPath = path; for (int i = 0; i < path.length(); i++) { tempPath[i] = tolower(path[i]); } path = path.substr(0, (tempPath.find_last_of('m'))); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) path += "blast/bin/"; #else path += "blast\\bin\\"; #endif } int randNumber = rand(); string pid = m->mothurGetpid(threadID); dbFileName = pid + toString(randNumber) + ".template.unaligned.fasta"; queryFileName = pid + toString(randNumber) + ".candidate.unaligned.fasta"; blastFileName = pid + toString(randNumber) + ".blast"; string formatdbCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) formatdbCommand = path + "formatdb"; // format the database, -o option gives us the ability #else formatdbCommand = path + "formatdb.exe"; //wrap entire string in "" //formatdbCommand = "\"" + formatdbCommand + "\""; #endif //test to make sure formatdb exists ifstream in; formatdbCommand = m->getFullPathName(formatdbCommand); int ableToOpen = m->openInputFile(formatdbCommand, in, "no error"); in.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + formatdbCommand + " file does not exist. mothur requires formatdb.exe."); m->mothurOutEndLine(); m->control_pressed = true; } string blastCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) blastCommand = path + "blastall"; // format the database, -o option gives us the ability #else blastCommand = path + "blastall.exe"; //wrap entire string in "" //blastCommand = "\"" + blastCommand + "\""; #endif //test to make sure formatdb exists ifstream in2; blastCommand = m->getFullPathName(blastCommand); ableToOpen = m->openInputFile(blastCommand, in2, "no error"); in2.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + blastCommand + " file does not exist. mothur requires blastall.exe."); m->mothurOutEndLine(); m->control_pressed = true; } string megablastCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) megablastCommand = path + "megablast"; // format the database, -o option gives us the ability #else megablastCommand = path + "megablast.exe"; //wrap entire string in "" //megablastCommand = "\"" + megablastCommand + "\""; #endif //test to make sure formatdb exists ifstream in3; megablastCommand = m->getFullPathName(megablastCommand); ableToOpen = m->openInputFile(megablastCommand, in3, "no error"); in3.close(); if(ableToOpen == 1) { m->mothurOut("[ERROR]: " + megablastCommand + " file does not exist. mothur requires megablast.exe."); m->mothurOutEndLine(); m->control_pressed = true; } } catch(exception& e) { m->errorOut(e, "BlastDB", "BlastDB"); exit(1); } } /**************************************************************************************************/ BlastDB::~BlastDB(){ try{ m->mothurRemove(queryFileName); // let's clean stuff up and remove the temp files m->mothurRemove(dbFileName); // let's clean stuff up and remove the temp files m->mothurRemove((dbFileName+".nsq")); // let's clean stuff up and remove the temp files m->mothurRemove((dbFileName+".nsi")); // let's clean stuff up and remove the temp files m->mothurRemove((dbFileName+".nsd")); // let's clean stuff up and remove the temp files m->mothurRemove((dbFileName+".nin")); // let's clean stuff up and remove the temp files m->mothurRemove((dbFileName+".nhr")); // let's clean stuff up and remove the temp files m->mothurRemove(blastFileName.c_str()); // let's clean stuff up and remove the temp files } catch(exception& e) { m->errorOut(e, "BlastDB", "~BlastDB"); exit(1); } } /**************************************************************************************************/ //assumes you have added all the template sequences using the addSequence function and run generateDB. vector BlastDB::findClosestSequences(Sequence* seq, int n) { try{ vector topMatches; ofstream queryFile; int randNumber = rand(); string pid = scrubName(seq->getName()); m->openOutputFile((queryFileName+pid+toString(randNumber)), queryFile); queryFile << '>' << seq->getName() << endl; queryFile << seq->getUnaligned() << endl; queryFile.close(); // the goal here is to quickly survey the database to find the closest match. To do this we are using the default // wordsize used in megablast. I'm sure we're sacrificing accuracy for speed, but anyother way would take way too // long. With this setting, it seems comparable in speed to the suffix tree approach. string blastCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) blastCommand = path + "blastall -p blastn -d " + dbFileName + " -m 8 -W 28 -v " + toString(n) + " -b " + toString(n); blastCommand += (" -i " + (queryFileName+pid+toString(randNumber)) + " -o " + blastFileName+pid+toString(randNumber)); #else blastCommand = "\"" + path + "blastall\" -p blastn -d " + "\"" + dbFileName + "\"" + " -m 8 -W 28 -v " + toString(n) + " -b " + toString(n); blastCommand += (" -i " + (queryFileName+pid+toString(randNumber)) + " -o " + blastFileName+pid+toString(randNumber)); //wrap entire string in "" blastCommand = "\"" + blastCommand + "\""; #endif system(blastCommand.c_str()); ifstream m8FileHandle; m->openInputFile(blastFileName+pid+toString(randNumber), m8FileHandle, "no error"); string dummy; int templateAccession; m->gobble(m8FileHandle); while(!m8FileHandle.eof()){ m8FileHandle >> dummy >> templateAccession >> searchScore; //get rest of junk in line while (!m8FileHandle.eof()) { char c = m8FileHandle.get(); if (c == 10 || c == 13){ break; } } m->gobble(m8FileHandle); topMatches.push_back(templateAccession); } m8FileHandle.close(); m->mothurRemove((queryFileName+pid+toString(randNumber))); m->mothurRemove((blastFileName+pid+toString(randNumber))); return topMatches; } catch(exception& e) { m->errorOut(e, "BlastDB", "findClosestSequences"); exit(1); } } /**************************************************************************************************/ //assumes you have added all the template sequences using the addSequence function and run generateDB. vector BlastDB::findClosestMegaBlast(Sequence* seq, int n, int minPerID) { try{ vector topMatches; float numBases, mismatch, gap, startQuery, endQuery, startRef, endRef, score; Scores.clear(); ofstream queryFile; int randNumber = rand(); string pid = scrubName(seq->getName()); m->openOutputFile((queryFileName+pid+toString(randNumber)), queryFile); queryFile << '>' << seq->getName() << endl; queryFile << seq->getUnaligned() << endl; queryFile.close(); // cout << seq->getUnaligned() << endl; // the goal here is to quickly survey the database to find the closest match. To do this we are using the default // wordsize used in megablast. I'm sure we're sacrificing accuracy for speed, but anyother way would take way too // long. With this setting, it seems comparable in speed to the suffix tree approach. //7000004128189528left 0 100 66 0 0 1 66 61 126 1e-31 131 string blastCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) blastCommand = path + "megablast -e 1e-10 -d " + dbFileName + " -m 8 -b " + toString(n) + " -v " + toString(n); //-W 28 -p blastn blastCommand += (" -i " + (queryFileName+pid+toString(randNumber)) + " -o " + blastFileName+pid+toString(randNumber)); #else //blastCommand = path + "blast\\bin\\megablast -e 1e-10 -d " + dbFileName + " -m 8 -b " + toString(n) + " -v " + toString(n); //-W 28 -p blastn //blastCommand += (" -i " + (queryFileName+toString(randNumber)) + " -o " + blastFileName+toString(randNumber)); blastCommand = "\"" + path + "megablast\" -e 1e-10 -d " + "\"" + dbFileName + "\"" + " -m 8 -b " + toString(n) + " -v " + toString(n); //-W 28 -p blastn blastCommand += (" -i " + (queryFileName+pid+toString(randNumber)) + " -o " + blastFileName+pid+toString(randNumber)); //wrap entire string in "" blastCommand = "\"" + blastCommand + "\""; #endif system(blastCommand.c_str()); ifstream m8FileHandle; m->openInputFile(blastFileName+pid+toString(randNumber), m8FileHandle, "no error"); string dummy, eScore; int templateAccession; m->gobble(m8FileHandle); while(!m8FileHandle.eof()){ m8FileHandle >> dummy >> templateAccession >> searchScore >> numBases >> mismatch >> gap >> startQuery >> endQuery >> startRef >> endRef >> eScore >> score; // cout << dummy << '\t' << templateAccession << '\t' << searchScore << '\t' << numBases << '\t' << mismatch << '\t' << gap << '\t' << startQuery << '\t' << endQuery << '\t' << startRef << '\t' << endRef << '\t' << eScore << '\t' << score << endl; //get rest of junk in line //while (!m8FileHandle.eof()) { char c = m8FileHandle.get(); if (c == 10 || c == 13){ break; }else{ cout << c; } } // //cout << endl; m->gobble(m8FileHandle); if (searchScore >= minPerID) { topMatches.push_back(templateAccession); Scores.push_back(searchScore); } //cout << templateAccession << endl; } m8FileHandle.close(); m->mothurRemove((queryFileName+pid+toString(randNumber))); m->mothurRemove((blastFileName+pid+toString(randNumber))); //cout << "\n" ; return topMatches; } catch(exception& e) { m->errorOut(e, "BlastDB", "findClosestMegaBlast"); exit(1); } } /**************************************************************************************************/ void BlastDB::addSequence(Sequence seq) { try { ofstream unalignedFastaFile; m->openOutputFileAppend(dbFileName, unalignedFastaFile); // generating a fasta file with unaligned template unalignedFastaFile << '>' << count << endl; // sequences, which will be input to formatdb unalignedFastaFile << seq.getUnaligned() << endl; unalignedFastaFile.close(); count++; } catch(exception& e) { m->errorOut(e, "BlastDB", "addSequence"); exit(1); } } /**************************************************************************************************/ void BlastDB::generateDB() { try { //m->mothurOut("Generating the temporary BLAST database...\t"); cout.flush(); string formatdbCommand; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) formatdbCommand = path + "formatdb -p F -o T -i " + dbFileName; // format the database, -o option gives us the ability #else //formatdbCommand = path + "blast\\bin\\formatdb -p F -o T -i " + dbFileName; // format the database, -o option gives us the ability formatdbCommand = "\"" + path + "formatdb\" -p F -o T -i " + "\"" + dbFileName + "\""; //wrap entire string in "" formatdbCommand = "\"" + formatdbCommand + "\""; #endif //cout << formatdbCommand << endl; system(formatdbCommand.c_str()); // to get the right sequence names, i think. -p F // option tells formatdb that seqs are DNA, not prot //m->mothurOut("DONE."); m->mothurOutEndLine(); m->mothurOutEndLine(); cout.flush(); } catch(exception& e) { m->errorOut(e, "BlastDB", "generateDB"); exit(1); } } /**************************************************************************************************/ string BlastDB::scrubName(string seqName) { try { string cleanName = ""; for (int i = 0; i < seqName.length(); i++) { if (isalnum(seqName[i])) { cleanName += seqName[i]; } else { cleanName += "_"; } } return cleanName; } catch(exception& e) { m->errorOut(e, "BlastDB", "scrubName"); exit(1); } } /**************************************************************************************************/ /**************************************************************************************************/ mothur-1.36.1/source/datastructures/blastdb.hpp000066400000000000000000000013161255543666200216400ustar00rootroot00000000000000#ifndef BLASTDB_HPP #define BLASTDB_HPP /* * blastdb.hpp * * * Created by Pat Schloss on 12/22/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" class BlastDB : public Database { public: BlastDB(string, float, float, float, float, string, int); BlastDB(string, int); ~BlastDB(); void generateDB(); void addSequence(Sequence); vector findClosestSequences(Sequence*, int); vector findClosestMegaBlast(Sequence*, int, int); private: string scrubName(string); string dbFileName; string queryFileName; string blastFileName; string path; int count, threadID; float gapOpen; float gapExtend; float match; float misMatch; }; #endif mothur-1.36.1/source/datastructures/compare.h000066400000000000000000000017211255543666200213130ustar00rootroot00000000000000// // compare.h // Mothur // // Created by Sarah Westcott on 3/4/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #ifndef __Mothur__compare__ #define __Mothur__compare__ #include "mothurout.h" class Compare { public: Compare(){ AA=0; AT=0; AG=0; AC=0; TA=0; TT=0; TG=0; TC=0; GA=0; GT=0; GG=0; GC=0; CA=0; CT=0; CG=0; CC=0; NA=0; NT=0; NG=0; NC=0; Ai=0; Ti=0; Gi=0; Ci=0; Ni=0; dA=0; dT=0; dG=0; dC=0; refName = ""; queryName = ""; weight = 1; matches = 0; mismatches = 0; total = 0; errorRate = 1.0000; sequence = ""; } ~Compare(){} int AA, AT, AG, AC, TA, TT, TG, TC, GA, GT, GG, GC, CA, CT, CG, CC, NA, NT, NG, NC, Ai, Ti, Gi, Ci, Ni, dA, dT, dG, dC; string refName, queryName, sequence; double errorRate; int weight, matches, mismatches, total; }; #endif /* defined(__Mothur__compare__) */ mothur-1.36.1/source/datastructures/counttable.cpp000066400000000000000000001142331255543666200223630ustar00rootroot00000000000000// // counttable.cpp // Mothur // // Created by Sarah Westcott on 6/26/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "counttable.h" /************************************************************/ int CountTable::createTable(set& n, map& g, set& gs) { try { int numGroups = 0; groups.clear(); totalGroups.clear(); indexGroupMap.clear(); indexNameMap.clear(); counts.clear(); for (set::iterator it = gs.begin(); it != gs.end(); it++) { groups.push_back(*it); hasGroups = true; } numGroups = groups.size(); totalGroups.resize(numGroups, 0); //sort groups to keep consistent with how we store the groups in groupmap sort(groups.begin(), groups.end()); for (int i = 0; i < groups.size(); i++) { indexGroupMap[groups[i]] = i; } m->setAllGroups(groups); uniques = 0; total = 0; for (set::iterator it = n.begin(); it != n.end(); it++) { if (m->control_pressed) { break; } string seqName = *it; vector groupCounts; groupCounts.resize(numGroups, 0); map::iterator itGroup = g.find(seqName); if (itGroup != g.end()) { groupCounts[indexGroupMap[itGroup->second]] = 1; totalGroups[indexGroupMap[itGroup->second]]++; }else { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seqName, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seqName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } m->mothurOut("[ERROR]: Your group file does not contain " + seqName + ". Please correct."); m->mothurOutEndLine(); } map::iterator it2 = indexNameMap.find(seqName); if (it2 == indexNameMap.end()) { if (hasGroups) { counts.push_back(groupCounts); } indexNameMap[seqName] = uniques; totals.push_back(1); total++; uniques++; } } if (hasGroups) { for (int i = 0; i < totalGroups.size(); i++) { if (totalGroups[i] == 0) { m->mothurOut("\nRemoving group: " + groups[i] + " because all sequences have been removed.\n"); removeGroup(groups[i]); i--; } } } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "createTable"); exit(1); } } /************************************************************/ bool CountTable::testGroups(string file) { try { m = MothurOut::getInstance(); hasGroups = false; total = 0; ifstream in; m->openInputFile(file, in); string headers = m->getline(in); m->gobble(in); vector columnHeaders = m->splitWhiteSpace(headers); if (columnHeaders.size() > 2) { hasGroups = true; } return hasGroups; } catch(exception& e) { m->errorOut(e, "CountTable", "readTable"); exit(1); } } /************************************************************/ int CountTable::createTable(string namefile, string groupfile, bool createGroup) { try { if (namefile == "") { m->mothurOut("[ERROR]: namefile cannot be blank when creating a count table.\n"); m->control_pressed = true; } GroupMap* groupMap; int numGroups = 0; groups.clear(); totalGroups.clear(); indexGroupMap.clear(); indexNameMap.clear(); counts.clear(); map originalGroupIndexes; if (groupfile != "") { hasGroups = true; groupMap = new GroupMap(groupfile); groupMap->readMap(); numGroups = groupMap->getNumGroups(); groups = groupMap->getNamesOfGroups(); totalGroups.resize(numGroups, 0); }else if(createGroup) { hasGroups = true; numGroups = 1; groups.push_back("Group1"); totalGroups.resize(numGroups, 0); } //sort groups to keep consistent with how we store the groups in groupmap sort(groups.begin(), groups.end()); for (int i = 0; i < groups.size(); i++) { indexGroupMap[groups[i]] = i; } m->setAllGroups(groups); bool error = false; string name; uniques = 0; total = 0; //open input file ifstream in; m->openInputFile(namefile, in); int total = 0; while (!in.eof()) { if (m->control_pressed) { break; } string firstCol, secondCol; in >> firstCol; m->gobble(in); in >> secondCol; m->gobble(in); m->checkName(firstCol); m->checkName(secondCol); vector names; m->splitAtChar(secondCol, names, ','); map groupCounts; int thisTotal = 0; if (groupfile != "") { //set to 0 for (int i = 0; i < groups.size(); i++) { groupCounts[groups[i]] = 0; } //get counts for each of the users groups for (int i = 0; i < names.size(); i++) { string group = groupMap->getGroup(names[i]); if (group == "not found") { m->mothurOut("[ERROR]: " + names[i] + " is not in your groupfile, please correct."); m->mothurOutEndLine(); error=true; } else { map::iterator it = groupCounts.find(group); //if not found, then this sequence is not from a group we care about if (it != groupCounts.end()) { it->second++; thisTotal++; } } } }else if (createGroup) { groupCounts["Group1"]=0; for (int i = 0; i < names.size(); i++) { string group = "Group1"; groupCounts["Group1"]++; thisTotal++; } }else { thisTotal = names.size(); } //if group info, then read it vector thisGroupsCount; thisGroupsCount.resize(numGroups, 0); for (int i = 0; i < numGroups; i++) { thisGroupsCount[i] = groupCounts[groups[i]]; totalGroups[i] += thisGroupsCount[i]; } map::iterator it = indexNameMap.find(firstCol); if (it == indexNameMap.end()) { if (hasGroups) { counts.push_back(thisGroupsCount); } indexNameMap[firstCol] = uniques; totals.push_back(thisTotal); total += thisTotal; uniques++; }else { error = true; m->mothurOut("[ERROR]: Your count table contains more than 1 sequence named " + firstCol + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } } in.close(); if (error) { m->control_pressed = true; } else { //check for zero groups if (hasGroups) { for (int i = 0; i < totalGroups.size(); i++) { if (totalGroups[i] == 0) { m->mothurOut("\nRemoving group: " + groups[i] + " because all sequences have been removed.\n"); removeGroup(groups[i]); i--; } } } } if (groupfile != "") { delete groupMap; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "createTable"); exit(1); } } /************************************************************/ int CountTable::readTable(string file, bool readGroups, bool mothurRunning) { try { filename = file; ifstream in; m->openInputFile(filename, in); string headers = m->getline(in); m->gobble(in); vector columnHeaders = m->splitWhiteSpace(headers); int numGroups = 0; groups.clear(); totalGroups.clear(); indexGroupMap.clear(); indexNameMap.clear(); counts.clear(); map originalGroupIndexes; if ((columnHeaders.size() > 2) && readGroups) { hasGroups = true; numGroups = columnHeaders.size() - 2; } for (int i = 2; i < columnHeaders.size(); i++) { groups.push_back(columnHeaders[i]); originalGroupIndexes[i-2] = columnHeaders[i]; totalGroups.push_back(0); } //sort groups to keep consistent with how we store the groups in groupmap sort(groups.begin(), groups.end()); for (int i = 0; i < groups.size(); i++) { indexGroupMap[groups[i]] = i; } m->setAllGroups(groups); bool error = false; string name; int thisTotal; uniques = 0; total = 0; while (!in.eof()) { if (m->control_pressed) { break; } in >> name; m->gobble(in); in >> thisTotal; m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: " + name + '\t' + toString(thisTotal) + "\n"); } if ((thisTotal == 0) && !mothurRunning) { error=true; m->mothurOut("[ERROR]: Your count table contains a sequence named " + name + " with a total=0. Please correct."); m->mothurOutEndLine(); } //if group info, then read it vector groupCounts; groupCounts.resize(numGroups, 0); if (columnHeaders.size() > 2) { //file contains groups if (readGroups) { //user wants to save them for (int i = 0; i < numGroups; i++) { int thisIndex = indexGroupMap[originalGroupIndexes[i]]; in >> groupCounts[thisIndex]; m->gobble(in); totalGroups[thisIndex] += groupCounts[thisIndex]; } }else { //read and discard m->getline(in); m->gobble(in); } } map::iterator it = indexNameMap.find(name); if (it == indexNameMap.end()) { if (hasGroups) { counts.push_back(groupCounts); } indexNameMap[name] = uniques; totals.push_back(thisTotal); total += thisTotal; uniques++; }else { error = true; m->mothurOut("[ERROR]: Your count table contains more than 1 sequence named " + name + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } } in.close(); if (error) { m->control_pressed = true; } else { //check for zero groups if (hasGroups) { for (int i = 0; i < totalGroups.size(); i++) { if (totalGroups[i] == 0) { m->mothurOut("\nRemoving group: " + groups[i] + " because all sequences have been removed.\n"); removeGroup(groups[i]); i--; } } } } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "readTable"); exit(1); } } /************************************************************/ int CountTable::printTable(string file) { try { ofstream out; m->openOutputFile(file, out); out << "Representative_Sequence\ttotal"; for (int i = 0; i < groups.size(); i++) { out << '\t' << groups[i]; } out << endl; map reverse; //use this to preserve order for (map::iterator it = indexNameMap.begin(); it !=indexNameMap.end(); it++) { reverse[it->second] = it->first; } for (int i = 0; i < totals.size(); i++) { map::iterator itR = reverse.find(i); if (itR != reverse.end()) { //will equal end if seqs were removed because remove just removes from indexNameMap out << itR->second << '\t' << totals[i]; if (hasGroups) { for (int j = 0; j < groups.size(); j++) { out << '\t' << counts[i][j]; } } out << endl; } } /*for (map::iterator itNames = indexNameMap.begin(); itNames != indexNameMap.end(); itNames++) { out << itNames->first << '\t' << totals[itNames->second]; if (hasGroups) { for (int i = 0; i < groups.size(); i++) { out << '\t' << counts[itNames->second][i]; } } out << endl; }*/ out.close(); return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "printTable"); exit(1); } } /************************************************************/ int CountTable::printHeaders(ofstream& out) { try { out << "Representative_Sequence\ttotal"; for (int i = 0; i < groups.size(); i++) { out << '\t' << groups[i]; } out << endl; return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "printHeaders"); exit(1); } } /************************************************************/ int CountTable::printSeq(ofstream& out, string seqName) { try { map::iterator it = indexNameMap.find(seqName); if (it == indexNameMap.end()) { m->mothurOut("[ERROR]: " + seqName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { out << it->first << '\t' << totals[it->second]; if (hasGroups) { for (int i = 0; i < groups.size(); i++) { out << '\t' << counts[it->second][i]; } } out << endl; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "printSeq"); exit(1); } } /************************************************************/ //group counts for a seq vector CountTable::getGroupCounts(string seqName) { try { vector temp; if (hasGroups) { map::iterator it = indexNameMap.find(seqName); if (it == indexNameMap.end()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seqName, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seqName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } m->mothurOut("[ERROR]: " + seqName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { temp = counts[it->second]; } }else{ m->mothurOut("[ERROR]: Your count table does not have group info. Please correct.\n"); m->control_pressed = true; } return temp; } catch(exception& e) { m->errorOut(e, "CountTable", "getGroupCounts"); exit(1); } } /************************************************************/ //total number of sequences for the group int CountTable::getGroupCount(string groupName) { try { if (hasGroups) { map::iterator it = indexGroupMap.find(groupName); if (it == indexGroupMap.end()) { m->mothurOut("[ERROR]: group " + groupName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { return totalGroups[it->second]; } }else{ m->mothurOut("[ERROR]: Your count table does not have group info. Please correct.\n"); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "getGroupCount"); exit(1); } } /************************************************************/ //total number of sequences for the seq for the group int CountTable::getGroupCount(string seqName, string groupName) { try { if (hasGroups) { map::iterator it = indexGroupMap.find(groupName); if (it == indexGroupMap.end()) { m->mothurOut("[ERROR]: group " + groupName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { map::iterator it2 = indexNameMap.find(seqName); if (it2 == indexNameMap.end()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seqName, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seqName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } m->mothurOut("[ERROR]: seq " + seqName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { return counts[it2->second][it->second]; } } }else{ m->mothurOut("[ERROR]: Your count table does not have group info. Please correct.\n"); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "getGroupCount"); exit(1); } } /************************************************************/ //set the number of sequences for the seq for the group int CountTable::setAbund(string seqName, string groupName, int num) { try { if (hasGroups) { map::iterator it = indexGroupMap.find(groupName); if (it == indexGroupMap.end()) { m->mothurOut("[ERROR]: " + groupName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { map::iterator it2 = indexNameMap.find(seqName); if (it2 == indexNameMap.end()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seqName, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seqName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } m->mothurOut("[ERROR]: " + seqName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { int oldCount = counts[it2->second][it->second]; counts[it2->second][it->second] = num; totalGroups[it->second] += (num - oldCount); total += (num - oldCount); totals[it2->second] += (num - oldCount); } } }else{ m->mothurOut("[ERROR]: Your count table does not have group info. Please correct.\n"); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "set"); exit(1); } } /************************************************************/ //add group int CountTable::addGroup(string groupName) { try { bool sanity = m->inUsersGroups(groupName, groups); if (sanity) { m->mothurOut("[ERROR]: " + groupName + " is already in the count table, cannot add again.\n"); m->control_pressed = true; return 0; } groups.push_back(groupName); if (!hasGroups) { counts.resize(uniques); } for (int i = 0; i < counts.size(); i++) { counts[i].push_back(0); } totalGroups.push_back(0); indexGroupMap[groupName] = groups.size()-1; map originalGroupMap = indexGroupMap; //important to play well with others, :) sort(groups.begin(), groups.end()); //fix indexGroupMap && totalGroups vector newTotals; newTotals.resize(groups.size(), 0); for (int i = 0; i < groups.size(); i++) { indexGroupMap[groups[i]] = i; //find original spot of group[i] int index = originalGroupMap[groups[i]]; newTotals[i] = totalGroups[index]; } totalGroups = newTotals; //fix counts vectors for (int i = 0; i < counts.size(); i++) { vector newCounts; newCounts.resize(groups.size(), 0); for (int j = 0; j < groups.size(); j++) { //find original spot of group[i] int index = originalGroupMap[groups[j]]; newCounts[j] = counts[i][index]; } counts[i] = newCounts; } hasGroups = true; m->setAllGroups(groups); return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "addGroup"); exit(1); } } /************************************************************/ //remove group int CountTable::removeGroup(string groupName) { try { if (hasGroups) { //save for later in case removing a group means we need to remove a seq. map reverse; for (map::iterator it = indexNameMap.begin(); it !=indexNameMap.end(); it++) { reverse[it->second] = it->first; } map::iterator it = indexGroupMap.find(groupName); if (it == indexGroupMap.end()) { m->mothurOut("[ERROR]: " + groupName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { int indexOfGroupToRemove = it->second; map currentGroupIndex = indexGroupMap; vector newGroups; for (int i = 0; i < groups.size(); i++) { if (groups[i] != groupName) { newGroups.push_back(groups[i]); indexGroupMap[groups[i]] = newGroups.size()-1; } } indexGroupMap.erase(groupName); groups = newGroups; totalGroups.erase(totalGroups.begin()+indexOfGroupToRemove); int thisIndex = 0; map newIndexNameMap; for (int i = 0; i < counts.size(); i++) { int num = counts[i][indexOfGroupToRemove]; counts[i].erase(counts[i].begin()+indexOfGroupToRemove); totals[i] -= num; total -= num; if (totals[i] == 0) { //your sequences are only from the group we want to remove, then remove you. counts.erase(counts.begin()+i); totals.erase(totals.begin()+i); uniques--; i--; } newIndexNameMap[reverse[thisIndex]] = i; thisIndex++; } indexNameMap = newIndexNameMap; if (groups.size() == 0) { hasGroups = false; } } }else { m->mothurOut("[ERROR]: your count table does not contain group information, can not remove group " + groupName + ".\n"); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "removeGroup"); exit(1); } } /************************************************************/ //vector of groups for the seq vector CountTable::getGroups(string seqName) { try { vector thisGroups; if (hasGroups) { vector thisCounts = getGroupCounts(seqName); for (int i = 0; i < thisCounts.size(); i++) { if (thisCounts[i] != 0) { thisGroups.push_back(groups[i]); } } }else{ m->mothurOut("[ERROR]: Your count table does not have group info. Please correct.\n"); m->control_pressed = true; } return thisGroups; } catch(exception& e) { m->errorOut(e, "CountTable", "getGroups"); exit(1); } } /************************************************************/ //total number of seqs represented by seq int CountTable::renameSeq(string oldSeqName, string newSeqName) { try { map::iterator it = indexNameMap.find(oldSeqName); if (it == indexNameMap.end()) { if (hasGroupInfo()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(oldSeqName, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + oldSeqName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } } m->mothurOut("[ERROR]: " + oldSeqName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { int index = it->second; indexNameMap.erase(it); indexNameMap[newSeqName] = index; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "renameSeq"); exit(1); } } /************************************************************/ //total number of seqs represented by seq int CountTable::getNumSeqs(string seqName) { try { map::iterator it = indexNameMap.find(seqName); if (it == indexNameMap.end()) { if (hasGroupInfo()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seqName, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seqName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } } m->mothurOut("[ERROR]: " + seqName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { return totals[it->second]; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "getNumSeqs"); exit(1); } } /************************************************************/ //returns unique index for sequence like get in NameAssignment int CountTable::get(string seqName) { try { map::iterator it = indexNameMap.find(seqName); if (it == indexNameMap.end()) { if (hasGroupInfo()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seqName, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seqName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } } m->mothurOut("[ERROR]: " + seqName + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { return it->second; } return -1; } catch(exception& e) { m->errorOut(e, "CountTable", "get"); exit(1); } } /************************************************************/ //add seqeunce without group info int CountTable::push_back(string seqName) { try { map::iterator it = indexNameMap.find(seqName); if (it == indexNameMap.end()) { if (hasGroups) { m->mothurOut("[ERROR]: Your count table has groups and I have no group information for " + seqName + "."); m->mothurOutEndLine(); m->control_pressed = true; } indexNameMap[seqName] = uniques; totals.push_back(1); total++; uniques++; }else { m->mothurOut("[ERROR]: Your count table contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "push_back"); exit(1); } } /************************************************************/ //remove sequence int CountTable::remove(string seqName) { try { map::iterator it = indexNameMap.find(seqName); if (it != indexNameMap.end()) { uniques--; if (hasGroups){ //remove this sequences counts from group totals for (int i = 0; i < totalGroups.size(); i++) { totalGroups[i] -= counts[it->second][i]; counts[it->second][i] = 0; } } int thisTotal = totals[it->second]; totals[it->second] = 0; total -= thisTotal; indexNameMap.erase(it); }else { if (hasGroupInfo()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seqName, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seqName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } } m->mothurOut("[ERROR]: Your count table contains does not include " + seqName + ", cannot remove."); m->mothurOutEndLine(); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "push_back"); exit(1); } } /************************************************************/ //add seqeunce without group info int CountTable::push_back(string seqName, int thisTotal) { try { map::iterator it = indexNameMap.find(seqName); if (it == indexNameMap.end()) { if (hasGroups) { m->mothurOut("[ERROR]: Your count table has groups and I have no group information for " + seqName + "."); m->mothurOutEndLine(); m->control_pressed = true; } indexNameMap[seqName] = uniques; totals.push_back(thisTotal); total+=thisTotal; uniques++; }else { m->mothurOut("[ERROR]: Your count table contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "push_back"); exit(1); } } /************************************************************/ //add sequence with group info int CountTable::push_back(string seqName, vector groupCounts) { try { map::iterator it = indexNameMap.find(seqName); if (it == indexNameMap.end()) { if ((hasGroups) && (groupCounts.size() != getNumGroups())) { m->mothurOut("[ERROR]: Your count table has a " + toString(getNumGroups()) + " groups and " + seqName + " has " + toString(groupCounts.size()) + ", please correct."); m->mothurOutEndLine(); m->control_pressed = true; } int thisTotal = 0; for (int i = 0; i < getNumGroups(); i++) { totalGroups[i] += groupCounts[i]; thisTotal += groupCounts[i]; } if (hasGroups) { counts.push_back(groupCounts); } indexNameMap[seqName] = uniques; totals.push_back(thisTotal); total+= thisTotal; uniques++; }else { m->mothurOut("[ERROR]: Your count table contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "push_back"); exit(1); } } /************************************************************/ //create ListVector from uniques ListVector CountTable::getListVector() { try { ListVector list(indexNameMap.size()); for (map::iterator it = indexNameMap.begin(); it != indexNameMap.end(); it++) { if (m->control_pressed) { break; } list.set(it->second, it->first); } return list; } catch(exception& e) { m->errorOut(e, "CountTable", "getListVector"); exit(1); } } /************************************************************/ //returns the names of all unique sequences in file vector CountTable::getNamesOfSeqs() { try { vector names; for (map::iterator it = indexNameMap.begin(); it != indexNameMap.end(); it++) { names.push_back(it->first); } return names; } catch(exception& e) { m->errorOut(e, "CountTable", "getNamesOfSeqs"); exit(1); } } /************************************************************/ //returns the names of all unique sequences in file mapped to their seqCounts map CountTable::getNameMap() { try { map names; for (map::iterator it = indexNameMap.begin(); it != indexNameMap.end(); it++) { names[it->first] = totals[it->second]; } return names; } catch(exception& e) { m->errorOut(e, "CountTable", "getNameMap"); exit(1); } } /************************************************************/ //returns the names of all unique sequences in file vector CountTable::getNamesOfSeqs(string group) { try { vector names; if (hasGroups) { map::iterator it = indexGroupMap.find(group); if (it == indexGroupMap.end()) { m->mothurOut("[ERROR]: " + group + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { for (map::iterator it2 = indexNameMap.begin(); it2 != indexNameMap.end(); it2++) { if (counts[it2->second][it->second] != 0) { names.push_back(it2->first); } } } }else{ m->mothurOut("[ERROR]: Your count table does not have group info. Please correct.\n"); m->control_pressed = true; } return names; } catch(exception& e) { m->errorOut(e, "CountTable", "getNamesOfSeqs"); exit(1); } } /************************************************************/ //merges counts of seq1 and seq2, saving in seq1 int CountTable::mergeCounts(string seq1, string seq2) { try { map::iterator it = indexNameMap.find(seq1); if (it == indexNameMap.end()) { if (hasGroupInfo()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seq1, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seq1 + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } } m->mothurOut("[ERROR]: " + seq1 + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { map::iterator it2 = indexNameMap.find(seq2); if (it2 == indexNameMap.end()) { if (hasGroupInfo()) { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(seq2, groups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + seq2 + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } } m->mothurOut("[ERROR]: " + seq2 + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { //merge data for (int i = 0; i < groups.size(); i++) { counts[it->second][i] += counts[it2->second][i]; } totals[it->second] += totals[it2->second]; uniques--; indexNameMap.erase(it2); } } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "getNamesOfSeqs"); exit(1); } } /************************************************************/ int CountTable::copy(CountTable* ct) { try { vector thisGroups = ct->getNamesOfGroups(); for (int i = 0; i < thisGroups.size(); i++) { addGroup(thisGroups[i]); } vector names = ct->getNamesOfSeqs(); for (int i = 0; i < names.size(); i++) { vector thisCounts = ct->getGroupCounts(names[i]); push_back(names[i], thisCounts); } return 0; } catch(exception& e) { m->errorOut(e, "CountTable", "copy"); exit(1); } } /************************************************************/ mothur-1.36.1/source/datastructures/counttable.h000066400000000000000000000077431255543666200220370ustar00rootroot00000000000000#ifndef Mothur_counttable_h #define Mothur_counttable_h // // counttable.h // Mothur // // Created by Sarah Westcott on 6/26/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // //This class is designed to read a count table file and store its data. //count table files look like: /* Representative_Sequence total F003D000 F003D002 F003D004 F003D006 F003D008 F003D142 F003D144 F003D146 F003D148 F003D150 MOCK.GQY1XT001 GQY1XT001C296C 6051 409 985 923 937 342 707 458 439 387 464 0 GQY1XT001A3TJI 4801 396 170 413 442 306 769 581 576 497 651 0 GQY1XT001CS2B8 3018 263 226 328 460 361 336 248 290 187 319 0 GQY1XT001CD9IB 2736 239 177 256 405 306 286 263 248 164 392 0 or if no group info was used to create it Representative_Sequence total GQY1XT001C296C 6051 GQY1XT001A3TJI 4801 GQY1XT001CS2B8 3018 GQY1XT001CD9IB 2736 GQY1XT001ARCB1 2183 GQY1XT001CNF2P 2796 GQY1XT001CJMDA 1667 GQY1XT001CBVJB 3758 */ #include "mothurout.h" #include "listvector.hpp" #include "groupmap.h" class CountTable { public: CountTable() { m = MothurOut::getInstance(); hasGroups = false; total = 0; uniques = 0; } ~CountTable() {} //reads and creates smart enough to eliminate groups with zero counts int createTable(set&, map&, set&); //seqNames, seqName->group, groupNames int createTable(string, string, bool); //namefile, groupfile, createGroup int readTable(string, bool, bool); int printTable(string); int printHeaders(ofstream&); int printSeq(ofstream&, string); bool testGroups(string file); //used to check if file has group data without reading it. int copy(CountTable*); bool hasGroupInfo() { return hasGroups; } int getNumGroups() { return groups.size(); } vector getNamesOfGroups() { return groups; } //returns group names, if no group info vector is blank. int addGroup(string); int removeGroup(string); int renameSeq(string, string); //used to change name of sequence for use with trees int setAbund(string, string, int); //set abundance number of seqs for that group for that seq int push_back(string); //add a sequence int push_back(string, int); //add a sequence int push_back(string, vector); //add a sequence with group info int remove(string); //remove seq int get(string); //returns unique sequence index for reading distance matrices like NameAssignment int size() { return indexNameMap.size(); } vector getGroups(string); //returns vector of groups represented by this sequences vector getGroupCounts(string); //returns group counts for a seq passed in, if no group info is in file vector is blank. Order is the same as the groups returned by getGroups function. int getGroupCount(string, string); //returns number of seqs for that group for that seq int getGroupCount(string); // returns total seqs for that group int getNumSeqs(string); //returns total seqs for that seq, 0 if not found int getNumSeqs() { return total; } //return total number of seqs int getNumUniqueSeqs() { return uniques; } //return number of unique/representative seqs int getGroupIndex(string); //returns index in getGroupCounts vector of specific group vector getNamesOfSeqs(); vector getNamesOfSeqs(string); int mergeCounts(string, string); //combines counts for 2 seqs, saving under the first name passed in. ListVector getListVector(); map getNameMap(); private: string filename; MothurOut* m; bool hasGroups; int total, uniques; vector groups; vector< vector > counts; vector totals; vector totalGroups; map indexNameMap; map indexGroupMap; }; #endif mothur-1.36.1/source/datastructures/database.cpp000066400000000000000000000017361255543666200217720ustar00rootroot00000000000000/* * database.cpp * * * Created by Pat Schloss on 12/16/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "sequence.hpp" #include "database.hpp" /**************************************************************************************************/ Database::Database(){ longest = 0; numSeqs = 0; m = MothurOut::getInstance(); } /**************************************************************************************************/ Database::~Database(){} /**************************************************************************************************/ float Database::getSearchScore() { return searchScore; } // we're assuming that the search is already done /**************************************************************************************************/ int Database::getLongestBase() { return longest+1; } /**************************************************************************************************/ mothur-1.36.1/source/datastructures/database.hpp000066400000000000000000000043401255543666200217710ustar00rootroot00000000000000#ifndef DATABASE_HPP #define DATABASE_HPP /* * database.hpp * * * Created by Pat Schloss on 12/16/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ /* This class is a parent to blastdb, kmerdb, suffixdb. */ #include "mothur.h" #include "sequence.hpp" /**************************************************************************************************/ struct seqMatch { //used to select top n matches int seq; int match; seqMatch() {} seqMatch(int s, int m) : seq(s), match(m) {} }; /**************************************************************************************************/ inline bool compareSeqMatches (seqMatch member, seqMatch member2){ //sorts largest to smallest if(member.match > member2.match){ return true; } else{ return false; } } /**************************************************************************************************/ inline bool compareSeqMatchesReverse (seqMatch member, seqMatch member2){ //sorts largest to smallest if(member.match < member2.match){ return true; } else{ return false; } } /**************************************************************************************************/ class Database { public: Database(); virtual ~Database(); virtual void generateDB() = 0; virtual void addSequence(Sequence) = 0; //add sequence to search engine virtual string getName(int) { return ""; } virtual vector findClosestSequences(Sequence*, int) = 0; // returns indexes of n closest sequences to query virtual vector findClosestMegaBlast(Sequence*, int, int){return results;} virtual float getSearchScore(); virtual vector getSearchScores() { return Scores; } //assumes you already called findClosestMegaBlast virtual int getLongestBase(); virtual void readKmerDB(ifstream&){}; virtual void setNumSeqs(int i) { numSeqs = i; } virtual vector getSequencesWithKmer(int){ vector filler; return filler; }; virtual int getReversed(int) { return 0; } virtual int getMaxKmer(){ return 1; } protected: MothurOut* m; int numSeqs, longest; float searchScore; vector results; vector Scores; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/datavector.hpp000066400000000000000000000033701255543666200223630ustar00rootroot00000000000000#ifndef datavector_h #define datavector_h #include "mothur.h" #include "mothurout.h" /* This class is parent to listvector, ordervector, rabundvector, sabundvector, sharedordervector, sharedrabundvector, sharedsabundvector. The child classes all contain OTU information in different forms. */ class RAbundVector; class SAbundVector; class OrderVector; class SharedListVector; class SharedOrderVector; class SharedSAbundVector; class SharedRAbundVector; class SharedRAbundFloatVector; class GroupMap; class DataVector { public: DataVector(){ m = MothurOut::getInstance(); }// : maxRank(0), numBins(0), numSeqs(0){}; DataVector(string l) : label(l) { m = MothurOut::getInstance();}; DataVector(const DataVector& dv) : label(dv.label){ m = MothurOut::getInstance();};//, maxRank(dv.maxRank), numBins(dv.numBins), numSeqs(dv.numSeqs) {}; DataVector(ifstream&) {m = MothurOut::getInstance();} DataVector(ifstream&, GroupMap*){m = MothurOut::getInstance();} virtual ~DataVector(){}; // virtual int getNumBins() { return numBins; } // virtual int getNumSeqs() { return numSeqs; } // virtual int getMaxRank() { return maxRank; } virtual void resize(int) = 0; virtual int size() = 0; virtual void print(ostream&) = 0; virtual void print(ostream&, map&) {} virtual void printHeaders(ostream&) {}; virtual void clear() = 0; void setLabel(string l) { label = l; } string getLabel() { return label; } virtual RAbundVector getRAbundVector() = 0; virtual SAbundVector getSAbundVector() = 0; virtual OrderVector getOrderVector(map* hold = NULL) = 0; protected: string label; MothurOut* m; // int maxRank; // int numBins; // int numSeqs; }; /***********************************************************************/ #endif mothur-1.36.1/source/datastructures/designmap.cpp000066400000000000000000000650611255543666200221760ustar00rootroot00000000000000// // designmap.cpp // Mothur // // Created by SarahsWork on 6/17/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include "designmap.h" /************************************************************/ DesignMap::DesignMap(string file) { try { m = MothurOut::getInstance(); defaultClass = "not found"; read(file); } catch(exception& e) { m->errorOut(e, "DesignMap", "DesignMap"); exit(1); } } /************************************************************/ int DesignMap::read(string file) { try { ifstream in; m->openInputFile(file, in); string temp = ""; in >> temp; m->gobble(in); vector columnHeaders; vector tempColumnHeaders; if (temp == "group") { string headers = m->getline(in); m->gobble(in); columnHeaders = m->splitWhiteSpace(headers); columnHeaders.insert(columnHeaders.begin(), "group"); }else { string headers = m->getline(in); m->gobble(in); tempColumnHeaders = m->splitWhiteSpace(headers); int num = tempColumnHeaders.size(); columnHeaders.push_back("group"); for (int i = 0; i < num; i++) { columnHeaders.push_back("value" + toString(i)); } } namesOfCategories.clear(); indexCategoryMap.clear(); indexGroupNameMap.clear(); designMap.clear(); map originalGroupIndexes; for (int i = 1; i < columnHeaders.size(); i++) { namesOfCategories.push_back(columnHeaders[i]); originalGroupIndexes[i-1] = columnHeaders[i]; } if (columnHeaders.size() > 1) { defaultClass = columnHeaders[1]; } else { m->mothurOut("[ERROR]: Your design file contains only one column. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; } //sort groups to keep consistent with how we store the groups in groupmap sort(namesOfCategories.begin(), namesOfCategories.end()); for (int i = 0; i < namesOfCategories.size(); i++) { indexCategoryMap[namesOfCategories[i]] = i; } int numCategories = namesOfCategories.size(); bool error = false; string group; totalCategories.resize(numCategories); int count = 0; //file without headers, fix it if (temp != "group") { group = temp; if (m->debug) { m->mothurOut("[DEBUG]: group = " + group + "\n"); } //if group info, then read it vector categoryValues; categoryValues.resize(numCategories, "not found"); for (int i = 0; i < numCategories; i++) { int thisIndex = indexCategoryMap[originalGroupIndexes[i]]; //find index of this category because we sort the values. string temp = tempColumnHeaders[i]; categoryValues[thisIndex] = temp; if (m->debug) { m->mothurOut("[DEBUG]: value = " + temp + "\n"); } //do we have this value for this category already map::iterator it = totalCategories[thisIndex].find(temp); if (it == totalCategories[thisIndex].end()) { totalCategories[thisIndex][temp] = 1; } else { totalCategories[thisIndex][temp]++; } } map::iterator it = indexGroupNameMap.find(group); if (it == indexGroupNameMap.end()) { groups.push_back(group); indexGroupNameMap[group] = count; designMap.push_back(categoryValues); count++; }else { error = true; m->mothurOut("[ERROR]: Your design file contains more than 1 group named " + group + ", group names must be unique. Please correct."); m->mothurOutEndLine(); } } while (!in.eof()) { if (m->control_pressed) { break; } in >> group; m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: group = " + group + "\n"); } //if group info, then read it vector categoryValues; categoryValues.resize(numCategories, "not found"); for (int i = 0; i < numCategories; i++) { int thisIndex = indexCategoryMap[originalGroupIndexes[i]]; //find index of this category because we sort the values. string temp = "not found"; in >> temp; categoryValues[thisIndex] = temp; m->gobble(in); if (m->debug) { m->mothurOut("[DEBUG]: value = " + temp + "\n"); } //do we have this value for this category already map::iterator it = totalCategories[thisIndex].find(temp); if (it == totalCategories[thisIndex].end()) { totalCategories[thisIndex][temp] = 1; } else { totalCategories[thisIndex][temp]++; } } map::iterator it = indexGroupNameMap.find(group); if (it == indexGroupNameMap.end()) { groups.push_back(group); indexGroupNameMap[group] = count; designMap.push_back(categoryValues); count++; }else { error = true; m->mothurOut("[ERROR]: Your design file contains more than 1 group named " + group + ", group names must be unique. Please correct."); m->mothurOutEndLine(); } } in.close(); if (error) { m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "DesignMap", "read"); exit(1); } } /************************************************************/ ////groupName, returns default categories value. string DesignMap::get(string groupName) { try { string value = "not found"; map::iterator it2 = indexGroupNameMap.find(groupName); if (it2 == indexGroupNameMap.end()) { m->mothurOut("[ERROR]: group " + groupName + " is not in your design file. Please correct.\n"); m->control_pressed = true; }else { return designMap[it2->second][indexCategoryMap[defaultClass]]; } return value; } catch(exception& e) { m->errorOut(e, "DesignMap", "get"); exit(1); } } /************************************************************/ ////groupName, returns default categories value. vector DesignMap::getCategory() { try { //oldStyle design file group -> treatment. returns treatments set uniqueNames; for (int i = 0; i < groups.size(); i++) { uniqueNames.insert(get(groups[i])); } vector values; for (set::iterator it = uniqueNames.begin(); it != uniqueNames.end(); it++) { values.push_back(*it); } return values; } catch(exception& e) { m->errorOut(e, "DesignMap", "getCategory"); exit(1); } } /************************************************************/ ////categoryName, returns category values. vector DesignMap::getCategory(string catName) { try { vector values; map::iterator it2 = indexCategoryMap.find(catName); if (it2 == indexCategoryMap.end()) { m->mothurOut("[ERROR]: category " + catName + " is not in your design file. Please correct.\n"); m->control_pressed = true; }else { for (map::iterator it = totalCategories[it2->second].begin(); it != totalCategories[it2->second].end(); it++) { values.push_back(it->first); } } return values; } catch(exception& e) { m->errorOut(e, "DesignMap", "getCategory"); exit(1); } } /************************************************************/ ////groupName, category returns value. example F000132, sex -> male string DesignMap::get(string groupName, string categoryName) { try { string value = "not found"; map::iterator it = indexCategoryMap.find(categoryName); if (it == indexCategoryMap.end()) { m->mothurOut("[ERROR]: category " + categoryName + " is not in your design file. Please correct.\n"); m->control_pressed = true; }else { map::iterator it2 = indexGroupNameMap.find(groupName); if (it2 == indexGroupNameMap.end()) { m->mothurOut("[ERROR]: group " + groupName + " is not in your design file. Please correct.\n"); m->control_pressed = true; }else { return designMap[it2->second][it->second]; } } return value; } catch(exception& e) { m->errorOut(e, "DesignMap", "get"); exit(1); } } /************************************************************/ //add group, assumes order is correct int DesignMap::push_back(string group, vector values) { try { map::iterator it = indexGroupNameMap.find(group); if (it == indexGroupNameMap.end()) { if (values.size() != getNumCategories()) { m->mothurOut("[ERROR]: Your design file has a " + toString(getNumCategories()) + " categories and " + group + " has " + toString(values.size()) + ", please correct."); m->mothurOutEndLine(); m->control_pressed = true; return 0; } for (int i = 0; i < values.size(); i++) { //do we have this value for this category already map::iterator it = totalCategories[i].find(values[i]); if (it == totalCategories[i].end()) { totalCategories[i][values[i]] = 1; } else { totalCategories[i][values[i]]++; } } int count = indexGroupNameMap.size(); indexGroupNameMap[group] = count; designMap.push_back(values); }else { m->mothurOut("[ERROR]: Your design file contains more than 1 group named " + group + ", group names must be unique. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "DesignMap", "push_back"); exit(1); } } /************************************************************/ //set values for group, does not need to set all values. assumes group is in table already int DesignMap::setValues(string group, map values) { try { map::iterator it = indexGroupNameMap.find(group); if (it != indexGroupNameMap.end()) { for (map::iterator it2 = values.begin(); it2 != values.end(); it2++) { map::iterator it3 = indexCategoryMap.find(it2->first); //do we have this category if (it3 == indexCategoryMap.end()) { m->mothurOut("[ERROR]: Your design file does not contain a category called " + it2->first + ". Please correct."); m->mothurOutEndLine(); m->control_pressed = true; }else { string oldCategory = designMap[it->second][it3->second]; //adjust totals for old category int oldCount = totalCategories[it3->second][oldCategory]; if (oldCount == 1) { totalCategories[it3->second].erase(oldCategory); } else { totalCategories[it3->second][oldCategory]--; } designMap[it->second][it3->second] = it2->second; //reset value //adjust totals for new category map::iterator it4 = totalCategories[it3->second].find(it2->second); if (it4 == totalCategories[it3->second].end()) { totalCategories[it3->second][it2->second] = 1; } else { totalCategories[it3->second][it2->second]++; } } } }else { m->mothurOut("[ERROR]: Your design file does not contain a group named " + group + ". Please correct."); m->mothurOutEndLine(); m->control_pressed = true; } return 0; } catch(exception& e) { m->errorOut(e, "DesignMap", "set"); exit(1); } } /************************************************************/ //set defaultclass void DesignMap::setDefaultClass(string dClass) { try { if (m->inUsersGroups(dClass, namesOfCategories)) { defaultClass = dClass; }else{ m->mothurOut("[WARNING]: Your design file does not contain a category named " + dClass + ". Using default class " + defaultClass + " .\n\n"); } } catch(exception& e) { m->errorOut(e, "DesignMap", "setDefaultClass"); exit(1); } } /************************************************************/ //get number of groups belonging to a category or set of categories, with value or a set of values. Must have all categories and values. Example: // map early, late>, male> would return 1. Only one group is male and from early or late. int DesignMap::getNumUnique(map > selected) { try { int num = 0; //get indexes of categories vector indexes; for (map >::iterator it = selected.begin(); it != selected.end(); it++) { map::iterator it2 = indexCategoryMap.find(it->first); if (it2 == indexCategoryMap.end()) { m->mothurOut("[ERROR]: Your design file does not contain a category named " + it->first + ". Please correct."); m->mothurOutEndLine(); m->control_pressed = true; return 0; }else { indexes.push_back(it2->second); } } for (int i = 0; i < designMap.size(); i++) { bool hasAll = true; //innocent til proven guilty int count = 0; for (map >::iterator it = selected.begin(); it != selected.end(); it++) { //loop through each //check category to see if this group meets the requirements if (!m->inUsersGroups(designMap[i][indexes[count]], it->second)) { hasAll = false; it = selected.end(); } count++; } if (hasAll) { num++; } } return num; } catch(exception& e) { m->errorOut(e, "DesignMap", "getNumUnique"); exit(1); } } /************************************************************/ //get number of groups belonging to a category or set of categories, with value or a set of values. Must have at least one categories and values. Example: // map early, late>, male> would return 3. All three group have are either male or from early or late. int DesignMap::getNumShared(map > selected) { try { int num = 0; //get indexes of categories vector indexes; for (map >::iterator it = selected.begin(); it != selected.end(); it++) { map::iterator it2 = indexCategoryMap.find(it->first); if (it2 == indexCategoryMap.end()) { m->mothurOut("[ERROR]: Your design file does not contain a category named " + it->first + ". Please correct."); m->mothurOutEndLine(); m->control_pressed = true; return 0; }else { indexes.push_back(it2->second); } } for (int i = 0; i < designMap.size(); i++) { bool hasAny = false; //guilty til proven innocent int count = 0; for (map >::iterator it = selected.begin(); it != selected.end(); it++) { //loop through each //check category to see if this group meets the requirements if (m->inUsersGroups(designMap[i][indexes[count]], it->second)) { hasAny = true; it = selected.end(); } count++; } if (hasAny) { num++; } } return num; } catch(exception& e) { m->errorOut(e, "DesignMap", "getNumShared"); exit(1); } } /************************************************************/ //get names of groups belonging to a category or set of categories, with value or a set of values. Must have all categories and values. Example: // map early, late>, male> would return F000132. F000132 is the only group which is male and from early or late. vector DesignMap::getNamesUnique(map > selected) { try { vector names; //get indexes of categories vector indexes; for (map >::iterator it = selected.begin(); it != selected.end(); it++) { map::iterator it2 = indexCategoryMap.find(it->first); if (it2 == indexCategoryMap.end()) { m->mothurOut("[ERROR]: Your design file does not contain a category named " + it->first + ". Please correct."); m->mothurOutEndLine(); m->control_pressed = true; return names; }else { indexes.push_back(it2->second); } } //map int to name map reverse; for (map::iterator it = indexGroupNameMap.begin(); it != indexGroupNameMap.end(); it++) { reverse[it->second] = it->first; } for (int i = 0; i < designMap.size(); i++) { bool hasAll = true; //innocent til proven guilty int count = 0; for (map >::iterator it = selected.begin(); it != selected.end(); it++) { //loop through each //check category to see if this group meets the requirements if (!m->inUsersGroups(designMap[i][indexes[count]], it->second)) { hasAll = false; it = selected.end(); } count++; } if (hasAll) { map::iterator it = reverse.find(i); if (it == reverse.end()) { m->mothurOut("[ERROR]: should never get here, oops. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; return names; }else { names.push_back(it->second); } } } return names; } catch(exception& e) { m->errorOut(e, "DesignMap", "getNamesUnique"); exit(1); } } /************************************************************/ //get names of groups belonging to a category or set of categories, with value or a set of values. Must have all categories and values. Example: // map early, late>, male> would return F000132. F000132 is the only group which is male and from early or late. vector DesignMap::getNamesShared(map > selected) { try { vector names; //get indexes of categories vector indexes; for (map >::iterator it = selected.begin(); it != selected.end(); it++) { map::iterator it2 = indexCategoryMap.find(it->first); if (it2 == indexCategoryMap.end()) { m->mothurOut("[ERROR]: Your design file does not contain a category named " + it->first + ". Please correct."); m->mothurOutEndLine(); m->control_pressed = true; return names; }else { indexes.push_back(it2->second); } } //map int to name map reverse; for (map::iterator it = indexGroupNameMap.begin(); it != indexGroupNameMap.end(); it++) { reverse[it->second] = it->first; } for (int i = 0; i < designMap.size(); i++) { bool hasAny = false; int count = 0; for (map >::iterator it = selected.begin(); it != selected.end(); it++) { //loop through each //check category to see if this group meets the requirements if (m->inUsersGroups(designMap[i][indexes[count]], it->second)) { hasAny = true; break; } count++; } if (hasAny) { map::iterator it = reverse.find(i); if (it == reverse.end()) { m->mothurOut("[ERROR]: should never get here, oops. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; return names; }else { names.push_back(it->second); } } } return names; } catch(exception& e) { m->errorOut(e, "DesignMap", "getNamesShared"); exit(1); } } /************************************************************/ //get names of groups belonging to a category or set of categories, with value or a set of values. Must have at least one categories and values. Example: // map early, late>, male> would return F000132, F000142, F000138. All three group have are either male or from early or late. vector DesignMap::getNamesGroups(string category, string value) { try { vector names; map::iterator it = indexCategoryMap.find(category); if (it == indexCategoryMap.end()) { m->mothurOut("[ERROR]: category " + category + " is not in your design file. Please correct.\n"); m->control_pressed = true; }else { int column = it->second; //map int to name map reverse; for (map::iterator it2 = indexGroupNameMap.begin(); it2 != indexGroupNameMap.end(); it2++) { reverse[it2->second] = it2->first; } for (int i = 0; i < designMap.size(); i++) { if (designMap[i][column] == value) { map::iterator it2 = reverse.find(i); if (it2 == reverse.end()) { m->mothurOut("[ERROR]: should never get here, oops. Please correct."); m->mothurOutEndLine(); m->control_pressed = true; return names; }else { names.push_back(it2->second); } } } } return names; } catch(exception& e) { m->errorOut(e, "DesignMap", "getNamesGroups"); exit(1); } } /************************************************************/ //assume default category and get names groups that match any values in vector passed in. = F000142, F000132. vector DesignMap::getNamesGroups(vector sets) { try { vector names; if (sets.size() == 0) { return names; } map > temp; temp[defaultClass] = sets; names = getNamesShared(temp); return names; } catch(exception& e) { m->errorOut(e, "DesignMap", "getNamesGroups"); exit(1); } } /************************************************************/ int DesignMap::print(ofstream& out) { try { out << "group"; for (int i = 0; i < namesOfCategories.size(); i++) { out << '\t' << namesOfCategories[i]; } out << endl; map reverse; //use this to preserve order for (map::iterator it = indexGroupNameMap.begin(); it !=indexGroupNameMap.end(); it++) { reverse[it->second] = it->first; } for (int i = 0; i < designMap.size(); i++) { map::iterator itR = reverse.find(i); if (itR != reverse.end()) { //will equal end if seqs were removed because remove just removes from indexNameMap out << itR->second; for (int j = 0; j < namesOfCategories.size(); j++) { out << '\t' << designMap[i][j]; } out << endl; } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "DesignMap", "print"); exit(1); } } /************************************************************/ //print specific categories int DesignMap::printCategories(ofstream& out, vector cats) { try { out << "group"; for (int i = 0; i < namesOfCategories.size(); i++) { if (m->inUsersGroups(namesOfCategories[i], cats)) { out << '\t' << namesOfCategories[i]; } } out << endl; map reverse; //use this to preserve order for (map::iterator it = indexGroupNameMap.begin(); it !=indexGroupNameMap.end(); it++) { reverse[it->second] = it->first; } for (int i = 0; i < designMap.size(); i++) { map::iterator itR = reverse.find(i); if (itR != reverse.end()) { //will equal end if seqs were removed because remove just removes from indexNameMap out << itR->second; for (int j = 0; j < namesOfCategories.size(); j++) { if (m->inUsersGroups(namesOfCategories[i], cats)) { out << '\t' << designMap[i][j]; } } out << endl; } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "DesignMap", "printCategories"); exit(1); } } /************************************************************/ //print specific groups int DesignMap::printGroups(ofstream& out, vector groups) { try { int numSelected = 0; out << "group"; for (int i = 0; i < namesOfCategories.size(); i++) { out << '\t' << namesOfCategories[i]; } out << endl; map reverse; //use this to preserve order for (map::iterator it = indexGroupNameMap.begin(); it !=indexGroupNameMap.end(); it++) { reverse[it->second] = it->first; } for (int i = 0; i < designMap.size(); i++) { map::iterator itR = reverse.find(i); if (itR != reverse.end()) { //will equal end if groups were removed because remove just removes from indexNameMap if (m->inUsersGroups(itR->second, groups)) { out << itR->second; for (int j = 0; j < namesOfCategories.size(); j++) { out << '\t' << designMap[i][j]; } out << endl; numSelected++; } } } out.close(); return numSelected; } catch(exception& e) { m->errorOut(e, "DesignMap", "printGroups"); exit(1); } } /************************************************************/ mothur-1.36.1/source/datastructures/designmap.h000066400000000000000000000104221255543666200216320ustar00rootroot00000000000000// // designmap.h // Mothur // // Created by SarahsWork on 6/17/13. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef __Mothur__designmap__ #define __Mothur__designmap__ #include "mothurout.h" /* This class is a representation of the design file. group treatment sex age F000142 Early female young F000132 Late male old F000138 Mid male old */ class DesignMap { public: DesignMap() { m = MothurOut::getInstance(); defaultClass = "not found"; } DesignMap(string); //reads file as well ~DesignMap() {} //read designfile name int read(string); //like groupMap getGroup string get(string, string); //groupName, category returns value. example F000132, sex -> male string get(string); //groupName, returns default categories value. example F000132, -> late //like groupMap getNamesOfGroups vector getCategory(string); //categoryName, returns values. example treatment, -> early,late,mid vector getCategory(); //returns default categories values. example treatment, -> early,late,mid int setValues(string, map); //groupName, map int push_back(string, vector); //groupName, vector - assumes you put values in order of getNamesOfCategories //refers to header labels vector getNamesOfCategories() { sort(namesOfCategories.begin(), namesOfCategories.end()); return namesOfCategories; } //set deault treatment, mothur sets this to column 2. void setDefaultClass(string); string getDefaultClass() { return defaultClass; } //number of treatments / columns in file int getNumCategories() { return namesOfCategories.size(); } //number of groups / rows in file int getNumGroups() { return designMap.size(); } //options to select groups based on values vector getNamesGroups() { return groups; } vector getNamesGroups(string, string); //get names groups with category and value. vector getNamesGroups(vector); //assume default category and get names groups that match any values in vector passed in. = F000142, F000132. //options to selects - may want to expand on these int getNumUnique(map >); //get number of groups belonging to a category or set of categories, with value or a set of values. Must have all categories and values. Example: // map early, late>, male> would return 1. Only one group is male and from early or late. int getNumShared(map >); //get number of groups belonging to a category or set of categories, with value or a set of values. Must have at least one categories and values. Example: // map early, late>, male> would return 3. All three group have are either male or from early or late. vector getNamesUnique(map >); //get names of groups belonging to a category or set of categories, with value or a set of values. Must have all categories and values. Example: // map early, late>, male> would return F000132. F000132 is the only group which is male and from early or late. vector getNamesShared(map >); //get names of groups belonging to a category or set of categories, with value or a set of values. Must have at least one categories and values. Example: // map early, late>, male> would return F000132, F000142, F000138. All three group have are either male or from early or late. int print(ofstream&); int printCategories(ofstream&, vector); //print certain categories int printGroups(ofstream&, vector); //print certain Groups private: string defaultClass; MothurOut* m; vector< map > totalCategories; //for each category, total groups assigned to it. vector[0] early -> 1, vector[1] male -> 2 vector groups; vector namesOfCategories; vector< vector > designMap; map indexGroupNameMap; //maps groupName to row in values map indexCategoryMap; //maps category to column in values }; #endif /* defined(__Mothur__designmap__) */ mothur-1.36.1/source/datastructures/distancedb.cpp000066400000000000000000000075501255543666200223260ustar00rootroot00000000000000/* * distancedb.cpp * * * Created by Pat Schloss on 12/29/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "database.hpp" #include "sequence.hpp" #include "distancedb.hpp" #include "onegapignore.h" /**************************************************************************************************/ DistanceDB::DistanceDB() : Database() { try { templateAligned = true; templateSeqsLength = 0; distCalculator = new oneGapIgnoreTermGapDist(); } catch(exception& e) { m->errorOut(e, "DistanceDB", "DistanceDB"); exit(1); } } /**************************************************************************************************/ void DistanceDB::addSequence(Sequence seq) { try { //are the template sequences aligned if (!isAligned(seq.getAligned())) { templateAligned = false; m->mothurOut(seq.getName() + " is not aligned. Sequences must be aligned to use the distance method."); m->mothurOutEndLine(); } if (templateSeqsLength == 0) { templateSeqsLength = seq.getAligned().length(); } data.push_back(seq); } catch(exception& e) { m->errorOut(e, "DistanceDB", "addSequence"); exit(1); } } /**************************************************************************************************/ //returns indexes to top matches vector DistanceDB::findClosestSequences(Sequence* query, int numWanted){ try { vector topMatches; Scores.clear(); bool templateSameLength = true; string sequence = query->getAligned(); vector dists; searchScore = -1.0; if (numWanted > data.size()){ m->mothurOut("numwanted is larger than the number of template sequences, using "+ toString(data.size()) + "."); m->mothurOutEndLine(); numWanted = data.size(); } if (sequence.length() != templateSeqsLength) { templateSameLength = false; } if (templateSameLength && templateAligned) { if (numWanted != 1) { dists.resize(data.size()); //calc distance from this sequence to every sequence in the template for (int i = 0; i < data.size(); i++) { distCalculator->calcDist(*query, data[i]); float dist = distCalculator->getDist(); //save distance to each template sequence dists[i].seq1 = -1; dists[i].seq2 = i; dists[i].dist = dist; } sort(dists.begin(), dists.end(), compareSequenceDistance); //sorts by distance lowest to highest //save distance of best match searchScore = dists[0].dist; //fill topmatches with numwanted closest sequences indexes for (int i = 0; i < numWanted; i++) { topMatches.push_back(dists[i].seq2); Scores.push_back(dists[i].dist); } }else { int bestIndex = 0; float smallDist = 100000; for (int i = 0; i < data.size(); i++) { distCalculator->calcDist(*query, data[i]); float dist = distCalculator->getDist(); //are you smaller? if (dist < smallDist) { bestIndex = i; smallDist = dist; } } searchScore = smallDist; topMatches.push_back(bestIndex); Scores.push_back(smallDist); } }else{ m->mothurOut("cannot find closest matches using distance method for " + query->getName() + " without aligned template sequences of the same length."); m->mothurOutEndLine(); exit(1); } return topMatches; } catch(exception& e) { m->errorOut(e, "DistanceDB", "findClosestSequence"); exit(1); } } /**************************************************************************************************/ bool DistanceDB::isAligned(string seq){ try { bool aligned; int pos = seq.find_first_of(".-"); if (pos != seq.npos) { aligned = true; }else { aligned = false; } return aligned; } catch(exception& e) { m->errorOut(e, "DistanceDB", "isAligned"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/datastructures/distancedb.hpp000066400000000000000000000014361255543666200223300ustar00rootroot00000000000000#ifndef DISTANCEDB_HPP #define DISTANCEDB_HPP /* * distancedb.hpp * * * Created by westcott on 1/27/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "dist.h" class DistanceDB : public Database { public: DistanceDB(); ~DistanceDB() { delete distCalculator; } void generateDB() {} //doesn't generate a search db void addSequence(Sequence); string getName(int i) { return data[i].getName(); } vector findClosestSequences(Sequence*, int); // returns indexes of n closest sequences to query #ifdef USE_MPI int MPISend(int) {return 0;} int MPIRecv(int) {return 0;} #endif private: vector data; Dist* distCalculator; int templateSeqsLength; bool templateAligned; bool isAligned(string); }; #endif mothur-1.36.1/source/datastructures/fastamap.cpp000066400000000000000000000132241255543666200220150ustar00rootroot00000000000000/* * fastamap.cpp * mothur * * Created by Sarah Westcott on 1/16/09. * Copyright 2009 Schloss Lab UMASS AMherst. All rights reserved. * */ #include "fastamap.h" #include "sequence.hpp" /*******************************************************************************/ void FastaMap::readFastaFile(string inFileName) { try { ifstream in; m->openInputFile(inFileName, in); string name, sequence, line; sequence = ""; string temp; map::iterator itName; while(!in.eof()){ if (m->control_pressed) { break; } Sequence currSeq(in); name = currSeq.getName(); if (name != "") { if(currSeq.getIsAligned()) { sequence = currSeq.getAligned(); } else { sequence = currSeq.getUnaligned(); } itName = seqmap.find(name); if (itName == seqmap.end()) { seqmap[name] = sequence; } else { m->mothurOut("You already have a sequence named " + name + ", sequence names must be unique, please correct."); m->mothurOutEndLine(); } map::iterator it = data.find(sequence); if (it == data.end()) { //it's unique. data[sequence].groupname = name; //group name will be the name of the first duplicate sequence found. // data[sequence].groupnumber = 1; data[sequence].names = name; }else { // its a duplicate. data[sequence].names += "," + name; // data[sequence].groupnumber++; } } m->gobble(in); } in.close(); } catch(exception& e) { m->errorOut(e, "FastaMap", "readFastaFile"); exit(1); } } /*******************************************************************************/ void FastaMap::readFastaFile(string inFastaFile, string oldNameFileName){ //prints data ifstream oldNameFile; m->openInputFile(oldNameFileName, oldNameFile); map oldNameMap; map::iterator itName; string name, list; while(!oldNameFile.eof()){ if (m->control_pressed) { break; } oldNameFile >> name; m->gobble(oldNameFile); oldNameFile >> list; oldNameMap[name] = list; m->gobble(oldNameFile); } oldNameFile.close(); ifstream inFASTA; m->openInputFile(inFastaFile, inFASTA); string sequence; while(!inFASTA.eof()){ if (m->control_pressed) { break; } Sequence currSeq(inFASTA); name = currSeq.getName(); if (name != "") { if(currSeq.getIsAligned()) { sequence = currSeq.getAligned(); } else { sequence = currSeq.getUnaligned(); } itName = seqmap.find(name); if (itName == seqmap.end()) { seqmap[name] = sequence; } else { m->mothurOut("You already have a sequence named " + name + ", sequence names must be unique, please correct."); m->mothurOutEndLine(); } seqmap[name] = sequence; map::iterator it = data.find(sequence); if (it == data.end()) { //it's unique. data[sequence].groupname = name; //group name will be the name of the first duplicate sequence found. // data[sequence].groupnumber = 1; data[sequence].names = oldNameMap[name]; }else { // its a duplicate. data[sequence].names += "," + oldNameMap[name]; // data[sequence].groupnumber++; } } m->gobble(inFASTA); } inFASTA.close(); } /*******************************************************************************/ string FastaMap::getGroupName(string seq) { //pass a sequence name get its group return data[seq].groupname; } /*******************************************************************************/ string FastaMap::getNames(string seq) { //pass a sequence get the string of names in the group separated by ','s. return data[seq].names; } /*******************************************************************************/ string FastaMap::getSequence(string name) { map::iterator it = seqmap.find(name); if (it == seqmap.end()) { return "not found"; } else { return it->second; } } /*******************************************************************************/ void FastaMap::push_back(string name, string seq) { map::iterator it = data.find(seq); if (it == data.end()) { //it's unique. data[seq].groupname = name; //group name will be the name of the first duplicate sequence found. data[seq].names = name; }else { // its a duplicate. data[seq].names += "," + name; } seqmap[name] = seq; } /*******************************************************************************/ int FastaMap::sizeUnique(){ //returns datas size which is the number of unique sequences return data.size(); } /*******************************************************************************/ void FastaMap::printNamesFile(string outFileName){ //prints data try { ofstream outFile; m->openOutputFile(outFileName, outFile); // two column file created with groupname and them list of identical sequence names for (map::iterator it = data.begin(); it != data.end(); it++) { if (m->control_pressed) { break; } outFile << it->second.groupname << '\t' << it->second.names << endl; } outFile.close(); } catch(exception& e) { m->errorOut(e, "FastaMap", "printNamesFile"); exit(1); } } /*******************************************************************************/ void FastaMap::printCondensedFasta(string outFileName){ //prints data try { ofstream out; m->openOutputFile(outFileName, out); //creates a fasta file for (map::iterator it = data.begin(); it != data.end(); it++) { if (m->control_pressed) { break; } out << ">" << it->second.groupname << endl; out << it->first << endl; } out.close(); } catch(exception& e) { m->errorOut(e, "FastaMap", "printCondensedFasta"); exit(1); } } /*******************************************************************************/ mothur-1.36.1/source/datastructures/fastamap.h000066400000000000000000000032551255543666200214650ustar00rootroot00000000000000#ifndef FASTAMAP_H #define FASTAMAP_H /* * fastamap.h * mothur * * Created by Sarah Westcott on 1/16/09. * Copyright 2009 Schloss Lab UMASS AMherst. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" /* This class represents the fasta file. It reads a fasta file a populates the internal data structure "data". Data is a map where the key is the sequence and the value is a struct containing the sequences groupname, a list of the sequences names who have the same sequence and a number of how many sequence names there are. */ class FastaMap { public: FastaMap() { m = MothurOut::getInstance(); } ~FastaMap() {}; string getGroupName(string); //pass a sequence name get its group string getNames(string); //pass a sequence get the string of names in the group separated by ','s. void push_back(string, string); //sequencename, sequence int sizeUnique(); //returns number of unique sequences void printNamesFile(string); //produces a 2 column file with the groupname in the first column and the names in the second column - a names file. void printCondensedFasta(string); //produces a fasta file. void readFastaFile(string); void readFastaFile(string, string); string getSequence(string); //pass it a name of a sequence, it returns the sequence. private: struct group { string groupname; //the group name for identical sequences, will be set to the first sequence found. string names; //the names of the sequence separated by ','. }; map data; //sequence, groupinfo - condensed representation of file map seqmap; //name, sequence - uncondensed representation of file MothurOut* m; }; #endif mothur-1.36.1/source/datastructures/fastqread.cpp000066400000000000000000000234661255543666200222040ustar00rootroot00000000000000// // fastqread.cpp // Mothur // // Created by Sarah Westcott on 1/26/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #include "fastqread.h" /*******************************************************************************/ FastqRead::FastqRead() { try { m = MothurOut::getInstance(); format = "illumina1.8+"; name = ""; sequence = ""; scores.clear(); //fill convert table - goes from solexa to sanger. Used fq_all2std.pl as a reference. for (int i = -64; i < 65; i++) { char temp = (char) ((int)(33 + 10*log(1+pow(10,(i/10.0)))/log(10)+0.499)); convertTable.push_back(temp); } } catch(exception& e) { m->errorOut(e, "FastqRead", "FastqRead"); exit(1); } } /*******************************************************************************/ FastqRead::FastqRead(string f) { try { m = MothurOut::getInstance(); format = f; name = ""; sequence = ""; scores.clear(); //fill convert table - goes from solexa to sanger. Used fq_all2std.pl as a reference. for (int i = -64; i < 65; i++) { char temp = (char) ((int)(33 + 10*log(1+pow(10,(i/10.0)))/log(10)+0.499)); convertTable.push_back(temp); } } catch(exception& e) { m->errorOut(e, "FastqRead", "FastqRead"); exit(1); } } /*******************************************************************************/ FastqRead::FastqRead(string f, string n, string s, vector sc) { try { m = MothurOut::getInstance(); format = f; name = n; sequence = s; scores = sc; //fill convert table - goes from solexa to sanger. Used fq_all2std.pl as a reference. for (int i = -64; i < 65; i++) { char temp = (char) ((int)(33 + 10*log(1+pow(10,(i/10.0)))/log(10)+0.499)); convertTable.push_back(temp); } } catch(exception& e) { m->errorOut(e, "FastqRead", "FastqRead"); exit(1); } } /*******************************************************************************/ FastqRead::FastqRead(ifstream& in, bool& ignore, string f) { try { m = MothurOut::getInstance(); ignore = false; format = f; //fill convert table - goes from solexa to sanger. Used fq_all2std.pl as a reference. for (int i = -64; i < 65; i++) { char temp = (char) ((int)(33 + 10*log(1+pow(10,(i/10.0)))/log(10)+0.499)); convertTable.push_back(temp); } //read sequence name string line = m->getline(in); m->gobble(in); vector pieces = m->splitWhiteSpace(line); name = ""; if (pieces.size() != 0) { name = pieces[0]; } if (name == "") { m->mothurOut("[WARNING]: Blank fasta name, ignoring read."); m->mothurOutEndLine(); ignore=true; } else if (name[0] != '@') { m->mothurOut("[WARNING]: reading " + name + " expected a name with @ as a leading character, ignoring read."); m->mothurOutEndLine(); ignore=true; } else { name = name.substr(1); } //read sequence sequence = m->getline(in); m->gobble(in); if (sequence == "") { m->mothurOut("[WARNING]: missing sequence for " + name + ", ignoring."); ignore=true; } //read sequence name line = m->getline(in); m->gobble(in); pieces = m->splitWhiteSpace(line); string name2 = ""; if (pieces.size() != 0) { name2 = pieces[0]; } if (name2 == "") { m->mothurOut("[WARNING]: expected a name with + as a leading character, ignoring."); ignore=true; } else if (name2[0] != '+') { m->mothurOut("[WARNING]: reading " + name2 + " expected a name with + as a leading character, ignoring."); ignore=true; } else { name2 = name2.substr(1); if (name2 == "") { name2 = name; } } //read quality scores string quality = m->getline(in); m->gobble(in); if (quality == "") { m->mothurOut("[WARNING]: missing quality for " + name2 + ", ignoring."); ignore=true; } //sanity check sequence length and number of quality scores match if (name2 != "") { if (name != name2) { m->mothurOut("[WARNING]: names do not match. read " + name + " for fasta and " + name2 + " for quality, ignoring."); ignore=true; } } if (quality.length() != sequence.length()) { m->mothurOut("[WARNING]: Lengths do not match for sequence " + name + ". Read " + toString(sequence.length()) + " characters for fasta and " + toString(quality.length()) + " characters for quality scores, ignoring read."); ignore=true; } scores = convertQual(quality); m->checkName(name); if (m->debug) { m->mothurOut("[DEBUG]: " + name + " " + sequence + " " + quality + "\n"); } } catch(exception& e) { m->errorOut(e, "FastqRead", "FastqRead"); exit(1); } } //********************************************************************************************************************** #ifdef USE_BOOST FastqRead::FastqRead(boost::iostreams::filtering_istream& in, bool& ignore, string f) { try { m = MothurOut::getInstance(); ignore = false; format = f; //fill convert table - goes from solexa to sanger. Used fq_all2std.pl as a reference. for (int i = -64; i < 65; i++) { char temp = (char) ((int)(33 + 10*log(1+pow(10,(i/10.0)))/log(10)+0.499)); convertTable.push_back(temp); } //read sequence name string line = ""; std::getline(in, line); m->gobble(in); vector pieces = m->splitWhiteSpace(line); name = ""; if (pieces.size() != 0) { name = pieces[0]; } if (name == "") { m->mothurOut("[WARNING]: Blank fasta name, ignoring read."); m->mothurOutEndLine(); ignore=true; } else if (name[0] != '@') { m->mothurOut("[WARNING]: reading " + name + " expected a name with @ as a leading character, ignoring read."); m->mothurOutEndLine(); ignore=true; } else { name = name.substr(1); } //read sequence std::getline(in, sequence); m->gobble(in); if (sequence == "") { m->mothurOut("[WARNING]: missing sequence for " + name + ", ignoring."); ignore=true; } //read sequence name line = ""; std::getline(in, line); m->gobble(in); pieces = m->splitWhiteSpace(line); string name2 = ""; if (pieces.size() != 0) { name2 = pieces[0]; } if (name2 == "") { m->mothurOut("[WARNING]: expected a name with + as a leading character, ignoring."); ignore=true; } else if (name2[0] != '+') { m->mothurOut("[WARNING]: reading " + name2 + " expected a name with + as a leading character, ignoring."); ignore=true; } else { name2 = name2.substr(1); if (name2 == "") { name2 = name; } } //read quality scores string quality = ""; std::getline(in, quality); m->gobble(in); if (quality == "") { m->mothurOut("[WARNING]: missing quality for " + name2 + ", ignoring."); ignore=true; } //sanity check sequence length and number of quality scores match if (name2 != "") { if (name != name2) { m->mothurOut("[WARNING]: names do not match. read " + name + " for fasta and " + name2 + " for quality, ignoring."); ignore=true; } } if (quality.length() != sequence.length()) { m->mothurOut("[WARNING]: Lengths do not match for sequence " + name + ". Read " + toString(sequence.length()) + " characters for fasta and " + toString(quality.length()) + " characters for quality scores, ignoring read."); ignore=true; } scores = convertQual(quality); m->checkName(name); if (m->debug) { m->mothurOut("[DEBUG]: " + name + " " + sequence + " " + quality + "\n"); } } catch(exception& e) { m->errorOut(e, "FastqRead", "FastqRead"); exit(1); } } #endif //********************************************************************************************************************** vector FastqRead::convertQual(string qual) { try { vector qualScores; bool negativeScores = false; for (int i = 0; i < qual.length(); i++) { int temp = 0; temp = int(qual[i]); if (format == "illumina") { temp -= 64; //char '@' }else if (format == "illumina1.8+") { temp -= int('!'); //char '!' //33 }else if (format == "solexa") { temp = int(convertTable[temp]); //convert to sanger temp -= int('!'); //char '!' //33 }else { temp -= int('!'); //char '!' //33 } if (temp < 0) { negativeScores = true; temp = 0; } qualScores.push_back(temp); } if (negativeScores) { m->mothurOut("[ERROR]: finding negative quality scores, do you have the right format selected? http://en.wikipedia.org/wiki/FASTQ_format#Encoding \n"); m->control_pressed = true; } return qualScores; } catch(exception& e) { m->errorOut(e, "FastqRead", "convertQual"); exit(1); } } //********************************************************************************************************************** Sequence FastqRead::getSequence() { try { Sequence temp(name, sequence); return temp; } catch(exception& e) { m->errorOut(e, "FastqRead", "getSequence"); exit(1); } } //********************************************************************************************************************** QualityScores FastqRead::getQuality() { try { QualityScores temp(name, scores); return temp; } catch(exception& e) { m->errorOut(e, "FastqRead", "getQuality"); exit(1); } } /*******************************************************************************/ mothur-1.36.1/source/datastructures/fastqread.h000066400000000000000000000032521255543666200216400ustar00rootroot00000000000000// // fastqread.h // Mothur // // Created by Sarah Westcott on 1/26/15. // Copyright (c) 2015 Schloss Lab. All rights reserved. // #ifndef Mothur_fastqread_h #define Mothur_fastqread_h #include "mothur.h" #include "mothurout.h" #include "sequence.hpp" #include "qualityscores.h" /* This class is a representation of a fastqread. If no format is given, defaults to illumina1.8+. @M00704:50:000000000-A3G0K:1:1101:15777:1541 2:N:0:0 NCTCTACCAGGCCAAGCATAATGGGCGGGATCGTATCGAAGTAGCCTTGATGGGTAAGGTTGCCTGAGTTTCACAAGACAGATTACAGAGGTCGTCTATGCCCTGTCTCTTATACACATCTGACGCTGCCGACGAATAGAGAGGTGTAGATCTCGGTGGTCGCCGTATCATTAAAAAAAATATCGTCTAGGCCATGTGTGACGCTCGGTCTGGGCTTCACGAACAGGGGGTCCGCCATGTACCGCGCGCTC + #>>3AAFFFBAAFAGGFFFFGFHHHGGGG0EFGFHHFGHBFFGFDGHFGEGFFEBEGFCBFGFGFF2F4B3EGFHHHEHEHGHHH3FGHFG3BEEFHHHGGEGHFFHHEFGHHFHFHHF1B?FFD/AD/FC/<@D-.FGBF1<<<<< sc); FastqRead(ifstream&, bool&, string f); #ifdef USE_BOOST FastqRead(boost::iostreams::filtering_istream&, bool&, string f); #endif ~FastqRead() {} string getName() { return name; } void setName(string n) { name = n; } string getSeq() { return sequence; } vector getScores() { return scores; } Sequence getSequence(); QualityScores getQuality(); private: MothurOut* m; vector scores; string name; string sequence; string format; vector convertTable; vector convertQual(string qual); }; #endif mothur-1.36.1/source/datastructures/flowdata.cpp000066400000000000000000000171531255543666200220270ustar00rootroot00000000000000/* * flowdata.cpp * Mothur * * Created by Pat Schloss on 12/22/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "flowdata.h" //********************************************************************************************************************** FlowData::FlowData(){} //********************************************************************************************************************** FlowData::~FlowData(){ /* do nothing */ } //********************************************************************************************************************** FlowData::FlowData(int numFlows, float signal, float noise, int maxHomoP, string baseFlow) : numFlows(numFlows), signalIntensity(signal), noiseIntensity(noise), maxHomoP(maxHomoP), baseFlow(baseFlow){ try { m = MothurOut::getInstance(); flowData.assign(numFlows, 0); // baseFlow = "TACG"; seqName = ""; locationString = ""; } catch(exception& e) { m->errorOut(e, "FlowData", "FlowData"); exit(1); } } //********************************************************************************************************************** bool FlowData::getNext(ifstream& flowFile){ try { seqName = getSequenceName(flowFile); if (m->debug) { m->mothurOut("[DEBUG]: flow = " + seqName + " "); } flowFile >> endFlow; if (m->debug) { m->mothurOut(toString(endFlow) + " "); } if (!m->control_pressed) { if (m->debug) { m->mothurOut(" "); } for(int i=0;i> flowData[i]; if (m->debug) { m->mothurOut(toString(flowData[i]) + " "); } } if (m->debug) { m->mothurOut("\n"); } updateEndFlow(); translateFlow(); m->gobble(flowFile); } if(flowFile){ return 1; } else { return 0; } } catch(exception& e) { m->errorOut(e, "FlowData", "getNext"); exit(1); } } //******************************************************************************************************************** string FlowData::getSequenceName(ifstream& flowFile) { try { string name = ""; flowFile >> name; if (name.length() != 0) { m->checkName(name); }else{ m->mothurOut("Error in reading your flowfile, at position " + toString(flowFile.tellg()) + ". Blank name."); m->mothurOutEndLine(); m->control_pressed = true; } return name; } catch(exception& e) { m->errorOut(e, "FlowData", "getSequenceName"); exit(1); } } //********************************************************************************************************************** void FlowData::updateEndFlow(){ try{ if (baseFlow.length() > 4) { return; } //int currLength = 0; float maxIntensity = (float) maxHomoP + 0.49; int deadSpot = 0; while(deadSpot < endFlow){ int signal = 0; int noise = 0; for(int i=0;i signalIntensity){ signal++; if(intensity < noiseIntensity || intensity > maxIntensity){ noise++; } } } if(noise > 0 || signal == 0){ break; } deadSpot += baseFlow.length(); } endFlow = deadSpot; } catch(exception& e) { m->errorOut(e, "FlowData", "findDeadSpot"); exit(1); } } //********************************************************************************************************************** //TATGCT //1 0 0 0 0 1 //then the second positive flow is for a T, but you saw a T between the last and previous flow adn it wasn't positive, so something is missing //Becomes TNT void FlowData::translateFlow(){ try{ sequence = ""; set charInMiddle; int oldspot = -1; bool updateOld = false; for(int i=0;i= 1) { if (oldspot == -1) { updateOld = true; } else { //check for bases inbetween two 1's if (charInMiddle.count(base) != 0) { //we want to covert to an N sequence = sequence.substr(0, oldspot+1); sequence += 'N'; } updateOld = true; charInMiddle.clear(); } } for(int j=0;j 4){ sequence = sequence.substr(4); } else{ sequence = "NNNN"; } } catch(exception& e) { m->errorOut(e, "FlowData", "translateFlow"); exit(1); } } //********************************************************************************************************************** void FlowData::capFlows(int mF){ try{ maxFlows = mF; if(endFlow > maxFlows){ endFlow = maxFlows; } translateFlow(); } catch(exception& e) { m->errorOut(e, "FlowData", "capFlows"); exit(1); } } //********************************************************************************************************************** bool FlowData::hasGoodHomoP(){ try{ float maxIntensity = (float) maxHomoP + 0.49; for(int i=0;i maxIntensity){ return 0; } } return 1; } catch(exception& e) { m->errorOut(e, "FlowData", "hasMinFlows"); exit(1); } } //********************************************************************************************************************** bool FlowData::hasMinFlows(int minFlows){ try{ bool pastMin = 0; if(endFlow >= minFlows){ pastMin = 1; } return pastMin; } catch(exception& e) { m->errorOut(e, "FlowData", "hasMinFlows"); exit(1); } } //********************************************************************************************************************** Sequence FlowData::getSequence(){ try{ return Sequence(seqName, sequence); } catch(exception& e) { m->errorOut(e, "FlowData", "getSequence"); exit(1); } } //********************************************************************************************************************** void FlowData::printFlows(ofstream& outFlowFile){ try{ // outFlowFile << '>' << seqName << locationString << " length=" << seqLength << " numflows=" << maxFlows << endl; outFlowFile << seqName << ' ' << endFlow << ' ' << setprecision(2); for(int i=0;ierrorOut(e, "FlowData", "printFlows"); exit(1); } } //********************************************************************************************************************** void FlowData::printFlows(ofstream& outFlowFile, string scrapCode){ try{ outFlowFile << seqName << '|' << scrapCode << ' ' << endFlow << ' ' << setprecision(2); for(int i=0;ierrorOut(e, "FlowData", "printFlows"); exit(1); } } //********************************************************************************************************************** string FlowData::getName(){ try{ return seqName; } catch(exception& e) { m->errorOut(e, "FlowData", "getName"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/datastructures/flowdata.h000066400000000000000000000015011255543666200214620ustar00rootroot00000000000000#ifndef FLOWDATA_H #define FLOWDATA_H /* * flowdata.h * Mothur * * Created by Pat Schloss on 12/22/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" #include "sequence.hpp" class FlowData { public: FlowData(); FlowData(int, float, float, int, string); ~FlowData(); bool getNext(ifstream&); string getName(); void capFlows(int); bool hasMinFlows(int); bool hasGoodHomoP(); Sequence getSequence(); void printFlows(ofstream&); void printFlows(ofstream&, string); private: MothurOut* m; void updateEndFlow(); void translateFlow(); float signalIntensity, noiseIntensity; int maxHomoP; string seqName, locationString, sequence, baseFlow; int numFlows, maxFlows, endFlow; vector flowData; string getSequenceName(ifstream&); }; #endif mothur-1.36.1/source/datastructures/fullmatrix.cpp000066400000000000000000000154021255543666200224100ustar00rootroot00000000000000/* * fullmatrix.cpp * Mothur * * Created by Sarah Westcott on 3/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "fullmatrix.h" /**************************************************************************/ //This constructor reads a distance matrix file and stores the data in the matrix. FullMatrix::FullMatrix(ifstream& filehandle, GroupMap* g, bool s) : groupmap(g), sim(s) { try{ m = MothurOut::getInstance(); string name, group; filehandle >> numSeqs >> name; //make the matrix filled with zeros matrix.resize(numSeqs); for(int i = 0; i < numSeqs; i++) { matrix[i].resize(numSeqs, 0.0); } group = groupmap->getGroup(name); if(group == "not found") { m->mothurOut("Error: Sequence '" + name + "' was not found in the group file, please correct."); m->mothurOutEndLine(); exit(1); } index.resize(numSeqs); index[0].seqName = name; index[0].groupName = group; //determine if matrix is square or lower triangle //if it is square read the distances for the first sequence char d; bool square; while((d=filehandle.get()) != EOF){ //is d a number meaning its square if(isalnum(d)){ square = true; filehandle.putback(d); for(int i=0;i> matrix[0][i]; if (sim) { matrix[0][i] = 1.0 - matrix[0][i]; } } break; } //is d a line return meaning its lower triangle if(d == '\n'){ square = false; break; } } //read rest of matrix if (square == true) { readSquareMatrix(filehandle); } else { readLTMatrix(filehandle); } filehandle.close(); if (!m->control_pressed) { sortGroups(0, numSeqs-1); } } catch(exception& e) { m->errorOut(e, "FullMatrix", "FullMatrix"); exit(1); } } /**************************************************************************/ int FullMatrix::readSquareMatrix(ifstream& filehandle) { try { Progress* reading; reading = new Progress("Reading matrix: ", numSeqs * numSeqs); int count = 0; string group, name; for(int i=1;i> name; group = groupmap->getGroup(name); index[i].seqName = name; index[i].groupName = group; if(group == "not found") { m->mothurOut("Error: Sequence '" + name + "' was not found in the group file, please correct."); m->mothurOutEndLine(); exit(1); } for(int j=0;jcontrol_pressed) { delete reading; return 0; } filehandle >> matrix[i][j]; if (sim) { matrix[i][j] = 1.0 - matrix[i][j]; } count++; reading->update(count); } } if (m->control_pressed) { delete reading; return 0; } reading->finish(); delete reading; return 0; } catch(exception& e) { m->errorOut(e, "FullMatrix", "readSquareMatrix"); exit(1); } } /**************************************************************************/ int FullMatrix::readLTMatrix(ifstream& filehandle) { try { Progress* reading; reading = new Progress("Reading matrix: ", numSeqs * (numSeqs - 1) / 2); int count = 0; float distance; string group, name; for(int i=1;i> name; group = groupmap->getGroup(name); index[i].seqName = name; index[i].groupName = group; if(group == "not found") { m->mothurOut("Error: Sequence '" + name + "' was not found in the group file, please correct."); m->mothurOutEndLine(); exit(1); } for(int j=0;jcontrol_pressed) { delete reading; return 0; } filehandle >> distance; if (sim) { distance = 1.0 - distance; } matrix[i][j] = distance; matrix[j][i] = distance; count++; reading->update(count); } } if (m->control_pressed) { delete reading; return 0; } reading->finish(); delete reading; return 0; } catch(exception& e) { m->errorOut(e, "FullMatrix", "readLTMatrix"); exit(1); } } /**************************************************************************/ void FullMatrix::sortGroups(int low, int high){ try{ if (low < high) { int i = low+1; int j = high; int pivot = (low+high) / 2; swapRows(low, pivot); //puts pivot in final spot /* compare value */ //what group does this row belong to string key = index[low].groupName; /* partition */ while(i <= j) { /* find member above ... */ while((i <= high) && (index[i].groupName <= key)) { i++; } /* find element below ... */ while((j >= low) && (index[j].groupName > key)) { j--; } if(i < j) { swapRows(i, j); } } swapRows(low, j); /* recurse */ sortGroups(low, j-1); sortGroups(j+1, high); } } catch(exception& e) { m->errorOut(e, "FullMatrix", "sortGroups"); exit(1); } } /**************************************************************************/ void FullMatrix::swapRows(int i, int j) { try { float y; string z, name; /* swap rows*/ for (int h = 0; h < numSeqs; h++) { y = matrix[i][h]; matrix[i][h] = matrix[j][h]; matrix[j][h] = y; } /* swap columns*/ for (int b = 0; b < numSeqs; b++) { y = matrix[b][i]; matrix[b][i] = matrix[b][j]; matrix[b][j] = y; } //swap map elements z = index[i].groupName; index[i].groupName = index[j].groupName; index[j].groupName = z; name = index[i].seqName; index[i].seqName = index[j].seqName; index[j].seqName = name; } catch(exception& e) { m->errorOut(e, "FullMatrix", "swapRows"); exit(1); } } /**************************************************************************/ float FullMatrix::get(int i, int j){ return matrix[i][j]; } /**************************************************************************/ vector FullMatrix::getGroups(){ return groups; } /**************************************************************************/ vector FullMatrix::getSizes(){ return sizes; } /**************************************************************************/ int FullMatrix::getNumGroups(){ return groups.size(); } /**************************************************************************/ int FullMatrix::getNumSeqs(){ return numSeqs; } /**************************************************************************/ void FullMatrix::printMatrix(ostream& out) { try{ for (int i = 0; i < numSeqs; i++) { out << "row " << i << " group = " << index[i].groupName << " name = " << index[i].seqName << endl; for (int j = 0; j < numSeqs; j++) { out << i << '\t' << j << '\t' << matrix[i][j] << endl; } out << endl; } for (int i = 0; i < numSeqs; i++) { out << i << '\t' << index[i].seqName << endl; } } catch(exception& e) { m->errorOut(e, "FullMatrix", "printMatrix"); exit(1); } } /**************************************************************************/ mothur-1.36.1/source/datastructures/fullmatrix.h000066400000000000000000000025321255543666200220550ustar00rootroot00000000000000#ifndef FULLMATRIX_H #define FULLMATRIX_H /* * fullmatrix.h * Mothur * * Created by Sarah Westcott on 3/6/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "groupmap.h" #include "progress.hpp" struct Names { string seqName; string groupName; }; class FullMatrix { public: //FullMatrix(){ m = MothurOut::getInstance(); } FullMatrix(ifstream&, GroupMap*, bool); ~FullMatrix(){}; int getNumSeqs(); vector getSizes(); vector getGroups(); void setGroups(vector names) { groups = names; } void setSizes(vector s) { sizes = s; } int getNumGroups(); void printMatrix(ostream&); float get(int, int); Names getRowInfo(int row) { return index[row]; } private: vector< vector > matrix; //a 2D distance matrix of all the sequences and their distances to eachother. int readSquareMatrix(ifstream&); int readLTMatrix(ifstream&); vector index; // row in vector, sequence group. need to know this so when we sort it can be updated. vector sizes; vector groups; void sortGroups(int, int); //this function sorts the sequences within the matrix. void swapRows(int, int); GroupMap* groupmap; //maps sequences to groups they belong to. int numSeqs; int numGroups; int numUserGroups; bool sim; MothurOut* m; }; #endif mothur-1.36.1/source/datastructures/groupmap.cpp000066400000000000000000000433651255543666200220640ustar00rootroot00000000000000/* * groupmap.cpp * Dotur * * Created by Sarah Westcott on 12/1/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "groupmap.h" /************************************************************/ GroupMap::GroupMap(string filename) { m = MothurOut::getInstance(); groupFileName = filename; m->openInputFile(filename, fileHandle); index = 0; } /************************************************************/ GroupMap::~GroupMap(){} /************************************************************/ int GroupMap::readMap() { try { string seqName, seqGroup; int error = 0; string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; while (!fileHandle.eof()) { if (m->control_pressed) { fileHandle.close(); return 1; } fileHandle.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, fileHandle.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "', group = '" + seqGroup + "'\n"); } m->checkName(seqName); it = groupmap.find(seqName); if (it != groupmap.end()) { error = 1; m->mothurOut("Your groupfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[seqName] = seqGroup; //store data in map seqsPerGroup[seqGroup]++; //increment number of seqs in that group } pairDone = false; } } } fileHandle.close(); if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "', group = '" + seqGroup + "'\n"); } m->checkName(seqName); it = groupmap.find(seqName); if (it != groupmap.end()) { error = 1; m->mothurOut("Your groupfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[seqName] = seqGroup; //store data in map seqsPerGroup[seqGroup]++; //increment number of seqs in that group } pairDone = false; } } } m->setAllGroups(namesOfGroups); return error; } catch(exception& e) { m->errorOut(e, "GroupMap", "readMap"); exit(1); } } /************************************************************/ int GroupMap::readDesignMap() { try { string seqName, seqGroup; int error = 0; string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; while (!fileHandle.eof()) { if (m->control_pressed) { fileHandle.close(); return 1; } fileHandle.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, fileHandle.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "', group = '" + seqGroup + "'\n"); } m->checkName(seqName); it = groupmap.find(seqName); if (it != groupmap.end()) { error = 1; m->mothurOut("Your designfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[seqName] = seqGroup; //store data in map seqsPerGroup[seqGroup]++; //increment number of seqs in that group } pairDone = false; } } } fileHandle.close(); if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "', group = '" + seqGroup + "'\n"); } m->checkName(seqName); it = groupmap.find(seqName); if (it != groupmap.end()) { error = 1; m->mothurOut("Your designfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[seqName] = seqGroup; //store data in map seqsPerGroup[seqGroup]++; //increment number of seqs in that group } pairDone = false; } } } m->setAllGroups(namesOfGroups); return error; } catch(exception& e) { m->errorOut(e, "GroupMap", "readDesignMap"); exit(1); } } /************************************************************/ int GroupMap::readMap(string filename) { try { groupFileName = filename; m->openInputFile(filename, fileHandle); index = 0; string seqName, seqGroup; int error = 0; string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; while (!fileHandle.eof()) { if (m->control_pressed) { fileHandle.close(); return 1; } fileHandle.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, fileHandle.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "', group = '" + seqGroup + "'\n"); } m->checkName(seqName); it = groupmap.find(seqName); if (it != groupmap.end()) { error = 1; m->mothurOut("Your group file contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[seqName] = seqGroup; //store data in map seqsPerGroup[seqGroup]++; //increment number of seqs in that group } pairDone = false; } } } fileHandle.close(); if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "', group = '" + seqGroup + "'\n"); } m->checkName(seqName); it = groupmap.find(seqName); if (it != groupmap.end()) { error = 1; m->mothurOut("Your group file contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[seqName] = seqGroup; //store data in map seqsPerGroup[seqGroup]++; //increment number of seqs in that group } pairDone = false; } } } m->setAllGroups(namesOfGroups); return error; } catch(exception& e) { m->errorOut(e, "GroupMap", "readMap"); exit(1); } } /************************************************************/ int GroupMap::readDesignMap(string filename) { try { groupFileName = filename; m->openInputFile(filename, fileHandle); index = 0; string seqName, seqGroup; int error = 0; string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; while (!fileHandle.eof()) { if (m->control_pressed) { fileHandle.close(); return 1; } fileHandle.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, fileHandle.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "', group = '" + seqGroup + "'\n"); } m->checkName(seqName); it = groupmap.find(seqName); if (it != groupmap.end()) { error = 1; m->mothurOut("Your designfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[seqName] = seqGroup; //store data in map seqsPerGroup[seqGroup]++; //increment number of seqs in that group } pairDone = false; } } } fileHandle.close(); if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "', group = '" + seqGroup + "'\n"); } m->checkName(seqName); it = groupmap.find(seqName); if (it != groupmap.end()) { error = 1; m->mothurOut("Your designfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[seqName] = seqGroup; //store data in map seqsPerGroup[seqGroup]++; //increment number of seqs in that group } pairDone = false; } } } m->setAllGroups(namesOfGroups); return error; } catch(exception& e) { m->errorOut(e, "GroupMap", "readDesignMap"); exit(1); } } /************************************************************/ int GroupMap::getNumGroups() { return namesOfGroups.size(); } /************************************************************/ string GroupMap::getGroup(string sequenceName) { it = groupmap.find(sequenceName); if (it != groupmap.end()) { //sequence name was in group file return it->second; }else { //look for it in names of groups to see if the user accidently used the wrong file if (m->inUsersGroups(sequenceName, namesOfGroups)) { m->mothurOut("[WARNING]: Your group or design file contains a group named " + sequenceName + ". Perhaps you are used a group file instead of a design file? A common cause of this is using a tree file that relates your groups (created by the tree.shared command) with a group file that assigns sequences to a group."); m->mothurOutEndLine(); } return "not found"; } } /************************************************************/ void GroupMap::setGroup(string sequenceName, string groupN) { setNamesOfGroups(groupN); m->checkName(sequenceName); it = groupmap.find(sequenceName); if (it != groupmap.end()) { m->mothurOut("Your groupfile contains more than 1 sequence named " + sequenceName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { groupmap[sequenceName] = groupN; //store data in map seqsPerGroup[groupN]++; //increment number of seqs in that group } } /************************************************************/ void GroupMap::setNamesOfGroups(string seqGroup) { int i, count; count = 0; for (i=0; ierrorOut(e, "GroupMap", "isValidGroup"); exit(1); } } /************************************************************/ int GroupMap::getCopy(GroupMap* g) { try { vector names = g->getNamesSeqs(); for (int i = 0; i < names.size(); i++) { if (m->control_pressed) { break; } string group = g->getGroup(names[i]); setGroup(names[i], group); } return names.size(); } catch(exception& e) { m->errorOut(e, "GroupMap", "getCopy"); exit(1); } } /************************************************************/ int GroupMap::getNumSeqs(string group) { try { map::iterator itNum; itNum = seqsPerGroup.find(group); if (itNum == seqsPerGroup.end()) { return 0; } return seqsPerGroup[group]; } catch(exception& e) { m->errorOut(e, "GroupMap", "getNumSeqs"); exit(1); } } /************************************************************/ int GroupMap::renameSeq(string oldName, string newName) { try { map::iterator itName; itName = groupmap.find(oldName); if (itName == groupmap.end()) { m->mothurOut("[ERROR]: cannot find " + toString(oldName) + " in group file"); m->control_pressed = true; return 0; }else { string group = itName->second; groupmap.erase(itName); groupmap[newName] = group; } return 0; } catch(exception& e) { m->errorOut(e, "GroupMap", "renameSeq"); exit(1); } } /************************************************************/ int GroupMap::print(ofstream& out) { try { for (map::iterator itName = groupmap.begin(); itName != groupmap.end(); itName++) { out << itName->first << '\t' << itName->second << endl; } return 0; } catch(exception& e) { m->errorOut(e, "GroupMap", "print"); exit(1); } } /************************************************************/ int GroupMap::print(ofstream& out, vector userGroups) { try { for (map::iterator itName = groupmap.begin(); itName != groupmap.end(); itName++) { if (m->inUsersGroups(itName->second, userGroups)) { out << itName->first << '\t' << itName->second << endl; } } return 0; } catch(exception& e) { m->errorOut(e, "GroupMap", "print"); exit(1); } } /************************************************************/ vector GroupMap::getNamesSeqs(){ try { vector names; for (it = groupmap.begin(); it != groupmap.end(); it++) { names.push_back(it->first); } return names; } catch(exception& e) { m->errorOut(e, "GroupMap", "getNamesSeqs"); exit(1); } } /************************************************************/ vector GroupMap::getNamesSeqs(vector picked){ try { vector names; for (it = groupmap.begin(); it != groupmap.end(); it++) { //if you are belong to one the the groups in the picked vector add you if (m->inUsersGroups(it->second, picked)) { names.push_back(it->first); } } return names; } catch(exception& e) { m->errorOut(e, "GroupMap", "getNamesSeqs"); exit(1); } } /************************************************************/ mothur-1.36.1/source/datastructures/groupmap.h000066400000000000000000000035671255543666200215310ustar00rootroot00000000000000#ifndef GROUPMAP_H #define GROUPMAP_H /* * groupmap.h * Mothur * * Created by Sarah Westcott on 12/1/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" /* This class is a representation of the groupfile. It is used by all the shared commands to determine what group a certain sequence belongs to. */ class GroupMap { public: GroupMap() { m = MothurOut::getInstance(); } GroupMap(string); ~GroupMap(); int readMap(); int readMap(string); int readDesignMap(); int readDesignMap(string); int getNumGroups(); bool isValidGroup(string); //return true if string is a valid group string getGroup(string); void setGroup(string, string); vector getNamesOfGroups() { sort(namesOfGroups.begin(), namesOfGroups.end()); groupIndex.clear(); for (int i = 0; i < namesOfGroups.size(); i++) { groupIndex[namesOfGroups[i]] = i; } return namesOfGroups; } vector getNamesSeqs(); void setNamesOfGroups(vector sn) { namesOfGroups = sn; } int getNumSeqs() { return groupmap.size(); } vector getNamesSeqs(vector); //get names of seqs belonging to a group or set of groups int getNumSeqs(string); //return the number of seqs in a given group int getCopy(GroupMap*); int renameSeq(string, string); int print(ofstream&); int print(ofstream&, vector); //print certain groups map groupIndex; //groupname, vectorIndex in namesOfGroups. - used by collectdisplays and libshuff commands. private: vector namesOfGroups; MothurOut* m; ifstream fileHandle; string groupFileName; int index; map::iterator it; void setNamesOfGroups(string); map groupmap; //sequence name and groupname map seqsPerGroup; //maps groupname to number of seqs in that group }; #endif mothur-1.36.1/source/datastructures/kmer.cpp000066400000000000000000000151571255543666200211660ustar00rootroot00000000000000/* * kmer.cpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "kmer.hpp" /**************************************************************************************************/ Kmer::Kmer(int size) : kmerSize(size) { // The constructor sets the size of the kmer int power4s[14] = { 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864 }; // No reason to waste the time of recalculating maxKmer = power4s[kmerSize]+1;// (int)pow(4.,k)+1; // powers of 4 everytime through. We need an // extra kmer if we get a non-ATGCU base } /**************************************************************************************************/ string Kmer::getKmerString(string sequence){ // Calculate kmer for each position in the sequence, count the freq int length = sequence.length(); // of each kmer, and convert it to an ascii character with base '!'. int nKmers = length - kmerSize + 1; // Export the string of characters as a string vector counts(maxKmer, 0); for(int i=0;i > Kmer::getKmerCounts(string sequence){ // Calculate kmer for each position in the sequence, save info in a map int length = sequence.length(); // so you know at each spot in the sequence what kmers were found int nKmers = length - kmerSize + 1; // vector< map > counts; counts.resize(nKmers); // a map kmer counts for each spot map::iterator it; for(int i=0;i T [T] // Base5 = (915 / 4^1) % 4 = 228 % 4 = 0 => A [AT] // Base4 = (915 / 4^2) % 4 = 57 % 4 = 1 => C [CAT] // Base3 = (915 / 4^3) % 4 = 14 % 4 = 2 => G [GCAT] // Base2 = (915 / 4^4) % 4 = 3 % 4 = 3 => T [TGCAT] // Base1 = (915 / 4^5) % 4 = 0 % 4 = 0 => A [ATGCAT] -> this checks out with the previous method int power4s[14] = { 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864 }; string kmer = ""; if(kmerNumber == power4s[kmerSize]){//pow(4.,7)){ // if the kmer number is the same as the maxKmer then it must for(int i=0;i=0;i--){ if(kmerString[i] == 'A') { reverse += 'T'; } else if(kmerString[i] == 'T'){ reverse += 'A'; } else if(kmerString[i] == 'G'){ reverse += 'C'; } else if(kmerString[i] == 'C'){ reverse += 'G'; } else { reverse += 'N'; } } int reverseNumber = getKmerNumber(reverse, 0); return reverseNumber; } /**************************************************************************************************/ char Kmer::getASCII(int number) { return (char)(33+number); } // '!' is the first printable char and // has the int value of 33 /**************************************************************************************************/ int Kmer::getNumber(char character) { return ((int)(character-'!')); } // '!' has the value of 33 /**************************************************************************************************/ mothur-1.36.1/source/datastructures/kmer.hpp000066400000000000000000000013651255543666200211670ustar00rootroot00000000000000#ifndef KMER_HPP #define KMER_HPP /* * kmer.hpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" /**************************************************************************************************/ class Kmer { public: Kmer(int); ~Kmer() {} string getKmerString(string); int getKmerNumber(string, int); string getKmerBases(int); int getReverseKmerNumber(int); vector< map > getKmerCounts(string sequence); //for use in chimeraCheck private: char getASCII(int); int getNumber(char); int kmerSize; int maxKmer; int nKmers; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/kmeralign.cpp000066400000000000000000000131131255543666200221670ustar00rootroot00000000000000// // kmeralign.cpp // Mothur // // Created by Pat Schloss on 4/6/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #include "kmeralign.h" #include "kmer.hpp" #include "alignment.hpp" /**************************************************************************************************/ KmerAlign::KmerAlign(int k) : kmerSize(k), kmerLibrary(k), Alignment() { try { int power4s[14] = { 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864 }; //maxKmer = kmerLibrary.getMaxKmer(); maxKmer = power4s[kmerSize]+1;// (int)pow(4.,k)+1; } catch(exception& e) { m->errorOut(e, "KmerAlign", "KmerAlign"); exit(1); } } /**************************************************************************************************/ KmerAlign::~KmerAlign(){ /* do nothing */ } /**************************************************************************************************/ //modelled after pandaseqs kmer align, assemble.c void KmerAlign::align(string A, string B){ try { int aLength = A.length(); int bLength = B.length(); int maxOverlap = aLength; if (bLength < aLength) { maxOverlap = bLength; } maxOverlap -= 2; int nKmersA = A.length() - kmerSize + 1; vector< vector > kmerseen; //set all kmers to unseen kmerseen.resize(maxKmer); //for (int i = 0; i < maxKmer; i++) { kmerseen[i].resize(numKmers, 0); } int kmer; /* Scan forward sequence building k-mers and appending the position to kmerseen[k] */ for(int i=0;i overlaps; for(int i=0;i::iterator it = overlaps.begin(); it != overlaps.end(); it++) { int index = *it; int overlap = index + 2; //2 = minoverlap double probability = calcProb(A, B, overlap); //printf("overlap prob: %i, %f\n", overlap, probability); if (probability > bestProb && overlap >= 2) { bestProb = probability; bestOverlap = overlap; } } //printf("best overlap prob: %i, %f\n", bestOverlap, bestProb); if(bestOverlap != -1){ if((aLength-bestOverlap) > 0){ //add gaps to the start of B int numGaps = (aLength-bestOverlap); B = string(numGaps, '-') + B; for (int i = 0; i < bLength; i++) { BBaseMap[i+numGaps] = i; } for (int i = 0; i < aLength; i++) { ABaseMap[i] = i; } }else { for (int i = 0; i < bLength; i++) { BBaseMap[i] = i; } for (int i = 0; i < aLength; i++) { ABaseMap[i] = i; } } } int diff = B.length() - A.length(); if(diff > 0){ A = A + string(diff, '-'); } seqAaln = A; seqBaln = B; pairwiseLength = seqAaln.length(); } catch(exception& e) { m->errorOut(e, "KmerAlign", "align"); exit(1); } } /**************************************************************************************************/ //modelled after pandaseqs kmer align, assemble.c double KmerAlign::calcProb(string A, string B, int overlap){ try { double prob = 0; int aLength = A.length(); int bLength = B.length(); int unknown, match, mismatch; unknown = 0; match = 0; mismatch = 0; for (int i = 0; i < overlap; i++) { int findex = aLength + i - overlap; int rindex = i; if (findex < 0 || rindex < 0 || findex >= aLength || rindex >= bLength) continue; char f = A[findex]; char r = B[rindex]; if ((f == 'N') || (r == 'N')) { unknown++; } else if (r == f) { match++; } else { mismatch++; } } //ln(0.25 * (1 - 2 * 0.36 + 0.36 * 0.36)) double pmatch = -2.278869; //ln((3 * 0.36 - 2 * 0.36 * 0.36) / 18.0) double pmismatch = -3.087848; if (overlap >= aLength && overlap >= bLength) { prob = (-1.38629 * unknown + match * pmatch + mismatch * pmismatch); } else { prob = (-1.38629 * (aLength + bLength - 2 * overlap + unknown) + match * pmatch + mismatch * pmismatch); } return prob; } catch(exception& e) { m->errorOut(e, "KmerAlign", "calcProb"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/datastructures/kmeralign.h000066400000000000000000000016451255543666200216430ustar00rootroot00000000000000#ifndef KMERALIGN_N #define KMERALIGN_N /* * kmeralign.h * * * Created by Pat Schloss on 4/6/14. * Copyright 2014 Patrick D. Schloss. All rights reserved. * * This class is an Alignment child class that implements a kmer-based pairwise alignment algorithm * for making contigs of reads without insertions * * */ #include "alignment.hpp" #include "kmer.hpp" # define PHREDMAX 46 # define PHREDCLAMP(x) ((x) > PHREDMAX ? PHREDMAX : ((x) < 0 ? 0 : (x))) /**************************************************************************************************/ class KmerAlign : public Alignment { public: KmerAlign(int); ~KmerAlign(); void align(string, string); private: int kmerSize; int maxKmer; Kmer kmerLibrary; double calcProb(string A, string B, int overlap); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/kmerdb.cpp000066400000000000000000000207501255543666200214670ustar00rootroot00000000000000/* * kmerdb.cpp * * * Created by Pat Schloss on 12/16/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class is a child class of the Database class, which stores the template sequences as a kmer table and provides * a method of searching the kmer table for the sequence with the most kmers in common with a query sequence. * kmerLocations is the primary storage variable that is a two-dimensional vector where each row represents the * different number of kmers and each column contains the index to sequences that use that kmer. * * Construction of an object of this type will first look for an appropriately named database file and if it is found * then will read in the database file (readKmerDB), otherwise it will generate one and store the data in memory * (generateKmerDB) * * The search method used here is roughly the same as that used in the SimRank program that is found at the * greengenes website. The default kmer size is 7. The speed complexity is between O(L) and O(LN). When I use 7mers * on average a kmer is found in ~100 other sequences with a database of ~5000 sequences. If this is the case then the * time would be on the order of O(0.1LN) -> fast. * */ #include "sequence.hpp" #include "kmer.hpp" #include "database.hpp" #include "kmerdb.hpp" /**************************************************************************************************/ KmerDB::KmerDB(string fastaFileName, int kSize) : Database(), kmerSize(kSize) { try { kmerDBName = fastaFileName.substr(0,fastaFileName.find_last_of(".")+1) + char('0'+ kmerSize) + "mer"; int power4s[14] = { 1, 4, 16, 64, 256, 1024, 4096, 16384, 65536, 262144, 1048576, 4194304, 16777216, 67108864 }; count = 0; maxKmer = power4s[kmerSize]; kmerLocations.resize(maxKmer+1); } catch(exception& e) { m->errorOut(e, "KmerDB", "KmerDB"); exit(1); } } /**************************************************************************************************/ KmerDB::KmerDB() : Database() {} /**************************************************************************************************/ KmerDB::~KmerDB(){} /**************************************************************************************************/ vector KmerDB::findClosestSequences(Sequence* candidateSeq, int num){ try { if (num > numSeqs) { m->mothurOut("[WARNING]: you requested " + toString(num) + " closest sequences, but the template only contains " + toString(numSeqs) + ", adjusting."); m->mothurOutEndLine(); num = numSeqs; } vector topMatches; Kmer kmer(kmerSize); searchScore = 0; Scores.clear(); vector matches(numSeqs, 0); // a record of the sequences with shared kmers vector timesKmerFound(kmerLocations.size()+1, 0); // a record of the kmers that we have already found int numKmers = candidateSeq->getNumBases() - kmerSize + 1; for(int i=0;igetUnaligned(), i); // go through the query sequence and get a kmer number if(timesKmerFound[kmerNumber] == 0){ // if we haven't seen it before... for(int j=0;j seqMatches; seqMatches.resize(numSeqs); for(int i=0;i bestMatch) { bestIndex = i; bestMatch = matches[i]; } } searchScore = bestMatch; searchScore = 100 * searchScore / (float) numKmers; // return the Sequence object corresponding to the db topMatches.push_back(bestIndex); Scores.push_back(searchScore); } return topMatches; } catch(exception& e) { m->errorOut(e, "KmerDB", "findClosestSequences"); exit(1); } } /**************************************************************************************************/ void KmerDB::generateDB(){ try { ofstream kmerFile; // once we have the kmerLocations folder print it out m->openOutputFile(kmerDBName, kmerFile); // to a file //output version kmerFile << "#" << m->getVersion() << endl; for(int i=0;ierrorOut(e, "KmerDB", "generateDB"); exit(1); } } /**************************************************************************************************/ void KmerDB::addSequence(Sequence seq) { try { Kmer kmer(kmerSize); string unaligned = seq.getUnaligned(); // ...take the unaligned sequence... int numKmers = unaligned.length() - kmerSize + 1; vector seenBefore(maxKmer+1,0); for(int j=0;jerrorOut(e, "KmerDB", "addSequence"); exit(1); } } /**************************************************************************************************/ void KmerDB::readKmerDB(ifstream& kmerDBFile){ try { kmerDBFile.seekg(0); // start at the beginning of the file //read version string line = m->getline(kmerDBFile); m->gobble(kmerDBFile); string seqName; int seqNumber; for(int i=0;i> seqName >> numValues; for(int j=0;j> seqNumber; // 1. number of sequences with the kmer number kmerLocations[i].push_back(seqNumber); // 2. sequence indices } } kmerDBFile.close(); } catch(exception& e) { m->errorOut(e, "KmerDB", "readKmerDB"); exit(1); } } /**************************************************************************************************/ int KmerDB::getCount(int kmer) { try { if (kmer < 0) { return 0; } //if user gives negative number else if (kmer > maxKmer) { return 0; } //or a kmer that is bigger than maxkmer else { return kmerLocations[kmer].size(); } // kmer is in vector range } catch(exception& e) { m->errorOut(e, "KmerDB", "getCount"); exit(1); } } /**************************************************************************************************/ int KmerDB::getReversed(int kmerNumber) { try { Kmer kmer(kmerSize); if (kmerNumber < 0) { return 0; } //if user gives negative number else if (kmerNumber > maxKmer) { return 0; } //or a kmer that is bigger than maxkmer else { return kmer.getReverseKmerNumber(kmerNumber); } // kmer is in vector range } catch(exception& e) { m->errorOut(e, "KmerDB", "getReversed"); exit(1); } } /**************************************************************************************************/ vector KmerDB::getSequencesWithKmer(int kmer) { try { vector seqs; if (kmer < 0) { } //if user gives negative number else if (kmer > maxKmer) { } //or a kmer that is bigger than maxkmer else { seqs = kmerLocations[kmer]; } return seqs; } catch(exception& e) { m->errorOut(e, "KmerDB", "getSequencesWithKmer"); exit(1); } } /**************************************************************************************************/ /**************************************************************************************************/ mothur-1.36.1/source/datastructures/kmerdb.hpp000066400000000000000000000027601255543666200214750ustar00rootroot00000000000000#ifndef KMERDB_HPP #define KMERDB_HPP /* * kmerdb.h * * * Created by Pat Schloss on 12/16/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class is a child class of the Database class, which stores the template sequences as a kmer table and provides * a method of searching the kmer table for the sequence with the most kmers in common with a query sequence. * kmerLocations is the primary storage variable that is a two-dimensional vector where each row represents the * different number of kmers and each column contains the index to sequences that use that kmer. * * Construction of an object of this type will first look for an appropriately named database file and if it is found * then will read in the database file (readKmerDB), otherwise it will generate one and store the data in memory * (generateKmerDB) */ #include "mothur.h" #include "database.hpp" class KmerDB : public Database { public: KmerDB(string, int); KmerDB(); ~KmerDB(); void generateDB(); void addSequence(Sequence); vector findClosestSequences(Sequence*, int); void readKmerDB(ifstream&); int getCount(int); //returns number of sequences with that kmer number vector getSequencesWithKmer(int); //returns vector of sequences that contain kmer passed in int getReversed(int); //returns reverse compliment kmerNumber int getMaxKmer() { return maxKmer; } private: int kmerSize; int maxKmer, count; string kmerDBName; vector > kmerLocations; }; #endif mothur-1.36.1/source/datastructures/listvector.cpp000066400000000000000000000306631255543666200224250ustar00rootroot00000000000000/* * list.cpp * * * Created by Pat Schloss on 8/8/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "sabundvector.hpp" #include "rabundvector.hpp" #include "ordervector.hpp" #include "listvector.hpp" //sorts highest to lowest /***********************************************************************/ inline bool abundNamesSort(string left, string right){ int countLeft = 0; if(left != ""){ countLeft = 1; for(int i=0;i countRight) { return true; } return false; } //sorts highest to lowest /***********************************************************************/ inline bool abundNamesSort2(listCt left, listCt right){ if (left.bin == "") { return false; } if (right.bin == "") { return true; } if (left.binSize > right.binSize) { return true; } return false; } /***********************************************************************/ ListVector::ListVector() : DataVector(), maxRank(0), numBins(0), numSeqs(0){} /***********************************************************************/ ListVector::ListVector(int n): DataVector(), data(n, "") , maxRank(0), numBins(0), numSeqs(0){} /***********************************************************************/ ListVector::ListVector(string id, vector lv) : DataVector(id), data(lv){ try { for(int i=0;igetNumNames(data[i]); numBins = i+1; if(binSize > maxRank) { maxRank = binSize; } numSeqs += binSize; } } } catch(exception& e) { m->errorOut(e, "ListVector", "ListVector"); exit(1); } } /**********************************************************************/ ListVector::ListVector(ifstream& f) : DataVector(), maxRank(0), numBins(0), numSeqs(0) { try { int hold; //are we at the beginning of the file?? if (m->saveNextLabel == "") { f >> label; //is this a shared file that has headers if (label == "label") { //gets "numOtus" f >> label; m->gobble(f); //eat rest of line label = m->getline(f); m->gobble(f); //parse labels to save istringstream iStringStream(label); m->listBinLabelsInFile.clear(); while(!iStringStream.eof()){ if (m->control_pressed) { break; } string temp; iStringStream >> temp; m->gobble(iStringStream); m->listBinLabelsInFile.push_back(temp); } f >> label >> hold; }else { //read in first row f >> hold; //make binlabels because we don't have any string snumBins = toString(hold); m->listBinLabelsInFile.clear(); for (int i = 0; i < hold; i++) { //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; m->listBinLabelsInFile.push_back(binLabel); } } m->saveNextLabel = label; }else { f >> label >> hold; m->saveNextLabel = label; } binLabels.assign(m->listBinLabelsInFile.begin(), m->listBinLabelsInFile.begin()+hold); data.assign(hold, ""); string inputData = ""; for(int i=0;i> inputData; set(i, inputData); } m->gobble(f); if (f.eof()) { m->saveNextLabel = ""; } } catch(exception& e) { m->errorOut(e, "ListVector", "ListVector"); exit(1); } } /***********************************************************************/ void ListVector::set(int binNumber, string seqNames){ try { int nNames_old = m->getNumNames(data[binNumber]); data[binNumber] = seqNames; int nNames_new = m->getNumNames(seqNames); if(nNames_old == 0) { numBins++; } if(nNames_new == 0) { numBins--; } if(nNames_new > maxRank) { maxRank = nNames_new; } numSeqs += (nNames_new - nNames_old); } catch(exception& e) { m->errorOut(e, "ListVector", "set"); exit(1); } } /***********************************************************************/ string ListVector::get(int index){ return data[index]; } /***********************************************************************/ void ListVector::setLabels(vector labels){ try { binLabels = labels; } catch(exception& e) { m->errorOut(e, "ListVector", "setLabels"); exit(1); } } /***********************************************************************/ //could potentially end up with duplicate binlabel names with code below. //we don't currently use them in a way that would do that. //if you had a listfile that had been subsampled and then added to it, dup names would be possible. vector ListVector::getLabels(){ try { string tagHeader = "Otu"; if (m->sharedHeaderMode == "tax") { tagHeader = "PhyloType"; } if (binLabels.size() < data.size()) { string snumBins = toString(numBins); for (int i = 0; i < numBins; i++) { string binLabel = tagHeader; if (i < binLabels.size()) { //label exists, check leading zeros length string sbinNumber = m->getSimpleLabel(binLabels[i]); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; binLabels[i] = binLabel; }else{ string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; binLabels.push_back(binLabel); } } } return binLabels; } catch(exception& e) { m->errorOut(e, "ListVector", "getLabels"); exit(1); } } /***********************************************************************/ void ListVector::push_back(string seqNames){ try { data.push_back(seqNames); int nNames = m->getNumNames(seqNames); numBins++; if(nNames > maxRank) { maxRank = nNames; } numSeqs += nNames; } catch(exception& e) { m->errorOut(e, "ListVector", "push_back"); exit(1); } } /***********************************************************************/ void ListVector::resize(int size){ data.resize(size); } /***********************************************************************/ int ListVector::size(){ return data.size(); } /***********************************************************************/ void ListVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; return data.clear(); } /***********************************************************************/ void ListVector::printHeaders(ostream& output){ try { string snumBins = toString(numBins); string tagHeader = "Otu"; if (m->sharedHeaderMode == "tax") { tagHeader = "PhyloType"; } output << "label\tnum" + tagHeader + "s"; vector theseLabels = getLabels(); for(int i = 0; i < theseLabels.size(); i++) { //print original label for sorted by abundance otu output << '\t' << theseLabels[i]; } output << endl; m->printedListHeaders = true; } catch(exception& e) { m->errorOut(e, "ListVector", "printHeaders"); exit(1); } } /***********************************************************************/ void ListVector::print(ostream& output, map& ct){ try { output << label << '\t' << numBins; vector hold; for (int i = 0; i < data.size(); i++) { if (data[i] != "") { vector binNames; string bin = data[i]; m->splitAtComma(bin, binNames); int total = 0; for (int j = 0; j < binNames.size(); j++) { map::iterator it = ct.find(binNames[j]); if (it == ct.end()) { m->mothurOut("[ERROR]: " + binNames[j] + " is not in your count table. Please correct.\n"); m->control_pressed = true; }else { total += ct[binNames[j]]; } } listCt temp(data[i], total); hold.push_back(temp); } } sort(hold.begin(), hold.end(), abundNamesSort2); for(int i=0;ierrorOut(e, "ListVector", "print"); exit(1); } } /***********************************************************************/ void ListVector::print(ostream& output){ try { output << label << '\t' << numBins; vector hold = data; sort(hold.begin(), hold.end(), abundNamesSort); //find first non blank otu int start = 0; for(int i=0;ierrorOut(e, "ListVector", "print"); exit(1); } } /***********************************************************************/ RAbundVector ListVector::getRAbundVector(){ try { RAbundVector rav; for(int i=0;igetNumNames(data[i]); rav.push_back(binSize); } // This was here before to output data in a nice format, but it screws up the name mapping steps // sort(rav.rbegin(), rav.rend()); // // for(int i=data.size()-1;i>=0;i--){ // if(rav.get(i) == 0){ rav.pop_back(); } // else{ // break; // } // } rav.setLabel(label); return rav; } catch(exception& e) { m->errorOut(e, "ListVector", "getRAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector ListVector::getSAbundVector(){ try { SAbundVector sav(maxRank+1); for(int i=0;igetNumNames(data[i]); sav.set(binSize, sav.get(binSize) + 1); } sav.set(0, 0); sav.setLabel(label); return sav; } catch(exception& e) { m->errorOut(e, "ListVector", "getSAbundVector"); exit(1); } } /***********************************************************************/ OrderVector ListVector::getOrderVector(map* orderMap = NULL){ try { if(orderMap == NULL){ OrderVector ov; for(int i=0;igetNumNames(data[i]); for(int j=0;jcount(seqName) == 0){ m->mothurOut(seqName + " not found, check *.names file\n"); exit(1); } ov.set((*orderMap)[seqName], i); seqName = ""; } } if(orderMap->count(seqName) == 0){ m->mothurOut(seqName + " not found, check *.names file\n"); exit(1); } ov.set((*orderMap)[seqName], i); } ov.setLabel(label); ov.getNumBins(); return ov; } } catch(exception& e) { m->errorOut(e, "ListVector", "getOrderVector"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/listvector.hpp000066400000000000000000000031261255543666200224240ustar00rootroot00000000000000#ifndef LIST_H #define LIST_H #include "datavector.hpp" /* DataStructure for a list file. This class is a child to datavector. It represents OTU information at a certain distance. A list vector can be converted into and ordervector, rabundvector or sabundvector. Each member of the internal container "data" represents an individual OTU. So data[0] = "a,b,c,d,e,f". example: listvector = a,b,c,d,e,f g,h,i j,k l m rabundvector = 6 3 2 1 1 sabundvector = 2 1 1 0 0 1 ordervector = 1 1 1 1 1 1 2 2 2 3 3 4 5 */ class ListVector : public DataVector { public: ListVector(); ListVector(int); // ListVector(const ListVector&); ListVector(string, vector); ListVector(const ListVector& lv) : DataVector(lv.label), data(lv.data), maxRank(lv.maxRank), numBins(lv.numBins), numSeqs(lv.numSeqs), binLabels(lv.binLabels) {}; ListVector(ifstream&); ~ListVector(){}; int getNumBins() { return numBins; } int getNumSeqs() { return numSeqs; } int getMaxRank() { return maxRank; } void set(int, string); string get(int); vector getLabels(); void setLabels(vector); void push_back(string); void resize(int); void clear(); int size(); void print(ostream&); void print(ostream&, map&); void printHeaders(ostream&); RAbundVector getRAbundVector(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); private: vector data; //data[i] is a list of names of sequences in the ith OTU. int maxRank; int numBins; int numSeqs; vector binLabels; }; #endif mothur-1.36.1/source/datastructures/nameassignment.cpp000066400000000000000000000067241255543666200232410ustar00rootroot00000000000000 #include "nameassignment.hpp" //********************************************************************************************************************** NameAssignment::NameAssignment(string nameMapFile){ m = MothurOut::getInstance(); m->openInputFile(nameMapFile, fileHandle); } //********************************************************************************************************************** NameAssignment::NameAssignment(){ m = MothurOut::getInstance(); } //********************************************************************************************************************** void NameAssignment::readMap(){ try{ string firstCol, secondCol, skip; // int index = 0; map::iterator itData; int rowIndex = 0; while(fileHandle){ fileHandle >> firstCol; m->gobble(fileHandle); //read from first column fileHandle >> secondCol; //read from second column if (m->debug) { m->mothurOut("[DEBUG]: firstCol = " + firstCol + ", secondCol= " + secondCol + "\n"); } itData = (*this).find(firstCol); if (itData == (*this).end()) { (*this)[firstCol] = rowIndex++; list.push_back(secondCol); //adds data's value to list reverse[rowIndex] = firstCol; }else{ m->mothurOut(firstCol + " is already in namesfile. I will use first definition."); m->mothurOutEndLine(); } m->gobble(fileHandle); } fileHandle.close(); } catch(exception& e) { m->errorOut(e, "NameAssignment", "readMap"); exit(1); } } //********************************************************************************************************************** void NameAssignment::push_back(string name) { try{ int num = (*this).size(); (*this)[name] = num; reverse[num] = name; list.push_back(name); } catch(exception& e) { m->errorOut(e, "NameAssignment", "push_back"); exit(1); } } //********************************************************************************************************************** ListVector NameAssignment::getListVector(void){ return list; } //********************************************************************************************************************** void NameAssignment::print(ostream& out){ try { map::iterator it; //cout << (*this).size() << endl; for(it = (*this).begin(); it!=(*this).end(); it++){ out << it->first << '\t' << it->second << endl; //prints out keys and values of the map this. //out << it->first << '\t' << it->first << endl; } } catch(exception& e) { m->errorOut(e, "NameAssignment", "print"); exit(1); } } //********************************************************************************************************************** int NameAssignment::get(string key){ try { map::iterator itGet = (*this).find(key); //if you can't find it if (itGet == (*this).end()) { return -1; } return (*this)[key]; } catch(exception& e) { m->errorOut(e, "NameAssignment", "get"); exit(1); } } //********************************************************************************************************************** string NameAssignment::get(int key){ try { map::iterator itGet = reverse.find(key); if (itGet == reverse.end()) { return "not found"; } return reverse[key]; } catch(exception& e) { m->errorOut(e, "NameAssignment", "get"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/datastructures/nameassignment.hpp000066400000000000000000000007151255543666200232400ustar00rootroot00000000000000#ifndef NAMEASSIGNMENT_HPP #define NAMEASSIGNMENT_HPP #include "mothur.h" #include "listvector.hpp" class NameAssignment : public map { public: NameAssignment(string); NameAssignment(); ~NameAssignment(){} void readMap(); ListVector getListVector(); int get(string); string get(int); void print(ostream&); void push_back(string); private: ifstream fileHandle; ListVector list; map reverse; MothurOut* m; }; #endif mothur-1.36.1/source/datastructures/oligos.cpp000066400000000000000000000556141255543666200215260ustar00rootroot00000000000000// // oligos.cpp // Mothur // // Created by Sarah Westcott on 4/4/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #include "oligos.h" /**************************************************************************************************/ Oligos::Oligos(string o){ try { m = MothurOut::getInstance(); hasPPrimers = false; hasPBarcodes = false; pairedOligos = false; reversePairs = true; indexBarcode = 0; indexPairedBarcode = 0; indexPrimer = 0; indexPairedPrimer = 0; oligosfile = o; readOligos(); if (pairedOligos) { numBarcodes = pairedBarcodes.size(); numFPrimers = pairedPrimers.size(); }else { numBarcodes = barcodes.size(); numFPrimers = primers.size(); } } catch(exception& e) { m->errorOut(e, "Oligos", "Oligos"); exit(1); } } /**************************************************************************************************/ Oligos::Oligos(){ try { m = MothurOut::getInstance(); hasPPrimers = false; hasPBarcodes = false; pairedOligos = false; reversePairs = true; indexBarcode = 0; indexPairedBarcode = 0; indexPrimer = 0; indexPairedPrimer = 0; numFPrimers = 0; numBarcodes = 0; } catch(exception& e) { m->errorOut(e, "Oligos", "Oligos"); exit(1); } } /**************************************************************************************************/ int Oligos::read(string o){ try { oligosfile = o; readOligos(); if (pairedOligos) { numBarcodes = pairedBarcodes.size(); numFPrimers = pairedPrimers.size(); }else { numBarcodes = barcodes.size(); numFPrimers = primers.size(); } return 0; } catch(exception& e) { m->errorOut(e, "Oligos", "read"); exit(1); } } /**************************************************************************************************/ int Oligos::read(string o, bool reverse){ try { oligosfile = o; reversePairs = reverse; readOligos(); if (pairedOligos) { numBarcodes = pairedBarcodes.size(); numFPrimers = pairedPrimers.size(); }else { numBarcodes = barcodes.size(); numFPrimers = primers.size(); } return 0; } catch(exception& e) { m->errorOut(e, "Oligos", "read"); exit(1); } } //*************************************************************************************************************** int Oligos::readOligos(){ try { ifstream inOligos; m->openInputFile(oligosfile, inOligos); string type, oligo, roligo, group; while(!inOligos.eof()){ inOligos >> type; if (m->debug) { m->mothurOut("[DEBUG]: reading type - " + type + ".\n"); } if(type[0] == '#'){ while (!inOligos.eof()) { char c = inOligos.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there m->gobble(inOligos); } else{ m->gobble(inOligos); //make type case insensitive for(int i=0;i> oligo; if (m->debug) { m->mothurOut("[DEBUG]: reading - " + oligo + ".\n"); } for(int i=0;i::iterator itPrime = primers.find(oligo); if (itPrime != primers.end()) { m->mothurOut("[WARNING]: primer " + oligo + " is in your oligos file already, disregarding."); m->mothurOutEndLine(); } else { if (m->debug) { if (group != "") { m->mothurOut("[DEBUG]: reading group " + group + ".\n"); }else{ m->mothurOut("[DEBUG]: no group for primer " + oligo + ".\n"); } } primers[oligo]=indexPrimer; indexPrimer++; primerNameVector.push_back(group); } } else if (type == "PRIMER"){ m->gobble(inOligos); inOligos >> roligo; for(int i=0;idebug) { m->mothurOut("[DEBUG]: primer pair " + newPrimer.forward + " " + newPrimer.reverse + ", and group = " + group + ".\n"); } //check for repeat barcodes string tempPair = oligo+roligo; if (uniquePrimers.count(tempPair) != 0) { m->mothurOut("primer pair " + newPrimer.forward + " " + newPrimer.reverse + " is in your oligos file already, disregarding."); m->mothurOutEndLine(); } else { uniquePrimers.insert(tempPair); if (m->debug) { if (group != "") { m->mothurOut("[DEBUG]: reading group " + group + ".\n"); }else{ m->mothurOut("[DEBUG]: no group for primer pair " + newPrimer.forward + " " + newPrimer.reverse + ".\n"); } } pairedPrimers[indexPairedPrimer]=newPrimer; indexPairedPrimer++; primerNameVector.push_back(group); hasPPrimers = true; } } else if(type == "REVERSE"){ string oligoRC = reverseOligo(oligo); revPrimer.push_back(oligoRC); } else if(type == "BARCODE"){ inOligos >> group; //barcode lines can look like BARCODE atgcatgc groupName - for 454 seqs //or BARCODE atgcatgc atgcatgc groupName - for illumina data that has forward and reverse info string temp = ""; while (!inOligos.eof()) { char c = inOligos.get(); if (c == 10 || c == 13 || c == -1){ break; } else if (c == 32 || c == 9){;} //space or tab else { temp += c; } } //then this is illumina data with 4 columns if (temp != "") { hasPBarcodes = true; string reverseBarcode = group; //reverseOligo(group); //reverse barcode group = temp; for(int i=0;idebug) { m->mothurOut("[DEBUG]: barcode pair " + newPair.forward + " " + newPair.reverse + ", and group = " + group + ".\n"); } //check for repeat barcodes string tempPair = oligo+reverseBarcode; if (uniqueBarcodes.count(tempPair) != 0) { m->mothurOut("barcode pair " + newPair.forward + " " + newPair.reverse + " is in your oligos file already, disregarding."); m->mothurOutEndLine(); } else { uniqueBarcodes.insert(tempPair); pairedBarcodes[indexPairedBarcode]=newPair; indexPairedBarcode++; barcodeNameVector.push_back(group); } }else { //check for repeat barcodes map::iterator itBar = barcodes.find(oligo); if (itBar != barcodes.end()) { m->mothurOut("[WARNING]: barcode " + oligo + " is in your oligos file already, disregarding."); m->mothurOutEndLine(); } else { barcodes[oligo]=indexBarcode; indexBarcode++; barcodeNameVector.push_back(group); } } }else if(type == "LINKER"){ linker.push_back(oligo); }else if(type == "SPACER"){ spacer.push_back(oligo); } else{ m->mothurOut("[WARNING]: " + type + " is not recognized as a valid type. Choices are forward, reverse, and barcode. Ignoring " + oligo + "."); m->mothurOutEndLine(); } } m->gobble(inOligos); } inOligos.close(); if ((linker.size() == 0) && (spacer.size() == 0) && (pairedBarcodes.size() == 0) && (barcodes.size() == 0) && (pairedPrimers.size() == 0) && (primers.size() == 0) && (revPrimer.size() == 0)) { m->mothurOut("[ERROR]: invalid oligos file, quitting.\n"); m->control_pressed = true; return 0; } if (hasPBarcodes || hasPPrimers) { pairedOligos = true; if ((primers.size() != 0) || (barcodes.size() != 0) || (linker.size() != 0) || (spacer.size() != 0) || (revPrimer.size() != 0)) { m->control_pressed = true; m->mothurOut("[ERROR]: cannot mix paired primers and barcodes with non paired or linkers and spacers, quitting."); m->mothurOutEndLine(); return 0; } } //add in potential combos if(barcodeNameVector.size() == 0){ if (pairedOligos) { oligosPair newPair("", ""); pairedBarcodes[0] = newPair; }else { barcodes[""] = 0; } barcodeNameVector.push_back(""); } if(primerNameVector.size() == 0){ if (pairedOligos) { oligosPair newPair("", ""); pairedPrimers[0] = newPair; }else { primers[""] = 0; } primerNameVector.push_back(""); } if (pairedOligos) { for(map::iterator itBar = pairedBarcodes.begin();itBar != pairedBarcodes.end();itBar++){ for(map::iterator itPrimer = pairedPrimers.begin();itPrimer != pairedPrimers.end(); itPrimer++){ string primerName = primerNameVector[itPrimer->first]; string barcodeName = barcodeNameVector[itBar->first]; if (m->debug) { m->mothurOut("[DEBUG]: primerName = " + primerName + " barcodeName = " + barcodeName + "\n"); } if ((primerName == "ignore") || (barcodeName == "ignore")) { if (m->debug) { m->mothurOut("[DEBUG]: in ignore. \n"); } } //do nothing else if ((primerName == "") && (barcodeName == "")) { if (m->debug) { m->mothurOut("[DEBUG]: in blank. \n"); } } //do nothing else { string comboGroupName = ""; string fastqFileName = ""; if(primerName == ""){ comboGroupName = barcodeNameVector[itBar->first]; } else{ if(barcodeName == ""){ comboGroupName = primerNameVector[itPrimer->first]; } else{ comboGroupName = barcodeNameVector[itBar->first] + "." + primerNameVector[itPrimer->first]; } } if (m->debug) { m->mothurOut("[DEBUG]: comboGroupName = " + comboGroupName + "\n"); } uniqueNames.insert(comboGroupName); map >::iterator itGroup2Barcode = Group2Barcode.find(comboGroupName); if (itGroup2Barcode == Group2Barcode.end()) { vector tempBarcodes; tempBarcodes.push_back((itBar->second).forward+"."+(itBar->second).reverse); Group2Barcode[comboGroupName] = tempBarcodes; }else { Group2Barcode[comboGroupName].push_back((itBar->second).forward+"."+(itBar->second).reverse); } itGroup2Barcode = Group2Primer.find(comboGroupName); if (itGroup2Barcode == Group2Primer.end()) { vector tempPrimers; tempPrimers.push_back((itPrimer->second).forward+"."+(itPrimer->second).reverse); Group2Primer[comboGroupName] = tempPrimers; }else { Group2Primer[comboGroupName].push_back((itPrimer->second).forward+"."+(itPrimer->second).reverse); } } } } }else { for(map::iterator itBar = barcodes.begin();itBar != barcodes.end();itBar++){ for(map::iterator itPrimer = primers.begin();itPrimer != primers.end(); itPrimer++){ string primerName = primerNameVector[itPrimer->second]; string barcodeName = barcodeNameVector[itBar->second]; if ((primerName == "ignore") || (barcodeName == "ignore")) { } //do nothing else if ((primerName == "") && (barcodeName == "")) { } //do nothing else { string comboGroupName = ""; string fastqFileName = ""; if(primerName == ""){ comboGroupName = barcodeNameVector[itBar->second]; } else{ if(barcodeName == ""){ comboGroupName = primerNameVector[itPrimer->second]; } else{ comboGroupName = barcodeNameVector[itBar->second] + "." + primerNameVector[itPrimer->second]; } } uniqueNames.insert(comboGroupName); map >::iterator itGroup2Barcode = Group2Barcode.find(comboGroupName); if (itGroup2Barcode == Group2Barcode.end()) { vector tempBarcodes; tempBarcodes.push_back(itBar->first); Group2Barcode[comboGroupName] = tempBarcodes; }else { Group2Barcode[comboGroupName].push_back(itBar->first); } itGroup2Barcode = Group2Primer.find(comboGroupName); if (itGroup2Barcode == Group2Primer.end()) { vector tempPrimers; tempPrimers.push_back(itPrimer->first); Group2Primer[comboGroupName] = tempPrimers; }else { Group2Primer[comboGroupName].push_back(itPrimer->first); } } } } } if (m->debug) { int count = 0; for (set::iterator it = uniqueNames.begin(); it != uniqueNames.end(); it++) { m->mothurOut("[DEBUG]: " + toString(count) + " groupName = " + *it + "\n"); count++; } } Groups.clear(); for (set::iterator it = uniqueNames.begin(); it != uniqueNames.end(); it++) { Groups.push_back(*it); } return 0; } catch(exception& e) { m->errorOut(e, "Oligos", "readOligos"); exit(1); } } //********************************************************************/ vector Oligos::getBarcodes(string groupName){ try { vector thisGroupsBarcodes; map >::iterator it = Group2Barcode.find(groupName); if (it == Group2Barcode.end()) { m->mothurOut("[ERROR]: no barcodes found for group " + groupName + ".\n"); m->control_pressed=true; }else { thisGroupsBarcodes = it->second; } return thisGroupsBarcodes; } catch(exception& e) { m->errorOut(e, "Oligos", "getBarcodes"); exit(1); } } //********************************************************************/ vector Oligos::getPrimers(string groupName){ try { vector thisGroupsPrimers; map >::iterator it = Group2Primer.find(groupName); if (it == Group2Primer.end()) { m->mothurOut("[ERROR]: no primers found for group " + groupName + ".\n"); m->control_pressed=true; }else { thisGroupsPrimers = it->second; } return thisGroupsPrimers; } catch(exception& e) { m->errorOut(e, "Oligos", "getPrimers"); exit(1); } } //********************************************************************/ //can't have paired and unpaired so this function will either run the paired map or the unpaired map Oligos::getReorientedPairedPrimers(){ try { map rpairedPrimers; for (map::iterator it = pairedPrimers.begin(); it != pairedPrimers.end(); it++) { string forward = (it->second).reverse; if (reversePairs) { forward = reverseOligo(forward); } string reverse = (it->second).forward; if (reversePairs) { reverse = reverseOligo(reverse); } oligosPair tempPair(forward, reverse); //reversePrimer, rc ForwardPrimer rpairedPrimers[it->first] = tempPair; } for (map::iterator it = primers.begin(); it != primers.end(); it++) { oligosPair tempPair("", reverseOligo((it->first))); //reverseBarcode, rc ForwardBarcode rpairedPrimers[it->second] = tempPair; } return rpairedPrimers; } catch(exception& e) { m->errorOut(e, "Oligos", "getReorientedPairedPrimers"); exit(1); } } //********************************************************************/ //can't have paired and unpaired so this function will either run the paired map or the unpaired map Oligos::getReorientedPairedBarcodes(){ try { map rpairedBarcodes; for (map::iterator it = pairedBarcodes.begin(); it != pairedBarcodes.end(); it++) { string forward = (it->second).reverse; if (reversePairs) { forward = reverseOligo(forward); } string reverse = (it->second).forward; if (reversePairs) { reverse = reverseOligo(reverse); } oligosPair tempPair(forward, reverse); //reversePrimer, rc ForwardPrimer rpairedBarcodes[it->first] = tempPair; } for (map::iterator it = barcodes.begin(); it != barcodes.end(); it++) { oligosPair tempPair("", reverseOligo((it->first))); //reverseBarcode, rc ForwardBarcode rpairedBarcodes[it->second] = tempPair; } return rpairedBarcodes; } catch(exception& e) { m->errorOut(e, "Oligos", "getReorientedPairedBarcodes"); exit(1); } } //********************************************************************/ string Oligos::reverseOligo(string oligo){ try { if (oligo == "NONE") { return "NONE"; } string reverse = ""; for(int i=oligo.length()-1;i>=0;i--){ if(oligo[i] == 'A') { reverse += 'T'; } else if(oligo[i] == 'T'){ reverse += 'A'; } else if(oligo[i] == 'U'){ reverse += 'A'; } else if(oligo[i] == 'G'){ reverse += 'C'; } else if(oligo[i] == 'C'){ reverse += 'G'; } else if(oligo[i] == 'R'){ reverse += 'Y'; } else if(oligo[i] == 'Y'){ reverse += 'R'; } else if(oligo[i] == 'M'){ reverse += 'K'; } else if(oligo[i] == 'K'){ reverse += 'M'; } else if(oligo[i] == 'W'){ reverse += 'W'; } else if(oligo[i] == 'S'){ reverse += 'S'; } else if(oligo[i] == 'B'){ reverse += 'V'; } else if(oligo[i] == 'V'){ reverse += 'B'; } else if(oligo[i] == 'D'){ reverse += 'H'; } else if(oligo[i] == 'H'){ reverse += 'D'; } else { reverse += 'N'; } } return reverse; } catch(exception& e) { m->errorOut(e, "Oligos", "reverseOligo"); exit(1); } } //********************************************************************/ string Oligos::getBarcodeName(int index){ try { string name = ""; if ((index >= 0) && (index < barcodeNameVector.size())) { name = barcodeNameVector[index]; } return name; } catch(exception& e) { m->errorOut(e, "Oligos", "getBarcodeName"); exit(1); } } //********************************************************************/ string Oligos::getPrimerName(int index){ try { string name = ""; if ((index >= 0) && (index < primerNameVector.size())) { name = primerNameVector[index]; } return name; } catch(exception& e) { m->errorOut(e, "Oligos", "getPrimerName"); exit(1); } } //********************************************************************/ string Oligos::getGroupName(int barcodeIndex, int primerIndex){ try { string thisGroup = ""; if(numBarcodes != 0){ thisGroup = getBarcodeName(barcodeIndex); if (numFPrimers != 0) { if (getPrimerName(primerIndex) != "") { if(thisGroup != "") { thisGroup += "." + getPrimerName(primerIndex); }else { thisGroup = getPrimerName(primerIndex); } } } } return thisGroup; } catch(exception& e) { m->errorOut(e, "Oligos", "getGroupName"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/datastructures/oligos.h000066400000000000000000000060361255543666200211650ustar00rootroot00000000000000// // oligos.h // Mothur // // Created by Sarah Westcott on 4/4/14. // Copyright (c) 2014 Schloss Lab. All rights reserved. // #ifndef Mothur_oligos_h #define Mothur_oligos_h #include "mothurout.h" /**************************************************************************************************/ class Oligos { public: Oligos(string); Oligos(); ~Oligos() {} int read(string); int read(string, bool); //read without reversing the paired barcodes, for make.contigs. bool hasPairedPrimers() { return hasPPrimers; } bool hasPairedBarcodes() { return hasPBarcodes; } //for processing with trimOligos class map getPairedPrimers() { return pairedPrimers; } map getPairedBarcodes() { return pairedBarcodes; } map getPrimers() { return primers; } map getBarcodes() { return barcodes; } map getReorientedPairedPrimers(); map getReorientedPairedBarcodes(); map getReorientedPrimers(); map getReorientedBarcodes(); vector getLinkers() { return linker; } vector getSpacers() { return spacer; } vector getReversePrimers() { return revPrimer; } vector getPrimerNames() { return primerNameVector; } vector getBarcodeNames() { return barcodeNameVector; } vector getGroupNames() { return Groups; } //for printing and other formatting uses vector getBarcodes(string); //get barcodes for a group. For paired barcodes will return forward.reverse vector getPrimers(string); //get primers for a group. For paired primers will return forward.reverse string getBarcodeName(int); string getPrimerName(int); string getGroupName(int, int); private: set uniqueNames; vector Groups; vector revPrimer; map > Group2Barcode; map > Group2Primer; map pairedBarcodes; map pairedPrimers; map primers; map barcodes; vector linker; vector spacer; vector primerNameVector; vector barcodeNameVector; bool hasPPrimers, hasPBarcodes, pairedOligos, reversePairs; string oligosfile; int numBarcodes, numFPrimers; MothurOut* m; int indexPrimer; int indexBarcode; int indexPairedPrimer; int indexPairedBarcode; set uniquePrimers; set uniqueBarcodes; int readOligos(); string reverseOligo(string); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/ordervector.cpp000066400000000000000000000116301255543666200225560ustar00rootroot00000000000000/* * order.cpp * * * Created by Pat Schloss on 8/8/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "ordervector.hpp" /***********************************************************************/ OrderVector::OrderVector() : DataVector() {} /***********************************************************************/ //OrderVector::OrderVector(int ns) : DataVector(), data(ns, -1) {} /***********************************************************************/ OrderVector::OrderVector(string id, vector ov) : DataVector(id), data(ov) { updateStats(); } /***********************************************************************/ OrderVector::OrderVector(ifstream& f) : DataVector() { try { int hold; f >> label; f >> hold; data.assign(hold, -1); int inputData; for(int i=0;i> inputData; set(i, inputData); } updateStats(); } catch(exception& e) { m->errorOut(e, "OrderVector", "OrderVector"); exit(1); } } /***********************************************************************/ int OrderVector::getNumBins(){ if(needToUpdate == 1){ updateStats(); } return numBins; } /***********************************************************************/ int OrderVector::getNumSeqs(){ if(needToUpdate == 1){ updateStats(); } return numSeqs; } /***********************************************************************/ int OrderVector::getMaxRank(){ if(needToUpdate == 1){ updateStats(); } return maxRank; } /***********************************************************************/ void OrderVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; data.clear(); } /***********************************************************************/ void OrderVector::set(int index, int binNumber){ data[index] = binNumber; needToUpdate = 1; } /***********************************************************************/ int OrderVector::get(int index){ return data[index]; } /***********************************************************************/ void OrderVector::push_back(int index){ data.push_back(index); needToUpdate = 1; } /***********************************************************************/ void OrderVector::print(ostream& output){ try { output << label << '\t' << numSeqs; for(int i=0;ierrorOut(e, "OrderVector", "print"); exit(1); } } /***********************************************************************/ void OrderVector::print(string prefix, ostream& output){ try { output << prefix << '\t' << numSeqs; for(int i=0;ierrorOut(e, "OrderVector", "print"); exit(1); } } /***********************************************************************/ void OrderVector::resize(int){ m->mothurOut("resize() did nothing in class OrderVector"); } /***********************************************************************/ int OrderVector::size(){ return data.size(); } /***********************************************************************/ vector::iterator OrderVector::begin(){ return data.begin(); } /***********************************************************************/ vector::iterator OrderVector::end(){ return data.end(); } /***********************************************************************/ RAbundVector OrderVector::getRAbundVector(){ try { RAbundVector rav(data.size()); for(int i=0;i=0;i--){ if(rav.get(i) == 0){ rav.pop_back(); } else{ break; } } rav.setLabel(label); return rav; } catch(exception& e) { m->errorOut(e, "OrderVector", "getRAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector OrderVector::getSAbundVector(){ RAbundVector rav(this->getRAbundVector()); return rav.getSAbundVector(); } /***********************************************************************/ OrderVector OrderVector::getOrderVector(map* hold = 0){ return *this; } /***********************************************************************/ void OrderVector::updateStats(){ try { needToUpdate = 0; // int maxBinVectorLength = 0; numSeqs = 0; numBins = 0; maxRank = 0; for(int i=0;i hold(numSeqs); for(int i=0;i 0) { numBins++; } if(hold[i] > maxRank) { maxRank = hold[i]; } } } catch(exception& e) { m->errorOut(e, "OrderVector", "updateStats"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/ordervector.hpp000066400000000000000000000033731255543666200225700ustar00rootroot00000000000000#ifndef ORDER_H #define ORDER_H #include "datavector.hpp" #include "sabundvector.hpp" #include "rabundvector.hpp" /* This class is a child to datavector. It represents OTU information at a certain distance. A order vector can be converted into and listvector, rabundvector or sabundvector. Each member of the internal container "data" represents the OTU from which it came. So in the example below since there are 6 sequences in OTU 1 there are six 1's in the ordervector. and since there are 2 sequences in OTU 3 there are two 3's in the ordervector. example: listvector = a,b,c,d,e,f g,h,i j,k l m rabundvector = 6 3 2 1 1 sabundvector = 2 1 1 0 0 1 ordervector = 1 1 1 1 1 1 2 2 2 3 3 4 5 */ class OrderVector : public DataVector { public: OrderVector(); // OrderVector(int); // OrderVector(const OrderVector& ov); OrderVector(int ns, int nb=0, int mr=0) : DataVector(), data(ns, -1), maxRank(0), numBins(0), numSeqs(0) {}; OrderVector(const OrderVector& ov) : DataVector(ov.label), data(ov.data), maxRank(ov.maxRank), numBins(ov.numBins), numSeqs(ov.numSeqs), needToUpdate(ov.needToUpdate) {if(needToUpdate == 1){ updateStats();}}; OrderVector(string, vector); OrderVector(ifstream&); ~OrderVector(){}; void set(int, int); int get(int); void push_back(int); void resize(int); int size(); void clear(); void print(string, ostream&); vector::iterator begin(); vector::iterator end(); void print(ostream&); int getNumBins(); int getNumSeqs(); int getMaxRank(); RAbundVector getRAbundVector(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); private: vector data; int maxRank; int numBins; int numSeqs; bool needToUpdate; void updateStats(); }; #endif mothur-1.36.1/source/datastructures/qualityscores.cpp000066400000000000000000000440201255543666200231260ustar00rootroot00000000000000/* * qualityscores.cpp * Mothur * * Created by Pat Schloss on 7/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "qualityscores.h" /**************************************************************************************************/ QualityScores::QualityScores(){ try { m = MothurOut::getInstance(); seqName = ""; seqLength = -1; } catch(exception& e) { m->errorOut(e, "QualityScores", "QualityScores"); exit(1); } } /**************************************************************************************************/ QualityScores::QualityScores(string n, vector s){ try { m = MothurOut::getInstance(); setName(n); setScores(s); } catch(exception& e) { m->errorOut(e, "QualityScores", "QualityScores"); exit(1); } } /**************************************************************************************************/ QualityScores::QualityScores(ifstream& qFile){ try { m = MothurOut::getInstance(); int score; seqName = getSequenceName(qFile); m->gobble(qFile); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "'\n."); } if (!m->control_pressed) { string qScoreString = m->getline(qFile); m->gobble(qFile); if (m->debug) { m->mothurOut("[DEBUG]: scores = '" + qScoreString + "'\n."); } while(qFile.peek() != '>' && qFile.peek() != EOF){ if (m->control_pressed) { break; } string temp = m->getline(qFile); m->gobble(qFile); //if (m->debug) { m->mothurOut("[DEBUG]: scores = '" + temp + "'\n."); } qScoreString += ' ' + temp; } //cout << "done reading " << endl; istringstream qScoreStringStream(qScoreString); int count = 0; while(!qScoreStringStream.eof()){ if (m->control_pressed) { break; } string temp; qScoreStringStream >> temp; m->gobble(qScoreStringStream); //if (m->debug) { m->mothurOut("[DEBUG]: score " + toString(qScores.size()) + " = '" + temp + "'\n."); } //check temp to make sure its a number if (!m->isContainingOnlyDigits(temp)) { m->mothurOut("[ERROR]: In sequence " + seqName + "'s quality scores, expected a number and got " + temp + ", setting score to 0."); m->mothurOutEndLine(); temp = "0"; } convert(temp, score); //cout << count << '\t' << score << endl; qScores.push_back(score); count++; } } seqLength = qScores.size(); //cout << "seqlength = " << seqLength << endl; } catch(exception& e) { m->errorOut(e, "QualityScores", "QualityScores"); exit(1); } } /**************************************************************************************************/ #ifdef USE_BOOST QualityScores::QualityScores(boost::iostreams::filtering_istream& qFile){ try { m = MothurOut::getInstance(); int score; seqName = getSequenceName(qFile); m->gobble(qFile); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "'\n."); } if (!m->control_pressed) { string qScoreString = ""; std::getline(qFile, qScoreString); m->gobble(qFile); if (m->debug) { m->mothurOut("[DEBUG]: scores = '" + qScoreString + "'\n."); } while(qFile.peek() != '>' && qFile.peek() != EOF){ if (m->control_pressed) { break; } string temp = ""; std::getline(qFile, temp); m->gobble(qFile); //if (m->debug) { m->mothurOut("[DEBUG]: scores = '" + temp + "'\n."); } qScoreString += ' ' + temp; } //cout << "done reading " << endl; istringstream qScoreStringStream(qScoreString); int count = 0; while(!qScoreStringStream.eof()){ if (m->control_pressed) { break; } string temp; qScoreStringStream >> temp; m->gobble(qScoreStringStream); //if (m->debug) { m->mothurOut("[DEBUG]: score " + toString(qScores.size()) + " = '" + temp + "'\n."); } //check temp to make sure its a number if (!m->isContainingOnlyDigits(temp)) { m->mothurOut("[ERROR]: In sequence " + seqName + "'s quality scores, expected a number and got " + temp + ", setting score to 0."); m->mothurOutEndLine(); temp = "0"; } convert(temp, score); //cout << count << '\t' << score << endl; qScores.push_back(score); count++; } } seqLength = qScores.size(); //cout << "seqlength = " << seqLength << endl; } catch(exception& e) { m->errorOut(e, "QualityScores", "QualityScores"); exit(1); } } #endif /**************************************************************************************************/ int QualityScores::read(ifstream& qFile){ try { int score; seqName = getSequenceName(qFile); m->gobble(qFile); if (m->debug) { m->mothurOut("[DEBUG]: name = '" + seqName + "'\n."); } if (!m->control_pressed) { string qScoreString = m->getline(qFile); m->gobble(qFile); if (m->debug) { m->mothurOut("[DEBUG]: scores = '" + qScoreString + "'\n."); } while(qFile.peek() != '>' && qFile.peek() != EOF){ if (m->control_pressed) { break; } string temp = m->getline(qFile); m->gobble(qFile); //if (m->debug) { m->mothurOut("[DEBUG]: scores = '" + temp + "'\n."); } qScoreString += ' ' + temp; } //cout << "done reading " << endl; istringstream qScoreStringStream(qScoreString); int count = 0; while(!qScoreStringStream.eof()){ if (m->control_pressed) { break; } string temp; qScoreStringStream >> temp; m->gobble(qScoreStringStream); //if (m->debug) { m->mothurOut("[DEBUG]: score " + toString(qScores.size()) + " = '" + temp + "'\n."); } //check temp to make sure its a number if (!m->isContainingOnlyDigits(temp)) { m->mothurOut("[ERROR]: In sequence " + seqName + "'s quality scores, expected a number and got " + temp + ", setting score to 0."); m->mothurOutEndLine(); temp = "0"; } convert(temp, score); //cout << count << '\t' << score << endl; qScores.push_back(score); count++; } } seqLength = qScores.size(); //cout << "seqlength = " << seqLength << endl; return seqLength; } catch(exception& e) { m->errorOut(e, "QualityScores", "read"); exit(1); } } //******************************************************************************************************************** string QualityScores::getSequenceName(ifstream& qFile) { try { string name = ""; qFile >> name; m->getline(qFile); if (name.length() != 0) { name = name.substr(1); m->checkName(name); }else{ m->mothurOut("Error in reading your qfile, at position " + toString(qFile.tellg()) + ". Blank name."); m->mothurOutEndLine(); m->control_pressed = true; } return name; } catch(exception& e) { m->errorOut(e, "QualityScores", "getSequenceName"); exit(1); } } //******************************************************************************************************************** #ifdef USE_BOOST string QualityScores::getSequenceName(boost::iostreams::filtering_istream& qFile) { try { string name = ""; qFile >> name; string temp; std::getline(qFile, temp); if (name.length() != 0) { name = name.substr(1); m->checkName(name); }else{ m->mothurOut("Error in reading your qfile, at position " + toString(qFile.tellg()) + ". Blank name."); m->mothurOutEndLine(); m->control_pressed = true; } return name; } catch(exception& e) { m->errorOut(e, "QualityScores", "getSequenceName"); exit(1); } } #endif //******************************************************************************************************************** void QualityScores::setName(string name) { try { m->checkName(name); seqName = name; } catch(exception& e) { m->errorOut(e, "QualityScores", "setName"); exit(1); } } /**************************************************************************************************/ string QualityScores::getName(){ try { return seqName; } catch(exception& e) { m->errorOut(e, "QualityScores", "getName"); exit(1); } } /**************************************************************************************************/ void QualityScores::printQScores(ofstream& qFile){ try { double aveQScore = calculateAverage(false); qFile << '>' << seqName << '\t' << aveQScore << endl; for(int i=0;ierrorOut(e, "QualityScores", "printQScores"); exit(1); } } /**************************************************************************************************/ void QualityScores::printQScores(ostream& qFile){ try { double aveQScore = calculateAverage(false); qFile << '>' << seqName << '\t' << aveQScore << endl; for(int i=0;ierrorOut(e, "QualityScores", "printQScores"); exit(1); } } /**************************************************************************************************/ void QualityScores::trimQScores(int start, int end){ try { vector hold; //cout << seqName << '\t' << start << '\t' << end << '\t' << qScores.size() << endl; //for (int i = 0; i < qScores.size(); i++) { cout << qScores[i] << end; } if(end == -1){ hold = vector(qScores.begin()+start, qScores.end()); qScores = hold; } if(start == -1){ if(qScores.size() > end){ hold = vector(qScores.begin(), qScores.begin()+end); qScores = hold; } } seqLength = qScores.size(); } catch(exception& e) { m->errorOut(e, "QualityScores", "trimQScores"); exit(1); } } /**************************************************************************************************/ void QualityScores::flipQScores(){ try { vector temp = qScores; for(int i=0;ierrorOut(e, "QualityScores", "flipQScores"); exit(1); } } /**************************************************************************************************/ bool QualityScores::stripQualThreshold(Sequence& sequence, double qThreshold){ try { string rawSequence = sequence.getUnaligned(); int seqLength = sequence.getNumBases(); if(seqName != sequence.getName()){ m->mothurOut("sequence name mismatch btwn fasta: " + sequence.getName() + " and qual file: " + seqName); m->mothurOutEndLine(); m->control_pressed = true; } int end; for(int i=0;ierrorOut(e, "QualityScores", "flipQScores"); exit(1); } } /**************************************************************************************************/ bool QualityScores::stripQualRollingAverage(Sequence& sequence, double qThreshold, bool logTransform){ try { string rawSequence = sequence.getUnaligned(); int seqLength = sequence.getNumBases(); if(seqName != sequence.getName()){ m->mothurOut("sequence name mismatch btwn fasta: " + sequence.getName() + " and qual file: " + seqName); m->mothurOutEndLine(); } int end = -1; double rollingSum = 0.0000; double value = 0.0; for(int i=0;ierrorOut(e, "QualityScores", "flipQScores"); exit(1); } } /**************************************************************************************************/ bool QualityScores::stripQualWindowAverage(Sequence& sequence, int stepSize, int windowSize, double qThreshold, bool logTransform){ try { string rawSequence = sequence.getUnaligned(); int seqLength = sequence.getNumBases(); if(seqName != sequence.getName()){ m->mothurOut("sequence name mismatch between fasta: " + sequence.getName() + " and qual file: " + seqName); m->mothurOutEndLine(); } int end = windowSize; int start = 0; if(seqLength < windowSize) { return 0; } while((start+windowSize) < seqLength){ double windowSum = 0.0000; for(int i=start;i= seqLength){ end = seqLength; } } if(end == -1){ end = seqLength; } //failed first window if (end < windowSize) { return 0; } sequence.setUnaligned(rawSequence.substr(0,end)); trimQScores(-1, end); return 1; } catch(exception& e) { m->errorOut(e, "QualityScores", "stripQualWindowAverage"); exit(1); } } /**************************************************************************************************/ double QualityScores::calculateAverage(bool logTransform){ double aveQScore = 0.0000; for(int i=0;imothurOut("sequence name mismatch btwn fasta: " + sequence.getName() + " and qual file: " + seqName); m->mothurOutEndLine(); } double aveQScore = calculateAverage(logTransform); if (m->debug) { m->mothurOut("[DEBUG]: " + sequence.getName() + " average = " + toString(aveQScore) + "\n"); } if(aveQScore >= qAverage) { success = 1; } else { success = 0; } return success; } catch(exception& e) { m->errorOut(e, "QualityScores", "cullQualAverage"); exit(1); } } /**************************************************************************************************/ void QualityScores::updateQScoreErrorMap(map >& qualErrorMap, string errorSeq, int start, int stop, int weight){ try { int seqLength = errorSeq.size(); int qIndex = start - 1; for(int i=0;i stop){ break; } } } catch(exception& e) { m->errorOut(e, "QualityScores", "updateQScoreErrorMap"); exit(1); } } /**************************************************************************************************/ void QualityScores::updateForwardMap(vector >& forwardMap, int start, int stop, int weight){ try { int index = 0; for(int i=start-1;ierrorOut(e, "QualityScores", "updateForwardMap"); exit(1); } } /**************************************************************************************************/ void QualityScores::updateReverseMap(vector >& reverseMap, int start, int stop, int weight){ try { int index = 0; for(int i=stop-1;i>=start-1;i--){ reverseMap[index++][qScores[i]] += weight; } } catch(exception& e) { m->errorOut(e, "QualityScores", "updateReverseMap"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/datastructures/qualityscores.h000066400000000000000000000034461255543666200226020ustar00rootroot00000000000000#ifndef QUALITYSCORES #define QUALITYSCORES /* * qualityscores.h * Mothur * * Created by Pat Schloss on 7/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ //DataStructure for a quality file. #include "mothur.h" #include "mothurout.h" #include "sequence.hpp" /**************************************************************************************************/ class QualityScores { public: QualityScores(); ~QualityScores() {} QualityScores(string n, vector qs); QualityScores(ifstream&); #ifdef USE_BOOST QualityScores(boost::iostreams::filtering_istream&); #endif int read(ifstream&); string getName(); int getLength(){ return (int)qScores.size(); } vector getQualityScores() { return qScores; } void printQScores(ofstream&); void printQScores(ostream&); void trimQScores(int, int); void flipQScores(); bool stripQualThreshold(Sequence&, double); bool stripQualRollingAverage(Sequence&, double, bool); bool stripQualWindowAverage(Sequence&, int, int, double, bool); bool cullQualAverage(Sequence&, double, bool); void updateQScoreErrorMap(map >&, string, int, int, int); void updateForwardMap(vector >&, int, int, int); void updateReverseMap(vector >&, int, int, int); void setName(string n); void setScores(vector qs) { qScores = qs; seqLength = qScores.size(); } vector getScores() { return qScores; } private: double calculateAverage(bool); MothurOut* m; vector qScores; string seqName; int seqLength; string getSequenceName(ifstream&); #ifdef USE_BOOST string getSequenceName(boost::iostreams::filtering_istream&); #endif }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/rabundvector.cpp000066400000000000000000000160151255543666200227200ustar00rootroot00000000000000/* * rabundvector.cpp * * * Created by Pat Schloss on 8/8/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "rabundvector.hpp" #include "sabundvector.hpp" #include "ordervector.hpp" #include "calculator.h" /***********************************************************************/ RAbundVector::RAbundVector() : DataVector(), maxRank(0), numBins(0), numSeqs(0) {} /***********************************************************************/ RAbundVector::RAbundVector(int n) : DataVector(), data(n,0) , maxRank(0), numBins(0), numSeqs(0) {} /***********************************************************************/ //RAbundVector::RAbundVector(const RAbundVector& rav) : DataVector(rav), data(rav.data), (rav.label), (rav.maxRank), (rav.numBins), (rav.numSeqs){} /***********************************************************************/ RAbundVector::RAbundVector(string id, vector rav) : DataVector(id), data(rav) { try { numBins = 0; maxRank = 0; numSeqs = 0; for(int i=0;i maxRank) { maxRank = data[i]; } numSeqs += data[i]; } } catch(exception& e) { m->errorOut(e, "RAbundVector", "RAbundVector"); exit(1); } } /***********************************************************************/ RAbundVector::RAbundVector(vector rav, int mr, int nb, int ns) { try { numBins = nb; maxRank = mr; numSeqs = ns; data = rav; } catch(exception& e) { m->errorOut(e, "RAbundVector", "RAbundVector"); exit(1); } } /***********************************************************************/ RAbundVector::RAbundVector(ifstream& f) : DataVector(), maxRank(0), numBins(0), numSeqs(0) { try { int hold; f >> label >> hold; data.assign(hold, 0); int inputData; for(int i=0;i> inputData; set(i, inputData); } } catch(exception& e) { m->errorOut(e, "RAbundVector", "RAbundVector"); exit(1); } } /***********************************************************************/ RAbundVector::~RAbundVector() { } /***********************************************************************/ void RAbundVector::set(int binNumber, int newBinSize){ try { int oldBinSize = data[binNumber]; data[binNumber] = newBinSize; if(oldBinSize == 0) { numBins++; } if(newBinSize == 0) { numBins--; } if(newBinSize > maxRank) { maxRank = newBinSize; } numSeqs += (newBinSize - oldBinSize); } catch(exception& e) { m->errorOut(e, "RAbundVector", "set"); exit(1); } } /***********************************************************************/ int RAbundVector::get(int index){ return data[index]; } /***********************************************************************/ void RAbundVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; data.clear(); } /***********************************************************************/ void RAbundVector::push_back(int binSize){ try { data.push_back(binSize); numBins++; if(binSize > maxRank){ maxRank = binSize; } numSeqs += binSize; } catch(exception& e) { m->errorOut(e, "RAbundVector", "push_back"); exit(1); } } /***********************************************************************/ void RAbundVector::pop_back(){ return data.pop_back(); } /***********************************************************************/ void RAbundVector::resize(int size){ data.resize(size); } /***********************************************************************/ int RAbundVector::size(){ return data.size(); } /***********************************************************************/ void RAbundVector::quicksort(){ sort(data.rbegin(), data.rend()); } /***********************************************************************/ int RAbundVector::sum(){ VecCalc vecCalc; return vecCalc.sumElements(data); } /***********************************************************************/ int RAbundVector::sum(int index){ VecCalc vecCalc; return vecCalc.sumElements(data, index); } /***********************************************************************/ int RAbundVector::numNZ(){ VecCalc vecCalc; return vecCalc.numNZ(data); } /***********************************************************************/ vector::reverse_iterator RAbundVector::rbegin(){ return data.rbegin(); } /***********************************************************************/ vector::reverse_iterator RAbundVector::rend(){ return data.rend(); } /***********************************************************************/ void RAbundVector::nonSortedPrint(ostream& output){ try { output << label << '\t' << numBins; for(int i=0;ierrorOut(e, "RAbundVector", "nonSortedPrint"); exit(1); } } /***********************************************************************/ void RAbundVector::print(string prefix, ostream& output){ try { output << prefix << '\t' << numBins; vector hold = data; sort(hold.rbegin(), hold.rend()); for(int i=0;ierrorOut(e, "RAbundVector", "print"); exit(1); } } /***********************************************************************/ void RAbundVector::print(ostream& output){ try { output << label << '\t' << numBins; vector hold = data; sort(hold.rbegin(), hold.rend()); for(int i=0;ierrorOut(e, "RAbundVector", "print"); exit(1); } } /***********************************************************************/ int RAbundVector::getNumBins(){ return numBins; } /***********************************************************************/ int RAbundVector::getNumSeqs(){ return numSeqs; } /***********************************************************************/ int RAbundVector::getMaxRank(){ return maxRank; } /***********************************************************************/ RAbundVector RAbundVector::getRAbundVector(){ return *this; } /***********************************************************************/ SAbundVector RAbundVector::getSAbundVector() { try { SAbundVector sav(maxRank+1); for(int i=0;ierrorOut(e, "RAbundVector", "getSAbundVector"); exit(1); } } /***********************************************************************/ OrderVector RAbundVector::getOrderVector(map* nameMap = NULL) { try { OrderVector ov; for(int i=0;ierrorOut(e, "RAbundVector", "getOrderVector"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/rabundvector.hpp000066400000000000000000000031561255543666200227270ustar00rootroot00000000000000#ifndef RABUND_H #define RABUND_H #include "datavector.hpp" /* Data Structure for a rabund file. This class is a child to datavector. It represents OTU information at a certain distance. A rabundvector can be converted into and ordervector, listvector or sabundvector. Each member of the internal container "data" represents an individual OTU. So data[0] = 6, because there are six member in that OTU. example: listvector = a,b,c,d,e,f g,h,i j,k l m rabundvector = 6 3 2 1 1 sabundvector = 2 1 1 0 0 1 ordervector = 1 1 1 1 1 1 2 2 2 3 3 4 5 */ //class SAbundVector; //class OrderVector; class RAbundVector : public DataVector { public: RAbundVector(); RAbundVector(int); RAbundVector(vector, int, int, int); // RAbundVector(const RAbundVector&); RAbundVector(string, vector); RAbundVector(const RAbundVector& bv) : DataVector(bv), data(bv.data), maxRank(bv.maxRank), numBins(bv.numBins), numSeqs(bv.numSeqs){}; RAbundVector(ifstream&); ~RAbundVector(); int getNumBins(); int getNumSeqs(); int getMaxRank(); void set(int, int); int get(int); void push_back(int); void pop_back(); void resize(int); int size(); void quicksort(); int sum(); int sum(int); int numNZ(); void clear(); vector::reverse_iterator rbegin(); vector::reverse_iterator rend(); void print(ostream&); void print(string, ostream&); void nonSortedPrint(ostream&); RAbundVector getRAbundVector(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); private: vector data; int maxRank; int numBins; int numSeqs; }; #endif mothur-1.36.1/source/datastructures/referencedb.cpp000066400000000000000000000015561255543666200224720ustar00rootroot00000000000000/* * referencedb.cpp * Mothur * * Created by westcott on 6/29/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "referencedb.h" //needed for testing project //ReferenceDB* ReferenceDB::myInstance; /******************************************************/ ReferenceDB* ReferenceDB::getInstance() { if(myInstance == NULL) { myInstance = new ReferenceDB(); } return myInstance; } /******************************************************/ void ReferenceDB::clearMemory() { referenceSeqs.clear(); setSavedReference(""); for(int i = 0; i < wordGenusProb.size(); i++) { wordGenusProb[i].clear(); } wordGenusProb.clear(); WordPairDiffArr.clear(); setSavedTaxonomy(""); } /******************************************************* ReferenceDB::~ReferenceDB() { myInstance = NULL; } *******************************************************/ mothur-1.36.1/source/datastructures/referencedb.h000066400000000000000000000021431255543666200221300ustar00rootroot00000000000000#ifndef MYREFERENCEDB_H #define MYREFERENCEDB_H /* * referencedb.h * Mothur * * Created by westcott on 6/29/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "sequence.hpp" /***********************************************/ class ReferenceDB { public: static ReferenceDB* getInstance(); void clearMemory(); bool save; vector referenceSeqs; vector< vector > wordGenusProb; vector WordPairDiffArr; string getSavedReference() { return referencefile; } void setSavedReference(string p) { referencefile = p; } string getSavedTaxonomy() { return taxonomyfile; } void setSavedTaxonomy(string p) { taxonomyfile = p; } private: static ReferenceDB* myInstance; ReferenceDB() { referencefile = ""; taxonomyfile = ""; save = false; } ReferenceDB(const ReferenceDB&){}// Disable copy constructor void operator=(const ReferenceDB&){} // Disable assignment operator ~ReferenceDB(){ myInstance = 0; } string referencefile, taxonomyfile; }; /***********************************************/ #endif mothur-1.36.1/source/datastructures/reportfile.cpp000066400000000000000000000034711255543666200223770ustar00rootroot00000000000000/* * reportfile.cpp * Mothur * * Created by Pat Schloss on 12/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" #include "reportfile.h" /**************************************************************************************************/ ReportFile::ReportFile(){ try { m = MothurOut::getInstance(); } catch(exception& e) { m->errorOut(e, "ReportFile", "ReportFile"); exit(1); } } /**************************************************************************************************/ int ReportFile::readHeaders(ifstream& repFile, string repFileName){ try { m->openInputFile(repFileName, repFile); m->getline(repFile); return 0; } catch(exception& e) { m->errorOut(e, "ReportFile", "ReportFile"); exit(1); } } /**************************************************************************************************/ int ReportFile::read(ifstream& repFile){ try { m = MothurOut::getInstance(); repFile >> queryName; repFile >> queryLength; repFile >> templateName; repFile >> templateLength; repFile >> searchMethod; repFile >> dummySearchScore; repFile >> alignmentMethod; repFile >> queryStart; repFile >> queryEnd; repFile >> templateStart; repFile >> templateEnd; repFile >> pairwiseAlignmentLength; repFile >> gapsInQuery; repFile >> gapsInTemplate; repFile >> longestInsert; repFile >> simBtwnQueryAndTemplate; if(dummySearchScore != "nan"){ istringstream stream(dummySearchScore); stream >> searchScore; } else{ searchScore = 0; } m->gobble(repFile); return 0; } catch(exception& e) { m->errorOut(e, "ReportFile", "ReportFile"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/datastructures/reportfile.h000066400000000000000000000033001255543666200220330ustar00rootroot00000000000000#ifndef REPORTFILE #define REPORTFILE /* * reportfile.h * Mothur * * Created by Pat Schloss on 7/12/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ /**************************************************************************************************/ class ReportFile { public: ReportFile(); ~ReportFile() {} int read(ifstream&); int readHeaders(ifstream&, string); string getQueryName() { return queryName; } string getTemplateName() { return templateName; } string getSearchMethod() { return searchMethod; } string getAlignmentMethod() { return alignmentMethod; } int getQueryLength() { return queryLength; } int getTemplateLength() { return templateLength; } int getQueryStart() { return queryStart; } int getQueryEnd() { return queryEnd; } int getTemplateStart() { return templateStart; } int getTemplateEnd() { return templateEnd; } int getPairwiseAlignmentLength() { return pairwiseAlignmentLength; } int getGapsInQuery() { return gapsInQuery; } int getGapsInTemplate() { return gapsInTemplate; } int getLongestInsert() { return longestInsert; } float getSearchScore() { return searchScore; } float getSimBtwnQueryAndTemplate() { return simBtwnQueryAndTemplate; } private: MothurOut* m; string queryName, templateName, searchMethod, alignmentMethod, dummySearchScore; int queryLength, templateLength, queryStart, queryEnd, templateStart, templateEnd, pairwiseAlignmentLength, gapsInQuery, gapsInTemplate, longestInsert; float searchScore, simBtwnQueryAndTemplate; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/sabundvector.cpp000066400000000000000000000126421255543666200227230ustar00rootroot00000000000000/* * sabund.cpp * * * Created by Pat Schloss on 8/8/08. * Copyright 2008 Patrick D. Schloss. All rights resesaved. * */ #include "sabundvector.hpp" #include "calculator.h" /***********************************************************************/ SAbundVector::SAbundVector() : DataVector(), maxRank(0), numBins(0), numSeqs(0){} /***********************************************************************/ SAbundVector::SAbundVector(int size) : DataVector(), data(size, 0), maxRank(0), numBins(0), numSeqs(0) {} /***********************************************************************/ SAbundVector::SAbundVector(string id, vector sav) : DataVector(id), data(sav) { try { for(int i=0;ierrorOut(e, "SAbundVector", "SAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector::SAbundVector(vector dataVec, int mr, int nb, int ns) { try { data = dataVec; maxRank = mr; numBins = nb; numSeqs = ns; } catch(exception& e) { m->errorOut(e, "SAbundVector", "SAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector::SAbundVector(ifstream& f): DataVector(), maxRank(0), numBins(0), numSeqs(0) { try { int hold; f >> label >> hold; data.assign(hold+1, 0); int inputData; for(int i=1;i<=hold;i++){ f >> inputData; set(i, inputData); } } catch(exception& e) { m->errorOut(e, "SAbundVector", "SAbundVector"); exit(1); } } /***********************************************************************/ void SAbundVector::set(int sabund, int abundance){ try { int initSize = data[sabund]; data[sabund] = abundance; if(sabund != 0){ numBins += (abundance - initSize); } numSeqs += sabund * (abundance - initSize); if(sabund > maxRank) { maxRank = sabund; } } catch(exception& e) { m->errorOut(e, "SAbundVector", "set"); exit(1); } } /***********************************************************************/ int SAbundVector::get(int index){ return data[index]; } /***********************************************************************/ void SAbundVector::push_back(int abundance){ try { data.push_back(abundance); maxRank++; numBins += abundance; numSeqs += (maxRank * abundance); } catch(exception& e) { m->errorOut(e, "SAbundVector", "push_back"); exit(1); } } /***********************************************************************/ void SAbundVector::quicksort(){ sort(data.rbegin(), data.rend()); } /***********************************************************************/ int SAbundVector::sum(){ VecCalc vecCalc; return vecCalc.sumElements(data); } /***********************************************************************/ void SAbundVector::resize(int size){ data.resize(size); } /***********************************************************************/ int SAbundVector::size(){ return data.size(); } /***********************************************************************/ void SAbundVector::print(string prefix, ostream& output){ output << prefix << '\t' << maxRank; for(int i=1;i<=maxRank;i++){ output << '\t' << data[i]; } output << endl; } /***********************************************************************/ void SAbundVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; data.clear(); } /***********************************************************************/ void SAbundVector::print(ostream& output){ try { output << label << '\t' << maxRank; for(int i=1;i<=maxRank;i++){ output << '\t' << data[i]; } output << endl; } catch(exception& e) { m->errorOut(e, "SAbundVector", "print"); exit(1); } } /**********************************************************************/ int SAbundVector::getNumBins(){ // if(needToUpdate == 1){ updateStats(); } return numBins; } /***********************************************************************/ int SAbundVector::getNumSeqs(){ // if(needToUpdate == 1){ updateStats(); } return numSeqs; } /***********************************************************************/ int SAbundVector::getMaxRank(){ // if(needToUpdate == 1){ updateStats(); } return maxRank; } /***********************************************************************/ RAbundVector SAbundVector::getRAbundVector(){ try { RAbundVector rav; for(int i=1;i < data.size();i++){ for(int j=0;jerrorOut(e, "SAbundVector", "getRAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector SAbundVector::getSAbundVector(){ return *this; } /***********************************************************************/ OrderVector SAbundVector::getOrderVector(map* hold = NULL){ try { OrderVector ov; int binIndex = 0; for(int i=1;ierrorOut(e, "SAbundVector", "getOrderVector"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sabundvector.hpp000066400000000000000000000031121255543666200227200ustar00rootroot00000000000000#ifndef SABUND_H #define SABUND_H #include "datavector.hpp" #include "rabundvector.hpp" #include "ordervector.hpp" #include "calculator.h" /* Data Structure for a sabund file. This class is a child to datavector. It represents OTU information at a certain distance. A sabundvector can be converted into and ordervector, listvector or rabundvector. Each member of the internal container "data" represents the number of OTU's with that many members, but staring at 1. So data[1] = 2, because there are two OTUs with 1 member. example: listvector = a,b,c,d,e,f g,h,i j,k l m rabundvector = 6 3 2 1 1 sabundvector = 2 1 1 0 0 1 ordervector = 1 1 1 1 1 1 2 2 2 3 3 4 5 */ class SAbundVector : public DataVector { public: SAbundVector(); SAbundVector(int); // SAbundVector(const SAbundVector&); SAbundVector(vector, int, int, int); SAbundVector(string, vector); SAbundVector(const SAbundVector& rv) : DataVector(rv.label), data(rv.data), maxRank(rv.maxRank), numBins(rv.numBins), numSeqs(rv.numSeqs){}; SAbundVector(ifstream&); ~SAbundVector(){}; int getNumBins(); int getNumSeqs(); int getMaxRank(); void set(int, int); int get(int); void push_back(int); void quicksort(); int sum(); void resize(int); int size(); void clear(); void print(ostream&); void print(string, ostream&); RAbundVector getRAbundVector(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); private: vector data; // bool needToUpdate; // void updateStats(); int maxRank; int numBins; int numSeqs; }; #endif mothur-1.36.1/source/datastructures/sequence.cpp000066400000000000000000000653041255543666200220370ustar00rootroot00000000000000/* * sequence.cpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "sequence.hpp" /***********************************************************************/ Sequence::Sequence(){ m = MothurOut::getInstance(); initialize(); } /***********************************************************************/ Sequence::Sequence(string newName, string sequence) { try { m = MothurOut::getInstance(); initialize(); name = newName; m->checkName(name); //setUnaligned removes any gap characters for us setUnaligned(sequence); setAligned(sequence); } catch(exception& e) { m->errorOut(e, "Sequence", "Sequence"); exit(1); } } /***********************************************************************/ Sequence::Sequence(string newName, string sequence, string justUnAligned) { try { m = MothurOut::getInstance(); initialize(); name = newName; m->checkName(name); //setUnaligned removes any gap characters for us setUnaligned(sequence); } catch(exception& e) { m->errorOut(e, "Sequence", "Sequence"); exit(1); } } //******************************************************************************************************************** //this function will jump over commented out sequences, but if the last sequence in a file is commented out it makes a blank seq Sequence::Sequence(istringstream& fastaString){ try { m = MothurOut::getInstance(); initialize(); name = getSequenceName(fastaString); if (!m->control_pressed) { string sequence; //read comments while ((name[0] == '#') && fastaString) { while (!fastaString.eof()) { char c = fastaString.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there sequence = getCommentString(fastaString); if (fastaString) { fastaString >> name; name = name.substr(1); }else { name = ""; break; } } //while (!fastaString.eof()) { char c = fastaString.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there comment = getCommentString(fastaString); int numAmbig = 0; sequence = getSequenceString(fastaString, numAmbig); setAligned(sequence); //setUnaligned removes any gap characters for us setUnaligned(sequence); if ((numAmbig / (float) numBases) > 0.25) { m->mothurOut("[WARNING]: We found more than 25% of the bases in sequence " + name + " to be ambiguous. Mothur is not setup to process protein sequences."); m->mothurOutEndLine(); } } } catch(exception& e) { m->errorOut(e, "Sequence", "Sequence"); exit(1); } } //******************************************************************************************************************** //this function will jump over commented out sequences, but if the last sequence in a file is commented out it makes a blank seq Sequence::Sequence(istringstream& fastaString, string JustUnaligned){ try { m = MothurOut::getInstance(); initialize(); name = getSequenceName(fastaString); if (!m->control_pressed) { string sequence; //read comments while ((name[0] == '#') && fastaString) { while (!fastaString.eof()) { char c = fastaString.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there sequence = getCommentString(fastaString); if (fastaString) { fastaString >> name; name = name.substr(1); }else { name = ""; break; } } //while (!fastaString.eof()) { char c = fastaString.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there comment = getCommentString(fastaString); int numAmbig = 0; sequence = getSequenceString(fastaString, numAmbig); //setUnaligned removes any gap characters for us setUnaligned(sequence); if ((numAmbig / (float) numBases) > 0.25) { m->mothurOut("[WARNING]: We found more than 25% of the bases in sequence " + name + " to be ambiguous. Mothur is not setup to process protein sequences."); m->mothurOutEndLine(); } } } catch(exception& e) { m->errorOut(e, "Sequence", "Sequence"); exit(1); } } //******************************************************************************************************************** //this function will jump over commented out sequences, but if the last sequence in a file is commented out it makes a blank seq Sequence::Sequence(ifstream& fastaFile){ try { m = MothurOut::getInstance(); initialize(); name = getSequenceName(fastaFile); if (!m->control_pressed) { string sequence; //read comments while ((name[0] == '#') && fastaFile) { while (!fastaFile.eof()) { char c = fastaFile.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there sequence = getCommentString(fastaFile); if (fastaFile) { fastaFile >> name; name = name.substr(1); }else { name = ""; break; } } //while (!fastaFile.eof()) { char c = fastaFile.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there comment = getCommentString(fastaFile); int numAmbig = 0; sequence = getSequenceString(fastaFile, numAmbig); setAligned(sequence); //setUnaligned removes any gap characters for us setUnaligned(sequence); if ((numAmbig / (float) numBases) > 0.25) { m->mothurOut("[WARNING]: We found more than 25% of the bases in sequence " + name + " to be ambiguous. Mothur is not setup to process protein sequences."); m->mothurOutEndLine(); } } } catch(exception& e) { m->errorOut(e, "Sequence", "Sequence"); exit(1); } } //******************************************************************************************************************** //this function will jump over commented out sequences, but if the last sequence in a file is commented out it makes a blank seq #ifdef USE_BOOST Sequence::Sequence(boost::iostreams::filtering_istream& fastaFile){ try { m = MothurOut::getInstance(); initialize(); name = getSequenceName(fastaFile); if (!m->control_pressed) { string sequence; //read comments while ((name[0] == '#') && fastaFile) { while (!fastaFile.eof()) { char c = fastaFile.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there sequence = getCommentString(fastaFile); if (fastaFile) { fastaFile >> name; name = name.substr(1); }else { name = ""; break; } } //while (!fastaFile.eof()) { char c = fastaFile.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there comment = getCommentString(fastaFile); int numAmbig = 0; sequence = getSequenceString(fastaFile, numAmbig); setAligned(sequence); //setUnaligned removes any gap characters for us setUnaligned(sequence); if ((numAmbig / (float) numBases) > 0.25) { m->mothurOut("[WARNING]: We found more than 25% of the bases in sequence " + name + " to be ambiguous. Mothur is not setup to process protein sequences."); m->mothurOutEndLine(); } } } catch(exception& e) { m->errorOut(e, "Sequence", "Sequence"); exit(1); } } #endif //******************************************************************************************************************** //this function will jump over commented out sequences, but if the last sequence in a file is commented out it makes a blank seq Sequence::Sequence(ifstream& fastaFile, string& extraInfo, bool getInfo){ try { m = MothurOut::getInstance(); initialize(); extraInfo = ""; name = getSequenceName(fastaFile); if (!m->control_pressed) { string sequence; //read comments while ((name[0] == '#') && fastaFile) { while (!fastaFile.eof()) { char c = fastaFile.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there sequence = getCommentString(fastaFile); if (fastaFile) { fastaFile >> name; name = name.substr(1); }else { name = ""; break; } } //read info after sequence name while (!fastaFile.eof()) { char c = fastaFile.get(); if (c == 10 || c == 13 || c == -1){ break; } extraInfo += c; } comment = extraInfo; int numAmbig = 0; sequence = getSequenceString(fastaFile, numAmbig); setAligned(sequence); //setUnaligned removes any gap characters for us setUnaligned(sequence); if ((numAmbig / (float) numBases) > 0.25) { m->mothurOut("[WARNING]: We found more than 25% of the bases in sequence " + name + " to be ambiguous. Mothur is not setup to process protein sequences."); m->mothurOutEndLine(); } } } catch(exception& e) { m->errorOut(e, "Sequence", "Sequence"); exit(1); } } //******************************************************************************************************************** //this function will jump over commented out sequences, but if the last sequence in a file is commented out it makes a blank seq Sequence::Sequence(ifstream& fastaFile, string JustUnaligned){ try { m = MothurOut::getInstance(); initialize(); name = getSequenceName(fastaFile); if (!m->control_pressed) { string sequence; //read comments while ((name[0] == '#') && fastaFile) { while (!fastaFile.eof()) { char c = fastaFile.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there sequence = getCommentString(fastaFile); if (fastaFile) { fastaFile >> name; name = name.substr(1); }else { name = ""; break; } } //while (!fastaFile.eof()) { char c = fastaFile.get(); if (c == 10 || c == 13){ break; } } // get rest of line if there's any crap there comment = getCommentString(fastaFile); int numAmbig = 0; sequence = getSequenceString(fastaFile, numAmbig); //setUnaligned removes any gap characters for us setUnaligned(sequence); if ((numAmbig / (float) numBases) > 0.25) { m->mothurOut("[WARNING]: We found more than 25% of the bases in sequence " + name + " to be ambiguous. Mothur is not setup to process protein sequences."); m->mothurOutEndLine(); } } } catch(exception& e) { m->errorOut(e, "Sequence", "Sequence"); exit(1); } } //******************************************************************************************************************** string Sequence::getSequenceName(ifstream& fastaFile) { try { string name = ""; fastaFile >> name; if (name.length() != 0) { name = name.substr(1); m->checkName(name); }else{ m->mothurOut("Error in reading your fastafile, at position " + toString(fastaFile.tellg()) + ". Blank name."); m->mothurOutEndLine(); m->control_pressed = true; } return name; } catch(exception& e) { m->errorOut(e, "Sequence", "getSequenceName"); exit(1); } } //******************************************************************************************************************** #ifdef USE_BOOST string Sequence::getSequenceName(boost::iostreams::filtering_istream& fastaFile) { try { string name = ""; fastaFile >> name; if (name.length() != 0) { name = name.substr(1); m->checkName(name); }else{ m->mothurOut("Error in reading your fastafile, at position " + toString(fastaFile.tellg()) + ". Blank name."); m->mothurOutEndLine(); m->control_pressed = true; } return name; } catch(exception& e) { m->errorOut(e, "Sequence", "getSequenceName"); exit(1); } } #endif //******************************************************************************************************************** string Sequence::getSequenceName(istringstream& fastaFile) { try { string name = ""; fastaFile >> name; if (name.length() != 0) { name = name.substr(1); m->checkName(name); }else{ m->mothurOut("Error in reading your fastafile, at position " + toString(fastaFile.tellg()) + ". Blank name."); m->mothurOutEndLine(); m->control_pressed = true; } return name; } catch(exception& e) { m->errorOut(e, "Sequence", "getSequenceName"); exit(1); } } //******************************************************************************************************************** string Sequence::getSequenceString(ifstream& fastaFile, int& numAmbig) { try { char letter; string sequence = ""; numAmbig = 0; while(fastaFile){ letter= fastaFile.get(); if(letter == '>'){ fastaFile.putback(letter); break; }else if (letter == ' ') {;} else if(isprint(letter)){ letter = toupper(letter); if(letter == 'U'){letter = 'T';} if(letter != '.' && letter != '-' && letter != 'A' && letter != 'T' && letter != 'G' && letter != 'C' && letter != 'N'){ letter = 'N'; numAmbig++; } sequence += letter; } } return sequence; } catch(exception& e) { m->errorOut(e, "Sequence", "getSequenceString"); exit(1); } } //******************************************************************************************************************** #ifdef USE_BOOST string Sequence::getSequenceString(boost::iostreams::filtering_istream& fastaFile, int& numAmbig) { try { char letter; string sequence = ""; numAmbig = 0; while(fastaFile){ letter= fastaFile.get(); if(letter == '>'){ fastaFile.putback(letter); break; }else if (letter == ' ') {;} else if(isprint(letter)){ letter = toupper(letter); if(letter == 'U'){letter = 'T';} if(letter != '.' && letter != '-' && letter != 'A' && letter != 'T' && letter != 'G' && letter != 'C' && letter != 'N'){ letter = 'N'; numAmbig++; } sequence += letter; } } return sequence; } catch(exception& e) { m->errorOut(e, "Sequence", "getSequenceString"); exit(1); } } #endif //******************************************************************************************************************** //comment can contain '>' so we need to account for that string Sequence::getCommentString(ifstream& fastaFile) { try { char letter; string temp = ""; while(fastaFile){ letter=fastaFile.get(); if((letter == '\r') || (letter == '\n') || letter == -1){ m->gobble(fastaFile); //in case its a \r\n situation break; }else { temp += letter; } } return temp; } catch(exception& e) { m->errorOut(e, "Sequence", "getCommentString"); exit(1); } } //******************************************************************************************************************** #ifdef USE_BOOST //comment can contain '>' so we need to account for that string Sequence::getCommentString(boost::iostreams::filtering_istream& fastaFile) { try { char letter; string temp = ""; while(fastaFile){ letter=fastaFile.get(); if((letter == '\r') || (letter == '\n') || letter == -1){ m->gobble(fastaFile); //in case its a \r\n situation break; }else { temp += letter; } } return temp; } catch(exception& e) { m->errorOut(e, "Sequence", "getCommentString"); exit(1); } } #endif //******************************************************************************************************************** string Sequence::getSequenceString(istringstream& fastaFile, int& numAmbig) { try { char letter; string sequence = ""; numAmbig = 0; while(!fastaFile.eof()){ letter= fastaFile.get(); if(letter == '>'){ fastaFile.putback(letter); break; }else if (letter == ' ') {;} else if(isprint(letter)){ letter = toupper(letter); if(letter == 'U'){letter = 'T';} if(letter != '.' && letter != '-' && letter != 'A' && letter != 'T' && letter != 'G' && letter != 'C' && letter != 'N'){ letter = 'N'; numAmbig++; } sequence += letter; } } return sequence; } catch(exception& e) { m->errorOut(e, "Sequence", "getSequenceString"); exit(1); } } //******************************************************************************************************************** //comment can contain '>' so we need to account for that string Sequence::getCommentString(istringstream& fastaFile) { try { char letter; string temp = ""; while(fastaFile){ letter=fastaFile.get(); if((letter == '\r') || (letter == '\n') || letter == -1){ m->gobble(fastaFile); //in case its a \r\n situation break; }else { temp += letter; } } return temp; } catch(exception& e) { m->errorOut(e, "Sequence", "getCommentString"); exit(1); } } //******************************************************************************************************************** void Sequence::initialize(){ name = ""; unaligned = ""; aligned = ""; pairwise = ""; comment = ""; numBases = 0; alignmentLength = 0; isAligned = 0; startPos = -1; endPos = -1; longHomoPolymer = -1; ambigBases = -1; } //******************************************************************************************************************** void Sequence::setName(string seqName) { if(seqName[0] == '>') { name = seqName.substr(1); } else { name = seqName; } } //******************************************************************************************************************** void Sequence::setUnaligned(string sequence){ if(sequence.find_first_of('.') != string::npos || sequence.find_first_of('-') != string::npos) { string temp = ""; for(int j=0;j=0;i--){ if(aligned[i] == '-'){ aligned[i] = '.'; } else{ break; } } } isAligned = 1; } //******************************************************************************************************************** void Sequence::setPairwise(string sequence){ pairwise = sequence; } //******************************************************************************************************************** string Sequence::convert2ints() { if(unaligned == "") { /* need to throw an error */ } string processed; for(int i=0;i" << name << comment << endl; if(isAligned){ out << aligned << endl; } else{ out << unaligned << endl; } } //******************************************************************************************************************** int Sequence::getAlignLength(){ return alignmentLength; } //******************************************************************************************************************** int Sequence::getAmbigBases(){ if(ambigBases == -1){ ambigBases = 0; for(int j=0;j longHomoPolymer){ longHomoPolymer = homoPolymer; } homoPolymer = 1; } } if(homoPolymer > longHomoPolymer){ longHomoPolymer = homoPolymer; } } return longHomoPolymer; } //******************************************************************************************************************** int Sequence::getStartPos(){ if(startPos == -1){ for(int j = 0; j < alignmentLength; j++) { if((aligned[j] != '.')&&(aligned[j] != '-')){ startPos = j + 1; break; } } } if(isAligned == 0){ startPos = 1; } return startPos; } //******************************************************************************************************************** void Sequence::padToPos(int start){ for(int j = startPos-1; j < start-1; j++) { aligned[j] = '.'; } startPos = start; } //******************************************************************************************************************** int Sequence::filterToPos(int start){ if (start > aligned.length()) { start = aligned.length(); m->mothurOut("[ERROR]: start to large.\n"); } for(int j = 0; j < start; j++) { aligned[j] = '.'; } //things like ......----------AT become ................AT for(int j = start; j < aligned.length(); j++) { if (isalpha(aligned[j])) { break; } else { aligned[j] = '.'; } } setUnaligned(aligned); return 0; } //******************************************************************************************************************** int Sequence::filterFromPos(int end){ if (end > aligned.length()) { end = aligned.length(); m->mothurOut("[ERROR]: end to large.\n"); } for(int j = end; j < aligned.length(); j++) { aligned[j] = '.'; } for(int j = aligned.length()-1; j < 0; j--) { if (isalpha(aligned[j])) { break; } else { aligned[j] = '.'; } } setUnaligned(aligned); return 0; } //******************************************************************************************************************** int Sequence::getEndPos(){ if(endPos == -1){ for(int j=alignmentLength-1;j>=0;j--){ if((aligned[j] != '.')&&(aligned[j] != '-')){ endPos = j + 1; break; } } } if(isAligned == 0){ endPos = numBases; } return endPos; } //******************************************************************************************************************** void Sequence::padFromPos(int end){ //cout << end << '\t' << endPos << endl; for(int j = end; j < endPos; j++) { aligned[j] = '.'; } endPos = end; } //******************************************************************************************************************** bool Sequence::getIsAligned(){ return isAligned; } //******************************************************************************************************************** void Sequence::setComment(string c){ comment = c; } //******************************************************************************************************************** void Sequence::reverseComplement(){ string temp; for(int i=numBases-1;i>=0;i--){ if(unaligned[i] == 'A') { temp += 'T'; } else if(unaligned[i] == 'T'){ temp += 'A'; } else if(unaligned[i] == 'G'){ temp += 'C'; } else if(unaligned[i] == 'C'){ temp += 'G'; } else { temp += 'N'; } } unaligned = temp; aligned = temp; } //******************************************************************************************************************** void Sequence::trim(int length){ if(numBases > length){ unaligned = unaligned.substr(0,length); numBases = length; aligned = ""; isAligned = 0; } } ///**************************************************************************************************/ mothur-1.36.1/source/datastructures/sequence.hpp000066400000000000000000000054021255543666200220350ustar00rootroot00000000000000#ifndef SEQUENCE_H #define SEQUENCE_H /* * sequence.h * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * A sequence object has three components: i) an accession number / name, ii) the unaligned primary sequence, iii) a * pairwise aligned sequence, and iv) a sequence that is aligned to a reference alignment. This class has methods * to set and get these values for the other classes where they are needed. * * */ //Data Structure for a fasta file. #include "mothur.h" #include "mothurout.h" /**************************************************************************************************/ class Sequence { public: Sequence(); Sequence(string, string); Sequence(ifstream&); Sequence(ifstream&, string&, bool); Sequence(istringstream&); //these constructors just set the unaligned string to save space Sequence(string, string, string); Sequence(ifstream&, string); Sequence(istringstream&, string); #ifdef USE_BOOST Sequence(boost::iostreams::filtering_istream&); #endif ~Sequence() {} void setName(string); void setUnaligned(string); void setPairwise(string); void setAligned(string); void setComment(string); void setLength(); void reverseComplement(); void trim(int); string convert2ints(); string getName(); string getAligned(); string getPairwise(); string getUnaligned(); string getInlineSeq(); string getComment(); int getNumNs(); int getNumBases(); int getStartPos(); int getEndPos(); void padToPos(int); void padFromPos(int); int filterToPos(int); //any character before the pos is changed to . and aligned and unaligned strings changed int filterFromPos(int); //any character after the pos is changed to . and aligned and unaligned strings changed int getAlignLength(); int getAmbigBases(); void removeAmbigBases(); int getLongHomoPolymer(); bool getIsAligned(); void printSequence(ostream&); private: MothurOut* m; void initialize(); string getSequenceString(ifstream&, int&); string getCommentString(ifstream&); string getSequenceString(istringstream&, int&); string getCommentString(istringstream&); string getSequenceName(ifstream&); #ifdef USE_BOOST string getCommentString(boost::iostreams::filtering_istream&); string getSequenceString(boost::iostreams::filtering_istream&, int&); string getSequenceName(boost::iostreams::filtering_istream&); #endif string getSequenceName(istringstream&); string name; string unaligned; string aligned; string pairwise; string comment; int numBases; int alignmentLength; bool isAligned; int longHomoPolymer; int ambigBases; int startPos, endPos; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/datastructures/sequencecountparser.cpp000066400000000000000000000225211255543666200243170ustar00rootroot00000000000000// // sequencecountparser.cpp // Mothur // // Created by Sarah Westcott on 8/7/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "sequencecountparser.h" /************************************************************/ SequenceCountParser::SequenceCountParser(string countfile, string fastafile) { try { m = MothurOut::getInstance(); //read count file CountTable countTable; countTable.readTable(countfile, true, false); //initialize maps namesOfGroups = countTable.getNamesOfGroups(); for (int i = 0; i < namesOfGroups.size(); i++) { vector temp; map tempMap; seqs[namesOfGroups[i]] = temp; countTablePerGroup[namesOfGroups[i]] = tempMap; } //read fasta file making sure each sequence is in the group file ifstream in; m->openInputFile(fastafile, in); int fastaCount = 0; while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); fastaCount++; if (m->debug) { if((fastaCount) % 1000 == 0){ m->mothurOut("[DEBUG]: reading seq " + toString(fastaCount) + "\n."); } } if (seq.getName() != "") { allSeqsMap[seq.getName()] = seq.getName(); vector groupCounts = countTable.getGroupCounts(seq.getName()); for (int i = 0; i < namesOfGroups.size(); i++) { if (groupCounts[i] != 0) { seqs[namesOfGroups[i]].push_back(seq); countTablePerGroup[namesOfGroups[i]][seq.getName()] = groupCounts[i]; } } } } in.close(); } catch(exception& e) { m->errorOut(e, "SequenceCountParser", "SequenceCountParser"); exit(1); } } /************************************************************/ SequenceCountParser::SequenceCountParser(string fastafile, CountTable& countTable) { try { m = MothurOut::getInstance(); //initialize maps if (countTable.hasGroupInfo()) { namesOfGroups = countTable.getNamesOfGroups(); for (int i = 0; i < namesOfGroups.size(); i++) { vector temp; map tempMap; seqs[namesOfGroups[i]] = temp; countTablePerGroup[namesOfGroups[i]] = tempMap; } //read fasta file making sure each sequence is in the group file ifstream in; m->openInputFile(fastafile, in); int fastaCount = 0; while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); fastaCount++; if (m->debug) { if((fastaCount) % 1000 == 0){ m->mothurOut("[DEBUG]: reading seq " + toString(fastaCount) + "\n."); } } if (seq.getName() != "") { allSeqsMap[seq.getName()] = seq.getName(); vector groupCounts = countTable.getGroupCounts(seq.getName()); for (int i = 0; i < namesOfGroups.size(); i++) { if (groupCounts[i] != 0) { seqs[namesOfGroups[i]].push_back(seq); countTablePerGroup[namesOfGroups[i]][seq.getName()] = groupCounts[i]; } } } } in.close(); }else { m->control_pressed = true; m->mothurOut("[ERROR]: cannot parse fasta file by group with a count table that does not include group data, please correct.\n"); } } catch(exception& e) { m->errorOut(e, "SequenceCountParser", "SequenceCountParser"); exit(1); } } /************************************************************/ SequenceCountParser::~SequenceCountParser(){ } /************************************************************/ int SequenceCountParser::getNumGroups(){ return namesOfGroups.size(); } /************************************************************/ vector SequenceCountParser::getNamesOfGroups(){ return namesOfGroups; } /************************************************************/ int SequenceCountParser::getNumSeqs(string g){ try { map >::iterator it; int num = 0; it = seqs.find(g); if(it == seqs.end()) { m->mothurOut("[ERROR]: " + g + " is not a valid group, please correct."); m->mothurOutEndLine(); }else { num = (it->second).size(); } return num; } catch(exception& e) { m->errorOut(e, "SequenceCountParser", "getNumSeqs"); exit(1); } } /************************************************************/ vector SequenceCountParser::getSeqs(string g){ try { map >::iterator it; vector seqForThisGroup; it = seqs.find(g); if(it == seqs.end()) { m->mothurOut("[ERROR]: No sequences available for group " + g + ", please correct."); m->mothurOutEndLine(); }else { seqForThisGroup = it->second; if (m->debug) { m->mothurOut("[DEBUG]: group " + g + " fasta file has " + toString(seqForThisGroup.size()) + " sequences."); } } return seqForThisGroup; } catch(exception& e) { m->errorOut(e, "SequenceCountParser", "getSeqs"); exit(1); } } /************************************************************/ int SequenceCountParser::getSeqs(string g, string filename, bool uchimeFormat=false){ try { map >::iterator it; vector seqForThisGroup; vector nameVector; it = seqs.find(g); if(it == seqs.end()) { m->mothurOut("[ERROR]: No sequences available for group " + g + ", please correct."); m->mothurOutEndLine(); }else { ofstream out; m->openOutputFile(filename, out); seqForThisGroup = it->second; if (uchimeFormat) { // format should look like //>seqName /ab=numRedundantSeqs/ //sequence map countForThisGroup = getCountTable(g); map::iterator itCount; int error = 0; for (int i = 0; i < seqForThisGroup.size(); i++) { itCount = countForThisGroup.find(seqForThisGroup[i].getName()); if (itCount == countForThisGroup.end()){ error = 1; m->mothurOut("[ERROR]: " + seqForThisGroup[i].getName() + " is in your fastafile, but is not in your count file, please correct."); m->mothurOutEndLine(); }else { seqPriorityNode temp(itCount->second, seqForThisGroup[i].getAligned(), seqForThisGroup[i].getName()); nameVector.push_back(temp); } } if (error == 1) { out.close(); m->mothurRemove(filename); return 1; } //sort by num represented sort(nameVector.begin(), nameVector.end(), compareSeqPriorityNodes); //print new file in order of for (int i = 0; i < nameVector.size(); i++) { if(m->control_pressed) { out.close(); m->mothurRemove(filename); return 1; } out << ">" << nameVector[i].name << "/ab=" << nameVector[i].numIdentical << "/" << endl << nameVector[i].seq << endl; // } }else { //m->mothurOut("Group " + g + " contains " + toString(seqForThisGroup.size()) + " unique seqs.\n"); for (int i = 0; i < seqForThisGroup.size(); i++) { if(m->control_pressed) { out.close(); m->mothurRemove(filename); return 1; } seqForThisGroup[i].printSequence(out); } } out.close(); } return 0; } catch(exception& e) { m->errorOut(e, "SequenceCountParser", "getSeqs"); exit(1); } } /************************************************************/ map SequenceCountParser::getCountTable(string g){ try { map >::iterator it; map countForThisGroup; it = countTablePerGroup.find(g); if(it == countTablePerGroup.end()) { m->mothurOut("[ERROR]: No countTable available for group " + g + ", please correct."); m->mothurOutEndLine(); }else { countForThisGroup = it->second; if (m->debug) { m->mothurOut("[DEBUG]: group " + g + " count file has " + toString(countForThisGroup.size()) + " unique sequences."); } } return countForThisGroup; } catch(exception& e) { m->errorOut(e, "SequenceCountParser", "getCountTable"); exit(1); } } /************************************************************/ int SequenceCountParser::getCountTable(string g, string filename){ try { map >::iterator it; map countForThisGroup; it = countTablePerGroup.find(g); if(it == countTablePerGroup.end()) { m->mothurOut("[ERROR]: No countTable available for group " + g + ", please correct."); m->mothurOutEndLine(); }else { countForThisGroup = it->second; ofstream out; m->openOutputFile(filename, out); out << "Representative_Sequence\ttotal\t" << g << endl; for (map::iterator itFile = countForThisGroup.begin(); itFile != countForThisGroup.end(); itFile++) { if(m->control_pressed) { out.close(); m->mothurRemove(filename); return 1; } out << itFile->first << '\t' << itFile->second << '\t' << itFile->second << endl; } out.close(); } return 0; } catch(exception& e) { m->errorOut(e, "SequenceParser", "getCountTable"); exit(1); } } /************************************************************/ mothur-1.36.1/source/datastructures/sequencecountparser.h000066400000000000000000000042021255543666200237600ustar00rootroot00000000000000#ifndef Mothur_sequencecountparser_h #define Mothur_sequencecountparser_h // // sequencecountparser.h // Mothur // // Created by Sarah Westcott on 8/7/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "mothur.h" #include "mothurout.h" #include "sequence.hpp" #include "counttable.h" /* This class reads a fasta and count file and parses the data by group. The countfile must contain group information. Note: The sum of all the groups unique sequences will be larger than the original number of unique sequences. This is because when we parse the count file we make a unique for each group instead of 1 unique for all groups. */ class SequenceCountParser { public: SequenceCountParser(string, string); //count, fasta - file mismatches will set m->control_pressed = true SequenceCountParser(string, CountTable&); //fasta, counttable - file mismatches will set m->control_pressed = true ~SequenceCountParser(); //general operations int getNumGroups(); vector getNamesOfGroups(); int getNumSeqs(string); //returns the number of unique sequences in a specific group vector getSeqs(string); //returns unique sequences in a specific group map getCountTable(string); //returns seqName -> numberOfRedundantSeqs for a specific group - the count file format, but each line is parsed by group. int getSeqs(string, string, bool); //prints unique sequences in a specific group to a file - group, filename, uchimeFormat=false int getCountTable(string, string); //print seqName -> numberRedundantSeqs for a specific group - group, filename map getAllSeqsMap(){ return allSeqsMap; } //returns map where the key=sequenceName and the value=representativeSequence - helps us remove duplicates after group by group processing private: CountTable countTable; MothurOut* m; int numSeqs; map allSeqsMap; map > seqs; //a vector for each group map > countTablePerGroup; //countTable for each group vector namesOfGroups; }; #endif mothur-1.36.1/source/datastructures/sequencedb.cpp000066400000000000000000000116441255543666200223430ustar00rootroot00000000000000/* * sequencedb.cpp * Mothur * * Created by Thomas Ryabin on 4/13/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sequencedb.h" #include "sequence.hpp" #include "mothur.h" #include "calculator.h" /***********************************************************************/ SequenceDB::SequenceDB() { m = MothurOut::getInstance(); length = 0; samelength = true; } /***********************************************************************/ //the clear function free's the memory SequenceDB::~SequenceDB() { clear(); } /***********************************************************************/ SequenceDB::SequenceDB(int newSize) { data.resize(newSize, Sequence()); length = 0; samelength = true; } /***********************************************************************/ SequenceDB::SequenceDB(ifstream& filehandle) { try{ length = 0; samelength = true; //read through file while (!filehandle.eof()) { //input sequence info into sequencedb Sequence newSequence(filehandle); if (newSequence.getName() != "") { if (length == 0) { length = newSequence.getAligned().length(); } if (length != newSequence.getAligned().length()) { samelength = false; } data.push_back(newSequence); } //takes care of white space m->gobble(filehandle); } filehandle.close(); } catch(exception& e) { m->errorOut(e, "SequenceDB", "SequenceDB"); exit(1); } } /*******************************************************************************/ string SequenceDB::readName(ifstream& in) { try{ string name = ""; int c; string temp; while ((c = in.get()) != EOF) { //if c is not a line return if (c != 10) { name += c; }else { break; } } return name; } catch(exception& e) { m->errorOut(e, "SequenceDB", "readName"); exit(1); } } /*******************************************************************************/ string SequenceDB::readSequence(ifstream& in) { try{ string sequence = ""; string line; int pos, c; while (!in.eof()) { //save position in file in case next line is a new name. pos = in.tellg(); line = ""; in >> line; //if you are at a new name if (line[0] == '>') { //put file pointer back since you are now at a new name in.seekg(pos, ios::beg); c = in.get(); //because you put it back to a newline char break; }else { sequence += line; } } if (length == 0) { length = sequence.length(); } if (length != sequence.length()) { samelength = false; } return sequence; } catch(exception& e) { m->errorOut(e, "SequenceDB", "readSequence"); exit(1); } } /***********************************************************************/ int SequenceDB::getNumSeqs() { return data.size(); } /***********************************************************************/ void SequenceDB::set(int index, string newUnaligned) { try { if (length == 0) { length = newUnaligned.length(); } if (length != newUnaligned.length()) { samelength = false; } data[index] = Sequence(data[index].getName(), newUnaligned); } catch(exception& e) { m->errorOut(e, "SequenceDB", "set"); exit(1); } } /***********************************************************************/ void SequenceDB::set(int index, Sequence newSeq) { try { if (length == 0) { length = newSeq.getAligned().length(); } if (length != newSeq.getAligned().length()) { samelength = false; } data[index] = newSeq; } catch(exception& e) { m->errorOut(e, "SequenceDB", "set"); exit(1); } } /***********************************************************************/ Sequence SequenceDB::get(int index) { return data[index]; } /***********************************************************************/ void SequenceDB::resize(int newSize) { try { data.resize(newSize); } catch(exception& e) { m->errorOut(e, "SequenceDB", "resize"); exit(1); } } /***********************************************************************/ void SequenceDB::clear() { try { data.clear(); } catch(exception& e) { m->errorOut(e, "SequenceDB", "clear"); exit(1); } } /***********************************************************************/ int SequenceDB::size() { return data.size(); } /***********************************************************************/ void SequenceDB::print(ostream& out) { try { for(int i = 0; i < data.size(); i++) { data[i].printSequence(out); } } catch(exception& e) { m->errorOut(e, "SequenceDB", "print"); exit(1); } } /***********************************************************************/ void SequenceDB::push_back(Sequence newSequence) { try { if (length == 0) { length = newSequence.getAligned().length(); } if (length != newSequence.getAligned().length()) { samelength = false; } data.push_back(newSequence); } catch(exception& e) { m->errorOut(e, "SequenceDB", "push_back"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sequencedb.h000066400000000000000000000025121255543666200220020ustar00rootroot00000000000000#ifndef SEQUENCEDB_H #define SEQUENCEDB_H /* * sequencedb.h * Mothur * * Created by Thomas Ryabin on 4/13/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class is a container to store the sequences. */ #include "sequence.hpp" #include "calculator.h" class SequenceDB { public: SequenceDB(); SequenceDB(int); //makes data that size SequenceDB(ifstream&); //reads file to fill data SequenceDB(const SequenceDB& sdb) : data(sdb.data) {}; ~SequenceDB(); //loops through data and delete each sequence int getNumSeqs(); void set(int, string); //unaligned - should also set length void set(int, Sequence); //unaligned - should also set length Sequence get(int); //returns sequence name at that location void push_back(Sequence); //adds unaligned sequence void resize(int); //resizes data void clear(); //clears data - remeber to loop through and delete the sequences inside or you will have a memory leak int size(); //returns datas size void print(ostream&); //loops through data using sequence class print bool sameLength() { return samelength; } private: vector data; string readName(ifstream&); string readSequence(ifstream&); MothurOut* m; bool samelength; int length; }; #endif mothur-1.36.1/source/datastructures/sequenceparser.cpp000066400000000000000000000435351255543666200232560ustar00rootroot00000000000000/* * sequenceParser.cpp * Mothur * * Created by westcott on 9/9/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "sequenceparser.h" /************************************************************/ SequenceParser::SequenceParser(string groupFile, string fastaFile, string nameFile) { try { m = MothurOut::getInstance(); int error; //read group file groupMap = new GroupMap(groupFile); error = groupMap->readMap(); if (error == 1) { m->control_pressed = true; } //initialize maps vector namesOfGroups = groupMap->getNamesOfGroups(); for (int i = 0; i < namesOfGroups.size(); i++) { vector temp; map tempMap; seqs[namesOfGroups[i]] = temp; nameMapPerGroup[namesOfGroups[i]] = tempMap; } //read fasta file making sure each sequence is in the group file ifstream in; m->openInputFile(fastaFile, in); map seqName; //stores name -> sequence string so we can make new "unique" sequences when we parse the name file int fastaCount = 0; while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); fastaCount++; if (m->debug) { if((fastaCount) % 1000 == 0){ m->mothurOut("[DEBUG]: reading seq " + toString(fastaCount) + "\n."); } } if (seq.getName() != "") { string group = groupMap->getGroup(seq.getName()); if (group == "not found") { error = 1; m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file and not in your groupfile, please correct."); m->mothurOutEndLine(); } else { seqs[group].push_back(seq); seqName[seq.getName()] = seq.getAligned(); } } } in.close(); if (error == 1) { m->control_pressed = true; } //read name file ifstream inName; m->openInputFile(nameFile, inName); //string first, second; int countName = 0; set thisnames1; string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!inName.eof()) { if (m->control_pressed) { break; } inName.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, inName.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { //save one line if (m->debug) { m->mothurOut("[DEBUG]: reading names: " + firstCol + '\t' + secondCol + ".\n"); } vector names; m->splitAtChar(secondCol, names, ','); //get aligned string for these seqs from the fasta file string alignedString = ""; map::iterator itAligned = seqName.find(names[0]); if (itAligned == seqName.end()) { error = 1; m->mothurOut("[ERROR]: " + names[0] + " is in your name file and not in your fasta file, please correct."); m->mothurOutEndLine(); }else { alignedString = itAligned->second; } //separate by group - parse one line in name file map splitMap; //group -> name1,name2,... map::iterator it; for (int i = 0; i < names.size(); i++) { string group = groupMap->getGroup(names[i]); if (group == "not found") { error = 1; m->mothurOut("[ERROR]: " + names[i] + " is in your name file and not in your groupfile, please correct."); m->mothurOutEndLine(); } else { it = splitMap.find(group); if (it != splitMap.end()) { //adding seqs to this group (it->second) += "," + names[i]; thisnames1.insert(names[i]); countName++; }else { //first sighting of this group splitMap[group] = names[i]; countName++; thisnames1.insert(names[i]); //is this seq in the fasta file? if (i != 0) { //if not then we need to add a duplicate sequence to the seqs for this group so the new "fasta" and "name" files will match Sequence tempSeq(names[i], alignedString); //get the first guys sequence string since he's in the fasta file. seqs[group].push_back(tempSeq); } } } allSeqsMap[names[i]] = names[0]; } //fill nameMapPerGroup - holds all lines in namefile separated by group for (it = splitMap.begin(); it != splitMap.end(); it++) { //grab first name string firstName = ""; for(int i = 0; i < (it->second).length(); i++) { if (((it->second)[i]) != ',') { firstName += ((it->second)[i]); }else { break; } } //group1 -> seq1 -> seq1,seq2,seq3 nameMapPerGroup[it->first][firstName] = it->second; } pairDone = false; } } } inName.close(); //in case file does not end in white space if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { //save one line if (m->debug) { m->mothurOut("[DEBUG]: reading names: " + firstCol + '\t' + secondCol + ".\n"); } vector names; m->splitAtChar(secondCol, names, ','); //get aligned string for these seqs from the fasta file string alignedString = ""; map::iterator itAligned = seqName.find(names[0]); if (itAligned == seqName.end()) { error = 1; m->mothurOut("[ERROR]: " + names[0] + " is in your name file and not in your fasta file, please correct."); m->mothurOutEndLine(); }else { alignedString = itAligned->second; } //separate by group - parse one line in name file map splitMap; //group -> name1,name2,... map::iterator it; for (int i = 0; i < names.size(); i++) { string group = groupMap->getGroup(names[i]); if (group == "not found") { error = 1; m->mothurOut("[ERROR]: " + names[i] + " is in your name file and not in your groupfile, please correct."); m->mothurOutEndLine(); } else { it = splitMap.find(group); if (it != splitMap.end()) { //adding seqs to this group (it->second) += "," + names[i]; thisnames1.insert(names[i]); countName++; }else { //first sighting of this group splitMap[group] = names[i]; countName++; thisnames1.insert(names[i]); //is this seq in the fasta file? if (i != 0) { //if not then we need to add a duplicate sequence to the seqs for this group so the new "fasta" and "name" files will match Sequence tempSeq(names[i], alignedString); //get the first guys sequence string since he's in the fasta file. seqs[group].push_back(tempSeq); } } } allSeqsMap[names[i]] = names[0]; } //fill nameMapPerGroup - holds all lines in namefile separated by group for (it = splitMap.begin(); it != splitMap.end(); it++) { //grab first name string firstName = ""; for(int i = 0; i < (it->second).length(); i++) { if (((it->second)[i]) != ',') { firstName += ((it->second)[i]); }else { break; } } //group1 -> seq1 -> seq1,seq2,seq3 nameMapPerGroup[it->first][firstName] = it->second; } pairDone = false; } } } if (error == 1) { m->control_pressed = true; } if (countName != (groupMap->getNumSeqs())) { vector groupseqsnames = groupMap->getNamesSeqs(); for (int i = 0; i < groupseqsnames.size(); i++) { set::iterator itnamesfile = thisnames1.find(groupseqsnames[i]); if (itnamesfile == thisnames1.end()){ cout << "missing name " + groupseqsnames[i] << '\t' << allSeqsMap[groupseqsnames[i]] << endl; } } m->mothurOutEndLine(); m->mothurOut("[ERROR]: Your name file contains " + toString(countName) + " valid sequences, and your groupfile contains " + toString(groupMap->getNumSeqs()) + ", please correct."); m->mothurOutEndLine(); m->control_pressed = true; } } catch(exception& e) { m->errorOut(e, "SequenceParser", "SequenceParser"); exit(1); } } /************************************************************/ SequenceParser::SequenceParser(string groupFile, string fastaFile) { try { m = MothurOut::getInstance(); int error; //read group file groupMap = new GroupMap(groupFile); error = groupMap->readMap(); if (error == 1) { m->control_pressed = true; } //initialize maps vector namesOfGroups = groupMap->getNamesOfGroups(); for (int i = 0; i < namesOfGroups.size(); i++) { vector temp; seqs[namesOfGroups[i]] = temp; } //read fasta file making sure each sequence is in the group file ifstream in; m->openInputFile(fastaFile, in); int count = 0; while (!in.eof()) { if (m->control_pressed) { break; } Sequence seq(in); m->gobble(in); if (seq.getName() != "") { string group = groupMap->getGroup(seq.getName()); if (group == "not found") { error = 1; m->mothurOut("[ERROR]: " + seq.getName() + " is in your fasta file and not in your groupfile, please correct."); m->mothurOutEndLine(); } else { seqs[group].push_back(seq); count++; } } } in.close(); if (error == 1) { m->control_pressed = true; } if (count != (groupMap->getNumSeqs())) { m->mothurOutEndLine(); m->mothurOut("[ERROR]: Your fasta file contains " + toString(count) + " valid sequences, and your groupfile contains " + toString(groupMap->getNumSeqs()) + ", please correct."); if (count < (groupMap->getNumSeqs())) { m->mothurOut(" Did you forget to include the name file?"); } m->mothurOutEndLine(); m->control_pressed = true; } } catch(exception& e) { m->errorOut(e, "SequenceParser", "SequenceParser"); exit(1); } } /************************************************************/ SequenceParser::~SequenceParser(){ delete groupMap; } /************************************************************/ int SequenceParser::getNumGroups(){ return groupMap->getNumGroups(); } /************************************************************/ vector SequenceParser::getNamesOfGroups(){ return groupMap->getNamesOfGroups(); } /************************************************************/ bool SequenceParser::isValidGroup(string g){ return groupMap->isValidGroup(g); } /************************************************************/ int SequenceParser::getNumSeqs(string g){ try { map >::iterator it; int num = 0; it = seqs.find(g); if(it == seqs.end()) { m->mothurOut("[ERROR]: " + g + " is not a valid group, please correct."); m->mothurOutEndLine(); }else { num = (it->second).size(); } return num; } catch(exception& e) { m->errorOut(e, "SequenceParser", "getNumSeqs"); exit(1); } } /************************************************************/ vector SequenceParser::getSeqs(string g){ try { map >::iterator it; vector seqForThisGroup; it = seqs.find(g); if(it == seqs.end()) { m->mothurOut("[ERROR]: No sequences available for group " + g + ", please correct."); m->mothurOutEndLine(); }else { seqForThisGroup = it->second; if (m->debug) { m->mothurOut("[DEBUG]: group " + g + " fasta file has " + toString(seqForThisGroup.size()) + " sequences."); } } return seqForThisGroup; } catch(exception& e) { m->errorOut(e, "SequenceParser", "getSeqs"); exit(1); } } /************************************************************/ int SequenceParser::getSeqs(string g, string filename, bool uchimeFormat=false){ try { map >::iterator it; vector seqForThisGroup; vector nameVector; it = seqs.find(g); if(it == seqs.end()) { m->mothurOut("[ERROR]: No sequences available for group " + g + ", please correct."); m->mothurOutEndLine(); }else { ofstream out; m->openOutputFile(filename, out); seqForThisGroup = it->second; if (uchimeFormat) { // format should look like //>seqName /ab=numRedundantSeqs/ //sequence map nameMapForThisGroup = getNameMap(g); map::iterator itNameMap; int error = 0; for (int i = 0; i < seqForThisGroup.size(); i++) { itNameMap = nameMapForThisGroup.find(seqForThisGroup[i].getName()); if (itNameMap == nameMapForThisGroup.end()){ error = 1; m->mothurOut("[ERROR]: " + seqForThisGroup[i].getName() + " is in your fastafile, but is not in your namesfile, please correct."); m->mothurOutEndLine(); }else { int num = m->getNumNames(itNameMap->second); seqPriorityNode temp(num, seqForThisGroup[i].getAligned(), seqForThisGroup[i].getName()); nameVector.push_back(temp); } } if (error == 1) { out.close(); m->mothurRemove(filename); return 1; } //sort by num represented sort(nameVector.begin(), nameVector.end(), compareSeqPriorityNodes); //print new file in order of for (int i = 0; i < nameVector.size(); i++) { if(m->control_pressed) { out.close(); m->mothurRemove(filename); return 1; } out << ">" << nameVector[i].name << "/ab=" << nameVector[i].numIdentical << "/" << endl << nameVector[i].seq << endl; // } }else { //m->mothurOut("Group " + g + " contains " + toString(seqForThisGroup.size()) + " unique seqs.\n"); for (int i = 0; i < seqForThisGroup.size(); i++) { if(m->control_pressed) { out.close(); m->mothurRemove(filename); return 1; } seqForThisGroup[i].printSequence(out); } } out.close(); } return 0; } catch(exception& e) { m->errorOut(e, "SequenceParser", "getSeqs"); exit(1); } } /************************************************************/ map SequenceParser::getNameMap(string g){ try { map >::iterator it; map nameMapForThisGroup; it = nameMapPerGroup.find(g); if(it == nameMapPerGroup.end()) { m->mothurOut("[ERROR]: No nameMap available for group " + g + ", please correct."); m->mothurOutEndLine(); }else { nameMapForThisGroup = it->second; if (m->debug) { m->mothurOut("[DEBUG]: group " + g + " name file has " + toString(nameMapForThisGroup.size()) + " unique sequences."); } } return nameMapForThisGroup; } catch(exception& e) { m->errorOut(e, "SequenceParser", "getNameMap"); exit(1); } } /************************************************************/ int SequenceParser::getNameMap(string g, string filename){ try { map >::iterator it; map nameMapForThisGroup; it = nameMapPerGroup.find(g); if(it == nameMapPerGroup.end()) { m->mothurOut("[ERROR]: No nameMap available for group " + g + ", please correct."); m->mothurOutEndLine(); }else { nameMapForThisGroup = it->second; ofstream out; m->openOutputFile(filename, out); for (map::iterator itFile = nameMapForThisGroup.begin(); itFile != nameMapForThisGroup.end(); itFile++) { if(m->control_pressed) { out.close(); m->mothurRemove(filename); return 1; } out << itFile->first << '\t' << itFile->second << endl; } out.close(); } return 0; } catch(exception& e) { m->errorOut(e, "SequenceParser", "getNameMap"); exit(1); } } /************************************************************/ mothur-1.36.1/source/datastructures/sequenceparser.h000066400000000000000000000041251255543666200227130ustar00rootroot00000000000000#ifndef SEQUENCEPARSER_H #define SEQUENCEPARSER_H /* * sequenceParser.h * Mothur * * Created by westcott on 9/9/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" #include "sequence.hpp" #include "groupmap.h" /* This class reads a fasta and group file with a namesfile as optional and parses the data by group. Note: The sum of all the groups unique sequences will be larger than the original number of unique sequences. This is because when we parse the name file we make a unique for each group instead of 1 unique for all groups. */ class SequenceParser { public: SequenceParser(string, string); //group, fasta - file mismatches will set m->control_pressed = true SequenceParser(string, string, string); //group, fasta, name - file mismatches will set m->control_pressed = true ~SequenceParser(); //general operations int getNumGroups(); vector getNamesOfGroups(); bool isValidGroup(string); //return true if string is a valid group int getNumSeqs(string); //returns the number of unique sequences in a specific group vector getSeqs(string); //returns unique sequences in a specific group map getNameMap(string); //returns seqName -> namesOfRedundantSeqs separated by commas for a specific group - the name file format, but each line is parsed by group. int getSeqs(string, string, bool); //prints unique sequences in a specific group to a file - group, filename, uchimeFormat=false int getNameMap(string, string); //print seqName -> namesOfRedundantSeqs separated by commas for a specific group - group, filename map getAllSeqsMap(){ return allSeqsMap; } //returns map where the key=sequenceName and the value=representativeSequence - helps us remove duplicates after group by group processing private: GroupMap* groupMap; MothurOut* m; int numSeqs; map allSeqsMap; map > seqs; //a vector for each group map > nameMapPerGroup; //nameMap for each group }; #endif mothur-1.36.1/source/datastructures/sharedlistvector.cpp000066400000000000000000000373771255543666200236250ustar00rootroot00000000000000/* * sharedSharedListVector.cpp * Mothur * * Created by Sarah Westcott on 1/22/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sabundvector.hpp" #include "rabundvector.hpp" #include "ordervector.hpp" #include "sharedlistvector.h" #include "sharedordervector.h" #include "sharedutilities.h" /***********************************************************************/ SharedListVector::SharedListVector() : DataVector(), maxRank(0), numBins(0), numSeqs(0){ groupmap = NULL; countTable = NULL; } /***********************************************************************/ SharedListVector::SharedListVector(int n): DataVector(), data(n, "") , maxRank(0), numBins(0), numSeqs(0){ groupmap = NULL; countTable = NULL; } /***********************************************************************/ SharedListVector::SharedListVector(ifstream& f) : DataVector(), maxRank(0), numBins(0), numSeqs(0) { try { groupmap = NULL; countTable = NULL; //set up groupmap for later. if (m->groupMode == "group") { groupmap = new GroupMap(m->getGroupFile()); groupmap->readMap(); }else { countTable = new CountTable(); countTable->readTable(m->getCountTableFile(), true, false); } int hold; //are we at the beginning of the file?? if (m->saveNextLabel == "") { f >> label; //is this a shared file that has headers if (label == "label") { //gets "numOtus" f >> label; m->gobble(f); //eat rest of line label = m->getline(f); m->gobble(f); //parse labels to save istringstream iStringStream(label); m->listBinLabelsInFile.clear(); while(!iStringStream.eof()){ if (m->control_pressed) { break; } string temp; iStringStream >> temp; m->gobble(iStringStream); m->listBinLabelsInFile.push_back(temp); } f >> label >> hold; }else { //read in first row f >> hold; //make binlabels because we don't have any string snumBins = toString(hold); m->listBinLabelsInFile.clear(); for (int i = 0; i < hold; i++) { //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; m->listBinLabelsInFile.push_back(binLabel); } } m->saveNextLabel = label; }else { f >> label >> hold; m->saveNextLabel = label; } binLabels.assign(m->listBinLabelsInFile.begin(), m->listBinLabelsInFile.begin()+hold); data.assign(hold, ""); string inputData = ""; for(int i=0;i> inputData; set(i, inputData); } m->gobble(f); if (f.eof()) { m->saveNextLabel = ""; } } catch(exception& e) { m->errorOut(e, "SharedListVector", "SharedListVector"); exit(1); } } /***********************************************************************/ void SharedListVector::set(int binNumber, string seqNames){ try { int nNames_old = m->getNumNames(data[binNumber]); data[binNumber] = seqNames; int nNames_new = m->getNumNames(seqNames); if(nNames_old == 0) { numBins++; } if(nNames_new == 0) { numBins--; } if(nNames_new > maxRank) { maxRank = nNames_new; } numSeqs += (nNames_new - nNames_old); } catch(exception& e) { m->errorOut(e, "SharedListVector", "set"); exit(1); } } /***********************************************************************/ string SharedListVector::get(int index){ return data[index]; } /***********************************************************************/ void SharedListVector::setLabels(vector labels){ try { binLabels = labels; } catch(exception& e) { m->errorOut(e, "SharedListVector", "setLabels"); exit(1); } } /***********************************************************************/ //could potentially end up with duplicate binlabel names with code below. //we don't currently use them in a way that would do that. //if you had a listfile that had been subsampled and then added to it, dup names would be possible. vector SharedListVector::getLabels(){ try { string tagHeader = "Otu"; if (m->sharedHeaderMode == "tax") { tagHeader = "PhyloType"; } if (binLabels.size() < data.size()) { string snumBins = toString(numBins); for (int i = 0; i < numBins; i++) { string binLabel = tagHeader; if (i < binLabels.size()) { //label exists, check leading zeros length string sbinNumber = m->getSimpleLabel(binLabels[i]); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; binLabels[i] = binLabel; }else{ string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; binLabels.push_back(binLabel); } } } return binLabels; } catch(exception& e) { m->errorOut(e, "SharedListVector", "getLabels"); exit(1); } } /***********************************************************************/ void SharedListVector::push_back(string seqNames){ try { data.push_back(seqNames); int nNames = m->getNumNames(seqNames); numBins++; if(nNames > maxRank) { maxRank = nNames; } numSeqs += nNames; } catch(exception& e) { m->errorOut(e, "SharedListVector", "push_back"); exit(1); } } /***********************************************************************/ void SharedListVector::resize(int size){ data.resize(size); } /***********************************************************************/ int SharedListVector::size(){ return data.size(); } /***********************************************************************/ void SharedListVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; return data.clear(); } /***********************************************************************/ void SharedListVector::print(ostream& output){ try { output << label << '\t' << numBins; for(int i=0;ierrorOut(e, "SharedListVector", "print"); exit(1); } } /***********************************************************************/ RAbundVector SharedListVector::getRAbundVector(){ try { RAbundVector rav; for(int i=0;igetNumNames(data[i]); rav.push_back(binSize); } // This was here before to output data in a nice format, but it screws up the name mapping steps // sort(rav.rbegin(), rav.rend()); // // for(int i=data.size()-1;i>=0;i--){ // if(rav.get(i) == 0){ rav.pop_back(); } // else{ // break; // } // } rav.setLabel(label); return rav; } catch(exception& e) { m->errorOut(e, "SharedListVector", "getRAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector SharedListVector::getSAbundVector(){ try { SAbundVector sav(maxRank+1); for(int i=0;igetNumNames(data[i]); sav.set(binSize, sav.get(binSize) + 1); } sav.set(0, 0); sav.setLabel(label); return sav; } catch(exception& e) { m->errorOut(e, "SharedListVector", "getSAbundVector"); exit(1); } } /***********************************************************************/ SharedOrderVector* SharedListVector::getSharedOrderVector(){ try { SharedOrderVector* order = new SharedOrderVector(); order->setLabel(label); for(int i=0;igetNumNames(get(i)); //find number of individual in given bin string names = get(i); vector binNames; m->splitAtComma(names, binNames); if (m->groupMode != "group") { binSize = 0; for (int j = 0; j < binNames.size(); j++) { binSize += countTable->getNumSeqs(binNames[i]); } } for (int j = 0; j < binNames.size(); j++) { if (m->control_pressed) { return order; } if (m->groupMode == "group") { string groupName = groupmap->getGroup(binNames[i]); if(groupName == "not found") { m->mothurOut("Error: Sequence '" + binNames[i] + "' was not found in the group file, please correct."); m->mothurOutEndLine(); exit(1); } order->push_back(i, binSize, groupName); //i represents what bin you are in }else { vector groupAbundances = countTable->getGroupCounts(binNames[i]); vector groupNames = countTable->getNamesOfGroups(); for (int k = 0; k < groupAbundances.size(); k++) { //groupAbundances.size() == 0 if there is a file mismatch and m->control_pressed is true. if (m->control_pressed) { return order; } for (int l = 0; l < groupAbundances[k]; l++) { order->push_back(i, binSize, groupNames[k]); } } } } } random_shuffle(order->begin(), order->end()); order->updateStats(); return order; } catch(exception& e) { m->errorOut(e, "SharedListVector", "getSharedOrderVector"); exit(1); } } /***********************************************************************/ SharedRAbundVector SharedListVector::getSharedRAbundVector(string groupName) { try { m->currentSharedBinLabels = binLabels; SharedRAbundVector rav(data.size()); for(int i=0;i binNames; m->splitAtComma(names, binNames); for (int j = 0; j < binNames.size(); j++) { if (m->control_pressed) { return rav; } if (m->groupMode == "group") { string group = groupmap->getGroup(binNames[j]); if(group == "not found") { m->mothurOut("Error: Sequence '" + binNames[j] + "' was not found in the group file, please correct."); m->mothurOutEndLine(); exit(1); } if (group == groupName) { //this name is in the group you want the vector for. rav.set(i, rav.getAbundance(i) + 1, group); //i represents what bin you are in } }else { int count = countTable->getGroupCount(binNames[j], groupName); rav.set(i, rav.getAbundance(i) + count, groupName); } } } rav.setLabel(label); rav.setGroup(groupName); return rav; } catch(exception& e) { m->errorOut(e, "SharedListVector", "getSharedRAbundVector"); exit(1); } } /***********************************************************************/ vector SharedListVector::getSharedRAbundVector() { try { m->currentSharedBinLabels = binLabels; SharedUtil* util; util = new SharedUtil(); vector lookup; //contains just the groups the user selected vector lookupDelete; map finder; //contains all groups in groupmap vector Groups = m->getGroups(); vector allGroups; if (m->groupMode == "group") { allGroups = groupmap->getNamesOfGroups(); } else { allGroups = countTable->getNamesOfGroups(); } util->setGroups(Groups, allGroups); m->setGroups(Groups); delete util; for (int i = 0; i < allGroups.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(data.size()); finder[allGroups[i]] = temp; finder[allGroups[i]]->setLabel(label); finder[allGroups[i]]->setGroup(allGroups[i]); if (m->inUsersGroups(allGroups[i], m->getGroups())) { //if this group is in user groups lookup.push_back(finder[allGroups[i]]); }else { lookupDelete.push_back(finder[allGroups[i]]); } } //fill vectors for(int i=0;i binNames; m->splitAtComma(names, binNames); for (int j = 0; j < binNames.size(); j++) { if (m->groupMode == "group") { string group = groupmap->getGroup(binNames[j]); if(group == "not found") { m->mothurOut("Error: Sequence '" + binNames[j] + "' was not found in the group file, please correct."); m->mothurOutEndLine(); exit(1); } finder[group]->set(i, finder[group]->getAbundance(i) + 1, group); //i represents what bin you are in }else{ vector counts = countTable->getGroupCounts(binNames[j]); for (int k = 0; k < allGroups.size(); k++) { finder[allGroups[k]]->set(i, finder[allGroups[k]]->getAbundance(i) + counts[k], allGroups[k]); } } } } for (int j = 0; j < lookupDelete.size(); j++) { delete lookupDelete[j]; } return lookup; } catch(exception& e) { m->errorOut(e, "SharedListVector", "getSharedRAbundVector"); exit(1); } } /***********************************************************************/ SharedSAbundVector SharedListVector::getSharedSAbundVector(string groupName) { try { SharedSAbundVector sav; SharedRAbundVector rav; rav = this->getSharedRAbundVector(groupName); sav = rav.getSharedSAbundVector(); return sav; } catch(exception& e) { m->errorOut(e, "SharedListVector", "getSharedSAbundVector"); exit(1); } } /***********************************************************************/ OrderVector SharedListVector::getOrderVector(map* orderMap = NULL){ try { if(orderMap == NULL){ OrderVector ov; for(int i=0;i binNames; m->splitAtComma(names, binNames); int binSize = binNames.size(); if (m->groupMode != "group") { binSize = 0; for (int j = 0; j < binNames.size(); j++) { binSize += countTable->getNumSeqs(binNames[i]); } } for(int j=0;j binNames; m->splitAtComma(listOTU, binNames); for (int j = 0; j < binNames.size(); j++) { if(orderMap->count(binNames[j]) == 0){ m->mothurOut(binNames[j] + " not found, check *.names file\n"); exit(1); } ov.set((*orderMap)[binNames[j]], i); } } ov.setLabel(label); ov.getNumBins(); return ov; } } catch(exception& e) { m->errorOut(e, "SharedListVector", "getOrderVector"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sharedlistvector.h000066400000000000000000000045611255543666200232570ustar00rootroot00000000000000#ifndef SHAREDLIST_H #define SHAREDLIST_H /* * sharedlistvector.h * Mothur * * Created by Sarah Westcott on 1/22/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "datavector.hpp" #include "groupmap.h" #include "counttable.h" #include "sharedrabundvector.h" #include "sharedsabundvector.h" /* This class is a child to datavector. It represents OTU information at a certain distance. A sharedlistvector can be converted into a sharedordervector, sharedrabundvector or sharedsabundvectorand as well as an ordervector, rabundvector or sabundvector. Each member of the internal container "data" represents an individual OTU. Each individual in the OTU belongs to a group. So data[0] = "a,b,c,d,e,f". example: listvector = a,b,c,d,e,f g,h,i j,k l m rabundvector = 6 3 2 1 1 sabundvector = 2 1 1 0 0 1 ordervector = 1 1 1 1 1 1 2 2 2 3 3 4 5 */ class SharedListVector : public DataVector { public: SharedListVector(); SharedListVector(int); SharedListVector(ifstream&); SharedListVector(const SharedListVector& lv) : DataVector(lv.label), data(lv.data), maxRank(lv.maxRank), numBins(lv.numBins), numSeqs(lv.numSeqs), binLabels(lv.binLabels) { groupmap = NULL; countTable = NULL; }; ~SharedListVector(){ if (groupmap != NULL) { delete groupmap; } if (countTable != NULL) { delete countTable; } }; int getNumBins() { return numBins; } int getNumSeqs() { return numSeqs; } int getMaxRank() { return maxRank; } void set(int, string); string get(int); vector getLabels(); void setLabels(vector); void push_back(string); void resize(int); void clear(); int size(); void print(ostream&); RAbundVector getRAbundVector(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); SharedOrderVector* getSharedOrderVector(); SharedRAbundVector getSharedRAbundVector(string); //get sharedrabundvector for a certain group SharedSAbundVector getSharedSAbundVector(string); //get sharedsabundvector for a certain group vector getSharedRAbundVector(); //returns sharedRabundVectors for all the users groups private: vector data; //data[i] is a list of names of sequences in the ith OTU. GroupMap* groupmap; CountTable* countTable; int maxRank; int numBins; int numSeqs; vector binLabels; }; #endif mothur-1.36.1/source/datastructures/sharedordervector.cpp000066400000000000000000000222311255543666200237440ustar00rootroot00000000000000/* * sharedSharedOrderVector.cpp * Dotur * * Created by Sarah Westcott on 12/9/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedordervector.h" #include "sharedutilities.h" /***********************************************************************/ SharedOrderVector::SharedOrderVector() : DataVector(), maxRank(0), numBins(0), numSeqs(0) {} /***********************************************************************/ SharedOrderVector::SharedOrderVector(string id, vector ov) : DataVector(id), data(ov) { updateStats(); } /***********************************************************************/ //This function is used to read a .shared file for the collect.shared, rarefaction.shared and summary.shared commands //if you don't use a list and groupfile. SharedOrderVector::SharedOrderVector(ifstream& f) : DataVector() { //reads in a shared file try { maxRank = 0; numBins = 0; numSeqs = 0; groupmap = new GroupMap(); int num, inputData, count; count = 0; numSeqs = 0; string holdLabel, nextLabel, groupN; individual newguy; //read in first row since you know there is at least 1 group. //are we at the beginning of the file?? if (m->saveNextLabel == "") { f >> label; //is this a shared file that has headers if (label == "label") { //gets "group" f >> label; m->gobble(f); //gets "numOtus" f >> label; m->gobble(f); //eat rest of line label = m->getline(f); m->gobble(f); //parse labels to save istringstream iStringStream(label); m->sharedBinLabelsInFile.clear(); while(!iStringStream.eof()){ if (m->control_pressed) { break; } string temp; iStringStream >> temp; m->gobble(iStringStream); m->sharedBinLabelsInFile.push_back(temp); } f >> label; } }else { label = m->saveNextLabel; } //reset labels, currentLabels may have gotten changed as otus were eliminated because of group choices or sampling m->currentSharedBinLabels = m->sharedBinLabelsInFile; //read in first row since you know there is at least 1 group. f >> groupN >> num; holdLabel = label; vector allGroups; //save group in groupmap allGroups.push_back(groupN); groupmap->groupIndex[groupN] = 0; for(int i=0;i> inputData; for (int j = 0; j < inputData; j++) { push_back(i, i, groupN); numSeqs++; } } m->gobble(f); if (!(f.eof())) { f >> nextLabel; } //read the rest of the groups info in while ((nextLabel == holdLabel) && (f.eof() != true)) { f >> groupN >> num; count++; //save group in groupmap allGroups.push_back(groupN); groupmap->groupIndex[groupN] = count; for(int i=0;i> inputData; for (int j = 0; j < inputData; j++) { push_back(i, i, groupN); numSeqs++; } } m->gobble(f); if (f.eof() != true) { f >> nextLabel; } } m->saveNextLabel = nextLabel; groupmap->setNamesOfGroups(allGroups); m->setAllGroups(allGroups); updateStats(); } catch(exception& e) { m->errorOut(e, "SharedOrderVector", "SharedOrderVector"); exit(1); } } /***********************************************************************/ int SharedOrderVector::getNumBins(){ return numBins; } /***********************************************************************/ int SharedOrderVector::getNumSeqs(){ return numSeqs; } /***********************************************************************/ int SharedOrderVector::getMaxRank(){ return maxRank; } /***********************************************************************/ void SharedOrderVector::set(int index, int binNumber, int abund, string groupName){ data[index].group = groupName; data[index].bin = binNumber; data[index].abundance = abund; //if (abund > maxRank) { maxRank = abund; } updateStats(); } /***********************************************************************/ individual SharedOrderVector::get(int index){ return data[index]; } /***********************************************************************/ //commented updateStats out to improve speed, but whoever calls this must remember to update when they are done with all the pushbacks they are doing void SharedOrderVector::push_back(int binNumber, int abund, string groupName){ individual newGuy; newGuy.group = groupName; newGuy.abundance = abund; newGuy.bin = binNumber; data.push_back(newGuy); //numSeqs++; //numBins++; //if (abund > maxRank) { maxRank = abund; } //updateStats(); } /***********************************************************************/ void SharedOrderVector::print(ostream& output){ try { output << label << '\t' << numSeqs; for(int i=0;ierrorOut(e, "SharedOrderVector", "print"); exit(1); } } /***********************************************************************/ void SharedOrderVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; data.clear(); } /***********************************************************************/ void SharedOrderVector::resize(int){ m->mothurOut("resize() did nothing in class SharedOrderVector"); } /***********************************************************************/ vector::iterator SharedOrderVector::begin(){ return data.begin(); } /***********************************************************************/ vector::iterator SharedOrderVector::end(){ return data.end(); } /***********************************************************************/ int SharedOrderVector::size(){ return data.size(); } /***********************************************************************/ RAbundVector SharedOrderVector::getRAbundVector(){ try { RAbundVector rav(data.size()); for(int i=0;i=0;i--){ if(rav.get(i) == 0){ rav.pop_back(); } else{ break; } } rav.setLabel(label); return rav; } catch(exception& e) { m->errorOut(e, "SharedOrderVector", "getRAbundVector"); exit(1); } } /***********************************************************************/ OrderVector SharedOrderVector::getOrderVector(map* nameMap = NULL) { try { OrderVector ov; for (int i = 0; i < data.size(); i++) { ov.push_back(data[i].bin); } random_shuffle(ov.begin(), ov.end()); ov.setLabel(label); return ov; } catch(exception& e) { m->errorOut(e, "SharedOrderVector", "getOrderVector"); exit(1); } } /***********************************************************************/ SAbundVector SharedOrderVector::getSAbundVector(){ RAbundVector rav(this->getRAbundVector()); return rav.getSAbundVector(); } /***********************************************************************/ SharedRAbundVector SharedOrderVector::getSharedRAbundVector(string group) { try { SharedRAbundVector sharedRav(data.size()); sharedRav.setLabel(label); sharedRav.setGroup(group); for (int i = 0; i < data.size(); i++) { if (data[i].group == group) { sharedRav.set(data[i].abundance, sharedRav.getAbundance(data[i].abundance) + 1, data[i].group); } } return sharedRav; } catch(exception& e) { m->errorOut(e, "SharedOrderVector", "getSharedRAbundVector"); exit(1); } } /***********************************************************************/ vector SharedOrderVector::getSharedRAbundVector() { try { SharedUtil* util; util = new SharedUtil(); vector lookup; vector Groups = m->getGroups(); vector allGroups = m->getAllGroups(); util->setGroups(Groups, allGroups); util->getSharedVectors(Groups, lookup, this); m->setGroups(Groups); m->setAllGroups(allGroups); return lookup; } catch(exception& e) { m->errorOut(e, "SharedOrderVector", "getSharedRAbundVector"); exit(1); } } /***********************************************************************/ SharedSAbundVector SharedOrderVector::getSharedSAbundVector(string group) { try { SharedRAbundVector sharedRav(this->getSharedRAbundVector(group)); return sharedRav.getSharedSAbundVector(); } catch(exception& e) { m->errorOut(e, "SharedOrderVector", "getSharedSAbundVector"); exit(1); } } /***********************************************************************/ SharedOrderVector SharedOrderVector::getSharedOrderVector(){ random_shuffle(data.begin(), data.end()); return *this; } /***********************************************************************/ void SharedOrderVector::updateStats(){ try { needToUpdate = 0; numSeqs = 0; numBins = 0; maxRank = 0; numSeqs = data.size(); vector hold(numSeqs, 0); for(int i=0;i 0) { numBins++; } if(hold[i] > maxRank) { maxRank = hold[i]; } } } catch(exception& e) { m->errorOut(e, "SharedOrderVector", "updateStats"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sharedordervector.h000066400000000000000000000054011255543666200234110ustar00rootroot00000000000000#ifndef SHAREDORDER_H #define SHAREDORDER_H /* * sharedorder.h * Mothur * * Created by Sarah Westcott on 12/9/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class is a child to datavector. It represents OTU information at a certain distance. It is similiar to an order vector except each member of data knows which group it belongs to. Each member of the internal container "data" represents is an individual which knows the OTU from which it came, the group it is in and the abundance is equal to the OTU number. */ #include "datavector.hpp" struct individual { string group; int bin; int abundance; bool operator()(const individual& i1, const individual& i2) { return (i1.abundance > i2.abundance); } individual() { group = ""; bin = 0; abundance = 0; } }; struct individualFloat { string group; int bin; float abundance; bool operator()(const individual& i1, const individual& i2) { return (i1.abundance > i2.abundance); } individualFloat() { group = ""; bin = 0; abundance = 0.0; } }; #include "sabundvector.hpp" #include "rabundvector.hpp" #include "sharedrabundvector.h" #include "sharedsabundvector.h" #include "groupmap.h" class SharedOrderVector : public DataVector { public: SharedOrderVector(); // SharedOrderVector(int ns, int nb=0, int mr=0) : DataVector(), data(ns, -1), maxRank(0), numBins(0), numSeqs(0) {}; SharedOrderVector(const SharedOrderVector& ov) : DataVector(ov.label), data(ov.data), maxRank(ov.maxRank), numBins(ov.numBins), numSeqs(ov.numSeqs), needToUpdate(ov.needToUpdate) {if(needToUpdate == 1){ updateStats();}}; SharedOrderVector(string, vector); SharedOrderVector(ifstream&); ~SharedOrderVector(){}; individual get(int); void resize(int); int size(); void print(ostream&); vector::iterator begin(); vector::iterator end(); void push_back(int, int, string); //OTU, abundance, group MUST CALL UPDATE STATS AFTER PUSHBACK!!! void updateStats(); void clear(); int getNumBins(); int getNumSeqs(); int getMaxRank(); RAbundVector getRAbundVector(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); SharedOrderVector getSharedOrderVector(); SharedRAbundVector getSharedRAbundVector(string); //get the sharedRabundvector for a sepecific group SharedSAbundVector getSharedSAbundVector(string); //get the sharedSabundvector for a sepecific group vector getSharedRAbundVector(); //returns sharedRabundVectors for all the users groups private: GroupMap* groupmap; vector data; map< int, vector >::iterator it; int maxRank; int numBins; int numSeqs; bool needToUpdate; void set(int, int, int, string); //index, OTU, abundance, group }; #endif mothur-1.36.1/source/datastructures/sharedrabundfloatvector.cpp000066400000000000000000000410261255543666200251350ustar00rootroot00000000000000/* * sharedrabundfloatvector.cpp * Mothur * * Created by westcott on 8/18/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "sharedrabundfloatvector.h" #include "sharedutilities.h" /***********************************************************************/ SharedRAbundFloatVector::SharedRAbundFloatVector() : DataVector(), maxRank(0.0), numBins(0), numSeqs(0.0) {} /***********************************************************************/ SharedRAbundFloatVector::~SharedRAbundFloatVector() {} /***********************************************************************/ SharedRAbundFloatVector::SharedRAbundFloatVector(int n) : DataVector(), maxRank(0.0), numBins(n), numSeqs(0.0) { individualFloat newGuy; //initialize data for (int i=0; i< n; i++) { newGuy.bin = i; newGuy.abundance = 0.0; data.push_back(newGuy); } } /***********************************************************************/ //reads a shared file SharedRAbundFloatVector::SharedRAbundFloatVector(ifstream& f) : DataVector(), maxRank(0.0), numBins(0), numSeqs(0.0) { try { m->clearAllGroups(); vector allGroups; int num, count; float inputData; count = 0; string holdLabel, nextLabel, groupN; individualFloat newguy; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } lookup.clear(); //are we at the beginning of the file?? if (m->saveNextLabel == "") { f >> label; //is this a shared file that has headers if (label == "label") { //gets "group" f >> label; m->gobble(f); //gets "numOtus" f >> label; m->gobble(f); //eat rest of line label = m->getline(f); m->gobble(f); //parse labels to save istringstream iStringStream(label); m->sharedBinLabelsInFile.clear(); while(!iStringStream.eof()){ if (m->control_pressed) { break; } string temp; iStringStream >> temp; m->gobble(iStringStream); m->sharedBinLabelsInFile.push_back(temp); } f >> label >> groupN >> num; }else { //read in first row since you know there is at least 1 group. f >> groupN >> num; //make binlabels because we don't have any string snumBins = toString(num); m->sharedBinLabelsInFile.clear(); for (int i = 0; i < num; i++) { //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; m->sharedBinLabelsInFile.push_back(binLabel); } } }else { label = m->saveNextLabel; //read in first row since you know there is at least 1 group. f >> groupN >> num; } //reset labels, currentLabels may have gotten changed as otus were eliminated because of group choices or sampling m->currentSharedBinLabels = m->sharedBinLabelsInFile; holdLabel = label; //add new vector to lookup SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); lookup.push_back(temp); lookup[0]->setLabel(label); lookup[0]->setGroup(groupN); allGroups.push_back(groupN); //fill vector. data = first sharedrabund in file for(int i=0;i> inputData; lookup[0]->push_back(inputData, groupN); //abundance, bin, group push_back(inputData, groupN); if (inputData > maxRank) { maxRank = inputData; } } m->gobble(f); if (f.eof() != true) { f >> nextLabel; } //read the rest of the groups info in while ((nextLabel == holdLabel) && (f.eof() != true)) { f >> groupN >> num; count++; allGroups.push_back(groupN); //add new vector to lookup temp = new SharedRAbundFloatVector(); lookup.push_back(temp); lookup[count]->setLabel(label); lookup[count]->setGroup(groupN); //fill vector. for(int i=0;i> inputData; lookup[count]->push_back(inputData, groupN); //abundance, bin, group } m->gobble(f); if (f.eof() != true) { f >> nextLabel; } } m->saveNextLabel = nextLabel; m->setAllGroups(allGroups); } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "SharedRAbundFloatVector"); exit(1); } } /***********************************************************************/ void SharedRAbundFloatVector::set(int binNumber, float newBinSize, string groupname){ try { float oldBinSize = data[binNumber].abundance; data[binNumber].abundance = newBinSize; data[binNumber].group = groupname; if(newBinSize > maxRank) { newBinSize = newBinSize; } numSeqs += (newBinSize - oldBinSize); } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "set"); exit(1); } } /***********************************************************************/ void SharedRAbundFloatVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; data.clear(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } lookup.clear(); } /***********************************************************************/ float SharedRAbundFloatVector::getAbundance(int index){ return data[index].abundance; } /***********************************************************************/ //returns vector of abundances vector SharedRAbundFloatVector::getAbundances(){ vector abunds; for (int i = 0; i < data.size(); i++) { abunds.push_back(data[i].abundance); } return abunds; } /***********************************************************************/ individualFloat SharedRAbundFloatVector::get(int index){ return data[index]; } /***********************************************************************/ void SharedRAbundFloatVector::push_back(float binSize, string groupName){ try { individualFloat newGuy; newGuy.abundance = binSize; newGuy.group = groupName; newGuy.bin = data.size(); data.push_back(newGuy); numBins++; if(binSize > maxRank){ maxRank = binSize; } numSeqs += binSize; } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "push_back"); exit(1); } } /***********************************************************************/ void SharedRAbundFloatVector::insert(float binSize, int otu, string groupName){ try { individualFloat newGuy; newGuy.abundance = binSize; newGuy.group = groupName; newGuy.bin = otu; data.insert(data.begin()+otu, newGuy); numBins++; if(binSize > maxRank){ maxRank = binSize; } numSeqs += binSize; } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "insert"); exit(1); } } /***********************************************************************/ void SharedRAbundFloatVector::push_front(float binSize, int otu, string groupName){ try { individualFloat newGuy; newGuy.abundance = binSize; newGuy.group = groupName; newGuy.bin = otu; data.insert(data.begin(), newGuy); numBins++; if(binSize > maxRank){ maxRank = binSize; } numSeqs += binSize; } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "push_front"); exit(1); } } /**********************************************************************/ void SharedRAbundFloatVector::pop_back(){ numSeqs -= data[data.size()-1].abundance; numBins--; data.pop_back(); } /***********************************************************************/ void SharedRAbundFloatVector::resize(int size){ data.resize(size); } /**********************************************************************/ int SharedRAbundFloatVector::size(){ return data.size(); } /***********************************************************************/ void SharedRAbundFloatVector::printHeaders(ostream& output){ try { string snumBins = toString(numBins); output << "label\tGroup\tnumOtus"; if (m->sharedHeaderMode == "tax") { for (int i = 0; i < numBins; i++) { //if there is a bin label use it otherwise make one string binLabel = "PhyloType"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } output << '\t' << binLabel; } output << endl; }else { for (int i = 0; i < numBins; i++) { //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } output << '\t' << binLabel; } output << endl; } m->printedSharedHeaders = true; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "printHeaders"); exit(1); } } /***********************************************************************/ void SharedRAbundFloatVector::print(ostream& output){ try { output << numBins; for(int i=0;ierrorOut(e, "SharedRAbundFloatVector", "print"); exit(1); } } /***********************************************************************/ string SharedRAbundFloatVector::getGroup(){ return group; } /***********************************************************************/ void SharedRAbundFloatVector::setGroup(string groupName){ group = groupName; } /***********************************************************************/ int SharedRAbundFloatVector::getGroupIndex() { return index; } /***********************************************************************/ void SharedRAbundFloatVector::setGroupIndex(int vIndex) { index = vIndex; } /***********************************************************************/ int SharedRAbundFloatVector::getNumBins(){ return numBins; } /***********************************************************************/ float SharedRAbundFloatVector::getNumSeqs(){ return numSeqs; } /***********************************************************************/ float SharedRAbundFloatVector::getMaxRank(){ return maxRank; } /***********************************************************************/ SharedRAbundFloatVector SharedRAbundFloatVector::getSharedRAbundFloatVector(){ return *this; } /*********************************************************************** SharedRAbundVector SharedRAbundFloatVector::getSharedRAbundVector(){ try { SharedRAbundVector rav(numBins); rav.setLabel(label); rav.setGroup(group); for (int i = 0; i < data.size(); i++) { rav.push_back(data[i].abundance); } } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "getSharedRAbundVector"); exit(1); } } ***********************************************************************/ vector SharedRAbundFloatVector::getSharedRAbundFloatVectors(){ try { SharedUtil* util; util = new SharedUtil(); vector Groups = m->getGroups(); vector allGroups = m->getAllGroups(); util->setGroups(Groups, allGroups); m->setGroups(Groups); bool remove = false; for (int i = 0; i < lookup.size(); i++) { //if this sharedrabund is not from a group the user wants then delete it. if (util->isValidGroup(lookup[i]->getGroup(), m->getGroups()) == false) { delete lookup[i]; lookup[i] = NULL; lookup.erase(lookup.begin()+i); i--; remove = true; } } delete util; if (remove) { eliminateZeroOTUS(lookup); } return lookup; } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "getSharedRAbundFloatVectors"); exit(1); } } /***********************************************************************/ RAbundVector SharedRAbundFloatVector::getRAbundVector() { try { RAbundVector rav(numBins); //this is not functional, not sure how to handle it yet, but I need the stub because it is a pure function rav.setLabel(label); return rav; } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "getRAbundVector"); exit(1); } } /*********************************************************************** SharedSAbundVector SharedRAbundVector::getSharedSAbundVector(){ try { SharedSAbundVector sav(maxRank+1); for(int i=0;ierrorOut(e, "SharedRAbundVector", "getSharedSAbundVector"); exit(1); } } ***********************************************************************/ SAbundVector SharedRAbundFloatVector::getSAbundVector() { try { SAbundVector sav(ceil(maxRank)+1); //this is not functional, not sure how to handle it yet, but I need the stub because it is a pure function sav.set(0, 0); sav.setLabel(label); return sav; } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "getSAbundVector"); exit(1); } } /*********************************************************************** SharedOrderVector SharedRAbundFloatVector::getSharedOrderVector() { try { SharedOrderVector ov; for(int i=0;ierrorOut(e, "SharedRAbundFloatVector", "getSharedOrderVector"); exit(1); } } ***********************************************************************/ //this is not functional, not sure how to handle it yet, but I need the stub because it is a pure function OrderVector SharedRAbundFloatVector::getOrderVector(map* nameMap = NULL) { try { OrderVector ov; for(int i=0;ierrorOut(e, "SharedRAbundFloatVector", "getOrderVector"); exit(1); } } //********************************************************************************************************************** int SharedRAbundFloatVector::eliminateZeroOTUS(vector& thislookup) { try { vector newLookup; for (int i = 0; i < thislookup.size(); i++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); newLookup.push_back(temp); } //for each bin vector newBinLabels; string snumBins = toString(thislookup[0]->getNumBins()); for (int i = 0; i < thislookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //look at each sharedRabund and make sure they are not all zero bool allZero = true; for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { allZero = false; break; } } //if they are not all zero add this bin if (!allZero) { for (int j = 0; j < thislookup.size(); j++) { newLookup[j]->push_back(thislookup[j]->getAbundance(i), thislookup[j]->getGroup()); } //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } newBinLabels.push_back(binLabel); } } for (int j = 0; j < thislookup.size(); j++) { delete thislookup[j]; } thislookup = newLookup; m->currentSharedBinLabels = newBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "SharedRAbundFloatVector", "eliminateZeroOTUS"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sharedrabundfloatvector.h000066400000000000000000000045761255543666200246130ustar00rootroot00000000000000#ifndef SHAREDRABUNDFLOATVECTOR_H #define SHAREDRABUNDFLOATVECTOR_H /* * sharedrabundfloatvector.h * Mothur * * Created by westcott on 8/18/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "datavector.hpp" #include "sharedordervector.h" #include "sharedsabundvector.h" #include "rabundvector.hpp" //#include "groupmap.h" /* This class is a child to datavector. It represents OTU information at a certain distance. It is similiar to an rabundvector except each member of data knows which group it belongs to. Each member of the internal container "data" is a struct of type individualFloat. An individual which knows the OTU from which it came, the group it is in and its abundance. */ class SharedRAbundFloatVector : public DataVector { public: SharedRAbundFloatVector(); SharedRAbundFloatVector(int); SharedRAbundFloatVector(const SharedRAbundFloatVector& bv) : DataVector(bv), data(bv.data), maxRank(bv.maxRank), numBins(bv.numBins), numSeqs(bv.numSeqs), group(bv.group), index(bv.index){}; SharedRAbundFloatVector(ifstream&); ~SharedRAbundFloatVector(); int getNumBins(); float getNumSeqs(); float getMaxRank(); string getGroup(); void setGroup(string); int getGroupIndex(); void setGroupIndex(int); void set(int, float, string); //OTU, abundance, groupname individualFloat get(int); vector getData(); float getAbundance(int); vector getAbundances(); void push_front(float, int, string); //abundance, otu, groupname void insert(float, int, string); //abundance, otu, groupname void push_back(float, string); //abundance, groupname void pop_back(); void resize(int); void clear(); int size(); void print(ostream&); void printHeaders(ostream&); RAbundVector getRAbundVector(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); //SharedOrderVector getSharedOrderVector(); //SharedSAbundVector getSharedSAbundVector(); //SharedRAbundVector getSharedRAbundVector(); SharedRAbundFloatVector getSharedRAbundFloatVector(); vector getSharedRAbundFloatVectors(); private: vector data; vector lookup; //GlobalData* globaldata; //GroupMap* groupmap; float maxRank; int numBins; float numSeqs; string group; int index; int eliminateZeroOTUS(vector&); }; #endif mothur-1.36.1/source/datastructures/sharedrabundvector.cpp000066400000000000000000000460651255543666200241170ustar00rootroot00000000000000/* * sharedvector.cpp * Dotur * * Created by Sarah Westcott on 12/5/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedrabundvector.h" #include "sabundvector.hpp" #include "ordervector.hpp" #include "sharedutilities.h" /***********************************************************************/ SharedRAbundVector::SharedRAbundVector() : DataVector(), maxRank(0), numBins(0), numSeqs(0) {} /***********************************************************************/ SharedRAbundVector::~SharedRAbundVector() { //for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } } /***********************************************************************/ SharedRAbundVector::SharedRAbundVector(int n) : DataVector(), maxRank(0), numBins(n), numSeqs(0) { individual newGuy; //initialize data for (int i=0; i< n; i++) { newGuy.bin = i; newGuy.abundance = 0; data.push_back(newGuy); } } /*********************************************************************** SharedRAbundVector::SharedRAbundVector(string id, vector rav) : DataVector(id), data(rav) { try { numBins = 0; maxRank = 0; numSeqs = 0; for(int i=0;i maxRank) { maxRank = data[i].abundance; } numSeqs += data[i].abundance; } } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "SharedRAbundVector"); exit(1); } } ***********************************************************************/ //reads a shared file SharedRAbundVector::SharedRAbundVector(ifstream& f) : DataVector(), maxRank(0), numBins(0), numSeqs(0) { try { m->clearAllGroups(); vector allGroups; int num, inputData, count; count = 0; string holdLabel, nextLabel, groupN; individual newguy; for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } lookup.clear(); //are we at the beginning of the file?? if (m->saveNextLabel == "") { f >> label; //is this a shared file that has headers if (label == "label") { //gets "group" f >> label; m->gobble(f); //gets "numOtus" f >> label; m->gobble(f); //eat rest of line label = m->getline(f); m->gobble(f); //parse labels to save istringstream iStringStream(label); m->sharedBinLabelsInFile.clear(); while(!iStringStream.eof()){ if (m->control_pressed) { break; } string temp; iStringStream >> temp; m->gobble(iStringStream); m->sharedBinLabelsInFile.push_back(temp); } f >> label >> groupN >> num; }else { //read in first row since you know there is at least 1 group. f >> groupN >> num; //make binlabels because we don't have any string snumBins = toString(num); m->sharedBinLabelsInFile.clear(); for (int i = 0; i < num; i++) { //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; m->sharedBinLabelsInFile.push_back(binLabel); } } }else { label = m->saveNextLabel; //read in first row since you know there is at least 1 group. f >> groupN >> num; if (m->debug) { m->mothurOut("[DEBUG]: "+ groupN + '\t' + toString(num)); } } //reset labels, currentLabels may have gotten changed as otus were eliminated because of group choices or sampling m->currentSharedBinLabels = m->sharedBinLabelsInFile; holdLabel = label; //add new vector to lookup SharedRAbundVector* temp = new SharedRAbundVector(); lookup.push_back(temp); lookup[0]->setLabel(label); lookup[0]->setGroup(groupN); allGroups.push_back(groupN); //fill vector. data = first sharedrabund in file for(int i=0;i> inputData; if (m->debug) { m->mothurOut("[DEBUG]: OTU" + toString(i+1)+ '\t' +toString(inputData)); } lookup[0]->push_back(inputData, groupN); //abundance, bin, group push_back(inputData, groupN); if (inputData > maxRank) { maxRank = inputData; } } m->gobble(f); if (!(f.eof())) { f >> nextLabel; } //read the rest of the groups info in while ((nextLabel == holdLabel) && (f.eof() != true)) { f >> groupN >> num; if (m->debug) { m->mothurOut("[DEBUG]: "+ groupN + '\t' + toString(num)); } count++; allGroups.push_back(groupN); //add new vector to lookup temp = new SharedRAbundVector(); lookup.push_back(temp); lookup[count]->setLabel(label); lookup[count]->setGroup(groupN); //fill vector. for(int i=0;i> inputData; if (m->debug) { m->mothurOut("[DEBUG]: OTU" + toString(i+1)+ '\t' +toString(inputData)); } lookup[count]->push_back(inputData, groupN); //abundance, bin, group } m->gobble(f); if (f.eof() != true) { f >> nextLabel; } } m->saveNextLabel = nextLabel; m->setAllGroups(allGroups); } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "SharedRAbundVector"); exit(1); } } /***********************************************************************/ void SharedRAbundVector::set(int binNumber, int newBinSize, string groupname){ try { int oldBinSize = data[binNumber].abundance; data[binNumber].abundance = newBinSize; data[binNumber].group = groupname; if(newBinSize > maxRank) { maxRank = newBinSize; } numSeqs += (newBinSize - oldBinSize); } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "set"); exit(1); } } /***********************************************************************/ void SharedRAbundVector::setData(vector newData){ data = newData; } /***********************************************************************/ int SharedRAbundVector::getAbundance(int index){ return data[index].abundance; } /***********************************************************************/ //returns vector of abundances vector SharedRAbundVector::getAbundances(){ vector abunds; for (int i = 0; i < data.size(); i++) { abunds.push_back(data[i].abundance); } return abunds; } /***********************************************************************/ int SharedRAbundVector::numNZ(){ int sum = 0; for(int i = 1; i < numBins; i++) if(data[i].abundance > 0) sum++; return sum; } /***********************************************************************/ void SharedRAbundVector::sortD(){ struct individual indObj; sort(data.begin()+1, data.end(), indObj); } /***********************************************************************/ individual SharedRAbundVector::get(int index){ return data[index]; } /***********************************************************************/ vector SharedRAbundVector::getData(){ return data; } /***********************************************************************/ void SharedRAbundVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; data.clear(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; lookup[i] = NULL; } lookup.clear(); } /***********************************************************************/ void SharedRAbundVector::push_back(int binSize, string groupName){ try { individual newGuy; newGuy.abundance = binSize; newGuy.group = groupName; newGuy.bin = data.size(); data.push_back(newGuy); numBins++; if(binSize > maxRank){ maxRank = binSize; } numSeqs += binSize; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "push_back"); exit(1); } } /***********************************************************************/ void SharedRAbundVector::insert(int binSize, int otu, string groupName){ try { individual newGuy; newGuy.abundance = binSize; newGuy.group = groupName; newGuy.bin = otu; data.insert(data.begin()+otu, newGuy); numBins++; if(binSize > maxRank){ maxRank = binSize; } numSeqs += binSize; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "insert"); exit(1); } } /***********************************************************************/ void SharedRAbundVector::push_front(int binSize, int otu, string groupName){ try { individual newGuy; newGuy.abundance = binSize; newGuy.group = groupName; newGuy.bin = otu; data.insert(data.begin(), newGuy); numBins++; if(binSize > maxRank){ maxRank = binSize; } numSeqs += binSize; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "push_front"); exit(1); } } /***********************************************************************/ void SharedRAbundVector::pop_back(){ numSeqs -= data[data.size()-1].abundance; numBins--; return data.pop_back(); } /***********************************************************************/ vector::reverse_iterator SharedRAbundVector::rbegin(){ return data.rbegin(); } /***********************************************************************/ vector::reverse_iterator SharedRAbundVector::rend(){ return data.rend(); } /***********************************************************************/ void SharedRAbundVector::resize(int size){ data.resize(size); } /***********************************************************************/ int SharedRAbundVector::size(){ return data.size(); } /***********************************************************************/ void SharedRAbundVector::printHeaders(ostream& output){ try { string snumBins = toString(numBins); output << "label\tGroup\tnumOtus"; if (m->sharedHeaderMode == "tax") { for (int i = 0; i < numBins; i++) { //if there is a bin label use it otherwise make one string binLabel = "PhyloType"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } output << '\t' << binLabel ; } output << endl; }else { for (int i = 0; i < numBins; i++) { //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } output << '\t' << binLabel; } output << endl; } m->printedSharedHeaders = true; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "printHeaders"); exit(1); } } /***********************************************************************/ void SharedRAbundVector::print(ostream& output) { try { output << numBins; for(int i=0;ierrorOut(e, "SharedRAbundVector", "print"); exit(1); } } /***********************************************************************/ string SharedRAbundVector::getGroup(){ return group; } /***********************************************************************/ void SharedRAbundVector::setGroup(string groupName){ group = groupName; } /***********************************************************************/ int SharedRAbundVector::getGroupIndex() { return index; } /***********************************************************************/ void SharedRAbundVector::setGroupIndex(int vIndex) { index = vIndex; } /***********************************************************************/ int SharedRAbundVector::getNumBins(){ return numBins; } /***********************************************************************/ int SharedRAbundVector::getNumSeqs(){ return numSeqs; } /***********************************************************************/ int SharedRAbundVector::getMaxRank(){ return maxRank; } /***********************************************************************/ SharedRAbundVector SharedRAbundVector::getSharedRAbundVector(){ return *this; } /***********************************************************************/ vector SharedRAbundVector::getSharedRAbundVectors(){ try { SharedUtil* util; util = new SharedUtil(); vector Groups = m->getGroups(); vector allGroups = m->getAllGroups(); util->setGroups(Groups, allGroups); m->setGroups(Groups); bool remove = false; for (int i = 0; i < lookup.size(); i++) { //if this sharedrabund is not from a group the user wants then delete it. if (util->isValidGroup(lookup[i]->getGroup(), m->getGroups()) == false) { remove = true; delete lookup[i]; lookup[i] = NULL; lookup.erase(lookup.begin()+i); i--; } } delete util; if (remove) { eliminateZeroOTUS(lookup); } return lookup; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "getSharedRAbundVectors"); exit(1); } } //********************************************************************************************************************** int SharedRAbundVector::eliminateZeroOTUS(vector& thislookup) { try { vector newLookup; for (int i = 0; i < thislookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); newLookup.push_back(temp); } //for each bin vector newBinLabels; string snumBins = toString(thislookup[0]->getNumBins()); for (int i = 0; i < thislookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //look at each sharedRabund and make sure they are not all zero bool allZero = true; for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { allZero = false; break; } } //if they are not all zero add this bin if (!allZero) { for (int j = 0; j < thislookup.size(); j++) { newLookup[j]->push_back(thislookup[j]->getAbundance(i), thislookup[j]->getGroup()); } //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } newBinLabels.push_back(binLabel); } } for (int j = 0; j < thislookup.size(); j++) { delete thislookup[j]; } thislookup = newLookup; m->currentSharedBinLabels = newBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "eliminateZeroOTUS"); exit(1); } } /***********************************************************************/ vector SharedRAbundVector::getSharedRAbundFloatVectors(vector thislookup){ try { vector newLookupFloat; for (int i = 0; i < lookup.size(); i++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); newLookupFloat.push_back(temp); } for (int i = 0; i < thislookup.size(); i++) { for (int j = 0; j < thislookup[i]->getNumBins(); j++) { if (m->control_pressed) { return newLookupFloat; } int abund = thislookup[i]->getAbundance(j); float relabund = abund / (float) thislookup[i]->getNumSeqs(); newLookupFloat[i]->push_back(relabund, thislookup[i]->getGroup()); } } return newLookupFloat; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "getSharedRAbundVectors"); exit(1); } } /***********************************************************************/ RAbundVector SharedRAbundVector::getRAbundVector() { try { RAbundVector rav; for (int i = 0; i < data.size(); i++) { if(data[i].abundance != 0) { rav.push_back(data[i].abundance); } } rav.setLabel(label); return rav; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "getRAbundVector"); exit(1); } } /***********************************************************************/ RAbundVector SharedRAbundVector::getRAbundVector2() { try { RAbundVector rav; for(int i = 0; i < numBins; i++) if(data[i].abundance != 0) rav.push_back(data[i].abundance-1); return rav; } catch(exception& e) { m->errorOut(e, "SharedRAbundVector", "getRAbundVector2"); exit(1); } } /***********************************************************************/ SharedSAbundVector SharedRAbundVector::getSharedSAbundVector(){ try { SharedSAbundVector sav(maxRank+1); for(int i=0;ierrorOut(e, "SharedRAbundVector", "getSharedSAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector SharedRAbundVector::getSAbundVector() { try { SAbundVector sav(maxRank+1); for(int i=0;ierrorOut(e, "SharedRAbundVector", "getSAbundVector"); exit(1); } } /***********************************************************************/ SharedOrderVector SharedRAbundVector::getSharedOrderVector() { try { SharedOrderVector ov; for(int i=0;ierrorOut(e, "SharedRAbundVector", "getSharedOrderVector"); exit(1); } } /***********************************************************************/ OrderVector SharedRAbundVector::getOrderVector(map* nameMap = NULL) { try { OrderVector ov; for(int i=0;ierrorOut(e, "SharedRAbundVector", "getOrderVector"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sharedrabundvector.h000066400000000000000000000053231255543666200235540ustar00rootroot00000000000000#ifndef SHAREDRABUNDVECTOR_H #define SHAREDRABUNDVECTOR_H /* * sharedrabundvector.h * Dotur * * Created by Sarah Westcott on 12/5/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "datavector.hpp" #include "sharedordervector.h" #include "sharedsabundvector.h" #include "sharedrabundfloatvector.h" #include "rabundvector.hpp" //#include "groupmap.h" /* DataStructure for a shared file. This class is a child to datavector. It represents OTU information at a certain distance. It is similiar to an rabundvector except each member of data knows which group it belongs to. Each member of the internal container "data" is a struct of type individual. An individual which knows the OTU from which it came, the group it is in and its abundance. */ class SharedRAbundVector : public DataVector { public: SharedRAbundVector(); SharedRAbundVector(int); //SharedRAbundVector(string, vector); SharedRAbundVector(const SharedRAbundVector& bv) : DataVector(bv), data(bv.data), maxRank(bv.maxRank), numBins(bv.numBins), numSeqs(bv.numSeqs), group(bv.group), index(bv.index){}; SharedRAbundVector(ifstream&); ~SharedRAbundVector(); int getNumBins(); int getNumSeqs(); int getMaxRank(); string getGroup(); void setGroup(string); string getBinLabel(); void setBinLabel(string); int getGroupIndex(); void setGroupIndex(int); void set(int, int, string); //OTU, abundance, groupname void setData(vector ); individual get(int); vector getData(); int getAbundance(int); vector getAbundances(); int numNZ(); void sortD(); //Sorts the data in descending order. void push_front(int, int, string); //abundance, otu, groupname void insert(int, int, string); //abundance, otu, groupname void push_back(int, string); //abundance, groupname void pop_back(); void resize(int); int size(); void clear(); vector::reverse_iterator rbegin(); vector::reverse_iterator rend(); void print(ostream&); void printHeaders(ostream&); RAbundVector getRAbundVector(); RAbundVector getRAbundVector2(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); SharedOrderVector getSharedOrderVector(); SharedSAbundVector getSharedSAbundVector(); SharedRAbundVector getSharedRAbundVector(); vector getSharedRAbundVectors(); vector getSharedRAbundFloatVectors(vector); private: vector data; vector lookup; //GlobalData* globaldata; //GroupMap* groupmap; int maxRank; int numBins; int numSeqs; string group; int index; int eliminateZeroOTUS(vector&); }; #endif mothur-1.36.1/source/datastructures/sharedsabundvector.cpp000066400000000000000000000141221255543666200241050ustar00rootroot00000000000000/* * sharedSharedSAbundVector.cpp * Dotur * * Created by Sarah Westcott on 12/10/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedsabundvector.h" #include "sabundvector.hpp" /***********************************************************************/ SharedSAbundVector::SharedSAbundVector() : DataVector(), maxRank(0), numBins(0), numSeqs(0){ } /***********************************************************************/ SharedSAbundVector::SharedSAbundVector(int size) : DataVector(), maxRank(0), numBins(0), numSeqs(0) { individual newGuy; //initialize data for (int i=0; i< size; i++) { newGuy.bin = i; newGuy.abundance = 0; data.push_back(newGuy); } } /***********************************************************************/ void SharedSAbundVector::set(int bin, int abundance, string groupName){ try { int initSize = data[bin].abundance; data[bin].abundance = abundance; data[bin].group = groupName; if(bin != 0){ numBins += (abundance - initSize); } numSeqs += bin * (abundance - initSize); if(bin > maxRank) { maxRank = bin; } } catch(exception& e) { m->errorOut(e, "SharedSAbundVector", "set"); exit(1); } } /***********************************************************************/ individual SharedSAbundVector::get(int index){ return data[index]; } /***********************************************************************/ int SharedSAbundVector::getAbundance(int index){ return data[index].abundance; } /***********************************************************************/ void SharedSAbundVector::push_back(int abundance, int bin, string groupName){ try { individual newGuy; newGuy.abundance = abundance; newGuy.bin = bin; newGuy.group = groupName; data.push_back(newGuy); maxRank++; numBins += abundance; numSeqs += (maxRank * abundance); } catch(exception& e) { m->errorOut(e, "SharedSAbundVector", "push_back"); exit(1); } } /***********************************************************************/ void SharedSAbundVector::resize(int size){ data.resize(size); } /***********************************************************************/ int SharedSAbundVector::size(){ return data.size(); } /***********************************************************************/ void SharedSAbundVector::print(ostream& output){ try { output << label << '\t' << maxRank; for(int i=1;i<=maxRank;i++){ output << '\t' << data[i].abundance; } output << endl; } catch(exception& e) { m->errorOut(e, "SharedSAbundVector", "print"); exit(1); } } /***********************************************************************/ string SharedSAbundVector::getGroup(){ return group; } /***********************************************************************/ void SharedSAbundVector::setGroup(string groupName){ group = groupName; } /**********************************************************************/ int SharedSAbundVector::getNumBins(){ return numBins; } /***********************************************************************/ int SharedSAbundVector::getNumSeqs(){ return numSeqs; } /***********************************************************************/ int SharedSAbundVector::getMaxRank(){ return maxRank; } /***********************************************************************/ RAbundVector SharedSAbundVector::getRAbundVector(){ try { RAbundVector rav; for(int i=1;ierrorOut(e, "SharedSAbundVector", "getRAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector SharedSAbundVector::getSAbundVector(){ try { RAbundVector rav; SAbundVector sav; rav = getRAbundVector(); sav = rav.getSAbundVector(); return sav; } catch(exception& e) { m->errorOut(e, "SharedSAbundVector", "getSAbundVector"); exit(1); } } /***********************************************************************/ bool compareMembers (individual member, individual member2){ if(member.abundance < member2.abundance){ return true; } else{ return false; } } /***********************************************************************/ SharedRAbundVector SharedSAbundVector::getSharedRAbundVector(){ try { SharedRAbundVector rav; for(int i=1;ierrorOut(e, "SharedSAbundVector", "getSharedRAbundVector"); exit(1); } } /***********************************************************************/ SharedSAbundVector SharedSAbundVector::getSharedSAbundVector(){ return *this; } /***********************************************************************/ SharedOrderVector SharedSAbundVector::getSharedOrderVector() { try { SharedRAbundVector rav; SharedOrderVector ov; rav = this->getSharedRAbundVector(); ov = rav.getSharedOrderVector(); ov.updateStats(); return ov; } catch(exception& e) { m->errorOut(e, "SharedSAbundVector", "getSharedOrderVector"); exit(1); } } /***********************************************************************/ void SharedSAbundVector::clear(){ numBins = 0; maxRank = 0; numSeqs = 0; data.clear(); } /***********************************************************************/ OrderVector SharedSAbundVector::getOrderVector(map* hold = NULL){ try { OrderVector ov; int binIndex = 0; for(int i=1;ierrorOut(e, "SharedSAbundVector", "getOrderVector"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sharedsabundvector.h000066400000000000000000000033001255543666200235460ustar00rootroot00000000000000#ifndef SHAREDSABUND_H #define SHAREDSABUND_H /* * sharedSharedSAbundVector.h * Dotur * * Created by Sarah Westcott on 12/10/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "datavector.hpp" #include "rabundvector.hpp" #include "ordervector.hpp" #include "sharedordervector.h" #include "sharedrabundvector.h" /* This class is a child to datavector. It represents OTU information at a certain distance. It is similiar to an sabundvector except each member of data knows which group it belongs to. Each member of the internal container "data" is a struct of type individual. An individual which knows the OTU from which it came, the group it is in and its abundance. */ class SharedSAbundVector : public DataVector { public: SharedSAbundVector(); SharedSAbundVector(int); SharedSAbundVector(const SharedSAbundVector& rv) : DataVector(rv.label), data(rv.data), maxRank(rv.maxRank), numBins(rv.numBins), numSeqs(rv.numSeqs){}; ~SharedSAbundVector(){}; int getNumBins(); int getNumSeqs(); int getMaxRank(); string getGroup(); void setGroup(string); void set(int, int, string); //OTU, abundance, group individual get(int); int getAbundance(int); void push_back(int, int, string); //abundance, OTU, group void pop_back(); void resize(int); int size(); void clear(); void print(ostream&); RAbundVector getRAbundVector(); SAbundVector getSAbundVector(); OrderVector getOrderVector(map*); SharedSAbundVector getSharedSAbundVector(); SharedRAbundVector getSharedRAbundVector(); SharedOrderVector getSharedOrderVector(); private: vector data; int maxRank; int numBins; int numSeqs; string group; }; #endif mothur-1.36.1/source/datastructures/sparsedistancematrix.cpp000066400000000000000000000133561255543666200244640ustar00rootroot00000000000000// // sparsedistancematrix.cpp // Mothur // // Created by Sarah Westcott on 7/16/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "sparsedistancematrix.h" /***********************************************************************/ SparseDistanceMatrix::SparseDistanceMatrix() : numNodes(0), smallDist(1e6){ m = MothurOut::getInstance(); sorted=false; aboveCutoff = 1e6; } /***********************************************************************/ int SparseDistanceMatrix::getNNodes(){ return numNodes; } /***********************************************************************/ void SparseDistanceMatrix::clear(){ for (int i = 0; i < seqVec.size(); i++) { seqVec[i].clear(); } seqVec.clear(); } /***********************************************************************/ float SparseDistanceMatrix::getSmallDist(){ return smallDist; } /***********************************************************************/ int SparseDistanceMatrix::updateCellCompliment(ull row, ull col){ try { ull vrow = seqVec[row][col].index; ull vcol = 0; //find the columns entry for this cell as well for (int i = 0; i < seqVec[vrow].size(); i++) { if (seqVec[vrow][i].index == row) { vcol = i; break; } } seqVec[vrow][vcol].dist = seqVec[row][col].dist; return 0; } catch(exception& e) { m->errorOut(e, "SparseDistanceMatrix", "updateCellCompliment"); exit(1); } } /***********************************************************************/ int SparseDistanceMatrix::rmCell(ull row, ull col){ try { numNodes-=2; ull vrow = seqVec[row][col].index; ull vcol = 0; //find the columns entry for this cell as well for (int i = 0; i < seqVec[vrow].size(); i++) { if (seqVec[vrow][i].index == row) { vcol = i; break; } } seqVec[vrow].erase(seqVec[vrow].begin()+vcol); seqVec[row].erase(seqVec[row].begin()+col); return(0); } catch(exception& e) { m->errorOut(e, "SparseDistanceMatrix", "rmCell"); exit(1); } } /***********************************************************************/ void SparseDistanceMatrix::addCell(ull row, PDistCell cell){ try { numNodes+=2; if(cell.dist < smallDist){ smallDist = cell.dist; } seqVec[row].push_back(cell); PDistCell temp(row, cell.dist); seqVec[cell.index].push_back(temp); } catch(exception& e) { m->errorOut(e, "SparseDistanceMatrix", "addCell"); exit(1); } } /***********************************************************************/ int SparseDistanceMatrix::addCellSorted(ull row, PDistCell cell){ try { numNodes+=2; if(cell.dist < smallDist){ smallDist = cell.dist; } seqVec[row].push_back(cell); PDistCell temp(row, cell.dist); seqVec[cell.index].push_back(temp); sortSeqVec(row); sortSeqVec(cell.index); int location = -1; //find location of new cell when sorted for (int i = 0; i < seqVec[row].size(); i++) { if (seqVec[row][i].index == cell.index) { location = i; break; } } return location; } catch(exception& e) { m->errorOut(e, "SparseDistanceMatrix", "addCellSorted"); exit(1); } } /***********************************************************************/ ull SparseDistanceMatrix::getSmallestCell(ull& row){ try { if (!sorted) { sortSeqVec(); sorted = true; } vector mins; smallDist = 1e6; for (int i = 0; i < seqVec.size(); i++) { for (int j = 0; j < seqVec[i].size(); j++) { if (m->control_pressed) { return smallDist; } //already checked everyone else in row if (i < seqVec[i][j].index) { float dist = seqVec[i][j].dist; if(dist < smallDist){ //found a new smallest distance mins.clear(); smallDist = dist; PDistCellMin temp(i, seqVec[i][j].index); mins.push_back(temp); } else if(dist == smallDist){ //if a subsequent distance is the same as mins distance add the new iterator to the mins vector PDistCellMin temp(i, seqVec[i][j].index); mins.push_back(temp); } }else { j+=seqVec[i].size(); } //stop looking } } //random_shuffle(mins.begin(), mins.end()); //randomize the order of the iterators in the mins vector row = mins[0].row; ull col = mins[0].col; return col; } catch(exception& e) { m->errorOut(e, "SparseDistanceMatrix", "getSmallestCell"); exit(1); } } /***********************************************************************/ int SparseDistanceMatrix::sortSeqVec(){ try { //saves time in getSmallestCell, by making it so you dont search the repeats for (int i = 0; i < seqVec.size(); i++) { sort(seqVec[i].begin(), seqVec[i].end(), compareIndexes); } return 0; } catch(exception& e) { m->errorOut(e, "SparseDistanceMatrix", "sortSeqVec"); exit(1); } } /***********************************************************************/ int SparseDistanceMatrix::sortSeqVec(int index){ try { //saves time in getSmallestCell, by making it so you dont search the repeats sort(seqVec[index].begin(), seqVec[index].end(), compareIndexes); return 0; } catch(exception& e) { m->errorOut(e, "SparseDistanceMatrix", "sortSeqVec"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sparsedistancematrix.h000066400000000000000000000031601255543666200241210ustar00rootroot00000000000000#ifndef Mothur_sparsedistancematrix_h #define Mothur_sparsedistancematrix_h // // sparsedistancematrix.h // Mothur // // Created by Sarah Westcott on 7/16/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "mothur.h" #include "mothurout.h" class ListVector; /* For each distance in a sparse matrix we have a row, column and distance. The PDistCell consists of the column and distance. We know the row by the distances row in the seqVec matrix. SeqVec is square and each row is sorted so the column values are ascending to save time in the search for the smallest distance. */ /***********************************************************************/ struct PDistCellMin{ ull row; ull col; //PDistCell* cell; PDistCellMin(ull r, ull c) : col(c), row(r) {} }; /***********************************************************************/ class SparseDistanceMatrix { public: SparseDistanceMatrix(); ~SparseDistanceMatrix(){ clear(); } int getNNodes(); ull getSmallestCell(ull& index); //Return the cell with the smallest distance float getSmallDist(); int rmCell(ull, ull); int updateCellCompliment(ull, ull); void resize(ull n) { seqVec.resize(n); } void clear(); void addCell(ull, PDistCell); int addCellSorted(ull, PDistCell); vector > seqVec; private: PDistCell smallCell; //The cell with the smallest distance int numNodes; bool sorted; int sortSeqVec(); int sortSeqVec(int); float smallDist, aboveCutoff; MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/datastructures/sparsematrix.cpp000066400000000000000000000112241255543666200227410ustar00rootroot00000000000000 #include "sparsematrix.hpp" #include "listvector.hpp" /***********************************************************************/ SparseMatrix::SparseMatrix() : numNodes(0), minsIndex(0), smallDist(1e6){ m = MothurOut::getInstance(); } /***********************************************************************/ int SparseMatrix::getNNodes(){ return numNodes; } /***********************************************************************/ float SparseMatrix::getSmallDist(){ return smallDist; } /***********************************************************************/ MatData SparseMatrix::rmCell(MatData data){ try { if(data->vectorMap != NULL ){ *(data->vectorMap) = NULL; data->vectorMap = NULL; } data = matrix.erase(data); numNodes--; return(data); // seems like i should be updating smallDist here, but the only time we remove cells is when // clustering and the clustering algorithm updates smallDist } catch(exception& e) { m->errorOut(e, "SparseMatrix", "rmCell"); exit(1); } } /***********************************************************************/ void SparseMatrix::addCell(PCell value){ try { matrix.push_back(value); numNodes++; if(value.dist < smallDist){ smallDist = value.dist; } } catch(exception& e) { m->errorOut(e, "SparseMatrix", "addCell"); exit(1); } } /***********************************************************************/ void SparseMatrix::clear(){ try { matrix.clear(); mins.clear(); numNodes = 0; minsIndex = 0; smallDist = 1e6; } catch(exception& e) { m->errorOut(e, "SparseMatrix", "clear"); exit(1); } } /***********************************************************************/ MatData SparseMatrix::begin(){ return matrix.begin(); } /***********************************************************************/ MatData SparseMatrix::end(){ return matrix.end(); } /***********************************************************************/ void SparseMatrix::print(){ try { int index = 0; cout << endl << "Index\tRow\tColumn\tDistance" << endl; for(MatData currentCell=matrix.begin();currentCell!=matrix.end();currentCell++){ cout << index << '\t' << currentCell->row << '\t' << currentCell->column << '\t' << currentCell->dist << endl; index++; } } catch(exception& e) { m->errorOut(e, "SparseMatrix", "print"); exit(1); } } /***********************************************************************/ void SparseMatrix::print(ListVector* list){ try { int index = 0; m->mothurOutEndLine(); m->mothurOut("Index\tRow\tColumn\tDistance"); m->mothurOutEndLine(); for(MatData currentCell=matrix.begin();currentCell!=matrix.end();currentCell++){ m->mothurOut(toString(index) + "\t" + toString(list->get(currentCell->row)) + "\t" + toString(list->get(currentCell->column)) + "\t" + toString(currentCell->dist)); m->mothurOutEndLine(); index++; } } catch(exception& e) { m->errorOut(e, "SparseMatrix", "print"); exit(1); } } /***********************************************************************/ PCell* SparseMatrix::getSmallestCell(){ try { // this is where I check to see if the next small distance has the correct distance // if it doesn't then I remove the offending Cell -> should also be able to check for // invalid iterator / pointer -- right??? while(!mins.empty() && mins.back() == NULL){ mins.pop_back(); } // if the mins vector is empty go here... if(mins.empty()){ mins.clear(); smallDist = begin()->dist; //set the first candidate small distance for(MatData currentCell=begin();currentCell!=end();currentCell++){ float dist = currentCell->dist; if(dist < smallDist){ //found a new smallest distance mins.clear(); smallDist = dist; mins.push_back(&*currentCell); //this is the address of the data in the list being pointed to by the MatData iterator } else if(dist == smallDist){ //if a subsequent distance is the same as mins distance add the new iterator to the mins vector mins.push_back(&*currentCell); //this is the address of the data in the list being pointed to by the MatData iterator } } random_shuffle(mins.begin(), mins.end()); //randomize the order of the iterators in the mins vector for(int i=0;ivectorMap = &mins[i]; //assign vectorMap to the address for the container } } smallCell = mins.back(); //make the smallestCell the last element of the vector mins.pop_back(); //remove the last element from the vector return smallCell; } catch(exception& e) { m->errorOut(e, "SparseMatrix", "getSmallestCell"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/datastructures/sparsematrix.hpp000066400000000000000000000023361255543666200227520ustar00rootroot00000000000000#ifndef SPARSEMATRIX_H #define SPARSEMATRIX_H #include "mothur.h" #include "mothurout.h" class ListVector; /***********************************************************************/ struct PCell{ ull row; ull column; float dist; PCell** vectorMap; PCell() : row(0), column(0), dist(0), vectorMap(NULL) {}; PCell(ull r, ull c, float d) : row(r), column(c), dist(d), vectorMap(NULL) {}; }; /***********************************************************************/ typedef list::iterator MatData; class SparseMatrix { public: SparseMatrix(); ~SparseMatrix(){ while(!mins.empty() && mins.back() == NULL){ mins.pop_back(); } } int getNNodes(); void print(); //Print the contents of the matrix void print(ListVector*); //Print the contents of the matrix PCell* getSmallestCell(); //Return the cell with the smallest distance float getSmallDist(); MatData rmCell(MatData); void addCell(PCell); void clear(); MatData begin(); MatData end(); private: PCell* smallCell; //The cell with the smallest distance int numNodes; list matrix; vector mins; float smallDist; int minsIndex; MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/datastructures/suffixdb.cpp000066400000000000000000000061161255543666200220350ustar00rootroot00000000000000/* * suffixdb.cpp * * * Created by Pat Schloss on 12/16/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is a child class of the Database abstract datatype. The class is basically a database of suffix trees and an * encapsulation of the method for finding the most similar tree to an inputted sequence. the suffixForest objecct * is a vector of SuffixTrees, with each template sequence being represented by a different SuffixTree. The class also * provides a method to take an unaligned sequence and find the closest sequence in the suffixForest. The search * method is inspired by the article and Perl source code provided at http://www.ddj.com/web-development/184416093. I * would estimate that the time complexity is O(LN) for each search, which is slower than the kmer searching, but * faster than blast * */ #include "database.hpp" #include "sequence.hpp" #include "suffixtree.hpp" #include "suffixdb.hpp" /**************************************************************************************************/ SuffixDB::SuffixDB(int numSeqs) : Database() { suffixForest.resize(numSeqs); count = 0; } /**************************************************************************************************/ SuffixDB::SuffixDB() : Database() { count = 0; } /**************************************************************************************************/ //assumes sequences have been added using addSequence vector SuffixDB::findClosestSequences(Sequence* candidateSeq, int num){ try { vector topMatches; string processedSeq = candidateSeq->convert2ints(); // the candidate sequence needs to be a string of ints vector seqMatches; for(int i=0;ierrorOut(e, "SuffixDB", "findClosestSequences"); exit(1); } } /**************************************************************************************************/ //adding the sequences generates the db void SuffixDB::addSequence(Sequence seq) { try { suffixForest[count].loadSequence(seq); count++; } catch(exception& e) { m->errorOut(e, "SuffixDB", "addSequence"); exit(1); } } /**************************************************************************************************/ SuffixDB::~SuffixDB(){ for (int i = (suffixForest.size()-1); i >= 0; i--) { suffixForest.pop_back(); } } /**************************************************************************************************/ mothur-1.36.1/source/datastructures/suffixdb.hpp000066400000000000000000000023701255543666200220400ustar00rootroot00000000000000#ifndef SUFFIXDB_HPP #define SUFFIXDB_HPP /* * suffixdb.hpp * * * Created by Pat Schloss on 12/16/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is a child class of the Database abstract datatype. The class is basically a database of suffix trees and an * encapsulation of the method for finding the most similar tree to an inputted sequence. the suffixForest object * is a vector of SuffixTrees, with each template sequence being represented by a different SuffixTree. The class also * provides a method to take an unaligned sequence and find the closest sequence in the suffixForest. The search * method is inspired by the article and Perl source code provided at http://www.ddj.com/web-development/184416093. I * would estimate that the time complexity is O(LN) for each search, which is slower than the kmer searching, but * faster than blast * */ #include "mothur.h" #include "database.hpp" #include "suffixtree.hpp" class SuffixDB : public Database { public: SuffixDB(int); SuffixDB(); ~SuffixDB(); void generateDB() {}; //adding sequences generates the db void addSequence(Sequence); vector findClosestSequences(Sequence*, int); private: vector suffixForest; int count; }; #endif mothur-1.36.1/source/datastructures/suffixnodes.cpp000066400000000000000000000107701255543666200225610ustar00rootroot00000000000000/* * SuffixNodes.cpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * There are two types of nodes in a suffix tree as I have implemented it. First, there are the internal nodes that * have children, these are the SuffixBranch objects. There are also the terminal nodes, which are the suffixBranches. * I divided them into two groups to save on memory. A SuffixTree object will be a vector of SuffixNodes; therefore, * the values of parentNode, children nodes, and suffix nodes are stored as ints that correspond to indices in the * vector * */ #include "suffixnodes.hpp" //******************************************************************************************************************** inline char deCodeSequence(char code){ if(code == '0') { return 'a'; } // this method allows us to go from the int string to a char string; else if(code == '1') { return 'c'; } // it's only really useful if we want to print out the tree else if(code == '2') { return 'g'; } else if(code == '3') { return 't'; } else if(code == '4') { return 'n'; } else { return '$'; } } //******************************************************************************************************************** SuffixNode::SuffixNode(int parent, int start, int end) : parentNode(parent), // we store the parent node as an int startCharPosition(start), // the suffix tree class will hold the sequence that the startCharPosition and endCharPosition(end) // endCharPosition indices correspond to { /* do nothing */ m = MothurOut::getInstance(); } void SuffixNode::setChildren(char, int) { /* do nothing */ } // there's no children in a leaf int SuffixNode::getNumChildren() { return 0; } // ditto void SuffixNode::eraseChild(char) { /* do nothing */ } // ditto int SuffixNode::getChild(char) { return -1; } // ditto void SuffixNode::setSuffixNode(int) { /* do nothing */ } // there's no suffix node in a leaf int SuffixNode::getSuffixNode() { return -1; } // ditto int SuffixNode::getParentNode() { return parentNode; } void SuffixNode::setParentNode(int number) { parentNode = number; } int SuffixNode::getStartCharPos() { return startCharPosition; } void SuffixNode::setStartCharPos(int start) { startCharPosition = start; } int SuffixNode::getEndCharPos() { return endCharPosition; } //******************************************************************************************************************** SuffixLeaf::SuffixLeaf(int parent, int start, int end) : SuffixNode(parent, start, end) { /* do nothing */ } void SuffixLeaf::print(string sequence, int nodeNumber){ m->mothurOut(toString(this) + "\t" + toString(parentNode) + "\t" + toString(nodeNumber) + "\t" + toString(-1) + "\t" + toString(startCharPosition) + "\t" + toString(endCharPosition) + "\t"); m->mothurOut("/"); for(int i=startCharPosition;i<=endCharPosition;i++){ m->mothurOut(toString(deCodeSequence(sequence[i]))); } m->mothurOut("/"); m->mothurOutEndLine(); } //******************************************************************************************************************** SuffixBranch::SuffixBranch(int parent, int start, int end) : SuffixNode(parent, start, end), suffixNode(-1){ childNodes.assign(6, -1); } void SuffixBranch::print(string sequence, int nodeNumber){ // this method is different that than m->mothurOut(toString(this) + "\t" + toString(parentNode) + "\t" + toString(nodeNumber) + "\t" + // of a leaf because it prints out a toString(suffixNode) + "\t" + toString(startCharPosition) + "\t" + toString(endCharPosition) + "\t"); // value for the suffix node m->mothurOut("/"); for(int i=startCharPosition;i<=endCharPosition;i++){ m->mothurOut(toString(deCodeSequence(sequence[i]))); } m->mothurOut("/"); m->mothurOutEndLine(); } // we can access the children by subtracting '0' from the the char value from the string, the difference is an int // value and the index we need to access. void SuffixBranch::eraseChild(char base) { childNodes[base - '0'] = -1; } //to erase set the child index to -1 void SuffixBranch::setChildren(char base, int nodeIndex){ childNodes[base - '0'] = nodeIndex; } void SuffixBranch::setSuffixNode(int nodeIndex){ suffixNode = nodeIndex; } int SuffixBranch::getSuffixNode() { return suffixNode; } int SuffixBranch::getChild(char base) { return childNodes[base - '0']; } //******************************************************************************************************************** mothur-1.36.1/source/datastructures/suffixnodes.hpp000066400000000000000000000054761255543666200225750ustar00rootroot00000000000000#ifndef SUFFIXNODES_H #define SUFFIXNODES_H /* * SuffixNodes.h * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * There are two types of nodes in a suffix tree as I have implemented it. First, there are the internal nodes that * have children, these are the SuffixBranch objects. There are also the terminal nodes, which are the suffixBranches. * I divided them into two groups to save on memory. A SuffixTree object will be a vector of SuffixNodes; therefore, * the values of parentNode, children nodes, and suffix nodes are stored as ints that correspond to indices in the * vector * */ #include "mothur.h" #include "mothurout.h" //******************************************************************************************************************** class SuffixNode { public: SuffixNode(int, int, int); virtual ~SuffixNode() {} virtual void print(string, int) = 0; virtual void setChildren(char, int); virtual int getNumChildren(); virtual void eraseChild(char); virtual void setSuffixNode(int); virtual int getSuffixNode(); virtual int getChild(char); int getParentNode(); void setParentNode(int); int getStartCharPos(); void setStartCharPos(int start); int getEndCharPos(); protected: int parentNode; int startCharPosition; int endCharPosition; MothurOut* m; }; //******************************************************************************************************************** class SuffixLeaf : public SuffixNode { // most of the methods are already set in the parent class public: SuffixLeaf(int, int, int); // we just need to define a constructor and ~SuffixLeaf() {} void print(string, int); // print method }; //******************************************************************************************************************** class SuffixBranch : public SuffixNode { public: SuffixBranch(int, int, int); ~SuffixBranch() {} void print(string, int); // need a special method for printing the node because there are children void eraseChild(char); // need a special method for erasing the children void setChildren(char, int); // need a special method for setting children void setSuffixNode(int); // need a special method for setting the suffix node int getSuffixNode(); // need a special method for returning the suffix node int getChild(char); // need a special method for return children private: vector childNodes; // a suffix branch is unique because it has children and a suffixNode. The int suffixNode; // are stored in a vector for super-fast lookup. If the alphabet were bigger, this }; // might not be practical. Since we only have 5 possible letters, it makes sense //******************************************************************************************************************** #endif mothur-1.36.1/source/datastructures/suffixtree.cpp000066400000000000000000000302731255543666200224100ustar00rootroot00000000000000/* * suffixtree.cpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is my half-assed attempt to implement a suffix tree. This is a cobbled together algorithm using materials that * I found at http://marknelson.us/1996/08/01/suffix-trees/ and: * * Ukkonen E. (1995). On-line construction of suffix trees. Algorithmica 14 (3): 249--260 * Gusfield, Dan (1999). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. * USA: Cambridge University Press * * The Ukkonen paper is the seminal paper describing the on-line method of constructing a suffix tree. * * I have chosen to store the nodes of the tree as a vector of pointers to SuffixNode objects. The root is stored at * nodeVector[0]. Each tree also stores the sequence name and the string that corresponds to the actual sequence. * Finally, this class provides a way of counting the number of suffixes that are needed in one tree to generate a new * sequence (countSuffixes). This method is used to determine similarity between sequences and was inspired by the * article and Perl source code provided at http://www.ddj.com/web-development/184416093. * */ #include "sequence.hpp" #include "suffixnodes.hpp" #include "suffixtree.hpp" //******************************************************************************************************************** inline bool compareParents(SuffixNode* left, SuffixNode* right){// this is necessary to print the tree and to sort the return (left->getParentNode() < right->getParentNode()); // nodes in order of their parent } //******************************************************************************************************************** SuffixTree::SuffixTree(){ m = MothurOut::getInstance(); } //******************************************************************************************************************** SuffixTree::~SuffixTree(){ for(int i=0;i hold = nodeVector; sort(hold.begin(), hold.end(), compareParents); m->mothurOut("Address\t\tParent\tNode\tSuffix\tStartC\tEndC\tSuffix"); m->mothurOutEndLine(); for(int i=1;i<=nodeCounter;i++){ hold[i]->print(sequence, i); } } //******************************************************************************************************************** int SuffixTree::countSuffixes(string compareSequence, int& minValue){ // here we count the number of suffix parts // we need to rewrite a user supplied sequence. if the int numSuffixes = 0; // count exceeds the supplied minValue, bail out. The int seqLength = compareSequence.length(); // time complexity should be O(L) int position = 0; int presentNode = 0; while(position < seqLength){ // while the position in the query sequence isn't at the end... if(numSuffixes > minValue) { return 1000000; } // bail if the count gets too high int newNode = nodeVector[presentNode]->getChild(compareSequence[position]); // see if the current node has a // child that matches the next character in the query if(newNode == -1){ if(presentNode == 0){ position++; } // if not, go back to the root and increase the count numSuffixes++; // by one. presentNode = 0; } else{ // if there is, move to that node and see how far down presentNode = newNode; // it we can get for(int i=nodeVector[newNode]->getStartCharPos(); i<=nodeVector[newNode]->getEndCharPos(); i++){ if(compareSequence[position] == sequence[i]){ position++; // as long as the query and branch agree, keep going } else{ numSuffixes++; // if there is a mismatch, increase the number of presentNode = 0; // suffixes and go back to the root break; } } } // if we get all the way through the node we'll go to the top of the while loop and find the child node // that corresponds to what we are interested in } numSuffixes--; // the method puts an extra count on numSuffixes if(numSuffixes < minValue) { minValue = numSuffixes; } // if the count is less than the previous minValue, return numSuffixes; // change the value and return the number of suffixes } //******************************************************************************************************************** int SuffixTree::countSuffixes(string compareSequence){ // here we count the number of suffix parts // we need to rewrite a user supplied sequence. if the int numSuffixes = 0; // count exceeds the supplied minValue, bail out. The int seqLength = compareSequence.length(); // time complexity should be O(L) int position = 0; int presentNode = 0; while(position < seqLength){ // while the position in the query sequence isn't at the end... int newNode = nodeVector[presentNode]->getChild(compareSequence[position]); // see if the current node has a // child that matches the next character in the query if(newNode == -1){ if(presentNode == 0){ position++; } // if not, go back to the root and increase the count numSuffixes++; // by one. presentNode = 0; } else{ // if there is, move to that node and see how far down presentNode = newNode; // it we can get for(int i=nodeVector[newNode]->getStartCharPos(); i<=nodeVector[newNode]->getEndCharPos(); i++){ if(compareSequence[position] == sequence[i]){ position++; // as long as the query and branch agree, keep going } else{ numSuffixes++; // if there is a mismatch, increase the number of presentNode = 0; // suffixes and go back to the root break; } } } // if we get all the way through the node we'll go to the top of the while loop and find the child node // that corresponds to what we are interested in } numSuffixes--; // the method puts an extra count on numSuffixes return numSuffixes; // change the value and return the number of suffixes } //******************************************************************************************************************** void SuffixTree::canonize(){ // if you have to ask how this works, you don't really want to know and this really // isn't the place to ask. if ( isExplicit() == 0 ) { // if the node has no children... int tempNodeIndex = nodeVector[activeNode]->getChild(sequence[activeStartPosition]); SuffixNode* tempNode = nodeVector[tempNodeIndex]; int span = tempNode->getEndCharPos() - tempNode->getStartCharPos(); while ( span <= ( activeEndPosition - activeStartPosition ) ) { activeStartPosition = activeStartPosition + span + 1; activeNode = tempNodeIndex; if ( activeStartPosition <= activeEndPosition ) { tempNodeIndex = nodeVector[tempNodeIndex]->getChild(sequence[activeStartPosition]); tempNode = nodeVector[tempNodeIndex]; span = tempNode->getEndCharPos() - tempNode->getStartCharPos(); } } } } //******************************************************************************************************************** int SuffixTree::split(int nodeIndex, int position){ // leaves stay leaves, etc, to split a leaf we make a new interior // node and reconnect everything SuffixNode* node = nodeVector[nodeIndex]; // get the node that needs to be split SuffixNode* parentNode = nodeVector[node->getParentNode()]; // get it's parent node parentNode->eraseChild(sequence[node->getStartCharPos()]); // erase the present node from the registry of its parent nodeCounter++; SuffixNode* newNode = new SuffixBranch(node->getParentNode(), node->getStartCharPos(), node->getStartCharPos() + activeEndPosition - activeStartPosition); // create a new node that will link the parent with the old child parentNode->setChildren(sequence[newNode->getStartCharPos()], nodeCounter);// give the parent the new child nodeVector.push_back(newNode); node->setParentNode(nodeCounter); // give the original node the new node as its parent newNode->setChildren(sequence[node->getStartCharPos() + activeEndPosition - activeStartPosition + 1], nodeIndex); // put the original node in the registry of the new node's children newNode->setSuffixNode(activeNode);//link the new node with the old active node // recalculate the startCharPosition of the outermost node node->setStartCharPos(node->getStartCharPos() + activeEndPosition - activeStartPosition + 1 ); return node->getParentNode(); } //******************************************************************************************************************** void SuffixTree::makeSuffixLink(int& previous, int present){ // here we link the nodes that are suffixes of one another to rapidly speed through the tree if ( previous > 0 ) { nodeVector[previous]->setSuffixNode(present); } else { /* do nothing */ } previous = present; } //******************************************************************************************************************** void SuffixTree::addPrefix(int prefixPosition){ int lastParentNode = -1; // we need to place a new prefix in the suffix tree int parentNode = 0; while(1){ parentNode = activeNode; if(isExplicit() == 1){ // if the node is explicit (has kids), try to follow it down the branch if its there... if(nodeVector[activeNode]->getChild(sequence[prefixPosition]) != -1){ // break out and get next prefix... break; } else{ // ...otherwise continue, we'll need to make a new node later on... } } else{ // if it's not explicit (no kids), read through and see if all of the chars agree... int tempNode = nodeVector[activeNode]->getChild(sequence[activeStartPosition]); int span = activeEndPosition - activeStartPosition; if(sequence[nodeVector[tempNode]->getStartCharPos() + span + 1] == sequence[prefixPosition] ){ break; // if the existing suffix agrees with the new one, grab a new prefix... } else{ parentNode = split(tempNode, prefixPosition); // ... otherwise we need to split the node } } nodeCounter++; // we need to generate a new node here if the kid didn't exist, or we split a node SuffixNode* newSuffixLeaf = new SuffixLeaf(parentNode, prefixPosition, sequence.length()-1); nodeVector[parentNode]->setChildren(sequence[prefixPosition], nodeCounter); nodeVector.push_back(newSuffixLeaf); makeSuffixLink( lastParentNode, parentNode ); // make a suffix link for the parent node if(nodeVector[activeNode]->getParentNode() == -1){ // move along the start position for the tree activeStartPosition++; } else { activeNode = nodeVector[activeNode]->getSuffixNode(); } canonize(); // frankly, i'm not entirely clear on what canonize does. } makeSuffixLink( lastParentNode, parentNode ); activeEndPosition++; // move along the end position for the tree canonize(); // frankly, i'm not entirely clear on what canonize does. } //******************************************************************************************************************** mothur-1.36.1/source/datastructures/suffixtree.hpp000066400000000000000000000041151255543666200224110ustar00rootroot00000000000000#ifndef SUFFIXTREE_H #define SUFFIXTREE_H /* * suffixtree.h * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is my half-assed attempt to implement a suffix tree. This is a cobbled together algorithm using materials that * I found at http://marknelson.us/1996/08/01/suffix-trees/ and: * * Ukkonen E. (1995). On-line construction of suffix trees. Algorithmica 14 (3): 249--260 * Gusfield, Dan (1999). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. * USA: Cambridge University Press * * The Ukkonen paper is the seminal paper describing the on-line method of constructing a suffix tree. * * I have chosen to store the nodes of the tree as a vector of pointers to SuffixNode objects. The root is stored at * nodeVector[0]. Each tree also stores the sequence name and the string that corresponds to the actual sequence. * Finally, this class provides a way of counting the number of suffixes that are needed in one tree to generate a new * sequence (countSuffixes). This method is used to determine similarity between sequences and was inspired by the * article and Perl source code provided at http://www.ddj.com/web-development/184416093. * */ #include "mothur.h" class SuffixNode; //******************************************************************************************************************** class SuffixTree { public: SuffixTree(); ~SuffixTree(); void loadSequence(Sequence); string getSeqName(); void print(); int countSuffixes(string, int&); int countSuffixes(string); private: void addPrefix(int); void canonize(); int split(int, int); void makeSuffixLink(int&, int); bool isExplicit(){ return activeStartPosition > activeEndPosition; } int activeStartPosition; int activeEndPosition; vector nodeVector; int root; int activeNode; int nodeCounter; string seqName; string sequence; MothurOut* m; }; //******************************************************************************************************************** #endif mothur-1.36.1/source/datastructures/tree.cpp000066400000000000000000001270221255543666200211620ustar00rootroot00000000000000/* * tree.cpp * Mothur * * Created by Sarah Westcott on 1/22/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "tree.h" /*****************************************************************/ Tree::Tree(int num, CountTable* t) : ct(t) { try { m = MothurOut::getInstance(); numLeaves = num; numNodes = 2*numLeaves - 1; tree.resize(numNodes); } catch(exception& e) { m->errorOut(e, "Tree", "Tree - numNodes"); exit(1); } } /*****************************************************************/ Tree::Tree(string g) { //do not use tree generated by this its just to extract the treenames, its a chicken before the egg thing that needs to be revisited. try { m = MothurOut::getInstance(); parseTreeFile(); m->runParse = false; } catch(exception& e) { m->errorOut(e, "Tree", "Tree - just parse"); exit(1); } } /*****************************************************************/ Tree::Tree(CountTable* t) : ct(t) { try { m = MothurOut::getInstance(); if (m->runParse == true) { parseTreeFile(); m->runParse = false; } numLeaves = m->Treenames.size(); numNodes = 2*numLeaves - 1; tree.resize(numNodes); //initialize groupNodeInfo vector namesOfGroups = ct->getNamesOfGroups(); for (int i = 0; i < namesOfGroups.size(); i++) { groupNodeInfo[namesOfGroups[i]].resize(0); } //initialize tree with correct number of nodes, name and group info. for (int i = 0; i < numNodes; i++) { //initialize leaf nodes if (i <= (numLeaves-1)) { tree[i].setName(m->Treenames[i]); //save group info int maxPars = 1; vector group; vector counts = ct->getGroupCounts(m->Treenames[i]); for (int j = 0; j < namesOfGroups.size(); j++) { if (counts[j] != 0) { //you have seqs from this group groupNodeInfo[namesOfGroups[j]].push_back(i); group.push_back(namesOfGroups[j]); tree[i].pGroups[namesOfGroups[j]] = counts[j]; tree[i].pcount[namesOfGroups[j]] = counts[j]; //keep highest group if(counts[j] > maxPars){ maxPars = counts[j]; } } } tree[i].setGroup(group); setIndex(m->Treenames[i], i); if (maxPars > 1) { //then we have some more dominant groups //erase all the groups that are less than maxPars because you found a more dominant group. for(it=tree[i].pGroups.begin();it!=tree[i].pGroups.end();){ if(it->second < maxPars){ tree[i].pGroups.erase(it++); }else { it++; } } //set one remaining groups to 1 for(it=tree[i].pGroups.begin();it!=tree[i].pGroups.end();it++){ tree[i].pGroups[it->first] = 1; } }//end if //intialize non leaf nodes }else if (i > (numLeaves-1)) { tree[i].setName(""); vector tempGroups; tree[i].setGroup(tempGroups); } } } catch(exception& e) { m->errorOut(e, "Tree", "Tree"); exit(1); } } /*****************************************************************/ Tree::Tree(CountTable* t, vector< vector >& sims) : ct(t) { try { m = MothurOut::getInstance(); if (m->runParse == true) { parseTreeFile(); m->runParse = false; } numLeaves = m->Treenames.size(); numNodes = 2*numLeaves - 1; tree.resize(numNodes); //initialize groupNodeInfo vector namesOfGroups = ct->getNamesOfGroups(); for (int i = 0; i < namesOfGroups.size(); i++) { groupNodeInfo[namesOfGroups[i]].resize(0); } //initialize tree with correct number of nodes, name and group info. for (int i = 0; i < numNodes; i++) { //initialize leaf nodes if (i <= (numLeaves-1)) { tree[i].setName(m->Treenames[i]); //save group info int maxPars = 1; vector group; vector counts = ct->getGroupCounts(m->Treenames[i]); for (int j = 0; j < namesOfGroups.size(); j++) { if (counts[j] != 0) { //you have seqs from this group groupNodeInfo[namesOfGroups[j]].push_back(i); group.push_back(namesOfGroups[j]); tree[i].pGroups[namesOfGroups[j]] = counts[j]; tree[i].pcount[namesOfGroups[j]] = counts[j]; //keep highest group if(counts[j] > maxPars){ maxPars = counts[j]; } } } tree[i].setGroup(group); setIndex(m->Treenames[i], i); if (maxPars > 1) { //then we have some more dominant groups //erase all the groups that are less than maxPars because you found a more dominant group. for(it=tree[i].pGroups.begin();it!=tree[i].pGroups.end();){ if(it->second < maxPars){ tree[i].pGroups.erase(it++); }else { it++; } } //set one remaining groups to 1 for(it=tree[i].pGroups.begin();it!=tree[i].pGroups.end();it++){ tree[i].pGroups[it->first] = 1; } }//end if //intialize non leaf nodes }else if (i > (numLeaves-1)) { tree[i].setName(""); vector tempGroups; tree[i].setGroup(tempGroups); } } //build tree from matrix //initialize indexes map thisIndexes; //maps row in simMatrix to vector index in the tree for (int g = 0; g < numLeaves; g++) { thisIndexes[g] = g; } //do merges and create tree structure by setting parents and children //there are numGroups - 1 merges to do for (int i = 0; i < (numLeaves - 1); i++) { float largest = -1000.0; if (m->control_pressed) { break; } int row, column; //find largest value in sims matrix by searching lower triangle for (int j = 1; j < sims.size(); j++) { for (int k = 0; k < j; k++) { if (sims[j][k] > largest) { largest = sims[j][k]; row = j; column = k; } } } //set non-leaf node info and update leaves to know their parents //non-leaf tree[numLeaves + i].setChildren(thisIndexes[row], thisIndexes[column]); //parents tree[thisIndexes[row]].setParent(numLeaves + i); tree[thisIndexes[column]].setParent(numLeaves + i); //blength = distance / 2; float blength = ((1.0 - largest) / 2); //branchlengths tree[thisIndexes[row]].setBranchLength(blength - tree[thisIndexes[row]].getLengthToLeaves()); tree[thisIndexes[column]].setBranchLength(blength - tree[thisIndexes[column]].getLengthToLeaves()); //set your length to leaves to your childs length plus branchlength tree[numLeaves + i].setLengthToLeaves(tree[thisIndexes[row]].getLengthToLeaves() + tree[thisIndexes[row]].getBranchLength()); //update index thisIndexes[row] = numLeaves+i; thisIndexes[column] = numLeaves+i; //remove highest value that caused the merge. sims[row][column] = -1000.0; sims[column][row] = -1000.0; //merge values in simsMatrix for (int n = 0; n < sims.size(); n++) { //row becomes merge of 2 groups sims[row][n] = (sims[row][n] + sims[column][n]) / 2; sims[n][row] = sims[row][n]; //delete column sims[column][n] = -1000.0; sims[n][column] = -1000.0; } } //adjust tree to make sure root to tip length is .5 int root = findRoot(); tree[root].setBranchLength((0.5 - tree[root].getLengthToLeaves())); } catch(exception& e) { m->errorOut(e, "Tree", "Tree"); exit(1); } } /*****************************************************************/ Tree::~Tree() { } /***************************************************************** void Tree::addNamesToCounts(map nameMap) { try { //ex. seq1 seq2,seq3,se4 // seq1 = pasture // seq2 = forest // seq4 = pasture // seq3 = ocean //before this function seq1.pcount = pasture -> 1 //after seq1.pcount = pasture -> 2, forest -> 1, ocean -> 1 //before this function seq1.pgroups = pasture -> 1 //after seq1.pgroups = pasture -> 1 since that is the dominant group //go through each leaf and update its pcounts and pgroups //float A = clock(); for (int i = 0; i < numLeaves; i++) { string name = tree[i].getName(); map::iterator itNames = nameMap.find(name); if (itNames == nameMap.end()) { m->mothurOut(name + " is not in your name file, please correct."); m->mothurOutEndLine(); exit(1); } else { vector dupNames; m->splitAtComma(nameMap[name], dupNames); map::iterator itCounts; int maxPars = 1; set groupsAddedForThisNode; for (int j = 0; j < dupNames.size(); j++) { string group = tmap->getGroup(dupNames[j]); if (dupNames[j] != name) {//you already added yourself in the constructor if (groupsAddedForThisNode.count(group) == 0) { groupNodeInfo[group].push_back(i); groupsAddedForThisNode.insert(group); } //if you have not already added this node for this group, then add it //update pcounts itCounts = tree[i].pcount.find(group); if (itCounts == tree[i].pcount.end()) { //new group, add it tree[i].pcount[group] = 1; }else { tree[i].pcount[group]++; } //update pgroups itCounts = tree[i].pGroups.find(group); if (itCounts == tree[i].pGroups.end()) { //new group, add it tree[i].pGroups[group] = 1; }else{ tree[i].pGroups[group]++; } //keep highest group if(tree[i].pGroups[group] > maxPars){ maxPars = tree[i].pGroups[group]; } }else { groupsAddedForThisNode.insert(group); } //add it so you don't add it to groupNodeInfo again }//end for if (maxPars > 1) { //then we have some more dominant groups //erase all the groups that are less than maxPars because you found a more dominant group. for(it=tree[i].pGroups.begin();it!=tree[i].pGroups.end();){ if(it->second < maxPars){ tree[i].pGroups.erase(it++); }else { it++; } } //set one remaining groups to 1 for(it=tree[i].pGroups.begin();it!=tree[i].pGroups.end();it++){ tree[i].pGroups[it->first] = 1; } }//end if //update groups to reflect all the groups this node represents vector nodeGroups; map::iterator itGroups; for (itGroups = tree[i].pcount.begin(); itGroups != tree[i].pcount.end(); itGroups++) { nodeGroups.push_back(itGroups->first); } tree[i].setGroup(nodeGroups); }//end else }//end for //float B = clock(); //cout << "addNamesToCounts\t" << (B - A) / CLOCKS_PER_SEC << endl; } catch(exception& e) { m->errorOut(e, "Tree", "addNamesToCounts"); exit(1); } }*/ /*****************************************************************/ int Tree::getIndex(string searchName) { try { map::iterator itIndex = indexes.find(searchName); if (itIndex != indexes.end()) { return itIndex->second; } return -1; } catch(exception& e) { m->errorOut(e, "Tree", "getIndex"); exit(1); } } /*****************************************************************/ void Tree::setIndex(string searchName, int index) { try { map::iterator itIndex = indexes.find(searchName); if (itIndex == indexes.end()) { indexes[searchName] = index; } } catch(exception& e) { m->errorOut(e, "Tree", "setIndex"); exit(1); } } /*****************************************************************/ int Tree::assembleTree() { try { //build the pGroups in non leaf nodes to be used in the parsimony calcs. for (int i = numLeaves; i < numNodes; i++) { if (m->control_pressed) { return 1; } tree[i].pGroups = (mergeGroups(i)); tree[i].pcount = (mergeGcounts(i)); } return 0; } catch(exception& e) { m->errorOut(e, "Tree", "assembleTree"); exit(1); } } /*****************************************************************/ //assumes leaf node names are in groups and no names file - used by indicator command void Tree::getSubTree(Tree* Ctree, vector Groups) { try { //copy Tree since we are going to destroy it Tree* copy = new Tree(ct); copy->getCopy(Ctree); copy->assembleTree(); //we want to select some of the leaf nodes to create the output tree //go through the input Tree starting at parents of leaves //initialize groupNodeInfo vector namesOfGroups = ct->getNamesOfGroups(); for (int i = 0; i < namesOfGroups.size(); i++) { groupNodeInfo[namesOfGroups[i]].resize(0); } //initialize tree with correct number of nodes, name and group info. for (int i = 0; i < numNodes; i++) { //initialize leaf nodes if (i <= (numLeaves-1)) { tree[i].setName(Groups[i]); //save group info int maxPars = 1; vector group; vector counts = ct->getGroupCounts(Groups[i]); for (int j = 0; j < namesOfGroups.size(); j++) { if (counts[j] != 0) { //you have seqs from this group groupNodeInfo[namesOfGroups[j]].push_back(i); group.push_back(namesOfGroups[j]); tree[i].pGroups[namesOfGroups[j]] = counts[j]; tree[i].pcount[namesOfGroups[j]] = counts[j]; //keep highest group if(counts[j] > maxPars){ maxPars = counts[j]; } } } tree[i].setGroup(group); setIndex(Groups[i], i); if (maxPars > 1) { //then we have some more dominant groups //erase all the groups that are less than maxPars because you found a more dominant group. for(it=tree[i].pGroups.begin();it!=tree[i].pGroups.end();){ if(it->second < maxPars){ tree[i].pGroups.erase(it++); }else { it++; } } //set one remaining groups to 1 for(it=tree[i].pGroups.begin();it!=tree[i].pGroups.end();it++){ tree[i].pGroups[it->first] = 1; } }//end if //intialize non leaf nodes }else if (i > (numLeaves-1)) { tree[i].setName(""); vector tempGroups; tree[i].setGroup(tempGroups); } } set removedLeaves; for (int i = 0; i < copy->getNumLeaves(); i++) { if (removedLeaves.count(i) == 0) { //am I in the group int parent = copy->tree[i].getParent(); if (parent != -1) { if (m->inUsersGroups(copy->tree[i].getName(), Groups)) { //find my siblings name int parentRC = copy->tree[parent].getRChild(); int parentLC = copy->tree[parent].getLChild(); //if I am the right child, then my sib is the left child int sibIndex = parentRC; if (parentRC == i) { sibIndex = parentLC; } string sibsName = copy->tree[sibIndex].getName(); //if yes, is my sibling if ((m->inUsersGroups(sibsName, Groups)) || (sibsName == "")) { //we both are okay no trimming required }else{ //i am, my sib is not, so remove sib by setting my parent to my grandparent int grandparent = copy->tree[parent].getParent(); int grandparentLC = copy->tree[grandparent].getLChild(); int grandparentRC = copy->tree[grandparent].getRChild(); //whichever of my granparents children was my parent now equals me if (grandparentLC == parent) { grandparentLC = i; } else { grandparentRC = i; } copy->tree[i].setParent(grandparent); copy->tree[i].setBranchLength((copy->tree[i].getBranchLength()+copy->tree[parent].getBranchLength())); if (grandparent != -1) { copy->tree[grandparent].setChildren(grandparentLC, grandparentRC); } removedLeaves.insert(sibIndex); } }else{ //find my siblings name int parentRC = copy->tree[parent].getRChild(); int parentLC = copy->tree[parent].getLChild(); //if I am the right child, then my sib is the left child int sibIndex = parentRC; if (parentRC == i) { sibIndex = parentLC; } string sibsName = copy->tree[sibIndex].getName(); //if no is my sibling if ((m->inUsersGroups(sibsName, Groups)) || (sibsName == "")) { //i am not, but my sib is int grandparent = copy->tree[parent].getParent(); int grandparentLC = copy->tree[grandparent].getLChild(); int grandparentRC = copy->tree[grandparent].getRChild(); //whichever of my granparents children was my parent now equals my sib if (grandparentLC == parent) { grandparentLC = sibIndex; } else { grandparentRC = sibIndex; } copy->tree[sibIndex].setParent(grandparent); copy->tree[sibIndex].setBranchLength((copy->tree[sibIndex].getBranchLength()+copy->tree[parent].getBranchLength())); if (grandparent != -1) { copy->tree[grandparent].setChildren(grandparentLC, grandparentRC); } removedLeaves.insert(i); }else{ //neither of us are, so we want to eliminate ourselves and our parent //so set our parents sib to our great-grandparent int parent = copy->tree[i].getParent(); int grandparent = copy->tree[parent].getParent(); int parentsSibIndex; if (grandparent != -1) { int greatgrandparent = copy->tree[grandparent].getParent(); int greatgrandparentLC, greatgrandparentRC; if (greatgrandparent != -1) { greatgrandparentLC = copy->tree[greatgrandparent].getLChild(); greatgrandparentRC = copy->tree[greatgrandparent].getRChild(); } int grandparentLC = copy->tree[grandparent].getLChild(); int grandparentRC = copy->tree[grandparent].getRChild(); parentsSibIndex = grandparentLC; if (grandparentLC == parent) { parentsSibIndex = grandparentRC; } //whichever of my greatgrandparents children was my grandparent if (greatgrandparentLC == grandparent) { greatgrandparentLC = parentsSibIndex; } else { greatgrandparentRC = parentsSibIndex; } copy->tree[parentsSibIndex].setParent(greatgrandparent); copy->tree[parentsSibIndex].setBranchLength((copy->tree[parentsSibIndex].getBranchLength()+copy->tree[grandparent].getBranchLength())); if (greatgrandparent != -1) { copy->tree[greatgrandparent].setChildren(greatgrandparentLC, greatgrandparentRC); } }else{ copy->tree[parent].setParent(-1); //cout << "issues with making subtree" << endl; } removedLeaves.insert(sibIndex); removedLeaves.insert(i); } } } } } int root = 0; for (int i = 0; i < copy->getNumNodes(); i++) { //you found the root if (copy->tree[i].getParent() == -1) { root = i; break; } } int nextSpot = numLeaves; populateNewTree(copy->tree, root, nextSpot); delete copy; } catch(exception& e) { m->errorOut(e, "Tree", "getSubTree"); exit(1); } } /***************************************************************** //assumes nameMap contains unique names as key or is empty. //assumes numLeaves defined in tree constructor equals size of seqsToInclude and seqsToInclude only contains unique seqs. int Tree::getSubTree(Tree* copy, vector seqsToInclude, map nameMap) { try { if (numLeaves != seqsToInclude.size()) { m->mothurOut("[ERROR]: numLeaves does not equal numUniques, cannot create subtree.\n"); m->control_pressed = true; return 0; } getSubTree(copy, seqsToInclude); if (nameMap.size() != 0) { addNamesToCounts(nameMap); } //build the pGroups in non leaf nodes to be used in the parsimony calcs. for (int i = numLeaves; i < numNodes; i++) { if (m->control_pressed) { return 1; } tree[i].pGroups = (mergeGroups(i)); tree[i].pcount = (mergeGcounts(i)); } return 0; } catch(exception& e) { m->errorOut(e, "Tree", "getSubTree"); exit(1); } } /*****************************************************************/ int Tree::populateNewTree(vector& oldtree, int node, int& index) { try { if (oldtree[node].getLChild() != -1) { int rc = populateNewTree(oldtree, oldtree[node].getLChild(), index); int lc = populateNewTree(oldtree, oldtree[node].getRChild(), index); tree[index].setChildren(lc, rc); tree[rc].setParent(index); tree[lc].setParent(index); tree[index].setBranchLength(oldtree[node].getBranchLength()); tree[rc].setBranchLength(oldtree[oldtree[node].getLChild()].getBranchLength()); tree[lc].setBranchLength(oldtree[oldtree[node].getRChild()].getBranchLength()); return (index++); }else { //you are a leaf int indexInNewTree = getIndex(oldtree[node].getName()); return indexInNewTree; } } catch(exception& e) { m->errorOut(e, "Tree", "populateNewTree"); exit(1); } } /*****************************************************************/ void Tree::getCopy(Tree* copy, bool subsample) { try { //for each node in the tree copy its info for (int i = 0; i < numNodes; i++) { //copy branch length tree[i].setBranchLength(copy->tree[i].getBranchLength()); //copy parent tree[i].setParent(copy->tree[i].getParent()); //copy children tree[i].setChildren(copy->tree[i].getLChild(), copy->tree[i].getRChild()); } //build the pGroups in non leaf nodes to be used in the parsimony calcs. for (int i = numLeaves; i < numNodes; i++) { if (m->control_pressed) { break; } tree[i].pGroups = (mergeGroups(i)); tree[i].pcount = (mergeGcounts(i)); } } catch(exception& e) { m->errorOut(e, "Tree", "getCopy"); exit(1); } } /*****************************************************************/ void Tree::getCopy(Tree* copy) { try { //for each node in the tree copy its info for (int i = 0; i < numNodes; i++) { //copy name tree[i].setName(copy->tree[i].getName()); //copy group tree[i].setGroup(copy->tree[i].getGroup()); //copy branch length tree[i].setBranchLength(copy->tree[i].getBranchLength()); //copy parent tree[i].setParent(copy->tree[i].getParent()); //copy children tree[i].setChildren(copy->tree[i].getLChild(), copy->tree[i].getRChild()); //copy index in node and tmap setIndex(copy->tree[i].getName(), getIndex(copy->tree[i].getName())); tree[i].setIndex(copy->tree[i].getIndex()); //copy pGroups tree[i].pGroups = copy->tree[i].pGroups; //copy pcount tree[i].pcount = copy->tree[i].pcount; } groupNodeInfo = copy->groupNodeInfo; } catch(exception& e) { m->errorOut(e, "Tree", "getCopy"); exit(1); } } /*****************************************************************/ //returns a map with a groupname and the number of times that group was seen in the children //for instance if your children are white and black then it would return a map with 2 entries // p[white] = 1 and p[black] = 1. Now go up a level and merge that with a node who has p[white] = 1 //and you get p[white] = 2, p[black] = 1, but you erase the p[black] because you have a p value higher than 1. map Tree::mergeGroups(int i) { try { int lc = tree[i].getLChild(); int rc = tree[i].getRChild(); //set parsimony groups to left child map parsimony = tree[lc].pGroups; int maxPars = 1; //look at right child groups and update maxPars if right child has something higher for that group. for(it=tree[rc].pGroups.begin();it!=tree[rc].pGroups.end();it++){ it2 = parsimony.find(it->first); if (it2 != parsimony.end()) { parsimony[it->first]++; }else { parsimony[it->first] = 1; } if(parsimony[it->first] > maxPars){ maxPars = parsimony[it->first]; } } // this is true if right child had a greater parsimony for a certain group if(maxPars > 1){ //erase all the groups that are only 1 because you found something with 2. for(it=parsimony.begin();it!=parsimony.end();){ if(it->second == 1){ parsimony.erase(it++); }else { it++; } } //set one remaining groups to 1 //so with our above example p[white] = 2 would be left and it would become p[white] = 1 for(it=parsimony.begin();it!=parsimony.end();it++){ parsimony[it->first] = 1; } } return parsimony; } catch(exception& e) { m->errorOut(e, "Tree", "mergeGroups"); exit(1); } } /*****************************************************************/ //returns a map with a groupname and the number of times that group was seen in the children //for instance if your children are white and black then it would return a map with 2 entries // p[white] = 1 and p[black] = 1. Now go up a level and merge that with a node who has p[white] = 1 //and you get p[white] = 2, p[black] = 1, but you erase the p[black] because you have a p value higher than 1. map Tree::mergeUserGroups(int i, vector g) { try { int lc = tree[i].getLChild(); int rc = tree[i].getRChild(); //loop through nodes groups removing the ones the user doesn't want for(it=tree[lc].pGroups.begin();it!=tree[lc].pGroups.end();){ if (m->inUsersGroups(it->first, g) != true) { tree[lc].pGroups.erase(it++); }else { it++; } } //loop through nodes groups removing the ones the user doesn't want for(it=tree[rc].pGroups.begin();it!=tree[rc].pGroups.end();){ if (m->inUsersGroups(it->first, g) != true) { tree[rc].pGroups.erase(it++); }else { it++; } } //set parsimony groups to left child map parsimony = tree[lc].pGroups; int maxPars = 1; //look at right child groups and update maxPars if right child has something higher for that group. for(it=tree[rc].pGroups.begin();it!=tree[rc].pGroups.end();it++){ it2 = parsimony.find(it->first); if (it2 != parsimony.end()) { parsimony[it->first]++; }else { parsimony[it->first] = 1; } if(parsimony[it->first] > maxPars){ maxPars = parsimony[it->first]; } } // this is true if right child had a greater parsimony for a certain group if(maxPars > 1){ //erase all the groups that are only 1 because you found something with 2. for(it=parsimony.begin();it!=parsimony.end();){ if(it->second == 1){ parsimony.erase(it++); }else { it++; } } for(it=parsimony.begin();it!=parsimony.end();it++){ parsimony[it->first] = 1; } } return parsimony; } catch(exception& e) { m->errorOut(e, "Tree", "mergeUserGroups"); exit(1); } } /**************************************************************************************************/ map Tree::mergeGcounts(int position) { try{ map::iterator pos; int lc = tree[position].getLChild(); int rc = tree[position].getRChild(); map sum = tree[lc].pcount; for(it=tree[rc].pcount.begin();it!=tree[rc].pcount.end();it++){ sum[it->first] += it->second; } return sum; } catch(exception& e) { m->errorOut(e, "Tree", "mergeGcounts"); exit(1); } } /**************************************************************************************************/ void Tree::randomLabels(vector g) { try { //initialize groupNodeInfo for (int i = 0; i < (ct->getNamesOfGroups()).size(); i++) { groupNodeInfo[(ct->getNamesOfGroups())[i]].resize(0); } for(int i = 0; i < numLeaves; i++){ int z; //get random index to switch with z = int((float)(i+1) * (float)(rand()) / ((float)RAND_MAX+1.0)); //you only want to randomize the nodes that are from a group the user wants analyzed, so //if either of the leaf nodes you are about to switch are not in the users groups then you don't want to switch them. bool treez, treei; treez = m->inUsersGroups(tree[z].getGroup(), g); treei = m->inUsersGroups(tree[i].getGroup(), g); if ((treez == true) && (treei == true)) { //switches node i and node z's info. map lib_hold = tree[z].pGroups; tree[z].pGroups = (tree[i].pGroups); tree[i].pGroups = (lib_hold); vector zgroup = tree[z].getGroup(); tree[z].setGroup(tree[i].getGroup()); tree[i].setGroup(zgroup); string zname = tree[z].getName(); tree[z].setName(tree[i].getName()); tree[i].setName(zname); map gcount_hold = tree[z].pcount; tree[z].pcount = (tree[i].pcount); tree[i].pcount = (gcount_hold); } for (int k = 0; k < (tree[i].getGroup()).size(); k++) { groupNodeInfo[(tree[i].getGroup())[k]].push_back(i); } for (int k = 0; k < (tree[z].getGroup()).size(); k++) { groupNodeInfo[(tree[z].getGroup())[k]].push_back(z); } } } catch(exception& e) { m->errorOut(e, "Tree", "randomLabels"); exit(1); } } /**************************************************************************************************/ void Tree::randomBlengths() { try { for(int i=numNodes-1;i>=0;i--){ int z = int((float)(i+1) * (float)(rand()) / ((float)RAND_MAX+1.0)); float bl_hold = tree[z].getBranchLength(); tree[z].setBranchLength(tree[i].getBranchLength()); tree[i].setBranchLength(bl_hold); } } catch(exception& e) { m->errorOut(e, "Tree", "randomBlengths"); exit(1); } } /*************************************************************************************************/ void Tree::assembleRandomUnifracTree(vector g) { randomLabels(g); assembleTree(); } /*************************************************************************************************/ void Tree::assembleRandomUnifracTree(string groupA, string groupB) { vector temp; temp.push_back(groupA); temp.push_back(groupB); randomLabels(temp); assembleTree(); } /*************************************************************************************************/ //for now it's just random topology but may become random labels as well later that why this is such a simple function now... void Tree::assembleRandomTree() { randomTopology(); assembleTree(); } /**************************************************************************************************/ void Tree::randomTopology() { try { for(int i=0;ierrorOut(e, "Tree", "randomTopology"); exit(1); } } /*****************************************************************/ void Tree::print(ostream& out) { try { int root = findRoot(); printBranch(root, out, "branch"); out << ";" << endl; } catch(exception& e) { m->errorOut(e, "Tree", "print"); exit(1); } } /*****************************************************************/ void Tree::print(ostream& out, map nameMap) { try { int root = findRoot(); printBranch(root, out, nameMap); out << ";" << endl; } catch(exception& e) { m->errorOut(e, "Tree", "print"); exit(1); } } /*****************************************************************/ void Tree::print(ostream& out, string mode) { try { int root = findRoot(); printBranch(root, out, mode); out << ";" << endl; } catch(exception& e) { m->errorOut(e, "Tree", "print"); exit(1); } } /*****************************************************************/ // This prints out the tree in Newick form. void Tree::createNewickFile(string f) { try { int root = findRoot(); filename = f; m->openOutputFile(filename, out); printBranch(root, out, "branch"); // you are at the end of the tree out << ";" << endl; out.close(); } catch(exception& e) { m->errorOut(e, "Tree", "createNewickFile"); exit(1); } } /*****************************************************************/ //This function finds the index of the root node. int Tree::findRoot() { try { for (int i = 0; i < numNodes; i++) { //you found the root if (tree[i].getParent() == -1) { return i; } //cout << "i = " << i << endl; //cout << "i's parent = " << tree[i].getParent() << endl; } return -1; } catch(exception& e) { m->errorOut(e, "Tree", "findRoot"); exit(1); } } /*****************************************************************/ void Tree::printBranch(int node, ostream& out, map names) { try { // you are not a leaf if (tree[node].getLChild() != -1) { out << "("; printBranch(tree[node].getLChild(), out, names); out << ","; printBranch(tree[node].getRChild(), out, names); out << ")"; //if there is a branch length then print it if (tree[node].getBranchLength() != -1) { out << ":" << tree[node].getBranchLength(); } }else { //you are a leaf map::iterator itNames = names.find(tree[node].getName()); string outputString = ""; if (itNames != names.end()) { vector dupNames; m->splitAtComma((itNames->second), dupNames); if (dupNames.size() == 1) { outputString += tree[node].getName(); if (tree[node].getBranchLength() != -1) { outputString += ":" + toString(tree[node].getBranchLength()); } }else { outputString += "("; for (int u = 0; u < dupNames.size()-1; u++) { outputString += dupNames[u]; if (tree[node].getBranchLength() != -1) { outputString += ":" + toString(0.0); } outputString += ","; } outputString += dupNames[dupNames.size()-1]; if (tree[node].getBranchLength() != -1) { outputString += ":" + toString(0.0); } outputString += ")"; if (tree[node].getBranchLength() != -1) { outputString += ":" + toString(tree[node].getBranchLength()); } } }else { outputString = tree[node].getName(); //if there is a branch length then print it if (tree[node].getBranchLength() != -1) { outputString += ":" + toString(tree[node].getBranchLength()); } m->mothurOut("[ERROR]: " + tree[node].getName() + " is not in your namefile, please correct."); m->mothurOutEndLine(); } out << outputString; } } catch(exception& e) { m->errorOut(e, "Tree", "printBranch"); exit(1); } } /*****************************************************************/ void Tree::printBranch(int node, ostream& out, string mode) { try { // you are not a leaf if (tree[node].getLChild() != -1) { out << "("; printBranch(tree[node].getLChild(), out, mode); out << ","; printBranch(tree[node].getRChild(), out, mode); out << ")"; if (mode == "branch") { //if there is a branch length then print it if (tree[node].getBranchLength() != -1) { out << ":" << tree[node].getBranchLength(); } }else if (mode == "boot") { //if there is a label then print it if (tree[node].getLabel() != "") { out << tree[node].getLabel(); } }else if (mode == "both") { if (tree[node].getLabel() != "") { out << tree[node].getLabel(); } //if there is a branch length then print it if (tree[node].getBranchLength() != -1) { out << ":" << tree[node].getBranchLength(); } } }else { //you are a leaf vector leafGroup = ct->getGroups(tree[node].getName()); if (mode == "branch") { out << leafGroup[0]; //if there is a branch length then print it if (tree[node].getBranchLength() != -1) { out << ":" << tree[node].getBranchLength(); } }else if (mode == "boot") { out << leafGroup[0]; //if there is a label then print it if (tree[node].getLabel() != "") { out << tree[node].getLabel(); } }else if (mode == "both") { out << tree[node].getName(); if (tree[node].getLabel() != "") { out << tree[node].getLabel(); } //if there is a branch length then print it if (tree[node].getBranchLength() != -1) { out << ":" << tree[node].getBranchLength(); } } } } catch(exception& e) { m->errorOut(e, "Tree", "printBranch"); exit(1); } } /*****************************************************************/ void Tree::printBranch(int node, ostream& out, string mode, vector& theseNodes) { try { // you are not a leaf if (theseNodes[node].getLChild() != -1) { out << "("; printBranch(theseNodes[node].getLChild(), out, mode); out << ","; printBranch(theseNodes[node].getRChild(), out, mode); out << ")"; if (mode == "branch") { //if there is a branch length then print it if (theseNodes[node].getBranchLength() != -1) { out << ":" << theseNodes[node].getBranchLength(); } }else if (mode == "boot") { //if there is a label then print it if (theseNodes[node].getLabel() != "") { out << theseNodes[node].getLabel(); } }else if (mode == "both") { if (theseNodes[node].getLabel() != "") { out << theseNodes[node].getLabel(); } //if there is a branch length then print it if (theseNodes[node].getBranchLength() != -1) { out << ":" << theseNodes[node].getBranchLength(); } } }else { //you are a leaf vector leafGroup = ct->getGroups(theseNodes[node].getName()); if (mode == "branch") { out << leafGroup[0]; //if there is a branch length then print it if (theseNodes[node].getBranchLength() != -1) { out << ":" << theseNodes[node].getBranchLength(); } }else if (mode == "boot") { out << leafGroup[0]; //if there is a label then print it if (theseNodes[node].getLabel() != "") { out << theseNodes[node].getLabel(); } }else if (mode == "both") { out << theseNodes[node].getName(); if (theseNodes[node].getLabel() != "") { out << theseNodes[node].getLabel(); } //if there is a branch length then print it if (theseNodes[node].getBranchLength() != -1) { out << ":" << theseNodes[node].getBranchLength(); } } } } catch(exception& e) { m->errorOut(e, "Tree", "printBranch"); exit(1); } } /*****************************************************************/ void Tree::printTree() { for(int i=0;igetTreeFile(); ifstream filehandle; m->openInputFile(filename, filehandle); int c, comment; comment = 0; int done = 1; //ifyou are not a nexus file if((c = filehandle.peek()) != '#') { while((c = filehandle.peek()) != ';') { if (m->control_pressed) { filehandle.close(); return 0; } while ((c = filehandle.peek()) != ';') { if (m->control_pressed) { filehandle.close(); return 0; } // get past comments if(c == '[') { comment = 1; } if(c == ']'){ comment = 0; } if((c == '(') && (comment != 1)){ break; } filehandle.get(); } done = readTreeString(filehandle); if (done == 0) { break; } } //ifyou are a nexus file }else if((c = filehandle.peek()) == '#') { string holder = ""; // get past comments while(holder != "translate" && holder != "Translate"){ if (m->control_pressed) { filehandle.close(); return 0; } if(holder == "[" || holder == "[!"){ comment = 1; } if(holder == "]"){ comment = 0; } filehandle >> holder; //if there is no translate then you must read tree string otherwise use translate to get names if((holder == "tree") && (comment != 1)){ //pass over the "tree rep.6878900 = " while (((c = filehandle.get()) != '(') && ((c = filehandle.peek()) != EOF)) {;} if(c == EOF) { break; } filehandle.putback(c); //put back first ( of tree. done = readTreeString(filehandle); break; } if (done == 0) { break; } } //use nexus translation rather than parsing tree to save time if((holder == "translate") || (holder == "Translate")) { string number, name, h; h = ""; // so it enters the loop the first time while((h != ";") && (number != ";")) { if (m->control_pressed) { filehandle.close(); return 0; } filehandle >> number; filehandle >> name; //c = , until done with translation then c = ; h = name.substr(name.length()-1, name.length()); name.erase(name.end()-1); //erase the comma m->Treenames.push_back(number); } if(number == ";") { m->Treenames.pop_back(); } //in case ';' from translation is on next line instead of next to last name } } filehandle.close(); return 0; //for (int i = 0; i < globaldata->Treenames.size(); i++) { //cout << globaldata->Treenames[i] << endl; } //cout << globaldata->Treenames.size() << endl; } catch(exception& e) { m->errorOut(e, "Tree", "parseTreeFile"); exit(1); } } /*******************************************************/ /*******************************************************/ int Tree::readTreeString(ifstream& filehandle) { try { int c; string name; //, k while((c = filehandle.peek()) != ';') { if (m->control_pressed) { return 0; } //k = c; //cout << " at beginning of while " << k << endl; if(c == ')') { //to pass over labels in trees c=filehandle.get(); while((c!=',') && (c != -1) && (c!= ':') && (c!=';')){ c=filehandle.get(); } filehandle.putback(c); } if(c == ';') { return 0; } if(c == -1) { return 0; } //if you are a name if((c != '(') && (c != ')') && (c != ',') && (c != ':') && (c != '\n') && (c != '\t') && (c != 32)) { //32 is space name = ""; c = filehandle.get(); //k = c; //cout << k << endl; while ((c != '(') && (c != ')') && (c != ',') && (c != ':') && (c != '\n') && (c != 32) && (c != '\t')) { name += c; c = filehandle.get(); //k = c; //cout << " in name while " << k << endl; } //cout << "name = " << name << endl; if (name != "\r" ) { m->Treenames.push_back(name); } //cout << m->Treenames.size() << '\t' << name << endl; filehandle.putback(c); //k = c; //cout << " after putback" << k << endl; } if(c == ':') { //read until you reach the end of the branch length while ((c != '(') && (c != ')') && (c != ',') && (c != ';') && (c != '\n') && (c != '\t') && (c != 32)) { c = filehandle.get(); //k = c; //cout << " in branch while " << k << endl; } filehandle.putback(c); } c = filehandle.get(); //k = c; //cout << " here after get " << k << endl; if(c == ';') { return 0; } if(c == ')') { filehandle.putback(c); } //k = c; //cout << k << endl; } return 0; } catch(exception& e) { m->errorOut(e, "Tree", "readTreeString"); exit(1); } } /*******************************************************/ /*******************************************************/ mothur-1.36.1/source/datastructures/tree.h000066400000000000000000000070521255543666200206270ustar00rootroot00000000000000#ifndef TREE_H #define TREE_H /* * tree.h * Mothur * * Created by Sarah Westcott on 1/22/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "treenode.h" #include "counttable.h" /* This class represents the treefile. */ class Tree { public: Tree(string); //do not use tree generated by this constructor its just to extract the treenames, its a chicken before the egg thing that needs to be revisited. Tree(int, CountTable*); Tree(CountTable*); //to generate a tree from a file Tree(CountTable*, vector< vector >&); //create tree from sim matrix ~Tree(); CountTable* getCountTable() { return ct; } void getCopy(Tree*); //makes tree a copy of the one passed in. void getCopy(Tree* copy, bool); //makes a copy of the tree structure passed in, (just parents, children and br). Used with the Tree(TreeMap*) constructor. Assumes the tmap already has set seqs groups you want. Used by subsample to reassign seqs you don't want included to group "doNotIncludeMe". void getSubTree(Tree*, vector); //makes tree a that contains only the names passed in. //int getSubTree(Tree* originalToCopy, vector seqToInclude, map nameMap); //used with (int, TreeMap) constructor. SeqsToInclude contains subsample wanted - assumes these are unique seqs and size of vector=numLeaves passed into constructor. nameMap is unique -> redundantList can be empty if no namesfile was provided. void assembleRandomTree(); void assembleRandomUnifracTree(vector); void assembleRandomUnifracTree(string, string); void createNewickFile(string); int getIndex(string); void setIndex(string, int); int getNumNodes() { return numNodes; } int getNumLeaves(){ return numLeaves; } map mergeUserGroups(int, vector); //returns a map with a groupname and the number of times that group was seen in the children void printTree(); void print(ostream&); void print(ostream&, string); void print(ostream&, map); int findRoot(); //return index of root node //this function takes the leaf info and populates the non leaf nodes int assembleTree(); vector tree; //the first n nodes are the leaves, where n is the number of sequences. map< string, vector > groupNodeInfo; //maps group to indexes of leaf nodes with that group, different groups may contain same node because of names file. private: CountTable* ct; int numNodes, numLeaves; ofstream out; string filename; //map names; map::iterator it, it2; map mergeGroups(int); //returns a map with a groupname and the number of times that group was seen in the children map mergeGcounts(int); map indexes; //maps seqName -> index in tree vector void addNamesToCounts(map); void randomTopology(); void randomBlengths(); void randomLabels(vector); //void randomLabels(string, string); void printBranch(int, ostream&, map); //recursively print out tree void printBranch(int, ostream&, string); int parseTreeFile(); //parses through tree file to find names of nodes and number of them //this is required in case user has sequences in the names file that are //not included in the tree. //only takes names from the first tree in the tree file and assumes that all trees use the same names. int readTreeString(ifstream&); int populateNewTree(vector&, int, int&); void printBranch(int, ostream&, string, vector&); MothurOut* m; }; #endif mothur-1.36.1/source/datastructures/treemap.cpp000066400000000000000000000307751255543666200216700ustar00rootroot00000000000000/* * treemap.cpp * Mothur * * Created by Sarah Westcott on 1/26/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "treemap.h" /************************************************************/ TreeMap::TreeMap(string filename) { m = MothurOut::getInstance(); ofstream out2; m->openOutputFileAppend(filename, out2); out2 << endl; out2.close(); groupFileName = filename; m->openInputFile(filename, fileHandle); } /************************************************************/ TreeMap::~TreeMap(){} /************************************************************/ int TreeMap::readMap(string gf) { try { ofstream out2; m->openOutputFileAppend(gf, out2); out2 << endl; out2.close(); groupFileName = gf; m->openInputFile(gf, fileHandle); string seqName, seqGroup; int error = 0; string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; while (!fileHandle.eof()) { if (m->control_pressed) { fileHandle.close(); return 1; } fileHandle.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, fileHandle.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); map::iterator itCheck = treemap.find(seqName); if (itCheck != treemap.end()) { error = 1; m->mothurOut("[WARNING]: Your groupfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { namesOfSeqs.push_back(seqName); treemap[seqName].groupname = seqGroup; //store data in map it2 = seqsPerGroup.find(seqGroup); if (it2 == seqsPerGroup.end()) { //if it's a new group seqsPerGroup[seqGroup] = 1; }else {//it's a group we already have seqsPerGroup[seqGroup]++; } } pairDone = false; } } } fileHandle.close(); if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); map::iterator itCheck = treemap.find(seqName); if (itCheck != treemap.end()) { error = 1; m->mothurOut("[WARNING]: Your groupfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { namesOfSeqs.push_back(seqName); treemap[seqName].groupname = seqGroup; //store data in map it2 = seqsPerGroup.find(seqGroup); if (it2 == seqsPerGroup.end()) { //if it's a new group seqsPerGroup[seqGroup] = 1; }else {//it's a group we already have seqsPerGroup[seqGroup]++; } } pairDone = false; } } } return error; } catch(exception& e) { m->errorOut(e, "TreeMap", "readMap"); exit(1); } } /************************************************************/ int TreeMap::readMap() { try { string seqName, seqGroup; int error = 0; string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; while (!fileHandle.eof()) { if (m->control_pressed) { fileHandle.close(); return 1; } fileHandle.read(buffer, 4096); vector pieces = m->splitWhiteSpace(rest, buffer, fileHandle.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); map::iterator itCheck = treemap.find(seqName); if (itCheck != treemap.end()) { error = 1; m->mothurOut("[WARNING]: Your groupfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { namesOfSeqs.push_back(seqName); treemap[seqName].groupname = seqGroup; //store data in map it2 = seqsPerGroup.find(seqGroup); if (it2 == seqsPerGroup.end()) { //if it's a new group seqsPerGroup[seqGroup] = 1; }else {//it's a group we already have seqsPerGroup[seqGroup]++; } } pairDone = false; } } } fileHandle.close(); if (rest != "") { vector pieces = m->splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { seqName = pieces[i]; columnOne=false; } else { seqGroup = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { setNamesOfGroups(seqGroup); map::iterator itCheck = treemap.find(seqName); if (itCheck != treemap.end()) { error = 1; m->mothurOut("[WARNING]: Your groupfile contains more than 1 sequence named " + seqName + ", sequence names must be unique. Please correct."); m->mothurOutEndLine(); } else { namesOfSeqs.push_back(seqName); treemap[seqName].groupname = seqGroup; //store data in map it2 = seqsPerGroup.find(seqGroup); if (it2 == seqsPerGroup.end()) { //if it's a new group seqsPerGroup[seqGroup] = 1; }else {//it's a group we already have seqsPerGroup[seqGroup]++; } } pairDone = false; } } } return error; } catch(exception& e) { m->errorOut(e, "TreeMap", "readMap"); exit(1); } } /************************************************************/ void TreeMap::addSeq(string seqName, string seqGroup) { namesOfSeqs.push_back(seqName); setNamesOfGroups(seqGroup); treemap[seqName].groupname = seqGroup; //store data in map it2 = seqsPerGroup.find(seqGroup); if (it2 == seqsPerGroup.end()) { //if it's a new group seqsPerGroup[seqGroup] = 1; }else {//it's a group we already have seqsPerGroup[seqGroup]++; } } /************************************************************/ void TreeMap::removeSeq(string seqName) { //erase name from namesOfSeqs for (int i = 0; i < namesOfSeqs.size(); i++) { if (namesOfSeqs[i] == seqName) { namesOfSeqs.erase(namesOfSeqs.begin()+i); break; } } //decrement sequences in this group string group = treemap[seqName].groupname; seqsPerGroup[group]--; //remove seq from treemap it = treemap.find(seqName); treemap.erase(it); } /************************************************************/ int TreeMap::getNumGroups() { return namesOfGroups.size(); } /************************************************************/ int TreeMap::getNumSeqs() { return namesOfSeqs.size(); } /************************************************************/ string TreeMap::getGroup(string sequenceName) { it = treemap.find(sequenceName); if (it != treemap.end()) { //sequence name was in group file return it->second.groupname; }else { return "not found"; } } /************************************************************/ void TreeMap::setNamesOfGroups(string seqGroup) { int i, count; count = 0; for (i=0; ierrorOut(e, "TreeMap", "isValidGroup"); exit(1); } } /***********************************************************************/ void TreeMap::print(ostream& output){ try { for(it = treemap.begin(); it != treemap.end(); it++){ output << it->first << '\t' << it->second.groupname << '\t' << it->second.vectorIndex << endl; } } catch(exception& e) { m->errorOut(e, "TreeMap", "print"); exit(1); } } /************************************************************/ void TreeMap::makeSim(vector ThisnamesOfGroups) { try { //set names of groups namesOfGroups = ThisnamesOfGroups; //set names of seqs to names of groups namesOfSeqs = ThisnamesOfGroups; // make map where key and value are both the group name since that what the tree.shared command wants for (int i = 0; i < namesOfGroups.size(); i++) { treemap[namesOfGroups[i]].groupname = namesOfGroups[i]; seqsPerGroup[namesOfGroups[i]] = 1; } numGroups = namesOfGroups.size(); } catch(exception& e) { m->errorOut(e, "TreeMap", "makeSim"); exit(1); } } /************************************************************/ void TreeMap::makeSim(ListVector* list) { try { //set names of groups namesOfGroups.clear(); for(int i = 0; i < list->size(); i++) { namesOfGroups.push_back(list->get(i)); } //set names of seqs to names of groups namesOfSeqs = namesOfGroups; // make map where key and value are both the group name since that what the tree.shared command wants for (int i = 0; i < namesOfGroups.size(); i++) { treemap[namesOfGroups[i]].groupname = namesOfGroups[i]; seqsPerGroup[namesOfGroups[i]] = 1; } numGroups = namesOfGroups.size(); } catch(exception& e) { m->errorOut(e, "TreeMap", "makeSim"); exit(1); } } /************************************************************/ int TreeMap::getCopy(TreeMap& copy){ try { namesOfGroups = copy.getNamesOfGroups(); numGroups = copy.getNumGroups(); namesOfSeqs = copy.namesOfSeqs; seqsPerGroup = copy.seqsPerGroup; treemap = copy.treemap; return 0; } catch(exception& e) { m->errorOut(e, "TreeMap", "getCopy"); exit(1); } } /************************************************************/ vector TreeMap::getNamesSeqs(){ try { vector names; for(it = treemap.begin(); it != treemap.end(); it++){ names.push_back(it->first); } return names; } catch(exception& e) { m->errorOut(e, "TreeMap", "getNamesSeqs"); exit(1); } } /************************************************************/ vector TreeMap::getNamesSeqs(vector picked){ try { vector names; for(it = treemap.begin(); it != treemap.end(); it++){ //if you are belong to one the the groups in the picked vector add you if (m->inUsersGroups(it->second.groupname, picked)) { names.push_back(it->first); } } return names; } catch(exception& e) { m->errorOut(e, "TreeMap", "getNamesSeqs"); exit(1); } } /************************************************************/ mothur-1.36.1/source/datastructures/treemap.h000066400000000000000000000036161255543666200213270ustar00rootroot00000000000000#ifndef TREEMAP_H #define TREEMAP_H /* * treemap.h * Mothur * * Created by Sarah Westcott on 1/26/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "listvector.hpp" /* This class is used by the read.tree command to build the tree container. */ struct GroupIndex { string groupname; int vectorIndex; }; class TreeMap { public: TreeMap() { m = MothurOut::getInstance(); } TreeMap(string); ~TreeMap(); int readMap(); int readMap(string); int getNumGroups(); int getNumSeqs(); //void setIndex(string, int); //sequencename, index //int getIndex(string); //returns vector index of sequence bool isValidGroup(string); //return true if string is a valid group void removeSeq(string); //removes a sequence, this is to accomadate trees that do not contain all the seqs in your groupfile string getGroup(string); void addSeq(string, string); void addGroup(string s) { setNamesOfGroups(s); } vector getNamesOfGroups() { sort(namesOfGroups.begin(), namesOfGroups.end()); return namesOfGroups; } void print(ostream&); void makeSim(vector); //takes groupmap info and fills treemap for use by tree.shared command. void makeSim(ListVector*); //takes listvector info and fills treemap for use by tree.shared command. vector getNamesSeqs(); vector getNamesSeqs(vector); //get names of seqs belonging to a group or set of groups int getCopy(TreeMap&); vector namesOfSeqs; map seqsPerGroup; //groupname, number of seqs in that group. map treemap; //sequence name and private: vector namesOfGroups; ifstream fileHandle; string groupFileName; int numGroups; map::iterator it; map::iterator it2; void setNamesOfGroups(string); MothurOut* m; }; #endif mothur-1.36.1/source/datastructures/treenode.cpp000066400000000000000000000064771255543666200220420ustar00rootroot00000000000000/* * treenode.cpp * Mothur * * Created by Sarah Westcott on 1/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "treenode.h" /****************************************************************/ Node::Node() { m = MothurOut::getInstance(); //initialize node name = ""; branchLength = -1; parent = -1; lchild = -1; rchild = -1; length2leaf = 0.0; label = ""; } /****************************************************************/ void Node::setName(string Name) { name = Name; } /****************************************************************/ void Node::setGroup(vector groups) { group =groups; } /****************************************************************/ void Node::setBranchLength(float l) { branchLength = l; } /****************************************************************/ void Node::setLabel(string l) { label = l; } /****************************************************************/ void Node::setLengthToLeaves(float l) { length2leaf = l; } /****************************************************************/ void Node::setParent(int p) { parent = p; } /****************************************************************/ void Node::setIndex(int i) { vectorIndex = i; } /****************************************************************/ void Node::setChildren(int lc, int rc) { lchild = lc; rchild = rc; } //leftchild, rightchild /****************************************************************/ string Node::getName() { return name; } /****************************************************************/ vector Node::getGroup() { return group; } /****************************************************************/ float Node::getBranchLength() { return branchLength; } /****************************************************************/ string Node::getLabel() { return label; } /****************************************************************/ float Node::getLengthToLeaves() { return length2leaf; } /****************************************************************/ int Node::getParent() { return parent; } /****************************************************************/ int Node::getLChild() { return lchild; } /****************************************************************/ int Node::getRChild() { return rchild; } /****************************************************************/ int Node::getIndex() { return vectorIndex; } /****************************************************************/ //to be used by printTree in the Tree class to print the leaf info void Node::printNode() { try{ m->mothurOut(name + " " + toString(parent) + " " + toString(lchild) + " " + toString(rchild) + " "); /*for (int i = 0; i < group.size(); i++) { m->mothurOut( group[i] + " "); } //there is a branch length if (branchLength != -1) { m->mothurOut(" " + toString(branchLength)); } m->mothurOut(" |"); map::iterator it; for(it=pGroups.begin();it!=pGroups.end();it++){ m->mothurOut(" " + it->first + ":" + toString(it->second)); } m->mothurOut(" |"); for(it=pcount.begin();it!=pcount.end();it++){ m->mothurOut(" " + it->first + ":" + toString(it->second)); }*/ m->mothurOutEndLine(); } catch(exception& e) { m->errorOut(e, "Node", "printNode"); exit(1); } } /****************************************************************/ mothur-1.36.1/source/datastructures/treenode.h000066400000000000000000000033141255543666200214720ustar00rootroot00000000000000#ifndef TREENODE_H #define TREENODE_H /* * treenode.h * Mothur * * Created by Sarah Westcott on 1/23/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" /* This class represents a node on a tree. */ class Node { public: Node(); //pass it the sequence name ~Node() { pGroups.clear(); pcount.clear(); }; void setName(string); void setGroup(vector); void setBranchLength(float); void setLabel(string); void setParent(int); void setChildren(int, int); //leftchild, rightchild void setIndex(int); void setLengthToLeaves(float); string getName(); vector getGroup(); float getBranchLength(); float getLengthToLeaves(); string getLabel(); int getParent(); int getLChild(); int getRChild(); int getIndex(); void printNode(); //prints out the name and the branch length //pGroup is the parsimony group info. i.e. for a leaf node it would contain 1 enter pGroup["groupname"] = 1; //but for a branch node it may contain several entries so if the nodes children are from different groups it //would have at least two entries pgroup["groupnameOfLeftChild"] = 1, pgroup["groupnameOfRightChild"] = 1. //pCount is the nodes descendant group infomation. i.e. pCount["black"] = 20 would mean that 20 of the nodes //descendant are from group black. map pGroups; //leaf nodes will only have 1 group, but branch nodes may have multiple groups. map pcount; private: string name, label; vector group; float branchLength, length2leaf; int parent; int lchild; int rchild; int vectorIndex; MothurOut* m; }; #endif mothur-1.36.1/source/display.h000066400000000000000000000017351255543666200162620ustar00rootroot00000000000000#ifndef DISPLAY_H #define DISPLAY_H #include "sabundvector.hpp" #include "sharedsabundvector.h" #include "calculator.h" #include "fileoutput.h" /***********************************************************************/ class Display { public: virtual void update(SAbundVector* rank) = 0; virtual void update(vector shared, int numSeqs, int numGroupComb) = 0; virtual void init(string) = 0; virtual void reset() = 0; virtual void close() = 0; virtual void outputTempFiles(string) {} virtual void inputTempFiles(string) {} virtual bool isCalcMultiple() = 0; virtual void setAll(bool){} virtual bool hasLciHci(){ return false; } virtual bool getAll() { bool a; return a; } virtual bool calcNeedsAll() { bool a; return a; } virtual string getName() { return ""; }; virtual ~Display() {} Display() { m = MothurOut::getInstance(); } protected: MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/dlibshuff.cpp000066400000000000000000000037621255543666200171200ustar00rootroot00000000000000/* * DLibshuff.cpp * Mothur * * Created by Pat Schloss on 4/8/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "dlibshuff.h" /***********************************************************************/ DLibshuff::DLibshuff(FullMatrix* D, int it, float step, float co) : Libshuff(D, it, step, co){ numDXs = int(cutOff / stepSize); } /***********************************************************************/ float DLibshuff::evaluatePair(int i, int j){ return dCalculate(i,j); } /***********************************************************************/ vector > DLibshuff::evaluateAll(){ savedMins.resize(numGroups); vector > dCXYValues(numGroups); for(int i=0;icontrol_pressed) { return sum; } minXY = getMinXY(x, y); if (m->control_pressed) { return sum; } vector nx = calcN(minX); if (m->control_pressed) { return sum; } vector nxy = calcN(minXY); if (m->control_pressed) { return sum; } for(int i=0;i DLibshuff::calcN(vector minVector){ vector counts(numDXs,0); int precision = int(1 / stepSize); for(int i=0;i > evaluateAll(); float evaluatePair(int, int); private: int numDXs; double dCalculate(int, int); vector calcN(vector); }; #endif mothur-1.36.1/source/endiannessmacros.h000066400000000000000000000110541255543666200201440ustar00rootroot00000000000000#ifndef EDIANNESSMACROS_H #define EDIANNESSMACROS_H /* * endiannessmacros.h * Mothur * * Created by westcott on 7/9/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ /*********************************************************************/ /*********************************************************************/ // The following is copied from the staden io_lib-1.12.4 os.h - thanks! /*********************************************************************/ /*********************************************************************/ /* * Author: * MRC Laboratory of Molecular Biology * Hills Road * Cambridge CB2 2QH * United Kingdom * * Description: operating system specific type definitions * */ /* Mac FAT binaries or unknown. Auto detect based on CPU type */ #if !defined(SP_BIG_ENDIAN) && !defined(SP_LITTLE_ENDIAN) /* * x86 equivalents */ #if defined(__i386) || defined(__i386__) || defined(__ia64__) || defined(WIN32) || defined(__arm__) || (defined(__mips__) && defined(__MIPSEL__)) || defined(__SYMBIAN32__) || \ defined(__x86_64__) || defined(__x86_64) || defined(__i686__) || defined(__i686) || defined(__amd64__) || defined(__amd64) || defined(__LITTLE_ENDIAN__) #define SP_LITTLE_ENDIAN #else #define SP_BIG_ENDIAN #endif /* * SUN Sparc */ #if defined(__sparc__) || defined(__sparc) # if defined(SP_LITTLE_ENDIAN) # undef SP_LITTLE_ENDIAN # endif # define SP_BIG_ENDIAN #endif /* * PowerPC */ #if defined(__ppc__) || defined(__ppc) # if defined(SP_LITTLE_ENDIAN) # undef SP_LITTLE_ENDIAN # endif # define SP_BIG_ENDIAN #endif /* Some catch-alls */ #if defined(__LITTLE_ENDIAN__) || defined(__LITTLEENDIAN__) # define SP_LITTLE_ENDIAN #endif #if defined(__BIG_ENDIAN__) || defined(__BIGENDIAN__) # define SP_BIG_ENDIAN #endif #if defined(SP_BIG_ENDIAN) && defined(SP_LITTLE_ENDIAN) # error Both BIG and LITTLE endian defined. Fix os.h and/or Makefile #endif #if !defined(SP_BIG_ENDIAN) && !defined(SP_LITTLE_ENDIAN) # error Neither BIG nor LITTLE endian defined. Fix os.h and/or Makefile #endif #endif /*----------------------------------------------------------------------------- * Byte swapping macros */ /* * Our new swap runs at the same speed on Ultrix, but substantially faster * (300% for swap_int4, ~50% for swap_int2) on an Alpha (due to the lack of * decent 'char' support). * * They also have the ability to swap in situ (src == dst). Newer code now * relies on this so don't change back! */ #define iswap_int8(x) \ (((x & 0x00000000000000ffLL) << 56) + \ ((x & 0x000000000000ff00LL) << 40) + \ ((x & 0x0000000000ff0000LL) << 24) + \ ((x & 0x00000000ff000000LL) << 8) + \ ((x & 0x000000ff00000000LL) >> 8) + \ ((x & 0x0000ff0000000000LL) >> 24) + \ ((x & 0x00ff000000000000LL) >> 40) + \ ((x & 0xff00000000000000LL) >> 56)) #define iswap_int4(x) \ (((x & 0x000000ff) << 24) + \ ((x & 0x0000ff00) << 8) + \ ((x & 0x00ff0000) >> 8) + \ ((x & 0xff000000) >> 24)) #define iswap_int2(x) \ (((x & 0x00ff) << 8) + \ ((x & 0xff00) >> 8)) #define swap_int8(src, dst) ((dst) = iswap_int8(src)) #define swap_int4(src, dst) ((dst) = iswap_int4(src)) #define swap_int2(src, dst) ((dst) = iswap_int2(src)) /* * Linux systems may use byteswap.h to get assembly versions of byte-swap * on intel systems. This can be as trivial as the bswap opcode, which works * out at over 2-times faster than iswap_int4 above. */ #if 0 #if defined(__linux__) # include # undef iswap_int8 # undef iswap_int4 # undef iswap_int2 # define iswap_int8 bswap_64 # define iswap_int4 bswap_32 # define iswap_int2 bswap_16 #endif #endif /* * Macros to specify that data read in is of a particular endianness. * The macros here swap to the appropriate order for the particular machine * running the macro and return the new answer. These may also be used when * writing to a file to specify that we wish to write in (eg) big endian * format. * * This leads to efficient code as most of the time these macros are * trivial. */ #ifdef SP_BIG_ENDIAN #define be_int8(x) (x) #define be_int4(x) (x) #define be_int2(x) (x) #define be_int1(x) (x) #define le_int8(x) iswap_int8((x)) #define le_int4(x) iswap_int4((x)) #define le_int2(x) iswap_int2((x)) #define le_int1(x) (x) #endif #ifdef SP_LITTLE_ENDIAN #define be_int8(x) iswap_int8((x)) #define be_int4(x) iswap_int4((x)) #define be_int2(x) iswap_int2((x)) #define be_int1(x) (x) #define le_int8(x) (x) #define le_int4(x) (x) #define le_int2(x) (x) #define le_int1(x) (x) #endif #endif mothur-1.36.1/source/engine.cpp000066400000000000000000000440641255543666200164170ustar00rootroot00000000000000/* * engine.cpp * * * Created by Pat Schloss on 8/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * There's a TON of duplicated code between InteractEngine and BatchEngine * I couldn't figure out how to transition between ifstream (batch) and cin (interact) * Fix later, don't have time now. * */ #include "engine.hpp" /***********************************************************************/ Engine::Engine(){ try { cFactory = CommandFactory::getInstance(); mout = MothurOut::getInstance(); } catch(exception& e) { mout->errorOut(e, "Engine", "Engine"); exit(1); } } /***********************************************************************/ /***********************************************************************/ InteractEngine::InteractEngine(string path){ string temppath = path.substr(0, (path.find_last_of("othur")-5)); //this will happen if you set the path variable to contain mothur's exe location if (temppath == "") { path = mout->findProgramPath("mothur"); } mout->argv = path; //if you haven't set your own location #ifdef MOTHUR_FILES #else //set default location to search for files to mothur's executable location. This will resolve issue of double-clicking on the executable which opens mothur and sets pwd to your home directory instead of the mothur directory and leads to "unable to find file" errors. string tempDefault = path.substr(0, (path.find_last_of('m'))); if (tempDefault != "") { mout->setDefaultPath(tempDefault); } #endif } /***********************************************************************/ InteractEngine::~InteractEngine(){} /***********************************************************************/ //This function allows the user to input commands one line at a time until they quit. //If the command is garbage it does nothing. bool InteractEngine::getInput(){ try { string input = ""; string commandName = ""; string options = ""; int quitCommandCalled = 0; while(quitCommandCalled != 1){ #ifdef USE_MPI int pid, processors; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); MPI_Comm_size(MPI_COMM_WORLD, &processors); if (pid == 0) { #endif if (mout->changedSeqNames) { mout->mothurOut("[WARNING]: your sequence names contained ':'. I changed them to '_' to avoid problems in your downstream analysis.\n"); } input = getCommand(); if (mout->control_pressed) { input = "quit()"; } //allow user to omit the () on the quit command if (input == "quit") { input = "quit()"; } #ifdef USE_MPI //send commandName for(int i = 1; i < processors; i++) { int length = input.length(); MPI_Send(&length, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&input[0], length, MPI_CHAR, i, 2001, MPI_COMM_WORLD); } }else { int length; MPI_Recv(&length, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); //recieve container char* tempBuf = new char[length]; MPI_Recv(&tempBuf[0], length, MPI_CHAR, 0, 2001, MPI_COMM_WORLD, &status); input = tempBuf; if (input.length() > length) { input = input.substr(0, length); } delete tempBuf; } #endif CommandOptionParser parser(input); commandName = parser.getCommandString(); options = parser.getOptionString(); if (commandName != "") { mout->executing = true; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if ((cFactory->MPIEnabled(commandName)) || (pid == 0)) { //cout << pid << " is in execute " << commandName << endl; #endif //executes valid command mout->changedSeqNames = false; mout->runParse = true; mout->clearGroups(); mout->clearAllGroups(); mout->Treenames.clear(); mout->saveNextLabel = ""; mout->commandInputsConvertError = false; mout->printedSharedHeaders = false; mout->currentSharedBinLabels.clear(); mout->sharedBinLabelsInFile.clear(); mout->printedListHeaders = false; mout->listBinLabelsInFile.clear(); Command* command = cFactory->getCommand(commandName, options); if (mout->commandInputsConvertError) { quitCommandCalled = 2; } else { quitCommandCalled = command->execute(); } //if we aborted command if (quitCommandCalled == 2) { mout->mothurOut("[ERROR]: did not complete " + commandName + ".\n"); } mout->control_pressed = 0; mout->executing = false; #ifdef USE_MPI } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #endif }else { mout->mothurOut("Invalid.\n"); } } return 1; } catch(exception& e) { mout->errorOut(e, "InteractEngine", "getInput"); exit(1); } } /***********************************************************************/ string Engine::getCommand() { try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_READLINE char* nextCommand = NULL; nextCommand = readline("\nmothur > "); if(nextCommand != NULL) { add_history(nextCommand); } else{ //^D causes null string and we want it to quit mothur nextCommand = strdup("quit"); mout->mothurOut(nextCommand); mout->mothurOut("\n"); } mout->mothurOutJustToLog("\nmothur > " + toString(nextCommand) + "\n"); return nextCommand; #else string nextCommand = ""; mout->mothurOut("\nmothur > "); getline(cin, nextCommand); mout->mothurOut("\n"); mout->mothurOutJustToLog("\nmothur > " + toString(nextCommand) + "\n"); return nextCommand; #endif #else string nextCommand = ""; mout->mothurOut("\nmothur > "); getline(cin, nextCommand); mout->mothurOut("\n"); mout->mothurOutJustToLog(toString(nextCommand) + "\n"); return nextCommand; #endif } catch(exception& e) { mout->errorOut(e, "Engine", "getCommand"); exit(1); } } /***********************************************************************/ //This function opens the batchfile to be used by BatchEngine::getInput. BatchEngine::BatchEngine(string path, string batchFileName){ try { openedBatch = mout->openInputFile(batchFileName, inputBatchFile); string temppath = path.substr(0, (path.find_last_of("othur")-5)); //this will happen if you set the path variable to contain mothur's exe location if (temppath == "") { path = mout->findProgramPath("mothur"); } mout->argv = path; //if you haven't set your own location #ifdef MOTHUR_FILES #else //set default location to search for files to mothur's executable location. This will resolve issue of double-clicking on the executable which opens mothur and sets pwd to your home directory instead of the mothur directory and leads to "unable to find file" errors. string tempDefault = path.substr(0, (path.find_last_of('m'))); if (tempDefault != "") { mout->setDefaultPath(tempDefault); } #endif } catch(exception& e) { mout->errorOut(e, "BatchEngine", "BatchEngine"); exit(1); } } /***********************************************************************/ BatchEngine::~BatchEngine(){ } /***********************************************************************/ //This Function allows the user to run a batchfile containing several commands on Dotur bool BatchEngine::getInput(){ try { //check if this is a valid batchfile if (openedBatch == 1) { mout->mothurOut("unable to open batchfile\n"); return 1; } string input = ""; string commandName = ""; string options = ""; //CommandFactory cFactory; int quitCommandCalled = 0; int count = 0; while(quitCommandCalled != 1){ #ifdef USE_MPI int pid, processors; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); MPI_Comm_size(MPI_COMM_WORLD, &processors); if (pid == 0) { #endif input = getNextCommand(inputBatchFile); count++; #ifdef USE_MPI //send commandName for(int i = 1; i < processors; i++) { int length = input.length(); MPI_Send(&length, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); MPI_Send(&input[0], length, MPI_CHAR, i, 2001, MPI_COMM_WORLD); } }else { int length; MPI_Recv(&length, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); //recieve container char* tempBuf = new char[length]; MPI_Recv(&tempBuf[0], length, MPI_CHAR, 0, 2001, MPI_COMM_WORLD, &status); input = tempBuf; if (input.length() > length) { input = input.substr(0, length); } delete tempBuf; } #endif if (input[0] != '#') { if (mout->changedSeqNames) { mout->mothurOut("[WARNING]: your sequence names contained ':'. I changed them to '_' to avoid problems in your downstream analysis.\n"); } mout->mothurOut("\nmothur > " + input + "\n"); if (mout->control_pressed) { input = "quit()"; } //allow user to omit the () on the quit command if (input == "quit") { input = "quit()"; } CommandOptionParser parser(input); commandName = parser.getCommandString(); options = parser.getOptionString(); if (commandName != "") { mout->executing = true; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //cout << pid << " is here " << commandName << '\t' << count << endl; if ((cFactory->MPIEnabled(commandName)) || (pid == 0)) { #endif //executes valid command mout->changedSeqNames = false; mout->runParse = true; mout->clearGroups(); mout->clearAllGroups(); mout->Treenames.clear(); mout->saveNextLabel = ""; mout->commandInputsConvertError = false; mout->printedSharedHeaders = false; mout->currentSharedBinLabels.clear(); mout->sharedBinLabelsInFile.clear(); mout->printedListHeaders = false; mout->listBinLabelsInFile.clear(); Command* command = cFactory->getCommand(commandName, options); if (mout->commandInputsConvertError) { quitCommandCalled = 2; } else { quitCommandCalled = command->execute(); } //if we aborted command if (quitCommandCalled == 2) { mout->mothurOut("[ERROR]: did not complete " + commandName + ".\n"); } mout->control_pressed = 0; mout->executing = false; #ifdef USE_MPI } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #endif }else { mout->mothurOut("Invalid.\n"); } } mout->gobble(inputBatchFile); } inputBatchFile.close(); return 1; } catch(exception& e) { mout->errorOut(e, "BatchEngine", "getInput"); exit(1); } } /***********************************************************************/ string BatchEngine::getNextCommand(ifstream& inputBatchFile) { try { string nextcommand = ""; if (inputBatchFile.eof()) { nextcommand = "quit()"; } else { nextcommand = mout->getline(inputBatchFile); } return nextcommand; } catch(exception& e) { mout->errorOut(e, "BatchEngine", "getNextCommand"); exit(1); } } /***********************************************************************/ /***********************************************************************/ //This function opens the batchfile to be used by BatchEngine::getInput. ScriptEngine::ScriptEngine(string path, string commandString){ try { //remove quotes listOfCommands = commandString.substr(1, (commandString.length()-1)); string temppath = path.substr(0, (path.find_last_of("othur")-5)); //this will happen if you set the path variable to contain mothur's exe location if (temppath == "") { path = mout->findProgramPath("mothur"); } mout->argv = path; //if you haven't set your own location #ifdef MOTHUR_FILES #else //set default location to search for files to mothur's executable location. This will resolve issue of double-clicking on the executable which opens mothur and sets pwd to your home directory instead of the mothur directory and leads to "unable to find file" errors. string tempDefault = path.substr(0, (path.find_last_of('m'))); if (tempDefault != "") { mout->setDefaultPath(tempDefault); } #endif } catch(exception& e) { mout->errorOut(e, "ScriptEngine", "ScriptEngine"); exit(1); } } /***********************************************************************/ ScriptEngine::~ScriptEngine(){ } /***********************************************************************/ //This Function allows the user to run a batchfile containing several commands on mothur bool ScriptEngine::getInput(){ try { string input = ""; string commandName = ""; string options = ""; //CommandFactory cFactory; int quitCommandCalled = 0; while(quitCommandCalled != 1){ #ifdef USE_MPI int pid, processors; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); MPI_Comm_size(MPI_COMM_WORLD, &processors); if (pid == 0) { //cout << pid << " is here " << processors << endl; #endif input = getNextCommand(listOfCommands); if (input == "") { input = "quit()"; } if (mout->changedSeqNames) { mout->mothurOut("[WARNING]: your sequence names contained ':'. I changed them to '_' to avoid problems in your downstream analysis.\n"); } if (mout->gui) { if ((input.find("quit") != string::npos) || (input.find("set.logfile") != string::npos)) {} else if ((input.find("get.current") != string::npos) && (!mout->hasCurrentFiles())) {} else { mout->mothurOut("\nmothur > " + input + "\n"); } }else{ mout->mothurOut("\nmothur > " + input + "\n"); } #ifdef USE_MPI //send commandName for(int i = 1; i < processors; i++) { //cout << pid << " is here " << input << endl; int length = input.length(); MPI_Send(&length, 1, MPI_INT, i, 2001, MPI_COMM_WORLD); //cout << pid << " is here " << length << '\t' << input << endl; MPI_Send(&input[0], length, MPI_CHAR, i, 2001, MPI_COMM_WORLD); //cout << pid << " is here " << length << '\t' << input << endl; } }else { int length; MPI_Recv(&length, 1, MPI_INT, 0, 2001, MPI_COMM_WORLD, &status); //cout << pid << " is here " << length << endl; //recieve container char* tempBuf = new char[length]; MPI_Recv(&tempBuf[0], length, MPI_CHAR, 0, 2001, MPI_COMM_WORLD, &status); //cout << pid << " is here " << length << '\t' << tempBuf << endl; input = tempBuf; if (input.length() > length) { input = input.substr(0, length); } delete tempBuf; } #endif if (mout->control_pressed) { input = "quit()"; } //allow user to omit the () on the quit command if (input == "quit") { input = "quit()"; } CommandOptionParser parser(input); commandName = parser.getCommandString(); options = parser.getOptionString(); if (commandName != "") { mout->executing = true; #ifdef USE_MPI int pid, numProcesses; MPI_Comm_rank(MPI_COMM_WORLD, &pid); MPI_Comm_size(MPI_COMM_WORLD, &numProcesses); //cout << pid << " is here " << commandName << endl; if ((cFactory->MPIEnabled(commandName)) || (pid == 0)) { //cout << pid << " is in execute" << endl; #endif //executes valid command mout->changedSeqNames = false; mout->runParse = true; mout->clearGroups(); mout->clearAllGroups(); mout->Treenames.clear(); mout->saveNextLabel = ""; mout->commandInputsConvertError = false; mout->printedSharedHeaders = false; mout->currentSharedBinLabels.clear(); mout->sharedBinLabelsInFile.clear(); mout->printedListHeaders = false; mout->listBinLabelsInFile.clear(); Command* command = cFactory->getCommand(commandName, options); if (mout->commandInputsConvertError) { quitCommandCalled = 2; } else { quitCommandCalled = command->execute(); } //if we aborted command if (quitCommandCalled == 2) { mout->mothurOut("[ERROR]: did not complete " + commandName + ".\n"); } mout->control_pressed = 0; mout->executing = false; #ifdef USE_MPI //cout << pid << " is done in execute" << endl; } MPI_Barrier(MPI_COMM_WORLD); //make everyone wait - just in case #endif }else { mout->mothurOut("Invalid.\n"); } } return 1; } catch(exception& e) { mout->errorOut(e, "ScriptEngine", "getInput"); exit(1); } } /***********************************************************************/ string ScriptEngine::getNextCommand(string& commandString) { try { string nextcommand = ""; int count = 0; bool ignoreSemiColons = false; //go through string until you reach ; or end while (count < commandString.length()) { //you want to ignore any ; until you reach the next ' if ((commandString[count] == '\'') && (!ignoreSemiColons)) { ignoreSemiColons = true; } else if ((commandString[count] == '\'') && (ignoreSemiColons)) { ignoreSemiColons = false; } if ((commandString[count] == ';') && (!ignoreSemiColons)) { break; } else { nextcommand += commandString[count]; } count++; } //if you are not at the end if (count != commandString.length()) { commandString = commandString.substr(count+1, commandString.length()); } else { commandString = ""; } //get rid of spaces in between commands if any if (commandString.length() > 0) { while (commandString[0] == ' ') { commandString = commandString.substr(1,commandString.length()); if (commandString.length() == 0) { break; } } } return nextcommand; } catch(exception& e) { mout->errorOut(e, "ScriptEngine", "getNextCommand"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/engine.hpp000066400000000000000000000024661255543666200164240ustar00rootroot00000000000000#ifndef ENGINE_HPP #define ENGINE_HPP /* * engine.hpp * * * Created by Pat Schloss on 8/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "commandoptionparser.hpp" #include "command.hpp" #include "commandfactory.hpp" #include "mothurout.h" class Engine { public: Engine(); virtual ~Engine(){} virtual bool getInput() = 0; virtual string getCommand(); virtual string getOutputDir() { return cFactory->getOutputDir(); } virtual string getLogFileName() { return cFactory->getLogfileName(); } virtual bool getAppend() { return cFactory->getAppend(); } vector getOptions() { return options; } protected: vector options; CommandFactory* cFactory; MothurOut* mout; }; class BatchEngine : public Engine { public: BatchEngine(string, string); ~BatchEngine(); virtual bool getInput(); int openedBatch; private: ifstream inputBatchFile; string getNextCommand(ifstream&); }; class InteractEngine : public Engine { public: InteractEngine(string); ~InteractEngine(); virtual bool getInput(); private: }; class ScriptEngine : public Engine { public: ScriptEngine(string, string); ~ScriptEngine(); virtual bool getInput(); int openedBatch; private: string listOfCommands; string getNextCommand(string&); }; #endif mothur-1.36.1/source/fileoutput.cpp000066400000000000000000000253051255543666200173470ustar00rootroot00000000000000/* * fileoutput.cpp * Dotur * * Created by Sarah Westcott on 11/18/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "fileoutput.h" /***********************************************************************/ ThreeColumnFile::~ThreeColumnFile(){ inFile.close(); outFile.close(); m->mothurRemove(outName); } /***********************************************************************/ void ThreeColumnFile::initFile(string label){ try { if(counter != 0){ m->openOutputFile(outName, outFile); m->openInputFile(inName, inFile); string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << '\t' << label << "\tlci\thci" << endl; } else{ m->openOutputFile(outName, outFile); outFile << "numsampled\t" << label << "\tlci\thci" << endl; } outFile.setf(ios::fixed, ios::floatfield); outFile.setf(ios::showpoint); } catch(exception& e) { m->errorOut(e, "ThreeColumnFile", "initFile"); exit(1); } } /***********************************************************************/ void ThreeColumnFile::output(int nSeqs, vector data){ try { if(counter != 0){ string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << setprecision(4) << '\t' << data[0] << '\t' << data[1] << '\t' << data[2] << endl; } else{ outFile << nSeqs << setprecision(4) << '\t' << data[0] << '\t' << data[1] << '\t' << data[2] << endl; } } catch(exception& e) { m->errorOut(e, "ThreeColumnFile", "output"); exit(1); } } /***********************************************************************/ void ThreeColumnFile::resetFile(){ try { if(counter != 0){ outFile.close(); inFile.close(); } else{ outFile.close(); } counter = 1; m->mothurRemove(inName); renameOk = rename(outName.c_str(), inName.c_str()); //renameFile(outName, inName); //checks to make sure user was able to rename and remove successfully if ((renameOk != 0)) { m->mothurOut("Unable to rename " + outName); m->mothurOutEndLine(); perror(" : "); } } catch(exception& e) { m->errorOut(e, "ThreeColumnFile", "resetFile"); exit(1); } } /***********************************************************************/ /***********************************************************************/ ColumnFile::~ColumnFile(){ inFile.close(); outFile.close(); m->mothurRemove(outName); } /***********************************************************************/ void ColumnFile::initFile(string label, vector tags){ try { if(counter != 0){ m->openOutputFile(outName, outFile); m->openInputFile(inName, inFile); string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << '\t'; for(int i = 0; i < tags.size(); i++) { outFile << label + tags[i] << '\t'; } outFile << endl; } else{ m->openOutputFile(outName, outFile); for(int i = 0; i < tags.size(); i++) { outFile << label + tags[i] << '\t'; } outFile << endl; } outFile.setf(ios::fixed, ios::floatfield); outFile.setf(ios::showpoint); } catch(exception& e) { m->errorOut(e, "ColumnFile", "initFile"); exit(1); } } /***********************************************************************/ void ColumnFile::output(vector data){ try { if(counter != 0){ string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << '\t' << setprecision(6) << data[0] << setprecision(iters.length()); for (int i = 1; i< data.size(); i++) { outFile << '\t' << data[i]; } outFile << endl; } else{ outFile << setprecision(6) << data[0] << setprecision(iters.length()); for (int i = 1; i< data.size(); i++) { outFile << '\t' << data[i]; } outFile << endl; } } catch(exception& e) { m->errorOut(e, "ColumnFile", "output"); exit(1); } } /***********************************************************************/ void ColumnFile::resetFile(){ try { if(counter != 0){ outFile.close(); inFile.close(); } else{ outFile.close(); } counter = 1; m->mothurRemove(inName); renameOk = rename(outName.c_str(), inName.c_str()); //renameFile(outName, inName); //checks to make sure user was able to rename and remove successfully if ((renameOk != 0)) { m->mothurOut("Unable to rename " + outName); m->mothurOutEndLine(); perror(" : "); } } catch(exception& e) { m->errorOut(e, "ColumnFile", "resetFile"); exit(1); } } /***********************************************************************/ /***********************************************************************/ SharedThreeColumnFile::~SharedThreeColumnFile(){ inFile.close(); outFile.close(); m->mothurRemove(outName); } /***********************************************************************/ void SharedThreeColumnFile::initFile(string label){ try { if(counter != 0){ m->openOutputFile(outName, outFile); m->openInputFile(inName, inFile); string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << '\t' << label << "\tlci\thci" << endl; } else{ m->openOutputFile(outName, outFile); outFile << "numsampled\t" << groupLabel << '\t' << label << "\tlci\thci" << endl; } outFile.setf(ios::fixed, ios::floatfield); outFile.setf(ios::showpoint); } catch(exception& e) { m->errorOut(e, "SharedThreeColumnFile", "initFile"); exit(1); } } /***********************************************************************/ void SharedThreeColumnFile::output(int nSeqs, vector data){ try { if(counter != 0){ string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << setprecision(4) << '\t' << data[0] << '\t' << data[1] << '\t' << data[2] << endl; } else{ outFile << numGroup << setprecision(4) << '\t' << data[0] << '\t' << data[1] << '\t' << data[2] << endl; numGroup++; } } catch(exception& e) { m->errorOut(e, "SharedThreeColumnFile", "output"); exit(1); } } /***********************************************************************/ void SharedThreeColumnFile::resetFile(){ try { if(counter != 0){ outFile.close(); inFile.close(); } else{ outFile.close(); } counter = 1; m->mothurRemove(inName); renameOk = rename(outName.c_str(), inName.c_str()); //renameFile(outName, inName); //checks to make sure user was able to rename and remove successfully if ((renameOk != 0)) { m->mothurOut("Unable to rename " + outName); m->mothurOutEndLine(); perror(" : "); } } catch(exception& e) { m->errorOut(e, "SharedThreeColumnFile", "resetFile"); exit(1); } } /***********************************************************************/ /***********************************************************************/ OneColumnFile::~OneColumnFile(){ inFile.close(); outFile.close(); m->mothurRemove(outName); } /***********************************************************************/ void OneColumnFile::initFile(string label){ try { if(counter != 0){ m->openOutputFile(outName, outFile); m->openInputFile(inName, inFile); string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << '\t' << label << endl; } else{ m->openOutputFile(outName, outFile); outFile << "numsampled\t" << label << endl; } outFile.setf(ios::fixed, ios::floatfield); outFile.setf(ios::showpoint); } catch(exception& e) { m->errorOut(e, "OneColumnFile", "initFile"); exit(1); } } /***********************************************************************/ void OneColumnFile::output(int nSeqs, vector data){ try { if(counter != 0){ string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << setprecision(4) << '\t' << data[0] << endl; } else{ outFile << nSeqs << setprecision(4) << '\t' << data[0] << endl; } } catch(exception& e) { m->errorOut(e, "OneColumnFile", "output"); exit(1); } } /***********************************************************************/ void OneColumnFile::resetFile(){ try { if(counter != 0){ outFile.close(); inFile.close(); }else{ outFile.close(); } counter = 1; m->mothurRemove(inName); renameOk = rename(outName.c_str(), inName.c_str()); //renameFile(outName, inName); //checks to make sure user was able to rename and remove successfully if ((renameOk != 0)) { m->mothurOut("Unable to rename " + outName); m->mothurOutEndLine(); perror(" : "); } } catch(exception& e) { m->errorOut(e, "OneColumnFile", "resetFile"); exit(1); } } /***********************************************************************/ /***********************************************************************/ SharedOneColumnFile::~SharedOneColumnFile(){ inFile.close(); outFile.close(); m->mothurRemove(outName); } /***********************************************************************/ void SharedOneColumnFile::initFile(string label){ try { if(counter != 0){ m->openOutputFile(outName, outFile); m->openInputFile(inName, inFile); string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << '\t' << label << endl; } else{ m->openOutputFile(outName, outFile); outFile << "sampled\t" << label << endl; } outFile.setf(ios::fixed, ios::floatfield); outFile.setf(ios::showpoint); } catch(exception& e) { m->errorOut(e, "SharedOneColumnFile", "initFile"); exit(1); } } /***********************************************************************/ void SharedOneColumnFile::output(int nSeqs, vector data){ try { string dataOutput; float sam; sam = data[0]; dataOutput = ""; for (int i = 0; i < data.size(); i++) { dataOutput = dataOutput + "\t" + toString(data[i]); } if(counter != 0){ string inputBuffer; inputBuffer = m->getline(inFile); outFile << inputBuffer << setprecision(2) << '\t' << dataOutput << endl; } else{ outFile << nSeqs << setprecision(2) << '\t' << dataOutput << endl; } } catch(exception& e) { m->errorOut(e, "SharedOneColumnFile", "output"); exit(1); } } /***********************************************************************/ void SharedOneColumnFile::resetFile(){ try { if(counter != 0){ outFile.close(); inFile.close(); } else{ outFile.close(); } counter = 1; m->mothurRemove(inName); renameOk = rename(outName.c_str(), inName.c_str()); //renameFile(outName, inName); //checks to make sure user was able to rename and remove successfully if ((renameOk != 0)) { m->mothurOut("Unable to rename " + outName); m->mothurOutEndLine(); perror(" : "); } } catch(exception& e) { m->errorOut(e, "SharedOneColumnFile", "resetFile"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/fileoutput.h000066400000000000000000000065701255543666200170170ustar00rootroot00000000000000#ifndef FILEOUTPUT_H #define FILEOUTPUT_H #include "mothur.h" #include "mothurout.h" /***********************************************************************/ class FileOutput { public: FileOutput(){ m = MothurOut::getInstance(); } virtual ~FileOutput(){}; virtual void initFile(string) = 0; virtual void initFile(string, vector) = 0; virtual void output(int, vector) = 0; virtual void output(vector) = 0; virtual void resetFile() = 0; virtual string getFileName() = 0; protected: int renameOk; MothurOut* m; }; /***********************************************************************/ class ThreeColumnFile : public FileOutput { public: ThreeColumnFile(string n) : FileOutput(), inName(n), counter(0), outName(n + ".temp") { }; ~ThreeColumnFile(); void initFile(string); void output(int, vector); void resetFile(); string getFileName() { return inName; }; void initFile(string, vector){}; void output(vector) {}; private: string inName; string outName; ifstream inFile; ofstream outFile; int counter; }; /***********************************************************************/ class OneColumnFile : public FileOutput { public: OneColumnFile(string n) : inName(n), counter(0), outName(n + ".temp") {}; ~OneColumnFile(); void output(int, vector); void initFile(string); void resetFile(); string getFileName() { return inName; }; void initFile(string, vector) {}; void output(vector) {}; private: string outName; ifstream inFile; string inName; ofstream outFile; int counter; }; /***********************************************************************/ class SharedOneColumnFile : public FileOutput { public: SharedOneColumnFile(string n) : inName(n), counter(0), outName(n + ".temp") {}; ~SharedOneColumnFile(); void output(int, vector); void initFile(string); void resetFile(); string getFileName() { return inName; }; void initFile(string, vector) {}; void output(vector) {}; private: string outName; ifstream inFile; string inName; ofstream outFile; int counter; }; /***********************************************************************/ class SharedThreeColumnFile : public FileOutput { public: SharedThreeColumnFile(string n, string groups) : FileOutput(), groupLabel(groups), inName(n), counter(0), numGroup(1), outName(n + ".temp") { }; ~SharedThreeColumnFile(); void initFile(string); void output(int, vector); void resetFile(); string getFileName() { return inName; }; void initFile(string, vector) {}; void output(vector) {}; private: string inName, groupLabel; string outName; ifstream inFile; ofstream outFile; int counter, numGroup; }; /***********************************************************************/ //used by parsimony, unifrac.weighted and unifrac.unweighted class ColumnFile : public FileOutput { public: ColumnFile(string n, string i) : FileOutput(), iters(i), inName(n), counter(0), outName(n + ".temp") {}; ~ColumnFile(); //to make compatible with parent class void output(int, vector){}; void initFile(string){}; void initFile(string, vector); void output(vector); void resetFile(); string getFileName() { return inName; }; private: string inName; string outName; ifstream inFile; ofstream outFile; int counter; string iters; }; #endif mothur-1.36.1/source/gotohoverlap.cpp000066400000000000000000000072711255543666200176620ustar00rootroot00000000000000/* * gotohoverlap.cpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class is an Alignment child class that implements the Gotoh pairwise alignment algorithm as described in: * * Gotoh O. 1982. An improved algorithm for matching biological sequences. J. Mol. Biol. 162:705-8. * Myers, EW & Miller, W. 1988. Optimal alignments in linear space. Comput Appl Biosci. 4:11-7. * * This method is nice because it allows for an affine gap penalty to be assessed, which is analogous to what is used * in blast and is an alternative to Needleman-Wunsch, which only charges the same penalty for each gap position. * Because this method typically has problems at the ends when two sequences do not full overlap, we employ a separate * method to fix the ends (see Overlap class documentation) * */ #include "alignmentcell.hpp" #include "overlap.hpp" #include "alignment.hpp" #include "gotohoverlap.hpp" /**************************************************************************************************/ GotohOverlap::GotohOverlap(float gO, float gE, float f, float mm, int r) : gapOpen(gO), gapExtend(gE), match(f), mismatch(mm), Alignment(r) { try { for(int i=1;ierrorOut(e, "GotohOverlap", "GotohOverlap"); exit(1); } } /**************************************************************************************************/ void GotohOverlap::align(string A, string B){ try { seqA = ' ' + A; lA = seqA.length(); // the algorithm requires that the first character be a dummy value seqB = ' ' + B; lB = seqB.length(); // the algorithm requires that the first character be a dummy value for(int i=1;i alignment[i][j].dValue){ if(alignment[i][j].iValue > diagonal){ alignment[i][j].cValue = alignment[i][j].iValue; alignment[i][j].prevCell = 'l'; } else{ alignment[i][j].cValue = diagonal; alignment[i][j].prevCell = 'd'; } } else{ if(alignment[i][j].dValue > diagonal){ alignment[i][j].cValue = alignment[i][j].dValue; alignment[i][j].prevCell = 'u'; } else{ alignment[i][j].cValue = diagonal; alignment[i][j].prevCell = 'd'; } } } } Overlap over; over.setOverlap(alignment, lA, lB, 0); // Fix the gaps at the ends of the sequences traceBack(); // Construct the alignment and set seqAaln and seqBaln } catch(exception& e) { m->errorOut(e, "GotohOverlap", "align"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/gotohoverlap.hpp000066400000000000000000000026031255543666200176610ustar00rootroot00000000000000#ifndef GOTOHOVERLAP_H #define GOTOHOVERLAP_H /* * gotohoverlap.h * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class is an Alignment child class that implements the Gotoh pairwise alignment algorithm as described in: * * Gotoh O. 1982. An improved algorithm for matching biological sequences. J. Mol. Biol. 162:705-8. * Myers, EW & Miller, W. 1988. Optimal alignments in linear space. Comput Appl Biosci. 4:11-7. * * This method is nice because it allows for an affine gap penalty to be assessed, which is analogous to what is used * in blast and is an alternative to Needleman-Wunsch, which only charges the same penalty for each gap position. * Because this method typically has problems at the ends when two sequences do not full overlap, we employ a separate * method to fix the ends (see Overlap class documentation) * */ #include "mothur.h" #include "alignment.hpp" /**************************************************************************************************/ class GotohOverlap : public Alignment { public: GotohOverlap(float, float, float, float, int); void align(string, string); ~GotohOverlap() {} private: float gapOpen; float gapExtend; float match; float mismatch; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/hcluster.cpp000066400000000000000000000617111255543666200170010ustar00rootroot00000000000000/* * hcluster.cpp * Mothur * * Created by westcott on 10/13/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "hcluster.h" #include "rabundvector.hpp" #include "listvector.hpp" /***********************************************************************/ HCluster::HCluster(RAbundVector* rav, ListVector* lv, string ms, string d, NameAssignment* n, float c) : rabund(rav), list(lv), method(ms), distfile(d), nameMap(n), cutoff(c) { try { m = MothurOut::getInstance(); mapWanted = false; exitedBreak = false; numSeqs = list->getNumSeqs(); //initialize cluster array for (int i = 0; i < numSeqs; i++) { clusterNode temp(1, -1, i); clusterArray.push_back(temp); } if ((method == "furthest") || (method == "nearest")) { m->openInputFile(distfile, filehandle); }else{ processFile(); } } catch(exception& e) { m->errorOut(e, "HCluster", "HCluster"); exit(1); } } /***********************************************************************/ void HCluster::clusterBins(){ try { //cout << smallCol << '\t' << smallRow << '\t' << smallDist << '\t' << rabund->get(clusterArray[smallRow].smallChild) << '\t' << rabund->get(clusterArray[smallCol].smallChild); rabund->set(clusterArray[smallCol].smallChild, rabund->get(clusterArray[smallRow].smallChild)+rabund->get(clusterArray[smallCol].smallChild)); rabund->set(clusterArray[smallRow].smallChild, 0); rabund->setLabel(toString(smallDist)); //cout << '\t' << rabund->get(clusterArray[smallRow].smallChild) << '\t' << rabund->get(clusterArray[smallCol].smallChild) << endl; } catch(exception& e) { m->errorOut(e, "HCluster", "clusterBins"); exit(1); } } /***********************************************************************/ void HCluster::clusterNames(){ try { ///cout << smallCol << '\t' << smallRow << '\t' << smallDist << '\t' << list->get(clusterArray[smallRow].smallChild) << '\t' << list->get(clusterArray[smallCol].smallChild); if (mapWanted) { updateMap(); } list->set(clusterArray[smallCol].smallChild, list->get(clusterArray[smallRow].smallChild)+','+list->get(clusterArray[smallCol].smallChild)); list->set(clusterArray[smallRow].smallChild, ""); list->setLabel(toString(smallDist)); //cout << '\t' << list->get(clusterArray[smallRow].smallChild) << '\t' << list->get(clusterArray[smallCol].smallChild) << endl; } catch(exception& e) { m->errorOut(e, "HCluster", "clusterNames"); exit(1); } } /***********************************************************************/ int HCluster::getUpmostParent(int node){ try { while (clusterArray[node].parent != -1) { node = clusterArray[node].parent; } return node; } catch(exception& e) { m->errorOut(e, "HCluster", "getUpmostParent"); exit(1); } } /***********************************************************************/ void HCluster::printInfo(){ try { cout << "link table" << endl; for (itActive = activeLinks.begin(); itActive!= activeLinks.end(); itActive++) { cout << itActive->first << " = " << itActive->second << endl; } cout << endl; for (int i = 0; i < linkTable.size(); i++) { cout << i << '\t'; for (it = linkTable[i].begin(); it != linkTable[i].end(); it++) { cout << it->first << '-' << it->second << '\t' ; } cout << endl; } cout << endl << "clusterArray" << endl; for (int i = 0; i < clusterArray.size(); i++) { cout << i << '\t' << clusterArray[i].numSeq << '\t' << clusterArray[i].parent << '\t' << clusterArray[i].smallChild << endl; } cout << endl; } catch(exception& e) { m->errorOut(e, "HCluster", "getUpmostParent"); exit(1); } } /***********************************************************************/ int HCluster::makeActive() { try { int linkValue = 1; itActive = activeLinks.find(smallRow); it2Active = activeLinks.find(smallCol); if ((itActive == activeLinks.end()) && (it2Active == activeLinks.end())) { //both are not active so add them int size = linkTable.size(); map temp; map temp2; //add link to eachother temp[smallRow] = 1; // 1 2 temp2[smallCol] = 1; // 1 0 1 // 2 1 0 linkTable.push_back(temp); linkTable.push_back(temp2); //add to activeLinks activeLinks[smallRow] = size; activeLinks[smallCol] = size+1; }else if ((itActive != activeLinks.end()) && (it2Active == activeLinks.end())) { //smallRow is active, smallCol is not int size = linkTable.size(); int alreadyActiveRow = itActive->second; map temp; //add link to eachother temp[smallRow] = 1; // 6 2 3 5 linkTable.push_back(temp); // 6 0 1 2 0 linkTable[alreadyActiveRow][smallCol] = 1; // 2 1 0 1 1 // 3 2 1 0 0 // 5 0 1 0 0 //add to activeLinks activeLinks[smallCol] = size; }else if ((itActive == activeLinks.end()) && (it2Active != activeLinks.end())) { //smallCol is active, smallRow is not int size = linkTable.size(); int alreadyActiveCol = it2Active->second; map temp; //add link to eachother temp[smallCol] = 1; // 6 2 3 5 linkTable.push_back(temp); // 6 0 1 2 0 linkTable[alreadyActiveCol][smallRow] = 1; // 2 1 0 1 1 // 3 2 1 0 0 // 5 0 1 0 0 //add to activeLinks activeLinks[smallRow] = size; }else { //both are active so add one int row = itActive->second; int col = it2Active->second; linkTable[row][smallCol]++; linkTable[col][smallRow]++; linkValue = linkTable[row][smallCol]; } return linkValue; } catch(exception& e) { m->errorOut(e, "HCluster", "makeActive"); exit(1); } } /***********************************************************************/ void HCluster::updateArrayandLinkTable() { try { //if cluster was made update clusterArray and linkTable int size = clusterArray.size(); //add new node clusterNode temp(clusterArray[smallRow].numSeq + clusterArray[smallCol].numSeq, -1, clusterArray[smallCol].smallChild); clusterArray.push_back(temp); //update child nodes clusterArray[smallRow].parent = size; clusterArray[smallCol].parent = size; if (method == "furthest") { //update linkTable by merging clustered rows and columns int rowSpot = activeLinks[smallRow]; int colSpot = activeLinks[smallCol]; //fix old rows for (int i = 0; i < linkTable.size(); i++) { //check if they are in map it = linkTable[i].find(smallRow); it2 = linkTable[i].find(smallCol); if ((it!=linkTable[i].end()) && (it2!=linkTable[i].end())) { //they are both there linkTable[i][size] = linkTable[i][smallRow]+linkTable[i][smallCol]; linkTable[i].erase(smallCol); //delete col row linkTable[i].erase(smallRow); //delete col row }else if ((it==linkTable[i].end()) && (it2!=linkTable[i].end())) { //only col linkTable[i][size] = linkTable[i][smallCol]; linkTable[i].erase(smallCol); //delete col }else if ((it!=linkTable[i].end()) && (it2==linkTable[i].end())) { //only row linkTable[i][size] = linkTable[i][smallRow]; linkTable[i].erase(smallRow); //delete col } } //merge their values for (it = linkTable[rowSpot].begin(); it != linkTable[rowSpot].end(); it++) { it2 = linkTable[colSpot].find(it->first); //does the col also have this if (it2 == linkTable[colSpot].end()) { //not there so add it linkTable[colSpot][it->first] = it->second; }else { //merge them linkTable[colSpot][it->first] = it->second + it2->second; } } linkTable[colSpot].erase(size); linkTable.erase(linkTable.begin()+rowSpot); //delete row //update activerows activeLinks.erase(smallRow); activeLinks.erase(smallCol); activeLinks[size] = colSpot; //adjust everybody elses spot since you deleted - time vs. space for (itActive = activeLinks.begin(); itActive != activeLinks.end(); itActive++) { if (itActive->second > rowSpot) { activeLinks[itActive->first]--; } } } } catch(exception& e) { m->errorOut(e, "HCluster", "updateArrayandLinkTable"); exit(1); } } /***********************************************************************/ double HCluster::update(int row, int col, float distance){ try { bool cluster = false; smallRow = row; smallCol = col; smallDist = distance; //find upmost parent of row and col smallRow = getUpmostParent(smallRow); smallCol = getUpmostParent(smallCol); //you don't want to cluster with yourself if (smallRow != smallCol) { if ((method == "furthest") || (method == "nearest")) { //can we cluster??? if (method == "nearest") { cluster = true; } else{ //assume furthest //are they active in the link table int linkValue = makeActive(); //after this point this nodes info is active in linkTable if (linkValue == (clusterArray[smallRow].numSeq * clusterArray[smallCol].numSeq)) { cluster = true; } } if (cluster) { updateArrayandLinkTable(); clusterBins(); clusterNames(); } }else { cluster = true; updateArrayandLinkTable(); clusterBins(); clusterNames(); combineFile(); } } return cutoff; //printInfo(); } catch(exception& e) { m->errorOut(e, "HCluster", "update"); exit(1); } } /***********************************************************************/ void HCluster::setMapWanted(bool ms) { try { mapWanted = ms; //initialize map for (int i = 0; i < list->getNumBins(); i++) { //parse bin string names = list->get(i); while (names.find_first_of(',') != -1) { //get name from bin string name = names.substr(0,names.find_first_of(',')); //save name and bin number seq2Bin[name] = i; names = names.substr(names.find_first_of(',')+1, names.length()); } //get last name seq2Bin[names] = i; } } catch(exception& e) { m->errorOut(e, "HCluster", "setMapWanted"); exit(1); } } /***********************************************************************/ void HCluster::updateMap() { try { //update location of seqs in smallRow since they move to smallCol now string names = list->get(clusterArray[smallRow].smallChild); while (names.find_first_of(',') != -1) { //get name from bin string name = names.substr(0,names.find_first_of(',')); //save name and bin number seq2Bin[name] = clusterArray[smallCol].smallChild; names = names.substr(names.find_first_of(',')+1, names.length()); } //get last name seq2Bin[names] = clusterArray[smallCol].smallChild; } catch(exception& e) { m->errorOut(e, "HCluster", "updateMap"); exit(1); } } //********************************************************************************************************************** vector HCluster::getSeqs(){ try { vector sameSeqs; if ((method == "furthest") || (method == "nearest")) { sameSeqs = getSeqsFNNN(); }else{ sameSeqs = getSeqsAN(); } return sameSeqs; } catch(exception& e) { m->errorOut(e, "HCluster", "getSeqs"); exit(1); } } //********************************************************************************************************************** vector HCluster::getSeqsFNNN(){ try { string firstName, secondName; float distance, prevDistance; vector sameSeqs; prevDistance = -1; //if you are not at the beginning of the file if (exitedBreak) { sameSeqs.push_back(next); prevDistance = next.dist; exitedBreak = false; } //get entry while (!filehandle.eof()) { filehandle >> firstName >> secondName >> distance; m->gobble(filehandle); //save first one if (prevDistance == -1) { prevDistance = distance; } map::iterator itA = nameMap->find(firstName); map::iterator itB = nameMap->find(secondName); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + firstName + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + secondName + "' was not found in the names file, please correct\n"); exit(1); } //using cutoff if (distance > cutoff) { break; } if (distance != -1) { //-1 means skip me //are the distances the same if (distance == prevDistance) { //save in vector seqDist temp(itA->second, itB->second, distance); sameSeqs.push_back(temp); exitedBreak = false; }else{ next.seq1 = itA->second; next.seq2 = itB->second; next.dist = distance; exitedBreak = true; break; } } } //rndomize matching dists random_shuffle(sameSeqs.begin(), sameSeqs.end()); return sameSeqs; } catch(exception& e) { m->errorOut(e, "HCluster", "getSeqsFNNN"); exit(1); } } //********************************************************************************************************************** vector HCluster::getSeqsAN(){ try { int firstName, secondName; float prevDistance; vector sameSeqs; prevDistance = -1; m->openInputFile(distfile, filehandle, "no error"); //is the smallest value in mergedMin or the distfile? float mergedMinDist = 10000; float distance = 10000; if (mergedMin.size() > 0) { mergedMinDist = mergedMin[0].dist; } if (!filehandle.eof()) { filehandle >> firstName >> secondName >> distance; m->gobble(filehandle); //save first one if (prevDistance == -1) { prevDistance = distance; } if (distance != -1) { //-1 means skip me seqDist temp(firstName, secondName, distance); sameSeqs.push_back(temp); }else{ distance = 10000; } } if (mergedMinDist < distance) { //get minimum distance from mergedMin //remove distance we saved from file sameSeqs.clear(); prevDistance = mergedMinDist; for (int i = 0; i < mergedMin.size(); i++) { if (mergedMin[i].dist == prevDistance) { sameSeqs.push_back(mergedMin[i]); }else { break; } } }else{ //get minimum from file //get entry while (!filehandle.eof()) { filehandle >> firstName >> secondName >> distance; m->gobble(filehandle); if (prevDistance == -1) { prevDistance = distance; } if (distance != -1) { //-1 means skip me //are the distances the same if (distance == prevDistance) { //save in vector seqDist temp(firstName, secondName, distance); sameSeqs.push_back(temp); }else{ break; } } } } filehandle.close(); //randomize matching dists random_shuffle(sameSeqs.begin(), sameSeqs.end()); //can only return one value since once these are merged the other distances in sameSeqs may have changed vector temp; if (sameSeqs.size() > 0) { temp.push_back(sameSeqs[0]); } return temp; } catch(exception& e) { m->errorOut(e, "HCluster", "getSeqsAN"); exit(1); } } /***********************************************************************/ int HCluster::combineFile() { try { //int bufferSize = 64000; //512k - this should be a variable that the user can set to optimize code to their hardware //char* inputBuffer; //inputBuffer = new char[bufferSize]; //size_t numRead; string tempDistFile = distfile + ".temp"; ofstream out; m->openOutputFile(tempDistFile, out); //FILE* in; //in = fopen(distfile.c_str(), "rb"); ifstream in; m->openInputFile(distfile, in, "no error"); int first, second; float dist; vector< map > smallRowColValues; smallRowColValues.resize(2); //0 = row, 1 = col int count = 0; //go through file pulling out distances related to rows merging //if mergedMin contains distances add those back into file //bool done = false; //partialDist = ""; //while ((numRead = fread(inputBuffer, 1, bufferSize, in)) != 0) { //cout << "number of char read = " << numRead << endl; //cout << inputBuffer << endl; //if (numRead < bufferSize) { done = true; } //parse input into individual distances //int spot = 0; //string outputString = ""; //while(spot < numRead) { //cout << "spot = " << spot << endl; // seqDist nextDist = getNextDist(inputBuffer, spot, bufferSize); //you read a partial distance // if (nextDist.seq1 == -1) { break; } while (!in.eof()) { //first = nextDist.seq1; second = nextDist.seq2; dist = nextDist.dist; //cout << "next distance = " << first << '\t' << second << '\t' << dist << endl; //since file is sorted and mergedMin is sorted //you can put the smallest distance from each through the code below and keep the file sorted in >> first >> second >> dist; m->gobble(in); if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(tempDistFile); return 0; } //while there are still values in mergedMin that are smaller than the distance read from file while (count < mergedMin.size()) { //is the distance in mergedMin smaller than from the file if (mergedMin[count].dist < dist) { //is this a distance related to the columns merging? //if yes, save in memory if ((mergedMin[count].seq1 == smallRow) && (mergedMin[count].seq2 == smallCol)) { //do nothing this is the smallest distance from last time }else if (mergedMin[count].seq1 == smallCol) { smallRowColValues[1][mergedMin[count].seq2] = mergedMin[count].dist; }else if (mergedMin[count].seq2 == smallCol) { smallRowColValues[1][mergedMin[count].seq1] = mergedMin[count].dist; }else if (mergedMin[count].seq1 == smallRow) { smallRowColValues[0][mergedMin[count].seq2] = mergedMin[count].dist; }else if (mergedMin[count].seq2 == smallRow) { smallRowColValues[0][mergedMin[count].seq1] = mergedMin[count].dist; }else { //if no, write to temp file //outputString += toString(mergedMin[count].seq1) + '\t' + toString(mergedMin[count].seq2) + '\t' + toString(mergedMin[count].dist) + '\n'; //if (mergedMin[count].dist < cutoff) { out << mergedMin[count].seq1 << '\t' << mergedMin[count].seq2 << '\t' << mergedMin[count].dist << endl; //} } count++; }else{ break; } } //is this a distance related to the columns merging? //if yes, save in memory if ((first == smallRow) && (second == smallCol)) { //do nothing this is the smallest distance from last time }else if (first == smallCol) { smallRowColValues[1][second] = dist; }else if (second == smallCol) { smallRowColValues[1][first] = dist; }else if (first == smallRow) { smallRowColValues[0][second] = dist; }else if (second == smallRow) { smallRowColValues[0][first] = dist; }else { //if no, write to temp file //outputString += toString(first) + '\t' + toString(second) + '\t' + toString(dist) + '\n'; //if (dist < cutoff) { out << first << '\t' << second << '\t' << dist << endl; //} } } //out << outputString; //if(done) { break; } //} //fclose(in); in.close(); //if values in mergedMin are larger than the the largest in file then while (count < mergedMin.size()) { //is this a distance related to the columns merging? //if yes, save in memory if ((mergedMin[count].seq1 == smallRow) && (mergedMin[count].seq2 == smallCol)) { //do nothing this is the smallest distance from last time }else if (mergedMin[count].seq1 == smallCol) { smallRowColValues[1][mergedMin[count].seq2] = mergedMin[count].dist; }else if (mergedMin[count].seq2 == smallCol) { smallRowColValues[1][mergedMin[count].seq1] = mergedMin[count].dist; }else if (mergedMin[count].seq1 == smallRow) { smallRowColValues[0][mergedMin[count].seq2] = mergedMin[count].dist; }else if (mergedMin[count].seq2 == smallRow) { smallRowColValues[0][mergedMin[count].seq1] = mergedMin[count].dist; }else { //if no, write to temp file //if (mergedMin[count].dist < cutoff) { out << mergedMin[count].seq1 << '\t' << mergedMin[count].seq2 << '\t' << mergedMin[count].dist << endl; //} } count++; } out.close(); mergedMin.clear(); //rename tempfile to distfile m->mothurRemove(distfile); rename(tempDistFile.c_str(), distfile.c_str()); //cout << "remove = "<< renameOK << " rename = " << ok << endl; //merge clustered rows averaging the distances map::iterator itMerge; map::iterator it2Merge; for(itMerge = smallRowColValues[0].begin(); itMerge != smallRowColValues[0].end(); itMerge++) { //does smallRowColValues[1] have a distance to this seq too? it2Merge = smallRowColValues[1].find(itMerge->first); float average; if (it2Merge != smallRowColValues[1].end()) { //if yes, then average //average if (method == "average") { int total = clusterArray[smallRow].numSeq + clusterArray[smallCol].numSeq; average = ((clusterArray[smallRow].numSeq * itMerge->second) + (clusterArray[smallCol].numSeq * it2Merge->second)) / (float) total; }else { //weighted average = ((itMerge->second * 1.0) + (it2Merge->second * 1.0)) / (float) 2.0; } smallRowColValues[1].erase(it2Merge); seqDist temp(clusterArray[smallRow].parent, itMerge->first, average); mergedMin.push_back(temp); }else { //can't find value so update cutoff if (cutoff > itMerge->second) { cutoff = itMerge->second; } } } //update cutoff for(itMerge = smallRowColValues[1].begin(); itMerge != smallRowColValues[1].end(); itMerge++) { if (cutoff > itMerge->second) { cutoff = itMerge->second; } } //sort merged values sort(mergedMin.begin(), mergedMin.end(), compareSequenceDistance); return 0; } catch(exception& e) { m->errorOut(e, "HCluster", "combineFile"); exit(1); } } /*********************************************************************** seqDist HCluster::getNextDist(char* buffer, int& index, int size){ try { seqDist next; int indexBefore = index; string first, second, distance; first = ""; second = ""; distance = ""; int tabCount = 0; //cout << "partial = " << partialDist << endl; if (partialDist != "") { //read what you can, you know it is less than a whole distance. for (int i = 0; i < partialDist.size(); i++) { if (tabCount == 0) { if (partialDist[i] == '\t') { tabCount++; } else { first += partialDist[i]; } }else if (tabCount == 1) { if (partialDist[i] == '\t') { tabCount++; } else { second += partialDist[i]; } }else if (tabCount == 2) { distance += partialDist[i]; } } partialDist = ""; } //try to get another distance bool gotDist = false; while (index < size) { if ((buffer[index] == 10) || (buffer[index] == 13)) { //newline in unix or windows gotDist = true; //m->gobble space while (index < size) { if (isspace(buffer[index])) { index++; } else { break; } } break; }else{ if (tabCount == 0) { if (buffer[index] == '\t') { tabCount++; } else { first += buffer[index]; } }else if (tabCount == 1) { if (buffer[index] == '\t') { tabCount++; } else { second += buffer[index]; } }else if (tabCount == 2) { distance += buffer[index]; } index++; } } //there was not a whole distance in the buffer, ie. buffer = "1 2 0.01 2 3 0." //then you want to save the partial distance. if (!gotDist) { for (int i = indexBefore; i < size; i++) { partialDist += buffer[i]; } index = size + 1; next.seq1 = -1; next.seq2 = -1; next.dist = 0.0; }else{ int firstname, secondname; float dist; convert(first, firstname); convert(second, secondname); convert(distance, dist); next.seq1 = firstname; next.seq2 = secondname; next.dist = dist; } return next; } catch(exception& e) { m->errorOut(e, "HCluster", "getNextDist"); exit(1); } } ***********************************************************************/ int HCluster::processFile() { try { string firstName, secondName; float distance; ifstream in; m->openInputFile(distfile, in, "no error"); ofstream out; string outTemp = distfile + ".temp"; m->openOutputFile(outTemp, out); //get entry while (!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(outTemp); return 0; } in >> firstName >> secondName >> distance; m->gobble(in); map::iterator itA = nameMap->find(firstName); map::iterator itB = nameMap->find(secondName); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + firstName + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + secondName + "' was not found in the names file, please correct\n"); exit(1); } //using cutoff if (distance > cutoff) { break; } if (distance != -1) { //-1 means skip me out << itA->second << '\t' << itB->second << '\t' << distance << endl; } } in.close(); out.close(); m->mothurRemove(distfile); rename(outTemp.c_str(), distfile.c_str()); return 0; } catch(exception& e) { m->errorOut(e, "HCluster", "processFile"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/hcluster.h000066400000000000000000000034251255543666200164440ustar00rootroot00000000000000#ifndef HCLUSTER_H #define HCLUSTER_H /* * hcluster.h * Mothur * * Created by westcott on 10/13/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "nameassignment.hpp" class RAbundVector; class ListVector; /***********************************************************************/ class HCluster { public: HCluster(RAbundVector*, ListVector*, string, string, NameAssignment*, float); ~HCluster(){}; double update(int, int, float); void setMapWanted(bool m); map getSeqtoBin() { return seq2Bin; } vector getSeqs(); protected: void clusterBins(); void clusterNames(); int getUpmostParent(int); int makeActive(); void printInfo(); void updateArrayandLinkTable(); void updateMap(); vector getSeqsFNNN(); vector getSeqsAN(); int combineFile(); int processFile(); //seqDist getNextDist(char*, int&, int); RAbundVector* rabund; ListVector* list; NameAssignment* nameMap; vector clusterArray; //note: the nearest and average neighbor method do not use the link table or active links vector< map > linkTable; // vector of maps - linkTable[1][6] = 2 would mean sequence in spot 1 has 2 links with sequence in 6 map activeLinks; //maps sequence to index in linkTable map::iterator it; map::iterator itActive; map::iterator it2Active; map::iterator it2; int numSeqs; int smallRow; int smallCol; float smallDist, cutoff; map seq2Bin; bool mapWanted, exitedBreak; seqDist next; string method, distfile; ifstream filehandle; vector mergedMin; string partialDist; MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/heatmap.cpp000066400000000000000000000650461255543666200165740ustar00rootroot00000000000000/* * heatmap.cpp * Mothur * * Created by Sarah Westcott on 3/25/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "heatmap.h" //********************************************************************************************************************** HeatMap::HeatMap(string sort, string scale, int num, int fsize, string dir, string i){ try { m = MothurOut::getInstance(); // format = globaldata->getFormat(); sorted = sort; scaler = scale; outputDir = dir; numOTU = num; fontSize = fsize; inputfile = i; } catch(exception& e) { m->errorOut(e, "HeatMap", "HeatMap"); exit(1); } } //********************************************************************************************************************** string HeatMap::getPic(RAbundVector* rabund) { try { int numBinsToDisplay = rabund->getNumBins(); if (numOTU != 0) { //user want to display a portion of the otus if (numOTU < numBinsToDisplay) { numBinsToDisplay = numOTU; } } //sort lookup so shared bins are on top if (sorted != "none") { sortRabund(rabund); } float maxRelAbund = 0.0; for(int i=0;isize();i++){ float relAbund = rabund->get(i) / (float)rabund->getNumSeqs(); if(relAbund > maxRelAbund){ maxRelAbund = relAbund; } } vector scaleRelAbund(numBinsToDisplay, ""); for(int i=0;iget(i) / (float)rabund->getNumSeqs(); if (m->control_pressed) { return "control"; } if (rabund->get(i) != 0) { //don't want log value of 0. if (scaler == "log10") { scaleRelAbund[i] = toHex(int(255 * log10(relAbund) / log10(maxRelAbund))) + "0000"; }else if (scaler == "log2") { scaleRelAbund[i] = toHex(int(255 * log2(relAbund) / log2(maxRelAbund))) + "0000"; }else if (scaler == "linear") { scaleRelAbund[i] = toHex(int(255 * relAbund / maxRelAbund)) + "0000"; }else { //if user enters invalid scaler option. scaleRelAbund[i] = toHex(int(255 * log10(relAbund / log10(maxRelAbund)))) + "0000"; } } else { scaleRelAbund[i] = "FFFFFF"; } } string filenamesvg = outputDir + m->getRootName(m->getSimpleName(inputfile)) + rabund->getLabel() + ".heatmap.bin.svg"; m->openOutputFile(filenamesvg, outsvg); //svg image outsvg << "\n"; outsvg << "\n"; //white backround outsvg << ""; outsvg << "Heatmap at distance " + rabund->getLabel() + "\n"; //output legend and color labels string color; int x = 0; int y = 103 + (numBinsToDisplay*5); printLegend(y, maxRelAbund); y = 70; for (int i = 0; i < scaleRelAbund.size(); i++) { if (m->control_pressed) { outsvg.close(); return "control"; } outsvg << "\n"; y += 5; } outsvg << "\n\n"; outsvg.close(); return filenamesvg; } catch(exception& e) { m->errorOut(e, "HeatMap", "getPic"); exit(1); } } //********************************************************************************************************************** string HeatMap::getPic(vector lookup) { try { int numBinsToDisplay = lookup[0]->size(); if (numOTU != 0) { //user want to display a portion of the otus if (numOTU < numBinsToDisplay) { numBinsToDisplay = numOTU; } } //sort lookup so shared bins are on top vector sortedLabels = m->currentSharedBinLabels; if (sorted != "none") { sortedLabels = sortSharedVectors(lookup); } vector > scaleRelAbund; vector maxRelAbund(lookup[0]->size(), 0.0); float superMaxRelAbund = 0; for(int i = 0; i < lookup.size(); i++){ for(int j=0; jsize(); j++){ float relAbund = lookup[i]->getAbundance(j) / (float)lookup[i]->getNumSeqs(); if(relAbund > maxRelAbund[i]){ maxRelAbund[i] = relAbund; } } if(maxRelAbund[i] > superMaxRelAbund){ superMaxRelAbund = maxRelAbund[i]; } } scaleRelAbund.resize(lookup.size()); for(int i=0;icontrol_pressed) { return "control"; } float relAbund = lookup[i]->getAbundance(j) / (float)lookup[i]->getNumSeqs(); if (lookup[i]->getAbundance(j) != 0) { //don't want log value of 0. if (scaler == "log10") { if (maxRelAbund[i] == 1) { maxRelAbund[i] -= 0.001; } if (relAbund == 1) { relAbund -= 0.001; } scaleRelAbund[i][j] = toHex(int(255 * log10(relAbund) / log10(maxRelAbund[i]))) + "0000"; }else if (scaler == "log2") { if (maxRelAbund[i] == 1) { maxRelAbund[i] -= 0.001; } if (relAbund == 1) { relAbund -= 0.001; } scaleRelAbund[i][j] = toHex(int(255 * log2(relAbund) / log2(maxRelAbund[i]))) + "0000"; }else if (scaler == "linear") { scaleRelAbund[i][j] = toHex(int(255 * relAbund / maxRelAbund[i])) + "0000"; }else { //if user enters invalid scaler option. if (maxRelAbund[i] == 1) { maxRelAbund[i] += 0.001; } scaleRelAbund[i][j] = toHex(int(255 * log10(relAbund / log10(maxRelAbund[i])))) + "0000"; } }else { scaleRelAbund[i][j] = "FFFFFF"; } } } string filenamesvg = outputDir + m->getRootName(m->getSimpleName(inputfile)) + lookup[0]->getLabel() + ".heatmap.bin.svg"; m->openOutputFile(filenamesvg, outsvg); int binHeight = 20; int labelBump = 100; int binWidth = 300; //svg image outsvg << "\n"; outsvg << "\n"; //white backround outsvg << ""; outsvg << "Heatmap at distance " + lookup[0]->getLabel() + "\n"; //column labels for (int h = 0; h < lookup.size()+1; h++) { if (h == 0) { string tempLabel = "OTU"; outsvg << "" + tempLabel + "\n"; }else { outsvg << "getGroup().length() / 2)+labelBump/2) + "\" y=\"50\">" + lookup[h-1]->getGroup() + "\n"; } } //output legend and color labels string color; int x = 0; int y = 103 + (numBinsToDisplay*binHeight); printLegend(y, superMaxRelAbund); y = 70; for (int i = 0; i < numBinsToDisplay; i++) { outsvg << "" + sortedLabels[i] + "\n"; x += labelBump; for (int j = 0; j < scaleRelAbund.size(); j++) { if (m->control_pressed) { outsvg.close(); return "control"; } outsvg << "\n"; x += binWidth; } x = 0; y += binHeight; } outsvg << "\n\n"; outsvg.close(); return filenamesvg; } catch(exception& e) { m->errorOut(e, "HeatMap", "getPic"); exit(1); } } //********************************************************************************************************************** vector HeatMap::sortSharedVectors(vector& lookup){ try { vector looktemp; map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; vector sortedLabels = m->currentSharedBinLabels; /****************** find order of otus **********************/ if (sorted == "shared") { place = orderShared(lookup); }else if (sorted == "topotu") { place = orderTopOtu(lookup); }else if (sorted == "topgroup") { place = orderTopGroup(lookup); }else { m->mothurOut("Error: invalid sort option."); m->mothurOutEndLine(); return sortedLabels; } /******************* create copy of lookup *********************/ //create and initialize looktemp as a copy of lookup for (int i = 0; i < lookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(lookup[i]->getNumBins()); temp->setLabel(lookup[i]->getLabel()); temp->setGroup(lookup[i]->getGroup()); //copy lookup i's info for (int j = 0; j < lookup[i]->size(); j++) { temp->set(j, lookup[i]->getAbundance(j), lookup[i]->getGroup()); } looktemp.push_back(temp); } /************************ fill lookup in order given by place *********************/ //for each bin for (int i = 0; i < looktemp[0]->size(); i++) { //place //fill lookup // 2 -> 1 for (int j = 0; j < looktemp.size(); j++) { // 3 -> 2 int newAbund = looktemp[j]->getAbundance(i); // 1 -> 3 lookup[j]->set(place[i], newAbund, looktemp[j]->getGroup()); //binNumber, abundance, group } sortedLabels[place[i]] = m->currentSharedBinLabels[i]; } //delete looktemp -- Sarah look at - this is causing segmentation faults for (int j = 0; j < looktemp.size(); j++) { // delete looktemp[j]; } return sortedLabels; } catch(exception& e) { m->errorOut(e, "HeatMap", "sortSharedVectors"); exit(1); } } //********************************************************************************************************************** map HeatMap::orderShared(vector& lookup){ try { map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; vector sharedBins; vector uniqueBins; //for each bin for (int i = 0; i < lookup[0]->size(); i++) { int count = 0; //is this bin shared for (int j = 0; j < lookup.size(); j++) { if (lookup[j]->getAbundance(i) != 0) { count++; } } if (count < 2) { uniqueBins.push_back(i); } else { sharedBins.push_back(i); } } //fill place for (int i = 0; i < sharedBins.size(); i++) { place[sharedBins[i]] = i; } for (int i = 0; i < uniqueBins.size(); i++) { place[uniqueBins[i]] = (sharedBins.size() + i); } return place; } catch(exception& e) { m->errorOut(e, "HeatMap", "orderShared"); exit(1); } } //********************************************************************************************************************** map HeatMap::orderTopOtu(vector& lookup){ try { map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; vector totals; //for each bin for (int i = 0; i < lookup[0]->size(); i++) { int total = 0; for (int j = 0; j < lookup.size(); j++) { total += lookup[j]->getAbundance(i); } binCount temp(i, total); totals.push_back(temp); } sort(totals.begin(), totals.end(), comparebinCounts); //fill place for (int i = 0; i < totals.size(); i++) { place[totals[i].bin] = i; } return place; } catch(exception& e) { m->errorOut(e, "HeatMap", "orderTopOtu"); exit(1); } } //********************************************************************************************************************** map HeatMap::orderTopGroup(vector& lookup){ try { map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; vector < vector > totals; //totals[0] = bin totals for group 0, totals[1] = bin totals for group 1, ... totals.resize(lookup.size()); //for each bin for (int i = 0; i < lookup[0]->size(); i++) { for (int j = 0; j < lookup.size(); j++) { binCount temp(i, (lookup[j]->getAbundance(i))); totals[j].push_back(temp); } } for (int i = 0; i < totals.size(); i++) { sort(totals[i].begin(), totals[i].end(), comparebinCounts); } //fill place //grab the top otu for each group adding it if its not already added int count = 0; for (int i = 0; i < totals[0].size(); i++) { for (int j = 0; j < totals.size(); j++) { it = place.find(totals[j][i].bin); if (it == place.end()) { //not added yet place[totals[j][i].bin] = count; count++; } } } return place; } catch(exception& e) { m->errorOut(e, "HeatMap", "orderTopGroup"); exit(1); } } //********************************************************************************************************************** void HeatMap::printLegend(int y, float maxbin) { try { //output legend and color labels //go through map and give each score a color value string color; int x = 10; //prints legend for (int i = 1; i < 255; i++) { color = toHex(int((float)(i))); outsvg << "\n"; x += 1; } //prints legend labels x = 10; for (int i = 1; i<=5; i++) { float label; if(scaler== "log10") { label = maxbin * log10(51*i) / log10(255); } else if(scaler== "log2") { label = maxbin * log2(51*i) / log2(255); } else if(scaler== "linear") { label = maxbin * 51 * i / 255; } else { label = maxbin * log10(51*i) / log10(255); } label = int(label * 1000 + 0.5); label /= 1000.0; string text = toString(label, 3); outsvg << "" + text + "\n"; x += 60; } } catch(exception& e) { m->errorOut(e, "HeatMap", "printLegend"); exit(1); } } //********************************************************************************************************************** string HeatMap::getPic(vector lookup) { try { int numBinsToDisplay = lookup[0]->size(); if (numOTU != 0) { //user want to display a portion of the otus if (numOTU < numBinsToDisplay) { numBinsToDisplay = numOTU; } } //sort lookup so shared bins are on top vector sortedLabels = m->currentSharedBinLabels; if (sorted != "none") { sortedLabels = sortSharedVectors(lookup); } vector > scaleRelAbund; vector maxRelAbund(lookup.size(), 0.0); float superMaxRelAbund = 0; for(int i = 0; i < lookup.size(); i++){ for(int j=0; jsize(); j++){ float relAbund = lookup[i]->getAbundance(j); if(relAbund > maxRelAbund[i]){ maxRelAbund[i] = relAbund; } } if(maxRelAbund[i] > superMaxRelAbund){ superMaxRelAbund = maxRelAbund[i]; } } scaleRelAbund.resize(lookup.size()); for(int i=0;icontrol_pressed) { return "control"; } float relAbund = lookup[i]->getAbundance(j); if (lookup[i]->getAbundance(j) != 0) { //don't want log value of 0. if (scaler == "log10") { if (maxRelAbund[i] == 1) { maxRelAbund[i] -= 0.001; } if (relAbund == 1) { relAbund -= 0.001; } scaleRelAbund[i][j] = toHex(int(255 * log10(relAbund) / log10(maxRelAbund[i]))) + "0000"; }else if (scaler == "log2") { if (maxRelAbund[i] == 1) { maxRelAbund[i] -= 0.001; } if (relAbund == 1) { relAbund -= 0.001; } scaleRelAbund[i][j] = toHex(int(255 * log2(relAbund) / log2(maxRelAbund[i]))) + "0000"; }else if (scaler == "linear") { scaleRelAbund[i][j] = toHex(int(255 * relAbund / maxRelAbund[i])) + "0000"; }else { //if user enters invalid scaler option. scaleRelAbund[i][j] = toHex(int(255 * log10(relAbund / log10(maxRelAbund[i])))) + "0000"; } }else { scaleRelAbund[i][j] = "FFFFFF"; } } } string filenamesvg = outputDir + m->getRootName(m->getSimpleName(inputfile)) + lookup[0]->getLabel() + ".heatmap.bin.svg"; m->openOutputFile(filenamesvg, outsvg); int binHeight = 20; int labelBump = 100; int binWidth = 300; //svg image outsvg << "\n"; outsvg << "\n"; //white backround outsvg << ""; outsvg << "Heatmap at distance " + lookup[0]->getLabel() + "\n"; //column labels for (int h = 0; h < lookup.size()+1; h++) { if (h == 0) { string tempLabel = "OTU"; outsvg << "" + tempLabel + "\n"; }else { outsvg << "getGroup().length() / 2)+labelBump/2) + "\" y=\"50\">" + lookup[h-1]->getGroup() + "\n"; } } //output legend and color labels string color; int x = 0; int y = 103 + (numBinsToDisplay*binHeight); printLegend(y, superMaxRelAbund); y = 70; for (int i = 0; i < numBinsToDisplay; i++) { outsvg << "" + sortedLabels[i] + "\n"; x += labelBump; for (int j = 0; j < scaleRelAbund.size(); j++) { if (m->control_pressed) { outsvg.close(); return "control"; } outsvg << "\n"; x += binWidth; } x = 0; y += binHeight; } outsvg << "\n\n"; outsvg.close(); return filenamesvg; } catch(exception& e) { m->errorOut(e, "HeatMap", "getPic"); exit(1); } } //********************************************************************************************************************** vector HeatMap::sortSharedVectors(vector& lookup){ try { vector looktemp; map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; vector sortedLabels = m->currentSharedBinLabels; /****************** find order of otus **********************/ if (sorted == "shared") { place = orderShared(lookup); }else if (sorted == "topotu") { place = orderTopOtu(lookup); }else if (sorted == "topgroup") { place = orderTopGroup(lookup); }else { m->mothurOut("Error: invalid sort option."); m->mothurOutEndLine(); return sortedLabels; } /******************* create copy of lookup *********************/ //create and initialize looktemp as a copy of lookup for (int i = 0; i < lookup.size(); i++) { SharedRAbundFloatVector* temp = new SharedRAbundFloatVector(lookup[i]->getNumBins()); temp->setLabel(lookup[i]->getLabel()); temp->setGroup(lookup[i]->getGroup()); //copy lookup i's info for (int j = 0; j < lookup[i]->size(); j++) { temp->set(j, lookup[i]->getAbundance(j), lookup[i]->getGroup()); } looktemp.push_back(temp); } /************************ fill lookup in order given by place *********************/ //for each bin for (int i = 0; i < looktemp[0]->size(); i++) { //place //fill lookup // 2 -> 1 for (int j = 0; j < looktemp.size(); j++) { // 3 -> 2 float newAbund = looktemp[j]->getAbundance(i); // 1 -> 3 lookup[j]->set(place[i], newAbund, looktemp[j]->getGroup()); //binNumber, abundance, group sortedLabels[place[i]] = m->currentSharedBinLabels[i]; } } //delete looktemp -- Sarah look at - this is causing segmentation faults for (int j = 0; j < looktemp.size(); j++) { // delete looktemp[j]; } return sortedLabels; } catch(exception& e) { m->errorOut(e, "HeatMap", "sortSharedVectors"); exit(1); } } //********************************************************************************************************************** int HeatMap::sortRabund(RAbundVector*& r){ try { map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; /****************** find order of otus **********************/ vector totals; //for each bin for (int i = 0; i < r->getNumBins(); i++) { binCount temp(i, r->get(i)); totals.push_back(temp); } sort(totals.begin(), totals.end(), comparebinCounts); //fill place for (int i = 0; i < totals.size(); i++) { place[totals[i].bin] = i; } /******************* create copy of lookup *********************/ //create and initialize rtemp as a copy of r RAbundVector* rtemp = new RAbundVector(r->getNumBins()); for (int i = 0; i < r->size(); i++) { rtemp->set(i, r->get(i)); } rtemp->setLabel(r->getLabel()); /************************ fill lookup in order given by place *********************/ //for each bin for (int i = 0; i < rtemp->size(); i++) { //place //fill lookup // 2 -> 1 // 3 -> 2 int newAbund = rtemp->get(i); // 1 -> 3 r->set(place[i], newAbund); //binNumber, abundance } return 0; } catch(exception& e) { m->errorOut(e, "HeatMap", "sortRabund"); exit(1); } } //********************************************************************************************************************** map HeatMap::orderShared(vector& lookup){ try { map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; vector sharedBins; vector uniqueBins; //for each bin for (int i = 0; i < lookup[0]->size(); i++) { int count = 0; //is this bin shared for (int j = 0; j < lookup.size(); j++) { if (lookup[j]->getAbundance(i) != 0) { count++; } } if (count < 2) { uniqueBins.push_back(i); } else { sharedBins.push_back(i); } } //fill place for (int i = 0; i < sharedBins.size(); i++) { place[sharedBins[i]] = i; } for (int i = 0; i < uniqueBins.size(); i++) { place[uniqueBins[i]] = (sharedBins.size() + i); } return place; } catch(exception& e) { m->errorOut(e, "HeatMap", "orderShared"); exit(1); } } //********************************************************************************************************************** map HeatMap::orderTopOtu(vector& lookup){ try { map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; vector totals; //for each bin for (int i = 0; i < lookup[0]->size(); i++) { int total = 0; for (int j = 0; j < lookup.size(); j++) { total += lookup[j]->getAbundance(i); } binCountFloat temp(i, total); totals.push_back(temp); } sort(totals.begin(), totals.end(), comparebinFloatCounts); //fill place for (int i = 0; i < totals.size(); i++) { place[totals[i].bin] = i; } return place; } catch(exception& e) { m->errorOut(e, "HeatMap", "orderTopOtu"); exit(1); } } //********************************************************************************************************************** map HeatMap::orderTopGroup(vector& lookup){ try { map place; //spot in lookup where you insert shared by, ie, 3 -> 2 if they are shared by 3 inset into location 2. map::iterator it; vector < vector > totals; //totals[0] = bin totals for group 0, totals[1] = bin totals for group 1, ... totals.resize(lookup.size()); //for each bin for (int i = 0; i < lookup[0]->size(); i++) { for (int j = 0; j < lookup.size(); j++) { binCountFloat temp(i, (lookup[j]->getAbundance(i))); totals[j].push_back(temp); } } for (int i = 0; i < totals.size(); i++) { sort(totals[i].begin(), totals[i].end(), comparebinFloatCounts); } //fill place //grab the top otu for each group adding it if its not already added int count = 0; for (int i = 0; i < totals[0].size(); i++) { for (int j = 0; j < totals.size(); j++) { it = place.find(totals[j][i].bin); if (it == place.end()) { //not added yet place[totals[j][i].bin] = count; count++; } } } return place; } catch(exception& e) { m->errorOut(e, "HeatMap", "orderTopGroup"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/heatmap.h000066400000000000000000000042261255543666200162320ustar00rootroot00000000000000#ifndef HEATMAP_H #define HEATMAP_H /* * heatmap.h * Mothur * * Created by Sarah Westcott on 3/25/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "rabundvector.hpp" #include "sharedrabundvector.h" #include "sharedrabundfloatvector.h" #include "datavector.hpp" /***********************************************************************/ struct binCount { int bin; int abund; binCount(int i, int j) : bin(i), abund(j) {} }; /***********************************************************************/ struct binCountFloat { int bin; float abund; binCountFloat(int i, float j) : bin(i), abund(j) {} }; /***********************************************************************/ //sorts highest abund to lowest inline bool comparebinCounts(binCount left, binCount right){ return (left.abund > right.abund); } /***********************************************************************/ //sorts highest abund to lowest inline bool comparebinFloatCounts(binCountFloat left, binCountFloat right){ return (left.abund > right.abund); } /***********************************************************************/ class HeatMap { public: HeatMap(string, string, int, int, string, string); ~HeatMap(){}; string getPic(RAbundVector*); string getPic(vector); string getPic(vector); private: vector sortSharedVectors(vector& ); vector sortSharedVectors(vector& ); int sortRabund(RAbundVector*&); void printLegend(int, float); string format, sorted, groupComb, scaler, outputDir, inputfile; ofstream outsvg; MothurOut* m; int numOTU, fontSize; map orderTopGroup(vector&); map orderTopOtu(vector&); map orderShared(vector&); map orderTopGroup(vector&); map orderTopOtu(vector&); map orderShared(vector&); }; /***********************************************************************/ #endif mothur-1.36.1/source/heatmapsim.cpp000066400000000000000000000176671255543666200173130ustar00rootroot00000000000000/* * heatmapsim.cpp * Mothur * * Created by Sarah Westcott on 6/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "heatmapsim.h" #include "sharedjabund.h" #include "sharedsorabund.h" #include "sharedjclass.h" #include "sharedsorclass.h" #include "sharedjest.h" #include "sharedsorest.h" #include "sharedthetayc.h" #include "sharedthetan.h" #include "sharedmorisitahorn.h" #include "sharedbraycurtis.h" //********************************************************************************************************************** HeatMapSim::HeatMapSim(string dir, string i, int f) : outputDir(dir), inputfile(i), fontSize(f) { m = MothurOut::getInstance(); } //********************************************************************************************************************** vector HeatMapSim::getPic(vector lookup, vector calcs) { try { EstOutput data; vector sims; vector outputNames; //make file for each calculator selected for (int k = 0; k < calcs.size(); k++) { if (m->control_pressed) { return outputNames; } string filenamesvg = outputDir + m->getRootName(m->getSimpleName(inputfile)) + lookup[0]->getLabel() + "." + calcs[k]->getName() + ".heatmap.sim.svg"; m->openOutputFile(filenamesvg, outsvg); outputNames.push_back(filenamesvg); //svg image outsvg << "\n"; outsvg << "\n"; //white backround outsvg << ""; outsvg << "Heatmap at distance " + lookup[0]->getLabel() + "\n"; //column labels for (int h = 0; h < lookup.size(); h++) { outsvg << "getGroup().length() / 2)) + "\" y=\"50\">" + lookup[h]->getGroup() + "\n"; outsvg << "getGroup().length() / 2)) + "\" x=\"50\">" + lookup[h]->getGroup() + "\n"; } sims.clear(); // double biggest = -1; double biggest = 1; float scaler; //get sim for each comparison and save them so you can find the relative similairity for(int i = 0; i < (lookup.size()-1); i++){ for(int j = (i+1); j < lookup.size(); j++){ if (m->control_pressed) { outsvg.close(); return outputNames; } vector subset; subset.push_back(lookup[i]); subset.push_back(lookup[j]); //get similairity between groups data = calcs[k]->getValues(subset); sims.push_back(1.0 - data[0]); //save biggest similairity to set relative sim // if (data[0] > biggest) { biggest = data[0]; } } } //map biggest similairity found to red scaler = 255.0 / biggest; int count = 0; //output similairites to file for(int i = 0; i < (lookup.size()-1); i++){ for(int j = (i+1); j < lookup.size(); j++){ //find relative color int color = scaler * sims[count]; //draw box outsvg << "\n"; count++; } } int y = ((lookup.size() * 150) + 120); printLegend(y, biggest); outsvg << "\n\n"; outsvg.close(); } return outputNames; } catch(exception& e) { m->errorOut(e, "HeatMapSim", "getPic"); exit(1); } } //********************************************************************************************************************** string HeatMapSim::getPic(vector< vector > dists, vector groups) { try { vector sims; string filenamesvg = outputDir + m->getRootName(m->getSimpleName(inputfile)) + "heatmap.sim.svg"; m->openOutputFile(filenamesvg, outsvg); //svg image outsvg << "\n"; outsvg << "\n"; //white backround outsvg << ""; outsvg << "Heatmap for " + inputfile + "\n"; //column labels for (int h = 0; h < groups.size(); h++) { outsvg << "" + groups[h] + "\n"; outsvg << "" + groups[h] + "\n"; } double biggest = -1; float scaler; //get sim for each comparison and save them so you can find the relative similairity for(int i = 0; i < (dists.size()-1); i++){ for(int j = (i+1); j < dists.size(); j++){ if (m->control_pressed) { outsvg.close(); return filenamesvg; } float sim = 1.0 - dists[i][j]; sims.push_back(sim); //save biggest similairity to set relative sim if (sim > biggest) { biggest = sim; } } } //map biggest similairity found to red scaler = 255.0 / biggest; int count = 0; //output similairites to file for(int i = 0; i < (dists.size()-1); i++){ for(int j = (i+1); j < dists.size(); j++){ //find relative color int color = scaler * sims[count]; //draw box outsvg << "\n"; count++; } } int y = ((dists.size() * 150) + 120); printLegend(y, biggest); outsvg << "\n\n"; outsvg.close(); return filenamesvg; } catch(exception& e) { m->errorOut(e, "HeatMapSim", "getPic"); exit(1); } } //********************************************************************************************************************** void HeatMapSim::printLegend(int y, float maxSim) { try { maxSim = 1; //output legend and color labels //go through map and give each score a color value string color; int x = 10; //prints legend for (int i = 1; i < 255; i++) { color = toHex(int((float)(i))); outsvg << "\n"; x += 3; } float scaler = maxSim / 5.0; //prints legend labels x = 0; for (int i = 0; i<=5; i++) { float label = scaler*i; label = int(label * 1000 + 0.5); label /= 1000.0; string text = toString(label, 1); outsvg << "" + text + "\n"; x += 153; } } catch(exception& e) { m->errorOut(e, "HeatMapSim", "printLegend"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/heatmapsim.h000066400000000000000000000014471255543666200167450ustar00rootroot00000000000000#ifndef HEATMAPSIM_H #define HEATMAPSIM_H /* * heatmapsim.h * Mothur * * Created by Sarah Westcott on 6/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedrabundvector.h" #include "datavector.hpp" #include "calculator.h" /***********************************************************************/ class HeatMapSim { public: HeatMapSim(string, string, int); ~HeatMapSim(){}; vector getPic(vector, vector); string getPic(vector< vector >, vector); private: void printLegend(int, float); string format, groupComb, outputDir, inputfile; int fontSize; ofstream outsvg; MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/inputdata.cpp000066400000000000000000000501701255543666200171360ustar00rootroot00000000000000/* * inputdata.cpp * Dotur * * Created by Sarah Westcott on 11/18/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "inputdata.h" #include "ordervector.hpp" #include "listvector.hpp" #include "rabundvector.hpp" /***********************************************************************/ InputData::InputData(string fName, string f) : format(f){ m = MothurOut::getInstance(); m->openInputFile(fName, fileHandle); filename = fName; m->saveNextLabel = ""; } /***********************************************************************/ InputData::~InputData(){ fileHandle.close(); m->saveNextLabel = ""; } /***********************************************************************/ InputData::InputData(string fName, string orderFileName, string f) : format(f){ try { m = MothurOut::getInstance(); ifstream ofHandle; m->openInputFile(orderFileName, ofHandle); string name; int count = 0; while(ofHandle){ ofHandle >> name; orderMap[name] = count; count++; m->gobble(ofHandle); } ofHandle.close(); m->openInputFile(fName, fileHandle); m->saveNextLabel = ""; } catch(exception& e) { m->errorOut(e, "InputData", "InputData"); exit(1); } } /***********************************************************************/ ListVector* InputData::getListVector(){ try { if(!fileHandle.eof()){ if(format == "list") { list = new ListVector(fileHandle); }else{ list = NULL; } m->gobble(fileHandle); return list; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getListVector"); exit(1); } } /***********************************************************************/ ListVector* InputData::getListVector(string label){ try { ifstream in; string thisLabel; m->openInputFile(filename, in); m->saveNextLabel = ""; if(in){ if (format == "list") { while (in.eof() != true) { list = new ListVector(in); thisLabel = list->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete list; } m->gobble(in); } }else{ list = NULL; } in.close(); return list; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getListVector"); exit(1); } } /***********************************************************************/ ListVector* InputData::getListVector(string label, bool resetFP){ try { string thisLabel; fileHandle.clear(); fileHandle.seekg(0); m->saveNextLabel = ""; if(fileHandle){ if (format == "list") { while (fileHandle.eof() != true) { list = new ListVector(fileHandle); m->gobble(fileHandle); thisLabel = list->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete list; } } }else{ list = NULL; } return list; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getListVector"); exit(1); } } /***********************************************************************/ SharedListVector* InputData::getSharedListVector(){ try { if(fileHandle){ if (format == "shared") { SharedList = new SharedListVector(fileHandle); }else{ SharedList = NULL; } m->gobble(fileHandle); return SharedList; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getSharedListVector"); exit(1); } } /***********************************************************************/ SharedListVector* InputData::getSharedListVector(string label){ try { ifstream in; string thisLabel; m->openInputFile(filename, in); if(in){ if (format == "shared") { while (in.eof() != true) { SharedList = new SharedListVector(in); thisLabel = SharedList->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete SharedList; } m->gobble(in); } }else{ SharedList = NULL; } in.close(); return SharedList; }else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getSharedListVector"); exit(1); } } /***********************************************************************/ SharedOrderVector* InputData::getSharedOrderVector(){ try { if(fileHandle){ if (format == "sharedfile") { SharedOrder = new SharedOrderVector(fileHandle); }else{ SharedOrder = NULL; } m->gobble(fileHandle); return SharedOrder; }else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getSharedOrderVector"); exit(1); } } /***********************************************************************/ SharedOrderVector* InputData::getSharedOrderVector(string label){ try { ifstream in; string thisLabel; m->openInputFile(filename, in); m->saveNextLabel = ""; if(in){ if (format == "sharedfile") { while (in.eof() != true) { SharedOrder = new SharedOrderVector(in); thisLabel = SharedOrder->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete SharedOrder; } m->gobble(in); } }else{ SharedOrder = NULL; } in.close(); return SharedOrder; }else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getSharedOrderVector"); exit(1); } } /***********************************************************************/ OrderVector* InputData::getOrderVector(){ try { if(fileHandle){ if((format == "list") || (format == "listorder")) { input = new ListVector(fileHandle); } else if (format == "shared") { input = new SharedListVector(fileHandle); } else if(format == "rabund"){ input = new RAbundVector(fileHandle); } else if(format == "order"){ input = new OrderVector(fileHandle); } else if(format == "sabund"){ input = new SAbundVector(fileHandle); } m->gobble(fileHandle); output = new OrderVector(); *output = (input->getOrderVector()); return output; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getOrderVector"); exit(1); } } /***********************************************************************/ OrderVector* InputData::getOrderVector(string label){ try { ifstream in; string thisLabel; m->openInputFile(filename, in); if(in){ if((format == "list") || (format == "listorder")) { m->saveNextLabel = ""; while (in.eof() != true) { input = new ListVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if (format == "shared") { m->saveNextLabel = ""; while (in.eof() != true) { input = new SharedListVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "rabund"){ while (in.eof() != true) { input = new RAbundVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "order"){ while (in.eof() != true) { input = new OrderVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "sabund"){ while (in.eof() != true) { input = new SAbundVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } in.close(); output = new OrderVector(); *output = (input->getOrderVector()); return output; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getOrderVector"); exit(1); } } /***********************************************************************/ //this is used when you don't need the order vector vector InputData::getSharedRAbundVectors(){ try { if(fileHandle){ if (format == "sharedfile") { SharedRAbundVector* SharedRAbund = new SharedRAbundVector(fileHandle); if (SharedRAbund != NULL) { return SharedRAbund->getSharedRAbundVectors(); } }else if (format == "shared") { SharedList = new SharedListVector(fileHandle); if (SharedList != NULL) { return SharedList->getSharedRAbundVector(); } } m->gobble(fileHandle); } //this is created to signal to calling function that the input file is at eof vector null; null.push_back(NULL); return null; } catch(exception& e) { m->errorOut(e, "InputData", "getSharedRAbundVectors"); exit(1); } } /***********************************************************************/ vector InputData::getSharedRAbundVectors(string label){ try { ifstream in; string thisLabel; m->openInputFile(filename, in); m->saveNextLabel = ""; if(in){ if (format == "sharedfile") { while (in.eof() != true) { SharedRAbundVector* SharedRAbund = new SharedRAbundVector(in); if (SharedRAbund != NULL) { thisLabel = SharedRAbund->getLabel(); //if you are at the last label if (thisLabel == label) { in.close(); return SharedRAbund->getSharedRAbundVectors(); } else { //so you don't loose this memory vector lookup = SharedRAbund->getSharedRAbundVectors(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } delete SharedRAbund; } }else{ break; } m->gobble(in); } }else if (format == "shared") { while (in.eof() != true) { SharedList = new SharedListVector(in); if (SharedList != NULL) { thisLabel = SharedList->getLabel(); //if you are at the last label if (thisLabel == label) { in.close(); return SharedList->getSharedRAbundVector(); } else { //so you don't loose this memory delete SharedList; } }else{ break; } m->gobble(in); } } } //this is created to signal to calling function that the input file is at eof vector null; null.push_back(NULL); in.close(); return null; } catch(exception& e) { m->errorOut(e, "InputData", "getSharedRAbundVectors"); exit(1); } } /***********************************************************************/ //this is used when you don't need the order vector vector InputData::getSharedRAbundFloatVectors(){ try { if(fileHandle){ if (format == "relabund") { SharedRAbundFloatVector* SharedRelAbund = new SharedRAbundFloatVector(fileHandle); if (SharedRelAbund != NULL) { return SharedRelAbund->getSharedRAbundFloatVectors(); } }else if (format == "sharedfile") { SharedRAbundVector* SharedRAbund = new SharedRAbundVector(fileHandle); if (SharedRAbund != NULL) { vector lookup = SharedRAbund->getSharedRAbundVectors(); vector lookupFloat = SharedRAbund->getSharedRAbundFloatVectors(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup.clear(); return lookupFloat; } } m->gobble(fileHandle); } //this is created to signal to calling function that the input file is at eof vector null; null.push_back(NULL); return null; } catch(exception& e) { m->errorOut(e, "InputData", "getSharedRAbundFloatVectors"); exit(1); } } /***********************************************************************/ vector InputData::getSharedRAbundFloatVectors(string label){ try { ifstream in; string thisLabel; m->openInputFile(filename, in); m->saveNextLabel = ""; if(in){ if (format == "relabund") { while (in.eof() != true) { SharedRAbundFloatVector* SharedRelAbund = new SharedRAbundFloatVector(in); if (SharedRelAbund != NULL) { thisLabel = SharedRelAbund->getLabel(); //if you are at the last label if (thisLabel == label) { in.close(); return SharedRelAbund->getSharedRAbundFloatVectors(); } else { //so you don't loose this memory vector lookup = SharedRelAbund->getSharedRAbundFloatVectors(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } delete SharedRelAbund; } }else{ break; } m->gobble(in); } }else if (format == "sharedfile") { while (in.eof() != true) { SharedRAbundVector* SharedRAbund = new SharedRAbundVector(in); if (SharedRAbund != NULL) { thisLabel = SharedRAbund->getLabel(); //if you are at the last label if (thisLabel == label) { in.close(); vector lookup = SharedRAbund->getSharedRAbundVectors(); vector lookupFloat = SharedRAbund->getSharedRAbundFloatVectors(lookup); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup.clear(); return lookupFloat; }else { //so you don't loose this memory vector lookup = SharedRAbund->getSharedRAbundVectors(); for (int i = 0; i < lookup.size(); i++) { delete lookup[i]; } lookup.clear(); delete SharedRAbund; } }else{ break; } m->gobble(in); } } } //this is created to signal to calling function that the input file is at eof vector null; null.push_back(NULL); in.close(); return null; } catch(exception& e) { m->errorOut(e, "InputData", "getSharedRAbundFloatVectors"); exit(1); } } /***********************************************************************/ SAbundVector* InputData::getSAbundVector(){ try { if(fileHandle){ if (format == "list") { input = new ListVector(fileHandle); } else if (format == "shared") { input = new SharedListVector(fileHandle); } else if(format == "rabund"){ input = new RAbundVector(fileHandle); } else if(format == "order"){ input = new OrderVector(fileHandle); } else if(format == "sabund"){ input = new SAbundVector(fileHandle); } m->gobble(fileHandle); sabund = new SAbundVector(); *sabund = (input->getSAbundVector()); return sabund; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getSAbundVector"); exit(1); } } /***********************************************************************/ SAbundVector* InputData::getSAbundVector(string label){ try { ifstream in; string thisLabel; m->openInputFile(filename, in); if(in){ if (format == "list") { m->saveNextLabel = ""; while (in.eof() != true) { input = new ListVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if (format == "shared") { m->saveNextLabel = ""; while (in.eof() != true) { input = new SharedListVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "rabund"){ while (in.eof() != true) { input = new RAbundVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "order"){ while (in.eof() != true) { input = new OrderVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "sabund"){ while (in.eof() != true) { input = new SAbundVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } in.close(); sabund = new SAbundVector(); *sabund = (input->getSAbundVector()); return sabund; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getSAbundVector"); exit(1); } } /***********************************************************************/ RAbundVector* InputData::getRAbundVector(){ try { if(fileHandle){ if (format == "list") { input = new ListVector(fileHandle); } else if (format == "shared") { input = new SharedListVector(fileHandle); } else if(format == "rabund"){ input = new RAbundVector(fileHandle); } else if(format == "order"){ input = new OrderVector(fileHandle); } else if(format == "sabund"){ input = new SAbundVector(fileHandle); } m->gobble(fileHandle); rabund = new RAbundVector(); *rabund = (input->getRAbundVector()); return rabund; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getRAbundVector"); exit(1); } } /***********************************************************************/ RAbundVector* InputData::getRAbundVector(string label){ try { ifstream in; string thisLabel; m->openInputFile(filename, in); if(in){ if (format == "list") { m->saveNextLabel = ""; while (in.eof() != true) { input = new ListVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if (format == "shared") { m->saveNextLabel = ""; while (in.eof() != true) { input = new SharedListVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "rabund"){ while (in.eof() != true) { input = new RAbundVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "order"){ while (in.eof() != true) { input = new OrderVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } else if(format == "sabund"){ while (in.eof() != true) { input = new SAbundVector(in); thisLabel = input->getLabel(); //if you are at the last label if (thisLabel == label) { break; } //so you don't loose this memory else { delete input; } m->gobble(in); } } in.close(); rabund = new RAbundVector(); *rabund = (input->getRAbundVector()); return rabund; } else{ return NULL; } } catch(exception& e) { m->errorOut(e, "InputData", "getRAbundVector"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/inputdata.h000066400000000000000000000031451255543666200166030ustar00rootroot00000000000000#ifndef INPUTDATA_H #define INPUTDATA_H #include "mothur.h" #include "ordervector.hpp" #include "sharedlistvector.h" #include "sharedordervector.h" #include "listvector.hpp" #include "sharedrabundfloatvector.h" class InputData { public: InputData(string, string); InputData(string, string, string); ~InputData(); ListVector* getListVector(); ListVector* getListVector(string); //pass the label you want ListVector* getListVector(string, bool); //pass the label you want, reset filepointer SharedListVector* getSharedListVector(); SharedListVector* getSharedListVector(string); //pass the label you want OrderVector* getOrderVector(); OrderVector* getOrderVector(string); //pass the label you want SharedOrderVector* getSharedOrderVector(); SharedOrderVector* getSharedOrderVector(string); //pass the label you want SAbundVector* getSAbundVector(); SAbundVector* getSAbundVector(string); //pass the label you want RAbundVector* getRAbundVector(); RAbundVector* getRAbundVector(string); //pass the label you want vector getSharedRAbundVectors(); vector getSharedRAbundVectors(string); //pass the label you want vector getSharedRAbundFloatVectors(); vector getSharedRAbundFloatVectors(string); //pass the label you want private: string format; ifstream fileHandle; DataVector* input; ListVector* list; SharedListVector* SharedList; OrderVector* output; SharedOrderVector* SharedOrder; SAbundVector* sabund; RAbundVector* rabund; map orderMap; string filename; MothurOut* m; }; #endif mothur-1.36.1/source/libshuff.cpp000066400000000000000000000070321255543666200167460ustar00rootroot00000000000000/* * libshuffform.cpp * Mothur * * Created by Pat Schloss on 4/8/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "libshuff.h" /***********************************************************************/ void swap(int& i,int& j){ int t = i; i = j; j = t; } /***********************************************************************/ Libshuff::Libshuff(FullMatrix* D, int it, float step, float co) : matrix(D), iters(it), stepSize(step), cutOff(co){ try{ m = MothurOut::getInstance(); groupNames = matrix->getGroups(); groupSizes = matrix->getSizes(); numGroups = matrix->getNumGroups(); initializeGroups(matrix); } catch(exception& e) { m->errorOut(e, "Libshuff", "Libshuff"); exit(1); } } /***********************************************************************/ void Libshuff::initializeGroups(FullMatrix* matrix){ try{ groups.resize(numGroups); savedGroups.resize(numGroups); savedGroups.resize(numGroups); for(int i=0;ierrorOut(e, "Libshuff", "initializeGroups"); exit(1); } } /***********************************************************************/ vector > > Libshuff::getSavedMins(){ return savedMins; } /***********************************************************************/ vector Libshuff::getMinX(int x){ try{ vector minX(groupSizes[x], 0); for(int i=0;i 1 ? (i==0 ? matrix->get(groups[x][0], groups[x][1]) : matrix->get(groups[x][i], groups[x][0])) : 0.0); //get the first value in row i of this block //minX[i] = matrix->get(groups[x][i], groups[x][0]); for(int j=0;jget(groups[x][i], groups[x][j]); if(dx < minX[i]){ minX[i] = dx; } } } } return minX; } catch(exception& e) { m->errorOut(e, "Libshuff", "getMinX"); exit(1); } } /***********************************************************************/ vector Libshuff::getMinXY(int x, int y){ try{ vector minXY(groupSizes[x], 0); for(int i=0;iget(groups[x][i], groups[y][0]); for(int j=0;jget(groups[x][i], groups[y][j]); if(dxyerrorOut(e, "Libshuff", "getMinXY"); exit(1); } } /***********************************************************************/ void Libshuff::randomizeGroups(int x, int y){ try{ int nv = groupSizes[x]+groupSizes[y]; vector v(nv); int index=0; for(int k=0;k0;k--){ int z = (int)(rand() % k); swap(v[z],v[k]); } index=0; for(int k=0;kerrorOut(e, "Libshuff", "randomizeGroups"); exit(1); } } /***********************************************************************/ void Libshuff::resetGroup(int x){ for(int k=0;k > evaluateAll() = 0; virtual float evaluatePair(int,int) = 0; void randomizeGroups(int, int); void resetGroup(int); vector > > getSavedMins(); protected: void initializeGroups(FullMatrix*); vector getMinX(int); vector getMinXY(int, int); vector > > savedMins; FullMatrix* matrix; vector groupSizes; vector groupNames; vector > groups; vector > savedGroups; vector minX; vector minXY; float cutOff; int iters; float stepSize; int numGroups; MothurOut* m; }; #endif mothur-1.36.1/source/linearalgebra.cpp000066400000000000000000002447461255543666200177530ustar00rootroot00000000000000/* * linearalgebra.cpp * mothur * * Created by westcott on 1/7/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "linearalgebra.h" #include "wilcox.h" #define PI 3.1415926535897932384626433832795 // This class references functions used from "Numerical Recipes in C++" // /*********************************************************************************************************************************/ inline double SQR(const double a) { return a*a; } /*********************************************************************************************************************************/ inline double SIGN(const double a, const double b) { return b>=0 ? (a>=0 ? a:-a) : (a>=0 ? -a:a); } /*********************************************************************************************************************************/ //NUmerical recipes pg. 245 - Returns the complementary error function erfc(x) with fractional error everywhere less than 1.2 × 10−7. double LinearAlgebra::erfcc(double x){ try { double t,z,ans; z=fabs(x); t=1.0/(1.0+0.5*z); ans=t*exp(-z*z-1.26551223+t*(1.00002368+t*(0.37409196+t*(0.09678418+ t*(-0.18628806+t*(0.27886807+t*(-1.13520398+t*(1.48851587+ t*(-0.82215223+t*0.17087277))))))))); //cout << "in erfcc " << t << '\t' << ans<< endl; return (x >= 0.0 ? ans : 2.0 - ans); } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "betai"); exit(1); } } /*********************************************************************************************************************************/ //Numerical Recipes pg. 232 double LinearAlgebra::betai(const double a, const double b, const double x) { try { double bt; double result = 0.0; if (x < 0.0 || x > 1.0) { m->mothurOut("[ERROR]: bad x in betai.\n"); m->control_pressed = true; return 0.0; } if (x == 0.0 || x == 1.0) { bt = 0.0; } else { bt = exp(gammln(a+b)-gammln(a)-gammln(b)+a*log(x)+b*log(1.0-x)); } if (x < (a+1.0) / (a+b+2.0)) { result = bt*betacf(a,b,x)/a; } else { result = 1.0-bt*betacf(b,a,1.0-x)/b; } return result; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "betai"); exit(1); } } /*********************************************************************************************************************************/ //Numerical Recipes pg. 219 double LinearAlgebra::gammln(const double xx) { try { int j; double x,y,tmp,ser; static const double cof[6]={76.18009172947146,-86.50532032941677,24.01409824083091, -1.231739572450155,0.120858003e-2,-0.536382e-5}; y=x=xx; tmp=x+5.5; tmp -= (x+0.5)*log(tmp); ser=1.0; for (j=0;j<6;j++) { ser += cof[j]/++y; } return -tmp+log(2.5066282746310005*ser/x); } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "gammln"); exit(1); } } /*********************************************************************************************************************************/ //Numerical Recipes pg. 223 double LinearAlgebra::gammp(const double a, const double x) { try { double gamser,gammcf,gln; if (x < 0.0 || a <= 0.0) { m->mothurOut("[ERROR]: Invalid arguments in routine GAMMP\n"); m->control_pressed = true; return 0.0;} if (x < (a+1.0)) { gser(gamser,a,x,gln); return gamser; } else { gcf(gammcf,a,x,gln); return 1.0-gammcf; } return 0; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "gammp"); exit(1); } } /*********************************************************************************************************************************/ //Numerical Recipes pg. 223 /*double LinearAlgebra::gammq(const double a, const double x) { try { double gamser,gammcf,gln; if (x < 0.0 || a <= 0.0) { m->mothurOut("[ERROR]: Invalid arguments in routine GAMMQ\n"); m->control_pressed = true; return 0.0; } if (x < (a+1.0)) { gser(gamser,a,x,gln); return 1.0-gamser; } else { gcf(gammcf,a,x,gln); return gammcf; } return 0; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "gammq"); exit(1); } } *********************************************************************************************************************************/ //Numerical Recipes pg. 224 double LinearAlgebra::gcf(double& gammcf, const double a, const double x, double& gln){ try { const int ITMAX=100; const double EPS=numeric_limits::epsilon(); const double FPMIN=numeric_limits::min()/EPS; int i; double an,b,c,d,del,h; gln=gammln(a); b=x+1.0-a; c=1.0/FPMIN; d=1.0/b; h=d; for (i=1;i<=ITMAX;i++) { an = -i*(i-a); b += 2.0; d=an*d+b; if (fabs(d) < FPMIN) { d=FPMIN; } c=b+an/c; if (fabs(c) < FPMIN) { c=FPMIN; } d=1.0/d; del=d*c; h *= del; if (fabs(del-1.0) <= EPS) break; } if (i > ITMAX) { m->mothurOut("[ERROR]: " + toString(a) + " too large, ITMAX=100 too small in gcf\n"); m->control_pressed = true; } gammcf=exp(-x+a*log(x)-gln)*h; return 0.0; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "gcf"); exit(1); } } /*********************************************************************************************************************************/ //Numerical Recipes pg. 223 double LinearAlgebra::gser(double& gamser, const double a, const double x, double& gln) { try { int n; double sum,del,ap; const double EPS = numeric_limits::epsilon(); gln=gammln(a); if (x <= 0.0) { if (x < 0.0) { m->mothurOut("[ERROR]: x less than 0 in routine GSER\n"); m->control_pressed = true; } gamser=0.0; return 0.0; } else { ap=a; del=sum=1.0/a; for (n=0;n<100;n++) { ++ap; del *= x/ap; sum += del; if (fabs(del) < fabs(sum)*EPS) { gamser=sum*exp(-x+a*log(x)-gln); return 0.0; } } m->mothurOut("[ERROR]: a too large, ITMAX too small in routine GSER\n"); return 0.0; } return 0; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "gser"); exit(1); } } /*********************************************************************************************************************************/ //Numerical Recipes pg. 233 double LinearAlgebra::betacf(const double a, const double b, const double x) { try { const int MAXIT = 100; const double EPS = numeric_limits::epsilon(); const double FPMIN = numeric_limits::min() / EPS; int m1, m2; double aa, c, d, del, h, qab, qam, qap; qab=a+b; qap=a+1.0; qam=a-1.0; c=1.0; d=1.0-qab*x/qap; if (fabs(d) < FPMIN) d=FPMIN; d=1.0/d; h=d; for (m1=1;m1<=MAXIT;m1++) { m2=2*m1; aa=m1*(b-m1)*x/((qam+m2)*(a+m2)); d=1.0+aa*d; if (fabs(d) < FPMIN) d=FPMIN; c=1.0+aa/c; if (fabs(c) < FPMIN) c=FPMIN; d=1.0/d; h *= d*c; aa = -(a+m1)*(qab+m1)*x/((a+m2)*(qap+m2)); d=1.0+aa*d; if (fabs(d) < FPMIN) d=FPMIN; c=1.0+aa/c; if (fabs(c) < FPMIN) c=FPMIN; d=1.0/d; del=d*c; h *= del; if (fabs(del-1.0) < EPS) break; } if (m1 > MAXIT) { m->mothurOut("[ERROR]: a or b too big or MAXIT too small in betacf."); m->mothurOutEndLine(); m->control_pressed = true; } return h; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "betacf"); exit(1); } } /*********************************************************************************************************************************/ //[3][4] * [4][5] - columns in first must match rows in second, returns matrix[3][5] vector > LinearAlgebra::matrix_mult(vector > first, vector > second){ try { vector > product; int first_rows = first.size(); int first_cols = first[0].size(); int second_cols = second[0].size(); product.resize(first_rows); for(int i=0;icontrol_pressed) { return product; } product[i][j] = 0.0; for(int k=0;kerrorOut(e, "LinearAlgebra", "matrix_mult"); exit(1); } } /*********************************************************************************************************************************/ vector > LinearAlgebra::transpose(vector >matrix){ try { vector > trans; trans.resize(matrix[0].size()); for (int i = 0; i < trans.size(); i++) { for (int j = 0; j < matrix.size(); j++) { trans[i].push_back(matrix[j][i]); } } return trans; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "transpose"); exit(1); } } /*********************************************************************************************************************************/ void LinearAlgebra::recenter(double offset, vector > D, vector >& G){ try { int rank = D.size(); vector > A(rank); vector > C(rank); for(int i=0;ierrorOut(e, "LinearAlgebra", "recenter"); exit(1); } } /*********************************************************************************************************************************/ // This function is taken from Numerical Recipes in C++ by Press et al., 2nd edition, pg. 479 int LinearAlgebra::tred2(vector >& a, vector& d, vector& e){ try { double scale, hh, h, g, f; int n = a.size(); d.resize(n); e.resize(n); for(int i=n-1;i>0;i--){ int l=i-1; h = scale = 0.0000; if(l>0){ for(int k=0;k= 0.0 ? -sqrt(h) : sqrt(h)); e[i] = scale * g; h -= f * g; a[i][l] = f - g; f = 0.0; for(int j=0;jerrorOut(e, "LinearAlgebra", "tred2"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::pythag(double a, double b) { return(pow(a*a+b*b,0.5)); } /*********************************************************************************************************************************/ // This function is taken from Numerical Recipes in C++ by Press et al., 2nd edition, pg. 479 int LinearAlgebra::qtli(vector& d, vector& e, vector >& z) { try { int myM, i, iter; double s, r, p, g, f, dd, c, b; int n = d.size(); for(int i=1;i<=n;i++){ e[i-1] = e[i]; } e[n-1] = 0.0000; for(int l=0;l=l;i--){ f = s * e[i]; b = c * e[i]; e[i+1] = (r=pythag(f,g)); if(r==0.0){ d[i+1] -= p; e[myM] = 0.0000; break; } s = f / r; c = g / r; g = d[i+1] - p; r = (d[i] - g) * s + 2.0 * c * b; d[i+1] = g + ( p = s * r); g = c * r - b; for(int k=0;k= l) continue; d[l] -= p; e[l] = g; e[myM] = 0.0; } } while (myM != l); } int k; for(int i=0;i= p){ p=d[k=j]; } } if(k!=i){ d[k]=d[i]; d[i]=p; for(int j=0;jerrorOut(e, "LinearAlgebra", "qtli"); exit(1); } } /*********************************************************************************************************************************/ //groups by dimension vector< vector > LinearAlgebra::calculateEuclidianDistance(vector< vector >& axes, int dimensions){ try { //make square matrix vector< vector > dists; dists.resize(axes.size()); for (int i = 0; i < dists.size(); i++) { dists[i].resize(axes.size(), 0.0); } if (dimensions == 1) { //one dimension calc = abs(x-y) for (int i = 0; i < dists.size(); i++) { if (m->control_pressed) { return dists; } for (int j = 0; j < i; j++) { dists[i][j] = abs(axes[i][0] - axes[j][0]); dists[j][i] = dists[i][j]; } } }else if (dimensions > 1) { //two dimension calc = sqrt ((x1 - y1)^2 + (x2 - y2)^2)... for (int i = 0; i < dists.size(); i++) { if (m->control_pressed) { return dists; } for (int j = 0; j < i; j++) { double sum = 0.0; for (int k = 0; k < dimensions; k++) { sum += ((axes[i][k] - axes[j][k]) * (axes[i][k] - axes[j][k])); } dists[i][j] = sqrt(sum); dists[j][i] = dists[i][j]; } } } return dists; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calculateEuclidianDistance"); exit(1); } } /*********************************************************************************************************************************/ //returns groups by dimensions from dimensions by groups vector< vector > LinearAlgebra::calculateEuclidianDistance(vector< vector >& axes){ try { //make square matrix vector< vector > dists; dists.resize(axes[0].size()); for (int i = 0; i < dists.size(); i++) { dists[i].resize(axes[0].size(), 0.0); } if (axes.size() == 1) { //one dimension calc = abs(x-y) for (int i = 0; i < dists.size(); i++) { if (m->control_pressed) { return dists; } for (int j = 0; j < i; j++) { dists[i][j] = abs(axes[0][i] - axes[0][j]); dists[j][i] = dists[i][j]; } } }else if (axes.size() > 1) { //two dimension calc = sqrt ((x1 - y1)^2 + (x2 - y2)^2)... for (int i = 0; i < dists[0].size(); i++) { if (m->control_pressed) { return dists; } for (int j = 0; j < i; j++) { double sum = 0.0; for (int k = 0; k < axes.size(); k++) { sum += ((axes[k][i] - axes[k][j]) * (axes[k][i] - axes[k][j])); } dists[i][j] = sqrt(sum); dists[j][i] = dists[i][j]; } } } return dists; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calculateEuclidianDistance"); exit(1); } } /*********************************************************************************************************************************/ //assumes both matrices are square and the same size double LinearAlgebra::calcPearson(vector< vector >& euclidDists, vector< vector >& userDists){ try { //find average for - X int count = 0; float averageEuclid = 0.0; for (int i = 0; i < euclidDists.size(); i++) { for (int j = 0; j < i; j++) { averageEuclid += euclidDists[i][j]; count++; } } averageEuclid = averageEuclid / (float) count; //find average for - Y count = 0; float averageUser = 0.0; for (int i = 0; i < userDists.size(); i++) { for (int j = 0; j < i; j++) { averageUser += userDists[i][j]; count++; } } averageUser = averageUser / (float) count; double numerator = 0.0; double denomTerm1 = 0.0; double denomTerm2 = 0.0; for (int i = 0; i < euclidDists.size(); i++) { for (int k = 0; k < i; k++) { //just lt dists float Yi = userDists[i][k]; float Xi = euclidDists[i][k]; numerator += ((Xi - averageEuclid) * (Yi - averageUser)); denomTerm1 += ((Xi - averageEuclid) * (Xi - averageEuclid)); denomTerm2 += ((Yi - averageUser) * (Yi - averageUser)); } } double denom = (sqrt(denomTerm1) * sqrt(denomTerm2)); double r = numerator / denom; //divide by zero error if (isnan(r) || isinf(r)) { r = 0.0; } return r; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcPearson"); exit(1); } } /*********************************************************************************************************************************/ //assumes both matrices are square and the same size double LinearAlgebra::calcSpearman(vector< vector >& euclidDists, vector< vector >& userDists){ try { double r; //format data map tableX; map::iterator itTable; vector scores; for (int i = 0; i < euclidDists.size(); i++) { for (int j = 0; j < i; j++) { spearmanRank member(toString(scores.size()), euclidDists[i][j]); scores.push_back(member); //count number of repeats itTable = tableX.find(euclidDists[i][j]); if (itTable == tableX.end()) { tableX[euclidDists[i][j]] = 1; }else { tableX[euclidDists[i][j]]++; } } } //sort scores sort(scores.begin(), scores.end(), compareSpearman); //calc LX double Lx = 0.0; for (itTable = tableX.begin(); itTable != tableX.end(); itTable++) { double tx = (double) itTable->second; Lx += ((pow(tx, 3.0) - tx) / 12.0); } //find ranks of xi map rankEuclid; vector ties; int rankTotal = 0; for (int j = 0; j < scores.size(); j++) { rankTotal += (j+1); ties.push_back(scores[j]); if (j != (scores.size()-1)) { // you are not the last so you can look ahead if (scores[j].score != scores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankEuclid[ties[k].name] = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankEuclid[ties[k].name] = thisrank; } } } //format data map tableY; scores.clear(); for (int i = 0; i < userDists.size(); i++) { for (int j = 0; j < i; j++) { spearmanRank member(toString(scores.size()), userDists[i][j]); scores.push_back(member); //count number of repeats itTable = tableY.find(userDists[i][j]); if (itTable == tableY.end()) { tableY[userDists[i][j]] = 1; }else { tableY[userDists[i][j]]++; } } } //sort scores sort(scores.begin(), scores.end(), compareSpearman); //calc LX double Ly = 0.0; for (itTable = tableY.begin(); itTable != tableY.end(); itTable++) { double ty = (double) itTable->second; Ly += ((pow(ty, 3.0) - ty) / 12.0); } //find ranks of yi map rankUser; ties.clear(); rankTotal = 0; for (int j = 0; j < scores.size(); j++) { rankTotal += (j+1); ties.push_back(scores[j]); if (j != (scores.size()-1)) { // you are not the last so you can look ahead if (scores[j].score != scores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankUser[ties[k].name] = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankUser[ties[k].name] = thisrank; } } } double di = 0.0; int count = 0; for (int i = 0; i < userDists.size(); i++) { for (int j = 0; j < i; j++) { float xi = rankEuclid[toString(count)]; float yi = rankUser[toString(count)]; di += ((xi - yi) * (xi - yi)); count++; } } double n = (double) count; double SX2 = ((pow(n, 3.0) - n) / 12.0) - Lx; double SY2 = ((pow(n, 3.0) - n) / 12.0) - Ly; r = (SX2 + SY2 - di) / (2.0 * sqrt((SX2*SY2))); //divide by zero error if (isnan(r) || isinf(r)) { r = 0.0; } return r; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcSpearman"); exit(1); } } /*********************************************************************************************************************************/ //assumes both matrices are square and the same size double LinearAlgebra::calcKendall(vector< vector >& euclidDists, vector< vector >& userDists){ try { double r; //format data vector scores; for (int i = 0; i < euclidDists.size(); i++) { for (int j = 0; j < i; j++) { spearmanRank member(toString(scores.size()), euclidDists[i][j]); scores.push_back(member); } } //sort scores sort(scores.begin(), scores.end(), compareSpearman); //find ranks of xi map rankEuclid; vector ties; int rankTotal = 0; for (int j = 0; j < scores.size(); j++) { rankTotal += (j+1); ties.push_back(scores[j]); if (j != (scores.size()-1)) { // you are not the last so you can look ahead if (scores[j].score != scores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankEuclid[ties[k].name] = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankEuclid[ties[k].name] = thisrank; } } } vector scoresUser; for (int i = 0; i < userDists.size(); i++) { for (int j = 0; j < i; j++) { spearmanRank member(toString(scoresUser.size()), userDists[i][j]); scoresUser.push_back(member); } } //sort scores sort(scoresUser.begin(), scoresUser.end(), compareSpearman); //find ranks of yi map rankUser; ties.clear(); rankTotal = 0; for (int j = 0; j < scoresUser.size(); j++) { rankTotal += (j+1); ties.push_back(scoresUser[j]); if (j != (scoresUser.size()-1)) { // you are not the last so you can look ahead if (scoresUser[j].score != scoresUser[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankUser[ties[k].name] = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); rankUser[ties[k].name] = thisrank; } } } int numCoor = 0; int numDisCoor = 0; //order user ranks vector user; for (int l = 0; l < scores.size(); l++) { spearmanRank member(scores[l].name, rankUser[scores[l].name]); user.push_back(member); } int count = 0; for (int l = 0; l < scores.size(); l++) { int numWithHigherRank = 0; int numWithLowerRank = 0; float thisrank = user[l].score; for (int u = l+1; u < scores.size(); u++) { if (user[u].score > thisrank) { numWithHigherRank++; } else if (user[u].score < thisrank) { numWithLowerRank++; } count++; } numCoor += numWithHigherRank; numDisCoor += numWithLowerRank; } r = (numCoor - numDisCoor) / (float) count; //divide by zero error if (isnan(r) || isinf(r)) { r = 0.0; } return r; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcKendall"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::calcKendall(vector& x, vector& y, double& sig){ try { if (x.size() != y.size()) { m->mothurOut("[ERROR]: vector size mismatch."); m->mothurOutEndLine(); return 0.0; } //format data vector xscores; for (int i = 0; i < x.size(); i++) { spearmanRank member(toString(i), x[i]); xscores.push_back(member); } //sort xscores sort(xscores.begin(), xscores.end(), compareSpearman); //convert scores to ranks of x vector ties; int rankTotal = 0; for (int j = 0; j < xscores.size(); j++) { rankTotal += (j+1); ties.push_back(&(xscores[j])); if (j != xscores.size()-1) { // you are not the last so you can look ahead if (xscores[j].score != xscores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); (*ties[k]).score = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); (*ties[k]).score = thisrank; } } } //format data vector yscores; for (int j = 0; j < y.size(); j++) { spearmanRank member(toString(j), y[j]); yscores.push_back(member); } //sort yscores sort(yscores.begin(), yscores.end(), compareSpearman); //convert to ranks map rank; vector yties; rankTotal = 0; for (int j = 0; j < yscores.size(); j++) { rankTotal += (j+1); yties.push_back(yscores[j]); if (j != yscores.size()-1) { // you are not the last so you can look ahead if (yscores[j].score != yscores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < yties.size(); k++) { float thisrank = rankTotal / (float) yties.size(); rank[yties[k].name] = thisrank; } yties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < yties.size(); k++) { float thisrank = rankTotal / (float) yties.size(); rank[yties[k].name] = thisrank; } } } int numCoor = 0; int numDisCoor = 0; //associate x and y vector otus; for (int l = 0; l < xscores.size(); l++) { spearmanRank member(xscores[l].name, rank[xscores[l].name]); otus.push_back(member); } int count = 0; for (int l = 0; l < xscores.size(); l++) { int numWithHigherRank = 0; int numWithLowerRank = 0; float thisrank = otus[l].score; for (int u = l+1; u < xscores.size(); u++) { if (otus[u].score > thisrank) { numWithHigherRank++; } else if (otus[u].score < thisrank) { numWithLowerRank++; } count++; } numCoor += numWithHigherRank; numDisCoor += numWithLowerRank; } double p = (numCoor - numDisCoor) / (float) count; sig = calcKendallSig(x.size(), p); return p; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcKendall"); exit(1); } } double LinearAlgebra::ran0(int& idum) { const int IA=16807,IM=2147483647,IQ=127773; const int IR=2836,MASK=123459876; const double AM=1.0/double(IM); int k; double ans; idum ^= MASK; k=idum/IQ; idum=IA*(idum-k*IQ)-IR*k; if (idum < 0) idum += IM; ans=AM*idum; idum ^= MASK; return ans; } double LinearAlgebra::ran1(int &idum) { const int IA=16807,IM=2147483647,IQ=127773,IR=2836,NTAB=32; const int NDIV=(1+(IM-1)/NTAB); const double EPS=3.0e-16,AM=1.0/IM,RNMX=(1.0-EPS); static int iy=0; static vector iv(NTAB); int j,k; double temp; if (idum <= 0 || !iy) { if (-idum < 1) idum=1; else idum = -idum; for (j=NTAB+7;j>=0;j--) { k=idum/IQ; idum=IA*(idum-k*IQ)-IR*k; if (idum < 0) idum += IM; if (j < NTAB) iv[j] = idum; } iy=iv[0]; } k=idum/IQ; idum=IA*(idum-k*IQ)-IR*k; if (idum < 0) idum += IM; j=iy/NDIV; iy=iv[j]; iv[j] = idum; if ((temp=AM*iy) > RNMX) return RNMX; else return temp; } double LinearAlgebra::ran2(int &idum) { const int IM1=2147483563,IM2=2147483399; const int IA1=40014,IA2=40692,IQ1=53668,IQ2=52774; const int IR1=12211,IR2=3791,NTAB=32,IMM1=IM1-1; const int NDIV=1+IMM1/NTAB; const double EPS=3.0e-16,RNMX=1.0-EPS,AM=1.0/double(IM1); static int idum2=123456789,iy=0; static vector iv(NTAB); int j,k; double temp; if (idum <= 0) { idum=(idum==0 ? 1 : -idum); idum2=idum; for (j=NTAB+7;j>=0;j--) { k=idum/IQ1; idum=IA1*(idum-k*IQ1)-k*IR1; if (idum < 0) idum += IM1; if (j < NTAB) iv[j] = idum; } iy=iv[0]; } k=idum/IQ1; idum=IA1*(idum-k*IQ1)-k*IR1; if (idum < 0) idum += IM1; k=idum2/IQ2; idum2=IA2*(idum2-k*IQ2)-k*IR2; if (idum2 < 0) idum2 += IM2; j=iy/NDIV; iy=iv[j]-idum2; iv[j] = idum; if (iy < 1) iy += IMM1; if ((temp=AM*iy) > RNMX) return RNMX; else return temp; } double LinearAlgebra::ran3(int &idum) { static int inext,inextp; static int iff=0; const int MBIG=1000000000,MSEED=161803398,MZ=0; const double FAC=(1.0/MBIG); static vector ma(56); int i,ii,k,mj,mk; if (idum < 0 || iff == 0) { iff=1; mj=labs(MSEED-labs(idum)); mj %= MBIG; ma[55]=mj; mk=1; for (i=1;i<=54;i++) { ii=(21*i) % 55; ma[ii]=mk; mk=mj-mk; if (mk < int(MZ)) mk += MBIG; mj=ma[ii]; } for (k=0;k<4;k++) for (i=1;i<=55;i++) { ma[i] -= ma[1+(i+30) % 55]; if (ma[i] < int(MZ)) ma[i] += MBIG; } inext=0; inextp=31; idum=1; } if (++inext == 56) inext=1; if (++inextp == 56) inextp=1; mj=ma[inext]-ma[inextp]; if (mj < int(MZ)) mj += MBIG; ma[inext]=mj; return mj*FAC; } double LinearAlgebra::ran4(int &idum) { #if defined(vax) || defined(_vax_) || defined(__vax__) || defined(VAX) static const unsigned long jflone = 0x00004080; static const unsigned long jflmsk = 0xffff007f; #else static const unsigned long jflone = 0x3f800000; static const unsigned long jflmsk = 0x007fffff; #endif unsigned long irword,itemp,lword; static int idums = 0; if (idum < 0) { idums = -idum; idum=1; } irword=idum; lword=idums; psdes(lword,irword); itemp=jflone | (jflmsk & irword); ++idum; return (*(float *)&itemp)-1.0; } void LinearAlgebra::psdes(unsigned long &lword, unsigned long &irword) { const int NITER=4; static const unsigned long c1[NITER]={ 0xbaa96887L, 0x1e17d32cL, 0x03bcdc3cL, 0x0f33d1b2L}; static const unsigned long c2[NITER]={ 0x4b0f3b58L, 0xe874f0c3L, 0x6955c5a6L, 0x55a7ca46L}; unsigned long i,ia,ib,iswap,itmph=0,itmpl=0; for (i=0;i> 16; ib=itmpl*itmpl+ ~(itmph*itmph); irword=lword ^ (((ia = (ib >> 16) | ((ib & 0xffff) << 16)) ^ c2[i])+itmpl*itmph); lword=iswap; } } /*********************************************************************************************************************************/ double LinearAlgebra::calcKendallSig(double n, double r){ try { double sig = 0.0; double svar=(4.0*n+10.0)/(9.0*n*(n-1.0)); double z= r/sqrt(svar); sig=erfcc(fabs(z)/1.4142136); if (isnan(sig) || isinf(sig)) { sig = 0.0; } return sig; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcKendallSig"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::calcKruskalWallis(vector& values, double& pValue){ try { double H; set treatments; //rank values sort(values.begin(), values.end(), compareSpearman); vector ties; int rankTotal = 0; vector TIES; for (int j = 0; j < values.size(); j++) { treatments.insert(values[j].name); rankTotal += (j+1); ties.push_back(&(values[j])); if (j != values.size()-1) { // you are not the last so you can look ahead if (values[j].score != values[j+1].score) { // you are done with ties, rank them and continue if (ties.size() > 1) { TIES.push_back(ties.size()); } for (int k = 0; k < ties.size(); k++) { double thisrank = rankTotal / (double) ties.size(); (*ties[k]).score = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one if (ties.size() > 1) { TIES.push_back(ties.size()); } for (int k = 0; k < ties.size(); k++) { double thisrank = rankTotal / (double) ties.size(); (*ties[k]).score = thisrank; } } } // H = 12/(N*(N+1)) * (sum Ti^2/n) - 3(N+1) map sums; map counts; for (set::iterator it = treatments.begin(); it != treatments.end(); it++) { sums[*it] = 0.0; counts[*it] = 0; } for (int j = 0; j < values.size(); j++) { sums[values[j].name] += values[j].score; counts[values[j].name]+= 1.0; } double middleTerm = 0.0; for (set::iterator it = treatments.begin(); it != treatments.end(); it++) { middleTerm += ((sums[*it]*sums[*it])/counts[*it]); } double firstTerm = 12 / (double) (values.size()*(values.size()+1)); double lastTerm = 3 * (values.size()+1); H = firstTerm * middleTerm - lastTerm; //adjust for ties if (TIES.size() != 0) { double sum = 0.0; for (int j = 0; j < TIES.size(); j++) { sum += ((TIES[j]*TIES[j]*TIES[j])-TIES[j]); } double result = 1.0 - (sum / (double) ((values.size()*values.size()*values.size())-values.size())); H /= result; } if (isnan(H) || isinf(H)) { H = 0; } //Numerical Recipes pg221 pValue = 1.0 - (gammp(((treatments.size()-1)/(double)2.0), H/2.0)); return H; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcKruskalWallis"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::normalvariate(double mean, double standardDeviation) { try { double u1 = ((double)(rand()) + 1.0 )/( (double)(RAND_MAX) + 1.0); double u2 = ((double)(rand()) + 1.0 )/( (double)(RAND_MAX) + 1.0); //double r = sqrt( -2.0*log(u1) ); //double theta = 2.0*PI*u2; //cout << cos(8.*atan(1.)*u2)*sqrt(-2.*log(u1)) << endl; return cos(8.*atan(1.)*u2)*sqrt(-2.*log(u1)); } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "normalvariate"); exit(1); } } /*********************************************************************************************************************************/ //thanks http://www.johndcook.com/cpp_phi.html double LinearAlgebra::pnorm(double x){ try { // constants double a1 = 0.254829592; double a2 = -0.284496736; double a3 = 1.421413741; double a4 = -1.453152027; double a5 = 1.061405429; double p = 0.3275911; // Save the sign of x int sign = 1; if (x < 0) sign = -1; x = fabs(x)/sqrt(2.0); // A&S formula 7.1.26 double t = 1.0/(1.0 + p*x); double y = 1.0 - (((((a5*t + a4)*t) + a3)*t + a2)*t + a1)*t*exp(-x*x); return 0.5*(1.0 + sign*y); } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "pnorm"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::calcWilcoxon(vector& x, vector& y, double& sig){ try { double W = 0.0; sig = 0.0; vector ranks; for (int i = 0; i < x.size(); i++) { if (m->control_pressed) { return W; } spearmanRank member("x", x[i]); ranks.push_back(member); } for (int i = 0; i < y.size(); i++) { if (m->control_pressed) { return W; } spearmanRank member("y", y[i]); ranks.push_back(member); } //sort values sort(ranks.begin(), ranks.end(), compareSpearman); //convert scores to ranks of x vector ties; int rankTotal = 0; vector TIES; for (int j = 0; j < ranks.size(); j++) { if (m->control_pressed) { return W; } rankTotal += (j+1); ties.push_back(&(ranks[j])); if (j != ranks.size()-1) { // you are not the last so you can look ahead if (ranks[j].score != ranks[j+1].score) { // you are done with ties, rank them and continue if (ties.size() > 1) { TIES.push_back(ties.size()); } for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); (*ties[k]).score = thisrank; } ties.clear(); rankTotal = 0; } }else { // you are the last one if (ties.size() > 1) { TIES.push_back(ties.size()); } for (int k = 0; k < ties.size(); k++) { float thisrank = rankTotal / (float) ties.size(); (*ties[k]).score = thisrank; } } } //from R wilcox.test function //STATISTIC <- sum(r[seq_along(x)]) - n.x * (n.x + 1)/2 double sumRanks = 0.0; for (int i = 0; i < ranks.size(); i++) { if (m->control_pressed) { return W; } if (ranks[i].name == "x") { sumRanks += ranks[i].score; } } W = sumRanks - x.size() * ((double)(x.size() + 1)) / 2.0; //exact <- (n.x < 50) && (n.y < 50) bool findExact = false; if ((x.size() < 50) && (y.size() < 50)) { findExact = true; } if (findExact && (TIES.size() == 0)) { //find exact and no ties //PVAL <- switch(alternative, two.sided = { //p <- if (STATISTIC > (n.x * n.y/2)) PWilcox wilcox; double pval = 0.0; if (W > ((double)x.size()*y.size()/2.0)) { //pwilcox(STATISTIC-1, n.x, n.y, lower.tail = FALSE) pval = wilcox.pwilcox(W-1, x.size(), y.size(), false); }else { //pwilcox(STATISTIC,n.x, n.y) pval = wilcox.pwilcox(W, x.size(), y.size(), true); } sig = 2.0 * pval; if (1.0 < sig) { sig = 1.0; } }else { //z <- STATISTIC - n.x * n.y/2 double z = W - (double)(x.size() * y.size()/2.0); //NTIES <- table(r) double sum = 0.0; for (int j = 0; j < TIES.size(); j++) { sum += ((TIES[j]*TIES[j]*TIES[j])-TIES[j]); } //SIGMA <- sqrt((n.x * n.y/12) * ((n.x + n.y + 1) - //sum(NTIES^3 - NTIES)/((n.x + n.y) * (n.x + n.y - //1)))) double sigma = 0.0; double firstTerm = (double)(x.size() * y.size()/12.0); double secondTerm = (double)(x.size() + y.size() + 1) - sum / (double)((x.size() + y.size()) * (x.size() + y.size() - 1)); sigma = sqrt(firstTerm * secondTerm); //CORRECTION <- switch(alternative, two.sided = sign(z) * 0.5, greater = 0.5, less = -0.5) double CORRECTION = 0.0; if (z < 0) { CORRECTION = -1.0; } else if (z > 0) { CORRECTION = 1.0; } CORRECTION *= 0.5; z = (z - CORRECTION)/sigma; //PVAL <- switch(alternative, two.sided = 2 * min(pnorm(z), pnorm(z, lower.tail = FALSE))) sig = pnorm(z); if ((1.0-sig) < sig) { sig = 1.0 - sig; } sig *= 2; } return W; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcWilcoxon"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::choose(double n, double k){ try { n = floor(n + 0.5); k = floor(k + 0.5); double lchoose = gammln(n + 1.0) - gammln(k + 1.0) - gammln(n - k + 1.0); return (floor(exp(lchoose) + 0.5)); } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "choose"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::calcSpearman(vector& x, vector& y, double& sig){ try { if (x.size() != y.size()) { m->mothurOut("[ERROR]: vector size mismatch."); m->mothurOutEndLine(); return 0.0; } //format data double sf = 0.0; //f^3 - f where f is the number of ties in x; double sg = 0.0; //f^3 - f where f is the number of ties in y; map tableX; map::iterator itTable; vector xscores; for (int i = 0; i < x.size(); i++) { spearmanRank member(toString(i), x[i]); xscores.push_back(member); //count number of repeats itTable = tableX.find(x[i]); if (itTable == tableX.end()) { tableX[x[i]] = 1; }else { tableX[x[i]]++; } } //calc LX double Lx = 0.0; for (itTable = tableX.begin(); itTable != tableX.end(); itTable++) { double tx = (double) itTable->second; Lx += ((pow(tx, 3.0) - tx) / 12.0); } //sort x sort(xscores.begin(), xscores.end(), compareSpearman); //convert scores to ranks of x //convert to ranks map rankx; vector xties; int rankTotal = 0; for (int j = 0; j < xscores.size(); j++) { rankTotal += (j+1); xties.push_back(xscores[j]); if (j != xscores.size()-1) { // you are not the last so you can look ahead if (xscores[j].score != xscores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < xties.size(); k++) { float thisrank = rankTotal / (float) xties.size(); rankx[xties[k].name] = thisrank; } int t = xties.size(); sf += (t*t*t-t); xties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < xties.size(); k++) { float thisrank = rankTotal / (float) xties.size(); rankx[xties[k].name] = thisrank; } } } //format x vector yscores; map tableY; for (int j = 0; j < y.size(); j++) { spearmanRank member(toString(j), y[j]); yscores.push_back(member); itTable = tableY.find(member.score); if (itTable == tableY.end()) { tableY[member.score] = 1; }else { tableY[member.score]++; } } //calc Ly double Ly = 0.0; for (itTable = tableY.begin(); itTable != tableY.end(); itTable++) { double ty = (double) itTable->second; Ly += ((pow(ty, 3.0) - ty) / 12.0); } sort(yscores.begin(), yscores.end(), compareSpearman); //convert to ranks map rank; vector yties; rankTotal = 0; for (int j = 0; j < yscores.size(); j++) { rankTotal += (j+1); yties.push_back(yscores[j]); if (j != yscores.size()-1) { // you are not the last so you can look ahead if (yscores[j].score != yscores[j+1].score) { // you are done with ties, rank them and continue for (int k = 0; k < yties.size(); k++) { float thisrank = rankTotal / (float) yties.size(); rank[yties[k].name] = thisrank; } int t = yties.size(); sg += (t*t*t-t); yties.clear(); rankTotal = 0; } }else { // you are the last one for (int k = 0; k < yties.size(); k++) { float thisrank = rankTotal / (float) yties.size(); rank[yties[k].name] = thisrank; } } } double di = 0.0; for (int k = 0; k < x.size(); k++) { float xi = rankx[toString(k)]; float yi = rank[toString(k)]; di += ((xi - yi) * (xi - yi)); } double p = 0.0; double n = (double) x.size(); double SX2 = ((pow(n, 3.0) - n) / 12.0) - Lx; double SY2 = ((pow(n, 3.0) - n) / 12.0) - Ly; p = (SX2 + SY2 - di) / (2.0 * sqrt((SX2*SY2))); //Numerical Recipes 646 sig = calcSpearmanSig(n, sf, sg, di); return p; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcSpearman"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::calcSpearmanSig(double n, double sf, double sg, double d){ try { double sig = 0.0; double probrs = 0.0; double en=n; double en3n=en*en*en-en; double aved=en3n/6.0-(sf+sg)/12.0; double fac=(1.0-sf/en3n)*(1.0-sg/en3n); double vard=((en-1.0)*en*en*SQR(en+1.0)/36.0)*fac; double zd=(d-aved)/sqrt(vard); double probd=erfcc(fabs(zd)/1.4142136); double rs=(1.0-(6.0/en3n)*(d+(sf+sg)/12.0))/sqrt(fac); fac=(rs+1.0)*(1.0-rs); if (fac > 0.0) { double t=rs*sqrt((en-2.0)/fac); double df=en-2.0; probrs=betai(0.5*df,0.5,df/(df+t*t)); }else { probrs = 0.0; } //smaller of probd and probrs is sig sig = probrs; if (probd < probrs) { sig = probd; } if (isnan(sig) || isinf(sig)) { sig = 0.0; } return sig; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcSpearmanSig"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::calcPearson(vector& x, vector& y, double& sig){ try { if (x.size() != y.size()) { m->mothurOut("[ERROR]: vector size mismatch."); m->mothurOutEndLine(); return 0.0; } //find average X float averageX = 0.0; for (int i = 0; i < x.size(); i++) { averageX += x[i]; } averageX = averageX / (float) x.size(); //find average Y float sumY = 0.0; for (int j = 0; j < y.size(); j++) { sumY += y[j]; } float Ybar = sumY / (float) y.size(); double r = 0.0; double numerator = 0.0; double denomTerm1 = 0.0; double denomTerm2 = 0.0; for (int j = 0; j < x.size(); j++) { float Yi = y[j]; float Xi = x[j]; numerator += ((Xi - averageX) * (Yi - Ybar)); denomTerm1 += ((Xi - averageX) * (Xi - averageX)); denomTerm2 += ((Yi - Ybar) * (Yi - Ybar)); } double denom = (sqrt(denomTerm1) * sqrt(denomTerm2)); r = numerator / denom; //Numerical Recipes pg.644 sig = calcPearsonSig(x.size(), r); return r; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcPearson"); exit(1); } } /*********************************************************************************************************************************/ double LinearAlgebra::calcPearsonSig(double n, double r){ try { double sig = 0.0; const double TINY = 1.0e-20; double z = 0.5*log((1.0+r+TINY)/(1.0-r+TINY)); //Fisher's z transformation //code below was giving an error in betacf with sop files //int df = n-2; //double t = r*sqrt(df/((1.0-r+TINY)*(1.0+r+TINY))); //sig = betai(0.5+df, 0.5, df/(df+t*t)); //Numerical Recipes says code below gives approximately the same result sig = erfcc(fabs(z*sqrt(n-1.0))/1.4142136); if (isnan(sig) || isinf(sig)) { sig = 0.0; } return sig; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "calcPearsonSig"); exit(1); } } /*********************************************************************************************************************************/ vector > LinearAlgebra::getObservedEuclideanDistance(vector >& relAbundData){ try { int numSamples = relAbundData.size(); int numOTUs = relAbundData[0].size(); vector > dMatrix(numSamples); for(int i=0;icontrol_pressed) { return dMatrix; } double d = 0; for(int k=0;kerrorOut(e, "LinearAlgebra", "getObservedEuclideanDistance"); exit(1); } } /*********************************************************************************************************************************/ vector LinearAlgebra::solveEquations(vector > A, vector b){ try { int length = (int)b.size(); vector x(length, 0); vector index(length); for(int i=0;icontrol_pressed) { return b; } lubksb(A, index, b); return b; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "solveEquations"); exit(1); } } /*********************************************************************************************************************************/ vector LinearAlgebra::solveEquations(vector > A, vector b){ try { int length = (int)b.size(); vector x(length, 0); vector index(length); for(int i=0;icontrol_pressed) { return b; } lubksb(A, index, b); return b; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "solveEquations"); exit(1); } } /*********************************************************************************************************************************/ void LinearAlgebra::ludcmp(vector >& A, vector& index, double& d){ try { double tiny = 1e-20; int n = (int)A.size(); vector vv(n, 0.0); double temp; int imax; d = 1.0; for(int i=0;i big ) big=temp; } if(big==0.0){ m->mothurOut("Singular matrix in routine ludcmp\n"); } vv[i] = 1.0/big; } for(int j=0;jcontrol_pressed) { break; } for(int i=0;i= big){ big = dum; imax = i; } } if(j != imax){ for(int k=0;kerrorOut(e, "LinearAlgebra", "ludcmp"); exit(1); } } /*********************************************************************************************************************************/ void LinearAlgebra::lubksb(vector >& A, vector& index, vector& b){ try { //if(m->debug){ m->mothurOut("lubksb\n"); } double total; int n = (int)A.size(); int ii = 0; for(int i=0;idebug){ m->mothurOut("i loop " + toString(i) + "\n"); } if (m->control_pressed) { break; } int ip = index[i]; total = b[ip]; b[ip] = b[i]; if (ii != 0) { for(int j=ii-1;jdebug){ m->mothurOut("j loop " + toString(j) + "\n"); } total -= A[i][j] * b[j]; } } else if(total != 0){ ii = i+1; } b[i] = total; } for(int i=n-1;i>=0;i--){ //if(m->debug){ m->mothurOut("i loop " + toString(i) + "\n"); } total = b[i]; for(int j=i+1;jdebug){ m->mothurOut("end lubksb\n"); } } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "lubksb"); exit(1); } } /*********************************************************************************************************************************/ void LinearAlgebra::ludcmp(vector >& A, vector& index, float& d){ try { //if(m->debug){ m->mothurOut("ludcmp\n"); } double tiny = 1e-20; int n = (int)A.size(); vector vv(n, 0.0); double temp; int imax; d = 1.0; for(int i=0;idebug){ m->mothurOut("i loop " + toString(i) + "\n"); } float big = 0.0; for(int j=0;j big ) big=temp; } if(big==0.0){ m->mothurOut("Singular matrix in routine ludcmp\n"); } vv[i] = 1.0/big; } for(int j=0;jcontrol_pressed) { break; } //if(m->debug){ m->mothurOut("j loop " + toString(j) + "\n"); } for(int i=0;idebug){ m->mothurOut("i loop " + toString(i) + "\n"); } float sum = A[i][j]; for(int k=0;kdebug){ m->mothurOut("j loop " + toString(j) + "\n"); } float sum = A[i][j]; for(int k=0;k= big){ big = dum; imax = i; } } if(j != imax){ for(int k=0;kdebug){ m->mothurOut("end ludcmp\n"); } } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "ludcmp"); exit(1); } } /*********************************************************************************************************************************/ void LinearAlgebra::lubksb(vector >& A, vector& index, vector& b){ try { float total; int n = (int)A.size(); int ii = 0; for(int i=0;icontrol_pressed) { break; } int ip = index[i]; total = b[ip]; b[ip] = b[i]; if (ii != 0) { for(int j=ii-1;j=0;i--){ total = b[i]; for(int j=i+1;jerrorOut(e, "LinearAlgebra", "lubksb"); exit(1); } } /*********************************************************************************************************************************/ vector > LinearAlgebra::getInverse(vector > matrix){ try { int n = (int)matrix.size(); vector > inverse(n); for(int i=0;i column(n, 0.0000); vector index(n, 0); double dummy; ludcmp(matrix, index, dummy); for(int j=0;jcontrol_pressed) { break; } column.assign(n, 0); column[j] = 1.0000; lubksb(matrix, index, column); for(int i=0;ierrorOut(e, "LinearAlgebra", "getInverse"); exit(1); } } /*********************************************************************************************************************************/ //modelled R lda function - MASS:::lda.default vector< vector > LinearAlgebra::lda(vector< vector >& a, vector groups, vector< vector >& means, bool& ignore) { try { set uniqueGroups; for (int i = 0; i < groups.size(); i++) { uniqueGroups.insert(groups[i]); } int numGroups = uniqueGroups.size(); map quickIndex; //className to index. hoping to save counts, proportions and means in vectors to save time. This map will allow us to know index 0 in counts refers to group1. int count = 0; for (set::iterator it = uniqueGroups.begin(); it != uniqueGroups.end(); it++) { quickIndex[*it] = count; count++; } int numSampled = groups.size(); //number of sampled groups int numOtus = a.size(); //number of flagged bins //counts <- as.vector(table(g)) //number of samples from each class in random sampling vector counts; counts.resize(numGroups, 0); for (int i = 0; i < groups.size(); i++) { counts[quickIndex[groups[i]]]++; } vector proportions; proportions.resize(numGroups, 0.0); for (int i = 0; i < numGroups; i++) { proportions[i] = counts[i] / (double) numSampled; } means.clear(); //means[0] -> means[0][0] average for [group0][OTU0]. means.resize(numGroups); for (int i = 0; i < means.size(); i++) { means[i].resize(numOtus, 0.0); } for (int j = 0; j < numSampled; j++) { //total for each class for each OTU for (int i = 0; i < numOtus; i++) { means[quickIndex[groups[j]]][i] += a[i][j]; } } //average for each class for each OTU for (int j = 0; j < numGroups; j++) { for (int i = 0; i < numOtus; i++) { means[j][i] /= counts[j]; } } //randCov <- x - group.means[g, ] vector< vector > randCov; //randCov[0][0] -> (random sample value0 for OTU0 - average for samples group in OTU0). example OTU0, random sample 0.01 from class early. average of class early for OTU0 is 0.005. randCov[0][0] = (0.01-0.005) for (int i = 0; i < numOtus; i++) { //for each flagged OTU vector tempRand; for (int j = 0; j < numSampled; j++) { tempRand.push_back(a[i][j] - means[quickIndex[groups[j]]][i]); } randCov.push_back(tempRand); } //find variance and std for each OTU //f1 <- sqrt(diag(var(x - group.means[g, ]))) vector stdF1; vector ave; for (int i = 0; i < numOtus; i++) { stdF1.push_back(0.0); ave.push_back(m->getAverage(randCov[i])); } for (int i = 0; i < numOtus; i++) { for (int j = 0; j < numSampled; j++) { stdF1[i] += ((randCov[i][j] - ave[i]) * (randCov[i][j] - ave[i])); } } //fac <- 1/(n - ng) double fac = 1 / (double) (numSampled-numGroups); for (int i = 0; i < stdF1.size(); i++) { stdF1[i] /= (double) (numSampled-1); stdF1[i] = sqrt(stdF1[i]); } vector< vector > scaling; //[numOTUS][numOTUS] for (int i = 0; i < numOtus; i++) { vector temp; for (int j = 0; j < numOtus; j++) { if (i == j) { temp.push_back(1.0/stdF1[i]); } else { temp.push_back(0.0); } } scaling.push_back(temp); } /* cout << "scaling = " << endl; for (int i = 0; i < scaling.size(); i++) { for (int j = 0; j < scaling[i].size(); j++) { cout << scaling[i][j] << '\t'; } cout << endl; }*/ //X <- sqrt(fac) * ((x - group.means[g, ]) %*% scaling) vector< vector > X = randCov; //[numOTUS][numSampled] //((x - group.means[g, ]) %*% scaling) //matrix multiplication of randCov and scaling LinearAlgebra linear; X = linear.matrix_mult(scaling, randCov); //[numOTUS][numOTUS] * [numOTUS][numSampled] = [numOTUS][numSampled] fac = sqrt(fac); for (int i = 0; i < X.size(); i++) { for (int j = 0; j < X[i].size(); j++) { X[i][j] *= fac; } } vector d; vector< vector > v; vector< vector > Xcopy; //X = [numOTUS][numSampled] bool transpose = false; //svd requires rows < columns, so if they are not then I need to transpose and look for the results in v. if (X.size() < X[0].size()) { Xcopy = linear.transpose(X); transpose=true; } else { Xcopy = X; } linear.svd(Xcopy, d, v); //Xcopy gets the results we want for v below, because R's version is [numSampled][numOTUS] /*cout << "Xcopy = " << endl; for (int i = 0; i < Xcopy.size(); i++) { for (int j = 0; j < Xcopy[i].size(); j++) { cout << Xcopy[i][j] << '\t'; } cout << endl; } cout << "v = " << endl; for (int i = 0; i < v.size(); i++) { for (int j = 0; j < v[i].size(); j++) { cout << v[i][j] << '\t'; } cout << endl; } */ int rank = 0; set goodColumns; //cout << "d = " << endl; for (int i = 0; i < d.size(); i++) { if (d[i] > 0.0000000001) { rank++; goodColumns.insert(i); } } //cout << d[i] << endl; if (rank == 0) { ignore=true; //m->mothurOut("[ERROR]: rank = 0: variables are numerically const\n"); m->control_pressed = true; return scaling; } //scaling <- scaling %*% X.s$v[, 1L:rank] %*% diag(1/X.s$d[1L:rank], , rank) //X.s$v[, 1L:rank] = columns in Xcopy that correspond to "good" d values //diag(1/X.s$d[1L:rank], , rank) = matrix size rank * rank where the diagonal is 1/"good" dvalues /*example: d [1] 3.721545e+00 3.034607e+00 2.296649e+00 7.986927e-16 6.922408e-16 [6] 5.471102e-16 $v [,1] [,2] [,3] [,4] [,5] [,6] [1,] 0.31122175 0.10944725 0.20183340 -0.30136820 0.60786235 -0.13537095 [2,] -0.29563726 -0.20568893 0.11233366 -0.05073289 0.48234270 0.21965978 ... [1] "X.s$v[, 1L:rank]" [,1] [,2] [,3] [1,] 0.31122175 0.10944725 0.20183340 [2,] -0.29563726 -0.20568893 0.11233366 ... [1] "1/X.s$d[1L:rank]" [1] 0.2687056 0.3295320 0.4354170 [1] "diag(1/X.s$d[1L:rank], , rank)" [,1] [,2] [,3] [1,] 0.2687056 0.000000 0.000000 [2,] 0.0000000 0.329532 0.000000 [3,] 0.0000000 0.000000 0.435417 */ if (transpose) { Xcopy = linear.transpose(v); /* cout << "Xcopy = " << endl; for (int i = 0; i < Xcopy.size(); i++) { for (int j = 0; j < Xcopy[i].size(); j++) { cout << Xcopy[i][j] << '\t'; } cout << endl; }*/ } v.clear(); //store "good" columns - X.s$v[, 1L:rank] v.resize(Xcopy.size()); //[numOTUS]["good" columns] for (set::iterator it = goodColumns.begin(); it != goodColumns.end(); it++) { for (int i = 0; i < Xcopy.size(); i++) { v[i].push_back(Xcopy[i][*it]); } } vector< vector > diagRanks; diagRanks.resize(rank); for (int i = 0; i < rank; i++) { diagRanks[i].resize(rank, 0.0); } count = 0; for (set::iterator it = goodColumns.begin(); it != goodColumns.end(); it++) { diagRanks[count][count] = 1.0 / d[*it]; count++; } scaling = linear.matrix_mult(linear.matrix_mult(scaling, v), diagRanks); //([numOTUS][numOTUS]*[numOTUS]["good" columns]) = [numOTUS]["good" columns] then ([numOTUS]["good" columns] * ["good" columns]["good" columns] = scaling = [numOTUS]["good" columns] /*cout << "scaling = " << endl; for (int i = 0; i < scaling.size(); i++) { for (int j = 0; j < scaling[i].size(); j++) { cout << scaling[i][j] << '\t'; } cout << endl; }*/ //Note: linear.matrix_mult [1][numGroups] * [numGroups][numOTUs] - columns in first must match rows in second, returns matrix[1][numOTUs] vector< vector > prior; prior.push_back(proportions); vector< vector > xbar = linear.matrix_mult(prior, means); vector xBar = xbar[0]; //length numOTUs /*cout << "xbar" << endl; for (int j = 0; j < numOtus; j++) { cout << xBar[j] <<'\t'; }cout << endl;*/ //fac <- 1/(ng - 1) fac = 1 / (double) (numGroups-1); //scale(group.means, center = xbar, scale = FALSE) %*% scaling vector< vector > scaledMeans = means; //[numGroups][numOTUs] for (int i = 0; i < numGroups; i++) { for (int j = 0; j < numOtus; j++) { scaledMeans[i][j] -= xBar[j]; } } scaledMeans = linear.matrix_mult(scaledMeans, scaling); //[numGroups][numOTUS]*[numOTUS]["good"columns] = [numGroups]["good"columns] //sqrt((n * prior) * fac) vector temp = proportions; //[numGroups] for (int i = 0; i < temp.size(); i++) { temp[i] *= numSampled * fac; temp[i] = sqrt(temp[i]); } //X <- sqrt((n * prior) * fac) * (scale(group.means, center = xbar, scale = FALSE) %*% scaling) //X <- temp * scaledMeans X.clear(); X = scaledMeans; //[numGroups]["good"columns] for (int i = 0; i < X.size(); i++) { for (int j = 0; j < X[i].size(); j++) { X[i][j] *= temp[j]; } } /* cout << "X = " << endl; for (int i = 0; i < X.size(); i++) { for (int j = 0; j < X[i].size(); j++) { cout << X[i][j] << '\t'; } cout << endl; } */ d.clear(); v.clear(); //we want to transpose so results are in Xcopy, but if that makes rows > columns then we don't since svd requires rows < cols. transpose=false; if (X.size() > X[0].size()) { Xcopy = X; transpose=true; } else { Xcopy = linear.transpose(X); } linear.svd(Xcopy, d, v); //Xcopy gets the results we want for v below /*cout << "Xcopy = " << endl; for (int i = 0; i < Xcopy.size(); i++) { for (int j = 0; j < Xcopy[i].size(); j++) { cout << Xcopy[i][j] << '\t'; } cout << endl; } cout << "v = " << endl; for (int i = 0; i < v.size(); i++) { for (int j = 0; j < v[i].size(); j++) { cout << v[i][j] << '\t'; } cout << endl; } cout << "d = " << endl; for (int i = 0; i < d.size(); i++) { cout << d[i] << endl; }*/ //rank <- sum(X.s$d > tol * X.s$d[1L]) //X.s$d[1L] = larger value in d vector double largeD = m->max(d); rank = 0; goodColumns.clear(); for (int i = 0; i < d.size(); i++) { if (d[i] > (0.0000000001*largeD)) { rank++; goodColumns.insert(i); } } if (rank == 0) { ignore=true;//m->mothurOut("[ERROR]: rank = 0: class means are numerically identical.\n"); m->control_pressed = true; return scaling; } if (transpose) { Xcopy = linear.transpose(v); } //scaling <- scaling %*% X.s$v[, 1L:rank] - scaling * "good" columns v.clear(); //store "good" columns - X.s$v[, 1L:rank] v.resize(Xcopy.size()); //Xcopy = ["good"columns][numGroups] for (set::iterator it = goodColumns.begin(); it != goodColumns.end(); it++) { for (int i = 0; i < Xcopy.size(); i++) { v[i].push_back(Xcopy[i][*it]); } } scaling = linear.matrix_mult(scaling, v); //[numOTUS]["good" columns] * ["good"columns][new "good" columns] /*cout << "scaling = " << endl; for (int i = 0; i < scaling.size(); i++) { for (int j = 0; j < scaling[i].size(); j++) { cout << scaling[i][j] << '\t'; } cout << endl; }*/ ignore=false; return scaling; } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "lda"); exit(1); } } /*********************************************************************************************************************************/ //Singular value decomposition (SVD) - adapted from http://svn.lirec.eu/libs/magicsquares/src/SVD.cpp /* * svdcomp - SVD decomposition routine. * Takes an mxn matrix a and decomposes it into udv, where u,v are * left and right orthogonal transformation matrices, and d is a * diagonal matrix of singular values. * * This routine is adapted from svdecomp.c in XLISP-STAT 2.1 which is * code from Numerical Recipes adapted by Luke Tierney and David Betz. * * Input to dsvd is as follows: * a = mxn matrix to be decomposed, gets overwritten with u * m = row dimension of a * n = column dimension of a * w = returns the vector of singular values of a * v = returns the right orthogonal transformation matrix */ int LinearAlgebra::svd(vector< vector >& a, vector& w, vector< vector >& v) { try { int flag, i, its, j, jj, k, l, nm; double c, f, h, s, x, y, z; double anorm = 0.0, g = 0.0, scale = 0.0; int numRows = a.size(); if (numRows == 0) { return 0; } int numCols = a[0].size(); w.resize(numCols, 0.0); v.resize(numCols); for (int i = 0; i < numCols; i++) { v[i].resize(numRows, 0.0); } vector rv1; rv1.resize(numCols, 0.0); if (numRows < numCols){ m->mothurOut("[ERROR]: numRows < numCols\n"); m->control_pressed = true; return 0; } /* Householder reduction to bidiagonal form */ for (i = 0; i < numCols; i++) { /* left-hand reduction */ l = i + 1; rv1[i] = scale * g; g = s = scale = 0.0; if (i < numRows) { for (k = i; k < numRows; k++) scale += fabs((double)a[k][i]); if (scale) { for (k = i; k < numRows; k++) { a[k][i] = (double)((double)a[k][i]/scale); s += ((double)a[k][i] * (double)a[k][i]); } f = (double)a[i][i]; g = -SIGN(sqrt(s), f); h = f * g - s; a[i][i] = (double)(f - g); if (i != numCols - 1) { for (j = l; j < numCols; j++) { for (s = 0.0, k = i; k < numRows; k++) s += ((double)a[k][i] * (double)a[k][j]); f = s / h; for (k = i; k < numRows; k++) a[k][j] += (double)(f * (double)a[k][i]); } } for (k = i; k < numRows; k++) a[k][i] = (double)((double)a[k][i]*scale); } } w[i] = (double)(scale * g); /* right-hand reduction */ g = s = scale = 0.0; if (i < numRows && i != numCols - 1) { for (k = l; k < numCols; k++) scale += fabs((double)a[i][k]); if (scale) { for (k = l; k < numCols; k++) { a[i][k] = (double)((double)a[i][k]/scale); s += ((double)a[i][k] * (double)a[i][k]); } f = (double)a[i][l]; g = -SIGN(sqrt(s), f); h = f * g - s; a[i][l] = (double)(f - g); for (k = l; k < numCols; k++) rv1[k] = (double)a[i][k] / h; if (i != numRows - 1) { for (j = l; j < numRows; j++) { for (s = 0.0, k = l; k < numCols; k++) s += ((double)a[j][k] * (double)a[i][k]); for (k = l; k < numCols; k++) a[j][k] += (double)(s * rv1[k]); } } for (k = l; k < numCols; k++) a[i][k] = (double)((double)a[i][k]*scale); } } anorm = max(anorm, (fabs((double)w[i]) + fabs(rv1[i]))); } /* accumulate the right-hand transformation */ for (i = numCols - 1; i >= 0; i--) { if (i < numCols - 1) { if (g) { for (j = l; j < numCols; j++) v[j][i] = (double)(((double)a[i][j] / (double)a[i][l]) / g); /* double division to avoid underflow */ for (j = l; j < numCols; j++) { for (s = 0.0, k = l; k < numCols; k++) s += ((double)a[i][k] * (double)v[k][j]); for (k = l; k < numCols; k++) v[k][j] += (double)(s * (double)v[k][i]); } } for (j = l; j < numCols; j++) v[i][j] = v[j][i] = 0.0; } v[i][i] = 1.0; g = rv1[i]; l = i; } /* accumulate the left-hand transformation */ for (i = numCols - 1; i >= 0; i--) { l = i + 1; g = (double)w[i]; if (i < numCols - 1) for (j = l; j < numCols; j++) a[i][j] = 0.0; if (g) { g = 1.0 / g; if (i != numCols - 1) { for (j = l; j < numCols; j++) { for (s = 0.0, k = l; k < numRows; k++) s += ((double)a[k][i] * (double)a[k][j]); f = (s / (double)a[i][i]) * g; for (k = i; k < numRows; k++) a[k][j] += (double)(f * (double)a[k][i]); } } for (j = i; j < numRows; j++) a[j][i] = (double)((double)a[j][i]*g); } else { for (j = i; j < numRows; j++) a[j][i] = 0.0; } ++a[i][i]; } /* diagonalize the bidiagonal form */ for (k = numCols - 1; k >= 0; k--) { /* loop over singular values */ for (its = 0; its < 30; its++) { /* loop over allowed iterations */ flag = 1; for (l = k; l >= 0; l--) { /* test for splitting */ nm = l - 1; if (fabs(rv1[l]) + anorm == anorm) { flag = 0; break; } if (fabs((double)w[nm]) + anorm == anorm) break; } if (flag) { c = 0.0; s = 1.0; for (i = l; i <= k; i++) { f = s * rv1[i]; if (fabs(f) + anorm != anorm) { g = (double)w[i]; h = pythag(f, g); w[i] = (double)h; h = 1.0 / h; c = g * h; s = (- f * h); for (j = 0; j < numRows; j++) { y = (double)a[j][nm]; z = (double)a[j][i]; a[j][nm] = (double)(y * c + z * s); a[j][i] = (double)(z * c - y * s); } } } } z = (double)w[k]; if (l == k) { /* convergence */ if (z < 0.0) { /* make singular value nonnegative */ w[k] = (double)(-z); for (j = 0; j < numCols; j++) v[j][k] = (-v[j][k]); } break; } if (its >= 30) { m->mothurOut("No convergence after 30,000! iterations \n"); m->control_pressed = true; return(0); } /* shift from bottom 2 x 2 minor */ x = (double)w[l]; nm = k - 1; y = (double)w[nm]; g = rv1[nm]; h = rv1[k]; f = ((y - z) * (y + z) + (g - h) * (g + h)) / (2.0 * h * y); g = pythag(f, 1.0); f = ((x - z) * (x + z) + h * ((y / (f + SIGN(g, f))) - h)) / x; /* next QR transformation */ c = s = 1.0; for (j = l; j <= nm; j++) { i = j + 1; g = rv1[i]; y = (double)w[i]; h = s * g; g = c * g; z = pythag(f, h); rv1[j] = z; c = f / z; s = h / z; f = x * c + g * s; g = g * c - x * s; h = y * s; y = y * c; for (jj = 0; jj < numCols; jj++) { x = (double)v[jj][j]; z = (double)v[jj][i]; v[jj][j] = (float)(x * c + z * s); v[jj][i] = (float)(z * c - x * s); } z = pythag(f, h); w[j] = (float)z; if (z) { z = 1.0 / z; c = f * z; s = h * z; } f = (c * g) + (s * y); x = (c * y) - (s * g); for (jj = 0; jj < numRows; jj++) { y = (double)a[jj][j]; z = (double)a[jj][i]; a[jj][j] = (double)(y * c + z * s); a[jj][i] = (double)(z * c - y * s); } } rv1[l] = 0.0; rv1[k] = f; w[k] = (double)x; } } return(0); } catch(exception& e) { m->errorOut(e, "LinearAlgebra", "svd"); exit(1); } } mothur-1.36.1/source/linearalgebra.h000066400000000000000000000072071255543666200174050ustar00rootroot00000000000000#ifndef LINEARALGEBRA #define LINEARALGEBRA /* * linearalgebra.h * mothur * * Created by westcott on 1/7/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothurout.h" class LinearAlgebra { public: LinearAlgebra() { m = MothurOut::getInstance(); } ~LinearAlgebra() {} vector > matrix_mult(vector >, vector >); vector >transpose(vector >); void recenter(double, vector >, vector >&); //eigenvectors int tred2(vector >&, vector&, vector&); int qtli(vector&, vector&, vector >&); vector< vector > calculateEuclidianDistance(vector >&, int); //pass in axes and number of dimensions vector< vector > calculateEuclidianDistance(vector >&); //pass in axes vector > getObservedEuclideanDistance(vector >&); double calcPearson(vector >&, vector >&); double calcSpearman(vector >&, vector >&); double calcKendall(vector >&, vector >&); double calcKruskalWallis(vector&, double&); double calcWilcoxon(vector&, vector&, double&); double calcPearson(vector&, vector&, double&); double calcSpearman(vector&, vector&, double&); double calcKendall(vector&, vector&, double&); double calcSpearmanSig(double, double, double, double); //length, f^3 - f where f is the number of ties in x, f^3 - f where f is the number of ties in y, sum of squared diffs in ranks. - designed to find the sif of one score. double calcPearsonSig(double, double); //length, coeff. double calcKendallSig(double, double); //length, coeff. vector solveEquations(vector >, vector); vector solveEquations(vector >, vector); vector > getInverse(vector >); double choose(double, double); double normalvariate(double mu, double sigma); vector< vector > lda(vector< vector >& a, vector groups, vector< vector >& means, bool&); //Linear discriminant analysis - a is [features][valuesFromGroups] groups indicates which group each sampling comes from. For example if groups = early, late, mid, early, early. a[0][0] = value for feature0 from groupEarly. int svd(vector< vector >& a, vector& w, vector< vector >& v); //Singular value decomposition private: MothurOut* m; double pythag(double, double); double betacf(const double, const double, const double); double betai(const double, const double, const double); double gammln(const double); //double gammq(const double, const double); double gser(double&, const double, const double, double&); double gcf(double&, const double, const double, double&); double erfcc(double); double gammp(const double, const double); double pnorm(double x); double ran0(int&); //for testing double ran1(int&); //for testing double ran2(int&); //for testing double ran3(int&); //for testing double ran4(int&); //for testing void psdes(unsigned long &, unsigned long &); //for testing void ludcmp(vector >&, vector&, double&); void lubksb(vector >&, vector&, vector&); void ludcmp(vector >&, vector&, float&); void lubksb(vector >&, vector&, vector&); }; #endif mothur-1.36.1/source/makefile000066400000000000000000000112131255543666200161340ustar00rootroot00000000000000################################################### # # Makefile for mothur # ################################################### # # Macros # USEMPI ?= no 64BIT_VERSION ?= yes USEREADLINE ?= yes USECOMPRESSION ?= no MOTHUR_FILES="\"Enter_your_default_path_here\"" BOOST_INCLUDE_DIR="/usr/local/include/" BOOST_LIBRARY_DIR="/usr/local/lib/" RELEASE_DATE = "\"4/22/2015\"" VERSION = "\"1.35.1\"" FORTAN_COMPILER = gfortran FORTRAN_FLAGS = # Optimize to level 3: CXXFLAGS += -O3 ifeq ($(strip $(64BIT_VERSION)),yes) #if you are using centos uncomment the following lines #CXX = g++44 #if you are a mac user use the following line TARGET_ARCH += -arch x86_64 #if you using cygwin to build Windows the following line #CXX = x86_64-w64-mingw32-g++ #CC = x86_64-w64-mingw32-g++ #FORTAN_COMPILER = x86_64-w64-mingw32-gfortran #TARGET_ARCH += -m64 -static #if you are a linux user use the following line #CXXFLAGS += -mtune=native -march=native CXXFLAGS += -DBIT_VERSION FORTRAN_FLAGS = -m64 endif CXXFLAGS += -DRELEASE_DATE=${RELEASE_DATE} -DVERSION=${VERSION} ifeq ($(strip $(MOTHUR_FILES)),"\"Enter_your_default_path_here\"") else CXXFLAGS += -DMOTHUR_FILES=${MOTHUR_FILES} endif # if you do not want to use the readline library, set this to no. # make sure you have the library installed ifeq ($(strip $(USEREADLINE)),yes) CXXFLAGS += -DUSE_READLINE LIBS = \ -lreadline\ -lncurses endif #statically link boost libraries LDFLAGS += -L ${BOOST_LIBRARY_DIR} CXXFLAGS += -I ${BOOST_INCLUDE_DIR} LIBS += \ ${BOOST_LIBRARY_DIR}libboost_atomic.a\ ${BOOST_LIBRARY_DIR}libboost_iostreams.a\ ${BOOST_LIBRARY_DIR}libboost_test_exec_monitor.a\ ${BOOST_LIBRARY_DIR}libboost_chrono.a\ ${BOOST_LIBRARY_DIR}libboost_locale.a\ ${BOOST_LIBRARY_DIR}libboost_prg_exec_monitor.a\ ${BOOST_LIBRARY_DIR}libboost_thread.a\ ${BOOST_LIBRARY_DIR}libboost_container.a\ ${BOOST_LIBRARY_DIR}libboost_log.a\ ${BOOST_LIBRARY_DIR}libboost_program_options.a\ ${BOOST_LIBRARY_DIR}libboost_timer.a\ ${BOOST_LIBRARY_DIR}libboost_context.a\ ${BOOST_LIBRARY_DIR}libboost_log_setup.a\ ${BOOST_LIBRARY_DIR}libboost_python.a\ ${BOOST_LIBRARY_DIR}libboost_unit_test_framework.a\ ${BOOST_LIBRARY_DIR}libboost_coroutine.a\ ${BOOST_LIBRARY_DIR}libboost_math_c99.a\ ${BOOST_LIBRARY_DIR}libboost_random.a\ ${BOOST_LIBRARY_DIR}libboost_wave.a\ ${BOOST_LIBRARY_DIR}libboost_date_time.a\ ${BOOST_LIBRARY_DIR}libboost_math_c99f.a\ ${BOOST_LIBRARY_DIR}libboost_math_tr1l.a\ ${BOOST_LIBRARY_DIR}libboost_math_c99l.a\ ${BOOST_LIBRARY_DIR}libboost_math_tr1.a\ ${BOOST_LIBRARY_DIR}libboost_math_tr1f.a\ ${BOOST_LIBRARY_DIR}libboost_regex.a\ ${BOOST_LIBRARY_DIR}libboost_wserialization.a\ ${BOOST_LIBRARY_DIR}libboost_exception.a\ ${BOOST_LIBRARY_DIR}libboost_serialization.a\ ${BOOST_LIBRARY_DIR}libboost_filesystem.a\ ${BOOST_LIBRARY_DIR}libboost_signals.a\ ${BOOST_LIBRARY_DIR}libboost_graph.a\ ${BOOST_LIBRARY_DIR}libboost_system.a\ ${BOOST_LIBRARY_DIR}zlib.a ifeq ($(strip $(USEMPI)),yes) CXX = mpic++ CXXFLAGS += -DUSE_MPI endif # if you want to enable reading and writing of compressed files, set to yes. # The default is no. this may only work on unix-like systems, not for windows. ifeq ($(strip $(USECOMPRESSION)),yes) CXXFLAGS += -DUSE_COMPRESSION endif # # INCLUDE directories for mothur # # VPATH=calculators:chimera:classifier:clearcut:commands:communitytype:datastructures:metastats:randomforest:read:svm skipUchime := uchime_src/ subdirs := $(filter-out $(skipUchime), $(wildcard */)) subDirIncludes = $(patsubst %, -I %, $(subdirs)) subDirLinking = $(patsubst %, -L%, $(subdirs)) CXXFLAGS += -I. $(subDirIncludes) LDFLAGS += $(subDirLinking) # # Get the list of all .cpp files, rename to .o files # OBJECTS=$(patsubst %.cpp,%.o,$(wildcard $(addsuffix *.cpp,$(subdirs)))) OBJECTS+=$(patsubst %.c,%.o,$(wildcard $(addsuffix *.c,$(subdirs)))) OBJECTS+=$(patsubst %.cpp,%.o,$(wildcard *.cpp)) OBJECTS+=$(patsubst %.c,%.o,$(wildcard *.c)) OBJECTS+=$(patsubst %.f,%.o,$(wildcard *.f)) mothur : fortranSource $(OBJECTS) uchime $(CXX) $(LDFLAGS) $(TARGET_ARCH) -o $@ $(OBJECTS) $(LIBS) strip mothur mv mothur .. uchime: cd uchime_src && ./mk && mv uchime .. && cd .. fortranSource: ${FORTAN_COMPILER} -c $(FORTRAN_FLAGS) *.f install : mothur uchime mv mothur .. mv uchime .. %.o : %.c %.h $(COMPILE.c) $(OUTPUT_OPTION) $< %.o : %.cpp %.h $(COMPILE.cpp) $(OUTPUT_OPTION) $< %.o : %.cpp %.hpp $(COMPILE.cpp) $(OUTPUT_OPTION) $< clean : @rm -f $(OBJECTS) @rm -f uchime mothur-1.36.1/source/metastats/000077500000000000000000000000001255543666200164435ustar00rootroot00000000000000mothur-1.36.1/source/metastats/mothurfisher.cpp000066400000000000000000000117531255543666200216750ustar00rootroot00000000000000/* * mothurfisher.cpp * Mothur * * Created by westcott on 7/8/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ //translated to c++ using source code http://www.langsrud.com/stat/fisher.htm as a reference #include "mothurfisher.h" /***********************************************************/ double MothurFisher::fexact(double n11_, double n12_, double n21_, double n22_, string o) { try { sleft = 0.0; sright = 0.0; sless = 0.0; slarg = 0.0; otuLabel = o; if(n11_<0) n11_ *= -1; if(n12_<0) n12_ *= -1; if(n21_<0) n21_ *= -1; if(n22_<0) n22_ *= -1; double n1_ = n11_+n12_; double n_1 = n11_+n21_; double n = n11_ +n12_ +n21_ +n22_; if (m->debug) { m->mothurOut("[DEBUG]: fisher:fexact n11_, n1_, n_1, n " + toString(n11_) + " " + toString(n1_) + " " + toString(n_1) + " " + toString(n) + " \n"); } exact(n11_,n1_,n_1,n); double twotail = sleft+sright; if(twotail>1) twotail=1; double result = twotail; return result; }catch(exception& e) { m->errorOut(e, "MothurFisher", "fexact"); exit(1); } } /***********************************************************/ double MothurFisher::lngamm(double z) { // Reference: "Lanczos, C. 'A precision approximation // of the gamma function', J. SIAM Numer. Anal., B, 1, 86-96, 1964." // Translation of Alan Miller's FORTRAN-implementation // See http://lib.stat.cmu.edu/apstat/245 try { double x = 0; x += 0.1659470187408462e-06/(z+7); x += 0.9934937113930748e-05/(z+6); x -= 0.1385710331296526 /(z+5); x += 12.50734324009056 /(z+4); x -= 176.6150291498386 /(z+3); x += 771.3234287757674 /(z+2); x -= 1259.139216722289 /(z+1); x += 676.5203681218835 /(z); x += 0.9999999999995183; return(log(x)-5.58106146679532777-z+(z-0.5)*log(z+6.5)); }catch(exception& e) { m->errorOut(e, "MothurFisher", "lngamm"); exit(1); } } /***********************************************************/ double MothurFisher::lnfact(double n){ try { if(n <= 1) return(0); return(lngamm(n+1)); }catch(exception& e) { m->errorOut(e, "MothurFisher", "lnfact"); exit(1); } } /***********************************************************/ double MothurFisher::lnbico(double n, double k){ try { return(lnfact(n)-lnfact(k)-lnfact(n-k)); }catch(exception& e) { m->errorOut(e, "MothurFisher", "lnbico"); exit(1); } } /***********************************************************/ double MothurFisher::hyper_323(double n11, double n1_, double n_1, double n){ try { return(exp(lnbico(n1_,n11)+lnbico(n-n1_,n_1-n11)-lnbico(n,n_1))); }catch(exception& e) { m->errorOut(e, "MothurFisher", "hyper_323"); exit(1); } } /***********************************************************/ double MothurFisher::myhyper(double n11){ try { double hyper0Result = hyper0(n11,0,0,0); return hyper0Result; }catch(exception& e) { m->errorOut(e, "MothurFisher", "myhyper"); exit(1); } } /***********************************************************/ double MothurFisher::hyper0(double n11i, double n1_i, double n_1i, double ni) { try { if (!((n1_i != 0)&&(n_1i != 0)&&(ni != 0))) { if(!(((int)n11i % 10) == 0)){ if(n11i==sn11+1) { sprob *= ((sn1_-sn11)/(n11i))*((sn_1-sn11)/(n11i+sn-sn1_-sn_1)); sn11 = n11i; return sprob; } if(n11i==sn11-1) { sprob *= ((sn11)/(sn1_-n11i))*((sn11+sn-sn1_-sn_1)/(sn_1-n11i)); sn11 = n11i; return sprob; } } sn11 = n11i; }else{ sn11 = n11i; sn1_=n1_i; sn_1=n_1i; sn=ni; } sprob = hyper_323(sn11,sn1_,sn_1,sn); return sprob; }catch(exception& e) { m->errorOut(e, "MothurFisher", "hyper0"); exit(1); } } /***********************************************************/ double MothurFisher::exact(double n11, double n1_, double n_1, double n){ try { double p,i,j,prob; double max=n1_; if(n_1 max) { m->mothurOut("[WARNING]: i value too high. Take a closer look at the pvalue for " + otuLabel + ".\n"); break; } } i--; if(p<1.00000001*prob) sleft += p; else i--; sright=0; p=myhyper(max); for(j=max-1; p<0.99999999*prob; j--) { sright += p; p=myhyper(j); if (j < 0) { m->mothurOut("[WARNING]: j value too low. Take a closer look at the pvalue for " + otuLabel + ".\n"); break; } } j++; if(p<1.00000001*prob) sright += p; else j++; if(abs(i-n11)errorOut(e, "MothurFisher", "exact"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/metastats/mothurfisher.h000066400000000000000000000013621255543666200213350ustar00rootroot00000000000000#ifndef MOTHUR_FISHER #define MOTHUR_FISHER /* * mothurfisher.h * Mothur * * Created by westcott on 7/8/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothurout.h" class MothurFisher { public: MothurFisher(){otuLabel = ""; m = MothurOut::getInstance(); } ~MothurFisher(){} double fexact(double, double, double, double, string); private: MothurOut* m; double sleft, sright, sless, slarg; double sn11,sn1_,sn_1,sn,sprob; double lngamm(double); double lnfact(double); double lnbico(double, double); double hyper_323(double, double, double, double); double myhyper(double); double hyper0(double, double, double, double); double exact(double, double, double, double); string otuLabel; }; #endif mothur-1.36.1/source/metastats/mothurmetastats.cpp000066400000000000000000000444761255543666200224320ustar00rootroot00000000000000/* * mothurmetastats.cpp * Mothur * * Created by westcott on 7/6/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothurmetastats.h" #include "mothurfisher.h" /***********************************************************/ MothurMetastats::MothurMetastats(double t, int n) { try { m = MothurOut::getInstance(); threshold = t; numPermutations = n; }catch(exception& e) { m->errorOut(e, "MothurMetastats", "MothurMetastats"); exit(1); } } /***********************************************************/ MothurMetastats::~MothurMetastats() {} /***********************************************************/ //main metastats function int MothurMetastats::runMetastats(string outputFileName, vector< vector >& data, int secGroupingStart) { try { row = data.size(); //numBins column = data[0].size(); //numGroups in subset secondGroupingStart = secGroupingStart; //g number of samples in group 1 vector< vector > Pmatrix; Pmatrix.resize(row); for (int i = 0; i < row; i++) { Pmatrix[i].resize(column, 0.0); } // the relative proportion matrix vector< vector > C1; C1.resize(row); for (int i = 0; i < row; i++) { C1[i].resize(3, 0.0); } // statistic profiles for class1 and class 2 vector< vector > C2; C2.resize(row); // mean[1], variance[2], standard error[3] for (int i = 0; i < row; i++) { C2[i].resize(3, 0.0); } vector T_statistics; T_statistics.resize(row, 1); // a place to store the true t-statistics vector pvalues; pvalues.resize(row, 1); // place to store pvalues //************************************* // convert to proportions // generate Pmatrix //************************************* vector totals; totals.resize(column, 0); // sum of columns //total[i] = total abundance for group[i] for (int i = 0; i < column; i++) { for (int j = 0; j < row; j++) { totals[i] += data[j][i]; } } for (int i = 0; i < column; i++) { for (int j = 0; j < row; j++) { Pmatrix[j][i] = data[j][i]/totals[i]; } } //#******************************************************************************** //# ************************** STATISTICAL TESTING ******************************** //#******************************************************************************** if (column == 2){ //# then we have a two sample comparison //#************************************************************ //# generate p values fisher's exact test //#************************************************************ double total1, total2; total1 = 0; total2 = 0; //total for first grouping for (int i = 0; i < secondGroupingStart; i++) { total1 += totals[i]; } //total for second grouping for (int i = secondGroupingStart; i < column; i++) { total2 += totals[i]; } vector fish; fish.resize(row, 0.0); vector fish2; fish2.resize(row, 0.0); for(int i = 0; i < row; i++){ for(int j = 0; j < secondGroupingStart; j++) { fish[i] += data[i][j]; } for(int j = secondGroupingStart; j < column; j++) { fish2[i] += data[i][j]; } double f11, f12, f21, f22; f11 = fish[i]; f12 = fish2[i]; f21 = total1 - fish[i]; f22 = total2 - fish2[i]; MothurFisher fisher; double pre = fisher.fexact(f11, f12, f21, f22, m->currentSharedBinLabels[i]); if (pre > 0.999999999) { pre = 1.0; } if (m->control_pressed) { return 1; } pvalues[i] = pre; } }else { //we have multiple subjects per population //#************************************* //# generate statistics mean, var, stderr //#************************************* for(int i = 0; i < row; i++){ // for each taxa //# find the mean of each group double g1Total = 0.0; double g2Total = 0.0; for (int j = 0; j < secondGroupingStart; j++) { g1Total += Pmatrix[i][j]; } C1[i][0] = g1Total/(double)(secondGroupingStart); for (int j = secondGroupingStart; j < column; j++) { g2Total += Pmatrix[i][j]; } C2[i][0] = g2Total/(double)(column-secondGroupingStart); //# find the variance of each group double g1Var = 0.0; double g2Var = 0.0; for (int j = 0; j < secondGroupingStart; j++) { g1Var += pow((Pmatrix[i][j]-C1[i][0]), 2); } C1[i][1] = g1Var/(double)(secondGroupingStart-1); for (int j = secondGroupingStart; j < column; j++) { g2Var += pow((Pmatrix[i][j]-C2[i][0]), 2); } C2[i][1] = g2Var/(double)(column-secondGroupingStart-1); //# find the std error of each group -std err^2 (will change to std err at end) C1[i][2] = C1[i][1]/(double)(secondGroupingStart); C2[i][2] = C2[i][1]/(double)(column-secondGroupingStart); } //#************************************* //# two sample t-statistics //#************************************* for(int i = 0; i < row; i++){ // # for each taxa double xbar_diff = C1[i][0] - C2[i][0]; double denom = sqrt(C1[i][2] + C2[i][2]); T_statistics[i] = xbar_diff/denom; // calculate two sample t-statistic } if (m->debug) { for (int i = 0; i < row; i++) { for (int j = 0; j < 3; j++) { cout << "C1[" << i+1 << "," << j+1 << "]=" << C1[i][j] << ";" << endl; cout << "C2[" << i+1 << "," << j+1 << "]=" << C2[i][j] << ";" << endl; } cout << "T_statistics[" << i+1 << "]=" << T_statistics[i] << ";" << endl; } for (int i = 0; i < row; i++) { for (int j = 0; j < column; j++) { cout << "Fmatrix[" << i+1 << "," << j+1 << "]=" << data[i][j] << ";" << endl; } } } //#************************************* //# generate initial permuted p-values //#************************************* pvalues = permuted_pvalues(Pmatrix, T_statistics, data); if (m->debug) { for (int i = 0; i < row; i++) { m->mothurOut("[DEBUG]: " + m->currentSharedBinLabels[i] + " pvalue = " + toString(pvalues[i]) + "\n"); } } //#************************************* //# generate p values for sparse data //# using fisher's exact test //#************************************* double total1, total2; total1 = 0; total2 = 0; //total for first grouping for (int i = 0; i < secondGroupingStart; i++) { total1 += totals[i]; } //total for second grouping for (int i = secondGroupingStart; i < column; i++) { total2 += totals[i]; } vector fish; fish.resize(row, 0.0); vector fish2; fish2.resize(row, 0.0); for(int i = 0; i < row; i++){ for(int j = 0; j < secondGroupingStart; j++) { fish[i] += data[i][j]; } for(int j = secondGroupingStart; j < column; j++) { fish2[i] += data[i][j]; } if ((fish[i] < secondGroupingStart) && (fish2[i] < (column-secondGroupingStart))) { double f11, f12, f21, f22; f11 = fish[i]; f12 = fish2[i]; f21 = total1 - fish[i]; f22 = total2 - fish2[i]; MothurFisher fisher; if (m->debug) { m->mothurOut("[DEBUG]: about to run fisher for Otu " + m->currentSharedBinLabels[i] + " F11, F12, F21, F22 = " + toString(f11) + " " + toString(f12) + " " + toString(f21) + " " + toString(f22) + " " + "\n"); } double pre = fisher.fexact(f11, f12, f21, f22, m->currentSharedBinLabels[i]); if (m->debug) { m->mothurOut("[DEBUG]: about to completed fisher for Otu " + m->currentSharedBinLabels[i] + " pre = " + toString(pre) + "\n"); } if (pre > 0.999999999) { pre = 1.0; } if (m->control_pressed) { return 1; } pvalues[i] = pre; } } //#************************************* //# convert stderr^2 to std error //#************************************* for(int i = 0; i < row; i++){ C1[i][2] = sqrt(C1[i][2]); C2[i][2] = sqrt(C2[i][2]); } } // And now we write the files to a text file. struct tm *local; time_t t; t = time(NULL); local = localtime(&t); ofstream out; m->openOutputFile(outputFileName, out); out.setf(ios::fixed, ios::floatfield); out.setf(ios::showpoint); out << "Local time and date of test: " << asctime(local) << endl; out << "# rows = " << row << ", # col = " << column << ", g = " << secondGroupingStart << endl << endl; out << numPermutations << " permutations" << endl << endl; //output column headings - not really sure... documentation labels 9 columns, there are 10 in the output file //storage 0 = meanGroup1 - line 529, 1 = varGroup1 - line 532, 2 = err rate1 - line 534, 3 = mean of counts group1?? - line 291, 4 = meanGroup2 - line 536, 5 = varGroup2 - line 539, 6 = err rate2 - line 541, 7 = mean of counts group2?? - line 292, 8 = pvalues - line 293 out << "OTU\tmean(group1)\tvariance(group1)\tstderr(group1)\tmean(group2)\tvariance(group2)\tstderr(group2)\tp-value\n"; for(int i = 0; i < row; i++){ if (m->control_pressed) { out.close(); return 0; } //if there are binlabels use them otherwise count. if (i < m->currentSharedBinLabels.size()) { out << m->currentSharedBinLabels[i] << '\t'; } else { out << (i+1) << '\t'; } out << C1[i][0] << '\t' << C1[i][1] << '\t' << C1[i][2] << '\t' << C2[i][0] << '\t' << C2[i][1] << '\t' << C2[i][2] << '\t' << pvalues[i] << endl; } out << endl << endl; out.close(); return 0; }catch(exception& e) { m->errorOut(e, "MothurMetastats", "runMetastats"); exit(1); } } /***********************************************************/ vector MothurMetastats::permuted_pvalues(vector< vector >& Imatrix, vector& tstats, vector< vector >& Fmatrix) { try { //# matrix stores tstats for each taxa(row) for each permuted trial(column) vector ps; ps.resize(row, 0.0); //# to store the pvalues vector< vector > permuted_ttests; permuted_ttests.resize(numPermutations); for (int i = 0; i < numPermutations; i++) { permuted_ttests[i].resize(row, 0.0); } //# calculate null version of tstats using B permutations. for (int i = 0; i < numPermutations; i++) { permuted_ttests[i] = permute_and_calc_ts(Imatrix); } //# calculate each pvalue using the null ts if ((secondGroupingStart) < 8 || (column-secondGroupingStart) < 8){ vector< vector > cleanedpermuted_ttests; cleanedpermuted_ttests.resize(numPermutations); //# the array pooling just the frequently observed ts //# then pool the t's together! //# count how many high freq taxa there are int hfc = 1; for (int i = 0; i < row; i++) { // # for each taxa double group1Total = 0.0; double group2Total = 0.0; for(int j = 0; j < secondGroupingStart; j++) { group1Total += Fmatrix[i][j]; } for(int j = secondGroupingStart; j < column; j++) { group2Total += Fmatrix[i][j]; } if (group1Total >= secondGroupingStart || group2Total >= (column-secondGroupingStart)){ hfc++; for (int j = 0; j < numPermutations; j++) { cleanedpermuted_ttests[j].push_back(permuted_ttests[j][i]); } } } //#now for each taxa for (int i = 0; i < row; i++) { //number of cleanedpermuted_ttests greater than tstat[i] int numGreater = 0; for (int j = 0; j < numPermutations; j++) { for (int k = 0; k < hfc; k++) { if (cleanedpermuted_ttests[j][k] > abs(tstats[i])) { numGreater++; } } } ps[i] = (1/(double)(numPermutations*hfc))*numGreater; } }else{ for (int i = 0; i < row; i++) { //number of permuted_ttests[i] greater than tstat[i] //(sum(permuted_ttests[i,] > abs(tstats[i]))+1) int numGreater = 1; for (int j = 0; j < numPermutations; j++) { if (permuted_ttests[j][i] > abs(tstats[i])) { numGreater++; } } ps[i] = (1/(double)(numPermutations+1))*numGreater; } } return ps; }catch(exception& e) { m->errorOut(e, "MothurMetastats", "permuted_pvalues"); exit(1); } } /***********************************************************/ vector MothurMetastats::permute_and_calc_ts(vector< vector >& Imatrix) { try { vector< vector > permutedMatrix = Imatrix; //randomize columns, ie group abundances. map randomMap; vector randoms; for (int i = 0; i < column; i++) { randoms.push_back(i); } random_shuffle(randoms.begin(), randoms.end()); for (int i = 0; i < randoms.size(); i++) { randomMap[i] = randoms[i]; } //calc ts vector< vector > C1; C1.resize(row); for (int i = 0; i < row; i++) { C1[i].resize(3, 0.0); } // statistic profiles for class1 and class 2 vector< vector > C2; C2.resize(row); // mean[1], variance[2], standard error[3] for (int i = 0; i < row; i++) { C2[i].resize(3, 0.0); } vector Ts; Ts.resize(row, 0.0); // a place to store the true t-statistics //#************************************* //# generate statistics mean, var, stderr //#************************************* for(int i = 0; i < row; i++){ // for each taxa //# find the mean of each group double g1Total = 0.0; double g2Total = 0.0; for (int j = 0; j < secondGroupingStart; j++) { g1Total += permutedMatrix[i][randomMap[j]]; } C1[i][0] = g1Total/(double)(secondGroupingStart); for (int j = secondGroupingStart; j < column; j++) { g2Total += permutedMatrix[i][randomMap[j]]; } C2[i][0] = g2Total/(double)(column-secondGroupingStart); //# find the variance of each group double g1Var = 0.0; double g2Var = 0.0; for (int j = 0; j < secondGroupingStart; j++) { g1Var += pow((permutedMatrix[i][randomMap[j]]-C1[i][0]), 2); } C1[i][1] = g1Var/(double)(secondGroupingStart-1); for (int j = secondGroupingStart; j < column; j++) { g2Var += pow((permutedMatrix[i][randomMap[j]]-C2[i][0]), 2); } C2[i][1] = g2Var/(double)(column-secondGroupingStart-1); //# find the std error of each group -std err^2 (will change to std err at end) C1[i][2] = C1[i][1]/(double)(secondGroupingStart); C2[i][2] = C2[i][1]/(double)(column-secondGroupingStart); } //#************************************* //# two sample t-statistics //#************************************* for(int i = 0; i < row; i++){ // # for each taxa double xbar_diff = C1[i][0] - C2[i][0]; double denom = sqrt(C1[i][2] + C2[i][2]); Ts[i] = abs(xbar_diff/denom); // calculate two sample t-statistic } return Ts; }catch(exception& e) { m->errorOut(e, "MothurMetastats", "permuted_ttests"); exit(1); } } /***********************************************************/ int MothurMetastats::OrderPValues(int low, int high, vector& p, vector& order) { try { if (low < high) { int i = low+1; int j = high; int pivot = (low+high) / 2; swapElements(low, pivot, p, order); //puts pivot in final spot /* compare value */ double key = p[low]; /* partition */ while(i <= j) { /* find member above ... */ while((i <= high) && (p[i] <= key)) { i++; } /* find element below ... */ while((j >= low) && (p[j] > key)) { j--; } if(i < j) { swapElements(i, j, p, order); } } swapElements(low, j, p, order); /* recurse */ OrderPValues(low, j-1, p, order); OrderPValues(j+1, high, p, order); } return 0; }catch(exception& e) { m->errorOut(e, "MothurMetastats", "OrderPValues"); exit(1); } } /***********************************************************/ int MothurMetastats::swapElements(int i, int j, vector& p, vector& order) { try { double z = p[i]; p[i] = p[j]; p[j] = z; int temp = order[i]; order[i] = order[j]; order[j] = temp; return 0; }catch(exception& e) { m->errorOut(e, "MothurMetastats", "swapElements"); exit(1); } } /***********************************************************/ vector MothurMetastats::getSequence(int start, int end, int length) { try { vector sequence; double increment = (end-start) / (double) (length-1); sequence.push_back(start); for (int i = 1; i < length-1; i++) { sequence.push_back(int(i*increment)); } sequence.push_back(end); return sequence; }catch(exception& e) { m->errorOut(e, "MothurMetastats", "getSequence"); exit(1); } } /***********************************************************/ mothur-1.36.1/source/metastats/mothurmetastats.h000066400000000000000000000026371255543666200220700ustar00rootroot00000000000000#ifndef MOTHUR_METASTATS #define MOTHUR_METASTATS /* * mothurmetastats.h * Mothur * * Created by westcott on 7/6/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothurout.h" class MothurMetastats { public: MothurMetastats(double, int); //threshold, numPermutations ~MothurMetastats(); int runMetastats(string, vector< vector >&, int); //outputFileName, data, secondGroupingStart private: MothurOut* m; int row, column, numPermutations, secondGroupingStart; double threshold; vector permuted_pvalues(vector< vector >&, vector&, vector< vector >&); vector permute_and_calc_ts(vector< vector >&); int start(vector&, int, vector&, vector< vector >&); //Find the initial values for the matrix int meanvar(vector&, int, vector&); int testp(vector&, vector&, vector&, int, vector&, vector&); int permute_matrix(vector&, vector&, int, vector&, vector&, vector&); int permute_array(vector&); int calc_twosample_ts(vector&, int, vector&, vector&, vector&); int OrderPValues(int, int, vector&, vector&); int swapElements(int, int, vector&, vector&); vector getSequence(int, int, int); }; #endif mothur-1.36.1/source/mothur.cpp000066400000000000000000000232601255543666200164630ustar00rootroot00000000000000/* * interface.cpp * * * Created by Pat Schloss on 8/14/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "engine.hpp" #include "mothurout.h" #include "referencedb.h" /**************************************************************************************************/ CommandFactory* CommandFactory::_uniqueInstance = 0; MothurOut* MothurOut::_uniqueInstance = 0; ReferenceDB* ReferenceDB::myInstance = 0; /***********************************************************************/ volatile int ctrlc_pressed = 0; void ctrlc_handler ( int sig ) { MothurOut* m = MothurOut::getInstance(); ctrlc_pressed = 1; m->control_pressed = ctrlc_pressed; if (m->executing) { //if mid command quit execution, else quit mothur m->mothurOutEndLine(); m->mothurOut("quitting command..."); m->mothurOutEndLine(); }else{ m->mothurOut("quitting mothur"); m->mothurOutEndLine(); exit(1); } } /***********************************************************************/ int main(int argc, char *argv[]){ MothurOut* m = MothurOut::getInstance(); try { bool createLogFile = true; signal(SIGINT, ctrlc_handler ); time_t ltime = time(NULL); /* calendar time */ string logFileName = "mothur." + toString(ltime) + ".logfile"; #ifdef USE_MPI MPI_Init(&argc, &argv); #endif m->setFileName(logFileName); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) system("clear"); #else system("CLS"); #endif #ifdef MOTHUR_FILES string temp = MOTHUR_FILES; //add / to name if needed string lastChar = temp.substr(temp.length()-1); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (lastChar != "/") { temp += "/"; } #else if (lastChar != "\\") { temp += "\\"; } #endif temp = m->getFullPathName(temp); m->setDefaultPath(temp); #endif #ifdef USE_MPI int version, subversion; MPI_Get_version(&version, &subversion); #endif //get releaseDate from Make string releaseDate = RELEASE_DATE; string mothurVersion = VERSION; m->setReleaseDate(releaseDate); m->setVersion(mothurVersion); //will make the gui output "pretty" bool outputHeader = true; if (argc>1) { string guiInput = argv[1]; if (guiInput[0] == '+') { outputHeader = false; } if (guiInput[0] == '-') { outputHeader = false; } if (argc > 2) { //is one of these -q for quiet mode? if (argc > 3) { m->mothurOut("[ERROR]: mothur only allows command inputs and the -q command line options.\n i.e. ./mothur \"#summary.seqs(fasta=final.fasta);\" -q\n or ./mothur -q \"#summary.seqs(fasta=final.fasta);\"\n"); return 0; } else { string argv1 = argv[1]; string argv2 = argv[2]; if ((argv1 == "--quiet") || (argv1 == "-q")) { m->quietMode = true; argv[1] = argv[2]; }else if ((argv2 == "--quiet") || (argv2 == "-q")) { m->quietMode = true; }else { m->mothurOut("[ERROR]: mothur only allows command inputs and the -q command line options.\n"); m->mothurOut("[ERROR]: Unrecognized options: " + argv1 + " " + argv2 + "\n"); return 0; } } } } if (outputHeader) { //version #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #if defined (__APPLE__) || (__MACH__) m->mothurOutJustToLog("Mac version"); m->mothurOutEndLine(); m->mothurOutEndLine(); #else m->mothurOutJustToLog("Linux version"); m->mothurOutEndLine(); m->mothurOutEndLine(); #endif #else m->mothurOutJustToLog("Windows version"); m->mothurOutEndLine(); m->mothurOutEndLine(); #endif #ifdef USE_READLINE m->mothurOutJustToLog("Using ReadLine"); m->mothurOutEndLine(); m->mothurOutEndLine(); #endif #ifdef MOTHUR_FILES m->mothurOutJustToLog("Using default file location " + temp); m->mothurOutEndLine(); m->mothurOutEndLine(); #endif #ifdef BIT_VERSION m->mothurOutJustToLog("Running 64Bit Version"); m->mothurOutEndLine(); m->mothurOutEndLine(); #else m->mothurOutJustToLog("Running 32Bit Version"); m->mothurOutEndLine(); m->mothurOutEndLine(); #endif //header m->mothurOut("mothur v." + mothurVersion); m->mothurOutEndLine(); m->mothurOut("Last updated: " + releaseDate); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("by"); m->mothurOutEndLine(); m->mothurOut("Patrick D. Schloss"); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Department of Microbiology & Immunology"); m->mothurOutEndLine(); m->mothurOut("University of Michigan"); m->mothurOutEndLine(); m->mothurOut("pschloss@umich.edu"); m->mothurOutEndLine(); m->mothurOut("http://www.mothur.org"); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("When using, please cite:"); m->mothurOutEndLine(); m->mothurOut("Schloss, P.D., et al., Introducing mothur: Open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 2009. 75(23):7537-41."); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Distributed under the GNU General Public License"); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Type 'help()' for information on the commands that are available"); m->mothurOutEndLine(); m->mothurOutEndLine(); m->mothurOut("Type 'quit()' to exit program"); m->mothurOutEndLine(); #ifdef USE_MPI m->mothurOutJustToLog("Using MPI\tversion "); m->mothurOutJustToLog(toString(version) + "." + toString(subversion) + "\n"); #endif } //srand(54321); srand( (unsigned)time( NULL ) ); Engine* mothur = NULL; bool bail = 0; string input; if(argc>1){ input = argv[1]; //m->mothurOut("input = " + input); m->mothurOutEndLine(); if (input[0] == '#') { m->mothurOutJustToLog("Script Mode"); m->mothurOutEndLine(); m->mothurOutEndLine(); mothur = new ScriptEngine(argv[0], argv[1]); }else if (input[0] == '+') { mothur = new ScriptEngine(argv[0], argv[1]); m->gui = true; }else if ((input == "--version") || (input == "-v")) { createLogFile = false; string OS = ""; //version #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #if defined (__APPLE__) || (__MACH__) OS = "Mac "; #else OS = "Linux "; #endif #else OS = "Windows "; #endif #ifdef BIT_VERSION OS += "64Bit Version"; #else OS += "32Bit Version"; #endif m->mothurOut(OS + "\nMothur version=" + mothurVersion + "\nRelease Date=" + releaseDate); m->mothurOutEndLine(); m->mothurOutEndLine(); m->closeLog(); #ifdef USE_MPI MPI_Finalize(); #endif m->mothurRemove(logFileName); return 0; }else if ((input == "--help") || (input == "-h")) { createLogFile = false; m->mothurOutJustToLog("Script Mode"); m->mothurOutEndLine(); m->mothurOutEndLine(); char* temp = new char[16]; *temp = '\0'; strncat(temp, "#help();quit();", 15); argv[1] = temp; mothur = new ScriptEngine(argv[0], argv[1]); }else{ m->mothurOutJustToLog("Batch Mode"); m->mothurOutEndLine(); m->mothurOutEndLine(); mothur = new BatchEngine(argv[0], argv[1]); } }else{ m->mothurOutJustToLog("Interactive Mode"); m->mothurOutEndLine(); m->mothurOutEndLine(); mothur = new InteractEngine(argv[0]); } while(bail == 0) { bail = mothur->getInput(); } //closes logfile so we can rename m->closeLog(); string outputDir = mothur->getOutputDir(); string tempLog = mothur->getLogFileName(); bool append = mothur->getAppend(); string newlogFileName; if (tempLog != "") { newlogFileName = outputDir + tempLog; if (!append) { //need this because m->mothurOut makes the logfile, but doesn't know where to put it rename(logFileName.c_str(), newlogFileName.c_str()); //logfile with timestamp }else { ofstream outNewLog; m->openOutputFileAppend(newlogFileName, outNewLog); if (!m->gui) { outNewLog << endl << endl << "*********************************************************************************" << endl << endl; }else { outNewLog << endl; } outNewLog.close(); m->appendFiles(logFileName, newlogFileName); m->mothurRemove(logFileName); } }else{ newlogFileName = outputDir + logFileName; //need this because m->mothurOut makes the logfile, but doesn't know where to put it rename(logFileName.c_str(), newlogFileName.c_str()); //logfile with timestamp } if (!createLogFile) { m->mothurRemove(newlogFileName); } if (mothur != NULL) { delete mothur; } #ifdef USE_MPI MPI_Finalize(); #endif return 0; } catch(exception& e) { m->errorOut(e, "mothur", "main"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/mothur.h000066400000000000000000000266101255543666200161320ustar00rootroot00000000000000#ifndef MOTHUR_H #define MOTHUR_H /* * mothur.h * Mothur * * Created by Sarah Westcott on 2/19/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This file contains all the standard incudes we use in the project as well as some common utilities. */ //#include //boost libraries #ifdef USE_BOOST #include #include #endif //io libraries #include #include #include #include #include //exception #include #include #include //containers #include #include #include #include #include #include //math #include #include #include #include //misc #include #include #include #ifdef USE_MPI #include "mpi.h" #endif /***********************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #include #include #include #include #include #include #ifdef USE_READLINE #include #include #endif #else #include //allows unbuffered screen capture from stdin #include //get cwd #include #include #include #include #endif using namespace std; #define exp(x) (exp((double) x)) #define sqrt(x) (sqrt((double) x)) #define log10(x) (log10((double) x)) #define log2(x) (log10(x)/log10(2)) #define isnan(x) ((x) != (x)) #define isinf(x) (fabs(x) == std::numeric_limits::infinity()) typedef unsigned long ull; typedef unsigned short intDist; struct IntNode { int lvalue; int rvalue; int lcoef; int rcoef; IntNode* left; IntNode* right; IntNode(int lv, int rv, IntNode* l, IntNode* r) : lvalue(lv), rvalue(rv), left(l), right(r) {}; IntNode() {}; }; struct ThreadNode { int* pid; IntNode* left; IntNode* right; }; struct diffPair { float prob; float reverseProb; diffPair() { prob = 0; reverseProb = 0; } diffPair(float p, float rp) { prob = p; reverseProb = rp; } }; /**********************************************************/ struct CommonHeader { unsigned int magicNumber; string version; unsigned long long indexOffset; unsigned int indexLength; unsigned int numReads; unsigned short headerLength; unsigned short keyLength; unsigned short numFlowsPerRead; int flogramFormatCode; string flowChars; //length depends on number flow reads string keySequence; //length depends on key length CommonHeader(){ magicNumber=0; indexOffset=0; indexLength=0; numReads=0; headerLength=0; keyLength=0; numFlowsPerRead=0; flogramFormatCode='s'; } ~CommonHeader() { } }; /**********************************************************/ struct Header { unsigned short headerLength; unsigned short nameLength; unsigned int numBases; unsigned short clipQualLeft; unsigned short clipQualRight; unsigned short clipAdapterLeft; unsigned short clipAdapterRight; string name; //length depends on nameLength string timestamp; string region; string xy; Header() { headerLength=0; nameLength=0; numBases=0; clipQualLeft=0; clipQualRight=0; clipAdapterLeft=0; clipAdapterRight=0; } ~Header() { } }; /**********************************************************/ struct seqRead { vector flowgram; vector flowIndex; string bases; vector qualScores; seqRead() { } ~seqRead() { } }; /**********************************************************/ struct linePair { unsigned long long start; unsigned long long end; linePair(unsigned long long i, unsigned long long j) : start(i), end(j) {} linePair(){ start=0; end=0; } ~linePair(){} }; /***********************************************************************/ struct PDistCell{ ull index; float dist; PDistCell() : index(0), dist(0) {}; PDistCell(ull c, float d) : index(c), dist(d) {} }; /***********************************************************************/ struct consTax{ string name; string taxonomy; int abundance; consTax() : name(""), taxonomy("unknown"), abundance(0) {}; consTax(string n, string t, int a) : name(n), taxonomy(t), abundance(a) {} }; /***********************************************************************/ struct listCt{ string bin; int binSize; listCt() : bin(""), binSize(0) {}; listCt(string b, int a) : bin(b), binSize(a) {} }; /***********************************************************************/ struct consTax2{ string taxonomy; int abundance; string otuName; consTax2() : otuName("OTUxxx"), taxonomy("unknown"), abundance(0) {}; consTax2(string n, string t, int a) : otuName(n), taxonomy(t), abundance(a) {} }; /************************************************************/ struct clusterNode { int numSeq; int parent; int smallChild; //used to make linkTable work with list and rabund. represents bin number of this cluster node clusterNode(int num, int par, int kid) : numSeq(num), parent(par), smallChild(kid) {}; }; /************************************************************/ struct seqDist { int seq1; int seq2; double dist; seqDist() {} seqDist(int s1, int s2, double d) : seq1(s1), seq2(s2), dist(d) {} ~seqDist() {} }; /************************************************************/ struct distlinePair { int start; int end; }; /************************************************************/ struct oligosPair { string forward; string reverse; oligosPair() { forward = ""; reverse = ""; } oligosPair(string f, string r) : forward(f), reverse(r) {} ~oligosPair() {} }; /************************************************************/ struct seqPriorityNode { int numIdentical; string seq; string name; seqPriorityNode() {} seqPriorityNode(int n, string s, string nm) : numIdentical(n), seq(s), name(nm) {} ~seqPriorityNode() {} }; /************************************************************/ struct compGroup { string group1; string group2; compGroup() {} compGroup(string s, string nm) : group1(s), group2(nm) {} string getCombo() { return group1+"-"+group2; } ~compGroup() {} }; /***************************************************************/ struct spearmanRank { string name; float score; spearmanRank(string n, float s) : name(n), score(s) {} }; //*********************************************************************** inline bool compareIndexes(PDistCell left, PDistCell right){ return (left.index > right.index); } //******************************************************************************************************************** inline bool compareSpearman(spearmanRank left, spearmanRank right){ return (left.score < right.score); } //******************************************************************************************************************** inline double max(double left, double right){ if (left > right) { return left; } else { return right; } } //******************************************************************************************************************** inline double max(int left, double right){ double value = left; if (left > right) { return value; } else { return right; } } //******************************************************************************************************************** inline double max(double left, int right){ double value = right; if (left > value) { return left; } else { return value; } } //******************************************************************************************************************** //sorts highest to lowest inline bool compareSeqPriorityNodes(seqPriorityNode left, seqPriorityNode right){ if (left.numIdentical > right.numIdentical) { return true; }else if (left.numIdentical == right.numIdentical) { if (left.seq > right.seq) { return true; } else { return false; } } return false; } /************************************************************/ //sorts lowest to highest inline bool compareDistLinePairs(distlinePair left, distlinePair right){ return (left.end < right.end); } //******************************************************************************************************************** //sorts lowest to highest inline bool compareSequenceDistance(seqDist left, seqDist right){ return (left.dist < right.dist); } //******************************************************************************************************************** //returns sign of double inline double sign(double temp){ //find sign if (temp > 0) { return 1.0; } else if (temp < 0) { return -1.0; } return 0; } /***********************************************************************/ // snagged from http://www.parashift.com/c++-faq-lite/misc-technical-issues.html#faq-39.2 // works for now, but there should be a way to do it without killing the whole program class BadConversion : public runtime_error { public: BadConversion(const string& s) : runtime_error(s){ } }; //********************************************************************************************************************** template void convert(const string& s, T& x, bool failIfLeftoverChars = true){ istringstream i(s); char c; if (!(i >> x) || (failIfLeftoverChars && i.get(c))) throw BadConversion(s); } //********************************************************************************************************************** template int sgn(T val){ return (val > T(0)) - (val < T(0)); } //********************************************************************************************************************** template bool convertTestFloat(const string& s, T& x, bool failIfLeftoverChars = true){ istringstream i(s); char c; if (!(i >> x) || (failIfLeftoverChars && i.get(c))) { return false; } return true; } //********************************************************************************************************************** template bool convertTest(const string& s, T& x, bool failIfLeftoverChars = true){ istringstream i(s); char c; if (!(i >> x) || (failIfLeftoverChars && i.get(c))) { return false; } return true; } //********************************************************************************************************************** template string toString(const T&x){ stringstream output; output << x; return output.str(); } //********************************************************************************************************************** template string toHex(const T&x){ stringstream output; output << hex << x; return output.str(); } //********************************************************************************************************************** template string toString(const T&x, int i){ stringstream output; output.precision(i); output << fixed << x; return output.str(); } //********************************************************************************************************************** template T fromString(const string& s){ istringstream stream (s); T t; stream >> t; return t; } //********************************************************************************************************************** #endif mothur-1.36.1/source/mothurout.cpp000066400000000000000000004632201255543666200172170ustar00rootroot00000000000000/* * mothurOut.cpp * Mothur * * Created by westcott on 2/25/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothurout.h" //needed for testing project //MothurOut* MothurOut::_uniqueInstance; /******************************************************/ MothurOut* MothurOut::getInstance() { if( _uniqueInstance == 0) { _uniqueInstance = new MothurOut(); } return _uniqueInstance; } /*********************************************************************************************/ set MothurOut::getCurrentTypes() { try { set types; types.insert("fasta"); types.insert("summary"); types.insert("file"); types.insert("accnos"); types.insert("column"); types.insert("design"); types.insert("group"); types.insert("list"); types.insert("name"); types.insert("oligos"); types.insert("order"); types.insert("ordergroup"); types.insert("phylip"); types.insert("qfile"); types.insert("relabund"); types.insert("sabund"); types.insert("rabund"); types.insert("sff"); types.insert("shared"); types.insert("taxonomy"); types.insert("tree"); types.insert("flow"); types.insert("biom"); types.insert("count"); types.insert("processors"); return types; } catch(exception& e) { errorOut(e, "MothurOut", "getCurrentTypes"); exit(1); } } /*********************************************************************************************/ void MothurOut::printCurrentFiles() { try { if (accnosfile != "") { mothurOut("accnos=" + accnosfile); mothurOutEndLine(); } if (columnfile != "") { mothurOut("column=" + columnfile); mothurOutEndLine(); } if (designfile != "") { mothurOut("design=" + designfile); mothurOutEndLine(); } if (fastafile != "") { mothurOut("fasta=" + fastafile); mothurOutEndLine(); } if (groupfile != "") { mothurOut("group=" + groupfile); mothurOutEndLine(); } if (listfile != "") { mothurOut("list=" + listfile); mothurOutEndLine(); } if (namefile != "") { mothurOut("name=" + namefile); mothurOutEndLine(); } if (oligosfile != "") { mothurOut("oligos=" + oligosfile); mothurOutEndLine(); } if (orderfile != "") { mothurOut("order=" + orderfile); mothurOutEndLine(); } if (ordergroupfile != "") { mothurOut("ordergroup=" + ordergroupfile); mothurOutEndLine(); } if (phylipfile != "") { mothurOut("phylip=" + phylipfile); mothurOutEndLine(); } if (qualfile != "") { mothurOut("qfile=" + qualfile); mothurOutEndLine(); } if (rabundfile != "") { mothurOut("rabund=" + rabundfile); mothurOutEndLine(); } if (relabundfile != "") { mothurOut("relabund=" + relabundfile); mothurOutEndLine(); } if (sabundfile != "") { mothurOut("sabund=" + sabundfile); mothurOutEndLine(); } if (sfffile != "") { mothurOut("sff=" + sfffile); mothurOutEndLine(); } if (sharedfile != "") { mothurOut("shared=" + sharedfile); mothurOutEndLine(); } if (taxonomyfile != "") { mothurOut("taxonomy=" + taxonomyfile); mothurOutEndLine(); } if (treefile != "") { mothurOut("tree=" + treefile); mothurOutEndLine(); } if (flowfile != "") { mothurOut("flow=" + flowfile); mothurOutEndLine(); } if (biomfile != "") { mothurOut("biom=" + biomfile); mothurOutEndLine(); } if (counttablefile != "") { mothurOut("count=" + counttablefile); mothurOutEndLine(); } if (processors != "1") { mothurOut("processors=" + processors); mothurOutEndLine(); } if (summaryfile != "") { mothurOut("summary=" + summaryfile); mothurOutEndLine(); } if (filefile != "") { mothurOut("file=" + filefile); mothurOutEndLine(); } } catch(exception& e) { errorOut(e, "MothurOut", "printCurrentFiles"); exit(1); } } /*********************************************************************************************/ bool MothurOut::hasCurrentFiles() { try { bool hasCurrent = false; if (accnosfile != "") { return true; } if (columnfile != "") { return true; } if (designfile != "") { return true; } if (fastafile != "") { return true; } if (groupfile != "") { return true; } if (listfile != "") { return true; } if (namefile != "") { return true; } if (oligosfile != "") { return true; } if (orderfile != "") { return true; } if (ordergroupfile != "") { return true; } if (phylipfile != "") { return true; } if (qualfile != "") { return true; } if (rabundfile != "") { return true; } if (relabundfile != "") { return true; } if (sabundfile != "") { return true; } if (sfffile != "") { return true; } if (sharedfile != "") { return true; } if (taxonomyfile != "") { return true; } if (treefile != "") { return true; } if (flowfile != "") { return true; } if (biomfile != "") { return true; } if (counttablefile != "") { return true; } if (summaryfile != "") { return true; } if (filefile != "") { return true; } if (processors != "1") { return true; } return hasCurrent; } catch(exception& e) { errorOut(e, "MothurOut", "hasCurrentFiles"); exit(1); } } /*********************************************************************************************/ void MothurOut::clearCurrentFiles() { try { phylipfile = ""; filefile = ""; columnfile = ""; listfile = ""; rabundfile = ""; sabundfile = ""; namefile = ""; groupfile = ""; designfile = ""; orderfile = ""; treefile = ""; sharedfile = ""; ordergroupfile = ""; relabundfile = ""; fastafile = ""; qualfile = ""; sfffile = ""; oligosfile = ""; accnosfile = ""; taxonomyfile = ""; flowfile = ""; biomfile = ""; counttablefile = ""; summaryfile = ""; processors = "1"; } catch(exception& e) { errorOut(e, "MothurOut", "clearCurrentFiles"); exit(1); } } /***********************************************************************/ string MothurOut::findProgramPath(string programName){ try { string pPath = ""; //look in ./ //is this the programs path? ifstream in5; string tempIn = "."; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) tempIn += "/" + programName; #else tempIn += "\\" + programName; #endif openInputFile(tempIn, in5, ""); //if this file exists if (in5) { in5.close(); pPath = getFullPathName(tempIn); if (debug) { mothurOut("[DEBUG]: found it, programPath = " + pPath + "\n"); } return pPath; } string envPath = getenv("PATH"); //delimiting path char char delim; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) delim = ':'; #else delim = ';'; #endif //break apart path variable by ':' vector dirs; splitAtChar(envPath, dirs, delim); if (debug) { mothurOut("[DEBUG]: dir's in path: \n"); } //get path related to mothur for (int i = 0; i < dirs.size(); i++) { if (debug) { mothurOut("[DEBUG]: " + dirs[i] + "\n"); } //to lower so we can find it string tempLower = ""; for (int j = 0; j < dirs[i].length(); j++) { tempLower += tolower(dirs[i][j]); } //is this mothurs path? if (tempLower.find(programName) != -1) { pPath = dirs[i]; break; } } if (debug) { mothurOut("[DEBUG]: programPath = " + pPath + "\n"); } if (pPath != "") { //add programName so it looks like what argv would look like #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) pPath += "/" + programName; #else pPath += "\\" + programName; #endif }else { //okay programName is not in the path, so the folder programName is in must be in the path //lets find out which one //get path related to the program for (int i = 0; i < dirs.size(); i++) { if (debug) { mothurOut("[DEBUG]: looking in " + dirs[i] + " for " + programName + " \n"); } //is this the programs path? ifstream in; string tempIn = dirs[i]; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) tempIn += "/" + programName; #else tempIn += "\\" + programName; #endif openInputFile(tempIn, in, ""); //if this file exists if (in) { in.close(); pPath = tempIn; if (debug) { mothurOut("[DEBUG]: found it, programPath = " + pPath + "\n"); } break; } } } return pPath; } catch(exception& e) { errorOut(e, "MothurOut", "findProgramPath"); exit(1); } } /*********************************************************************************************/ void MothurOut::setFileName(string filename) { try { logFileName = filename; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif openOutputFile(filename, out); #ifdef USE_MPI } #endif } catch(exception& e) { errorOut(e, "MothurOut", "setFileName"); exit(1); } } /*********************************************************************************************/ void MothurOut::setDefaultPath(string pathname) { try { //add / to name if needed string lastChar = pathname.substr(pathname.length()-1); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (lastChar != "/") { pathname += "/"; } #else if (lastChar != "\\") { pathname += "\\"; } #endif defaultPath = pathname; } catch(exception& e) { errorOut(e, "MothurOut", "setDefaultPath"); exit(1); } } /*********************************************************************************************/ void MothurOut::setOutputDir(string pathname) { try { outputDir = pathname; } catch(exception& e) { errorOut(e, "MothurOut", "setOutputDir"); exit(1); } } /*********************************************************************************************/ void MothurOut::closeLog() { try { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif out.close(); #ifdef USE_MPI } #endif } catch(exception& e) { errorOut(e, "MothurOut", "closeLog"); exit(1); } } /*********************************************************************************************/ MothurOut::~MothurOut() { try { _uniqueInstance = 0; } catch(exception& e) { errorOut(e, "MothurOut", "MothurOut"); exit(1); } } /*********************************************************************************************/ void MothurOut::mothurOut(string output) { try { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif if (!quietMode) { out << output; logger() << output; }else { //check for this being an error if ((output.find("[ERROR]") != string::npos) || (output.find("mothur >") != string::npos)) { out << output; logger() << output; } } #ifdef USE_MPI } #endif } catch(exception& e) { errorOut(e, "MothurOut", "MothurOut"); exit(1); } } /*********************************************************************************************/ void MothurOut::mothurOutJustToScreen(string output) { try { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif if (!quietMode) { logger() << output; }else { //check for this being an error if ((output.find("[ERROR]") != string::npos) || (output.find("mothur >") != string::npos)) { logger() << output; } } #ifdef USE_MPI } #endif } catch(exception& e) { errorOut(e, "MothurOut", "MothurOut"); exit(1); } } /*********************************************************************************************/ void MothurOut::mothurOutEndLine() { try { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif if (!quietMode) { out << endl; logger() << endl; } #ifdef USE_MPI } #endif } catch(exception& e) { errorOut(e, "MothurOut", "MothurOutEndLine"); exit(1); } } /*********************************************************************************************/ void MothurOut::mothurOut(string output, ofstream& outputFile) { try { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif if (!quietMode) { out << output; outputFile << output; logger() << output; }else { //check for this being an error if ((output.find("[ERROR]") != string::npos) || (output.find("mothur >") != string::npos)) { out << output; outputFile << output; logger() << output; } } #ifdef USE_MPI } #endif } catch(exception& e) { errorOut(e, "MothurOut", "MothurOut"); exit(1); } } /*********************************************************************************************/ void MothurOut::mothurOutEndLine(ofstream& outputFile) { try { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif if (!quietMode) { out << endl; outputFile << endl; logger() << endl; } #ifdef USE_MPI } #endif } catch(exception& e) { errorOut(e, "MothurOut", "MothurOutEndLine"); exit(1); } } /*********************************************************************************************/ void MothurOut::mothurOutJustToLog(string output) { try { #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); if (pid == 0) { //only one process should output to screen #endif if (!quietMode) { out << output; }else { //check for this being an error if ((output.find("[ERROR]") != string::npos) || (output.find("mothur >") != string::npos)) { out << output; } } #ifdef USE_MPI } #endif } catch(exception& e) { errorOut(e, "MothurOut", "MothurOutJustToLog"); exit(1); } } /*********************************************************************************************/ void MothurOut::errorOut(exception& e, string object, string function) { //double vm, rss; //mem_usage(vm, rss); string errorType = toString(e.what()); int pos = errorType.find("bad_alloc"); mothurOut("[ERROR]: "); mothurOut(errorType); if (pos == string::npos) { //not bad_alloc mothurOut(" has occurred in the " + object + " class function " + function + ". Please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include the mothur.logFile with your inquiry."); mothurOutEndLine(); }else { //bad alloc if (object == "cluster"){ mothurOut(" has occurred in the " + object + " class function " + function + ". This error indicates your computer is running out of memory. There are two common causes for this, file size and format.\n\nFile Size:\nThe cluster command loads your distance matrix into RAM, and your distance file is most likely too large to fit in RAM. There are two options to help with this. The first is to use a cutoff. By using a cutoff mothur will only load distances that are below the cutoff. If that is still not enough, there is a command called cluster.split, http://www.mothur.org/wiki/cluster.split which divides the distance matrix, and clusters the smaller pieces separately. You may also be able to reduce the size of the original distance matrix by using the commands outlined in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP. \n\nWrong Format:\nThis error can be caused by trying to read a column formatted distance matrix using the phylip parameter. By default, the dist.seqs command generates a column formatted distance matrix. To make a phylip formatted matrix set the dist.seqs command parameter output to lt. \n\nIf you are uable to resolve the issue, please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include the mothur.logFile with your inquiry."); }else if (object == "shhh.flows"){ mothurOut(" has occurred in the " + object + " class function " + function + ". This error indicates your computer is running out of memory. The shhh.flows command is very memory intensive. This error is most commonly caused by trying to process a dataset too large, using multiple processors, or failing to run trim.flows before shhh.flows. If you are running our 32bit version, your memory usage is limited to 4G. If you have more than 4G of RAM and are running a 64bit OS, using our 64bit version may resolve your issue. If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Running trim.flows with an oligos file, and then shhh.flows with the file option may also resolve the issue. If for some reason you are unable to run shhh.flows with your data, a good alternative is to use the trim.seqs command using a 50-bp sliding window and to trim the sequence when the average quality score over that window drops below 35. Our results suggest that the sequencing error rates by this method are very good, but not quite as good as by shhh.flows and that the resulting sequences tend to be a bit shorter. If you are uable to resolve the issue, please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include the mothur.logFile with your inquiry. "); }else { mothurOut(" has occurred in the " + object + " class function " + function + ". This error indicates your computer is running out of memory. This is most commonly caused by trying to process a dataset too large, using multiple processors, or a file format issue. If you are running our 32bit version, your memory usage is limited to 4G. If you have more than 4G of RAM and are running a 64bit OS, using our 64bit version may resolve your issue. If you are using multiple processors, try running the command with processors=1, the more processors you use the more memory is required. Also, you may be able to reduce the size of your dataset by using the commands outlined in the Schloss SOP, http://www.mothur.org/wiki/Schloss_SOP. If you are uable to resolve the issue, please contact Pat Schloss at mothur.bugs@gmail.com, and be sure to include the mothur.logFile with your inquiry."); } } } /*********************************************************************************************/ //The following was originally from http://stackoverflow.com/questions/669438/how-to-get-memory-usage-at-run-time-in-c // process_mem_usage(double &, double &) - takes two doubles by reference, // attempts to read the system-dependent data for a process' virtual memory // size and resident set size, and return the results in KB. // // On failure, returns 0.0, 0.0 int MothurOut::mem_usage(double& vm_usage, double& resident_set) { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) vm_usage = 0.0; resident_set = 0.0; // 'file' stat seems to give the most reliable results // ifstream stat_stream("/proc/self/stat",ios_base::in); // dummy vars for leading entries in stat that we don't care about // string pid, comm, state, ppid, pgrp, session, tty_nr; string tpgid, flags, minflt, cminflt, majflt, cmajflt; string utime, stime, cutime, cstime, priority, nice; string O, itrealvalue, starttime; // the two fields we want // unsigned long vsize; long rss; stat_stream >> pid >> comm >> state >> ppid >> pgrp >> session >> tty_nr >> tpgid >> flags >> minflt >> cminflt >> majflt >> cmajflt >> utime >> stime >> cutime >> cstime >> priority >> nice >> O >> itrealvalue >> starttime >> vsize >> rss; // don't care about the rest long page_size_kb = sysconf(_SC_PAGE_SIZE) / 1024; // in case x86-64 is configured to use 2MB pages vm_usage = vsize / 1024.0; resident_set = rss * page_size_kb; mothurOut("Memory Usage: vm = " + toString(vm_usage) + " rss = " + toString(resident_set) + "\n"); return 0; #else /* //windows memory usage // Get the list of process identifiers. DWORD aProcesses[1024], cbNeeded, cProcesses; if ( !EnumProcesses( aProcesses, sizeof(aProcesses), &cbNeeded ) ){ return 1; } // Calculate how many process identifiers were returned. cProcesses = cbNeeded / sizeof(DWORD); // Print the memory usage for each process for (int i = 0; i < cProcesses; i++ ) { DWORD processID = aProcesses[i]; PROCESS_MEMORY_COUNTERS pmc; HANDLE hProcess = OpenProcess((PROCESS_QUERY_INFORMATION | PROCESS_VM_READ), FALSE, processID); // Print the process identifier. printf( "\nProcess ID: %u\n", processID); if (NULL != hProcess) { if ( GetProcessMemoryInfo( hProcess, &pmc, sizeof(pmc)) ) { printf( "\tPageFaultCount: 0x%08X\n", pmc.PageFaultCount ); printf( "\tPeakWorkingSetSize: 0x%08X\n", pmc.PeakWorkingSetSize ); printf( "\tWorkingSetSize: 0x%08X\n", pmc.WorkingSetSize ); printf( "\tQuotaPeakPagedPoolUsage: 0x%08X\n", pmc.QuotaPeakPagedPoolUsage ); printf( "\tQuotaPagedPoolUsage: 0x%08X\n", pmc.QuotaPagedPoolUsage ); printf( "\tQuotaPeakNonPagedPoolUsage: 0x%08X\n", pmc.QuotaPeakNonPagedPoolUsage ); printf( "\tQuotaNonPagedPoolUsage: 0x%08X\n", pmc.QuotaNonPagedPoolUsage ); printf( "\tPagefileUsage: 0x%08X\n", pmc.PagefileUsage ); printf( "\tPeakPagefileUsage: 0x%08X\n", pmc.PeakPagefileUsage ); } CloseHandle(hProcess); } } */ return 0; #endif } /***********************************************************************/ int MothurOut::openOutputFileAppend(string fileName, ofstream& fileHandle){ try { fileName = getFullPathName(fileName); fileHandle.open(fileName.c_str(), ios::app); if(!fileHandle) { mothurOut("[ERROR]: Could not open " + fileName); mothurOutEndLine(); return 1; } else { return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openOutputFileAppend"); exit(1); } } /***********************************************************************/ int MothurOut::openOutputFileBinaryAppend(string fileName, ofstream& fileHandle){ try { fileName = getFullPathName(fileName); fileHandle.open(fileName.c_str(), ios::app | ios::binary); if(!fileHandle) { mothurOut("[ERROR]: Could not open " + fileName); mothurOutEndLine(); return 1; } else { return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openOutputFileAppend"); exit(1); } } /***********************************************************************/ void MothurOut::gobble(istream& f){ try { char d; while(isspace(d=f.get())) { ;} if(!f.eof()) { f.putback(d); } } catch(exception& e) { errorOut(e, "MothurOut", "gobble"); exit(1); } } /***********************************************************************/ void MothurOut::gobble(istringstream& f){ try { char d; while(isspace(d=f.get())) {;} if(!f.eof()) { f.putback(d); } } catch(exception& e) { errorOut(e, "MothurOut", "gobble"); exit(1); } } /***********************************************************************/ void MothurOut::zapGremlins(istream& f){ try { char d; while('\0'==(d=f.get())) { ;} if(!f.eof()) { f.putback(d); } } catch(exception& e) { errorOut(e, "MothurOut", "zapGremlins"); exit(1); } } /***********************************************************************/ void MothurOut::zapGremlins(istringstream& f){ try { char d; while('\0'==(d=f.get())) { ;} if(!f.eof()) { f.putback(d); } } catch(exception& e) { errorOut(e, "MothurOut", "zapGremlins"); exit(1); } } /***********************************************************************/ string MothurOut::getline(istringstream& fileHandle) { try { string line = ""; while (!fileHandle.eof()) { //get next character char c = fileHandle.get(); //are you at the end of the line if ((c == '\n') || (c == '\r') || (c == '\f')){ break; } else { line += c; } } return line; } catch(exception& e) { errorOut(e, "MothurOut", "getline"); exit(1); } } /***********************************************************************/ string MothurOut::getline(ifstream& fileHandle) { try { string line = ""; while (fileHandle) { //get next character char c = fileHandle.get(); //are you at the end of the line if ((c == '\n') || (c == '\r') || (c == '\f') || (c == EOF)){ break; } else { line += c; } } return line; } catch(exception& e) { errorOut(e, "MothurOut", "getline"); exit(1); } } /***********************************************************************/ #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_COMPRESSION inline bool endsWith(string s, const char * suffix){ size_t suffixLength = strlen(suffix); return s.size() >= suffixLength && s.substr(s.size() - suffixLength, suffixLength).compare(suffix) == 0; } #endif #endif string MothurOut::getRootName(string longName){ try { string rootName = longName; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_COMPRESSION if (endsWith(rootName, ".gz") || endsWith(rootName, ".bz2")) { int pos = rootName.find_last_of('.'); rootName = rootName.substr(0, pos); cerr << "shortening " << longName << " to " << rootName << "\n"; } #endif #endif if(rootName.find_last_of(".") != rootName.npos){ int pos = rootName.find_last_of('.')+1; rootName = rootName.substr(0, pos); } return rootName; } catch(exception& e) { errorOut(e, "MothurOut", "getRootName"); exit(1); } } /***********************************************************************/ string MothurOut::getSimpleName(string longName){ try { string simpleName = longName; size_t found; found=longName.find_last_of("/\\"); if(found != longName.npos){ simpleName = longName.substr(found+1); } return simpleName; } catch(exception& e) { errorOut(e, "MothurOut", "getSimpleName"); exit(1); } } /***********************************************************************/ int MothurOut::getRandomIndex(int highest){ try { if (highest == 0) { return 0; } int random = (int) ((float)(highest+1) * (float)(rand()) / ((float)RAND_MAX+1.0)); return random; } catch(exception& e) { errorOut(e, "MothurOut", "getRandomIndex"); exit(1); } } /**********************************************************************/ string MothurOut::getPathName(string longName){ try { string rootPathName = longName; if(longName.find_last_of("/\\") != longName.npos){ int pos = longName.find_last_of("/\\")+1; rootPathName = longName.substr(0, pos); } return rootPathName; } catch(exception& e) { errorOut(e, "MothurOut", "getPathName"); exit(1); } } /***********************************************************************/ bool MothurOut::dirCheck(string& dirName){ try { if (dirName == "") { return false; } string tag = ""; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are tag = toString(pid); #endif //add / to name if needed string lastChar = dirName.substr(dirName.length()-1); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (lastChar != "/") { dirName += "/"; } #else if (lastChar != "\\") { dirName += "\\"; } #endif //test to make sure directory exists dirName = getFullPathName(dirName); string outTemp = dirName + tag + "temp"+ toString(time(NULL)); ofstream out; out.open(outTemp.c_str(), ios::trunc); if(!out) { mothurOut(dirName + " directory does not exist or is not writable."); mothurOutEndLine(); }else{ out.close(); mothurRemove(outTemp); return true; } return false; } catch(exception& e) { errorOut(e, "MothurOut", "dirCheck"); exit(1); } } /***********************************************************************/ bool MothurOut::dirCheck(string& dirName, string noError){ try { if (dirName == "") { return false; } string tag = ""; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are tag = toString(pid); #endif //add / to name if needed string lastChar = dirName.substr(dirName.length()-1); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (lastChar != "/") { dirName += "/"; } #else if (lastChar != "\\") { dirName += "\\"; } #endif //test to make sure directory exists dirName = getFullPathName(dirName); string outTemp = dirName + tag + "temp"+ toString(time(NULL)); ofstream out; out.open(outTemp.c_str(), ios::trunc); if(!out) { //mothurOut(dirName + " directory does not exist or is not writable."); mothurOutEndLine(); }else{ out.close(); mothurRemove(outTemp); return true; } return false; } catch(exception& e) { errorOut(e, "MothurOut", "dirCheck - noError"); exit(1); } } /***********************************************************************/ //returns true it exits or if we can make it bool MothurOut::mkDir(string& dirName){ try { bool dirExist = dirCheck(dirName, "noError"); if (dirExist) { return true; } string tag = ""; #ifdef USE_MPI int pid; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are tag = toString(pid); #endif string makeDirectoryCommand = ""; makeDirectoryCommand += "mkdir -p \"" + dirName + "\""; system(makeDirectoryCommand.c_str()); if (dirCheck(dirName)) { return true; } return false; } catch(exception& e) { errorOut(e, "MothurOut", "mkDir"); exit(1); } } //********************************************************************************************************************** map > MothurOut::parseClasses(string classes){ try { map > parts; //treatment-age vector pieces; splitAtDash(classes, pieces); // -> treatment, age for (int i = 0; i < pieces.size(); i++) { string category = ""; string value = ""; bool foundOpen = false; for (int j = 0; j < pieces[i].length(); j++) { if (control_pressed) { return parts; } if (pieces[i][j] == '<') { foundOpen = true; } else if (pieces[i][j] == '>') { j += pieces[i].length(); } else { if (!foundOpen) { category += pieces[i][j]; } else { value += pieces[i][j]; } } } vector values; splitAtChar(value, values, '|'); parts[category] = values; } return parts; } catch(exception& e) { errorOut(e, "MothurOut", "parseClasses"); exit(1); } } /***********************************************************************/ string MothurOut::hasPath(string longName){ try { string path = ""; size_t found; found=longName.find_last_of("~/\\"); if(found != longName.npos){ path = longName.substr(0, found+1); } return path; } catch(exception& e) { errorOut(e, "MothurOut", "hasPath"); exit(1); } } /***********************************************************************/ string MothurOut::getExtension(string longName){ try { string extension = ""; if(longName.find_last_of('.') != longName.npos){ int pos = longName.find_last_of('.'); extension = longName.substr(pos, longName.length()); } return extension; } catch(exception& e) { errorOut(e, "MothurOut", "getExtension"); exit(1); } } /***********************************************************************/ bool MothurOut::isBlank(string fileName){ try { fileName = getFullPathName(fileName); ifstream fileHandle; fileHandle.open(fileName.c_str()); if(!fileHandle) { mothurOut("[ERROR]: Could not open " + fileName); mothurOutEndLine(); return false; }else { //check for blank file zapGremlins(fileHandle); gobble(fileHandle); if (fileHandle.eof()) { fileHandle.close(); return true; } fileHandle.close(); } return false; } catch(exception& e) { errorOut(e, "MothurOut", "isBlank"); exit(1); } } /***********************************************************************/ string MothurOut::getFullPathName(string fileName){ try{ string path = hasPath(fileName); string newFileName; int pos; if (path == "") { return fileName; } //its a simple name else { //we need to complete the pathname // ex. ../../../filename // cwd = /user/work/desktop string cwd; //get current working directory #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (path.find("~") != -1) { //go to home directory string homeDir; char *homepath = NULL; homepath = getenv ("HOME"); if ( homepath != NULL) { homeDir = homepath; } else { homeDir = ""; } newFileName = homeDir + fileName.substr(fileName.find("~")+1); return newFileName; }else { //find path if (path.rfind("./") == string::npos) { return fileName; } //already complete name else { newFileName = fileName.substr(fileName.rfind("./")+2); } //save the complete part of the name //char* cwdpath = new char[1024]; //size_t size; //cwdpath=getcwd(cwdpath,size); //cwd = cwdpath; char *cwdpath = NULL; cwdpath = getcwd(NULL, 0); // or _getcwd if ( cwdpath != NULL) { cwd = cwdpath; } else { cwd = ""; } //rip off first '/' string simpleCWD; if (cwd.length() > 0) { simpleCWD = cwd.substr(1); } //break apart the current working directory vector dirs; while (simpleCWD.find_first_of('/') != string::npos) { string dir = simpleCWD.substr(0,simpleCWD.find_first_of('/')); simpleCWD = simpleCWD.substr(simpleCWD.find_first_of('/')+1, simpleCWD.length()); dirs.push_back(dir); } //get last one // ex. ../../../filename = /user/work/desktop/filename dirs.push_back(simpleCWD); //ex. dirs[0] = user, dirs[1] = work, dirs[2] = desktop int index = dirs.size()-1; while((pos = path.rfind("./")) != string::npos) { //while you don't have a complete path if (pos == 0) { break; //you are at the end }else if (path[(pos-1)] == '.') { //you want your parent directory ../ path = path.substr(0, pos-1); index--; if (index == 0) { break; } }else if (path[(pos-1)] == '/') { //you want the current working dir ./ path = path.substr(0, pos); }else if (pos == 1) { break; //you are at the end }else { mothurOut("cannot resolve path for " + fileName + "\n"); return fileName; } } for (int i = index; i >= 0; i--) { newFileName = dirs[i] + "/" + newFileName; } newFileName = "/" + newFileName; return newFileName; } #else if (path.find("~") != string::npos) { //go to home directory string homeDir = getenv ("HOMEPATH"); newFileName = homeDir + fileName.substr(fileName.find("~")+1); return newFileName; }else { //find path if (path.rfind(".\\") == string::npos) { return fileName; } //already complete name else { newFileName = fileName.substr(fileName.rfind(".\\")+2); } //save the complete part of the name char *cwdpath = NULL; cwdpath = getcwd(NULL, 0); // or _getcwd if ( cwdpath != NULL) { cwd = cwdpath; } else { cwd = ""; } //break apart the current working directory vector dirs; while (cwd.find_first_of('\\') != -1) { string dir = cwd.substr(0,cwd.find_first_of('\\')); cwd = cwd.substr(cwd.find_first_of('\\')+1, cwd.length()); dirs.push_back(dir); } //get last one dirs.push_back(cwd); //ex. dirs[0] = user, dirs[1] = work, dirs[2] = desktop int index = dirs.size()-1; while((pos = path.rfind(".\\")) != string::npos) { //while you don't have a complete path if (pos == 0) { break; //you are at the end }else if (path[(pos-1)] == '.') { //you want your parent directory ../ path = path.substr(0, pos-1); index--; if (index == 0) { break; } }else if (path[(pos-1)] == '\\') { //you want the current working dir ./ path = path.substr(0, pos); }else if (pos == 1) { break; //you are at the end }else { mothurOut("cannot resolve path for " + fileName + "\n"); return fileName; } } for (int i = index; i >= 0; i--) { newFileName = dirs[i] + "\\" + newFileName; } return newFileName; } #endif } } catch(exception& e) { errorOut(e, "MothurOut", "getFullPathName"); exit(1); } } /***********************************************************************/ int MothurOut::openInputFile(string fileName, ifstream& fileHandle, string m){ try { //get full path name string completeFileName = getFullPathName(fileName); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_COMPRESSION // check for gzipped or bzipped file if (endsWith(completeFileName, ".gz") || endsWith(completeFileName, ".bz2")) { string tempName = string(tmpnam(0)); mkfifo(tempName.c_str(), 0666); int fork_result = fork(); if (fork_result < 0) { cerr << "Error forking.\n"; exit(1); } else if (fork_result == 0) { string command = (endsWith(completeFileName, ".gz") ? "zcat " : "bzcat ") + completeFileName + string(" > ") + tempName; cerr << "Decompressing " << completeFileName << " via temporary named pipe " << tempName << "\n"; system(command.c_str()); cerr << "Done decompressing " << completeFileName << "\n"; mothurRemove(tempName); exit(EXIT_SUCCESS); } else { cerr << "waiting on child process " << fork_result << "\n"; completeFileName = tempName; } } #endif #endif fileHandle.open(completeFileName.c_str()); if(!fileHandle) { //mothurOut("[ERROR]: Could not open " + completeFileName); mothurOutEndLine(); return 1; }else { //check for blank file zapGremlins(fileHandle); gobble(fileHandle); return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openInputFile - no Error"); exit(1); } } /***********************************************************************/ int MothurOut::openInputFile(string fileName, ifstream& fileHandle){ try { //get full path name string completeFileName = getFullPathName(fileName); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_COMPRESSION // check for gzipped or bzipped file if (endsWith(completeFileName, ".gz") || endsWith(completeFileName, ".bz2")) { string tempName = string(tmpnam(0)); mkfifo(tempName.c_str(), 0666); int fork_result = fork(); if (fork_result < 0) { cerr << "Error forking.\n"; exit(1); } else if (fork_result == 0) { string command = (endsWith(completeFileName, ".gz") ? "zcat " : "bzcat ") + completeFileName + string(" > ") + tempName; cerr << "Decompressing " << completeFileName << " via temporary named pipe " << tempName << "\n"; system(command.c_str()); cerr << "Done decompressing " << completeFileName << "\n"; mothurRemove(tempName); exit(EXIT_SUCCESS); } else { cerr << "waiting on child process " << fork_result << "\n"; completeFileName = tempName; } } #endif #endif fileHandle.open(completeFileName.c_str()); if(!fileHandle) { mothurOut("[ERROR]: Could not open " + completeFileName); mothurOutEndLine(); return 1; } else { //check for blank file zapGremlins(fileHandle); gobble(fileHandle); if (fileHandle.eof()) { mothurOut("[ERROR]: " + completeFileName + " is blank. Please correct."); mothurOutEndLine(); } return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openInputFile"); exit(1); } } /***********************************************************************/ int MothurOut::openInputFileBinary(string fileName, ifstream& fileHandle){ try { //get full path name string completeFileName = getFullPathName(fileName); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_COMPRESSION // check for gzipped or bzipped file if (endsWith(completeFileName, ".gz") || endsWith(completeFileName, ".bz2")) { string tempName = string(tmpnam(0)); mkfifo(tempName.c_str(), 0666); int fork_result = fork(); if (fork_result < 0) { cerr << "Error forking.\n"; exit(1); } else if (fork_result == 0) { string command = (endsWith(completeFileName, ".gz") ? "zcat " : "bzcat ") + completeFileName + string(" > ") + tempName; cerr << "Decompressing " << completeFileName << " via temporary named pipe " << tempName << "\n"; system(command.c_str()); cerr << "Done decompressing " << completeFileName << "\n"; mothurRemove(tempName); exit(EXIT_SUCCESS); } else { cerr << "waiting on child process " << fork_result << "\n"; completeFileName = tempName; } } #endif #endif fileHandle.open(completeFileName.c_str(), ios::binary); if(!fileHandle) { mothurOut("[ERROR]: Could not open " + completeFileName); mothurOutEndLine(); return 1; } else { //check for blank file zapGremlins(fileHandle); gobble(fileHandle); if (fileHandle.eof()) { mothurOut("[ERROR]: " + completeFileName + " is blank. Please correct."); mothurOutEndLine(); } return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openInputFileBinary"); exit(1); } } /***********************************************************************/ int MothurOut::openInputFileBinary(string fileName, ifstream& fileHandle, string noerror){ try { //get full path name string completeFileName = getFullPathName(fileName); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_COMPRESSION // check for gzipped or bzipped file if (endsWith(completeFileName, ".gz") || endsWith(completeFileName, ".bz2")) { string tempName = string(tmpnam(0)); mkfifo(tempName.c_str(), 0666); int fork_result = fork(); if (fork_result < 0) { cerr << "Error forking.\n"; exit(1); } else if (fork_result == 0) { string command = (endsWith(completeFileName, ".gz") ? "zcat " : "bzcat ") + completeFileName + string(" > ") + tempName; cerr << "Decompressing " << completeFileName << " via temporary named pipe " << tempName << "\n"; system(command.c_str()); cerr << "Done decompressing " << completeFileName << "\n"; mothurRemove(tempName); exit(EXIT_SUCCESS); } else { cerr << "waiting on child process " << fork_result << "\n"; completeFileName = tempName; } } #endif #endif fileHandle.open(completeFileName.c_str(), ios::binary); if(!fileHandle) { //mothurOut("[ERROR]: Could not open " + completeFileName); mothurOutEndLine(); return 1; } else { //check for blank file zapGremlins(fileHandle); gobble(fileHandle); //if (fileHandle.eof()) { mothurOut("[ERROR]: " + completeFileName + " is blank. Please correct."); mothurOutEndLine(); } return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openInputFileBinary - no error"); exit(1); } } /***********************************************************************/ #ifdef USE_BOOST int MothurOut::openInputFileBinary(string fileName, ifstream& file, boost::iostreams::filtering_istream& in){ try { //get full path name string completeFileName = getFullPathName(fileName); file.open(completeFileName.c_str(), ios_base::in | ios_base::binary); if(!file) { mothurOut("[ERROR]: Could not open " + completeFileName); mothurOutEndLine(); return 1; } else { //check for blank file in.push(boost::iostreams::gzip_decompressor()); in.push(file); if (file.eof()) { mothurOut("[ERROR]: " + completeFileName + " is blank. Please correct."); mothurOutEndLine(); } return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openInputFileGZBinary"); exit(1); } } /***********************************************************************/ int MothurOut::openInputFileBinary(string fileName, ifstream& file, boost::iostreams::filtering_istream& in, string noerror){ try { //get full path name string completeFileName = getFullPathName(fileName); file.open(completeFileName.c_str(), ios_base::in | ios_base::binary); if(!file) { return 1; } else { //check for blank file in.push(boost::iostreams::gzip_decompressor()); in.push(file); return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openInputFileGZBinary - no error"); exit(1); } } #endif /***********************************************************************/ //results[0] = allGZ, results[1] = allNotGZ vector MothurOut::allGZFiles(vector & files){ try { vector results; bool allGZ = true; bool allNOTGZ = true; for (int i = 0; i < files.size(); i++) { if (control_pressed) { break; } //ignore none and blank filenames if ((files[i] != "") || (files[i] != "NONE")) { if (isGZ(files[i])[1]) { allNOTGZ = false; } else { allGZ = false; } } } if (!allGZ && !allNOTGZ) { //mixed bag mothurOut("[ERROR]: Cannot mix .gz and non compressed files. Please decompress your files and rerun.\n"); control_pressed = true; } results.push_back(allGZ); results.push_back(allNOTGZ); return results; } catch(exception& e) { errorOut(e, "MothurOut", "areGZFiles"); exit(1); } } /***********************************************************************/ vector MothurOut::isGZ(string filename){ try { vector results; results.resize(2, false); #ifdef USE_BOOST ifstream fileHandle; boost::iostreams::filtering_istream gzin; if (getExtension(filename) != ".gz") { return results; } // results[0] = false; results[1] = false; int ableToOpen = openInputFileBinary(filename, fileHandle, gzin, ""); //no error if (ableToOpen == 1) { return results; } // results[0] = false; results[1] = false; else { results[0] = true; } char c; try { gzin >> c; results[1] = true; } catch ( boost::iostreams::gzip_error & e ) { gzin.pop(); fileHandle.close(); return results; // results[0] = true; results[1] = false; } fileHandle.close(); #else mothurOut("[ERROR]: cannot test for gz format without enabling boost libraries.\n"); control_pressed = true; #endif return results; //results[0] = true; results[1] = true; } catch(exception& e) { errorOut(e, "MothurOut", "isGZ"); exit(1); } } /***********************************************************************/ int MothurOut::renameFile(string oldName, string newName){ try { if (oldName == newName) { return 0; } ifstream inTest; int exist = openInputFile(newName, inTest, ""); inTest.close(); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if (exist == 0) { //you could open it so you want to delete it string command = "rm " + newName; system(command.c_str()); } string command = "mv " + oldName + " " + newName; system(command.c_str()); #else mothurRemove(newName); int renameOk = rename(oldName.c_str(), newName.c_str()); #endif return 0; } catch(exception& e) { errorOut(e, "MothurOut", "renameFile"); exit(1); } } /***********************************************************************/ int MothurOut::openOutputFile(string fileName, ofstream& fileHandle){ try { string completeFileName = getFullPathName(fileName); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_COMPRESSION // check for gzipped file if (endsWith(completeFileName, ".gz") || endsWith(completeFileName, ".bz2")) { string tempName = string(tmpnam(0)); mkfifo(tempName.c_str(), 0666); cerr << "Compressing " << completeFileName << " via temporary named pipe " << tempName << "\n"; int fork_result = fork(); if (fork_result < 0) { cerr << "Error forking.\n"; exit(1); } else if (fork_result == 0) { string command = string(endsWith(completeFileName, ".gz") ? "gzip" : "bzip2") + " -v > " + completeFileName + string(" < ") + tempName; system(command.c_str()); exit(0); } else { completeFileName = tempName; } } #endif #endif fileHandle.open(completeFileName.c_str(), ios::trunc); if(!fileHandle) { mothurOut("[ERROR]: Could not open " + completeFileName); mothurOutEndLine(); return 1; } else { return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openOutputFile"); exit(1); } } /***********************************************************************/ int MothurOut::openOutputFileBinary(string fileName, ofstream& fileHandle){ try { string completeFileName = getFullPathName(fileName); #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #ifdef USE_COMPRESSION // check for gzipped file if (endsWith(completeFileName, ".gz") || endsWith(completeFileName, ".bz2")) { string tempName = string(tmpnam(0)); mkfifo(tempName.c_str(), 0666); cerr << "Compressing " << completeFileName << " via temporary named pipe " << tempName << "\n"; int fork_result = fork(); if (fork_result < 0) { cerr << "Error forking.\n"; exit(1); } else if (fork_result == 0) { string command = string(endsWith(completeFileName, ".gz") ? "gzip" : "bzip2") + " -v > " + completeFileName + string(" < ") + tempName; system(command.c_str()); exit(0); } else { completeFileName = tempName; } } #endif #endif fileHandle.open(completeFileName.c_str(), ios::trunc | ios::binary); if(!fileHandle) { mothurOut("[ERROR]: Could not open " + completeFileName); mothurOutEndLine(); return 1; } else { return 0; } } catch(exception& e) { errorOut(e, "MothurOut", "openOutputFileBinary"); exit(1); } } /**************************************************************************************************/ int MothurOut::appendFiles(string temp, string filename) { try{ ofstream output; ifstream input; //open output file in append mode openOutputFileBinaryAppend(filename, output); int ableToOpen = openInputFileBinary(temp, input, "no error"); //int ableToOpen = openInputFile(temp, input); int numLines = 0; if (ableToOpen == 0) { //you opened it char buffer[4096]; while (!input.eof()) { input.read(buffer, 4096); output.write(buffer, input.gcount()); //count number of lines for (int i = 0; i < input.gcount(); i++) { if (buffer[i] == '\n') {numLines++;} } } input.close(); } output.close(); return numLines; } catch(exception& e) { errorOut(e, "MothurOut", "appendFiles"); exit(1); } } /**************************************************************************************************/ int MothurOut::appendBinaryFiles(string temp, string filename) { try{ ofstream output; ifstream input; //open output file in append mode openOutputFileBinaryAppend(filename, output); int ableToOpen = openInputFileBinary(temp, input, "no error"); if (ableToOpen == 0) { //you opened it char buffer[4096]; while (!input.eof()) { input.read(buffer, 4096); output.write(buffer, input.gcount()); } input.close(); } output.close(); return ableToOpen; } catch(exception& e) { errorOut(e, "MothurOut", "appendBinaryFiles"); exit(1); } } /**************************************************************************************************/ int MothurOut::appendSFFFiles(string temp, string filename) { try{ ofstream output; ifstream input; int ableToOpen = 0; //open output file in append mode string fullFileName = getFullPathName(filename); output.open(fullFileName.c_str(), ios::app | ios::binary); if(!output) { mothurOut("[ERROR]: Could not open " + fullFileName); mothurOutEndLine(); return 1; }else { //get full path name string completeFileName = getFullPathName(temp); input.open(completeFileName.c_str(), ios::binary); if(!input) { //mothurOut("[ERROR]: Could not open " + completeFileName); mothurOutEndLine(); return 1; }else { char buffer[4096]; while (!input.eof()) { input.read(buffer, 4096); output.write(buffer, input.gcount()); } input.close(); } output.close(); } return ableToOpen; } catch(exception& e) { errorOut(e, "MothurOut", "appendSFFFiles"); exit(1); } } /**************************************************************************************************/ int MothurOut::appendFilesWithoutHeaders(string temp, string filename) { try{ ofstream output; ifstream input; //open output file in append mode openOutputFileAppend(filename, output); int ableToOpen = openInputFile(temp, input, "no error"); //int ableToOpen = openInputFile(temp, input); int numLines = 0; if (ableToOpen == 0) { //you opened it string headers = getline(input); gobble(input); if (debug) { mothurOut("[DEBUG]: skipping headers " + headers +'\n'); } char buffer[4096]; while (!input.eof()) { input.read(buffer, 4096); output.write(buffer, input.gcount()); //count number of lines for (int i = 0; i < input.gcount(); i++) { if (buffer[i] == '\n') {numLines++;} } } input.close(); } output.close(); return numLines; } catch(exception& e) { errorOut(e, "MothurOut", "appendFiles"); exit(1); } } /**************************************************************************************************/ string MothurOut::sortFile(string distFile, string outputDir){ try { //if (outputDir == "") { outputDir += hasPath(distFile); } string outfile = getRootName(distFile) + "sorted.dist"; //if you can, use the unix sort since its been optimized for years #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) string command = "sort -n -k +3 " + distFile + " -o " + outfile; system(command.c_str()); #else //you are stuck with my best attempt... //windows sort does not have a way to specify a column, only a character in the line //since we cannot assume that the distance will always be at the the same character location on each line //due to variable sequence name lengths, I chose to force the distance into first position, then sort and then put it back. //read in file line by file and put distance first string tempDistFile = distFile + ".temp"; ifstream input; ofstream output; openInputFile(distFile, input); openOutputFile(tempDistFile, output); string firstName, secondName; float dist; while (!input.eof()) { input >> firstName >> secondName >> dist; output << dist << '\t' << firstName << '\t' << secondName << endl; gobble(input); } input.close(); output.close(); //sort using windows sort string tempOutfile = outfile + ".temp"; string command = "sort " + tempDistFile + " /O " + tempOutfile; system(command.c_str()); //read in sorted file and put distance at end again ifstream input2; ofstream output2; openInputFile(tempOutfile, input2); openOutputFile(outfile, output2); while (!input2.eof()) { input2 >> dist >> firstName >> secondName; output2 << firstName << '\t' << secondName << '\t' << dist << endl; gobble(input2); } input2.close(); output2.close(); //remove temp files mothurRemove(tempDistFile); mothurRemove(tempOutfile); #endif return outfile; } catch(exception& e) { errorOut(e, "MothurOut", "sortFile"); exit(1); } } /**************************************************************************************************/ vector MothurOut::setFilePosFasta(string filename, long long& num) { try { vector positions; ifstream inFASTA; //openInputFileBinary(filename, inFASTA); string completeFileName = getFullPathName(filename); inFASTA.open(completeFileName.c_str(), ios::binary); string input; unsigned long long count = 0; while(!inFASTA.eof()){ //input = getline(inFASTA); //cout << input << '\t' << inFASTA.tellg() << endl; //if (input.length() != 0) { // if(input[0] == '>'){ unsigned long int pos = inFASTA.tellg(); positions.push_back(pos - input.length() - 1); cout << (pos - input.length() - 1) << endl; } //} //gobble(inFASTA); //has to be here since windows line endings are 2 characters and mess up the positions char c = inFASTA.get(); count++; if (c == '>') { positions.push_back(count-1); if (debug) { mothurOut("[DEBUG]: numSeqs = " + toString(positions.size()) + " count = " + toString(count) + ".\n"); } } } inFASTA.close(); num = positions.size(); if (debug) { mothurOut("[DEBUG]: num = " + toString(num) + ".\n"); } FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (completeFileName.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } /*unsigned long long size = positions[(positions.size()-1)]; ifstream in; openInputFile(filename, in); in.seekg(size); while(in.get()){ if(in.eof()) { break; } else { size++; } } in.close();*/ if (debug) { mothurOut("[DEBUG]: size = " + toString(size) + ".\n"); } positions.push_back(size); positions[0] = 0; return positions; } catch(exception& e) { errorOut(e, "MothurOut", "setFilePosFasta"); exit(1); } } /**************************************************************************************************/ vector MothurOut::setFilePosFasta(string filename, long long& num, char delim) { try { vector positions; ifstream inFASTA; string completeFileName = getFullPathName(filename); inFASTA.open(completeFileName.c_str(), ios::binary); int nameLine = 2; if (delim == '@') { nameLine = 4; } else if (delim == '>') { nameLine = 2; } else { mothurOut("[ERROR]: unknown file deliminator, quitting.\n"); control_pressed = true; } unsigned long long count = 0; long long numLines = 0; while(!inFASTA.eof()){ char c = inFASTA.get(); count++; string input = ""; input += c; while ((c != '\n') && (c != '\r') && (c != '\f') && (c != EOF)) { c = inFASTA.get(); count++; input += c; } numLines++; //gobble while(isspace(c=inFASTA.get())) { input += c; count++;} if(!inFASTA.eof()) { inFASTA.putback(c); count--; } if (input.length() != 0) { if((input[0] == delim) && (((numLines-1)%nameLine) == 0)){ //this is a name line //mothurOut(input + '\t' + toString(count+numLines-input.length()) + '\n');// << endl; positions.push_back(count+numLines-input.length()); if (debug) { mothurOut("[DEBUG]: numSeqs = " + toString(positions.size()) + " count = " + toString(count) + input + ".\n"); } }else if (int(c) == -1) { break; } else { input = ""; } } } inFASTA.close(); num = positions.size(); if (debug) { mothurOut("[DEBUG]: num = " + toString(num) + ".\n"); } FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (completeFileName.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } if (debug) { mothurOut("[DEBUG]: size = " + toString(size) + ".\n"); } positions.push_back(size); positions[0] = 0; return positions; } catch(exception& e) { errorOut(e, "MothurOut", "setFilePosFasta"); exit(1); } } /**************************************************************************************************/ vector MothurOut::setFilePosFasta(string filename, int& num) { try { vector positions; ifstream inFASTA; //openInputFile(filename, inFASTA); string completeFileName = getFullPathName(filename); inFASTA.open(completeFileName.c_str(), ios::binary); string input; unsigned long long count = 0; while(!inFASTA.eof()){ //input = getline(inFASTA); //cout << input << '\t' << inFASTA.tellg() << endl; //if (input.length() != 0) { // if(input[0] == '>'){ unsigned long int pos = inFASTA.tellg(); positions.push_back(pos - input.length() - 1); cout << (pos - input.length() - 1) << endl; } //} //gobble(inFASTA); //has to be here since windows line endings are 2 characters and mess up the positions char c = inFASTA.get(); count++; if (c == '>') { positions.push_back(count-1); if (debug) { mothurOut("[DEBUG]: numSeqs = " + toString(positions.size()) + " count = " + toString(count) + ".\n"); } } } inFASTA.close(); num = positions.size(); if (debug) { mothurOut("[DEBUG]: num = " + toString(num) + ".\n"); } FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (completeFileName.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } /*unsigned long long size = positions[(positions.size()-1)]; ifstream in; openInputFile(filename, in); in.seekg(size); while(in.get()){ if(in.eof()) { break; } else { size++; } } in.close();*/ if (debug) { mothurOut("[DEBUG]: size = " + toString(size) + ".\n"); } positions.push_back(size); positions[0] = 0; return positions; } catch(exception& e) { errorOut(e, "MothurOut", "setFilePosFasta"); exit(1); } } //********************************************************************************************************************** vector MothurOut::readConsTax(string inputfile){ try { vector taxes; ifstream in; openInputFile(inputfile, in); //read headers getline(in); while (!in.eof()) { if (control_pressed) { break; } string otu = ""; string tax = "unknown"; int size = 0; in >> otu >> size >> tax; gobble(in); consTax temp(otu, tax, size); taxes.push_back(temp); } in.close(); return taxes; } catch(exception& e) { errorOut(e, "MothurOut", "readConsTax"); exit(1); } } //********************************************************************************************************************** int MothurOut::readConsTax(string inputfile, map& taxes){ try { ifstream in; openInputFile(inputfile, in); //read headers getline(in); while (!in.eof()) { if (control_pressed) { break; } string otu = ""; string tax = "unknown"; int size = 0; in >> otu >> size >> tax; gobble(in); consTax2 temp(otu, tax, size); string simpleBin = getSimpleLabel(otu); int bin; convert(simpleBin, bin); taxes[bin] = temp; } in.close(); return 0; } catch(exception& e) { errorOut(e, "MothurOut", "readConsTax"); exit(1); } } /**************************************************************************************************/ vector MothurOut::setFilePosEachLine(string filename, int& num) { try { filename = getFullPathName(filename); vector positions; ifstream in; //openInputFile(filename, in); openInputFileBinary(filename, in); string input; unsigned long long count = 0; positions.push_back(0); while(!in.eof()){ //getline counting reads char d = in.get(); count++; while ((d != '\n') && (d != '\r') && (d != '\f') && (d != in.eof())) { //get next character d = in.get(); count++; } if (!in.eof()) { d=in.get(); count++; while(isspace(d) && (d != in.eof())) { d=in.get(); count++;} } positions.push_back(count-1); //cout << count-1 << endl; } in.close(); num = positions.size()-1; FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (filename.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } positions[(positions.size()-1)] = size; return positions; } catch(exception& e) { errorOut(e, "MothurOut", "setFilePosEachLine"); exit(1); } } /**************************************************************************************************/ vector MothurOut::setFilePosEachLine(string filename, unsigned long long& num) { try { filename = getFullPathName(filename); vector positions; ifstream in; //openInputFile(filename, in); openInputFileBinary(filename, in); string input; unsigned long long count = 0; positions.push_back(0); while(!in.eof()){ //getline counting reads char d = in.get(); count++; while ((d != '\n') && (d != '\r') && (d != '\f') && (d != in.eof())) { //get next character d = in.get(); count++; } if (!in.eof()) { d=in.get(); count++; while(isspace(d) && (d != in.eof())) { d=in.get(); count++;} } positions.push_back(count-1); //cout << count-1 << endl; } in.close(); num = positions.size()-1; FILE * pFile; unsigned long long size; //get num bytes in file pFile = fopen (filename.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } positions[(positions.size()-1)] = size; return positions; } catch(exception& e) { errorOut(e, "MothurOut", "setFilePosEachLine"); exit(1); } } /**************************************************************************************************/ vector MothurOut::divideFile(string filename, int& proc) { try{ vector filePos; filePos.push_back(0); FILE * pFile; unsigned long long size; filename = getFullPathName(filename); //get num bytes in file pFile = fopen (filename.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //estimate file breaks unsigned long long chunkSize = 0; chunkSize = size / proc; //file to small to divide by processors if (chunkSize == 0) { proc = 1; filePos.push_back(size); return filePos; } if (proc > 1) { //for each process seekg to closest file break and search for next '>' char. make that the filebreak for (int i = 0; i < proc; i++) { unsigned long long spot = (i+1) * chunkSize; ifstream in; openInputFile(filename, in); in.seekg(spot); //look for next '>' unsigned long long newSpot = spot; while (!in.eof()) { char c = in.get(); if (c == '>') { in.putback(c); newSpot = in.tellg(); break; } else if (int(c) == -1) { break; } } //there was not another sequence before the end of the file unsigned long long sanityPos = in.tellg(); if (sanityPos == -1) { break; } else { filePos.push_back(newSpot); } in.close(); } } //save end pos filePos.push_back(size); //sanity check filePos for (int i = 0; i < (filePos.size()-1); i++) { if (filePos[(i+1)] <= filePos[i]) { filePos.erase(filePos.begin()+(i+1)); i--; } } proc = (filePos.size() - 1); #else mothurOut("[ERROR]: Windows version should not be calling the divideFile function."); mothurOutEndLine(); proc=1; filePos.push_back(size); #endif return filePos; } catch(exception& e) { errorOut(e, "MothurOut", "divideFile"); exit(1); } } /**************************************************************************************************/ vector MothurOut::divideFile(string filename, int& proc, char delimChar) { try{ vector filePos; filePos.push_back(0); FILE * pFile; unsigned long long size; filename = getFullPathName(filename); //get num bytes in file pFile = fopen (filename.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } char secondaryDelim = '>'; if (delimChar == '@') { secondaryDelim = '+'; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //estimate file breaks unsigned long long chunkSize = 0; chunkSize = size / proc; //file to small to divide by processors if (chunkSize == 0) { proc = 1; filePos.push_back(size); return filePos; } //for each process seekg to closest file break and search for next delimChar char. make that the filebreak for (int i = 0; i < proc; i++) { unsigned long long spot = (i+1) * chunkSize; ifstream in; openInputFile(filename, in); in.seekg(spot); getline(in); //get to end of line in case you jump into middle of line where the delim char happens to fall. //look for next delimChar unsigned long long newSpot = spot; while (!in.eof()) { char c = in.get(); string input = ""; input += c; while ((c != '\n') && (c != '\r') && (c != '\f') && (c != EOF)) { c = in.get(); input += c; } if (input.length() != 0) { if(input[0] == delimChar){ //this is a potential name line newSpot = in.tellg(); newSpot -=input.length(); //get two lines and look for secondary delim //inf a fasta file this would be a new sequence, in fastq it will be the + line, if this was a nameline. getline(in); gobble(in); if (!in.eof()) { string secondInput = getline(in); gobble(in); if (debug) { mothurOut("[DEBUG]: input= " + input + "\n secondaryInput = " + secondInput + "\n"); } if (secondInput[0] == secondaryDelim) { break; } //yes, it was a nameline so stop else { input = ""; gobble(in); } //nope it was a delim at the beginning of a non nameline, keep looking. } }else if (int(c) == -1) { break; } else { input = ""; gobble(in); } } } //there was not another sequence before the end of the file unsigned long long sanityPos = in.tellg(); if (sanityPos == -1) { break; } else { filePos.push_back(newSpot); } in.close(); } //save end pos filePos.push_back(size); //sanity check filePos for (int i = 0; i < (filePos.size()-1); i++) { if (filePos[(i+1)] <= filePos[i]) { filePos.erase(filePos.begin()+(i+1)); i--; } } proc = (filePos.size() - 1); #else mothurOut("[ERROR]: Windows version should not be calling the divideFile function."); mothurOutEndLine(); proc=1; filePos.push_back(size); #endif return filePos; } catch(exception& e) { errorOut(e, "MothurOut", "divideFile"); exit(1); } } /**************************************************************************************************/ vector MothurOut::divideFilePerLine(string filename, int& proc) { try{ vector filePos; filePos.push_back(0); FILE * pFile; unsigned long long size; filename = getFullPathName(filename); //get num bytes in file pFile = fopen (filename.c_str(),"rb"); if (pFile==NULL) perror ("Error opening file"); else{ fseek (pFile, 0, SEEK_END); size=ftell (pFile); fclose (pFile); } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) //estimate file breaks unsigned long long chunkSize = 0; chunkSize = size / proc; //file to small to divide by processors if (chunkSize == 0) { proc = 1; filePos.push_back(size); return filePos; } //for each process seekg to closest file break and search for next '>' char. make that the filebreak for (int i = 0; i < proc; i++) { unsigned long long spot = (i+1) * chunkSize; ifstream in; openInputFile(filename, in); in.seekg(spot); //look for next line break unsigned long long newSpot = spot; while (!in.eof()) { char c = in.get(); if ((c == '\n') || (c == '\r') || (c == '\f')) { gobble(in); newSpot = in.tellg(); break; } else if (int(c) == -1) { break; } } //there was not another line before the end of the file unsigned long long sanityPos = in.tellg(); if (sanityPos == -1) { break; } else { filePos.push_back(newSpot); } in.close(); } //save end pos filePos.push_back(size); //sanity check filePos for (int i = 0; i < (filePos.size()-1); i++) { if (filePos[(i+1)] <= filePos[i]) { filePos.erase(filePos.begin()+(i+1)); i--; } } proc = (filePos.size() - 1); #else mothurOut("[ERROR]: Windows version should not be calling the divideFile function."); mothurOutEndLine(); proc=1; filePos.push_back(size); #endif return filePos; } catch(exception& e) { errorOut(e, "MothurOut", "divideFile"); exit(1); } } /**************************************************************************************************/ int MothurOut::divideFile(string filename, int& proc, vector& files) { try{ vector filePos = divideFile(filename, proc); for (int i = 0; i < (filePos.size()-1); i++) { //read file chunk ifstream in; openInputFile(filename, in); in.seekg(filePos[i]); unsigned long long size = filePos[(i+1)] - filePos[i]; char* chunk = new char[size]; in.read(chunk, size); in.close(); //open new file string fileChunkName = filename + "." + toString(i) + ".tmp"; ofstream out; openOutputFile(fileChunkName, out); out << chunk << endl; out.close(); delete[] chunk; //save name files.push_back(fileChunkName); } return 0; } catch(exception& e) { errorOut(e, "MothurOut", "divideFile"); exit(1); } } /***********************************************************************/ bool MothurOut::isTrue(string f){ try { for (int i = 0; i < f.length(); i++) { f[i] = toupper(f[i]); } if ((f == "TRUE") || (f == "T")) { return true; } else { return false; } } catch(exception& e) { errorOut(e, "MothurOut", "isTrue"); exit(1); } } /***********************************************************************/ float MothurOut::roundDist(float dist, int precision){ try { return int(dist * precision + 0.5)/float(precision); } catch(exception& e) { errorOut(e, "MothurOut", "roundDist"); exit(1); } } /***********************************************************************/ float MothurOut::ceilDist(float dist, int precision){ try { return int(ceil(dist * precision))/float(precision); } catch(exception& e) { errorOut(e, "MothurOut", "ceilDist"); exit(1); } } /***********************************************************************/ vector MothurOut::splitWhiteSpace(string& rest, char buffer[], int size){ try { vector pieces; for (int i = 0; i < size; i++) { if (!isspace(buffer[i])) { rest += buffer[i]; } else { if (rest != "") { pieces.push_back(rest); rest = ""; } while (i < size) { //gobble white space if (isspace(buffer[i])) { i++; } else { rest = buffer[i]; break; } //cout << "next piece buffer = " << nextPiece << endl; } } } return pieces; } catch(exception& e) { errorOut(e, "MothurOut", "splitWhiteSpace"); exit(1); } } /***********************************************************************/ vector MothurOut::splitWhiteSpace(string input){ try { vector pieces; string rest = ""; for (int i = 0; i < input.length(); i++) { if (!isspace(input[i])) { rest += input[i]; } else { if (rest != "") { pieces.push_back(rest); rest = ""; } while (i < input.length()) { //gobble white space if (isspace(input[i])) { i++; } else { rest = input[i]; break; } //cout << "next piece buffer = " << nextPiece << endl; } } } if (rest != "") { pieces.push_back(rest); } return pieces; } catch(exception& e) { errorOut(e, "MothurOut", "splitWhiteSpace"); exit(1); } } /***********************************************************************/ vector MothurOut::splitWhiteSpaceWithQuotes(string input){ try { vector pieces; string rest = ""; int pos = input.find('\''); int pos2 = input.find('\"'); if ((pos == string::npos) && (pos2 == string::npos)) { return splitWhiteSpace(input); } //no quotes to worry about else { for (int i = 0; i < input.length(); i++) { if ((input[i] == '\'') || (input[i] == '\"') || (rest == "\'") || (rest == "\"")) { //grab everything til end or next ' or " rest += input[i]; for (int j = i+1; j < input.length(); j++) { if ((input[j] == '\'') || (input[j] == '\"')) { //then quit rest += input[j]; i = j+1; j+=input.length(); }else { rest += input[j]; } } }else if (!isspace(input[i])) { rest += input[i]; } else { if (rest != "") { pieces.push_back(rest); rest = ""; } while (i < input.length()) { //gobble white space if (isspace(input[i])) { i++; } else { rest = input[i]; break; } //cout << "next piece buffer = " << nextPiece << endl; } } } if (rest != "") { pieces.push_back(rest); } } return pieces; } catch(exception& e) { errorOut(e, "MothurOut", "splitWhiteSpace"); exit(1); } } //********************************************************************************************************************** int MothurOut::readTax(string namefile, map& taxMap, bool removeConfidence) { try { //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; bool error = false; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); //are there confidence scores, if so remove them if (removeConfidence) { if (secondCol.find_first_of('(') != -1) { removeConfidences(secondCol); } } map::iterator itTax = taxMap.find(firstCol); if(itTax == taxMap.end()) { bool ignore = false; if (secondCol != "") { if (secondCol[secondCol.length()-1] != ';') { mothurOut("[ERROR]: " + firstCol + " is missing the final ';', ignoring.\n"); ignore=true; } } if (!ignore) { taxMap[firstCol] = secondCol; } if (debug) { mothurOut("[DEBUG]: name = '" + firstCol + "' tax = '" + secondCol + "'\n"); } }else { mothurOut("[ERROR]: " + firstCol + " is already in your taxonomy file, names must be unique.\n"); error = true; } pairDone = false; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); //are there confidence scores, if so remove them if (removeConfidence) { if (secondCol.find_first_of('(') != -1) { removeConfidences(secondCol); } } map::iterator itTax = taxMap.find(firstCol); if(itTax == taxMap.end()) { bool ignore = false; if (secondCol != "") { if (secondCol[secondCol.length()-1] != ';') { mothurOut("[ERROR]: " + firstCol + " is missing the final ';', ignoring.\n"); ignore=true; } } if (!ignore) { taxMap[firstCol] = secondCol; } if (debug) { mothurOut("[DEBUG]: name = '" + firstCol + "' tax = '" + secondCol + "'\n"); } }else { mothurOut("[ERROR]: " + firstCol + " is already in your taxonomy file, names must be unique./n"); error = true; } pairDone = false; } } } if (error) { control_pressed = true; } if (debug) { mothurOut("[DEBUG]: numSeqs saved = '" + toString(taxMap.size()) + "'\n"); } return taxMap.size(); } catch(exception& e) { errorOut(e, "MothurOut", "readTax"); exit(1); } } /**********************************************************************************************************************/ int MothurOut::readNames(string namefile, map& nameMap, bool redund) { try { //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); //parse names into vector vector theseNames; splitAtComma(secondCol, theseNames); for (int i = 0; i < theseNames.size(); i++) { nameMap[theseNames[i]] = firstCol; } pairDone = false; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); //parse names into vector vector theseNames; splitAtComma(secondCol, theseNames); for (int i = 0; i < theseNames.size(); i++) { nameMap[theseNames[i]] = firstCol; } pairDone = false; } } } return nameMap.size(); } catch(exception& e) { errorOut(e, "MothurOut", "readNames"); exit(1); } } /**********************************************************************************************************************/ int MothurOut::readNames(string namefile, map& nameMap, int flip) { try { //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); nameMap[secondCol] = firstCol; pairDone = false; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); nameMap[secondCol] = firstCol; pairDone = false; } } } return nameMap.size(); } catch(exception& e) { errorOut(e, "MothurOut", "readNames"); exit(1); } } /**********************************************************************************************************************/ int MothurOut::readNames(string namefile, map& nameMap, map& nameCount) { try { nameMap.clear(); nameCount.clear(); //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); //parse names into vector vector theseNames; splitAtComma(secondCol, theseNames); for (int i = 0; i < theseNames.size(); i++) { nameMap[theseNames[i]] = firstCol; } nameCount[firstCol] = theseNames.size(); pairDone = false; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); //parse names into vector vector theseNames; splitAtComma(secondCol, theseNames); for (int i = 0; i < theseNames.size(); i++) { nameMap[theseNames[i]] = firstCol; } nameCount[firstCol] = theseNames.size(); pairDone = false; } } } return nameMap.size(); } catch(exception& e) { errorOut(e, "MothurOut", "readNames"); exit(1); } } /**********************************************************************************************************************/ int MothurOut::readNames(string namefile, map& nameMap) { try { //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); nameMap[firstCol] = secondCol; pairDone = false; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); nameMap[firstCol] = secondCol; pairDone = false; } } } return nameMap.size(); } catch(exception& e) { errorOut(e, "MothurOut", "readNames"); exit(1); } } /**********************************************************************************************************************/ int MothurOut::readNames(string namefile, map >& nameMap) { try { //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); vector temp; splitAtComma(secondCol, temp); nameMap[firstCol] = temp; pairDone = false; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); vector temp; splitAtComma(secondCol, temp); nameMap[firstCol] = temp; pairDone = false; } } } return nameMap.size(); } catch(exception& e) { errorOut(e, "MothurOut", "readNames"); exit(1); } } /**********************************************************************************************************************/ map MothurOut::readNames(string namefile) { try { map nameMap; //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); int num = getNumNames(secondCol); nameMap[firstCol] = num; if (debug) { mothurOut("[DEBUG]: " + firstCol + ", " + toString(num) + ".\n"); } pairDone = false; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); int num = getNumNames(secondCol); nameMap[firstCol] = num; pairDone = false; } } } return nameMap; } catch(exception& e) { errorOut(e, "MothurOut", "readNames"); exit(1); } } /**********************************************************************************************************************/ map MothurOut::readNames(string namefile, unsigned long int& numSeqs) { try { map nameMap; numSeqs = 0; //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); int num = getNumNames(secondCol); nameMap[firstCol] = num; pairDone = false; numSeqs += num; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); int num = getNumNames(secondCol); nameMap[firstCol] = num; pairDone = false; numSeqs += num; } } } return nameMap; } catch(exception& e) { errorOut(e, "MothurOut", "readNames"); exit(1); } } /************************************************************/ int MothurOut::checkName(string& name) { try { if (modifyNames) { for (int i = 0; i < name.length(); i++) { if (name[i] == ':') { name[i] = '_'; changedSeqNames = true; } } } return 0; } catch(exception& e) { errorOut(e, "MothurOut", "checkName"); exit(1); } } /**********************************************************************************************************************/ int MothurOut::readNames(string namefile, vector& nameVector, map& fastamap) { try { int error = 0; //open input file ifstream in; openInputFile(namefile, in); string rest = ""; char buffer[4096]; bool pairDone = false; bool columnOne = true; string firstCol, secondCol; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); int num = getNumNames(secondCol); map::iterator it = fastamap.find(firstCol); if (it == fastamap.end()) { error = 1; mothurOut("[ERROR]: " + firstCol + " is not in your fastafile, but is in your namesfile, please correct."); mothurOutEndLine(); }else { seqPriorityNode temp(num, it->second, firstCol); nameVector.push_back(temp); } pairDone = false; } } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { if (columnOne) { firstCol = pieces[i]; columnOne=false; } else { secondCol = pieces[i]; pairDone = true; columnOne=true; } if (pairDone) { checkName(firstCol); checkName(secondCol); int num = getNumNames(secondCol); map::iterator it = fastamap.find(firstCol); if (it == fastamap.end()) { error = 1; mothurOut("[ERROR]: " + firstCol + " is not in your fastafile, but is in your namesfile, please correct."); mothurOutEndLine(); }else { seqPriorityNode temp(num, it->second, firstCol); nameVector.push_back(temp); } pairDone = false; } } } return error; } catch(exception& e) { errorOut(e, "MothurOut", "readNames"); exit(1); } } //********************************************************************************************************************** set MothurOut::readAccnos(string accnosfile){ try { set names; ifstream in; int ableToOpen = openInputFile(accnosfile, in, ""); if (ableToOpen == 1) { mothurOut("[ERROR]: Could not open " + accnosfile); mothurOutEndLine(); return names; } string name; string rest = ""; char buffer[4096]; unsigned long long count = 0; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { checkName(pieces[i]); names.insert(pieces[i]); count++; } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { checkName(pieces[i]); names.insert(pieces[i]); count++; } } return names; } catch(exception& e) { errorOut(e, "MothurOut", "readAccnos"); exit(1); } } //********************************************************************************************************************** int MothurOut::readAccnos(string accnosfile, vector& names){ try { names.clear(); ifstream in; openInputFile(accnosfile, in); string name; string rest = ""; char buffer[4096]; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { checkName(pieces[i]); names.push_back(pieces[i]); } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { checkName(pieces[i]); names.push_back(pieces[i]); } } return 0; } catch(exception& e) { errorOut(e, "MothurOut", "readAccnos"); exit(1); } } //********************************************************************************************************************** int MothurOut::readAccnos(string accnosfile, vector& names, string noerror){ try { names.clear(); ifstream in; openInputFile(accnosfile, in, noerror); string name; string rest = ""; char buffer[4096]; while (!in.eof()) { if (control_pressed) { break; } in.read(buffer, 4096); vector pieces = splitWhiteSpace(rest, buffer, in.gcount()); for (int i = 0; i < pieces.size(); i++) { checkName(pieces[i]); names.push_back(pieces[i]); } } in.close(); if (rest != "") { vector pieces = splitWhiteSpace(rest); for (int i = 0; i < pieces.size(); i++) { checkName(pieces[i]); names.push_back(pieces[i]); } } return 0; } catch(exception& e) { errorOut(e, "MothurOut", "readAccnos"); exit(1); } } /***********************************************************************/ int MothurOut::getNumNames(string names){ try { int count = 0; if(names != ""){ count = 1; for(int i=0;i47 && label[i]<58) { //is a digit newLabel1 += label[i]; } } int num1; if (debug) { mothurOut("[DEBUG]: " + newLabel1 + "\n"); } mothurConvert(newLabel1, num1); simple = toString(num1); return simple; } catch(exception& e) { errorOut(e, "MothurOut", "getSimpleLabel"); exit(1); } } /***********************************************************************/ string MothurOut::mothurGetpid(int threadID){ try { string pid = ""; #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) pid += toString(getpid()); if(debug) { mothurOut("[DEBUG]: " + pid + "\n"); } //remove any weird chars string pid1 = ""; for (int i = 0; i < pid.length(); i++) { if(pid[i]>47 && pid[i]<58) { //is a digit pid1 += pid[i]; } } pid = pid1; #else pid += toString(threadID); #endif return pid; } catch(exception& e) { errorOut(e, "MothurOut", "mothurGetpid"); exit(1); } } /***********************************************************************/ bool MothurOut::isLabelEquivalent(string label1, string label2){ try { bool same = false; //remove OTU or phylo tag string newLabel1 = ""; for (int i = 0; i < label1.length(); i++) { if(label1[i]>47 && label1[i]<58) { //is a digit newLabel1 += label1[i]; } } string newLabel2 = ""; for (int i = 0; i < label2.length(); i++) { if(label2[i]>47 && label2[i]<58) { //is a digit newLabel2 += label2[i]; } } int num1, num2; mothurConvert(newLabel1, num1); mothurConvert(newLabel2, num2); if (num1 == num2) { same = true; } return same; } catch(exception& e) { errorOut(e, "MothurOut", "isLabelEquivalent"); exit(1); } } //********************************************************************************************************************** bool MothurOut::isSubset(vector bigset, vector subset) { try { if (subset.size() > bigset.size()) { return false; } //check if each guy in subset is also in bigset for (int i = 0; i < subset.size(); i++) { bool match = false; for (int j = 0; j < bigset.size(); j++) { if (subset[i] == bigset[j]) { match = true; break; } } //you have a guy in subset that had no match in bigset if (match == false) { return false; } } return true; } catch(exception& e) { errorOut(e, "MothurOut", "isSubset"); exit(1); } } /***********************************************************************/ int MothurOut::mothurRemove(string filename){ try { filename = getFullPathName(filename); int error = remove(filename.c_str()); if (debug) { mothurOut("[DEBUG]: removed " + filename + "\n"); } //if (error != 0) { // if (errno != ENOENT) { //ENOENT == file does not exist // string message = "Error deleting file " + filename; // perror(message.c_str()); // } //} return error; } catch(exception& e) { errorOut(e, "MothurOut", "mothurRemove"); exit(1); } } /***********************************************************************/ bool MothurOut::mothurConvert(string item, int& num){ try { bool error = false; if (isNumeric1(item)) { convert(item, num); }else { num = 0; error = true; mothurOut("[ERROR]: cannot convert " + item + " to an integer."); mothurOutEndLine(); commandInputsConvertError = true; } return error; } catch(exception& e) { errorOut(e, "MothurOut", "mothurConvert"); exit(1); } } /***********************************************************************/ bool MothurOut::mothurConvert(string item, intDist& num){ try { bool error = false; if (isNumeric1(item)) { convert(item, num); }else { num = 0; error = true; mothurOut("[ERROR]: cannot convert " + item + " to an integer."); mothurOutEndLine(); commandInputsConvertError = true; } return error; } catch(exception& e) { errorOut(e, "MothurOut", "mothurConvert"); exit(1); } } /***********************************************************************/ bool MothurOut::isNumeric1(string stringToCheck){ try { bool numeric = false; if(stringToCheck.find_first_not_of("0123456789.-") == string::npos) { numeric = true; } return numeric; } catch(exception& e) { errorOut(e, "MothurOut", "isNumeric1"); exit(1); } } /***********************************************************************/ bool MothurOut::isInteger(string stringToCheck){ try { bool isInt = false; if(stringToCheck.find_first_not_of("0123456789-") == string::npos) { isInt = true; } return isInt; } catch(exception& e) { errorOut(e, "MothurOut", "isInteger"); exit(1); } } /***********************************************************************/ bool MothurOut::containsAlphas(string stringToCheck){ try { bool containsAlpha = false; if(stringToCheck.find_first_of("AaBbCcDdEeFfGgHhIiJjKkLlMmNnOopPQqRrSsTtUuVvWwXxYyZz") != string::npos) { containsAlpha = true; } return containsAlpha; } catch(exception& e) { errorOut(e, "MothurOut", "containsAlphas"); exit(1); } } /***********************************************************************/ bool MothurOut::mothurConvert(string item, float& num){ try { bool error = false; if (isNumeric1(item)) { convert(item, num); }else { num = 0; error = true; mothurOut("[ERROR]: cannot convert " + item + " to a float."); mothurOutEndLine(); commandInputsConvertError = true; } return error; } catch(exception& e) { errorOut(e, "MothurOut", "mothurConvert"); exit(1); } } /***********************************************************************/ bool MothurOut::mothurConvert(string item, double& num){ try { bool error = false; if (isNumeric1(item)) { convert(item, num); }else { num = 0; error = true; mothurOut("[ERROR]: cannot convert " + item + " to a double."); mothurOutEndLine(); commandInputsConvertError = true; } return error; } catch(exception& e) { errorOut(e, "MothurOut", "mothurConvert"); exit(1); } } /**************************************************************************************************/ vector > MothurOut::binomial(int maxOrder){ try { vector > binomial(maxOrder+1); for(int i=0;i<=maxOrder;i++){ binomial[i].resize(maxOrder+1); binomial[i][0]=1; binomial[0][i]=0; } binomial[0][0]=1; binomial[1][0]=1; binomial[1][1]=1; for(int i=2;i<=maxOrder;i++){ binomial[1][i]=0; } for(int i=2;i<=maxOrder;i++){ for(int j=1;j<=maxOrder;j++){ if(i==j){ binomial[i][j]=1; } if(j>i) { binomial[i][j]=0; } else { binomial[i][j]=binomial[i-1][j-1]+binomial[i-1][j]; } } } return binomial; } catch(exception& e) { errorOut(e, "MothurOut", "binomial"); exit(1); } } /**************************************************************************************************/ unsigned int MothurOut::fromBase36(string base36){ try { unsigned int num = 0; map converts; converts['A'] = 0; converts['a'] = 0; converts['B'] = 1; converts['b'] = 1; converts['C'] = 2; converts['c'] = 2; converts['D'] = 3; converts['d'] = 3; converts['E'] = 4; converts['e'] = 4; converts['F'] = 5; converts['f'] = 5; converts['G'] = 6; converts['g'] = 6; converts['H'] = 7; converts['h'] = 7; converts['I'] = 8; converts['i'] = 8; converts['J'] = 9; converts['j'] = 9; converts['K'] = 10; converts['k'] = 10; converts['L'] = 11; converts['l'] = 11; converts['M'] = 12; converts['m'] = 12; converts['N'] = 13; converts['n'] = 13; converts['O'] = 14; converts['o'] = 14; converts['P'] = 15; converts['p'] = 15; converts['Q'] = 16; converts['q'] = 16; converts['R'] = 17; converts['r'] = 17; converts['S'] = 18; converts['s'] = 18; converts['T'] = 19; converts['t'] = 19; converts['U'] = 20; converts['u'] = 20; converts['V'] = 21; converts['v'] = 21; converts['W'] = 22; converts['w'] = 22; converts['X'] = 23; converts['x'] = 23; converts['Y'] = 24; converts['y'] = 24; converts['Z'] = 25; converts['z'] = 25; converts['0'] = 26; converts['1'] = 27; converts['2'] = 28; converts['3'] = 29; converts['4'] = 30; converts['5'] = 31; converts['6'] = 32; converts['7'] = 33; converts['8'] = 34; converts['9'] = 35; int i = 0; while (i < base36.length()) { char c = base36[i]; num = 36 * num + converts[c]; i++; } return num; } catch(exception& e) { errorOut(e, "MothurOut", "fromBase36"); exit(1); } } /***********************************************************************/ string MothurOut::findEdianness() { try { // find real endian type string endianType = "unknown"; int num = 1; if(*(char *)&num == 1) { endianType = "LITTLE_ENDIAN"; } else { endianType = "BIG_ENDIAN"; } return endianType; } catch(exception& e) { errorOut(e, "MothurOut", "findEdianness"); exit(1); } } /***********************************************************************/ double MothurOut::median(vector x) { try { double value = 0.0; if (x.size() == 0) { } //error else { //For example, if a < b < c, then the median of the list {a, b, c} is b, and, if a < b < c < d, then the median of the list {a, b, c, d} is the mean of b and c; i.e., it is (b + c)/2. sort(x.begin(), x.end()); //is x.size even? if ((x.size()%2) == 0) { //size() is even. median = average of 2 midpoints int midIndex1 = (x.size()/2)-1; int midIndex2 = (x.size()/2); value = (x[midIndex1]+ x[midIndex2]) / 2.0; }else { int midIndex = (x.size()/2); value = x[midIndex]; } } return value; } catch(exception& e) { errorOut(e, "MothurOut", "median"); exit(1); } } /***********************************************************************/ int MothurOut::factorial(int num){ try { int total = 1; for (int i = 1; i <= num; i++) { total *= i; } return total; } catch(exception& e) { errorOut(e, "MothurOut", "factorial"); exit(1); } } /***********************************************************************/ int MothurOut::getNumSeqs(ifstream& file){ try { int numSeqs = count(istreambuf_iterator(file),istreambuf_iterator(), '>'); file.seekg(0); return numSeqs; } catch(exception& e) { errorOut(e, "MothurOut", "getNumSeqs"); exit(1); } } /***********************************************************************/ void MothurOut::getNumSeqs(ifstream& file, int& numSeqs){ try { string input; numSeqs = 0; while(!file.eof()){ input = getline(file); if (input.length() != 0) { if(input[0] == '>'){ numSeqs++; } } } } catch(exception& e) { errorOut(e, "MothurOut", "getNumSeqs"); exit(1); } } /***********************************************************************/ bool MothurOut::checkLocations(string& filename, string inputDir){ try { filename = getFullPathName(filename); int ableToOpen; ifstream in; ableToOpen = openInputFile(filename, in, "noerror"); in.close(); //if you can't open it, try input location if (ableToOpen == 1) { if (inputDir != "") { //default path is set string tryPath = inputDir + getSimpleName(filename); mothurOut("Unable to open " + filename + ". Trying input directory " + tryPath); mothurOutEndLine(); ifstream in2; ableToOpen = openInputFile(tryPath, in2, "noerror"); in2.close(); filename = tryPath; } } //if you can't open it, try default location if (ableToOpen == 1) { if (getDefaultPath() != "") { //default path is set string tryPath = getDefaultPath() + getSimpleName(filename); mothurOut("Unable to open " + filename + ". Trying default " + tryPath); mothurOutEndLine(); ifstream in2; ableToOpen = openInputFile(tryPath, in2, "noerror"); in2.close(); filename = tryPath; } } //if you can't open it its not in current working directory or inputDir, try mothur excutable location if (ableToOpen == 1) { string exepath = argv; string tempPath = exepath; for (int i = 0; i < exepath.length(); i++) { tempPath[i] = tolower(exepath[i]); } exepath = exepath.substr(0, (tempPath.find_last_of('m'))); string tryPath = getFullPathName(exepath) + getSimpleName(filename); mothurOut("Unable to open " + filename + ". Trying mothur's executable location " + tryPath); mothurOutEndLine(); ifstream in2; ableToOpen = openInputFile(tryPath, in2, "noerror"); in2.close(); filename = tryPath; } if (ableToOpen == 1) { mothurOut("Unable to open " + filename + "."); mothurOutEndLine(); return false; } return true; } catch(exception& e) { errorOut(e, "MothurOut", "checkLocations"); exit(1); } } /***********************************************************************/ //This function parses the estimator options and puts them in a vector void MothurOut::splitAtChar(string& estim, vector& container, char symbol) { try { if (symbol == '-') { splitAtDash(estim, container); return; } string individual = ""; int estimLength = estim.size(); for(int i=0;i& container) { try { string individual = ""; int estimLength = estim.size(); bool prevEscape = false; /*for(int i=0;i& container) { try { string individual = ""; int estimLength = estim.size(); bool prevEscape = false; /* for(int i=0;i& container) { try { string individual = ""; int lineNum; int estimLength = estim.size(); bool prevEscape = false; /* for(int i=0;i& names) { try { string list = ""; if (names.size() == 0) { return list; } for (int i = 0; i < names.size()-1; i++) { list += names[i] + ","; } //get last name list += names[names.size()-1]; return list; } catch(exception& e) { errorOut(e, "MothurOut", "makeList"); exit(1); } } /***********************************************************************/ //This function parses the a string and puts peices in a vector void MothurOut::splitAtComma(string& estim, vector& container) { try { string individual = ""; int estimLength = estim.size(); for(int i=0;i Groups) { try { for (int i = 0; i < Groups.size(); i++) { if (groupname == Groups[i]) { return true; } } return false; } catch(exception& e) { errorOut(e, "MothurOut", "inUsersGroups"); exit(1); } } /**************************************************************************************************/ bool MothurOut::inUsersGroups(vector set, vector< vector > sets) { try { for (int i = 0; i < sets.size(); i++) { if (set == sets[i]) { return true; } } return false; } catch(exception& e) { errorOut(e, "MothurOut", "inUsersGroups"); exit(1); } } /**************************************************************************************************/ bool MothurOut::inUsersGroups(int groupname, vector Groups) { try { for (int i = 0; i < Groups.size(); i++) { if (groupname == Groups[i]) { return true; } } return false; } catch(exception& e) { errorOut(e, "MothurOut", "inUsersGroups"); exit(1); } } /**************************************************************************************************/ //returns true if any of the strings in first vector are in second vector bool MothurOut::inUsersGroups(vector groupnames, vector Groups) { try { for (int i = 0; i < groupnames.size(); i++) { if (inUsersGroups(groupnames[i], Groups)) { return true; } } return false; } catch(exception& e) { errorOut(e, "MothurOut", "inUsersGroups"); exit(1); } } /**************************************************************************************************/ //removes entries that are only white space int MothurOut::removeBlanks(vector& tempVector) { try { vector newVector; for (int i = 0; i < tempVector.size(); i++) { bool isBlank = true; for (int j = 0; j < tempVector[i].length(); j++) { if (!isspace(tempVector[i][j])) { isBlank = false; j+= tempVector[i].length(); } //contains non space chars, break out and save } if (!isBlank) { newVector.push_back(tempVector[i]); } } tempVector = newVector; return 0; } catch(exception& e) { errorOut(e, "MothurOut", "removeBlanks"); exit(1); } } /***********************************************************************/ //this function determines if the user has given us labels that are smaller than the given label. //if so then it returns true so that the calling function can run the previous valid distance. //it's a "smart" distance function. It also checks for invalid labels. bool MothurOut::anyLabelsToProcess(string label, set& userLabels, string errorOff) { try { set::iterator it; vector orderFloat; map userMap; //the conversion process removes trailing 0's which we need to put back map::iterator it2; float labelFloat; bool smaller = false; //unique is the smallest line if (label == "unique") { return false; } else { if (convertTestFloat(label, labelFloat)) { convert(label, labelFloat); }else { //cant convert return false; } } //go through users set and make them floats for(it = userLabels.begin(); it != userLabels.end();) { float temp; if ((*it != "unique") && (convertTestFloat(*it, temp) == true)){ convert(*it, temp); orderFloat.push_back(temp); userMap[*it] = temp; it++; }else if (*it == "unique") { orderFloat.push_back(-1.0); userMap["unique"] = -1.0; it++; }else { if (errorOff == "") { mothurOut(*it + " is not a valid label."); mothurOutEndLine(); } userLabels.erase(it++); } } //sort order sort(orderFloat.begin(), orderFloat.end()); /*************************************************/ //is this label bigger than any of the users labels /*************************************************/ //loop through order until you find a label greater than label for (int i = 0; i < orderFloat.size(); i++) { if (orderFloat[i] < labelFloat) { smaller = true; if (orderFloat[i] == -1) { if (errorOff == "") { mothurOut("Your file does not include the label unique."); mothurOutEndLine(); } userLabels.erase("unique"); } else { if (errorOff == "") { mothurOut("Your file does not include the label "); mothurOutEndLine(); } string s = ""; for (it2 = userMap.begin(); it2!= userMap.end(); it2++) { if (it2->second == orderFloat[i]) { s = it2->first; //remove small labels userLabels.erase(s); break; } } if (errorOff == "") {mothurOut( s + ". I will use the next smallest distance. "); mothurOutEndLine(); } } //since they are sorted once you find a bigger one stop looking }else { break; } } return smaller; } catch(exception& e) { errorOut(e, "MothurOut", "anyLabelsToProcess"); exit(1); } } /**************************************************************************************************/ bool MothurOut::checkReleaseVersion(ifstream& file, string version) { try { bool good = true; string line = getline(file); //before we added this check if (line[0] != '#') { good = false; } else { //rip off # line = line.substr(1); vector versionVector; splitAtChar(version, versionVector, '.'); //check file version vector linesVector; splitAtChar(line, linesVector, '.'); if (versionVector.size() != linesVector.size()) { good = false; } else { for (int j = 0; j < versionVector.size(); j++) { int num1, num2; convert(versionVector[j], num1); convert(linesVector[j], num2); //if mothurs version is newer than this files version, then we want to remake it if (num1 > num2) { good = false; break; } } } } if (!good) { file.close(); } else { file.seekg(0); } return good; } catch(exception& e) { errorOut(e, "MothurOut", "checkReleaseVersion"); exit(1); } } /**************************************************************************************************/ vector MothurOut::getAverages(vector< vector >& dists) { try{ vector averages; //averages.resize(numComp, 0.0); for (int i = 0; i < dists[0].size(); i++) { averages.push_back(0.0); } for (int thisIter = 0; thisIter < dists.size(); thisIter++) { for (int i = 0; i < dists[thisIter].size(); i++) { averages[i] += dists[thisIter][i]; } } //finds average. for (int i = 0; i < averages.size(); i++) { averages[i] /= (double) dists.size(); } return averages; } catch(exception& e) { errorOut(e, "MothurOut", "getAverages"); exit(1); } } /**************************************************************************************************/ double MothurOut::getAverage(vector dists) { try{ double average = 0; for (int i = 0; i < dists.size(); i++) { average += dists[i]; } //finds average. average /= (double) dists.size(); return average; } catch(exception& e) { errorOut(e, "MothurOut", "getAverage"); exit(1); } } /**************************************************************************************************/ vector MothurOut::getStandardDeviation(vector< vector >& dists) { try{ vector averages = getAverages(dists); //find standard deviation vector stdDev; //stdDev.resize(numComp, 0.0); for (int i = 0; i < dists[0].size(); i++) { stdDev.push_back(0.0); } for (int thisIter = 0; thisIter < dists.size(); thisIter++) { //compute the difference of each dist from the mean, and square the result of each for (int j = 0; j < dists[thisIter].size(); j++) { stdDev[j] += ((dists[thisIter][j] - averages[j]) * (dists[thisIter][j] - averages[j])); } } for (int i = 0; i < stdDev.size(); i++) { stdDev[i] /= (double) dists.size(); stdDev[i] = sqrt(stdDev[i]); } return stdDev; } catch(exception& e) { errorOut(e, "MothurOut", "getAverages"); exit(1); } } /**************************************************************************************************/ vector MothurOut::getStandardDeviation(vector< vector >& dists, vector& averages) { try{ //find standard deviation vector stdDev; //stdDev.resize(numComp, 0.0); for (int i = 0; i < dists[0].size(); i++) { stdDev.push_back(0.0); } for (int thisIter = 0; thisIter < dists.size(); thisIter++) { //compute the difference of each dist from the mean, and square the result of each for (int j = 0; j < dists[thisIter].size(); j++) { stdDev[j] += ((dists[thisIter][j] - averages[j]) * (dists[thisIter][j] - averages[j])); } } for (int i = 0; i < stdDev.size(); i++) { stdDev[i] /= (double) dists.size(); stdDev[i] = sqrt(stdDev[i]); } return stdDev; } catch(exception& e) { errorOut(e, "MothurOut", "getStandardDeviation"); exit(1); } } /**************************************************************************************************/ vector< vector > MothurOut::getAverages(vector< vector< vector > >& calcDistsTotals, string mode) { try{ vector< vector > calcAverages; //calcAverages.resize(calcDistsTotals[0].size()); for (int i = 0; i < calcDistsTotals[0].size(); i++) { //initialize sums to zero. //calcAverages[i].resize(calcDistsTotals[0][i].size()); vector temp; for (int j = 0; j < calcDistsTotals[0][i].size(); j++) { seqDist tempDist; tempDist.seq1 = calcDistsTotals[0][i][j].seq1; tempDist.seq2 = calcDistsTotals[0][i][j].seq2; tempDist.dist = 0.0; temp.push_back(tempDist); } calcAverages.push_back(temp); } if (mode == "average") { for (int thisIter = 0; thisIter < calcDistsTotals.size(); thisIter++) { //sum all groups dists for each calculator for (int i = 0; i < calcAverages.size(); i++) { //initialize sums to zero. for (int j = 0; j < calcAverages[i].size(); j++) { calcAverages[i][j].dist += calcDistsTotals[thisIter][i][j].dist; } } } for (int i = 0; i < calcAverages.size(); i++) { //finds average. for (int j = 0; j < calcAverages[i].size(); j++) { calcAverages[i][j].dist /= (float) calcDistsTotals.size(); } } }else { //find median for (int i = 0; i < calcAverages.size(); i++) { //for each calc for (int j = 0; j < calcAverages[i].size(); j++) { //for each comparison vector dists; for (int thisIter = 0; thisIter < calcDistsTotals.size(); thisIter++) { //for each subsample dists.push_back(calcDistsTotals[thisIter][i][j].dist); } sort(dists.begin(), dists.end()); calcAverages[i][j].dist = dists[(calcDistsTotals.size()/2)]; } } } return calcAverages; } catch(exception& e) { errorOut(e, "MothurOut", "getAverages"); exit(1); } } /**************************************************************************************************/ vector< vector > MothurOut::getAverages(vector< vector< vector > >& calcDistsTotals) { try{ vector< vector > calcAverages; //calcAverages.resize(calcDistsTotals[0].size()); for (int i = 0; i < calcDistsTotals[0].size(); i++) { //initialize sums to zero. //calcAverages[i].resize(calcDistsTotals[0][i].size()); vector temp; for (int j = 0; j < calcDistsTotals[0][i].size(); j++) { seqDist tempDist; tempDist.seq1 = calcDistsTotals[0][i][j].seq1; tempDist.seq2 = calcDistsTotals[0][i][j].seq2; tempDist.dist = 0.0; temp.push_back(tempDist); } calcAverages.push_back(temp); } for (int thisIter = 0; thisIter < calcDistsTotals.size(); thisIter++) { //sum all groups dists for each calculator for (int i = 0; i < calcAverages.size(); i++) { //initialize sums to zero. for (int j = 0; j < calcAverages[i].size(); j++) { calcAverages[i][j].dist += calcDistsTotals[thisIter][i][j].dist; } } } for (int i = 0; i < calcAverages.size(); i++) { //finds average. for (int j = 0; j < calcAverages[i].size(); j++) { calcAverages[i][j].dist /= (float) calcDistsTotals.size(); } } return calcAverages; } catch(exception& e) { errorOut(e, "MothurOut", "getAverages"); exit(1); } } /**************************************************************************************************/ vector< vector > MothurOut::getStandardDeviation(vector< vector< vector > >& calcDistsTotals) { try{ vector< vector > calcAverages = getAverages(calcDistsTotals); //find standard deviation vector< vector > stdDev; for (int i = 0; i < calcDistsTotals[0].size(); i++) { //initialize sums to zero. vector temp; for (int j = 0; j < calcDistsTotals[0][i].size(); j++) { seqDist tempDist; tempDist.seq1 = calcDistsTotals[0][i][j].seq1; tempDist.seq2 = calcDistsTotals[0][i][j].seq2; tempDist.dist = 0.0; temp.push_back(tempDist); } stdDev.push_back(temp); } for (int thisIter = 0; thisIter < calcDistsTotals.size(); thisIter++) { //compute the difference of each dist from the mean, and square the result of each for (int i = 0; i < stdDev.size(); i++) { for (int j = 0; j < stdDev[i].size(); j++) { stdDev[i][j].dist += ((calcDistsTotals[thisIter][i][j].dist - calcAverages[i][j].dist) * (calcDistsTotals[thisIter][i][j].dist - calcAverages[i][j].dist)); } } } for (int i = 0; i < stdDev.size(); i++) { //finds average. for (int j = 0; j < stdDev[i].size(); j++) { stdDev[i][j].dist /= (float) calcDistsTotals.size(); stdDev[i][j].dist = sqrt(stdDev[i][j].dist); } } return stdDev; } catch(exception& e) { errorOut(e, "MothurOut", "getAverages"); exit(1); } } /**************************************************************************************************/ vector< vector > MothurOut::getStandardDeviation(vector< vector< vector > >& calcDistsTotals, vector< vector >& calcAverages) { try{ //find standard deviation vector< vector > stdDev; for (int i = 0; i < calcDistsTotals[0].size(); i++) { //initialize sums to zero. vector temp; for (int j = 0; j < calcDistsTotals[0][i].size(); j++) { seqDist tempDist; tempDist.seq1 = calcDistsTotals[0][i][j].seq1; tempDist.seq2 = calcDistsTotals[0][i][j].seq2; tempDist.dist = 0.0; temp.push_back(tempDist); } stdDev.push_back(temp); } for (int thisIter = 0; thisIter < calcDistsTotals.size(); thisIter++) { //compute the difference of each dist from the mean, and square the result of each for (int i = 0; i < stdDev.size(); i++) { for (int j = 0; j < stdDev[i].size(); j++) { stdDev[i][j].dist += ((calcDistsTotals[thisIter][i][j].dist - calcAverages[i][j].dist) * (calcDistsTotals[thisIter][i][j].dist - calcAverages[i][j].dist)); } } } for (int i = 0; i < stdDev.size(); i++) { //finds average. for (int j = 0; j < stdDev[i].size(); j++) { stdDev[i][j].dist /= (float) calcDistsTotals.size(); stdDev[i][j].dist = sqrt(stdDev[i][j].dist); } } return stdDev; } catch(exception& e) { errorOut(e, "MothurOut", "getAverages"); exit(1); } } /**************************************************************************************************/ bool MothurOut::isContainingOnlyDigits(string input) { try{ //are you a digit in ascii code for (int i = 0;i < input.length(); i++){ if( input[i]>47 && input[i]<58){} else { return false; } } return true; } catch(exception& e) { errorOut(e, "MothurOut", "isContainingOnlyDigits"); exit(1); } } /**************************************************************************************************/ int MothurOut::removeConfidences(string& tax) { try { string taxon; string newTax = ""; while (tax.find_first_of(';') != -1) { if (control_pressed) { return 0; } //get taxon taxon = tax.substr(0,tax.find_first_of(';')); int pos = taxon.find_last_of('('); if (pos != -1) { //is it a number? int pos2 = taxon.find_last_of(')'); if (pos2 != -1) { string confidenceScore = taxon.substr(pos+1, (pos2-(pos+1))); if (isNumeric1(confidenceScore)) { taxon = taxon.substr(0, pos); //rip off confidence } } } taxon += ";"; tax = tax.substr(tax.find_first_of(';')+1, tax.length()); newTax += taxon; } tax = newTax; return 0; } catch(exception& e) { errorOut(e, "MothurOut", "removeConfidences"); exit(1); } } /**************************************************************************************************/ string MothurOut::removeQuotes(string tax) { try { string taxon; string newTax = ""; for (int i = 0; i < tax.length(); i++) { if (control_pressed) { return newTax; } if ((tax[i] != '\'') && (tax[i] != '\"')) { newTax += tax[i]; } } return newTax; } catch(exception& e) { errorOut(e, "MothurOut", "removeQuotes"); exit(1); } } /**************************************************************************************************/ // function for calculating standard deviation double MothurOut::getStandardDeviation(vector& featureVector){ try { //finds sum double average = 0; for (int i = 0; i < featureVector.size(); i++) { average += featureVector[i]; } average /= (double) featureVector.size(); //find standard deviation double stdDev = 0; for (int i = 0; i < featureVector.size(); i++) { //compute the difference of each dist from the mean, and square the result of each stdDev += ((featureVector[i] - average) * (featureVector[i] - average)); } stdDev /= (double) featureVector.size(); stdDev = sqrt(stdDev); return stdDev; } catch(exception& e) { errorOut(e, "MothurOut", "getStandardDeviation"); exit(1); } } /**************************************************************************************************/ // returns largest value in vector double MothurOut::max(vector& featureVector){ try { if (featureVector.size() == 0) { mothurOut("[ERROR]: vector size = 0!\n"); control_pressed=true; return 0.0; } //finds largest double largest = featureVector[0]; for (int i = 1; i < featureVector.size(); i++) { if (featureVector[i] > largest) { largest = featureVector[i]; } } return largest; } catch(exception& e) { errorOut(e, "MothurOut", "max"); exit(1); } } /**************************************************************************************************/ // returns smallest value in vector double MothurOut::min(vector& featureVector){ try { if (featureVector.size() == 0) { mothurOut("[ERROR]: vector size = 0!\n"); control_pressed=true; return 0.0; } //finds smallest double smallest = featureVector[0]; for (int i = 1; i < featureVector.size(); i++) { if (featureVector[i] < smallest) { smallest = featureVector[i]; } } return smallest; } catch(exception& e) { errorOut(e, "MothurOut", "min"); exit(1); } } /**************************************************************************************************/ int MothurOut::max(int A, int B){ try { //finds largest int largest = A; if (B > A) { largest = B; } return largest; } catch(exception& e) { errorOut(e, "MothurOut", "max"); exit(1); } } /**************************************************************************************************/ int MothurOut::min(int A, int B){ try { //finds smallest int smallest = A; if (B < A) { smallest = B; } return smallest; } catch(exception& e) { errorOut(e, "MothurOut", "min"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/mothurout.h000066400000000000000000000352061255543666200166630ustar00rootroot00000000000000#ifndef MOTHUROUT_H #define MOTHUROUT_H /* * mothurOut.h * Mothur * * Created by westcott on 2/25/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" /***********************************************/ struct logger { logger() {} ~logger() {} template< class T > logger& operator <<( const T& o ) { cout << o; return *this; } logger& operator<<(ostream& (*m)(ostream&) ) { cout << m; return *this; } }; /***********************************************/ class MothurOut { public: static MothurOut* getInstance(); void setFileName(string); void mothurOut(string); //writes to cout and the logfile void mothurOutEndLine(); //writes to cout and the logfile void mothurOut(string, ofstream&); //writes to the ofstream, cout and the logfile void mothurOutEndLine(ofstream&); //writes to the ofstream, cout and the logfile void mothurOutJustToScreen(string); //writes to cout void mothurOutJustToLog(string); void errorOut(exception&, string, string); void closeLog(); string getDefaultPath() { return defaultPath; } void setDefaultPath(string); string getOutputDir() { return outputDir; } void setOutputDir(string); string getReleaseDate() { return releaseDate; } void setReleaseDate(string r) { releaseDate = r; } string getVersion() { return version; } void setVersion(string r) { version = r; } void addGroup(string g) { Groups.push_back(g); } void setGroups(vector& g) { sort(g.begin(), g.end()); Groups = g; } void clearGroups() { Groups.clear(); } int getNumGroups() { return Groups.size(); } vector getGroups() { sort(Groups.begin(), Groups.end()); return Groups; } void addAllGroup(string g) { namesOfGroups.push_back(g); } void setAllGroups(vector& g) { sort(g.begin(), g.end()); namesOfGroups = g; } void clearAllGroups() { namesOfGroups.clear(); } int getNumAllGroups() { return namesOfGroups.size(); } vector getAllGroups() { sort(namesOfGroups.begin(), namesOfGroups.end()); return namesOfGroups; } vector Treenames; vector sharedBinLabelsInFile; vector currentSharedBinLabels; vector listBinLabelsInFile; string saveNextLabel, argv, sharedHeaderMode, groupMode; bool printedSharedHeaders, printedListHeaders, commandInputsConvertError, changedSeqNames, modifyNames; //functions from mothur.h //file operations bool dirCheck(string&); //completes path, appends appropriate / or \, makes sure dir is writable. bool dirCheck(string&, string); //completes path, appends appropriate / or \, makes sure dir is writable. - no error bool mkDir(string&); //completes path, appends appropriate / or \. //returns true it exits or if we can make it vector divideFile(string, int&); //divides splitting unevenness by sequence vector divideFile(string filename, int& proc, char delimChar); vector divideFilePerLine(string, int&); //divides splitting unevenness at line breaks int divideFile(string, int&, vector&); vector setFilePosEachLine(string, int&); vector setFilePosEachLine(string, unsigned long long&); vector setFilePosFasta(string, long long&); vector setFilePosFasta(string, long long&, char); vector setFilePosFasta(string, int&); string sortFile(string, string); int appendFiles(string, string); int appendBinaryFiles(string, string); int appendSFFFiles(string, string); int appendFilesWithoutHeaders(string, string); int renameFile(string, string); //oldname, newname string getFullPathName(string); string findProgramPath(string programName); string hasPath(string); string getExtension(string); string getPathName(string); string getSimpleName(string); string getRootName(string); bool isBlank(string); int openOutputFile(string, ofstream&); int openOutputFileBinary(string, ofstream&); int openOutputFileAppend(string, ofstream&); int openOutputFileBinaryAppend(string, ofstream&); int openInputFile(string, ifstream&); int openInputFileBinary(string, ifstream&); int openInputFileBinary(string, ifstream&, string); #ifdef USE_BOOST int openInputFileBinary(string, ifstream&, boost::iostreams::filtering_istream&); int openInputFileBinary(string, ifstream&, boost::iostreams::filtering_istream&, string); #endif int openInputFile(string, ifstream&, string); //no error given vector allGZFiles(vector&); vector isGZ(string); //checks existence and format - will fail for either or both. bool checkLocations(string&, string); //filename, inputDir. checks for file in ./, inputdir, default and mothur's exe location. Returns false if cant be found. If found completes name with location string getline(ifstream&); string getline(istringstream&); void gobble(istream&); void gobble(istringstream&); void zapGremlins(istream&); void zapGremlins(istringstream&); vector splitWhiteSpace(string& rest, char[], int); vector splitWhiteSpace(string); set readAccnos(string); int readAccnos(string, vector&); int readAccnos(string, vector&, string); map readNames(string); map readNames(string, unsigned long int&); int readTax(string, map&, bool); vector readConsTax(string); int readConsTax(string, map&); int readNames(string, map&, map&); int readNames(string, map&); int readNames(string, map&, bool); int readNames(string, map&, int); int readNames(string, map >&); int readNames(string, vector&, map&); int mothurRemove(string); bool mothurConvert(string, int&); //use for converting user inputs. Sets commandInputsConvertError to true if error occurs. Engines check this. bool mothurConvert(string, intDist&); //use for converting user inputs. Sets commandInputsConvertError to true if error occurs. Engines check this. bool mothurConvert(string, float&); //use for converting user inputs. Sets commandInputsConvertError to true if error occurs. Engines check this. bool mothurConvert(string, double&); //use for converting user inputs. Sets commandInputsConvertError to true if error occurs. Engines check this. //searchs and checks bool checkReleaseVersion(ifstream&, string); bool anyLabelsToProcess(string, set&, string); bool inUsersGroups(vector, vector); //returns true if any of the strings in first vector are in second vector bool inUsersGroups(vector, vector< vector >); bool inUsersGroups(string, vector); bool inUsersGroups(int, vector); void getNumSeqs(ifstream&, int&); int getNumSeqs(ifstream&); int getNumNames(string); int getNumChar(string, char); bool isTrue(string); bool isContainingOnlyDigits(string); bool containsAlphas(string); bool isNumeric1(string); bool isInteger(string); bool isLabelEquivalent(string, string); string getSimpleLabel(string); string findEdianness(); string mothurGetpid(int); //string manipulation void splitAtEquals(string&, string&); void splitAtComma(string&, string&); void splitAtComma(string&, vector&); void splitAtDash(string&, set&); void splitAtDash(string&, set&); void splitAtDash(string&, vector&); void splitAtChar(string&, vector&, char); void splitAtChar(string&, string&, char); int removeBlanks(vector&); vector splitWhiteSpaceWithQuotes(string); int splitWhiteSpaceWithQuotes(string, vector&); int removeConfidences(string&); string removeQuotes(string); string makeList(vector&); bool isSubset(vector, vector); //bigSet, subset int checkName(string&); map > parseClasses(string); //math operation int max(int, int); int min(int, int); double max(vector&); //returns largest value in vector double min(vector&); //returns smallest value in vector int factorial(int num); vector > binomial(int); float ceilDist(float, int); float roundDist(float, int); unsigned int fromBase36(string); double median(vector); int getRandomIndex(int); //highest double getStandardDeviation(vector&); vector getStandardDeviation(vector< vector >&); vector getStandardDeviation(vector< vector >&, vector&); vector getAverages(vector< vector >&); double getAverage(vector); vector< vector > getStandardDeviation(vector< vector< vector > >&); vector< vector > getStandardDeviation(vector< vector< vector > >&, vector< vector >&); vector< vector > getAverages(vector< vector< vector > >&, string); vector< vector > getAverages(vector< vector< vector > >&); int control_pressed; bool executing, runParse, jumble, gui, mothurCalling, debug, quietMode; //current files - if you add a new type you must edit optionParser->getParameters, get.current and set.current commands and mothurOut->printCurrentFiles/clearCurrentFiles/getCurrentTypes. add a get and set function. string getPhylipFile() { return phylipfile; } string getColumnFile() { return columnfile; } string getListFile() { return listfile; } string getRabundFile() { return rabundfile; } string getSabundFile() { return sabundfile; } string getNameFile() { return namefile; } string getGroupFile() { return groupfile; } string getOrderFile() { return orderfile; } string getOrderGroupFile() { return ordergroupfile; } string getTreeFile() { return treefile; } string getSharedFile() { return sharedfile; } string getRelAbundFile() { return relabundfile; } string getDesignFile() { return designfile; } string getFastaFile() { return fastafile; } string getSFFFile() { return sfffile; } string getQualFile() { return qualfile; } string getOligosFile() { return oligosfile; } string getAccnosFile() { return accnosfile; } string getTaxonomyFile() { return taxonomyfile; } string getFlowFile() { return flowfile; } string getBiomFile() { return biomfile; } string getCountTableFile() { return counttablefile; } string getSummaryFile() { return summaryfile; } string getFileFile() { return filefile; } string getProcessors() { return processors; } void setListFile(string f) { listfile = getFullPathName(f); } void setTreeFile(string f) { treefile = getFullPathName(f); } void setGroupFile(string f) { groupfile = getFullPathName(f); groupMode = "group"; } void setPhylipFile(string f) { phylipfile = getFullPathName(f); } void setColumnFile(string f) { columnfile = getFullPathName(f); } void setNameFile(string f) { namefile = getFullPathName(f); } void setRabundFile(string f) { rabundfile = getFullPathName(f); } void setSabundFile(string f) { sabundfile = getFullPathName(f); } void setSharedFile(string f) { sharedfile = getFullPathName(f); } void setRelAbundFile(string f) { relabundfile = getFullPathName(f); } void setOrderFile(string f) { orderfile = getFullPathName(f); } void setOrderGroupFile(string f) { ordergroupfile = getFullPathName(f); } void setDesignFile(string f) { designfile = getFullPathName(f); } void setFastaFile(string f) { fastafile = getFullPathName(f); } void setSFFFile(string f) { sfffile = getFullPathName(f); } void setQualFile(string f) { qualfile = getFullPathName(f); } void setOligosFile(string f) { oligosfile = getFullPathName(f); } void setAccnosFile(string f) { accnosfile = getFullPathName(f); } void setTaxonomyFile(string f) { taxonomyfile = getFullPathName(f); } void setFlowFile(string f) { flowfile = getFullPathName(f); } void setBiomFile(string f) { biomfile = getFullPathName(f); } void setSummaryFile(string f) { summaryfile = getFullPathName(f); } void setFileFile(string f) { filefile = getFullPathName(f); } void setCountTableFile(string f) { counttablefile = getFullPathName(f); groupMode = "count"; } void setProcessors(string p) { processors = p; mothurOut("\nUsing " + toString(p) + " processors.\n"); } void printCurrentFiles(); bool hasCurrentFiles(); void clearCurrentFiles(); set getCurrentTypes(); private: static MothurOut* _uniqueInstance; MothurOut( const MothurOut& ); // Disable copy constructor void operator=( const MothurOut& ); // Disable assignment operator MothurOut() { control_pressed = false; defaultPath=""; filefile = ""; phylipfile = ""; columnfile = ""; listfile = ""; rabundfile = ""; sabundfile = ""; namefile = ""; groupfile = ""; designfile = ""; orderfile = ""; treefile = ""; sharedfile = ""; ordergroupfile = ""; relabundfile = ""; fastafile = ""; qualfile = ""; sfffile = ""; oligosfile = ""; accnosfile = ""; taxonomyfile = ""; processors = "1"; flowfile = ""; biomfile = ""; counttablefile = ""; summaryfile = ""; gui = false; printedSharedHeaders = false; printedListHeaders = false; commandInputsConvertError = false; mothurCalling = false; debug = false; quietMode = false; sharedHeaderMode = ""; groupMode = "group"; changedSeqNames = false; modifyNames = true; } ~MothurOut(); string logFileName; string defaultPath, outputDir; string releaseDate, version; string accnosfile, phylipfile, columnfile, listfile, rabundfile, sabundfile, namefile, groupfile, designfile, taxonomyfile, biomfile, filefile; string orderfile, treefile, sharedfile, ordergroupfile, relabundfile, fastafile, qualfile, sfffile, oligosfile, processors, flowfile, counttablefile, summaryfile; vector Groups; vector namesOfGroups; ofstream out; int mem_usage(double&, double&); }; /***********************************************/ #endif mothur-1.36.1/source/myseqdist.cpp000066400000000000000000000263011255543666200171660ustar00rootroot00000000000000/* * pds.seqdist.cpp * * * Created by Pat Schloss on 8/12/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "myseqdist.h" #include "sequence.hpp" /**************************************************************************************************/ correctDist::correctDist(int p) : processors(p) { try { m = MothurOut::getInstance(); } catch(exception& e) { m->errorOut(e, "correctDist", "correctDist"); exit(1); } } /**************************************************************************************************/ correctDist::correctDist(string sequenceFileName, int p) : processors(p) { try { m = MothurOut::getInstance(); getSequences(sequenceFileName); } catch(exception& e) { m->errorOut(e, "correctDist", "correctDist"); exit(1); } } /**************************************************************************************************/ int correctDist::addSeq(string seqName, string seqSeq){ try { names.push_back(seqName); sequences.push_back(fixSequence(seqSeq)); return 0; } catch(exception& e) { m->errorOut(e, "correctDist", "addSeq"); exit(1); } } /**************************************************************************************************/ int correctDist::execute(string distanceFileName){ try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) #else processors = 1; #endif correctMatrix.resize(4); for(int i=0;i<4;i++){ correctMatrix[i].resize(4); } correctMatrix[0][0] = 0.000000; //AA correctMatrix[1][0] = 11.619259; //CA correctMatrix[2][0] = 11.694004; //TA correctMatrix[3][0] = 7.748623; //GA correctMatrix[1][1] = 0.000000; //CC correctMatrix[2][1] = 7.619657; //TC correctMatrix[3][1] = 12.852562; //GC correctMatrix[2][2] = 0.000000; //TT correctMatrix[3][2] = 10.964048; //TG correctMatrix[3][3] = 0.000000; //GG for(int i=0;i<4;i++){ for(int j=0;jerrorOut(e, "correctDist", "execute"); exit(1); } } /**************************************************************************************************/ int correctDist::getSequences(string sequenceFileName){ try { ifstream sequenceFile; m->openInputFile(sequenceFileName, sequenceFile); string seqName, seqSeq; while(!sequenceFile.eof()){ if (m->control_pressed) { break; } Sequence temp(sequenceFile); m->gobble(sequenceFile); if (temp.getName() != "") { names.push_back(temp.getName()); sequences.push_back(fixSequence(temp.getAligned())); } } sequenceFile.close(); return 0; } catch(exception& e) { m->errorOut(e, "correctDist", "getSequences"); exit(1); } } /**************************************************************************************************/ vector correctDist::fixSequence(string sequence){ try { int alignLength = sequence.length(); vector seqVector; for(int i=0;ierrorOut(e, "correctDist", "fixSequence"); exit(1); } } /**************************************************************************************************/ int correctDist::createProcess(string distanceFileName){ try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; vector processIDs; bool recalc = false; while(process != processors){ pid_t pid = fork(); if(pid > 0){ processIDs.push_back(pid); process++; } else if(pid == 0){ driver(start[process], end[process], distanceFileName + m->mothurGetpid(process) + ".temp"); exit(0); } else{ m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDs.size(); i++) { kill (processIDs[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; recalc = true; break; } } if (recalc) { start.clear(); end.clear(); for(int i=0;i 0){ processIDs.push_back(pid); process++; } else if(pid == 0){ driver(start[process], end[process], distanceFileName + m->mothurGetpid(process) + ".temp"); exit(0); } else{ m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i=0;iappendFiles((distanceFileName + toString(processIDs[i]) + ".temp"), distanceFileName); remove((distanceFileName + toString(processIDs[i]) + ".temp").c_str()); } #endif return 0; } catch(exception& e) { m->errorOut(e, "correctDist", "createProcess"); exit(1); } } /**************************************************************************************************/ int correctDist::driver(int start, int end, string distFileName){ try { ofstream distFile; m->openOutputFile(distFileName, distFile); distFile << setprecision(9); if(start == 0){ distFile << numSeqs << endl; } int startTime = time(NULL); m->mothurOut("\nCalculating distances...\n"); for(int i=start;icontrol_pressed) { distFile.close(); return 0; } double dist = getDist(sequences[i], sequences[j]); distFile << ' ' << dist; } distFile << endl; if(i % 100 == 0){ m->mothurOutJustToScreen(toString(i) + "\t" + toString(time(NULL) - startTime)+"\n"); } } distFile.close(); if((end-1) % 100 != 0){ m->mothurOutJustToScreen(toString(end-1) + "\t" + toString(time(NULL) - startTime)+"\n"); } m->mothurOut("Done.\n"); return 0; } catch(exception& e) { m->errorOut(e, "correctDist", "driver"); exit(1); } } /**************************************************************************************************/ double correctDist::getDist(vector& seqA, vector& seqB){ try { int lengthA = seqA.size(); int lengthB = seqB.size(); vector > alignMatrix(lengthA+1); vector > alignMoves(lengthA+1); for(int i=0;i<=lengthA;i++){ alignMatrix[i].resize(lengthB+1, 0); alignMoves[i].resize(lengthB+1, 'x'); } for(int i=0;i<=lengthA;i++){ alignMatrix[i][0] = 15.0 * i; alignMoves[i][0] = 'u'; } for(int i=0;i<=lengthB;i++){ alignMatrix[0][i] = 15.0 * i; alignMoves[0][i] = 'l'; } for(int i=1;i<=lengthA;i++){ for(int j=1;j<=lengthB;j++){ if (m->control_pressed) { return 0; } double nogap; nogap = alignMatrix[i-1][j-1] + correctMatrix[seqA[i-1]][seqB[j-1]]; double gap; double left; if(i == lengthA){ //terminal gap left = alignMatrix[i][j-1]; } else{ if(seqB[j-1] == getLastMatch('l', alignMoves, i, j, seqA, seqB)){ gap = 4.0; } else{ gap = 15.0; } left = alignMatrix[i][j-1] + gap; } double up; if(j == lengthB){ //terminal gap up = alignMatrix[i-1][j]; } else{ if(seqA[i-1] == getLastMatch('u', alignMoves, i, j, seqA, seqB)){ gap = 4.0; } else{ gap = 15.0; } up = alignMatrix[i-1][j] + gap; } if(nogap < left){ if(nogap < up){ alignMoves[i][j] = 'd'; alignMatrix[i][j] = nogap; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = up; } } else{ if(left < up){ alignMoves[i][j] = 'l'; alignMatrix[i][j] = left; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = up; } } } } int i = lengthA; int j = lengthB; int count = 0; // string alignA = ""; // string alignB = ""; // string bases = "ACTG"; // // for(int i=0;i 0 && j > 0){ if (m->control_pressed) { return 0; } if(alignMoves[i][j] == 'd'){ // alignA = bases[seqA[i-1]] + alignA; // alignB = bases[seqB[j-1]] + alignB; count++; i--; j--; } else if(alignMoves[i][j] == 'u'){ if(j != lengthB){ // alignA = bases[seqA[i-1]] + alignA; // alignB = '-' + alignB; count++; } i--; } else if(alignMoves[i][j] == 'l'){ if(i != lengthA){ // alignA = '-' + alignA; // alignB = bases[seqB[j-1]] + alignB; count++; } j--; } } // cout << alignA << endl << alignB << endl; return alignMatrix[lengthA][lengthB] / (double)count; } catch(exception& e) { m->errorOut(e, "correctDist", "getDist"); exit(1); } } /**************************************************************************************************/ int correctDist::getLastMatch(char direction, vector >& alignMoves, int i, int j, vector& seqA, vector& seqB){ try { char nullReturn = -1; while(i>=1 && j>=1){ if (m->control_pressed) { return nullReturn; } if(direction == 'd'){ if(seqA[i-1] == seqB[j-1]) { return seqA[i-1]; } else { return nullReturn; } } else if(direction == 'l') { j--; } else { i--; } direction = alignMoves[i][j]; } return nullReturn; } catch(exception& e) { m->errorOut(e, "correctDist", "getLastMatch"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/myseqdist.h000066400000000000000000000017761255543666200166440ustar00rootroot00000000000000#ifndef CORRECTDIST_H #define CORRECTDIST_H /* * pds.seqdist.h * * * Created by Pat Schloss on 8/12/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "mothurout.h" /**************************************************************************************************/ class correctDist { public: correctDist(string, int); correctDist(int); ~correctDist(){} int addSeq(string, string); int execute(string); private: MothurOut* m; int getSequences(string); vector fixSequence(string); int driver(int, int, string); int createProcess(string); double getDist(vector&, vector&); int getLastMatch(char, vector >&, int, int, vector&, vector&); vector > correctMatrix; vector > sequences; vector names; int numSeqs; int processors; vector start; vector end; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/nast.cpp000066400000000000000000000414161255543666200161150ustar00rootroot00000000000000/* * nast.cpp * * * Created by Pat Schloss on 12/17/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is my implementation of the NAST (nearest alignment space termination) algorithm as described in: * * DeSantis TZ, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, Phan R, & Anderson GL. 2006. NAST: a multiple * sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Research. 34:W394-9. * * To construct an object one needs to provide a method of getting a pairwise alignment (alignment) and the template * and candidate sequence that are to be aligned to each other. * */ #include "sequence.hpp" #include "alignment.hpp" #include "nast.hpp" /**************************************************************************************************/ Nast::Nast(Alignment* method, Sequence* cand, Sequence* temp) : alignment(method), candidateSeq(cand), templateSeq(temp) { try { m = MothurOut::getInstance(); maxInsertLength = 0; pairwiseAlignSeqs(); // This is part A in Fig. 2 of DeSantis et al. regapSequences(); // This is parts B-F in Fig. 2 of DeSantis et al. } catch(exception& e) { m->errorOut(e, "Nast", "Nast"); exit(1); } } /**************************************************************************************************/ void Nast::pairwiseAlignSeqs(){ // Here we call one of the pairwise alignment methods to align our unaligned candidate // and template sequences try { alignment->align(candidateSeq->getUnaligned(), templateSeq->getUnaligned()); string candAln = alignment->getSeqAAln(); string tempAln = alignment->getSeqBAln(); if(candAln == ""){ candidateSeq->setPairwise(""); templateSeq->setPairwise(templateSeq->getUnaligned()); } else{ if(tempAln[0] == '-'){ int pairwiseAlignmentLength = tempAln.length(); // we need to make sure that the candidate sequence alignment for(int i=0;i=0; i--){// ends where the template sequence alignment ends, if it runs if(isalpha(tempAln[i])){ // long, we nuke the end of the candidate sequence candAln = candAln.substr(0,i+1); tempAln = tempAln.substr(0,i+1); break; } } } } candidateSeq->setPairwise(candAln); // set the pairwise sequences in the Sequence objects for templateSeq->setPairwise(tempAln); // the candidate and template sequences } catch(exception& e) { m->errorOut(e, "Nast", "pairwiseAlignSeqs"); exit(1); } } /**************************************************************************************************/ void Nast::removeExtraGaps(string& candAln, string tempAln, string newTemplateAlign){ // here we do steps C-F of Fig. 2 from DeSantis et al. try { int longAlignmentLength = newTemplateAlign.length(); for(int i=0; i0;leftIndex--){ // then we've got problems... if(!isalpha(candAln[leftIndex])){ leftRoom = 1; //count how far it is to the nearest gap on the LEFT side of the anomaly while(leftIndex-leftRoom>=0 && !isalpha(candAln[leftIndex-leftRoom])) { leftRoom++; } break; } } for(rightIndex=i+1;rightIndex maxInsertLength){ maxInsertLength = insertLength; } if((leftRoom + rightRoom) >= insertLength){ // Parts D & E from Fig. 2 of DeSantis et al. if((i-leftIndex) <= (rightIndex-i)){ // the left gap is closer - > move stuff left there's if(leftRoom >= insertLength){ // enough room to the left to move //cout << "lr newTemplateAlign = " << newTemplateAlign.length() << '\t' << i << '\t' << insertLength << endl; string leftTemplateString = newTemplateAlign.substr(0,i); string rightTemplateString = newTemplateAlign.substr((i+insertLength)); newTemplateAlign = leftTemplateString + rightTemplateString; longAlignmentLength = newTemplateAlign.length(); //cout << "lr candAln = " << candAln.length() << '\t' << leftIndex << '\t' << endl; string leftCandidateString = candAln.substr(0,(leftIndex-insertLength+1)); string rightCandidateString = candAln.substr((leftIndex+1)); candAln = leftCandidateString + rightCandidateString; }else{ // not enough room to the left, have to steal some space to the right //cout << "in else lr newTemplateAlign = " << newTemplateAlign.length() << '\t' << i << '\t' << insertLength << endl; string leftTemplateString = newTemplateAlign.substr(0,i); string rightTemplateString = newTemplateAlign.substr((i+insertLength)); newTemplateAlign = leftTemplateString + rightTemplateString; longAlignmentLength = newTemplateAlign.length(); //cout << " in else lr candAln = " << candAln.length() << '\t' << " leftIndex = " << leftIndex << " leftroom = " << leftRoom << " rightIndex = " << rightIndex << '\t' << " rightroom = " << rightRoom << '\t' << endl; string leftCandidateString = candAln.substr(0,(leftIndex-leftRoom+1)); string insertString = candAln.substr((leftIndex+1),(rightIndex-leftIndex-1)); string rightCandidateString = candAln.substr((rightIndex+(insertLength-leftRoom))); candAln = leftCandidateString + insertString + rightCandidateString; } }else{ // the right gap is closer - > move stuff right there's if(rightRoom >= insertLength){ // enough room to the right to move //cout << "rr newTemplateAlign = " << newTemplateAlign.length() << '\t' << i << '\t' << i+insertLength << endl; string leftTemplateString = newTemplateAlign.substr(0,i); string rightTemplateString = newTemplateAlign.substr((i+insertLength)); newTemplateAlign = leftTemplateString + rightTemplateString; longAlignmentLength = newTemplateAlign.length(); //cout << "rr candAln = " << candAln.length() << '\t' << i << '\t' << rightIndex << '\t' << rightIndex+insertLength << endl; string leftCandidateString = candAln.substr(0,rightIndex); string rightCandidateString = candAln.substr((rightIndex+insertLength)); candAln = leftCandidateString + rightCandidateString; } else{ // not enough room to the right, have to steal some // space to the left lets move left and then right... //cout << "in else rr newTemplateAlign = " << newTemplateAlign.length() << '\t' << i << '\t' << i+insertLength << endl; string leftTemplateString = newTemplateAlign.substr(0,i); string rightTemplateString = newTemplateAlign.substr((i+insertLength)); newTemplateAlign = leftTemplateString + rightTemplateString; longAlignmentLength = newTemplateAlign.length(); //cout << "in else rr candAln = " << candAln.length() << '\t' << '\t' << (leftIndex-(insertLength-rightRoom)+1) << '\t' << (leftIndex+1,rightIndex-leftIndex-1) << '\t' << (rightIndex+rightRoom) << endl; string leftCandidateString = candAln.substr(0,(leftIndex-(insertLength-rightRoom)+1)); string insertString = candAln.substr((leftIndex+1),(rightIndex-leftIndex-1)); string rightCandidateString = candAln.substr((rightIndex+rightRoom)); candAln = leftCandidateString + insertString + rightCandidateString; } } if ((i - insertLength) < 0) { i = 0; } else { i -= insertLength; } } else{ // there could be a case where there isn't enough room in either direction to move stuff //cout << "in else else newTemplateAlign = " << newTemplateAlign.length() << '\t' << i << '\t' << (i+leftRoom+rightRoom) << endl; string leftTemplateString = newTemplateAlign.substr(0,i); string rightTemplateString = newTemplateAlign.substr((i+leftRoom+rightRoom)); newTemplateAlign = leftTemplateString + rightTemplateString; longAlignmentLength = newTemplateAlign.length(); //cout << "in else else newTemplateAlign = " << candAln.length() << '\t' << (leftIndex-leftRoom+1) << '\t' << (leftIndex+1) << '\t' << (rightIndex-leftIndex-1) << '\t' << (rightIndex+rightRoom) << endl; string leftCandidateString = candAln.substr(0,(leftIndex-leftRoom+1)); string insertString = candAln.substr((leftIndex+1),(rightIndex-leftIndex-1)); string rightCandidateString = candAln.substr((rightIndex+rightRoom)); candAln = leftCandidateString + insertString + rightCandidateString; i -= (leftRoom + rightRoom); } // i -= insertLength; //if i is negative, we want to remove the extra gaps to the right if (i < 0) { m->mothurOut("i is negative"); m->mothurOutEndLine(); } } } } catch(exception& e) { m->errorOut(e, "Nast", "removeExtraGaps"); exit(1); } } /**************************************************************************************************/ void Nast::regapSequences(){ //This is essentially part B in Fig 2. of DeSantis et al. try { //cout << candidateSeq->getName() << endl; string candPair = candidateSeq->getPairwise(); string candAln = ""; string tempPair = templateSeq->getPairwise(); string tempAln = templateSeq->getAligned(); // we use the template aligned sequence as our guide int pairwiseLength = candPair.length(); int fullAlignLength = tempAln.length(); if(candPair == ""){ for(int i=0;isetAligned(candAln); return; } int fullAlignIndex = 0; int pairwiseAlignIndex = 0; string newTemplateAlign = ""; // this is going to be messy so we want a temporary template // alignment string while(tempAln[fullAlignIndex] == '.' || tempAln[fullAlignIndex] == '-'){ candAln += '.'; // add the initial '-' and '.' to the candidate and template newTemplateAlign += tempAln[fullAlignIndex];// pairwise sequences fullAlignIndex++; } string lastLoop = ""; while(pairwiseAlignIndexseems to be the opposite of the alpha scenario candAln += candPair[pairwiseAlignIndex]; newTemplateAlign += tempAln[fullAlignIndex];// pairwiseAlignIndex++; fullAlignIndex++; } else if(isalpha(tempPair[pairwiseAlignIndex]) && !isalpha(tempAln[fullAlignIndex]) && !isalpha(candPair[pairwiseAlignIndex])){ // template pairwise has a character, but its full aligned sequence and candidate sequence have gaps // this would happen like we need to add a gap. basically the opposite of the alpha situation newTemplateAlign += tempAln[fullAlignIndex];// candAln += "-"; fullAlignIndex++; } else if(!isalpha(tempPair[pairwiseAlignIndex]) && isalpha(tempAln[fullAlignIndex]) && !isalpha(candPair[pairwiseAlignIndex])){ // template and candidate pairwise are gaps and the template aligned is not a gap this should not be possible // would skip the gaps and not progress through full alignment sequence // not tested yet m->mothurOut("We're into D " + toString(fullAlignIndex) + " " + toString(pairwiseAlignIndex)); m->mothurOutEndLine(); pairwiseAlignIndex++; } else{ // everything has a gap - not possible // not tested yet m->mothurOut("We're into F " + toString(fullAlignIndex) + " " + toString(pairwiseAlignIndex)); m->mothurOutEndLine(); pairwiseAlignIndex++; fullAlignIndex++; } } for(int i=fullAlignIndex;i=0;i--){ // ditto. if(candAln[i] == 'Z' || !isalnum(candAln[i])) { candAln[i] = '.'; } else{ end = i; break; } } for(int i=start;i<=end;i++){ // go through the candidate alignment sequence and make sure that candAln[i] = toupper(candAln[i]); // everything is upper case } if(candAln.length() != tempAln.length()){ // if the regapped candidate sequence is longer than the official removeExtraGaps(candAln, tempAln, newTemplateAlign);// template alignment then we need to do steps C-F in Fig. } // 2 of Desantis et al. candidateSeq->setAligned(candAln); //cout << "here" << endl; } catch(exception& e) { m->errorOut(e, "Nast", "regapSequences"); exit(1); } } /**************************************************************************************************/ float Nast::getSimilarityScore(){ try { string cand = candidateSeq->getAligned(); string temp = templateSeq->getAligned(); int alignmentLength = temp.length(); int mismatch = 0; int denominator = 0; for(int i=0;ierrorOut(e, "Nast", "getSimilarityScore"); exit(1); } } /**************************************************************************************************/ int Nast::getMaxInsertLength(){ return maxInsertLength; } /**************************************************************************************************/ mothur-1.36.1/source/nast.hpp000066400000000000000000000024761255543666200161250ustar00rootroot00000000000000#ifndef NAST_HPP #define NAST_HPP /* * nast.hpp * * * Created by Pat Schloss on 12/17/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This is my implementation of the NAST (nearest alignment space termination) algorithm as described in: * * DeSantis TZ, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, Phan R, & Anderson GL. 2006. NAST: a multiple * sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Research. 34:W394-9. * * To construct an object one needs to provide a method of getting a pairwise alignment (alignment) and the template * and candidate sequence that are to be aligned to each other. * */ #include "mothur.h" #include "mothurout.h" class Alignment; class Sequence; /**************************************************************************************************/ class Nast { public: Nast(Alignment*, Sequence*, Sequence*); ~Nast(){}; float getSimilarityScore(); int getMaxInsertLength(); private: void pairwiseAlignSeqs(); void regapSequences(); void removeExtraGaps(string&, string, string); Alignment* alignment; Sequence* candidateSeq; Sequence* templateSeq; int maxInsertLength; MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/nastreport.cpp000066400000000000000000000152471255543666200173540ustar00rootroot00000000000000/* * nastreport.cpp * * * Created by Pat Schloss on 12/19/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "sequence.hpp" #include "nast.hpp" #include "alignment.hpp" #include "nastreport.hpp" /******************************************************************************************************************/ NastReport::NastReport() { try { m = MothurOut::getInstance(); output = ""; } catch(exception& e) { m->errorOut(e, "NastReport", "NastReport"); exit(1); } } /******************************************************************************************************************/ string NastReport::getHeaders() { try { output = ""; output += "QueryName\tQueryLength\tTemplateName\tTemplateLength\t"; output += "SearchMethod\tSearchScore\t"; output += "AlignmentMethod\tQueryStart\tQueryEnd\tTemplateStart\tTemplateEnd\t"; output += "PairwiseAlignmentLength\tGapsInQuery\tGapsInTemplate\t"; output += "LongestInsert\t"; output += "SimBtwnQuery&Template\n"; return output; } catch(exception& e) { m->errorOut(e, "NastReport", "getHeaders"); exit(1); } } /******************************************************************************************************************/ NastReport::NastReport(string candidateReportFName) { try { m = MothurOut::getInstance(); m->openOutputFile(candidateReportFName, candidateReportFile); candidateReportFile << "QueryName\tQueryLength\tTemplateName\tTemplateLength\t"; candidateReportFile << "SearchMethod\tSearchScore\t"; candidateReportFile << "AlignmentMethod\tQueryStart\tQueryEnd\tTemplateStart\tTemplateEnd\t"; candidateReportFile << "PairwiseAlignmentLength\tGapsInQuery\tGapsInTemplate\t"; candidateReportFile << "LongestInsert\t"; candidateReportFile << "SimBtwnQuery&Template" << endl; } catch(exception& e) { m->errorOut(e, "NastReport", "NastReport"); exit(1); } } /******************************************************************************************************************/ NastReport::~NastReport() { try { candidateReportFile.close(); } catch(exception& e) { m->errorOut(e, "NastReport", "~NastReport"); exit(1); } } /******************************************************************************************************************/ void NastReport::print(){ try { candidateReportFile << queryName << '\t' << queryLength << '\t' << templateName << '\t' << templateLength << '\t'; candidateReportFile << searchMethod << '\t' << setprecision(2) << fixed << searchScore << '\t'; candidateReportFile << alignmentMethod << '\t' << candidateStartPosition << "\t" << candidateEndPosition << '\t'; candidateReportFile << templateStartPosition << "\t" << templateEndPosition << '\t'; candidateReportFile << pairwiseAlignmentLength << '\t' << totalGapsInQuery << '\t' << totalGapsInTemplate << '\t'; candidateReportFile << longestInsert << '\t'; candidateReportFile << setprecision(2) << similarityToTemplate; candidateReportFile << endl; candidateReportFile.flush(); } catch(exception& e) { m->errorOut(e, "NastReport", "print"); exit(1); } } /******************************************************************************************************************/ string NastReport::getReport(){ try { output = ""; output += queryName + '\t' + toString(queryLength) + '\t' + templateName + '\t' + toString(templateLength) + '\t'; string temp = toString(searchScore); int pos = temp.find_last_of('.'); //find deicmal point if their is one //if there is a decimal if (pos != -1) { temp = temp.substr(0, pos+3); } //set precision to 2 places else{ temp += ".00"; } output += searchMethod + '\t' + temp + '\t'; output += alignmentMethod + '\t' + toString(candidateStartPosition) + "\t" + toString(candidateEndPosition) + '\t'; output += toString(templateStartPosition) + "\t" + toString(templateEndPosition) + '\t'; output += toString(pairwiseAlignmentLength) + '\t' + toString(totalGapsInQuery) + '\t' + toString(totalGapsInTemplate) + '\t'; output += toString(longestInsert) + '\t'; temp = toString(similarityToTemplate); pos = temp.find_last_of('.'); //find deicmal point if their is one //if there is a decimal if (pos != -1) { temp = temp.substr(0, pos+3); } //set precision to 2 places else{ temp += ".00"; } output += temp + '\n'; return output; } catch(exception& e) { m->errorOut(e, "NastReport", "getReport"); exit(1); } } /******************************************************************************************************************/ void NastReport::setCandidate(Sequence* candSeq){ try { queryName = candSeq->getName(); queryLength = candSeq->getNumBases(); } catch(exception& e) { m->errorOut(e, "NastReport", "setCandidate"); exit(1); } } /******************************************************************************************************************/ void NastReport::setTemplate(Sequence* tempSeq){ try { templateName = tempSeq->getName(); templateLength = tempSeq->getNumBases(); } catch(exception& e) { m->errorOut(e, "NastReport", "setTemplate"); exit(1); } } /******************************************************************************************************************/ void NastReport::setSearchParameters(string method, float score){ try { searchMethod = method; searchScore = score; } catch(exception& e) { m->errorOut(e, "NastReport", "setSearchParameters"); exit(1); } } /******************************************************************************************************************/ void NastReport::setAlignmentParameters(string method, Alignment* align){ try { alignmentMethod = method; candidateStartPosition = align->getCandidateStartPos(); candidateEndPosition = align->getCandidateEndPos(); templateStartPosition = align->getTemplateStartPos(); templateEndPosition = align->getTemplateEndPos(); pairwiseAlignmentLength = align->getPairwiseLength(); totalGapsInQuery = pairwiseAlignmentLength - (candidateEndPosition - candidateStartPosition + 1); totalGapsInTemplate = pairwiseAlignmentLength - (templateEndPosition - templateStartPosition + 1); } catch(exception& e) { m->errorOut(e, "NastReport", "setAlignmentParameters"); exit(1); } } /******************************************************************************************************************/ void NastReport::setNastParameters(Nast nast){ try { longestInsert = nast.getMaxInsertLength(); similarityToTemplate = nast.getSimilarityScore(); } catch(exception& e) { m->errorOut(e, "NastReport", "setNastParameters"); exit(1); } } /******************************************************************************************************************/ mothur-1.36.1/source/nastreport.hpp000066400000000000000000000023041255543666200173470ustar00rootroot00000000000000#ifndef NASTREPORT_HPP #define NASTREPORT_HPP /* * nastreport.hpp * * * Created by Pat Schloss on 12/19/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" /******************************************************************************************************************/ class NastReport { public: NastReport(string); NastReport(); ~NastReport(); void setCandidate(Sequence*); void setTemplate(Sequence*); void setSearchParameters(string, float); void setAlignmentParameters(string, Alignment*); void setNastParameters(Nast); void print(); string getReport(); string getHeaders(); private: string queryName; string output; int queryLength; string templateName; int templateLength; string searchMethod; float searchScore; string alignmentMethod; int candidateStartPosition, candidateEndPosition; int templateStartPosition, templateEndPosition; int pairwiseAlignmentLength; int longestInsert; int totalGapsInQuery, totalGapsInTemplate; float similarityToTemplate; ofstream candidateReportFile; MothurOut* m; }; /******************************************************************************************************************/ #endif mothur-1.36.1/source/needlemanoverlap.cpp000066400000000000000000000177721255543666200205010ustar00rootroot00000000000000/* * needleman.cpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class is an Alignment child class that implements the Gotoh pairwise alignment algorithm as described in: * * Gotoh O. 1982. An improved algorithm for matching biological sequences. J. Mol. Biol. 162:705-8. * Myers, EW & Miller, W. 1988. Optimal alignments in linear space. Comput Appl Biosci. 4:11-7. * * This method is nice because it allows for an affine gap penalty to be assessed, which is analogous to what is used * in blast and is an alternative to Needleman-Wunsch, which only charges the same penalty for each gap position. * Because this method typically has problems at the ends when two sequences do not full overlap, we employ a separate * method to fix the ends (see Overlap class documentation) * */ #include "alignmentcell.hpp" #include "alignment.hpp" #include "overlap.hpp" #include "needlemanoverlap.hpp" /**************************************************************************************************/ NeedlemanOverlap::NeedlemanOverlap(float gO, float f, float mm, int r) :// note that we don't have a gap extend gap(gO), match(f), mismatch(mm), Alignment(r) { // the gap openning penalty is assessed for try { // every gapped position for(int i=1;ierrorOut(e, "NeedlemanOverlap", "NeedlemanOverlap"); exit(1); } } /**************************************************************************************************/ NeedlemanOverlap::~NeedlemanOverlap(){ /* do nothing */ } /**************************************************************************************************/ void NeedlemanOverlap::align(string A, string B){ try { seqA = ' ' + A; lA = seqA.length(); // algorithm requires a dummy space at the beginning of each string seqB = ' ' + B; lB = seqB.length(); // algorithm requires a dummy space at the beginning of each string if (lA > nRows) { m->mothurOut("One of your candidate sequences is longer than you longest template sequence. Your longest template sequence is " + toString(nRows) + ". Your candidate is " + toString(lA) + "."); m->mothurOutEndLine(); } for(int i=1;i= up){ if(diagonal >= left){ alignment[i][j].cValue = diagonal; alignment[i][j].prevCell = 'd'; } else{ alignment[i][j].cValue = left; alignment[i][j].prevCell = 'l'; } } else{ if(up >= left){ alignment[i][j].cValue = up; alignment[i][j].prevCell = 'u'; } else{ alignment[i][j].cValue = left; alignment[i][j].prevCell = 'l'; } } } } Overlap over; over.setOverlap(alignment, lA, lB, 0); // Fix gaps at the beginning and end of the sequences traceBack(); // Traceback the alignment to populate seqAaln and seqBaln } catch(exception& e) { m->errorOut(e, "NeedlemanOverlap", "align"); exit(1); } } /**************************************************************************************************/ void NeedlemanOverlap::alignPrimer(string A, string B){ try { seqA = ' ' + A; lA = seqA.length(); // algorithm requires a dummy space at the beginning of each string seqB = ' ' + B; lB = seqB.length(); // algorithm requires a dummy space at the beginning of each string if (lA > nRows) { m->mothurOut("One of your candidate sequences is longer than you longest template sequence. Your longest template sequence is " + toString(nRows) + ". Your candidate is " + toString(lA) + "."); m->mothurOutEndLine(); } for(int i=1;i= up){ if(diagonal >= left){ alignment[i][j].cValue = diagonal; alignment[i][j].prevCell = 'd'; } else{ alignment[i][j].cValue = left; alignment[i][j].prevCell = 'l'; } } else{ if(up >= left){ alignment[i][j].cValue = up; alignment[i][j].prevCell = 'u'; } else{ alignment[i][j].cValue = left; alignment[i][j].prevCell = 'l'; } } } } Overlap over; over.setOverlap(alignment, lA, lB, 0); // Fix gaps at the beginning and end of the sequences traceBack(); // Traceback the alignment to populate seqAaln and seqBaln } catch(exception& e) { m->errorOut(e, "NeedlemanOverlap", "alignPrimer"); exit(1); } } //********************************************************************/ bool NeedlemanOverlap::isEquivalent(char oligo, char seq){ try { bool same = true; oligo = toupper(oligo); seq = toupper(seq); if(oligo != seq){ if(oligo == 'A' && (seq != 'A' && seq != 'M' && seq != 'R' && seq != 'W' && seq != 'D' && seq != 'H' && seq != 'V')) { same = false; } else if(oligo == 'C' && (seq != 'C' && seq != 'Y' && seq != 'M' && seq != 'S' && seq != 'B' && seq != 'H' && seq != 'V')) { same = false; } else if(oligo == 'G' && (seq != 'G' && seq != 'R' && seq != 'K' && seq != 'S' && seq != 'B' && seq != 'D' && seq != 'V')) { same = false; } else if(oligo == 'T' && (seq != 'T' && seq != 'Y' && seq != 'K' && seq != 'W' && seq != 'B' && seq != 'D' && seq != 'H')) { same = false; } else if((oligo == '.' || oligo == '-')) { same = false; } else if((oligo == 'N' || oligo == 'I') && (seq == 'N')) { same = false; } else if(oligo == 'R' && (seq != 'A' && seq != 'G')) { same = false; } else if(oligo == 'Y' && (seq != 'C' && seq != 'T')) { same = false; } else if(oligo == 'M' && (seq != 'C' && seq != 'A')) { same = false; } else if(oligo == 'K' && (seq != 'T' && seq != 'G')) { same = false; } else if(oligo == 'W' && (seq != 'T' && seq != 'A')) { same = false; } else if(oligo == 'S' && (seq != 'C' && seq != 'G')) { same = false; } else if(oligo == 'B' && (seq != 'C' && seq != 'T' && seq != 'G')) { same = false; } else if(oligo == 'D' && (seq != 'A' && seq != 'T' && seq != 'G')) { same = false; } else if(oligo == 'H' && (seq != 'A' && seq != 'T' && seq != 'C')) { same = false; } else if(oligo == 'V' && (seq != 'A' && seq != 'C' && seq != 'G')) { same = false; } } return same; } catch(exception& e) { m->errorOut(e, "TrimOligos", "countDiffs"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/needlemanoverlap.hpp000066400000000000000000000025411255543666200204720ustar00rootroot00000000000000#ifndef NEEDLEMAN_H #define NEEDLEMAN_H /* * needleman.h * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class is an Alignment child class that implements the Needleman-Wunsch pairwise alignment algorithm as * described in: * * Needleman SB & Wunsch CD. 1970. A general method applicable to the search for similarities in the amino acid * sequence of two proteins. J Mol Biol. 48:443-53. * Korf I, Yandell M, & Bedell J. 2003. BLAST. O'Reilly & Associates. Sebastopol, CA. * * This method is simple as it assesses a consistent penalty for each gap position. Because this method typically has * problems at the ends when two sequences do not full overlap, we employ a separate method to fix the ends (see * Overlap class documentation) * */ #include "mothur.h" #include "alignment.hpp" /**************************************************************************************************/ class NeedlemanOverlap : public Alignment { public: NeedlemanOverlap(float, float, float, int); ~NeedlemanOverlap(); void align(string, string); void alignPrimer(string, string); private: float gap; float match; float mismatch; bool isEquivalent(char, char); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/noalign.cpp000066400000000000000000000013211255543666200165660ustar00rootroot00000000000000/* * noalign.cpp * * * Created by Pat Schloss on 2/19/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "alignment.hpp" #include "noalign.hpp" /**************************************************************************************************/ NoAlign::NoAlign(){ /* do nothing */ } /**************************************************************************************************/ NoAlign::~NoAlign(){ /* do nothing */ } /**************************************************************************************************/ void NoAlign::align(string A, string B){ } /**************************************************************************************************/ mothur-1.36.1/source/noalign.hpp000066400000000000000000000010211255543666200165700ustar00rootroot00000000000000#ifndef NOALIGN_HPP #define NOALIGN_HPP /* * noalign.hpp * * * Created by Pat Schloss on 2/19/09. * Copyright 2009Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" /**************************************************************************************************/ class NoAlign : public Alignment { public: NoAlign(); ~NoAlign(); void align(string, string); private: }; /**************************************************************************************************/ #endif mothur-1.36.1/source/observable.h000066400000000000000000000006371255543666200167410ustar00rootroot00000000000000#ifndef OBSERVABLE_H #define OBSERVABLE_H #include "collectdisplay.h" /***********************************************************************/ class Observable { public: virtual void registerDisplay(Display*) = 0; virtual void removeDisplay(Display*) = 0; virtual void notifyDisplays() = 0; virtual ~Observable() {} }; /***********************************************************************/ #endif mothur-1.36.1/source/optionparser.cpp000066400000000000000000000170111255543666200176670ustar00rootroot00000000000000/* * optionparser.cpp * Mothur * * Created by Sarah Westcott on 6/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "optionparser.h" /***********************************************************************/ OptionParser::OptionParser(string option) { try { m = MothurOut::getInstance(); if (option != "") { string key, value; //reads in parameters and values while((option.find_first_of(',') != -1)) { //while there are parameters m->splitAtComma(value, option); m->splitAtEquals(key, value); if ((key == "candidate") || (key == "query")) { key = "fasta"; } if (key == "template") { key = "reference"; } parameters[key] = value; } //in case there is no comma and to get last parameter after comma m->splitAtEquals(key, option); if ((key == "candidate") || (key == "query")) { key = "fasta"; } if (key == "template") { key = "reference"; } parameters[key] = option; } } catch(exception& e) { m->errorOut(e, "OptionParser", "OptionParser"); exit(1); } } /***********************************************************************/ OptionParser::OptionParser(string option, map& copy) { try { m = MothurOut::getInstance(); if (option != "") { string key, value; //reads in parameters and values while((option.find_first_of(',') != -1)) { //while there are parameters m->splitAtComma(value, option); m->splitAtEquals(key, value); if ((key == "candidate") || (key == "query")) { key = "fasta"; } if (key == "template") { key = "reference"; } parameters[key] = value; } //in case there is no comma and to get last parameter after comma m->splitAtEquals(key, option); if ((key == "candidate") || (key == "query")) { key = "fasta"; } if (key == "template") { key = "reference"; } parameters[key] = option; } copy = parameters; } catch(exception& e) { m->errorOut(e, "OptionParser", "OptionParser"); exit(1); } } /***********************************************************************/ map OptionParser::getParameters() { try { //loop through parameters and look for "current" so you can return the appropriate file //doing it here to avoid code duplication in each of the commands map::iterator it; for (it = parameters.begin(); it != parameters.end();) { if (it->second == "current") { //look for file types if (it->first == "fasta") { it->second = m->getFastaFile(); }else if (it->first == "qfile") { it->second = m->getQualFile(); }else if (it->first == "phylip") { it->second = m->getPhylipFile(); }else if (it->first == "column") { it->second = m->getColumnFile(); }else if (it->first == "list") { it->second = m->getListFile(); }else if (it->first == "rabund") { it->second = m->getRabundFile(); }else if (it->first == "sabund") { it->second = m->getSabundFile(); }else if (it->first == "name") { it->second = m->getNameFile(); }else if (it->first == "group") { it->second = m->getGroupFile(); }else if (it->first == "order") { it->second = m->getOrderFile(); }else if (it->first == "ordergroup") { it->second = m->getOrderGroupFile(); }else if (it->first == "tree") { it->second = m->getTreeFile(); }else if (it->first == "shared") { it->second = m->getSharedFile(); }else if (it->first == "relabund") { it->second = m->getRelAbundFile(); }else if (it->first == "design") { it->second = m->getDesignFile(); }else if (it->first == "sff") { it->second = m->getSFFFile(); }else if (it->first == "flow") { it->second = m->getFlowFile(); }else if (it->first == "oligos") { it->second = m->getOligosFile(); }else if (it->first == "accnos") { it->second = m->getAccnosFile(); }else if (it->first == "taxonomy") { it->second = m->getTaxonomyFile(); }else if (it->first == "biom") { it->second = m->getBiomFile(); }else if (it->first == "count") { it->second = m->getCountTableFile(); }else if (it->first == "summary") { it->second = m->getSummaryFile(); }else if (it->first == "file") { it->second = m->getFileFile(); }else { m->mothurOut("[ERROR]: mothur does not save a current file for " + it->first); m->mothurOutEndLine(); } if (it->second == "") { //no file was saved for that type, warn and remove from parameters m->mothurOut("[WARNING]: no file was saved for " + it->first + " parameter."); m->mothurOutEndLine(); parameters.erase(it++); }else { m->mothurOut("Using " + it->second + " as input file for the " + it->first + " parameter."); m->mothurOutEndLine(); it++; } }else{ it++; } } return parameters; } catch(exception& e) { m->errorOut(e, "OptionParser", "getParameters"); exit(1); } } /***********************************************************************/ //pass a vector of filenames that may match the current namefile. //this function will look at each one, if the rootnames match, mothur will warn //the user that they may have neglected to provide a namefile. //stops when it finds a match. bool OptionParser::getNameFile(vector files) { try { string namefile = m->getNameFile(); bool match = false; if ((namefile != "")&&(!m->mothurCalling)) { string temp = m->getRootName(m->getSimpleName(namefile)); vector rootName; m->splitAtChar(temp, rootName, '.'); for (int i = 0; i < files.size(); i++) { temp = m->getRootName(m->getSimpleName(files[i])); vector root; m->splitAtChar(temp, root, '.'); int smallest = rootName.size(); if (root.size() < smallest) { smallest = root.size(); } int numMatches = 0; for(int j = 0; j < smallest; j++) { if (root[j] == rootName[j]) { numMatches++; } } if (smallest > 0) { if ((numMatches >= (smallest-2)) && (root[0] == rootName[0])) { m->mothurOut("[WARNING]: This command can take a namefile and you did not provide one. The current namefile is " + namefile + " which seems to match " + files[i] + "."); m->mothurOutEndLine(); match = true; break; } } } } return match; } catch(exception& e) { m->errorOut(e, "OptionParser", "getNameFile"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/optionparser.h000066400000000000000000000012471255543666200173400ustar00rootroot00000000000000#ifndef OPTIONPARSER_H #define OPTIONPARSER_H /* * optionparser.h * Mothur * * Created by Sarah Westcott on 6/8/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" #include "command.hpp" /***********************************************************************/ class OptionParser { public: OptionParser(string); OptionParser(string, map&); ~OptionParser() {} map getParameters(); bool getNameFile(vector); private: map parameters; MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/overlap.cpp000066400000000000000000000054331255543666200166170ustar00rootroot00000000000000/* * overlap.cpp * * * Created by Pat Schloss on 12/15/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * * This class cleans up the alignment at the 3' end of the alignments. Because the Gotoh and Needleman-Wunsch * algorithms start the traceback from the lower-right corner of the dynamic programming matrix, there may be a lot of * scattered bases in the alignment near the 3' end of the alignment. Here we basically look for the largest score * in the last column and row to determine whether there should be exta gaps in sequence A or sequence B. The gap * issues at the 5' end of the alignment seem to take care of themselves in the traceback. * */ #include "alignmentcell.hpp" #include "overlap.hpp" /**************************************************************************************************/ int Overlap::maxRow(vector >& alignment, const int band){ float max = -100; int end = lA - 1; int index = end; for(int i=band;i= max){ // score. index = i; max = alignment[i][end].cValue; } } return index; } /**************************************************************************************************/ int Overlap::maxColumn(vector >& alignment, const int band){ float max = -100; int end = lB - 1; int index = end; for(int i=band;i= max){ // alignment score. index = i; max = alignment[end][i].cValue; } } return index; } /**************************************************************************************************/ void Overlap::setOverlap(vector >& alignment, const int nA, const int nB, const int band=0){ lA = nA; lB = nB; int rowIndex = maxRow(alignment, band); // get the index for the row with the highest right hand side score int colIndex = maxColumn(alignment, band); // get the index for the column with the highest bottom row score int row = lB-1; int column = lA-1; if(colIndex == column && rowIndex == row){} // if the max values are the lower right corner, then we're good else if(alignment[row][colIndex].cValue < alignment[rowIndex][column].cValue){ for(int i=rowIndex+1;i >&, const int, const int, const int); private: int maxRow(vector >&, const int); int maxColumn(vector >&, const int); int lA, lB; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/progress.cpp000066400000000000000000000045201255543666200170070ustar00rootroot00000000000000/* * progress.cpp * * * Created by Pat Schloss on 8/14/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "progress.hpp" const int totalTicks = 50; const char marker = '|'; /***********************************************************************/ Progress::Progress(){ try { m = MothurOut::getInstance(); m->mothurOut("********************#****#****#****#****#****#****#****#****#****#****#"); nTicks = 0; finalPos = 0; } catch(exception& e) { m->errorOut(e, "Progress", "Progress"); exit(1); } } /***********************************************************************/ Progress::Progress(string job, int end){ try { m = MothurOut::getInstance(); m->mothurOut("********************#****#****#****#****#****#****#****#****#****#****#\n"); cout << setw(20) << left << job << setw(1) << marker; m->mothurOutJustToLog(job); m->mothurOut(toString(marker)); cout.flush(); nTicks = 0; finalPos = end; } catch(exception& e) { m->errorOut(e, "Progress", "Progress"); exit(1); } } /***********************************************************************/ void Progress::newLine(string job, int end){ try { m->mothurOutEndLine(); cout << setw(20) << left << job << setw(1) << marker; m->mothurOutJustToLog(job); m->mothurOut(toString(marker)); cout.flush(); nTicks = 0; finalPos = end; } catch(exception& e) { m->errorOut(e, "Progress", "newLine"); exit(1); } } /***********************************************************************/ void Progress::update(const int currentPos){ try { int ratio = int(totalTicks * (float)currentPos / finalPos); if(ratio > nTicks){ for(int i=nTicks;imothurOut(toString(marker)); cout.flush(); } nTicks = ratio; } } catch(exception& e) { m->errorOut(e, "Progress", "update"); exit(1); } } /***********************************************************************/ void Progress::finish(){ try { for(int i=nTicks;imothurOut(toString(marker)); cout.flush(); } m->mothurOutEndLine(); m->mothurOut("***********************************************************************\n"); cout.flush(); } catch(exception& e) { m->errorOut(e, "Progress", "finish"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/progress.hpp000066400000000000000000000004241255543666200170130ustar00rootroot00000000000000#ifndef PROGRESS_H #define PROGRESS_H #include "mothur.h" #include "mothurout.h" class Progress { public: Progress(); Progress(string, int); void update(int); void newLine(string, int); void finish(); private: int nTicks; int finalPos; MothurOut* m; }; #endif mothur-1.36.1/source/randomforest/000077500000000000000000000000001255543666200171415ustar00rootroot00000000000000mothur-1.36.1/source/randomforest/abstractdecisiontree.cpp000066400000000000000000000334451255543666200240570ustar00rootroot00000000000000// // abstractdecisiontree.cpp // Mothur // // Created by Sarah Westcott on 10/1/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "abstractdecisiontree.hpp" /**************************************************************************************************/ AbstractDecisionTree::AbstractDecisionTree(vector >& baseDataSet, vector globalDiscardedFeatureIndices, OptimumFeatureSubsetSelector optimumFeatureSubsetSelector, string treeSplitCriterion) : baseDataSet(baseDataSet), numSamples((int)baseDataSet.size()), numFeatures((int)(baseDataSet[0].size() - 1)), numOutputClasses(0), rootNode(NULL), nodeIdCount(0), globalDiscardedFeatureIndices(globalDiscardedFeatureIndices), optimumFeatureSubsetSize(optimumFeatureSubsetSelector.getOptimumFeatureSubsetSize(numFeatures)), treeSplitCriterion(treeSplitCriterion) { try { // TODO: istead of calculating this for every DecisionTree // clacualte this once in the RandomForest class and pass the values m = MothurOut::getInstance(); for (int i = 0; i < numSamples; i++) { if (m->control_pressed) { break; } int outcome = baseDataSet[i][numFeatures]; vector::iterator it = find(outputClasses.begin(), outputClasses.end(), outcome); if (it == outputClasses.end()){ // find() will return classes.end() if the element is not found outputClasses.push_back(outcome); numOutputClasses++; } } if (m->debug) { //m->mothurOut("outputClasses = " + toStringVectorInt(outputClasses)); m->mothurOut("numOutputClasses = " + toString(numOutputClasses) + '\n'); } } catch(exception& e) { m->errorOut(e, "AbstractDecisionTree", "AbstractDecisionTree"); exit(1); } } /**************************************************************************************************/ int AbstractDecisionTree::createBootStrappedSamples(){ try { vector isInTrainingSamples(numSamples, false); for (int i = 0; i < numSamples; i++) { if (m->control_pressed) { return 0; } // TODO: optimize the rand() function call + double check if it's working properly int randomIndex = rand() % numSamples; bootstrappedTrainingSamples.push_back(baseDataSet[randomIndex]); isInTrainingSamples[randomIndex] = true; } for (int i = 0; i < numSamples; i++) { if (m->control_pressed) { return 0; } if (isInTrainingSamples[i]){ bootstrappedTrainingSampleIndices.push_back(i); } else{ bootstrappedTestSamples.push_back(baseDataSet[i]); bootstrappedTestSampleIndices.push_back(i); } } // do the transpose of Test Samples for (int i = 0; i < bootstrappedTestSamples[0].size(); i++) { if (m->control_pressed) { return 0; } vector tmpFeatureVector(bootstrappedTestSamples.size(), 0); for (int j = 0; j < bootstrappedTestSamples.size(); j++) { if (m->control_pressed) { return 0; } tmpFeatureVector[j] = bootstrappedTestSamples[j][i]; } testSampleFeatureVectors.push_back(tmpFeatureVector); } return 0; } catch(exception& e) { m->errorOut(e, "AbstractDecisionTree", "createBootStrappedSamples"); exit(1); } } /**************************************************************************************************/ int AbstractDecisionTree::getMinEntropyOfFeature(vector featureVector, vector outputVector, double& minEntropy, int& featureSplitValue, double& intrinsicValue){ try { vector< pair > featureOutputPair(featureVector.size(), pair(0, 0)); for (int i = 0; i < featureVector.size(); i++) { if (m->control_pressed) { return 0; } featureOutputPair[i].first = featureVector[i]; featureOutputPair[i].second = outputVector[i]; } // TODO: using default behavior to sort(), need to specify the comparator for added safety and compiler portability, IntPairVectorSorter intPairVectorSorter; sort(featureOutputPair.begin(), featureOutputPair.end(), intPairVectorSorter); vector splitPoints; vector uniqueFeatureValues(1, featureOutputPair[0].first); for (int i = 0; i < featureOutputPair.size(); i++) { if (m->control_pressed) { return 0; } int featureValue = featureOutputPair[i].first; vector::iterator it = find(uniqueFeatureValues.begin(), uniqueFeatureValues.end(), featureValue); if (it == uniqueFeatureValues.end()){ // NOT FOUND uniqueFeatureValues.push_back(featureValue); splitPoints.push_back(i); } } int bestSplitIndex = -1; if (splitPoints.size() == 0){ // TODO: trying out C++'s infitinity, don't know if this will work properly // TODO: check the caller function of this function, there check the value if minEntropy and comapre to inf // so that no wrong calculation is done minEntropy = numeric_limits::infinity(); // OUTPUT intrinsicValue = numeric_limits::infinity(); // OUTPUT featureSplitValue = -1; // OUTPUT }else{ getBestSplitAndMinEntropy(featureOutputPair, splitPoints, minEntropy, bestSplitIndex, intrinsicValue); // OUTPUT featureSplitValue = featureOutputPair[splitPoints[bestSplitIndex]].first; // OUTPUT } return 0; } catch(exception& e) { m->errorOut(e, "AbstractDecisionTree", "getMinEntropyOfFeature"); exit(1); } } /**************************************************************************************************/ double AbstractDecisionTree::calcIntrinsicValue(int numLessThanValueAtSplitPoint, int numGreaterThanValueAtSplitPoint, int numSamples) { try { double upperSplitEntropy = 0.0, lowerSplitEntropy = 0.0; if (numLessThanValueAtSplitPoint > 0) { upperSplitEntropy = numLessThanValueAtSplitPoint * log2((double) numLessThanValueAtSplitPoint / (double) numSamples); } if (numGreaterThanValueAtSplitPoint > 0) { lowerSplitEntropy = numGreaterThanValueAtSplitPoint * log2((double) numGreaterThanValueAtSplitPoint / (double) numSamples); } double intrinsicValue = - ((double)(upperSplitEntropy + lowerSplitEntropy) / (double)numSamples); return intrinsicValue; } catch(exception& e) { m->errorOut(e, "AbstractDecisionTree", "calcIntrinsicValue"); exit(1); } } /**************************************************************************************************/ int AbstractDecisionTree::getBestSplitAndMinEntropy(vector< pair > featureOutputPairs, vector splitPoints, double& minEntropy, int& minEntropyIndex, double& relatedIntrinsicValue){ try { int numSamples = (int)featureOutputPairs.size(); vector entropies; vector intrinsicValues; for (int i = 0; i < splitPoints.size(); i++) { if (m->control_pressed) { return 0; } int index = splitPoints[i]; int valueAtSplitPoint = featureOutputPairs[index].first; int numLessThanValueAtSplitPoint = 0; int numGreaterThanValueAtSplitPoint = 0; for (int j = 0; j < featureOutputPairs.size(); j++) { if (m->control_pressed) { return 0; } pair record = featureOutputPairs[j]; if (record.first < valueAtSplitPoint){ numLessThanValueAtSplitPoint++; } else{ numGreaterThanValueAtSplitPoint++; } } double upperEntropyOfSplit = calcSplitEntropy(featureOutputPairs, index, numOutputClasses, true); double lowerEntropyOfSplit = calcSplitEntropy(featureOutputPairs, index, numOutputClasses, false); double totalEntropy = (numLessThanValueAtSplitPoint * upperEntropyOfSplit + numGreaterThanValueAtSplitPoint * lowerEntropyOfSplit) / (double)numSamples; double intrinsicValue = calcIntrinsicValue(numLessThanValueAtSplitPoint, numGreaterThanValueAtSplitPoint, numSamples); entropies.push_back(totalEntropy); intrinsicValues.push_back(intrinsicValue); } // set output values vector::iterator it = min_element(entropies.begin(), entropies.end()); minEntropy = *it; // OUTPUT minEntropyIndex = (int)(it - entropies.begin()); // OUTPUT relatedIntrinsicValue = intrinsicValues[minEntropyIndex]; // OUTPUT return 0; } catch(exception& e) { m->errorOut(e, "AbstractDecisionTree", "getBestSplitAndMinEntropy"); exit(1); } } /**************************************************************************************************/ double AbstractDecisionTree::calcSplitEntropy(vector< pair > featureOutputPairs, int splitIndex, int numOutputClasses, bool isUpperSplit = true) { try { vector classCounts(numOutputClasses, 0); if (isUpperSplit) { for (int i = 0; i < splitIndex; i++) { if (m->control_pressed) { return 0; } classCounts[featureOutputPairs[i].second]++; } } else { for (int i = splitIndex; i < featureOutputPairs.size(); i++) { if (m->control_pressed) { return 0; } classCounts[featureOutputPairs[i].second]++; } } int totalClassCounts = accumulate(classCounts.begin(), classCounts.end(), 0); double splitEntropy = 0.0; for (int i = 0; i < classCounts.size(); i++) { if (m->control_pressed) { return 0; } if (classCounts[i] == 0) { continue; } double probability = (double) classCounts[i] / (double) totalClassCounts; splitEntropy += -(probability * log2(probability)); } return splitEntropy; } catch(exception& e) { m->errorOut(e, "AbstractDecisionTree", "calcSplitEntropy"); exit(1); } } /**************************************************************************************************/ int AbstractDecisionTree::getSplitPopulation(RFTreeNode* node, vector< vector >& leftChildSamples, vector< vector >& rightChildSamples){ try { // TODO: there is a possibility of optimization if we can recycle the samples in each nodes // we just need to pointers to the samples i.e. vector and use it everywhere and not create the sample // sample over and over again // we need to make this const so that it is not modified by all the function calling // currently purgeTreeNodesDataRecursively() is used for the same purpose, but this can be avoided altogher // if re-using the same data over the classes int splitFeatureGlobalIndex = node->getSplitFeatureIndex(); for (int i = 0; i < node->getBootstrappedTrainingSamples().size(); i++) { if (m->control_pressed) { return 0; } vector sample = node->getBootstrappedTrainingSamples()[i]; if (m->control_pressed) { return 0; } if (sample[splitFeatureGlobalIndex] < node->getSplitFeatureValue()) { leftChildSamples.push_back(sample); } else { rightChildSamples.push_back(sample); } } return 0; } catch(exception& e) { m->errorOut(e, "AbstractDecisionTree", "getSplitPopulation"); exit(1); } } /**************************************************************************************************/ // TODO: checkIfAlreadyClassified() verify code // TODO: use bootstrappedOutputVector for easier calculation instead of using getBootstrappedTrainingSamples() bool AbstractDecisionTree::checkIfAlreadyClassified(RFTreeNode* treeNode, int& outputClass) { try { vector tempOutputClasses; for (int i = 0; i < treeNode->getBootstrappedTrainingSamples().size(); i++) { if (m->control_pressed) { return 0; } int sampleOutputClass = treeNode->getBootstrappedTrainingSamples()[i][numFeatures]; vector::iterator it = find(tempOutputClasses.begin(), tempOutputClasses.end(), sampleOutputClass); if (it == tempOutputClasses.end()) { // NOT FOUND tempOutputClasses.push_back(sampleOutputClass); } } if (tempOutputClasses.size() < 2) { outputClass = tempOutputClasses[0]; return true; } else { outputClass = -1; return false; } } catch(exception& e) { m->errorOut(e, "AbstractDecisionTree", "checkIfAlreadyClassified"); exit(1); } } /**************************************************************************************************/ mothur-1.36.1/source/randomforest/abstractdecisiontree.hpp000066400000000000000000000052431255543666200240570ustar00rootroot00000000000000// // abstractdecisiontree.hpp // rrf-fs-prototype // // Created by Abu Zaher Faridee on 7/22/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef RF_ABSTRACTDECISIONTREE_HPP #define RF_ABSTRACTDECISIONTREE_HPP #include "mothurout.h" #include "macros.h" #include "rftreenode.hpp" #define DEBUG_MODE /**************************************************************************************************/ struct IntPairVectorSorter{ bool operator() (const pair& firstPair, const pair& secondPair) { return firstPair.first < secondPair.first; } }; /**************************************************************************************************/ class AbstractDecisionTree{ public: AbstractDecisionTree(vector >& baseDataSet, vector globalDiscardedFeatureIndices, OptimumFeatureSubsetSelector optimumFeatureSubsetSelector, string treeSplitCriterion); virtual ~AbstractDecisionTree(){} protected: virtual int createBootStrappedSamples(); virtual int getMinEntropyOfFeature(vector featureVector, vector outputVector, double& minEntropy, int& featureSplitValue, double& intrinsicValue); virtual int getBestSplitAndMinEntropy(vector< pair > featureOutputPairs, vector splitPoints, double& minEntropy, int& minEntropyIndex, double& relatedIntrinsicValue); virtual double calcIntrinsicValue(int numLessThanValueAtSplitPoint, int numGreaterThanValueAtSplitPoint, int numSamples); virtual double calcSplitEntropy(vector< pair > featureOutputPairs, int splitIndex, int numOutputClasses, bool); virtual int getSplitPopulation(RFTreeNode* node, vector< vector >& leftChildSamples, vector< vector >& rightChildSamples); virtual bool checkIfAlreadyClassified(RFTreeNode* treeNode, int& outputClass); vector< vector >& baseDataSet; int numSamples; int numFeatures; int numOutputClasses; vector outputClasses; vector< vector > bootstrappedTrainingSamples; vector bootstrappedTrainingSampleIndices; vector< vector > bootstrappedTestSamples; vector bootstrappedTestSampleIndices; vector > testSampleFeatureVectors; RFTreeNode* rootNode; int nodeIdCount; map nodeMisclassificationCounts; vector globalDiscardedFeatureIndices; int optimumFeatureSubsetSize; string treeSplitCriterion; MothurOut* m; private: }; /**************************************************************************************************/ #endif mothur-1.36.1/source/randomforest/abstractrandomforest.cpp000066400000000000000000000043601255543666200240770ustar00rootroot00000000000000// // abstractrandomforest.cpp // Mothur // // Created by Sarah Westcott on 10/1/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "abstractrandomforest.hpp" /***********************************************************************/ AbstractRandomForest::AbstractRandomForest(const std::vector < std::vector > dataSet, const int numDecisionTrees, const string treeSplitCriterion = "informationGain") : dataSet(dataSet), numDecisionTrees(numDecisionTrees), numSamples((int)dataSet.size()), numFeatures((int)(dataSet[0].size() - 1)), globalDiscardedFeatureIndices(getGlobalDiscardedFeatureIndices()), globalVariableImportanceList(numFeatures, 0), treeSplitCriterion(treeSplitCriterion) { m = MothurOut::getInstance(); // TODO: double check if the implemenatation of 'globalOutOfBagEstimates' is correct } /***********************************************************************/ vector AbstractRandomForest::getGlobalDiscardedFeatureIndices() { try { vector globalDiscardedFeatureIndices; // calculate feature vectors vector< vector > featureVectors(numFeatures, vector(numSamples, 0)); for (int i = 0; i < numSamples; i++) { if (m->control_pressed) { return globalDiscardedFeatureIndices; } for (int j = 0; j < numFeatures; j++) { featureVectors[j][i] = dataSet[i][j]; } } for (int i = 0; i < featureVectors.size(); i++) { if (m->control_pressed) { return globalDiscardedFeatureIndices; } double standardDeviation = m->getStandardDeviation(featureVectors[i]); if (standardDeviation <= 0){ globalDiscardedFeatureIndices.push_back(i); } } if (m->debug) { m->mothurOut("number of global discarded features: " + toString(globalDiscardedFeatureIndices.size())+ "\n"); m->mothurOut("total features: " + toString(featureVectors.size())+ "\n"); } return globalDiscardedFeatureIndices; } catch(exception& e) { m->errorOut(e, "AbstractRandomForest", "getGlobalDiscardedFeatureIndices"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/randomforest/abstractrandomforest.hpp000066400000000000000000000043111255543666200241000ustar00rootroot00000000000000// // abstractrandomforest.hpp // rrf-fs-prototype // // Created by Abu Zaher Faridee on 7/20/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef rrf_fs_prototype_abstractrandomforest_hpp #define rrf_fs_prototype_abstractrandomforest_hpp #include "mothurout.h" #include "macros.h" #include "abstractdecisiontree.hpp" #define DEBUG_MODE /***********************************************************************/ class AbstractRandomForest{ public: // intialization with vectors AbstractRandomForest(const std::vector < std::vector > dataSet, const int numDecisionTrees, const string); virtual ~AbstractRandomForest(){ } virtual int populateDecisionTrees() = 0; virtual int calcForrestErrorRate() = 0; virtual int calcForrestVariableImportance(string) = 0; /***********************************************************************/ protected: // TODO: create a better way of discarding feature // currently we just set FEATURE_DISCARD_SD_THRESHOLD to 0 to solved this // it can be tuned for better selection // also, there might be other factors like Mean or other stuffs // same would apply for createLocalDiscardedFeatureList in the TreeNode class // TODO: Another idea is getting an aggregated discarded feature indices after the run, from combining // the local discarded feature indices // this would penalize a feature, even if in global space the feature looks quite good // the penalization would be averaged, so this woould unlikely to create a local optmina vector getGlobalDiscardedFeatureIndices(); int numDecisionTrees; int numSamples; int numFeatures; vector< vector > dataSet; vector globalDiscardedFeatureIndices; vector globalVariableImportanceList; string treeSplitCriterion; // This is a map of each feature to outcome count of each classes // e.g. 1 => [2 7] means feature 1 has 2 outcome of 0 and 7 outcome of 1 map > globalOutOfBagEstimates; // TODO: fix this, do we use pointers? vector decisionTrees; MothurOut* m; private: }; #endif mothur-1.36.1/source/randomforest/decisiontree.cpp000066400000000000000000000564031255543666200223320ustar00rootroot00000000000000// // decisiontree.cpp // Mothur // // Created by Sarah Westcott on 10/1/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "decisiontree.hpp" DecisionTree::DecisionTree(vector< vector >& baseDataSet, vector globalDiscardedFeatureIndices, OptimumFeatureSubsetSelector optimumFeatureSubsetSelector, string treeSplitCriterion, float featureStandardDeviationThreshold) : AbstractDecisionTree(baseDataSet, globalDiscardedFeatureIndices, optimumFeatureSubsetSelector, treeSplitCriterion), variableImportanceList(numFeatures, 0), featureStandardDeviationThreshold(featureStandardDeviationThreshold) { try { m = MothurOut::getInstance(); createBootStrappedSamples(); buildDecisionTree(); } catch(exception& e) { m->errorOut(e, "DecisionTree", "DecisionTree"); exit(1); } } /***********************************************************************/ int DecisionTree::calcTreeVariableImportanceAndError(int& numCorrect, double& treeErrorRate) { try { vector< vector > randomlySampledTestData(bootstrappedTestSamples.size(), vector(bootstrappedTestSamples[0].size(), 0)); // TODO: is is possible to further speed up the following O(N^2) by using std::copy? for (int i = 0; i < bootstrappedTestSamples.size(); i++) { for (int j = 0; j < bootstrappedTestSamples[i].size(); j++) { randomlySampledTestData[i][j] = bootstrappedTestSamples[i][j]; } } for (int i = 0; i < numFeatures; i++) { if (m->control_pressed) { return 0; } // if the index is in globalDiscardedFeatureIndices (i.e, null feature) we don't want to shuffle them vector::iterator it = find(globalDiscardedFeatureIndices.begin(), globalDiscardedFeatureIndices.end(), i); if (it == globalDiscardedFeatureIndices.end()) { // NOT FOUND // if the standard deviation is very low, we know it's not a good feature at all // we can save some time here by discarding that feature vector featureVector = testSampleFeatureVectors[i]; if (m->getStandardDeviation(featureVector) > featureStandardDeviationThreshold) { // NOTE: only shuffle the features, never shuffle the output vector // so i = 0 and i will be alwaays <= (numFeatures - 1) as the index at numFeatures will denote // the feature vector randomlyShuffleAttribute(bootstrappedTestSamples, i, i - 1, randomlySampledTestData); int numCorrectAfterShuffle = 0; for (int j = 0; j < randomlySampledTestData.size(); j++) { if (m->control_pressed) {return 0; } vector shuffledSample = randomlySampledTestData[j]; int actualSampleOutputClass = shuffledSample[numFeatures]; int predictedSampleOutputClass = evaluateSample(shuffledSample); if (actualSampleOutputClass == predictedSampleOutputClass) { numCorrectAfterShuffle++; } } variableImportanceList[i] += (numCorrect - numCorrectAfterShuffle); } } } // TODO: do we need to save the variableRanks in the DecisionTree, do we need it later? vector< pair > variableRanks; for (int i = 0; i < variableImportanceList.size(); i++) { if (m->control_pressed) {return 0; } if (variableImportanceList[i] > 0) { // TODO: is there a way to optimize the follow line's code? pair variableRank(0, 0); variableRank.first = i; variableRank.second = variableImportanceList[i]; variableRanks.push_back(variableRank); } } VariableRankDescendingSorter variableRankDescendingSorter; sort(variableRanks.begin(), variableRanks.end(), variableRankDescendingSorter); return 0; } catch(exception& e) { m->errorOut(e, "DecisionTree", "calcTreeVariableImportanceAndError"); exit(1); } } /***********************************************************************/ // TODO: there must be a way to optimize this function int DecisionTree::evaluateSample(vector testSample) { try { RFTreeNode *node = rootNode; while (true) { if (m->control_pressed) { return 0; } if (node->checkIsLeaf()) { return node->getOutputClass(); } int sampleSplitFeatureValue = testSample[node->getSplitFeatureIndex()]; if (sampleSplitFeatureValue < node->getSplitFeatureValue()) { node = node->getLeftChildNode(); } else { node = node->getRightChildNode(); } } return 0; } catch(exception& e) { m->errorOut(e, "DecisionTree", "evaluateSample"); exit(1); } } /***********************************************************************/ int DecisionTree::calcTreeErrorRate(int& numCorrect, double& treeErrorRate){ numCorrect = 0; try { for (int i = 0; i < bootstrappedTestSamples.size(); i++) { if (m->control_pressed) {return 0; } vector testSample = bootstrappedTestSamples[i]; int testSampleIndex = bootstrappedTestSampleIndices[i]; int actualSampleOutputClass = testSample[numFeatures]; int predictedSampleOutputClass = evaluateSample(testSample); if (actualSampleOutputClass == predictedSampleOutputClass) { numCorrect++; } outOfBagEstimates[testSampleIndex] = predictedSampleOutputClass; } treeErrorRate = 1 - ((double)numCorrect / (double)bootstrappedTestSamples.size()); return 0; } catch(exception& e) { m->errorOut(e, "DecisionTree", "calcTreeErrorRate"); exit(1); } } /***********************************************************************/ // TODO: optimize the algo, instead of transposing two time, we can extarct the feature, // shuffle it and then re-insert in the original place, thus iproving runnting time //This function randomize abundances for a given OTU/feature. void DecisionTree::randomlyShuffleAttribute(const vector< vector >& samples, const int featureIndex, const int prevFeatureIndex, vector< vector >& shuffledSample) { try { // NOTE: we need (numFeatures + 1) featureVecotors, the last extra vector is actually outputVector // restore previously shuffled feature if (prevFeatureIndex > -1) { for (int j = 0; j < samples.size(); j++) { if (m->control_pressed) { return; } shuffledSample[j][prevFeatureIndex] = samples[j][prevFeatureIndex]; } } // now do the shuffling vector featureVectors(samples.size(), 0); for (int j = 0; j < samples.size(); j++) { if (m->control_pressed) { return; } featureVectors[j] = samples[j][featureIndex]; } random_shuffle(featureVectors.begin(), featureVectors.end()); for (int j = 0; j < samples.size(); j++) { if (m->control_pressed) { return; } shuffledSample[j][featureIndex] = featureVectors[j]; } } catch(exception& e) { m->errorOut(e, "DecisionTree", "randomlyShuffleAttribute"); exit(1); } } /***********************************************************************/ int DecisionTree::purgeTreeNodesDataRecursively(RFTreeNode* treeNode) { try { treeNode->bootstrappedTrainingSamples.clear(); treeNode->bootstrappedFeatureVectors.clear(); treeNode->bootstrappedOutputVector.clear(); treeNode->localDiscardedFeatureIndices.clear(); treeNode->globalDiscardedFeatureIndices.clear(); if (treeNode->leftChildNode != NULL) { purgeTreeNodesDataRecursively(treeNode->leftChildNode); } if (treeNode->rightChildNode != NULL) { purgeTreeNodesDataRecursively(treeNode->rightChildNode); } return 0; } catch(exception& e) { m->errorOut(e, "DecisionTree", "purgeTreeNodesDataRecursively"); exit(1); } } /***********************************************************************/ void DecisionTree::buildDecisionTree(){ try { int generation = 0; rootNode = new RFTreeNode(bootstrappedTrainingSamples, globalDiscardedFeatureIndices, numFeatures, numSamples, numOutputClasses, generation, nodeIdCount, featureStandardDeviationThreshold); nodeIdCount++; splitRecursively(rootNode); } catch(exception& e) { m->errorOut(e, "DecisionTree", "buildDecisionTree"); exit(1); } } /***********************************************************************/ int DecisionTree::splitRecursively(RFTreeNode* rootNode) { try { if (rootNode->getNumSamples() < 2){ rootNode->setIsLeaf(true); rootNode->setOutputClass(rootNode->getBootstrappedTrainingSamples()[0][rootNode->getNumFeatures()]); return 0; } int classifiedOutputClass; bool isAlreadyClassified = checkIfAlreadyClassified(rootNode, classifiedOutputClass); if (isAlreadyClassified == true){ rootNode->setIsLeaf(true); rootNode->setOutputClass(classifiedOutputClass); return 0; } if (m->control_pressed) { return 0; } vector featureSubsetIndices = selectFeatureSubsetRandomly(globalDiscardedFeatureIndices, rootNode->getLocalDiscardedFeatureIndices()); // TODO: need to check if the value is actually copied correctly rootNode->setFeatureSubsetIndices(featureSubsetIndices); if (m->control_pressed) { return 0; } findAndUpdateBestFeatureToSplitOn(rootNode); // update rootNode outputClass, this is needed for pruning // this is only for internal nodes updateOutputClassOfNode(rootNode); if (m->control_pressed) { return 0; } vector< vector > leftChildSamples; vector< vector > rightChildSamples; getSplitPopulation(rootNode, leftChildSamples, rightChildSamples); if (m->control_pressed) { return 0; } // TODO: need to write code to clear this memory RFTreeNode* leftChildNode = new RFTreeNode(leftChildSamples, globalDiscardedFeatureIndices, numFeatures, (int)leftChildSamples.size(), numOutputClasses, rootNode->getGeneration() + 1, nodeIdCount, featureStandardDeviationThreshold); nodeIdCount++; RFTreeNode* rightChildNode = new RFTreeNode(rightChildSamples, globalDiscardedFeatureIndices, numFeatures, (int)rightChildSamples.size(), numOutputClasses, rootNode->getGeneration() + 1, nodeIdCount, featureStandardDeviationThreshold); nodeIdCount++; rootNode->setLeftChildNode(leftChildNode); leftChildNode->setParentNode(rootNode); rootNode->setRightChildNode(rightChildNode); rightChildNode->setParentNode(rootNode); // TODO: This recursive split can be parrallelized later splitRecursively(leftChildNode); if (m->control_pressed) { return 0; } splitRecursively(rightChildNode); return 0; } catch(exception& e) { m->errorOut(e, "DecisionTree", "splitRecursively"); exit(1); } } /***********************************************************************/ int DecisionTree::findAndUpdateBestFeatureToSplitOn(RFTreeNode* node){ try { vector< vector > bootstrappedFeatureVectors = node->getBootstrappedFeatureVectors(); if (m->control_pressed) { return 0; } vector bootstrappedOutputVector = node->getBootstrappedOutputVector(); if (m->control_pressed) { return 0; } vector featureSubsetIndices = node->getFeatureSubsetIndices(); if (m->control_pressed) { return 0; } vector featureSubsetEntropies; vector featureSubsetSplitValues; vector featureSubsetIntrinsicValues; vector featureSubsetGainRatios; for (int i = 0; i < featureSubsetIndices.size(); i++) { if (m->control_pressed) { return 0; } int tryIndex = featureSubsetIndices[i]; double featureMinEntropy; int featureSplitValue; double featureIntrinsicValue; getMinEntropyOfFeature(bootstrappedFeatureVectors[tryIndex], bootstrappedOutputVector, featureMinEntropy, featureSplitValue, featureIntrinsicValue); if (m->control_pressed) { return 0; } featureSubsetEntropies.push_back(featureMinEntropy); featureSubsetSplitValues.push_back(featureSplitValue); featureSubsetIntrinsicValues.push_back(featureIntrinsicValue); double featureInformationGain = node->getOwnEntropy() - featureMinEntropy; double featureGainRatio = (double)featureInformationGain / (double)featureIntrinsicValue; featureSubsetGainRatios.push_back(featureGainRatio); } vector::iterator minEntropyIterator = min_element(featureSubsetEntropies.begin(), featureSubsetEntropies.end()); vector::iterator maxGainRatioIterator = max_element(featureSubsetGainRatios.begin(), featureSubsetGainRatios.end()); double featureMinEntropy = *minEntropyIterator; // TODO: kept the following line as future reference, can be useful // double featureMaxGainRatio = *maxGainRatioIterator; double bestFeatureSplitEntropy = featureMinEntropy; int bestFeatureToSplitOnIndex = -1; if (treeSplitCriterion == "gainratio"){ bestFeatureToSplitOnIndex = (int)(maxGainRatioIterator - featureSubsetGainRatios.begin()); // if using 'gainRatio' measure, then featureMinEntropy must be re-updated, as the index // for 'featureMaxGainRatio' would be different bestFeatureSplitEntropy = featureSubsetEntropies[bestFeatureToSplitOnIndex]; } else if ( treeSplitCriterion == "infogain"){ bestFeatureToSplitOnIndex = (int)(minEntropyIterator - featureSubsetEntropies.begin()); } else { // TODO: we need an abort mechanism here } // TODO: is the following line needed? kept is as future reference // splitInformationGain = node.ownEntropy - node.splitFeatureEntropy int bestFeatureSplitValue = featureSubsetSplitValues[bestFeatureToSplitOnIndex]; node->setSplitFeatureIndex(featureSubsetIndices[bestFeatureToSplitOnIndex]); node->setSplitFeatureValue(bestFeatureSplitValue); node->setSplitFeatureEntropy(bestFeatureSplitEntropy); // TODO: kept the following line as future reference // node.splitInformationGain = splitInformationGain return 0; } catch(exception& e) { m->errorOut(e, "DecisionTree", "findAndUpdateBestFeatureToSplitOn"); exit(1); } } /***********************************************************************/ vector DecisionTree::selectFeatureSubsetRandomly(vector globalDiscardedFeatureIndices, vector localDiscardedFeatureIndices){ try { vector featureSubsetIndices; vector combinedDiscardedFeatureIndices; combinedDiscardedFeatureIndices.insert(combinedDiscardedFeatureIndices.end(), globalDiscardedFeatureIndices.begin(), globalDiscardedFeatureIndices.end()); combinedDiscardedFeatureIndices.insert(combinedDiscardedFeatureIndices.end(), localDiscardedFeatureIndices.begin(), localDiscardedFeatureIndices.end()); sort(combinedDiscardedFeatureIndices.begin(), combinedDiscardedFeatureIndices.end()); int numberOfRemainingSuitableFeatures = (int)(numFeatures - combinedDiscardedFeatureIndices.size()); int currentFeatureSubsetSize = numberOfRemainingSuitableFeatures < optimumFeatureSubsetSize ? numberOfRemainingSuitableFeatures : optimumFeatureSubsetSize; while (featureSubsetIndices.size() < currentFeatureSubsetSize) { if (m->control_pressed) { return featureSubsetIndices; } // TODO: optimize rand() call here int randomIndex = rand() % numFeatures; vector::iterator it = find(featureSubsetIndices.begin(), featureSubsetIndices.end(), randomIndex); if (it == featureSubsetIndices.end()){ // NOT FOUND vector::iterator it2 = find(combinedDiscardedFeatureIndices.begin(), combinedDiscardedFeatureIndices.end(), randomIndex); if (it2 == combinedDiscardedFeatureIndices.end()){ // NOT FOUND AGAIN featureSubsetIndices.push_back(randomIndex); } } } sort(featureSubsetIndices.begin(), featureSubsetIndices.end()); //#ifdef DEBUG_LEVEL_3 // PRINT_VAR(featureSubsetIndices); //#endif return featureSubsetIndices; } catch(exception& e) { m->errorOut(e, "DecisionTree", "selectFeatureSubsetRandomly"); exit(1); } } /***********************************************************************/ // TODO: printTree() needs a check if correct int DecisionTree::printTree(RFTreeNode* treeNode, string caption){ try { string tabs = ""; for (int i = 0; i < treeNode->getGeneration(); i++) { tabs += "|--"; } // for (int i = 0; i < treeNode->getGeneration() - 1; i++) { tabs += "| "; } // if (treeNode->getGeneration() != 0) { tabs += "|--"; } if (treeNode != NULL && treeNode->checkIsLeaf() == false){ m->mothurOut(tabs + caption + " [ gen: " + toString(treeNode->getGeneration()) + " , id: " + toString(treeNode->nodeId) + " ] ( " + toString(treeNode->getSplitFeatureValue()) + " < X" + toString(treeNode->getSplitFeatureIndex()) + " ) ( predicted: " + toString(treeNode->outputClass) + " , misclassified: " + toString(treeNode->testSampleMisclassificationCount) + " )\n"); printTree(treeNode->getLeftChildNode(), "left "); printTree(treeNode->getRightChildNode(), "right"); }else { m->mothurOut(tabs + caption + " [ gen: " + toString(treeNode->getGeneration()) + " , id: " + toString(treeNode->nodeId) + " ] ( classified to: " + toString(treeNode->getOutputClass()) + ", samples: " + toString(treeNode->getNumSamples()) + " , misclassified: " + toString(treeNode->testSampleMisclassificationCount) + " )\n"); } return 0; } catch(exception& e) { m->errorOut(e, "DecisionTree", "printTree"); exit(1); } } /***********************************************************************/ void DecisionTree::deleteTreeNodesRecursively(RFTreeNode* treeNode) { try { if (treeNode == NULL) { return; } deleteTreeNodesRecursively(treeNode->leftChildNode); deleteTreeNodesRecursively(treeNode->rightChildNode); delete treeNode; treeNode = NULL; } catch(exception& e) { m->errorOut(e, "DecisionTree", "deleteTreeNodesRecursively"); exit(1); } } /***********************************************************************/ void DecisionTree::pruneTree(double pruneAggressiveness = 0.9) { // find out the number of misclassification by each of the nodes for (int i = 0; i < bootstrappedTestSamples.size(); i++) { if (m->control_pressed) { return; } vector testSample = bootstrappedTestSamples[i]; updateMisclassificationCountRecursively(rootNode, testSample); } // do the actual pruning pruneRecursively(rootNode, pruneAggressiveness); } /***********************************************************************/ void DecisionTree::pruneRecursively(RFTreeNode* treeNode, double pruneAggressiveness){ if (treeNode != NULL && treeNode->checkIsLeaf() == false) { if (m->control_pressed) { return; } pruneRecursively(treeNode->leftChildNode, pruneAggressiveness); pruneRecursively(treeNode->rightChildNode, pruneAggressiveness); int subTreeMisclassificationCount = treeNode->leftChildNode->getTestSampleMisclassificationCount() + treeNode->rightChildNode->getTestSampleMisclassificationCount(); int ownMisclassificationCount = treeNode->getTestSampleMisclassificationCount(); if (subTreeMisclassificationCount * pruneAggressiveness > ownMisclassificationCount) { // TODO: need to check the effect of these two delete calls delete treeNode->leftChildNode; treeNode->leftChildNode = NULL; delete treeNode->rightChildNode; treeNode->rightChildNode = NULL; treeNode->isLeaf = true; } } } /***********************************************************************/ void DecisionTree::updateMisclassificationCountRecursively(RFTreeNode* treeNode, vector testSample) { int actualSampleOutputClass = testSample[numFeatures]; int nodePredictedOutputClass = treeNode->outputClass; if (actualSampleOutputClass != nodePredictedOutputClass) { treeNode->testSampleMisclassificationCount++; map::iterator it = nodeMisclassificationCounts.find(treeNode->nodeId); if (it == nodeMisclassificationCounts.end()) { // NOT FOUND nodeMisclassificationCounts[treeNode->nodeId] = 0; } nodeMisclassificationCounts[treeNode->nodeId]++; } if (treeNode->checkIsLeaf() == false) { // NOT A LEAF int sampleSplitFeatureValue = testSample[treeNode->splitFeatureIndex]; if (sampleSplitFeatureValue < treeNode->splitFeatureValue) { updateMisclassificationCountRecursively(treeNode->leftChildNode, testSample); } else { updateMisclassificationCountRecursively(treeNode->rightChildNode, testSample); } } } /***********************************************************************/ void DecisionTree::updateOutputClassOfNode(RFTreeNode* treeNode) { vector counts(numOutputClasses, 0); for (int i = 0; i < treeNode->bootstrappedOutputVector.size(); i++) { int bootstrappedOutput = treeNode->bootstrappedOutputVector[i]; counts[bootstrappedOutput]++; } vector::iterator majorityVotedOutputClassCountIterator = max_element(counts.begin(), counts.end()); int majorityVotedOutputClassCount = *majorityVotedOutputClassCountIterator; vector::iterator it = find(counts.begin(), counts.end(), majorityVotedOutputClassCount); int majorityVotedOutputClass = (int)(it - counts.begin()); treeNode->setOutputClass(majorityVotedOutputClass); } /***********************************************************************/ mothur-1.36.1/source/randomforest/decisiontree.hpp000066400000000000000000000053121255543666200223300ustar00rootroot00000000000000 // // decisiontree.hpp // rrf-fs-prototype // // Created by Abu Zaher Faridee on 5/28/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef RF_DECISIONTREE_HPP #define RF_DECISIONTREE_HPP #include "macros.h" #include "rftreenode.hpp" #include "abstractdecisiontree.hpp" /***********************************************************************/ struct VariableRankDescendingSorter { bool operator() (const pair& firstPair, const pair& secondPair){ return firstPair.second > secondPair.second; } }; struct VariableRankDescendingSorterDouble { bool operator() (const pair& firstPair, const pair& secondPair){ return firstPair.second > secondPair.second; } }; /***********************************************************************/ class DecisionTree: public AbstractDecisionTree{ friend class RandomForest; public: DecisionTree(vector< vector >& baseDataSet, vector globalDiscardedFeatureIndices, OptimumFeatureSubsetSelector optimumFeatureSubsetSelector, string treeSplitCriterion, float featureStandardDeviationThreshold); virtual ~DecisionTree(){ deleteTreeNodesRecursively(rootNode); } int calcTreeVariableImportanceAndError(int& numCorrect, double& treeErrorRate); int evaluateSample(vector testSample); int calcTreeErrorRate(int& numCorrect, double& treeErrorRate); void randomlyShuffleAttribute(const vector< vector >& samples, const int featureIndex, const int prevFeatureIndex, vector< vector >& shuffledSample); void purgeDataSetsFromTree() { purgeTreeNodesDataRecursively(rootNode); } int purgeTreeNodesDataRecursively(RFTreeNode* treeNode); void pruneTree(double pruneAggressiveness); void pruneRecursively(RFTreeNode* treeNode, double pruneAggressiveness); void updateMisclassificationCountRecursively(RFTreeNode* treeNode, vector testSample); void updateOutputClassOfNode(RFTreeNode* treeNode); private: void buildDecisionTree(); int splitRecursively(RFTreeNode* rootNode); int findAndUpdateBestFeatureToSplitOn(RFTreeNode* node); vector selectFeatureSubsetRandomly(vector globalDiscardedFeatureIndices, vector localDiscardedFeatureIndices); int printTree(RFTreeNode* treeNode, string caption); void deleteTreeNodesRecursively(RFTreeNode* treeNode); vector variableImportanceList; map outOfBagEstimates; float featureStandardDeviationThreshold; }; #endif mothur-1.36.1/source/randomforest/forest.cpp000066400000000000000000000057761255543666200211660ustar00rootroot00000000000000// // forest.cpp // Mothur // // Created by Kathryn Iverson on 10/26/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "forest.h" /***********************************************************************/ Forest::Forest(const std::vector < std::vector > dataSet, const int numDecisionTrees, const string treeSplitCriterion = "gainratio", const bool doPruning = false, const float pruneAggressiveness = 0.9, const bool discardHighErrorTrees = true, const float highErrorTreeDiscardThreshold = 0.4, const string optimumFeatureSubsetSelectionCriteria = "log2", const float featureStandardDeviationThreshold = 0.0) : dataSet(dataSet), numDecisionTrees(numDecisionTrees), numSamples((int)dataSet.size()), numFeatures((int)(dataSet[0].size() - 1)), globalVariableImportanceList(numFeatures, 0), treeSplitCriterion(treeSplitCriterion), doPruning(doPruning), pruneAggressiveness(pruneAggressiveness), discardHighErrorTrees(discardHighErrorTrees), highErrorTreeDiscardThreshold(highErrorTreeDiscardThreshold), optimumFeatureSubsetSelectionCriteria(optimumFeatureSubsetSelectionCriteria), featureStandardDeviationThreshold(featureStandardDeviationThreshold) { m = MothurOut::getInstance(); globalDiscardedFeatureIndices = getGlobalDiscardedFeatureIndices(); // TODO: double check if the implemenatation of 'globalOutOfBagEstimates' is correct } /***********************************************************************/ vector Forest::getGlobalDiscardedFeatureIndices() { try { //vector globalDiscardedFeatureIndices; //globalDiscardedFeatureIndices.push_back(1); // calculate feature vectors vector< vector > featureVectors(numFeatures, vector(numSamples, 0) ); for (int i = 0; i < numSamples; i++) { if (m->control_pressed) { return globalDiscardedFeatureIndices; } for (int j = 0; j < numFeatures; j++) { featureVectors[j][i] = dataSet[i][j]; } } for (int i = 0; i < featureVectors.size(); i++) { if (m->control_pressed) { return globalDiscardedFeatureIndices; } double standardDeviation = m->getStandardDeviation(featureVectors[i]); if (standardDeviation <= featureStandardDeviationThreshold){ globalDiscardedFeatureIndices.push_back(i); } } if (m->debug) { m->mothurOut("number of global discarded features: " + toString(globalDiscardedFeatureIndices.size())+ "\n"); m->mothurOut("total features: " + toString(featureVectors.size())+ "\n"); } return globalDiscardedFeatureIndices; } catch(exception& e) { m->errorOut(e, "Forest", "getGlobalDiscardedFeatureIndices"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/randomforest/forest.h000066400000000000000000000055031255543666200206170ustar00rootroot00000000000000// // forest.h // Mothur // // Created by Kathryn Iverson on 10/26/12. Modified abstractrandomforest // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef __Mothur__forest__ #define __Mothur__forest__ #include "mothurout.h" #include "macros.h" #include "decisiontree.hpp" #include "abstractdecisiontree.hpp" /***********************************************************************/ //this is a re-implementation of the abstractrandomforest class class Forest{ public: // intialization with vectors Forest(const std::vector < std::vector > dataSet, const int numDecisionTrees, const string treeSplitCriterion, const bool doPruning, const float pruneAggressiveness, const bool discardHighErrorTrees, const float highErrorTreeDiscardThreshold, const string optimumFeatureSubsetSelectionCriteria, const float featureStandardDeviationThreshold); virtual ~Forest(){ } virtual int populateDecisionTrees() = 0; virtual int calcForrestErrorRate() = 0; virtual int calcForrestVariableImportance(string) = 0; virtual int updateGlobalOutOfBagEstimates(DecisionTree* decisionTree) = 0; /***********************************************************************/ protected: // TODO: create a better way of discarding feature // currently we just set FEATURE_DISCARD_SD_THRESHOLD to 0 to solved this // it can be tuned for better selection // also, there might be other factors like Mean or other stuffs // same would apply for createLocalDiscardedFeatureList in the TreeNode class // TODO: Another idea is getting an aggregated discarded feature indices after the run, from combining // the local discarded feature indices // this would penalize a feature, even if in global space the feature looks quite good // the penalization would be averaged, so this woould unlikely to create a local optmina vector getGlobalDiscardedFeatureIndices(); int numDecisionTrees; int numSamples; int numFeatures; vector< vector > dataSet; vector globalDiscardedFeatureIndices; vector globalVariableImportanceList; string treeSplitCriterion; bool doPruning; float pruneAggressiveness; bool discardHighErrorTrees; float highErrorTreeDiscardThreshold; string optimumFeatureSubsetSelectionCriteria; float featureStandardDeviationThreshold; // This is a map of each feature to outcome count of each classes // e.g. 1 => [2 7] means feature 1 has 2 outcome of 0 and 7 outcome of 1 map > globalOutOfBagEstimates; // TODO: fix this, do we use pointers? vector decisionTrees; MothurOut* m; private: }; #endif /* defined(__Mothur__forest__) */ mothur-1.36.1/source/randomforest/macros.h000066400000000000000000000014151255543666200205770ustar00rootroot00000000000000// // macros.h // rrf-fs-prototype // // Created by Abu Zaher Faridee on 5/28/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef RF_MACROS_H #define RF_MACROS_H #include "mothurout.h" /***********************************************************************/ class OptimumFeatureSubsetSelector{ public: OptimumFeatureSubsetSelector(string selectionType = "log2"): selectionType(selectionType){ } int getOptimumFeatureSubsetSize(int numFeatures){ if (selectionType == "log2"){ return (int)ceil(log2(numFeatures)); } else if (selectionType == "squareRoot"){ return (int)ceil(sqrt(numFeatures)); } return -1; } private: string selectionType; }; /***********************************************************************/ #endif mothur-1.36.1/source/randomforest/randomforest.cpp000066400000000000000000000324641255543666200223610ustar00rootroot00000000000000// // randomforest.cpp // Mothur // // Created by Sarah Westcott on 10/2/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "randomforest.hpp" /***********************************************************************/ RandomForest::RandomForest(const vector > dataSet, const int numDecisionTrees, const string treeSplitCriterion = "gainratio", const bool doPruning = false, const float pruneAggressiveness = 0.9, const bool discardHighErrorTrees = true, const float highErrorTreeDiscardThreshold = 0.4, const string optimumFeatureSubsetSelectionCriteria = "log2", const float featureStandardDeviationThreshold = 0.0) : Forest(dataSet, numDecisionTrees, treeSplitCriterion, doPruning, pruneAggressiveness, discardHighErrorTrees, highErrorTreeDiscardThreshold, optimumFeatureSubsetSelectionCriteria, featureStandardDeviationThreshold) { m = MothurOut::getInstance(); } /***********************************************************************/ // DONE int RandomForest::calcForrestErrorRate() { try { int numCorrect = 0; for (map >::iterator it = globalOutOfBagEstimates.begin(); it != globalOutOfBagEstimates.end(); it++) { if (m->control_pressed) { return 0; } int indexOfSample = it->first; vector predictedOutComes = it->second; vector::iterator maxPredictedOutComeIterator = max_element(predictedOutComes.begin(), predictedOutComes.end()); int majorityVotedOutcome = (int)(maxPredictedOutComeIterator - predictedOutComes.begin()); int realOutcome = dataSet[indexOfSample][numFeatures]; if (majorityVotedOutcome == realOutcome) { numCorrect++; } } // TODO: save or return forrestErrorRate for future use; double forrestErrorRate = 1 - ((double)numCorrect / (double)globalOutOfBagEstimates.size()); m->mothurOut("numCorrect = " + toString(numCorrect)+ "\n"); m->mothurOut("forrestErrorRate = " + toString(forrestErrorRate)+ "\n"); return 0; } catch(exception& e) { m->errorOut(e, "RandomForest", "calcForrestErrorRate"); exit(1); } } /***********************************************************************/ int RandomForest::printConfusionMatrix(map intToTreatmentMap) { try { int numGroups = intToTreatmentMap.size(); vector > cm(numGroups, vector(numGroups, 0)); for (map >::iterator it = globalOutOfBagEstimates.begin(); it != globalOutOfBagEstimates.end(); it++) { if (m->control_pressed) { return 0; } int indexOfSample = it->first; //key vector predictedOutComes = it->second; //value, vector of all predicted classes vector::iterator maxPredictedOutComeIterator = max_element(predictedOutComes.begin(), predictedOutComes.end()); int majorityVotedOutcome = (int)(maxPredictedOutComeIterator - predictedOutComes.begin()); int realOutcome = dataSet[indexOfSample][numFeatures]; cm[realOutcome][majorityVotedOutcome] = cm[realOutcome][majorityVotedOutcome] + 1; } vector fw; for (int w = 0; w mothurOut("confusion matrix:\n\t\t"); for (int k = 0; k < numGroups; k++) { //m->mothurOut(intToTreatmentMap[k] + "\t"); cout << setw(fw[k]) << intToTreatmentMap[k] << "\t"; } for (int i = 0; i < numGroups; i++) { cout << "\n" << setw(fw[i]) << intToTreatmentMap[i] << "\t"; //m->mothurOut("\n" + intToTreatmentMap[i] + "\t"); if (m->control_pressed) { return 0; } for (int j = 0; j < numGroups; j++) { //m->mothurOut(toString(cm[i][j]) + "\t"); cout << setw(fw[i]) << cm[i][j] << "\t"; } } //m->mothurOut("\n"); cout << "\n"; return 0; } catch(exception& e) { m->errorOut(e, "RandomForest", "printConfusionMatrix"); exit(1); } } /***********************************************************************/ int RandomForest::getMissclassifications(string filename, map intToTreatmentMap, vector names) { try { ofstream out; m->openOutputFile(filename, out); out <<"Sample\tRF classification\tActual classification\n"; for (map >::iterator it = globalOutOfBagEstimates.begin(); it != globalOutOfBagEstimates.end(); it++) { if (m->control_pressed) { return 0; } int indexOfSample = it->first; vector predictedOutComes = it->second; vector::iterator maxPredictedOutComeIterator = max_element(predictedOutComes.begin(), predictedOutComes.end()); int majorityVotedOutcome = (int)(maxPredictedOutComeIterator - predictedOutComes.begin()); int realOutcome = dataSet[indexOfSample][numFeatures]; if (majorityVotedOutcome != realOutcome) { out << names[indexOfSample] << "\t" << intToTreatmentMap[majorityVotedOutcome] << "\t" << intToTreatmentMap[realOutcome] << endl; } } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "RandomForest", "getMissclassifications"); exit(1); } } /***********************************************************************/ int RandomForest::calcForrestVariableImportance(string filename) { try { // follow the link: http://en.wikipedia.org/wiki/Dynamic_cast //if you are going to dynamically cast, aren't you undoing the advantage of abstraction. Why abstract at all? //could cause maintenance issues later if other types of Abstract decison trees are created that cannot be cast as a decision tree. for (int i = 0; i < decisionTrees.size(); i++) { if (m->control_pressed) { return 0; } DecisionTree* decisionTree = dynamic_cast(decisionTrees[i]); for (int j = 0; j < numFeatures; j++) { globalVariableImportanceList[j] += (double)decisionTree->variableImportanceList[j]; } } for (int i = 0; i < numFeatures; i++) { globalVariableImportanceList[i] /= (double)numDecisionTrees; } vector< pair > globalVariableRanks; for (int i = 0; i < globalVariableImportanceList.size(); i++) { //cout << "[" << i << ',' << globalVariableImportanceList[i] << "], "; if (globalVariableImportanceList[i] > 0) { pair globalVariableRank(0, 0.0); globalVariableRank.first = i; globalVariableRank.second = globalVariableImportanceList[i]; globalVariableRanks.push_back(globalVariableRank); } } // for (int i = 0; i < globalVariableRanks.size(); i++) { // cout << m->currentBinLabels[(int)globalVariableRanks[i][0]] << '\t' << globalVariableImportanceList[globalVariableRanks[i][0]] << endl; // } VariableRankDescendingSorterDouble variableRankDescendingSorter; sort(globalVariableRanks.begin(), globalVariableRanks.end(), variableRankDescendingSorter); ofstream out; m->openOutputFile(filename, out); out <<"OTU\tMean decrease accuracy\n"; for (int i = 0; i < globalVariableRanks.size(); i++) { out << m->currentSharedBinLabels[(int)globalVariableRanks[i].first] << '\t' << globalVariableImportanceList[globalVariableRanks[i].first] << endl; } out.close(); return 0; } catch(exception& e) { m->errorOut(e, "RandomForest", "calcForrestVariableImportance"); exit(1); } } /***********************************************************************/ int RandomForest::populateDecisionTrees() { try { vector errorRateImprovements; for (int i = 0; i < numDecisionTrees; i++) { if (m->control_pressed) { return 0; } if (((i+1) % 100) == 0) { m->mothurOut("Creating " + toString(i+1) + " (th) Decision tree\n"); } // TODO: need to first fix if we are going to use pointer based system or anything else DecisionTree* decisionTree = new DecisionTree(dataSet, globalDiscardedFeatureIndices, OptimumFeatureSubsetSelector(optimumFeatureSubsetSelectionCriteria), treeSplitCriterion, featureStandardDeviationThreshold); if (m->debug && doPruning) { m->mothurOut("Before pruning\n"); decisionTree->printTree(decisionTree->rootNode, "ROOT"); } int numCorrect; double treeErrorRate; decisionTree->calcTreeErrorRate(numCorrect, treeErrorRate); double prePrunedErrorRate = treeErrorRate; if (m->debug) { m->mothurOut("treeErrorRate: " + toString(treeErrorRate) + " numCorrect: " + toString(numCorrect) + "\n"); } if (doPruning) { decisionTree->pruneTree(pruneAggressiveness); if (m->debug) { m->mothurOut("After pruning\n"); decisionTree->printTree(decisionTree->rootNode, "ROOT"); } decisionTree->calcTreeErrorRate(numCorrect, treeErrorRate); } double postPrunedErrorRate = treeErrorRate; decisionTree->calcTreeVariableImportanceAndError(numCorrect, treeErrorRate); double errorRateImprovement = (prePrunedErrorRate - postPrunedErrorRate) / prePrunedErrorRate; if (m->debug) { m->mothurOut("treeErrorRate: " + toString(treeErrorRate) + " numCorrect: " + toString(numCorrect) + "\n"); if (doPruning) { m->mothurOut("errorRateImprovement: " + toString(errorRateImprovement) + "\n"); } } if (discardHighErrorTrees) { if (treeErrorRate < highErrorTreeDiscardThreshold) { updateGlobalOutOfBagEstimates(decisionTree); decisionTree->purgeDataSetsFromTree(); decisionTrees.push_back(decisionTree); if (doPruning) { errorRateImprovements.push_back(errorRateImprovement); } } else { delete decisionTree; } } else { updateGlobalOutOfBagEstimates(decisionTree); decisionTree->purgeDataSetsFromTree(); decisionTrees.push_back(decisionTree); if (doPruning) { errorRateImprovements.push_back(errorRateImprovement); } } } double avgErrorRateImprovement = -1.0; if (errorRateImprovements.size() > 0) { avgErrorRateImprovement = accumulate(errorRateImprovements.begin(), errorRateImprovements.end(), 0.0); // cout << "Total " << avgErrorRateImprovement << " size " << errorRateImprovements.size() << endl; avgErrorRateImprovement /= errorRateImprovements.size(); } if (m->debug && doPruning) { m->mothurOut("avgErrorRateImprovement:" + toString(avgErrorRateImprovement) + "\n"); } // m->mothurOut("globalOutOfBagEstimates = " + toStringVectorMap(globalOutOfBagEstimates)+ "\n"); return 0; } catch(exception& e) { m->errorOut(e, "RandomForest", "populateDecisionTrees"); exit(1); } } /***********************************************************************/ // TODO: need to finalize bettween reference and pointer for DecisionTree [partially solved] // DONE: make this pure virtual in superclass // DONE int RandomForest::updateGlobalOutOfBagEstimates(DecisionTree* decisionTree) { try { for (map::iterator it = decisionTree->outOfBagEstimates.begin(); it != decisionTree->outOfBagEstimates.end(); it++) { if (m->control_pressed) { return 0; } int indexOfSample = it->first; int predictedOutcomeOfSample = it->second; if (globalOutOfBagEstimates.count(indexOfSample) == 0) { globalOutOfBagEstimates[indexOfSample] = vector(decisionTree->numOutputClasses, 0); }; globalOutOfBagEstimates[indexOfSample][predictedOutcomeOfSample] += 1; } return 0; } catch(exception& e) { m->errorOut(e, "RandomForest", "updateGlobalOutOfBagEstimates"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/randomforest/randomforest.hpp000066400000000000000000000035531255543666200223630ustar00rootroot00000000000000// // randomforest.hpp // rrf-fs-prototype // // Created by Abu Zaher Faridee on 7/20/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef RF_RANDOMFOREST_HPP #define RF_RANDOMFOREST_HPP #include "macros.h" #include "forest.h" #include "decisiontree.hpp" class RandomForest: public Forest { public: RandomForest(const vector > dataSet, const int numDecisionTrees, const string treeSplitCriterion, const bool doPruning, const float pruneAggressiveness, const bool discardHighErrorTrees, const float highErrorTreeDiscardThreshold, const string optimumFeatureSubsetSelectionCriteria, const float featureStandardDeviationThreshold); //NOTE:: if you are going to dynamically cast, aren't you undoing the advantage of abstraction. Why abstract at all? //could cause maintenance issues later if other types of Abstract decison trees are created that cannot be cast as a decision tree. virtual ~RandomForest() { for (vector::iterator it = decisionTrees.begin(); it != decisionTrees.end(); it++) { // we know that this is decision tree, so we can do a dynamic_case here DecisionTree* decisionTree = dynamic_cast(*it); // calling the destructor by deleting delete decisionTree; } } int calcForrestErrorRate(); int calcForrestVariableImportance(string); int populateDecisionTrees(); int updateGlobalOutOfBagEstimates(DecisionTree* decisionTree); int printConfusionMatrix(map intToTreatmentMap); int getMissclassifications(string, map intToTreatmentMap, vector names); private: MothurOut* m; }; #endif mothur-1.36.1/source/randomforest/regularizeddecisiontree.cpp000066400000000000000000000002771255543666200245660ustar00rootroot00000000000000// // regularizeddecisiontree.cpp // Mothur // // Created by Kathryn Iverson on 11/16/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "regularizeddecisiontree.h" mothur-1.36.1/source/randomforest/regularizeddecisiontree.h000066400000000000000000000005761255543666200242350ustar00rootroot00000000000000// // regularizeddecisiontree.h // Mothur // // Created by Kathryn Iverson on 11/16/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef __Mothur__regularizeddecisiontree__ #define __Mothur__regularizeddecisiontree__ #include #include "rftreenode.hpp" #include "abstractdecisiontree.hpp" #endif /* defined(__Mothur__regularizeddecisiontree__) */ mothur-1.36.1/source/randomforest/regularizedrandomforest.cpp000066400000000000000000000032171255543666200246110ustar00rootroot00000000000000// // regularizedrandomforest.cpp // Mothur // // Created by Kathryn Iverson on 11/16/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "regularizedrandomforest.h" RegularizedRandomForest::RegularizedRandomForest(const vector > dataSet, const int numDecisionTrees, const string treeSplitCriterion = "gainratio") // TODO: update ctor according to basic RandomForest Class : Forest(dataSet, numDecisionTrees, treeSplitCriterion, false, 0.9, true, 0.4, "log2", 0.0) { m = MothurOut::getInstance(); } int RegularizedRandomForest::calcForrestErrorRate() { // try { return 0; } catch(exception& e) { m->errorOut(e, "RegularizedRandomForest", "calcForrestErrorRate"); exit(1); } } int RegularizedRandomForest::calcForrestVariableImportance(string filename) { // try { return 0; } catch(exception& e) { m->errorOut(e, "RegularizedRandomForest", "calcForrestVariableImportance"); exit(1); } } int RegularizedRandomForest::populateDecisionTrees() { // try { return 0; } catch(exception& e) { m->errorOut(e, "RegularizedRandomForest", "populateDecisionTrees"); exit(1); } } int RegularizedRandomForest::updateGlobalOutOfBagEstimates(DecisionTree *decisionTree) { try { return 0; } catch(exception& e) { m->errorOut(e, "RegularizedRandomForest", "updateGlobalOutOfBagEstimates"); exit(1); } } mothur-1.36.1/source/randomforest/regularizedrandomforest.h000066400000000000000000000013501255543666200242520ustar00rootroot00000000000000// // regularizedrandomforest.h // Mothur // // Created by Kathryn Iverson on 11/16/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef __Mothur__regularizedrandomforest__ #define __Mothur__regularizedrandomforest__ #include "forest.h" #include "decisiontree.hpp" class RegularizedRandomForest: public Forest { public: // RegularizedRandomForest(const vector > dataSet,const int numDecisionTrees, const string); int calcForrestErrorRate(); int calcForrestVariableImportance(string); int populateDecisionTrees(); int updateGlobalOutOfBagEstimates(DecisionTree* decisionTree); private: // MothurOut* m; }; #endif /* defined(__Mothur__regularizedrandomforest__) */ mothur-1.36.1/source/randomforest/rftreenode.cpp000066400000000000000000000100401255543666200217750ustar00rootroot00000000000000// // rftreenode.cpp // Mothur // // Created by Sarah Westcott on 10/2/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "rftreenode.hpp" /***********************************************************************/ RFTreeNode::RFTreeNode(vector< vector > bootstrappedTrainingSamples, vector globalDiscardedFeatureIndices, int numFeatures, int numSamples, int numOutputClasses, int generation, int nodeId, float featureStandardDeviationThreshold) : bootstrappedTrainingSamples(bootstrappedTrainingSamples), globalDiscardedFeatureIndices(globalDiscardedFeatureIndices), numFeatures(numFeatures), numSamples(numSamples), numOutputClasses(numOutputClasses), generation(generation), isLeaf(false), outputClass(-1), nodeId(nodeId), testSampleMisclassificationCount(0), splitFeatureIndex(-1), splitFeatureValue(-1), splitFeatureEntropy(-1.0), ownEntropy(-1.0), featureStandardDeviationThreshold(featureStandardDeviationThreshold), bootstrappedFeatureVectors(numFeatures, vector(numSamples, 0)), bootstrappedOutputVector(numSamples, 0), leftChildNode(NULL), rightChildNode(NULL), parentNode(NULL) { m = MothurOut::getInstance(); for (int i = 0; i < numSamples; i++) { // just doing a simple transpose of the matrix if (m->control_pressed) { break; } for (int j = 0; j < numFeatures; j++) { bootstrappedFeatureVectors[j][i] = bootstrappedTrainingSamples[i][j]; } } for (int i = 0; i < numSamples; i++) { if (m->control_pressed) { break; } bootstrappedOutputVector[i] = bootstrappedTrainingSamples[i][numFeatures]; } createLocalDiscardedFeatureList(); updateNodeEntropy(); } /***********************************************************************/ int RFTreeNode::createLocalDiscardedFeatureList(){ try { for (int i = 0; i < numFeatures; i++) { // TODO: need to check if bootstrappedFeatureVectors == numFeatures, in python code we are using bootstrappedFeatureVectors instead of numFeatures if (m->control_pressed) { return 0; } vector::iterator it = find(globalDiscardedFeatureIndices.begin(), globalDiscardedFeatureIndices.end(), i); if (it == globalDiscardedFeatureIndices.end()) { // NOT FOUND double standardDeviation = m->getStandardDeviation(bootstrappedFeatureVectors[i]); if (standardDeviation <= featureStandardDeviationThreshold) { localDiscardedFeatureIndices.push_back(i); } } } return 0; } catch(exception& e) { m->errorOut(e, "RFTreeNode", "createLocalDiscardedFeatureList"); exit(1); } } /***********************************************************************/ int RFTreeNode::updateNodeEntropy() { try { vector classCounts(numOutputClasses, 0); for (int i = 0; i < bootstrappedOutputVector.size(); i++) { classCounts[bootstrappedOutputVector[i]]++; } int totalClassCounts = accumulate(classCounts.begin(), classCounts.end(), 0); double nodeEntropy = 0.0; for (int i = 0; i < classCounts.size(); i++) { if (m->control_pressed) { return 0; } if (classCounts[i] == 0) continue; double probability = (double)classCounts[i] / (double)totalClassCounts; nodeEntropy += -(probability * log2(probability)); } ownEntropy = nodeEntropy; return 0; } catch(exception& e) { m->errorOut(e, "RFTreeNode", "updateNodeEntropy"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/randomforest/rftreenode.hpp000066400000000000000000000100341255543666200220050ustar00rootroot00000000000000// // rftreenode.hpp // rrf-fs-prototype // // Created by Abu Zaher Faridee on 5/29/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #ifndef RF_RFTREENODE_HPP #define RF_RFTREENODE_HPP #include "mothurout.h" #include "macros.h" class RFTreeNode{ public: RFTreeNode(vector< vector > bootstrappedTrainingSamples, vector globalDiscardedFeatureIndices, int numFeatures, int numSamples, int numOutputClasses, int generation, int nodeId, float featureStandardDeviationThreshold = 0.0); virtual ~RFTreeNode(){} // getters // we need to return const reference so that we have the actual value and not a copy, // plus we do not modify the value as well const int getSplitFeatureIndex() { return splitFeatureIndex; } // TODO: check if this works properly or returs a shallow copy of the data const vector< vector >& getBootstrappedTrainingSamples() { return bootstrappedTrainingSamples; } const int getSplitFeatureValue() { return splitFeatureValue; } const int getGeneration() { return generation; } const bool checkIsLeaf() { return isLeaf; } // TODO: fix this const pointer dillema // we do not want to modify the data pointer by getLeftChildNode RFTreeNode* getLeftChildNode() { return leftChildNode; } RFTreeNode* getRightChildNode() { return rightChildNode; } const int getOutputClass() { return outputClass; } const int getNumSamples() { return numSamples; } const int getNumFeatures() { return numFeatures; } const vector& getLocalDiscardedFeatureIndices() { return localDiscardedFeatureIndices; } const vector< vector >& getBootstrappedFeatureVectors() { return bootstrappedFeatureVectors; } const vector& getBootstrappedOutputVector() { return bootstrappedOutputVector; } const vector& getFeatureSubsetIndices() { return featureSubsetIndices; } const double getOwnEntropy() { return ownEntropy; } const int getTestSampleMisclassificationCount() { return testSampleMisclassificationCount; } // setters void setIsLeaf(bool isLeaf) { this->isLeaf = isLeaf; } void setOutputClass(int outputClass) { this->outputClass = outputClass; } void setFeatureSubsetIndices(vector featureSubsetIndices) { this->featureSubsetIndices = featureSubsetIndices; } void setLeftChildNode(RFTreeNode* leftChildNode) { this->leftChildNode = leftChildNode; } void setRightChildNode(RFTreeNode* rightChildNode) { this->rightChildNode = rightChildNode; } void setParentNode(RFTreeNode* parentNode) { this->parentNode = parentNode; } void setSplitFeatureIndex(int splitFeatureIndex) { this->splitFeatureIndex = splitFeatureIndex; } void setSplitFeatureValue(int splitFeatureValue) { this->splitFeatureValue = splitFeatureValue; } void setSplitFeatureEntropy(double splitFeatureEntropy) { this->splitFeatureEntropy = splitFeatureEntropy; } // TODO: need to remove this mechanism of friend class //NOTE: friend classes can be useful for testing purposes, but I would avoid using them otherwise. friend class DecisionTree; friend class AbstractDecisionTree; private: vector > bootstrappedTrainingSamples; vector globalDiscardedFeatureIndices; vector localDiscardedFeatureIndices; vector > bootstrappedFeatureVectors; vector bootstrappedOutputVector; vector featureSubsetIndices; int numFeatures; int numSamples; int numOutputClasses; int generation; bool isLeaf; int outputClass; int splitFeatureIndex; int splitFeatureValue; double splitFeatureEntropy; double ownEntropy; int nodeId; float featureStandardDeviationThreshold; int testSampleMisclassificationCount; RFTreeNode* leftChildNode; RFTreeNode* rightChildNode; RFTreeNode* parentNode; MothurOut* m; int createLocalDiscardedFeatureList(); int updateNodeEntropy(); }; #endif mothur-1.36.1/source/randomnumber.cpp000066400000000000000000000217741255543666200176460ustar00rootroot00000000000000#ifndef RANDOMNUMBER_H #define RANDOMNUMBER_H /* * randomnumber.cpp * * * Created by Pat Schloss on 7/6/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "randomnumber.h" #include /**************************************************************************************************/ RandomNumberGenerator::RandomNumberGenerator(){ // srand( (unsigned)time( NULL ) ); } /**************************************************************************************************/ float RandomNumberGenerator::randomUniform(){ float randUnif = 0.0000; while(randUnif == 0.0000){ randUnif = rand() / (float)RAND_MAX; } return randUnif; } /**************************************************************************************************/ // //Code shamelessly swiped and modified from Numerical Recipes in C++ // /**************************************************************************************************/ float RandomNumberGenerator::randomExp(){ float randExp = 0.0000; while(randExp == 0.0000){ randExp = -log(randomUniform()); } return randExp; } /**************************************************************************************************/ // //Code shamelessly swiped and modified from Numerical Recipes in C++ // /**************************************************************************************************/ float RandomNumberGenerator::randomNorm(){ float x, y, rsquare; do{ x = 2.0 * randomUniform() - 1.0; y = 2.0 * randomUniform() - 1.0; rsquare = x * x + y * y; } while(rsquare >= 1.0 || rsquare == 0.0); float fac = sqrt(-2.0 * log(rsquare)/rsquare); return x * fac; } /**************************************************************************************************/ /* * Slightly modified version of: * * Mathlib : A C Library of Special Functions * Copyright (C) 1998 Ross Ihaka * Copyright (C) 2000-2005 The R Development Core Team * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, a copy is available at * http://www.r-project.org/Licenses/ * * SYNOPSIS * * #include * float rgamma(float a, float scale); * * DESCRIPTION * * Random variates from the gamma distribution. * * REFERENCES * * [1] Shape parameter a >= 1. Algorithm GD in: * * Ahrens, J.H. and Dieter, U. (1982). * Generating gamma variates by a modified * rejection technique. * Comm. ACM, 25, 47-54. * * * [2] Shape parameter 0 < a < 1. Algorithm GS in: * * Ahrens, J.H. and Dieter, U. (1974). * Computer methods for sampling from gamma, beta, * poisson and binomial distributions. * Computing, 12, 223-246. * * Input: a = parameter (mean) of the standard gamma distribution. * Output: a variate from the gamma(a)-distribution */ float RandomNumberGenerator::randomGamma(float a) { /* Constants : */ const static float sqrt32 = 5.656854; const static float exp_m1 = 0.36787944117144232159;/* exp(-1) = 1/e */ float scale = 1.0; /* Coefficients q[k] - for q0 = sum(q[k]*a^(-k)) * Coefficients a[k] - for q = q0+(t*t/2)*sum(a[k]*v^k) * Coefficients e[k] - for exp(q)-1 = sum(e[k]*q^k) */ const static float q1 = 0.04166669; const static float q2 = 0.02083148; const static float q3 = 0.00801191; const static float q4 = 0.00144121; const static float q5 = -7.388e-5; const static float q6 = 2.4511e-4; const static float q7 = 2.424e-4; const static float a1 = 0.3333333; const static float a2 = -0.250003; const static float a3 = 0.2000062; const static float a4 = -0.1662921; const static float a5 = 0.1423657; const static float a6 = -0.1367177; const static float a7 = 0.1233795; /* State variables [FIXME for threading!] :*/ static float aa = 0.; static float aaa = 0.; static float s, s2, d; /* no. 1 (step 1) */ static float q0, b, si, c;/* no. 2 (step 4) */ float e, p, q, r, t, u, v, w, x, ret_val; if (a <= 0.0 || scale <= 0.0){ cout << "error alpha or scale parameter are less than zero." << endl; exit(1); } if (a < 1.) { /* GS algorithm for parameters a < 1 */ e = 1.0 + exp_m1 * a; for(;;) { p = e * randomUniform(); if (p >= 1.0) { x = -log((e - p) / a); if (randomExp() >= (1.0 - a) * log(x)) break; } else { x = exp(log(p) / a); if (randomExp() >= x) break; } } return scale * x; } /* --- a >= 1 : GD algorithm --- */ /* Step 1: Recalculations of s2, s, d if a has changed */ if (a != aa) { aa = a; s2 = a - 0.5; s = sqrt(s2); d = sqrt32 - s * 12.0; } /* Step 2: t = standard normal deviate, x = (s,1/2) -normal deviate. */ /* immediate acceptance (i) */ t = randomNorm(); x = s + 0.5 * t; ret_val = x * x; if (t >= 0.0) return scale * ret_val; /* Step 3: u = 0,1 - uniform sample. squeeze acceptance (s) */ u = randomUniform(); if (d * u <= t * t * t) return scale * ret_val; /* Step 4: recalculations of q0, b, si, c if necessary */ if (a != aaa) { aaa = a; r = 1.0 / a; q0 = ((((((q7 * r + q6) * r + q5) * r + q4) * r + q3) * r + q2) * r + q1) * r; /* Approximation depending on size of parameter a */ /* The constants in the expressions for b, si and c */ /* were established by numerical experiments */ if (a <= 3.686) { b = 0.463 + s + 0.178 * s2; si = 1.235; c = 0.195 / s - 0.079 + 0.16 * s; } else if (a <= 13.022) { b = 1.654 + 0.0076 * s2; si = 1.68 / s + 0.275; c = 0.062 / s + 0.024; } else { b = 1.77; si = 0.75; c = 0.1515 / s; } } /* Step 5: no quotient test if x not positive */ if (x > 0.0) { /* Step 6: calculation of v and quotient q */ v = t / (s + s); if (fabs(v) <= 0.25) q = q0 + 0.5 * t * t * ((((((a7 * v + a6) * v + a5) * v + a4) * v + a3) * v + a2) * v + a1) * v; else q = q0 - s * t + 0.25 * t * t + (s2 + s2) * log(1.0 + v); /* Step 7: quotient acceptance (q) */ if (log(1.0 - u) <= q) return scale * ret_val; } for(;;) { /* Step 8: e = standard exponential deviate * u = 0,1 -uniform deviate * t = (b,si)-float exponential (laplace) sample */ e = randomExp(); u = randomUniform(); u = u + u - 1.0; if (u < 0.0) t = b - si * e; else t = b + si * e; /* Step 9: rejection if t < tau(1) = -0.71874483771719 */ if (t >= -0.71874483771719) { /* Step 10: calculation of v and quotient q */ v = t / (s + s); if (fabs(v) <= 0.25) q = q0 + 0.5 * t * t * ((((((a7 * v + a6) * v + a5) * v + a4) * v + a3) * v + a2) * v + a1) * v; else q = q0 - s * t + 0.25 * t * t + (s2 + s2) * log(1.0 + v); /* Step 11: hat acceptance (h) */ /* (if q not positive go to step 8) */ if (q > 0.0) { w = expm1(q); /* ^^^^^ original code had approximation with rel.err < 2e-7 */ /* if t is rejected sample again at step 8 */ if (c * fabs(u) <= w * exp(e - 0.5 * t * t)) break; } } } /* for(;;) .. until `t' is accepted */ x = s + 0.5 * t; return scale * x * x; } /**************************************************************************************************/ // // essentially swiped from http://en.wikipedia.org/wiki/Dirichlet_distribution#Random_number_generation // /**************************************************************************************************/ vector RandomNumberGenerator::randomDirichlet(vector alphas){ int nAlphas = (int)alphas.size(); vector dirs(nAlphas, 0.0000); float sum = 0.0000; for(int i=0;i randomDirichlet(vector alphas); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/rarecalc.cpp000066400000000000000000000023341255543666200167200ustar00rootroot00000000000000/* * rarecalc.cpp * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "rarecalc.h" /***********************************************************************/ EstOutput RareCalc::getValues(int n){ try { EstOutput output(3,0); double richness = (double)numBins; double varS = 0.0000; double varTerm1 = 0.0000; double varTerm2 = 0.0000; double rSummation = 0; for(int i=0;iget(i); rSummation += (bMatrix[N_ni][n]); varTerm1 += (bMatrix[N_ni][n] * (1.0 - bMatrix[N_ni][n] / bMatrix[numSeqs][n])); for(int j=i+1;jget(j)][n] - bMatrix[N_ni][n] * bMatrix[numSeqs-bins->get(j)][n] / bMatrix[numSeqs][n]); } } richness -= (rSummation / bMatrix[numSeqs][n]); varS = (varTerm1 + 2 * varTerm2) / bMatrix[numSeqs][n]; float sd = pow(varS, 0.5); output[0] = richness; output[1] = richness - 1.96 * sd; output[2] = richness + 1.96 * sd; return output; } catch(exception& e) { m->errorOut(e, "RareCalc", "getValues"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/rarecalc.h000066400000000000000000000014501255543666200163630ustar00rootroot00000000000000#ifndef RARECALC_H #define RARECALC_H /* * rarecalc.h * Dotur * * Created by Sarah Westcott on 1/7/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ /* This class is not currently used by Mothur */ #include "calculator.h" /***********************************************************************/ class RareCalc { public: RareCalc(RAbundVector* b) : bins(b), numSeqs(b->getNumSeqs()), maxRank(b->getMaxRank()), numBins(b->getNumBins()) { m = MothurOut::getInstance(); bMatrix = m->binomial(numSeqs+1); } EstOutput getValues(int); string getName() { return "rarecalc"; } private: RAbundVector* bins; vector > bMatrix; int numSeqs, maxRank, numBins; MothurOut* m; }; /***********************************************************************/ #endif mothur-1.36.1/source/raredisplay.cpp000066400000000000000000000111201255543666200174540ustar00rootroot00000000000000/* * raredisplay.cpp * Dotur * * Created by Sarah Westcott on 11/18/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "raredisplay.h" /***********************************************************************/ void RareDisplay::init(string label){ try { this->label = label; } catch(exception& e) { m->errorOut(e, "RareDisplay", "init"); exit(1); } } /***********************************************************************/ void RareDisplay::update(SAbundVector* rank){ try { int newNSeqs = rank->getNumSeqs(); vector data = estimate->getValues(rank); map >::iterator it = results.find(newNSeqs); if (it == results.end()) { //first iter for this count vector temp; temp.push_back(data[0]); results[newNSeqs] = temp; }else { it->second.push_back(data[0]); } } catch(exception& e) { m->errorOut(e, "RareDisplay", "update"); exit(1); } } /***********************************************************************/ void RareDisplay::update(vector shared, int numSeqs, int numGroupComb) { try { vector data = estimate->getValues(shared); map >::iterator it = results.find(numSeqs); if (it == results.end()) { //first iter for this count vector temp; temp.push_back(data[0]); results[numSeqs] = temp; }else { it->second.push_back(data[0]); } } catch(exception& e) { m->errorOut(e, "RareDisplay", "update"); exit(1); } } /***********************************************************************/ void RareDisplay::reset(){ try { nIters++; } catch(exception& e) { m->errorOut(e, "RareDisplay", "reset"); exit(1); } } /***********************************************************************/ void RareDisplay::close(){ try { output->initFile(label); for (map >::iterator it = results.begin(); it != results.end(); it++) { vector data(3,0); sort((it->second).begin(), (it->second).end()); vector thisResults = it->second; double meanResults = m->getAverage(thisResults); data[0] = meanResults; data[1] = (it->second)[(int)(0.025*(nIters-1))]; data[2] = (it->second)[(int)(0.975*(nIters-1))]; //cout << nIters << '\t' << (int)(0.025*(nIters-1)) << '\t' << (int)(0.975*(nIters-1)) << endl; //cout << it->first << '\t' << data[1] << '\t' << data[2] << endl; output->output(it->first, data); } nIters = 1; results.clear(); output->resetFile(); } catch(exception& e) { m->errorOut(e, "RareDisplay", "close"); exit(1); } } /***********************************************************************/ void RareDisplay::inputTempFiles(string filename){ try { ifstream in; m->openInputFile(filename, in); int thisIters, size; in >> thisIters >> size; m->gobble(in); nIters += thisIters; for (int i = 0; i < size; i++) { int tempCount; in >> tempCount; m->gobble(in); map >::iterator it = results.find(tempCount); if (it != results.end()) { for (int j = 0; j < thisIters; j++) { double value; in >> value; m->gobble(in); (it->second).push_back(value); } }else { vector tempValues; for (int j = 0; j < thisIters; j++) { double value; in >> value; m->gobble(in); tempValues.push_back(value); } results[tempCount] = tempValues; } } in.close(); } catch(exception& e) { m->errorOut(e, "RareDisplay", "inputTempFiles"); exit(1); } } /***********************************************************************/ void RareDisplay::outputTempFiles(string filename){ try { ofstream out; m->openOutputFile(filename, out); out << nIters-1 << '\t' << results.size() << endl; for (map >::iterator it = results.begin(); it != results.end(); it++) { out << it->first; for(int i = 0; i < (it->second).size(); i++) { out << '\t' << (it->second)[i]; } out << endl; } out.close(); } catch(exception& e) { m->errorOut(e, "RareDisplay", "outputTempFiles"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/raredisplay.h000066400000000000000000000015671255543666200171370ustar00rootroot00000000000000#ifndef RAREDISPLAY_H #define RAREDISPLAY_H #include "sabundvector.hpp" #include "calculator.h" #include "fileoutput.h" #include "display.h" /***********************************************************************/ class RareDisplay : public Display { public: RareDisplay(Calculator* calc, FileOutput* file) : estimate(calc), output(file), nIters(1) {}; ~RareDisplay() { delete estimate; delete output; }; void init(string); void reset(); void update(SAbundVector*); void update(vector shared, int numSeqs, int numGroupComb); void close(); bool isCalcMultiple() { return estimate->getMultiple(); } void outputTempFiles(string); void inputTempFiles(string); private: Calculator* estimate; FileOutput* output; string label; map > results; //maps seqCount to results for that number of sequences int nIters; }; #endif mothur-1.36.1/source/rarefact.cpp000066400000000000000000000226761255543666200167460ustar00rootroot00000000000000/* * rarefact.cpp * Dotur * * Created by Sarah Westcott on 11/18/08. * Copyright 2008 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "rarefact.h" //#include "ordervector.hpp" /***********************************************************************/ int Rarefact::getCurve(float percentFreq = 0.01, int nIters = 1000){ try { RarefactionCurveData* rcd = new RarefactionCurveData(); for(int i=0;iregisterDisplay(displays[i]); } //convert freq percentage to number int increment = 1; if (percentFreq < 1.0) { increment = numSeqs * percentFreq; } else { increment = percentFreq; } #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) if(processors == 1){ driver(rcd, increment, nIters); }else{ vector procIters; int numItersPerProcessor = nIters / processors; //divide iters between processes for (int i = 0; i < processors; i++) { if(i == processors - 1){ numItersPerProcessor = nIters - i * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } createProcesses(procIters, rcd, increment, nIters); } #else driver(rcd, increment, nIters); #endif for(int i=0;iclose(); } delete rcd; return 0; } catch(exception& e) { m->errorOut(e, "Rarefact", "getCurve"); exit(1); } } /***********************************************************************/ int Rarefact::driver(RarefactionCurveData* rcd, int increment, int nIters = 1000){ try { for(int iter=0;iterinit(label); } RAbundVector* lookup = new RAbundVector(order->getNumBins()); SAbundVector* rank = new SAbundVector(order->getMaxRank()+1); random_shuffle(order->begin(), order->end()); for(int i=0;icontrol_pressed) { delete lookup; delete rank; delete rcd; return 0; } int binNumber = order->get(i); int abundance = lookup->get(binNumber); rank->set(abundance, rank->get(abundance)-1); abundance++; lookup->set(binNumber, abundance); rank->set(abundance, rank->get(abundance)+1); if((i == 0) || ((i+1) % increment == 0) || (ends.count(i+1) != 0)){ rcd->updateRankData(rank); } } if((numSeqs % increment != 0) || (ends.count(numSeqs) != 0)){ rcd->updateRankData(rank); } for(int i=0;ireset(); } delete lookup; delete rank; } return 0; } catch(exception& e) { m->errorOut(e, "Rarefact", "driver"); exit(1); } } /**************************************************************************************************/ int Rarefact::createProcesses(vector& procIters, RarefactionCurveData* rcd, int increment, int nIters) { try { #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) int process = 1; vector processIDS; bool recalc = false; EstOutput results; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ driver(rcd, increment, procIters[process]); //pass numSeqs to parent for(int i=0;imothurGetpid(process) + toString(i) + ".rarefact.temp"; displays[i]->outputTempFiles(tempFile); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(process) + "\n"); processors = process; for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } //wait to die for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(toString(processIDS[i]) + toString(j) + ".rarefact.temp"); } } recalc = true; break; } } if (recalc) { //test line, also set recalc to true. //for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } for (int i=0;icontrol_pressed = false; for (int i=0;imothurRemove(toString(processIDS[i]) + toString(j) + ".rarefact.temp");}}processors=3; m->mothurOut("[ERROR]: unable to spawn the number of processes you requested, reducing number to " + toString(processors) + "\n"); vector procIters; int numItersPerProcessor = nIters / processors; //divide iters between processes for (int i = 0; i < processors; i++) { if(i == processors - 1){ numItersPerProcessor = nIters - i * numItersPerProcessor; } procIters.push_back(numItersPerProcessor); } processIDS.resize(0); process = 1; //loop through and create all the processes you want while (process != processors) { pid_t pid = fork(); if (pid > 0) { processIDS.push_back(pid); //create map from line number to pid so you can append files in correct order later process++; }else if (pid == 0){ driver(rcd, increment, procIters[process]); //pass numSeqs to parent for(int i=0;imothurGetpid(process) + toString(i) + ".rarefact.temp"; displays[i]->outputTempFiles(tempFile); } exit(0); }else { m->mothurOut("[ERROR]: unable to spawn the necessary processes."); m->mothurOutEndLine(); for (int i = 0; i < processIDS.size(); i++) { kill (processIDS[i], SIGINT); } exit(0); } } } driver(rcd, increment, procIters[0]); //force parent to wait until all the processes are done for (int i=0;i<(processors-1);i++) { int temp = processIDS[i]; wait(&temp); } //get data created by processes for (int i=0;i<(processors-1);i++) { for(int j=0;jinputTempFiles(s); m->mothurRemove(s); } } return 0; #endif } catch(exception& e) { m->errorOut(e, "Rarefact", "createProcesses"); exit(1); } } /***********************************************************************/ int Rarefact::getSharedCurve(float percentFreq = 0.01, int nIters = 1000){ try { SharedRarefactionCurveData* rcd = new SharedRarefactionCurveData(); label = lookup[0]->getLabel(); //register the displays for(int i=0;iregisterDisplay(displays[i]); } //if jumble is false all iters will be the same if (m->jumble == false) { nIters = 1; } //convert freq percentage to number int increment = 1; if (percentFreq < 1.0) { increment = numSeqs * percentFreq; } else { increment = percentFreq; } for(int iter=0;iterinit(label); } if (m->jumble == true) { //randomize the groups random_shuffle(lookup.begin(), lookup.end()); } //make merge the size of lookup[0] SharedRAbundVector* merge = new SharedRAbundVector(lookup[0]->size()); //make copy of lookup zero for(int i = 0; isize(); i++) { merge->set(i, lookup[0]->getAbundance(i), "merge"); } vector subset; //send each group one at a time for (int k = 0; k < lookup.size(); k++) { if (m->control_pressed) { delete merge; delete rcd; return 0; } subset.clear(); //clears out old pair of sharedrabunds //add in new pair of sharedrabunds subset.push_back(merge); subset.push_back(lookup[k]); rcd->updateSharedData(subset, k+1, numGroupComb); mergeVectors(merge, lookup[k]); } //resets output files for(int i=0;ireset(); } delete merge; } for(int i=0;iclose(); } delete rcd; return 0; } catch(exception& e) { m->errorOut(e, "Rarefact", "getSharedCurve"); exit(1); } } /**************************************************************************************/ void Rarefact::mergeVectors(SharedRAbundVector* shared1, SharedRAbundVector* shared2) { try{ for (int k = 0; k < shared1->size(); k++) { //merge new species into shared1 shared1->set(k, (shared1->getAbundance(k) + shared2->getAbundance(k)), "combo"); //set to 'combo' since this vector now contains multiple groups } } catch(exception& e) { m->errorOut(e, "Rarefact", "mergeVectors"); exit(1); } } mothur-1.36.1/source/rarefact.h000066400000000000000000000017271255543666200164050ustar00rootroot00000000000000#ifndef RAREFACT_H #define RAREFACT_H #include "rarefactioncurvedata.h" #include "raredisplay.h" #include "ordervector.hpp" #include "mothur.h" class Rarefact { public: Rarefact(OrderVector* o, vector disp, int p, set en) : numSeqs(o->getNumSeqs()), order(o), displays(disp), label(o->getLabel()), processors(p), ends(en) { m = MothurOut::getInstance(); } Rarefact(vector shared, vector disp) : lookup(shared), displays(disp) { m = MothurOut::getInstance(); } ~Rarefact(){}; int getCurve(float, int); int getSharedCurve(float, int); private: OrderVector* order; vector displays; int numSeqs, numGroupComb, processors; string label; set ends; void mergeVectors(SharedRAbundVector*, SharedRAbundVector*); vector lookup; MothurOut* m; int createProcesses(vector&, RarefactionCurveData*, int, int); int driver(RarefactionCurveData*, int, int); }; #endif mothur-1.36.1/source/rarefactioncurvedata.h000066400000000000000000000033341255543666200210060ustar00rootroot00000000000000#ifndef RAREFACTIONCURVEDATA_H #define RAREFACTIONCURVEDATA_H #include "mothur.h" #include "sabundvector.hpp" #include "display.h" #include "observable.h" /***********************************************************************/ class RarefactionCurveData : public Observable { public: RarefactionCurveData() : rank(0) {}; void registerDisplay(Display* o) { displays.insert(o); }; void removeDisplay(Display* o) { displays.erase(o); delete o; }; SAbundVector* getRankData() { return rank; }; void rankDataChanged() { notifyDisplays(); }; void updateRankData(SAbundVector* rv) { rank = rv; rankDataChanged(); }; void notifyDisplays(){ for(set::iterator pos=displays.begin();pos!=displays.end();pos++){ (*pos)->update(rank); } }; private: set displays; SAbundVector* rank; }; /***********************************************************************/ class SharedRarefactionCurveData : public Observable { public: SharedRarefactionCurveData() {}; //: shared1(0), shared2(0) void registerDisplay(Display* o) { displays.insert(o); }; void removeDisplay(Display* o) { displays.erase(o); delete o; }; void SharedDataChanged() { notifyDisplays(); }; void updateSharedData(vector r, int numSeqs, int numGroupComb) { shared = r; NumSeqs = numSeqs; NumGroupComb = numGroupComb; SharedDataChanged(); }; void notifyDisplays(){ for(set::iterator pos=displays.begin();pos!=displays.end();pos++){ (*pos)->update(shared, NumSeqs, NumGroupComb); } }; private: set displays; vector shared; int NumSeqs, NumGroupComb; }; /***********************************************************************/ #endif mothur-1.36.1/source/read/000077500000000000000000000000001255543666200153515ustar00rootroot00000000000000mothur-1.36.1/source/read/formatcolumn.cpp000066400000000000000000000247641255543666200206000ustar00rootroot00000000000000/* * formatcolumn.cpp * Mothur * * Created by westcott on 1/13/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "formatcolumn.h" #include "progress.hpp" /***********************************************************************/ FormatColumnMatrix::FormatColumnMatrix(string df) : filename(df){ m->openInputFile(filename, fileHandle); } /***********************************************************************/ int FormatColumnMatrix::read(NameAssignment* nameMap){ try { string firstName, secondName; float distance; int nseqs = nameMap->size(); list = new ListVector(nameMap->getListVector()); Progress* reading = new Progress("Formatting matrix: ", nseqs * nseqs); int lt = 1; int refRow = 0; //we'll keep track of one cell - Cell(refRow,refCol) - and see if it's transpose int refCol = 0; //shows up later - Cell(refCol,refRow). If it does, then its a square matrix //need to see if this is a square or a triangular matrix... ofstream out; string tempOutFile = filename + ".temp"; m->openOutputFile(tempOutFile, out); while(fileHandle && lt == 1){ //let's assume it's a triangular matrix... if (m->control_pressed) { out.close(); m->mothurRemove(tempOutFile); fileHandle.close(); delete reading; return 0; } fileHandle >> firstName >> secondName >> distance; // get the row and column names and distance map::iterator itA = nameMap->find(firstName); map::iterator itB = nameMap->find(secondName); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + firstName + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + secondName + "' was not found in the names file, please correct\n"); exit(1); } if (distance == -1) { distance = 1000000; } if((distance < cutoff) && (itA != itB)){ if(refRow == refCol){ // in other words, if we haven't loaded refRow and refCol... refRow = itA->second; refCol = itB->second; //making it square out << itA->second << '\t' << itB->second << '\t' << distance << endl; out << itB->second << '\t' << itA->second << '\t' << distance << endl; } else if(refRow == itA->second && refCol == itB->second){ lt = 0; } //you are square else if(refRow == itB->second && refCol == itA->second){ lt = 0; } //you are square else{ //making it square out << itA->second << '\t' << itB->second << '\t' << distance << endl; out << itB->second << '\t' << itA->second << '\t' << distance << endl; } reading->update(itA->second * nseqs / 2); } m->gobble(fileHandle); } out.close(); fileHandle.close(); string squareFile; if(lt == 0){ // oops, it was square squareFile = filename; }else{ squareFile = tempOutFile; } //sort file by first column so the distances for each row are together string outfile = m->getRootName(squareFile) + "sorted.dist.temp"; //use the unix sort #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) string command = "sort -n " + squareFile + " -o " + outfile; system(command.c_str()); #else //sort using windows sort string command = "sort " + squareFile + " /O " + outfile; system(command.c_str()); #endif if (m->control_pressed) { m->mothurRemove(tempOutFile); m->mothurRemove(outfile); delete reading; return 0; } //output to new file distance for each row and save positions in file where new row begins ifstream in; m->openInputFile(outfile, in); distFile = outfile + ".rowFormatted"; m->openOutputFile(distFile, out); rowPos.resize(nseqs, -1); int currentRow; int first, second; float dist; map rowMap; map::iterator itRow; //get first currentRow in >> first; currentRow = first; string firstString = toString(first); for(int k = 0; k < firstString.length(); k++) { in.putback(firstString[k]); } while(!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(distFile); m->mothurRemove(tempOutFile); m->mothurRemove(outfile); delete reading; return 0; } in >> first >> second >> dist; m->gobble(in); if (first != currentRow) { //save position in file of each new row rowPos[currentRow] = out.tellp(); out << currentRow << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; currentRow = first; rowMap.clear(); //save row you just read if (dist < cutoff) { rowMap[second] = dist; } }else{ if (dist < cutoff) { rowMap[second] = dist; } } } //print last Row //save position in file of each new row rowPos[currentRow] = out.tellp(); out << currentRow << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; in.close(); out.close(); if (m->control_pressed) { m->mothurRemove(distFile); m->mothurRemove(tempOutFile); m->mothurRemove(outfile); delete reading; return 0; } m->mothurRemove(tempOutFile); m->mothurRemove(outfile); reading->finish(); delete reading; list->setLabel("0"); if (m->control_pressed) { m->mothurRemove(distFile); return 0; } return 1; } catch(exception& e) { m->errorOut(e, "FormatColumnMatrix", "read"); exit(1); } } /***********************************************************************/ int FormatColumnMatrix::read(CountTable* nameMap){ try { string firstName, secondName; float distance; int nseqs = nameMap->size(); list = new ListVector(nameMap->getListVector()); Progress* reading = new Progress("Formatting matrix: ", nseqs * nseqs); int lt = 1; int refRow = 0; //we'll keep track of one cell - Cell(refRow,refCol) - and see if it's transpose int refCol = 0; //shows up later - Cell(refCol,refRow). If it does, then its a square matrix //need to see if this is a square or a triangular matrix... ofstream out; string tempOutFile = filename + ".temp"; m->openOutputFile(tempOutFile, out); while(fileHandle && lt == 1){ //let's assume it's a triangular matrix... if (m->control_pressed) { out.close(); m->mothurRemove(tempOutFile); fileHandle.close(); delete reading; return 0; } fileHandle >> firstName >> secondName >> distance; // get the row and column names and distance int itA = nameMap->get(firstName); int itB = nameMap->get(secondName); if (distance == -1) { distance = 1000000; } if((distance < cutoff) && (itA != itB)){ if(refRow == refCol){ // in other words, if we haven't loaded refRow and refCol... refRow = itA; refCol = itB; //making it square out << itA << '\t' << itB << '\t' << distance << endl; out << itB << '\t' << itA << '\t' << distance << endl; } else if(refRow == itA && refCol == itB){ lt = 0; } //you are square else if(refRow == itB && refCol == itA){ lt = 0; } //you are square else{ //making it square out << itA << '\t' << itB << '\t' << distance << endl; out << itB << '\t' << itA << '\t' << distance << endl; } reading->update(itA * nseqs / 2); } m->gobble(fileHandle); } out.close(); fileHandle.close(); string squareFile; if(lt == 0){ // oops, it was square squareFile = filename; }else{ squareFile = tempOutFile; } //sort file by first column so the distances for each row are together string outfile = m->getRootName(squareFile) + "sorted.dist.temp"; //use the unix sort #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) string command = "sort -n " + squareFile + " -o " + outfile; system(command.c_str()); #else //sort using windows sort string command = "sort " + squareFile + " /O " + outfile; system(command.c_str()); #endif if (m->control_pressed) { m->mothurRemove(tempOutFile); m->mothurRemove(outfile); delete reading; return 0; } //output to new file distance for each row and save positions in file where new row begins ifstream in; m->openInputFile(outfile, in); distFile = outfile + ".rowFormatted"; m->openOutputFile(distFile, out); rowPos.resize(nseqs, -1); int currentRow; int first, second; float dist; map rowMap; map::iterator itRow; //get first currentRow in >> first; currentRow = first; string firstString = toString(first); for(int k = 0; k < firstString.length(); k++) { in.putback(firstString[k]); } while(!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(distFile); m->mothurRemove(tempOutFile); m->mothurRemove(outfile); delete reading; return 0; } in >> first >> second >> dist; m->gobble(in); if (first != currentRow) { //save position in file of each new row rowPos[currentRow] = out.tellp(); out << currentRow << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; currentRow = first; rowMap.clear(); //save row you just read if (dist < cutoff) { rowMap[second] = dist; } }else{ if (dist < cutoff) { rowMap[second] = dist; } } } //print last Row //save position in file of each new row rowPos[currentRow] = out.tellp(); out << currentRow << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; in.close(); out.close(); if (m->control_pressed) { m->mothurRemove(distFile); m->mothurRemove(tempOutFile); m->mothurRemove(outfile); delete reading; return 0; } m->mothurRemove(tempOutFile); m->mothurRemove(outfile); reading->finish(); delete reading; list->setLabel("0"); if (m->control_pressed) { m->mothurRemove(distFile); return 0; } return 1; } catch(exception& e) { m->errorOut(e, "FormatColumnMatrix", "read"); exit(1); } } /***********************************************************************/ FormatColumnMatrix::~FormatColumnMatrix(){} /***********************************************************************/ mothur-1.36.1/source/read/formatcolumn.h000066400000000000000000000010521255543666200202260ustar00rootroot00000000000000#ifndef FORMATCOLUMN_H #define FORMATCOLUMN_H /* * formatcolumn.h * Mothur * * Created by westcott on 1/13/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "formatmatrix.h" /******************************************************/ class FormatColumnMatrix : public FormatMatrix { public: FormatColumnMatrix(string); ~FormatColumnMatrix(); int read(NameAssignment*); int read(CountTable*); private: ifstream fileHandle; string filename; }; /******************************************************/ #endif mothur-1.36.1/source/read/formatmatrix.h000066400000000000000000000042001255543666200202330ustar00rootroot00000000000000#ifndef FORMATMATRIX_H #define FORMATMATRIX_H /* * formatmatrix.h * Mothur * * Created by westcott on 1/13/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "listvector.hpp" #include "nameassignment.hpp" #include "counttable.h" //********************************************************************************************************************** // This class takes a distance matrix file and converts it to a file where each row contains all distances below the cutoff // for a given sequence. // Example: /* 5 A B 0.01 C 0.015 0.03 D 0.03 0.02 0.02 E 0.04 0.05 0.03 0.02 becomes 0 4 1 0.01 2 0.015 3 0.03 4 0.04 1 4 0 0.01 2 0.03 3 0.02 4 0.05 2 4 0 0.015 1 0.03 3 0.02 4 0.03 3 4 0 0.03 1 0.02 2 0.02 4 0.02 4 4 0 0.04 1 0.05 2 0.03 3 0.02 column 1 - sequence name converted to row number column 2 - numDists under cutoff rest of line - sequence row -> distance, sequence row -> distance if you had a cutoff of 0.03 then the file would look like, 0 3 1 0.01 2 0.015 3 0.03 1 3 0 0.01 2 0.03 3 0.02 2 4 0 0.015 1 0.03 3 0.02 4 0.03 3 4 0 0.03 1 0.02 2 0.02 4 0.02 4 2 2 0.03 3 0.02 This class also creates a vector of ints, rowPos. rowPos[0] = position in the file of distances related to sequence 0. If a sequence is excluded by the cutoff, it's rowPos = -1. */ //********************************************************************************************************************** class FormatMatrix { public: FormatMatrix(){ m = MothurOut::getInstance(); } virtual ~FormatMatrix() {} virtual int read(NameAssignment*){ return 1; } virtual int read(CountTable*){ return 1; } void setCutoff(float c) { cutoff = c; } ListVector* getListVector() { return list; } string getFormattedFileName() { return distFile; } vector getRowPositions() { return rowPos; } protected: ListVector* list; float cutoff; string distFile; vector rowPos; MothurOut* m; }; //********************************************************************************************************************** #endif mothur-1.36.1/source/read/formatphylip.cpp000066400000000000000000000366021255543666200206020ustar00rootroot00000000000000/* * formatphylip.cpp * Mothur * * Created by westcott on 1/13/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "formatphylip.h" #include "progress.hpp" /***********************************************************************/ FormatPhylipMatrix::FormatPhylipMatrix(string df) : filename(df) { m->openInputFile(filename, fileHandle); } /***********************************************************************/ //not using nameMap int FormatPhylipMatrix::read(NameAssignment* nameMap){ try { float distance; int square, nseqs; string name; ofstream out; string numTest; fileHandle >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } if(nameMap == NULL){ list = new ListVector(nseqs); list->set(0, name); } else{ list = new ListVector(nameMap->getListVector()); if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } } char d; while((d=fileHandle.get()) != EOF){ if(isalnum(d)){ //you are square square = 1; fileHandle.close(); //reset file //open and get through numSeqs, code below formats rest of file m->openInputFile(filename, fileHandle); fileHandle >> nseqs; m->gobble(fileHandle); distFile = filename + ".rowFormatted"; m->openOutputFile(distFile, out); break; } if(d == '\n'){ square = 0; break; } } Progress* reading; reading = new Progress("Formatting matrix: ", nseqs * nseqs); //lower triangle, so must go to column then formatted row file if(square == 0){ int index = 0; ofstream outTemp; string tempFile = filename + ".temp"; m->openOutputFile(tempFile, outTemp); //convert to square column matrix for(int i=1;i> name; if(nameMap == NULL){ list->set(i, name); } else { if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } } for(int j=0;jcontrol_pressed) { outTemp.close(); m->mothurRemove(tempFile); fileHandle.close(); delete reading; return 0; } fileHandle >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff){ outTemp << i << '\t' << j << '\t' << distance << endl; outTemp << j << '\t' << i << '\t' << distance << endl; } index++; reading->update(index); } } outTemp.close(); //format from square column to rowFormatted //sort file by first column so the distances for each row are together string outfile = m->getRootName(tempFile) + "sorted.dist.temp"; //use the unix sort #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) string command = "sort -n " + tempFile + " -o " + outfile; system(command.c_str()); #else //sort using windows sort string command = "sort " + tempFile + " /O " + outfile; system(command.c_str()); #endif if (m->control_pressed) { m->mothurRemove(tempFile); m->mothurRemove(outfile); delete reading; return 0; } //output to new file distance for each row and save positions in file where new row begins ifstream in; m->openInputFile(outfile, in); distFile = outfile + ".rowFormatted"; m->openOutputFile(distFile, out); rowPos.resize(nseqs, -1); int currentRow; int first, second; float dist; map rowMap; map::iterator itRow; //get first currentRow in >> first; currentRow = first; string firstString = toString(first); for(int k = 0; k < firstString.length(); k++) { in.putback(firstString[k]); } while(!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); m->mothurRemove(distFile); m->mothurRemove(outfile); delete reading; return 0; } in >> first >> second >> dist; m->gobble(in); if (first != currentRow) { //save position in file of each new row rowPos[currentRow] = out.tellp(); out << currentRow << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; currentRow = first; rowMap.clear(); //save row you just read rowMap[second] = dist; index++; reading->update(index); }else{ rowMap[second] = dist; } } //print last Row //save position in file of each new row rowPos[currentRow] = out.tellp(); out << currentRow << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; in.close(); out.close(); m->mothurRemove(tempFile); m->mothurRemove(outfile); if (m->control_pressed) { m->mothurRemove(distFile); delete reading; return 0; } } else{ //square matrix convert directly to formatted row file int index = nseqs; map rowMap; map::iterator itRow; rowPos.resize(nseqs, -1); for(int i=0;i> name; if(nameMap == NULL){ list->set(i, name); } else { if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } } for(int j=0;jcontrol_pressed) { fileHandle.close(); out.close(); m->mothurRemove(distFile); delete reading; return 0; } fileHandle >> distance; if (distance == -1) { distance = 1000000; } if((distance < cutoff) && (j != i)){ rowMap[j] = distance; } index++; reading->update(index); } m->gobble(fileHandle); //save position in file of each new row rowPos[i] = out.tellp(); //output row to file out << i << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; //clear map for new row's info rowMap.clear(); } } reading->finish(); delete reading; fileHandle.close(); out.close(); if (m->control_pressed) { m->mothurRemove(distFile); return 0; } list->setLabel("0"); return 1; } catch(exception& e) { m->errorOut(e, "FormatPhylipMatrix", "read"); exit(1); } } /***********************************************************************/ //not using nameMap int FormatPhylipMatrix::read(CountTable* nameMap){ try { float distance; int square, nseqs; string name; ofstream out; string numTest; fileHandle >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } if(nameMap == NULL){ list = new ListVector(nseqs); list->set(0, name); } else{ list = new ListVector(nameMap->getListVector()); nameMap->get(name); } char d; while((d=fileHandle.get()) != EOF){ if(isalnum(d)){ //you are square square = 1; fileHandle.close(); //reset file //open and get through numSeqs, code below formats rest of file m->openInputFile(filename, fileHandle); fileHandle >> nseqs; m->gobble(fileHandle); distFile = filename + ".rowFormatted"; m->openOutputFile(distFile, out); break; } if(d == '\n'){ square = 0; break; } } Progress* reading; reading = new Progress("Formatting matrix: ", nseqs * nseqs); //lower triangle, so must go to column then formatted row file if(square == 0){ int index = 0; ofstream outTemp; string tempFile = filename + ".temp"; m->openOutputFile(tempFile, outTemp); //convert to square column matrix for(int i=1;i> name; if(nameMap == NULL){ list->set(i, name); } else { nameMap->get(name); } for(int j=0;jcontrol_pressed) { outTemp.close(); m->mothurRemove(tempFile); fileHandle.close(); delete reading; return 0; } fileHandle >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff){ outTemp << i << '\t' << j << '\t' << distance << endl; outTemp << j << '\t' << i << '\t' << distance << endl; } index++; reading->update(index); } } outTemp.close(); //format from square column to rowFormatted //sort file by first column so the distances for each row are together string outfile = m->getRootName(tempFile) + "sorted.dist.temp"; //use the unix sort #if defined (__APPLE__) || (__MACH__) || (linux) || (__linux) || (__linux__) || (__unix__) || (__unix) string command = "sort -n " + tempFile + " -o " + outfile; system(command.c_str()); #else //sort using windows sort string command = "sort " + tempFile + " /O " + outfile; system(command.c_str()); #endif if (m->control_pressed) { m->mothurRemove(tempFile); m->mothurRemove(outfile); delete reading; return 0; } //output to new file distance for each row and save positions in file where new row begins ifstream in; m->openInputFile(outfile, in); distFile = outfile + ".rowFormatted"; m->openOutputFile(distFile, out); rowPos.resize(nseqs, -1); int currentRow; int first, second; float dist; map rowMap; map::iterator itRow; //get first currentRow in >> first; currentRow = first; string firstString = toString(first); for(int k = 0; k < firstString.length(); k++) { in.putback(firstString[k]); } while(!in.eof()) { if (m->control_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); m->mothurRemove(distFile); m->mothurRemove(outfile); delete reading; return 0; } in >> first >> second >> dist; m->gobble(in); if (first != currentRow) { //save position in file of each new row rowPos[currentRow] = out.tellp(); out << currentRow << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; currentRow = first; rowMap.clear(); //save row you just read rowMap[second] = dist; index++; reading->update(index); }else{ rowMap[second] = dist; } } //print last Row //save position in file of each new row rowPos[currentRow] = out.tellp(); out << currentRow << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; in.close(); out.close(); m->mothurRemove(tempFile); m->mothurRemove(outfile); if (m->control_pressed) { m->mothurRemove(distFile); delete reading; return 0; } } else{ //square matrix convert directly to formatted row file int index = nseqs; map rowMap; map::iterator itRow; rowPos.resize(nseqs, -1); for(int i=0;i> name; if(nameMap == NULL){ list->set(i, name); } else { nameMap->get(name); } for(int j=0;jcontrol_pressed) { fileHandle.close(); out.close(); m->mothurRemove(distFile); delete reading; return 0; } fileHandle >> distance; if (distance == -1) { distance = 1000000; } if((distance < cutoff) && (j != i)){ rowMap[j] = distance; } index++; reading->update(index); } m->gobble(fileHandle); //save position in file of each new row rowPos[i] = out.tellp(); //output row to file out << i << '\t' << rowMap.size() << '\t'; for (itRow = rowMap.begin(); itRow != rowMap.end(); itRow++) { out << itRow->first << '\t' << itRow->second << '\t'; } out << endl; //clear map for new row's info rowMap.clear(); } } reading->finish(); delete reading; fileHandle.close(); out.close(); if (m->control_pressed) { m->mothurRemove(distFile); return 0; } list->setLabel("0"); return 1; } catch(exception& e) { m->errorOut(e, "FormatPhylipMatrix", "read"); exit(1); } } /***********************************************************************/ FormatPhylipMatrix::~FormatPhylipMatrix(){} /***********************************************************************/ mothur-1.36.1/source/read/formatphylip.h000066400000000000000000000010541255543666200202400ustar00rootroot00000000000000#ifndef FORMATPHYLIP_H #define FORMATPHYLIP_H /* * formatphylip.h * Mothur * * Created by westcott on 1/13/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "formatmatrix.h" /******************************************************/ class FormatPhylipMatrix : public FormatMatrix { public: FormatPhylipMatrix(string); ~FormatPhylipMatrix(); int read(NameAssignment*); int read(CountTable*); private: ifstream fileHandle; string filename; }; /******************************************************/ #endif mothur-1.36.1/source/read/readblast.cpp000066400000000000000000000327251255543666200200270ustar00rootroot00000000000000/* * readblast.cpp * Mothur * * Created by westcott on 12/10/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "readblast.h" #include "progress.hpp" //******************************************************************************************************************** //sorts lowest to highest inline bool compareOverlap(seqDist left, seqDist right){ return (left.dist < right.dist); } /*********************************************************************************************/ ReadBlast::ReadBlast(string file, float c, float p, int l, bool ms, bool h) : blastfile(file), cutoff(c), penalty(p), length(l), minWanted(ms), hclusterWanted(h) { try { m = MothurOut::getInstance(); matrix = NULL; } catch(exception& e) { m->errorOut(e, "ReadBlast", "ReadBlast"); exit(1); } } /*********************************************************************************************/ //assumptions about the blast file: //1. if duplicate lines occur the first line is always best and is chosen //2. blast scores are grouped together, ie. a a .... score, a b .... score, a c ....score... int ReadBlast::read(NameAssignment* nameMap) { try { //if the user has not given a names file read names from blastfile if (nameMap->size() == 0) { readNames(nameMap); } int nseqs = nameMap->size(); if (m->control_pressed) { return 0; } ifstream fileHandle; m->openInputFile(blastfile, fileHandle); string firstName, secondName, eScore, currentRow; string repeatName = ""; int count = 1; float distance, thisoverlap, refScore; float percentId; float numBases, mismatch, gap, startQuery, endQuery, startRef, endRef, score, lengthThisSeq; ofstream outDist; ofstream outOverlap; //create objects needed for read if (!hclusterWanted) { matrix = new SparseDistanceMatrix(); matrix->resize(nseqs); }else{ overlapFile = m->getRootName(blastfile) + "overlap.dist"; distFile = m->getRootName(blastfile) + "hclusterDists.dist"; m->openOutputFile(overlapFile, outOverlap); m->openOutputFile(distFile, outDist); } if (m->control_pressed) { fileHandle.close(); if (!hclusterWanted) { delete matrix; } else { outOverlap.close(); m->mothurRemove(overlapFile); outDist.close(); m->mothurRemove(distFile); } return 0; } Progress* reading = new Progress("Reading blast: ", nseqs * nseqs); //this is used to quickly find if we already have a distance for this combo vector< map > dists; dists.resize(nseqs); //dists[0][1] = distance from seq0 to seq1 map thisRowsBlastScores; if (!fileHandle.eof()) { //read in line from file fileHandle >> firstName >> secondName >> percentId >> numBases >> mismatch >> gap >> startQuery >> endQuery >> startRef >> endRef >> eScore >> score; m->gobble(fileHandle); currentRow = firstName; lengthThisSeq = numBases; repeatName = firstName + secondName; if (firstName == secondName) { refScore = score; } else{ //convert name to number map::iterator itA = nameMap->find(firstName); map::iterator itB = nameMap->find(secondName); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + firstName + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + secondName + "' was not found in the names file, please correct\n"); exit(1); } thisRowsBlastScores[itB->second] = score; //calc overlap score thisoverlap = 1.0 - (percentId * (lengthThisSeq - startQuery) / endRef / 100.0 - penalty); //if there is a valid overlap, add it if ((startRef <= length) && ((endQuery+length) >= lengthThisSeq) && (thisoverlap < cutoff)) { if (!hclusterWanted) { seqDist overlapValue(itA->second, itB->second, thisoverlap); overlap.push_back(overlapValue); }else { outOverlap << itA->first << '\t' << itB->first << '\t' << thisoverlap << endl; } } } }else { m->mothurOut("Error in your blast file, cannot read."); m->mothurOutEndLine(); exit(1); } //read file while(!fileHandle.eof()){ if (m->control_pressed) { fileHandle.close(); if (!hclusterWanted) { delete matrix; } else { outOverlap.close(); m->mothurRemove(overlapFile); outDist.close(); m->mothurRemove(distFile); } delete reading; return 0; } //read in line from file fileHandle >> firstName >> secondName >> percentId >> numBases >> mismatch >> gap >> startQuery >> endQuery >> startRef >> endRef >> eScore >> score; //cout << firstName << '\t' << secondName << '\t' << percentId << '\t' << numBases << '\t' << mismatch << '\t' << gap << '\t' << startQuery << '\t' << endQuery << '\t' << startRef << '\t' << endRef << '\t' << eScore << '\t' << score << endl; m->gobble(fileHandle); string temp = firstName + secondName; //to check if this file has repeat lines, ie. is this a blast instead of a blscreen file //if this is a new pairing if (temp != repeatName) { repeatName = temp; if (currentRow == firstName) { //cout << "first = " << firstName << " second = " << secondName << endl; if (firstName == secondName) { refScore = score; reading->update((count + nseqs)); count++; }else{ //convert name to number map::iterator itA = nameMap->find(firstName); map::iterator itB = nameMap->find(secondName); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + firstName + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + secondName + "' was not found in the names file, please correct\n"); exit(1); } //save score thisRowsBlastScores[itB->second] = score; //calc overlap score thisoverlap = 1.0 - (percentId * (lengthThisSeq - startQuery) / endRef / 100.0 - penalty); //if there is a valid overlap, add it if ((startRef <= length) && ((endQuery+length) >= lengthThisSeq) && (thisoverlap < cutoff)) { if (!hclusterWanted) { seqDist overlapValue(itA->second, itB->second, thisoverlap); //cout << "overlap = " << itA->second << '\t' << itB->second << '\t' << thisoverlap << endl; overlap.push_back(overlapValue); }else { outOverlap << itA->first << '\t' << itB->first << '\t' << thisoverlap << endl; } } } //end else }else { //end row //convert blast scores to distance and add cell to sparse matrix if we can map::iterator it; map::iterator itDist; for(it=thisRowsBlastScores.begin(); it!=thisRowsBlastScores.end(); it++) { distance = 1.0 - (it->second / refScore); //do we already have the distance calculated for b->a map::iterator itA = nameMap->find(currentRow); itDist = dists[it->first].find(itA->second); //if we have it then compare if (itDist != dists[it->first].end()) { //if you want the minimum blast score ratio, then pick max distance if(minWanted) { distance = max(itDist->second, distance); } else{ distance = min(itDist->second, distance); } //is this distance below cutoff if (distance < cutoff) { if (!hclusterWanted) { if (itA->second < it->first) { PDistCell value(it->first, distance); matrix->addCell(itA->second, value); }else { PDistCell value(itA->second, distance); matrix->addCell(it->first, value); } }else{ outDist << itA->first << '\t' << nameMap->get(it->first) << '\t' << distance << endl; } } //not going to need this again dists[it->first].erase(itDist); }else { //save this value until we get the other ratio dists[itA->second][it->first] = distance; } } //clear out last rows info thisRowsBlastScores.clear(); currentRow = firstName; lengthThisSeq = numBases; //add this row to thisRowsBlastScores if (firstName == secondName) { refScore = score; } else{ //add this row to thisRowsBlastScores //convert name to number map::iterator itA = nameMap->find(firstName); map::iterator itB = nameMap->find(secondName); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + firstName + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + secondName + "' was not found in the names file, please correct\n"); exit(1); } thisRowsBlastScores[itB->second] = score; //calc overlap score thisoverlap = 1.0 - (percentId * (lengthThisSeq - startQuery) / endRef / 100.0 - penalty); //if there is a valid overlap, add it if ((startRef <= length) && ((endQuery+length) >= lengthThisSeq) && (thisoverlap < cutoff)) { if (!hclusterWanted) { seqDist overlapValue(itA->second, itB->second, thisoverlap); overlap.push_back(overlapValue); }else { outOverlap << itA->first << '\t' << itB->first << '\t' << thisoverlap << endl; } } } }//end if current row }//end if repeat }//end while //get last rows info stored //convert blast scores to distance and add cell to sparse matrix if we can map::iterator it; map::iterator itDist; for(it=thisRowsBlastScores.begin(); it!=thisRowsBlastScores.end(); it++) { distance = 1.0 - (it->second / refScore); //do we already have the distance calculated for b->a map::iterator itA = nameMap->find(currentRow); itDist = dists[it->first].find(itA->second); //if we have it then compare if (itDist != dists[it->first].end()) { //if you want the minimum blast score ratio, then pick max distance if(minWanted) { distance = max(itDist->second, distance); } else{ distance = min(itDist->second, distance); } //is this distance below cutoff if (distance < cutoff) { if (!hclusterWanted) { if (itA->second < it->first) { PDistCell value(it->first, distance); matrix->addCell(itA->second, value); }else { PDistCell value(itA->second, distance); matrix->addCell(it->first, value); } }else{ outDist << itA->first << '\t' << nameMap->get(it->first) << '\t' << distance << endl; } } //not going to need this again dists[it->first].erase(itDist); }else { //save this value until we get the other ratio dists[itA->second][it->first] = distance; } } //clear out info thisRowsBlastScores.clear(); dists.clear(); if (m->control_pressed) { fileHandle.close(); if (!hclusterWanted) { delete matrix; } else { outOverlap.close(); m->mothurRemove(overlapFile); outDist.close(); m->mothurRemove(distFile); } delete reading; return 0; } if (!hclusterWanted) { sort(overlap.begin(), overlap.end(), compareOverlap); }else { outDist.close(); outOverlap.close(); } if (m->control_pressed) { fileHandle.close(); if (!hclusterWanted) { delete matrix; } else { m->mothurRemove(overlapFile); m->mothurRemove(distFile); } delete reading; return 0; } reading->finish(); delete reading; fileHandle.close(); return 0; } catch(exception& e) { m->errorOut(e, "ReadBlast", "read"); exit(1); } } /*********************************************************************************************/ int ReadBlast::readNames(NameAssignment* nameMap) { try { m->mothurOut("Reading names... "); cout.flush(); string name, hold, prevName; int num = 1; ifstream in; m->openInputFile(blastfile, in); //ofstream outName; //m->openOutputFile((blastfile + ".tempOutNames"), outName); //read first line in >> prevName; for (int i = 0; i < 11; i++) { in >> hold; } m->gobble(in); //save name in nameMap nameMap->push_back(prevName); while (!in.eof()) { if (m->control_pressed) { in.close(); return 0; } //read line in >> name; for (int i = 0; i < 11; i++) { in >> hold; } m->gobble(in); //is this a new name? if (name != prevName) { prevName = name; if (nameMap->get(name) != -1) { m->mothurOut("[ERROR]: trying to exact names from blast file, and I found dups. Are you sequence names unique? quitting.\n"); m->control_pressed = true; } else { nameMap->push_back(name); } //outName << name << '\t' << name << endl; num++; } } in.close(); //write out names file //string outNames = m->getRootName(blastfile) + "names"; //ofstream out; //m->openOutputFile(outNames, out); //nameMap->print(out); //out.close(); if (m->control_pressed) { return 0; } m->mothurOut(toString(num) + " names read."); m->mothurOutEndLine(); return 0; } catch(exception& e) { m->errorOut(e, "ReadBlast", "readNames"); exit(1); } } /*********************************************************************************************/ mothur-1.36.1/source/read/readblast.h000066400000000000000000000031171255543666200174650ustar00rootroot00000000000000#ifndef READBLAST_H #define READBLAST_H /* * readblast.h * Mothur * * Created by westcott on 12/10/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "sparsedistancematrix.h" #include "nameassignment.hpp" /****************************************************************************************/ //Note: this class creates a sparsematrix and list if the read is executed, but does not delete them on deconstruction. //the user of this object is responsible for deleting the matrix and list if they call the read or there will be a memory leak //it is done this way so the read can be deleted and the information still used. class ReadBlast { public: ReadBlast(string, float, float, int, bool, bool); //blastfile, cutoff, penalty, length of overlap, min or max bsr, hclusterWanted ~ReadBlast() {} int read(NameAssignment*); SparseDistanceMatrix* getDistMatrix() { return matrix; } vector getOverlapMatrix() { return overlap; } string getOverlapFile() { return overlapFile; } string getDistFile() { return distFile; } private: string blastfile, overlapFile, distFile; int length; //number of amino acids overlapped float penalty, cutoff; //penalty is used to adjust error rate bool minWanted; //if true choose min bsr, if false choose max bsr bool hclusterWanted; SparseDistanceMatrix* matrix; vector overlap; MothurOut* m; int readNames(NameAssignment*); }; /*******************************************************************************************/ #endif mothur-1.36.1/source/read/readcluster.cpp000066400000000000000000000241331255543666200203750ustar00rootroot00000000000000/* * readcluster.cpp * Mothur * * Created by westcott on 10/28/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "readcluster.h" /***********************************************************************/ ReadCluster::ReadCluster(string distfile, float c, string o, bool s){ m = MothurOut::getInstance(); distFile = distfile; cutoff = c; outputDir = o; sortWanted = s; list = NULL; } /***********************************************************************/ int ReadCluster::read(NameAssignment*& nameMap){ try { if (format == "phylip") { convertPhylip2Column(nameMap); } else { list = new ListVector(nameMap->getListVector()); } if (m->control_pressed) { return 0; } if (sortWanted) { OutPutFile = m->sortFile(distFile, outputDir); } else { OutPutFile = distFile; } //for use by clusters splitMatrix to convert a phylip matrix to column return 0; } catch(exception& e) { m->errorOut(e, "ReadCluster", "read"); exit(1); } } /***********************************************************************/ int ReadCluster::read(CountTable*& ct){ try { if (format == "phylip") { convertPhylip2Column(ct); } else { list = new ListVector(ct->getListVector()); } if (m->control_pressed) { return 0; } if (sortWanted) { OutPutFile = m->sortFile(distFile, outputDir); } else { OutPutFile = distFile; } //for use by clusters splitMatrix to convert a phylip matrix to column return 0; } catch(exception& e) { m->errorOut(e, "ReadCluster", "read"); exit(1); } } /***********************************************************************/ int ReadCluster::convertPhylip2Column(NameAssignment*& nameMap){ try { //convert phylip file to column file map rowToName; map::iterator it; ifstream in; ofstream out; string tempFile = distFile + ".column.temp"; m->openInputFile(distFile, in); m->gobble(in); m->openOutputFile(tempFile, out); float distance; int square, nseqs; string name; vector matrixNames; string numTest; in >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } rowToName[0] = name; matrixNames.push_back(name); if(nameMap == NULL){ list = new ListVector(nseqs); list->set(0, name); } else{ list = new ListVector(nameMap->getListVector()); if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } } char d; while((d=in.get()) != EOF){ if(isalnum(d)){ square = 1; in.putback(d); for(int i=0;i> distance; } break; } if(d == '\n'){ square = 0; break; } } if(square == 0){ for(int i=1;i> name; rowToName[i] = name; matrixNames.push_back(name); //there's A LOT of repeated code throughout this method... if(nameMap == NULL){ list->set(i, name); for(int j=0;jcontrol_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); return 0; } in >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff){ out << i << '\t' << j << '\t' << distance << endl; } } } else{ if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } for(int j=0;jcontrol_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); return 0; } in >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff){ out << i << '\t' << j << '\t' << distance << endl; } } } } } else{ for(int i=1;i> name; rowToName[i] = name; matrixNames.push_back(name); if(nameMap == NULL){ list->set(i, name); for(int j=0;jcontrol_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); return 0; } in >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff && j < i){ out << i << '\t' << j << '\t' << distance << endl; } } } else{ if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } for(int j=0;jcontrol_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); return 0; } in >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff && j < i){ out << i << '\t' << j << '\t' << distance << endl; } } } } } list->setLabel("0"); in.close(); out.close(); if(nameMap == NULL){ nameMap = new NameAssignment(); for(int i=0;ipush_back(matrixNames[i]); } } ifstream in2; ofstream out2; string outputFile = m->getRootName(distFile) + "column.dist"; m->openInputFile(tempFile, in2); m->openOutputFile(outputFile, out2); int first, second; float dist; while (in2) { if (m->control_pressed) { in2.close(); out2.close(); m->mothurRemove(tempFile); m->mothurRemove(outputFile); return 0; } in2 >> first >> second >> dist; out2 << rowToName[first] << '\t' << rowToName[second] << '\t' << dist << endl; m->gobble(in2); } in2.close(); out2.close(); m->mothurRemove(tempFile); distFile = outputFile; if (m->control_pressed) { m->mothurRemove(outputFile); } return 0; } catch(exception& e) { m->errorOut(e, "ReadCluster", "convertPhylip2Column"); exit(1); } } /***********************************************************************/ int ReadCluster::convertPhylip2Column(CountTable*& ct){ try { //convert phylip file to column file map rowToName; map::iterator it; ifstream in; ofstream out; string tempFile = distFile + ".column.temp"; m->openInputFile(distFile, in); m->gobble(in); m->openOutputFile(tempFile, out); float distance; int square, nseqs; string name; vector matrixNames; string numTest; in >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } rowToName[0] = name; matrixNames.push_back(name); if(ct == NULL){ list = new ListVector(nseqs); list->set(0, name); } else{ list = new ListVector(ct->getListVector()); } char d; while((d=in.get()) != EOF){ if(isalnum(d)){ square = 1; in.putback(d); for(int i=0;i> distance; } break; } if(d == '\n'){ square = 0; break; } } if(square == 0){ for(int i=1;i> name; rowToName[i] = name; matrixNames.push_back(name); //there's A LOT of repeated code throughout this method... if(ct == NULL){ list->set(i, name); for(int j=0;jcontrol_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); return 0; } in >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff){ out << i << '\t' << j << '\t' << distance << endl; } } } else{ for(int j=0;jcontrol_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); return 0; } in >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff){ out << i << '\t' << j << '\t' << distance << endl; } } } } } else{ for(int i=1;i> name; rowToName[i] = name; matrixNames.push_back(name); if(ct == NULL){ list->set(i, name); for(int j=0;jcontrol_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); return 0; } in >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff && j < i){ out << i << '\t' << j << '\t' << distance << endl; } } } else{ for(int j=0;jcontrol_pressed) { in.close(); out.close(); m->mothurRemove(tempFile); return 0; } in >> distance; if (distance == -1) { distance = 1000000; } if(distance < cutoff && j < i){ out << i << '\t' << j << '\t' << distance << endl; } } } } } list->setLabel("0"); in.close(); out.close(); if(ct == NULL){ ct = new CountTable(); for(int i=0;ipush_back(matrixNames[i]); } } ifstream in2; ofstream out2; string outputFile = m->getRootName(distFile) + "column.dist"; m->openInputFile(tempFile, in2); m->openOutputFile(outputFile, out2); int first, second; float dist; while (in2) { if (m->control_pressed) { in2.close(); out2.close(); m->mothurRemove(tempFile); m->mothurRemove(outputFile); return 0; } in2 >> first >> second >> dist; out2 << rowToName[first] << '\t' << rowToName[second] << '\t' << dist << endl; m->gobble(in2); } in2.close(); out2.close(); m->mothurRemove(tempFile); distFile = outputFile; if (m->control_pressed) { m->mothurRemove(outputFile); } return 0; } catch(exception& e) { m->errorOut(e, "ReadCluster", "convertPhylip2Column"); exit(1); } } /***********************************************************************/ ReadCluster::~ReadCluster(){} /***********************************************************************/ mothur-1.36.1/source/read/readcluster.h000066400000000000000000000016251255543666200200430ustar00rootroot00000000000000#ifndef READCLUSTER_H #define READCLUSTER_H /* * readcluster.h * Mothur * * Created by westcott on 10/28/09. * Copyright 2009 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "nameassignment.hpp" #include "listvector.hpp" #include "counttable.h" /******************************************************/ class ReadCluster { public: ReadCluster(string, float, string, bool); ~ReadCluster(); int read(NameAssignment*&); int read(CountTable*&); string getOutputFile() { return OutPutFile; } void setFormat(string f) { format = f; } ListVector* getListVector() { return list; } private: string distFile, outputDir; string OutPutFile, format; ListVector* list; float cutoff; MothurOut* m; bool sortWanted; int convertPhylip2Column(NameAssignment*&); int convertPhylip2Column(CountTable*&); }; /******************************************************/ #endif mothur-1.36.1/source/read/readcolumn.cpp000066400000000000000000000213021255543666200202040ustar00rootroot00000000000000/* * readcolumn.cpp * Mothur * * Created by Sarah Westcott on 4/21/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "readcolumn.h" #include "progress.hpp" /***********************************************************************/ ReadColumnMatrix::ReadColumnMatrix(string df) : distFile(df){ successOpen = m->openInputFile(distFile, fileHandle); sim = false; } /***********************************************************************/ ReadColumnMatrix::ReadColumnMatrix(string df, bool s) : distFile(df){ successOpen = m->openInputFile(distFile, fileHandle); sim = s; } /***********************************************************************/ int ReadColumnMatrix::read(NameAssignment* nameMap){ try { string firstName, secondName; float distance; int nseqs = nameMap->size(); DMatrix->resize(nseqs); list = new ListVector(nameMap->getListVector()); Progress* reading = new Progress("Reading matrix: ", nseqs * nseqs); int lt = 1; int refRow = 0; //we'll keep track of one cell - Cell(refRow,refCol) - and see if it's transpose int refCol = 0; //shows up later - Cell(refCol,refRow). If it does, then its a square matrix //need to see if this is a square or a triangular matrix... while(fileHandle && lt == 1){ //let's assume it's a triangular matrix... fileHandle >> firstName; m->gobble(fileHandle); fileHandle >> secondName; m->gobble(fileHandle); fileHandle >> distance; // get the row and column names and distance if (m->debug) { cout << firstName << '\t' << secondName << '\t' << distance << endl; } if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } map::iterator itA = nameMap->find(firstName); map::iterator itB = nameMap->find(secondName); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + firstName + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + secondName + "' was not found in the names file, please correct\n"); exit(1); } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff && itA != itB){ if(itA->second > itB->second){ PDistCell value(itA->second, distance); if(refRow == refCol){ // in other words, if we haven't loaded refRow and refCol... refRow = itA->second; refCol = itB->second; DMatrix->addCell(itB->second, value); } else if(refRow == itA->second && refCol == itB->second){ lt = 0; } else{ DMatrix->addCell(itB->second, value); } } else if(itA->second < itB->second){ PDistCell value(itB->second, distance); if(refRow == refCol){ // in other words, if we haven't loaded refRow and refCol... refRow = itA->second; refCol = itB->second; DMatrix->addCell(itA->second, value); } else if(refRow == itB->second && refCol == itA->second){ lt = 0; } else{ DMatrix->addCell(itA->second, value); } } reading->update(itA->second * nseqs); } m->gobble(fileHandle); } if(lt == 0){ // oops, it was square fileHandle.close(); //let's start over DMatrix->clear(); //let's start over m->openInputFile(distFile, fileHandle); //let's start over while(fileHandle){ fileHandle >> firstName; m->gobble(fileHandle); fileHandle >> secondName; m->gobble(fileHandle); fileHandle >> distance; // get the row and column names and distance if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } map::iterator itA = nameMap->find(firstName); map::iterator itB = nameMap->find(secondName); if(itA == nameMap->end()){ m->mothurOut("AAError: Sequence '" + firstName + "' was not found in the names file, please correct\n"); exit(1); } if(itB == nameMap->end()){ m->mothurOut("ABError: Sequence '" + secondName + "' was not found in the names file, please correct\n"); exit(1); } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff && itA->second > itB->second){ PDistCell value(itA->second, distance); DMatrix->addCell(itB->second, value); reading->update(itA->second * nseqs); } m->gobble(fileHandle); } } if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } reading->finish(); fileHandle.close(); list->setLabel("0"); return 1; } catch(exception& e) { m->errorOut(e, "ReadColumnMatrix", "read"); exit(1); } } /***********************************************************************/ int ReadColumnMatrix::read(CountTable* countTable){ try { string firstName, secondName; float distance; int nseqs = countTable->size(); DMatrix->resize(nseqs); list = new ListVector(countTable->getListVector()); Progress* reading = new Progress("Reading matrix: ", nseqs * nseqs); int lt = 1; int refRow = 0; //we'll keep track of one cell - Cell(refRow,refCol) - and see if it's transpose int refCol = 0; //shows up later - Cell(refCol,refRow). If it does, then its a square matrix //need to see if this is a square or a triangular matrix... while(fileHandle && lt == 1){ //let's assume it's a triangular matrix... fileHandle >> firstName; m->gobble(fileHandle); fileHandle >> secondName; m->gobble(fileHandle); fileHandle >> distance; // get the row and column names and distance if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } int itA = countTable->get(firstName); int itB = countTable->get(secondName); if (m->control_pressed) { exit(1); } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff && itA != itB){ if(itA > itB){ PDistCell value(itA, distance); if(refRow == refCol){ // in other words, if we haven't loaded refRow and refCol... refRow = itA; refCol = itB; DMatrix->addCell(itB, value); } else if(refRow == itA && refCol == itB){ lt = 0; } else{ DMatrix->addCell(itB, value); } } else if(itA < itB){ PDistCell value(itB, distance); if(refRow == refCol){ // in other words, if we haven't loaded refRow and refCol... refRow = itA; refCol = itB; DMatrix->addCell(itA, value); } else if(refRow == itB && refCol == itA){ lt = 0; } else{ DMatrix->addCell(itA, value); } } reading->update(itA * nseqs); } m->gobble(fileHandle); } if(lt == 0){ // oops, it was square fileHandle.close(); //let's start over DMatrix->clear(); //let's start over m->openInputFile(distFile, fileHandle); //let's start over while(fileHandle){ fileHandle >> firstName; m->gobble(fileHandle); fileHandle >> secondName; m->gobble(fileHandle); fileHandle >> distance; // get the row and column names and distance if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } int itA = countTable->get(firstName); int itB = countTable->get(secondName); if (m->control_pressed) { exit(1); } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff && itA > itB){ PDistCell value(itA, distance); DMatrix->addCell(itB, value); reading->update(itA * nseqs); } m->gobble(fileHandle); } } if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } reading->finish(); fileHandle.close(); list->setLabel("0"); return 1; } catch(exception& e) { m->errorOut(e, "ReadColumnMatrix", "read"); exit(1); } } /***********************************************************************/ ReadColumnMatrix::~ReadColumnMatrix(){} /***********************************************************************/ mothur-1.36.1/source/read/readcolumn.h000066400000000000000000000011161255543666200176520ustar00rootroot00000000000000#ifndef READCOLUMN_H #define READCOLUMN_H /* * readcolumn.h * Mothur * * Created by Sarah Westcott on 4/21/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "readmatrix.hpp" /******************************************************/ class ReadColumnMatrix : public ReadMatrix { public: ReadColumnMatrix(string); ReadColumnMatrix(string, bool); ~ReadColumnMatrix(); int read(NameAssignment*); int read(CountTable*); private: ifstream fileHandle; string distFile; }; /******************************************************/ #endif mothur-1.36.1/source/read/readmatrix.hpp000066400000000000000000000015211255543666200202210ustar00rootroot00000000000000#ifndef READMATRIX_HPP #define READMATRIX_HPP /* * readmatrix.hpp * * * Created by Pat Schloss on 8/13/08. * Copyright 2008 Patrick D. Schloss. All rights reserved. * */ #include "mothur.h" #include "listvector.hpp" #include "nameassignment.hpp" #include "counttable.h" #include "sparsedistancematrix.h" class ReadMatrix { public: ReadMatrix(){ DMatrix = new SparseDistanceMatrix(); m = MothurOut::getInstance(); } virtual ~ReadMatrix() {} virtual int read(NameAssignment*){ return 1; } virtual int read(CountTable*){ return 1; } void setCutoff(float c) { cutoff = c; } SparseDistanceMatrix* getDMatrix() { return DMatrix; } ListVector* getListVector() { return list; } int successOpen; protected: SparseDistanceMatrix* DMatrix; ListVector* list; float cutoff; MothurOut* m; bool sim; }; #endif mothur-1.36.1/source/read/readphylip.cpp000066400000000000000000000415371255543666200202300ustar00rootroot00000000000000/* * readphylip.cpp * Mothur * * Created by Sarah Westcott on 4/21/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "readphylip.h" #include "progress.hpp" /***********************************************************************/ ReadPhylipMatrix::ReadPhylipMatrix(string distFile){ successOpen = m->openInputFile(distFile, fileHandle); sim=false; } /***********************************************************************/ ReadPhylipMatrix::ReadPhylipMatrix(string distFile, bool s){ successOpen = m->openInputFile(distFile, fileHandle); sim=s; } /***********************************************************************/ int ReadPhylipMatrix::read(NameAssignment* nameMap){ try { float distance; int square, nseqs; string name; vector matrixNames; string numTest; fileHandle >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } matrixNames.push_back(name); if(nameMap == NULL){ list = new ListVector(nseqs); list->set(0, name); } else{ list = new ListVector(nameMap->getListVector()); if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } } char d; while((d=fileHandle.get()) != EOF){ if(isalnum(d)){ square = 1; fileHandle.putback(d); for(int i=0;i> distance; } break; } if(d == '\n'){ square = 0; break; } } Progress* reading; DMatrix->resize(nseqs); if(square == 0){ reading = new Progress("Reading matrix: ", nseqs * (nseqs - 1) / 2); int index = 0; for(int i=1;icontrol_pressed) { fileHandle.close(); delete reading; return 0; } fileHandle >> name; matrixNames.push_back(name); //there's A LOT of repeated code throughout this method... if(nameMap == NULL){ list->set(i, name); for(int j=0;jcontrol_pressed) { delete reading; fileHandle.close(); return 0; } fileHandle >> distance; if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff){ PDistCell value(i, distance); DMatrix->addCell(j, value); } index++; reading->update(index); } } else{ if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } for(int j=0;j> distance; if (m->control_pressed) { delete reading; fileHandle.close(); return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff){ PDistCell value(nameMap->get(matrixNames[i]), distance); DMatrix->addCell(nameMap->get(matrixNames[j]), value); } index++; reading->update(index); } } } } else{ reading = new Progress("Reading matrix: ", nseqs * nseqs); int index = nseqs; for(int i=1;i> name; matrixNames.push_back(name); if(nameMap == NULL){ list->set(i, name); for(int j=0;j> distance; if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff && j < i){ PDistCell value(i, distance); DMatrix->addCell(j, value); } index++; reading->update(index); } } else{ if(nameMap->count(name)==0){ m->mothurOut("Error: Sequence '" + name + "' was not found in the names file, please correct"); m->mothurOutEndLine(); } for(int j=0;j> distance; if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff && j < i){ PDistCell value(nameMap->get(matrixNames[i]), distance); DMatrix->addCell(nameMap->get(matrixNames[j]), value); } index++; reading->update(index); } } } } if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } reading->finish(); delete reading; list->setLabel("0"); fileHandle.close(); return 1; } catch(exception& e) { m->errorOut(e, "ReadPhylipMatrix", "read"); exit(1); } } /***********************************************************************/ int ReadPhylipMatrix::read(CountTable* countTable){ try { float distance; int square, nseqs; string name; vector matrixNames; string numTest; fileHandle >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ", quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, nseqs); } matrixNames.push_back(name); if(countTable == NULL){ list = new ListVector(nseqs); list->set(0, name); } else{ list = new ListVector(countTable->getListVector()); } if (m->control_pressed) { return 0; } char d; while((d=fileHandle.get()) != EOF){ if(isalnum(d)){ square = 1; fileHandle.putback(d); for(int i=0;i> distance; } break; } if(d == '\n'){ square = 0; break; } } Progress* reading; DMatrix->resize(nseqs); if(square == 0){ reading = new Progress("Reading matrix: ", nseqs * (nseqs - 1) / 2); int index = 0; for(int i=1;icontrol_pressed) { fileHandle.close(); delete reading; return 0; } fileHandle >> name; matrixNames.push_back(name); //there's A LOT of repeated code throughout this method... if(countTable == NULL){ list->set(i, name); for(int j=0;jcontrol_pressed) { delete reading; fileHandle.close(); return 0; } fileHandle >> distance; if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff){ PDistCell value(i, distance); DMatrix->addCell(j, value); } index++; reading->update(index); } } else{ for(int j=0;j> distance; if (m->control_pressed) { delete reading; fileHandle.close(); return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff){ int iIndex = countTable->get(matrixNames[i]); int jIndex = countTable->get(matrixNames[j]); if (m->control_pressed) { delete reading; fileHandle.close(); return 0; } if (iIndex < jIndex) { PDistCell value(jIndex, distance); DMatrix->addCell(iIndex, value); }else { PDistCell value(iIndex, distance); DMatrix->addCell(jIndex, value); } } index++; reading->update(index); } } } } else{ reading = new Progress("Reading matrix: ", nseqs * nseqs); int index = nseqs; for(int i=1;i> name; matrixNames.push_back(name); if(countTable == NULL){ list->set(i, name); for(int j=0;j> distance; if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff && j < i){ PDistCell value(i, distance); DMatrix->addCell(j, value); } index++; reading->update(index); } } else{ for(int j=0;j> distance; if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } if (distance == -1) { distance = 1000000; } else if (sim) { distance = 1.0 - distance; } //user has entered a sim matrix that we need to convert. if(distance < cutoff && j < i){ int iIndex = countTable->get(matrixNames[i]); int jIndex = countTable->get(matrixNames[j]); if (m->control_pressed) { delete reading; fileHandle.close(); return 0; } if (iIndex < jIndex) { PDistCell value(jIndex, distance); DMatrix->addCell(iIndex, value); }else { PDistCell value(iIndex, distance); DMatrix->addCell(jIndex, value); } } index++; reading->update(index); } } } } if (m->control_pressed) { fileHandle.close(); delete reading; return 0; } reading->finish(); delete reading; list->setLabel("0"); fileHandle.close(); return 1; } catch(exception& e) { m->errorOut(e, "ReadPhylipMatrix", "read"); exit(1); } } /***********************************************************************/ ReadPhylipMatrix::~ReadPhylipMatrix(){} /***********************************************************************/ mothur-1.36.1/source/read/readphylip.h000066400000000000000000000011141255543666200176600ustar00rootroot00000000000000#ifndef READPHYLIP_H #define READPHYLIP_H /* * readphylip.h * Mothur * * Created by Sarah Westcott on 4/21/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "readmatrix.hpp" /******************************************************/ class ReadPhylipMatrix : public ReadMatrix { public: ReadPhylipMatrix(string); ReadPhylipMatrix(string, bool); ~ReadPhylipMatrix(); int read(NameAssignment*); int read(CountTable*); private: ifstream fileHandle; string distFile; }; /******************************************************/ #endif mothur-1.36.1/source/read/readphylipvector.cpp000066400000000000000000000074451255543666200214530ustar00rootroot00000000000000/* * readphylipvector.cpp * mothur * * Created by westcott on 1/11/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "readphylipvector.h" /***********************************************************************/ ReadPhylipVector::ReadPhylipVector(string d) { try { m = MothurOut::getInstance(); distFile = d; } catch(exception& e) { m->errorOut(e, "ReadPhylipVector", "ReadPhylipVector"); exit(1); } } /***********************************************************************/ vector ReadPhylipVector::read(vector< vector >& matrix) { try { vector names; ifstream in; m->openInputFile(distFile, in); //check whether matrix is square char d; int square = 1; int numSeqs; string name; string numTest; in >> numTest >> name; if (!m->isContainingOnlyDigits(numTest)) { m->mothurOut("[ERROR]: expected a number and got " + numTest + ". I suspect you entered a column formatted file as a phylip file, quitting."); m->mothurOutEndLine(); exit(1); } else { convert(numTest, numSeqs); } while((d=in.get()) != EOF){ //is d a number meaning its square if(isalnum(d)){ square = 1; break; } //is d a line return meaning its lower triangle if(d == '\n'){ square = 2; break; } } in.close(); //reopen and read now that you know whether you are square ifstream f; m->openInputFile(distFile, f); int rank; f >> rank; names.resize(rank); matrix.resize(rank); if(square == 1){ for(int i=0;i> names[i]; for(int j=0;jcontrol_pressed) { return names; } f >> matrix[i][j]; if (matrix[i][j] == -0.0000) matrix[i][j] = 0.0000; } } } else if(square == 2){ for(int i=0;i> names[0]; for(int i=1;i> names[i]; matrix[i][i]=0.0000; for(int j=0;jcontrol_pressed) { return names; } f >> matrix[i][j]; if (matrix[i][j] == -0.0000) matrix[i][j] = 0.0000; matrix[j][i]=matrix[i][j]; } } } f.close(); return names; } catch(exception& e) { m->errorOut(e, "ReadPhylipVector", "read"); exit(1); } } /***********************************************************************/ vector ReadPhylipVector::read(vector& matrix) { try { vector names; ifstream in; m->openInputFile(distFile, in); //check whether matrix is square char d; int square = 1; int numSeqs; string name; in >> numSeqs >> name; while((d=in.get()) != EOF){ //is d a number meaning its square if(isalnum(d)){ square = 1; break; } //is d a line return meaning its lower triangle if(d == '\n'){ square = 2; break; } } in.close(); //reopen and read now that you know whether you are square ifstream f; m->openInputFile(distFile, f); int rank; float temp; f >> rank; names.resize(rank); if(square == 1){ for(int i=0;i> names[i]; for(int j=0;jcontrol_pressed) { return names; } f >> temp; if (j < i) { //only save lt seqDist dist(i, j, temp); matrix.push_back(dist); } } } } else if(square == 2){ f >> names[0]; for(int i=1;i> names[i]; for(int j=0;jcontrol_pressed) { return names; } f >> temp; seqDist dist(i, j, temp); matrix.push_back(dist); } } } f.close(); return names; } catch(exception& e) { m->errorOut(e, "ReadPhylipVector", "read"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/read/readphylipvector.h000066400000000000000000000014411255543666200211060ustar00rootroot00000000000000#ifndef READPHYLIPVECTOR_H #define READPHYLIPVECTOR_H /* * readphylipvector.h * mothur * * Created by westcott on 1/11/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" /******************************************************/ class ReadPhylipVector { public: ReadPhylipVector(string); //phylipfile - lt or square ~ReadPhylipVector() {} vector read(vector< vector >&); //pass in matrix to fill with values, returns vector of strings containing names in phylipfile vector read(vector&); //pass in matrix to fill with values, returns vector of strings containing names in phylipfile private: string distFile; MothurOut* m; }; /******************************************************/ #endif mothur-1.36.1/source/read/readtree.cpp000066400000000000000000000270261255543666200176570ustar00rootroot00000000000000/* * readtree.cpp * Mothur * * Created by Sarah Westcott on 1/22/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "readtree.h" /* Special characters to trees: , ) ( ; [ ] : */ /***********************************************************************/ ReadTree::ReadTree() { try { m = MothurOut::getInstance(); } catch(exception& e) { m->errorOut(e, "ReadTree", "ReadTree"); exit(1); } } /***********************************************************************/ int ReadTree::AssembleTrees() { try { //assemble users trees for (int i = 0; i < Trees.size(); i++) { if (m->control_pressed) { return 0; } Trees[i]->assembleTree(); } return 0; } catch(exception& e) { m->errorOut(e, "ReadTree", "AssembleTrees"); exit(1); } } /***********************************************************************/ int ReadTree::readSpecialChar(istream& f, char c, string name) { try { m->gobble(f); char d = f.get(); if(d == EOF){ m->mothurOut("Error: Input file ends prematurely, expecting a " + name + "\n"); exit(1); } if(d != c){ m->mothurOut("Error: Expected " + name + " in input file. Found " + toString(d) + ".\n"); exit(1); } if(d == ')' && f.peek() == '\n'){ m->gobble(f); } return d; } catch(exception& e) { m->errorOut(e, "ReadTree", "readSpecialChar"); exit(1); } } /**************************************************************************************************/ int ReadTree::readNodeChar(istream& f) { try { // while(isspace(d=f.get())) {;} m->gobble(f); char d = f.get(); if(d == EOF){ m->mothurOut("Error: Input file ends prematurely, expecting a left parenthesis\n"); exit(1); } return d; } catch(exception& e) { m->errorOut(e, "ReadTree", "readNodeChar"); exit(1); } } /**************************************************************************************************/ float ReadTree::readBranchLength(istream& f) { try { float b; if(!(f >> b)){ m->mothurOut("Error: Missing branch length in input tree.\n"); exit(1); } m->gobble(f); return b; } catch(exception& e) { m->errorOut(e, "ReadTree", "readBranchLength"); exit(1); } } /***********************************************************************/ /***********************************************************************/ //Child Classes Below /***********************************************************************/ /***********************************************************************/ //This class reads a file in Newick form and stores it in a tree. int ReadNewickTree::read(CountTable* ct) { try { holder = ""; int c, error; int comment = 0; //if you are not a nexus file if ((c = filehandle.peek()) != '#') { while((c = filehandle.peek()) != EOF) { if (m->control_pressed) { filehandle.close(); return 0; } while ((c = filehandle.peek()) != EOF) { if (m->control_pressed) { filehandle.close(); return 0; } // get past comments if(c == '[') { comment = 1; } if(c == ']'){ comment = 0; } if((c == '(') && (comment != 1)){ break; } filehandle.get(); } //make new tree T = new Tree(ct); numNodes = T->getNumNodes(); numLeaves = T->getNumLeaves(); error = readTreeString(ct); //save trees for later commands Trees.push_back(T); m->gobble(filehandle); } //if you are a nexus file }else if ((c = filehandle.peek()) == '#') { //get right number of seqs from nexus file. Tree* temp = new Tree(ct); delete temp; nexusTranslation(ct); //reads file through the translation and updates treemap while((c = filehandle.peek()) != EOF) { if (m->control_pressed) { filehandle.close(); return 0; } // get past comments while ((c = filehandle.peek()) != EOF) { if (m->control_pressed) { filehandle.close(); return 0; } if(holder == "[" || holder == "[!"){ comment = 1; } if(holder == "]"){ comment = 0; } if((holder == "tree" || holder == "end;") && comment != 1){ holder = ""; comment = 0; break;} filehandle >> holder; } //pass over the "tree rep.6878900 = " while (((c = filehandle.get()) != '(') && ((c = filehandle.peek()) != EOF) ) {;} if (c == EOF ) { break; } filehandle.putback(c); //put back first ( of tree. //make new tree T = new Tree(ct); numNodes = T->getNumNodes(); numLeaves = T->getNumLeaves(); //read tree info error = readTreeString(ct); //save trees for later commands Trees.push_back(T); } } if (error != 0) { readOk = error; } filehandle.close(); return readOk; } catch(exception& e) { m->errorOut(e, "ReadNewickTree", "read"); exit(1); } } /**************************************************************************************************/ //This function read the file through the translation of the sequences names and updates treemap. string ReadNewickTree::nexusTranslation(CountTable* ct) { try { holder = ""; int numSeqs = m->Treenames.size(); //must save this some when we clear old names we can still know how many sequences there were int comment = 0; // get past comments while(holder != "translate" && holder != "Translate"){ if(holder == "[" || holder == "[!"){ comment = 1; } if(holder == "]"){ comment = 0; } filehandle >> holder; if(holder == "tree" && comment != 1){return holder;} } string number, name; for(int i=0;i> number; filehandle >> name; name.erase(name.end()-1); //erase the comma ct->renameSeq(name, toString(number)); } return name; } catch(exception& e) { m->errorOut(e, "ReadNewickTree", "nexusTranslation"); exit(1); } } /**************************************************************************************************/ int ReadNewickTree::readTreeString(CountTable* ct) { try { int n = 0; int lc, rc; int rooted = 0; int ch = filehandle.peek(); if(ch == '('){ n = numLeaves; //number of leaves / sequences, we want node 1 to start where the leaves left off lc = readNewickInt(filehandle, n, T, ct); if (lc == -1) { m->mothurOut("error with lc"); m->mothurOutEndLine(); m->control_pressed = true; return -1; } //reports an error in reading if(filehandle.peek()==','){ readSpecialChar(filehandle,',',"comma"); } // ';' means end of tree. else if((ch=filehandle.peek())==';' || ch=='['){ rooted = 1; } if(rooted != 1){ rc = readNewickInt(filehandle, n, T, ct); if (rc == -1) { m->mothurOut("error with rc"); m->mothurOutEndLine(); m->control_pressed = true; return -1; } //reports an error in reading if(filehandle.peek() == ')'){ readSpecialChar(filehandle,')',"right parenthesis"); } } } //note: treeclimber had the code below added - not sure why? else{ filehandle.putback(ch); char name[MAX_LINE]; filehandle.get(name, MAX_LINE,'\n'); SKIPLINE(filehandle, ch); n = T->getIndex(name); if(n!=0){ m->mothurOut("Internal error: The only taxon is not taxon 0.\n"); //exit(1); readOk = -1; return -1; } lc = rc = -1; } while(((ch=filehandle.get())!=';') && (filehandle.eof() != true)){;} if(rooted != 1){ T->tree[n].setChildren(lc,rc); T->tree[n].setBranchLength(0); T->tree[n].setParent(-1); if(lc!=-1){ T->tree[lc].setParent(n); } if(rc!=-1){ T->tree[rc].setParent(n); } } //T->printTree(); cout << endl; return 0; } catch(exception& e) { m->errorOut(e, "ReadNewickTree", "readTreeString"); exit(1); } } /**************************************************************************************************/ int ReadNewickTree::readNewickInt(istream& f, int& n, Tree* T, CountTable* ct) { try { if (m->control_pressed) { return -1; } int c = readNodeChar(f); if(c == '('){ //to account for multifurcating trees generated by fasttree, we are forcing them to be bifurcating //read all children vector childrenNodes; while(f.peek() != ')'){ int child = readNewickInt(f, n, T, ct); if (child == -1) { return -1; } //reports an error in reading //cout << "child = " << child << endl; childrenNodes.push_back(child); //after a child you either have , or ), check for both if(f.peek()==')'){ break; } else if (f.peek()==',') { readSpecialChar(f,',',"comma"); } else {;} } //cout << childrenNodes.size() << endl; if (childrenNodes.size() < 2) { m->mothurOut("Error in tree, please correct."); m->mothurOutEndLine(); return -1; } //then force into 2 node structure for (int i = 1; i < childrenNodes.size(); i++) { int lc, rc; if (i == 1) { lc = childrenNodes[i-1]; rc = childrenNodes[i]; } else { lc = n-1; rc = childrenNodes[i]; } //cout << i << '\t' << lc << '\t' << rc << endl; T->tree[n].setChildren(lc,rc); T->tree[lc].setParent(n); T->tree[rc].setParent(n); //T->printTree(); cout << endl; n++; } //to account for extra ++ in looping n--; if(f.peek()==')'){ readSpecialChar(f,')',"right parenthesis"); //to pass over labels in trees c=filehandle.get(); while((c!=',') && (c != -1) && (c!= ':') && (c!=';')&& (c!=')')){ c=filehandle.get(); } filehandle.putback(c); } if(f.peek() == ':'){ readSpecialChar(f,':',"colon"); if(n >= numNodes){ m->mothurOut("Error: Too many nodes in input tree\n"); readOk = -1; return -1; } T->tree[n].setBranchLength(readBranchLength(f)); }else{ T->tree[n].setBranchLength(0.0); } return n++; }else{ f.putback(c); string name = ""; char d=f.get(); while(d != ':' && d != ',' && d!=')' && d!='\n'){ name += d; d=f.get(); } //cout << name << endl; int blen = 0; if(d == ':') { blen = 1; } f.putback(d); //set group info vector group = ct->getGroups(name); //cout << name << endl; //find index in tree of name int n1 = T->getIndex(name); //adds sequence names that are not in group file to the "xxx" group if(group.size() == 0) { m->mothurOut("Name: " + name + " is not in your groupfile, and will be disregarded. \n"); //readOk = -1; return n1; vector currentGroups = ct->getNamesOfGroups(); if (!m->inUsersGroups("xxx", currentGroups)) { ct->addGroup("xxx"); } currentGroups = ct->getNamesOfGroups(); vector thisCounts; thisCounts.resize(currentGroups.size(), 0); for (int h = 0; h < currentGroups.size(); h++) { if (currentGroups[h] == "xxx") { thisCounts[h] = 1; break; } } ct->push_back(name, thisCounts); group.push_back("xxx"); } T->tree[n1].setGroup(group); T->tree[n1].setChildren(-1,-1); if(blen == 1){ f.get(); T->tree[n1].setBranchLength(readBranchLength(f)); }else{ T->tree[n1].setBranchLength(0.0); } while((c=f.get())!=0 && (c != ':' && c != ',' && c!=')') ) {;} f.putback(c); return n1; } } catch(exception& e) { m->errorOut(e, "ReadNewickTree", "readNewickInt"); exit(1); } } /**************************************************************************************************/ /**************************************************************************************************/ mothur-1.36.1/source/read/readtree.h000066400000000000000000000027101255543666200173150ustar00rootroot00000000000000#ifndef READTREE_H #define READTREE_H /* * readtree.h * Mothur * * Created by Sarah Westcott on 1/22/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "tree.h" #include "counttable.h" #define MAX_LINE 513 #define SKIPLINE(f,c) {while((c=f.get())!=EOF && ((c) != '\n')){}} class Tree; /****************************************************************************/ class ReadTree { public: ReadTree(); virtual ~ReadTree() {}; virtual int read(CountTable*) = 0; int readSpecialChar(istream&, char, string); int readNodeChar(istream& f); float readBranchLength(istream& f); vector getTrees() { return Trees; } int AssembleTrees(); protected: vector Trees; CountTable* ct; int numNodes, numLeaves; MothurOut* m; }; /****************************************************************************/ class ReadNewickTree : public ReadTree { public: ReadNewickTree(string file) : treeFile(file) { m->openInputFile(file, filehandle); readOk = 0; } ~ReadNewickTree() {}; int read(CountTable*); private: Tree* T; int readNewickInt(istream&, int&, Tree*, CountTable*); int readTreeString(CountTable*); string nexusTranslation(CountTable*); ifstream filehandle; string treeFile; string holder; int readOk; // readOk = 0 means success, readOk = 1 means errors. }; /****************************************************************************/ #endif mothur-1.36.1/source/read/splitmatrix.cpp000066400000000000000000001012571255543666200204430ustar00rootroot00000000000000/* * splitmatrix.cpp * Mothur * * Created by westcott on 5/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "splitmatrix.h" #include "phylotree.h" #include "distancecommand.h" #include "seqsummarycommand.h" /***********************************************************************/ SplitMatrix::SplitMatrix(string distfile, string name, string count, string tax, float c, string t, bool l){ m = MothurOut::getInstance(); distFile = distfile; cutoff = c; namefile = name; method = t; taxFile = tax; countfile = count; large = l; } /***********************************************************************/ SplitMatrix::SplitMatrix(string ffile, string name, string count, string tax, float c, float cu, string t, int p, bool cl, string output){ m = MothurOut::getInstance(); fastafile = ffile; namefile = name; countfile = count; taxFile = tax; cutoff = c; //tax level cutoff distCutoff = cu; //for fasta method if you are creating distance matrix you need a cutoff for that method = t; processors = p; classic = cl; outputDir = output; } /***********************************************************************/ int SplitMatrix::split(){ try { if (method == "distance") { splitDistance(); }else if ((method == "classify") || (method == "fasta")) { splitClassify(); }else { m->mothurOut("Unknown splitting method, aborting split."); m->mothurOutEndLine(); map temp; if (namefile != "") { temp[distFile] = namefile; } else { temp[distFile] = countfile; } dists.push_back(temp); } return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "split"); exit(1); } } /***********************************************************************/ int SplitMatrix::splitDistance(){ try { if (large) { splitDistanceLarge(); } else { splitDistanceRAM(); } return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "splitDistance"); exit(1); } } /***********************************************************************/ int SplitMatrix::splitClassify(){ try { cutoff = int(cutoff); map seqGroup; map::iterator it; map::iterator it2; int numGroups = 0; //build tree from users taxonomy file PhyloTree* phylo = new PhyloTree(); map temp; m->readTax(taxFile, temp, true); for (map::iterator itTemp = temp.begin(); itTemp != temp.end();) { phylo->addSeqToTree(itTemp->first, itTemp->second); temp.erase(itTemp++); } phylo->assignHeirarchyIDs(0); //make sure the cutoff is not greater than maxlevel if (cutoff > phylo->getMaxLevel()) { m->mothurOut("splitcutoff is greater than the longest taxonomy, using " + toString(phylo->getMaxLevel())); m->mothurOutEndLine(); cutoff = phylo->getMaxLevel(); } //for each node in tree for (int i = 0; i < phylo->getNumNodes(); i++) { //is this node within the cutoff TaxNode taxon = phylo->get(i); if (taxon.level == cutoff) {//if yes, then create group containing this nodes sequences if (taxon.accessions.size() > 1) { //if this taxon just has one seq its a singleton for (int j = 0; j < taxon.accessions.size(); j++) { seqGroup[taxon.accessions[j]] = numGroups; } numGroups++; } } } delete phylo; if (method == "classify") { splitDistanceFileByTax(seqGroup, numGroups); }else { createDistanceFilesFromTax(seqGroup, numGroups); } return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "splitClassify"); exit(1); } } /***********************************************************************/ int SplitMatrix::createDistanceFilesFromTax(map& seqGroup, int numGroups){ try { #ifdef USE_MPI createDistanceFilesFromTaxMPI(seqGroup, numGroups); #else map copyGroups = seqGroup; map::iterator it; set names; ifstream in; m->openInputFile(fastafile, in); //open output files map outFiles; for (int i = 0; i < numGroups; i++) { ofstream* outFile = new ofstream(); //remove old temp files, just in case m->mothurRemove((fastafile + "." + toString(i) + ".temp")); m->openOutputFileAppend((fastafile + "." + toString(i) + ".temp"), *outFile); outFiles[i] = outFile; } //parse fastafile while (!in.eof()) { Sequence query(in); m->gobble(in); if (query.getName() != "") { it = seqGroup.find(query.getName()); //save names in case no namefile is given if ((namefile == "") && (countfile == "")) { names.insert(query.getName()); } if (it != seqGroup.end()) { //not singleton query.printSequence(*outFiles[it->second]); copyGroups.erase(query.getName()); } } } in.close(); //Close output files for (map::iterator it = outFiles.begin(); it != outFiles.end(); it++) { it->second->close(); delete it->second; it->second = 0; } //warn about sequence in groups that are not in fasta file for(it = copyGroups.begin(); it != copyGroups.end(); it++) { m->mothurOut("ERROR: " + it->first + " is missing from your fastafile. This could happen if your taxonomy file is not unique and your fastafile is, or it could indicate and error."); m->mothurOutEndLine(); exit(1); } copyGroups.clear(); //process each distance file for (int i = 0; i < numGroups; i++) { string options = ""; if (classic) { options = "fasta=" + (fastafile + "." + toString(i) + ".temp") + ", processors=" + toString(processors) + ", output=lt"; } else { options = "fasta=" + (fastafile + "." + toString(i) + ".temp") + ", processors=" + toString(processors) + ", cutoff=" + toString(distCutoff); } if (outputDir != "") { options += ", outputdir=" + outputDir; } m->mothurCalling = true; m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: dist.seqs(" + options + ")"); m->mothurOutEndLine(); m->mothurCalling = true; Command* command = new DistanceCommand(options); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); command->execute(); delete command; m->mothurCalling = false; m->mothurRemove((fastafile + "." + toString(i) + ".temp")); //remove old names files just in case if (namefile != "") { m->mothurRemove((namefile + "." + toString(i) + ".temp")); } else { m->mothurRemove((countfile + "." + toString(i) + ".temp")); } } //restore old fasta file name since dist.seqs overwrites it with the temp files m->setFastaFile(fastafile); vector tempDistFiles; for(int i=0;ihasPath(fastafile); } string tempDistFile = ""; if (classic) { tempDistFile = outputDir + m->getRootName(m->getSimpleName((fastafile + "." + toString(i) + ".temp"))) + "phylip.dist";} else { tempDistFile = outputDir + m->getRootName(m->getSimpleName((fastafile + "." + toString(i) + ".temp"))) + "dist"; } tempDistFiles.push_back(tempDistFile); } splitNames(seqGroup, numGroups, tempDistFiles); if (m->control_pressed) { for (int i = 0; i < dists.size(); i++) { m->mothurRemove((dists[i].begin()->first)); m->mothurRemove((dists[i].begin()->second)); } dists.clear(); } #endif return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "createDistanceFilesFromTax"); exit(1); } } /***********************************************************************/ #ifdef USE_MPI int SplitMatrix::createDistanceFilesFromTaxMPI(map& seqGroup, int numGroups){ try { map copyGroups = seqGroup; map::iterator it; set names; for (int i = 0; i < numGroups; i++) { //remove old temp files, just in case m->mothurRemove((fastafile + "." + toString(i) + ".temp")); } int pid, numSeqsPerProcessor; int tag = 2001; MPI_Status status; MPI_Comm_rank(MPI_COMM_WORLD, &pid); //find out who we are MPI_Comm_size(MPI_COMM_WORLD, &processors); if (pid == 0) { ifstream in; m->openInputFile(fastafile, in); //parse fastafile ofstream outFile; while (!in.eof()) { Sequence query(in); m->gobble(in); if (query.getName() != "") { it = seqGroup.find(query.getName()); //save names in case no namefile is given if ((namefile == "") && (countfile == "")) { names.insert(query.getName()); } if (it != seqGroup.end()) { //not singleton m->openOutputFileAppend((fastafile + "." + toString(it->second) + ".temp"), outFile); query.printSequence(outFile); outFile.close(); copyGroups.erase(query.getName()); } } } in.close(); //warn about sequence in groups that are not in fasta file for(it = copyGroups.begin(); it != copyGroups.end(); it++) { m->mothurOut("ERROR: " + it->first + " is missing from your fastafile. This could happen if your taxonomy file is not unique and your fastafile is, or it could indicate and error."); m->mothurOutEndLine(); exit(1); } copyGroups.clear(); } //process each distance file for (int i = 0; i < numGroups; i++) { string options = ""; if (classic) { options = "fasta=" + (fastafile + "." + toString(i) + ".temp") + ", processors=" + toString(processors) + ", output=lt"; } else { options = "fasta=" + (fastafile + "." + toString(i) + ".temp") + ", processors=" + toString(processors) + ", cutoff=" + toString(distCutoff); } if (outputDir != "") { options += ", outputdir=" + outputDir; } m->mothurOut("/******************************************/"); m->mothurOutEndLine(); m->mothurOut("Running command: dist.seqs(" + options + ")"); m->mothurOutEndLine(); Command* command = new DistanceCommand(options); m->mothurOut("/******************************************/"); m->mothurOutEndLine(); command->execute(); delete command; m->mothurRemove((fastafile + "." + toString(i) + ".temp")); //remove old names files just in case if (namefile != "") { m->mothurRemove((namefile + "." + toString(i) + ".temp")); } else { m->mothurRemove((countfile + "." + toString(i) + ".temp")); } } //restore old fasta file name since dist.seqs overwrites it with the temp files m->setFastaFile(fastafile); vector tempDistFiles; for(int i=0;ihasPath(fastafile); } string tempDistFile = ""; if (classic) { tempDistFile = outputDir + m->getRootName(m->getSimpleName((fastafile + "." + toString(i) + ".temp"))) + "phylip.dist";} else { tempDistFile = outputDir + m->getRootName(m->getSimpleName((fastafile + "." + toString(i) + ".temp"))) + "dist"; } tempDistFiles.push_back(tempDistFile); } if (pid == 0) { splitNames(seqGroup, numGroups, tempDistFiles); } if (m->control_pressed) { for (int i = 0; i < dists.size(); i++) { m->mothurRemove((dists[i].begin()->first)); m->mothurRemove((dists[i].begin()->second)); } dists.clear(); } return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "createDistanceFilesFromTax"); exit(1); } } #endif /***********************************************************************/ int SplitMatrix::splitDistanceFileByTax(map& seqGroup, int numGroups){ try { map::iterator it; map::iterator it2; ofstream outFile; ifstream dFile; m->openInputFile(distFile, dFile); for (int i = 0; i < numGroups; i++) { //remove old temp files, just in case m->mothurRemove((distFile + "." + toString(i) + ".temp")); } //for buffering the io to improve speed //allow for 10 dists to be stored, then output. vector outputs; outputs.resize(numGroups, ""); vector numOutputs; numOutputs.resize(numGroups, 0); //you can have a group made, but their may be no distances in the file for this group if the taxonomy file and distance file don't match //this can occur if we have converted the phylip to column, since we reduce the size at that step by using the cutoff value vector validDistances; validDistances.resize(numGroups, false); //for each distance while(dFile){ string seqA, seqB; float dist; if (m->control_pressed) { dFile.close(); for (int i = 0; i < numGroups; i++) { m->mothurRemove((distFile + "." + toString(i) + ".temp")); } } dFile >> seqA >> seqB >> dist; m->gobble(dFile); //if both sequences are in the same group then they are within the cutoff it = seqGroup.find(seqA); it2 = seqGroup.find(seqB); if ((it != seqGroup.end()) && (it2 != seqGroup.end())) { //they are both not singletons if (it->second == it2->second) { //they are from the same group so add the distance if (numOutputs[it->second] > 30) { m->openOutputFileAppend((distFile + "." + toString(it->second) + ".temp"), outFile); outFile << outputs[it->second] << seqA << '\t' << seqB << '\t' << dist << endl; outFile.close(); outputs[it->second] = ""; numOutputs[it->second] = 0; validDistances[it->second] = true; }else{ outputs[it->second] += seqA + '\t' + seqB + '\t' + toString(dist) + '\n'; numOutputs[it->second]++; } } } } dFile.close(); string inputFile = namefile; if (countfile != "") { inputFile = countfile; } vector tempDistFiles; for (int i = 0; i < numGroups; i++) { //remove old temp files, just in case string tempDistFile = distFile + "." + toString(i) + ".temp"; tempDistFiles.push_back(tempDistFile); m->mothurRemove((inputFile + "." + toString(i) + ".temp")); //write out any remaining buffers if (numOutputs[i] > 0) { m->openOutputFileAppend((distFile + "." + toString(i) + ".temp"), outFile); outFile << outputs[i]; outFile.close(); outputs[i] = ""; numOutputs[i] = 0; validDistances[i] = true; } } splitNames(seqGroup, numGroups, tempDistFiles); if (m->control_pressed) { for (int i = 0; i < dists.size(); i++) { m->mothurRemove((dists[i].begin()->first)); m->mothurRemove((dists[i].begin()->second)); } dists.clear(); } return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "splitDistanceFileByTax"); exit(1); } } /***********************************************************************/ int SplitMatrix::splitDistanceLarge(){ try { vector > groups; //for buffering the io to improve speed //allow for 30 dists to be stored, then output. vector outputs; vector numOutputs; vector wroteOutPut; int numGroups = 0; //ofstream outFile; ifstream dFile; m->openInputFile(distFile, dFile); while(dFile){ string seqA, seqB; float dist; dFile >> seqA >> seqB >> dist; if (m->control_pressed) { dFile.close(); for(int i=0;i 0){ m->mothurRemove((distFile + "." + toString(i) + ".temp")); } } return 0; } if(dist < cutoff){ //cout << "in cutoff: " << dist << endl; int groupIDA = -1; int groupIDB = -1; int groupID = -1; for(int i=0;i::iterator aIt = groups[i].find(seqA); set::iterator bIt = groups[i].find(seqB); if(groupIDA == -1 && aIt != groups[i].end()){//seqA is not already assigned to a group and is in group[i], so assign seqB to group[i] groups[i].insert(seqB); groupIDA = i; groupID = groupIDA; //cout << "in aIt: " << groupID << endl; // break; } else if(groupIDB == -1 && bIt != groups[i].end()){//seqB is not already assigned to a group and is in group[i], so assign seqA to group[i] groups[i].insert(seqA); groupIDB = i; groupID = groupIDB; // cout << "in bIt: " << groupID << endl; // break; } if(groupIDA != -1 && groupIDB != -1){//both ifs above have been executed, so we need to decide who to assign them to if(groupIDA < groupIDB){ // cout << "A: " << groupIDA << "\t" << groupIDB << endl; groups[groupIDA].insert(groups[groupIDB].begin(), groups[groupIDB].end()); //merge two groups into groupIDA groups[groupIDB].clear(); groupID = groupIDA; } else{ // cout << "B: " << groupIDA << "\t" << groupIDB << endl; groups[groupIDB].insert(groups[groupIDA].begin(), groups[groupIDA].end()); //merge two groups into groupIDB groups[groupIDA].clear(); groupID = groupIDB; } break; } } //windows is gonna gag on the reuse of outFile, will need to make it local... if(groupIDA == -1 && groupIDB == -1){ //we need a new group set newGroup; newGroup.insert(seqA); newGroup.insert(seqB); groups.push_back(newGroup); string tempOut = seqA + '\t' + seqB + '\t' + toString(dist) + '\n'; outputs.push_back(tempOut); numOutputs.push_back(1); wroteOutPut.push_back(false); numGroups++; } else{ string fileName = distFile + "." + toString(groupID) + ".temp"; //have we reached the max buffer size if (numOutputs[groupID] > 60) { //write out sequence ofstream outFile; outFile.open(fileName.c_str(), ios::app); outFile << outputs[groupID] << seqA << '\t' << seqB << '\t' << dist << endl; outFile.close(); outputs[groupID] = ""; numOutputs[groupID] = 0; wroteOutPut[groupID] = true; }else { outputs[groupID] += seqA + '\t' + seqB + '\t' + toString(dist) + '\n'; numOutputs[groupID]++; } if(groupIDA != -1 && groupIDB != -1){ //merge distance files of two groups you merged above string row, column, distance; if(groupIDAappendFiles(fileName2, fileName); m->mothurRemove(fileName2); //write out the merged memory if (numOutputs[groupID] > 60) { ofstream tempOut; m->openOutputFile(fileName, tempOut); tempOut << outputs[groupID]; outputs[groupID] = ""; numOutputs[groupID] = 0; tempOut.close(); } //outFile.close(); wroteOutPut[groupID] = true; wroteOutPut[groupIDB] = false; }else{ } //just merge b's memory with a's memory } else{ numOutputs[groupID] += numOutputs[groupIDA]; outputs[groupID] += outputs[groupIDA]; outputs[groupIDA] = ""; numOutputs[groupIDA] = 0; if (wroteOutPut[groupIDA]) { string fileName2 = distFile + "." + toString(groupIDA) + ".temp"; /*ifstream fileB(fileName2.c_str(), ios::ate); outFile.open(fileName.c_str(), ios::app); long size; char* memblock; size = fileB.tellg(); fileB.seekg (0, ios::beg); int numRead = size / 1024; int lastRead = size % 1024; for (int i = 0; i < numRead; i++) { memblock = new char [1024]; fileB.read (memblock, 1024); string temp = memblock; outFile << temp.substr(0, 1024); delete memblock; } memblock = new char [lastRead]; fileB.read (memblock, lastRead); //not sure why but it will read more than lastRead char...?? string temp = memblock; outFile << temp.substr(0, lastRead); delete memblock; fileB.close();*/ m->appendFiles(fileName2, fileName); m->mothurRemove(fileName2); //write out the merged memory if (numOutputs[groupID] > 60) { ofstream tempOut; m->openOutputFile(fileName, tempOut); tempOut << outputs[groupID]; outputs[groupID] = ""; numOutputs[groupID] = 0; tempOut.close(); } //outFile.close(); wroteOutPut[groupID] = true; wroteOutPut[groupIDA] = false; }else { } //just merge memory } } } } m->gobble(dFile); } dFile.close(); vector tempDistFiles; for (int i = 0; i < numGroups; i++) { string fileName = distFile + "." + toString(i) + ".temp"; tempDistFiles.push_back(fileName); //remove old names files just in case if (numOutputs[i] > 0) { ofstream outFile; outFile.open(fileName.c_str(), ios::app); outFile << outputs[i]; outFile.close(); } } map seqGroup; for (int i = 0; i < groups.size(); i++) { for (set::iterator itNames = groups[i].begin(); itNames != groups[i].end();) { seqGroup[*itNames] = i; groups[i].erase(itNames++); } } splitNames(seqGroup, numGroups, tempDistFiles); return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "splitDistanceLarge"); exit(1); } } //******************************************************************************************************************** int SplitMatrix::splitNames(map& seqGroup, int numGroups, vector& tempDistFiles){ try { ofstream outFile; map::iterator it; string inputFile = namefile; if (countfile != "") { inputFile = countfile; } for(int i=0;imothurRemove((inputFile + "." + toString(i) + ".temp")); } singleton = inputFile + ".extra.temp"; ofstream remainingNames; m->openOutputFile(singleton, remainingNames); bool wroteExtra = false; ifstream bigNameFile; m->openInputFile(inputFile, bigNameFile); //grab header line string headers = ""; if (countfile != "") { headers = m->getline(bigNameFile); m->gobble(bigNameFile); } string name, nameList; while(!bigNameFile.eof()){ bigNameFile >> name >> nameList; m->getline(bigNameFile); m->gobble(bigNameFile); //extra getline is for rest of countfile line if groups are given. //did this sequence get assigned a group it = seqGroup.find(name); if (it != seqGroup.end()) { m->openOutputFileAppend((inputFile + "." + toString(it->second) + ".temp"), outFile); outFile << name << '\t' << nameList << endl; outFile.close(); }else{ wroteExtra = true; remainingNames << name << '\t' << nameList << endl; } } bigNameFile.close(); for(int i=0;igobble(fileHandle); if (!fileHandle.eof()) { //check map temp; if (countfile != "") { //add header ofstream out; string newtempNameFile = tempNameFile + "2"; m->openOutputFile(newtempNameFile, out); out << "Representative_Sequence\ttotal" << endl; out.close(); m->appendFiles(tempNameFile, newtempNameFile); m->mothurRemove(tempNameFile); m->renameFile(newtempNameFile, tempNameFile); } temp[tempDistFile] = tempNameFile; dists.push_back(temp); }else{ ifstream in; m->openInputFile(tempNameFile, in); while(!in.eof()) { in >> name >> nameList; m->gobble(in); wroteExtra = true; remainingNames << name << '\t' << nameList << endl; } in.close(); m->mothurRemove(tempNameFile); } } fileHandle.close(); } remainingNames.close(); if (!wroteExtra) { m->mothurRemove(singleton); singleton = "none"; }else if (countfile != "") { //add header ofstream out; string newtempNameFile = singleton + "2"; m->openOutputFile(newtempNameFile, out); out << "Representative_Sequence\ttotal" << endl; out.close(); m->appendFiles(singleton, newtempNameFile); m->mothurRemove(singleton); m->renameFile(newtempNameFile, singleton); } return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "splitNames"); exit(1); } } //******************************************************************************************************************** int SplitMatrix::splitDistanceRAM(){ try { vector > groups; vector outputs; int numGroups = 0; ifstream dFile; m->openInputFile(distFile, dFile); while(dFile){ string seqA, seqB; float dist; dFile >> seqA >> seqB >> dist; if (m->control_pressed) { dFile.close(); for(int i=0;i 0){ m->mothurRemove((distFile + "." + toString(i) + ".temp")); } } return 0; } if(dist < cutoff){ //cout << "in cutoff: " << dist << endl; int groupIDA = -1; int groupIDB = -1; int groupID = -1; for(int i=0;i::iterator aIt = groups[i].find(seqA); set::iterator bIt = groups[i].find(seqB); if(groupIDA == -1 && aIt != groups[i].end()){//seqA is not already assigned to a group and is in group[i], so assign seqB to group[i] groups[i].insert(seqB); groupIDA = i; groupID = groupIDA; //cout << "in aIt: " << groupID << endl; // break; } else if(groupIDB == -1 && bIt != groups[i].end()){//seqB is not already assigned to a group and is in group[i], so assign seqA to group[i] groups[i].insert(seqA); groupIDB = i; groupID = groupIDB; // cout << "in bIt: " << groupID << endl; // break; } if(groupIDA != -1 && groupIDB != -1){//both ifs above have been executed, so we need to decide who to assign them to if(groupIDA < groupIDB){ // cout << "A: " << groupIDA << "\t" << groupIDB << endl; groups[groupIDA].insert(groups[groupIDB].begin(), groups[groupIDB].end()); //merge two groups into groupIDA groups[groupIDB].clear(); groupID = groupIDA; } else{ // cout << "B: " << groupIDA << "\t" << groupIDB << endl; groups[groupIDB].insert(groups[groupIDA].begin(), groups[groupIDA].end()); //merge two groups into groupIDB groups[groupIDA].clear(); groupID = groupIDB; } break; } } //windows is gonna gag on the reuse of outFile, will need to make it local... if(groupIDA == -1 && groupIDB == -1){ //we need a new group set newGroup; newGroup.insert(seqA); newGroup.insert(seqB); groups.push_back(newGroup); string tempOut = seqA + '\t' + seqB + '\t' + toString(dist) + '\n'; outputs.push_back(tempOut); numGroups++; } else{ outputs[groupID] += seqA + '\t' + seqB + '\t' + toString(dist) + '\n'; if(groupIDA != -1 && groupIDB != -1){ //merge distance files of two groups you merged above string row, column, distance; if(groupIDAgobble(dFile); } dFile.close(); vector tempDistFiles; for (int i = 0; i < numGroups; i++) { string fileName = distFile + "." + toString(i) + ".temp"; tempDistFiles.push_back(fileName); if (outputs[i] != "") { ofstream outFile; outFile.open(fileName.c_str(), ios::ate); outFile << outputs[i]; outFile.close(); } } map seqGroup; for (int i = 0; i < groups.size(); i++) { for (set::iterator itNames = groups[i].begin(); itNames != groups[i].end();) { seqGroup[*itNames] = i; groups[i].erase(itNames++); } } splitNames(seqGroup, numGroups, tempDistFiles); return 0; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "splitDistanceRAM"); exit(1); } } //******************************************************************************************************************** //sorts biggest to smallest inline bool compareFileSizes(map left, map right){ FILE * pFile; long leftsize = 0; //get num bytes in file string filename = left.begin()->first; pFile = fopen (filename.c_str(),"rb"); string error = "Error opening " + filename; if (pFile==NULL) perror (error.c_str()); else{ fseek (pFile, 0, SEEK_END); leftsize=ftell (pFile); fclose (pFile); } FILE * pFile2; long rightsize = 0; //get num bytes in file filename = right.begin()->first; pFile2 = fopen (filename.c_str(),"rb"); error = "Error opening " + filename; if (pFile2==NULL) perror (error.c_str()); else{ fseek (pFile2, 0, SEEK_END); rightsize=ftell (pFile2); fclose (pFile2); } return (leftsize > rightsize); } /***********************************************************************/ //returns map of distance files -> namefile sorted by distance file size vector< map< string, string> > SplitMatrix::getDistanceFiles(){ try { sort(dists.begin(), dists.end(), compareFileSizes); return dists; } catch(exception& e) { m->errorOut(e, "SplitMatrix", "getDistanceFiles"); exit(1); } } /***********************************************************************/ SplitMatrix::~SplitMatrix(){} /***********************************************************************/ mothur-1.36.1/source/read/splitmatrix.h000066400000000000000000000031131255543666200201000ustar00rootroot00000000000000#ifndef SPLITMATRIX_H #define SPLITMATRIX_H /* * splitmatrix.h * Mothur * * Created by westcott on 5/19/10. * Copyright 2010 Schloss Lab. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" /******************************************************/ class SplitMatrix { public: SplitMatrix(string, string, string, string, float, string, bool); //column formatted distance file, namesfile, countfile, cutoff, method, large SplitMatrix(string, string, string, string, float, float, string, int, bool, string); //fastafile, namefile, countfile, taxFile, taxcutoff, cutoff, method, processors, classic, outputDir ~SplitMatrix(); int split(); vector< map > getDistanceFiles(); //returns map of distance files -> namefile sorted by distance file size string getSingletonNames() { return singleton; } //returns namesfile containing singletons private: MothurOut* m; string distFile, namefile, singleton, method, taxFile, fastafile, outputDir, countfile; vector< map< string, string> > dists; float cutoff, distCutoff; bool large, classic; int processors; int splitDistance(); int splitClassify(); int splitDistanceLarge(); int splitDistanceRAM(); int splitNames(map& groups, int, vector&); int splitDistanceFileByTax(map&, int); int createDistanceFilesFromTax(map&, int); #ifdef USE_MPI int createDistanceFilesFromTaxMPI(map& seqGroup, int numGroups); #endif }; /******************************************************/ #endif mothur-1.36.1/source/read/treereader.cpp000066400000000000000000000104131255543666200201760ustar00rootroot00000000000000// // treereader.cpp // Mothur // // Created by Sarah Westcott on 4/11/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "treereader.h" #include "readtree.h" #include "groupmap.h" /***********************************************************************/ TreeReader::TreeReader(string tf, string cf) : treefile(tf), countfile(cf) { try { m = MothurOut::getInstance(); ct = new CountTable(); ct->readTable(cf, true, false); //if no groupinfo in count file we need to add it if (!ct->hasGroupInfo()) { ct->addGroup("Group1"); vector namesOfSeqs = ct->getNamesOfSeqs(); for (int i = 0; i < namesOfSeqs.size(); i++) { ct->setAbund(namesOfSeqs[i], "Group1", ct->getNumSeqs(namesOfSeqs[i])); } } namefile = ""; groupfile = ""; readTrees(); } catch(exception& e) { m->errorOut(e, "TreeReader", "TreeReader"); exit(1); } } /***********************************************************************/ TreeReader::TreeReader(string tf, string gf, string nf) : treefile(tf), groupfile(gf), namefile(nf) { try { m = MothurOut::getInstance(); countfile = ""; ct = new CountTable(); if (namefile != "") { ct->createTable(namefile, groupfile, true); } else { Tree* tree = new Tree(treefile); delete tree; //extracts names from tree to make faked out groupmap set nameMap; map groupMap; set gps; for (int i = 0; i < m->Treenames.size(); i++) { nameMap.insert(m->Treenames[i]); } if (groupfile == "") { gps.insert("Group1"); for (int i = 0; i < m->Treenames.size(); i++) { groupMap[m->Treenames[i]] = "Group1"; } } else { GroupMap g(groupfile); g.readMap(); vector seqs = g.getNamesSeqs(); for (int i = 0; i < seqs.size(); i++) { string group = g.getGroup(seqs[i]); groupMap[seqs[i]] = group; gps.insert(group); } } ct->createTable(nameMap, groupMap, gps); } readTrees(); } catch(exception& e) { m->errorOut(e, "TreeReader", "TreeReader"); exit(1); } } /***********************************************************************/ bool TreeReader::readTrees() { try { int numUniquesInName = ct->getNumUniqueSeqs(); //if (namefile != "") { numUniquesInName = readNamesFile(); } ReadTree* read = new ReadNewickTree(treefile); int readOk = read->read(ct); if (readOk != 0) { m->mothurOut("Read Terminated."); m->mothurOutEndLine(); delete read; m->control_pressed=true; return 0; } read->AssembleTrees(); trees = read->getTrees(); delete read; //make sure all files match //if you provide a namefile we will use the numNames in the namefile as long as the number of unique match the tree names size. int numNamesInTree; if (namefile != "") { if (numUniquesInName == m->Treenames.size()) { numNamesInTree = ct->getNumSeqs(); } else { numNamesInTree = m->Treenames.size(); } }else { numNamesInTree = m->Treenames.size(); } //output any names that are in group file but not in tree if (numNamesInTree < ct->getNumSeqs()) { vector namesSeqsCt = ct->getNamesOfSeqs(); for (int i = 0; i < namesSeqsCt.size(); i++) { //is that name in the tree? int count = 0; for (int j = 0; j < m->Treenames.size(); j++) { if (namesSeqsCt[i] == m->Treenames[j]) { break; } //found it count++; } if (m->control_pressed) { for (int i = 0; i < trees.size(); i++) { delete trees[i]; } return 0; } //then you did not find it so report it if (count == m->Treenames.size()) { m->mothurOut(namesSeqsCt[i] + " is in your name or group file and not in your tree. It will be disregarded."); m->mothurOutEndLine(); ct->remove(namesSeqsCt[i]); } } } return true; } catch(exception& e) { m->errorOut(e, "TreeReader", "readTrees"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/read/treereader.h000066400000000000000000000013671255543666200176530ustar00rootroot00000000000000#ifndef Mothur_treereader_h #define Mothur_treereader_h // // treereader.h // Mothur // // Created by Sarah Westcott on 4/11/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "mothurout.h" #include "tree.h" #include "counttable.h" class TreeReader { public: TreeReader(string tf, string cf); TreeReader(string tf, string gf, string nf); ~TreeReader() {} vector getTrees() { return trees; } private: MothurOut* m; vector trees; CountTable* ct; //map nameMap; //dupName -> uniqueName // map names; string treefile, groupfile, namefile, countfile; bool readTrees(); int readNamesFile(); }; #endif mothur-1.36.1/source/refchimeratest.cpp000066400000000000000000000471631255543666200201620ustar00rootroot00000000000000/* * refchimeratest.cpp * Mothur * * Created by Pat Schloss on 1/31/11. * Copyright 2011 Schloss Lab. All rights reserved. * */ #include "refchimeratest.h" #include "mothur.h" int MAXINT = numeric_limits::max(); //*************************************************************************************************************** RefChimeraTest::RefChimeraTest(vector& refs, bool aligned) : aligned(aligned){ m = MothurOut::getInstance(); numRefSeqs = refs.size(); referenceSeqs.resize(numRefSeqs); referenceNames.resize(numRefSeqs); for(int i=0;ierrorOut(e, "RefChimeraTest", "printHeader"); exit(1); } } //*************************************************************************************************************** int RefChimeraTest::analyzeQuery(string queryName, string querySeq, ofstream& chimeraReportFile){ int numParents = -1; if(aligned){ numParents = analyzeAlignedQuery(queryName, querySeq, chimeraReportFile); } else{ numParents = analyzeUnalignedQuery(queryName, querySeq, chimeraReportFile); } return numParents; } //*************************************************************************************************************** int RefChimeraTest::analyzeAlignedQuery(string queryName, string querySeq, ofstream& chimeraReportFile){ vector > left; left.resize(numRefSeqs); vector > right; right.resize(numRefSeqs); vector singleLeft, bestLeft; vector singleRight, bestRight; for(int i=0;i= 3){// || (minMismatchToChimera == 0 && bestSequenceMismatch != 0)){ nMera = 2; chimeraRefSeq = stitchBimera(leftParentBi, rightParentBi, breakPointBi); } else{ nMera = 1; chimeraRefSeq = referenceSeqs[bestMatchIndex]; } bestRefAlignment = chimeraRefSeq; bestQueryAlignment = querySeq; double distToChimera = calcDistToChimera(bestQueryAlignment, bestRefAlignment); chimeraReportFile << queryName << '\t' << referenceNames[bestMatchIndex] << '\t' << bestSequenceMismatch << '\t'; chimeraReportFile << referenceNames[leftParentBi] << ',' << referenceNames[rightParentBi] << '\t' << breakPointBi << '\t'; chimeraReportFile << minMismatchToChimera << '\t'; chimeraReportFile << '\t' << distToChimera << '\t' << nMera << endl; bestMatch = bestMatchIndex; return nMera; } //*************************************************************************************************************** int RefChimeraTest::analyzeUnalignedQuery(string queryName, string querySeq, ofstream& chimeraReportFile){ int nMera = 0; int seqLength = querySeq.length(); vector queryAlign; queryAlign.resize(numRefSeqs); vector refAlign; refAlign.resize(numRefSeqs); vector > leftDiffs; leftDiffs.resize(numRefSeqs); vector > rightDiffs; rightDiffs.resize(numRefSeqs); vector > leftMaps; leftMaps.resize(numRefSeqs); vector > rightMaps; rightMaps.resize(numRefSeqs); int bestRefIndex = -1; int bestRefDiffs = numeric_limits::max(); double bestRefLength = 0; for(int i=0;i= 3){ for(int i=0;i singleLeft(seqLength, numeric_limits::max()); vector bestLeft(seqLength, -1); for(int l=0;l singleRight(seqLength, numeric_limits::max()); vector bestRight(seqLength, -1); for(int l=0;l::max(); int leftParent = 0; int rightParent = 0; int breakPoint = 0; for(int l=0;l= 3){// || (minMismatchToChimera == 0 && bestSequenceMismatch != 0)){ nMera = 2; int breakLeft = leftMaps[leftParent][breakPoint]; int breakRight = rightMaps[rightParent][rightMaps[rightParent].size() - breakPoint - 2]; string left = refAlign[leftParent]; string right = refAlign[rightParent]; for(int i=0;i<=breakLeft;i++){ if (m->control_pressed) { return 0; } if(left[i] != '-' && left[i] != '.'){ reference += left[i]; } } for(int i=breakRight;icontrol_pressed) { return 0; } if(right[i] != '-' && right[i] != '.'){ reference += right[i]; } } } else{ nMera = 1; reference = referenceSeqs[bestRefIndex]; } double alignLength; double finalDiffs = alignQueryToReferences(querySeq, reference, bestQueryAlignment, bestRefAlignment, alignLength); double finalDistance = finalDiffs / alignLength; chimeraReportFile << queryName << '\t' << referenceNames[bestRefIndex] << '\t' << bestRefDiffs << '\t'; chimeraReportFile << referenceNames[leftParent] << ',' << referenceNames[rightParent] << '\t' << breakPoint << '\t'; chimeraReportFile << bestChimeraMismatches << '\t'; chimeraReportFile << '\t' << finalDistance << '\t' << nMera << endl; } else{ bestQueryAlignment = queryAlign[bestRefIndex]; bestRefAlignment = refAlign[bestRefIndex]; nMera = 1; chimeraReportFile << queryName << '\t' << referenceNames[bestRefIndex] << '\t' << bestRefDiffs << '\t'; chimeraReportFile << "NA\tNA\tNA\tNA\t1" << endl; } bestMatch = bestRefIndex; return nMera; } /**************************************************************************************************/ double RefChimeraTest::alignQueryToReferences(string query, string reference, string& qAlign, string& rAlign, double& length){ try { double GAP = -5; double MATCH = 1; double MISMATCH = -1; int queryLength = query.length(); int refLength = reference.length(); vector > alignMatrix; alignMatrix.resize(queryLength + 1); vector > alignMoves; alignMoves.resize(queryLength + 1); for(int i=0;i<=queryLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[i].resize(refLength + 1, 0); alignMoves[i].resize(refLength + 1, 'x'); } for(int i=0;i<=queryLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[i][0] = 0;//GAP * i; alignMoves[i][0] = 'u'; } for(int i=0;i<=refLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[0][i] = 0;//GAP * i; alignMoves[0][i] = 'l'; } for(int i=1;i<=queryLength;i++){ if (m->control_pressed) { return 0; } for(int j=1;j<=refLength;j++){ double nogapScore; if(query[i-1] == reference[j-1]){ nogapScore = alignMatrix[i-1][j-1] + MATCH; } else { nogapScore = alignMatrix[i-1][j-1] + MISMATCH; } double leftScore; if(i == queryLength) { leftScore = alignMatrix[i][j-1]; } else { leftScore = alignMatrix[i][j-1] + GAP; } double upScore; if(j == refLength) { upScore = alignMatrix[i-1][j]; } else { upScore = alignMatrix[i-1][j] + GAP; } if(nogapScore > leftScore){ if(nogapScore > upScore){ alignMoves[i][j] = 'd'; alignMatrix[i][j] = nogapScore; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = upScore; } } else{ if(leftScore > upScore){ alignMoves[i][j] = 'l'; alignMatrix[i][j] = leftScore; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = upScore; } } } } int end = refLength - 1; int maxRow = 0; double maxRowValue = -2147483647; for(int i=0;i maxRowValue){ maxRow = i; maxRowValue = alignMatrix[i][end]; } } end = queryLength - 1; int maxColumn = 0; double maxColumnValue = -2147483647; for(int j=0;j maxColumnValue){ maxColumn = j; maxColumnValue = alignMatrix[end][j]; } } int row = queryLength-1; int column = refLength-1; if(maxColumn == column && maxRow == row){} // if the max values are the lower right corner, then we're good else if(alignMatrix[row][maxColumn] < alignMatrix[maxRow][column]){ for(int i=maxRow+1;i 0 && j > 0){ if (m->control_pressed) { return 0; } if(alignMoves[i][j] == 'd'){ qAlign = query[i-1] + qAlign; rAlign = reference[j-1] + rAlign; if(query[i-1] != reference[j-1]){ diffs++; } length++; i--; j--; } else if(alignMoves[i][j] == 'u'){ qAlign = query[i-1] + qAlign; if(j != refLength) { rAlign = '-' + rAlign; diffs++; length++; } else { rAlign = '.' + rAlign; } i--; } else if(alignMoves[i][j] == 'l'){ rAlign = reference[j-1] + rAlign; if(i != queryLength){ qAlign = '-' + qAlign; diffs++; length++; } else { qAlign = '.' + qAlign; } j--; } } if(i>0){ qAlign = query.substr(0, i) + qAlign; rAlign = string(i, '.') + rAlign; } else if(j>0){ qAlign = string(j, '.') + qAlign; rAlign = reference.substr(0, j) + rAlign; } return diffs; } catch(exception& e) { m->errorOut(e, "RefChimeraTest", "alignQueryToReferences"); exit(1); } } /**************************************************************************************************/ int RefChimeraTest::getUnalignedDiffs(string qAlign, string rAlign, vector& leftDiffs, vector& leftMap, vector& rightDiffs, vector& rightMap){ try { int alignLength = qAlign.length(); int lDiffs = 0; int lCount = 0; for(int l=0;lcontrol_pressed) { return 0; } if(qAlign[l] == '-'){ lDiffs++; } else if(qAlign[l] != '.'){ if(rAlign[l] == '-'){ lDiffs++; } else if(qAlign[l] != rAlign[l]){;// && rAlign[l] != '.'){ lDiffs++; } leftDiffs[lCount] = lDiffs; leftMap[lCount] = l; lCount++; } } int rDiffs = 0; int rCount = 0; for(int l=alignLength-1;l>=0;l--){ if (m->control_pressed) { return 0; } if(qAlign[l] == '-'){ rDiffs++; } else if(qAlign[l] != '.'){ if(rAlign[l] == '-'){ rDiffs++; } else if(qAlign[l] != rAlign[l]){;// && rAlign[l] != '.'){ rDiffs++; } rightDiffs[rCount] = rDiffs; rightMap[rCount] = l; rCount++; } } return 0; } catch(exception& e) { m->errorOut(e, "RefChimeraTest", "getUnalignedDiffs"); exit(1); } } /**************************************************************************************************/ int RefChimeraTest::getAlignedMismatches(string& querySeq, vector >& left, vector >& right, int& bestRefSeq){ int bestSequenceMismatch = MAXINT; for(int i=0;i=0;l--){ if(querySeq[l] != '.' && referenceSeqs[i][l] != '.' && querySeq[l] != referenceSeqs[i][l] && referenceSeqs[i][l] != 'N'){ rDiffs++; } right[i][index++] = rDiffs; } if(lDiffs < bestSequenceMismatch){ bestSequenceMismatch = lDiffs; bestRefSeq = i; } } return bestSequenceMismatch; } /**************************************************************************************************/ int RefChimeraTest::getChimera(vector >& left, vector >& right, int& leftParent, int& rightParent, int& breakPoint, vector& singleLeft, vector& bestLeft, vector& singleRight, vector& bestRight){ singleLeft.resize(alignLength, MAXINT); bestLeft.resize(alignLength, -1); for(int l=0;l >& left, vector >& right, int& leftParent, int& middleParent, int& rightParent, int& breakPointA, int& breakPointB, vector& singleLeft, vector& bestLeft, vector& singleRight, vector& bestRight){ int bestTrimeraMismatches = MAXINT; leftParent = -1; middleParent = -1; rightParent = -1; breakPointA = -1; breakPointB = -1; vector > minDelta; minDelta.resize(alignLength); vector > minDeltaSeq; minDeltaSeq.resize(alignLength); for(int i=0;i&, bool); int printHeader(ofstream&); int analyzeQuery(string, string, ofstream&); int getClosestRefIndex(); string getClosestRefAlignment(); string getQueryAlignment(); private: int getAlignedMismatches(string&, vector >&, vector >&, int&); int analyzeAlignedQuery(string, string, ofstream&); int analyzeUnalignedQuery(string, string, ofstream&); double alignQueryToReferences(string, string, string&, string&, double&); int getUnalignedDiffs(string, string, vector&, vector&, vector&, vector&); int getChimera(vector >&, vector >&, int&, int&, int&, vector&, vector&, vector&, vector&); int getTrimera(vector >&, vector >&, int&, int&, int&, int&, int&, vector&, vector&, vector&, vector&); string stitchBimera(int, int, int); string stitchTrimera(int, int, int, int, int); double calcDistToChimera(string&, string&); vector referenceSeqs; vector referenceNames; int numRefSeqs; int alignLength; int bestMatch; string bestRefAlignment; string bestQueryAlignment; bool aligned; MothurOut* m; }; #endif mothur-1.36.1/source/seqnoise.cpp000066400000000000000000000625641255543666200170050ustar00rootroot00000000000000/* * mySeqNoise.cpp * * * Created by Pat Schloss on 8/31/11. * Copyright 2011 Patrick D. Schloss. All rights reserved. * */ #include "seqnoise.h" #include "sequence.hpp" #include "listvector.hpp" #include "inputdata.h" #define MIN_DELTA 1.0e-6 #define MIN_ITER 20 #define MAX_ITER 1000 #define MIN_COUNT 0.1 #define MIN_TAU 1.0e-4 #define MIN_WEIGHT 0.1 /**************************************************************************************************/ int seqNoise::getSequenceData(string sequenceFileName, vector& sequences){ try { ifstream sequenceFile; m->openInputFile(sequenceFileName, sequenceFile); while(!sequenceFile.eof()){ if (m->control_pressed) { break; } Sequence temp(sequenceFile); m->gobble(sequenceFile); if (temp.getName() != "") { sequences.push_back(temp.getAligned()); } } sequenceFile.close(); return 0; } catch(exception& e) { m->errorOut(e, "seqNoise", "getSequenceData"); exit(1); } } /**************************************************************************************************/ int seqNoise::addSeq(string seq, vector& sequences){ try { sequences.push_back(seq); return 0; } catch(exception& e) { m->errorOut(e, "seqNoise", "addSeq"); exit(1); } } /**************************************************************************************************/ //no checks for file mismatches int seqNoise::getRedundantNames(string namesFileName, vector& uniqueNames, vector& redundantNames, vector& seqFreq){ try { string unique, redundant; ifstream namesFile; m->openInputFile(namesFileName, namesFile); for(int i=0;icontrol_pressed) { break; } namesFile >> uniqueNames[i]; m->gobble(namesFile); namesFile >> redundantNames[i]; m->gobble(namesFile); seqFreq[i] = m->getNumNames(redundantNames[i]); } namesFile.close(); return 0; } catch(exception& e) { m->errorOut(e, "seqNoise", "getRedundantNames"); exit(1); } } /**************************************************************************************************/ int seqNoise::addRedundantName(string uniqueName, string redundantName, vector& uniqueNames, vector& redundantNames, vector& seqFreq){ try { uniqueNames.push_back(uniqueName); redundantNames.push_back(redundantName); seqFreq.push_back(m->getNumNames(redundantName)); return 0; } catch(exception& e) { m->errorOut(e, "seqNoise", "addRedundantName"); exit(1); } } /**************************************************************************************************/ int seqNoise::getDistanceData(string distFileName, vector& distances){ try { ifstream distFile; m->openInputFile(distFileName, distFile); int numSeqs = 0; string name = ""; distFile >> numSeqs; for(int i=0;icontrol_pressed) { break; } distances[i * numSeqs + i] = 0.0000; distFile >> name; for(int j=0;j> distances[i * numSeqs + j]; distances[j * numSeqs + i] = distances[i * numSeqs + j]; } } distFile.close(); return 0; } catch(exception& e) { m->errorOut(e, "seqNoise", "getDistanceData"); exit(1); } } /**************************************************************************************************/ int seqNoise::getListData(string listFileName, double cutOff, vector& otuData, vector& otuFreq, vector >& otuBySeqLookUp){ try { ifstream listFile; m->openInputFile(listFileName, listFile); bool adjustCutoff = true; string lastLabel = ""; while(!listFile.eof()){ ListVector list(listFile); m->gobble(listFile); //10/18/13 - change to reading with listvector to accomodate changes to the listfiel format. ie. adding header labels. string thisLabel = list.getLabel(); lastLabel = thisLabel; if (thisLabel == "unique") {} //skip to next label in listfile else { double threshold; m->mothurConvert(thisLabel, threshold); if(threshold < cutOff){} //skip to next label in listfile else{ adjustCutoff = false; int numOTUs = list.getNumBins(); otuFreq.resize(numOTUs, 0); for(int i=0;icontrol_pressed) { return 0; } string otu = list.get(i); int count = 0; string number = ""; for(int j=0;jcontrol_pressed) { return 0; } otuBySeqLookUp[otuData[i]].push_back(i); } for(int i=0;icontrol_pressed) { return 0; } for(int j=otuBySeqLookUp[i].size();jgetNumBins(); otuFreq.resize(numOTUs, 0); for(int i=0;icontrol_pressed) { return 0; } string otu = list->get(i); int count = 0; string number = ""; for(int j=0;jcontrol_pressed) { return 0; } otuBySeqLookUp[otuData[i]].push_back(i); } for(int i=0;icontrol_pressed) { return 0; } for(int j=otuBySeqLookUp[i].size();jerrorOut(e, "seqNoise", "getListData"); exit(1); } } /**************************************************************************************************/ int seqNoise::updateOTUCountData(vector otuFreq, vector > otuBySeqLookUp, vector > aanI, vector& anP, vector& anI, vector& cumCount ){ try { int numOTUs = otuFreq.size(); int count = 0; for(int i=0;icontrol_pressed) { return 0; } for(int j=0;jerrorOut(e, "seqNoise", "updateOTUCountData"); exit(1); } } /**************************************************************************************************/ double seqNoise::calcNewWeights( vector& weights, // vector seqFreq, // vector anI, // vector cumCount, // vector anP, // vector otuFreq, // vector tau // ){ try { int numOTUs = weights.size(); double maxChange = -1; cout.flush(); for(int i=0;icontrol_pressed) { return 0; } double change = weights[i]; weights[i] = 0.0000; for(int j=0;j maxChange){ maxChange = change; } cout.flush(); } return maxChange; } catch(exception& e) { m->errorOut(e, "seqNoise", "calcNewWeights"); exit(1); } } /**************************************************************************************************/ int seqNoise::calcCentroids( vector anI, vector anP, vector& change, vector& centroids, vector cumCount, vector distances,/// vector seqFreq, vector otuFreq, vector tau ){ try { int numOTUs = change.size(); int numSeqs = seqFreq.size(); for(int i=0;icontrol_pressed) { return 0; } int minFIndex = -1; double minFValue = 1e10; change[i] = 0; double count = 0.00000; int freqOfOTU = otuFreq[i]; for(int j=0;j 0 && count > MIN_COUNT){ vector adF(freqOfOTU); vector anL(freqOfOTU); for(int j=0;jerrorOut(e, "seqNoise", "calcCentroids"); exit(1); } } /**************************************************************************************************/ int seqNoise::checkCentroids(vector& weights, vector centroids){ try { int numOTUs = centroids.size(); vector unique(numOTUs, 1); double minWeight = MIN_WEIGHT; for(int i=0;icontrol_pressed) { return 0; } if(weights[i] < minWeight){ unique[i] = -1; } } for(int i=0;icontrol_pressed) { return 0; } if(unique[i] == 1){ for(int j=i+1; jerrorOut(e, "seqNoise", "checkCentroids"); exit(1); } } /**************************************************************************************************/ int seqNoise::setUpOTUData(vector& otuData, vector& percentage, vector cumCount, vector tau, vector otuFreq, vector anP, vector anI){ try { int numOTUs = cumCount.size(); int numSeqs = otuData.size(); vector bestTau(numSeqs, 0); vector bestIndex(numSeqs, -1); for(int i=0;icontrol_pressed) { return 0; } for(int j=0;j bestTau[index2]){ bestTau[index2] = thisTau; bestIndex[index2] = i; } } } for(int i=0;icontrol_pressed) { return 0; } otuData[i] = bestIndex[i]; percentage[i] = 1 - bestTau[i]; } return 0; } catch(exception& e) { m->errorOut(e, "seqNoise", "setUpOTUData"); exit(1); } } /**************************************************************************************************/ int seqNoise::finishOTUData(vector otuData, vector& otuFreq, vector& anP, vector& anI, vector& cumCount, vector >& otuBySeqLookUp, vector >& aanI, vector& tau){ try { int numSeqs = otuData.size(); int numOTUs = otuFreq.size(); int total = numSeqs; otuFreq.assign(numOTUs, 0); tau.assign(numSeqs, 1); anP.assign(numSeqs, 0); anI.assign(numSeqs, 0); for(int i=0;icontrol_pressed) { return 0; } int otu = otuData[i]; total++; otuBySeqLookUp[otu][otuFreq[otu]] = i; aanI[otu][otuFreq[otu]] = i; otuFreq[otu]++; } updateOTUCountData(otuFreq, otuBySeqLookUp, aanI, anP, anI, cumCount); return 0; } catch(exception& e) { m->errorOut(e, "seqNoise", "finishOTUData"); exit(1); } } /**************************************************************************************************/ int seqNoise::getLastMatch(char direction, vector >& alignMoves, int i, int j, vector& seqA, vector& seqB){ try{ char nullReturn = -1; while(i>=1 && j>=1){ if (m->control_pressed) { return nullReturn; } if(direction == 'd'){ if(seqA[i-1] == seqB[j-1]) { return seqA[i-1]; } else { return nullReturn; } } else if(direction == 'l') { j--; } else { i--; } direction = alignMoves[i][j]; } return nullReturn; } catch(exception& e) { m->errorOut(e, "seqNoise", "getLastMatch"); exit(1); } } /**************************************************************************************************/ int seqNoise::countDiffs(vector query, vector ref){ try { //double MATCH = 5.0; //double MISMATCH = -2.0; //double GAP = -2.0; vector > correctMatrix(4); for(int i=0;i<4;i++){ correctMatrix[i].resize(4); } correctMatrix[0][0] = 0.000000; //AA correctMatrix[1][0] = 11.619259; //CA correctMatrix[2][0] = 11.694004; //TA correctMatrix[3][0] = 7.748623; //GA correctMatrix[1][1] = 0.000000; //CC correctMatrix[2][1] = 7.619657; //TC correctMatrix[3][1] = 12.852562; //GC correctMatrix[2][2] = 0.000000; //TT correctMatrix[3][2] = 10.964048; //TG correctMatrix[3][3] = 0.000000; //GG for(int i=0;i<4;i++){ for(int j=0;j > alignMatrix(queryLength + 1); vector > alignMoves(queryLength + 1); for(int i=0;i<=queryLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[i].resize(refLength + 1, 0); alignMoves[i].resize(refLength + 1, 'x'); } for(int i=0;i<=queryLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[i][0] = 15.0 * i; alignMoves[i][0] = 'u'; } for(int i=0;i<=refLength;i++){ if (m->control_pressed) { return 0; } alignMatrix[0][i] = 15.0 * i; alignMoves[0][i] = 'l'; } for(int i=1;i<=queryLength;i++){ if (m->control_pressed) { return 0; } for(int j=1;j<=refLength;j++){ double nogap; nogap = alignMatrix[i-1][j-1] + correctMatrix[query[i-1]][ref[j-1]]; double gap; double left; if(i == queryLength){ //terminal gap left = alignMatrix[i][j-1]; } else{ if(ref[j-1] == getLastMatch('l', alignMoves, i, j, query, ref)){ gap = 4.0; } else{ gap = 15.0; } left = alignMatrix[i][j-1] + gap; } double up; if(j == refLength){ //terminal gap up = alignMatrix[i-1][j]; } else{ if(query[i-1] == getLastMatch('u', alignMoves, i, j, query, ref)){ gap = 4.0; } else{ gap = 15.0; } up = alignMatrix[i-1][j] + gap; } if(nogap < left){ if(nogap < up){ alignMoves[i][j] = 'd'; alignMatrix[i][j] = nogap; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = up; } } else{ if(left < up){ alignMoves[i][j] = 'l'; alignMatrix[i][j] = left; } else{ alignMoves[i][j] = 'u'; alignMatrix[i][j] = up; } } } } int i = queryLength; int j = refLength; int diffs = 0; // string alignA = ""; // string alignB = ""; // string bases = "ACTG"; while(i > 0 && j > 0){ if (m->control_pressed) { return 0; } if(alignMoves[i][j] == 'd'){ // alignA = bases[query[i-1]] + alignA; // alignB = bases[ref[j-1]] + alignB; if(query[i-1] != ref[j-1]) { diffs++; } i--; j--; } else if(alignMoves[i][j] == 'u'){ if(j != refLength){ // alignA = bases[query[i-1]] + alignA; // alignB = '-' + alignB; diffs++; } i--; } else if(alignMoves[i][j] == 'l'){ if(i != queryLength){ // alignA = '-' + alignA; // alignB = bases[ref[j-1]] + alignB; diffs++; } j--; } } // cout << diffs << endl; // cout << alignA << endl; // cout << alignB << endl; // cout << endl; return diffs; } catch(exception& e) { m->errorOut(e, "seqNoise", "countDiffs"); exit(1); } } /**************************************************************************************************/ vector seqNoise::convertSeq(string bases){ try { vector numbers(bases.length(), -1); for(int i=0;icontrol_pressed) { return numbers; } char b = bases[i]; if(b == 'A') { numbers[i] = 0; } else if(b=='C') { numbers[i] = 1; } else if(b=='T') { numbers[i] = 2; } else if(b=='G') { numbers[i] = 3; } else { numbers[i] = 0; } } return numbers; } catch(exception& e) { m->errorOut(e, "seqNoise", "convertSeq"); exit(1); } } /**************************************************************************************************/ string seqNoise::degapSeq(string aligned){ try { string unaligned = ""; for(int i=0;icontrol_pressed) { return ""; } if(aligned[i] != '-' && aligned[i] != '.'){ unaligned += aligned[i]; } } return unaligned; } catch(exception& e) { m->errorOut(e, "seqNoise", "degapSeq"); exit(1); } } /**************************************************************************************************/ int seqNoise::writeOutput(string fastaFileName, string namesFileName, string uMapFileName, vector finalTau, vector centroids, vector otuData, vector sequences, vector uniqueNames, vector redundantNames, vector seqFreq, vector& distances){ try { int numOTUs = finalTau.size(); int numSeqs = uniqueNames.size(); ofstream fastaFile(fastaFileName.c_str()); ofstream namesFile(namesFileName.c_str()); ofstream uMapFile(uMapFileName.c_str()); vector maxSequenceAbund(numOTUs, 0); vector maxSequenceIndex(numOTUs, 0); for(int i=0;icontrol_pressed) { return 0; } if(maxSequenceAbund[otuData[i]] < seqFreq[i]){ maxSequenceAbund[otuData[i]] = seqFreq[i]; maxSequenceIndex[otuData[i]] = i; } } int count = 1; for(int i=0;icontrol_pressed) { return 0; } if(finalTau[i] > 0){ if(maxSequenceIndex[i] != centroids[i] && distances[maxSequenceIndex[i]*numSeqs + centroids[i]] == 0){ // cout << uniqueNames[centroids[i]] << '\t' << uniqueNames[maxSequenceIndex[i]] << '\t' << count << endl; centroids[i] = maxSequenceIndex[i]; } int index = centroids[i]; fastaFile << '>' << uniqueNames[index] << endl << sequences[index] << endl; namesFile << uniqueNames[index] << '\t'; string refSeq = sequences[index]; string redundantSeqs = redundantNames[index];; vector frequencyData; for(int j=0;j rUnalign = convertSeq(refDegap); uMapFile << "ideal_seq_" << count << '\t' << finalTau[i] << endl; uMapFile << uniqueNames[index] << '\t' << seqFreq[index] << "\t0\t" << refDegap << endl; for(int j=0;jcontrol_pressed) { return 0; } redundantSeqs += ',' + redundantNames[frequencyData[j].index]; uMapFile << uniqueNames[frequencyData[j].index] << '\t' << seqFreq[frequencyData[j].index] << '\t'; string querySeq = sequences[frequencyData[j].index]; string queryDegap = degapSeq(querySeq); vector qUnalign = convertSeq(queryDegap); int udiffs = countDiffs(qUnalign, rUnalign); uMapFile << udiffs << '\t' << queryDegap << endl; } uMapFile << endl; namesFile << redundantSeqs << endl; count++; } } fastaFile.close(); namesFile.close(); uMapFile.close(); return 0; } catch(exception& e) { m->errorOut(e, "seqNoise", "writeOutput"); exit(1); } } /************************************************************************************************** int main(int argc, char *argv[]){ double sigma = 100; sigma = atof(argv[5]); double cutOff = 0.08; int minIter = 10; int maxIter = 1000; double minDelta = 1e-6; string sequenceFileName = argv[1]; string fileNameStub = sequenceFileName.substr(0,sequenceFileName.find_last_of('.')) + ".shhh"; vector sequences; getSequenceData(sequenceFileName, sequences); int numSeqs = sequences.size(); vector uniqueNames(numSeqs); vector redundantNames(numSeqs); vector seqFreq(numSeqs); string namesFileName = argv[4]; getRedundantNames(namesFileName, uniqueNames, redundantNames, seqFreq); string distFileName = argv[2]; vector distances(numSeqs * numSeqs); getDistanceData(distFileName, distances); string listFileName = argv[3]; vector otuData(numSeqs); vector otuFreq; vector > otuBySeqLookUp; getListData(listFileName, cutOff, otuData, otuFreq, otuBySeqLookUp); int numOTUs = otuFreq.size(); vector weights(numOTUs, 0); vector change(numOTUs, 1); vector centroids(numOTUs, -1); vector cumCount(numOTUs, 0); vector tau(numSeqs, 1); vector anP(numSeqs, 0); vector anI(numSeqs, 0); vector anN(numSeqs, 0); vector > aanI = otuBySeqLookUp; int numIters = 0; double maxDelta = 1e6; while(numIters < minIter || ((maxDelta > minDelta) && (numIters < maxIter))){ updateOTUCountData(otuFreq, otuBySeqLookUp, aanI, anP, anI, cumCount); maxDelta = calcNewWeights(weights, seqFreq, anI, cumCount, anP, otuFreq, tau); calcCentroids(anI, anP, change, centroids, cumCount, distances, seqFreq, otuFreq, tau); checkCentroids(weights, centroids); otuFreq.assign(numOTUs, 0); int total = 0; for(int i=0;i currentTau(numOTUs); for(int j=0;j minWeight && distances[i * numSeqs+centroids[j]] < offset){ offset = distances[i * numSeqs+centroids[j]]; } } for(int j=0;j minWeight){ currentTau[j] = exp(sigma * (-distances[(i * numSeqs + centroids[j])] + offset)) * weights[j]; norm += currentTau[j]; } else{ currentTau[j] = 0.0000; } } for(int j=0;j MIN_TAU){ int oldTotal = total; total++; tau.resize(oldTotal+1); tau[oldTotal] = currentTau[j]; otuBySeqLookUp[j][otuFreq[j]] = oldTotal; aanI[j][otuFreq[j]] = i; otuFreq[j]++; } } anP.resize(total); anI.resize(total); } numIters++; } updateOTUCountData(otuFreq, otuBySeqLookUp, aanI, anP, anI, cumCount); vector percentage(numSeqs); setUpOTUData(otuData, percentage, cumCount, tau, otuFreq, anP, anI); finishOTUData(otuData, otuFreq, anP, anI, cumCount, otuBySeqLookUp, aanI, tau); change.assign(numOTUs, 1); calcCentroids(anI, anP, change, centroids, cumCount, distances, seqFreq, otuFreq, tau); vector finalTau(numOTUs, 0); for(int i=0;i&); int addSeq(string, vector&); int getRedundantNames(string, vector&, vector&, vector&); int addRedundantName(string, string, vector&, vector&, vector&); int getDistanceData(string, vector&); int getListData(string, double, vector&, vector&, vector >&); int updateOTUCountData(vector, vector >, vector >, vector&, vector&, vector&); double calcNewWeights(vector&,vector,vector,vector,vector,vector,vector); int calcCentroids(vector,vector,vector&,vector&,vector,vector,vector,vector,vector); int checkCentroids(vector&, vector); int setUpOTUData(vector&, vector&, vector, vector, vector, vector, vector); int finishOTUData(vector, vector&, vector&, vector&, vector&, vector >&, vector >&, vector&); int writeOutput(string, string, string, vector, vector, vector, vector, vector, vector, vector, vector&); private: MothurOut* m; int getLastMatch(char, vector >&, int, int, vector&, vector&); int countDiffs(vector, vector); vector convertSeq(string); string degapSeq(string); }; /**************************************************************************************************/ #endif mothur-1.36.1/source/sharedutilities.cpp000066400000000000000000000253141255543666200203510ustar00rootroot00000000000000/* * sharedutilities.cpp * Mothur * * Created by Sarah Westcott on 4/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "sharedutilities.h" #include "sharedrabundvector.h" #include "sharedordervector.h" /**************************************************************************************************/ void SharedUtil::getSharedVectors(vector Groups, vector& lookup, SharedOrderVector* order) { try { //delete each sharedrabundvector in lookup for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } lookup.clear(); sort(Groups.begin(), Groups.end()); //create and initialize vector of sharedvectors, one for each group for (int i = 0; i < Groups.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(order->getNumBins()); temp->setLabel(order->getLabel()); temp->setGroup(Groups[i]); lookup.push_back(temp); } int numSeqs = order->size(); //sample all the members for(int i=0;iget(i); int abundance; //set info for sharedvector in chosens group for (int j = 0; j < lookup.size(); j++) { if (chosen.group == lookup[j]->getGroup()) { abundance = lookup[j]->getAbundance(chosen.bin); lookup[j]->set(chosen.bin, (abundance + 1), chosen.group); break; } } } } catch(exception& e) { m->errorOut(e, "SharedUtil", "getSharedVectors"); exit(1); } } /**************************************************************************************************/ void SharedUtil::getSharedVectorswithReplacement(vector Groups, vector& lookup, SharedOrderVector* order) { try { //delete each sharedrabundvector in lookup for (int j = 0; j < lookup.size(); j++) { delete lookup[j]; } lookup.clear(); //create and initialize vector of sharedvectors, one for each group for (int i = 0; i < Groups.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(order->getNumBins()); temp->setLabel(order->getLabel()); temp->setGroup(Groups[i]); lookup.push_back(temp); } int numSeqs = order->size(); //sample all the members for(int i=0;iget(random); int abundance; //set info for sharedvector in chosens group for (int j = 0; j < lookup.size(); j++) { if (chosen.group == lookup[j]->getGroup()) { abundance = lookup[j]->getAbundance(chosen.bin); lookup[j]->set(chosen.bin, (abundance + 1), chosen.group); break; } } } } catch(exception& e) { m->errorOut(e, "SharedUtil", "getSharedVectorswithReplacement"); exit(1); } } /**************************************************************************************************/ //need to have mode because different commands require different number of valid groups void SharedUtil::setGroups(vector& userGroups, vector& allGroups) { try { sort(userGroups.begin(), userGroups.end()); sort(allGroups.begin(), allGroups.end()); if (userGroups.size() != 0) { if (userGroups[0] != "all") { //check that groups are valid for (int i = 0; i < userGroups.size(); i++) { if (isValidGroup(userGroups[i], allGroups) != true) { m->mothurOut(userGroups[i] + " is not a valid group, and will be disregarded."); m->mothurOutEndLine(); // erase the invalid group from userGroups userGroups.erase(userGroups.begin()+i); i--; } } //if the user only entered invalid groups if (userGroups.size() == 0) { m->mothurOut("You provided no valid groups. I will run the command using all the groups in your file."); m->mothurOutEndLine(); for (int i = 0; i < allGroups.size(); i++) { userGroups.push_back(allGroups[i]); } } }else{//user has enter "all" and wants the default groups userGroups.clear(); for (int i = 0; i < allGroups.size(); i++) { userGroups.push_back(allGroups[i]); } } }else { //the user has not entered groups for (int i = 0; i < allGroups.size(); i++) { userGroups.push_back(allGroups[i]); } } } catch(exception& e) { m->errorOut(e, "SharedUtil", "setGroups"); exit(1); } } /**************************************************************************************************/ //need to have mode because different commands require different number of valid groups void SharedUtil::setGroups(vector& userGroups, vector& allGroups, string mode) { try { sort(userGroups.begin(), userGroups.end()); sort(allGroups.begin(), allGroups.end()); if (userGroups.size() != 0) { if (userGroups[0] != "all") { //check that groups are valid for (int i = 0; i < userGroups.size(); i++) { if (isValidGroup(userGroups[i], allGroups) != true) { m->mothurOut(userGroups[i] + " is not a valid group, and will be disregarded."); m->mothurOutEndLine(); // erase the invalid group from userGroups userGroups.erase(userGroups.begin()+i); i--; } } }else{//user has enter "all" and wants the default groups userGroups.clear(); for (int i = 0; i < allGroups.size(); i++) { userGroups.push_back(allGroups[i]); } } }else { //the user has not entered groups for (int i = 0; i < allGroups.size(); i++) { userGroups.push_back(allGroups[i]); } } if ((mode == "collect") || (mode == "rarefact") || (mode == "summary") || (mode == "treegroup")) { //if the user only entered invalid groups if ((userGroups.size() == 0) || (userGroups.size() == 1)) { m->mothurOut("When using the groups parameter you must have at least 2 valid groups. I will run the command using all the groups in your groupfile."); m->mothurOutEndLine(); for (int i = 0; i < allGroups.size(); i++) { userGroups.push_back(allGroups[i]); } } } } catch(exception& e) { m->errorOut(e, "SharedUtil", "setGroups"); exit(1); } } /**************************************************************************************/ //for parsimony and unifrac commands you set pairwise groups as well as an allgroups in calc void SharedUtil::setGroups(vector& userGroups, vector& allGroups, string& label, int& numGroups, string mode){ //globaldata->Groups, your tree or group map, allgroups, mode try { sort(userGroups.begin(), userGroups.end()); sort(allGroups.begin(), allGroups.end()); numGroups = 0; label = ""; //if the user has not entered specific groups to analyze then do them all if (userGroups.size() != 0) { if (userGroups[0] != "all") { //check that groups are valid for (int i = 0; i < userGroups.size(); i++) { if (isValidGroup(userGroups[i], allGroups) != true) { m->mothurOut(userGroups[i] + " is not a valid group, and will be disregarded."); m->mothurOutEndLine(); // erase the invalid group from globaldata->Groups userGroups.erase(userGroups.begin()+i); i--; } } }else { //users wants all groups userGroups.clear(); for (int i=0; i < allGroups.size(); i++) { if (allGroups[i] != "xxx") { userGroups.push_back(allGroups[i]); } } } }else { //the user has not entered groups for (int i=0; i < allGroups.size(); i++) { if (allGroups[i] != "xxx") { if (mode == "weighted") { userGroups.push_back(allGroups[i]); }else { numGroups = 1; label += allGroups[i] + "-"; } } } //rip extra - off allgroups label = label.substr(0, label.length()-1); if ((mode != "weighted") && (allGroups.size() > 10)) { label = "merged"; } } if (mode == "weighted") { //if the user only entered invalid groups if (userGroups.size() == 0) { for (int i=0; i < allGroups.size(); i++) { if (allGroups[i] != "xxx") { userGroups.push_back(allGroups[i]); } } m->mothurOut("When using the groups parameter you must have at least 2 valid groups. I will run the command using all the groups in your groupfile."); m->mothurOutEndLine(); }else if (userGroups.size() == 1) { m->mothurOut("When using the groups parameter you must have at least 2 valid groups. I will run the command using all the groups in your groupfile."); m->mothurOutEndLine(); userGroups.clear(); for (int i=0; i < allGroups.size(); i++) { if (allGroups[i] != "xxx") { userGroups.push_back(allGroups[i]); } } } numGroups = userGroups.size(); }else if ((mode == "unweighted") || (mode == "parsimony")) { //if the user only entered invalid groups if ((userGroups.size() == 0) && (numGroups == 0)) { m->mothurOut("When using the groups parameter you must have at least 1 valid group. I will run the command using all the groups in your groupfile."); m->mothurOutEndLine(); for (int i = 0; i < allGroups.size(); i++) { if (allGroups[i] != "xxx") { userGroups.push_back(allGroups[i]); } } } if (numGroups != 1) { numGroups = userGroups.size(); } } } catch(exception& e) { m->errorOut(e, "SharedUtil", "setGroups"); exit(1); } } /**************************************************************************************/ void SharedUtil::getCombos(vector& groupComb, vector userGroups, int& numComp) { //groupcomb, globaldata->Groups, numcomb try { sort(userGroups.begin(), userGroups.end()); //calculate number of comparisons i.e. with groups A,B,C = AB, AC, BC = 3; numComp = 0; for (int i=0; i< userGroups.size(); i++) { numComp += i; for (int l = 0; l < i; l++) { if (userGroups[i] > userGroups[l]) { //set group comparison labels groupComb.push_back(userGroups[l] + "-" + userGroups[i]); }else{ groupComb.push_back(userGroups[i] + "-" + userGroups[l]); } } } } catch(exception& e) { m->errorOut(e, "SharedUtil", "getCombos"); exit(1); } } /**************************************************************************************/ bool SharedUtil::isValidGroup(string groupname, vector groups) { try { for (int i = 0; i < groups.size(); i++) { if (groupname == groups[i]) { return true; } } return false; } catch(exception& e) { m->errorOut(e, "SharedUtil", "isValidGroup"); exit(1); } } /**************************************************************************************/ void SharedUtil::updateGroupIndex(vector& userGroups, map& index) { try { index.clear(); for (int i = 0; i < userGroups.size(); i++) { index[userGroups[i]] = i; } } catch(exception& e) { m->errorOut(e, "SharedUtil", "updateGroupIndex"); exit(1); } } /**************************************************************************************/ mothur-1.36.1/source/sharedutilities.h000066400000000000000000000026431255543666200200160ustar00rootroot00000000000000#ifndef SHAREDUTIL_H #define SHAREDUTIL_H /* * sharedutilities.h * Mothur * * Created by Sarah Westcott on 4/9/09. * Copyright 2009 Schloss Lab UMASS Amherst. All rights reserved. * */ #include "mothur.h" #include "mothurout.h" class SharedRAbundVector; class SharedOrderVector; /**************************************************************************************************/ class SharedUtil { public: SharedUtil() { m = MothurOut::getInstance(); } ~SharedUtil() {}; void getSharedVectors(vector, vector&, SharedOrderVector*); void getSharedVectorswithReplacement(vector, vector&, SharedOrderVector*); void setGroups(vector&, vector&); //globaldata->Groups, your tree or group map void setGroups(vector&, vector&, string); //globaldata->Groups, your tree or group map, mode void setGroups(vector&, vector&, string&, int&, string); //globaldata->Groups, your tree or group map, allgroups, numGroups, mode void getCombos(vector&, vector, int&); //groupcomb, globaldata->Groups, numcomb void updateGroupIndex(vector&, map&); //globaldata->Groups, groupmap->groupIndex bool isValidGroup(string, vector); private: MothurOut* m; }; /**************************************************************************************************/ #endif mothur-1.36.1/source/singlelinkage.cpp000066400000000000000000000071761255543666200177710ustar00rootroot00000000000000 #include "mothur.h" #include "cluster.hpp" /***********************************************************************/ SingleLinkage::SingleLinkage(RAbundVector* rav, ListVector* lv, SparseDistanceMatrix* dm, float c, string s, float a) : Cluster(rav, lv, dm, c, s, a) {} /***********************************************************************/ //This function returns the tag of the method. string SingleLinkage::getTag() { return("nn"); } /*********************************************************************** //This function clusters based on the single linkage method. void SingleLinkage::update(double& cutOFF){ try { smallCol = dMatrix->getSmallestCell(smallRow); nColCells = dMatrix->seqVec[smallCol].size(); nRowCells = dMatrix->seqVec[smallRow].size(); vector deleted(nRowCells, false); int rowInd; int search; bool changed; // The vector has to be traversed in reverse order to preserve the index // for faster removal in removeCell() for (int i=nRowCells-1;i>=0;i--) { if (dMatrix->seqVec[smallRow][i].index == smallCol) { rowInd = i; // The index of the smallest distance cell in rowCells } else { search = dMatrix->seqVec[smallRow][i].index; for (int j=0;jseqVec[smallCol][j].index != smallRow) { //if you are not the small cell if (dMatrix->seqVec[smallCol][j].index == search) { changed = updateDistance(dMatrix->seqVec[smallCol][j], dMatrix->seqVec[smallRow][i]); dMatrix->updateCellCompliment(smallCol, j); dMatrix->rmCell(smallRow, i); deleted[i] = true; break; } } } if (!deleted[i]) { // Assign the cell to the new cluster // remove the old cell from seqVec and add the cell // with the new row and column assignment again float distance = dMatrix->seqVec[smallRow][i].dist; dMatrix->rmCell(smallRow, i); if (search < smallCol){ PDistCell value(smallCol, distance); dMatrix->addCell(search, value); } else { PDistCell value(search, distance); dMatrix->addCell(smallCol, value); } sort(dMatrix->seqVec[smallCol].begin(), dMatrix->seqVec[smallCol].end(), compareIndexes); sort(dMatrix->seqVec[search].begin(), dMatrix->seqVec[search].end(), compareIndexes); } } } clusterBins(); clusterNames(); // remove also the cell with the smallest distance dMatrix->rmCell(smallRow, rowInd); } catch(exception& e) { m->errorOut(e, "SingleLinkage", "update"); exit(1); } } /***********************************************************************/ //This function updates the distance based on the nearest neighbor method. bool SingleLinkage::updateDistance(PDistCell& colCell, PDistCell& rowCell) { try { bool changed = false; if (colCell.dist > rowCell.dist) { colCell.dist = rowCell.dist; } return(changed); } catch(exception& e) { m->errorOut(e, "SingleLinkage", "updateDistance"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/slibshuff.cpp000066400000000000000000000044671255543666200171420ustar00rootroot00000000000000/* * slibshuff.cpp * Mothur * * Created by Pat Schloss on 4/8/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "slibshuff.h" /***********************************************************************/ SLibshuff::SLibshuff(FullMatrix* D, int it, float co) : Libshuff(D, it, 0, co){} /***********************************************************************/ float SLibshuff::evaluatePair(int i, int j){ return sCalculate(i,j); } /***********************************************************************/ vector > SLibshuff::evaluateAll(){ try{ savedMins.resize(numGroups); vector > dCXYValues(numGroups); for(int i=0;ierrorOut(e, "SLibshuff", "evaluateAll"); exit(1); } } /***********************************************************************/ double SLibshuff::sCalculate(int x, int y){ try{ double sum = 0.0,t=0.0; minX = getMinX(x); if (m->control_pressed) { return sum; } minXY = getMinXY(x,y); if (m->control_pressed) { return sum; } sort(minX.begin(), minX.end()); if (m->control_pressed) { return sum; } sort(minXY.begin(), minXY.end()); if (m->control_pressed) { return sum; } int ix=0,iy=0; while( (ix < groupSizes[x]) && (iy < groupSizes[x]) ) { double h = (ix-iy)/double(groupSizes[x]); if(minX[ix] < minXY[iy]) { sum += (minX[ix] - t)*h*h; t = minX[ix++]; } else { sum += (minXY[iy] - t)*h*h; t = minXY[iy++]; } } if(ix < groupSizes[x]) { while(ix < groupSizes[x]) { double h = (ix-iy)/double(groupSizes[x]); sum += (minX[ix] - t)*h*h; t = minX[ix++]; } } else { while(iy < groupSizes[x]) { double h = (ix-iy)/double(groupSizes[x]); sum += (minXY[iy] - t)*h*h; t = minXY[iy++]; } } return sum; } catch(exception& e) { m->errorOut(e, "SLibshuff", "sCalculate"); exit(1); } } /***********************************************************************/ mothur-1.36.1/source/slibshuff.h000066400000000000000000000006541255543666200166010ustar00rootroot00000000000000#ifndef SLIBSHUFF #define SLIBSHUFF /* * slibshuff.h * Mothur * * Created by Pat Schloss on 4/8/09. * Copyright 2009 Patrick D. Schloss. All rights reserved. * */ #include "fullmatrix.h" #include "libshuff.h" class SLibshuff : public Libshuff { public: SLibshuff(FullMatrix*, int, float); vector > evaluateAll(); float evaluatePair(int, int); private: double sCalculate(int, int); }; #endif mothur-1.36.1/source/subsample.cpp000066400000000000000000000405011255543666200171350ustar00rootroot00000000000000// // subsample.cpp // Mothur // // Created by Sarah Westcott on 4/2/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "subsample.h" //********************************************************************************************************************** Tree* SubSample::getSample(Tree* T, CountTable* ct, CountTable* newCt, int size) { try { Tree* newTree = NULL; //remove seqs not in sample from counttable vector Groups = ct->getNamesOfGroups(); newCt->copy(ct); newCt->addGroup("doNotIncludeMe"); map doNotIncludeTotals; vector namesSeqs = ct->getNamesOfSeqs(); for (int i = 0; i < namesSeqs.size(); i++) { doNotIncludeTotals[namesSeqs[i]] = 0; } for (int i = 0; i < Groups.size(); i++) { if (m->inUsersGroups(Groups[i], m->getGroups())) { if (m->control_pressed) { break; } int thisSize = ct->getGroupCount(Groups[i]); if (thisSize >= size) { vector names = ct->getNamesOfSeqs(Groups[i]); vector random; for (int j = 0; j < names.size(); j++) { int num = ct->getGroupCount(names[j], Groups[i]); for (int k = 0; k < num; k++) { random.push_back(j); } } random_shuffle(random.begin(), random.end()); vector sampleRandoms; sampleRandoms.resize(names.size(), 0); for (int j = 0; j < size; j++) { sampleRandoms[random[j]]++; } for (int j = 0; j < sampleRandoms.size(); j++) { newCt->setAbund(names[j], Groups[i], sampleRandoms[j]); } sampleRandoms.clear(); sampleRandoms.resize(names.size(), 0); for (int j = size; j < thisSize; j++) { sampleRandoms[random[j]]++; } for (int j = 0; j < sampleRandoms.size(); j++) { doNotIncludeTotals[names[j]] += sampleRandoms[j]; } }else { m->mothurOut("[ERROR]: You have selected a size that is larger than "+Groups[i]+" number of sequences.\n"); m->control_pressed = true; } } } for (map::iterator it = doNotIncludeTotals.begin(); it != doNotIncludeTotals.end(); it++) { newCt->setAbund(it->first, "doNotIncludeMe", it->second); } newTree = new Tree(newCt); newTree->getCopy(T, true); return newTree; } catch(exception& e) { m->errorOut(e, "SubSample", "getSample-Tree"); exit(1); } } //********************************************************************************************************************** //assumes whole maps dupName -> uniqueName map SubSample::deconvolute(map whole, vector& wanted) { try { map nameMap; //whole will be empty if user gave no name file, so we don't need to make a new one if (whole.size() == 0) { return nameMap; } vector newWanted; for (int i = 0; i < wanted.size(); i++) { if (m->control_pressed) { break; } string dupName = wanted[i]; map::iterator itWhole = whole.find(dupName); if (itWhole != whole.end()) { string repName = itWhole->second; //do we already have this rep? map::iterator itName = nameMap.find(repName); if (itName != nameMap.end()) { //add this seqs to dups list (itName->second) += "," + dupName; }else { //first sighting of this seq nameMap[repName] = dupName; newWanted.push_back(repName); } }else { m->mothurOut("[ERROR]: "+dupName+" is not in your name file, please correct.\n"); m->control_pressed = true; } } wanted = newWanted; return nameMap; } catch(exception& e) { m->errorOut(e, "SubSample", "deconvolute"); exit(1); } } //********************************************************************************************************************** vector SubSample::getSample(vector& thislookup, int size) { try { //save mothurOut's binLabels to restore for next label vector saveBinLabels = m->currentSharedBinLabels; int numBins = thislookup[0]->getNumBins(); for (int i = 0; i < thislookup.size(); i++) { int thisSize = thislookup[i]->getNumSeqs(); if (thisSize != size) { string thisgroup = thislookup[i]->getGroup(); OrderVector order; for(int p=0;pgetAbundance(p);j++){ order.push_back(p); } } random_shuffle(order.begin(), order.end()); SharedRAbundVector* temp = new SharedRAbundVector(numBins); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); delete thislookup[i]; thislookup[i] = temp; for (int j = 0; j < size; j++) { if (m->control_pressed) { return m->currentSharedBinLabels; } int bin = order.get(j); int abund = thislookup[i]->getAbundance(bin); thislookup[i]->set(bin, (abund+1), thisgroup); } } } //subsampling may have created some otus with no sequences in them eliminateZeroOTUS(thislookup); if (m->control_pressed) { return m->currentSharedBinLabels; } //save mothurOut's binLabels to restore for next label vector subsampleBinLabels = m->currentSharedBinLabels; m->currentSharedBinLabels = saveBinLabels; return subsampleBinLabels; } catch(exception& e) { m->errorOut(e, "SubSample", "getSample-shared"); exit(1); } } //********************************************************************************************************************** int SubSample::eliminateZeroOTUS(vector& thislookup) { try { vector newLookup; for (int i = 0; i < thislookup.size(); i++) { SharedRAbundVector* temp = new SharedRAbundVector(); temp->setLabel(thislookup[i]->getLabel()); temp->setGroup(thislookup[i]->getGroup()); newLookup.push_back(temp); } //for each bin vector newBinLabels; string snumBins = toString(thislookup[0]->getNumBins()); for (int i = 0; i < thislookup[0]->getNumBins(); i++) { if (m->control_pressed) { for (int j = 0; j < newLookup.size(); j++) { delete newLookup[j]; } return 0; } //look at each sharedRabund and make sure they are not all zero bool allZero = true; for (int j = 0; j < thislookup.size(); j++) { if (thislookup[j]->getAbundance(i) != 0) { allZero = false; break; } } //if they are not all zero add this bin if (!allZero) { for (int j = 0; j < thislookup.size(); j++) { newLookup[j]->push_back(thislookup[j]->getAbundance(i), thislookup[j]->getGroup()); } //if there is a bin label use it otherwise make one string binLabel = "Otu"; string sbinNumber = toString(i+1); if (sbinNumber.length() < snumBins.length()) { int diff = snumBins.length() - sbinNumber.length(); for (int h = 0; h < diff; h++) { binLabel += "0"; } } binLabel += sbinNumber; if (i < m->currentSharedBinLabels.size()) { binLabel = m->currentSharedBinLabels[i]; } newBinLabels.push_back(binLabel); } } for (int j = 0; j < thislookup.size(); j++) { delete thislookup[j]; } thislookup.clear(); thislookup = newLookup; m->currentSharedBinLabels = newBinLabels; return 0; } catch(exception& e) { m->errorOut(e, "SubSample", "eliminateZeroOTUS"); exit(1); } } //********************************************************************************************************************** int SubSample::getSample(SAbundVector*& sabund, int size) { try { OrderVector* order = new OrderVector(); *order = sabund->getOrderVector(NULL); int numBins = order->getNumBins(); int thisSize = order->getNumSeqs(); if (thisSize > size) { random_shuffle(order->begin(), order->end()); RAbundVector* rabund = new RAbundVector(numBins); rabund->setLabel(order->getLabel()); for (int j = 0; j < size; j++) { if (m->control_pressed) { delete order; delete rabund; return 0; } int bin = order->get(j); int abund = rabund->get(bin); rabund->set(bin, (abund+1)); } delete sabund; sabund = new SAbundVector(); *sabund = rabund->getSAbundVector(); delete rabund; }else if (thisSize < size) { m->mothurOut("[ERROR]: The size you requested is larger than the number of sequences in the sabund vector. You requested " + toString(size) + " and you only have " + toString(thisSize) + " seqs in your sabund vector.\n"); m->control_pressed = true; } if (m->control_pressed) { return 0; } delete order; return 0; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getSample"); exit(1); } } //********************************************************************************************************************** CountTable SubSample::getSample(CountTable& ct, int size, vector Groups) { try { if (!ct.hasGroupInfo()) { m->mothurOut("[ERROR]: Cannot subsample by group because your count table doesn't have group information.\n"); m->control_pressed = true; } CountTable sampledCt; map > tempCount; for (int i = 0; i < Groups.size(); i++) { sampledCt.addGroup(Groups[i]); vector names = ct.getNamesOfSeqs(Groups[i]); vector allNames; for (int j = 0; j < names.size(); j++) { if (m->control_pressed) { return sampledCt; } int num = ct. getGroupCount(names[j], Groups[i]); for (int k = 0; k < num; k++) { allNames.push_back(names[j]); } } random_shuffle(allNames.begin(), allNames.end()); if (allNames.size() < size) { m->mothurOut("[ERROR]: You have selected a size that is larger than "+Groups[i]+" number of sequences.\n"); m->control_pressed = true; } else{ for (int j = 0; j < size; j++) { if (m->control_pressed) { return sampledCt; } map >::iterator it = tempCount.find(allNames[j]); if (it == tempCount.end()) { //we have not seen this sequence at all yet vector tempGroups; tempGroups.resize(Groups.size(), 0); tempGroups[i]++; tempCount[allNames[j]] = tempGroups; }else{ tempCount[allNames[j]][i]++; } } } } //build count table for (map >::iterator it = tempCount.begin(); it != tempCount.end();) { sampledCt.push_back(it->first, it->second); tempCount.erase(it++); } return sampledCt; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getSample"); exit(1); } } //********************************************************************************************************************** CountTable SubSample::getSample(CountTable& ct, int size, vector Groups, bool pickedGroups) { try { CountTable sampledCt; if (!ct.hasGroupInfo() && pickedGroups) { m->mothurOut("[ERROR]: Cannot subsample with groups because your count table doesn't have group information.\n"); m->control_pressed = true; return sampledCt; } if (ct.hasGroupInfo()) { map > tempCount; vector allNames; map groupMap; vector myGroups; if (pickedGroups) { myGroups = Groups; } else { myGroups = ct.getNamesOfGroups(); } for (int i = 0; i < myGroups.size(); i++) { sampledCt.addGroup(myGroups[i]); groupMap[myGroups[i]] = i; vector names = ct.getNamesOfSeqs(myGroups[i]); for (int j = 0; j < names.size(); j++) { if (m->control_pressed) { return sampledCt; } int num = ct. getGroupCount(names[j], myGroups[i]); for (int k = 0; k < num; k++) { item temp(names[j], myGroups[i]); allNames.push_back(temp); } } } random_shuffle(allNames.begin(), allNames.end()); if (allNames.size() < size) { if (pickedGroups) { m->mothurOut("[ERROR]: You have selected a size that is larger than the number of sequences.\n"); } else { m->mothurOut("[ERROR]: You have selected a size that is larger than the number of sequences in the groups you chose.\n"); } m->control_pressed = true; return sampledCt; } else{ for (int j = 0; j < size; j++) { if (m->control_pressed) { return sampledCt; } map >::iterator it = tempCount.find(allNames[j].name); if (it == tempCount.end()) { //we have not seen this sequence at all yet vector tempGroups; tempGroups.resize(myGroups.size(), 0); tempGroups[groupMap[allNames[j].group]]++; tempCount[allNames[j].name] = tempGroups; }else{ tempCount[allNames[j].name][groupMap[allNames[j].group]]++; } } } //build count table for (map >::iterator it = tempCount.begin(); it != tempCount.end();) { sampledCt.push_back(it->first, it->second); tempCount.erase(it++); } //remove empty groups for (int i = 0; i < myGroups.size(); i++) { if (sampledCt.getGroupCount(myGroups[i]) == 0) { sampledCt.removeGroup(myGroups[i]); } } }else { vector names = ct.getNamesOfSeqs(); map nameMap; vector allNames; for (int i = 0; i < names.size(); i++) { int num = ct.getNumSeqs(names[i]); for (int j = 0; j < num; j++) { allNames.push_back(names[i]); } } if (allNames.size() < size) { m->mothurOut("[ERROR]: You have selected a size that is larger than the number of sequences.\n"); m->control_pressed = true; return sampledCt; } else { random_shuffle(allNames.begin(), allNames.end()); for (int j = 0; j < size; j++) { if (m->control_pressed) { return sampledCt; } map::iterator it = nameMap.find(allNames[j]); //we have not seen this sequence at all yet if (it == nameMap.end()) { nameMap[allNames[j]] = 1; } else{ nameMap[allNames[j]]++; } } //build count table for (map::iterator it = nameMap.begin(); it != nameMap.end();) { sampledCt.push_back(it->first, it->second); nameMap.erase(it++); } } } return sampledCt; } catch(exception& e) { m->errorOut(e, "SubSampleCommand", "getSample"); exit(1); } } //********************************************************************************************************************** mothur-1.36.1/source/subsample.h000066400000000000000000000037451255543666200166130ustar00rootroot00000000000000#ifndef Mothur_subsample_h #define Mothur_subsample_h // // subsample.h // Mothur // // Created by Sarah Westcott on 4/2/12. // Copyright (c) 2012 Schloss Lab. All rights reserved. // #include "mothurout.h" #include "sharedrabundvector.h" #include "treemap.h" #include "tree.h" #include "counttable.h" struct item { string name; string group; item() {} item(string n, string g) : name(n), group(g) {} ~item() {} }; //subsampling overwrites the sharedRabunds. If you need to reuse the original use the getSamplePreserve function. class SubSample { public: SubSample() { m = MothurOut::getInstance(); } ~SubSample() {} vector getSample(vector&, int); //returns the bin labels for the subsample, mothurOuts binlabels are preserved so you can run this multiple times. Overwrites original vector passed in, if you need to preserve it deep copy first. Tree* getSample(Tree*, CountTable*, CountTable*, int); //creates new subsampled tree. Uses first counttable to fill new counttable with sabsampled seqs. Sets groups of seqs not in subsample to "doNotIncludeMe". int getSample(SAbundVector*&, int); //destroys sabundvector passed in, so copy it if you need it CountTable getSample(CountTable&, int, vector); //subsample a countTable bygroup(same number sampled from each group, returns subsampled countTable CountTable getSample(CountTable&, int, vector, bool); //subsample a countTable. If you want to only sample from specific groups, pass in groups in the vector and set bool=true, otherwise set bool=false. private: MothurOut* m; int eliminateZeroOTUS(vector&); map deconvolute(map wholeSet, vector& subsampleWanted); //returns new nameMap containing only subsampled names, and removes redundants from subsampled wanted because it makes the new nameMap. }; #endif mothur-1.36.1/source/svm/000077500000000000000000000000001255543666200152435ustar00rootroot00000000000000mothur-1.36.1/source/svm/svm.cpp000066400000000000000000001602021255543666200165550ustar00rootroot00000000000000// // svm.cpp // support vector machine // // Created by Joshua Lynch on 6/19/2013. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #include #include #include #include #include #include #include #include #include "svm.hpp" // OutputFilter constants const int OutputFilter::QUIET = 0; const int OutputFilter::INFO = 1; const int OutputFilter::mDEBUG = 2; const int OutputFilter::TRACE = 3; #define RANGE(X) X, X + sizeof(X)/sizeof(double) // parameters will be tested in the order they are specified const string LinearKernelFunction::MapKey = "linear";//"LinearKernel"; const string LinearKernelFunction::MapKey_Constant = "constant";//"LinearKernel_Constant"; const double defaultLinearConstantRangeArray[] = {0.0, -1.0, 1.0, -10.0, 10.0}; const ParameterRange LinearKernelFunction::defaultConstantRange = ParameterRange(RANGE(defaultLinearConstantRangeArray)); const string RbfKernelFunction::MapKey = "rbf";//"RbfKernel"; const string RbfKernelFunction::MapKey_Gamma = "gamma";//"RbfKernel_Gamma"; const double defaultRbfGammaRangeArray[] = {0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0}; const ParameterRange RbfKernelFunction::defaultGammaRange = ParameterRange(RANGE(defaultRbfGammaRangeArray)); const string PolynomialKernelFunction::MapKey = "polynomial";//"PolynomialKernel"; const string PolynomialKernelFunction::MapKey_Constant = "constant";//"PolynomialKernel_Constant"; const string PolynomialKernelFunction::MapKey_Coefficient = "coefficient";//"PolynomialKernel_Coefficient"; const string PolynomialKernelFunction::MapKey_Degree = "degree";//"PolynomialKernel_Degree"; const double defaultPolynomialConstantRangeArray[] = {0.0, -1.0, 1.0, -2.0, 2.0, -3.0, 3.0}; const ParameterRange PolynomialKernelFunction::defaultConstantRange = ParameterRange(RANGE(defaultPolynomialConstantRangeArray)); const double defaultPolynomialCoefficientRangeArray[] = {0.01, 0.1, 1.0, 10.0, 100.0}; const ParameterRange PolynomialKernelFunction::defaultCoefficientRange = ParameterRange(RANGE(defaultPolynomialCoefficientRangeArray)); const double defaultPolynomialDegreeRangeArray[] = {2.0, 3.0, 4.0}; const ParameterRange PolynomialKernelFunction::defaultDegreeRange = ParameterRange(RANGE(defaultPolynomialDegreeRangeArray)); const string SigmoidKernelFunction::MapKey = "sigmoid"; const string SigmoidKernelFunction::MapKey_Alpha = "alpha"; const string SigmoidKernelFunction::MapKey_Constant = "constant"; const double defaultSigmoidAlphaRangeArray[] = {1.0, 2.0}; const ParameterRange SigmoidKernelFunction::defaultAlphaRange = ParameterRange(RANGE(defaultSigmoidAlphaRangeArray)); const double defaultSigmoidConstantRangeArray[] = {1.0, 2.0}; const ParameterRange SigmoidKernelFunction::defaultConstantRange = ParameterRange(RANGE(defaultSigmoidConstantRangeArray)); const string SmoTrainer::MapKey_C = "smoc";//"SmoTrainer_C"; const double defaultSmoTrainerCRangeArray[] = {0.0001, 0.001, 0.01, 0.1, 1.0, 10.0, 100.0, 1000.0}; const ParameterRange SmoTrainer::defaultCRange = ParameterRange(RANGE(defaultSmoTrainerCRangeArray)); MothurOut* m = MothurOut::getInstance(); LabelPair buildLabelPair(const Label& one, const Label& two) { LabelVector labelPair(2); labelPair[0] = one; labelPair[1] = two; return labelPair; } // Dividing a dataset into training and testing sets while maintaining equal // representation of all classes is done using a LabelToLabeledObservationVector. // This container is used to divide datasets into groups of LabeledObservations // having the same label. For example, given a LabeledObservationVector like // ["blue", [1.0, 2.0, 3.0]] // ["green", [3.0, 4.0, 5.0]] // ["blue", [2,0, 3.0. 4.0]] // ["green", [4.0, 5.0, 6.0]] // the corresponding LabelToLabeledObservationVector looks like // "blue" : [["blue", [1.0, 2.0, 3.0]], ["blue", [2,0, 3.0. 4.0]]] // "green" : [["green", [3.0, 4.0, 5.0]], ["green", [4.0, 5.0, 6.0]]] void buildLabelToLabeledObservationVector(LabelToLabeledObservationVector& labelToLabeledObservationVector, const LabeledObservationVector& labeledObservationVector) { for ( LabeledObservationVector::const_iterator j = labeledObservationVector.begin(); j != labeledObservationVector.end(); j++ ) { labelToLabeledObservationVector[j->first].push_back(*j); } } class MeanAndStd { private: double n; double M2; double mean; public: MeanAndStd() {} ~MeanAndStd() {} void initialize() { n = 0.0; mean = 0.0; M2 = 0.0; } void processNextValue(double x) { n += 1.0; double delta = x - mean; mean += delta / n; M2 += delta * (x - mean); } double getMean() { return mean; } double getStd() { double variance = M2 / (n - 1.0); return sqrt(variance); } }; // The LabelMatchesEither functor is used only in a call to remove_copy_if in the // OneVsOneMultiClassSvmTrainer::train method. It returns true if the labeled // observation argument has the same label as either of the two label arguments. class FeatureLabelMatches { public: FeatureLabelMatches(const string& _featureLabel) : featureLabel(_featureLabel){} bool operator() (const Feature& f) { return f.getFeatureLabel() == featureLabel; } private: const string& featureLabel; }; Feature removeFeature(Feature featureToRemove, LabeledObservationVector& observations, FeatureVector& featureVector) { FeatureLabelMatches matchFeatureLabel(featureToRemove.getFeatureLabel()); featureVector.erase( remove_if(featureVector.begin(), featureVector.end(), matchFeatureLabel), featureVector.end() ); for ( ObservationVector::size_type observation = 0; observation < observations.size(); observation++ ) { observations[observation].removeFeatureAtIndex(featureToRemove.getFeatureIndex()); } // update the feature indices for ( int i = 0; i < featureVector.size(); i++ ) { featureVector.at(i).setFeatureIndex(i); } featureToRemove.setFeatureIndex(-1); return featureToRemove; } FeatureVector applyStdThreshold(double stdThreshold, LabeledObservationVector& observations, FeatureVector& featureVector) { // calculate standard deviation of each feature // remove features with standard deviation less than or equal to stdThreshold MeanAndStd ms; // loop over features in reverse order so we can get the index of each // for example, // if there are 5 features a,b,c,d,e // and features a, c, e fall below the stdThreshold // loop iteration 0: remove feature e (index 4) -- features are now a,b,c,d // loop iteration 1: leave feature d (index 3) // loop iteration 2: remove feature c (index 2) -- features are now a,b,d // loop iteration 3: leave feature b (index 1) // loop iteration 4: remove feature a (index 0) -- features are now b,d FeatureVector removedFeatureVector; for ( int feature = observations[0].second->size()-1; feature >= 0 ; feature-- ) { ms.initialize(); m->mothurOut("feature index " + toString(feature)); m->mothurOutEndLine(); for ( ObservationVector::size_type observation = 0; observation < observations.size(); observation++ ) { ms.processNextValue(observations[observation].second->at(feature)); } m->mothurOut( "feature " + toString(feature) + " has std " + toString(ms.getStd()) ); m->mothurOutEndLine(); if ( ms.getStd() <= stdThreshold ) { m->mothurOut( "removing feature with index " + toString(feature) ); m->mothurOutEndLine(); // remove this feature Feature featureToRemove = featureVector.at(feature); removedFeatureVector.push_back( removeFeature(featureToRemove, observations, featureVector) ); } } reverse(removedFeatureVector.begin(), removedFeatureVector.end()); return removedFeatureVector; } // this function standardizes data to mean 0 and variance 1 // but this may not be a good standardization for OTU data void transformZeroMeanUnitVariance(LabeledObservationVector& observations) { bool vebose = false; // online method for mean and variance MeanAndStd ms; for ( Observation::size_type feature = 0; feature < observations[0].second->size(); feature++ ) { ms.initialize(); //double n = 0.0; //double mean = 0.0; //double M2 = 0.0; for ( ObservationVector::size_type observation = 0; observation < observations.size(); observation++ ) { ms.processNextValue(observations[observation].second->at(feature)); //n += 1.0; //double x = observations[observation].second->at(feature); //double delta = x - mean; //mean += delta / n; //M2 += delta * (x - mean); } //double variance = M2 / (n - 1.0); //double standardDeviation = sqrt(variance); if (vebose) { m->mothurOut( "mean of feature " + toString(feature) + " is " + toString(ms.getMean()) ); m->mothurOutEndLine(); m->mothurOut( "std of feature " + toString(feature) + " is " + toString(ms.getStd()) ); m->mothurOutEndLine(); } // normalize the feature double mean = ms.getMean(); double std = ms.getStd(); for ( ObservationVector::size_type observation = 0; observation < observations.size(); observation++ ) { observations[observation].second->at(feature) = (observations[observation].second->at(feature) - mean ) / std; } } } double getMinimumFeatureValueForObservation(Observation::size_type featureIndex, LabeledObservationVector& observations) { double featureMinimum = numeric_limits::max(); for ( ObservationVector::size_type observation = 0; observation < observations.size(); observation++ ) { if ( observations[observation].second->at(featureIndex) < featureMinimum ) { featureMinimum = observations[observation].second->at(featureIndex); } } return featureMinimum; } double getMaximumFeatureValueForObservation(Observation::size_type featureIndex, LabeledObservationVector& observations) { double featureMaximum = numeric_limits::min(); for ( ObservationVector::size_type observation = 0; observation < observations.size(); observation++ ) { if ( observations[observation].second->at(featureIndex) > featureMaximum ) { featureMaximum = observations[observation].second->at(featureIndex); } } return featureMaximum; } // this function standardizes data to minimum value 0.0 and maximum value 1.0 void transformZeroOne(LabeledObservationVector& observations) { for ( Observation::size_type feature = 0; feature < observations[0].second->size(); feature++ ) { double featureMinimum = getMinimumFeatureValueForObservation(feature, observations); double featureMaximum = getMaximumFeatureValueForObservation(feature, observations); // standardize the feature for ( ObservationVector::size_type observation = 0; observation < observations.size(); observation++ ) { double x = observations[observation].second->at(feature); double xstd = (x - featureMinimum) / (featureMaximum - featureMinimum); observations[observation].second->at(feature) = xstd / (1.0 - 0.0) + 0.0; } } } // // SVM member functions // // the discriminant member function returns +1 or -1 int SVM::discriminant(const Observation& observation) const { // d is the discriminant function double d = b; for ( int i = 0; i < y.size(); i++ ) { d += y[i]*a[i]*inner_product(observation.begin(), observation.end(), x[i].second->begin(), 0.0); } return d > 0.0 ? 1 : -1; } LabelVector SVM::classify(const LabeledObservationVector& twoClassLabeledObservationVector) const { LabelVector predictionVector; for ( LabeledObservationVector::const_iterator i = twoClassLabeledObservationVector.begin(); i != twoClassLabeledObservationVector.end(); i++ ) { Label prediction = classify(*(i->getObservation())); Label actual = i->getLabel(); //cout << "classification of actual " << actual << " is " << prediction << endl; predictionVector.push_back(prediction); } return predictionVector; } // the score member function classifies each labeled observation from the // argument and returns the fraction of correct classifications // don't need this any more???? double SVM::score(const LabeledObservationVector& twoClassLabeledObservationVector) const { //cout << "score:" << endl; double s = 0.0; for ( LabeledObservationVector::const_iterator i = twoClassLabeledObservationVector.begin(); i != twoClassLabeledObservationVector.end(); i++ ) { Label predicted_label = classify(*(i->second)); //cout << "in score actual label: '" << i->first << "' predicted label: '" << predicted_label << "'" << endl; if ( predicted_label == i->first ) { s = s + 1.0; } else { } } return s / double(twoClassLabeledObservationVector.size()); } void SvmPerformanceSummary::init(const SVM& svm, const LabeledObservationVector& actualLabels, const LabelVector& predictedLabels) { // accumulate four counts: // tp (true positive) -- correct classifications (classified +1 as +1) // fp (false positive) -- incorrect classifications (classified -1 as +1) // fn (false negative) -- incorrect classifications (classified +1 as -1) // tn (true negative) -- correct classification (classified -1 as -1) // the label corresponding to discriminant +1 will be the 'positive' class NumericClassToLabel discriminantToLabel = svm.getDiscriminantToLabel(); positiveClassLabel = discriminantToLabel[1]; negativeClassLabel = discriminantToLabel[-1]; //cout << "positive class label: " << positiveClassLabel << endl; //cout << "negative class label: " << negativeClassLabel << endl; //cout << "actual labels vector has length " << actualLabels.size() << endl; //cout << "predicted labels vector has length " << predictedLabels.size() << endl; double tp = 0; double fp = 0; double fn = 0; double tn = 0; double unknown = 0; for (int i = 0; i < actualLabels.size(); i++) { Label predictedLabel = predictedLabels.at(i); Label actualLabel = actualLabels.at(i).getLabel(); //cout << "predicted: " << predictedLabel << " actual: " << actualLabel << endl; if ( actualLabel.compare(positiveClassLabel) == 0) { if ( predictedLabel.compare(positiveClassLabel) == 0 ) { tp++; } else if ( predictedLabel.compare(negativeClassLabel) == 0 ) { fn++; } else { m->mothurOut( "actual label is positive but something is wrong" ); m->mothurOutEndLine(); } } else if ( actualLabel.compare(negativeClassLabel) == 0 ) { if ( predictedLabel.compare(positiveClassLabel) == 0 ) { fp++; } else if ( predictedLabel.compare(negativeClassLabel) == 0 ) { tn++; } else { m->mothurOut( "actual label is negative but something is wrong" ); m->mothurOutEndLine(); } } else { // in the event we have been given an observation that is labeled // neither positive nor negative then we will get a false classification //cout << "unrecognized actual label " << actualLabel << endl; if ( predictedLabel.compare(positiveClassLabel) ) { fp++; } else { fn++; } } } if ( tp == 0 && fp == 0 ) { precision = 0; } else { precision = tp / (tp + fp); } recall = tp / (tp + fn); if ( precision == 0 && recall == 0 ) { f = 0; } else { f = 2.0 * (precision * recall) / (precision + recall); } accuracy = (tp + tn) / (tp + tn + fp + fn); //cout << "svm performance summary for labels " << positiveClassLabel << " " << negativeClassLabel << endl; //cout << "tp: " << tp << " fp: " << fp << " tn: " << tn << " fn: " << fn << endl; //cout << "precision: " << precision << " recall: " << recall << " f: " << f << " accuracy: " << accuracy << endl; } MultiClassSVM::MultiClassSVM(const vector s, const LabelSet& l, const SvmToSvmPerformanceSummary& p, OutputFilter of) : twoClassSvmList(s.begin(), s.end()), labelSet(l), svmToSvmPerformanceSummary(p), outputFilter(of), accuracy(0) {} MultiClassSVM::~MultiClassSVM() { for ( int i = 0; i < twoClassSvmList.size(); i++ ) { delete twoClassSvmList[i]; } } // The fewerVotes function is used to find the maximum vote // tally in MultiClassSVM::classify. This function returns true // if the first element (number of votes for the first label) is // less than the second element (number of votes for the second label). bool fewerVotes(const pair& p, const pair& q) { return p.second < q.second; } Label MultiClassSVM::classify(const Observation& observation) { map labelToVoteCount; for ( int i = 0; i < twoClassSvmList.size(); i++ ) { Label predictedLabel = twoClassSvmList[i]->classify(observation); labelToVoteCount[predictedLabel]++; } pair winner = *max_element(labelToVoteCount.begin(), labelToVoteCount.end(), fewerVotes); LabelVector winningLabels; winningLabels.push_back(winner.first); for ( map::const_iterator i = labelToVoteCount.begin(); i != labelToVoteCount.end(); i++ ) { if ( i->second == winner.second && i->first != winner.first ) { winningLabels.push_back(i->first); } } if ( winningLabels.size() == 1) { // we have a winner } else { // we have a tie throw MultiClassSvmClassificationTie(winningLabels, winner.second); } return winner.first; } double MultiClassSVM::score(const LabeledObservationVector& multiClassLabeledObservationVector) { double s = 0.0; for (LabeledObservationVector::const_iterator i = multiClassLabeledObservationVector.begin(); i != multiClassLabeledObservationVector.end(); i++) { //cout << "classifying observation with label " << i->first << endl; try { Label predicted_label = classify(*(i->second)); if ( predicted_label == i->first ) { s = s + 1.0; } else { // predicted label does not match actual label } } catch ( MultiClassSvmClassificationTie& e ) { if ( outputFilter.debug() ) { m->mothurOut( "classification tie for observation " + toString(i->datasetIndex) + " with label " + toString(i->first) ); m->mothurOutEndLine(); } } } return s / double(multiClassLabeledObservationVector.size()); } class MaxIterationsExceeded : public exception { virtual const char* what() const throw() { return "maximum iterations exceeded during SMO"; } } maxIterationsExceeded; //SvmTrainingInterruptedException smoTrainingInterruptedException("SMO training interrupted by user"); // The train method implements Sequential Minimal Optimization as described in // "Support Vector Machine Solvers" by Bottou and Lin. // // SmoTrainer::train releases a pointer to an SVM into the wild so we must be // careful about handling the LabeledObservationVector.... Must create a copy // of those labeled vectors??? SVM* SmoTrainer::train(KernelFunctionCache& K, const LabeledObservationVector& twoClassLabeledObservationVector) { const int observationCount = twoClassLabeledObservationVector.size(); const int featureCount = twoClassLabeledObservationVector[0].second->size(); if (outputFilter.debug()) m->mothurOut( "observation count : " + toString(observationCount) ); m->mothurOutEndLine(); if (outputFilter.debug()) m->mothurOut( "feature count : " + toString(featureCount) ); m->mothurOutEndLine(); // dual coefficients vector a(observationCount, 0.0); // gradient vector g(observationCount, 1.0); // convert the labels to -1.0,+1.0 vector y(observationCount); if (outputFilter.trace()) m->mothurOut( "assign numeric labels" ); m->mothurOutEndLine(); NumericClassToLabel discriminantToLabel; assignNumericLabels(y, twoClassLabeledObservationVector, discriminantToLabel); if (outputFilter.trace()) m->mothurOut( "assign A and B" ); m->mothurOutEndLine(); vector A(observationCount); vector B(observationCount); for ( int n = 0; n < observationCount; n++ ) { if ( y[n] == +1.0) { A[n] = 0.0; B[n] = C; } else { A[n] = -C; B[n] = 0; } if (outputFilter.trace()) m->mothurOut( toString(n) + " " + toString(A[n]) + " " + toString(B[n]) ); m->mothurOutEndLine(); } if (outputFilter.trace()) m->mothurOut( "assign K" ); m->mothurOutEndLine(); int m_count = 0; vector u(3); vector ya(observationCount); vector yg(observationCount); double lambda = numeric_limits::max(); while ( true ) { if (m->control_pressed) { return 0; } //if ( externalSvmTrainingInterruption.interruptTraining() ) { // this should be a specialized exception //cout << "***************************** interrupting training **********************************" << endl; //throw smoTrainingInterruptedException; //} m_count++; int i = 0; // 0 int j = 0; // 0 double yg_max = numeric_limits::min(); double yg_min = numeric_limits::max(); if (outputFilter.trace()) m->mothurOut( "m = " + toString(m_count) ); m->mothurOutEndLine(); for ( int k = 0; k < observationCount; k++ ) { ya[k] = y[k] * a[k]; yg[k] = y[k] * g[k]; } if (outputFilter.trace()) { m->mothurOut( "yg ="); for ( int k = 0; k < observationCount; k++ ) { //cout << A[k] << " " << B[k] << " " << y[k] << " " << a[k] << " " << g[k] << " " << ya[k] << " " << yg[k] << endl; m->mothurOut( " " + toString(yg[k])); } m->mothurOutEndLine(); } for ( int k = 0; k < observationCount; k++ ) { if ( ya[k] < B[k] && yg[k] > yg_max ) { yg_max = yg[k]; i = k; } if ( A[k] < ya[k] && yg[k] < yg_min ) { yg_min = yg[k]; j = k; } //cout << "n = " << n << endl; //cout << ya[k] << " " << yg[k] << endl; //cout << "j = " << j << " yg[j] = " << yg[j] << endl; } // maximum violating pair is i,j if (outputFilter.trace()) { m->mothurOut( "maximal violating pair: " + toString(i) + " " + toString(j) ); m->mothurOutEndLine(); m->mothurOut( " i = " + toString(i) + " features: "); for ( int feature = 0; feature < featureCount; feature++ ) { m->mothurOut( toString(twoClassLabeledObservationVector[i].second->at(feature)) + " "); }; m->mothurOutEndLine(); m->mothurOut( " j = " + toString(j) + " features: "); for ( int feature = 0; feature < featureCount; feature++ ) { m->mothurOut( toString(twoClassLabeledObservationVector[j].second->at(feature)) + " "); }; m->mothurOutEndLine(); } // parameterize this if ( m_count > 1000 ) { //1000 // what happens if we just go with what we've got instead of throwing an exception? // things work pretty well for the most part // might be better to look at lambda??? if (outputFilter.debug()) m->mothurOut( "iteration limit reached with lambda = " + toString(lambda) ); m->mothurOutEndLine(); break; } // using lambda to break is a good performance enhancement if ( yg[i] <= yg[j] or lambda < 0.0001) { break; } u[0] = B[i] - ya[i]; u[1] = ya[j] - A[j]; double K_ii = K.similarity(twoClassLabeledObservationVector[i], twoClassLabeledObservationVector[i]); double K_jj = K.similarity(twoClassLabeledObservationVector[j], twoClassLabeledObservationVector[j]); double K_ij = K.similarity(twoClassLabeledObservationVector[i], twoClassLabeledObservationVector[j]); u[2] = (yg[i] - yg[j]) / (K_ii+K_jj-2.0*K_ij); if (outputFilter.trace()) m->mothurOut( "directions: (" + toString(u[0]) + "," + toString(u[1]) + "," + toString(u[2]) + ")" ); m->mothurOutEndLine(); lambda = *min_element(u.begin(), u.end()); if (outputFilter.trace()) m->mothurOut( "lambda: " + toString(lambda) ); m->mothurOutEndLine(); for ( int k = 0; k < observationCount; k++ ) { double K_ik = K.similarity(twoClassLabeledObservationVector[i], twoClassLabeledObservationVector[k]); double K_jk = K.similarity(twoClassLabeledObservationVector[j], twoClassLabeledObservationVector[k]); g[k] += (-lambda * y[k] * K_ik + lambda * y[k] * K_jk); } if (outputFilter.trace()) { m->mothurOut( "g ="); for ( int k = 0; k < observationCount; k++ ) { m->mothurOut( " " + toString(g[k])); } m->mothurOutEndLine(); } a[i] += y[i] * lambda; a[j] -= y[j] * lambda; } // at this point the optimal a's have been found // now use them to find w and b if (outputFilter.trace()) m->mothurOut( "find w" ); m->mothurOutEndLine(); vector w(twoClassLabeledObservationVector[0].second->size(), 0.0); double b = 0.0; for ( int i = 0; i < y.size(); i++ ) { if (outputFilter.trace()) m->mothurOut( "alpha[" + toString(i) + "] = " + toString(a[i]) ); m->mothurOutEndLine(); for ( int j = 0; j < w.size(); j++ ) { w[j] += a[i] * y[i] * twoClassLabeledObservationVector[i].second->at(j); } if ( A[i] < a[i] && a[i] < B[i] ) { b = yg[i]; if (outputFilter.trace()) m->mothurOut( "b = " + toString(b) ); m->mothurOutEndLine(); } } if (outputFilter.trace()) { for ( int i = 0; i < w.size(); i++ ) { m->mothurOut( "w[" + toString(i) + "] = " + toString(w[i]) ); m->mothurOutEndLine(); } } // be careful about passing twoClassLabeledObservationVector - what if this vector // is deleted??? // // we can eliminate elements of y, a and observation vectors corresponding to a = 0 vector support_y; vector nonzero_a; LabeledObservationVector supportVectors; for (int i = 0; i < a.size(); i++) { if ( a.at(i) == 0.0 ) { // this dual coefficient does not correspond to a support vector } else { support_y.push_back(y.at(i)); nonzero_a.push_back(a.at(i)); supportVectors.push_back(twoClassLabeledObservationVector.at(i)); } } //return new SVM(y, a, twoClassLabeledObservationVector, b, discriminantToLabel); if (outputFilter.info()) m->mothurOut( "found " + toString(supportVectors.size()) + " support vectors" ); m->mothurOutEndLine(); return new SVM(support_y, nonzero_a, supportVectors, b, discriminantToLabel); } typedef map LabelToNumericClassLabel; // For SVM training we need to assign numeric class labels of -1.0 and +1.0. // This method populates the y vector argument with -1.0 and +1.0 // corresponding to the two classes in the labelVector argument. // For example, if labeledObservationVector looks like this: // [ (0, "blue", [...some observations...]), // (1, "green", [...some observations...]), // (2, "blue", [...some observations...]) ] // Then after the function executes the y vector will look like this: // [-1.0, blue // +1.0, green // -1.0] blue // and discriminantToLabel will look like this: // { -1.0 : "blue", // +1.0 : "green" } // The label "blue" is mapped to -1.0 because it is (lexicographically) less than "green". // When given labels "blue" and "green" this function will always assign "blue" to -1.0 and // "green" to +1.0. This is not fundamentally important but it makes testing easier and is // not a hassle to implement. void SmoTrainer::assignNumericLabels(vector& y, const LabeledObservationVector& labeledObservationVector, NumericClassToLabel& discriminantToLabel) { // it would be nice if we assign -1.0 and +1.0 consistently for each pair of labels // I think the label set will always be traversed in sorted order so we should get this for free // we are going to overwrite arguments y and discriminantToLabel y.clear(); discriminantToLabel.clear(); LabelSet labelSet; buildLabelSet(labelSet, labeledObservationVector); LabelVector uniqueLabels(labelSet.begin(), labelSet.end()); if (labelSet.size() != 2) { // throw an exception cerr << "unexpected label set size " << labelSet.size() << endl; for (LabelSet::const_iterator i = labelSet.begin(); i != labelSet.end(); i++) { cerr << " label " << *i << endl; } throw SmoTrainerException("SmoTrainer::assignNumericLabels was passed more than 2 labels"); } else { LabelToNumericClassLabel labelToNumericClassLabel; labelToNumericClassLabel[uniqueLabels[0]] = -1.0; labelToNumericClassLabel[uniqueLabels[1]] = +1.0; for ( LabeledObservationVector::const_iterator i = labeledObservationVector.begin(); i != labeledObservationVector.end(); i++ ) { y.push_back( labelToNumericClassLabel[i->first] ); } discriminantToLabel[-1.0] = uniqueLabels[0]; discriminantToLabel[+1.0] = uniqueLabels[1]; } } // the is a convenience function for getting parameter ranges for all kernels void getDefaultKernelParameterRangeMap(KernelParameterRangeMap& kernelParameterRangeMap) { ParameterRangeMap linearParameterRangeMap; linearParameterRangeMap[SmoTrainer::MapKey_C] = SmoTrainer::defaultCRange; linearParameterRangeMap[LinearKernelFunction::MapKey_Constant] = LinearKernelFunction::defaultConstantRange; ParameterRangeMap rbfParameterRangeMap; rbfParameterRangeMap[SmoTrainer::MapKey_C] = SmoTrainer::defaultCRange; rbfParameterRangeMap[RbfKernelFunction::MapKey_Gamma] = RbfKernelFunction::defaultGammaRange; ParameterRangeMap polynomialParameterRangeMap; polynomialParameterRangeMap[SmoTrainer::MapKey_C] = SmoTrainer::defaultCRange; polynomialParameterRangeMap[PolynomialKernelFunction::MapKey_Constant] = PolynomialKernelFunction::defaultConstantRange; polynomialParameterRangeMap[PolynomialKernelFunction::MapKey_Coefficient] = PolynomialKernelFunction::defaultCoefficientRange; polynomialParameterRangeMap[PolynomialKernelFunction::MapKey_Degree] = PolynomialKernelFunction::defaultDegreeRange; ParameterRangeMap sigmoidParameterRangeMap; sigmoidParameterRangeMap[SmoTrainer::MapKey_C] = SmoTrainer::defaultCRange; sigmoidParameterRangeMap[SigmoidKernelFunction::MapKey_Alpha] = SigmoidKernelFunction::defaultAlphaRange; sigmoidParameterRangeMap[SigmoidKernelFunction::MapKey_Constant] = SigmoidKernelFunction::defaultConstantRange; kernelParameterRangeMap[LinearKernelFunction::MapKey] = linearParameterRangeMap; kernelParameterRangeMap[RbfKernelFunction::MapKey] = rbfParameterRangeMap; kernelParameterRangeMap[PolynomialKernelFunction::MapKey] = polynomialParameterRangeMap; kernelParameterRangeMap[SigmoidKernelFunction::MapKey] = sigmoidParameterRangeMap; } // // OneVsOneMultiClassSvmTrainer // // An instance of OneVsOneMultiClassSvmTrainer is intended to work with a single set of data // to produce a single instance of MultiClassSVM. That's why observations and labels go in to // the constructor. OneVsOneMultiClassSvmTrainer::OneVsOneMultiClassSvmTrainer(SvmDataset& d, int e, int t, OutputFilter& of) : svmDataset(d), evaluationFoldCount(e), trainFoldCount(t), outputFilter(of) { buildLabelSet(labelSet, svmDataset.getLabeledObservationVector()); buildLabelToLabeledObservationVector(labelToLabeledObservationVector, svmDataset.getLabeledObservationVector()); buildLabelPairSet(labelPairSet, svmDataset.getLabeledObservationVector()); } void buildLabelSet(LabelSet& labelSet, const LabeledObservationVector& labeledObservationVector) { for (LabeledObservationVector::const_iterator i = labeledObservationVector.begin(); i != labeledObservationVector.end(); i++) { labelSet.insert(i->first); } } // This function uses the LabeledObservationVector argument to populate the LabelPairSet // argument with pairs of labels. For example, if labeledObservationVector looks like this: // [ ("blue", x), ("green", y), ("red", z) ] // then the labelPairSet will be populated with the following label pairs: // ("blue", "green"), ("blue", "red"), ("green", "red") // The order of labels in the pairs is determined by the ordering of labels in the temporary // LabelSet. By default this order will be ascending. However, labels are taken off the // temporary labelStack in reverse order, so the labelStack is initialized with reverse iterators. // In the end our label pairs will be in sorted order. void OneVsOneMultiClassSvmTrainer::buildLabelPairSet(LabelPairSet& labelPairSet, const LabeledObservationVector& labeledObservationVector) { //cout << "buildLabelPairSet" << endl; LabelSet labelSet; buildLabelSet(labelSet, labeledObservationVector); LabelVector labelStack(labelSet.rbegin(), labelSet.rend()); while (labelStack.size() > 1) { Label label = labelStack.back(); labelStack.pop_back(); LabelPair labelPair(2); labelPair[0] = label; for (LabelVector::const_iterator i = labelStack.begin(); i != labelStack.end(); i++) { labelPair[1] = *i; labelPairSet.insert( //make_pair(label, *i) labelPair ); } } } // The LabelMatchesEither functor is used only in a call to remove_copy_if in the // OneVsOneMultiClassSvmTrainer::train method. It returns true if the labeled // observation argument has the same label as either of the two label arguments. class LabelMatchesEither { public: LabelMatchesEither(const Label& _label0, const Label& _label1) : label0(_label0), label1(_label1) {} bool operator() (const LabeledObservation& o) { return !((o.first == label0) || (o.first == label1)); } private: const Label& label0; const Label& label1; }; MultiClassSVM* OneVsOneMultiClassSvmTrainer::train(const KernelParameterRangeMap& kernelParameterRangeMap) { double bestMultiClassSvmScore = 0.0; MultiClassSVM* bestMc; KernelFunctionFactory kernelFunctionFactory(svmDataset.getLabeledObservationVector()); // first divide the data into a 'development' set for tuning hyperparameters // and an 'evaluation' set for measuring performance int evaluationFoldNumber = 0; KFoldLabeledObservationsDivider kFoldDevEvalDivider(evaluationFoldCount, svmDataset.getLabeledObservationVector()); for ( kFoldDevEvalDivider.start(); !kFoldDevEvalDivider.end(); kFoldDevEvalDivider.next() ) { const LabeledObservationVector& developmentObservations = kFoldDevEvalDivider.getTrainingData(); const LabeledObservationVector& evaluationObservations = kFoldDevEvalDivider.getTestingData(); evaluationFoldNumber++; if ( outputFilter.debug() ) { m->mothurOut( "evaluation fold " + toString(evaluationFoldNumber) + " of " + toString(evaluationFoldCount) ); m->mothurOutEndLine(); } vector twoClassSvmList; SvmToSvmPerformanceSummary svmToSvmPerformanceSummary; SmoTrainer smoTrainer(outputFilter); LabelPairSet::iterator labelPair; for (labelPair = labelPairSet.begin(); labelPair != labelPairSet.end(); labelPair++) { // generate training and testing data for this label pair Label label0 = (*labelPair)[0]; Label label1 = (*labelPair)[1]; if ( outputFilter.debug() ) { m->mothurOut("training SVM on labels " + toString(label0) + " and " + toString(label1) ); m->mothurOutEndLine(); } double bestMeanScoreOnKFolds = 0.0; ParameterMap bestParameterMap; string bestKernelFunctionKey; LabeledObservationVector twoClassDevelopmentObservations; LabelMatchesEither labelMatchesEither(label0, label1); remove_copy_if( developmentObservations.begin(), developmentObservations.end(), back_inserter(twoClassDevelopmentObservations), labelMatchesEither //[&](const LabeledObservation& o){ // return !((o.first == label0) || (o.first == label1)); //} ); KFoldLabeledObservationsDivider kFoldLabeledObservationsDivider(trainFoldCount, twoClassDevelopmentObservations); // loop on kernel functions and kernel function parameters for ( KernelParameterRangeMap::const_iterator kmap = kernelParameterRangeMap.begin(); kmap != kernelParameterRangeMap.end(); kmap++ ) { string kernelFunctionKey = kmap->first; KernelFunction& kernelFunction = kernelFunctionFactory.getKernelFunctionForKey(kmap->first); ParameterSetBuilder p(kmap->second); for (ParameterMapVector::const_iterator hp = p.getParameterSetList().begin(); hp != p.getParameterSetList().end(); hp++) { kernelFunction.setParameters(*hp); KernelFunctionCache kernelFunctionCache(kernelFunction, svmDataset.getLabeledObservationVector()); smoTrainer.setParameters(*hp); if (outputFilter.debug()) { m->mothurOut( "parameters for " + toString(kernelFunctionKey) + " kernel" ); m->mothurOutEndLine(); for ( ParameterMap::const_iterator i = hp->begin(); i != hp->end(); i++ ) { m->mothurOut( " " + toString(i->first) + ":" + toString(i->second) ); m->mothurOutEndLine(); } } double meanScoreOnKFolds = trainOnKFolds(smoTrainer, kernelFunctionCache, kFoldLabeledObservationsDivider); if ( meanScoreOnKFolds > bestMeanScoreOnKFolds ) { bestMeanScoreOnKFolds = meanScoreOnKFolds; bestParameterMap = *hp; bestKernelFunctionKey = kernelFunctionKey; } } } if ( bestMeanScoreOnKFolds == 0.0 ) { m->mothurOut( "failed to train SVM on labels " + toString(label0) + " and " + toString(label1) ); m->mothurOutEndLine(); throw exception(); } else { if ( outputFilter.debug() ) { m->mothurOut( "trained SVM on labels " + label0 + " and " + label1 ); m->mothurOutEndLine(); m->mothurOut( " best mean score over " + toString(trainFoldCount) + " folds is " + toString(bestMeanScoreOnKFolds) ); m->mothurOutEndLine(); m->mothurOut( " best parameters for " + bestKernelFunctionKey + " kernel" ); m->mothurOutEndLine(); for ( ParameterMap::const_iterator p = bestParameterMap.begin(); p != bestParameterMap.end(); p++ ) { m->mothurOut( " " + toString(p->first) + " : " + toString(p->second) ); m->mothurOutEndLine(); } } LabelMatchesEither labelMatchesEither(label0, label1); LabeledObservationVector twoClassDevelopmentObservations; remove_copy_if( developmentObservations.begin(), developmentObservations.end(), back_inserter(twoClassDevelopmentObservations), labelMatchesEither //[&](const LabeledObservation& o){ // return !((o.first == label0) || (o.first == label1)); //} ); if (outputFilter.info()) { m->mothurOut( "training final SVM with " + toString(twoClassDevelopmentObservations.size()) + " labeled observations" ); m->mothurOutEndLine(); for ( ParameterMap::const_iterator i = bestParameterMap.begin(); i != bestParameterMap.end(); i++ ) { m->mothurOut( " " + toString(i->first) + ":" + toString(i->second) ); m->mothurOutEndLine(); } } KernelFunction& kernelFunction = kernelFunctionFactory.getKernelFunctionForKey(bestKernelFunctionKey); kernelFunction.setParameters(bestParameterMap); smoTrainer.setParameters(bestParameterMap); KernelFunctionCache kernelFunctionCache(kernelFunction, svmDataset.getLabeledObservationVector()); SVM* svm = smoTrainer.train(kernelFunctionCache, twoClassDevelopmentObservations); //cout << "done training final SVM" << endl; twoClassSvmList.push_back(svm); // return a performance summary using the evaluation dataset LabeledObservationVector twoClassEvaluationObservations; remove_copy_if( evaluationObservations.begin(), evaluationObservations.end(), back_inserter(twoClassEvaluationObservations), labelMatchesEither //[&](const LabeledObservation& o){ // return !((o.first == label0) || (o.first == label1)); //} ); SvmPerformanceSummary p(*svm, twoClassEvaluationObservations); svmToSvmPerformanceSummary[svm->getLabelPair()] = p; } } MultiClassSVM* mc = new MultiClassSVM(twoClassSvmList, labelSet, svmToSvmPerformanceSummary, outputFilter); //double score = mc->score(evaluationObservations); mc->setAccuracy(evaluationObservations); if ( outputFilter.debug() ) { m->mothurOut( "fold " + toString(evaluationFoldNumber) + " multiclass SVM score: " + toString(mc->getAccuracy()) ); m->mothurOutEndLine(); } if ( mc->getAccuracy() > bestMultiClassSvmScore ) { bestMc = mc; bestMultiClassSvmScore = mc->getAccuracy(); } else { delete mc; } } if ( outputFilter.info() ) { m->mothurOut( "best multiclass SVM has score " + toString(bestMc->getAccuracy()) ); m->mothurOutEndLine(); } //for ( SvmVector::iterator i = bestMc->getSvmList().begin(); i != bestMc->getSvmList().end(); i++ ) { // SvmPerformanceSummary bestMc->getSvmPerformanceSummary(*i); //} return bestMc; } //SvmTrainingInterruptedException multiClassSvmTrainingInterruptedException("one-vs-one multiclass SVM training interrupted by user"); double OneVsOneMultiClassSvmTrainer::trainOnKFolds(SmoTrainer& smoTrainer, KernelFunctionCache& kernelFunction, KFoldLabeledObservationsDivider& kFoldLabeledObservationsDivider) { double meanScoreOverKFolds = 0.0; double online_mean_n = 0.0; double online_mean_score = 0.0; meanScoreOverKFolds = -1.0; // means we failed to train a SVM for ( kFoldLabeledObservationsDivider.start(); !kFoldLabeledObservationsDivider.end(); kFoldLabeledObservationsDivider.next() ) { const LabeledObservationVector& kthTwoClassTrainingFold = kFoldLabeledObservationsDivider.getTrainingData(); const LabeledObservationVector& kthTwoClassTestingFold = kFoldLabeledObservationsDivider.getTestingData(); if (outputFilter.info()) { m->mothurOut( "fold " + toString(kFoldLabeledObservationsDivider.getFoldNumber()) + " training data has " + toString(kthTwoClassTrainingFold.size()) + " labeled observations" ); m->mothurOutEndLine(); m->mothurOut( "fold " + toString(kFoldLabeledObservationsDivider.getFoldNumber()) + " testing data has " + toString(kthTwoClassTestingFold.size()) + " labeled observations" ); m->mothurOutEndLine(); } if (m->control_pressed) { return 0; } // if ( externalSvmTrainingInterruption.interruptTraining() ) { // m->mothurOut( "training interrupted by user" ); m->mothurOutEndLine(); // throw multiClassSvmTrainingInterruptedException; // } else { try { if (outputFilter.debug()) m->mothurOut( "begin training" ); m->mothurOutEndLine(); SVM* evaluationSvm = smoTrainer.train(kernelFunction, kthTwoClassTrainingFold); SvmPerformanceSummary svmPerformanceSummary(*evaluationSvm, kthTwoClassTestingFold); double score = evaluationSvm->score(kthTwoClassTestingFold); //double score = svmPerformanceSummary.getAccuracy(); if (outputFilter.debug()) { m->mothurOut( "score on fold " + toString(kFoldLabeledObservationsDivider.getFoldNumber()) + " of test data is " + toString(score) ); m->mothurOutEndLine(); m->mothurOut( "positive label: " + toString(svmPerformanceSummary.getPositiveClassLabel()) ); m->mothurOutEndLine(); m->mothurOut( "negative label: " + toString(svmPerformanceSummary.getNegativeClassLabel()) ); m->mothurOutEndLine(); m->mothurOut( " precision: " + toString(svmPerformanceSummary.getPrecision()) + " recall: " + toString(svmPerformanceSummary.getRecall()) + " f: " + toString(svmPerformanceSummary.getF()) + " accuracy: " + toString(svmPerformanceSummary.getAccuracy()) ); m->mothurOutEndLine(); } online_mean_n += 1.0; double online_mean_delta = score - online_mean_score; online_mean_score += online_mean_delta / online_mean_n; meanScoreOverKFolds = online_mean_score; delete evaluationSvm; } catch ( exception& e ) { m->mothurOut( "exception: " + toString(e.what()) ); m->mothurOutEndLine(); m->mothurOut( " on fold " + toString(kFoldLabeledObservationsDivider.getFoldNumber()) + " failed to train SVM with C = " + toString(smoTrainer.getC()) ); m->mothurOutEndLine(); } } } if (outputFilter.debug()) { m->mothurOut( "done with cross validation on C = " + toString(smoTrainer.getC()) ); m->mothurOutEndLine(); m->mothurOut( " mean score over " + toString(kFoldLabeledObservationsDivider.getFoldNumber()) + " folds is " + toString(meanScoreOverKFolds) ); m->mothurOutEndLine(); } if ( meanScoreOverKFolds == 0.0 ) { m->mothurOut( "failed to train SVM with C = " + toString(smoTrainer.getC()) ); m->mothurOutEndLine(); } return meanScoreOverKFolds; } class UnrankedFeature { public: UnrankedFeature(const Feature& f) : feature(f), rankingCriterion(0.0) {} ~UnrankedFeature() {} Feature getFeature() const { return feature; } double getRankingCriterion() const { return rankingCriterion; } void setRankingCriterion(double rc) { rankingCriterion = rc; } private: Feature feature; double rankingCriterion; }; bool lessThanRankingCriterion(const UnrankedFeature& a, const UnrankedFeature& b) { return a.getRankingCriterion() < b.getRankingCriterion(); } bool lessThanFeatureIndex(const UnrankedFeature& a, const UnrankedFeature& b) { return a.getFeature().getFeatureIndex() < b.getFeature().getFeatureIndex(); } typedef list UnrankedFeatureList; // Only the linear svm can be used here. // Consider allowing only parameter ranges as arguments. // Right now any kernel can be sent in. // It would be useful to remove more than one feature at a time // Might make sense to turn last two arguments into one RankedFeatureList SvmRfe::getOrderedFeatureList(SvmDataset& svmDataset, OneVsOneMultiClassSvmTrainer& t, const ParameterRange& linearKernelConstantRange, const ParameterRange& smoTrainerParameterRange) { KernelParameterRangeMap rfeKernelParameterRangeMap; ParameterRangeMap linearParameterRangeMap; linearParameterRangeMap[SmoTrainer::MapKey_C] = smoTrainerParameterRange; linearParameterRangeMap[LinearKernelFunction::MapKey_Constant] = linearKernelConstantRange; rfeKernelParameterRangeMap[LinearKernelFunction::MapKey] = linearParameterRangeMap; /* UnrankedFeatureList unrankedFeatureList; //for ( FeatureVector::iterator f = svmDataset.getFeatureVector().begin(); f != svmDataset.getFeatureVector().end(); f++ ) { for (int featureIndex = 0; featureIndex < svmDataset.getFeatureVector().size(); featureIndex++) { Feature f = svmDataset.getFeatureVector().at(featureIndex); unrankedFeatureList.push_back(UnrankedFeature(f)); } */ // the rankedFeatureList is empty at first RankedFeatureList rankedFeatureList; // loop until all but one feature have been eliminated // no need to eliminate the last feature, after all int svmRfeRound = 0; //while ( rankedFeatureList.size() < (svmDataset.getFeatureVector().size()-1) ) { while ( svmDataset.getFeatureVector().size() > 1 ) { svmRfeRound++; m->mothurOut( "SVM-RFE round " + toString(svmRfeRound) + ":" ); m->mothurOutEndLine(); UnrankedFeatureList unrankedFeatureList; for (int featureIndex = 0; featureIndex < svmDataset.getFeatureVector().size(); featureIndex++) { Feature f = svmDataset.getFeatureVector().at(featureIndex); unrankedFeatureList.push_back(UnrankedFeature(f)); } m->mothurOut( toString(unrankedFeatureList.size()) + " unranked features" ); m->mothurOutEndLine(); MultiClassSVM* s = t.train(rfeKernelParameterRangeMap); m->mothurOut( "multiclass SVM accuracy: " + toString(s->getAccuracy()) ); m->mothurOutEndLine(); m->mothurOut( "two-class SVM performance" ); m->mothurOutEndLine(); int labelFieldWidth = 2 + max_element(s->getLabels().begin(), s->getLabels().end())->size(); int performanceFieldWidth = 10; int performancePrecision = 3; m->mothurOut("class 1\tclass 2\tprecision\trecall\f\accuracy"); m->mothurOutEndLine(); for ( SvmVector::const_iterator svm = s->getSvmList().begin(); svm != s->getSvmList().end(); svm++ ) { SvmPerformanceSummary sps = s->getSvmPerformanceSummary(**svm); m->mothurOut(toString(sps.getPositiveClassLabel()) + toString(sps.getNegativeClassLabel()) + toString(sps.getPrecision()) + toString(sps.getRecall()) + toString(sps.getF()) + toString(sps.getAccuracy()) ); m->mothurOutEndLine(); // m->mothurOut( setw(labelFieldWidth) + setprecision(performancePrecision) + sps.getPositiveClassLabel() // + setw(labelFieldWidth) + setprecision(performancePrecision) + sps.getNegativeClassLabel() // + setw(performanceFieldWidth) + setprecision(performancePrecision) + sps.getPrecision() // + setw(performanceFieldWidth) + setprecision(performancePrecision) + sps.getRecall() // + setw(performanceFieldWidth) + setprecision(performancePrecision) + sps.getF() // + setw(performanceFieldWidth) + setprecision(performancePrecision) + sps.getAccuracy() ); m->mothurOutEndLine(); } // calculate the 'ranking criterion' for each (remaining) feature using each binary svm for (UnrankedFeatureList::iterator f = unrankedFeatureList.begin(); f != unrankedFeatureList.end(); f++) { const int i = f->getFeature().getFeatureIndex(); // rankingCriterion combines feature weights for feature i in all svms double rankingCriterion = 0.0; for ( SvmVector::const_iterator svm = s->getSvmList().begin(); svm != s->getSvmList().end(); svm++ ) { // output SVM performance summary // calculate the weight w of feature i for this svm double wi = 0.0; for (int j = 0; j < (*svm)->x.size(); j++) { // all support vectors contribute to wi wi += (*svm)->a.at(j) * (*svm)->y.at(j) * (*svm)->x.at(j).second->at(i); } // accumulate weights for feature i from all svms rankingCriterion += pow(wi, 2); } // update the (unranked) feature ranking criterion f->setRankingCriterion(rankingCriterion); } delete s; // sort the unranked features by ranking criterion unrankedFeatureList.sort(lessThanRankingCriterion); // eliminate the bottom 1/(n+1) features - this is very slow but gives good results ////int eliminateFeatureCount = ceil(unrankedFeatureList.size() / (iterationCount+1.0)); // eliminate the bottom 1/3 features - fast but results slightly different from above // how about 1/4? int eliminateFeatureCount = ceil(unrankedFeatureList.size() / 4.0); m->mothurOut( "eliminating " + toString(eliminateFeatureCount) + " feature(s) of " + toString(unrankedFeatureList.size()) + " total features"); m->mothurOutEndLine(); m->mothurOutEndLine(); UnrankedFeatureList featuresToEliminate; for ( int i = 0; i < eliminateFeatureCount; i++ ) { // remove the lowest ranked feature(s) from the list of unranked features UnrankedFeature unrankedFeature = unrankedFeatureList.front(); unrankedFeatureList.pop_front(); featuresToEliminate.push_back(unrankedFeature); // put the lowest ranked feature at the front of the list of ranked features // the first feature to be eliminated will be at the back of this list // the last feature to be eliminated will be at the front of this list rankedFeatureList.push_front(RankedFeature(unrankedFeature.getFeature(), svmRfeRound)); /* // TODO speed things up by removing the feature completely??? const int unrankedFeatureIndex = unrankedFeature.getFeature().getFeatureIndex(); for (LabeledObservationVector::iterator v = svmDataset.getLabeledObservationVector().begin(); v != svmDataset.getLabeledObservationVector().end(); v++) { v->second->at(unrankedFeatureIndex) = 0.0; } */ } featuresToEliminate.sort(lessThanFeatureIndex); reverse(featuresToEliminate.begin(), featuresToEliminate.end()); for (UnrankedFeatureList::iterator g = featuresToEliminate.begin(); g != featuresToEliminate.end(); g++) { Feature unrankedFeature = g->getFeature(); removeFeature(unrankedFeature, svmDataset.getLabeledObservationVector(), svmDataset.getFeatureVector()); } //cout << "remaining unranked features:" << endl; //for (UnrankedFeatureList::iterator g = unrankedFeatureList.begin(); g != unrankedFeatureList.end(); g++) { // cout << " feature " << g->getFeature().getFeatureLabel() << " with index " << g->getFeature().getFeatureIndex() << endl; //} // end of experiment } // there may be one feature left svmRfeRound++; //cout << unrankedFeatureList.size() << " feature(s) remain" << endl; for ( FeatureVector::iterator f = svmDataset.getFeatureVector().begin(); f != svmDataset.getFeatureVector().end(); f++ ) { rankedFeatureList.push_front(RankedFeature(*f, svmRfeRound)); } return rankedFeatureList; } mothur-1.36.1/source/svm/svm.hpp000066400000000000000000001115301255543666200165620ustar00rootroot00000000000000// // svm.hpp // support vector machine // // Created by Joshua Lynch on 6/19/2013. // Copyright (c) 2013 Schloss Lab. All rights reserved. // #ifndef svm_hpp_ #define svm_hpp_ #include #include #include #include #include #include #include #include #include #include #include "mothurout.h" // For the purpose of training a support vector machine // we need to calculate a dot product between two feature // vectors. In general these feature vectors are not // restricted to lists of doubles, but in this implementation // feature vectors (or 'observations' as they will be called from here on) // will be vectors of doubles. typedef vector Observation; /* class Observation { public: Observation() {} ~Observation() {} private: vector obs; }; */ // A dataset is a collection of labeled observations. // The ObservationVector typedef is a vector // of pointers to ObservationVectors. Pointers are used here since // datasets will be rearranged many times during cross validation. // Using pointers to Observations makes copying the elements of // an ObservationVector cheap. typedef vector ObservationVector; // Training a support vector machine requires labeled data. The // Label typedef defines what will constitute a class 'label' in // this implementation. typedef string Label; typedef vector